All of lore.kernel.org
 help / color / mirror / Atom feed
From: juanfengpy@gmail.com
To: gregkh@linuxfoundation.org
Cc: jirislaby@kernel.org, ilpo.jarvinen@linux.intel.com,
	benbjiang@tencent.com, robinlai@tencent.com,
	linux-serial@vger.kernel.org, juanfengpy@gmail.com,
	Hui Li <caelli@tencent.com>, <stable@vger.kernel.org>
Subject: [PATCH v7] tty: fix hang on tty device with no_room set
Date: Fri, 17 Mar 2023 10:41:55 +0800	[thread overview]
Message-ID: <1679020915-8349-1-git-send-email-caelli@tencent.com> (raw)
In-Reply-To: <YrmdEJrM3z6Dbvgn@kroah.com>

From: Hui Li <caelli@tencent.com>

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
copy_from_read_buf()|
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Cc: <stable@vger.kernel.org>
Fixes: 42458f41d08f ("n_tty: Ensure reader restarts worker for next reader")
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Hui Li <caelli@tencent.com>
---
Patch changelogs between v1 and v2:
	-add barrier inside n_tty_read and n_tty_receive_buf_common;
	-comment why barrier is needed;
	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
	-in function n_tty_receive_buf_common, add unlikely to check
	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
	 is removed here to get locality;
	-change comment for barrier to show the race condition to make
	 comment easier to understand;
Patch changelogs between v3 and v4:
	-change subject from 'tty: fix a possible hang on tty device' to
	 'tty: fix hang on tty device with no_room set' to make subject 
	 more obvious;
Patch changelogs between v4 and v5:
	-name is changed from cael to caelli, li is added as the family
	 name and caelli is the fullname.
Patch changelogs between v5 and v6:
	-change from and Signed-off-by, from 'caelli <juanfengpy@gmail.com>'
	 to 'caelli <caelli@tencent.com>', later one is my corporate address.
Patch changelogs between v6 and v7:
	-change name from caelli to 'Hui Li', which is my name in chinese.
	-the comment for barrier is improved, and a Fixes and Reviewed-by
	 tags is added.

 drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index c8f56c9b1a1c..8c17304fffcf 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -204,8 +204,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1698,7 +1698,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1729,6 +1729,27 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set, otherwise, following race may occur:
+		 * n_tty_receive_buf_common()
+		 *				n_tty_read()
+		 *   if (!chars_in_buffer(tty))->false
+		 *				  copy_from_read_buf()
+		 *				    read_tail=commit_head
+		 *				  n_tty_kick_worker()
+		 *				    if (ldata->no_room)->false
+		 *   ldata->no_room = 1
+		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+		 * smp_mb is paired with smp_mb() in n_tty_read().
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2282,8 +2303,25 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (old_tail != ldata->read_tail)
+	if (old_tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read in n_tty_kick_worker()
+		 * before setting ldata->read_tail in copy_from_read_buf(),
+		 * otherwise, following race may occur:
+		 * n_tty_read()
+		 *			n_tty_receive_buf_common()
+		 *   n_tty_kick_worker()
+		 *     if(ldata->no_room)->false
+		 *			  ldata->no_room = 1
+		 *			  if (!chars_in_buffer(tty))->false
+		 *   copy_from_read_buf()
+		 *     read_tail=commit_head
+		 * Both reader and kworker will fail to kick tty_buffer_restart_work(),
+		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


  parent reply	other threads:[~2023-03-17  2:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24  2:21 tty: fix a possible hang on tty device cael
2022-05-24  9:11 ` Ilpo Järvinen
2022-05-24 11:09   ` cael
2022-05-24 11:40     ` Ilpo Järvinen
2022-05-24 12:47       ` cael
2022-05-24 13:25         ` Ilpo Järvinen
2022-05-25 10:36           ` cael
2022-05-25 11:21             ` Ilpo Järvinen
2022-05-30 13:13               ` cael
2022-05-31 12:37                 ` Ilpo Järvinen
2022-06-01  9:38 ` Greg KH
2022-06-01 13:39   ` cael
2022-06-01 14:47     ` Greg KH
2022-06-01 15:28     ` Ilpo Järvinen
2022-06-06 13:40       ` cael
2022-06-06 14:43         ` Greg KH
2022-06-11  6:50           ` cael
2022-06-11  7:32             ` Greg KH
2022-06-13 12:30               ` [PATCH v3] tty: fix hang on tty device with no_room set juanfengpy
2022-06-13 17:20                 ` Greg KH
2022-06-15  3:45                   ` [PATCH v4] " cael
2022-06-15  5:00                     ` Greg KH
2022-06-15  7:57                       ` Ilpo Järvinen
2022-06-15  9:29                         ` Greg KH
2022-06-15 11:17                           ` [PATCH v5] " cael
2022-06-15 11:29                             ` Ilpo Järvinen
2022-06-15 13:33                               ` caelli
2022-06-27 12:05                             ` Greg KH
2022-06-27 13:53                               ` [PATCH v6] " juanfengpy
2023-03-17  2:37                               ` [PATCH v7] " juanfengpy
2023-03-17  2:41                               ` juanfengpy [this message]
2023-03-17  6:32                                 ` Jiri Slaby
2023-03-17  7:25                                   ` [PATCH v8] " juanfengpy
2023-04-06  2:44                                     ` [PATCH v9] " juanfengpy
2023-06-15 10:21                                       ` patch "tty: fix hang on tty device with no_room set" added to tty-testing gregkh
2023-06-16  6:14                                       ` patch "tty: fix hang on tty device with no_room set" added to tty-next gregkh
2023-03-17  2:24 [PATCH v7] tty: fix hang on tty device with no_room set juanfengpy
2023-04-04  2:49 ` Bagas Sanjaya
2023-04-04  2:55   ` Bagas Sanjaya
2023-04-04  7:02     ` caelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1679020915-8349-1-git-send-email-caelli@tencent.com \
    --to=juanfengpy@gmail.com \
    --cc=benbjiang@tencent.com \
    --cc=caelli@tencent.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jirislaby@kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=robinlai@tencent.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.