linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tty: fix a possible hang on tty device
@ 2022-05-24  2:21 cael
  2022-05-24  9:11 ` Ilpo Järvinen
  2022-06-01  9:38 ` Greg KH
  0 siblings, 2 replies; 33+ messages in thread
From: cael @ 2022-05-24  2:21 UTC (permalink / raw)
  To: gregkh, jirislaby; +Cc: linux-serial

We have met a hang on pty device, the reader was blocking at
 epoll on master side, the writer was sleeping at wait_woken inside
 n_tty_write on slave side ,and the write buffer on tty_port was full, we
 found that the reader and writer would never be woken again and block
 forever.

We thought the problem was caused as a race between reader and
kworker as follows:
n_tty_read(reader)| n_tty_receive_buf_common(kworker)
                  |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                  |room <= 0
copy_from_read_buf|
n_tty_kick_worker |
                  |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and call
n_tty_kick_worker to check ldata->no_room and finds that there
is no need to call tty_buffer_restart_work to flush data to reader
and reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if writer buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

We think this problem can be solved with a check for read buffer
inside function n_tty_receive_buf_common, if read buffer is empty and
ldata->no_room is true, this means that kworker has more data to flush
to read buffer, so a call to n_tty_kick_worker is necessary.

Signed-off-by: cael <juanfengpy@gmail.com>
---
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..36c7bc033c78 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
const unsigned char *cp,
        } else
                n_tty_check_throttle(tty);

+       if (!chars_in_buffer(tty))
+               n_tty_kick_worker(tty);
+
        up_read(&tty->termios_rwsem);

        return rcvd;
-- 
2.27.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-24  2:21 tty: fix a possible hang on tty device cael
@ 2022-05-24  9:11 ` Ilpo Järvinen
  2022-05-24 11:09   ` cael
  2022-06-01  9:38 ` Greg KH
  1 sibling, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2022-05-24  9:11 UTC (permalink / raw)
  To: cael; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

On Tue, 24 May 2022, cael wrote:

> We have met a hang on pty device, the reader was blocking at
>  epoll on master side, the writer was sleeping at wait_woken inside
>  n_tty_write on slave side ,and the write buffer on tty_port was full, we

Space after comma. It would be also useful to tone down usage of "we" in 
the changelog.

>  found that the reader and writer would never be woken again and block
>  forever.
> 
> We thought the problem was caused as a race between reader and
> kworker as follows:
> n_tty_read(reader)| n_tty_receive_buf_common(kworker)
>                   |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                   |room <= 0
> copy_from_read_buf|
> n_tty_kick_worker |
>                   |ldata->no_room = true
>
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and call
> n_tty_kick_worker to check ldata->no_room and finds that there
> is no need to call tty_buffer_restart_work to flush data to reader
> and reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if writer buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> We think this problem can be solved with a check for read buffer
> inside function n_tty_receive_buf_common, if read buffer is empty and
> ldata->no_room is true, this means that kworker has more data to flush
> to read buffer, so a call to n_tty_kick_worker is necessary.
> 
> Signed-off-by: cael <juanfengpy@gmail.com>
> ---
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..36c7bc033c78 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> const unsigned char *cp,
>         } else
>                 n_tty_check_throttle(tty);
> 
> +       if (!chars_in_buffer(tty))
> +               n_tty_kick_worker(tty);
> +

chars_in_buffer() accesses ldata->read_tail in producer context so this 
probably just moves the race there?


-- 
 i.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-24  9:11 ` Ilpo Järvinen
@ 2022-05-24 11:09   ` cael
  2022-05-24 11:40     ` Ilpo Järvinen
  0 siblings, 1 reply; 33+ messages in thread
From: cael @ 2022-05-24 11:09 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

Thanks for the answer, yes, there exists a race between reader and kworker,
but it's OK. Before checking chars_in_buffer in kworker,
ldata->no_room is set true,
if reader changes ldata->read_tail in n_tty_read when kworker checks this value
which makes the check fail, then when reader reaches end of n_tty_read,
n_tty_kick_worker will also be called. Besides, kworker and reader may
call n_tty_kick_worker at the same time, this function only queues work
on workqueue, so it's harmless.

Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 17:11写道:
>
> On Tue, 24 May 2022, cael wrote:
>
> > We have met a hang on pty device, the reader was blocking at
> >  epoll on master side, the writer was sleeping at wait_woken inside
> >  n_tty_write on slave side ,and the write buffer on tty_port was full, we
>
> Space after comma. It would be also useful to tone down usage of "we" in
> the changelog.
>
> >  found that the reader and writer would never be woken again and block
> >  forever.
> >
> > We thought the problem was caused as a race between reader and
> > kworker as follows:
> > n_tty_read(reader)| n_tty_receive_buf_common(kworker)
> >                   |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> >                   |room <= 0
> > copy_from_read_buf|
> > n_tty_kick_worker |
> >                   |ldata->no_room = true
> >
> > After writing to slave device, writer wakes up kworker to flush
> > data on tty_port to reader, and the kworker finds that reader
> > has no room to store data so room <= 0 is met. At this moment,
> > reader consumes all the data on reader buffer and call
> > n_tty_kick_worker to check ldata->no_room and finds that there
> > is no need to call tty_buffer_restart_work to flush data to reader
> > and reader quits reading. Then kworker sets ldata->no_room=true
> > and quits too.
> >
> > If write buffer is not full, writer will wake kworker to flush data
> > again after following writes, but if writer buffer is full and writer
> > goes to sleep, kworker will never be woken again and tty device is
> > blocked.
> >
> > We think this problem can be solved with a check for read buffer
> > inside function n_tty_receive_buf_common, if read buffer is empty and
> > ldata->no_room is true, this means that kworker has more data to flush
> > to read buffer, so a call to n_tty_kick_worker is necessary.
> >
> > Signed-off-by: cael <juanfengpy@gmail.com>
> > ---
> > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> > index efc72104c840..36c7bc033c78 100644
> > --- a/drivers/tty/n_tty.c
> > +++ b/drivers/tty/n_tty.c
> > @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> > const unsigned char *cp,
> >         } else
> >                 n_tty_check_throttle(tty);
> >
> > +       if (!chars_in_buffer(tty))
> > +               n_tty_kick_worker(tty);
> > +
>
> chars_in_buffer() accesses ldata->read_tail in producer context so this
> probably just moves the race there?
>
>
> --
>  i.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-24 11:09   ` cael
@ 2022-05-24 11:40     ` Ilpo Järvinen
  2022-05-24 12:47       ` cael
  0 siblings, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2022-05-24 11:40 UTC (permalink / raw)
  To: cael; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

[-- Attachment #1: Type: text/plain, Size: 3377 bytes --]

On Tue, 24 May 2022, cael wrote:

> Thanks for the answer, yes, there exists a race between reader and kworker,
> but it's OK. Before checking chars_in_buffer in kworker,
> ldata->no_room is set true,

Nothing seems to guarantee this.

> if reader changes ldata->read_tail in n_tty_read when kworker checks this value
> which makes the check fail, then when reader reaches end of n_tty_read,
> n_tty_kick_worker will also be called. Besides, kworker and reader may
> call n_tty_kick_worker at the same time, this function only queues work
> on workqueue, so it's harmless.

I'm not worried about the case where both cpus call n_tty_kick_worker but 
the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu
!no_room.

-- 
 i.

> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 17:11写道:
> >
> > On Tue, 24 May 2022, cael wrote:
> >
> > > We have met a hang on pty device, the reader was blocking at
> > >  epoll on master side, the writer was sleeping at wait_woken inside
> > >  n_tty_write on slave side ,and the write buffer on tty_port was full, we
> >
> > Space after comma. It would be also useful to tone down usage of "we" in
> > the changelog.
> >
> > >  found that the reader and writer would never be woken again and block
> > >  forever.
> > >
> > > We thought the problem was caused as a race between reader and
> > > kworker as follows:
> > > n_tty_read(reader)| n_tty_receive_buf_common(kworker)
> > >                   |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> > >                   |room <= 0
> > > copy_from_read_buf|
> > > n_tty_kick_worker |
> > >                   |ldata->no_room = true
> > >
> > > After writing to slave device, writer wakes up kworker to flush
> > > data on tty_port to reader, and the kworker finds that reader
> > > has no room to store data so room <= 0 is met. At this moment,
> > > reader consumes all the data on reader buffer and call
> > > n_tty_kick_worker to check ldata->no_room and finds that there
> > > is no need to call tty_buffer_restart_work to flush data to reader
> > > and reader quits reading. Then kworker sets ldata->no_room=true
> > > and quits too.
> > >
> > > If write buffer is not full, writer will wake kworker to flush data
> > > again after following writes, but if writer buffer is full and writer
> > > goes to sleep, kworker will never be woken again and tty device is
> > > blocked.
> > >
> > > We think this problem can be solved with a check for read buffer
> > > inside function n_tty_receive_buf_common, if read buffer is empty and
> > > ldata->no_room is true, this means that kworker has more data to flush
> > > to read buffer, so a call to n_tty_kick_worker is necessary.
> > >
> > > Signed-off-by: cael <juanfengpy@gmail.com>
> > > ---
> > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> > > index efc72104c840..36c7bc033c78 100644
> > > --- a/drivers/tty/n_tty.c
> > > +++ b/drivers/tty/n_tty.c
> > > @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> > > const unsigned char *cp,
> > >         } else
> > >                 n_tty_check_throttle(tty);
> > >
> > > +       if (!chars_in_buffer(tty))
> > > +               n_tty_kick_worker(tty);
> > > +
> >
> > chars_in_buffer() accesses ldata->read_tail in producer context so this
> > probably just moves the race there?
> >
> >
> > --
> >  i.
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-24 11:40     ` Ilpo Järvinen
@ 2022-05-24 12:47       ` cael
  2022-05-24 13:25         ` Ilpo Järvinen
  0 siblings, 1 reply; 33+ messages in thread
From: cael @ 2022-05-24 12:47 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

if  ldata->no_room is not true, that means kworker has flushed
at least n characters to break the while loop, so return value of
n_tty_receive_buf_common is not zero, flush_to_ldisc will
continue to call this function to flush data to reader if write buffer
is not empty.

Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 19:40写道:
>
> On Tue, 24 May 2022, cael wrote:
>
> > Thanks for the answer, yes, there exists a race between reader and kworker,
> > but it's OK. Before checking chars_in_buffer in kworker,
> > ldata->no_room is set true,
>
> Nothing seems to guarantee this.
>
> > if reader changes ldata->read_tail in n_tty_read when kworker checks this value
> > which makes the check fail, then when reader reaches end of n_tty_read,
> > n_tty_kick_worker will also be called. Besides, kworker and reader may
> > call n_tty_kick_worker at the same time, this function only queues work
> > on workqueue, so it's harmless.
>
> I'm not worried about the case where both cpus call n_tty_kick_worker but
> the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu
> !no_room.
>
> --
>  i.
>
> > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 17:11写道:
> > >
> > > On Tue, 24 May 2022, cael wrote:
> > >
> > > > We have met a hang on pty device, the reader was blocking at
> > > >  epoll on master side, the writer was sleeping at wait_woken inside
> > > >  n_tty_write on slave side ,and the write buffer on tty_port was full, we
> > >
> > > Space after comma. It would be also useful to tone down usage of "we" in
> > > the changelog.
> > >
> > > >  found that the reader and writer would never be woken again and block
> > > >  forever.
> > > >
> > > > We thought the problem was caused as a race between reader and
> > > > kworker as follows:
> > > > n_tty_read(reader)| n_tty_receive_buf_common(kworker)
> > > >                   |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> > > >                   |room <= 0
> > > > copy_from_read_buf|
> > > > n_tty_kick_worker |
> > > >                   |ldata->no_room = true
> > > >
> > > > After writing to slave device, writer wakes up kworker to flush
> > > > data on tty_port to reader, and the kworker finds that reader
> > > > has no room to store data so room <= 0 is met. At this moment,
> > > > reader consumes all the data on reader buffer and call
> > > > n_tty_kick_worker to check ldata->no_room and finds that there
> > > > is no need to call tty_buffer_restart_work to flush data to reader
> > > > and reader quits reading. Then kworker sets ldata->no_room=true
> > > > and quits too.
> > > >
> > > > If write buffer is not full, writer will wake kworker to flush data
> > > > again after following writes, but if writer buffer is full and writer
> > > > goes to sleep, kworker will never be woken again and tty device is
> > > > blocked.
> > > >
> > > > We think this problem can be solved with a check for read buffer
> > > > inside function n_tty_receive_buf_common, if read buffer is empty and
> > > > ldata->no_room is true, this means that kworker has more data to flush
> > > > to read buffer, so a call to n_tty_kick_worker is necessary.
> > > >
> > > > Signed-off-by: cael <juanfengpy@gmail.com>
> > > > ---
> > > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> > > > index efc72104c840..36c7bc033c78 100644
> > > > --- a/drivers/tty/n_tty.c
> > > > +++ b/drivers/tty/n_tty.c
> > > > @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> > > > const unsigned char *cp,
> > > >         } else
> > > >                 n_tty_check_throttle(tty);
> > > >
> > > > +       if (!chars_in_buffer(tty))
> > > > +               n_tty_kick_worker(tty);
> > > > +
> > >
> > > chars_in_buffer() accesses ldata->read_tail in producer context so this
> > > probably just moves the race there?
> > >
> > >
> > > --
> > >  i.
> > >
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-24 12:47       ` cael
@ 2022-05-24 13:25         ` Ilpo Järvinen
  2022-05-25 10:36           ` cael
  0 siblings, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2022-05-24 13:25 UTC (permalink / raw)
  To: cael; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

[-- Attachment #1: Type: text/plain, Size: 4239 bytes --]

On Tue, 24 May 2022, cael wrote:

> if  ldata->no_room is not true, that means kworker has flushed
> at least n characters to break the while loop, so return value of
> n_tty_receive_buf_common is not zero, flush_to_ldisc will
> continue to call this function to flush data to reader if write buffer
> is not empty.

Now you switched to an entirely different case, not the one we were 
talking about. ...There is no ldisc->no_room = true race in the case
you now described.

-- 
 i.

> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 19:40写道:
> >
> > On Tue, 24 May 2022, cael wrote:
> >
> > > Thanks for the answer, yes, there exists a race between reader and kworker,
> > > but it's OK. Before checking chars_in_buffer in kworker,
> > > ldata->no_room is set true,
> >
> > Nothing seems to guarantee this.
> >
> > > if reader changes ldata->read_tail in n_tty_read when kworker checks this value
> > > which makes the check fail, then when reader reaches end of n_tty_read,
> > > n_tty_kick_worker will also be called. Besides, kworker and reader may
> > > call n_tty_kick_worker at the same time, this function only queues work
> > > on workqueue, so it's harmless.
> >
> > I'm not worried about the case where both cpus call n_tty_kick_worker but
> > the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu
> > !no_room.
> >
> > --
> >  i.
> >
> > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 17:11写道:
> > > >
> > > > On Tue, 24 May 2022, cael wrote:
> > > >
> > > > > We have met a hang on pty device, the reader was blocking at
> > > > >  epoll on master side, the writer was sleeping at wait_woken inside
> > > > >  n_tty_write on slave side ,and the write buffer on tty_port was full, we
> > > >
> > > > Space after comma. It would be also useful to tone down usage of "we" in
> > > > the changelog.
> > > >
> > > > >  found that the reader and writer would never be woken again and block
> > > > >  forever.
> > > > >
> > > > > We thought the problem was caused as a race between reader and
> > > > > kworker as follows:
> > > > > n_tty_read(reader)| n_tty_receive_buf_common(kworker)
> > > > >                   |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> > > > >                   |room <= 0
> > > > > copy_from_read_buf|
> > > > > n_tty_kick_worker |
> > > > >                   |ldata->no_room = true
> > > > >
> > > > > After writing to slave device, writer wakes up kworker to flush
> > > > > data on tty_port to reader, and the kworker finds that reader
> > > > > has no room to store data so room <= 0 is met. At this moment,
> > > > > reader consumes all the data on reader buffer and call
> > > > > n_tty_kick_worker to check ldata->no_room and finds that there
> > > > > is no need to call tty_buffer_restart_work to flush data to reader
> > > > > and reader quits reading. Then kworker sets ldata->no_room=true
> > > > > and quits too.
> > > > >
> > > > > If write buffer is not full, writer will wake kworker to flush data
> > > > > again after following writes, but if writer buffer is full and writer
> > > > > goes to sleep, kworker will never be woken again and tty device is
> > > > > blocked.
> > > > >
> > > > > We think this problem can be solved with a check for read buffer
> > > > > inside function n_tty_receive_buf_common, if read buffer is empty and
> > > > > ldata->no_room is true, this means that kworker has more data to flush
> > > > > to read buffer, so a call to n_tty_kick_worker is necessary.
> > > > >
> > > > > Signed-off-by: cael <juanfengpy@gmail.com>
> > > > > ---
> > > > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> > > > > index efc72104c840..36c7bc033c78 100644
> > > > > --- a/drivers/tty/n_tty.c
> > > > > +++ b/drivers/tty/n_tty.c
> > > > > @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> > > > > const unsigned char *cp,
> > > > >         } else
> > > > >                 n_tty_check_throttle(tty);
> > > > >
> > > > > +       if (!chars_in_buffer(tty))
> > > > > +               n_tty_kick_worker(tty);
> > > > > +
> > > >
> > > > chars_in_buffer() accesses ldata->read_tail in producer context so this
> > > > probably just moves the race there?



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-24 13:25         ` Ilpo Järvinen
@ 2022-05-25 10:36           ` cael
  2022-05-25 11:21             ` Ilpo Järvinen
  0 siblings, 1 reply; 33+ messages in thread
From: cael @ 2022-05-25 10:36 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

>Now you switched to an entirely different case, not the one we were
>talking about. ...There is no ldisc->no_room = true race in the case
>you now described.
So, I think we should back to the case ldata->no_room=true as
ldata->no_room=false seems harmless.

>I'm not worried about the case where both cpus call n_tty_kick_worker but
>the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu
>!no_room.
As ldata->no_room=true is set before checking chars_in_buffer(), if producer
finds chars_in_buffer() > 0, then if reader is currently in n_tty_read,
when reader quits n_tty_read, n_tty_kick_worker will be called. If reader
has already exited n_tty_read, which means that reader still has data to read,
next time reader will call n_tty_kick_worker inside n_tty_read too.

Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 21:25写道:
>
> On Tue, 24 May 2022, cael wrote:
>
> > if  ldata->no_room is not true, that means kworker has flushed
> > at least n characters to break the while loop, so return value of
> > n_tty_receive_buf_common is not zero, flush_to_ldisc will
> > continue to call this function to flush data to reader if write buffer
> > is not empty.
>
> Now you switched to an entirely different case, not the one we were
> talking about. ...There is no ldisc->no_room = true race in the case
> you now described.
>
> --
>  i.
>
> > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 19:40写道:
> > >
> > > On Tue, 24 May 2022, cael wrote:
> > >
> > > > Thanks for the answer, yes, there exists a race between reader and kworker,
> > > > but it's OK. Before checking chars_in_buffer in kworker,
> > > > ldata->no_room is set true,
> > >
> > > Nothing seems to guarantee this.
> > >
> > > > if reader changes ldata->read_tail in n_tty_read when kworker checks this value
> > > > which makes the check fail, then when reader reaches end of n_tty_read,
> > > > n_tty_kick_worker will also be called. Besides, kworker and reader may
> > > > call n_tty_kick_worker at the same time, this function only queues work
> > > > on workqueue, so it's harmless.
> > >
> > > I'm not worried about the case where both cpus call n_tty_kick_worker but
> > > the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu
> > > !no_room.
> > >
> > > --
> > >  i.
> > >
> > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月24日周二 17:11写道:
> > > > >
> > > > > On Tue, 24 May 2022, cael wrote:
> > > > >
> > > > > > We have met a hang on pty device, the reader was blocking at
> > > > > >  epoll on master side, the writer was sleeping at wait_woken inside
> > > > > >  n_tty_write on slave side ,and the write buffer on tty_port was full, we
> > > > >
> > > > > Space after comma. It would be also useful to tone down usage of "we" in
> > > > > the changelog.
> > > > >
> > > > > >  found that the reader and writer would never be woken again and block
> > > > > >  forever.
> > > > > >
> > > > > > We thought the problem was caused as a race between reader and
> > > > > > kworker as follows:
> > > > > > n_tty_read(reader)| n_tty_receive_buf_common(kworker)
> > > > > >                   |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> > > > > >                   |room <= 0
> > > > > > copy_from_read_buf|
> > > > > > n_tty_kick_worker |
> > > > > >                   |ldata->no_room = true
> > > > > >
> > > > > > After writing to slave device, writer wakes up kworker to flush
> > > > > > data on tty_port to reader, and the kworker finds that reader
> > > > > > has no room to store data so room <= 0 is met. At this moment,
> > > > > > reader consumes all the data on reader buffer and call
> > > > > > n_tty_kick_worker to check ldata->no_room and finds that there
> > > > > > is no need to call tty_buffer_restart_work to flush data to reader
> > > > > > and reader quits reading. Then kworker sets ldata->no_room=true
> > > > > > and quits too.
> > > > > >
> > > > > > If write buffer is not full, writer will wake kworker to flush data
> > > > > > again after following writes, but if writer buffer is full and writer
> > > > > > goes to sleep, kworker will never be woken again and tty device is
> > > > > > blocked.
> > > > > >
> > > > > > We think this problem can be solved with a check for read buffer
> > > > > > inside function n_tty_receive_buf_common, if read buffer is empty and
> > > > > > ldata->no_room is true, this means that kworker has more data to flush
> > > > > > to read buffer, so a call to n_tty_kick_worker is necessary.
> > > > > >
> > > > > > Signed-off-by: cael <juanfengpy@gmail.com>
> > > > > > ---
> > > > > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> > > > > > index efc72104c840..36c7bc033c78 100644
> > > > > > --- a/drivers/tty/n_tty.c
> > > > > > +++ b/drivers/tty/n_tty.c
> > > > > > @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> > > > > > const unsigned char *cp,
> > > > > >         } else
> > > > > >                 n_tty_check_throttle(tty);
> > > > > >
> > > > > > +       if (!chars_in_buffer(tty))
> > > > > > +               n_tty_kick_worker(tty);
> > > > > > +
> > > > >
> > > > > chars_in_buffer() accesses ldata->read_tail in producer context so this
> > > > > probably just moves the race there?
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-25 10:36           ` cael
@ 2022-05-25 11:21             ` Ilpo Järvinen
  2022-05-30 13:13               ` cael
  0 siblings, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2022-05-25 11:21 UTC (permalink / raw)
  To: cael; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

On Wed, 25 May 2022, cael wrote:

> >Now you switched to an entirely different case, not the one we were
> >talking about. ...There is no ldisc->no_room = true race in the case
> >you now described.
> So, I think we should back to the case ldata->no_room=true as
> ldata->no_room=false seems harmless.
> 
> >I'm not worried about the case where both cpus call n_tty_kick_worker but
> >the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu
> >!no_room.
>
> As ldata->no_room=true is set before checking chars_in_buffer()

Please take a brief look at Documentation/memory-barriers.txt and then 
tell me if you still find this claim to be true.

> if producer
> finds chars_in_buffer() > 0, then if reader is currently in n_tty_read,

...Then please do a similar analysis for ldata->read_tail. What guarantees 
its update is seen by the producer cpu when the reader is already past the 
point you think it still must be in?

> when reader quits n_tty_read, n_tty_kick_worker will be called. If reader
> has already exited n_tty_read, which means that reader still has data to read,
> next time reader will call n_tty_kick_worker inside n_tty_read too.

C-level analysis alone is not going to be very useful here given you're 
dealing with a concurrency challenge here.


-- 
 i.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-25 11:21             ` Ilpo Järvinen
@ 2022-05-30 13:13               ` cael
  2022-05-31 12:37                 ` Ilpo Järvinen
  0 siblings, 1 reply; 33+ messages in thread
From: cael @ 2022-05-30 13:13 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

Thanks, You are right, barrier is needed here. I changed the patch as follows:
1) WRITE_ONCE and READ_ONCE is used to access ldata->no_room since
n_tty_kick_worker  would be called in kworker and reader cpu;
2) smp_mb added in chars_in_buffer as this function will be called in
reader and kworker, accessing commit_head and read_tail; and to make
sure that read_tail is not read before setting no_room in
n_tty_receive_buf_common;
3) smp_mb added in n_tty_read to make sure that no_room is not read
before setting read_tail.
---
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..3327687da0d3 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
        struct n_tty_data *ldata = tty->disc_data;

        /* Did the input worker stop? Restart it */
-       if (unlikely(ldata->no_room)) {
-               ldata->no_room = 0;
+       if (unlikely(READ_ONCE(ldata->no_room))) {
+               WRITE_ONCE(ldata->no_room, 0);

                WARN_RATELIMIT(tty->port->itty == NULL,
                                "scheduling with invalid itty\n");
@@ -221,6 +221,7 @@ static ssize_t chars_in_buffer(struct tty_struct *tty)
        struct n_tty_data *ldata = tty->disc_data;
        ssize_t n = 0;

+       smp_mb();
        if (!ldata->icanon)
                n = ldata->commit_head - ldata->read_tail;
        else
@@ -1632,7 +1633,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
const unsigned char *cp,
                        if (overflow && room < 0)
                                ldata->read_head--;
                        room = overflow;
-                       ldata->no_room = flow && !room;
+                       WRITE_ONCE(ldata->no_room, flow && !room);
                } else
                        overflow = 0;

@@ -1663,6 +1664,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
const unsigned char *cp,
        } else
                n_tty_check_throttle(tty);

+       if (!chars_in_buffer(tty))
+               n_tty_kick_worker(tty);
+
        up_read(&tty->termios_rwsem);

        return rcvd;
@@ -2180,8 +2184,10 @@ static ssize_t n_tty_read(struct tty_struct
*tty, struct file *file,
                if (time)
                        timeout = time;
        }
-       if (tail != ldata->read_tail)
+       if (tail != ldata->read_tail) {
+               smp_mb();
                n_tty_kick_worker(tty);
+       }
        up_read(&tty->termios_rwsem);

        remove_wait_queue(&tty->read_wait, &wait);
--
2.27.0

Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月25日周三 19:21写道:
>
> On Wed, 25 May 2022, cael wrote:
>
> > >Now you switched to an entirely different case, not the one we were
> > >talking about. ...There is no ldisc->no_room = true race in the case
> > >you now described.
> > So, I think we should back to the case ldata->no_room=true as
> > ldata->no_room=false seems harmless.
> >
> > >I'm not worried about the case where both cpus call n_tty_kick_worker but
> > >the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu
> > >!no_room.
> >
> > As ldata->no_room=true is set before checking chars_in_buffer()
>
> Please take a brief look at Documentation/memory-barriers.txt and then
> tell me if you still find this claim to be true.
>
> > if producer
> > finds chars_in_buffer() > 0, then if reader is currently in n_tty_read,
>
> ...Then please do a similar analysis for ldata->read_tail. What guarantees
> its update is seen by the producer cpu when the reader is already past the
> point you think it still must be in?
>
> > when reader quits n_tty_read, n_tty_kick_worker will be called. If reader
> > has already exited n_tty_read, which means that reader still has data to read,
> > next time reader will call n_tty_kick_worker inside n_tty_read too.
>
> C-level analysis alone is not going to be very useful here given you're
> dealing with a concurrency challenge here.
>
>
> --
>  i.
>
>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-30 13:13               ` cael
@ 2022-05-31 12:37                 ` Ilpo Järvinen
  0 siblings, 0 replies; 33+ messages in thread
From: Ilpo Järvinen @ 2022-05-31 12:37 UTC (permalink / raw)
  To: cael; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-serial

[-- Attachment #1: Type: text/plain, Size: 4823 bytes --]

On Mon, 30 May 2022, cael wrote:

> Thanks, You are right, barrier is needed here. I changed the patch as follows:
> 1) WRITE_ONCE and READ_ONCE is used to access ldata->no_room since
> n_tty_kick_worker  would be called in kworker and reader cpu;
> 2) smp_mb added in chars_in_buffer as this function will be called in
> reader and kworker, accessing commit_head and read_tail; and to make
> sure that read_tail is not read before setting no_room in
> n_tty_receive_buf_common;
> 3) smp_mb added in n_tty_read to make sure that no_room is not read
> before setting read_tail.

Please include proper changelog to all revised patch submissions, not 
just list of changes you've made (and properly version the submissions 
with [PATCH v2] etc. in the subject).

> ---
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..3327687da0d3 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
>         struct n_tty_data *ldata = tty->disc_data;
> 
>         /* Did the input worker stop? Restart it */
> -       if (unlikely(ldata->no_room)) {
> -               ldata->no_room = 0;
> +       if (unlikely(READ_ONCE(ldata->no_room))) {
> +               WRITE_ONCE(ldata->no_room, 0);
>                 WARN_RATELIMIT(tty->port->itty == NULL,
>                                 "scheduling with invalid itty\n");
> @@ -221,6 +221,7 @@ static ssize_t chars_in_buffer(struct tty_struct *tty)
>         struct n_tty_data *ldata = tty->disc_data;
>         ssize_t n = 0;
> 
> +       smp_mb();

You should add the reason in comment for any barriers you add.

>         if (!ldata->icanon)
>                 n = ldata->commit_head - ldata->read_tail;
>         else
> @@ -1632,7 +1633,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> const unsigned char *cp,
>                         if (overflow && room < 0)
>                                 ldata->read_head--;
>                         room = overflow;
> -                       ldata->no_room = flow && !room;
> +                       WRITE_ONCE(ldata->no_room, flow && !room);
>                 } else
>                         overflow = 0;
> 
> @@ -1663,6 +1664,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> const unsigned char *cp,
>         } else
>                 n_tty_check_throttle(tty);
> 
> +       if (!chars_in_buffer(tty))
> +               n_tty_kick_worker(tty);
> +

Instead of having the barrier in chars_in_buffer() perhaps it would be 
more obvious what's going on here and also scope down to the cases where 
the barrier might be needed in the first place if you'd do:

	if (ldata->no_room) {
		/* ... */
		smp_mb();
		if (!chars_in_buffer(tty))
			n_tty_kick_worker(tty);
	}


-- 
 i.

>         up_read(&tty->termios_rwsem);
> 
>         return rcvd;
> @@ -2180,8 +2184,10 @@ static ssize_t n_tty_read(struct tty_struct
> *tty, struct file *file,
>                 if (time)
>                         timeout = time;
>         }
> -       if (tail != ldata->read_tail)
> +       if (tail != ldata->read_tail) {
> +               smp_mb();
>                 n_tty_kick_worker(tty);
> +       }
>         up_read(&tty->termios_rwsem);
> 
>         remove_wait_queue(&tty->read_wait, &wait);
> --
> 2.27.0
> 
> Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年5月25日周三 19:21写道:
> >
> > On Wed, 25 May 2022, cael wrote:
> >
> > > >Now you switched to an entirely different case, not the one we were
> > > >talking about. ...There is no ldisc->no_room = true race in the case
> > > >you now described.
> > > So, I think we should back to the case ldata->no_room=true as
> > > ldata->no_room=false seems harmless.
> > >
> > > >I'm not worried about the case where both cpus call n_tty_kick_worker but
> > > >the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu
> > > >!no_room.
> > >
> > > As ldata->no_room=true is set before checking chars_in_buffer()
> >
> > Please take a brief look at Documentation/memory-barriers.txt and then
> > tell me if you still find this claim to be true.
> >
> > > if producer
> > > finds chars_in_buffer() > 0, then if reader is currently in n_tty_read,
> >
> > ...Then please do a similar analysis for ldata->read_tail. What guarantees
> > its update is seen by the producer cpu when the reader is already past the
> > point you think it still must be in?
> >
> > > when reader quits n_tty_read, n_tty_kick_worker will be called. If reader
> > > has already exited n_tty_read, which means that reader still has data to read,
> > > next time reader will call n_tty_kick_worker inside n_tty_read too.
> >
> > C-level analysis alone is not going to be very useful here given you're
> > dealing with a concurrency challenge here.
> >
> >
> > --
> >  i.
> >
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-05-24  2:21 tty: fix a possible hang on tty device cael
  2022-05-24  9:11 ` Ilpo Järvinen
@ 2022-06-01  9:38 ` Greg KH
  2022-06-01 13:39   ` cael
  1 sibling, 1 reply; 33+ messages in thread
From: Greg KH @ 2022-06-01  9:38 UTC (permalink / raw)
  To: cael; +Cc: jirislaby, linux-serial

On Tue, May 24, 2022 at 10:21:04AM +0800, cael wrote:
> We have met a hang on pty device, the reader was blocking at
>  epoll on master side, the writer was sleeping at wait_woken inside
>  n_tty_write on slave side ,and the write buffer on tty_port was full, we
>  found that the reader and writer would never be woken again and block
>  forever.
> 
> We thought the problem was caused as a race between reader and
> kworker as follows:
> n_tty_read(reader)| n_tty_receive_buf_common(kworker)
>                   |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                   |room <= 0
> copy_from_read_buf|
> n_tty_kick_worker |
>                   |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and call
> n_tty_kick_worker to check ldata->no_room and finds that there
> is no need to call tty_buffer_restart_work to flush data to reader
> and reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if writer buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> We think this problem can be solved with a check for read buffer
> inside function n_tty_receive_buf_common, if read buffer is empty and
> ldata->no_room is true, this means that kworker has more data to flush
> to read buffer, so a call to n_tty_kick_worker is necessary.
> 
> Signed-off-by: cael <juanfengpy@gmail.com>
> ---
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..36c7bc033c78 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> const unsigned char *cp,
>         } else
>                 n_tty_check_throttle(tty);
> 
> +       if (!chars_in_buffer(tty))
> +               n_tty_kick_worker(tty);
> +
>         up_read(&tty->termios_rwsem);
> 
>         return rcvd;
> -- 
> 2.27.0

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch is malformed (tabs converted to spaces, linewrapped, etc.)
  and can not be applied.  Please read the file,
  Documentation/email-clients.txt in order to fix this.

- You did not specify a description of why the patch is needed, or
  possibly, any description at all, in the email body.  Please read the
  section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what is needed in order to
  properly describe the change.

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

- It looks like you did not use your "real" name for the patch on either
  the Signed-off-by: line, or the From: line (both of which have to
  match).  Please read the kernel file, Documentation/SubmittingPatches
  for how to do this correctly.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-06-01  9:38 ` Greg KH
@ 2022-06-01 13:39   ` cael
  2022-06-01 14:47     ` Greg KH
  2022-06-01 15:28     ` Ilpo Järvinen
  0 siblings, 2 replies; 33+ messages in thread
From: cael @ 2022-06-01 13:39 UTC (permalink / raw)
  To: Greg KH; +Cc: Jiri Slaby, linux-serial

From: cael <juanfengpy@gmail.com>
Subject: [PATCH v2] tty: fix a possible hang on tty device

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and block forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
copy_from_read_buf()|
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and call
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if writer buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Signed-off-by: cael <juanfengpy@gmail.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

---
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..21241ea7cdb9 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
        struct n_tty_data *ldata = tty->disc_data;

        /* Did the input worker stop? Restart it */
-       if (unlikely(ldata->no_room)) {
-               ldata->no_room = 0;
+       if (unlikely(READ_ONCE(ldata->no_room))) {
+               WRITE_ONCE(ldata->no_room, 0);

                WARN_RATELIMIT(tty->port->itty == NULL,
                                "scheduling with invalid itty\n");
@@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
const unsigned char *cp,
                        if (overflow && room < 0)
                                ldata->read_head--;
                        room = overflow;
-                       ldata->no_room = flow && !room;
+                       WRITE_ONCE(ldata->no_room, flow && !room);
                } else
                        overflow = 0;

@@ -1663,6 +1663,21 @@ n_tty_receive_buf_common(struct tty_struct
*tty, const unsigned char *cp,
        } else
                n_tty_check_throttle(tty);

+       if (READ_ONCE(ldata->no_room)) {
+               /*
+                * Reader ensures that read_tail is updated before
checking no_room,
+                * make sure that no_room is set before reading read_tail here.
+                * Now no_room is visible by reader, the race needs to
be handled is
+                * that reader has passed checkpoint for no_room and
reader buffer
+                * is empty, if so n_tty_kick_worker will not be
called by reader,
+                * instead, this function is called here.
+                * barrier is paired with smp_mb() in n_tty_read()
+                */
+               smp_mb();
+               if (!chars_in_buffer(tty))
+                       n_tty_kick_worker(tty);
+       }
+
        up_read(&tty->termios_rwsem);

        return rcvd;
@@ -2180,8 +2195,14 @@ static ssize_t n_tty_read(struct tty_struct
*tty, struct file *file,
                if (time)
                        timeout = time;
        }
-       if (tail != ldata->read_tail)
+       if (tail != ldata->read_tail) {
+               /*
+                * Make sure no_room is not read before setting read_tail,
+                * paired with smp_mb() in n_tty_receive_buf_common()
+                */

+               smp_mb();
                n_tty_kick_worker(tty);
+       }
        up_read(&tty->termios_rwsem);

        remove_wait_queue(&tty->read_wait, &wait);
--
2.27.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-06-01 13:39   ` cael
@ 2022-06-01 14:47     ` Greg KH
  2022-06-01 15:28     ` Ilpo Järvinen
  1 sibling, 0 replies; 33+ messages in thread
From: Greg KH @ 2022-06-01 14:47 UTC (permalink / raw)
  To: cael; +Cc: Jiri Slaby, linux-serial

On Wed, Jun 01, 2022 at 09:39:27PM +0800, cael wrote:
> From: cael <juanfengpy@gmail.com>
> Subject: [PATCH v2] tty: fix a possible hang on tty device
> 
> We have met a hang on pty device, the reader was blocking
> at epoll on master side, the writer was sleeping at wait_woken
> inside n_tty_write on slave side, and the write buffer on
> tty_port was full, we found that the reader and writer would
> never be woken again and block forever.
> 
> The problem was caused by a race between reader and kworker:
> n_tty_read(reader):  n_tty_receive_buf_common(kworker):
>                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                     |room <= 0
> copy_from_read_buf()|
> n_tty_kick_worker() |
>                     |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and call
> n_tty_kick_worker to check ldata->no_room which is false and
> reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if writer buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> This problem can be solved with a check for read buffer size inside
> n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> is true, a call to n_tty_kick_worker is necessary to keep flushing
> data to reader.
> 
> Signed-off-by: cael <juanfengpy@gmail.com>
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> 
> ---
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..21241ea7cdb9 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
>         struct n_tty_data *ldata = tty->disc_data;
> 
>         /* Did the input worker stop? Restart it */
> -       if (unlikely(ldata->no_room)) {
> -               ldata->no_room = 0;
> +       if (unlikely(READ_ONCE(ldata->no_room))) {
> +               WRITE_ONCE(ldata->no_room, 0);
> 
>                 WARN_RATELIMIT(tty->port->itty == NULL,
>                                 "scheduling with invalid itty\n");
> @@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> const unsigned char *cp,
>                         if (overflow && room < 0)
>                                 ldata->read_head--;
>                         room = overflow;
> -                       ldata->no_room = flow && !room;
> +                       WRITE_ONCE(ldata->no_room, flow && !room);
>                 } else
>                         overflow = 0;
> 
> @@ -1663,6 +1663,21 @@ n_tty_receive_buf_common(struct tty_struct
> *tty, const unsigned char *cp,
>         } else
>                 n_tty_check_throttle(tty);
> 
> +       if (READ_ONCE(ldata->no_room)) {
> +               /*
> +                * Reader ensures that read_tail is updated before
> checking no_room,
> +                * make sure that no_room is set before reading read_tail here.
> +                * Now no_room is visible by reader, the race needs to
> be handled is
> +                * that reader has passed checkpoint for no_room and
> reader buffer
> +                * is empty, if so n_tty_kick_worker will not be
> called by reader,
> +                * instead, this function is called here.
> +                * barrier is paired with smp_mb() in n_tty_read()
> +                */
> +               smp_mb();
> +               if (!chars_in_buffer(tty))
> +                       n_tty_kick_worker(tty);
> +       }
> +
>         up_read(&tty->termios_rwsem);
> 
>         return rcvd;
> @@ -2180,8 +2195,14 @@ static ssize_t n_tty_read(struct tty_struct
> *tty, struct file *file,
>                 if (time)
>                         timeout = time;
>         }
> -       if (tail != ldata->read_tail)
> +       if (tail != ldata->read_tail) {
> +               /*
> +                * Make sure no_room is not read before setting read_tail,
> +                * paired with smp_mb() in n_tty_receive_buf_common()
> +                */
> 
> +               smp_mb();
>                 n_tty_kick_worker(tty);
> +       }
>         up_read(&tty->termios_rwsem);
> 
>         remove_wait_queue(&tty->read_wait, &wait);
> --
> 2.27.0

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch is malformed (tabs converted to spaces, linewrapped, etc.)
  and can not be applied.  Please read the file,
  Documentation/email-clients.txt in order to fix this.

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

- It looks like you did not use your "real" name for the patch on either
  the Signed-off-by: line, or the From: line (both of which have to
  match).  Please read the kernel file, Documentation/SubmittingPatches
  for how to do this correctly.

- This looks like a new version of a previously submitted patch, but you
  did not list below the --- line any changes from the previous version.
  Please read the section entitled "The canonical patch format" in the
  kernel file, Documentation/SubmittingPatches for what needs to be done
  here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-06-01 13:39   ` cael
  2022-06-01 14:47     ` Greg KH
@ 2022-06-01 15:28     ` Ilpo Järvinen
  2022-06-06 13:40       ` cael
  1 sibling, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2022-06-01 15:28 UTC (permalink / raw)
  To: cael; +Cc: Greg KH, Jiri Slaby, linux-serial

[-- Attachment #1: Type: text/plain, Size: 4065 bytes --]

On Wed, 1 Jun 2022, cael wrote:

> From: cael <juanfengpy@gmail.com>
> Subject: [PATCH v2] tty: fix a possible hang on tty device
> 
> We have met a hang on pty device, the reader was blocking
> at epoll on master side, the writer was sleeping at wait_woken
> inside n_tty_write on slave side, and the write buffer on
> tty_port was full, we found that the reader and writer would
> never be woken again and block forever.
> 
> The problem was caused by a race between reader and kworker:
> n_tty_read(reader):  n_tty_receive_buf_common(kworker):
>                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                     |room <= 0
> copy_from_read_buf()|
> n_tty_kick_worker() |
>                     |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and call
> n_tty_kick_worker to check ldata->no_room which is false and
> reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if writer buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> This problem can be solved with a check for read buffer size inside
> n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> is true, a call to n_tty_kick_worker is necessary to keep flushing
> data to reader.
> 
> Signed-off-by: cael <juanfengpy@gmail.com>
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

You should not add Reviewed-by on your own. Only after the person 
himself/herself gives that tag for you, include it.

> 
> ---
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..21241ea7cdb9 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
>         struct n_tty_data *ldata = tty->disc_data;
> 
>         /* Did the input worker stop? Restart it */
> -       if (unlikely(ldata->no_room)) {
> -               ldata->no_room = 0;
> +       if (unlikely(READ_ONCE(ldata->no_room))) {
> +               WRITE_ONCE(ldata->no_room, 0);
> 
>                 WARN_RATELIMIT(tty->port->itty == NULL,
>                                 "scheduling with invalid itty\n");
> @@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> const unsigned char *cp,
>                         if (overflow && room < 0)
>                                 ldata->read_head--;
>                         room = overflow;
> -                       ldata->no_room = flow && !room;
> +                       WRITE_ONCE(ldata->no_room, flow && !room);
>                 } else
>                         overflow = 0;
> 
> @@ -1663,6 +1663,21 @@ n_tty_receive_buf_common(struct tty_struct
> *tty, const unsigned char *cp,
>         } else
>                 n_tty_check_throttle(tty);
> 
> +       if (READ_ONCE(ldata->no_room)) {

Hmm, since this function is only one setting it to non-zero value, perhaps 
the information could be carried over here in a no_room local var (and 
maybe unlikely() would be useful too similar to n_tty_kick_worker). After 
all, this check is just an optimization for the common case where we know 
no_room is definitely zero.

> +               /*
> +                * Reader ensures that read_tail is updated before
> checking no_room,
> +                * make sure that no_room is set before reading read_tail here.



> +                * Now no_room is visible by reader, the race needs to
> be handled is
> +                * that reader has passed checkpoint for no_room and
> reader buffer
> +                * is empty, if so n_tty_kick_worker will not be
> called by reader,
> +                * instead, this function is called here.

This part is hard to parse/understand. Please try to rephrase.


-- 
 i.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-06-01 15:28     ` Ilpo Järvinen
@ 2022-06-06 13:40       ` cael
  2022-06-06 14:43         ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: cael @ 2022-06-06 13:40 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Greg KH, Jiri Slaby, linux-serial

[-- Attachment #1: Type: text/plain, Size: 4903 bytes --]

From: cael <juanfengpy@gmail.com>
Subject:[PATCH v3] tty: fix a possible hang on tty device

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and block forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
copy_from_read_buf()|
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and call
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if writer buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Signed-off-by: cael <juanfengpy@gmail.com>

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..544f782b9a11 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
        struct n_tty_data *ldata = tty->disc_data;

        /* Did the input worker stop? Restart it */
-       if (unlikely(ldata->no_room)) {
-               ldata->no_room = 0;
+       if (unlikely(READ_ONCE(ldata->no_room))) {
+               WRITE_ONCE(ldata->no_room, 0);

                WARN_RATELIMIT(tty->port->itty == NULL,
                                "scheduling with invalid itty\n");
@@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
const unsigned char *cp,
                        if (overflow && room < 0)
                                ldata->read_head--;
                        room = overflow;
-                       ldata->no_room = flow && !room;
+                       WRITE_ONCE(ldata->no_room, flow && !room);
                } else
                        overflow = 0;

@@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct
*tty, const unsigned char *cp,
        } else
                n_tty_check_throttle(tty);

+       if (unlikely(ldata->no_room)) {
+               /*
+                * Barrier here is to ensure to read the latest read_tail in
+                * chars_in_buffer() and to make sure that read_tail
is not loaded
+                * before ldata->no_room is set, otherwise, following
race may occur:
+                * n_tty_receive_buf_common() |n_tty_read()
+                * chars_in_buffer() > 0      |
+                *
|copy_from_read_buf()->chars_in_buffer()==0
+                *                            |if (ldata->no_room)
+                * ldata->no_room = 1         |
+                * Then both kworker and reader will fail to kick
n_tty_kick_worker(),
+                * smp_mb is paired with smp_mb() in n_tty_read().
+                */
+               smp_mb();
+               if (!chars_in_buffer(tty))
+                       n_tty_kick_worker(tty);
+       }
+
        up_read(&tty->termios_rwsem);

        return rcvd;
@@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct
*tty, struct file *file,
                if (time)
                        timeout = time;
        }
-       if (tail != ldata->read_tail)
+       if (tail != ldata->read_tail) {
+               /*
+                * Make sure no_room is not read before setting read_tail,
+                * otherwise, following race may occur:
+                * n_tty_read()
|n_tty_receive_buf_common()
+                * if(ldata->no_room)->false            |
+                *                                      |ldata->no_room = 1
+                *                                      |char_in_buffer() > 0
+                * ldata->read_tail = ldata->commit_head|
+                * Then copy_from_read_buf() in reader consumes all the data
+                * in read buffer, both reader and kworker will fail to kick
+                * tty_buffer_restart_work().
+                * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+                */
+               smp_mb();
                n_tty_kick_worker(tty);
+       }
        up_read(&tty->termios_rwsem);

        remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0

[-- Attachment #2: 0001-PATCH-v3-tty-fix-a-possible-hang-on-tty-device.patch --]
[-- Type: application/octet-stream, Size: 4314 bytes --]

From 6d213bd916fce9140557221a7eff0d65bd33df57 Mon Sep 17 00:00:00 2001
From: cael <juanfengpy@gmail.com>
Date: Mon, 23 May 2022 20:53:55 +0800
Subject: [PATCH] [PATCH v3] tty: fix a possible hang on tty device

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and block forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
copy_from_read_buf()|
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and call
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if writer buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Signed-off-by: cael <juanfengpy@gmail.com>

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..544f782b9a11 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set, otherwise, following race may occur:
+		 * n_tty_receive_buf_common() |n_tty_read()
+		 * chars_in_buffer() > 0      |
+		 *                            |copy_from_read_buf()->chars_in_buffer()==0
+		 *                            |if (ldata->no_room)
+		 * ldata->no_room = 1         |
+		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+		 * smp_mb is paired with smp_mb() in n_tty_read().
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (tail != ldata->read_tail)
+	if (tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read before setting read_tail,
+		 * otherwise, following race may occur:
+		 * n_tty_read()		                |n_tty_receive_buf_common()
+		 * if(ldata->no_room)->false            |
+		 *			                |ldata->no_room = 1
+		 *                                      |char_in_buffer() > 0
+		 * ldata->read_tail = ldata->commit_head|
+		 * Then copy_from_read_buf() in reader consumes all the data
+		 * in read buffer, both reader and kworker will fail to kick
+		 * tty_buffer_restart_work().
+		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-06-06 13:40       ` cael
@ 2022-06-06 14:43         ` Greg KH
  2022-06-11  6:50           ` cael
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2022-06-06 14:43 UTC (permalink / raw)
  To: cael; +Cc: Ilpo Järvinen, Jiri Slaby, linux-serial

On Mon, Jun 06, 2022 at 09:40:16PM +0800, cael wrote:
> From: cael <juanfengpy@gmail.com>
> Subject:[PATCH v3] tty: fix a possible hang on tty device
> 
> We have met a hang on pty device, the reader was blocking
> at epoll on master side, the writer was sleeping at wait_woken
> inside n_tty_write on slave side, and the write buffer on
> tty_port was full, we found that the reader and writer would
> never be woken again and block forever.
> 
> The problem was caused by a race between reader and kworker:
> n_tty_read(reader):  n_tty_receive_buf_common(kworker):
>                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                     |room <= 0
> copy_from_read_buf()|
> n_tty_kick_worker() |
>                     |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and call
> n_tty_kick_worker to check ldata->no_room which is false and
> reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if writer buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> This problem can be solved with a check for read buffer size inside
> n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> is true, a call to n_tty_kick_worker is necessary to keep flushing
> data to reader.
> 
> Signed-off-by: cael <juanfengpy@gmail.com>
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..544f782b9a11 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
>         struct n_tty_data *ldata = tty->disc_data;
> 
>         /* Did the input worker stop? Restart it */
> -       if (unlikely(ldata->no_room)) {
> -               ldata->no_room = 0;
> +       if (unlikely(READ_ONCE(ldata->no_room))) {
> +               WRITE_ONCE(ldata->no_room, 0);
> 
>                 WARN_RATELIMIT(tty->port->itty == NULL,
>                                 "scheduling with invalid itty\n");
> @@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> const unsigned char *cp,
>                         if (overflow && room < 0)
>                                 ldata->read_head--;
>                         room = overflow;
> -                       ldata->no_room = flow && !room;
> +                       WRITE_ONCE(ldata->no_room, flow && !room);
>                 } else
>                         overflow = 0;
> 
> @@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct
> *tty, const unsigned char *cp,
>         } else
>                 n_tty_check_throttle(tty);
> 
> +       if (unlikely(ldata->no_room)) {
> +               /*
> +                * Barrier here is to ensure to read the latest read_tail in
> +                * chars_in_buffer() and to make sure that read_tail
> is not loaded
> +                * before ldata->no_room is set, otherwise, following
> race may occur:
> +                * n_tty_receive_buf_common() |n_tty_read()
> +                * chars_in_buffer() > 0      |
> +                *
> |copy_from_read_buf()->chars_in_buffer()==0
> +                *                            |if (ldata->no_room)
> +                * ldata->no_room = 1         |
> +                * Then both kworker and reader will fail to kick
> n_tty_kick_worker(),
> +                * smp_mb is paired with smp_mb() in n_tty_read().
> +                */
> +               smp_mb();
> +               if (!chars_in_buffer(tty))
> +                       n_tty_kick_worker(tty);
> +       }
> +
>         up_read(&tty->termios_rwsem);
> 
>         return rcvd;
> @@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct
> *tty, struct file *file,
>                 if (time)
>                         timeout = time;
>         }
> -       if (tail != ldata->read_tail)
> +       if (tail != ldata->read_tail) {
> +               /*
> +                * Make sure no_room is not read before setting read_tail,
> +                * otherwise, following race may occur:
> +                * n_tty_read()
> |n_tty_receive_buf_common()
> +                * if(ldata->no_room)->false            |
> +                *                                      |ldata->no_room = 1
> +                *                                      |char_in_buffer() > 0
> +                * ldata->read_tail = ldata->commit_head|
> +                * Then copy_from_read_buf() in reader consumes all the data
> +                * in read buffer, both reader and kworker will fail to kick
> +                * tty_buffer_restart_work().
> +                * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
> +                */
> +               smp_mb();
>                 n_tty_kick_worker(tty);
> +       }
>         up_read(&tty->termios_rwsem);
> 
>         remove_wait_queue(&tty->read_wait, &wait);
> -- 
> 2.27.0


Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch is malformed (tabs converted to spaces, linewrapped, etc.)
  and can not be applied.  Please read the file,
  Documentation/email-clients.txt in order to fix this.

- Your patch was attached, please place it inline so that it can be
  applied directly from the email message itself.

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

- It looks like you did not use your "real" name for the patch on either
  the Signed-off-by: line, or the From: line (both of which have to
  match).  Please read the kernel file, Documentation/SubmittingPatches
  for how to do this correctly.

- This looks like a new version of a previously submitted patch, but you
  did not list below the --- line any changes from the previous version.
  Please read the section entitled "The canonical patch format" in the
  kernel file, Documentation/SubmittingPatches for what needs to be done
  here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-06-06 14:43         ` Greg KH
@ 2022-06-11  6:50           ` cael
  2022-06-11  7:32             ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: cael @ 2022-06-11  6:50 UTC (permalink / raw)
  To: Greg KH; +Cc: Ilpo Järvinen, Jiri Slaby, linux-serial

From: cael <juanfengpy@gmail.com>
Subject: [PATCH] [PATCH v3] tty: fix a possible hang on tty device

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and block forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
copy_from_read_buf()|
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and call
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if writer buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Signed-off-by: cael <juanfengpy@gmail.com>

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..544f782b9a11 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
        struct n_tty_data *ldata = tty->disc_data;

        /* Did the input worker stop? Restart it */
-       if (unlikely(ldata->no_room)) {
-               ldata->no_room = 0;
+       if (unlikely(READ_ONCE(ldata->no_room))) {
+               WRITE_ONCE(ldata->no_room, 0);

                WARN_RATELIMIT(tty->port->itty == NULL,
                                "scheduling with invalid itty\n");
@@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
const unsigned char *cp,
                        if (overflow && room < 0)
                                ldata->read_head--;
                        room = overflow;
-                       ldata->no_room = flow && !room;
+                       WRITE_ONCE(ldata->no_room, flow && !room);
                } else
                        overflow = 0;

@@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct
*tty, const unsigned char *cp,
        } else
                n_tty_check_throttle(tty);

+       if (unlikely(ldata->no_room)) {
+               /*
+                * Barrier here is to ensure to read the latest read_tail in
+                * chars_in_buffer() and to make sure that read_tail
is not loaded
+                * before ldata->no_room is set, otherwise, following
race may occur:
+                * n_tty_receive_buf_common() |n_tty_read()
+                * chars_in_buffer() > 0      |
+                *
|copy_from_read_buf()->chars_in_buffer()==0
+                *                            |if (ldata->no_room)
+                * ldata->no_room = 1         |
+                * Then both kworker and reader will fail to kick
n_tty_kick_worker(),
+                * smp_mb is paired with smp_mb() in n_tty_read().
+                */
+               smp_mb();
+               if (!chars_in_buffer(tty))
+                       n_tty_kick_worker(tty);
+       }
+
        up_read(&tty->termios_rwsem);

        return rcvd;
@@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct
*tty, struct file *file,
                if (time)
                        timeout = time;
        }
-       if (tail != ldata->read_tail)
+       if (tail != ldata->read_tail) {
+               /*
+                * Make sure no_room is not read before setting read_tail,
+                * otherwise, following race may occur:
+                * n_tty_read()
|n_tty_receive_buf_common()
+                * if(ldata->no_room)->false            |
+                *                                      |ldata->no_room = 1
+                *                                      |char_in_buffer() > 0
+                * ldata->read_tail = ldata->commit_head|
+                * Then copy_from_read_buf() in reader consumes all the data
+                * in read buffer, both reader and kworker will fail to kick
+                * tty_buffer_restart_work().
+                * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+                */
+               smp_mb();
                n_tty_kick_worker(tty);
+       }
        up_read(&tty->termios_rwsem);

        remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0

Greg KH <gregkh@linuxfoundation.org> 于2022年6月6日周一 22:43写道:
>
> On Mon, Jun 06, 2022 at 09:40:16PM +0800, cael wrote:
> > From: cael <juanfengpy@gmail.com>
> > Subject:[PATCH v3] tty: fix a possible hang on tty device
> >
> > We have met a hang on pty device, the reader was blocking
> > at epoll on master side, the writer was sleeping at wait_woken
> > inside n_tty_write on slave side, and the write buffer on
> > tty_port was full, we found that the reader and writer would
> > never be woken again and block forever.
> >
> > The problem was caused by a race between reader and kworker:
> > n_tty_read(reader):  n_tty_receive_buf_common(kworker):
> >                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> >                     |room <= 0
> > copy_from_read_buf()|
> > n_tty_kick_worker() |
> >                     |ldata->no_room = true
> >
> > After writing to slave device, writer wakes up kworker to flush
> > data on tty_port to reader, and the kworker finds that reader
> > has no room to store data so room <= 0 is met. At this moment,
> > reader consumes all the data on reader buffer and call
> > n_tty_kick_worker to check ldata->no_room which is false and
> > reader quits reading. Then kworker sets ldata->no_room=true
> > and quits too.
> >
> > If write buffer is not full, writer will wake kworker to flush data
> > again after following writes, but if writer buffer is full and writer
> > goes to sleep, kworker will never be woken again and tty device is
> > blocked.
> >
> > This problem can be solved with a check for read buffer size inside
> > n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> > is true, a call to n_tty_kick_worker is necessary to keep flushing
> > data to reader.
> >
> > Signed-off-by: cael <juanfengpy@gmail.com>
> >
> > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> > index efc72104c840..544f782b9a11 100644
> > --- a/drivers/tty/n_tty.c
> > +++ b/drivers/tty/n_tty.c
> > @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
> >         struct n_tty_data *ldata = tty->disc_data;
> >
> >         /* Did the input worker stop? Restart it */
> > -       if (unlikely(ldata->no_room)) {
> > -               ldata->no_room = 0;
> > +       if (unlikely(READ_ONCE(ldata->no_room))) {
> > +               WRITE_ONCE(ldata->no_room, 0);
> >
> >                 WARN_RATELIMIT(tty->port->itty == NULL,
> >                                 "scheduling with invalid itty\n");
> > @@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> > const unsigned char *cp,
> >                         if (overflow && room < 0)
> >                                 ldata->read_head--;
> >                         room = overflow;
> > -                       ldata->no_room = flow && !room;
> > +                       WRITE_ONCE(ldata->no_room, flow && !room);
> >                 } else
> >                         overflow = 0;
> >
> > @@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct
> > *tty, const unsigned char *cp,
> >         } else
> >                 n_tty_check_throttle(tty);
> >
> > +       if (unlikely(ldata->no_room)) {
> > +               /*
> > +                * Barrier here is to ensure to read the latest read_tail in
> > +                * chars_in_buffer() and to make sure that read_tail
> > is not loaded
> > +                * before ldata->no_room is set, otherwise, following
> > race may occur:
> > +                * n_tty_receive_buf_common() |n_tty_read()
> > +                * chars_in_buffer() > 0      |
> > +                *
> > |copy_from_read_buf()->chars_in_buffer()==0
> > +                *                            |if (ldata->no_room)
> > +                * ldata->no_room = 1         |
> > +                * Then both kworker and reader will fail to kick
> > n_tty_kick_worker(),
> > +                * smp_mb is paired with smp_mb() in n_tty_read().
> > +                */
> > +               smp_mb();
> > +               if (!chars_in_buffer(tty))
> > +                       n_tty_kick_worker(tty);
> > +       }
> > +
> >         up_read(&tty->termios_rwsem);
> >
> >         return rcvd;
> > @@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct
> > *tty, struct file *file,
> >                 if (time)
> >                         timeout = time;
> >         }
> > -       if (tail != ldata->read_tail)
> > +       if (tail != ldata->read_tail) {
> > +               /*
> > +                * Make sure no_room is not read before setting read_tail,
> > +                * otherwise, following race may occur:
> > +                * n_tty_read()
> > |n_tty_receive_buf_common()
> > +                * if(ldata->no_room)->false            |
> > +                *                                      |ldata->no_room = 1
> > +                *                                      |char_in_buffer() > 0
> > +                * ldata->read_tail = ldata->commit_head|
> > +                * Then copy_from_read_buf() in reader consumes all the data
> > +                * in read buffer, both reader and kworker will fail to kick
> > +                * tty_buffer_restart_work().
> > +                * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
> > +                */
> > +               smp_mb();
> >                 n_tty_kick_worker(tty);
> > +       }
> >         up_read(&tty->termios_rwsem);
> >
> >         remove_wait_queue(&tty->read_wait, &wait);
> > --
> > 2.27.0
>
>
> Hi,
>
> This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
> a patch that has triggered this response.  He used to manually respond
> to these common problems, but in order to save his sanity (he kept
> writing the same thing over and over, yet to different people), I was
> created.  Hopefully you will not take offence and will fix the problem
> in your patch and resubmit it so that it can be accepted into the Linux
> kernel tree.
>
> You are receiving this message because of the following common error(s)
> as indicated below:
>
> - Your patch is malformed (tabs converted to spaces, linewrapped, etc.)
>   and can not be applied.  Please read the file,
>   Documentation/email-clients.txt in order to fix this.
>
> - Your patch was attached, please place it inline so that it can be
>   applied directly from the email message itself.
>
> - You did not write a descriptive Subject: for the patch, allowing Greg,
>   and everyone else, to know what this patch is all about.  Please read
>   the section entitled "The canonical patch format" in the kernel file,
>   Documentation/SubmittingPatches for what a proper Subject: line should
>   look like.
>
> - It looks like you did not use your "real" name for the patch on either
>   the Signed-off-by: line, or the From: line (both of which have to
>   match).  Please read the kernel file, Documentation/SubmittingPatches
>   for how to do this correctly.
>
> - This looks like a new version of a previously submitted patch, but you
>   did not list below the --- line any changes from the previous version.
>   Please read the section entitled "The canonical patch format" in the
>   kernel file, Documentation/SubmittingPatches for what needs to be done
>   here to properly describe this.
>
> If you wish to discuss this problem further, or you have questions about
> how to resolve this issue, please feel free to respond to this email and
> Greg will reply once he has dug out from the pending patches received
> from other developers.
>
> thanks,
>
> greg k-h's patch email bot

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: tty: fix a possible hang on tty device
  2022-06-11  6:50           ` cael
@ 2022-06-11  7:32             ` Greg KH
  2022-06-13 12:30               ` [PATCH v3] tty: fix hang on tty device with no_room set juanfengpy
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2022-06-11  7:32 UTC (permalink / raw)
  To: cael; +Cc: Ilpo Järvinen, Jiri Slaby, linux-serial

On Sat, Jun 11, 2022 at 02:50:54PM +0800, cael wrote:
> From: cael <juanfengpy@gmail.com>
> Subject: [PATCH] [PATCH v3] tty: fix a possible hang on tty device
> 
> We have met a hang on pty device, the reader was blocking
> at epoll on master side, the writer was sleeping at wait_woken
> inside n_tty_write on slave side, and the write buffer on
> tty_port was full, we found that the reader and writer would
> never be woken again and block forever.
> 
> The problem was caused by a race between reader and kworker:
> n_tty_read(reader):  n_tty_receive_buf_common(kworker):
>                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                     |room <= 0
> copy_from_read_buf()|
> n_tty_kick_worker() |
>                     |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and call
> n_tty_kick_worker to check ldata->no_room which is false and
> reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if writer buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> This problem can be solved with a check for read buffer size inside
> n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> is true, a call to n_tty_kick_worker is necessary to keep flushing
> data to reader.
> 
> Signed-off-by: cael <juanfengpy@gmail.com>
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..544f782b9a11 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
>         struct n_tty_data *ldata = tty->disc_data;
> 
>         /* Did the input worker stop? Restart it */
> -       if (unlikely(ldata->no_room)) {
> -               ldata->no_room = 0;
> +       if (unlikely(READ_ONCE(ldata->no_room))) {
> +               WRITE_ONCE(ldata->no_room, 0);
> 
>                 WARN_RATELIMIT(tty->port->itty == NULL,
>                                 "scheduling with invalid itty\n");
> @@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty,
> const unsigned char *cp,
>                         if (overflow && room < 0)
>                                 ldata->read_head--;
>                         room = overflow;
> -                       ldata->no_room = flow && !room;
> +                       WRITE_ONCE(ldata->no_room, flow && !room);
>                 } else
>                         overflow = 0;
> 
> @@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct
> *tty, const unsigned char *cp,
>         } else
>                 n_tty_check_throttle(tty);
> 
> +       if (unlikely(ldata->no_room)) {
> +               /*
> +                * Barrier here is to ensure to read the latest read_tail in
> +                * chars_in_buffer() and to make sure that read_tail
> is not loaded
> +                * before ldata->no_room is set, otherwise, following
> race may occur:
> +                * n_tty_receive_buf_common() |n_tty_read()
> +                * chars_in_buffer() > 0      |
> +                *
> |copy_from_read_buf()->chars_in_buffer()==0
> +                *                            |if (ldata->no_room)
> +                * ldata->no_room = 1         |
> +                * Then both kworker and reader will fail to kick
> n_tty_kick_worker(),
> +                * smp_mb is paired with smp_mb() in n_tty_read().
> +                */
> +               smp_mb();
> +               if (!chars_in_buffer(tty))
> +                       n_tty_kick_worker(tty);
> +       }
> +
>         up_read(&tty->termios_rwsem);
> 
>         return rcvd;
> @@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct
> *tty, struct file *file,
>                 if (time)
>                         timeout = time;
>         }
> -       if (tail != ldata->read_tail)
> +       if (tail != ldata->read_tail) {
> +               /*
> +                * Make sure no_room is not read before setting read_tail,
> +                * otherwise, following race may occur:
> +                * n_tty_read()
> |n_tty_receive_buf_common()
> +                * if(ldata->no_room)->false            |
> +                *                                      |ldata->no_room = 1
> +                *                                      |char_in_buffer() > 0
> +                * ldata->read_tail = ldata->commit_head|
> +                * Then copy_from_read_buf() in reader consumes all the data
> +                * in read buffer, both reader and kworker will fail to kick
> +                * tty_buffer_restart_work().
> +                * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
> +                */
> +               smp_mb();
>                 n_tty_kick_worker(tty);
> +       }
>         up_read(&tty->termios_rwsem);
> 
>         remove_wait_queue(&tty->read_wait, &wait);
> -- 
> 2.27.0

Is there any specific reason you ignored all of the recommendations from
my previous email as to what needs to be changed in order for this patch
to be accepted?  It doesn't make any sense for me to just keep sending
the same information again :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3] tty: fix hang on tty device with no_room set
  2022-06-11  7:32             ` Greg KH
@ 2022-06-13 12:30               ` juanfengpy
  2022-06-13 17:20                 ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: juanfengpy @ 2022-06-13 12:30 UTC (permalink / raw)
  To: gregkh
  Cc: jirislaby, ilpo.jarvinen, benbjiang, robinlai, linux-serial, caelli

From: caelli <caelli@tencent.com>

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
copy_from_read_buf()|
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and call
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Signed-off-by: caelli <caelli@tencent.com>
---
Previous threads:
https://lore.kernel.org/all/CAPmgiULo4h8bOrzL+XJ5Pndw0kz80fBPfH_KNLx3c5j-Yj04SA@mail.gmail.com/t/

I corrected some format problems as recommended and switched client to git send-email,
which may be ok. And subject is changed from 'tty: fix a possible hang on tty device' to
'tty: fix hang on tty device with no_room set' to make subject more obvious.

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..544f782b9a11 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set, otherwise, following race may occur:
+		 * n_tty_receive_buf_common() |n_tty_read()
+		 * chars_in_buffer() > 0      |
+		 *                            |copy_from_read_buf()->chars_in_buffer()==0
+		 *                            |if (ldata->no_room)
+		 * ldata->no_room = 1         |
+		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+		 * smp_mb is paired with smp_mb() in n_tty_read().
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (tail != ldata->read_tail)
+	if (tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read before setting read_tail,
+		 * otherwise, following race may occur:
+		 * n_tty_read()		                |n_tty_receive_buf_common()
+		 * if(ldata->no_room)->false            |
+		 *			                |ldata->no_room = 1
+		 *                                      |char_in_buffer() > 0
+		 * ldata->read_tail = ldata->commit_head|
+		 * Then copy_from_read_buf() in reader consumes all the data
+		 * in read buffer, both reader and kworker will fail to kick
+		 * tty_buffer_restart_work().
+		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] tty: fix hang on tty device with no_room set
  2022-06-13 12:30               ` [PATCH v3] tty: fix hang on tty device with no_room set juanfengpy
@ 2022-06-13 17:20                 ` Greg KH
  2022-06-15  3:45                   ` [PATCH v4] " cael
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2022-06-13 17:20 UTC (permalink / raw)
  To: juanfengpy
  Cc: jirislaby, ilpo.jarvinen, benbjiang, robinlai, linux-serial, caelli

On Mon, Jun 13, 2022 at 08:30:29PM +0800, juanfengpy@gmail.com wrote:
> From: caelli <caelli@tencent.com>

This name/address does not match what you are sending it from, and I do
not think this is how you sign legal documents right?

For that reason alone, I can't take this :(

> 
> We have met a hang on pty device, the reader was blocking
> at epoll on master side, the writer was sleeping at wait_woken
> inside n_tty_write on slave side, and the write buffer on
> tty_port was full, we found that the reader and writer would
> never be woken again and blocked forever.
> 
> The problem was caused by a race between reader and kworker:
> n_tty_read(reader):  n_tty_receive_buf_common(kworker):
>                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                     |room <= 0
> copy_from_read_buf()|
> n_tty_kick_worker() |
>                     |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and call
> n_tty_kick_worker to check ldata->no_room which is false and
> reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if write buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> This problem can be solved with a check for read buffer size inside
> n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> is true, a call to n_tty_kick_worker is necessary to keep flushing
> data to reader.
> 
> Signed-off-by: caelli <caelli@tencent.com>
> ---
> Previous threads:
> https://lore.kernel.org/all/CAPmgiULo4h8bOrzL+XJ5Pndw0kz80fBPfH_KNLx3c5j-Yj04SA@mail.gmail.com/t/
> 
> I corrected some format problems as recommended and switched client to git send-email,
> which may be ok. And subject is changed from 'tty: fix a possible hang on tty device' to
> 'tty: fix hang on tty device with no_room set' to make subject more obvious.

Please properly version your patches like the documentation explains how
to, so we know what has changed from previous versions.  Otherwise they
all look identical to us.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4] tty: fix hang on tty device with no_room set
  2022-06-13 17:20                 ` Greg KH
@ 2022-06-15  3:45                   ` cael
  2022-06-15  5:00                     ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: cael @ 2022-06-15  3:45 UTC (permalink / raw)
  To: gregkh
  Cc: jirislaby, ilpo.jarvinen, benbjiang, robinlai, linux-serial, juanfengpy

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
copy_from_read_buf()|
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Signed-off-by: cael <juanfengpy@gmail.com>
---
Patch changelogs between v1 and v2:
	-add barrier inside n_tty_read and n_tty_receive_buf_common;
	-comment why barrier is needed;
	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
	-in function n_tty_receive_buf_common, add unlikely to check
	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
	 is removed here to get locality;
	-change comment for barrier to show the race condition to make
	 comment easier to understand;
Patch changelogs between v3 and v4:
	-change subject from 'tty: fix a possible hang on tty device' to
	 'tty: fix hang on tty device with no_room set' to make subject 
	 more obvious.

 drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..544f782b9a11 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set, otherwise, following race may occur:
+		 * n_tty_receive_buf_common() |n_tty_read()
+		 * chars_in_buffer() > 0      |
+		 *                            |copy_from_read_buf()->chars_in_buffer()==0
+		 *                            |if (ldata->no_room)
+		 * ldata->no_room = 1         |
+		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+		 * smp_mb is paired with smp_mb() in n_tty_read().
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (tail != ldata->read_tail)
+	if (tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read before setting read_tail,
+		 * otherwise, following race may occur:
+		 * n_tty_read()		                |n_tty_receive_buf_common()
+		 * if(ldata->no_room)->false            |
+		 *			                |ldata->no_room = 1
+		 *                                      |char_in_buffer() > 0
+		 * ldata->read_tail = ldata->commit_head|
+		 * Then copy_from_read_buf() in reader consumes all the data
+		 * in read buffer, both reader and kworker will fail to kick
+		 * tty_buffer_restart_work().
+		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] tty: fix hang on tty device with no_room set
  2022-06-15  3:45                   ` [PATCH v4] " cael
@ 2022-06-15  5:00                     ` Greg KH
  2022-06-15  7:57                       ` Ilpo Järvinen
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2022-06-15  5:00 UTC (permalink / raw)
  To: cael; +Cc: jirislaby, ilpo.jarvinen, benbjiang, robinlai, linux-serial

On Wed, Jun 15, 2022 at 11:45:10AM +0800, cael wrote:
> We have met a hang on pty device, the reader was blocking
> at epoll on master side, the writer was sleeping at wait_woken
> inside n_tty_write on slave side, and the write buffer on
> tty_port was full, we found that the reader and writer would
> never be woken again and blocked forever.
> 
> The problem was caused by a race between reader and kworker:
> n_tty_read(reader):  n_tty_receive_buf_common(kworker):
>                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                     |room <= 0
> copy_from_read_buf()|
> n_tty_kick_worker() |
>                     |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and calls
> n_tty_kick_worker to check ldata->no_room which is false and
> reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if write buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> This problem can be solved with a check for read buffer size inside
> n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> is true, a call to n_tty_kick_worker is necessary to keep flushing
> data to reader.
> 
> Signed-off-by: cael <juanfengpy@gmail.com>
> ---
> Patch changelogs between v1 and v2:
> 	-add barrier inside n_tty_read and n_tty_receive_buf_common;
> 	-comment why barrier is needed;
> 	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
> Patch changelogs between v2 and v3:
> 	-in function n_tty_receive_buf_common, add unlikely to check
> 	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
> 	 is removed here to get locality;
> 	-change comment for barrier to show the race condition to make
> 	 comment easier to understand;
> Patch changelogs between v3 and v4:
> 	-change subject from 'tty: fix a possible hang on tty device' to
> 	 'tty: fix hang on tty device with no_room set' to make subject 
> 	 more obvious.
> 
>  drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
>  1 file changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..544f782b9a11 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
>  	struct n_tty_data *ldata = tty->disc_data;
>  
>  	/* Did the input worker stop? Restart it */
> -	if (unlikely(ldata->no_room)) {
> -		ldata->no_room = 0;
> +	if (unlikely(READ_ONCE(ldata->no_room))) {
> +		WRITE_ONCE(ldata->no_room, 0);
>  
>  		WARN_RATELIMIT(tty->port->itty == NULL,
>  				"scheduling with invalid itty\n");
> @@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>  			if (overflow && room < 0)
>  				ldata->read_head--;
>  			room = overflow;
> -			ldata->no_room = flow && !room;
> +			WRITE_ONCE(ldata->no_room, flow && !room);
>  		} else
>  			overflow = 0;
>  
> @@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>  	} else
>  		n_tty_check_throttle(tty);
>  
> +	if (unlikely(ldata->no_room)) {
> +		/*
> +		 * Barrier here is to ensure to read the latest read_tail in
> +		 * chars_in_buffer() and to make sure that read_tail is not loaded
> +		 * before ldata->no_room is set, otherwise, following race may occur:
> +		 * n_tty_receive_buf_common() |n_tty_read()
> +		 * chars_in_buffer() > 0      |
> +		 *                            |copy_from_read_buf()->chars_in_buffer()==0
> +		 *                            |if (ldata->no_room)
> +		 * ldata->no_room = 1         |
> +		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
> +		 * smp_mb is paired with smp_mb() in n_tty_read().
> +		 */
> +		smp_mb();
> +		if (!chars_in_buffer(tty))
> +			n_tty_kick_worker(tty);
> +	}
> +
>  	up_read(&tty->termios_rwsem);
>  
>  	return rcvd;
> @@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
>  		if (time)
>  			timeout = time;
>  	}
> -	if (tail != ldata->read_tail)
> +	if (tail != ldata->read_tail) {
> +		/*
> +		 * Make sure no_room is not read before setting read_tail,
> +		 * otherwise, following race may occur:
> +		 * n_tty_read()		                |n_tty_receive_buf_common()
> +		 * if(ldata->no_room)->false            |
> +		 *			                |ldata->no_room = 1
> +		 *                                      |char_in_buffer() > 0
> +		 * ldata->read_tail = ldata->commit_head|
> +		 * Then copy_from_read_buf() in reader consumes all the data
> +		 * in read buffer, both reader and kworker will fail to kick
> +		 * tty_buffer_restart_work().
> +		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
> +		 */
> +		smp_mb();
>  		n_tty_kick_worker(tty);
> +	}
>  	up_read(&tty->termios_rwsem);
>  
>  	remove_wait_queue(&tty->read_wait, &wait);
> -- 
> 2.27.0
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- It looks like you did not use your "real" name for the patch on either
  the Signed-off-by: line, or the From: line (both of which have to
  match).  Please read the kernel file, Documentation/SubmittingPatches
  for how to do this correctly.

- This looks like a new version of a previously submitted patch, but you
  did not list below the --- line any changes from the previous version.
  Please read the section entitled "The canonical patch format" in the
  kernel file, Documentation/SubmittingPatches for what needs to be done
  here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] tty: fix hang on tty device with no_room set
  2022-06-15  5:00                     ` Greg KH
@ 2022-06-15  7:57                       ` Ilpo Järvinen
  2022-06-15  9:29                         ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2022-06-15  7:57 UTC (permalink / raw)
  To: Greg KH; +Cc: cael, Jiri Slaby, benbjiang, robinlai, linux-serial

Hi Greg,

On Wed, 15 Jun 2022, Greg KH wrote:

> On Wed, Jun 15, 2022 at 11:45:10AM +0800, cael wrote:
> > We have met a hang on pty device, the reader was blocking
> > at epoll on master side, the writer was sleeping at wait_woken
> > inside n_tty_write on slave side, and the write buffer on
> > tty_port was full, we found that the reader and writer would
> > never be woken again and blocked forever.
> > 
> > The problem was caused by a race between reader and kworker:
> > n_tty_read(reader):  n_tty_receive_buf_common(kworker):
> >                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> >                     |room <= 0
> > copy_from_read_buf()|
> > n_tty_kick_worker() |
> >                     |ldata->no_room = true
> > 
> > After writing to slave device, writer wakes up kworker to flush
> > data on tty_port to reader, and the kworker finds that reader
> > has no room to store data so room <= 0 is met. At this moment,
> > reader consumes all the data on reader buffer and calls
> > n_tty_kick_worker to check ldata->no_room which is false and
> > reader quits reading. Then kworker sets ldata->no_room=true
> > and quits too.
> > 
> > If write buffer is not full, writer will wake kworker to flush data
> > again after following writes, but if write buffer is full and writer
> > goes to sleep, kworker will never be woken again and tty device is
> > blocked.
> > 
> > This problem can be solved with a check for read buffer size inside
> > n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> > is true, a call to n_tty_kick_worker is necessary to keep flushing
> > data to reader.
> > 
> > Signed-off-by: cael <juanfengpy@gmail.com>
> > ---
> > Patch changelogs between v1 and v2:
> > 	-add barrier inside n_tty_read and n_tty_receive_buf_common;
> > 	-comment why barrier is needed;
> > 	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
> > Patch changelogs between v2 and v3:
> > 	-in function n_tty_receive_buf_common, add unlikely to check
> > 	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
> > 	 is removed here to get locality;
> > 	-change comment for barrier to show the race condition to make
> > 	 comment easier to understand;
> > Patch changelogs between v3 and v4:
> > 	-change subject from 'tty: fix a possible hang on tty device' to
> > 	 'tty: fix hang on tty device with no_room set' to make subject 
> > 	 more obvious.


> This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
> a patch that has triggered this response.  He used to manually respond
> to these common problems, but in order to save his sanity (he kept
> writing the same thing over and over, yet to different people), I was
> created.  Hopefully you will not take offence and will fix the problem
> in your patch and resubmit it so that it can be accepted into the Linux
> kernel tree.
> 
> You are receiving this message because of the following common error(s)
> as indicated below:

[...snip...]

> - This looks like a new version of a previously submitted patch, but you
>   did not list below the --- line any changes from the previous version.
>   Please read the section entitled "The canonical patch format" in the
>   kernel file, Documentation/SubmittingPatches for what needs to be done
>   here to properly describe this.

I think your bot's changelog heuristic got it wrong here. He provided
the list of changes as you can see above.

(The name thing might still be valid though, I've no idea which names are 
real and which are not).


-- 
 i.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] tty: fix hang on tty device with no_room set
  2022-06-15  7:57                       ` Ilpo Järvinen
@ 2022-06-15  9:29                         ` Greg KH
  2022-06-15 11:17                           ` [PATCH v5] " cael
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2022-06-15  9:29 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: cael, Jiri Slaby, benbjiang, robinlai, linux-serial

On Wed, Jun 15, 2022 at 10:57:48AM +0300, Ilpo Järvinen wrote:
> Hi Greg,
> 
> On Wed, 15 Jun 2022, Greg KH wrote:
> 
> > On Wed, Jun 15, 2022 at 11:45:10AM +0800, cael wrote:
> > > We have met a hang on pty device, the reader was blocking
> > > at epoll on master side, the writer was sleeping at wait_woken
> > > inside n_tty_write on slave side, and the write buffer on
> > > tty_port was full, we found that the reader and writer would
> > > never be woken again and blocked forever.
> > > 
> > > The problem was caused by a race between reader and kworker:
> > > n_tty_read(reader):  n_tty_receive_buf_common(kworker):
> > >                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> > >                     |room <= 0
> > > copy_from_read_buf()|
> > > n_tty_kick_worker() |
> > >                     |ldata->no_room = true
> > > 
> > > After writing to slave device, writer wakes up kworker to flush
> > > data on tty_port to reader, and the kworker finds that reader
> > > has no room to store data so room <= 0 is met. At this moment,
> > > reader consumes all the data on reader buffer and calls
> > > n_tty_kick_worker to check ldata->no_room which is false and
> > > reader quits reading. Then kworker sets ldata->no_room=true
> > > and quits too.
> > > 
> > > If write buffer is not full, writer will wake kworker to flush data
> > > again after following writes, but if write buffer is full and writer
> > > goes to sleep, kworker will never be woken again and tty device is
> > > blocked.
> > > 
> > > This problem can be solved with a check for read buffer size inside
> > > n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> > > is true, a call to n_tty_kick_worker is necessary to keep flushing
> > > data to reader.
> > > 
> > > Signed-off-by: cael <juanfengpy@gmail.com>
> > > ---
> > > Patch changelogs between v1 and v2:
> > > 	-add barrier inside n_tty_read and n_tty_receive_buf_common;
> > > 	-comment why barrier is needed;
> > > 	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
> > > Patch changelogs between v2 and v3:
> > > 	-in function n_tty_receive_buf_common, add unlikely to check
> > > 	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
> > > 	 is removed here to get locality;
> > > 	-change comment for barrier to show the race condition to make
> > > 	 comment easier to understand;
> > > Patch changelogs between v3 and v4:
> > > 	-change subject from 'tty: fix a possible hang on tty device' to
> > > 	 'tty: fix hang on tty device with no_room set' to make subject 
> > > 	 more obvious.
> 
> 
> > This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
> > a patch that has triggered this response.  He used to manually respond
> > to these common problems, but in order to save his sanity (he kept
> > writing the same thing over and over, yet to different people), I was
> > created.  Hopefully you will not take offence and will fix the problem
> > in your patch and resubmit it so that it can be accepted into the Linux
> > kernel tree.
> > 
> > You are receiving this message because of the following common error(s)
> > as indicated below:
> 
> [...snip...]
> 
> > - This looks like a new version of a previously submitted patch, but you
> >   did not list below the --- line any changes from the previous version.
> >   Please read the section entitled "The canonical patch format" in the
> >   kernel file, Documentation/SubmittingPatches for what needs to be done
> >   here to properly describe this.
> 
> I think your bot's changelog heuristic got it wrong here. He provided
> the list of changes as you can see above.

Ah, yeah, it didn't catch the changelog text at all, will go fix that
up...

> (The name thing might still be valid though, I've no idea which names are 
> real and which are not).

Yes, for the name reason alone we can't take this change :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5] tty: fix hang on tty device with no_room set
  2022-06-15  9:29                         ` Greg KH
@ 2022-06-15 11:17                           ` cael
  2022-06-15 11:29                             ` Ilpo Järvinen
  2022-06-27 12:05                             ` Greg KH
  0 siblings, 2 replies; 33+ messages in thread
From: cael @ 2022-06-15 11:17 UTC (permalink / raw)
  To: gregkh
  Cc: jirislaby, ilpo.jarvinen, benbjiang, robinlai, linux-serial, juanfengpy

From: caelli <juanfengpy@gmail.com>

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
copy_from_read_buf()|
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Signed-off-by: caelli <juanfengpy@gmail.com>
---
Patch changelogs between v1 and v2:
	-add barrier inside n_tty_read and n_tty_receive_buf_common;
	-comment why barrier is needed;
	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
	-in function n_tty_receive_buf_common, add unlikely to check
	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
	 is removed here to get locality;
	-change comment for barrier to show the race condition to make
	 comment easier to understand;
Patch changelogs between v3 and v4:
	-change subject from 'tty: fix a possible hang on tty device' to
	 'tty: fix hang on tty device with no_room set' to make subject 
	 more obvious;
Patch changelogs between v4 and v5:
	-name is changed from cael to caelli, li is added as the family
	 name and caelli is the fullname.

 drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..544f782b9a11 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set, otherwise, following race may occur:
+		 * n_tty_receive_buf_common() |n_tty_read()
+		 * chars_in_buffer() > 0      |
+		 *                            |copy_from_read_buf()->chars_in_buffer()==0
+		 *                            |if (ldata->no_room)
+		 * ldata->no_room = 1         |
+		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+		 * smp_mb is paired with smp_mb() in n_tty_read().
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (tail != ldata->read_tail)
+	if (tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read before setting read_tail,
+		 * otherwise, following race may occur:
+		 * n_tty_read()		                |n_tty_receive_buf_common()
+		 * if(ldata->no_room)->false            |
+		 *			                |ldata->no_room = 1
+		 *                                      |char_in_buffer() > 0
+		 * ldata->read_tail = ldata->commit_head|
+		 * Then copy_from_read_buf() in reader consumes all the data
+		 * in read buffer, both reader and kworker will fail to kick
+		 * tty_buffer_restart_work().
+		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5] tty: fix hang on tty device with no_room set
  2022-06-15 11:17                           ` [PATCH v5] " cael
@ 2022-06-15 11:29                             ` Ilpo Järvinen
  2022-06-15 13:33                               ` caelli
  2022-06-27 12:05                             ` Greg KH
  1 sibling, 1 reply; 33+ messages in thread
From: Ilpo Järvinen @ 2022-06-15 11:29 UTC (permalink / raw)
  To: cael; +Cc: Greg Kroah-Hartman, Jiri Slaby, benbjiang, robinlai, linux-serial

[-- Attachment #1: Type: text/plain, Size: 5774 bytes --]

On Wed, 15 Jun 2022, cael wrote:

> From: caelli <juanfengpy@gmail.com>
> 
> We have met a hang on pty device, the reader was blocking
> at epoll on master side, the writer was sleeping at wait_woken
> inside n_tty_write on slave side, and the write buffer on
> tty_port was full, we found that the reader and writer would
> never be woken again and blocked forever.
> 
> The problem was caused by a race between reader and kworker:
> n_tty_read(reader):  n_tty_receive_buf_common(kworker):
>                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                     |room <= 0
> copy_from_read_buf()|
> n_tty_kick_worker() |
>                     |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and calls
> n_tty_kick_worker to check ldata->no_room which is false and
> reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
> 
> If write buffer is not full, writer will wake kworker to flush data
> again after following writes, but if write buffer is full and writer
> goes to sleep, kworker will never be woken again and tty device is
> blocked.
> 
> This problem can be solved with a check for read buffer size inside
> n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> is true, a call to n_tty_kick_worker is necessary to keep flushing
> data to reader.
> 
> Signed-off-by: caelli <juanfengpy@gmail.com>
> ---
> Patch changelogs between v1 and v2:
> 	-add barrier inside n_tty_read and n_tty_receive_buf_common;
> 	-comment why barrier is needed;
> 	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
> Patch changelogs between v2 and v3:
> 	-in function n_tty_receive_buf_common, add unlikely to check
> 	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
> 	 is removed here to get locality;
> 	-change comment for barrier to show the race condition to make
> 	 comment easier to understand;
> Patch changelogs between v3 and v4:
> 	-change subject from 'tty: fix a possible hang on tty device' to
> 	 'tty: fix hang on tty device with no_room set' to make subject 
> 	 more obvious;
> Patch changelogs between v4 and v5:
> 	-name is changed from cael to caelli, li is added as the family
> 	 name and caelli is the fullname.
> 
>  drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
>  1 file changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index efc72104c840..544f782b9a11 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
>  	struct n_tty_data *ldata = tty->disc_data;
>  
>  	/* Did the input worker stop? Restart it */
> -	if (unlikely(ldata->no_room)) {
> -		ldata->no_room = 0;
> +	if (unlikely(READ_ONCE(ldata->no_room))) {
> +		WRITE_ONCE(ldata->no_room, 0);
>  
>  		WARN_RATELIMIT(tty->port->itty == NULL,
>  				"scheduling with invalid itty\n");
> @@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>  			if (overflow && room < 0)
>  				ldata->read_head--;
>  			room = overflow;
> -			ldata->no_room = flow && !room;
> +			WRITE_ONCE(ldata->no_room, flow && !room);
>  		} else
>  			overflow = 0;
>  
> @@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>  	} else
>  		n_tty_check_throttle(tty);
>  
> +	if (unlikely(ldata->no_room)) {
> +		/*
> +		 * Barrier here is to ensure to read the latest read_tail in
> +		 * chars_in_buffer() and to make sure that read_tail is not loaded
> +		 * before ldata->no_room is set, otherwise, following race may occur:
> +		 * n_tty_receive_buf_common() |n_tty_read()
> +		 * chars_in_buffer() > 0      |
> +		 *                            |copy_from_read_buf()->chars_in_buffer()==0
> +		 *                            |if (ldata->no_room)
> +		 * ldata->no_room = 1         |
> +		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
> +		 * smp_mb is paired with smp_mb() in n_tty_read().
> +		 */
> +		smp_mb();
> +		if (!chars_in_buffer(tty))
> +			n_tty_kick_worker(tty);
> +	}
> +
>  	up_read(&tty->termios_rwsem);
>  
>  	return rcvd;
> @@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
>  		if (time)
>  			timeout = time;
>  	}
> -	if (tail != ldata->read_tail)
> +	if (tail != ldata->read_tail) {
> +		/*
> +		 * Make sure no_room is not read before setting read_tail,
> +		 * otherwise, following race may occur:
> +		 * n_tty_read()		                |n_tty_receive_buf_common()
> +		 * if(ldata->no_room)->false            |
> +		 *			                |ldata->no_room = 1
> +		 *                                      |char_in_buffer() > 0
> +		 * ldata->read_tail = ldata->commit_head|
> +		 * Then copy_from_read_buf() in reader consumes all the data
> +		 * in read buffer, both reader and kworker will fail to kick
> +		 * tty_buffer_restart_work().
> +		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
> +		 */
> +		smp_mb();
>  		n_tty_kick_worker(tty);
> +	}
>  	up_read(&tty->termios_rwsem);
>  
>  	remove_wait_queue(&tty->read_wait, &wait);

I think the code looks fine. What I'm not entirely sure if there is 
supposed to be some other backup mechanism to handle this case.

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Note to Cael: you don't need to resend the patch just to add my reviewed 
by, it would be picked by the tools automatically. But if you need to 
resend due to other reasons, please add it in that case.


-- 
 i.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5] tty: fix hang on tty device with no_room set
  2022-06-15 11:29                             ` Ilpo Järvinen
@ 2022-06-15 13:33                               ` caelli
  0 siblings, 0 replies; 33+ messages in thread
From: caelli @ 2022-06-15 13:33 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Greg Kroah-Hartman, Jiri Slaby, benbjiang, robinlai, linux-serial

It seems done, thanks for your opinion and help. The original patch
(without barrier) was tested in our environment and seemed to work.
The main idea is around when to call n_tty_kick_worker, calling it
periodically still works, the current solution seems to be more
reasonable and obvious.

Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 于2022年6月15日周三 19:29写道:
>
> On Wed, 15 Jun 2022, cael wrote:
>
> > From: caelli <juanfengpy@gmail.com>
> >
> > We have met a hang on pty device, the reader was blocking
> > at epoll on master side, the writer was sleeping at wait_woken
> > inside n_tty_write on slave side, and the write buffer on
> > tty_port was full, we found that the reader and writer would
> > never be woken again and blocked forever.
> >
> > The problem was caused by a race between reader and kworker:
> > n_tty_read(reader):  n_tty_receive_buf_common(kworker):
> >                     |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
> >                     |room <= 0
> > copy_from_read_buf()|
> > n_tty_kick_worker() |
> >                     |ldata->no_room = true
> >
> > After writing to slave device, writer wakes up kworker to flush
> > data on tty_port to reader, and the kworker finds that reader
> > has no room to store data so room <= 0 is met. At this moment,
> > reader consumes all the data on reader buffer and calls
> > n_tty_kick_worker to check ldata->no_room which is false and
> > reader quits reading. Then kworker sets ldata->no_room=true
> > and quits too.
> >
> > If write buffer is not full, writer will wake kworker to flush data
> > again after following writes, but if write buffer is full and writer
> > goes to sleep, kworker will never be woken again and tty device is
> > blocked.
> >
> > This problem can be solved with a check for read buffer size inside
> > n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
> > is true, a call to n_tty_kick_worker is necessary to keep flushing
> > data to reader.
> >
> > Signed-off-by: caelli <juanfengpy@gmail.com>
> > ---
> > Patch changelogs between v1 and v2:
> >       -add barrier inside n_tty_read and n_tty_receive_buf_common;
> >       -comment why barrier is needed;
> >       -access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
> > Patch changelogs between v2 and v3:
> >       -in function n_tty_receive_buf_common, add unlikely to check
> >        ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
> >        is removed here to get locality;
> >       -change comment for barrier to show the race condition to make
> >        comment easier to understand;
> > Patch changelogs between v3 and v4:
> >       -change subject from 'tty: fix a possible hang on tty device' to
> >        'tty: fix hang on tty device with no_room set' to make subject
> >        more obvious;
> > Patch changelogs between v4 and v5:
> >       -name is changed from cael to caelli, li is added as the family
> >        name and caelli is the fullname.
> >
> >  drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 37 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> > index efc72104c840..544f782b9a11 100644
> > --- a/drivers/tty/n_tty.c
> > +++ b/drivers/tty/n_tty.c
> > @@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
> >       struct n_tty_data *ldata = tty->disc_data;
> >
> >       /* Did the input worker stop? Restart it */
> > -     if (unlikely(ldata->no_room)) {
> > -             ldata->no_room = 0;
> > +     if (unlikely(READ_ONCE(ldata->no_room))) {
> > +             WRITE_ONCE(ldata->no_room, 0);
> >
> >               WARN_RATELIMIT(tty->port->itty == NULL,
> >                               "scheduling with invalid itty\n");
> > @@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
> >                       if (overflow && room < 0)
> >                               ldata->read_head--;
> >                       room = overflow;
> > -                     ldata->no_room = flow && !room;
> > +                     WRITE_ONCE(ldata->no_room, flow && !room);
> >               } else
> >                       overflow = 0;
> >
> > @@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
> >       } else
> >               n_tty_check_throttle(tty);
> >
> > +     if (unlikely(ldata->no_room)) {
> > +             /*
> > +              * Barrier here is to ensure to read the latest read_tail in
> > +              * chars_in_buffer() and to make sure that read_tail is not loaded
> > +              * before ldata->no_room is set, otherwise, following race may occur:
> > +              * n_tty_receive_buf_common() |n_tty_read()
> > +              * chars_in_buffer() > 0      |
> > +              *                            |copy_from_read_buf()->chars_in_buffer()==0
> > +              *                            |if (ldata->no_room)
> > +              * ldata->no_room = 1         |
> > +              * Then both kworker and reader will fail to kick n_tty_kick_worker(),
> > +              * smp_mb is paired with smp_mb() in n_tty_read().
> > +              */
> > +             smp_mb();
> > +             if (!chars_in_buffer(tty))
> > +                     n_tty_kick_worker(tty);
> > +     }
> > +
> >       up_read(&tty->termios_rwsem);
> >
> >       return rcvd;
> > @@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
> >               if (time)
> >                       timeout = time;
> >       }
> > -     if (tail != ldata->read_tail)
> > +     if (tail != ldata->read_tail) {
> > +             /*
> > +              * Make sure no_room is not read before setting read_tail,
> > +              * otherwise, following race may occur:
> > +              * n_tty_read()                         |n_tty_receive_buf_common()
> > +              * if(ldata->no_room)->false            |
> > +              *                                      |ldata->no_room = 1
> > +              *                                      |char_in_buffer() > 0
> > +              * ldata->read_tail = ldata->commit_head|
> > +              * Then copy_from_read_buf() in reader consumes all the data
> > +              * in read buffer, both reader and kworker will fail to kick
> > +              * tty_buffer_restart_work().
> > +              * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
> > +              */
> > +             smp_mb();
> >               n_tty_kick_worker(tty);
> > +     }
> >       up_read(&tty->termios_rwsem);
> >
> >       remove_wait_queue(&tty->read_wait, &wait);
>
> I think the code looks fine. What I'm not entirely sure if there is
> supposed to be some other backup mechanism to handle this case.
>
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> Note to Cael: you don't need to resend the patch just to add my reviewed
> by, it would be picked by the tools automatically. But if you need to
> resend due to other reasons, please add it in that case.
>
>
> --
>  i.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5] tty: fix hang on tty device with no_room set
  2022-06-15 11:17                           ` [PATCH v5] " cael
  2022-06-15 11:29                             ` Ilpo Järvinen
@ 2022-06-27 12:05                             ` Greg KH
  2022-06-27 13:53                               ` [PATCH v6] " juanfengpy
  2023-03-17  2:41                               ` [PATCH v7] " juanfengpy
  1 sibling, 2 replies; 33+ messages in thread
From: Greg KH @ 2022-06-27 12:05 UTC (permalink / raw)
  To: cael; +Cc: jirislaby, ilpo.jarvinen, benbjiang, robinlai, linux-serial

On Wed, Jun 15, 2022 at 07:17:01PM +0800, cael wrote:
> From: caelli <juanfengpy@gmail.com>

Can you please use the name you use to sign legal documents as the
number of different names being used and written here is totally
confusing.  Please use your native language if that's an issue, that
would be best.

Also, why not use your corporate email address as the From: and
signed-off-by line also?  That is preferred, and in this case, I'm going
to ask for it given the confusion that has happened so far in this
thread in the past.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v6] tty: fix hang on tty device with no_room set
  2022-06-27 12:05                             ` Greg KH
@ 2022-06-27 13:53                               ` juanfengpy
  2023-03-17  2:41                               ` [PATCH v7] " juanfengpy
  1 sibling, 0 replies; 33+ messages in thread
From: juanfengpy @ 2022-06-27 13:53 UTC (permalink / raw)
  To: gregkh
  Cc: jirislaby, ilpo.jarvinen, benbjiang, robinlai, linux-serial,
	juanfengpy, caelli

From: caelli <caelli@tencent.com>

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
copy_from_read_buf()|
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Signed-off-by: caelli <caelli@tencent.com>
---
Patch changelogs between v1 and v2:
	-add barrier inside n_tty_read and n_tty_receive_buf_common;
	-comment why barrier is needed;
	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
	-in function n_tty_receive_buf_common, add unlikely to check
	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
	 is removed here to get locality;
	-change comment for barrier to show the race condition to make
	 comment easier to understand;
Patch changelogs between v3 and v4:
	-change subject from 'tty: fix a possible hang on tty device' to
	 'tty: fix hang on tty device with no_room set' to make subject 
	 more obvious;
Patch changelogs between v4 and v5:
	-name is changed from cael to caelli, li is added as the family
	 name and caelli is the fullname.
Patch changelogs between v5 and v6:
	-change from and Signed-off-by, from 'caelli <juanfengpy@gmail.com>'
	 to 'caelli <caelli@tencent.com>', later one is my corporate address.

 drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index efc72104c840..544f782b9a11 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -201,8 +201,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1632,7 +1632,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1663,6 +1663,24 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set, otherwise, following race may occur:
+		 * n_tty_receive_buf_common() |n_tty_read()
+		 * chars_in_buffer() > 0      |
+		 *                            |copy_from_read_buf()->chars_in_buffer()==0
+		 *                            |if (ldata->no_room)
+		 * ldata->no_room = 1         |
+		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+		 * smp_mb is paired with smp_mb() in n_tty_read().
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2180,8 +2198,23 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (tail != ldata->read_tail)
+	if (tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read before setting read_tail,
+		 * otherwise, following race may occur:
+		 * n_tty_read()		                |n_tty_receive_buf_common()
+		 * if(ldata->no_room)->false            |
+		 *			                |ldata->no_room = 1
+		 *                                      |char_in_buffer() > 0
+		 * ldata->read_tail = ldata->commit_head|
+		 * Then copy_from_read_buf() in reader consumes all the data
+		 * in read buffer, both reader and kworker will fail to kick
+		 * tty_buffer_restart_work().
+		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v7] tty: fix hang on tty device with no_room set
  2022-06-27 12:05                             ` Greg KH
  2022-06-27 13:53                               ` [PATCH v6] " juanfengpy
@ 2023-03-17  2:41                               ` juanfengpy
  2023-03-17  6:32                                 ` Jiri Slaby
  1 sibling, 1 reply; 33+ messages in thread
From: juanfengpy @ 2023-03-17  2:41 UTC (permalink / raw)
  To: gregkh
  Cc: jirislaby, ilpo.jarvinen, benbjiang, robinlai, linux-serial,
	juanfengpy, Hui Li, stable

From: Hui Li <caelli@tencent.com>

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
copy_from_read_buf()|
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Cc: <stable@vger.kernel.org>
Fixes: 42458f41d08f ("n_tty: Ensure reader restarts worker for next reader")
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Hui Li <caelli@tencent.com>
---
Patch changelogs between v1 and v2:
	-add barrier inside n_tty_read and n_tty_receive_buf_common;
	-comment why barrier is needed;
	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
	-in function n_tty_receive_buf_common, add unlikely to check
	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
	 is removed here to get locality;
	-change comment for barrier to show the race condition to make
	 comment easier to understand;
Patch changelogs between v3 and v4:
	-change subject from 'tty: fix a possible hang on tty device' to
	 'tty: fix hang on tty device with no_room set' to make subject 
	 more obvious;
Patch changelogs between v4 and v5:
	-name is changed from cael to caelli, li is added as the family
	 name and caelli is the fullname.
Patch changelogs between v5 and v6:
	-change from and Signed-off-by, from 'caelli <juanfengpy@gmail.com>'
	 to 'caelli <caelli@tencent.com>', later one is my corporate address.
Patch changelogs between v6 and v7:
	-change name from caelli to 'Hui Li', which is my name in chinese.
	-the comment for barrier is improved, and a Fixes and Reviewed-by
	 tags is added.

 drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index c8f56c9b1a1c..8c17304fffcf 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -204,8 +204,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1698,7 +1698,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1729,6 +1729,27 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set, otherwise, following race may occur:
+		 * n_tty_receive_buf_common()
+		 *				n_tty_read()
+		 *   if (!chars_in_buffer(tty))->false
+		 *				  copy_from_read_buf()
+		 *				    read_tail=commit_head
+		 *				  n_tty_kick_worker()
+		 *				    if (ldata->no_room)->false
+		 *   ldata->no_room = 1
+		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+		 * smp_mb is paired with smp_mb() in n_tty_read().
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2282,8 +2303,25 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (old_tail != ldata->read_tail)
+	if (old_tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read in n_tty_kick_worker()
+		 * before setting ldata->read_tail in copy_from_read_buf(),
+		 * otherwise, following race may occur:
+		 * n_tty_read()
+		 *			n_tty_receive_buf_common()
+		 *   n_tty_kick_worker()
+		 *     if(ldata->no_room)->false
+		 *			  ldata->no_room = 1
+		 *			  if (!chars_in_buffer(tty))->false
+		 *   copy_from_read_buf()
+		 *     read_tail=commit_head
+		 * Both reader and kworker will fail to kick tty_buffer_restart_work(),
+		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v7] tty: fix hang on tty device with no_room set
  2023-03-17  2:41                               ` [PATCH v7] " juanfengpy
@ 2023-03-17  6:32                                 ` Jiri Slaby
  2023-03-17  7:25                                   ` [PATCH v8] " juanfengpy
  0 siblings, 1 reply; 33+ messages in thread
From: Jiri Slaby @ 2023-03-17  6:32 UTC (permalink / raw)
  To: juanfengpy, gregkh
  Cc: ilpo.jarvinen, benbjiang, robinlai, linux-serial, Hui Li, stable

On 17. 03. 23, 3:41, juanfengpy@gmail.com wrote:
> From: Hui Li <caelli@tencent.com>
> 
> We have met a hang on pty device, the reader was blocking
> at epoll on master side, the writer was sleeping at wait_woken
> inside n_tty_write on slave side, and the write buffer on
> tty_port was full, we found that the reader and writer would
> never be woken again and blocked forever.
> 
> The problem was caused by a race between reader and kworker:
> n_tty_read(reader):  n_tty_receive_buf_common(kworker):
> copy_from_read_buf()|
>                      |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
>                      |room <= 0
> n_tty_kick_worker() |
>                      |ldata->no_room = true
> 
> After writing to slave device, writer wakes up kworker to flush
> data on tty_port to reader, and the kworker finds that reader
> has no room to store data so room <= 0 is met. At this moment,
> reader consumes all the data on reader buffer and calls
> n_tty_kick_worker to check ldata->no_room which is false and
> reader quits reading. Then kworker sets ldata->no_room=true
> and quits too.
...
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
...
> @@ -1729,6 +1729,27 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>   	} else
>   		n_tty_check_throttle(tty);
>   
> +	if (unlikely(ldata->no_room)) {
> +		/*
> +		 * Barrier here is to ensure to read the latest read_tail in
> +		 * chars_in_buffer() and to make sure that read_tail is not loaded
> +		 * before ldata->no_room is set,


I am not sure I would keep the following part of the comment in the code:

 > otherwise, following race may occur:
> +		 * n_tty_receive_buf_common()
> +		 *				n_tty_read()
> +		 *   if (!chars_in_buffer(tty))->false
> +		 *				  copy_from_read_buf()
> +		 *				    read_tail=commit_head
> +		 *				  n_tty_kick_worker()
> +		 *				    if (ldata->no_room)->false
> +		 *   ldata->no_room = 1
> +		 * Then both kworker and reader will fail to kick n_tty_kick_worker(),
> +		 * smp_mb is paired with smp_mb() in n_tty_read().

I would only let it ^^^ documented in the commit log as you did.

> +		 */
> +		smp_mb();
> +		if (!chars_in_buffer(tty))
> +			n_tty_kick_worker(tty);
> +	}
> +
>   	up_read(&tty->termios_rwsem);
>   
>   	return rcvd;
> @@ -2282,8 +2303,25 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
>   		if (time)
>   			timeout = time;
>   	}
> -	if (old_tail != ldata->read_tail)
> +	if (old_tail != ldata->read_tail) {
> +		/*
> +		 * Make sure no_room is not read in n_tty_kick_worker()
> +		 * before setting ldata->read_tail in copy_from_read_buf(),

The same here (it's only repeated). I think the above two lines are 
enough for the comment. We have git blame after all.

> +		 * otherwise, following race may occur:
> +		 * n_tty_read()
> +		 *			n_tty_receive_buf_common()
> +		 *   n_tty_kick_worker()
> +		 *     if(ldata->no_room)->false
> +		 *			  ldata->no_room = 1
> +		 *			  if (!chars_in_buffer(tty))->false
> +		 *   copy_from_read_buf()
> +		 *     read_tail=commit_head
> +		 * Both reader and kworker will fail to kick tty_buffer_restart_work(),
> +		 * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
> +		 */
> +		smp_mb();
>   		n_tty_kick_worker(tty);
> +	}
>   	up_read(&tty->termios_rwsem);
>   
>   	remove_wait_queue(&tty->read_wait, &wait);

-- 
js


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v8] tty: fix hang on tty device with no_room set
  2023-03-17  6:32                                 ` Jiri Slaby
@ 2023-03-17  7:25                                   ` juanfengpy
  2023-04-06  2:44                                     ` [PATCH v9] " juanfengpy
  0 siblings, 1 reply; 33+ messages in thread
From: juanfengpy @ 2023-03-17  7:25 UTC (permalink / raw)
  To: jirislaby, gregkh
  Cc: ilpo.jarvinen, benbjiang, robinlai, linux-serial, juanfengpy,
	Hui Li, stable

From: Hui Li <caelli@tencent.com>

We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
copy_from_read_buf()|
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Cc: <stable@vger.kernel.org>
Fixes: 42458f41d08f ("n_tty: Ensure reader restarts worker for next reader")
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Hui Li <caelli@tencent.com>
---
Patch changelogs between v1 and v2:
	-add barrier inside n_tty_read and n_tty_receive_buf_common;
	-comment why barrier is needed;
	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
	-in function n_tty_receive_buf_common, add unlikely to check
	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
	 is removed here to get locality;
	-change comment for barrier to show the race condition to make
	 comment easier to understand;
Patch changelogs between v3 and v4:
	-change subject from 'tty: fix a possible hang on tty device' to
	 'tty: fix hang on tty device with no_room set' to make subject 
	 more obvious;
Patch changelogs between v4 and v5:
	-name is changed from cael to caelli, li is added as the family
	 name and caelli is the fullname.
Patch changelogs between v5 and v6:
	-change from and Signed-off-by, from 'caelli <juanfengpy@gmail.com>'
	 to 'caelli <caelli@tencent.com>', later one is my corporate address.
Patch changelogs between v6 and v7:
	-change name from caelli to 'Hui Li', which is my name in chinese.
	-the comment for barrier is improved, and a Fixes and Reviewed-by
	 tags is added.
Patch changelogs between v7 and v8:
	-Simplify the comments for barriers.

 drivers/tty/n_tty.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index c8f56c9b1a1c..4dff2f34e2d0 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -204,8 +204,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1698,7 +1698,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1729,6 +1729,17 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set.
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2282,8 +2293,14 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (old_tail != ldata->read_tail)
+	if (old_tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read in n_tty_kick_worker()
+		 * before setting ldata->read_tail in copy_from_read_buf().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v9] tty: fix hang on tty device with no_room set
  2023-03-17  7:25                                   ` [PATCH v8] " juanfengpy
@ 2023-04-06  2:44                                     ` juanfengpy
  0 siblings, 0 replies; 33+ messages in thread
From: juanfengpy @ 2023-04-06  2:44 UTC (permalink / raw)
  To: jirislaby, gregkh
  Cc: ilpo.jarvinen, bagasdotme, benbjiang, robinlai, linux-serial,
	juanfengpy, Hui Li, stable

From: Hui Li <caelli@tencent.com>

It is possible to hang pty devices in this case, the reader was
blocking at epoll on master side, the writer was sleeping at
wait_woken inside n_tty_write on slave side, and the write buffer
on tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.

The problem was caused by a race between reader and kworker:
n_tty_read(reader):  n_tty_receive_buf_common(kworker):
copy_from_read_buf()|
                    |room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
                    |room <= 0
n_tty_kick_worker() |
                    |ldata->no_room = true

After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.

If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.

This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.

Cc: <stable@vger.kernel.org>
Fixes: 42458f41d08f ("n_tty: Ensure reader restarts worker for next reader")
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Hui Li <caelli@tencent.com>
---
Patch changelogs between v1 and v2:
	-add barrier inside n_tty_read and n_tty_receive_buf_common;
	-comment why barrier is needed;
	-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
	-in function n_tty_receive_buf_common, add unlikely to check
	 ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
	 is removed here to get locality;
	-change comment for barrier to show the race condition to make
	 comment easier to understand;
Patch changelogs between v3 and v4:
	-change subject from 'tty: fix a possible hang on tty device' to
	 'tty: fix hang on tty device with no_room set' to make subject 
	 more obvious;
Patch changelogs between v4 and v5:
	-name is changed from cael to caelli, li is added as the family
	 name and caelli is the fullname.
Patch changelogs between v5 and v6:
	-change from and Signed-off-by, from 'caelli <juanfengpy@gmail.com>'
	 to 'caelli <caelli@tencent.com>', later one is my corporate address.
Patch changelogs between v6 and v7:
	-change name from caelli to 'Hui Li', which is my name in chinese.
	-the comment for barrier is improved, and a Fixes and Reviewed-by
	 tags is added.
Patch changelogs between v7 and v8:
	-Simplify the comments for barriers.
Patch changelogs between v8 and v9:
	-change the commit messages as suggested by Bagas Sanjaya.

 drivers/tty/n_tty.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index c8f56c9b1a1c..4dff2f34e2d0 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -204,8 +204,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
 	struct n_tty_data *ldata = tty->disc_data;
 
 	/* Did the input worker stop? Restart it */
-	if (unlikely(ldata->no_room)) {
-		ldata->no_room = 0;
+	if (unlikely(READ_ONCE(ldata->no_room))) {
+		WRITE_ONCE(ldata->no_room, 0);
 
 		WARN_RATELIMIT(tty->port->itty == NULL,
 				"scheduling with invalid itty\n");
@@ -1698,7 +1698,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 			if (overflow && room < 0)
 				ldata->read_head--;
 			room = overflow;
-			ldata->no_room = flow && !room;
+			WRITE_ONCE(ldata->no_room, flow && !room);
 		} else
 			overflow = 0;
 
@@ -1729,6 +1729,17 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+	if (unlikely(ldata->no_room)) {
+		/*
+		 * Barrier here is to ensure to read the latest read_tail in
+		 * chars_in_buffer() and to make sure that read_tail is not loaded
+		 * before ldata->no_room is set.
+		 */
+		smp_mb();
+		if (!chars_in_buffer(tty))
+			n_tty_kick_worker(tty);
+	}
+
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
@@ -2282,8 +2293,14 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		if (time)
 			timeout = time;
 	}
-	if (old_tail != ldata->read_tail)
+	if (old_tail != ldata->read_tail) {
+		/*
+		 * Make sure no_room is not read in n_tty_kick_worker()
+		 * before setting ldata->read_tail in copy_from_read_buf().
+		 */
+		smp_mb();
 		n_tty_kick_worker(tty);
+	}
 	up_read(&tty->termios_rwsem);
 
 	remove_wait_queue(&tty->read_wait, &wait);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-04-06  2:46 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-24  2:21 tty: fix a possible hang on tty device cael
2022-05-24  9:11 ` Ilpo Järvinen
2022-05-24 11:09   ` cael
2022-05-24 11:40     ` Ilpo Järvinen
2022-05-24 12:47       ` cael
2022-05-24 13:25         ` Ilpo Järvinen
2022-05-25 10:36           ` cael
2022-05-25 11:21             ` Ilpo Järvinen
2022-05-30 13:13               ` cael
2022-05-31 12:37                 ` Ilpo Järvinen
2022-06-01  9:38 ` Greg KH
2022-06-01 13:39   ` cael
2022-06-01 14:47     ` Greg KH
2022-06-01 15:28     ` Ilpo Järvinen
2022-06-06 13:40       ` cael
2022-06-06 14:43         ` Greg KH
2022-06-11  6:50           ` cael
2022-06-11  7:32             ` Greg KH
2022-06-13 12:30               ` [PATCH v3] tty: fix hang on tty device with no_room set juanfengpy
2022-06-13 17:20                 ` Greg KH
2022-06-15  3:45                   ` [PATCH v4] " cael
2022-06-15  5:00                     ` Greg KH
2022-06-15  7:57                       ` Ilpo Järvinen
2022-06-15  9:29                         ` Greg KH
2022-06-15 11:17                           ` [PATCH v5] " cael
2022-06-15 11:29                             ` Ilpo Järvinen
2022-06-15 13:33                               ` caelli
2022-06-27 12:05                             ` Greg KH
2022-06-27 13:53                               ` [PATCH v6] " juanfengpy
2023-03-17  2:41                               ` [PATCH v7] " juanfengpy
2023-03-17  6:32                                 ` Jiri Slaby
2023-03-17  7:25                                   ` [PATCH v8] " juanfengpy
2023-04-06  2:44                                     ` [PATCH v9] " juanfengpy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).