linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] drivers:tty:pty: Fix a race causing data loss on close
@ 2020-10-02 13:03 minyard
  2020-10-05 11:31 ` Greg Kroah-Hartman
  2020-10-12  7:13 ` Jiri Slaby
  0 siblings, 2 replies; 4+ messages in thread
From: minyard @ 2020-10-02 13:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby; +Cc: LKML, Corey Minyard

From: Corey Minyard <cminyard@mvista.com>

If you write to a pty master an immediately close the pty master, the
receiver might get a chunk of data dropped, but then receive some later
data.  That's obviously something rather unexpected for a user.  It
certainly confused my test program.

It turns out that tty_vhangup() gets called from pty_close(), and that
causes the data on the slave side to be flushed, but due to races more
data can be copied into the slave side's buffer after that.  Consider
the following sequence:

thread1          thread2         thread3
-------          -------         -------
 |                |-write data into buffer,
 |                | n_tty buffer is filled
 |                | along with other buffers
 |                |-pty_close()
 |                |--tty_vhangup()
 |                |---tty_ldisc_hangup()
 |                |----n_tty_flush_buffer()
 |                |-----reset_buffer_flags()
 |-n_tty_read()   |
 |--up_read(&tty->termios_rwsem);
 |                |------down_read(&tty->termios_rwsem)
 |                |------clear n_tty buffer contents
 |                |------up_read(&tty->termios_rwsem)
 |--tty_buffer_flush_work()       |
 |--schedules work calling        |
 |  flush_to_ldisc()              |
 |                                |-flush_to_ldisc()
 |                                |--receive_buf()
 |                                |---tty_port_default_receive_buf()
 |                                |----tty_ldisc_receive_buf()
 |                                |-----n_tty_receive_buf2()
 |                                |------n_tty_receive_buf_common()
 |                                |-------down_read(&tty->termios_rwsem)
 |                                |-------__receive_buf()
 |                                |-------copies data into n_tty buffer
 |                                |-------up_read(&tty->termios_rwsem)
 |--down_read(&tty->termios_rwsem)
 |--copy buffer data to user

From this sequence, you can see that thread2 writes to the buffer then
only clears the part of the buffer in n_tty.  The n_tty receive buffer
code then copies more data into the n_tty buffer.

This change checks to see if the tty is being hung up before copying
anything in n_tty_receive_buf_common().  It has to be done after the
tty->termios_rwsem semaphore is claimed, for reasons that should be
apparent from the sequence above.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
---

Changes since v1: Added lines to make the sequence diagram clearer.

I sent a program to reproduce this, I extended it to prove it wasn't the
test program that caused the issue, and I've uploaded it to:
  http://sourceforge.net/projects/ser2net/files/tmp/testpty.c
if you want to run it.  It has a lot of comments that explain what is
going on.

This is not a very satisfying fix, though.  It works reliably, but it
doesn't seem right to me.  My inclination was to remove the up and down
semaphore around tty_buffer_flush_work() in n_tty_read(), as it just
schedules some work, no need to unlock for that.  But that resulted
in a deadlock elsewhere, so that up/down on the semaphore is there for
another reason.

The locking in the tty code is really hard to follow.  I believe this is
actually a locking problem, but fixing it looks daunting to me.

Another way to fix this that occurred just occurred to me would be to
clear all the buffers in pty_close().

-corey


 drivers/tty/n_tty.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 1794d84e7bf6..1c33c26dc229 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1704,6 +1704,9 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 
 	down_read(&tty->termios_rwsem);
 
+	if (test_bit(TTY_HUPPING, &tty->flags))
+		goto out_upsem;
+
 	do {
 		/*
 		 * When PARMRK is set, each input char may take up to 3 chars
@@ -1760,6 +1763,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 	} else
 		n_tty_check_throttle(tty);
 
+out_upsem:
 	up_read(&tty->termios_rwsem);
 
 	return rcvd;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] drivers:tty:pty: Fix a race causing data loss on close
  2020-10-02 13:03 [PATCH v2] drivers:tty:pty: Fix a race causing data loss on close minyard
@ 2020-10-05 11:31 ` Greg Kroah-Hartman
  2020-10-06  4:48   ` Jiri Slaby
  2020-10-12  7:13 ` Jiri Slaby
  1 sibling, 1 reply; 4+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-05 11:31 UTC (permalink / raw)
  To: minyard, Jiri Slaby; +Cc: LKML, Corey Minyard

On Fri, Oct 02, 2020 at 08:03:04AM -0500, minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> If you write to a pty master an immediately close the pty master, the
> receiver might get a chunk of data dropped, but then receive some later
> data.  That's obviously something rather unexpected for a user.  It
> certainly confused my test program.
> 
> It turns out that tty_vhangup() gets called from pty_close(), and that
> causes the data on the slave side to be flushed, but due to races more
> data can be copied into the slave side's buffer after that.  Consider
> the following sequence:
> 
> thread1          thread2         thread3
> -------          -------         -------
>  |                |-write data into buffer,
>  |                | n_tty buffer is filled
>  |                | along with other buffers
>  |                |-pty_close()
>  |                |--tty_vhangup()
>  |                |---tty_ldisc_hangup()
>  |                |----n_tty_flush_buffer()
>  |                |-----reset_buffer_flags()
>  |-n_tty_read()   |
>  |--up_read(&tty->termios_rwsem);
>  |                |------down_read(&tty->termios_rwsem)
>  |                |------clear n_tty buffer contents
>  |                |------up_read(&tty->termios_rwsem)
>  |--tty_buffer_flush_work()       |
>  |--schedules work calling        |
>  |  flush_to_ldisc()              |
>  |                                |-flush_to_ldisc()
>  |                                |--receive_buf()
>  |                                |---tty_port_default_receive_buf()
>  |                                |----tty_ldisc_receive_buf()
>  |                                |-----n_tty_receive_buf2()
>  |                                |------n_tty_receive_buf_common()
>  |                                |-------down_read(&tty->termios_rwsem)
>  |                                |-------__receive_buf()
>  |                                |-------copies data into n_tty buffer
>  |                                |-------up_read(&tty->termios_rwsem)
>  |--down_read(&tty->termios_rwsem)
>  |--copy buffer data to user
> 
> >From this sequence, you can see that thread2 writes to the buffer then
> only clears the part of the buffer in n_tty.  The n_tty receive buffer
> code then copies more data into the n_tty buffer.
> 
> This change checks to see if the tty is being hung up before copying
> anything in n_tty_receive_buf_common().  It has to be done after the
> tty->termios_rwsem semaphore is claimed, for reasons that should be
> apparent from the sequence above.
> 
> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> ---
> 
> Changes since v1: Added lines to make the sequence diagram clearer.
> 
> I sent a program to reproduce this, I extended it to prove it wasn't the
> test program that caused the issue, and I've uploaded it to:
>   http://sourceforge.net/projects/ser2net/files/tmp/testpty.c
> if you want to run it.  It has a lot of comments that explain what is
> going on.
> 
> This is not a very satisfying fix, though.  It works reliably, but it
> doesn't seem right to me.  My inclination was to remove the up and down
> semaphore around tty_buffer_flush_work() in n_tty_read(), as it just
> schedules some work, no need to unlock for that.  But that resulted
> in a deadlock elsewhere, so that up/down on the semaphore is there for
> another reason.
> 
> The locking in the tty code is really hard to follow.  I believe this is
> actually a locking problem, but fixing it looks daunting to me.
> 
> Another way to fix this that occurred just occurred to me would be to
> clear all the buffers in pty_close().
> 
> -corey
> 
> 
>  drivers/tty/n_tty.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index 1794d84e7bf6..1c33c26dc229 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1704,6 +1704,9 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>  
>  	down_read(&tty->termios_rwsem);
>  
> +	if (test_bit(TTY_HUPPING, &tty->flags))
> +		goto out_upsem;
> +
>  	do {
>  		/*
>  		 * When PARMRK is set, each input char may take up to 3 chars
> @@ -1760,6 +1763,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>  	} else
>  		n_tty_check_throttle(tty);
>  
> +out_upsem:
>  	up_read(&tty->termios_rwsem);
>  
>  	return rcvd;
> -- 
> 2.17.1
> 

Jiri, any thoughts about this?  At first glance, it looks sane to me,
but a second review would be great.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] drivers:tty:pty: Fix a race causing data loss on close
  2020-10-05 11:31 ` Greg Kroah-Hartman
@ 2020-10-06  4:48   ` Jiri Slaby
  0 siblings, 0 replies; 4+ messages in thread
From: Jiri Slaby @ 2020-10-06  4:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman, minyard; +Cc: LKML, Corey Minyard

On 05. 10. 20, 13:31, Greg Kroah-Hartman wrote:
> On Fri, Oct 02, 2020 at 08:03:04AM -0500, minyard@acm.org wrote:
>> From: Corey Minyard <cminyard@mvista.com>
>>
>> If you write to a pty master an immediately close the pty master, the
>> receiver might get a chunk of data dropped, but then receive some later
>> data.  That's obviously something rather unexpected for a user.  It
>> certainly confused my test program.
...
> Jiri, any thoughts about this?  At first glance, it looks sane to me,
> but a second review would be great.

Hi,

I haven't had a chance to look into it yet (labsconf this week). Moving
it upwards on my TODO.

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] drivers:tty:pty: Fix a race causing data loss on close
  2020-10-02 13:03 [PATCH v2] drivers:tty:pty: Fix a race causing data loss on close minyard
  2020-10-05 11:31 ` Greg Kroah-Hartman
@ 2020-10-12  7:13 ` Jiri Slaby
  1 sibling, 0 replies; 4+ messages in thread
From: Jiri Slaby @ 2020-10-12  7:13 UTC (permalink / raw)
  To: minyard, Greg Kroah-Hartman
  Cc: LKML, Corey Minyard, Peter Hurley, brian.bloniarz

On 02. 10. 20, 15:03, minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> If you write to a pty master an immediately close the pty master, the
> receiver might get a chunk of data dropped, but then receive some later
> data.  That's obviously something rather unexpected for a user.  It
> certainly confused my test program.

Hmm, that's another instance of bugs which were fixed in the past. Like:
commit f8747d4a466ab2cafe56112c51b3379f9fdb7a12
Author: Peter Hurley <peter@hurleysoftware.com>
Date:   Fri Sep 27 13:27:05 2013 -0400

     tty: Fix pty master read() after slave closes

and
commit 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa
Author: Brian Bloniarz <brian.bloniarz@gmail.com>
Date:   Sun Mar 6 13:16:30 2016 -0800

     Fix OpenSSH pty regression on close


Ccing Peter who is involved in tty buffers a lot and Brian.

> It turns out that tty_vhangup() gets called from pty_close(), and that
> causes the data on the slave side to be flushed, but due to races more
> data can be copied into the slave side's buffer after that.  Consider
> the following sequence:
> 
> thread1          thread2         thread3
> -------          -------         -------
>   |                |-write data into buffer,
>   |                | n_tty buffer is filled
>   |                | along with other buffers
>   |                |-pty_close()
>   |                |--tty_vhangup()

This hangup is on slave, not master. This confused me initially.

>   |                |---tty_ldisc_hangup()
>   |                |----n_tty_flush_buffer()
>   |                |-----reset_buffer_flags()

So this flushes the data (as on the toilette; see below) slave already 
had in its buffer. That's not nice. As I understand it, programs like 
ssh rely on all the data to be delivered, actually.

>   |-n_tty_read()   |
>   |--up_read(&tty->termios_rwsem);
>   |                |------down_read(&tty->termios_rwsem)
>   |                |------clear n_tty buffer contents
>   |                |------up_read(&tty->termios_rwsem)
>   |--tty_buffer_flush_work()       |
>   |--schedules work calling        |
>   |  flush_to_ldisc()              |

"Flush" doesn't mean "flush" as on the toilette. Here, it's "flush" 
meaning actually "push" (that confused you given what I read in the note 
about trying to remove the rwsem lock below). It simply waits for 
flush_to_ldisc to finish the job.

>   |                                |-flush_to_ldisc()
>   |                                |--receive_buf()
>   |                                |---tty_port_default_receive_buf()
>   |                                |----tty_ldisc_receive_buf()
>   |                                |-----n_tty_receive_buf2()
>   |                                |------n_tty_receive_buf_common()
>   |                                |-------down_read(&tty->termios_rwsem)
>   |                                |-------__receive_buf()
>   |                                |-------copies data into n_tty buffer
>   |                                |-------up_read(&tty->termios_rwsem)

Sure, so we reset the head/tail in reset_buffer_flags (the data you 
miss) and queued another chunk of data (the data you get).

I currently don't see how to fix this properly. Any hint appreciated.

>   |--down_read(&tty->termios_rwsem)
>   |--copy buffer data to user
> 
>  From this sequence, you can see that thread2 writes to the buffer then
> only clears the part of the buffer in n_tty.  The n_tty receive buffer
> code then copies more data into the n_tty buffer.
> 
> This change checks to see if the tty is being hung up before copying
> anything in n_tty_receive_buf_common().  It has to be done after the
> tty->termios_rwsem semaphore is claimed, for reasons that should be
> apparent from the sequence above.
> 
> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> ---
> 
> Changes since v1: Added lines to make the sequence diagram clearer.
> 
> I sent a program to reproduce this, I extended it to prove it wasn't the
> test program that caused the issue, and I've uploaded it to:
>    http://sourceforge.net/projects/ser2net/files/tmp/testpty.c
> if you want to run it.  It has a lot of comments that explain what is
> going on.
> 
> This is not a very satisfying fix, though.  It works reliably, but it
> doesn't seem right to me.  My inclination was to remove the up and down
> semaphore around tty_buffer_flush_work() in n_tty_read(), as it just
> schedules some work, no need to unlock for that.  But that resulted
> in a deadlock elsewhere, so that up/down on the semaphore is there for
> another reason.
> 
> The locking in the tty code is really hard to follow.  I believe this is
> actually a locking problem, but fixing it looks daunting to me.
> 
> Another way to fix this that occurred just occurred to me would be to
> clear all the buffers in pty_close().
> 
> -corey
> 
> 
>   drivers/tty/n_tty.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
> index 1794d84e7bf6..1c33c26dc229 100644
> --- a/drivers/tty/n_tty.c
> +++ b/drivers/tty/n_tty.c
> @@ -1704,6 +1704,9 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>   
>   	down_read(&tty->termios_rwsem);
>   
> +	if (test_bit(TTY_HUPPING, &tty->flags))
> +		goto out_upsem;
> +
>   	do {
>   		/*
>   		 * When PARMRK is set, each input char may take up to 3 chars
> @@ -1760,6 +1763,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
>   	} else
>   		n_tty_check_throttle(tty);
>   
> +out_upsem:
>   	up_read(&tty->termios_rwsem);
>   
>   	return rcvd;
> 

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-10-12  7:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-02 13:03 [PATCH v2] drivers:tty:pty: Fix a race causing data loss on close minyard
2020-10-05 11:31 ` Greg Kroah-Hartman
2020-10-06  4:48   ` Jiri Slaby
2020-10-12  7:13 ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).