linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tty: fix data race in flush_to_ldisc
@ 2015-09-01 20:11 Dmitry Vyukov
  2015-09-02 14:18 ` Peter Hurley
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-01 20:11 UTC (permalink / raw)
  To: gregkh, peter, jslaby, linux-kernel
  Cc: jslaby, andreyknvl, kcc, glider, Dmitry Vyukov

The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):

ThreadSanitizer: data-race in release_tty
Write of size 8 by thread T325 (K2579):
  release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
  tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
  __fput+0x15f/0x310 fs/file_table.c:207
  ____fput+0x1d/0x30 fs/file_table.c:243
  task_work_run+0x115/0x130 kernel/task_work.c:123
  do_notify_resume+0x73/0x80
    tracehook_notify_resume include/linux/tracehook.h:190
  do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
  int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
Previous read of size 8 by thread T19 (K16):
  flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
  process_one_work+0x47e/0x930 kernel/workqueue.c:2036
  worker_thread+0xb0/0x900 kernel/workqueue.c:2170
  kthread+0x150/0x170 kernel/kthread.c:207

flush_to_ldisc reads port->itty and checks that it is not NULL,
concurrently release_tty sets port->itty to NULL. It is possible
that flush_to_ldisc loads port->itty once, ensures that it is
not NULL, but then reloads it again and uses. The second load
can already return NULL, which will cause a crash.

Set port->itty to NULL after shutting down flush_to_ldisc work.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
---
 drivers/tty/tty_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 57fc6ee..0df24c1 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1684,9 +1684,9 @@ static void release_tty(struct tty_struct *tty, int idx)
 	tty_free_termios(tty);
 	tty_driver_remove_tty(tty->driver, tty);
 	tty->port->itty = NULL;
+	cancel_work_sync(&tty->port->buf.work);
 	if (tty->link)
 		tty->link->port->itty = NULL;
-	cancel_work_sync(&tty->port->buf.work);
 
 	tty_kref_put(tty->link);
 	tty_kref_put(tty);
-- 
2.5.0.457.gab17608


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-01 20:11 [PATCH] tty: fix data race in flush_to_ldisc Dmitry Vyukov
@ 2015-09-02 14:18 ` Peter Hurley
  2015-09-02 14:41   ` Dmitry Vyukov
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Hurley @ 2015-09-02 14:18 UTC (permalink / raw)
  To: Dmitry Vyukov, gregkh
  Cc: jslaby, linux-kernel, jslaby, andreyknvl, kcc, glider

On 09/01/2015 04:11 PM, Dmitry Vyukov wrote:
> The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):
> 
> ThreadSanitizer: data-race in release_tty
> Write of size 8 by thread T325 (K2579):
>   release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>   tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>   __fput+0x15f/0x310 fs/file_table.c:207
>   ____fput+0x1d/0x30 fs/file_table.c:243
>   task_work_run+0x115/0x130 kernel/task_work.c:123
>   do_notify_resume+0x73/0x80
>     tracehook_notify_resume include/linux/tracehook.h:190
>   do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>   int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
> Previous read of size 8 by thread T19 (K16):
>   flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>   process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>   worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>   kthread+0x150/0x170 kernel/kthread.c:207
> 
> flush_to_ldisc reads port->itty and checks that it is not NULL,
> concurrently release_tty sets port->itty to NULL. It is possible
> that flush_to_ldisc loads port->itty once, ensures that it is
> not NULL, but then reloads it again and uses. The second load
> can already return NULL, which will cause a crash.
> 
> Set port->itty to NULL after shutting down flush_to_ldisc work.
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> ---
>  drivers/tty/tty_io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 57fc6ee..0df24c1 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -1684,9 +1684,9 @@ static void release_tty(struct tty_struct *tty, int idx)
>  	tty_free_termios(tty);
>  	tty_driver_remove_tty(tty->driver, tty);
>  	tty->port->itty = NULL;
> +	cancel_work_sync(&tty->port->buf.work);
>  	if (tty->link)
>  		tty->link->port->itty = NULL;
> -	cancel_work_sync(&tty->port->buf.work);

This patch doesn't do anything.

Regards,
Peter Hurley

>  
>  	tty_kref_put(tty->link);
>  	tty_kref_put(tty);
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-02 14:18 ` Peter Hurley
@ 2015-09-02 14:41   ` Dmitry Vyukov
  2015-09-02 15:28     ` Peter Hurley
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-02 14:41 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Greg Kroah-Hartman, jslaby, LKML, Jiri Slaby, Andrey Konovalov,
	Kostya Serebryany, Alexander Potapenko

On Wed, Sep 2, 2015 at 4:18 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
> On 09/01/2015 04:11 PM, Dmitry Vyukov wrote:
>> The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):
>>
>> ThreadSanitizer: data-race in release_tty
>> Write of size 8 by thread T325 (K2579):
>>   release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>>   tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>>   __fput+0x15f/0x310 fs/file_table.c:207
>>   ____fput+0x1d/0x30 fs/file_table.c:243
>>   task_work_run+0x115/0x130 kernel/task_work.c:123
>>   do_notify_resume+0x73/0x80
>>     tracehook_notify_resume include/linux/tracehook.h:190
>>   do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>>   int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
>> Previous read of size 8 by thread T19 (K16):
>>   flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>>   process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>>   worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>>   kthread+0x150/0x170 kernel/kthread.c:207
>>
>> flush_to_ldisc reads port->itty and checks that it is not NULL,
>> concurrently release_tty sets port->itty to NULL. It is possible
>> that flush_to_ldisc loads port->itty once, ensures that it is
>> not NULL, but then reloads it again and uses. The second load
>> can already return NULL, which will cause a crash.
>>
>> Set port->itty to NULL after shutting down flush_to_ldisc work.
>>
>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>> ---
>>  drivers/tty/tty_io.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>> index 57fc6ee..0df24c1 100644
>> --- a/drivers/tty/tty_io.c
>> +++ b/drivers/tty/tty_io.c
>> @@ -1684,9 +1684,9 @@ static void release_tty(struct tty_struct *tty, int idx)
>>       tty_free_termios(tty);
>>       tty_driver_remove_tty(tty->driver, tty);
>>       tty->port->itty = NULL;
>> +     cancel_work_sync(&tty->port->buf.work);
>>       if (tty->link)
>>               tty->link->port->itty = NULL;
>> -     cancel_work_sync(&tty->port->buf.work);
>
> This patch doesn't do anything.

It sets tty->link->port->itty to NULL _after_ waiting for
flush_to_ldisc work to finish. At least this is the intention.
Can you explain why it does not do anything?

Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-02 14:41   ` Dmitry Vyukov
@ 2015-09-02 15:28     ` Peter Hurley
  2015-09-02 17:53       ` Dmitry Vyukov
  2015-09-02 17:55       ` Dmitry Vyukov
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Hurley @ 2015-09-02 15:28 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Greg Kroah-Hartman, jslaby, LKML, Jiri Slaby, Andrey Konovalov,
	Kostya Serebryany, Alexander Potapenko

On 09/02/2015 10:41 AM, Dmitry Vyukov wrote:
> On Wed, Sep 2, 2015 at 4:18 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
>> On 09/01/2015 04:11 PM, Dmitry Vyukov wrote:
>>> The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):
>>>
>>> ThreadSanitizer: data-race in release_tty
>>> Write of size 8 by thread T325 (K2579):
>>>   release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>>>   tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>>>   __fput+0x15f/0x310 fs/file_table.c:207
>>>   ____fput+0x1d/0x30 fs/file_table.c:243
>>>   task_work_run+0x115/0x130 kernel/task_work.c:123
>>>   do_notify_resume+0x73/0x80
>>>     tracehook_notify_resume include/linux/tracehook.h:190
>>>   do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>>>   int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
>>> Previous read of size 8 by thread T19 (K16):
>>>   flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>>>   process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>>>   worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>>>   kthread+0x150/0x170 kernel/kthread.c:207
>>>
>>> flush_to_ldisc reads port->itty and checks that it is not NULL,
>>> concurrently release_tty sets port->itty to NULL. It is possible
>>> that flush_to_ldisc loads port->itty once, ensures that it is
>>> not NULL, but then reloads it again and uses. The second load
>>> can already return NULL, which will cause a crash.
>>>
>>> Set port->itty to NULL after shutting down flush_to_ldisc work.
>>>
>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>> ---
>>>  drivers/tty/tty_io.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>>> index 57fc6ee..0df24c1 100644
>>> --- a/drivers/tty/tty_io.c
>>> +++ b/drivers/tty/tty_io.c
>>> @@ -1684,9 +1684,9 @@ static void release_tty(struct tty_struct *tty, int idx)
>>>       tty_free_termios(tty);
>>>       tty_driver_remove_tty(tty->driver, tty);
>>>       tty->port->itty = NULL;
>>> +     cancel_work_sync(&tty->port->buf.work);
>>>       if (tty->link)
>>>               tty->link->port->itty = NULL;
>>> -     cancel_work_sync(&tty->port->buf.work);
>>
>> This patch doesn't do anything.
> 
> It sets tty->link->port->itty to NULL _after_ waiting for
> flush_to_ldisc work to finish. At least this is the intention.
> Can you explain why it does not do anything?

Because tty->link is a different tty than that corresponding
to the buffer work being cancelled.

Regards,
Peter Hurley


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-02 15:28     ` Peter Hurley
@ 2015-09-02 17:53       ` Dmitry Vyukov
  2015-09-03  0:50         ` Peter Hurley
  2015-09-02 17:55       ` Dmitry Vyukov
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-02 17:53 UTC (permalink / raw)
  To: gregkh, peter, jslaby, linux-kernel
  Cc: jslaby, andreyknvl, kcc, glider, Dmitry Vyukov

The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):

ThreadSanitizer: data-race in release_tty
Write of size 8 by thread T325 (K2579):
  release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
  tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
  __fput+0x15f/0x310 fs/file_table.c:207
  ____fput+0x1d/0x30 fs/file_table.c:243
  task_work_run+0x115/0x130 kernel/task_work.c:123
  do_notify_resume+0x73/0x80
    tracehook_notify_resume include/linux/tracehook.h:190
  do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
  int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
Previous read of size 8 by thread T19 (K16):
  flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
  process_one_work+0x47e/0x930 kernel/workqueue.c:2036
  worker_thread+0xb0/0x900 kernel/workqueue.c:2170
  kthread+0x150/0x170 kernel/kthread.c:207

flush_to_ldisc reads port->itty and checks that it is not NULL,
concurrently release_tty sets port->itty to NULL. It is possible
that flush_to_ldisc loads port->itty once, ensures that it is
not NULL, but then reloads it again and uses. The second load
can already return NULL, which will cause a crash.

Use READ_ONCE/WRITE_ONCE to read/update port->itty.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
---
 drivers/tty/tty_buffer.c | 2 +-
 drivers/tty/tty_io.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index 4cf263d..1f1031d 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -469,7 +469,7 @@ static void flush_to_ldisc(struct work_struct *work)
 	struct tty_struct *tty;
 	struct tty_ldisc *disc;
 
-	tty = port->itty;
+	tty = READ_ONCE(port->itty);
 	if (tty == NULL)
 		return;
 
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 57fc6ee..aad47df 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1685,7 +1685,7 @@ static void release_tty(struct tty_struct *tty, int idx)
 	tty_driver_remove_tty(tty->driver, tty);
 	tty->port->itty = NULL;
 	if (tty->link)
-		tty->link->port->itty = NULL;
+		WRITE_ONCE(tty->link->port->itty, NULL);
 	cancel_work_sync(&tty->port->buf.work);
 
 	tty_kref_put(tty->link);
-- 
2.5.0.457.gab17608


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-02 15:28     ` Peter Hurley
  2015-09-02 17:53       ` Dmitry Vyukov
@ 2015-09-02 17:55       ` Dmitry Vyukov
  1 sibling, 0 replies; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-02 17:55 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Greg Kroah-Hartman, jslaby, LKML, Jiri Slaby, Andrey Konovalov,
	Kostya Serebryany, Alexander Potapenko

On Wed, Sep 2, 2015 at 5:28 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
> On 09/02/2015 10:41 AM, Dmitry Vyukov wrote:
>> On Wed, Sep 2, 2015 at 4:18 PM, Peter Hurley <peter@hurleysoftware.com> wrote:
>>> On 09/01/2015 04:11 PM, Dmitry Vyukov wrote:
>>>> The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):
>>>>
>>>> ThreadSanitizer: data-race in release_tty
>>>> Write of size 8 by thread T325 (K2579):
>>>>   release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>>>>   tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>>>>   __fput+0x15f/0x310 fs/file_table.c:207
>>>>   ____fput+0x1d/0x30 fs/file_table.c:243
>>>>   task_work_run+0x115/0x130 kernel/task_work.c:123
>>>>   do_notify_resume+0x73/0x80
>>>>     tracehook_notify_resume include/linux/tracehook.h:190
>>>>   do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>>>>   int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
>>>> Previous read of size 8 by thread T19 (K16):
>>>>   flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>>>>   process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>>>>   worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>>>>   kthread+0x150/0x170 kernel/kthread.c:207
>>>>
>>>> flush_to_ldisc reads port->itty and checks that it is not NULL,
>>>> concurrently release_tty sets port->itty to NULL. It is possible
>>>> that flush_to_ldisc loads port->itty once, ensures that it is
>>>> not NULL, but then reloads it again and uses. The second load
>>>> can already return NULL, which will cause a crash.
>>>>
>>>> Set port->itty to NULL after shutting down flush_to_ldisc work.
>>>>
>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>>> ---
>>>>  drivers/tty/tty_io.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>>>> index 57fc6ee..0df24c1 100644
>>>> --- a/drivers/tty/tty_io.c
>>>> +++ b/drivers/tty/tty_io.c
>>>> @@ -1684,9 +1684,9 @@ static void release_tty(struct tty_struct *tty, int idx)
>>>>       tty_free_termios(tty);
>>>>       tty_driver_remove_tty(tty->driver, tty);
>>>>       tty->port->itty = NULL;
>>>> +     cancel_work_sync(&tty->port->buf.work);
>>>>       if (tty->link)
>>>>               tty->link->port->itty = NULL;
>>>> -     cancel_work_sync(&tty->port->buf.work);
>>>
>>> This patch doesn't do anything.
>>
>> It sets tty->link->port->itty to NULL _after_ waiting for
>> flush_to_ldisc work to finish. At least this is the intention.
>> Can you explain why it does not do anything?
>
> Because tty->link is a different tty than that corresponding
> to the buffer work being cancelled.

Got it.
Please take a look at the updated version of the patch.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-02 17:53       ` Dmitry Vyukov
@ 2015-09-03  0:50         ` Peter Hurley
  2015-09-04 19:28           ` Dmitry Vyukov
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Hurley @ 2015-09-03  0:50 UTC (permalink / raw)
  To: Dmitry Vyukov, gregkh
  Cc: jslaby, linux-kernel, jslaby, andreyknvl, kcc, glider

On 09/02/2015 01:53 PM, Dmitry Vyukov wrote:
> The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):
> 
> ThreadSanitizer: data-race in release_tty
> Write of size 8 by thread T325 (K2579):
>   release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>   tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>   __fput+0x15f/0x310 fs/file_table.c:207
>   ____fput+0x1d/0x30 fs/file_table.c:243
>   task_work_run+0x115/0x130 kernel/task_work.c:123
>   do_notify_resume+0x73/0x80
>     tracehook_notify_resume include/linux/tracehook.h:190
>   do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>   int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
> Previous read of size 8 by thread T19 (K16):
>   flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>   process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>   worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>   kthread+0x150/0x170 kernel/kthread.c:207

The stack traces are not really helpful in describing how the race
occurs; I would leave it out of the changelog.


> flush_to_ldisc reads port->itty and checks that it is not NULL,
> concurrently release_tty sets port->itty to NULL. It is possible
> that flush_to_ldisc loads port->itty once, ensures that it is
> not NULL, but then reloads it again and uses. The second load
> can already return NULL, which will cause a crash.
> 
> Use READ_ONCE/WRITE_ONCE to read/update port->itty.

See below.

> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> ---
>  drivers/tty/tty_buffer.c | 2 +-
>  drivers/tty/tty_io.c     | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
> index 4cf263d..1f1031d 100644
> --- a/drivers/tty/tty_buffer.c
> +++ b/drivers/tty/tty_buffer.c
> @@ -469,7 +469,7 @@ static void flush_to_ldisc(struct work_struct *work)
>  	struct tty_struct *tty;
>  	struct tty_ldisc *disc;
>  
> -	tty = port->itty;
> +	tty = READ_ONCE(port->itty);

This is fine.

>  	if (tty == NULL)
>  		return;
>  
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 57fc6ee..aad47df 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -1685,7 +1685,7 @@ static void release_tty(struct tty_struct *tty, int idx)
>  	tty_driver_remove_tty(tty->driver, tty);
>  	tty->port->itty = NULL;
>  	if (tty->link)
> -		tty->link->port->itty = NULL;
> +		WRITE_ONCE(tty->link->port->itty, NULL);

This isn't doing anything useful.

1. The compiler can't push the store past the cancel_work_sync() (because the
   compiler has no visibility into cancel_work_sync()), and,
2. There's no effect if the compiler hoists the store higher in the release_tty()
   because the line discipline has already been closed and killed (so the
   tty_ldisc_ref() in flush_to_ldisc() returns NULL anyway).


Regards,
Peter Hurley

>  	cancel_work_sync(&tty->port->buf.work);
>  
>  	tty_kref_put(tty->link);
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-03  0:50         ` Peter Hurley
@ 2015-09-04 19:28           ` Dmitry Vyukov
  2015-09-16  2:54             ` Peter Hurley
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-04 19:28 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Greg Kroah-Hartman, jslaby, LKML, Jiri Slaby, Andrey Konovalov,
	Kostya Serebryany, Alexander Potapenko

On Thu, Sep 3, 2015 at 2:50 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
> On 09/02/2015 01:53 PM, Dmitry Vyukov wrote:
>> The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):
>>
>> ThreadSanitizer: data-race in release_tty
>> Write of size 8 by thread T325 (K2579):
>>   release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>>   tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>>   __fput+0x15f/0x310 fs/file_table.c:207
>>   ____fput+0x1d/0x30 fs/file_table.c:243
>>   task_work_run+0x115/0x130 kernel/task_work.c:123
>>   do_notify_resume+0x73/0x80
>>     tracehook_notify_resume include/linux/tracehook.h:190
>>   do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>>   int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
>> Previous read of size 8 by thread T19 (K16):
>>   flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>>   process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>>   worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>>   kthread+0x150/0x170 kernel/kthread.c:207
>
> The stack traces are not really helpful in describing how the race
> occurs; I would leave it out of the changelog.

ok

>> flush_to_ldisc reads port->itty and checks that it is not NULL,
>> concurrently release_tty sets port->itty to NULL. It is possible
>> that flush_to_ldisc loads port->itty once, ensures that it is
>> not NULL, but then reloads it again and uses. The second load
>> can already return NULL, which will cause a crash.
>>
>> Use READ_ONCE/WRITE_ONCE to read/update port->itty.
>
> See below.
>
>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>> ---
>>  drivers/tty/tty_buffer.c | 2 +-
>>  drivers/tty/tty_io.c     | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
>> index 4cf263d..1f1031d 100644
>> --- a/drivers/tty/tty_buffer.c
>> +++ b/drivers/tty/tty_buffer.c
>> @@ -469,7 +469,7 @@ static void flush_to_ldisc(struct work_struct *work)
>>       struct tty_struct *tty;
>>       struct tty_ldisc *disc;
>>
>> -     tty = port->itty;
>> +     tty = READ_ONCE(port->itty);
>
> This is fine.
>
>>       if (tty == NULL)
>>               return;
>>
>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>> index 57fc6ee..aad47df 100644
>> --- a/drivers/tty/tty_io.c
>> +++ b/drivers/tty/tty_io.c
>> @@ -1685,7 +1685,7 @@ static void release_tty(struct tty_struct *tty, int idx)
>>       tty_driver_remove_tty(tty->driver, tty);
>>       tty->port->itty = NULL;
>>       if (tty->link)
>> -             tty->link->port->itty = NULL;
>> +             WRITE_ONCE(tty->link->port->itty, NULL);
>
> This isn't doing anything useful.
>
> 1. The compiler can't push the store past the cancel_work_sync() (because the
>    compiler has no visibility into cancel_work_sync()), and,
> 2. There's no effect if the compiler hoists the store higher in the release_tty()
>    because the line discipline has already been closed and killed (so the
>    tty_ldisc_ref() in flush_to_ldisc() returns NULL anyway).

OK, let me do one try at convincing you that WRITE_ONCE here is a good
idea. If you are not convinced then I will remove it.

WRITE_ONCE/READ_ONCE for all shared memory accesses:
1. Make the code more readable but highlighting important aspects.
2. Required by relevant standards and relieve you, me and everybody
else reading this code from spending time on proving that it cannot
break (think of multi-file compilation mode, store tearing which
compilers indeed known to do in some contexts, and compiler
transformations that we don't know of).
3. Allow tooling that finds undoubtedly harmful bugs, like this one,
or in fact, I've just mailed another fix for missed memory barriers in
this file (also found by KTSAN).

I've described these aspects in more detail here:
https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

I don't see any negative aspects to it. Do you see any? Because if you
see at least some value in at least on these points and don't see any
negative aspects, then it is worth doing.


-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-04 19:28           ` Dmitry Vyukov
@ 2015-09-16  2:54             ` Peter Hurley
  2015-09-17 11:42               ` Dmitry Vyukov
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Hurley @ 2015-09-16  2:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Greg Kroah-Hartman, Jiri Slaby, LKML, Jiri Slaby,
	Andrey Konovalov, Kostya Serebryany, Alexander Potapenko

On 09/04/2015 03:28 PM, Dmitry Vyukov wrote:
> On Thu, Sep 3, 2015 at 2:50 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
>> On 09/02/2015 01:53 PM, Dmitry Vyukov wrote:
>>> The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):
>>>
>>> ThreadSanitizer: data-race in release_tty
>>> Write of size 8 by thread T325 (K2579):
>>>   release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>>>   tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>>>   __fput+0x15f/0x310 fs/file_table.c:207
>>>   ____fput+0x1d/0x30 fs/file_table.c:243
>>>   task_work_run+0x115/0x130 kernel/task_work.c:123
>>>   do_notify_resume+0x73/0x80
>>>     tracehook_notify_resume include/linux/tracehook.h:190
>>>   do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>>>   int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
>>> Previous read of size 8 by thread T19 (K16):
>>>   flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>>>   process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>>>   worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>>>   kthread+0x150/0x170 kernel/kthread.c:207
>>
>> The stack traces are not really helpful in describing how the race
>> occurs; I would leave it out of the changelog.
>
> ok

Just to clarify; my comment is only wrt this particular patch.
You may have other patches in the future with unclear execution paths
that may require the call stacks. It's just that this particular
race has invariant call stacks (for the relevant parts).


>>> flush_to_ldisc reads port->itty and checks that it is not NULL,
>>> concurrently release_tty sets port->itty to NULL. It is possible
>>> that flush_to_ldisc loads port->itty once, ensures that it is
>>> not NULL, but then reloads it again and uses. The second load
>>> can already return NULL, which will cause a crash.
>>>
>>> Use READ_ONCE/WRITE_ONCE to read/update port->itty.
>>
>> See below.
>>
>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>> ---
>>>  drivers/tty/tty_buffer.c | 2 +-
>>>  drivers/tty/tty_io.c     | 2 +-
>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
>>> index 4cf263d..1f1031d 100644
>>> --- a/drivers/tty/tty_buffer.c
>>> +++ b/drivers/tty/tty_buffer.c
>>> @@ -469,7 +469,7 @@ static void flush_to_ldisc(struct work_struct *work)
>>>       struct tty_struct *tty;
>>>       struct tty_ldisc *disc;
>>>
>>> -     tty = port->itty;
>>> +     tty = READ_ONCE(port->itty);
>>
>> This is fine.
>>
>>>       if (tty == NULL)
>>>               return;
>>>
>>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>>> index 57fc6ee..aad47df 100644
>>> --- a/drivers/tty/tty_io.c
>>> +++ b/drivers/tty/tty_io.c
>>> @@ -1685,7 +1685,7 @@ static void release_tty(struct tty_struct *tty, int idx)
>>>       tty_driver_remove_tty(tty->driver, tty);
>>>       tty->port->itty = NULL;
>>>       if (tty->link)
>>> -             tty->link->port->itty = NULL;
>>> +             WRITE_ONCE(tty->link->port->itty, NULL);
>>
>> This isn't doing anything useful.
>>
>> 1. The compiler can't push the store past the cancel_work_sync() (because the
>>    compiler has no visibility into cancel_work_sync()), and,
>> 2. There's no effect if the compiler hoists the store higher in the release_tty()
>>    because the line discipline has already been closed and killed (so the
>>    tty_ldisc_ref() in flush_to_ldisc() returns NULL anyway).
>
> OK, let me do one try at convincing you that WRITE_ONCE here is a good
> idea. If you are not convinced then I will remove it.

FWIW, I'm not the gatekeeper wrt tree-wide code/best-practices changes;
iow, convincing me is not a useful goal.  But I think this would be a
good topic/
presentation for either Kernel Summit or Linux Plumbers.

That said, I'll make some observations below that you should expect to read
from other contributors/maintainers regarding your point-of-view.

> WRITE_ONCE/READ_ONCE for all shared memory accesses:
> 1. Make the code more readable but highlighting important aspects.

Most would argue the additional macros detract from readability.

> 2. Required by relevant standards and relieve you, me and everybody
> else reading this code from spending time on proving that it cannot
> break (think of multi-file compilation mode, store tearing which
> compilers indeed known to do in some contexts, and compiler
> transformations that we don't know of).

Multi-file compilation and the compiler memory model are inherently
incompatible with kernel code. Instrumenting code rather than specializing
the tool (compiler) is progress in the wrong direction, imho.

Same with store tearing of primitive types.

Compiler "transforms" are an occasional hazard of kernel development,
as are straight-up compiler bugs; in my limited experience, each occurs
with equal frequency.

> 3. Allow tooling that finds undoubtedly harmful bugs, like this one,

While the READ_ONCE() fix improves robustness, I wouldn't categorize
this as a 'harmful' bug. I seriously doubt any compiler has generated
a reload of port->itty in flush_to_ldisc(). As such, I would disagree with
any submission to stable kernels for this fix.

> or in fact, I've just mailed another fix for missed memory barriers in
> this file (also found by KTSAN).

I really appreciate you uncovering those. But note that READ_ONCE/
WRITE_ONCE aren't necessary to _find_ those bugs, but rather to
eliminate false positives (which I can sympathize with).

And in fact, with every potential race, human analysis is required anyway.

> I've described these aspects in more detail here:
> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
>
> I don't see any negative aspects to it. Do you see any? Because if you
> see at least some value in at least on these points and don't see any
> negative aspects, then it is worth doing.

Consider the converse suggestion; everywhere I even smell race I add
READ_ONCE/WRITE_ONCE. Kernel code would become unreadable,
(and slower on hot paths). Consequently, each change will need to be
justified, and thus, we've arrived where we started:  analyzing each
race on its own merits and fixing real races.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tty: fix data race in flush_to_ldisc
  2015-09-16  2:54             ` Peter Hurley
@ 2015-09-17 11:42               ` Dmitry Vyukov
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry Vyukov @ 2015-09-17 11:42 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Greg Kroah-Hartman, Jiri Slaby, LKML, Jiri Slaby,
	Andrey Konovalov, Kostya Serebryany, Alexander Potapenko

On Wed, Sep 16, 2015 at 4:54 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
> On 09/04/2015 03:28 PM, Dmitry Vyukov wrote:
>> On Thu, Sep 3, 2015 at 2:50 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
>>> On 09/02/2015 01:53 PM, Dmitry Vyukov wrote:
>>>> The data race is found with KernelThreadSanitizer (on rev 21bdb584af8c):
>>>>
>>>> ThreadSanitizer: data-race in release_tty
>>>> Write of size 8 by thread T325 (K2579):
>>>>   release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>>>>   tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>>>>   __fput+0x15f/0x310 fs/file_table.c:207
>>>>   ____fput+0x1d/0x30 fs/file_table.c:243
>>>>   task_work_run+0x115/0x130 kernel/task_work.c:123
>>>>   do_notify_resume+0x73/0x80
>>>>     tracehook_notify_resume include/linux/tracehook.h:190
>>>>   do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>>>>   int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
>>>> Previous read of size 8 by thread T19 (K16):
>>>>   flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>>>>   process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>>>>   worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>>>>   kthread+0x150/0x170 kernel/kthread.c:207
>>>
>>> The stack traces are not really helpful in describing how the race
>>> occurs; I would leave it out of the changelog.
>>
>> ok
>
> Just to clarify; my comment is only wrt this particular patch.
> You may have other patches in the future with unclear execution paths
> that may require the call stacks. It's just that this particular
> race has invariant call stacks (for the relevant parts).
>
>
>>>> flush_to_ldisc reads port->itty and checks that it is not NULL,
>>>> concurrently release_tty sets port->itty to NULL. It is possible
>>>> that flush_to_ldisc loads port->itty once, ensures that it is
>>>> not NULL, but then reloads it again and uses. The second load
>>>> can already return NULL, which will cause a crash.
>>>>
>>>> Use READ_ONCE/WRITE_ONCE to read/update port->itty.
>>>
>>> See below.
>>>
>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>>> ---
>>>>  drivers/tty/tty_buffer.c | 2 +-
>>>>  drivers/tty/tty_io.c     | 2 +-
>>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
>>>> index 4cf263d..1f1031d 100644
>>>> --- a/drivers/tty/tty_buffer.c
>>>> +++ b/drivers/tty/tty_buffer.c
>>>> @@ -469,7 +469,7 @@ static void flush_to_ldisc(struct work_struct *work)
>>>>       struct tty_struct *tty;
>>>>       struct tty_ldisc *disc;
>>>>
>>>> -     tty = port->itty;
>>>> +     tty = READ_ONCE(port->itty);
>>>
>>> This is fine.
>>>
>>>>       if (tty == NULL)
>>>>               return;
>>>>
>>>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>>>> index 57fc6ee..aad47df 100644
>>>> --- a/drivers/tty/tty_io.c
>>>> +++ b/drivers/tty/tty_io.c
>>>> @@ -1685,7 +1685,7 @@ static void release_tty(struct tty_struct *tty, int idx)
>>>>       tty_driver_remove_tty(tty->driver, tty);
>>>>       tty->port->itty = NULL;
>>>>       if (tty->link)
>>>> -             tty->link->port->itty = NULL;
>>>> +             WRITE_ONCE(tty->link->port->itty, NULL);
>>>
>>> This isn't doing anything useful.
>>>
>>> 1. The compiler can't push the store past the cancel_work_sync() (because the
>>>    compiler has no visibility into cancel_work_sync()), and,
>>> 2. There's no effect if the compiler hoists the store higher in the release_tty()
>>>    because the line discipline has already been closed and killed (so the
>>>    tty_ldisc_ref() in flush_to_ldisc() returns NULL anyway).
>>
>> OK, let me do one try at convincing you that WRITE_ONCE here is a good
>> idea. If you are not convinced then I will remove it.
>
> FWIW, I'm not the gatekeeper wrt tree-wide code/best-practices changes;
> iow, convincing me is not a useful goal.  But I think this would be a
> good topic/
> presentation for either Kernel Summit or Linux Plumbers.
>
> That said, I'll make some observations below that you should expect to read
> from other contributors/maintainers regarding your point-of-view.
>
>> WRITE_ONCE/READ_ONCE for all shared memory accesses:
>> 1. Make the code more readable but highlighting important aspects.
>
> Most would argue the additional macros detract from readability.
>
>> 2. Required by relevant standards and relieve you, me and everybody
>> else reading this code from spending time on proving that it cannot
>> break (think of multi-file compilation mode, store tearing which
>> compilers indeed known to do in some contexts, and compiler
>> transformations that we don't know of).
>
> Multi-file compilation and the compiler memory model are inherently
> incompatible with kernel code. Instrumenting code rather than specializing
> the tool (compiler) is progress in the wrong direction, imho.
>
> Same with store tearing of primitive types.
>
> Compiler "transforms" are an occasional hazard of kernel development,
> as are straight-up compiler bugs; in my limited experience, each occurs
> with equal frequency.
>
>> 3. Allow tooling that finds undoubtedly harmful bugs, like this one,
>
> While the READ_ONCE() fix improves robustness, I wouldn't categorize
> this as a 'harmful' bug. I seriously doubt any compiler has generated
> a reload of port->itty in flush_to_ldisc(). As such, I would disagree with
> any submission to stable kernels for this fix.
>
>> or in fact, I've just mailed another fix for missed memory barriers in
>> this file (also found by KTSAN).
>
> I really appreciate you uncovering those. But note that READ_ONCE/
> WRITE_ONCE aren't necessary to _find_ those bugs, but rather to
> eliminate false positives (which I can sympathize with).
>
> And in fact, with every potential race, human analysis is required anyway.
>
>> I've described these aspects in more detail here:
>> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
>>
>> I don't see any negative aspects to it. Do you see any? Because if you
>> see at least some value in at least on these points and don't see any
>> negative aspects, then it is worth doing.
>
> Consider the converse suggestion; everywhere I even smell race I add
> READ_ONCE/WRITE_ONCE. Kernel code would become unreadable,
> (and slower on hot paths). Consequently, each change will need to be
> justified, and thus, we've arrived where we started:  analyzing each
> race on its own merits and fixing real races.


Thanks for the detailed explanation, Peter!


-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-09-17 11:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-01 20:11 [PATCH] tty: fix data race in flush_to_ldisc Dmitry Vyukov
2015-09-02 14:18 ` Peter Hurley
2015-09-02 14:41   ` Dmitry Vyukov
2015-09-02 15:28     ` Peter Hurley
2015-09-02 17:53       ` Dmitry Vyukov
2015-09-03  0:50         ` Peter Hurley
2015-09-04 19:28           ` Dmitry Vyukov
2015-09-16  2:54             ` Peter Hurley
2015-09-17 11:42               ` Dmitry Vyukov
2015-09-02 17:55       ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).