All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: Potential data race in psmouse_interrupt
       [not found] <CAAeHK+wZ=fXCRDUvkQdSmFWxU1fqN5suF=P1kL_CT1L8p3ybDg@mail.gmail.com>
@ 2015-07-22 13:14 ` Andrey Konovalov
  2015-07-22 16:52   ` Dmitry Torokhov
  2015-07-23 13:08   ` Pali Rohár
  0 siblings, 2 replies; 19+ messages in thread
From: Andrey Konovalov @ 2015-07-22 13:14 UTC (permalink / raw)
  To: Dmitry Torokhov, Hans de Goede, Mathias Gottschlag,
	Shailendra Verma, Rusty Russell, Luis R. Rodriguez,
	Thomas Hellstrom, linux-input, linux-kernel
  Cc: Dmitry Vyukov, Kostya Serebryany, Alexander Potapenko, ktsan

Hi!

We are working on a dynamic data race detector for Linux kernel called
KernelThreadSanitizer (ktsan) (https://github.com/google/ktsan/wiki).

Here is a report we got while running ktsan (upstream revision
21bdb584af8cca7c6df3c44cba268be050a234eb).

==================================================================
ThreadSanitizer: data-race in psmouse_interrupt

Write of size 1 by thread T1 (K1):
 [<ffffffff819d5272>] psmouse_interrupt+0xd2/0x690
drivers/input/mouse/psmouse-base.c:337
 [<ffffffff819bcd16>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
 [<ffffffff819be5f0>] i8042_interrupt+0x240/0x4d0
drivers/input/serio/i8042.c:541
 [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
 [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
 [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
 [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
include/linux/irqdesc.h:146
 [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
 [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
 [<ffffffff81e726ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
 [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
 [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
 [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
./arch/x86/include/asm/apic.h:655
 [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
arch/x86/kernel/apic/apic.c:915
 [<ffffffff81e7299b>] apic_timer_interrupt+0x6b/0x70
arch/x86/entry/entry_64.S:782
 [<ffffffff8153223d>] kvasprintf+0x5d/0x90 lib/kasprintf.c:24
 [<ffffffff815322be>] kasprintf+0x4e/0x70 lib/kasprintf.c:40
 [<     inlined    >] module_add_driver+0x8f/0x180 make_driver_name
drivers/base/module.c:18
 [<ffffffff8183059f>] module_add_driver+0x8f/0x180 drivers/base/module.c:63
 [<ffffffff81812c94>] bus_add_driver+0x204/0x340 drivers/base/bus.c:713
 [<ffffffff81814e2b>] driver_register+0xdb/0x190 drivers/base/driver.c:168
 [<ffffffff81a5ed16>] __hid_register_driver+0xb6/0x100
drivers/hid/hid-core.c:2677
 [<ffffffff825ab67d>] apple_driver_init+0x24/0x31 drivers/hid/hid-apple.c:574
 [<ffffffff81000339>] do_one_initcall+0xb9/0x220 init/main.c:798
 [<     inlined    >] kernel_init_freeable+0x2ca/0x397
do_initcall_level init/main.c:863
 [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_initcalls
init/main.c:871
 [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_basic_setup
init/main.c:890
 [<ffffffff8253d919>] kernel_init_freeable+0x2ca/0x397 init/main.c:1013
 [<ffffffff81e5ad66>] kernel_init+0x16/0x140 init/main.c:941
 [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = ffff88053fd9fe68
DBG: cpu id = 3

Previous read of size 1 by thread T509 (K506):
 [<     inlined    >] cypress_send_ext_cmd+0x193/0x500
cypress_ps2_read_cmd_status drivers/input/mouse/cypress_ps2.c:123
 [<ffffffff819e6b23>] cypress_send_ext_cmd+0x193/0x500
drivers/input/mouse/cypress_ps2.c:192
 [<ffffffff819e6f5a>] cypress_detect+0x2a/0xb0
drivers/input/mouse/cypress_ps2.c:208
 [<ffffffff819d5e66>] psmouse_extensions+0x1f6/0x4e0
drivers/input/mouse/psmouse-base.c:849 (discriminator 1)
 [<ffffffff819d63fd>] psmouse_switch_protocol+0x2ad/0x330
drivers/input/mouse/psmouse-base.c:1467
 [<ffffffff819d7426>] psmouse_connect+0x2e6/0x450
drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
 [<     inlined    >] serio_driver_probe+0x57/0x80
serio_connect_driver drivers/input/serio/serio.c:65
 [<ffffffff819bcf77>] serio_driver_probe+0x57/0x80
drivers/input/serio/serio.c:796
 [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
drivers/base/dd.c:308
 [<ffffffff81813b1a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
 [<ffffffff81813e64>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
 [<ffffffff81810933>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
 [<ffffffff81813353>] driver_attach+0x33/0x50 drivers/base/dd.c:631
 [<     inlined    >] serio_handle_event+0x2d3/0x3b0
serio_attach_driver drivers/input/serio/serio.c:826
 [<ffffffff819bd8e3>] serio_handle_event+0x2d3/0x3b0
drivers/input/serio/serio.c:245
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = 0

DBG: addr: ffff88019d703909
DBG: first offset: 1, second offset: 1
DBG: T1 clock: {T1: 17075020, T509: 142163}
DBG: T509 clock: {T509: 142167}
==================================================================

It seems that accessing 'psmouse->pktcnt' in
'psmouse->packet[psmouse->pktcnt++] = data' in 'psmouse_interrupt()'
is racing with 'wait_event_timeout(ps2dev->wait, (psmouse->pktcnt >=
pktsize), msecs_to_jiffies(CYTP_CMD_TIMEOUT))' in
'cypress_ps2_read_cmd_status()'.
If that's the case, it's possible that 'psmouse->pktcnt' is
incremented, but no data is written yet, so '(psmouse->pktcnt >=
pktsize)' will evaluate to true and the following 'memcpy' will access
invalid data.

Could you confirm if this is a real data race?

Thanks in advance!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fwd: Potential data race in psmouse_interrupt
  2015-07-22 13:14 ` Fwd: Potential data race in psmouse_interrupt Andrey Konovalov
@ 2015-07-22 16:52   ` Dmitry Torokhov
  2015-07-23 13:08   ` Pali Rohár
  1 sibling, 0 replies; 19+ messages in thread
From: Dmitry Torokhov @ 2015-07-22 16:52 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Hans de Goede, Mathias Gottschlag, Shailendra Verma,
	Rusty Russell, Luis R. Rodriguez, Thomas Hellstrom, linux-input,
	linux-kernel, Dmitry Vyukov, Kostya Serebryany,
	Alexander Potapenko, ktsan

On Wed, Jul 22, 2015 at 03:14:42PM +0200, Andrey Konovalov wrote:
> Hi!
> 
> We are working on a dynamic data race detector for Linux kernel called
> KernelThreadSanitizer (ktsan) (https://github.com/google/ktsan/wiki).
> 
> Here is a report we got while running ktsan (upstream revision
> 21bdb584af8cca7c6df3c44cba268be050a234eb).
> 
> ==================================================================
> ThreadSanitizer: data-race in psmouse_interrupt
> 
> Write of size 1 by thread T1 (K1):
>  [<ffffffff819d5272>] psmouse_interrupt+0xd2/0x690
> drivers/input/mouse/psmouse-base.c:337
>  [<ffffffff819bcd16>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
>  [<ffffffff819be5f0>] i8042_interrupt+0x240/0x4d0
> drivers/input/serio/i8042.c:541
>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
> include/linux/irqdesc.h:146
>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
>  [<ffffffff81e726ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
>  [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
>  [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
>  [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
> ./arch/x86/include/asm/apic.h:655
>  [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
> arch/x86/kernel/apic/apic.c:915
>  [<ffffffff81e7299b>] apic_timer_interrupt+0x6b/0x70
> arch/x86/entry/entry_64.S:782
>  [<ffffffff8153223d>] kvasprintf+0x5d/0x90 lib/kasprintf.c:24
>  [<ffffffff815322be>] kasprintf+0x4e/0x70 lib/kasprintf.c:40
>  [<     inlined    >] module_add_driver+0x8f/0x180 make_driver_name
> drivers/base/module.c:18
>  [<ffffffff8183059f>] module_add_driver+0x8f/0x180 drivers/base/module.c:63
>  [<ffffffff81812c94>] bus_add_driver+0x204/0x340 drivers/base/bus.c:713
>  [<ffffffff81814e2b>] driver_register+0xdb/0x190 drivers/base/driver.c:168
>  [<ffffffff81a5ed16>] __hid_register_driver+0xb6/0x100
> drivers/hid/hid-core.c:2677
>  [<ffffffff825ab67d>] apple_driver_init+0x24/0x31 drivers/hid/hid-apple.c:574
>  [<ffffffff81000339>] do_one_initcall+0xb9/0x220 init/main.c:798
>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397
> do_initcall_level init/main.c:863
>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_initcalls
> init/main.c:871
>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_basic_setup
> init/main.c:890
>  [<ffffffff8253d919>] kernel_init_freeable+0x2ca/0x397 init/main.c:1013
>  [<ffffffff81e5ad66>] kernel_init+0x16/0x140 init/main.c:941
>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = ffff88053fd9fe68
> DBG: cpu id = 3
> 
> Previous read of size 1 by thread T509 (K506):
>  [<     inlined    >] cypress_send_ext_cmd+0x193/0x500
> cypress_ps2_read_cmd_status drivers/input/mouse/cypress_ps2.c:123
>  [<ffffffff819e6b23>] cypress_send_ext_cmd+0x193/0x500
> drivers/input/mouse/cypress_ps2.c:192
>  [<ffffffff819e6f5a>] cypress_detect+0x2a/0xb0
> drivers/input/mouse/cypress_ps2.c:208
>  [<ffffffff819d5e66>] psmouse_extensions+0x1f6/0x4e0
> drivers/input/mouse/psmouse-base.c:849 (discriminator 1)
>  [<ffffffff819d63fd>] psmouse_switch_protocol+0x2ad/0x330
> drivers/input/mouse/psmouse-base.c:1467
>  [<ffffffff819d7426>] psmouse_connect+0x2e6/0x450
> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
>  [<     inlined    >] serio_driver_probe+0x57/0x80
> serio_connect_driver drivers/input/serio/serio.c:65
>  [<ffffffff819bcf77>] serio_driver_probe+0x57/0x80
> drivers/input/serio/serio.c:796
>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
> drivers/base/dd.c:308
>  [<ffffffff81813b1a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
>  [<ffffffff81813e64>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
>  [<ffffffff81810933>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
>  [<ffffffff81813353>] driver_attach+0x33/0x50 drivers/base/dd.c:631
>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
> serio_attach_driver drivers/input/serio/serio.c:826
>  [<ffffffff819bd8e3>] serio_handle_event+0x2d3/0x3b0
> drivers/input/serio/serio.c:245
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = 0
> 
> DBG: addr: ffff88019d703909
> DBG: first offset: 1, second offset: 1
> DBG: T1 clock: {T1: 17075020, T509: 142163}
> DBG: T509 clock: {T509: 142167}
> ==================================================================
> 
> It seems that accessing 'psmouse->pktcnt' in
> 'psmouse->packet[psmouse->pktcnt++] = data' in 'psmouse_interrupt()'
> is racing with 'wait_event_timeout(ps2dev->wait, (psmouse->pktcnt >=
> pktsize), msecs_to_jiffies(CYTP_CMD_TIMEOUT))' in
> 'cypress_ps2_read_cmd_status()'.
> If that's the case, it's possible that 'psmouse->pktcnt' is
> incremented, but no data is written yet, so '(psmouse->pktcnt >=
> pktsize)' will evaluate to true and the following 'memcpy' will access
> invalid data.
> 
> Could you confirm if this is a real data race?

Yes, looks like real positive.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-07-22 13:14 ` Fwd: Potential data race in psmouse_interrupt Andrey Konovalov
  2015-07-22 16:52   ` Dmitry Torokhov
@ 2015-07-23 13:08   ` Pali Rohár
  2015-07-23 13:31       ` Andrey Konovalov
  1 sibling, 1 reply; 19+ messages in thread
From: Pali Rohár @ 2015-07-23 13:08 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Dmitry Torokhov, Hans de Goede, Mathias Gottschlag,
	Shailendra Verma, Rusty Russell, Luis R. Rodriguez,
	Thomas Hellstrom, linux-input, linux-kernel, Dmitry Vyukov,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Wednesday 22 July 2015 15:14:42 Andrey Konovalov wrote:
> Hi!
> 
> We are working on a dynamic data race detector for Linux kernel called
> KernelThreadSanitizer (ktsan) (https://github.com/google/ktsan/wiki).
> 
> Here is a report we got while running ktsan (upstream revision
> 21bdb584af8cca7c6df3c44cba268be050a234eb).
> 
> ==================================================================
> ThreadSanitizer: data-race in psmouse_interrupt
> 
> Write of size 1 by thread T1 (K1):
>  [<ffffffff819d5272>] psmouse_interrupt+0xd2/0x690
> drivers/input/mouse/psmouse-base.c:337
>  [<ffffffff819bcd16>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
>  [<ffffffff819be5f0>] i8042_interrupt+0x240/0x4d0
> drivers/input/serio/i8042.c:541
>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
> include/linux/irqdesc.h:146
>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
>  [<ffffffff81e726ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
>  [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
>  [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
>  [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
> ./arch/x86/include/asm/apic.h:655
>  [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
> arch/x86/kernel/apic/apic.c:915
>  [<ffffffff81e7299b>] apic_timer_interrupt+0x6b/0x70
> arch/x86/entry/entry_64.S:782
>  [<ffffffff8153223d>] kvasprintf+0x5d/0x90 lib/kasprintf.c:24
>  [<ffffffff815322be>] kasprintf+0x4e/0x70 lib/kasprintf.c:40
>  [<     inlined    >] module_add_driver+0x8f/0x180 make_driver_name
> drivers/base/module.c:18
>  [<ffffffff8183059f>] module_add_driver+0x8f/0x180 drivers/base/module.c:63
>  [<ffffffff81812c94>] bus_add_driver+0x204/0x340 drivers/base/bus.c:713
>  [<ffffffff81814e2b>] driver_register+0xdb/0x190 drivers/base/driver.c:168
>  [<ffffffff81a5ed16>] __hid_register_driver+0xb6/0x100
> drivers/hid/hid-core.c:2677
>  [<ffffffff825ab67d>] apple_driver_init+0x24/0x31 drivers/hid/hid-apple.c:574
>  [<ffffffff81000339>] do_one_initcall+0xb9/0x220 init/main.c:798
>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397
> do_initcall_level init/main.c:863
>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_initcalls
> init/main.c:871
>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_basic_setup
> init/main.c:890
>  [<ffffffff8253d919>] kernel_init_freeable+0x2ca/0x397 init/main.c:1013
>  [<ffffffff81e5ad66>] kernel_init+0x16/0x140 init/main.c:941
>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = ffff88053fd9fe68
> DBG: cpu id = 3
> 
> Previous read of size 1 by thread T509 (K506):
>  [<     inlined    >] cypress_send_ext_cmd+0x193/0x500
> cypress_ps2_read_cmd_status drivers/input/mouse/cypress_ps2.c:123
>  [<ffffffff819e6b23>] cypress_send_ext_cmd+0x193/0x500
> drivers/input/mouse/cypress_ps2.c:192
>  [<ffffffff819e6f5a>] cypress_detect+0x2a/0xb0
> drivers/input/mouse/cypress_ps2.c:208
>  [<ffffffff819d5e66>] psmouse_extensions+0x1f6/0x4e0
> drivers/input/mouse/psmouse-base.c:849 (discriminator 1)
>  [<ffffffff819d63fd>] psmouse_switch_protocol+0x2ad/0x330
> drivers/input/mouse/psmouse-base.c:1467
>  [<ffffffff819d7426>] psmouse_connect+0x2e6/0x450
> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
>  [<     inlined    >] serio_driver_probe+0x57/0x80
> serio_connect_driver drivers/input/serio/serio.c:65
>  [<ffffffff819bcf77>] serio_driver_probe+0x57/0x80
> drivers/input/serio/serio.c:796
>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
> drivers/base/dd.c:308
>  [<ffffffff81813b1a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
>  [<ffffffff81813e64>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
>  [<ffffffff81810933>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
>  [<ffffffff81813353>] driver_attach+0x33/0x50 drivers/base/dd.c:631
>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
> serio_attach_driver drivers/input/serio/serio.c:826
>  [<ffffffff819bd8e3>] serio_handle_event+0x2d3/0x3b0
> drivers/input/serio/serio.c:245
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = 0
> 
> DBG: addr: ffff88019d703909
> DBG: first offset: 1, second offset: 1
> DBG: T1 clock: {T1: 17075020, T509: 142163}
> DBG: T509 clock: {T509: 142167}
> ==================================================================
> 
> It seems that accessing 'psmouse->pktcnt' in
> 'psmouse->packet[psmouse->pktcnt++] = data' in 'psmouse_interrupt()'
> is racing with 'wait_event_timeout(ps2dev->wait, (psmouse->pktcnt >=
> pktsize), msecs_to_jiffies(CYTP_CMD_TIMEOUT))' in
> 'cypress_ps2_read_cmd_status()'.
> If that's the case, it's possible that 'psmouse->pktcnt' is
> incremented, but no data is written yet, so '(psmouse->pktcnt >=
> pktsize)' will evaluate to true and the following 'memcpy' will access
> invalid data.
> 
> Could you confirm if this is a real data race?
> 
> Thanks in advance!
> 

Hi Andrey,

I would like to ask you, did not you noticed some problem from Sanitizer
with psmouse-base.c and alps.c code?

There is unknown problem when alps.c receive invalid data from
psmouse_interrupt/serio_interrupt/i8042_interrupt.

And when I read your email with race condition & accesing invalid data
in cypress driver. I'm thinking if there cannot be something like that.

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-07-23 13:08   ` Pali Rohár
@ 2015-07-23 13:31       ` Andrey Konovalov
  0 siblings, 0 replies; 19+ messages in thread
From: Andrey Konovalov @ 2015-07-23 13:31 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Dmitry Torokhov, Hans de Goede, Mathias Gottschlag,
	Shailendra Verma, Rusty Russell, Luis R. Rodriguez,
	Thomas Hellstrom, linux-input, linux-kernel, Dmitry Vyukov,
	Kostya Serebryany, Alexander Potapenko, ktsan

Hi Pali,

There are a few reports in psmouse-base.c/alps.c code as well.
Here is a couple of them that are printed one right after another.
The second one tells about a race on ps2dev->cmdcnt.

==================================================================
ThreadSanitizer: data-race in ps2_handle_ack

Write of size 8 by thread T6 (K6):
 [<ffffffff819c0b83>] ps2_handle_ack+0xf3/0x180 drivers/input/serio/libps2.c:322
 [<ffffffff819d55d7>] psmouse_interrupt+0x207/0x690
drivers/input/mouse/psmouse-base.c:317
 [<ffffffff819bcf46>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
 [<ffffffff819be820>] i8042_interrupt+0x240/0x4d0
drivers/input/serio/i8042.c:541
 [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
 [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
 [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
 [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
include/linux/irqdesc.h:146
 [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
 [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
 [<ffffffff81e728ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
 [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
 [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
 [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
./arch/x86/include/asm/apic.h:655
 [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
arch/x86/kernel/apic/apic.c:915
 [<ffffffff81e72b9b>] apic_timer_interrupt+0x6b/0x70
arch/x86/entry/entry_64.S:782
 [<ffffffff81086fe3>] _do_fork+0x133/0x500 kernel/fork.c:1719
 [<ffffffff810873ec>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
 [<ffffffff810a71dc>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = ffff88053fd9fe68
DBG: cpu id = 3

Previous read of size 8 by thread T397 (K394):
 [<ffffffff819c0cf2>] ps2_sendbyte+0xe2/0x210 drivers/input/serio/libps2.c:45
 [<ffffffff819c1109>] __ps2_command+0x159/0x660 drivers/input/serio/libps2.c:214
 [<ffffffff819c1645>] ps2_command+0x35/0x60 drivers/input/serio/libps2.c:260
 [<ffffffff819dd412>] alps_rpt_cmd+0x42/0xe0
drivers/input/mouse/alps.c:1665 (discriminator 1)
 [<ffffffff819e12b6>] alps_identify+0x36/0xc80 drivers/input/mouse/alps.c:2690
 [<ffffffff819e3498>] alps_detect+0x28/0x150 drivers/input/mouse/alps.c:2960
 [<     inlined    >] psmouse_extensions+0x25e/0x4e0 psmouse_do_detect
drivers/input/mouse/psmouse-base.c:752
 [<ffffffff819d60fe>] psmouse_extensions+0x25e/0x4e0
drivers/input/mouse/psmouse-base.c:870
 [<ffffffff819d662d>] psmouse_switch_protocol+0x2ad/0x330
drivers/input/mouse/psmouse-base.c:1467
 [<ffffffff819d7656>] psmouse_connect+0x2e6/0x450
drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
 [<     inlined    >] serio_driver_probe+0x57/0x80
serio_connect_driver drivers/input/serio/serio.c:65
 [<ffffffff819bd1a7>] serio_driver_probe+0x57/0x80
drivers/input/serio/serio.c:796
 [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
drivers/base/dd.c:308
 [<ffffffff81813d4a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
 [<ffffffff81814094>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
 [<ffffffff81810b63>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
 [<ffffffff81813583>] driver_attach+0x33/0x50 drivers/base/dd.c:631
 [<     inlined    >] serio_handle_event+0x2d3/0x3b0
serio_attach_driver drivers/input/serio/serio.c:826
 [<ffffffff819bdb13>] serio_handle_event+0x2d3/0x3b0
drivers/input/serio/serio.c:245
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = 0

DBG: addr: ffff88019a923a58
DBG: first offset: 0, second offset: 0
DBG: T6 clock: {T6: 2264295, T397: 174118}
DBG: T397 clock: {T397: 174122}
==================================================================
==================================================================
ThreadSanitizer: data-race in __ps2_command

Read of size 1 by thread T397 (K394):
 [<ffffffff819c1220>] __ps2_command+0x270/0x660
drivers/input/serio/libps2.c:230 (discriminator 21)
 [<ffffffff819c1645>] ps2_command+0x35/0x60 drivers/input/serio/libps2.c:260
 [<ffffffff819dd47e>] alps_rpt_cmd+0xae/0xe0 drivers/input/mouse/alps.c:1674
 [<ffffffff819e12b6>] alps_identify+0x36/0xc80 drivers/input/mouse/alps.c:2690
 [<ffffffff819e3498>] alps_detect+0x28/0x150 drivers/input/mouse/alps.c:2960
 [<     inlined    >] psmouse_extensions+0x25e/0x4e0 psmouse_do_detect
drivers/input/mouse/psmouse-base.c:752
 [<ffffffff819d60fe>] psmouse_extensions+0x25e/0x4e0
drivers/input/mouse/psmouse-base.c:870
 [<ffffffff819d662d>] psmouse_switch_protocol+0x2ad/0x330
drivers/input/mouse/psmouse-base.c:1467
 [<ffffffff819d7656>] psmouse_connect+0x2e6/0x450
drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
 [<     inlined    >] serio_driver_probe+0x57/0x80
serio_connect_driver drivers/input/serio/serio.c:65
 [<ffffffff819bd1a7>] serio_driver_probe+0x57/0x80
drivers/input/serio/serio.c:796
 [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
drivers/base/dd.c:308
 [<ffffffff81813d4a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
 [<ffffffff81814094>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
 [<ffffffff81810b63>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
 [<ffffffff81813583>] driver_attach+0x33/0x50 drivers/base/dd.c:631
 [<     inlined    >] serio_handle_event+0x2d3/0x3b0
serio_attach_driver drivers/input/serio/serio.c:826
 [<ffffffff819bdb13>] serio_handle_event+0x2d3/0x3b0
drivers/input/serio/serio.c:245
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = ffff88053fc9fe68
DBG: cpu id = 1

Previous write of size 1 by thread T6 (K6):
 [<ffffffff819c09a6>] ps2_handle_response+0x46/0x130
drivers/input/serio/libps2.c:346
 [<ffffffff819d58a4>] psmouse_interrupt+0x4d4/0x690
drivers/input/mouse/psmouse-base.c:321
 [<ffffffff819bcf46>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
 [<ffffffff819be820>] i8042_interrupt+0x240/0x4d0
drivers/input/serio/i8042.c:541
 [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
 [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
 [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
 [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
include/linux/irqdesc.h:146
 [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
 [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
 [<ffffffff81e728ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
 [<ffffffff81e72b9b>] apic_timer_interrupt+0x6b/0x70
arch/x86/entry/entry_64.S:782
 [<     inlined    >] kvm_clock_get_cycles+0x3e/0x50 kvm_clock_read
arch/x86/kernel/kvmclock.c:85
 [<ffffffff8106e74e>] kvm_clock_get_cycles+0x3e/0x50
arch/x86/kernel/kvmclock.c:92
 [<     inlined    >] ktime_get_with_offset+0x99/0x180
timekeeping_get_delta kernel/time/timekeeping.c:210
 [<     inlined    >] ktime_get_with_offset+0x99/0x180
timekeeping_get_ns kernel/time/timekeeping.c:306
 [<ffffffff81114da9>] ktime_get_with_offset+0x99/0x180
kernel/time/timekeeping.c:720
 [<     inlined    >] copy_process.part.44+0x773/0x2c40
ktime_get_boottime include/linux/timekeeping.h:183
 [<     inlined    >] copy_process.part.44+0x773/0x2c40
ktime_get_boot_ns include/linux/timekeeping.h:214
 [<ffffffff810848b3>] copy_process.part.44+0x773/0x2c40 kernel/fork.c:1363
 [<     inlined    >] _do_fork+0x103/0x500 copy_process kernel/fork.c:1261
 [<ffffffff81086fb3>] _do_fork+0x103/0x500 kernel/fork.c:1709
 [<ffffffff810873ec>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
 [<ffffffff810a71dc>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = 0

DBG: addr: ffff88019a923a68
DBG: first offset: 0, second offset: 0
DBG: T397 clock: {T397: 175738, T6: 2266233}
DBG: T6 clock: {T6: 2266336}
==================================================================

On Thu, Jul 23, 2015 at 3:08 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> On Wednesday 22 July 2015 15:14:42 Andrey Konovalov wrote:
>> Hi!
>>
>> We are working on a dynamic data race detector for Linux kernel called
>> KernelThreadSanitizer (ktsan) (https://github.com/google/ktsan/wiki).
>>
>> Here is a report we got while running ktsan (upstream revision
>> 21bdb584af8cca7c6df3c44cba268be050a234eb).
>>
>> ==================================================================
>> ThreadSanitizer: data-race in psmouse_interrupt
>>
>> Write of size 1 by thread T1 (K1):
>>  [<ffffffff819d5272>] psmouse_interrupt+0xd2/0x690
>> drivers/input/mouse/psmouse-base.c:337
>>  [<ffffffff819bcd16>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
>>  [<ffffffff819be5f0>] i8042_interrupt+0x240/0x4d0
>> drivers/input/serio/i8042.c:541
>>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
>>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
>>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
>>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
>> include/linux/irqdesc.h:146
>>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
>>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
>>  [<ffffffff81e726ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
>>  [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
>>  [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
>>  [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
>> ./arch/x86/include/asm/apic.h:655
>>  [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
>> arch/x86/kernel/apic/apic.c:915
>>  [<ffffffff81e7299b>] apic_timer_interrupt+0x6b/0x70
>> arch/x86/entry/entry_64.S:782
>>  [<ffffffff8153223d>] kvasprintf+0x5d/0x90 lib/kasprintf.c:24
>>  [<ffffffff815322be>] kasprintf+0x4e/0x70 lib/kasprintf.c:40
>>  [<     inlined    >] module_add_driver+0x8f/0x180 make_driver_name
>> drivers/base/module.c:18
>>  [<ffffffff8183059f>] module_add_driver+0x8f/0x180 drivers/base/module.c:63
>>  [<ffffffff81812c94>] bus_add_driver+0x204/0x340 drivers/base/bus.c:713
>>  [<ffffffff81814e2b>] driver_register+0xdb/0x190 drivers/base/driver.c:168
>>  [<ffffffff81a5ed16>] __hid_register_driver+0xb6/0x100
>> drivers/hid/hid-core.c:2677
>>  [<ffffffff825ab67d>] apple_driver_init+0x24/0x31 drivers/hid/hid-apple.c:574
>>  [<ffffffff81000339>] do_one_initcall+0xb9/0x220 init/main.c:798
>>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397
>> do_initcall_level init/main.c:863
>>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_initcalls
>> init/main.c:871
>>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_basic_setup
>> init/main.c:890
>>  [<ffffffff8253d919>] kernel_init_freeable+0x2ca/0x397 init/main.c:1013
>>  [<ffffffff81e5ad66>] kernel_init+0x16/0x140 init/main.c:941
>>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
>> DBG: cpu = ffff88053fd9fe68
>> DBG: cpu id = 3
>>
>> Previous read of size 1 by thread T509 (K506):
>>  [<     inlined    >] cypress_send_ext_cmd+0x193/0x500
>> cypress_ps2_read_cmd_status drivers/input/mouse/cypress_ps2.c:123
>>  [<ffffffff819e6b23>] cypress_send_ext_cmd+0x193/0x500
>> drivers/input/mouse/cypress_ps2.c:192
>>  [<ffffffff819e6f5a>] cypress_detect+0x2a/0xb0
>> drivers/input/mouse/cypress_ps2.c:208
>>  [<ffffffff819d5e66>] psmouse_extensions+0x1f6/0x4e0
>> drivers/input/mouse/psmouse-base.c:849 (discriminator 1)
>>  [<ffffffff819d63fd>] psmouse_switch_protocol+0x2ad/0x330
>> drivers/input/mouse/psmouse-base.c:1467
>>  [<ffffffff819d7426>] psmouse_connect+0x2e6/0x450
>> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
>>  [<     inlined    >] serio_driver_probe+0x57/0x80
>> serio_connect_driver drivers/input/serio/serio.c:65
>>  [<ffffffff819bcf77>] serio_driver_probe+0x57/0x80
>> drivers/input/serio/serio.c:796
>>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
>> drivers/base/dd.c:308
>>  [<ffffffff81813b1a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
>>  [<ffffffff81813e64>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
>>  [<ffffffff81810933>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
>>  [<ffffffff81813353>] driver_attach+0x33/0x50 drivers/base/dd.c:631
>>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
>> serio_attach_driver drivers/input/serio/serio.c:826
>>  [<ffffffff819bd8e3>] serio_handle_event+0x2d3/0x3b0
>> drivers/input/serio/serio.c:245
>>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
>> DBG: cpu = 0
>>
>> DBG: addr: ffff88019d703909
>> DBG: first offset: 1, second offset: 1
>> DBG: T1 clock: {T1: 17075020, T509: 142163}
>> DBG: T509 clock: {T509: 142167}
>> ==================================================================
>>
>> It seems that accessing 'psmouse->pktcnt' in
>> 'psmouse->packet[psmouse->pktcnt++] = data' in 'psmouse_interrupt()'
>> is racing with 'wait_event_timeout(ps2dev->wait, (psmouse->pktcnt >=
>> pktsize), msecs_to_jiffies(CYTP_CMD_TIMEOUT))' in
>> 'cypress_ps2_read_cmd_status()'.
>> If that's the case, it's possible that 'psmouse->pktcnt' is
>> incremented, but no data is written yet, so '(psmouse->pktcnt >=
>> pktsize)' will evaluate to true and the following 'memcpy' will access
>> invalid data.
>>
>> Could you confirm if this is a real data race?
>>
>> Thanks in advance!
>>
>
> Hi Andrey,
>
> I would like to ask you, did not you noticed some problem from Sanitizer
> with psmouse-base.c and alps.c code?
>
> There is unknown problem when alps.c receive invalid data from
> psmouse_interrupt/serio_interrupt/i8042_interrupt.
>
> And when I read your email with race condition & accesing invalid data
> in cypress driver. I'm thinking if there cannot be something like that.
>
> --
> Pali Rohár
> pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
@ 2015-07-23 13:31       ` Andrey Konovalov
  0 siblings, 0 replies; 19+ messages in thread
From: Andrey Konovalov @ 2015-07-23 13:31 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Dmitry Torokhov, Hans de Goede, Mathias Gottschlag,
	Shailendra Verma, Rusty Russell, Luis R. Rodriguez,
	Thomas Hellstrom, linux-input, linux-kernel, Dmitry Vyukov,
	Kostya Serebryany, Alexander Potapenko, ktsan

Hi Pali,

There are a few reports in psmouse-base.c/alps.c code as well.
Here is a couple of them that are printed one right after another.
The second one tells about a race on ps2dev->cmdcnt.

==================================================================
ThreadSanitizer: data-race in ps2_handle_ack

Write of size 8 by thread T6 (K6):
 [<ffffffff819c0b83>] ps2_handle_ack+0xf3/0x180 drivers/input/serio/libps2.c:322
 [<ffffffff819d55d7>] psmouse_interrupt+0x207/0x690
drivers/input/mouse/psmouse-base.c:317
 [<ffffffff819bcf46>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
 [<ffffffff819be820>] i8042_interrupt+0x240/0x4d0
drivers/input/serio/i8042.c:541
 [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
 [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
 [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
 [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
include/linux/irqdesc.h:146
 [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
 [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
 [<ffffffff81e728ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
 [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
 [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
 [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
./arch/x86/include/asm/apic.h:655
 [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
arch/x86/kernel/apic/apic.c:915
 [<ffffffff81e72b9b>] apic_timer_interrupt+0x6b/0x70
arch/x86/entry/entry_64.S:782
 [<ffffffff81086fe3>] _do_fork+0x133/0x500 kernel/fork.c:1719
 [<ffffffff810873ec>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
 [<ffffffff810a71dc>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = ffff88053fd9fe68
DBG: cpu id = 3

Previous read of size 8 by thread T397 (K394):
 [<ffffffff819c0cf2>] ps2_sendbyte+0xe2/0x210 drivers/input/serio/libps2.c:45
 [<ffffffff819c1109>] __ps2_command+0x159/0x660 drivers/input/serio/libps2.c:214
 [<ffffffff819c1645>] ps2_command+0x35/0x60 drivers/input/serio/libps2.c:260
 [<ffffffff819dd412>] alps_rpt_cmd+0x42/0xe0
drivers/input/mouse/alps.c:1665 (discriminator 1)
 [<ffffffff819e12b6>] alps_identify+0x36/0xc80 drivers/input/mouse/alps.c:2690
 [<ffffffff819e3498>] alps_detect+0x28/0x150 drivers/input/mouse/alps.c:2960
 [<     inlined    >] psmouse_extensions+0x25e/0x4e0 psmouse_do_detect
drivers/input/mouse/psmouse-base.c:752
 [<ffffffff819d60fe>] psmouse_extensions+0x25e/0x4e0
drivers/input/mouse/psmouse-base.c:870
 [<ffffffff819d662d>] psmouse_switch_protocol+0x2ad/0x330
drivers/input/mouse/psmouse-base.c:1467
 [<ffffffff819d7656>] psmouse_connect+0x2e6/0x450
drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
 [<     inlined    >] serio_driver_probe+0x57/0x80
serio_connect_driver drivers/input/serio/serio.c:65
 [<ffffffff819bd1a7>] serio_driver_probe+0x57/0x80
drivers/input/serio/serio.c:796
 [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
drivers/base/dd.c:308
 [<ffffffff81813d4a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
 [<ffffffff81814094>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
 [<ffffffff81810b63>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
 [<ffffffff81813583>] driver_attach+0x33/0x50 drivers/base/dd.c:631
 [<     inlined    >] serio_handle_event+0x2d3/0x3b0
serio_attach_driver drivers/input/serio/serio.c:826
 [<ffffffff819bdb13>] serio_handle_event+0x2d3/0x3b0
drivers/input/serio/serio.c:245
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = 0

DBG: addr: ffff88019a923a58
DBG: first offset: 0, second offset: 0
DBG: T6 clock: {T6: 2264295, T397: 174118}
DBG: T397 clock: {T397: 174122}
==================================================================
==================================================================
ThreadSanitizer: data-race in __ps2_command

Read of size 1 by thread T397 (K394):
 [<ffffffff819c1220>] __ps2_command+0x270/0x660
drivers/input/serio/libps2.c:230 (discriminator 21)
 [<ffffffff819c1645>] ps2_command+0x35/0x60 drivers/input/serio/libps2.c:260
 [<ffffffff819dd47e>] alps_rpt_cmd+0xae/0xe0 drivers/input/mouse/alps.c:1674
 [<ffffffff819e12b6>] alps_identify+0x36/0xc80 drivers/input/mouse/alps.c:2690
 [<ffffffff819e3498>] alps_detect+0x28/0x150 drivers/input/mouse/alps.c:2960
 [<     inlined    >] psmouse_extensions+0x25e/0x4e0 psmouse_do_detect
drivers/input/mouse/psmouse-base.c:752
 [<ffffffff819d60fe>] psmouse_extensions+0x25e/0x4e0
drivers/input/mouse/psmouse-base.c:870
 [<ffffffff819d662d>] psmouse_switch_protocol+0x2ad/0x330
drivers/input/mouse/psmouse-base.c:1467
 [<ffffffff819d7656>] psmouse_connect+0x2e6/0x450
drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
 [<     inlined    >] serio_driver_probe+0x57/0x80
serio_connect_driver drivers/input/serio/serio.c:65
 [<ffffffff819bd1a7>] serio_driver_probe+0x57/0x80
drivers/input/serio/serio.c:796
 [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
drivers/base/dd.c:308
 [<ffffffff81813d4a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
 [<ffffffff81814094>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
 [<ffffffff81810b63>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
 [<ffffffff81813583>] driver_attach+0x33/0x50 drivers/base/dd.c:631
 [<     inlined    >] serio_handle_event+0x2d3/0x3b0
serio_attach_driver drivers/input/serio/serio.c:826
 [<ffffffff819bdb13>] serio_handle_event+0x2d3/0x3b0
drivers/input/serio/serio.c:245
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = ffff88053fc9fe68
DBG: cpu id = 1

Previous write of size 1 by thread T6 (K6):
 [<ffffffff819c09a6>] ps2_handle_response+0x46/0x130
drivers/input/serio/libps2.c:346
 [<ffffffff819d58a4>] psmouse_interrupt+0x4d4/0x690
drivers/input/mouse/psmouse-base.c:321
 [<ffffffff819bcf46>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
 [<ffffffff819be820>] i8042_interrupt+0x240/0x4d0
drivers/input/serio/i8042.c:541
 [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
 [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
 [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
 [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
include/linux/irqdesc.h:146
 [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
 [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
 [<ffffffff81e728ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
 [<ffffffff81e72b9b>] apic_timer_interrupt+0x6b/0x70
arch/x86/entry/entry_64.S:782
 [<     inlined    >] kvm_clock_get_cycles+0x3e/0x50 kvm_clock_read
arch/x86/kernel/kvmclock.c:85
 [<ffffffff8106e74e>] kvm_clock_get_cycles+0x3e/0x50
arch/x86/kernel/kvmclock.c:92
 [<     inlined    >] ktime_get_with_offset+0x99/0x180
timekeeping_get_delta kernel/time/timekeeping.c:210
 [<     inlined    >] ktime_get_with_offset+0x99/0x180
timekeeping_get_ns kernel/time/timekeeping.c:306
 [<ffffffff81114da9>] ktime_get_with_offset+0x99/0x180
kernel/time/timekeeping.c:720
 [<     inlined    >] copy_process.part.44+0x773/0x2c40
ktime_get_boottime include/linux/timekeeping.h:183
 [<     inlined    >] copy_process.part.44+0x773/0x2c40
ktime_get_boot_ns include/linux/timekeeping.h:214
 [<ffffffff810848b3>] copy_process.part.44+0x773/0x2c40 kernel/fork.c:1363
 [<     inlined    >] _do_fork+0x103/0x500 copy_process kernel/fork.c:1261
 [<ffffffff81086fb3>] _do_fork+0x103/0x500 kernel/fork.c:1709
 [<ffffffff810873ec>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
 [<ffffffff810a71dc>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
 [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
 [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
 [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
 [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
DBG: cpu = 0

DBG: addr: ffff88019a923a68
DBG: first offset: 0, second offset: 0
DBG: T397 clock: {T397: 175738, T6: 2266233}
DBG: T6 clock: {T6: 2266336}
==================================================================

On Thu, Jul 23, 2015 at 3:08 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> On Wednesday 22 July 2015 15:14:42 Andrey Konovalov wrote:
>> Hi!
>>
>> We are working on a dynamic data race detector for Linux kernel called
>> KernelThreadSanitizer (ktsan) (https://github.com/google/ktsan/wiki).
>>
>> Here is a report we got while running ktsan (upstream revision
>> 21bdb584af8cca7c6df3c44cba268be050a234eb).
>>
>> ==================================================================
>> ThreadSanitizer: data-race in psmouse_interrupt
>>
>> Write of size 1 by thread T1 (K1):
>>  [<ffffffff819d5272>] psmouse_interrupt+0xd2/0x690
>> drivers/input/mouse/psmouse-base.c:337
>>  [<ffffffff819bcd16>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
>>  [<ffffffff819be5f0>] i8042_interrupt+0x240/0x4d0
>> drivers/input/serio/i8042.c:541
>>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
>>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
>>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
>>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
>> include/linux/irqdesc.h:146
>>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
>>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
>>  [<ffffffff81e726ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
>>  [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
>>  [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
>>  [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
>> ./arch/x86/include/asm/apic.h:655
>>  [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
>> arch/x86/kernel/apic/apic.c:915
>>  [<ffffffff81e7299b>] apic_timer_interrupt+0x6b/0x70
>> arch/x86/entry/entry_64.S:782
>>  [<ffffffff8153223d>] kvasprintf+0x5d/0x90 lib/kasprintf.c:24
>>  [<ffffffff815322be>] kasprintf+0x4e/0x70 lib/kasprintf.c:40
>>  [<     inlined    >] module_add_driver+0x8f/0x180 make_driver_name
>> drivers/base/module.c:18
>>  [<ffffffff8183059f>] module_add_driver+0x8f/0x180 drivers/base/module.c:63
>>  [<ffffffff81812c94>] bus_add_driver+0x204/0x340 drivers/base/bus.c:713
>>  [<ffffffff81814e2b>] driver_register+0xdb/0x190 drivers/base/driver.c:168
>>  [<ffffffff81a5ed16>] __hid_register_driver+0xb6/0x100
>> drivers/hid/hid-core.c:2677
>>  [<ffffffff825ab67d>] apple_driver_init+0x24/0x31 drivers/hid/hid-apple.c:574
>>  [<ffffffff81000339>] do_one_initcall+0xb9/0x220 init/main.c:798
>>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397
>> do_initcall_level init/main.c:863
>>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_initcalls
>> init/main.c:871
>>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_basic_setup
>> init/main.c:890
>>  [<ffffffff8253d919>] kernel_init_freeable+0x2ca/0x397 init/main.c:1013
>>  [<ffffffff81e5ad66>] kernel_init+0x16/0x140 init/main.c:941
>>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
>> DBG: cpu = ffff88053fd9fe68
>> DBG: cpu id = 3
>>
>> Previous read of size 1 by thread T509 (K506):
>>  [<     inlined    >] cypress_send_ext_cmd+0x193/0x500
>> cypress_ps2_read_cmd_status drivers/input/mouse/cypress_ps2.c:123
>>  [<ffffffff819e6b23>] cypress_send_ext_cmd+0x193/0x500
>> drivers/input/mouse/cypress_ps2.c:192
>>  [<ffffffff819e6f5a>] cypress_detect+0x2a/0xb0
>> drivers/input/mouse/cypress_ps2.c:208
>>  [<ffffffff819d5e66>] psmouse_extensions+0x1f6/0x4e0
>> drivers/input/mouse/psmouse-base.c:849 (discriminator 1)
>>  [<ffffffff819d63fd>] psmouse_switch_protocol+0x2ad/0x330
>> drivers/input/mouse/psmouse-base.c:1467
>>  [<ffffffff819d7426>] psmouse_connect+0x2e6/0x450
>> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
>>  [<     inlined    >] serio_driver_probe+0x57/0x80
>> serio_connect_driver drivers/input/serio/serio.c:65
>>  [<ffffffff819bcf77>] serio_driver_probe+0x57/0x80
>> drivers/input/serio/serio.c:796
>>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
>> drivers/base/dd.c:308
>>  [<ffffffff81813b1a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
>>  [<ffffffff81813e64>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
>>  [<ffffffff81810933>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
>>  [<ffffffff81813353>] driver_attach+0x33/0x50 drivers/base/dd.c:631
>>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
>> serio_attach_driver drivers/input/serio/serio.c:826
>>  [<ffffffff819bd8e3>] serio_handle_event+0x2d3/0x3b0
>> drivers/input/serio/serio.c:245
>>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
>> DBG: cpu = 0
>>
>> DBG: addr: ffff88019d703909
>> DBG: first offset: 1, second offset: 1
>> DBG: T1 clock: {T1: 17075020, T509: 142163}
>> DBG: T509 clock: {T509: 142167}
>> ==================================================================
>>
>> It seems that accessing 'psmouse->pktcnt' in
>> 'psmouse->packet[psmouse->pktcnt++] = data' in 'psmouse_interrupt()'
>> is racing with 'wait_event_timeout(ps2dev->wait, (psmouse->pktcnt >=
>> pktsize), msecs_to_jiffies(CYTP_CMD_TIMEOUT))' in
>> 'cypress_ps2_read_cmd_status()'.
>> If that's the case, it's possible that 'psmouse->pktcnt' is
>> incremented, but no data is written yet, so '(psmouse->pktcnt >=
>> pktsize)' will evaluate to true and the following 'memcpy' will access
>> invalid data.
>>
>> Could you confirm if this is a real data race?
>>
>> Thanks in advance!
>>
>
> Hi Andrey,
>
> I would like to ask you, did not you noticed some problem from Sanitizer
> with psmouse-base.c and alps.c code?
>
> There is unknown problem when alps.c receive invalid data from
> psmouse_interrupt/serio_interrupt/i8042_interrupt.
>
> And when I read your email with race condition & accesing invalid data
> in cypress driver. I'm thinking if there cannot be something like that.
>
> --
> Pali Rohár
> pali.rohar@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-07-23 13:31       ` Andrey Konovalov
@ 2015-07-29 11:53         ` Pali Rohár
  -1 siblings, 0 replies; 19+ messages in thread
From: Pali Rohár @ 2015-07-29 11:53 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Dmitry Torokhov, Hans de Goede, Mathias Gottschlag,
	Shailendra Verma, Rusty Russell, Luis R. Rodriguez,
	Thomas Hellstrom, linux-input, linux-kernel, Dmitry Vyukov,
	Kostya Serebryany, Alexander Potapenko, ktsan

Hi! This log looks like there can be race condition in initialization
code, but not in receiving ALPS packets after initialization...

On Thursday 23 July 2015 15:31:46 Andrey Konovalov wrote:
> Hi Pali,
> 
> There are a few reports in psmouse-base.c/alps.c code as well.
> Here is a couple of them that are printed one right after another.
> The second one tells about a race on ps2dev->cmdcnt.
> 
> ==================================================================
> ThreadSanitizer: data-race in ps2_handle_ack
> 
> Write of size 8 by thread T6 (K6):
>  [<ffffffff819c0b83>] ps2_handle_ack+0xf3/0x180 drivers/input/serio/libps2.c:322
>  [<ffffffff819d55d7>] psmouse_interrupt+0x207/0x690
> drivers/input/mouse/psmouse-base.c:317
>  [<ffffffff819bcf46>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
>  [<ffffffff819be820>] i8042_interrupt+0x240/0x4d0
> drivers/input/serio/i8042.c:541
>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
> include/linux/irqdesc.h:146
>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
>  [<ffffffff81e728ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
>  [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
>  [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
>  [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
> ./arch/x86/include/asm/apic.h:655
>  [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
> arch/x86/kernel/apic/apic.c:915
>  [<ffffffff81e72b9b>] apic_timer_interrupt+0x6b/0x70
> arch/x86/entry/entry_64.S:782
>  [<ffffffff81086fe3>] _do_fork+0x133/0x500 kernel/fork.c:1719
>  [<ffffffff810873ec>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
>  [<ffffffff810a71dc>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = ffff88053fd9fe68
> DBG: cpu id = 3
> 
> Previous read of size 8 by thread T397 (K394):
>  [<ffffffff819c0cf2>] ps2_sendbyte+0xe2/0x210 drivers/input/serio/libps2.c:45
>  [<ffffffff819c1109>] __ps2_command+0x159/0x660 drivers/input/serio/libps2.c:214
>  [<ffffffff819c1645>] ps2_command+0x35/0x60 drivers/input/serio/libps2.c:260
>  [<ffffffff819dd412>] alps_rpt_cmd+0x42/0xe0
> drivers/input/mouse/alps.c:1665 (discriminator 1)
>  [<ffffffff819e12b6>] alps_identify+0x36/0xc80 drivers/input/mouse/alps.c:2690
>  [<ffffffff819e3498>] alps_detect+0x28/0x150 drivers/input/mouse/alps.c:2960
>  [<     inlined    >] psmouse_extensions+0x25e/0x4e0 psmouse_do_detect
> drivers/input/mouse/psmouse-base.c:752
>  [<ffffffff819d60fe>] psmouse_extensions+0x25e/0x4e0
> drivers/input/mouse/psmouse-base.c:870
>  [<ffffffff819d662d>] psmouse_switch_protocol+0x2ad/0x330
> drivers/input/mouse/psmouse-base.c:1467
>  [<ffffffff819d7656>] psmouse_connect+0x2e6/0x450
> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
>  [<     inlined    >] serio_driver_probe+0x57/0x80
> serio_connect_driver drivers/input/serio/serio.c:65
>  [<ffffffff819bd1a7>] serio_driver_probe+0x57/0x80
> drivers/input/serio/serio.c:796
>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
> drivers/base/dd.c:308
>  [<ffffffff81813d4a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
>  [<ffffffff81814094>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
>  [<ffffffff81810b63>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
>  [<ffffffff81813583>] driver_attach+0x33/0x50 drivers/base/dd.c:631
>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
> serio_attach_driver drivers/input/serio/serio.c:826
>  [<ffffffff819bdb13>] serio_handle_event+0x2d3/0x3b0
> drivers/input/serio/serio.c:245
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = 0
> 
> DBG: addr: ffff88019a923a58
> DBG: first offset: 0, second offset: 0
> DBG: T6 clock: {T6: 2264295, T397: 174118}
> DBG: T397 clock: {T397: 174122}
> ==================================================================
> ==================================================================
> ThreadSanitizer: data-race in __ps2_command
> 
> Read of size 1 by thread T397 (K394):
>  [<ffffffff819c1220>] __ps2_command+0x270/0x660
> drivers/input/serio/libps2.c:230 (discriminator 21)
>  [<ffffffff819c1645>] ps2_command+0x35/0x60 drivers/input/serio/libps2.c:260
>  [<ffffffff819dd47e>] alps_rpt_cmd+0xae/0xe0 drivers/input/mouse/alps.c:1674
>  [<ffffffff819e12b6>] alps_identify+0x36/0xc80 drivers/input/mouse/alps.c:2690
>  [<ffffffff819e3498>] alps_detect+0x28/0x150 drivers/input/mouse/alps.c:2960
>  [<     inlined    >] psmouse_extensions+0x25e/0x4e0 psmouse_do_detect
> drivers/input/mouse/psmouse-base.c:752
>  [<ffffffff819d60fe>] psmouse_extensions+0x25e/0x4e0
> drivers/input/mouse/psmouse-base.c:870
>  [<ffffffff819d662d>] psmouse_switch_protocol+0x2ad/0x330
> drivers/input/mouse/psmouse-base.c:1467
>  [<ffffffff819d7656>] psmouse_connect+0x2e6/0x450
> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
>  [<     inlined    >] serio_driver_probe+0x57/0x80
> serio_connect_driver drivers/input/serio/serio.c:65
>  [<ffffffff819bd1a7>] serio_driver_probe+0x57/0x80
> drivers/input/serio/serio.c:796
>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
> drivers/base/dd.c:308
>  [<ffffffff81813d4a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
>  [<ffffffff81814094>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
>  [<ffffffff81810b63>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
>  [<ffffffff81813583>] driver_attach+0x33/0x50 drivers/base/dd.c:631
>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
> serio_attach_driver drivers/input/serio/serio.c:826
>  [<ffffffff819bdb13>] serio_handle_event+0x2d3/0x3b0
> drivers/input/serio/serio.c:245
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = ffff88053fc9fe68
> DBG: cpu id = 1
> 
> Previous write of size 1 by thread T6 (K6):
>  [<ffffffff819c09a6>] ps2_handle_response+0x46/0x130
> drivers/input/serio/libps2.c:346
>  [<ffffffff819d58a4>] psmouse_interrupt+0x4d4/0x690
> drivers/input/mouse/psmouse-base.c:321
>  [<ffffffff819bcf46>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
>  [<ffffffff819be820>] i8042_interrupt+0x240/0x4d0
> drivers/input/serio/i8042.c:541
>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
> include/linux/irqdesc.h:146
>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
>  [<ffffffff81e728ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
>  [<ffffffff81e72b9b>] apic_timer_interrupt+0x6b/0x70
> arch/x86/entry/entry_64.S:782
>  [<     inlined    >] kvm_clock_get_cycles+0x3e/0x50 kvm_clock_read
> arch/x86/kernel/kvmclock.c:85
>  [<ffffffff8106e74e>] kvm_clock_get_cycles+0x3e/0x50
> arch/x86/kernel/kvmclock.c:92
>  [<     inlined    >] ktime_get_with_offset+0x99/0x180
> timekeeping_get_delta kernel/time/timekeeping.c:210
>  [<     inlined    >] ktime_get_with_offset+0x99/0x180
> timekeeping_get_ns kernel/time/timekeeping.c:306
>  [<ffffffff81114da9>] ktime_get_with_offset+0x99/0x180
> kernel/time/timekeeping.c:720
>  [<     inlined    >] copy_process.part.44+0x773/0x2c40
> ktime_get_boottime include/linux/timekeeping.h:183
>  [<     inlined    >] copy_process.part.44+0x773/0x2c40
> ktime_get_boot_ns include/linux/timekeeping.h:214
>  [<ffffffff810848b3>] copy_process.part.44+0x773/0x2c40 kernel/fork.c:1363
>  [<     inlined    >] _do_fork+0x103/0x500 copy_process kernel/fork.c:1261
>  [<ffffffff81086fb3>] _do_fork+0x103/0x500 kernel/fork.c:1709
>  [<ffffffff810873ec>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
>  [<ffffffff810a71dc>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = 0
> 
> DBG: addr: ffff88019a923a68
> DBG: first offset: 0, second offset: 0
> DBG: T397 clock: {T397: 175738, T6: 2266233}
> DBG: T6 clock: {T6: 2266336}
> ==================================================================
> 
> On Thu, Jul 23, 2015 at 3:08 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> > On Wednesday 22 July 2015 15:14:42 Andrey Konovalov wrote:
> >> Hi!
> >>
> >> We are working on a dynamic data race detector for Linux kernel called
> >> KernelThreadSanitizer (ktsan) (https://github.com/google/ktsan/wiki).
> >>
> >> Here is a report we got while running ktsan (upstream revision
> >> 21bdb584af8cca7c6df3c44cba268be050a234eb).
> >>
> >> ==================================================================
> >> ThreadSanitizer: data-race in psmouse_interrupt
> >>
> >> Write of size 1 by thread T1 (K1):
> >>  [<ffffffff819d5272>] psmouse_interrupt+0xd2/0x690
> >> drivers/input/mouse/psmouse-base.c:337
> >>  [<ffffffff819bcd16>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
> >>  [<ffffffff819be5f0>] i8042_interrupt+0x240/0x4d0
> >> drivers/input/serio/i8042.c:541
> >>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
> >>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
> >>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
> >>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
> >> include/linux/irqdesc.h:146
> >>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
> >>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
> >>  [<ffffffff81e726ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
> >>  [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
> >>  [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
> >>  [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
> >> ./arch/x86/include/asm/apic.h:655
> >>  [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
> >> arch/x86/kernel/apic/apic.c:915
> >>  [<ffffffff81e7299b>] apic_timer_interrupt+0x6b/0x70
> >> arch/x86/entry/entry_64.S:782
> >>  [<ffffffff8153223d>] kvasprintf+0x5d/0x90 lib/kasprintf.c:24
> >>  [<ffffffff815322be>] kasprintf+0x4e/0x70 lib/kasprintf.c:40
> >>  [<     inlined    >] module_add_driver+0x8f/0x180 make_driver_name
> >> drivers/base/module.c:18
> >>  [<ffffffff8183059f>] module_add_driver+0x8f/0x180 drivers/base/module.c:63
> >>  [<ffffffff81812c94>] bus_add_driver+0x204/0x340 drivers/base/bus.c:713
> >>  [<ffffffff81814e2b>] driver_register+0xdb/0x190 drivers/base/driver.c:168
> >>  [<ffffffff81a5ed16>] __hid_register_driver+0xb6/0x100
> >> drivers/hid/hid-core.c:2677
> >>  [<ffffffff825ab67d>] apple_driver_init+0x24/0x31 drivers/hid/hid-apple.c:574
> >>  [<ffffffff81000339>] do_one_initcall+0xb9/0x220 init/main.c:798
> >>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397
> >> do_initcall_level init/main.c:863
> >>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_initcalls
> >> init/main.c:871
> >>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_basic_setup
> >> init/main.c:890
> >>  [<ffffffff8253d919>] kernel_init_freeable+0x2ca/0x397 init/main.c:1013
> >>  [<ffffffff81e5ad66>] kernel_init+0x16/0x140 init/main.c:941
> >>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> >> DBG: cpu = ffff88053fd9fe68
> >> DBG: cpu id = 3
> >>
> >> Previous read of size 1 by thread T509 (K506):
> >>  [<     inlined    >] cypress_send_ext_cmd+0x193/0x500
> >> cypress_ps2_read_cmd_status drivers/input/mouse/cypress_ps2.c:123
> >>  [<ffffffff819e6b23>] cypress_send_ext_cmd+0x193/0x500
> >> drivers/input/mouse/cypress_ps2.c:192
> >>  [<ffffffff819e6f5a>] cypress_detect+0x2a/0xb0
> >> drivers/input/mouse/cypress_ps2.c:208
> >>  [<ffffffff819d5e66>] psmouse_extensions+0x1f6/0x4e0
> >> drivers/input/mouse/psmouse-base.c:849 (discriminator 1)
> >>  [<ffffffff819d63fd>] psmouse_switch_protocol+0x2ad/0x330
> >> drivers/input/mouse/psmouse-base.c:1467
> >>  [<ffffffff819d7426>] psmouse_connect+0x2e6/0x450
> >> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
> >>  [<     inlined    >] serio_driver_probe+0x57/0x80
> >> serio_connect_driver drivers/input/serio/serio.c:65
> >>  [<ffffffff819bcf77>] serio_driver_probe+0x57/0x80
> >> drivers/input/serio/serio.c:796
> >>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
> >> drivers/base/dd.c:308
> >>  [<ffffffff81813b1a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
> >>  [<ffffffff81813e64>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
> >>  [<ffffffff81810933>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
> >>  [<ffffffff81813353>] driver_attach+0x33/0x50 drivers/base/dd.c:631
> >>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
> >> serio_attach_driver drivers/input/serio/serio.c:826
> >>  [<ffffffff819bd8e3>] serio_handle_event+0x2d3/0x3b0
> >> drivers/input/serio/serio.c:245
> >>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
> >>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
> >>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
> >>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> >> DBG: cpu = 0
> >>
> >> DBG: addr: ffff88019d703909
> >> DBG: first offset: 1, second offset: 1
> >> DBG: T1 clock: {T1: 17075020, T509: 142163}
> >> DBG: T509 clock: {T509: 142167}
> >> ==================================================================
> >>
> >> It seems that accessing 'psmouse->pktcnt' in
> >> 'psmouse->packet[psmouse->pktcnt++] = data' in 'psmouse_interrupt()'
> >> is racing with 'wait_event_timeout(ps2dev->wait, (psmouse->pktcnt >=
> >> pktsize), msecs_to_jiffies(CYTP_CMD_TIMEOUT))' in
> >> 'cypress_ps2_read_cmd_status()'.
> >> If that's the case, it's possible that 'psmouse->pktcnt' is
> >> incremented, but no data is written yet, so '(psmouse->pktcnt >=
> >> pktsize)' will evaluate to true and the following 'memcpy' will access
> >> invalid data.
> >>
> >> Could you confirm if this is a real data race?
> >>
> >> Thanks in advance!
> >>
> >
> > Hi Andrey,
> >
> > I would like to ask you, did not you noticed some problem from Sanitizer
> > with psmouse-base.c and alps.c code?
> >
> > There is unknown problem when alps.c receive invalid data from
> > psmouse_interrupt/serio_interrupt/i8042_interrupt.
> >
> > And when I read your email with race condition & accesing invalid data
> > in cypress driver. I'm thinking if there cannot be something like that.
> >
> > --
> > Pali Rohár
> > pali.rohar@gmail.com

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
@ 2015-07-29 11:53         ` Pali Rohár
  0 siblings, 0 replies; 19+ messages in thread
From: Pali Rohár @ 2015-07-29 11:53 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Dmitry Torokhov, Hans de Goede, Mathias Gottschlag,
	Shailendra Verma, Rusty Russell, Luis R. Rodriguez,
	Thomas Hellstrom, linux-input, linux-kernel, Dmitry Vyukov,
	Kostya Serebryany, Alexander Potapenko, ktsan

Hi! This log looks like there can be race condition in initialization
code, but not in receiving ALPS packets after initialization...

On Thursday 23 July 2015 15:31:46 Andrey Konovalov wrote:
> Hi Pali,
> 
> There are a few reports in psmouse-base.c/alps.c code as well.
> Here is a couple of them that are printed one right after another.
> The second one tells about a race on ps2dev->cmdcnt.
> 
> ==================================================================
> ThreadSanitizer: data-race in ps2_handle_ack
> 
> Write of size 8 by thread T6 (K6):
>  [<ffffffff819c0b83>] ps2_handle_ack+0xf3/0x180 drivers/input/serio/libps2.c:322
>  [<ffffffff819d55d7>] psmouse_interrupt+0x207/0x690
> drivers/input/mouse/psmouse-base.c:317
>  [<ffffffff819bcf46>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
>  [<ffffffff819be820>] i8042_interrupt+0x240/0x4d0
> drivers/input/serio/i8042.c:541
>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
> include/linux/irqdesc.h:146
>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
>  [<ffffffff81e728ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
>  [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
>  [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
>  [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
> ./arch/x86/include/asm/apic.h:655
>  [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
> arch/x86/kernel/apic/apic.c:915
>  [<ffffffff81e72b9b>] apic_timer_interrupt+0x6b/0x70
> arch/x86/entry/entry_64.S:782
>  [<ffffffff81086fe3>] _do_fork+0x133/0x500 kernel/fork.c:1719
>  [<ffffffff810873ec>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
>  [<ffffffff810a71dc>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = ffff88053fd9fe68
> DBG: cpu id = 3
> 
> Previous read of size 8 by thread T397 (K394):
>  [<ffffffff819c0cf2>] ps2_sendbyte+0xe2/0x210 drivers/input/serio/libps2.c:45
>  [<ffffffff819c1109>] __ps2_command+0x159/0x660 drivers/input/serio/libps2.c:214
>  [<ffffffff819c1645>] ps2_command+0x35/0x60 drivers/input/serio/libps2.c:260
>  [<ffffffff819dd412>] alps_rpt_cmd+0x42/0xe0
> drivers/input/mouse/alps.c:1665 (discriminator 1)
>  [<ffffffff819e12b6>] alps_identify+0x36/0xc80 drivers/input/mouse/alps.c:2690
>  [<ffffffff819e3498>] alps_detect+0x28/0x150 drivers/input/mouse/alps.c:2960
>  [<     inlined    >] psmouse_extensions+0x25e/0x4e0 psmouse_do_detect
> drivers/input/mouse/psmouse-base.c:752
>  [<ffffffff819d60fe>] psmouse_extensions+0x25e/0x4e0
> drivers/input/mouse/psmouse-base.c:870
>  [<ffffffff819d662d>] psmouse_switch_protocol+0x2ad/0x330
> drivers/input/mouse/psmouse-base.c:1467
>  [<ffffffff819d7656>] psmouse_connect+0x2e6/0x450
> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
>  [<     inlined    >] serio_driver_probe+0x57/0x80
> serio_connect_driver drivers/input/serio/serio.c:65
>  [<ffffffff819bd1a7>] serio_driver_probe+0x57/0x80
> drivers/input/serio/serio.c:796
>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
> drivers/base/dd.c:308
>  [<ffffffff81813d4a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
>  [<ffffffff81814094>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
>  [<ffffffff81810b63>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
>  [<ffffffff81813583>] driver_attach+0x33/0x50 drivers/base/dd.c:631
>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
> serio_attach_driver drivers/input/serio/serio.c:826
>  [<ffffffff819bdb13>] serio_handle_event+0x2d3/0x3b0
> drivers/input/serio/serio.c:245
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = 0
> 
> DBG: addr: ffff88019a923a58
> DBG: first offset: 0, second offset: 0
> DBG: T6 clock: {T6: 2264295, T397: 174118}
> DBG: T397 clock: {T397: 174122}
> ==================================================================
> ==================================================================
> ThreadSanitizer: data-race in __ps2_command
> 
> Read of size 1 by thread T397 (K394):
>  [<ffffffff819c1220>] __ps2_command+0x270/0x660
> drivers/input/serio/libps2.c:230 (discriminator 21)
>  [<ffffffff819c1645>] ps2_command+0x35/0x60 drivers/input/serio/libps2.c:260
>  [<ffffffff819dd47e>] alps_rpt_cmd+0xae/0xe0 drivers/input/mouse/alps.c:1674
>  [<ffffffff819e12b6>] alps_identify+0x36/0xc80 drivers/input/mouse/alps.c:2690
>  [<ffffffff819e3498>] alps_detect+0x28/0x150 drivers/input/mouse/alps.c:2960
>  [<     inlined    >] psmouse_extensions+0x25e/0x4e0 psmouse_do_detect
> drivers/input/mouse/psmouse-base.c:752
>  [<ffffffff819d60fe>] psmouse_extensions+0x25e/0x4e0
> drivers/input/mouse/psmouse-base.c:870
>  [<ffffffff819d662d>] psmouse_switch_protocol+0x2ad/0x330
> drivers/input/mouse/psmouse-base.c:1467
>  [<ffffffff819d7656>] psmouse_connect+0x2e6/0x450
> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
>  [<     inlined    >] serio_driver_probe+0x57/0x80
> serio_connect_driver drivers/input/serio/serio.c:65
>  [<ffffffff819bd1a7>] serio_driver_probe+0x57/0x80
> drivers/input/serio/serio.c:796
>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
> drivers/base/dd.c:308
>  [<ffffffff81813d4a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
>  [<ffffffff81814094>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
>  [<ffffffff81810b63>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
>  [<ffffffff81813583>] driver_attach+0x33/0x50 drivers/base/dd.c:631
>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
> serio_attach_driver drivers/input/serio/serio.c:826
>  [<ffffffff819bdb13>] serio_handle_event+0x2d3/0x3b0
> drivers/input/serio/serio.c:245
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = ffff88053fc9fe68
> DBG: cpu id = 1
> 
> Previous write of size 1 by thread T6 (K6):
>  [<ffffffff819c09a6>] ps2_handle_response+0x46/0x130
> drivers/input/serio/libps2.c:346
>  [<ffffffff819d58a4>] psmouse_interrupt+0x4d4/0x690
> drivers/input/mouse/psmouse-base.c:321
>  [<ffffffff819bcf46>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
>  [<ffffffff819be820>] i8042_interrupt+0x240/0x4d0
> drivers/input/serio/i8042.c:541
>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
> include/linux/irqdesc.h:146
>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
>  [<ffffffff81e728ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
>  [<ffffffff81e72b9b>] apic_timer_interrupt+0x6b/0x70
> arch/x86/entry/entry_64.S:782
>  [<     inlined    >] kvm_clock_get_cycles+0x3e/0x50 kvm_clock_read
> arch/x86/kernel/kvmclock.c:85
>  [<ffffffff8106e74e>] kvm_clock_get_cycles+0x3e/0x50
> arch/x86/kernel/kvmclock.c:92
>  [<     inlined    >] ktime_get_with_offset+0x99/0x180
> timekeeping_get_delta kernel/time/timekeeping.c:210
>  [<     inlined    >] ktime_get_with_offset+0x99/0x180
> timekeeping_get_ns kernel/time/timekeeping.c:306
>  [<ffffffff81114da9>] ktime_get_with_offset+0x99/0x180
> kernel/time/timekeeping.c:720
>  [<     inlined    >] copy_process.part.44+0x773/0x2c40
> ktime_get_boottime include/linux/timekeeping.h:183
>  [<     inlined    >] copy_process.part.44+0x773/0x2c40
> ktime_get_boot_ns include/linux/timekeeping.h:214
>  [<ffffffff810848b3>] copy_process.part.44+0x773/0x2c40 kernel/fork.c:1363
>  [<     inlined    >] _do_fork+0x103/0x500 copy_process kernel/fork.c:1261
>  [<ffffffff81086fb3>] _do_fork+0x103/0x500 kernel/fork.c:1709
>  [<ffffffff810873ec>] kernel_thread+0x3c/0x60 kernel/fork.c:1772
>  [<ffffffff810a71dc>] __call_usermodehelper+0x5c/0x90 kernel/kmod.c:317
>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
>  [<ffffffff81e7215f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> DBG: cpu = 0
> 
> DBG: addr: ffff88019a923a68
> DBG: first offset: 0, second offset: 0
> DBG: T397 clock: {T397: 175738, T6: 2266233}
> DBG: T6 clock: {T6: 2266336}
> ==================================================================
> 
> On Thu, Jul 23, 2015 at 3:08 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> > On Wednesday 22 July 2015 15:14:42 Andrey Konovalov wrote:
> >> Hi!
> >>
> >> We are working on a dynamic data race detector for Linux kernel called
> >> KernelThreadSanitizer (ktsan) (https://github.com/google/ktsan/wiki).
> >>
> >> Here is a report we got while running ktsan (upstream revision
> >> 21bdb584af8cca7c6df3c44cba268be050a234eb).
> >>
> >> ==================================================================
> >> ThreadSanitizer: data-race in psmouse_interrupt
> >>
> >> Write of size 1 by thread T1 (K1):
> >>  [<ffffffff819d5272>] psmouse_interrupt+0xd2/0x690
> >> drivers/input/mouse/psmouse-base.c:337
> >>  [<ffffffff819bcd16>] serio_interrupt+0x66/0xb0 drivers/input/serio/serio.c:1008
> >>  [<ffffffff819be5f0>] i8042_interrupt+0x240/0x4d0
> >> drivers/input/serio/i8042.c:541
> >>  [<ffffffff810f03b2>] handle_irq_event_percpu+0x62/0x2a0 kernel/irq/handle.c:143
> >>  [<ffffffff810f0692>] handle_irq_event+0xa2/0xf0 kernel/irq/handle.c:192
> >>  [<ffffffff810f5414>] handle_edge_irq+0x104/0x380 kernel/irq/chip.c:608
> >>  [<     inlined    >] handle_irq+0xf6/0x220 generic_handle_irq_desc
> >> include/linux/irqdesc.h:146
> >>  [<ffffffff81009f86>] handle_irq+0xf6/0x220 arch/x86/kernel/irq_64.c:81
> >>  [<ffffffff810093ce>] do_IRQ+0x5e/0x110 arch/x86/kernel/irq.c:223
> >>  [<ffffffff81e726ab>] ret_from_intr+0x0/0x1f arch/x86/entry/entry_64.S:601
> >>  [<     inlined    >] irq_exit+0x98/0xa0 invoke_softirq kernel/softirq.c:350
> >>  [<ffffffff8108edd8>] irq_exit+0x98/0xa0 kernel/softirq.c:391
> >>  [<     inlined    >] smp_apic_timer_interrupt+0x63/0x80 exiting_irq
> >> ./arch/x86/include/asm/apic.h:655
> >>  [<ffffffff8105c503>] smp_apic_timer_interrupt+0x63/0x80
> >> arch/x86/kernel/apic/apic.c:915
> >>  [<ffffffff81e7299b>] apic_timer_interrupt+0x6b/0x70
> >> arch/x86/entry/entry_64.S:782
> >>  [<ffffffff8153223d>] kvasprintf+0x5d/0x90 lib/kasprintf.c:24
> >>  [<ffffffff815322be>] kasprintf+0x4e/0x70 lib/kasprintf.c:40
> >>  [<     inlined    >] module_add_driver+0x8f/0x180 make_driver_name
> >> drivers/base/module.c:18
> >>  [<ffffffff8183059f>] module_add_driver+0x8f/0x180 drivers/base/module.c:63
> >>  [<ffffffff81812c94>] bus_add_driver+0x204/0x340 drivers/base/bus.c:713
> >>  [<ffffffff81814e2b>] driver_register+0xdb/0x190 drivers/base/driver.c:168
> >>  [<ffffffff81a5ed16>] __hid_register_driver+0xb6/0x100
> >> drivers/hid/hid-core.c:2677
> >>  [<ffffffff825ab67d>] apple_driver_init+0x24/0x31 drivers/hid/hid-apple.c:574
> >>  [<ffffffff81000339>] do_one_initcall+0xb9/0x220 init/main.c:798
> >>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397
> >> do_initcall_level init/main.c:863
> >>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_initcalls
> >> init/main.c:871
> >>  [<     inlined    >] kernel_init_freeable+0x2ca/0x397 do_basic_setup
> >> init/main.c:890
> >>  [<ffffffff8253d919>] kernel_init_freeable+0x2ca/0x397 init/main.c:1013
> >>  [<ffffffff81e5ad66>] kernel_init+0x16/0x140 init/main.c:941
> >>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> >> DBG: cpu = ffff88053fd9fe68
> >> DBG: cpu id = 3
> >>
> >> Previous read of size 1 by thread T509 (K506):
> >>  [<     inlined    >] cypress_send_ext_cmd+0x193/0x500
> >> cypress_ps2_read_cmd_status drivers/input/mouse/cypress_ps2.c:123
> >>  [<ffffffff819e6b23>] cypress_send_ext_cmd+0x193/0x500
> >> drivers/input/mouse/cypress_ps2.c:192
> >>  [<ffffffff819e6f5a>] cypress_detect+0x2a/0xb0
> >> drivers/input/mouse/cypress_ps2.c:208
> >>  [<ffffffff819d5e66>] psmouse_extensions+0x1f6/0x4e0
> >> drivers/input/mouse/psmouse-base.c:849 (discriminator 1)
> >>  [<ffffffff819d63fd>] psmouse_switch_protocol+0x2ad/0x330
> >> drivers/input/mouse/psmouse-base.c:1467
> >>  [<ffffffff819d7426>] psmouse_connect+0x2e6/0x450
> >> drivers/input/mouse/psmouse-base.c:1554 (discriminator 4)
> >>  [<     inlined    >] serio_driver_probe+0x57/0x80
> >> serio_connect_driver drivers/input/serio/serio.c:65
> >>  [<ffffffff819bcf77>] serio_driver_probe+0x57/0x80
> >> drivers/input/serio/serio.c:796
> >>  [<     inlined    >] driver_probe_device+0x1da/0x460 really_probe
> >> drivers/base/dd.c:308
> >>  [<ffffffff81813b1a>] driver_probe_device+0x1da/0x460 drivers/base/dd.c:414
> >>  [<ffffffff81813e64>] __driver_attach+0xc4/0xd0 drivers/base/dd.c:612
> >>  [<ffffffff81810933>] bus_for_each_dev+0x93/0xe0 drivers/base/bus.c:314
> >>  [<ffffffff81813353>] driver_attach+0x33/0x50 drivers/base/dd.c:631
> >>  [<     inlined    >] serio_handle_event+0x2d3/0x3b0
> >> serio_attach_driver drivers/input/serio/serio.c:826
> >>  [<ffffffff819bd8e3>] serio_handle_event+0x2d3/0x3b0
> >> drivers/input/serio/serio.c:245
> >>  [<ffffffff810adcf9>] process_one_work+0x2a9/0x6d0 kernel/workqueue.c:2036
> >>  [<ffffffff810aeddf>] worker_thread+0xbf/0x710 kernel/workqueue.c:2170
> >>  [<ffffffff810b700c>] kthread+0x15c/0x180 kernel/kthread.c:207
> >>  [<ffffffff81e71f5f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> >> DBG: cpu = 0
> >>
> >> DBG: addr: ffff88019d703909
> >> DBG: first offset: 1, second offset: 1
> >> DBG: T1 clock: {T1: 17075020, T509: 142163}
> >> DBG: T509 clock: {T509: 142167}
> >> ==================================================================
> >>
> >> It seems that accessing 'psmouse->pktcnt' in
> >> 'psmouse->packet[psmouse->pktcnt++] = data' in 'psmouse_interrupt()'
> >> is racing with 'wait_event_timeout(ps2dev->wait, (psmouse->pktcnt >=
> >> pktsize), msecs_to_jiffies(CYTP_CMD_TIMEOUT))' in
> >> 'cypress_ps2_read_cmd_status()'.
> >> If that's the case, it's possible that 'psmouse->pktcnt' is
> >> incremented, but no data is written yet, so '(psmouse->pktcnt >=
> >> pktsize)' will evaluate to true and the following 'memcpy' will access
> >> invalid data.
> >>
> >> Could you confirm if this is a real data race?
> >>
> >> Thanks in advance!
> >>
> >
> > Hi Andrey,
> >
> > I would like to ask you, did not you noticed some problem from Sanitizer
> > with psmouse-base.c and alps.c code?
> >
> > There is unknown problem when alps.c receive invalid data from
> > psmouse_interrupt/serio_interrupt/i8042_interrupt.
> >
> > And when I read your email with race condition & accesing invalid data
> > in cypress driver. I'm thinking if there cannot be something like that.
> >
> > --
> > Pali Rohár
> > pali.rohar@gmail.com

-- 
Pali Rohár
pali.rohar@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-07-29 11:53         ` Pali Rohár
  (?)
@ 2015-08-28 17:34         ` Dmitry Vyukov
  2015-08-28 17:51           ` Dmitry Torokhov
  -1 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2015-08-28 17:34 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Andrey Konovalov, Dmitry Torokhov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

Hello,

I am looking at this code in __ps2_command again:

/*
* The reset command takes a long time to execute.
*/
timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);

timeout = wait_event_timeout(ps2dev->wait,
    !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);

if (smp_load_acquire(&ps2dev->cmdcnt) &&
    !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
              timeout = ps2_adjust_timeout(ps2dev, command, timeout);
              wait_event_timeout(ps2dev->wait,
                        !(smp_load_acquire(&ps2dev->flags) &
PS2_FLAG_CMD), timeout);
}

if (param)
    for (i = 0; i < receive; i++)
              param[i] = ps2dev->cmdbuf[(receive - 1) - i];


Here are two moments I don't understand:
1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
is compared against 100ms). However, timeout is assigned to result of
wait_event_timeout, which returns 0 or 1. This does not make sense to
me. What am I missing?
2. This code pays great attention to timeouts, but in the end I don't
see how it handles timeouts. That is, if a timeout is happened, we
still copyout (garbage) from cmdbuf. What am I missing here?

Thank you

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-08-28 17:34         ` Dmitry Vyukov
@ 2015-08-28 17:51           ` Dmitry Torokhov
  2015-08-28 18:08             ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Torokhov @ 2015-08-28 17:51 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> I am looking at this code in __ps2_command again:
>
> /*
> * The reset command takes a long time to execute.
> */
> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>
> timeout = wait_event_timeout(ps2dev->wait,
>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>
> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>               wait_event_timeout(ps2dev->wait,
>                         !(smp_load_acquire(&ps2dev->flags) &
> PS2_FLAG_CMD), timeout);
> }
>
> if (param)
>     for (i = 0; i < receive; i++)
>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>
>
> Here are two moments I don't understand:
> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
> is compared against 100ms). However, timeout is assigned to result of
> wait_event_timeout, which returns 0 or 1. This does not make sense to
> me. What am I missing?

The fact that wait_event_timeout can return value greater than one:

 * Returns:
 * 0 if the @condition evaluated to %false after the @timeout elapsed,
 * 1 if the @condition evaluated to %true after the @timeout elapsed,
 * or the remaining jiffies (at least 1) if the @condition evaluated
              ^^^^^^^^^^^^^^^^^^^^^^^^^

> 2. This code pays great attention to timeouts, but in the end I don't
> see how it handles timeouts. That is, if a timeout is happened, we
> still copyout (garbage) from cmdbuf. What am I missing here?

Once upon a time wait_event() did not return positive value when
timeout expired and then condition satisfied. So we just examine the
final state (psmpouse->cmdcnt should be 0 if command actually
succeeded) and even if we copy in garbage nobody should care since
we'll return error in this case.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-08-28 17:51           ` Dmitry Torokhov
@ 2015-08-28 18:08             ` Dmitry Vyukov
  2015-08-28 18:32               ` Dmitry Torokhov
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2015-08-28 18:08 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> Hello,
>>
>> I am looking at this code in __ps2_command again:
>>
>> /*
>> * The reset command takes a long time to execute.
>> */
>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>
>> timeout = wait_event_timeout(ps2dev->wait,
>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>
>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>               wait_event_timeout(ps2dev->wait,
>>                         !(smp_load_acquire(&ps2dev->flags) &
>> PS2_FLAG_CMD), timeout);
>> }
>>
>> if (param)
>>     for (i = 0; i < receive; i++)
>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>
>>
>> Here are two moments I don't understand:
>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>> is compared against 100ms). However, timeout is assigned to result of
>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>> me. What am I missing?
>
> The fact that wait_event_timeout can return value greater than one:
>
>  * Returns:
>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>  * or the remaining jiffies (at least 1) if the @condition evaluated
>               ^^^^^^^^^^^^^^^^^^^^^^^^^


OK, makes sense now!

>> 2. This code pays great attention to timeouts, but in the end I don't
>> see how it handles timeouts. That is, if a timeout is happened, we
>> still copyout (garbage) from cmdbuf. What am I missing here?
>
> Once upon a time wait_event() did not return positive value when
> timeout expired and then condition satisfied. So we just examine the
> final state (psmpouse->cmdcnt should be 0 if command actually
> succeeded) and even if we copy in garbage nobody should care since
> we'll return error in this case.


I see.
But the cmdcnt is re-read after copying out response. So it is
possible that we read garbage response, but then read cmdcnt==0 and
return OK to caller.

So far I have something along the following lines to fix data races in libps2.c


diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
index 7551699..51c747f 100644
--- a/drivers/input/serio/libps2.c
+++ b/drivers/input/serio/libps2.c
@@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
char byte, int timeout)

         if (serio_write(ps2dev->serio, byte) == 0)
                 wait_event_timeout(ps2dev->wait,
-                                   !(ps2dev->flags & PS2_FLAG_ACK),
+                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
                                    msecs_to_jiffies(timeout));

         serio_pause_rx(ps2dev->serio);
@@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
char *param, int command)
         int receive = (command >> 8) & 0xf;
         int rc = -1;
         int i;
+        unsigned char cmdcnt;

         if (receive > sizeof(ps2dev->cmdbuf)) {
                 WARN_ON(1);
@@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
unsigned char *param, int command)
         timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);

         timeout = wait_event_timeout(ps2dev->wait,
-                                     !(ps2dev->flags &
PS2_FLAG_CMD1), timeout);
-
-        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
+                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);

+        if (READ_ONCE(&ps2dev->cmdcnt) &&
+                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
                 timeout = ps2_adjust_timeout(ps2dev, command, timeout);
                 wait_event_timeout(ps2dev->wait,
-                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
+                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);
         }

-        if (param)
-                for (i = 0; i < receive; i++)
-                        param[i] = ps2dev->cmdbuf[(receive - 1) - i];
-
-        if (ps2dev->cmdcnt && (command != PS2_CMD_RESET_BAT ||
ps2dev->cmdcnt != 1))
-                goto out;
-
-        rc = 0;
+        cmdcnt = smp_load_acquire(&ps2dev->cmdcnt);
+        if (!cmdcnt || command == PS2_CMD_RESET_BAT && ps2dev->cmdcnt == 1) {
+                if (param)
+                        for (i = 0; i < receive; i++)
+                                param[i] = ps2dev->cmdbuf[(receive - 1) - i];
+                rc = 0;
+        }

  out:
         serio_pause_rx(ps2dev->serio);
@@ -284,19 +284,21 @@ EXPORT_SYMBOL(ps2_init);

 int ps2_handle_ack(struct ps2dev *ps2dev, unsigned char data)
 {
+        unsigned long flags = ps2dev->flags;
+
         switch (data) {
                 case PS2_RET_ACK:
                         ps2dev->nak = 0;
                         break;

                 case PS2_RET_NAK:
-                        ps2dev->flags |= PS2_FLAG_NAK;
+                        flags |= PS2_FLAG_NAK;
                         ps2dev->nak = PS2_RET_NAK;
                         break;

                 case PS2_RET_ERR:
-                        if (ps2dev->flags & PS2_FLAG_NAK) {
-                                ps2dev->flags &= ~PS2_FLAG_NAK;
+                        if (flags & PS2_FLAG_NAK) {
+                                flags &= ~PS2_FLAG_NAK;
                                 ps2dev->nak = PS2_RET_ERR;
                                 break;
                         }
@@ -308,7 +310,7 @@ int ps2_handle_ack(struct ps2dev *ps2dev, unsigned
char data)
                 case 0x00:
                 case 0x03:
                 case 0x04:
-                        if (ps2dev->flags & PS2_FLAG_WAITID) {
+                        if (flags & PS2_FLAG_WAITID) {
                                 ps2dev->nak = 0;
                                 break;
                         }
@@ -319,12 +321,12 @@ int ps2_handle_ack(struct ps2dev *ps2dev,
unsigned char data)


         if (!ps2dev->nak) {
-                ps2dev->flags &= ~PS2_FLAG_NAK;
+                flags &= ~PS2_FLAG_NAK;
                 if (ps2dev->cmdcnt)
-                        ps2dev->flags |= PS2_FLAG_CMD | PS2_FLAG_CMD1;
+                        flags |= PS2_FLAG_CMD | PS2_FLAG_CMD1;
         }

-        ps2dev->flags &= ~PS2_FLAG_ACK;
+        WRITE_ONCE(ps2dev->flags, flags & ~PS2_FLAG_ACK);
         wake_up(&ps2dev->wait);

         if (data != PS2_RET_ACK)
@@ -342,17 +344,21 @@ EXPORT_SYMBOL(ps2_handle_ack);

 int ps2_handle_response(struct ps2dev *ps2dev, unsigned char data)
 {
-        if (ps2dev->cmdcnt)
-                ps2dev->cmdbuf[--ps2dev->cmdcnt] = data;
+        if (ps2dev->cmdcnt) {
+                unsigned char cmdcnt = ps2dev->cmdcnt - 1;
+
+                ps2dev->cmdbuf[cmdcnt] = data;
+                smp_store_release(&ps2dev->cmdcnt, cmdcnt);
+        }

         if (ps2dev->flags & PS2_FLAG_CMD1) {
-                ps2dev->flags &= ~PS2_FLAG_CMD1;
+                smp_store_release(&ps2dev->flags, ps2dev->flags &
~PS2_FLAG_CMD1);
                 if (ps2dev->cmdcnt)
                         wake_up(&ps2dev->wait);
         }

         if (!ps2dev->cmdcnt) {
-                ps2dev->flags &= ~PS2_FLAG_CMD;
+                smp_store_release(&ps2dev->flags, ps2dev->flags &
~PS2_FLAG_CMD);
                 wake_up(&ps2dev->wait);
         }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-08-28 18:08             ` Dmitry Vyukov
@ 2015-08-28 18:32               ` Dmitry Torokhov
  2015-08-28 18:34                 ` Dmitry Torokhov
  2015-09-01 18:46                 ` Dmitry Vyukov
  0 siblings, 2 replies; 19+ messages in thread
From: Dmitry Torokhov @ 2015-08-28 18:32 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> Hello,
>>>
>>> I am looking at this code in __ps2_command again:
>>>
>>> /*
>>> * The reset command takes a long time to execute.
>>> */
>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>
>>> timeout = wait_event_timeout(ps2dev->wait,
>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>
>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>               wait_event_timeout(ps2dev->wait,
>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>> PS2_FLAG_CMD), timeout);
>>> }
>>>
>>> if (param)
>>>     for (i = 0; i < receive; i++)
>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>
>>>
>>> Here are two moments I don't understand:
>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>> is compared against 100ms). However, timeout is assigned to result of
>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>> me. What am I missing?
>>
>> The fact that wait_event_timeout can return value greater than one:
>>
>>  * Returns:
>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>
>
> OK, makes sense now!
>
>>> 2. This code pays great attention to timeouts, but in the end I don't
>>> see how it handles timeouts. That is, if a timeout is happened, we
>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>
>> Once upon a time wait_event() did not return positive value when
>> timeout expired and then condition satisfied. So we just examine the
>> final state (psmpouse->cmdcnt should be 0 if command actually
>> succeeded) and even if we copy in garbage nobody should care since
>> we'll return error in this case.
>
>
> I see.
> But the cmdcnt is re-read after copying out response. So it is
> possible that we read garbage response, but then read cmdcnt==0 and
> return OK to caller.

That assumes that we actually timed out, and while we were copying the
data the response finally came.

>
> So far I have something along the following lines to fix data races in libps2.c

I don't know, maybe we should simply move call to
serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
and move copying of the buffer down, after checking cmdcnt.

>
>
> diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
> index 7551699..51c747f 100644
> --- a/drivers/input/serio/libps2.c
> +++ b/drivers/input/serio/libps2.c
> @@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
> char byte, int timeout)
>
>          if (serio_write(ps2dev->serio, byte) == 0)
>                  wait_event_timeout(ps2dev->wait,
> -                                   !(ps2dev->flags & PS2_FLAG_ACK),
> +                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
>                                     msecs_to_jiffies(timeout));
>
>          serio_pause_rx(ps2dev->serio);
> @@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
> char *param, int command)
>          int receive = (command >> 8) & 0xf;
>          int rc = -1;
>          int i;
> +        unsigned char cmdcnt;
>
>          if (receive > sizeof(ps2dev->cmdbuf)) {
>                  WARN_ON(1);
> @@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
> unsigned char *param, int command)
>          timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>
>          timeout = wait_event_timeout(ps2dev->wait,
> -                                     !(ps2dev->flags &
> PS2_FLAG_CMD1), timeout);
> -
> -        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
> +                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>
> +        if (READ_ONCE(&ps2dev->cmdcnt) &&
> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>                  timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>                  wait_event_timeout(ps2dev->wait,
> -                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);

What all these READ_ONCE()s give us?

>          }
>
> -        if (param)
> -                for (i = 0; i < receive; i++)
> -                        param[i] = ps2dev->cmdbuf[(receive - 1) - i];
> -
> -        if (ps2dev->cmdcnt && (command != PS2_CMD_RESET_BAT ||
> ps2dev->cmdcnt != 1))
> -                goto out;
> -
> -        rc = 0;
> +        cmdcnt = smp_load_acquire(&ps2dev->cmdcnt);
> +        if (!cmdcnt || command == PS2_CMD_RESET_BAT && ps2dev->cmdcnt == 1) {
> +                if (param)
> +                        for (i = 0; i < receive; i++)
> +                                param[i] = ps2dev->cmdbuf[(receive - 1) - i];
> +                rc = 0;
> +        }
>
>   out:
>          serio_pause_rx(ps2dev->serio);
> @@ -284,19 +284,21 @@ EXPORT_SYMBOL(ps2_init);
>
>  int ps2_handle_ack(struct ps2dev *ps2dev, unsigned char data)
>  {
> +        unsigned long flags = ps2dev->flags;
> +
>          switch (data) {
>                  case PS2_RET_ACK:
>                          ps2dev->nak = 0;
>                          break;
>
>                  case PS2_RET_NAK:
> -                        ps2dev->flags |= PS2_FLAG_NAK;
> +                        flags |= PS2_FLAG_NAK;
>                          ps2dev->nak = PS2_RET_NAK;
>                          break;
>
>                  case PS2_RET_ERR:
> -                        if (ps2dev->flags & PS2_FLAG_NAK) {
> -                                ps2dev->flags &= ~PS2_FLAG_NAK;
> +                        if (flags & PS2_FLAG_NAK) {
> +                                flags &= ~PS2_FLAG_NAK;
>                                  ps2dev->nak = PS2_RET_ERR;
>                                  break;
>                          }
> @@ -308,7 +310,7 @@ int ps2_handle_ack(struct ps2dev *ps2dev, unsigned
> char data)
>                  case 0x00:
>                  case 0x03:
>                  case 0x04:
> -                        if (ps2dev->flags & PS2_FLAG_WAITID) {
> +                        if (flags & PS2_FLAG_WAITID) {
>                                  ps2dev->nak = 0;
>                                  break;
>                          }
> @@ -319,12 +321,12 @@ int ps2_handle_ack(struct ps2dev *ps2dev,
> unsigned char data)
>
>
>          if (!ps2dev->nak) {
> -                ps2dev->flags &= ~PS2_FLAG_NAK;
> +                flags &= ~PS2_FLAG_NAK;
>                  if (ps2dev->cmdcnt)
> -                        ps2dev->flags |= PS2_FLAG_CMD | PS2_FLAG_CMD1;
> +                        flags |= PS2_FLAG_CMD | PS2_FLAG_CMD1;
>          }
>
> -        ps2dev->flags &= ~PS2_FLAG_ACK;
> +        WRITE_ONCE(ps2dev->flags, flags & ~PS2_FLAG_ACK);
>          wake_up(&ps2dev->wait);
>
>          if (data != PS2_RET_ACK)
> @@ -342,17 +344,21 @@ EXPORT_SYMBOL(ps2_handle_ack);
>
>  int ps2_handle_response(struct ps2dev *ps2dev, unsigned char data)
>  {
> -        if (ps2dev->cmdcnt)
> -                ps2dev->cmdbuf[--ps2dev->cmdcnt] = data;
> +        if (ps2dev->cmdcnt) {
> +                unsigned char cmdcnt = ps2dev->cmdcnt - 1;
> +
> +                ps2dev->cmdbuf[cmdcnt] = data;
> +                smp_store_release(&ps2dev->cmdcnt, cmdcnt);
> +        }
>
>          if (ps2dev->flags & PS2_FLAG_CMD1) {
> -                ps2dev->flags &= ~PS2_FLAG_CMD1;
> +                smp_store_release(&ps2dev->flags, ps2dev->flags &
> ~PS2_FLAG_CMD1);
>                  if (ps2dev->cmdcnt)
>                          wake_up(&ps2dev->wait);
>          }
>
>          if (!ps2dev->cmdcnt) {
> -                ps2dev->flags &= ~PS2_FLAG_CMD;
> +                smp_store_release(&ps2dev->flags, ps2dev->flags &
> ~PS2_FLAG_CMD);
>                  wake_up(&ps2dev->wait);
>          }

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-08-28 18:32               ` Dmitry Torokhov
@ 2015-08-28 18:34                 ` Dmitry Torokhov
  2015-09-01 18:46                 ` Dmitry Vyukov
  1 sibling, 0 replies; 19+ messages in thread
From: Dmitry Torokhov @ 2015-08-28 18:34 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Fri, Aug 28, 2015 at 11:32 AM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
>> <dmitry.torokhov@gmail.com> wrote:
>>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> Hello,
>>>>
>>>> I am looking at this code in __ps2_command again:
>>>>
>>>> /*
>>>> * The reset command takes a long time to execute.
>>>> */
>>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>
>>>> timeout = wait_event_timeout(ps2dev->wait,
>>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>
>>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>               wait_event_timeout(ps2dev->wait,
>>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>>> PS2_FLAG_CMD), timeout);
>>>> }
>>>>
>>>> if (param)
>>>>     for (i = 0; i < receive; i++)
>>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>>
>>>>
>>>> Here are two moments I don't understand:
>>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>>> is compared against 100ms). However, timeout is assigned to result of
>>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>>> me. What am I missing?
>>>
>>> The fact that wait_event_timeout can return value greater than one:
>>>
>>>  * Returns:
>>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>>
>> OK, makes sense now!
>>
>>>> 2. This code pays great attention to timeouts, but in the end I don't
>>>> see how it handles timeouts. That is, if a timeout is happened, we
>>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>>
>>> Once upon a time wait_event() did not return positive value when
>>> timeout expired and then condition satisfied. So we just examine the
>>> final state (psmpouse->cmdcnt should be 0 if command actually
>>> succeeded) and even if we copy in garbage nobody should care since
>>> we'll return error in this case.
>>
>>
>> I see.
>> But the cmdcnt is re-read after copying out response. So it is
>> possible that we read garbage response, but then read cmdcnt==0 and
>> return OK to caller.
>
> That assumes that we actually timed out, and while we were copying the
> data the response finally came.
>
>>
>> So far I have something along the following lines to fix data races in libps2.c
>
> I don't know, maybe we should simply move call to
> serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
> and move copying of the buffer down, after checking cmdcnt.

By the way, please either drop ktsan group from public postngs or open
it to post from public.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-08-28 18:32               ` Dmitry Torokhov
  2015-08-28 18:34                 ` Dmitry Torokhov
@ 2015-09-01 18:46                 ` Dmitry Vyukov
  2015-09-04 16:56                   ` Dmitry Torokhov
  1 sibling, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2015-09-01 18:46 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Fri, Aug 28, 2015 at 8:32 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
>> <dmitry.torokhov@gmail.com> wrote:
>>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> Hello,
>>>>
>>>> I am looking at this code in __ps2_command again:
>>>>
>>>> /*
>>>> * The reset command takes a long time to execute.
>>>> */
>>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>
>>>> timeout = wait_event_timeout(ps2dev->wait,
>>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>
>>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>               wait_event_timeout(ps2dev->wait,
>>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>>> PS2_FLAG_CMD), timeout);
>>>> }
>>>>
>>>> if (param)
>>>>     for (i = 0; i < receive; i++)
>>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>>
>>>>
>>>> Here are two moments I don't understand:
>>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>>> is compared against 100ms). However, timeout is assigned to result of
>>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>>> me. What am I missing?
>>>
>>> The fact that wait_event_timeout can return value greater than one:
>>>
>>>  * Returns:
>>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>>
>> OK, makes sense now!
>>
>>>> 2. This code pays great attention to timeouts, but in the end I don't
>>>> see how it handles timeouts. That is, if a timeout is happened, we
>>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>>
>>> Once upon a time wait_event() did not return positive value when
>>> timeout expired and then condition satisfied. So we just examine the
>>> final state (psmpouse->cmdcnt should be 0 if command actually
>>> succeeded) and even if we copy in garbage nobody should care since
>>> we'll return error in this case.
>>
>>
>> I see.
>> But the cmdcnt is re-read after copying out response. So it is
>> possible that we read garbage response, but then read cmdcnt==0 and
>> return OK to caller.
>
> That assumes that we actually timed out, and while we were copying the
> data the response finally came.

Right.

>>
>> So far I have something along the following lines to fix data races in libps2.c
>
> I don't know, maybe we should simply move call to
> serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
> and move copying of the buffer down, after checking cmdcnt.

I don't know about serio_pause_rx, but copying of response should be
done after checking cmdcnt.
Also you need to use smp_store_release/smp_load_acquire cmdcnt and
flags when they have dependent data. And READ_ONCE/WRITE_ONCE on
shared state otherwise is highly desirable.

>> diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
>> index 7551699..51c747f 100644
>> --- a/drivers/input/serio/libps2.c
>> +++ b/drivers/input/serio/libps2.c
>> @@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
>> char byte, int timeout)
>>
>>          if (serio_write(ps2dev->serio, byte) == 0)
>>                  wait_event_timeout(ps2dev->wait,
>> -                                   !(ps2dev->flags & PS2_FLAG_ACK),
>> +                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
>>                                     msecs_to_jiffies(timeout));
>>
>>          serio_pause_rx(ps2dev->serio);
>> @@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
>> char *param, int command)
>>          int receive = (command >> 8) & 0xf;
>>          int rc = -1;
>>          int i;
>> +        unsigned char cmdcnt;
>>
>>          if (receive > sizeof(ps2dev->cmdbuf)) {
>>                  WARN_ON(1);
>> @@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
>> unsigned char *param, int command)
>>          timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>
>>          timeout = wait_event_timeout(ps2dev->wait,
>> -                                     !(ps2dev->flags &
>> PS2_FLAG_CMD1), timeout);
>> -
>> -        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
>> +                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>
>> +        if (READ_ONCE(&ps2dev->cmdcnt) &&
>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>                  timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>                  wait_event_timeout(ps2dev->wait,
>> -                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);
>
> What all these READ_ONCE()s give us?

I've wrote up the response here:
https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE


>By the way, please either drop ktsan group from public postngs or open
it to post from public.

Sorry, should be public now.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-09-01 18:46                 ` Dmitry Vyukov
@ 2015-09-04 16:56                   ` Dmitry Torokhov
  2015-09-04 19:32                     ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Torokhov @ 2015-09-04 16:56 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Tue, Sep 1, 2015 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Aug 28, 2015 at 8:32 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
>> On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
>>> <dmitry.torokhov@gmail.com> wrote:
>>>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>> Hello,
>>>>>
>>>>> I am looking at this code in __ps2_command again:
>>>>>
>>>>> /*
>>>>> * The reset command takes a long time to execute.
>>>>> */
>>>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>
>>>>> timeout = wait_event_timeout(ps2dev->wait,
>>>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>
>>>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>               wait_event_timeout(ps2dev->wait,
>>>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>>>> PS2_FLAG_CMD), timeout);
>>>>> }
>>>>>
>>>>> if (param)
>>>>>     for (i = 0; i < receive; i++)
>>>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>>>
>>>>>
>>>>> Here are two moments I don't understand:
>>>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>>>> is compared against 100ms). However, timeout is assigned to result of
>>>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>>>> me. What am I missing?
>>>>
>>>> The fact that wait_event_timeout can return value greater than one:
>>>>
>>>>  * Returns:
>>>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>>
>>> OK, makes sense now!
>>>
>>>>> 2. This code pays great attention to timeouts, but in the end I don't
>>>>> see how it handles timeouts. That is, if a timeout is happened, we
>>>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>>>
>>>> Once upon a time wait_event() did not return positive value when
>>>> timeout expired and then condition satisfied. So we just examine the
>>>> final state (psmpouse->cmdcnt should be 0 if command actually
>>>> succeeded) and even if we copy in garbage nobody should care since
>>>> we'll return error in this case.
>>>
>>>
>>> I see.
>>> But the cmdcnt is re-read after copying out response. So it is
>>> possible that we read garbage response, but then read cmdcnt==0 and
>>> return OK to caller.
>>
>> That assumes that we actually timed out, and while we were copying the
>> data the response finally came.
>
> Right.
>
>>>
>>> So far I have something along the following lines to fix data races in libps2.c
>>
>> I don't know, maybe we should simply move call to
>> serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
>> and move copying of the buffer down, after checking cmdcnt.
>
> I don't know about serio_pause_rx, but copying of response should be
> done after checking cmdcnt.

It will stop the interrupt handler from running while we are examining
the cmdcnt and copy out the data, thus removing the race.

> Also you need to use smp_store_release/smp_load_acquire cmdcnt and
> flags when they have dependent data. And READ_ONCE/WRITE_ONCE on
> shared state otherwise is highly desirable.
>
>>> diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
>>> index 7551699..51c747f 100644
>>> --- a/drivers/input/serio/libps2.c
>>> +++ b/drivers/input/serio/libps2.c
>>> @@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
>>> char byte, int timeout)
>>>
>>>          if (serio_write(ps2dev->serio, byte) == 0)
>>>                  wait_event_timeout(ps2dev->wait,
>>> -                                   !(ps2dev->flags & PS2_FLAG_ACK),
>>> +                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
>>>                                     msecs_to_jiffies(timeout));
>>>
>>>          serio_pause_rx(ps2dev->serio);
>>> @@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
>>> char *param, int command)
>>>          int receive = (command >> 8) & 0xf;
>>>          int rc = -1;
>>>          int i;
>>> +        unsigned char cmdcnt;
>>>
>>>          if (receive > sizeof(ps2dev->cmdbuf)) {
>>>                  WARN_ON(1);
>>> @@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
>>> unsigned char *param, int command)
>>>          timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>
>>>          timeout = wait_event_timeout(ps2dev->wait,
>>> -                                     !(ps2dev->flags &
>>> PS2_FLAG_CMD1), timeout);
>>> -
>>> -        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
>>> +                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>
>>> +        if (READ_ONCE(&ps2dev->cmdcnt) &&
>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>                  timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>                  wait_event_timeout(ps2dev->wait,
>>> -                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);
>>
>> What all these READ_ONCE()s give us?
>
> I've wrote up the response here:
> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

I read it and I still do not understand what READ_ONCE() in
wait_event* conditions will buy us.

Also if the following is true:

> As the consequence C compilers stopped guarantying that "word accesses are atomic".

a lot of stuff will break in the kernel. Maybe compilers should stop
moving towards the lala land?

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-09-04 16:56                   ` Dmitry Torokhov
@ 2015-09-04 19:32                     ` Dmitry Vyukov
  2015-09-04 20:27                       ` Dmitry Torokhov
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2015-09-04 19:32 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Fri, Sep 4, 2015 at 6:56 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Tue, Sep 1, 2015 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Aug 28, 2015 at 8:32 PM, Dmitry Torokhov
>> <dmitry.torokhov@gmail.com> wrote:
>>> On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
>>>> <dmitry.torokhov@gmail.com> wrote:
>>>>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I am looking at this code in __ps2_command again:
>>>>>>
>>>>>> /*
>>>>>> * The reset command takes a long time to execute.
>>>>>> */
>>>>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>>
>>>>>> timeout = wait_event_timeout(ps2dev->wait,
>>>>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>>
>>>>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>>               wait_event_timeout(ps2dev->wait,
>>>>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>>>>> PS2_FLAG_CMD), timeout);
>>>>>> }
>>>>>>
>>>>>> if (param)
>>>>>>     for (i = 0; i < receive; i++)
>>>>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>>>>
>>>>>>
>>>>>> Here are two moments I don't understand:
>>>>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>>>>> is compared against 100ms). However, timeout is assigned to result of
>>>>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>>>>> me. What am I missing?
>>>>>
>>>>> The fact that wait_event_timeout can return value greater than one:
>>>>>
>>>>>  * Returns:
>>>>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>>>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>>>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>>>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>
>>>>
>>>> OK, makes sense now!
>>>>
>>>>>> 2. This code pays great attention to timeouts, but in the end I don't
>>>>>> see how it handles timeouts. That is, if a timeout is happened, we
>>>>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>>>>
>>>>> Once upon a time wait_event() did not return positive value when
>>>>> timeout expired and then condition satisfied. So we just examine the
>>>>> final state (psmpouse->cmdcnt should be 0 if command actually
>>>>> succeeded) and even if we copy in garbage nobody should care since
>>>>> we'll return error in this case.
>>>>
>>>>
>>>> I see.
>>>> But the cmdcnt is re-read after copying out response. So it is
>>>> possible that we read garbage response, but then read cmdcnt==0 and
>>>> return OK to caller.
>>>
>>> That assumes that we actually timed out, and while we were copying the
>>> data the response finally came.
>>
>> Right.
>>
>>>>
>>>> So far I have something along the following lines to fix data races in libps2.c
>>>
>>> I don't know, maybe we should simply move call to
>>> serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
>>> and move copying of the buffer down, after checking cmdcnt.
>>
>> I don't know about serio_pause_rx, but copying of response should be
>> done after checking cmdcnt.
>
> It will stop the interrupt handler from running while we are examining
> the cmdcnt and copy out the data, thus removing the race.
>
>> Also you need to use smp_store_release/smp_load_acquire cmdcnt and
>> flags when they have dependent data. And READ_ONCE/WRITE_ONCE on
>> shared state otherwise is highly desirable.
>>
>>>> diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
>>>> index 7551699..51c747f 100644
>>>> --- a/drivers/input/serio/libps2.c
>>>> +++ b/drivers/input/serio/libps2.c
>>>> @@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
>>>> char byte, int timeout)
>>>>
>>>>          if (serio_write(ps2dev->serio, byte) == 0)
>>>>                  wait_event_timeout(ps2dev->wait,
>>>> -                                   !(ps2dev->flags & PS2_FLAG_ACK),
>>>> +                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
>>>>                                     msecs_to_jiffies(timeout));
>>>>
>>>>          serio_pause_rx(ps2dev->serio);
>>>> @@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
>>>> char *param, int command)
>>>>          int receive = (command >> 8) & 0xf;
>>>>          int rc = -1;
>>>>          int i;
>>>> +        unsigned char cmdcnt;
>>>>
>>>>          if (receive > sizeof(ps2dev->cmdbuf)) {
>>>>                  WARN_ON(1);
>>>> @@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
>>>> unsigned char *param, int command)
>>>>          timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>
>>>>          timeout = wait_event_timeout(ps2dev->wait,
>>>> -                                     !(ps2dev->flags &
>>>> PS2_FLAG_CMD1), timeout);
>>>> -
>>>> -        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
>>>> +                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>
>>>> +        if (READ_ONCE(&ps2dev->cmdcnt) &&
>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>                  timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>                  wait_event_timeout(ps2dev->wait,
>>>> -                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);
>>>
>>> What all these READ_ONCE()s give us?
>>
>> I've wrote up the response here:
>> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
>
> I read it and I still do not understand what READ_ONCE() in
> wait_event* conditions will buy us.
>
> Also if the following is true:
>
>> As the consequence C compilers stopped guarantying that "word accesses are atomic".
>
> a lot of stuff will break in the kernel. Maybe compilers should stop
> moving towards the lala land?

It buys us:
1. More readable code but highlighting important aspects. Inter-thread
synchronization is important and complex, explicit is better than
implicit in such contexts.
2. Conformance to relevant standards and relieve you, me and everybody
else reading this code from spending time on proving that it cannot
break (which is not actually possible to do, "I don't see how it can
break" is not quite proof).
3. Allow tooling that finds undoubtedly harmful bugs, like this one.

I don't see any negative aspects to it. Do you see any? Because if you
see at least some value in at least on these points and don't see any
negative aspects, then it is worth doing.



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-09-04 19:32                     ` Dmitry Vyukov
@ 2015-09-04 20:27                       ` Dmitry Torokhov
  2015-09-05 13:21                         ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Torokhov @ 2015-09-04 20:27 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Fri, Sep 4, 2015 at 12:32 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Sep 4, 2015 at 6:56 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
>> On Tue, Sep 1, 2015 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Fri, Aug 28, 2015 at 8:32 PM, Dmitry Torokhov
>>> <dmitry.torokhov@gmail.com> wrote:
>>>> On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
>>>>> <dmitry.torokhov@gmail.com> wrote:
>>>>>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am looking at this code in __ps2_command again:
>>>>>>>
>>>>>>> /*
>>>>>>> * The reset command takes a long time to execute.
>>>>>>> */
>>>>>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>>>
>>>>>>> timeout = wait_event_timeout(ps2dev->wait,
>>>>>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>>>
>>>>>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>>>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>>>               wait_event_timeout(ps2dev->wait,
>>>>>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>>>>>> PS2_FLAG_CMD), timeout);
>>>>>>> }
>>>>>>>
>>>>>>> if (param)
>>>>>>>     for (i = 0; i < receive; i++)
>>>>>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>>>>>
>>>>>>>
>>>>>>> Here are two moments I don't understand:
>>>>>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>>>>>> is compared against 100ms). However, timeout is assigned to result of
>>>>>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>>>>>> me. What am I missing?
>>>>>>
>>>>>> The fact that wait_event_timeout can return value greater than one:
>>>>>>
>>>>>>  * Returns:
>>>>>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>>>>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>>>>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>>>>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>
>>>>>
>>>>> OK, makes sense now!
>>>>>
>>>>>>> 2. This code pays great attention to timeouts, but in the end I don't
>>>>>>> see how it handles timeouts. That is, if a timeout is happened, we
>>>>>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>>>>>
>>>>>> Once upon a time wait_event() did not return positive value when
>>>>>> timeout expired and then condition satisfied. So we just examine the
>>>>>> final state (psmpouse->cmdcnt should be 0 if command actually
>>>>>> succeeded) and even if we copy in garbage nobody should care since
>>>>>> we'll return error in this case.
>>>>>
>>>>>
>>>>> I see.
>>>>> But the cmdcnt is re-read after copying out response. So it is
>>>>> possible that we read garbage response, but then read cmdcnt==0 and
>>>>> return OK to caller.
>>>>
>>>> That assumes that we actually timed out, and while we were copying the
>>>> data the response finally came.
>>>
>>> Right.
>>>
>>>>>
>>>>> So far I have something along the following lines to fix data races in libps2.c
>>>>
>>>> I don't know, maybe we should simply move call to
>>>> serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
>>>> and move copying of the buffer down, after checking cmdcnt.
>>>
>>> I don't know about serio_pause_rx, but copying of response should be
>>> done after checking cmdcnt.
>>
>> It will stop the interrupt handler from running while we are examining
>> the cmdcnt and copy out the data, thus removing the race.
>>
>>> Also you need to use smp_store_release/smp_load_acquire cmdcnt and
>>> flags when they have dependent data. And READ_ONCE/WRITE_ONCE on
>>> shared state otherwise is highly desirable.
>>>
>>>>> diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
>>>>> index 7551699..51c747f 100644
>>>>> --- a/drivers/input/serio/libps2.c
>>>>> +++ b/drivers/input/serio/libps2.c
>>>>> @@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
>>>>> char byte, int timeout)
>>>>>
>>>>>          if (serio_write(ps2dev->serio, byte) == 0)
>>>>>                  wait_event_timeout(ps2dev->wait,
>>>>> -                                   !(ps2dev->flags & PS2_FLAG_ACK),
>>>>> +                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
>>>>>                                     msecs_to_jiffies(timeout));
>>>>>
>>>>>          serio_pause_rx(ps2dev->serio);
>>>>> @@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
>>>>> char *param, int command)
>>>>>          int receive = (command >> 8) & 0xf;
>>>>>          int rc = -1;
>>>>>          int i;
>>>>> +        unsigned char cmdcnt;
>>>>>
>>>>>          if (receive > sizeof(ps2dev->cmdbuf)) {
>>>>>                  WARN_ON(1);
>>>>> @@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
>>>>> unsigned char *param, int command)
>>>>>          timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>
>>>>>          timeout = wait_event_timeout(ps2dev->wait,
>>>>> -                                     !(ps2dev->flags &
>>>>> PS2_FLAG_CMD1), timeout);
>>>>> -
>>>>> -        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
>>>>> +                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>
>>>>> +        if (READ_ONCE(&ps2dev->cmdcnt) &&
>>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>                  timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>                  wait_event_timeout(ps2dev->wait,
>>>>> -                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
>>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);
>>>>
>>>> What all these READ_ONCE()s give us?
>>>
>>> I've wrote up the response here:
>>> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
>>
>> I read it and I still do not understand what READ_ONCE() in
>> wait_event* conditions will buy us.
>>
>> Also if the following is true:
>>
>>> As the consequence C compilers stopped guarantying that "word accesses are atomic".
>>
>> a lot of stuff will break in the kernel. Maybe compilers should stop
>> moving towards the lala land?
>
> It buys us:
> 1. More readable code but highlighting important aspects. Inter-thread
> synchronization is important and complex, explicit is better than
> implicit in such contexts.

*Every* condition in wait_event* is modified by a separate thread,
there is no need to higlight anything.

> 2. Conformance to relevant standards and relieve you, me and everybody
> else reading this code from spending time on proving that it cannot
> break (which is not actually possible to do, "I don't see how it can
> break" is not quite proof).

I expect wait_event() API to ensure that the condition is re-evaluated
properly instead of sprinkling these annotations throughout entire
kernel. As far as I know prepare_to_wait* does provides necessary
barriers.

> 3. Allow tooling that finds undoubtedly harmful bugs, like this one.

You already found this bug without annotations, once it is fixed (by
expanding critical section) there is no longer a reason for using
slower access as there are no concurrency anymore.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-09-04 20:27                       ` Dmitry Torokhov
@ 2015-09-05 13:21                         ` Dmitry Vyukov
  2015-09-07 14:06                             ` Dmitry Vyukov
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Vyukov @ 2015-09-05 13:21 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

On Fri, Sep 4, 2015 at 10:27 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Fri, Sep 4, 2015 at 12:32 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Fri, Sep 4, 2015 at 6:56 PM, Dmitry Torokhov
>> <dmitry.torokhov@gmail.com> wrote:
>>> On Tue, Sep 1, 2015 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> On Fri, Aug 28, 2015 at 8:32 PM, Dmitry Torokhov
>>>> <dmitry.torokhov@gmail.com> wrote:
>>>>> On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
>>>>>> <dmitry.torokhov@gmail.com> wrote:
>>>>>>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I am looking at this code in __ps2_command again:
>>>>>>>>
>>>>>>>> /*
>>>>>>>> * The reset command takes a long time to execute.
>>>>>>>> */
>>>>>>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>>>>
>>>>>>>> timeout = wait_event_timeout(ps2dev->wait,
>>>>>>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>>>>
>>>>>>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>>>>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>>>>               wait_event_timeout(ps2dev->wait,
>>>>>>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>>>>>>> PS2_FLAG_CMD), timeout);
>>>>>>>> }
>>>>>>>>
>>>>>>>> if (param)
>>>>>>>>     for (i = 0; i < receive; i++)
>>>>>>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>>>>>>
>>>>>>>>
>>>>>>>> Here are two moments I don't understand:
>>>>>>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>>>>>>> is compared against 100ms). However, timeout is assigned to result of
>>>>>>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>>>>>>> me. What am I missing?
>>>>>>>
>>>>>>> The fact that wait_event_timeout can return value greater than one:
>>>>>>>
>>>>>>>  * Returns:
>>>>>>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>>>>>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>>>>>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>>>>>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>
>>>>>>
>>>>>> OK, makes sense now!
>>>>>>
>>>>>>>> 2. This code pays great attention to timeouts, but in the end I don't
>>>>>>>> see how it handles timeouts. That is, if a timeout is happened, we
>>>>>>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>>>>>>
>>>>>>> Once upon a time wait_event() did not return positive value when
>>>>>>> timeout expired and then condition satisfied. So we just examine the
>>>>>>> final state (psmpouse->cmdcnt should be 0 if command actually
>>>>>>> succeeded) and even if we copy in garbage nobody should care since
>>>>>>> we'll return error in this case.
>>>>>>
>>>>>>
>>>>>> I see.
>>>>>> But the cmdcnt is re-read after copying out response. So it is
>>>>>> possible that we read garbage response, but then read cmdcnt==0 and
>>>>>> return OK to caller.
>>>>>
>>>>> That assumes that we actually timed out, and while we were copying the
>>>>> data the response finally came.
>>>>
>>>> Right.
>>>>
>>>>>>
>>>>>> So far I have something along the following lines to fix data races in libps2.c
>>>>>
>>>>> I don't know, maybe we should simply move call to
>>>>> serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
>>>>> and move copying of the buffer down, after checking cmdcnt.
>>>>
>>>> I don't know about serio_pause_rx, but copying of response should be
>>>> done after checking cmdcnt.
>>>
>>> It will stop the interrupt handler from running while we are examining
>>> the cmdcnt and copy out the data, thus removing the race.
>>>
>>>> Also you need to use smp_store_release/smp_load_acquire cmdcnt and
>>>> flags when they have dependent data. And READ_ONCE/WRITE_ONCE on
>>>> shared state otherwise is highly desirable.
>>>>
>>>>>> diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
>>>>>> index 7551699..51c747f 100644
>>>>>> --- a/drivers/input/serio/libps2.c
>>>>>> +++ b/drivers/input/serio/libps2.c
>>>>>> @@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
>>>>>> char byte, int timeout)
>>>>>>
>>>>>>          if (serio_write(ps2dev->serio, byte) == 0)
>>>>>>                  wait_event_timeout(ps2dev->wait,
>>>>>> -                                   !(ps2dev->flags & PS2_FLAG_ACK),
>>>>>> +                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
>>>>>>                                     msecs_to_jiffies(timeout));
>>>>>>
>>>>>>          serio_pause_rx(ps2dev->serio);
>>>>>> @@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
>>>>>> char *param, int command)
>>>>>>          int receive = (command >> 8) & 0xf;
>>>>>>          int rc = -1;
>>>>>>          int i;
>>>>>> +        unsigned char cmdcnt;
>>>>>>
>>>>>>          if (receive > sizeof(ps2dev->cmdbuf)) {
>>>>>>                  WARN_ON(1);
>>>>>> @@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
>>>>>> unsigned char *param, int command)
>>>>>>          timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>>
>>>>>>          timeout = wait_event_timeout(ps2dev->wait,
>>>>>> -                                     !(ps2dev->flags &
>>>>>> PS2_FLAG_CMD1), timeout);
>>>>>> -
>>>>>> -        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
>>>>>> +                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>>
>>>>>> +        if (READ_ONCE(&ps2dev->cmdcnt) &&
>>>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>>                  timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>>                  wait_event_timeout(ps2dev->wait,
>>>>>> -                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
>>>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);
>>>>>
>>>>> What all these READ_ONCE()s give us?
>>>>
>>>> I've wrote up the response here:
>>>> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
>>>
>>> I read it and I still do not understand what READ_ONCE() in
>>> wait_event* conditions will buy us.
>>>
>>> Also if the following is true:
>>>
>>>> As the consequence C compilers stopped guarantying that "word accesses are atomic".
>>>
>>> a lot of stuff will break in the kernel. Maybe compilers should stop
>>> moving towards the lala land?
>>
>> It buys us:
>> 1. More readable code but highlighting important aspects. Inter-thread
>> synchronization is important and complex, explicit is better than
>> implicit in such contexts.
>
> *Every* condition in wait_event* is modified by a separate thread,
> there is no need to higlight anything.

Yeah, but it does not cancel subsequent points. Also, "do this
everywhere except wait_event*" looks inconsistent.


>> 2. Conformance to relevant standards and relieve you, me and everybody
>> else reading this code from spending time on proving that it cannot
>> break (which is not actually possible to do, "I don't see how it can
>> break" is not quite proof).
>
> I expect wait_event() API to ensure that the condition is re-evaluated
> properly instead of sprinkling these annotations throughout entire
> kernel. As far as I know prepare_to_wait* does provides necessary
> barriers.

Barriers do not fix it. Plain racy accesses are bugs. The fact that we
don't see how it can break does not make it correct.


>> 3. Allow tooling that finds undoubtedly harmful bugs, like this one.
>
> You already found this bug without annotations, once it is fixed (by
> expanding critical section) there is no longer a reason for using
> slower access as there are no concurrency anymore.

We've found this bug, but we've spent unreasonably large amount of time.
We've also started blacklisting functions with data races. This saves
our time, but leads to missed bugs.
So, no, it is not OK to have lots of unfixed data races to efficiently
use such tool.


Regarding performance, this is misconception. You pay only for what
you need. If you pay just a bit less you end up with broken code.
READ_ONCE namely says to do a single load and don't mess with this
memory location in any other way. This is _precisely_ what you want
here.
There is no price of READ_ONCE that you don't have to pay here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
  2015-09-05 13:21                         ` Dmitry Vyukov
@ 2015-09-07 14:06                             ` Dmitry Vyukov
  0 siblings, 0 replies; 19+ messages in thread
From: Dmitry Vyukov @ 2015-09-07 14:06 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

I've mailed a separate patch that does serio_pause_rx before reading
out data ("input: fix data race __ps2_command").


On Sat, Sep 5, 2015 at 3:21 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Sep 4, 2015 at 10:27 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
>> On Fri, Sep 4, 2015 at 12:32 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Fri, Sep 4, 2015 at 6:56 PM, Dmitry Torokhov
>>> <dmitry.torokhov@gmail.com> wrote:
>>>> On Tue, Sep 1, 2015 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>> On Fri, Aug 28, 2015 at 8:32 PM, Dmitry Torokhov
>>>>> <dmitry.torokhov@gmail.com> wrote:
>>>>>> On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
>>>>>>> <dmitry.torokhov@gmail.com> wrote:
>>>>>>>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am looking at this code in __ps2_command again:
>>>>>>>>>
>>>>>>>>> /*
>>>>>>>>> * The reset command takes a long time to execute.
>>>>>>>>> */
>>>>>>>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>>>>>
>>>>>>>>> timeout = wait_event_timeout(ps2dev->wait,
>>>>>>>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>>>>>
>>>>>>>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>>>>>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>>>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>>>>>               wait_event_timeout(ps2dev->wait,
>>>>>>>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>>>>>>>> PS2_FLAG_CMD), timeout);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> if (param)
>>>>>>>>>     for (i = 0; i < receive; i++)
>>>>>>>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here are two moments I don't understand:
>>>>>>>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>>>>>>>> is compared against 100ms). However, timeout is assigned to result of
>>>>>>>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>>>>>>>> me. What am I missing?
>>>>>>>>
>>>>>>>> The fact that wait_event_timeout can return value greater than one:
>>>>>>>>
>>>>>>>>  * Returns:
>>>>>>>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>>>>>>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>>>>>>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>>>>>>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>
>>>>>>>
>>>>>>> OK, makes sense now!
>>>>>>>
>>>>>>>>> 2. This code pays great attention to timeouts, but in the end I don't
>>>>>>>>> see how it handles timeouts. That is, if a timeout is happened, we
>>>>>>>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>>>>>>>
>>>>>>>> Once upon a time wait_event() did not return positive value when
>>>>>>>> timeout expired and then condition satisfied. So we just examine the
>>>>>>>> final state (psmpouse->cmdcnt should be 0 if command actually
>>>>>>>> succeeded) and even if we copy in garbage nobody should care since
>>>>>>>> we'll return error in this case.
>>>>>>>
>>>>>>>
>>>>>>> I see.
>>>>>>> But the cmdcnt is re-read after copying out response. So it is
>>>>>>> possible that we read garbage response, but then read cmdcnt==0 and
>>>>>>> return OK to caller.
>>>>>>
>>>>>> That assumes that we actually timed out, and while we were copying the
>>>>>> data the response finally came.
>>>>>
>>>>> Right.
>>>>>
>>>>>>>
>>>>>>> So far I have something along the following lines to fix data races in libps2.c
>>>>>>
>>>>>> I don't know, maybe we should simply move call to
>>>>>> serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
>>>>>> and move copying of the buffer down, after checking cmdcnt.
>>>>>
>>>>> I don't know about serio_pause_rx, but copying of response should be
>>>>> done after checking cmdcnt.
>>>>
>>>> It will stop the interrupt handler from running while we are examining
>>>> the cmdcnt and copy out the data, thus removing the race.
>>>>
>>>>> Also you need to use smp_store_release/smp_load_acquire cmdcnt and
>>>>> flags when they have dependent data. And READ_ONCE/WRITE_ONCE on
>>>>> shared state otherwise is highly desirable.
>>>>>
>>>>>>> diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
>>>>>>> index 7551699..51c747f 100644
>>>>>>> --- a/drivers/input/serio/libps2.c
>>>>>>> +++ b/drivers/input/serio/libps2.c
>>>>>>> @@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
>>>>>>> char byte, int timeout)
>>>>>>>
>>>>>>>          if (serio_write(ps2dev->serio, byte) == 0)
>>>>>>>                  wait_event_timeout(ps2dev->wait,
>>>>>>> -                                   !(ps2dev->flags & PS2_FLAG_ACK),
>>>>>>> +                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
>>>>>>>                                     msecs_to_jiffies(timeout));
>>>>>>>
>>>>>>>          serio_pause_rx(ps2dev->serio);
>>>>>>> @@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
>>>>>>> char *param, int command)
>>>>>>>          int receive = (command >> 8) & 0xf;
>>>>>>>          int rc = -1;
>>>>>>>          int i;
>>>>>>> +        unsigned char cmdcnt;
>>>>>>>
>>>>>>>          if (receive > sizeof(ps2dev->cmdbuf)) {
>>>>>>>                  WARN_ON(1);
>>>>>>> @@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
>>>>>>> unsigned char *param, int command)
>>>>>>>          timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>>>
>>>>>>>          timeout = wait_event_timeout(ps2dev->wait,
>>>>>>> -                                     !(ps2dev->flags &
>>>>>>> PS2_FLAG_CMD1), timeout);
>>>>>>> -
>>>>>>> -        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
>>>>>>> +                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>>>
>>>>>>> +        if (READ_ONCE(&ps2dev->cmdcnt) &&
>>>>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>>>                  timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>>>                  wait_event_timeout(ps2dev->wait,
>>>>>>> -                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
>>>>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);
>>>>>>
>>>>>> What all these READ_ONCE()s give us?
>>>>>
>>>>> I've wrote up the response here:
>>>>> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
>>>>
>>>> I read it and I still do not understand what READ_ONCE() in
>>>> wait_event* conditions will buy us.
>>>>
>>>> Also if the following is true:
>>>>
>>>>> As the consequence C compilers stopped guarantying that "word accesses are atomic".
>>>>
>>>> a lot of stuff will break in the kernel. Maybe compilers should stop
>>>> moving towards the lala land?
>>>
>>> It buys us:
>>> 1. More readable code but highlighting important aspects. Inter-thread
>>> synchronization is important and complex, explicit is better than
>>> implicit in such contexts.
>>
>> *Every* condition in wait_event* is modified by a separate thread,
>> there is no need to higlight anything.
>
> Yeah, but it does not cancel subsequent points. Also, "do this
> everywhere except wait_event*" looks inconsistent.
>
>
>>> 2. Conformance to relevant standards and relieve you, me and everybody
>>> else reading this code from spending time on proving that it cannot
>>> break (which is not actually possible to do, "I don't see how it can
>>> break" is not quite proof).
>>
>> I expect wait_event() API to ensure that the condition is re-evaluated
>> properly instead of sprinkling these annotations throughout entire
>> kernel. As far as I know prepare_to_wait* does provides necessary
>> barriers.
>
> Barriers do not fix it. Plain racy accesses are bugs. The fact that we
> don't see how it can break does not make it correct.
>
>
>>> 3. Allow tooling that finds undoubtedly harmful bugs, like this one.
>>
>> You already found this bug without annotations, once it is fixed (by
>> expanding critical section) there is no longer a reason for using
>> slower access as there are no concurrency anymore.
>
> We've found this bug, but we've spent unreasonably large amount of time.
> We've also started blacklisting functions with data races. This saves
> our time, but leads to missed bugs.
> So, no, it is not OK to have lots of unfixed data races to efficiently
> use such tool.
>
>
> Regarding performance, this is misconception. You pay only for what
> you need. If you pay just a bit less you end up with broken code.
> READ_ONCE namely says to do a single load and don't mess with this
> memory location in any other way. This is _precisely_ what you want
> here.
> There is no price of READ_ONCE that you don't have to pay here.



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Potential data race in psmouse_interrupt
@ 2015-09-07 14:06                             ` Dmitry Vyukov
  0 siblings, 0 replies; 19+ messages in thread
From: Dmitry Vyukov @ 2015-09-07 14:06 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Pali Rohár, Andrey Konovalov, Hans de Goede,
	Mathias Gottschlag, Shailendra Verma, Rusty Russell,
	Luis R. Rodriguez, Thomas Hellstrom, linux-input, LKML,
	Kostya Serebryany, Alexander Potapenko, ktsan

I've mailed a separate patch that does serio_pause_rx before reading
out data ("input: fix data race __ps2_command").


On Sat, Sep 5, 2015 at 3:21 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Sep 4, 2015 at 10:27 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
>> On Fri, Sep 4, 2015 at 12:32 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Fri, Sep 4, 2015 at 6:56 PM, Dmitry Torokhov
>>> <dmitry.torokhov@gmail.com> wrote:
>>>> On Tue, Sep 1, 2015 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>> On Fri, Aug 28, 2015 at 8:32 PM, Dmitry Torokhov
>>>>> <dmitry.torokhov@gmail.com> wrote:
>>>>>> On Fri, Aug 28, 2015 at 11:08 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>> On Fri, Aug 28, 2015 at 7:51 PM, Dmitry Torokhov
>>>>>>> <dmitry.torokhov@gmail.com> wrote:
>>>>>>>> On Fri, Aug 28, 2015 at 10:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am looking at this code in __ps2_command again:
>>>>>>>>>
>>>>>>>>> /*
>>>>>>>>> * The reset command takes a long time to execute.
>>>>>>>>> */
>>>>>>>>> timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>>>>>
>>>>>>>>> timeout = wait_event_timeout(ps2dev->wait,
>>>>>>>>>     !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>>>>>
>>>>>>>>> if (smp_load_acquire(&ps2dev->cmdcnt) &&
>>>>>>>>>     !(smp_load_acquire(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>>>>>               timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>>>>>               wait_event_timeout(ps2dev->wait,
>>>>>>>>>                         !(smp_load_acquire(&ps2dev->flags) &
>>>>>>>>> PS2_FLAG_CMD), timeout);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> if (param)
>>>>>>>>>     for (i = 0; i < receive; i++)
>>>>>>>>>               param[i] = ps2dev->cmdbuf[(receive - 1) - i];
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here are two moments I don't understand:
>>>>>>>>> 1. The last parameter of ps2_adjust_timeout is timeout in jiffies (it
>>>>>>>>> is compared against 100ms). However, timeout is assigned to result of
>>>>>>>>> wait_event_timeout, which returns 0 or 1. This does not make sense to
>>>>>>>>> me. What am I missing?
>>>>>>>>
>>>>>>>> The fact that wait_event_timeout can return value greater than one:
>>>>>>>>
>>>>>>>>  * Returns:
>>>>>>>>  * 0 if the @condition evaluated to %false after the @timeout elapsed,
>>>>>>>>  * 1 if the @condition evaluated to %true after the @timeout elapsed,
>>>>>>>>  * or the remaining jiffies (at least 1) if the @condition evaluated
>>>>>>>>               ^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>>
>>>>>>>
>>>>>>> OK, makes sense now!
>>>>>>>
>>>>>>>>> 2. This code pays great attention to timeouts, but in the end I don't
>>>>>>>>> see how it handles timeouts. That is, if a timeout is happened, we
>>>>>>>>> still copyout (garbage) from cmdbuf. What am I missing here?
>>>>>>>>
>>>>>>>> Once upon a time wait_event() did not return positive value when
>>>>>>>> timeout expired and then condition satisfied. So we just examine the
>>>>>>>> final state (psmpouse->cmdcnt should be 0 if command actually
>>>>>>>> succeeded) and even if we copy in garbage nobody should care since
>>>>>>>> we'll return error in this case.
>>>>>>>
>>>>>>>
>>>>>>> I see.
>>>>>>> But the cmdcnt is re-read after copying out response. So it is
>>>>>>> possible that we read garbage response, but then read cmdcnt==0 and
>>>>>>> return OK to caller.
>>>>>>
>>>>>> That assumes that we actually timed out, and while we were copying the
>>>>>> data the response finally came.
>>>>>
>>>>> Right.
>>>>>
>>>>>>>
>>>>>>> So far I have something along the following lines to fix data races in libps2.c
>>>>>>
>>>>>> I don't know, maybe we should simply move call to
>>>>>> serio_pause_rx(ps2dev->serio) higher, before we check ps2dev->cmdcnt,
>>>>>> and move copying of the buffer down, after checking cmdcnt.
>>>>>
>>>>> I don't know about serio_pause_rx, but copying of response should be
>>>>> done after checking cmdcnt.
>>>>
>>>> It will stop the interrupt handler from running while we are examining
>>>> the cmdcnt and copy out the data, thus removing the race.
>>>>
>>>>> Also you need to use smp_store_release/smp_load_acquire cmdcnt and
>>>>> flags when they have dependent data. And READ_ONCE/WRITE_ONCE on
>>>>> shared state otherwise is highly desirable.
>>>>>
>>>>>>> diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
>>>>>>> index 7551699..51c747f 100644
>>>>>>> --- a/drivers/input/serio/libps2.c
>>>>>>> +++ b/drivers/input/serio/libps2.c
>>>>>>> @@ -43,7 +43,7 @@ int ps2_sendbyte(struct ps2dev *ps2dev, unsigned
>>>>>>> char byte, int timeout)
>>>>>>>
>>>>>>>          if (serio_write(ps2dev->serio, byte) == 0)
>>>>>>>                  wait_event_timeout(ps2dev->wait,
>>>>>>> -                                   !(ps2dev->flags & PS2_FLAG_ACK),
>>>>>>> +                                   !(READ_ONCE(ps2dev->flags) & PS2_FLAG_ACK),
>>>>>>>                                     msecs_to_jiffies(timeout));
>>>>>>>
>>>>>>>          serio_pause_rx(ps2dev->serio);
>>>>>>> @@ -187,6 +187,7 @@ int __ps2_command(struct ps2dev *ps2dev, unsigned
>>>>>>> char *param, int command)
>>>>>>>          int receive = (command >> 8) & 0xf;
>>>>>>>          int rc = -1;
>>>>>>>          int i;
>>>>>>> +        unsigned char cmdcnt;
>>>>>>>
>>>>>>>          if (receive > sizeof(ps2dev->cmdbuf)) {
>>>>>>>                  WARN_ON(1);
>>>>>>> @@ -225,23 +226,22 @@ int __ps2_command(struct ps2dev *ps2dev,
>>>>>>> unsigned char *param, int command)
>>>>>>>          timeout = msecs_to_jiffies(command == PS2_CMD_RESET_BAT ? 4000 : 500);
>>>>>>>
>>>>>>>          timeout = wait_event_timeout(ps2dev->wait,
>>>>>>> -                                     !(ps2dev->flags &
>>>>>>> PS2_FLAG_CMD1), timeout);
>>>>>>> -
>>>>>>> -        if (ps2dev->cmdcnt && !(ps2dev->flags & PS2_FLAG_CMD1)) {
>>>>>>> +                !(READ_ONCE(ps2dev->flags) & PS2_FLAG_CMD1), timeout);
>>>>>>>
>>>>>>> +        if (READ_ONCE(&ps2dev->cmdcnt) &&
>>>>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD1)) {
>>>>>>>                  timeout = ps2_adjust_timeout(ps2dev, command, timeout);
>>>>>>>                  wait_event_timeout(ps2dev->wait,
>>>>>>> -                                   !(ps2dev->flags & PS2_FLAG_CMD), timeout);
>>>>>>> +                        !(READ_ONCE(&ps2dev->flags) & PS2_FLAG_CMD), timeout);
>>>>>>
>>>>>> What all these READ_ONCE()s give us?
>>>>>
>>>>> I've wrote up the response here:
>>>>> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
>>>>
>>>> I read it and I still do not understand what READ_ONCE() in
>>>> wait_event* conditions will buy us.
>>>>
>>>> Also if the following is true:
>>>>
>>>>> As the consequence C compilers stopped guarantying that "word accesses are atomic".
>>>>
>>>> a lot of stuff will break in the kernel. Maybe compilers should stop
>>>> moving towards the lala land?
>>>
>>> It buys us:
>>> 1. More readable code but highlighting important aspects. Inter-thread
>>> synchronization is important and complex, explicit is better than
>>> implicit in such contexts.
>>
>> *Every* condition in wait_event* is modified by a separate thread,
>> there is no need to higlight anything.
>
> Yeah, but it does not cancel subsequent points. Also, "do this
> everywhere except wait_event*" looks inconsistent.
>
>
>>> 2. Conformance to relevant standards and relieve you, me and everybody
>>> else reading this code from spending time on proving that it cannot
>>> break (which is not actually possible to do, "I don't see how it can
>>> break" is not quite proof).
>>
>> I expect wait_event() API to ensure that the condition is re-evaluated
>> properly instead of sprinkling these annotations throughout entire
>> kernel. As far as I know prepare_to_wait* does provides necessary
>> barriers.
>
> Barriers do not fix it. Plain racy accesses are bugs. The fact that we
> don't see how it can break does not make it correct.
>
>
>>> 3. Allow tooling that finds undoubtedly harmful bugs, like this one.
>>
>> You already found this bug without annotations, once it is fixed (by
>> expanding critical section) there is no longer a reason for using
>> slower access as there are no concurrency anymore.
>
> We've found this bug, but we've spent unreasonably large amount of time.
> We've also started blacklisting functions with data races. This saves
> our time, but leads to missed bugs.
> So, no, it is not OK to have lots of unfixed data races to efficiently
> use such tool.
>
>
> Regarding performance, this is misconception. You pay only for what
> you need. If you pay just a bit less you end up with broken code.
> READ_ONCE namely says to do a single load and don't mess with this
> memory location in any other way. This is _precisely_ what you want
> here.
> There is no price of READ_ONCE that you don't have to pay here.



-- 
Dmitry Vyukov, Software Engineer, dvyukov@google.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-09-07 14:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAAeHK+wZ=fXCRDUvkQdSmFWxU1fqN5suF=P1kL_CT1L8p3ybDg@mail.gmail.com>
2015-07-22 13:14 ` Fwd: Potential data race in psmouse_interrupt Andrey Konovalov
2015-07-22 16:52   ` Dmitry Torokhov
2015-07-23 13:08   ` Pali Rohár
2015-07-23 13:31     ` Andrey Konovalov
2015-07-23 13:31       ` Andrey Konovalov
2015-07-29 11:53       ` Pali Rohár
2015-07-29 11:53         ` Pali Rohár
2015-08-28 17:34         ` Dmitry Vyukov
2015-08-28 17:51           ` Dmitry Torokhov
2015-08-28 18:08             ` Dmitry Vyukov
2015-08-28 18:32               ` Dmitry Torokhov
2015-08-28 18:34                 ` Dmitry Torokhov
2015-09-01 18:46                 ` Dmitry Vyukov
2015-09-04 16:56                   ` Dmitry Torokhov
2015-09-04 19:32                     ` Dmitry Vyukov
2015-09-04 20:27                       ` Dmitry Torokhov
2015-09-05 13:21                         ` Dmitry Vyukov
2015-09-07 14:06                           ` Dmitry Vyukov
2015-09-07 14:06                             ` Dmitry Vyukov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.