* ata: BUG in ata_sff_hsm_move @ 2016-01-28 11:35 Dmitry Vyukov 2016-01-29 11:52 ` Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Dmitry Vyukov @ 2016-01-28 11:35 UTC (permalink / raw) To: Tejun Heo, linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov Cc: syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin Hello, I episodically hit the following BUG while running syzkaller fuzzer: ata2: protocol 2 task_state 0 (dev_stat 0x41) ------------[ cut here ]------------ kernel BUG at drivers/ata/libata-sff.c:1302! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 1 PID: 10679 Comm: syz-executor Not tainted 4.5.0-rc1+ #300 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88002bd00000 ti: ffff88002e048000 task.ti: ffff88002e048000 RIP: 0010:[<ffffffff83a83409>] [<ffffffff83a83409>] ata_sff_hsm_move+0x619/0x1c60 RSP: 0000:ffff88003ed07c80 EFLAGS: 00010086 RAX: 000000000000002d RBX: 0000000000000041 RCX: 0000000000000000 RDX: 000000000000002d RSI: ffff88003ed16cd8 RDI: ffffed0007da0f84 RBP: ffff88003ed07d20 R08: 0000000000000001 R09: 0000000000000000 R10: ffff88002bd00000 R11: 1ffffffff134edc5 R12: 0000000000000000 R13: dffffc0000000000 R14: ffff880060bc8000 R15: ffff880060bc8198 FS: 00007fa131e89700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000451a40 CR3: 000000002f797000 CR4: 00000000000006e0 Stack: ffff88003ed07e08 ffffffff81455c49 0000000035b0df00 ffff88003dfe7d78 ffffffff87b5d0d8 dffffc0000000000 ffffffff87b5d0c0 0000000000000000 ffff880060bc81c0 ffff880060bca408 ffffed000c179038 ffff880060bc81a0 Call Trace: <IRQ> [<ffffffff83a84c31>] __ata_sff_port_intr+0x1e1/0x3a0 drivers/ata/libata-sff.c:1584 [<ffffffff83a85611>] ata_bmdma_port_intr+0x71/0x400 drivers/ata/libata-sff.c:2877 [< inline >] __ata_sff_interrupt drivers/ata/libata-sff.c:1629 [<ffffffff83a85bf3>] ata_bmdma_interrupt+0x253/0x580 drivers/ata/libata-sff.c:2902 [<ffffffff81479f98>] handle_irq_event_percpu+0x108/0x7e0 kernel/irq/handle.c:157 [<ffffffff8147a717>] handle_irq_event+0xa7/0x140 kernel/irq/handle.c:205 [<ffffffff81484573>] handle_edge_irq+0x1e3/0x8d0 kernel/irq/chip.c:623 [< inline >] generic_handle_irq_desc include/linux/irqdesc.h:146 [<ffffffff811a92bc>] handle_irq+0x10c/0x2a0 arch/x86/kernel/irq_64.c:78 [<ffffffff811a7e4d>] do_IRQ+0x7d/0x1a0 arch/x86/kernel/irq.c:240 [<ffffffff86653d4c>] common_interrupt+0x8c/0x8c arch/x86/entry/entry_64.S:520 <EOI> [< inline >] rcu_lock_acquire include/linux/rcupdate.h:490 [< inline >] rcu_read_lock include/linux/rcupdate.h:874 [<ffffffff8164b4a1>] filemap_map_pages+0x131/0xba0 mm/filemap.c:2145 [< inline >] do_fault_around mm/memory.c:2943 [< inline >] do_read_fault mm/memory.c:2962 [< inline >] do_fault mm/memory.c:3133 [< inline >] handle_pte_fault mm/memory.c:3308 [< inline >] __handle_mm_fault mm/memory.c:3418 [<ffffffff816efb16>] handle_mm_fault+0x2516/0x49a0 mm/memory.c:3447 [<ffffffff8127dc16>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238 [<ffffffff8127e358>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331 [<ffffffff8126f514>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264 [<ffffffff86655578>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986 Code: b6 0c 01 48 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 9e 13 00 00 41 8b 76 24 44 89 e1 48 c7 c7 40 21 d7 86 e8 ac 22 bc fd <0f> 0b 41 83 fc 01 75 84 e8 aa 6d ae fd 48 8b 55 c0 48 b8 00 00 RIP [<ffffffff83a83409>] ata_sff_hsm_move+0x619/0x1c60 drivers/ata/libata-sff.c:1299 RSP <ffff88003ed07c80> ---[ end trace 3f182793f8f54475 ]--- Kernel panic - not syncing: Fatal exception in interrupt Shutting down cpus with NMI Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt I've added the following debug print output of which you can see above: diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index cdf6215..30fd154 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -1296,6 +1296,9 @@ fsm_start: break; default: poll_next = 0; + pr_err("ata%u: protocol %d task_state %d (dev_stat 0x%X)\n", + ap->print_id, qc->tf.protocol, ap->hsm_task_state, + status); BUG(); } So the unexpected state is HSM_IDLE. It is not trivially reproducible. If you want to reproduce, follow instructions at: https://github.com/google/syzkaller/wiki/How-to-execute-syzkaller-programs Using the following file as program: https://gist.githubusercontent.com/dvyukov/7a9ee0544eb0cda5210d/raw/7076e5a7749c0ebdb59ee83b1683ac9d6ff76059/gistfile1.txt This is on commit 92e963f50fc74041b5e9e744c330dca48e04f08d. ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-28 11:35 ata: BUG in ata_sff_hsm_move Dmitry Vyukov @ 2016-01-29 11:52 ` Tejun Heo 2016-01-29 11:59 ` Dmitry Vyukov 2016-01-29 12:20 ` [PATCH libata/for-4.5-fixes] libata-sff: use WARN instead of BUG on illegal host state machine state Tejun Heo 0 siblings, 2 replies; 12+ messages in thread From: Tejun Heo @ 2016-01-29 11:52 UTC (permalink / raw) To: Dmitry Vyukov Cc: linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin Hello, Dmitry. On Thu, Jan 28, 2016 at 12:35:43PM +0100, Dmitry Vyukov wrote: > Hello, > > I episodically hit the following BUG while running syzkaller fuzzer: > > ata2: protocol 2 task_state 0 (dev_stat 0x41) > ------------[ cut here ]------------ > kernel BUG at drivers/ata/libata-sff.c:1302! ... > So the unexpected state is HSM_IDLE. Hmmm... the port interrupt handler checks for IDLE before calling into hsm_move, so the only explanation would be that something is resetting it to IDLE inbetween. ce7514526742 ("libata: prevent HSM state change race between ISR and PIO") describes and fixes the same problem. The fix seems correct and I can't find anywhere else where this can happen. :( Can you please post the kernel log leading to the BUG? Also, I don't think that condition needs to be BUG. I'll change it to WARN. Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-29 11:52 ` Tejun Heo @ 2016-01-29 11:59 ` Dmitry Vyukov 2016-01-29 12:23 ` Tejun Heo 2016-01-29 12:20 ` [PATCH libata/for-4.5-fixes] libata-sff: use WARN instead of BUG on illegal host state machine state Tejun Heo 1 sibling, 1 reply; 12+ messages in thread From: Dmitry Vyukov @ 2016-01-29 11:59 UTC (permalink / raw) To: Tejun Heo Cc: linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin On Fri, Jan 29, 2016 at 12:52 PM, Tejun Heo <tj@kernel.org> wrote: > Hello, Dmitry. > > On Thu, Jan 28, 2016 at 12:35:43PM +0100, Dmitry Vyukov wrote: >> Hello, >> >> I episodically hit the following BUG while running syzkaller fuzzer: >> >> ata2: protocol 2 task_state 0 (dev_stat 0x41) >> ------------[ cut here ]------------ >> kernel BUG at drivers/ata/libata-sff.c:1302! > ... >> So the unexpected state is HSM_IDLE. > > Hmmm... the port interrupt handler checks for IDLE before calling into > hsm_move, so the only explanation would be that something is resetting > it to IDLE inbetween. ce7514526742 ("libata: prevent HSM state change > race between ISR and PIO") describes and fixes the same problem. The > fix seems correct and I can't find anywhere else where this can > happen. :( > > Can you please post the kernel log leading to the BUG? Also, I don't > think that condition needs to be BUG. I'll change it to WARN. Here are two logs, in both cases no kernel messages prior to the bug: https://gist.githubusercontent.com/dvyukov/5087d633e3620280b6c7/raw/31c9ab1ced92ac5f85cfb15eaf48ec5793c2c3a1/gistfile1.txt https://gist.githubusercontent.com/dvyukov/825b2e3d5fb80ae08a9a/raw/03c5a4f4c4bd9d0a304a71cda2da4c92f4b7f1ba/gistfile1.txt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-29 11:59 ` Dmitry Vyukov @ 2016-01-29 12:23 ` Tejun Heo 2016-01-29 13:18 ` Dmitry Vyukov 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2016-01-29 12:23 UTC (permalink / raw) To: Dmitry Vyukov Cc: linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin Hello, Dmitry. On Fri, Jan 29, 2016 at 12:59:49PM +0100, Dmitry Vyukov wrote: > > Hmmm... the port interrupt handler checks for IDLE before calling into > > hsm_move, so the only explanation would be that something is resetting > > it to IDLE inbetween. ce7514526742 ("libata: prevent HSM state change > > race between ISR and PIO") describes and fixes the same problem. The > > fix seems correct and I can't find anywhere else where this can > > happen. :( > > > > Can you please post the kernel log leading to the BUG? Also, I don't > > think that condition needs to be BUG. I'll change it to WARN. > > Here are two logs, in both cases no kernel messages prior to the bug: > https://gist.githubusercontent.com/dvyukov/5087d633e3620280b6c7/raw/31c9ab1ced92ac5f85cfb15eaf48ec5793c2c3a1/gistfile1.txt > https://gist.githubusercontent.com/dvyukov/825b2e3d5fb80ae08a9a/raw/03c5a4f4c4bd9d0a304a71cda2da4c92f4b7f1ba/gistfile1.txt lol, this is kinda embarrassing. It looks like the poll path wasn't doing any locking. Can you please verify the following patch at least doesn't crash the machine immediately and if so keep it applied to the test kernel so that we can verify that the problem actually goes away? Thanks. diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 608677d..6991efc 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -1362,12 +1362,14 @@ static void ata_sff_pio_task(struct work_struct *work) u8 status; int poll_next; + spin_lock_irq(ap->lock); + BUG_ON(ap->sff_pio_task_link == NULL); /* qc can be NULL if timeout occurred */ qc = ata_qc_from_tag(ap, link->active_tag); if (!qc) { ap->sff_pio_task_link = NULL; - return; + goto out_unlock; } fsm_start: @@ -1382,11 +1384,14 @@ static void ata_sff_pio_task(struct work_struct *work) */ status = ata_sff_busy_wait(ap, ATA_BUSY, 5); if (status & ATA_BUSY) { + spin_unlock_irq(ap->lock); ata_msleep(ap, 2); + spin_lock_irq(ap->lock); + status = ata_sff_busy_wait(ap, ATA_BUSY, 10); if (status & ATA_BUSY) { ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); - return; + goto out_unlock; } } @@ -1403,6 +1408,8 @@ static void ata_sff_pio_task(struct work_struct *work) */ if (poll_next) goto fsm_start; +out_unlock: + spin_unlock_irq(ap->lock); } /** ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-29 12:23 ` Tejun Heo @ 2016-01-29 13:18 ` Dmitry Vyukov 2016-01-29 13:40 ` Dmitry Vyukov 0 siblings, 1 reply; 12+ messages in thread From: Dmitry Vyukov @ 2016-01-29 13:18 UTC (permalink / raw) To: syzkaller Cc: linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, Kostya Serebryany, Alexander Potapenko, Sasha Levin On Fri, Jan 29, 2016 at 1:23 PM, Tejun Heo <tj@kernel.org> wrote: > Hello, Dmitry. > > On Fri, Jan 29, 2016 at 12:59:49PM +0100, Dmitry Vyukov wrote: >> > Hmmm... the port interrupt handler checks for IDLE before calling into >> > hsm_move, so the only explanation would be that something is resetting >> > it to IDLE inbetween. ce7514526742 ("libata: prevent HSM state change >> > race between ISR and PIO") describes and fixes the same problem. The >> > fix seems correct and I can't find anywhere else where this can >> > happen. :( >> > >> > Can you please post the kernel log leading to the BUG? Also, I don't >> > think that condition needs to be BUG. I'll change it to WARN. >> >> Here are two logs, in both cases no kernel messages prior to the bug: >> https://gist.githubusercontent.com/dvyukov/5087d633e3620280b6c7/raw/31c9ab1ced92ac5f85cfb15eaf48ec5793c2c3a1/gistfile1.txt >> https://gist.githubusercontent.com/dvyukov/825b2e3d5fb80ae08a9a/raw/03c5a4f4c4bd9d0a304a71cda2da4c92f4b7f1ba/gistfile1.txt > > lol, this is kinda embarrassing. It looks like the poll path wasn't > doing any locking. Can you please verify the following patch at least > doesn't crash the machine immediately and if so keep it applied to the > test kernel so that we can verify that the problem actually goes away? Great that you managed to debug it without a repro! I've applied this patch to my tree and will rerun fuzzer. I will notify you if I see this warning again. Thanks > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 608677d..6991efc 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -1362,12 +1362,14 @@ static void ata_sff_pio_task(struct work_struct *work) > u8 status; > int poll_next; > > + spin_lock_irq(ap->lock); > + > BUG_ON(ap->sff_pio_task_link == NULL); > /* qc can be NULL if timeout occurred */ > qc = ata_qc_from_tag(ap, link->active_tag); > if (!qc) { > ap->sff_pio_task_link = NULL; > - return; > + goto out_unlock; > } > > fsm_start: > @@ -1382,11 +1384,14 @@ static void ata_sff_pio_task(struct work_struct *work) > */ > status = ata_sff_busy_wait(ap, ATA_BUSY, 5); > if (status & ATA_BUSY) { > + spin_unlock_irq(ap->lock); > ata_msleep(ap, 2); > + spin_lock_irq(ap->lock); > + > status = ata_sff_busy_wait(ap, ATA_BUSY, 10); > if (status & ATA_BUSY) { > ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); > - return; > + goto out_unlock; > } > } > > @@ -1403,6 +1408,8 @@ static void ata_sff_pio_task(struct work_struct *work) > */ > if (poll_next) > goto fsm_start; > +out_unlock: > + spin_unlock_irq(ap->lock); > } > > /** > > -- > You received this message because you are subscribed to the Google Groups "syzkaller" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-29 13:18 ` Dmitry Vyukov @ 2016-01-29 13:40 ` Dmitry Vyukov 2016-01-29 18:14 ` David Milburn 2016-01-29 20:23 ` Tejun Heo 0 siblings, 2 replies; 12+ messages in thread From: Dmitry Vyukov @ 2016-01-29 13:40 UTC (permalink / raw) To: syzkaller Cc: linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, Kostya Serebryany, Alexander Potapenko, Sasha Levin On Fri, Jan 29, 2016 at 2:18 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Fri, Jan 29, 2016 at 1:23 PM, Tejun Heo <tj@kernel.org> wrote: >> Hello, Dmitry. >> >> On Fri, Jan 29, 2016 at 12:59:49PM +0100, Dmitry Vyukov wrote: >>> > Hmmm... the port interrupt handler checks for IDLE before calling into >>> > hsm_move, so the only explanation would be that something is resetting >>> > it to IDLE inbetween. ce7514526742 ("libata: prevent HSM state change >>> > race between ISR and PIO") describes and fixes the same problem. The >>> > fix seems correct and I can't find anywhere else where this can >>> > happen. :( >>> > >>> > Can you please post the kernel log leading to the BUG? Also, I don't >>> > think that condition needs to be BUG. I'll change it to WARN. >>> >>> Here are two logs, in both cases no kernel messages prior to the bug: >>> https://gist.githubusercontent.com/dvyukov/5087d633e3620280b6c7/raw/31c9ab1ced92ac5f85cfb15eaf48ec5793c2c3a1/gistfile1.txt >>> https://gist.githubusercontent.com/dvyukov/825b2e3d5fb80ae08a9a/raw/03c5a4f4c4bd9d0a304a71cda2da4c92f4b7f1ba/gistfile1.txt >> >> lol, this is kinda embarrassing. It looks like the poll path wasn't >> doing any locking. Can you please verify the following patch at least >> doesn't crash the machine immediately and if so keep it applied to the >> test kernel so that we can verify that the problem actually goes away? > > > Great that you managed to debug it without a repro! > I've applied this patch to my tree and will rerun fuzzer. I will > notify you if I see this warning again. > Thanks It now deadlocks at this stack. It is probably not OK to call ata_sff_hsm_move holding the spinlock. [ 8.168809] NMI backtrace for cpu 2 [ 8.168809] CPU: 2 PID: 17 Comm: kworker/2:0 Not tainted 4.5.0-rc1+ #302 [ 8.168809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 8.168809] Workqueue: ata_sff ata_sff_pio_task [ 8.168809] task: ffff88006d24c740 ti: ffff88006d090000 task.ti: ffff88006d090000 [ 8.168809] RIP: 0010:[<ffffffff82c0e388>] [<ffffffff82c0e388>] delay_tsc+0x18/0x70 [ 8.168809] RSP: 0000:ffff88006d097a60 EFLAGS: 00000086 [ 8.168809] RAX: 000000008edaeede RBX: ffff880036e88028 RCX: 000000000000001e [ 8.168809] RDX: 0000000000000033 RSI: 000000338edaee9c RDI: 0000000000000001 [ 8.168809] RBP: ffff88006d097a60 R08: 0000000000000002 R09: 0000000000000000 [ 8.168809] R10: fffffbfff11baf82 R11: 1ffffffff134edc5 R12: ffff880036e88038 [ 8.168809] R13: 000000009a9d2d40 R14: ffff880036e88030 R15: 0000000099383d98 [ 8.168809] FS: 0000000000000000(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000 [ 8.168809] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 8.168809] CR2: 00000000ffffffff CR3: 0000000007a4b000 CR4: 00000000000006e0 [ 8.168809] Stack: [ 8.168809] ffff88006d097a70 ffffffff82c0e2da ffff88006d097aa8 ffffffff814661a9 [ 8.168809] ffff880036e88028 0000000000000082 ffff880036e90008 ffff880036e90010 [ 8.168809] 0000000000000001 ffff88006d097ad0 ffffffff86652c47 ffffffff83a80c4c [ 8.168809] Call Trace: [ 8.168809] [<ffffffff82c0e2da>] __delay+0xa/0x10 [ 8.168809] [<ffffffff814661a9>] do_raw_spin_lock+0x149/0x2b0 [ 8.168809] [<ffffffff86652c47>] _raw_spin_lock_irqsave+0xa7/0xd0 [ 8.168809] [<ffffffff83a80c4c>] ? ata_hsm_qc_complete+0x12c/0x420 [ 8.168809] [<ffffffff83a80c4c>] ata_hsm_qc_complete+0x12c/0x420 [ 8.168809] [<ffffffff83a82f42>] ata_sff_hsm_move+0x1e2/0x1c60 [ 8.168809] [<ffffffff83a85f17>] ? ata_sff_pio_task+0x87/0x530 [ 8.168809] [<ffffffff83a86296>] ata_sff_pio_task+0x406/0x530 [ 8.168809] [<ffffffff813a2386>] process_one_work+0x796/0x1440 [ 8.168809] [<ffffffff813a22ba>] ? process_one_work+0x6ca/0x1440 [ 8.168809] [<ffffffff813d6190>] ? finish_task_switch+0x120/0x5f0 [ 8.168809] [<ffffffff813a1bf0>] ? pwq_dec_nr_in_flight+0x2e0/0x2e0 [ 8.168809] [<ffffffff813a310b>] worker_thread+0xdb/0xfc0 [ 8.168809] [<ffffffff813b637f>] kthread+0x23f/0x2d0 [ 8.168809] [<ffffffff813a3030>] ? process_one_work+0x1440/0x1440 [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0 [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0 [ 8.168809] [<ffffffff8665362f>] ret_from_fork+0x3f/0x70 [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0 >> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c >> index 608677d..6991efc 100644 >> --- a/drivers/ata/libata-sff.c >> +++ b/drivers/ata/libata-sff.c >> @@ -1362,12 +1362,14 @@ static void ata_sff_pio_task(struct work_struct *work) >> u8 status; >> int poll_next; >> >> + spin_lock_irq(ap->lock); >> + >> BUG_ON(ap->sff_pio_task_link == NULL); >> /* qc can be NULL if timeout occurred */ >> qc = ata_qc_from_tag(ap, link->active_tag); >> if (!qc) { >> ap->sff_pio_task_link = NULL; >> - return; >> + goto out_unlock; >> } >> >> fsm_start: >> @@ -1382,11 +1384,14 @@ static void ata_sff_pio_task(struct work_struct *work) >> */ >> status = ata_sff_busy_wait(ap, ATA_BUSY, 5); >> if (status & ATA_BUSY) { >> + spin_unlock_irq(ap->lock); >> ata_msleep(ap, 2); >> + spin_lock_irq(ap->lock); >> + >> status = ata_sff_busy_wait(ap, ATA_BUSY, 10); >> if (status & ATA_BUSY) { >> ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); >> - return; >> + goto out_unlock; >> } >> } >> >> @@ -1403,6 +1408,8 @@ static void ata_sff_pio_task(struct work_struct *work) >> */ >> if (poll_next) >> goto fsm_start; >> +out_unlock: >> + spin_unlock_irq(ap->lock); >> } >> >> /** >> >> -- >> You received this message because you are subscribed to the Google Groups "syzkaller" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-29 13:40 ` Dmitry Vyukov @ 2016-01-29 18:14 ` David Milburn 2016-01-29 20:24 ` Tejun Heo 2016-01-29 20:23 ` Tejun Heo 1 sibling, 1 reply; 12+ messages in thread From: David Milburn @ 2016-01-29 18:14 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzkaller, linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, Kostya Serebryany, Alexander Potapenko, Sasha Levin [-- Attachment #1: Type: text/plain, Size: 6849 bytes --] Hi Tejun, Can ata_sff_hsm_move grab the lock and save off the task_state, like this patch? Thanks, David On Fri, Jan 29, 2016 at 02:40:33PM +0100, Dmitry Vyukov wrote: > On Fri, Jan 29, 2016 at 2:18 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > > On Fri, Jan 29, 2016 at 1:23 PM, Tejun Heo <tj@kernel.org> wrote: > >> Hello, Dmitry. > >> > >> On Fri, Jan 29, 2016 at 12:59:49PM +0100, Dmitry Vyukov wrote: > >>> > Hmmm... the port interrupt handler checks for IDLE before calling into > >>> > hsm_move, so the only explanation would be that something is resetting > >>> > it to IDLE inbetween. ce7514526742 ("libata: prevent HSM state change > >>> > race between ISR and PIO") describes and fixes the same problem. The > >>> > fix seems correct and I can't find anywhere else where this can > >>> > happen. :( > >>> > > >>> > Can you please post the kernel log leading to the BUG? Also, I don't > >>> > think that condition needs to be BUG. I'll change it to WARN. > >>> > >>> Here are two logs, in both cases no kernel messages prior to the bug: > >>> https://gist.githubusercontent.com/dvyukov/5087d633e3620280b6c7/raw/31c9ab1ced92ac5f85cfb15eaf48ec5793c2c3a1/gistfile1.txt > >>> https://gist.githubusercontent.com/dvyukov/825b2e3d5fb80ae08a9a/raw/03c5a4f4c4bd9d0a304a71cda2da4c92f4b7f1ba/gistfile1.txt > >> > >> lol, this is kinda embarrassing. It looks like the poll path wasn't > >> doing any locking. Can you please verify the following patch at least > >> doesn't crash the machine immediately and if so keep it applied to the > >> test kernel so that we can verify that the problem actually goes away? > > > > > > Great that you managed to debug it without a repro! > > I've applied this patch to my tree and will rerun fuzzer. I will > > notify you if I see this warning again. > > Thanks > > It now deadlocks at this stack. It is probably not OK to call > ata_sff_hsm_move holding the spinlock. > > [ 8.168809] NMI backtrace for cpu 2 > [ 8.168809] CPU: 2 PID: 17 Comm: kworker/2:0 Not tainted 4.5.0-rc1+ #302 > [ 8.168809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Bochs 01/01/2011 > [ 8.168809] Workqueue: ata_sff ata_sff_pio_task > [ 8.168809] task: ffff88006d24c740 ti: ffff88006d090000 task.ti: > ffff88006d090000 > [ 8.168809] RIP: 0010:[<ffffffff82c0e388>] [<ffffffff82c0e388>] > delay_tsc+0x18/0x70 > [ 8.168809] RSP: 0000:ffff88006d097a60 EFLAGS: 00000086 > [ 8.168809] RAX: 000000008edaeede RBX: ffff880036e88028 RCX: 000000000000001e > [ 8.168809] RDX: 0000000000000033 RSI: 000000338edaee9c RDI: 0000000000000001 > [ 8.168809] RBP: ffff88006d097a60 R08: 0000000000000002 R09: 0000000000000000 > [ 8.168809] R10: fffffbfff11baf82 R11: 1ffffffff134edc5 R12: ffff880036e88038 > [ 8.168809] R13: 000000009a9d2d40 R14: ffff880036e88030 R15: 0000000099383d98 > [ 8.168809] FS: 0000000000000000(0000) GS:ffff88006d600000(0000) > knlGS:0000000000000000 > [ 8.168809] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 8.168809] CR2: 00000000ffffffff CR3: 0000000007a4b000 CR4: 00000000000006e0 > [ 8.168809] Stack: > [ 8.168809] ffff88006d097a70 ffffffff82c0e2da ffff88006d097aa8 > ffffffff814661a9 > [ 8.168809] ffff880036e88028 0000000000000082 ffff880036e90008 > ffff880036e90010 > [ 8.168809] 0000000000000001 ffff88006d097ad0 ffffffff86652c47 > ffffffff83a80c4c > [ 8.168809] Call Trace: > [ 8.168809] [<ffffffff82c0e2da>] __delay+0xa/0x10 > [ 8.168809] [<ffffffff814661a9>] do_raw_spin_lock+0x149/0x2b0 > [ 8.168809] [<ffffffff86652c47>] _raw_spin_lock_irqsave+0xa7/0xd0 > [ 8.168809] [<ffffffff83a80c4c>] ? ata_hsm_qc_complete+0x12c/0x420 > [ 8.168809] [<ffffffff83a80c4c>] ata_hsm_qc_complete+0x12c/0x420 > [ 8.168809] [<ffffffff83a82f42>] ata_sff_hsm_move+0x1e2/0x1c60 > [ 8.168809] [<ffffffff83a85f17>] ? ata_sff_pio_task+0x87/0x530 > [ 8.168809] [<ffffffff83a86296>] ata_sff_pio_task+0x406/0x530 > [ 8.168809] [<ffffffff813a2386>] process_one_work+0x796/0x1440 > [ 8.168809] [<ffffffff813a22ba>] ? process_one_work+0x6ca/0x1440 > [ 8.168809] [<ffffffff813d6190>] ? finish_task_switch+0x120/0x5f0 > [ 8.168809] [<ffffffff813a1bf0>] ? pwq_dec_nr_in_flight+0x2e0/0x2e0 > [ 8.168809] [<ffffffff813a310b>] worker_thread+0xdb/0xfc0 > [ 8.168809] [<ffffffff813b637f>] kthread+0x23f/0x2d0 > [ 8.168809] [<ffffffff813a3030>] ? process_one_work+0x1440/0x1440 > [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0 > [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0 > [ 8.168809] [<ffffffff8665362f>] ret_from_fork+0x3f/0x70 > [ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0 > > > > > >> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > >> index 608677d..6991efc 100644 > >> --- a/drivers/ata/libata-sff.c > >> +++ b/drivers/ata/libata-sff.c > >> @@ -1362,12 +1362,14 @@ static void ata_sff_pio_task(struct work_struct *work) > >> u8 status; > >> int poll_next; > >> > >> + spin_lock_irq(ap->lock); > >> + > >> BUG_ON(ap->sff_pio_task_link == NULL); > >> /* qc can be NULL if timeout occurred */ > >> qc = ata_qc_from_tag(ap, link->active_tag); > >> if (!qc) { > >> ap->sff_pio_task_link = NULL; > >> - return; > >> + goto out_unlock; > >> } > >> > >> fsm_start: > >> @@ -1382,11 +1384,14 @@ static void ata_sff_pio_task(struct work_struct *work) > >> */ > >> status = ata_sff_busy_wait(ap, ATA_BUSY, 5); > >> if (status & ATA_BUSY) { > >> + spin_unlock_irq(ap->lock); > >> ata_msleep(ap, 2); > >> + spin_lock_irq(ap->lock); > >> + > >> status = ata_sff_busy_wait(ap, ATA_BUSY, 10); > >> if (status & ATA_BUSY) { > >> ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); > >> - return; > >> + goto out_unlock; > >> } > >> } > >> > >> @@ -1403,6 +1408,8 @@ static void ata_sff_pio_task(struct work_struct *work) > >> */ > >> if (poll_next) > >> goto fsm_start; > >> +out_unlock: > >> + spin_unlock_irq(ap->lock); > >> } > >> > >> /** > >> > >> -- > >> You received this message because you are subscribed to the Google Groups "syzkaller" group. > >> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com. > >> For more options, visit https://groups.google.com/d/optout. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: libata-sff-check-task_state-hsm-patch --] [-- Type: text/plain, Size: 893 bytes --] drivers/ata/libata-sff.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index cdf6215..f70bc13 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -1050,6 +1050,7 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, struct ata_eh_info *ehi = &link->eh_info; unsigned long flags = 0; int poll_next; + unsigned int task_state; WARN_ON_ONCE((qc->flags & ATA_QCFLAG_ACTIVE) == 0); @@ -1063,7 +1064,11 @@ fsm_start: DPRINTK("ata%u: protocol %d task_state %d (dev_stat 0x%X)\n", ap->print_id, qc->tf.protocol, ap->hsm_task_state, status); - switch (ap->hsm_task_state) { + spin_lock_irq(ap->lock); + task_state = ap->hsm_task_state; + spin_unlock_irq(ap->lock); + + switch (task_state) { case HSM_ST_FIRST: /* Send first data block or PACKET CDB */ ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-29 18:14 ` David Milburn @ 2016-01-29 20:24 ` Tejun Heo 0 siblings, 0 replies; 12+ messages in thread From: Tejun Heo @ 2016-01-29 20:24 UTC (permalink / raw) To: David Milburn Cc: Dmitry Vyukov, syzkaller, linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, Kostya Serebryany, Alexander Potapenko, Sasha Levin Hello, David. On Fri, Jan 29, 2016 at 12:14:42PM -0600, David Milburn wrote: > Can ata_sff_hsm_move grab the lock and save off the task_state, > like this patch? It's not just the task_state. Everything the polling path is doing is dangerous. It's all pretty confused. Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-29 13:40 ` Dmitry Vyukov 2016-01-29 18:14 ` David Milburn @ 2016-01-29 20:23 ` Tejun Heo 2016-02-01 10:46 ` Dmitry Vyukov 1 sibling, 1 reply; 12+ messages in thread From: Tejun Heo @ 2016-01-29 20:23 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzkaller, linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, Kostya Serebryany, Alexander Potapenko, Sasha Levin Hello, On Fri, Jan 29, 2016 at 02:40:33PM +0100, Dmitry Vyukov wrote: > It now deadlocks at this stack. It is probably not OK to call > ata_sff_hsm_move holding the spinlock. Yeah, the locking is pretty messed up with the polling path. Can you please try the following? Thanks. diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index cdf6215..7dbba38 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -997,12 +997,9 @@ static inline int ata_hsm_ok_in_wq(struct ata_port *ap, static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) { struct ata_port *ap = qc->ap; - unsigned long flags; if (ap->ops->error_handler) { if (in_wq) { - spin_lock_irqsave(ap->lock, flags); - /* EH might have kicked in while host lock is * released. */ @@ -1014,8 +1011,6 @@ static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) } else ata_port_freeze(ap); } - - spin_unlock_irqrestore(ap->lock, flags); } else { if (likely(!(qc->err_mask & AC_ERR_HSM))) ata_qc_complete(qc); @@ -1024,10 +1019,8 @@ static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) } } else { if (in_wq) { - spin_lock_irqsave(ap->lock, flags); ata_sff_irq_on(ap); ata_qc_complete(qc); - spin_unlock_irqrestore(ap->lock, flags); } else ata_qc_complete(qc); } @@ -1048,9 +1041,10 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, { struct ata_link *link = qc->dev->link; struct ata_eh_info *ehi = &link->eh_info; - unsigned long flags = 0; int poll_next; + lockdep_assert_held(ap->lock); + WARN_ON_ONCE((qc->flags & ATA_QCFLAG_ACTIVE) == 0); /* Make sure ata_sff_qc_issue() does not throw things @@ -1112,14 +1106,6 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, } } - /* Send the CDB (atapi) or the first data block (ata pio out). - * During the state transition, interrupt handler shouldn't - * be invoked before the data transfer is complete and - * hsm_task_state is changed. Hence, the following locking. - */ - if (in_wq) - spin_lock_irqsave(ap->lock, flags); - if (qc->tf.protocol == ATA_PROT_PIO) { /* PIO data out protocol. * send first data block. @@ -1135,9 +1121,6 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, /* send CDB */ atapi_send_cdb(ap, qc); - if (in_wq) - spin_unlock_irqrestore(ap->lock, flags); - /* if polling, ata_sff_pio_task() handles the rest. * otherwise, interrupt handler takes over from here. */ @@ -1361,12 +1344,14 @@ static void ata_sff_pio_task(struct work_struct *work) u8 status; int poll_next; + spin_lock_irq(ap->lock); + BUG_ON(ap->sff_pio_task_link == NULL); /* qc can be NULL if timeout occurred */ qc = ata_qc_from_tag(ap, link->active_tag); if (!qc) { ap->sff_pio_task_link = NULL; - return; + goto out_unlock; } fsm_start: @@ -1381,11 +1366,14 @@ static void ata_sff_pio_task(struct work_struct *work) */ status = ata_sff_busy_wait(ap, ATA_BUSY, 5); if (status & ATA_BUSY) { + spin_unlock_irq(ap->lock); ata_msleep(ap, 2); + spin_lock_irq(ap->lock); + status = ata_sff_busy_wait(ap, ATA_BUSY, 10); if (status & ATA_BUSY) { ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); - return; + goto out_unlock; } } @@ -1402,6 +1390,8 @@ static void ata_sff_pio_task(struct work_struct *work) */ if (poll_next) goto fsm_start; +out_unlock: + spin_unlock_irq(ap->lock); } /** ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: ata: BUG in ata_sff_hsm_move 2016-01-29 20:23 ` Tejun Heo @ 2016-02-01 10:46 ` Dmitry Vyukov 2016-02-01 16:50 ` [PATCH libata/for-4.5-fixes] libata: fix sff host state machine locking while polling Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Dmitry Vyukov @ 2016-02-01 10:46 UTC (permalink / raw) To: Tejun Heo Cc: syzkaller, linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, Kostya Serebryany, Alexander Potapenko, Sasha Levin On Fri, Jan 29, 2016 at 9:23 PM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Fri, Jan 29, 2016 at 02:40:33PM +0100, Dmitry Vyukov wrote: >> It now deadlocks at this stack. It is probably not OK to call >> ata_sff_hsm_move holding the spinlock. > > Yeah, the locking is pretty messed up with the polling path. Can you > please try the following? No crashes over the weekend. We can consider this fixed. Thanks! > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index cdf6215..7dbba38 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -997,12 +997,9 @@ static inline int ata_hsm_ok_in_wq(struct ata_port *ap, > static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) > { > struct ata_port *ap = qc->ap; > - unsigned long flags; > > if (ap->ops->error_handler) { > if (in_wq) { > - spin_lock_irqsave(ap->lock, flags); > - > /* EH might have kicked in while host lock is > * released. > */ > @@ -1014,8 +1011,6 @@ static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) > } else > ata_port_freeze(ap); > } > - > - spin_unlock_irqrestore(ap->lock, flags); > } else { > if (likely(!(qc->err_mask & AC_ERR_HSM))) > ata_qc_complete(qc); > @@ -1024,10 +1019,8 @@ static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) > } > } else { > if (in_wq) { > - spin_lock_irqsave(ap->lock, flags); > ata_sff_irq_on(ap); > ata_qc_complete(qc); > - spin_unlock_irqrestore(ap->lock, flags); > } else > ata_qc_complete(qc); > } > @@ -1048,9 +1041,10 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, > { > struct ata_link *link = qc->dev->link; > struct ata_eh_info *ehi = &link->eh_info; > - unsigned long flags = 0; > int poll_next; > > + lockdep_assert_held(ap->lock); > + > WARN_ON_ONCE((qc->flags & ATA_QCFLAG_ACTIVE) == 0); > > /* Make sure ata_sff_qc_issue() does not throw things > @@ -1112,14 +1106,6 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, > } > } > > - /* Send the CDB (atapi) or the first data block (ata pio out). > - * During the state transition, interrupt handler shouldn't > - * be invoked before the data transfer is complete and > - * hsm_task_state is changed. Hence, the following locking. > - */ > - if (in_wq) > - spin_lock_irqsave(ap->lock, flags); > - > if (qc->tf.protocol == ATA_PROT_PIO) { > /* PIO data out protocol. > * send first data block. > @@ -1135,9 +1121,6 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, > /* send CDB */ > atapi_send_cdb(ap, qc); > > - if (in_wq) > - spin_unlock_irqrestore(ap->lock, flags); > - > /* if polling, ata_sff_pio_task() handles the rest. > * otherwise, interrupt handler takes over from here. > */ > @@ -1361,12 +1344,14 @@ static void ata_sff_pio_task(struct work_struct *work) > u8 status; > int poll_next; > > + spin_lock_irq(ap->lock); > + > BUG_ON(ap->sff_pio_task_link == NULL); > /* qc can be NULL if timeout occurred */ > qc = ata_qc_from_tag(ap, link->active_tag); > if (!qc) { > ap->sff_pio_task_link = NULL; > - return; > + goto out_unlock; > } > > fsm_start: > @@ -1381,11 +1366,14 @@ static void ata_sff_pio_task(struct work_struct *work) > */ > status = ata_sff_busy_wait(ap, ATA_BUSY, 5); > if (status & ATA_BUSY) { > + spin_unlock_irq(ap->lock); > ata_msleep(ap, 2); > + spin_lock_irq(ap->lock); > + > status = ata_sff_busy_wait(ap, ATA_BUSY, 10); > if (status & ATA_BUSY) { > ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); > - return; > + goto out_unlock; > } > } > > @@ -1402,6 +1390,8 @@ static void ata_sff_pio_task(struct work_struct *work) > */ > if (poll_next) > goto fsm_start; > +out_unlock: > + spin_unlock_irq(ap->lock); > } > > /** ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH libata/for-4.5-fixes] libata: fix sff host state machine locking while polling 2016-02-01 10:46 ` Dmitry Vyukov @ 2016-02-01 16:50 ` Tejun Heo 0 siblings, 0 replies; 12+ messages in thread From: Tejun Heo @ 2016-02-01 16:50 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzkaller, linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, Kostya Serebryany, Alexander Potapenko, Sasha Levin Hello, Dmitry. Thanks a lot for confirming the fix. Applied the following to for-4.5-fixes branch w/ stable cc'd. Thanks again! ------ 8< ------ >From 8eee1d3ed5b6fc8e14389567c9a6f53f82bb7224 Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Mon, 1 Feb 2016 11:33:21 -0500 The bulk of ATA host state machine is implemented by ata_sff_hsm_move(). The function is called from either the interrupt handler or, if polling, a work item. Unlike from the interrupt path, the polling path calls the function without holding the host lock and ata_sff_hsm_move() selectively grabs the lock. This is completely broken. If an IRQ triggers while polling is in progress, the two can easily race and end up accessing the hardware and updating state machine state at the same time. This can put the state machine in an illegal state and lead to a crash like the following. kernel BUG at drivers/ata/libata-sff.c:1302! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 1 PID: 10679 Comm: syz-executor Not tainted 4.5.0-rc1+ #300 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88002bd00000 ti: ffff88002e048000 task.ti: ffff88002e048000 RIP: 0010:[<ffffffff83a83409>] [<ffffffff83a83409>] ata_sff_hsm_move+0x619/0x1c60 ... Call Trace: <IRQ> [<ffffffff83a84c31>] __ata_sff_port_intr+0x1e1/0x3a0 drivers/ata/libata-sff.c:1584 [<ffffffff83a85611>] ata_bmdma_port_intr+0x71/0x400 drivers/ata/libata-sff.c:2877 [< inline >] __ata_sff_interrupt drivers/ata/libata-sff.c:1629 [<ffffffff83a85bf3>] ata_bmdma_interrupt+0x253/0x580 drivers/ata/libata-sff.c:2902 [<ffffffff81479f98>] handle_irq_event_percpu+0x108/0x7e0 kernel/irq/handle.c:157 [<ffffffff8147a717>] handle_irq_event+0xa7/0x140 kernel/irq/handle.c:205 [<ffffffff81484573>] handle_edge_irq+0x1e3/0x8d0 kernel/irq/chip.c:623 [< inline >] generic_handle_irq_desc include/linux/irqdesc.h:146 [<ffffffff811a92bc>] handle_irq+0x10c/0x2a0 arch/x86/kernel/irq_64.c:78 [<ffffffff811a7e4d>] do_IRQ+0x7d/0x1a0 arch/x86/kernel/irq.c:240 [<ffffffff86653d4c>] common_interrupt+0x8c/0x8c arch/x86/entry/entry_64.S:520 <EOI> [< inline >] rcu_lock_acquire include/linux/rcupdate.h:490 [< inline >] rcu_read_lock include/linux/rcupdate.h:874 [<ffffffff8164b4a1>] filemap_map_pages+0x131/0xba0 mm/filemap.c:2145 [< inline >] do_fault_around mm/memory.c:2943 [< inline >] do_read_fault mm/memory.c:2962 [< inline >] do_fault mm/memory.c:3133 [< inline >] handle_pte_fault mm/memory.c:3308 [< inline >] __handle_mm_fault mm/memory.c:3418 [<ffffffff816efb16>] handle_mm_fault+0x2516/0x49a0 mm/memory.c:3447 [<ffffffff8127dc16>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238 [<ffffffff8127e358>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331 [<ffffffff8126f514>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264 [<ffffffff86655578>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986 Fix it by ensuring that the polling path is holding the host lock before entering ata_sff_hsm_move() so that all hardware accesses and state updates are performed under the host lock. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/g/CACT4Y+b_JsOxJu2EZyEf+mOXORc_zid5V1-pLZSroJVxyWdSpw@mail.gmail.com Cc: stable@vger.kernel.org --- drivers/ata/libata-sff.c | 32 +++++++++++--------------------- 1 file changed, 11 insertions(+), 21 deletions(-) diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 608677d..051b615 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -997,12 +997,9 @@ static inline int ata_hsm_ok_in_wq(struct ata_port *ap, static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) { struct ata_port *ap = qc->ap; - unsigned long flags; if (ap->ops->error_handler) { if (in_wq) { - spin_lock_irqsave(ap->lock, flags); - /* EH might have kicked in while host lock is * released. */ @@ -1014,8 +1011,6 @@ static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) } else ata_port_freeze(ap); } - - spin_unlock_irqrestore(ap->lock, flags); } else { if (likely(!(qc->err_mask & AC_ERR_HSM))) ata_qc_complete(qc); @@ -1024,10 +1019,8 @@ static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq) } } else { if (in_wq) { - spin_lock_irqsave(ap->lock, flags); ata_sff_irq_on(ap); ata_qc_complete(qc); - spin_unlock_irqrestore(ap->lock, flags); } else ata_qc_complete(qc); } @@ -1048,9 +1041,10 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, { struct ata_link *link = qc->dev->link; struct ata_eh_info *ehi = &link->eh_info; - unsigned long flags = 0; int poll_next; + lockdep_assert_held(ap->lock); + WARN_ON_ONCE((qc->flags & ATA_QCFLAG_ACTIVE) == 0); /* Make sure ata_sff_qc_issue() does not throw things @@ -1112,14 +1106,6 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, } } - /* Send the CDB (atapi) or the first data block (ata pio out). - * During the state transition, interrupt handler shouldn't - * be invoked before the data transfer is complete and - * hsm_task_state is changed. Hence, the following locking. - */ - if (in_wq) - spin_lock_irqsave(ap->lock, flags); - if (qc->tf.protocol == ATA_PROT_PIO) { /* PIO data out protocol. * send first data block. @@ -1135,9 +1121,6 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, /* send CDB */ atapi_send_cdb(ap, qc); - if (in_wq) - spin_unlock_irqrestore(ap->lock, flags); - /* if polling, ata_sff_pio_task() handles the rest. * otherwise, interrupt handler takes over from here. */ @@ -1362,12 +1345,14 @@ static void ata_sff_pio_task(struct work_struct *work) u8 status; int poll_next; + spin_lock_irq(ap->lock); + BUG_ON(ap->sff_pio_task_link == NULL); /* qc can be NULL if timeout occurred */ qc = ata_qc_from_tag(ap, link->active_tag); if (!qc) { ap->sff_pio_task_link = NULL; - return; + goto out_unlock; } fsm_start: @@ -1382,11 +1367,14 @@ static void ata_sff_pio_task(struct work_struct *work) */ status = ata_sff_busy_wait(ap, ATA_BUSY, 5); if (status & ATA_BUSY) { + spin_unlock_irq(ap->lock); ata_msleep(ap, 2); + spin_lock_irq(ap->lock); + status = ata_sff_busy_wait(ap, ATA_BUSY, 10); if (status & ATA_BUSY) { ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE); - return; + goto out_unlock; } } @@ -1403,6 +1391,8 @@ static void ata_sff_pio_task(struct work_struct *work) */ if (poll_next) goto fsm_start; +out_unlock: + spin_unlock_irq(ap->lock); } /** -- 2.5.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH libata/for-4.5-fixes] libata-sff: use WARN instead of BUG on illegal host state machine state 2016-01-29 11:52 ` Tejun Heo 2016-01-29 11:59 ` Dmitry Vyukov @ 2016-01-29 12:20 ` Tejun Heo 1 sibling, 0 replies; 12+ messages in thread From: Tejun Heo @ 2016-01-29 12:20 UTC (permalink / raw) To: Dmitry Vyukov Cc: linux-ide, LKML, Alan Cox, Jeff Garzik, Sergei Shtylyov, syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin >From a588afc920bc50e894f6ae2874c4281c795e0979 Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Fri, 29 Jan 2016 07:06:53 -0500 ata_sff_hsm_move() triggers BUG if it sees a host state machine state that it dind't expect. The risk for data corruption when the condition occurs is low as it's highly unlikely that it would lead to spurious completion of commands. The BUG occasionally triggered for subtle race conditions in the driver. Let's downgrade it to WARN so that it doesn't kill the machine unnecessarily. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> --- Applied to libata/for-4.5-fixes. Thanks. drivers/ata/libata-sff.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index cdf6215..608677d 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -1296,7 +1296,8 @@ int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc, break; default: poll_next = 0; - BUG(); + WARN(true, "ata%d: SFF host state machine in invalid state %d", + ap->print_id, ap->hsm_task_state); } return poll_next; -- 2.5.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-02-01 16:50 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-28 11:35 ata: BUG in ata_sff_hsm_move Dmitry Vyukov 2016-01-29 11:52 ` Tejun Heo 2016-01-29 11:59 ` Dmitry Vyukov 2016-01-29 12:23 ` Tejun Heo 2016-01-29 13:18 ` Dmitry Vyukov 2016-01-29 13:40 ` Dmitry Vyukov 2016-01-29 18:14 ` David Milburn 2016-01-29 20:24 ` Tejun Heo 2016-01-29 20:23 ` Tejun Heo 2016-02-01 10:46 ` Dmitry Vyukov 2016-02-01 16:50 ` [PATCH libata/for-4.5-fixes] libata: fix sff host state machine locking while polling Tejun Heo 2016-01-29 12:20 ` [PATCH libata/for-4.5-fixes] libata-sff: use WARN instead of BUG on illegal host state machine state Tejun Heo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.