All of lore.kernel.org
 help / color / mirror / Atom feed
* dmatest regression in 3.10-rc1
@ 2013-05-15 15:28 ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-15 15:28 UTC (permalink / raw)
  To: djbw, vinod.koul
  Cc: linux-kernel, linux-arm-kernel, andriy.shevchenko, viresh.kumar

Hello,

I've been observing a regression in the dmatest module with 3.10-rc1. It
manifests as either:

 - a spurious timeout on one or more of the channel threads
 - a complete kernel lockup (loss of console)
 - a panic (see below, noting that the callback [dmatest_callback] is
   dereferencing a NULL pointer)

If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
for") then things are rosy again, but I'm not sure if this is hiding another
problem.

I'm using 8 iterations, 8 threads per channel and a 128k buffer size,
although I've reproduced it with 1 iteration, 1 thread, default buffer size
too.

Any ideas? Some of the changes to dmatest.c since 3.9 have some fishy
looking synchronisation variables without barriers/atomic accesses, but I'm
reproducing this on a software model so I wouldn't expect too much memory
re-ordering.

Any help appreciated,

Will

--->8

[   18.911813] dmatest: thread dma0chan0-copy1 exited with status 0
[   18.911916] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   18.911974] pgd = c0003000
[   18.912020] [00000000] *pgd=80000080004003, *pmd=00000000
[   18.912097] Internal error: Oops: 207 [#1] PREEMPT SMP ARM
[   18.912144] Modules linked in: dmatest(-)
[   18.912230] CPU: 0 PID: 651 Comm: dma2chan0-copy3 Not tainted 3.10.0-rc1+ #15
[   18.912300] task: dfac8880 ti: df37a000 task.ti: df37a000
[   18.912378] PC is at _raw_spin_lock_irqsave+0x24/0x60
[   18.912448] LR is at __wake_up+0x20/0x50
[   18.912522] pc : [<c032ec60>]    lr : [<c0044054>]    psr: a0000193
[   18.912522] sp : df37bb78  ip : 00000000  fp : df37bb9c
[   18.912616] r10: df37a000  r9 : 00000000  r8 : 00000000
[   18.912679] r7 : 00000003  r6 : 00000000  r5 : df82c000  r4 : 00000000
[   18.912749] r3 : df37a000  r2 : 00000000  r1 : 40000103  r0 : a0000113
[   18.912818] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[   18.912893] Control: 30c5387d  Table: 9f97ec80  DAC: fffffffd
[   18.912956] Process dma2chan0-copy3 (pid: 651, stack limit = 0xdf37a240)
[   18.913020] Stack: (0xdf37bb78 to 0xdf37c000)
[   18.913093] bb60:                                                       d28de712 00000001
[   18.913207] bb80: 0000006e df836878 df82c000 00000000 df82c040 df82c000 df37bbc4 c01e92d8
[   18.913322] bba0: 00000001 c0ce3a40 00000000 80000193 dfac88b8 00002e7b df82c04c 20000113
[   18.913438] bbc0: c0ce2e9c df836878 df836878 00000080 00000000 70b93fb1 8f46af61 df82c000
[   18.913551] bbe0: df82c004 00000000 c0556508 c0585d00 00000000 df37a000 00000006 c0025304
[   18.913665] bc00: 00000001 00000018 c055c09c df37a000 c055c080 40000102 00000018 c00256e0
[   18.913782] bc20: df80c950 df0850c0 e0802000 00000000 ffff90d1 00200040 df37bcec df37a000
[   18.913895] bc40: 00000060 00000000 e0802000 dfa00cc0 00000000 df0d5c00 df37bcec c0025ae8
[   18.914013] bc60: c0556e48 c000eb38 e080200c c0562820 df37bc90 c0008560 c00444b0 c032efe4
[   18.914130] bc80: 60000113 ffffffff df37bcc4 c000de80 c0ce3a40 dfac8880 df37a000 0000221b
[   18.914244] bca0: 00000000 c0ce3a40 dfacc840 00000002 dfa00cc0 00000000 df0d5c00 df37bcec
[   18.914362] bcc0: c0557a40 df37bcd8 c00444b0 c032efe4 60000113 ffffffff c0ce3a40 df37a000
[   18.914478] bce0: c0557a40 dfac8880 df37bd8c c032db74 00000001 c002bd68 c0586a54 c0586c54
[   18.914593] bd00: df37bd00 df37bd00 6475a018 00000004 00000004 c032e458 c0557a40 c0557a40
[   18.914706] bd20: 00000000 00000008 c055c08c c0025754 c05731c0 df81c840 c0ce6680 00000000
[   18.914821] bd40: ffff90cd 00200040 c01d4a84 df37a000 0000001b 00000000 e0802000 00007317
[   18.914936] bd60: c0556e48 0000001b 00000000 df37a000 80000113 ffffffff df37bdd4 c000dea0
[   18.915052] bd80: df37a000 0001beae df37bd9c c032e458 bf001fe0 bf001e70 00000000 c000deb8
[   18.915164] bda0: df15a200 dfb4ca40 0001beae 00020000 0001c8d5 00000000 0000000b 0000000c
[   18.915279] bdc0: 00007317 deea0000 0001beae 00000000 0001c8d4 df37bde8 bf001fe0 bf001e70
[   18.915393] bde0: 80000113 ffffffff 00020000 df15a200 00000000 00014b97 0000517d 00007317
[   18.915505] be00: dfb4ca40 0001beae 00000000 bf001fe0 0001beae 00000000 00000000 00000001
[   18.915618] be20: 00000080 00020000 0001beae bf002dfc dfb4c980 00000001 bf00262c 00000000
[   18.915731] be40: 00000000 00000001 00000080 ffffffff 000000c0 0000517d df37bf64 bf001cdc
[   18.915843] be60: 00014b97 00007317 dfb4ca40 00000001 0001beae 00000000 00000000 bf000e78
[   18.915954] be80: 9eea0000 00000000 df374000 00000002 9ee8517d 00000000 00000000 00000000
[   18.916066] bea0: df37401c 00000008 df37bf20 00000010 dfac8ab8 00000010 00000000 00000008
[   18.916181] bec0: df37be80 df82e014 df37be98 0000517d df37be80 df37be90 00000001 dfb4c980
[   18.916294] bee0: 00014b97 00000001 00000001 c0564504 c0591d00 00000000 00007317 df97e340
[   18.916408] bf00: 0000517d bf002dfc dfa0ac40 00000003 df37bf44 75b7c14f 00000001 df37bf20
[   18.916521] bf20: 00040004 df37bf24 df37bf24 00000007 00000007 00000000 df37bf38 df37bf38
[   18.916634] bf40: df37bf64 df161e98 00000000 df97e340 bf000b0c 00000000 00000000 00000000
[   18.916746] bf60: 00000000 c003c1a4 dfdfdfcf 00000000 dfdfdfcf df97e340 00000000 00000000
[   18.916861] bf80: df37bf80 df37bf80 00000000 00000000 df37bf90 df37bf90 df37bfac df161e98
[   18.916971] bfa0: c003c100 00000000 00000000 c000e318 00000000 00000000 00000000 00000000
[   18.917078] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   18.917189] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 dfdfdfcf cfdfdfdf
[   18.917318] [<c032ec60>] (_raw_spin_lock_irqsave+0x24/0x60) from [<c0044054>] (__wake_up+0x20/0x50)
[   18.917453] [<c0044054>] (__wake_up+0x20/0x50) from [<c01e92d8>] (pl330_tasklet+0x4c8/0x580)
[   18.917593] [<c01e92d8>] (pl330_tasklet+0x4c8/0x580) from [<c0025304>] (tasklet_action+0x74/0x100)
[   18.917734] [<c0025304>] (tasklet_action+0x74/0x100) from [<c00256e0>] (__do_softirq+0xe0/0x1b8)
[   18.917858] [<c00256e0>] (__do_softirq+0xe0/0x1b8) from [<c0025ae8>] (irq_exit+0x90/0xc8)
[   18.917972] [<c0025ae8>] (irq_exit+0x90/0xc8) from [<c000eb38>] (handle_IRQ+0x3c/0x94)
[   18.918089] [<c000eb38>] (handle_IRQ+0x3c/0x94) from [<c0008560>] (gic_handle_irq+0x28/0x5c)
[   18.918204] [<c0008560>] (gic_handle_irq+0x28/0x5c) from [<c000de80>] (__irq_svc+0x40/0x70)
[   18.918273] Exception stack(0xdf37bc90 to 0xdf37bcd8)
[   18.918357] bc80:                                     c0ce3a40 dfac8880 df37a000 0000221b
[   18.918470] bca0: 00000000 c0ce3a40 dfacc840 00000002 dfa00cc0 00000000 df0d5c00 df37bcec
[   18.918572] bcc0: c0557a40 df37bcd8 c00444b0 c032efe4 60000113 ffffffff
[   18.918697] [<c000de80>] (__irq_svc+0x40/0x70) from [<c032efe4>] (_raw_spin_unlock_irq+0x1c/0x44)
[   18.918831] [<c032efe4>] (_raw_spin_unlock_irq+0x1c/0x44) from [<c00444b0>] (finish_task_switch+0x54/0xdc)
[   18.918965] [<c00444b0>] (finish_task_switch+0x54/0xdc) from [<c032db74>] (__schedule+0x208/0x66c)
[   18.919097] [<c032db74>] (__schedule+0x208/0x66c) from [<c032e458>] (preempt_schedule_irq+0x40/0x64)
[   18.919116] dma2chan0-copy7: #1: No errors with src_off=0x5000 dst_off=0x5028 len=0x8d77 (0)
[   18.919292] [<c032e458>] (preempt_schedule_irq+0x40/0x64) from [<c000deb8>] (svc_preempt+0x8/0x18)
[   18.919431] [<c000deb8>] (svc_preempt+0x8/0x18) from [<bf001e70>] (dmatest_verify.isra.9+0x14/0xb4 [dmatest])
[   18.919499] unwind: Index not found bf001e70
[   18.919582] Code: e3c3303f e5931004 e2811001 e5831004 (e1923f9f)
[   18.919644] ---[ end trace f24842049530d274 ]---
[   18.919699] Kernel panic - not syncing: Fatal exception in interrupt
[   18.919754] CPU4: stopping
[   18.919822] CPU: 4 PID: 655 Comm: dma2chan0-copy7 Tainted: G      D      3.10.0-rc1+ #15
[   18.919949] [<c0013b50>] (unwind_backtrace+0x0/0xf8) from [<c0011434>] (show_stack+0x10/0x14)
[   18.920073] [<c0011434>] (show_stack+0x10/0x14) from [<c0012918>] (handle_IPI+0xf8/0x11c)
[   18.920183] [<c0012918>] (handle_IPI+0xf8/0x11c) from [<c000858c>] (gic_handle_irq+0x54/0x5c)
[   18.920299] [<c000858c>] (gic_handle_irq+0x54/0x5c) from [<c000de80>] (__irq_svc+0x40/0x70)
[   18.920369] Exception stack(0xdef23e18 to 0xdef23e60)
[   18.920442] 3e00:                                                       dfb4ce40 000106d3
[   18.920555] 3e20: defa0000 000015d2 00020000 00012292 00008a83 00000080 ffffffff 000000c0
[   18.920671] 3e40: 00008a83 def23f64 ffffff8e def23e60 bf000fb4 bf000fe8 80000113 ffffffff
[   18.920808] [<c000de80>] (__irq_svc+0x40/0x70) from [<bf000fe8>] (dmatest_func+0x4dc/0x12ec [dmatest])
[   18.920944] [<bf000fe8>] (dmatest_func+0x4dc/0x12ec [dmatest]) from [<c003c1a4>] (kthread+0xa4/0xb0)
[   18.921066] [<c003c1a4>] (kthread+0xa4/0xb0) from [<c000e318>] (ret_from_fork+0x14/0x3c)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* dmatest regression in 3.10-rc1
@ 2013-05-15 15:28 ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-15 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

I've been observing a regression in the dmatest module with 3.10-rc1. It
manifests as either:

 - a spurious timeout on one or more of the channel threads
 - a complete kernel lockup (loss of console)
 - a panic (see below, noting that the callback [dmatest_callback] is
   dereferencing a NULL pointer)

If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
for") then things are rosy again, but I'm not sure if this is hiding another
problem.

I'm using 8 iterations, 8 threads per channel and a 128k buffer size,
although I've reproduced it with 1 iteration, 1 thread, default buffer size
too.

Any ideas? Some of the changes to dmatest.c since 3.9 have some fishy
looking synchronisation variables without barriers/atomic accesses, but I'm
reproducing this on a software model so I wouldn't expect too much memory
re-ordering.

Any help appreciated,

Will

--->8

[   18.911813] dmatest: thread dma0chan0-copy1 exited with status 0
[   18.911916] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   18.911974] pgd = c0003000
[   18.912020] [00000000] *pgd=80000080004003, *pmd=00000000
[   18.912097] Internal error: Oops: 207 [#1] PREEMPT SMP ARM
[   18.912144] Modules linked in: dmatest(-)
[   18.912230] CPU: 0 PID: 651 Comm: dma2chan0-copy3 Not tainted 3.10.0-rc1+ #15
[   18.912300] task: dfac8880 ti: df37a000 task.ti: df37a000
[   18.912378] PC is at _raw_spin_lock_irqsave+0x24/0x60
[   18.912448] LR is at __wake_up+0x20/0x50
[   18.912522] pc : [<c032ec60>]    lr : [<c0044054>]    psr: a0000193
[   18.912522] sp : df37bb78  ip : 00000000  fp : df37bb9c
[   18.912616] r10: df37a000  r9 : 00000000  r8 : 00000000
[   18.912679] r7 : 00000003  r6 : 00000000  r5 : df82c000  r4 : 00000000
[   18.912749] r3 : df37a000  r2 : 00000000  r1 : 40000103  r0 : a0000113
[   18.912818] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[   18.912893] Control: 30c5387d  Table: 9f97ec80  DAC: fffffffd
[   18.912956] Process dma2chan0-copy3 (pid: 651, stack limit = 0xdf37a240)
[   18.913020] Stack: (0xdf37bb78 to 0xdf37c000)
[   18.913093] bb60:                                                       d28de712 00000001
[   18.913207] bb80: 0000006e df836878 df82c000 00000000 df82c040 df82c000 df37bbc4 c01e92d8
[   18.913322] bba0: 00000001 c0ce3a40 00000000 80000193 dfac88b8 00002e7b df82c04c 20000113
[   18.913438] bbc0: c0ce2e9c df836878 df836878 00000080 00000000 70b93fb1 8f46af61 df82c000
[   18.913551] bbe0: df82c004 00000000 c0556508 c0585d00 00000000 df37a000 00000006 c0025304
[   18.913665] bc00: 00000001 00000018 c055c09c df37a000 c055c080 40000102 00000018 c00256e0
[   18.913782] bc20: df80c950 df0850c0 e0802000 00000000 ffff90d1 00200040 df37bcec df37a000
[   18.913895] bc40: 00000060 00000000 e0802000 dfa00cc0 00000000 df0d5c00 df37bcec c0025ae8
[   18.914013] bc60: c0556e48 c000eb38 e080200c c0562820 df37bc90 c0008560 c00444b0 c032efe4
[   18.914130] bc80: 60000113 ffffffff df37bcc4 c000de80 c0ce3a40 dfac8880 df37a000 0000221b
[   18.914244] bca0: 00000000 c0ce3a40 dfacc840 00000002 dfa00cc0 00000000 df0d5c00 df37bcec
[   18.914362] bcc0: c0557a40 df37bcd8 c00444b0 c032efe4 60000113 ffffffff c0ce3a40 df37a000
[   18.914478] bce0: c0557a40 dfac8880 df37bd8c c032db74 00000001 c002bd68 c0586a54 c0586c54
[   18.914593] bd00: df37bd00 df37bd00 6475a018 00000004 00000004 c032e458 c0557a40 c0557a40
[   18.914706] bd20: 00000000 00000008 c055c08c c0025754 c05731c0 df81c840 c0ce6680 00000000
[   18.914821] bd40: ffff90cd 00200040 c01d4a84 df37a000 0000001b 00000000 e0802000 00007317
[   18.914936] bd60: c0556e48 0000001b 00000000 df37a000 80000113 ffffffff df37bdd4 c000dea0
[   18.915052] bd80: df37a000 0001beae df37bd9c c032e458 bf001fe0 bf001e70 00000000 c000deb8
[   18.915164] bda0: df15a200 dfb4ca40 0001beae 00020000 0001c8d5 00000000 0000000b 0000000c
[   18.915279] bdc0: 00007317 deea0000 0001beae 00000000 0001c8d4 df37bde8 bf001fe0 bf001e70
[   18.915393] bde0: 80000113 ffffffff 00020000 df15a200 00000000 00014b97 0000517d 00007317
[   18.915505] be00: dfb4ca40 0001beae 00000000 bf001fe0 0001beae 00000000 00000000 00000001
[   18.915618] be20: 00000080 00020000 0001beae bf002dfc dfb4c980 00000001 bf00262c 00000000
[   18.915731] be40: 00000000 00000001 00000080 ffffffff 000000c0 0000517d df37bf64 bf001cdc
[   18.915843] be60: 00014b97 00007317 dfb4ca40 00000001 0001beae 00000000 00000000 bf000e78
[   18.915954] be80: 9eea0000 00000000 df374000 00000002 9ee8517d 00000000 00000000 00000000
[   18.916066] bea0: df37401c 00000008 df37bf20 00000010 dfac8ab8 00000010 00000000 00000008
[   18.916181] bec0: df37be80 df82e014 df37be98 0000517d df37be80 df37be90 00000001 dfb4c980
[   18.916294] bee0: 00014b97 00000001 00000001 c0564504 c0591d00 00000000 00007317 df97e340
[   18.916408] bf00: 0000517d bf002dfc dfa0ac40 00000003 df37bf44 75b7c14f 00000001 df37bf20
[   18.916521] bf20: 00040004 df37bf24 df37bf24 00000007 00000007 00000000 df37bf38 df37bf38
[   18.916634] bf40: df37bf64 df161e98 00000000 df97e340 bf000b0c 00000000 00000000 00000000
[   18.916746] bf60: 00000000 c003c1a4 dfdfdfcf 00000000 dfdfdfcf df97e340 00000000 00000000
[   18.916861] bf80: df37bf80 df37bf80 00000000 00000000 df37bf90 df37bf90 df37bfac df161e98
[   18.916971] bfa0: c003c100 00000000 00000000 c000e318 00000000 00000000 00000000 00000000
[   18.917078] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   18.917189] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 dfdfdfcf cfdfdfdf
[   18.917318] [<c032ec60>] (_raw_spin_lock_irqsave+0x24/0x60) from [<c0044054>] (__wake_up+0x20/0x50)
[   18.917453] [<c0044054>] (__wake_up+0x20/0x50) from [<c01e92d8>] (pl330_tasklet+0x4c8/0x580)
[   18.917593] [<c01e92d8>] (pl330_tasklet+0x4c8/0x580) from [<c0025304>] (tasklet_action+0x74/0x100)
[   18.917734] [<c0025304>] (tasklet_action+0x74/0x100) from [<c00256e0>] (__do_softirq+0xe0/0x1b8)
[   18.917858] [<c00256e0>] (__do_softirq+0xe0/0x1b8) from [<c0025ae8>] (irq_exit+0x90/0xc8)
[   18.917972] [<c0025ae8>] (irq_exit+0x90/0xc8) from [<c000eb38>] (handle_IRQ+0x3c/0x94)
[   18.918089] [<c000eb38>] (handle_IRQ+0x3c/0x94) from [<c0008560>] (gic_handle_irq+0x28/0x5c)
[   18.918204] [<c0008560>] (gic_handle_irq+0x28/0x5c) from [<c000de80>] (__irq_svc+0x40/0x70)
[   18.918273] Exception stack(0xdf37bc90 to 0xdf37bcd8)
[   18.918357] bc80:                                     c0ce3a40 dfac8880 df37a000 0000221b
[   18.918470] bca0: 00000000 c0ce3a40 dfacc840 00000002 dfa00cc0 00000000 df0d5c00 df37bcec
[   18.918572] bcc0: c0557a40 df37bcd8 c00444b0 c032efe4 60000113 ffffffff
[   18.918697] [<c000de80>] (__irq_svc+0x40/0x70) from [<c032efe4>] (_raw_spin_unlock_irq+0x1c/0x44)
[   18.918831] [<c032efe4>] (_raw_spin_unlock_irq+0x1c/0x44) from [<c00444b0>] (finish_task_switch+0x54/0xdc)
[   18.918965] [<c00444b0>] (finish_task_switch+0x54/0xdc) from [<c032db74>] (__schedule+0x208/0x66c)
[   18.919097] [<c032db74>] (__schedule+0x208/0x66c) from [<c032e458>] (preempt_schedule_irq+0x40/0x64)
[   18.919116] dma2chan0-copy7: #1: No errors with src_off=0x5000 dst_off=0x5028 len=0x8d77 (0)
[   18.919292] [<c032e458>] (preempt_schedule_irq+0x40/0x64) from [<c000deb8>] (svc_preempt+0x8/0x18)
[   18.919431] [<c000deb8>] (svc_preempt+0x8/0x18) from [<bf001e70>] (dmatest_verify.isra.9+0x14/0xb4 [dmatest])
[   18.919499] unwind: Index not found bf001e70
[   18.919582] Code: e3c3303f e5931004 e2811001 e5831004 (e1923f9f)
[   18.919644] ---[ end trace f24842049530d274 ]---
[   18.919699] Kernel panic - not syncing: Fatal exception in interrupt
[   18.919754] CPU4: stopping
[   18.919822] CPU: 4 PID: 655 Comm: dma2chan0-copy7 Tainted: G      D      3.10.0-rc1+ #15
[   18.919949] [<c0013b50>] (unwind_backtrace+0x0/0xf8) from [<c0011434>] (show_stack+0x10/0x14)
[   18.920073] [<c0011434>] (show_stack+0x10/0x14) from [<c0012918>] (handle_IPI+0xf8/0x11c)
[   18.920183] [<c0012918>] (handle_IPI+0xf8/0x11c) from [<c000858c>] (gic_handle_irq+0x54/0x5c)
[   18.920299] [<c000858c>] (gic_handle_irq+0x54/0x5c) from [<c000de80>] (__irq_svc+0x40/0x70)
[   18.920369] Exception stack(0xdef23e18 to 0xdef23e60)
[   18.920442] 3e00:                                                       dfb4ce40 000106d3
[   18.920555] 3e20: defa0000 000015d2 00020000 00012292 00008a83 00000080 ffffffff 000000c0
[   18.920671] 3e40: 00008a83 def23f64 ffffff8e def23e60 bf000fb4 bf000fe8 80000113 ffffffff
[   18.920808] [<c000de80>] (__irq_svc+0x40/0x70) from [<bf000fe8>] (dmatest_func+0x4dc/0x12ec [dmatest])
[   18.920944] [<bf000fe8>] (dmatest_func+0x4dc/0x12ec [dmatest]) from [<c003c1a4>] (kthread+0xa4/0xb0)
[   18.921066] [<c003c1a4>] (kthread+0xa4/0xb0) from [<c000e318>] (ret_from_fork+0x14/0x3c)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: dmatest regression in 3.10-rc1
  2013-05-15 15:28 ` Will Deacon
@ 2013-05-16 15:35   ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-16 15:35 UTC (permalink / raw)
  To: djbw, vinod.koul
  Cc: linux-kernel, linux-arm-kernel, andriy.shevchenko, viresh.kumar

On Wed, May 15, 2013 at 04:28:03PM +0100, Will Deacon wrote:
> I've been observing a regression in the dmatest module with 3.10-rc1. It
> manifests as either:
> 
>  - a spurious timeout on one or more of the channel threads
>  - a complete kernel lockup (loss of console)
>  - a panic (see below, noting that the callback [dmatest_callback] is
>    dereferencing a NULL pointer)
> 
> If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
> for") then things are rosy again, but I'm not sure if this is hiding another
> problem.

Right, so I think I understand what's causing this, but I'll leave it to
Andriy to suggest a fix. The problem comes about because the dmatest
module is now driven from debugfs, making it possible to unload the module
whilst a test run is in progress. In this case:

	- The DMA threads will return from wait_event_freezable_timeout(...)
	  due to kthread_should_stop() returning true, and subsequently
	  report failure because done.done is false.

	- The DMA engines may not be idle, so the asynchronous callback can
	  be invoked after we've started cleaning up, explaining the NULL
	  dereference I'm seeing.

The solutions are either fixing the module exit code to cope with concurrent
DMA transfers or to revert 77101ce578bb and not allow the channel threads to
return mid-transfer.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* dmatest regression in 3.10-rc1
@ 2013-05-16 15:35   ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-16 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 15, 2013 at 04:28:03PM +0100, Will Deacon wrote:
> I've been observing a regression in the dmatest module with 3.10-rc1. It
> manifests as either:
> 
>  - a spurious timeout on one or more of the channel threads
>  - a complete kernel lockup (loss of console)
>  - a panic (see below, noting that the callback [dmatest_callback] is
>    dereferencing a NULL pointer)
> 
> If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
> for") then things are rosy again, but I'm not sure if this is hiding another
> problem.

Right, so I think I understand what's causing this, but I'll leave it to
Andriy to suggest a fix. The problem comes about because the dmatest
module is now driven from debugfs, making it possible to unload the module
whilst a test run is in progress. In this case:

	- The DMA threads will return from wait_event_freezable_timeout(...)
	  due to kthread_should_stop() returning true, and subsequently
	  report failure because done.done is false.

	- The DMA engines may not be idle, so the asynchronous callback can
	  be invoked after we've started cleaning up, explaining the NULL
	  dereference I'm seeing.

The solutions are either fixing the module exit code to cope with concurrent
DMA transfers or to revert 77101ce578bb and not allow the channel threads to
return mid-transfer.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: dmatest regression in 3.10-rc1
  2013-05-16 15:35   ` Will Deacon
@ 2013-05-17 12:34     ` Vinod Koul
  -1 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-17 12:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: djbw, linux-kernel, linux-arm-kernel,
	andriy.shevchenko@linux.intel.com, viresh.kumar

On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> On Wed, May 15, 2013 at 04:28:03PM +0100, Will Deacon wrote:
> > I've been observing a regression in the dmatest module with 3.10-rc1. It
> > manifests as either:
> > 
> >  - a spurious timeout on one or more of the channel threads
> >  - a complete kernel lockup (loss of console)
> >  - a panic (see below, noting that the callback [dmatest_callback] is
> >    dereferencing a NULL pointer)
> > 
> > If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
> > for") then things are rosy again, but I'm not sure if this is hiding another
> > problem.
> 
> Right, so I think I understand what's causing this, but I'll leave it to
> Andriy to suggest a fix. The problem comes about because the dmatest
> module is now driven from debugfs, making it possible to unload the module
> whilst a test run is in progress. In this case:
> 
> 	- The DMA threads will return from wait_event_freezable_timeout(...)
> 	  due to kthread_should_stop() returning true, and subsequently
> 	  report failure because done.done is false.
> 
> 	- The DMA engines may not be idle, so the asynchronous callback can
> 	  be invoked after we've started cleaning up, explaining the NULL
> 	  dereference I'm seeing.
> 
> The solutions are either fixing the module exit code to cope with concurrent
> DMA transfers or to revert 77101ce578bb and not allow the channel threads to
> return mid-transfer.
We need to properly abort the channels on removal. This is already handled in
the code but the kthread_stop is called after the transactions are aborted. It
should be the other way round. Can you try with below patch

---

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..4e8d581 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -822,6 +822,9 @@ static void dmatest_cleanup_channel(struct dmatest_chan *dtc)
 	struct dmatest_thread	*_thread;
 	int			ret;
 
+	/* terminate all transfers on specified channels */
+	dmaengine_terminate_all(dtc->chan);
+
 	list_for_each_entry_safe(thread, _thread, &dtc->threads, node) {
 		ret = kthread_stop(thread->task);
 		pr_debug("dmatest: thread %s exited with status %d\n",
@@ -830,9 +833,6 @@ static void dmatest_cleanup_channel(struct dmatest_chan *dtc)
 		kfree(thread);
 	}
 
-	/* terminate all transfers on specified channels */
-	dmaengine_terminate_all(dtc->chan);
-
 	kfree(dtc);
 }

--
~Vinod 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* dmatest regression in 3.10-rc1
@ 2013-05-17 12:34     ` Vinod Koul
  0 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-17 12:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> On Wed, May 15, 2013 at 04:28:03PM +0100, Will Deacon wrote:
> > I've been observing a regression in the dmatest module with 3.10-rc1. It
> > manifests as either:
> > 
> >  - a spurious timeout on one or more of the channel threads
> >  - a complete kernel lockup (loss of console)
> >  - a panic (see below, noting that the callback [dmatest_callback] is
> >    dereferencing a NULL pointer)
> > 
> > If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
> > for") then things are rosy again, but I'm not sure if this is hiding another
> > problem.
> 
> Right, so I think I understand what's causing this, but I'll leave it to
> Andriy to suggest a fix. The problem comes about because the dmatest
> module is now driven from debugfs, making it possible to unload the module
> whilst a test run is in progress. In this case:
> 
> 	- The DMA threads will return from wait_event_freezable_timeout(...)
> 	  due to kthread_should_stop() returning true, and subsequently
> 	  report failure because done.done is false.
> 
> 	- The DMA engines may not be idle, so the asynchronous callback can
> 	  be invoked after we've started cleaning up, explaining the NULL
> 	  dereference I'm seeing.
> 
> The solutions are either fixing the module exit code to cope with concurrent
> DMA transfers or to revert 77101ce578bb and not allow the channel threads to
> return mid-transfer.
We need to properly abort the channels on removal. This is already handled in
the code but the kthread_stop is called after the transactions are aborted. It
should be the other way round. Can you try with below patch

---

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..4e8d581 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -822,6 +822,9 @@ static void dmatest_cleanup_channel(struct dmatest_chan *dtc)
 	struct dmatest_thread	*_thread;
 	int			ret;
 
+	/* terminate all transfers on specified channels */
+	dmaengine_terminate_all(dtc->chan);
+
 	list_for_each_entry_safe(thread, _thread, &dtc->threads, node) {
 		ret = kthread_stop(thread->task);
 		pr_debug("dmatest: thread %s exited with status %d\n",
@@ -830,9 +833,6 @@ static void dmatest_cleanup_channel(struct dmatest_chan *dtc)
 		kfree(thread);
 	}
 
-	/* terminate all transfers on specified channels */
-	dmaengine_terminate_all(dtc->chan);
-
 	kfree(dtc);
 }

--
~Vinod 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: dmatest regression in 3.10-rc1
  2013-05-17 12:34     ` Vinod Koul
@ 2013-05-17 14:18       ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-17 14:18 UTC (permalink / raw)
  To: Vinod Koul
  Cc: djbw, linux-kernel, linux-arm-kernel,
	andriy.shevchenko@linux.intel.com, viresh.kumar

Hi Vinod,

Thanks for the reply.

On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > Right, so I think I understand what's causing this, but I'll leave it to
> > Andriy to suggest a fix. The problem comes about because the dmatest
> > module is now driven from debugfs, making it possible to unload the module
> > whilst a test run is in progress. In this case:
> > 
> > 	- The DMA threads will return from wait_event_freezable_timeout(...)
> > 	  due to kthread_should_stop() returning true, and subsequently
> > 	  report failure because done.done is false.
> > 
> > 	- The DMA engines may not be idle, so the asynchronous callback can
> > 	  be invoked after we've started cleaning up, explaining the NULL
> > 	  dereference I'm seeing.
> > 
> > The solutions are either fixing the module exit code to cope with concurrent
> > DMA transfers or to revert 77101ce578bb and not allow the channel threads to
> > return mid-transfer.
> We need to properly abort the channels on removal. This is already handled in
> the code but the kthread_stop is called after the transactions are aborted. It
> should be the other way round. Can you try with below patch

Unfortunately, I can trigger the exact same panic with this patch applied.

Isn't there a race between terminating the dmaengine transfers
(dmaengine_terminate_all) and killing the test threads (kthread_stop) where
a new transfer could be kicked off by dmatest_func?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* dmatest regression in 3.10-rc1
@ 2013-05-17 14:18       ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-17 14:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Vinod,

Thanks for the reply.

On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > Right, so I think I understand what's causing this, but I'll leave it to
> > Andriy to suggest a fix. The problem comes about because the dmatest
> > module is now driven from debugfs, making it possible to unload the module
> > whilst a test run is in progress. In this case:
> > 
> > 	- The DMA threads will return from wait_event_freezable_timeout(...)
> > 	  due to kthread_should_stop() returning true, and subsequently
> > 	  report failure because done.done is false.
> > 
> > 	- The DMA engines may not be idle, so the asynchronous callback can
> > 	  be invoked after we've started cleaning up, explaining the NULL
> > 	  dereference I'm seeing.
> > 
> > The solutions are either fixing the module exit code to cope with concurrent
> > DMA transfers or to revert 77101ce578bb and not allow the channel threads to
> > return mid-transfer.
> We need to properly abort the channels on removal. This is already handled in
> the code but the kthread_stop is called after the transactions are aborted. It
> should be the other way round. Can you try with below patch

Unfortunately, I can trigger the exact same panic with this patch applied.

Isn't there a race between terminating the dmaengine transfers
(dmaengine_terminate_all) and killing the test threads (kthread_stop) where
a new transfer could be kicked off by dmatest_func?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: dmatest regression in 3.10-rc1
  2013-05-17 14:18       ` Will Deacon
@ 2013-05-20  7:52         ` Andy Shevchenko
  -1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-20  7:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: Vinod Koul, djbw, linux-kernel, linux-arm-kernel,
	andriy.shevchenko@linux.intel.com, viresh.kumar

On Fri, 2013-05-17 at 15:18 +0100, Will Deacon wrote: 
> On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> > On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > > Right, so I think I understand what's causing this, but I'll leave it to
> > > Andriy to suggest a fix.

Thanks for the report.

Unfortunately neither me nor other guys in the team could not reproduce
mentioned issue on our hardware.
Nevertheless I will try to allocate time and suggest possible fix if any
comes to my mind. So, your testing and analysis is appreciated.

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* dmatest regression in 3.10-rc1
@ 2013-05-20  7:52         ` Andy Shevchenko
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-20  7:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2013-05-17 at 15:18 +0100, Will Deacon wrote: 
> On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> > On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > > Right, so I think I understand what's causing this, but I'll leave it to
> > > Andriy to suggest a fix.

Thanks for the report.

Unfortunately neither me nor other guys in the team could not reproduce
mentioned issue on our hardware.
Nevertheless I will try to allocate time and suggest possible fix if any
comes to my mind. So, your testing and analysis is appreciated.

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: dmatest regression in 3.10-rc1
  2013-05-20  7:52         ` Andy Shevchenko
@ 2013-05-20  9:58           ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-20  9:58 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Vinod Koul, djbw, linux-kernel, linux-arm-kernel,
	andriy.shevchenko@linux.intel.com, viresh.kumar

Hi Andy,

On Mon, May 20, 2013 at 08:52:34AM +0100, Andy Shevchenko wrote:
> On Fri, 2013-05-17 at 15:18 +0100, Will Deacon wrote: 
> > On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> > > On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > > > Right, so I think I understand what's causing this, but I'll leave it to
> > > > Andriy to suggest a fix.
> 
> Thanks for the report.
> 
> Unfortunately neither me nor other guys in the team could not reproduce
> mentioned issue on our hardware.
> Nevertheless I will try to allocate time and suggest possible fix if any
> comes to my mind. So, your testing and analysis is appreciated.

Sure. They way I trigger it is to unload the module whilst the DMA is
ongoing (I'm using a software model, so I can make the DMA nice and slow --
I guess you could try using some large buffers).

My script (I just run from busybox) is:

  #!/bin/sh

  mount proc -t proc /proc
  mount sysfs -t sysfs /sys
  mount debugfs -t debugfs /sys/kernel/debug

  echo 9 > /proc/sysrq-trigger

  modprobe dmatest
  echo 8 > /sys/kernel/debug/dmatest/iterations
  echo 8 > /sys/kernel/debug/dmatest/threads_per_chan
  echo 131072 > /sys/kernel/debug/dmatest/test_buf_size
  echo 1 > /sys/kernel/debug/dmatest/run
  modprobe -r dmatest

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* dmatest regression in 3.10-rc1
@ 2013-05-20  9:58           ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-20  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Andy,

On Mon, May 20, 2013 at 08:52:34AM +0100, Andy Shevchenko wrote:
> On Fri, 2013-05-17 at 15:18 +0100, Will Deacon wrote: 
> > On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> > > On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > > > Right, so I think I understand what's causing this, but I'll leave it to
> > > > Andriy to suggest a fix.
> 
> Thanks for the report.
> 
> Unfortunately neither me nor other guys in the team could not reproduce
> mentioned issue on our hardware.
> Nevertheless I will try to allocate time and suggest possible fix if any
> comes to my mind. So, your testing and analysis is appreciated.

Sure. They way I trigger it is to unload the module whilst the DMA is
ongoing (I'm using a software model, so I can make the DMA nice and slow --
I guess you could try using some large buffers).

My script (I just run from busybox) is:

  #!/bin/sh

  mount proc -t proc /proc
  mount sysfs -t sysfs /sys
  mount debugfs -t debugfs /sys/kernel/debug

  echo 9 > /proc/sysrq-trigger

  modprobe dmatest
  echo 8 > /sys/kernel/debug/dmatest/iterations
  echo 8 > /sys/kernel/debug/dmatest/threads_per_chan
  echo 131072 > /sys/kernel/debug/dmatest/test_buf_size
  echo 1 > /sys/kernel/debug/dmatest/run
  modprobe -r dmatest

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: dmatest regression in 3.10-rc1
  2013-05-20  9:58           ` Will Deacon
@ 2013-05-21 12:31             ` Andy Shevchenko
  -1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 12:31 UTC (permalink / raw)
  To: Will Deacon
  Cc: Vinod Koul, djbw, linux-kernel, linux-arm-kernel,
	andriy.shevchenko@linux.intel.com, viresh.kumar

On Mon, 2013-05-20 at 10:58 +0100, Will Deacon wrote: 
> Sure. They way I trigger it is to unload the module whilst the DMA is
> ongoing (I'm using a software model, so I can make the DMA nice and slow --
> I guess you could try using some large buffers).

Thank you for the script. It looks like I managed to reproduce it with
help of your script. I'm about to send the patch. I will appreciate to
collect a Tested-by tags in case it solves the issue.


-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* dmatest regression in 3.10-rc1
@ 2013-05-21 12:31             ` Andy Shevchenko
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 12:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2013-05-20 at 10:58 +0100, Will Deacon wrote: 
> Sure. They way I trigger it is to unload the module whilst the DMA is
> ongoing (I'm using a software model, so I can make the DMA nice and slow --
> I guess you could try using some large buffers).

Thank you for the script. It looks like I managed to reproduce it with
help of your script. I'm about to send the patch. I will appreciate to
collect a Tested-by tags in case it solves the issue.


-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
  2013-05-16 15:35   ` Will Deacon
@ 2013-05-21 12:33     ` Andy Shevchenko
  -1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 12:33 UTC (permalink / raw)
  To: Vinod Koul, djbw @ fb . com, linux-kernel @ vger . kernel . org,
	linux-arm-kernel @ lists . infradead . org, viresh.kumar,
	Will Deacon
  Cc: Andy Shevchenko

When thread is going to be stopped we have to unconditionally terminate all
ongoing transfers. Otherwise it would be possible that callback function will
be called on the next interrupt and will try to access to already freed
structures.

The patch introduces specific error message for this, though it doesn't
increase the counter of the failed tests.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Will Deacon <will.deacon@arm.com>
---
 drivers/dma/dmatest.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..f61bd55 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -92,6 +92,7 @@ enum dmatest_error_type {
 	DMATEST_ET_MAP_DST,
 	DMATEST_ET_PREP,
 	DMATEST_ET_SUBMIT,
+	DMATEST_ET_ABORT,
 	DMATEST_ET_TIMEOUT,
 	DMATEST_ET_DMA_ERROR,
 	DMATEST_ET_DMA_IN_PROGRESS,
@@ -366,6 +367,7 @@ static char *thread_result_get(const char *name,
 		[DMATEST_ET_MAP_DST]		= "dst mapping error",
 		[DMATEST_ET_PREP]		= "prep error",
 		[DMATEST_ET_SUBMIT]		= "submit error",
+		[DMATEST_ET_ABORT]		= "transfer aborted",
 		[DMATEST_ET_TIMEOUT]		= "test timed out",
 		[DMATEST_ET_DMA_ERROR]		=
 			"got completion callback (DMA_ERROR)",
@@ -720,6 +722,15 @@ static int dmatest_func(void *data)
 					     done.done || kthread_should_stop(),
 					     msecs_to_jiffies(params->timeout));
 
+		/* terminate all transfers on specified channel if needed */
+		if (kthread_should_stop()) {
+			dmaengine_terminate_all(chan);
+			thread_result_add(info, result, DMATEST_ET_ABORT,
+					  total_tests, src_off, dst_off,
+					  len, 0);
+			break;
+		}
+
 		status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
 
 		if (!done.done) {
-- 
1.8.2.rc0.22.gb3600c3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-21 12:33     ` Andy Shevchenko
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

When thread is going to be stopped we have to unconditionally terminate all
ongoing transfers. Otherwise it would be possible that callback function will
be called on the next interrupt and will try to access to already freed
structures.

The patch introduces specific error message for this, though it doesn't
increase the counter of the failed tests.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Will Deacon <will.deacon@arm.com>
---
 drivers/dma/dmatest.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..f61bd55 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -92,6 +92,7 @@ enum dmatest_error_type {
 	DMATEST_ET_MAP_DST,
 	DMATEST_ET_PREP,
 	DMATEST_ET_SUBMIT,
+	DMATEST_ET_ABORT,
 	DMATEST_ET_TIMEOUT,
 	DMATEST_ET_DMA_ERROR,
 	DMATEST_ET_DMA_IN_PROGRESS,
@@ -366,6 +367,7 @@ static char *thread_result_get(const char *name,
 		[DMATEST_ET_MAP_DST]		= "dst mapping error",
 		[DMATEST_ET_PREP]		= "prep error",
 		[DMATEST_ET_SUBMIT]		= "submit error",
+		[DMATEST_ET_ABORT]		= "transfer aborted",
 		[DMATEST_ET_TIMEOUT]		= "test timed out",
 		[DMATEST_ET_DMA_ERROR]		=
 			"got completion callback (DMA_ERROR)",
@@ -720,6 +722,15 @@ static int dmatest_func(void *data)
 					     done.done || kthread_should_stop(),
 					     msecs_to_jiffies(params->timeout));
 
+		/* terminate all transfers on specified channel if needed */
+		if (kthread_should_stop()) {
+			dmaengine_terminate_all(chan);
+			thread_result_add(info, result, DMATEST_ET_ABORT,
+					  total_tests, src_off, dst_off,
+					  len, 0);
+			break;
+		}
+
 		status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
 
 		if (!done.done) {
-- 
1.8.2.rc0.22.gb3600c3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] dmatest: abort transfers immediately when asked for
  2013-05-21 12:33     ` Andy Shevchenko
@ 2013-05-21 15:11       ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-21 15:11 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Vinod Koul, djbw @ fb . com, linux-kernel @ vger . kernel . org,
	linux-arm-kernel @ lists . infradead . org, viresh.kumar

Hi Andy,

On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> When thread is going to be stopped we have to unconditionally terminate all
> ongoing transfers. Otherwise it would be possible that callback function will
> be called on the next interrupt and will try to access to already freed
> structures.
> 
> The patch introduces specific error message for this, though it doesn't
> increase the counter of the failed tests.
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reported-by: Will Deacon <will.deacon@arm.com>

Thanks for persevering with this! Although this patch definitely fixes the
panic I was seeing, I now observe buffer verification failures in subsequent
test runs after an aborted run:

[   43.888424] dma0chan0-copy0: #1: transfer aborted with src_off=0x81f0 dst_off=0x9f4 len=0xdf29 (0)
[   43.888526] dma0chan0-copy0: terminating after 1 tests, 0 failures (status 0)
[   43.888635] dmatest: thread dma0chan0-copy0 exited with status 0

<reload module and start new tests>

[   77.341405] dma0chan0-copy0: #1: errors with src_off=0x8e29 dst_off=0x11dc8 len=0x2b93 (370)
[   77.341505] dma0chan0-copy0: terminating after 1 tests, 1 failures (status 0)

Note that the non-aborted transfers (I have 4 dma controllers) seem fine.

Looking at the results file doesn't show anything obvious, although I do
see some strange entries:

	[...]
	dstbuf not copied! [0xffffffff] Expected ff, got ff
	dstbuf not copied! [0xffffffff] Expected ff, got ff
	dstbuf not copied! [0xffffffff] Expected 00, got 00
	dstbuf not copied! [0xffffffff] Expected 00, got 00
	dstbuf not copied! [0xffffffff] Expected ff, got ff
	[...]

The more alarming ones look like:

	[...]
	dstbuf not copied! [0x147e9] Expected d5, got 36
	dstbuf not copied! [0x147ea] Expected d4, got 35
	dstbuf not copied! [0x147eb] Expected d3, got 34
	dstbuf not copied! [0x147ec] Expected d2, got 33
	dstbuf not copied! [0x147ed] Expected d1, got 32
	dstbuf not copied! [0x147ee] Expected d0, got 31
	[...]

Of course, this could be a driver bug with the PL330 that is being exposed
by this aborting code.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-21 15:11       ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-21 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Andy,

On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> When thread is going to be stopped we have to unconditionally terminate all
> ongoing transfers. Otherwise it would be possible that callback function will
> be called on the next interrupt and will try to access to already freed
> structures.
> 
> The patch introduces specific error message for this, though it doesn't
> increase the counter of the failed tests.
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reported-by: Will Deacon <will.deacon@arm.com>

Thanks for persevering with this! Although this patch definitely fixes the
panic I was seeing, I now observe buffer verification failures in subsequent
test runs after an aborted run:

[   43.888424] dma0chan0-copy0: #1: transfer aborted with src_off=0x81f0 dst_off=0x9f4 len=0xdf29 (0)
[   43.888526] dma0chan0-copy0: terminating after 1 tests, 0 failures (status 0)
[   43.888635] dmatest: thread dma0chan0-copy0 exited with status 0

<reload module and start new tests>

[   77.341405] dma0chan0-copy0: #1: errors with src_off=0x8e29 dst_off=0x11dc8 len=0x2b93 (370)
[   77.341505] dma0chan0-copy0: terminating after 1 tests, 1 failures (status 0)

Note that the non-aborted transfers (I have 4 dma controllers) seem fine.

Looking at the results file doesn't show anything obvious, although I do
see some strange entries:

	[...]
	dstbuf not copied! [0xffffffff] Expected ff, got ff
	dstbuf not copied! [0xffffffff] Expected ff, got ff
	dstbuf not copied! [0xffffffff] Expected 00, got 00
	dstbuf not copied! [0xffffffff] Expected 00, got 00
	dstbuf not copied! [0xffffffff] Expected ff, got ff
	[...]

The more alarming ones look like:

	[...]
	dstbuf not copied! [0x147e9] Expected d5, got 36
	dstbuf not copied! [0x147ea] Expected d4, got 35
	dstbuf not copied! [0x147eb] Expected d3, got 34
	dstbuf not copied! [0x147ec] Expected d2, got 33
	dstbuf not copied! [0x147ed] Expected d1, got 32
	dstbuf not copied! [0x147ee] Expected d0, got 31
	[...]

Of course, this could be a driver bug with the PL330 that is being exposed
by this aborting code.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dmatest: abort transfers immediately when asked for
  2013-05-21 15:11       ` Will Deacon
@ 2013-05-21 17:24         ` Andy Shevchenko
  -1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 17:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andy Shevchenko, Vinod Koul, djbw @ fb . com,
	linux-kernel @ vger . kernel . org,
	linux-arm-kernel @ lists . infradead . org, viresh.kumar

On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Andy,
>
> On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> When thread is going to be stopped we have to unconditionally terminate all
>> ongoing transfers. Otherwise it would be possible that callback function will
>> be called on the next interrupt and will try to access to already freed
>> structures.
>>
>> The patch introduces specific error message for this, though it doesn't
>> increase the counter of the failed tests.
>>
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> Reported-by: Will Deacon <will.deacon@arm.com>
>
> Thanks for persevering with this! Although this patch definitely fixes the
> panic I was seeing, I now observe buffer verification failures in subsequent
> test runs after an aborted run:

I think the description to the commit adfa543e "dmatest: don't use
set_freezable_with_signal()" may shed light on this.

The background (if  I got it correctly) is in race with done flag. So,
we got a callback call from the DMA engine, but we don't know which
transfer triggers it.
I might be wrong. This is just an assumption.

Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
with old dmatest module)

--
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-21 17:24         ` Andy Shevchenko
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 17:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Andy,
>
> On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> When thread is going to be stopped we have to unconditionally terminate all
>> ongoing transfers. Otherwise it would be possible that callback function will
>> be called on the next interrupt and will try to access to already freed
>> structures.
>>
>> The patch introduces specific error message for this, though it doesn't
>> increase the counter of the failed tests.
>>
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> Reported-by: Will Deacon <will.deacon@arm.com>
>
> Thanks for persevering with this! Although this patch definitely fixes the
> panic I was seeing, I now observe buffer verification failures in subsequent
> test runs after an aborted run:

I think the description to the commit adfa543e "dmatest: don't use
set_freezable_with_signal()" may shed light on this.

The background (if  I got it correctly) is in race with done flag. So,
we got a callback call from the DMA engine, but we don't know which
transfer triggers it.
I might be wrong. This is just an assumption.

Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
with old dmatest module)

--
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dmatest: abort transfers immediately when asked for
  2013-05-21 17:24         ` Andy Shevchenko
@ 2013-05-22 12:41           ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-22 12:41 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andy Shevchenko, Vinod Koul, djbw @ fb . com,
	linux-kernel @ vger . kernel . org,
	linux-arm-kernel @ lists . infradead . org, viresh.kumar

On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
> 
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
> 
> The background (if  I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.

I've not managed to work out exactly what's going on, but it's certainly
something like that. In fact, I just managed to trigger a case where all but
one of the transfers is aborted, whilst the remaining one fails. Looking at
the code, I can't see how that situation comes about, since the threads are
protected with the info mutex and kthread_stop is synchronous.

> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)

No, dmatest from 3.9 is completely reliable in my experience.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-22 12:41           ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-22 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
> 
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
> 
> The background (if  I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.

I've not managed to work out exactly what's going on, but it's certainly
something like that. In fact, I just managed to trigger a case where all but
one of the transfers is aborted, whilst the remaining one fails. Looking at
the code, I can't see how that situation comes about, since the threads are
protected with the info mutex and kthread_stop is synchronous.

> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)

No, dmatest from 3.9 is completely reliable in my experience.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dmatest: abort transfers immediately when asked for
  2013-05-22 12:41           ` Will Deacon
@ 2013-05-22 13:26             ` Andy Shevchenko
  -1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-22 13:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andy Shevchenko, Vinod Koul, djbw @ fb . com,
	linux-kernel @ vger . kernel . org,
	linux-arm-kernel @ lists . infradead . org, viresh.kumar

On Wed, May 22, 2013 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if  I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>
> I've not managed to work out exactly what's going on, but it's certainly
> something like that. In fact, I just managed to trigger a case where all but
> one of the transfers is aborted, whilst the remaining one fails. Looking at
> the code, I can't see how that situation comes about, since the threads are
> protected with the info mutex and kthread_stop is synchronous.

>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
>
> No, dmatest from 3.9 is completely reliable in my experience.

Yeah, I supposed that was a rhetorical question.

So, have I understood correctly that if you revert the 77101ce5
("cancel thread ...")
everything is working fine / as before?

--
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-22 13:26             ` Andy Shevchenko
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-22 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 22, 2013 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if  I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>
> I've not managed to work out exactly what's going on, but it's certainly
> something like that. In fact, I just managed to trigger a case where all but
> one of the transfers is aborted, whilst the remaining one fails. Looking at
> the code, I can't see how that situation comes about, since the threads are
> protected with the info mutex and kthread_stop is synchronous.

>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
>
> No, dmatest from 3.9 is completely reliable in my experience.

Yeah, I supposed that was a rhetorical question.

So, have I understood correctly that if you revert the 77101ce5
("cancel thread ...")
everything is working fine / as before?

--
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dmatest: abort transfers immediately when asked for
  2013-05-21 17:24         ` Andy Shevchenko
@ 2013-05-23 10:09           ` Vinod Koul
  -1 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-23 10:09 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Will Deacon, Andy Shevchenko, djbw @ fb . com,
	linux-kernel @ vger . kernel . org,
	linux-arm-kernel @ lists . infradead . org, viresh.kumar

On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
> 
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
> 
> The background (if  I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.
> 
> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)
Terminate shouldnt cause the issue with buffer verfication, can you try this on
dw_dmac, do you see similar failures on verfication?

--
~Vinod

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-23 10:09           ` Vinod Koul
  0 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-23 10:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
> 
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
> 
> The background (if  I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.
> 
> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)
Terminate shouldnt cause the issue with buffer verfication, can you try this on
dw_dmac, do you see similar failures on verfication?

--
~Vinod

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dmatest: abort transfers immediately when asked for
  2013-05-23 10:51             ` Andy Shevchenko
@ 2013-05-23 10:22               ` Vinod Koul
  -1 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-23 10:22 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Will Deacon, Andy Shevchenko, djbw @ fb . com,
	linux-kernel @ vger . kernel . org,
	linux-arm-kernel @ lists . infradead . org, viresh.kumar

On Thu, May 23, 2013 at 01:51:29PM +0300, Andy Shevchenko wrote:
> On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> > On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> >> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> >> When thread is going to be stopped we have to unconditionally terminate all
> >> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> >> be called on the next interrupt and will try to access to already freed
> >> >> structures.
> >> >>
> >> >> The patch introduces specific error message for this, though it doesn't
> >> >> increase the counter of the failed tests.
> >> >>
> >> > Thanks for persevering with this! Although this patch definitely fixes the
> >> > panic I was seeing, I now observe buffer verification failures in subsequent
> >> > test runs after an aborted run:
> >>
> >> I think the description to the commit adfa543e "dmatest: don't use
> >> set_freezable_with_signal()" may shed light on this.
> >>
> >> The background (if  I got it correctly) is in race with done flag. So,
> >> we got a callback call from the DMA engine, but we don't know which
> >> transfer triggers it.
> >> I might be wrong. This is just an assumption.
> >>
> >> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> >> with old dmatest module)
> > Terminate shouldnt cause the issue with buffer verfication, can you try this on
> > dw_dmac, do you see similar failures on verfication?
> 
> I saw the similar errors on dw_dmac on Intel Medfield device.
Ah, so may not be a driver issue.

> Anyway, I checked another approach with Will.
> For now I will send a quick fix that prevents tester to abort an
> ongoing transfer.
okay so taht should prevent regression, let me check on my setup if I find
anything

--
~Vinod
> In future we could implement a robust logic when transfers can be
> interrupted at any time.
> 
> -- 
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-23 10:22               ` Vinod Koul
  0 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-23 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 23, 2013 at 01:51:29PM +0300, Andy Shevchenko wrote:
> On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> > On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> >> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> >> When thread is going to be stopped we have to unconditionally terminate all
> >> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> >> be called on the next interrupt and will try to access to already freed
> >> >> structures.
> >> >>
> >> >> The patch introduces specific error message for this, though it doesn't
> >> >> increase the counter of the failed tests.
> >> >>
> >> > Thanks for persevering with this! Although this patch definitely fixes the
> >> > panic I was seeing, I now observe buffer verification failures in subsequent
> >> > test runs after an aborted run:
> >>
> >> I think the description to the commit adfa543e "dmatest: don't use
> >> set_freezable_with_signal()" may shed light on this.
> >>
> >> The background (if  I got it correctly) is in race with done flag. So,
> >> we got a callback call from the DMA engine, but we don't know which
> >> transfer triggers it.
> >> I might be wrong. This is just an assumption.
> >>
> >> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> >> with old dmatest module)
> > Terminate shouldnt cause the issue with buffer verfication, can you try this on
> > dw_dmac, do you see similar failures on verfication?
> 
> I saw the similar errors on dw_dmac on Intel Medfield device.
Ah, so may not be a driver issue.

> Anyway, I checked another approach with Will.
> For now I will send a quick fix that prevents tester to abort an
> ongoing transfer.
okay so taht should prevent regression, let me check on my setup if I find
anything

--
~Vinod
> In future we could implement a robust logic when transfers can be
> interrupted at any time.
> 
> -- 
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] dmatest: abort transfers immediately when asked for
  2013-05-23 10:09           ` Vinod Koul
@ 2013-05-23 10:51             ` Andy Shevchenko
  -1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-23 10:51 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Will Deacon, Andy Shevchenko, djbw @ fb . com,
	linux-kernel @ vger . kernel . org,
	linux-arm-kernel @ lists . infradead . org, viresh.kumar

On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> >> When thread is going to be stopped we have to unconditionally terminate all
>> >> ongoing transfers. Otherwise it would be possible that callback function will
>> >> be called on the next interrupt and will try to access to already freed
>> >> structures.
>> >>
>> >> The patch introduces specific error message for this, though it doesn't
>> >> increase the counter of the failed tests.
>> >>
>> > Thanks for persevering with this! Although this patch definitely fixes the
>> > panic I was seeing, I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if  I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>>
>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
> Terminate shouldnt cause the issue with buffer verfication, can you try this on
> dw_dmac, do you see similar failures on verfication?

I saw the similar errors on dw_dmac on Intel Medfield device.
Anyway, I checked another approach with Will.
For now I will send a quick fix that prevents tester to abort an
ongoing transfer.
In future we could implement a robust logic when transfers can be
interrupted at any time.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-23 10:51             ` Andy Shevchenko
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-23 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> >> When thread is going to be stopped we have to unconditionally terminate all
>> >> ongoing transfers. Otherwise it would be possible that callback function will
>> >> be called on the next interrupt and will try to access to already freed
>> >> structures.
>> >>
>> >> The patch introduces specific error message for this, though it doesn't
>> >> increase the counter of the failed tests.
>> >>
>> > Thanks for persevering with this! Although this patch definitely fixes the
>> > panic I was seeing, I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if  I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>>
>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
> Terminate shouldnt cause the issue with buffer verfication, can you try this on
> dw_dmac, do you see similar failures on verfication?

I saw the similar errors on dw_dmac on Intel Medfield device.
Anyway, I checked another approach with Will.
For now I will send a quick fix that prevents tester to abort an
ongoing transfer.
In future we could implement a robust logic when transfers can be
interrupted at any time.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-05-23 10:59 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-15 15:28 dmatest regression in 3.10-rc1 Will Deacon
2013-05-15 15:28 ` Will Deacon
2013-05-16 15:35 ` Will Deacon
2013-05-16 15:35   ` Will Deacon
2013-05-17 12:34   ` Vinod Koul
2013-05-17 12:34     ` Vinod Koul
2013-05-17 14:18     ` Will Deacon
2013-05-17 14:18       ` Will Deacon
2013-05-20  7:52       ` Andy Shevchenko
2013-05-20  7:52         ` Andy Shevchenko
2013-05-20  9:58         ` Will Deacon
2013-05-20  9:58           ` Will Deacon
2013-05-21 12:31           ` Andy Shevchenko
2013-05-21 12:31             ` Andy Shevchenko
2013-05-21 12:33   ` [PATCH] dmatest: abort transfers immediately when asked for Andy Shevchenko
2013-05-21 12:33     ` Andy Shevchenko
2013-05-21 15:11     ` Will Deacon
2013-05-21 15:11       ` Will Deacon
2013-05-21 17:24       ` Andy Shevchenko
2013-05-21 17:24         ` Andy Shevchenko
2013-05-22 12:41         ` Will Deacon
2013-05-22 12:41           ` Will Deacon
2013-05-22 13:26           ` Andy Shevchenko
2013-05-22 13:26             ` Andy Shevchenko
2013-05-23 10:09         ` Vinod Koul
2013-05-23 10:09           ` Vinod Koul
2013-05-23 10:51           ` Andy Shevchenko
2013-05-23 10:51             ` Andy Shevchenko
2013-05-23 10:22             ` Vinod Koul
2013-05-23 10:22               ` Vinod Koul

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.