* dmatest regression in 3.10-rc1
@ 2013-05-16 15:35 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-16 15:35 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, May 15, 2013 at 04:28:03PM +0100, Will Deacon wrote:
> I've been observing a regression in the dmatest module with 3.10-rc1. It
> manifests as either:
>
> - a spurious timeout on one or more of the channel threads
> - a complete kernel lockup (loss of console)
> - a panic (see below, noting that the callback [dmatest_callback] is
> dereferencing a NULL pointer)
>
> If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
> for") then things are rosy again, but I'm not sure if this is hiding another
> problem.
Right, so I think I understand what's causing this, but I'll leave it to
Andriy to suggest a fix. The problem comes about because the dmatest
module is now driven from debugfs, making it possible to unload the module
whilst a test run is in progress. In this case:
- The DMA threads will return from wait_event_freezable_timeout(...)
due to kthread_should_stop() returning true, and subsequently
report failure because done.done is false.
- The DMA engines may not be idle, so the asynchronous callback can
be invoked after we've started cleaning up, explaining the NULL
dereference I'm seeing.
The solutions are either fixing the module exit code to cope with concurrent
DMA transfers or to revert 77101ce578bb and not allow the channel threads to
return mid-transfer.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: dmatest regression in 3.10-rc1
2013-05-16 15:35 ` Will Deacon
@ 2013-05-17 12:34 ` Vinod Koul
-1 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-17 12:34 UTC (permalink / raw)
To: Will Deacon
Cc: djbw, linux-kernel, linux-arm-kernel,
andriy.shevchenko@linux.intel.com, viresh.kumar
On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> On Wed, May 15, 2013 at 04:28:03PM +0100, Will Deacon wrote:
> > I've been observing a regression in the dmatest module with 3.10-rc1. It
> > manifests as either:
> >
> > - a spurious timeout on one or more of the channel threads
> > - a complete kernel lockup (loss of console)
> > - a panic (see below, noting that the callback [dmatest_callback] is
> > dereferencing a NULL pointer)
> >
> > If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
> > for") then things are rosy again, but I'm not sure if this is hiding another
> > problem.
>
> Right, so I think I understand what's causing this, but I'll leave it to
> Andriy to suggest a fix. The problem comes about because the dmatest
> module is now driven from debugfs, making it possible to unload the module
> whilst a test run is in progress. In this case:
>
> - The DMA threads will return from wait_event_freezable_timeout(...)
> due to kthread_should_stop() returning true, and subsequently
> report failure because done.done is false.
>
> - The DMA engines may not be idle, so the asynchronous callback can
> be invoked after we've started cleaning up, explaining the NULL
> dereference I'm seeing.
>
> The solutions are either fixing the module exit code to cope with concurrent
> DMA transfers or to revert 77101ce578bb and not allow the channel threads to
> return mid-transfer.
We need to properly abort the channels on removal. This is already handled in
the code but the kthread_stop is called after the transactions are aborted. It
should be the other way round. Can you try with below patch
---
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..4e8d581 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -822,6 +822,9 @@ static void dmatest_cleanup_channel(struct dmatest_chan *dtc)
struct dmatest_thread *_thread;
int ret;
+ /* terminate all transfers on specified channels */
+ dmaengine_terminate_all(dtc->chan);
+
list_for_each_entry_safe(thread, _thread, &dtc->threads, node) {
ret = kthread_stop(thread->task);
pr_debug("dmatest: thread %s exited with status %d\n",
@@ -830,9 +833,6 @@ static void dmatest_cleanup_channel(struct dmatest_chan *dtc)
kfree(thread);
}
- /* terminate all transfers on specified channels */
- dmaengine_terminate_all(dtc->chan);
-
kfree(dtc);
}
--
~Vinod
^ permalink raw reply related [flat|nested] 30+ messages in thread
* dmatest regression in 3.10-rc1
@ 2013-05-17 12:34 ` Vinod Koul
0 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-17 12:34 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> On Wed, May 15, 2013 at 04:28:03PM +0100, Will Deacon wrote:
> > I've been observing a regression in the dmatest module with 3.10-rc1. It
> > manifests as either:
> >
> > - a spurious timeout on one or more of the channel threads
> > - a complete kernel lockup (loss of console)
> > - a panic (see below, noting that the callback [dmatest_callback] is
> > dereferencing a NULL pointer)
> >
> > If I revert 77101ce578bb ("dmatest: cancel thread immediately when asked
> > for") then things are rosy again, but I'm not sure if this is hiding another
> > problem.
>
> Right, so I think I understand what's causing this, but I'll leave it to
> Andriy to suggest a fix. The problem comes about because the dmatest
> module is now driven from debugfs, making it possible to unload the module
> whilst a test run is in progress. In this case:
>
> - The DMA threads will return from wait_event_freezable_timeout(...)
> due to kthread_should_stop() returning true, and subsequently
> report failure because done.done is false.
>
> - The DMA engines may not be idle, so the asynchronous callback can
> be invoked after we've started cleaning up, explaining the NULL
> dereference I'm seeing.
>
> The solutions are either fixing the module exit code to cope with concurrent
> DMA transfers or to revert 77101ce578bb and not allow the channel threads to
> return mid-transfer.
We need to properly abort the channels on removal. This is already handled in
the code but the kthread_stop is called after the transactions are aborted. It
should be the other way round. Can you try with below patch
---
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..4e8d581 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -822,6 +822,9 @@ static void dmatest_cleanup_channel(struct dmatest_chan *dtc)
struct dmatest_thread *_thread;
int ret;
+ /* terminate all transfers on specified channels */
+ dmaengine_terminate_all(dtc->chan);
+
list_for_each_entry_safe(thread, _thread, &dtc->threads, node) {
ret = kthread_stop(thread->task);
pr_debug("dmatest: thread %s exited with status %d\n",
@@ -830,9 +833,6 @@ static void dmatest_cleanup_channel(struct dmatest_chan *dtc)
kfree(thread);
}
- /* terminate all transfers on specified channels */
- dmaengine_terminate_all(dtc->chan);
-
kfree(dtc);
}
--
~Vinod
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: dmatest regression in 3.10-rc1
2013-05-17 12:34 ` Vinod Koul
@ 2013-05-17 14:18 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-17 14:18 UTC (permalink / raw)
To: Vinod Koul
Cc: djbw, linux-kernel, linux-arm-kernel,
andriy.shevchenko@linux.intel.com, viresh.kumar
Hi Vinod,
Thanks for the reply.
On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > Right, so I think I understand what's causing this, but I'll leave it to
> > Andriy to suggest a fix. The problem comes about because the dmatest
> > module is now driven from debugfs, making it possible to unload the module
> > whilst a test run is in progress. In this case:
> >
> > - The DMA threads will return from wait_event_freezable_timeout(...)
> > due to kthread_should_stop() returning true, and subsequently
> > report failure because done.done is false.
> >
> > - The DMA engines may not be idle, so the asynchronous callback can
> > be invoked after we've started cleaning up, explaining the NULL
> > dereference I'm seeing.
> >
> > The solutions are either fixing the module exit code to cope with concurrent
> > DMA transfers or to revert 77101ce578bb and not allow the channel threads to
> > return mid-transfer.
> We need to properly abort the channels on removal. This is already handled in
> the code but the kthread_stop is called after the transactions are aborted. It
> should be the other way round. Can you try with below patch
Unfortunately, I can trigger the exact same panic with this patch applied.
Isn't there a race between terminating the dmaengine transfers
(dmaengine_terminate_all) and killing the test threads (kthread_stop) where
a new transfer could be kicked off by dmatest_func?
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* dmatest regression in 3.10-rc1
@ 2013-05-17 14:18 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-17 14:18 UTC (permalink / raw)
To: linux-arm-kernel
Hi Vinod,
Thanks for the reply.
On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > Right, so I think I understand what's causing this, but I'll leave it to
> > Andriy to suggest a fix. The problem comes about because the dmatest
> > module is now driven from debugfs, making it possible to unload the module
> > whilst a test run is in progress. In this case:
> >
> > - The DMA threads will return from wait_event_freezable_timeout(...)
> > due to kthread_should_stop() returning true, and subsequently
> > report failure because done.done is false.
> >
> > - The DMA engines may not be idle, so the asynchronous callback can
> > be invoked after we've started cleaning up, explaining the NULL
> > dereference I'm seeing.
> >
> > The solutions are either fixing the module exit code to cope with concurrent
> > DMA transfers or to revert 77101ce578bb and not allow the channel threads to
> > return mid-transfer.
> We need to properly abort the channels on removal. This is already handled in
> the code but the kthread_stop is called after the transactions are aborted. It
> should be the other way round. Can you try with below patch
Unfortunately, I can trigger the exact same panic with this patch applied.
Isn't there a race between terminating the dmaengine transfers
(dmaengine_terminate_all) and killing the test threads (kthread_stop) where
a new transfer could be kicked off by dmatest_func?
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: dmatest regression in 3.10-rc1
2013-05-17 14:18 ` Will Deacon
@ 2013-05-20 7:52 ` Andy Shevchenko
-1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-20 7:52 UTC (permalink / raw)
To: Will Deacon
Cc: Vinod Koul, djbw, linux-kernel, linux-arm-kernel,
andriy.shevchenko@linux.intel.com, viresh.kumar
On Fri, 2013-05-17 at 15:18 +0100, Will Deacon wrote:
> On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> > On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > > Right, so I think I understand what's causing this, but I'll leave it to
> > > Andriy to suggest a fix.
Thanks for the report.
Unfortunately neither me nor other guys in the team could not reproduce
mentioned issue on our hardware.
Nevertheless I will try to allocate time and suggest possible fix if any
comes to my mind. So, your testing and analysis is appreciated.
--
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy
^ permalink raw reply [flat|nested] 30+ messages in thread
* dmatest regression in 3.10-rc1
@ 2013-05-20 7:52 ` Andy Shevchenko
0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-20 7:52 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, 2013-05-17 at 15:18 +0100, Will Deacon wrote:
> On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> > On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > > Right, so I think I understand what's causing this, but I'll leave it to
> > > Andriy to suggest a fix.
Thanks for the report.
Unfortunately neither me nor other guys in the team could not reproduce
mentioned issue on our hardware.
Nevertheless I will try to allocate time and suggest possible fix if any
comes to my mind. So, your testing and analysis is appreciated.
--
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: dmatest regression in 3.10-rc1
2013-05-20 7:52 ` Andy Shevchenko
@ 2013-05-20 9:58 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-20 9:58 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Vinod Koul, djbw, linux-kernel, linux-arm-kernel,
andriy.shevchenko@linux.intel.com, viresh.kumar
Hi Andy,
On Mon, May 20, 2013 at 08:52:34AM +0100, Andy Shevchenko wrote:
> On Fri, 2013-05-17 at 15:18 +0100, Will Deacon wrote:
> > On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> > > On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > > > Right, so I think I understand what's causing this, but I'll leave it to
> > > > Andriy to suggest a fix.
>
> Thanks for the report.
>
> Unfortunately neither me nor other guys in the team could not reproduce
> mentioned issue on our hardware.
> Nevertheless I will try to allocate time and suggest possible fix if any
> comes to my mind. So, your testing and analysis is appreciated.
Sure. They way I trigger it is to unload the module whilst the DMA is
ongoing (I'm using a software model, so I can make the DMA nice and slow --
I guess you could try using some large buffers).
My script (I just run from busybox) is:
#!/bin/sh
mount proc -t proc /proc
mount sysfs -t sysfs /sys
mount debugfs -t debugfs /sys/kernel/debug
echo 9 > /proc/sysrq-trigger
modprobe dmatest
echo 8 > /sys/kernel/debug/dmatest/iterations
echo 8 > /sys/kernel/debug/dmatest/threads_per_chan
echo 131072 > /sys/kernel/debug/dmatest/test_buf_size
echo 1 > /sys/kernel/debug/dmatest/run
modprobe -r dmatest
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* dmatest regression in 3.10-rc1
@ 2013-05-20 9:58 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-20 9:58 UTC (permalink / raw)
To: linux-arm-kernel
Hi Andy,
On Mon, May 20, 2013 at 08:52:34AM +0100, Andy Shevchenko wrote:
> On Fri, 2013-05-17 at 15:18 +0100, Will Deacon wrote:
> > On Fri, May 17, 2013 at 01:34:23PM +0100, Vinod Koul wrote:
> > > On Thu, May 16, 2013 at 04:35:53PM +0100, Will Deacon wrote:
> > > > Right, so I think I understand what's causing this, but I'll leave it to
> > > > Andriy to suggest a fix.
>
> Thanks for the report.
>
> Unfortunately neither me nor other guys in the team could not reproduce
> mentioned issue on our hardware.
> Nevertheless I will try to allocate time and suggest possible fix if any
> comes to my mind. So, your testing and analysis is appreciated.
Sure. They way I trigger it is to unload the module whilst the DMA is
ongoing (I'm using a software model, so I can make the DMA nice and slow --
I guess you could try using some large buffers).
My script (I just run from busybox) is:
#!/bin/sh
mount proc -t proc /proc
mount sysfs -t sysfs /sys
mount debugfs -t debugfs /sys/kernel/debug
echo 9 > /proc/sysrq-trigger
modprobe dmatest
echo 8 > /sys/kernel/debug/dmatest/iterations
echo 8 > /sys/kernel/debug/dmatest/threads_per_chan
echo 131072 > /sys/kernel/debug/dmatest/test_buf_size
echo 1 > /sys/kernel/debug/dmatest/run
modprobe -r dmatest
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: dmatest regression in 3.10-rc1
2013-05-20 9:58 ` Will Deacon
@ 2013-05-21 12:31 ` Andy Shevchenko
-1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 12:31 UTC (permalink / raw)
To: Will Deacon
Cc: Vinod Koul, djbw, linux-kernel, linux-arm-kernel,
andriy.shevchenko@linux.intel.com, viresh.kumar
On Mon, 2013-05-20 at 10:58 +0100, Will Deacon wrote:
> Sure. They way I trigger it is to unload the module whilst the DMA is
> ongoing (I'm using a software model, so I can make the DMA nice and slow --
> I guess you could try using some large buffers).
Thank you for the script. It looks like I managed to reproduce it with
help of your script. I'm about to send the patch. I will appreciate to
collect a Tested-by tags in case it solves the issue.
--
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy
^ permalink raw reply [flat|nested] 30+ messages in thread
* dmatest regression in 3.10-rc1
@ 2013-05-21 12:31 ` Andy Shevchenko
0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 12:31 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2013-05-20 at 10:58 +0100, Will Deacon wrote:
> Sure. They way I trigger it is to unload the module whilst the DMA is
> ongoing (I'm using a software model, so I can make the DMA nice and slow --
> I guess you could try using some large buffers).
Thank you for the script. It looks like I managed to reproduce it with
help of your script. I'm about to send the patch. I will appreciate to
collect a Tested-by tags in case it solves the issue.
--
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
2013-05-16 15:35 ` Will Deacon
@ 2013-05-21 12:33 ` Andy Shevchenko
-1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 12:33 UTC (permalink / raw)
To: Vinod Koul, djbw @ fb . com, linux-kernel @ vger . kernel . org,
linux-arm-kernel @ lists . infradead . org, viresh.kumar,
Will Deacon
Cc: Andy Shevchenko
When thread is going to be stopped we have to unconditionally terminate all
ongoing transfers. Otherwise it would be possible that callback function will
be called on the next interrupt and will try to access to already freed
structures.
The patch introduces specific error message for this, though it doesn't
increase the counter of the failed tests.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Will Deacon <will.deacon@arm.com>
---
drivers/dma/dmatest.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..f61bd55 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -92,6 +92,7 @@ enum dmatest_error_type {
DMATEST_ET_MAP_DST,
DMATEST_ET_PREP,
DMATEST_ET_SUBMIT,
+ DMATEST_ET_ABORT,
DMATEST_ET_TIMEOUT,
DMATEST_ET_DMA_ERROR,
DMATEST_ET_DMA_IN_PROGRESS,
@@ -366,6 +367,7 @@ static char *thread_result_get(const char *name,
[DMATEST_ET_MAP_DST] = "dst mapping error",
[DMATEST_ET_PREP] = "prep error",
[DMATEST_ET_SUBMIT] = "submit error",
+ [DMATEST_ET_ABORT] = "transfer aborted",
[DMATEST_ET_TIMEOUT] = "test timed out",
[DMATEST_ET_DMA_ERROR] =
"got completion callback (DMA_ERROR)",
@@ -720,6 +722,15 @@ static int dmatest_func(void *data)
done.done || kthread_should_stop(),
msecs_to_jiffies(params->timeout));
+ /* terminate all transfers on specified channel if needed */
+ if (kthread_should_stop()) {
+ dmaengine_terminate_all(chan);
+ thread_result_add(info, result, DMATEST_ET_ABORT,
+ total_tests, src_off, dst_off,
+ len, 0);
+ break;
+ }
+
status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
if (!done.done) {
--
1.8.2.rc0.22.gb3600c3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-21 12:33 ` Andy Shevchenko
0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 12:33 UTC (permalink / raw)
To: linux-arm-kernel
When thread is going to be stopped we have to unconditionally terminate all
ongoing transfers. Otherwise it would be possible that callback function will
be called on the next interrupt and will try to access to already freed
structures.
The patch introduces specific error message for this, though it doesn't
increase the counter of the failed tests.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Will Deacon <will.deacon@arm.com>
---
drivers/dma/dmatest.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index d8ce4ec..f61bd55 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -92,6 +92,7 @@ enum dmatest_error_type {
DMATEST_ET_MAP_DST,
DMATEST_ET_PREP,
DMATEST_ET_SUBMIT,
+ DMATEST_ET_ABORT,
DMATEST_ET_TIMEOUT,
DMATEST_ET_DMA_ERROR,
DMATEST_ET_DMA_IN_PROGRESS,
@@ -366,6 +367,7 @@ static char *thread_result_get(const char *name,
[DMATEST_ET_MAP_DST] = "dst mapping error",
[DMATEST_ET_PREP] = "prep error",
[DMATEST_ET_SUBMIT] = "submit error",
+ [DMATEST_ET_ABORT] = "transfer aborted",
[DMATEST_ET_TIMEOUT] = "test timed out",
[DMATEST_ET_DMA_ERROR] =
"got completion callback (DMA_ERROR)",
@@ -720,6 +722,15 @@ static int dmatest_func(void *data)
done.done || kthread_should_stop(),
msecs_to_jiffies(params->timeout));
+ /* terminate all transfers on specified channel if needed */
+ if (kthread_should_stop()) {
+ dmaengine_terminate_all(chan);
+ thread_result_add(info, result, DMATEST_ET_ABORT,
+ total_tests, src_off, dst_off,
+ len, 0);
+ break;
+ }
+
status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
if (!done.done) {
--
1.8.2.rc0.22.gb3600c3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] dmatest: abort transfers immediately when asked for
2013-05-21 12:33 ` Andy Shevchenko
@ 2013-05-21 15:11 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-21 15:11 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Vinod Koul, djbw @ fb . com, linux-kernel @ vger . kernel . org,
linux-arm-kernel @ lists . infradead . org, viresh.kumar
Hi Andy,
On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> When thread is going to be stopped we have to unconditionally terminate all
> ongoing transfers. Otherwise it would be possible that callback function will
> be called on the next interrupt and will try to access to already freed
> structures.
>
> The patch introduces specific error message for this, though it doesn't
> increase the counter of the failed tests.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reported-by: Will Deacon <will.deacon@arm.com>
Thanks for persevering with this! Although this patch definitely fixes the
panic I was seeing, I now observe buffer verification failures in subsequent
test runs after an aborted run:
[ 43.888424] dma0chan0-copy0: #1: transfer aborted with src_off=0x81f0 dst_off=0x9f4 len=0xdf29 (0)
[ 43.888526] dma0chan0-copy0: terminating after 1 tests, 0 failures (status 0)
[ 43.888635] dmatest: thread dma0chan0-copy0 exited with status 0
<reload module and start new tests>
[ 77.341405] dma0chan0-copy0: #1: errors with src_off=0x8e29 dst_off=0x11dc8 len=0x2b93 (370)
[ 77.341505] dma0chan0-copy0: terminating after 1 tests, 1 failures (status 0)
Note that the non-aborted transfers (I have 4 dma controllers) seem fine.
Looking at the results file doesn't show anything obvious, although I do
see some strange entries:
[...]
dstbuf not copied! [0xffffffff] Expected ff, got ff
dstbuf not copied! [0xffffffff] Expected ff, got ff
dstbuf not copied! [0xffffffff] Expected 00, got 00
dstbuf not copied! [0xffffffff] Expected 00, got 00
dstbuf not copied! [0xffffffff] Expected ff, got ff
[...]
The more alarming ones look like:
[...]
dstbuf not copied! [0x147e9] Expected d5, got 36
dstbuf not copied! [0x147ea] Expected d4, got 35
dstbuf not copied! [0x147eb] Expected d3, got 34
dstbuf not copied! [0x147ec] Expected d2, got 33
dstbuf not copied! [0x147ed] Expected d1, got 32
dstbuf not copied! [0x147ee] Expected d0, got 31
[...]
Of course, this could be a driver bug with the PL330 that is being exposed
by this aborting code.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-21 15:11 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-21 15:11 UTC (permalink / raw)
To: linux-arm-kernel
Hi Andy,
On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> When thread is going to be stopped we have to unconditionally terminate all
> ongoing transfers. Otherwise it would be possible that callback function will
> be called on the next interrupt and will try to access to already freed
> structures.
>
> The patch introduces specific error message for this, though it doesn't
> increase the counter of the failed tests.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reported-by: Will Deacon <will.deacon@arm.com>
Thanks for persevering with this! Although this patch definitely fixes the
panic I was seeing, I now observe buffer verification failures in subsequent
test runs after an aborted run:
[ 43.888424] dma0chan0-copy0: #1: transfer aborted with src_off=0x81f0 dst_off=0x9f4 len=0xdf29 (0)
[ 43.888526] dma0chan0-copy0: terminating after 1 tests, 0 failures (status 0)
[ 43.888635] dmatest: thread dma0chan0-copy0 exited with status 0
<reload module and start new tests>
[ 77.341405] dma0chan0-copy0: #1: errors with src_off=0x8e29 dst_off=0x11dc8 len=0x2b93 (370)
[ 77.341505] dma0chan0-copy0: terminating after 1 tests, 1 failures (status 0)
Note that the non-aborted transfers (I have 4 dma controllers) seem fine.
Looking at the results file doesn't show anything obvious, although I do
see some strange entries:
[...]
dstbuf not copied! [0xffffffff] Expected ff, got ff
dstbuf not copied! [0xffffffff] Expected ff, got ff
dstbuf not copied! [0xffffffff] Expected 00, got 00
dstbuf not copied! [0xffffffff] Expected 00, got 00
dstbuf not copied! [0xffffffff] Expected ff, got ff
[...]
The more alarming ones look like:
[...]
dstbuf not copied! [0x147e9] Expected d5, got 36
dstbuf not copied! [0x147ea] Expected d4, got 35
dstbuf not copied! [0x147eb] Expected d3, got 34
dstbuf not copied! [0x147ec] Expected d2, got 33
dstbuf not copied! [0x147ed] Expected d1, got 32
dstbuf not copied! [0x147ee] Expected d0, got 31
[...]
Of course, this could be a driver bug with the PL330 that is being exposed
by this aborting code.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] dmatest: abort transfers immediately when asked for
2013-05-21 15:11 ` Will Deacon
@ 2013-05-21 17:24 ` Andy Shevchenko
-1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 17:24 UTC (permalink / raw)
To: Will Deacon
Cc: Andy Shevchenko, Vinod Koul, djbw @ fb . com,
linux-kernel @ vger . kernel . org,
linux-arm-kernel @ lists . infradead . org, viresh.kumar
On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Andy,
>
> On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> When thread is going to be stopped we have to unconditionally terminate all
>> ongoing transfers. Otherwise it would be possible that callback function will
>> be called on the next interrupt and will try to access to already freed
>> structures.
>>
>> The patch introduces specific error message for this, though it doesn't
>> increase the counter of the failed tests.
>>
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> Reported-by: Will Deacon <will.deacon@arm.com>
>
> Thanks for persevering with this! Although this patch definitely fixes the
> panic I was seeing, I now observe buffer verification failures in subsequent
> test runs after an aborted run:
I think the description to the commit adfa543e "dmatest: don't use
set_freezable_with_signal()" may shed light on this.
The background (if I got it correctly) is in race with done flag. So,
we got a callback call from the DMA engine, but we don't know which
transfer triggers it.
I might be wrong. This is just an assumption.
Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
with old dmatest module)
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-21 17:24 ` Andy Shevchenko
0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-21 17:24 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Andy,
>
> On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> When thread is going to be stopped we have to unconditionally terminate all
>> ongoing transfers. Otherwise it would be possible that callback function will
>> be called on the next interrupt and will try to access to already freed
>> structures.
>>
>> The patch introduces specific error message for this, though it doesn't
>> increase the counter of the failed tests.
>>
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> Reported-by: Will Deacon <will.deacon@arm.com>
>
> Thanks for persevering with this! Although this patch definitely fixes the
> panic I was seeing, I now observe buffer verification failures in subsequent
> test runs after an aborted run:
I think the description to the commit adfa543e "dmatest: don't use
set_freezable_with_signal()" may shed light on this.
The background (if I got it correctly) is in race with done flag. So,
we got a callback call from the DMA engine, but we don't know which
transfer triggers it.
I might be wrong. This is just an assumption.
Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
with old dmatest module)
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] dmatest: abort transfers immediately when asked for
2013-05-21 17:24 ` Andy Shevchenko
@ 2013-05-22 12:41 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-22 12:41 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Andy Shevchenko, Vinod Koul, djbw @ fb . com,
linux-kernel @ vger . kernel . org,
linux-arm-kernel @ lists . infradead . org, viresh.kumar
On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
>
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
>
> The background (if I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.
I've not managed to work out exactly what's going on, but it's certainly
something like that. In fact, I just managed to trigger a case where all but
one of the transfers is aborted, whilst the remaining one fails. Looking at
the code, I can't see how that situation comes about, since the threads are
protected with the info mutex and kthread_stop is synchronous.
> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)
No, dmatest from 3.9 is completely reliable in my experience.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-22 12:41 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2013-05-22 12:41 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
>
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
>
> The background (if I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.
I've not managed to work out exactly what's going on, but it's certainly
something like that. In fact, I just managed to trigger a case where all but
one of the transfers is aborted, whilst the remaining one fails. Looking at
the code, I can't see how that situation comes about, since the threads are
protected with the info mutex and kthread_stop is synchronous.
> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)
No, dmatest from 3.9 is completely reliable in my experience.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] dmatest: abort transfers immediately when asked for
2013-05-22 12:41 ` Will Deacon
@ 2013-05-22 13:26 ` Andy Shevchenko
-1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-22 13:26 UTC (permalink / raw)
To: Will Deacon
Cc: Andy Shevchenko, Vinod Koul, djbw @ fb . com,
linux-kernel @ vger . kernel . org,
linux-arm-kernel @ lists . infradead . org, viresh.kumar
On Wed, May 22, 2013 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>
> I've not managed to work out exactly what's going on, but it's certainly
> something like that. In fact, I just managed to trigger a case where all but
> one of the transfers is aborted, whilst the remaining one fails. Looking at
> the code, I can't see how that situation comes about, since the threads are
> protected with the info mutex and kthread_stop is synchronous.
>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
>
> No, dmatest from 3.9 is completely reliable in my experience.
Yeah, I supposed that was a rhetorical question.
So, have I understood correctly that if you revert the 77101ce5
("cancel thread ...")
everything is working fine / as before?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-22 13:26 ` Andy Shevchenko
0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-22 13:26 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, May 22, 2013 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, May 21, 2013 at 06:24:15PM +0100, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>
> I've not managed to work out exactly what's going on, but it's certainly
> something like that. In fact, I just managed to trigger a case where all but
> one of the transfers is aborted, whilst the remaining one fails. Looking at
> the code, I can't see how that situation comes about, since the threads are
> protected with the info mutex and kthread_stop is synchronous.
>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
>
> No, dmatest from 3.9 is completely reliable in my experience.
Yeah, I supposed that was a rhetorical question.
So, have I understood correctly that if you revert the 77101ce5
("cancel thread ...")
everything is working fine / as before?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] dmatest: abort transfers immediately when asked for
2013-05-21 17:24 ` Andy Shevchenko
@ 2013-05-23 10:09 ` Vinod Koul
-1 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-23 10:09 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Will Deacon, Andy Shevchenko, djbw @ fb . com,
linux-kernel @ vger . kernel . org,
linux-arm-kernel @ lists . infradead . org, viresh.kumar
On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
>
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
>
> The background (if I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.
>
> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)
Terminate shouldnt cause the issue with buffer verfication, can you try this on
dw_dmac, do you see similar failures on verfication?
--
~Vinod
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-23 10:09 ` Vinod Koul
0 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-23 10:09 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Andy,
> >
> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> When thread is going to be stopped we have to unconditionally terminate all
> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> be called on the next interrupt and will try to access to already freed
> >> structures.
> >>
> >> The patch introduces specific error message for this, though it doesn't
> >> increase the counter of the failed tests.
> >>
> >> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Reported-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for persevering with this! Although this patch definitely fixes the
> > panic I was seeing, I now observe buffer verification failures in subsequent
> > test runs after an aborted run:
>
> I think the description to the commit adfa543e "dmatest: don't use
> set_freezable_with_signal()" may shed light on this.
>
> The background (if I got it correctly) is in race with done flag. So,
> we got a callback call from the DMA engine, but we don't know which
> transfer triggers it.
> I might be wrong. This is just an assumption.
>
> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> with old dmatest module)
Terminate shouldnt cause the issue with buffer verfication, can you try this on
dw_dmac, do you see similar failures on verfication?
--
~Vinod
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] dmatest: abort transfers immediately when asked for
2013-05-23 10:09 ` Vinod Koul
@ 2013-05-23 10:51 ` Andy Shevchenko
-1 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-23 10:51 UTC (permalink / raw)
To: Vinod Koul
Cc: Will Deacon, Andy Shevchenko, djbw @ fb . com,
linux-kernel @ vger . kernel . org,
linux-arm-kernel @ lists . infradead . org, viresh.kumar
On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> >> When thread is going to be stopped we have to unconditionally terminate all
>> >> ongoing transfers. Otherwise it would be possible that callback function will
>> >> be called on the next interrupt and will try to access to already freed
>> >> structures.
>> >>
>> >> The patch introduces specific error message for this, though it doesn't
>> >> increase the counter of the failed tests.
>> >>
>> > Thanks for persevering with this! Although this patch definitely fixes the
>> > panic I was seeing, I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>>
>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
> Terminate shouldnt cause the issue with buffer verfication, can you try this on
> dw_dmac, do you see similar failures on verfication?
I saw the similar errors on dw_dmac on Intel Medfield device.
Anyway, I checked another approach with Will.
For now I will send a quick fix that prevents tester to abort an
ongoing transfer.
In future we could implement a robust logic when transfers can be
interrupted at any time.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-23 10:51 ` Andy Shevchenko
0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2013-05-23 10:51 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
>> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
>> >> When thread is going to be stopped we have to unconditionally terminate all
>> >> ongoing transfers. Otherwise it would be possible that callback function will
>> >> be called on the next interrupt and will try to access to already freed
>> >> structures.
>> >>
>> >> The patch introduces specific error message for this, though it doesn't
>> >> increase the counter of the failed tests.
>> >>
>> > Thanks for persevering with this! Although this patch definitely fixes the
>> > panic I was seeing, I now observe buffer verification failures in subsequent
>> > test runs after an aborted run:
>>
>> I think the description to the commit adfa543e "dmatest: don't use
>> set_freezable_with_signal()" may shed light on this.
>>
>> The background (if I got it correctly) is in race with done flag. So,
>> we got a callback call from the DMA engine, but we don't know which
>> transfer triggers it.
>> I might be wrong. This is just an assumption.
>>
>> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
>> with old dmatest module)
> Terminate shouldnt cause the issue with buffer verfication, can you try this on
> dw_dmac, do you see similar failures on verfication?
I saw the similar errors on dw_dmac on Intel Medfield device.
Anyway, I checked another approach with Will.
For now I will send a quick fix that prevents tester to abort an
ongoing transfer.
In future we could implement a robust logic when transfers can be
interrupted at any time.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] dmatest: abort transfers immediately when asked for
2013-05-23 10:51 ` Andy Shevchenko
@ 2013-05-23 10:22 ` Vinod Koul
-1 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-23 10:22 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Will Deacon, Andy Shevchenko, djbw @ fb . com,
linux-kernel @ vger . kernel . org,
linux-arm-kernel @ lists . infradead . org, viresh.kumar
On Thu, May 23, 2013 at 01:51:29PM +0300, Andy Shevchenko wrote:
> On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> > On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> >> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> >> When thread is going to be stopped we have to unconditionally terminate all
> >> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> >> be called on the next interrupt and will try to access to already freed
> >> >> structures.
> >> >>
> >> >> The patch introduces specific error message for this, though it doesn't
> >> >> increase the counter of the failed tests.
> >> >>
> >> > Thanks for persevering with this! Although this patch definitely fixes the
> >> > panic I was seeing, I now observe buffer verification failures in subsequent
> >> > test runs after an aborted run:
> >>
> >> I think the description to the commit adfa543e "dmatest: don't use
> >> set_freezable_with_signal()" may shed light on this.
> >>
> >> The background (if I got it correctly) is in race with done flag. So,
> >> we got a callback call from the DMA engine, but we don't know which
> >> transfer triggers it.
> >> I might be wrong. This is just an assumption.
> >>
> >> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> >> with old dmatest module)
> > Terminate shouldnt cause the issue with buffer verfication, can you try this on
> > dw_dmac, do you see similar failures on verfication?
>
> I saw the similar errors on dw_dmac on Intel Medfield device.
Ah, so may not be a driver issue.
> Anyway, I checked another approach with Will.
> For now I will send a quick fix that prevents tester to abort an
> ongoing transfer.
okay so taht should prevent regression, let me check on my setup if I find
anything
--
~Vinod
> In future we could implement a robust logic when transfers can be
> interrupted at any time.
>
> --
> With Best Regards,
> Andy Shevchenko
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH] dmatest: abort transfers immediately when asked for
@ 2013-05-23 10:22 ` Vinod Koul
0 siblings, 0 replies; 30+ messages in thread
From: Vinod Koul @ 2013-05-23 10:22 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 23, 2013 at 01:51:29PM +0300, Andy Shevchenko wrote:
> On Thu, May 23, 2013 at 1:09 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> > On Tue, May 21, 2013 at 08:24:15PM +0300, Andy Shevchenko wrote:
> >> On Tue, May 21, 2013 at 6:11 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Tue, May 21, 2013 at 01:33:17PM +0100, Andy Shevchenko wrote:
> >> >> When thread is going to be stopped we have to unconditionally terminate all
> >> >> ongoing transfers. Otherwise it would be possible that callback function will
> >> >> be called on the next interrupt and will try to access to already freed
> >> >> structures.
> >> >>
> >> >> The patch introduces specific error message for this, though it doesn't
> >> >> increase the counter of the failed tests.
> >> >>
> >> > Thanks for persevering with this! Although this patch definitely fixes the
> >> > panic I was seeing, I now observe buffer verification failures in subsequent
> >> > test runs after an aborted run:
> >>
> >> I think the description to the commit adfa543e "dmatest: don't use
> >> set_freezable_with_signal()" may shed light on this.
> >>
> >> The background (if I got it correctly) is in race with done flag. So,
> >> we got a callback call from the DMA engine, but we don't know which
> >> transfer triggers it.
> >> I might be wrong. This is just an assumption.
> >>
> >> Have you ever see such behaviour on pre v3.10-rc1 kernels? (I mean
> >> with old dmatest module)
> > Terminate shouldnt cause the issue with buffer verfication, can you try this on
> > dw_dmac, do you see similar failures on verfication?
>
> I saw the similar errors on dw_dmac on Intel Medfield device.
Ah, so may not be a driver issue.
> Anyway, I checked another approach with Will.
> For now I will send a quick fix that prevents tester to abort an
> ongoing transfer.
okay so taht should prevent regression, let me check on my setup if I find
anything
--
~Vinod
> In future we could implement a robust logic when transfers can be
> interrupted at any time.
>
> --
> With Best Regards,
> Andy Shevchenko
^ permalink raw reply [flat|nested] 30+ messages in thread