All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] dmaengine: rcar-dmac: fix resource freeing synchronization
@ 2017-03-28 22:40 Niklas Söderlund
  2017-03-28 22:40 ` [PATCH 1/3] dmaengine: rcar-dmac: store channel IRQ in struct rcar_dmac_chan Niklas Söderlund
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Niklas Söderlund @ 2017-03-28 22:40 UTC (permalink / raw)
  To: Vinod Koul, dmaengine, linux-renesas-soc
  Cc: Yoshihiro Shimoda, Lars-Peter Clausen, Hiroyuki Yokoyama,
	Niklas Söderlund

Hi,

This series fix resource freeing synchronization by:

1. Patch 1/3
   Store the IRQ number in the global struct so it can be used later 
   together with synchronize_irq().

2. Patch 2/3
   Adding support for the device_synchronize() callback in patch 2/3.

3. Patch 3/3
   Waiting for any ISR that might still be running after the channel is 
   halted prior to freeing its resources. This was patch previously part 
   of a patch sent out by Yoshihiro Shimoda and authored by Hiroyuki 
   Yokoyama, see [1].

   In that thread it was suggested by Lars-Peter Clausen to instead 
   implement the device_synchronize() callback. Unfortunately this is not 
   enough to solve the issue. In rcar_dmac_free_chan_resources() the 
   channel is halted by a call to rcar_dmac_chan_halt() and then directly 
   moves on to freeing resources, here it is still needed to add a wait 
   for any ISR to finish before freeing the resources, despite that a 
   device_synchronize() have been added.  This is because call chain:

   dma_release_channel()
     dma_chan_put()
       dmaengine_synchronize()
       rcar_dmac_free_chan_resources()
         rcar_dmac_chan_halt()

   Here dmaengine_synchronize() is called prior to rcar_dmac_chan_halt() 
   so an extra synchronisation to wait for any running ISR is still 
   needed.

By both adding a device_synchronize() which can be used in conjunction 
with device_terminate_all() and fiends and by adding an explicit 
synchronize_irq() when freeing channel resources I feel the 
synchronisation for freeing channel resources are in a much better 
shape. It also solves the issue in the original mail thread.

The series is based on v4.11-rc1 and is tested on r8a7795 Salvator-X.

1. https://patchwork.kernel.org/patch/9557691/

Niklas Söderlund (3):
  dmaengine: rcar-dmac: store channel IRQ in struct rcar_dmac_chan
  dmaengine: rcar-dmac: implement device_synchronize()
  dmaengine: rcar-dmac: wait for ISR to finish before freeing resources

 drivers/dma/sh/rcar-dmac.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] dmaengine: rcar-dmac: store channel IRQ in struct rcar_dmac_chan
  2017-03-28 22:40 [PATCH 0/3] dmaengine: rcar-dmac: fix resource freeing synchronization Niklas Söderlund
@ 2017-03-28 22:40 ` Niklas Söderlund
  2017-03-28 22:40 ` [PATCH 2/3] dmaengine: rcar-dmac: implement device_synchronize() Niklas Söderlund
  2017-03-28 22:40 ` [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources Niklas Söderlund
  2 siblings, 0 replies; 14+ messages in thread
From: Niklas Söderlund @ 2017-03-28 22:40 UTC (permalink / raw)
  To: Vinod Koul, dmaengine, linux-renesas-soc
  Cc: Yoshihiro Shimoda, Lars-Peter Clausen, Hiroyuki Yokoyama,
	Niklas Söderlund

The IRQ number is needed after probe to be able to add synchronisation
points in other places in the driver when freeing resources and to
implement a device_synchronize() callback. Store the IRQ number in the
struct rcar_dmac_chan so that it can be used later.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/dma/sh/rcar-dmac.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c
index 48b22d5c86026098..3038654f11b5c6ed 100644
--- a/drivers/dma/sh/rcar-dmac.c
+++ b/drivers/dma/sh/rcar-dmac.c
@@ -144,6 +144,7 @@ struct rcar_dmac_chan_map {
  * @chan: base DMA channel object
  * @iomem: channel I/O memory base
  * @index: index of this channel in the controller
+ * @irq: channel IRQ
  * @src: slave memory address and size on the source side
  * @dst: slave memory address and size on the destination side
  * @mid_rid: hardware MID/RID for the DMA client using this channel
@@ -161,6 +162,7 @@ struct rcar_dmac_chan {
 	struct dma_chan chan;
 	void __iomem *iomem;
 	unsigned int index;
+	int irq;
 
 	struct rcar_dmac_chan_slave src;
 	struct rcar_dmac_chan_slave dst;
@@ -1635,7 +1637,6 @@ static int rcar_dmac_chan_probe(struct rcar_dmac *dmac,
 	struct dma_chan *chan = &rchan->chan;
 	char pdev_irqname[5];
 	char *irqname;
-	int irq;
 	int ret;
 
 	rchan->index = index;
@@ -1652,8 +1653,8 @@ static int rcar_dmac_chan_probe(struct rcar_dmac *dmac,
 
 	/* Request the channel interrupt. */
 	sprintf(pdev_irqname, "ch%u", index);
-	irq = platform_get_irq_byname(pdev, pdev_irqname);
-	if (irq < 0) {
+	rchan->irq = platform_get_irq_byname(pdev, pdev_irqname);
+	if (rchan->irq < 0) {
 		dev_err(dmac->dev, "no IRQ specified for channel %u\n", index);
 		return -ENODEV;
 	}
@@ -1663,11 +1664,13 @@ static int rcar_dmac_chan_probe(struct rcar_dmac *dmac,
 	if (!irqname)
 		return -ENOMEM;
 
-	ret = devm_request_threaded_irq(dmac->dev, irq, rcar_dmac_isr_channel,
+	ret = devm_request_threaded_irq(dmac->dev, rchan->irq,
+					rcar_dmac_isr_channel,
 					rcar_dmac_isr_channel_thread, 0,
 					irqname, rchan);
 	if (ret) {
-		dev_err(dmac->dev, "failed to request IRQ %u (%d)\n", irq, ret);
+		dev_err(dmac->dev, "failed to request IRQ %u (%d)\n",
+			rchan->irq, ret);
 		return ret;
 	}
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] dmaengine: rcar-dmac: implement device_synchronize()
  2017-03-28 22:40 [PATCH 0/3] dmaengine: rcar-dmac: fix resource freeing synchronization Niklas Söderlund
  2017-03-28 22:40 ` [PATCH 1/3] dmaengine: rcar-dmac: store channel IRQ in struct rcar_dmac_chan Niklas Söderlund
@ 2017-03-28 22:40 ` Niklas Söderlund
  2017-03-28 22:40 ` [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources Niklas Söderlund
  2 siblings, 0 replies; 14+ messages in thread
From: Niklas Söderlund @ 2017-03-28 22:40 UTC (permalink / raw)
  To: Vinod Koul, dmaengine, linux-renesas-soc
  Cc: Yoshihiro Shimoda, Lars-Peter Clausen, Hiroyuki Yokoyama,
	Niklas Söderlund

Implement the device_synchronize() callback which wait until a dma
channel is stopped to provide a synchronization point.

This protects the driver from multiple race conditions when terminating
and freeing resources. E.g. the completion callback still running after
device_terminate_all() has completed.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/dma/sh/rcar-dmac.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c
index 3038654f11b5c6ed..4b90deb40d559bed 100644
--- a/drivers/dma/sh/rcar-dmac.c
+++ b/drivers/dma/sh/rcar-dmac.c
@@ -1353,6 +1353,13 @@ static void rcar_dmac_issue_pending(struct dma_chan *chan)
 	spin_unlock_irqrestore(&rchan->lock, flags);
 }
 
+static void rcar_dmac_device_synchronize(struct dma_chan *chan)
+{
+	struct rcar_dmac_chan *rchan = to_rcar_dmac_chan(chan);
+
+	synchronize_irq(rchan->irq);
+}
+
 /* -----------------------------------------------------------------------------
  * IRQ handling
  */
@@ -1834,6 +1841,7 @@ static int rcar_dmac_probe(struct platform_device *pdev)
 	engine->device_terminate_all = rcar_dmac_chan_terminate_all;
 	engine->device_tx_status = rcar_dmac_tx_status;
 	engine->device_issue_pending = rcar_dmac_issue_pending;
+	engine->device_synchronize = rcar_dmac_device_synchronize;
 
 	ret = dma_async_device_register(engine);
 	if (ret < 0)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-03-28 22:40 [PATCH 0/3] dmaengine: rcar-dmac: fix resource freeing synchronization Niklas Söderlund
  2017-03-28 22:40 ` [PATCH 1/3] dmaengine: rcar-dmac: store channel IRQ in struct rcar_dmac_chan Niklas Söderlund
  2017-03-28 22:40 ` [PATCH 2/3] dmaengine: rcar-dmac: implement device_synchronize() Niklas Söderlund
@ 2017-03-28 22:40 ` Niklas Söderlund
  2017-03-29 12:31   ` Geert Uytterhoeven
  2 siblings, 1 reply; 14+ messages in thread
From: Niklas Söderlund @ 2017-03-28 22:40 UTC (permalink / raw)
  To: Vinod Koul, dmaengine, linux-renesas-soc
  Cc: Yoshihiro Shimoda, Lars-Peter Clausen, Hiroyuki Yokoyama,
	Niklas Söderlund

This fixes a race condition where the channel resources could be freed
before the ISR had finished running resulting in a NULL pointer
reference from the ISR.

[  167.148934] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  167.157051] pgd = ffff80003c641000
[  167.160449] [00000000] *pgd=000000007c507003, *pud=000000007c4ff003, *pmd=0000000000000000
[  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
[  167.174289] Modules linked in:
[  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted 4.11.0-rc1-00001-g8d92afddc2f6633a #73
[  167.186131] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
[  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
[  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
[  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400

Based of previous work by:
    Hiroyuki Yokoyama <hiroyuki.yokoyama.vx@renesas.com>.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
---
 drivers/dma/sh/rcar-dmac.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c
index 4b90deb40d559bed..0ec63600ebcc3a27 100644
--- a/drivers/dma/sh/rcar-dmac.c
+++ b/drivers/dma/sh/rcar-dmac.c
@@ -998,7 +998,11 @@ static void rcar_dmac_free_chan_resources(struct dma_chan *chan)
 	rcar_dmac_chan_halt(rchan);
 	spin_unlock_irq(&rchan->lock);
 
-	/* Now no new interrupts will occur */
+	/*
+	 * Now no new interrupts will occur, but one might already be
+	 * running. Wait for it to finish before freeing resources.
+	 */
+	synchronize_irq(rchan->irq);
 
 	if (rchan->mid_rid >= 0) {
 		/* The caller is holding dma_list_mutex */
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-03-28 22:40 ` [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources Niklas Söderlund
@ 2017-03-29 12:31   ` Geert Uytterhoeven
  2017-03-29 13:30     ` Niklas Söderlund
  0 siblings, 1 reply; 14+ messages in thread
From: Geert Uytterhoeven @ 2017-03-29 12:31 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Vinod Koul, dmaengine, Linux-Renesas, Yoshihiro Shimoda,
	Lars-Peter Clausen, Hiroyuki Yokoyama

Hi Niklas,

On Wed, Mar 29, 2017 at 12:40 AM, Niklas Söderlund
<niklas.soderlund+renesas@ragnatech.se> wrote:
> This fixes a race condition where the channel resources could be freed
> before the ISR had finished running resulting in a NULL pointer
> reference from the ISR.
>
> [  167.148934] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [  167.157051] pgd = ffff80003c641000
> [  167.160449] [00000000] *pgd=000000007c507003, *pud=000000007c4ff003, *pmd=0000000000000000
> [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> [  167.174289] Modules linked in:
> [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> [  167.186131] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
> [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400

Do you have a test case to trigger this?

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-03-29 12:31   ` Geert Uytterhoeven
@ 2017-03-29 13:30     ` Niklas Söderlund
  2017-03-30  7:38       ` Niklas Söderlund
  0 siblings, 1 reply; 14+ messages in thread
From: Niklas Söderlund @ 2017-03-29 13:30 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Vinod Koul, dmaengine, Linux-Renesas, Yoshihiro Shimoda,
	Lars-Peter Clausen, Hiroyuki Yokoyama

Hi Geert,

On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> Hi Niklas,
> 
> On Wed, Mar 29, 2017 at 12:40 AM, Niklas S�derlund
> <niklas.soderlund+renesas@ragnatech.se> wrote:
> > This fixes a race condition where the channel resources could be freed
> > before the ISR had finished running resulting in a NULL pointer
> > reference from the ISR.
> >
> > [  167.148934] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [  167.157051] pgd = ffff80003c641000
> > [  167.160449] [00000000] *pgd=000000007c507003, *pud=000000007c4ff003, *pmd=0000000000000000
> > [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > [  167.174289] Modules linked in:
> > [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > [  167.186131] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
> > [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> 
> Do you have a test case to trigger this?

Yes I have a testcase, it's rather complex and involves both a kernel 
module and a userspaces application to stress the rcar-dmac. I'm 
checking if I can share this publicly or not, please hold :-)

> 
> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

-- 
Regards,
Niklas S�derlund

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-03-29 13:30     ` Niklas Söderlund
@ 2017-03-30  7:38       ` Niklas Söderlund
  2017-04-05  3:25         ` Vinod Koul
  0 siblings, 1 reply; 14+ messages in thread
From: Niklas Söderlund @ 2017-03-30  7:38 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Vinod Koul, dmaengine, Linux-Renesas, Yoshihiro Shimoda,
	Lars-Peter Clausen, Hiroyuki Yokoyama

Hi Geert,

On 2017-03-29 15:30:42 +0200, Niklas S�derlund wrote:
> Hi Geert,
> 
> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> > Hi Niklas,
> > 
> > On Wed, Mar 29, 2017 at 12:40 AM, Niklas S�derlund
> > <niklas.soderlund+renesas@ragnatech.se> wrote:
> > > This fixes a race condition where the channel resources could be freed
> > > before the ISR had finished running resulting in a NULL pointer
> > > reference from the ISR.
> > >
> > > [  167.148934] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > > [  167.157051] pgd = ffff80003c641000
> > > [  167.160449] [00000000] *pgd=000000007c507003, *pud=000000007c4ff003, *pmd=0000000000000000
> > > [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > > [  167.174289] Modules linked in:
> > > [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > > [  167.186131] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
> > > [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > > [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > > [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> > 
> > Do you have a test case to trigger this?
> 
> Yes I have a testcase, it's rather complex and involves both a kernel 
> module and a userspaces application to stress the rcar-dmac. I'm 
> checking if I can share this publicly or not, please hold :-)

I have now received feedback that I'm unfortunately not allowed to share 
the test case :-(

The big picture in how to trigger this problem is that you start a DMA 
transfer like this:

struct dma_async_tx_descriptor *tx = ...;

...

tx->tx_submit(tx);

And then you directly call dma_release_channel() on this channel without 
making sure the completion callback ran or anything. Now if you are 
unlucky the ISR have not finished running for the DMA when 
dma_release_channel() starts to clean up resources. The synchronisation 
point in the dma_release_channel() call path fixes this.

> 
> > 
> > Thanks!
> > 
> > Gr{oetje,eeting}s,
> > 
> >                         Geert
> > 
> > --
> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> > 
> > In personal conversations with technical people, I call myself a hacker. But
> > when I'm talking to journalists I just say "programmer" or something like that.
> >                                 -- Linus Torvalds
> 
> -- 
> Regards,
> Niklas S�derlund

-- 
Regards,
Niklas S�derlund

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-03-30  7:38       ` Niklas Söderlund
@ 2017-04-05  3:25         ` Vinod Koul
  2017-04-05  9:14           ` Niklas Söderlund
  0 siblings, 1 reply; 14+ messages in thread
From: Vinod Koul @ 2017-04-05  3:25 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Geert Uytterhoeven, dmaengine, Linux-Renesas, Yoshihiro Shimoda,
	Lars-Peter Clausen, Hiroyuki Yokoyama

On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas S�derlund wrote:
> Hi Geert,
> 
> On 2017-03-29 15:30:42 +0200, Niklas S�derlund wrote:
> > Hi Geert,
> > 
> > On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> > > Hi Niklas,
> > > 
> > > On Wed, Mar 29, 2017 at 12:40 AM, Niklas S�derlund
> > > <niklas.soderlund+renesas@ragnatech.se> wrote:
> > > > This fixes a race condition where the channel resources could be freed
> > > > before the ISR had finished running resulting in a NULL pointer
> > > > reference from the ISR.
> > > >
> > > > [  167.148934] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > > > [  167.157051] pgd = ffff80003c641000
> > > > [  167.160449] [00000000] *pgd=000000007c507003, *pud=000000007c4ff003, *pmd=0000000000000000
> > > > [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > > > [  167.174289] Modules linked in:
> > > > [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > > > [  167.186131] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
> > > > [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > > > [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > > > [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> > > 
> > > Do you have a test case to trigger this?
> > 
> > Yes I have a testcase, it's rather complex and involves both a kernel 
> > module and a userspaces application to stress the rcar-dmac. I'm 
> > checking if I can share this publicly or not, please hold :-)
> 
> I have now received feedback that I'm unfortunately not allowed to share 
> the test case :-(
> 
> The big picture in how to trigger this problem is that you start a DMA 
> transfer like this:
> 
> struct dma_async_tx_descriptor *tx = ...;
> 
> ...
> 
> tx->tx_submit(tx);
> 
> And then you directly call dma_release_channel() on this channel without 
> making sure the completion callback ran or anything. Now if you are 
> unlucky the ISR have not finished running for the DMA when 
> dma_release_channel() starts to clean up resources. The synchronisation 
> point in the dma_release_channel() call path fixes this.

Well the API expectation would be you abort the txn before calling release.
So the expected order should be:

dmaengine_terminate_all();
dma_release_channel();

Terminate should then stop the channel, ie abort the pending descriptors..

-- 
~Vinod

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-04-05  3:25         ` Vinod Koul
@ 2017-04-05  9:14           ` Niklas Söderlund
  2017-04-05 10:40             ` Geert Uytterhoeven
  0 siblings, 1 reply; 14+ messages in thread
From: Niklas Söderlund @ 2017-04-05  9:14 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Geert Uytterhoeven, dmaengine, Linux-Renesas, Yoshihiro Shimoda,
	Lars-Peter Clausen, Hiroyuki Yokoyama

Hi Vinod,

On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas S�derlund wrote:
> > Hi Geert,
> > 
> > On 2017-03-29 15:30:42 +0200, Niklas S�derlund wrote:
> > > Hi Geert,
> > > 
> > > On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> > > > Hi Niklas,
> > > > 
> > > > On Wed, Mar 29, 2017 at 12:40 AM, Niklas S�derlund
> > > > <niklas.soderlund+renesas@ragnatech.se> wrote:
> > > > > This fixes a race condition where the channel resources could be freed
> > > > > before the ISR had finished running resulting in a NULL pointer
> > > > > reference from the ISR.
> > > > >
> > > > > [  167.148934] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > > > > [  167.157051] pgd = ffff80003c641000
> > > > > [  167.160449] [00000000] *pgd=000000007c507003, *pud=000000007c4ff003, *pmd=0000000000000000
> > > > > [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > > > > [  167.174289] Modules linked in:
> > > > > [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > > > > [  167.186131] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
> > > > > [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > > > > [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > > > > [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> > > > 
> > > > Do you have a test case to trigger this?
> > > 
> > > Yes I have a testcase, it's rather complex and involves both a kernel 
> > > module and a userspaces application to stress the rcar-dmac. I'm 
> > > checking if I can share this publicly or not, please hold :-)
> > 
> > I have now received feedback that I'm unfortunately not allowed to share 
> > the test case :-(
> > 
> > The big picture in how to trigger this problem is that you start a DMA 
> > transfer like this:
> > 
> > struct dma_async_tx_descriptor *tx = ...;
> > 
> > ...
> > 
> > tx->tx_submit(tx);
> > 
> > And then you directly call dma_release_channel() on this channel without 
> > making sure the completion callback ran or anything. Now if you are 
> > unlucky the ISR have not finished running for the DMA when 
> > dma_release_channel() starts to clean up resources. The synchronisation 
> > point in the dma_release_channel() call path fixes this.
> 
> Well the API expectation would be you abort the txn before calling release.
> So the expected order should be:
> 
> dmaengine_terminate_all();
> dma_release_channel();

Agree this is the correct way and in this case patch 3/3 in this series 
could be dropped. Then device_synchronize() would added to rcar-dmac, 
dmaengine_terminate_all() would turn of the IRQ and 
dma_release_channel() would ensure that device_synchronize() is called 
prior to calling rcar-dmac device_free_chan_resources().

> 
> Terminate should then stop the channel, ie abort the pending descriptors..
> 

However for reasons unknown to me the rcar-dmac 
device_free_chan_resources() implementation implements logic to turn of 
IRQs before it frees the resources. And it's because of this patch 3/3 
is needed so that it can be sure no ISR is running before it frees 
resources.

I don't know how to best proceed here. I agree it feels a bit odd that 
device_free_chan_resources() is dealing with the IRQs as such things 
should be done before it's called. But on the other hand that code has 
been part of the driver since it was added upstream. I feel a bit 
uncomfortable just removing that part from the 
device_free_chan_resources() since the driver have been in use with it 
for such a long time.

How would you prefer I try and resolve this?

-- 
Regards,
Niklas S�derlund

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-04-05  9:14           ` Niklas Söderlund
@ 2017-04-05 10:40             ` Geert Uytterhoeven
  2017-04-07 11:33               ` Laurent Pinchart
  0 siblings, 1 reply; 14+ messages in thread
From: Geert Uytterhoeven @ 2017-04-05 10:40 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Vinod Koul, dmaengine, Linux-Renesas, Yoshihiro Shimoda,
	Lars-Peter Clausen, Hiroyuki Yokoyama, Laurent Pinchart

Hi Niklas,

(CC Laurent)

On Wed, Apr 5, 2017 at 11:14 AM, Niklas Söderlund
<niklas.soderlund@ragnatech.se> wrote:
> On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
>> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas Söderlund wrote:
>> > On 2017-03-29 15:30:42 +0200, Niklas Söderlund wrote:
>> > > On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
>> > > > On Wed, Mar 29, 2017 at 12:40 AM, Niklas Söderlund
>> > > > <niklas.soderlund+renesas@ragnatech.se> wrote:
>> > > > > This fixes a race condition where the channel resources could be freed
>> > > > > before the ISR had finished running resulting in a NULL pointer
>> > > > > reference from the ISR.
>> > > > >
>> > > > > [  167.148934] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>> > > > > [  167.157051] pgd = ffff80003c641000
>> > > > > [  167.160449] [00000000] *pgd=000000007c507003, *pud=000000007c4ff003, *pmd=0000000000000000
>> > > > > [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
>> > > > > [  167.174289] Modules linked in:
>> > > > > [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted 4.11.0-rc1-00001-g8d92afddc2f6633a #73
>> > > > > [  167.186131] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
>> > > > > [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
>> > > > > [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
>> > > > > [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
>> > > >
>> > > > Do you have a test case to trigger this?
>> > >
>> > > Yes I have a testcase, it's rather complex and involves both a kernel
>> > > module and a userspaces application to stress the rcar-dmac. I'm
>> > > checking if I can share this publicly or not, please hold :-)
>> >
>> > I have now received feedback that I'm unfortunately not allowed to share
>> > the test case :-(
>> >
>> > The big picture in how to trigger this problem is that you start a DMA
>> > transfer like this:
>> >
>> > struct dma_async_tx_descriptor *tx = ...;
>> >
>> > ...
>> >
>> > tx->tx_submit(tx);
>> >
>> > And then you directly call dma_release_channel() on this channel without
>> > making sure the completion callback ran or anything. Now if you are
>> > unlucky the ISR have not finished running for the DMA when
>> > dma_release_channel() starts to clean up resources. The synchronisation
>> > point in the dma_release_channel() call path fixes this.
>>
>> Well the API expectation would be you abort the txn before calling release.
>> So the expected order should be:
>>
>> dmaengine_terminate_all();
>> dma_release_channel();
>
> Agree this is the correct way and in this case patch 3/3 in this series
> could be dropped. Then device_synchronize() would added to rcar-dmac,
> dmaengine_terminate_all() would turn of the IRQ and
> dma_release_channel() would ensure that device_synchronize() is called
> prior to calling rcar-dmac device_free_chan_resources().
>
>>
>> Terminate should then stop the channel, ie abort the pending descriptors..
>>
>
> However for reasons unknown to me the rcar-dmac
> device_free_chan_resources() implementation implements logic to turn of
> IRQs before it frees the resources. And it's because of this patch 3/3
> is needed so that it can be sure no ISR is running before it frees
> resources.
>
> I don't know how to best proceed here. I agree it feels a bit odd that
> device_free_chan_resources() is dealing with the IRQs as such things
> should be done before it's called. But on the other hand that code has
> been part of the driver since it was added upstream. I feel a bit
> uncomfortable just removing that part from the
> device_free_chan_resources() since the driver have been in use with it
> for such a long time.
>
> How would you prefer I try and resolve this?

Perhaps Laurent knows why it was implemented this way?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-04-05 10:40             ` Geert Uytterhoeven
@ 2017-04-07 11:33               ` Laurent Pinchart
  2017-05-12 12:49                 ` Niklas Söderlund
  0 siblings, 1 reply; 14+ messages in thread
From: Laurent Pinchart @ 2017-04-07 11:33 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Niklas Söderlund, Vinod Koul, dmaengine, Linux-Renesas,
	Yoshihiro Shimoda, Lars-Peter Clausen, Hiroyuki Yokoyama

Hi Geert,

On Wednesday 05 Apr 2017 12:40:11 Geert Uytterhoeven wrote:
> On Wed, Apr 5, 2017 at 11:14 AM, Niklas Söderlund wrote:
> > On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
> >> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas Söderlund wrote:
> >>> On 2017-03-29 15:30:42 +0200, Niklas Söderlund wrote:
> >>>> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> >>>>> On Wed, Mar 29, 2017 at 12:40 AM, Niklas Söderlund wrote:
> >>>>>> This fixes a race condition where the channel resources could be
> >>>>>> freed before the ISR had finished running resulting in a NULL
> >>>>>> pointer reference from the ISR.
> >>>>>> 
> >>>>>> [  167.148934] Unable to handle kernel NULL pointer dereference
> >>>>>> at virtual address 00000000
> >>>>>> [  167.157051] pgd = ffff80003c641000
> >>>>>> [  167.160449] [00000000] *pgd=000000007c507003,
> >>>>>> *pud=000000007c4ff003, *pmd=0000000000000000
> >>>>>> [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> >>>>>> [  167.174289] Modules linked in:
> >>>>>> [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted
> >>>>>> 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> >>>>>> [  167.186131] Hardware name: Renesas Salvator-X board based on
> >>>>>> r8a7795 (DT)
> >>>>>> [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> >>>>>> [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> >>>>>> [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> >>>>> 
> >>>>> Do you have a test case to trigger this?
> >>>> 
> >>>> Yes I have a testcase, it's rather complex and involves both a kernel
> >>>> module and a userspaces application to stress the rcar-dmac. I'm
> >>>> checking if I can share this publicly or not, please hold :-)
> >>> 
> >>> I have now received feedback that I'm unfortunately not allowed to
> >>> share the test case :-(
> >>> 
> >>> The big picture in how to trigger this problem is that you start a DMA
> >>> transfer like this:
> >>> 
> >>> struct dma_async_tx_descriptor *tx = ...;
> >>> 
> >>> ...
> >>> 
> >>> tx->tx_submit(tx);
> >>> 
> >>> And then you directly call dma_release_channel() on this channel
> >>> without making sure the completion callback ran or anything. Now if you
> >>> are unlucky the ISR have not finished running for the DMA when
> >>> dma_release_channel() starts to clean up resources. The synchronisation
> >>> point in the dma_release_channel() call path fixes this.
> >> 
> >> Well the API expectation would be you abort the txn before calling
> >> release. So the expected order should be:
> >> 
> >> dmaengine_terminate_all();
> >> dma_release_channel();
> > 
> > Agree this is the correct way and in this case patch 3/3 in this series
> > could be dropped. Then device_synchronize() would added to rcar-dmac,
> > dmaengine_terminate_all() would turn of the IRQ and
> > dma_release_channel() would ensure that device_synchronize() is called
> > prior to calling rcar-dmac device_free_chan_resources().
> > 
> >> Terminate should then stop the channel, ie abort the pending
> >> descriptors..
> > 
> > However for reasons unknown to me the rcar-dmac
> > device_free_chan_resources() implementation implements logic to turn of
> > IRQs before it frees the resources. And it's because of this patch 3/3
> > is needed so that it can be sure no ISR is running before it frees
> > resources.
> > 
> > I don't know how to best proceed here. I agree it feels a bit odd that
> > device_free_chan_resources() is dealing with the IRQs as such things
> > should be done before it's called. But on the other hand that code has
> > been part of the driver since it was added upstream. I feel a bit
> > uncomfortable just removing that part from the
> > device_free_chan_resources() since the driver have been in use with it
> > for such a long time.
> > 
> > How would you prefer I try and resolve this?
> 
> Perhaps Laurent knows why it was implemented this way?

That was nearly 3 years ago, and I can hardly remember reasons related to code 
I wrote 3 months ago :-)

I might just have been overcautious, guarding against conditions that should 
not happen if the caller behaves correctly. The situation might have changed 
since the driver was written. It might also be just a case of cargo-cult 
programming, as the shdma_free_chan_resources() has very similar code.

Given that freeing channel resources when the channel isn't idle can cause an 
oops, I think we should guard against that. This should probably be 
implemented in the dma-engine core, to make sure we catch the issue in as many 
drivers as possible.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-04-07 11:33               ` Laurent Pinchart
@ 2017-05-12 12:49                 ` Niklas Söderlund
  2017-05-14 12:01                   ` Vinod Koul
  0 siblings, 1 reply; 14+ messages in thread
From: Niklas Söderlund @ 2017-05-12 12:49 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Geert Uytterhoeven, Vinod Koul, dmaengine, Linux-Renesas,
	Yoshihiro Shimoda, Lars-Peter Clausen, Hiroyuki Yokoyama

On 2017-04-07 14:33:47 +0300, Laurent Pinchart wrote:
> Hi Geert,
> 
> On Wednesday 05 Apr 2017 12:40:11 Geert Uytterhoeven wrote:
> > On Wed, Apr 5, 2017 at 11:14 AM, Niklas S�derlund wrote:
> > > On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
> > >> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas S�derlund wrote:
> > >>> On 2017-03-29 15:30:42 +0200, Niklas S�derlund wrote:
> > >>>> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> > >>>>> On Wed, Mar 29, 2017 at 12:40 AM, Niklas S�derlund wrote:
> > >>>>>> This fixes a race condition where the channel resources could be
> > >>>>>> freed before the ISR had finished running resulting in a NULL
> > >>>>>> pointer reference from the ISR.
> > >>>>>> 
> > >>>>>> [  167.148934] Unable to handle kernel NULL pointer dereference
> > >>>>>> at virtual address 00000000
> > >>>>>> [  167.157051] pgd = ffff80003c641000
> > >>>>>> [  167.160449] [00000000] *pgd=000000007c507003,
> > >>>>>> *pud=000000007c4ff003, *pmd=0000000000000000
> > >>>>>> [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > >>>>>> [  167.174289] Modules linked in:
> > >>>>>> [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted
> > >>>>>> 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > >>>>>> [  167.186131] Hardware name: Renesas Salvator-X board based on
> > >>>>>> r8a7795 (DT)
> > >>>>>> [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > >>>>>> [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > >>>>>> [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> > >>>>> 
> > >>>>> Do you have a test case to trigger this?
> > >>>> 
> > >>>> Yes I have a testcase, it's rather complex and involves both a kernel
> > >>>> module and a userspaces application to stress the rcar-dmac. I'm
> > >>>> checking if I can share this publicly or not, please hold :-)
> > >>> 
> > >>> I have now received feedback that I'm unfortunately not allowed to
> > >>> share the test case :-(
> > >>> 
> > >>> The big picture in how to trigger this problem is that you start a DMA
> > >>> transfer like this:
> > >>> 
> > >>> struct dma_async_tx_descriptor *tx = ...;
> > >>> 
> > >>> ...
> > >>> 
> > >>> tx->tx_submit(tx);
> > >>> 
> > >>> And then you directly call dma_release_channel() on this channel
> > >>> without making sure the completion callback ran or anything. Now if you
> > >>> are unlucky the ISR have not finished running for the DMA when
> > >>> dma_release_channel() starts to clean up resources. The synchronisation
> > >>> point in the dma_release_channel() call path fixes this.
> > >> 
> > >> Well the API expectation would be you abort the txn before calling
> > >> release. So the expected order should be:
> > >> 
> > >> dmaengine_terminate_all();
> > >> dma_release_channel();
> > > 
> > > Agree this is the correct way and in this case patch 3/3 in this series
> > > could be dropped. Then device_synchronize() would added to rcar-dmac,
> > > dmaengine_terminate_all() would turn of the IRQ and
> > > dma_release_channel() would ensure that device_synchronize() is called
> > > prior to calling rcar-dmac device_free_chan_resources().
> > > 
> > >> Terminate should then stop the channel, ie abort the pending
> > >> descriptors..
> > > 
> > > However for reasons unknown to me the rcar-dmac
> > > device_free_chan_resources() implementation implements logic to turn of
> > > IRQs before it frees the resources. And it's because of this patch 3/3
> > > is needed so that it can be sure no ISR is running before it frees
> > > resources.
> > > 
> > > I don't know how to best proceed here. I agree it feels a bit odd that
> > > device_free_chan_resources() is dealing with the IRQs as such things
> > > should be done before it's called. But on the other hand that code has
> > > been part of the driver since it was added upstream. I feel a bit
> > > uncomfortable just removing that part from the
> > > device_free_chan_resources() since the driver have been in use with it
> > > for such a long time.
> > > 
> > > How would you prefer I try and resolve this?
> > 
> > Perhaps Laurent knows why it was implemented this way?
> 
> That was nearly 3 years ago, and I can hardly remember reasons related to code 
> I wrote 3 months ago :-)
> 
> I might just have been overcautious, guarding against conditions that should 
> not happen if the caller behaves correctly. The situation might have changed 
> since the driver was written. It might also be just a case of cargo-cult 
> programming, as the shdma_free_chan_resources() has very similar code.

Since the driver today have this behavior would it not be best to first 
make sure it functions as expected and then as a second step see if we 
can remove it all together?

Vinod would you be strongly opposed to picking up this series as is?

> 
> Given that freeing channel resources when the channel isn't idle can cause an 
> oops, I think we should guard against that. This should probably be 
> implemented in the dma-engine core, to make sure we catch the issue in as many 
> drivers as possible.
> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 

-- 
Regards,
Niklas S�derlund

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-05-12 12:49                 ` Niklas Söderlund
@ 2017-05-14 12:01                   ` Vinod Koul
  2017-05-15 23:12                     ` Niklas Söderlund
  0 siblings, 1 reply; 14+ messages in thread
From: Vinod Koul @ 2017-05-14 12:01 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Laurent Pinchart, Geert Uytterhoeven, dmaengine, Linux-Renesas,
	Yoshihiro Shimoda, Lars-Peter Clausen, Hiroyuki Yokoyama

On Fri, May 12, 2017 at 02:49:38PM +0200, Niklas S�derlund wrote:
> On 2017-04-07 14:33:47 +0300, Laurent Pinchart wrote:
> > Hi Geert,
> > 
> > On Wednesday 05 Apr 2017 12:40:11 Geert Uytterhoeven wrote:
> > > On Wed, Apr 5, 2017 at 11:14 AM, Niklas S�derlund wrote:
> > > > On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
> > > >> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas S�derlund wrote:
> > > >>> On 2017-03-29 15:30:42 +0200, Niklas S�derlund wrote:
> > > >>>> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> > > >>>>> On Wed, Mar 29, 2017 at 12:40 AM, Niklas S�derlund wrote:
> > > >>>>>> This fixes a race condition where the channel resources could be
> > > >>>>>> freed before the ISR had finished running resulting in a NULL
> > > >>>>>> pointer reference from the ISR.
> > > >>>>>> 
> > > >>>>>> [  167.148934] Unable to handle kernel NULL pointer dereference
> > > >>>>>> at virtual address 00000000
> > > >>>>>> [  167.157051] pgd = ffff80003c641000
> > > >>>>>> [  167.160449] [00000000] *pgd=000000007c507003,
> > > >>>>>> *pud=000000007c4ff003, *pmd=0000000000000000
> > > >>>>>> [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > > >>>>>> [  167.174289] Modules linked in:
> > > >>>>>> [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted
> > > >>>>>> 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > > >>>>>> [  167.186131] Hardware name: Renesas Salvator-X board based on
> > > >>>>>> r8a7795 (DT)
> > > >>>>>> [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > > >>>>>> [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > > >>>>>> [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> > > >>>>> 
> > > >>>>> Do you have a test case to trigger this?
> > > >>>> 
> > > >>>> Yes I have a testcase, it's rather complex and involves both a kernel
> > > >>>> module and a userspaces application to stress the rcar-dmac. I'm
> > > >>>> checking if I can share this publicly or not, please hold :-)
> > > >>> 
> > > >>> I have now received feedback that I'm unfortunately not allowed to
> > > >>> share the test case :-(
> > > >>> 
> > > >>> The big picture in how to trigger this problem is that you start a DMA
> > > >>> transfer like this:
> > > >>> 
> > > >>> struct dma_async_tx_descriptor *tx = ...;
> > > >>> 
> > > >>> ...
> > > >>> 
> > > >>> tx->tx_submit(tx);
> > > >>> 
> > > >>> And then you directly call dma_release_channel() on this channel
> > > >>> without making sure the completion callback ran or anything. Now if you
> > > >>> are unlucky the ISR have not finished running for the DMA when
> > > >>> dma_release_channel() starts to clean up resources. The synchronisation
> > > >>> point in the dma_release_channel() call path fixes this.
> > > >> 
> > > >> Well the API expectation would be you abort the txn before calling
> > > >> release. So the expected order should be:
> > > >> 
> > > >> dmaengine_terminate_all();
> > > >> dma_release_channel();
> > > > 
> > > > Agree this is the correct way and in this case patch 3/3 in this series
> > > > could be dropped. Then device_synchronize() would added to rcar-dmac,
> > > > dmaengine_terminate_all() would turn of the IRQ and
> > > > dma_release_channel() would ensure that device_synchronize() is called
> > > > prior to calling rcar-dmac device_free_chan_resources().
> > > > 
> > > >> Terminate should then stop the channel, ie abort the pending
> > > >> descriptors..
> > > > 
> > > > However for reasons unknown to me the rcar-dmac
> > > > device_free_chan_resources() implementation implements logic to turn of
> > > > IRQs before it frees the resources. And it's because of this patch 3/3
> > > > is needed so that it can be sure no ISR is running before it frees
> > > > resources.
> > > > 
> > > > I don't know how to best proceed here. I agree it feels a bit odd that
> > > > device_free_chan_resources() is dealing with the IRQs as such things
> > > > should be done before it's called. But on the other hand that code has
> > > > been part of the driver since it was added upstream. I feel a bit
> > > > uncomfortable just removing that part from the
> > > > device_free_chan_resources() since the driver have been in use with it
> > > > for such a long time.
> > > > 
> > > > How would you prefer I try and resolve this?
> > > 
> > > Perhaps Laurent knows why it was implemented this way?
> > 
> > That was nearly 3 years ago, and I can hardly remember reasons related to code 
> > I wrote 3 months ago :-)
> > 
> > I might just have been overcautious, guarding against conditions that should 
> > not happen if the caller behaves correctly. The situation might have changed 
> > since the driver was written. It might also be just a case of cargo-cult 
> > programming, as the shdma_free_chan_resources() has very similar code.
> 
> Since the driver today have this behavior would it not be best to first 
> make sure it functions as expected and then as a second step see if we 
> can remove it all together?
> 
> Vinod would you be strongly opposed to picking up this series as is?

If there are no objections then I don't mind picking, please do rebase on
-rc1 and resend

-- 
~Vinod

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources
  2017-05-14 12:01                   ` Vinod Koul
@ 2017-05-15 23:12                     ` Niklas Söderlund
  0 siblings, 0 replies; 14+ messages in thread
From: Niklas Söderlund @ 2017-05-15 23:12 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Laurent Pinchart, Geert Uytterhoeven, dmaengine, Linux-Renesas,
	Yoshihiro Shimoda, Lars-Peter Clausen, Hiroyuki Yokoyama

Hi Vinod,

On 2017-05-14 17:31:36 +0530, Vinod Koul wrote:
> On Fri, May 12, 2017 at 02:49:38PM +0200, Niklas S�derlund wrote:
> > On 2017-04-07 14:33:47 +0300, Laurent Pinchart wrote:
> > > Hi Geert,
> > > 
> > > On Wednesday 05 Apr 2017 12:40:11 Geert Uytterhoeven wrote:
> > > > On Wed, Apr 5, 2017 at 11:14 AM, Niklas S�derlund wrote:
> > > > > On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
> > > > >> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas S�derlund wrote:
> > > > >>> On 2017-03-29 15:30:42 +0200, Niklas S�derlund wrote:
> > > > >>>> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> > > > >>>>> On Wed, Mar 29, 2017 at 12:40 AM, Niklas S�derlund wrote:
> > > > >>>>>> This fixes a race condition where the channel resources could be
> > > > >>>>>> freed before the ISR had finished running resulting in a NULL
> > > > >>>>>> pointer reference from the ISR.
> > > > >>>>>> 
> > > > >>>>>> [  167.148934] Unable to handle kernel NULL pointer dereference
> > > > >>>>>> at virtual address 00000000
> > > > >>>>>> [  167.157051] pgd = ffff80003c641000
> > > > >>>>>> [  167.160449] [00000000] *pgd=000000007c507003,
> > > > >>>>>> *pud=000000007c4ff003, *pmd=0000000000000000
> > > > >>>>>> [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > > > >>>>>> [  167.174289] Modules linked in:
> > > > >>>>>> [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted
> > > > >>>>>> 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > > > >>>>>> [  167.186131] Hardware name: Renesas Salvator-X board based on
> > > > >>>>>> r8a7795 (DT)
> > > > >>>>>> [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > > > >>>>>> [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > > > >>>>>> [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> > > > >>>>> 
> > > > >>>>> Do you have a test case to trigger this?
> > > > >>>> 
> > > > >>>> Yes I have a testcase, it's rather complex and involves both a kernel
> > > > >>>> module and a userspaces application to stress the rcar-dmac. I'm
> > > > >>>> checking if I can share this publicly or not, please hold :-)
> > > > >>> 
> > > > >>> I have now received feedback that I'm unfortunately not allowed to
> > > > >>> share the test case :-(
> > > > >>> 
> > > > >>> The big picture in how to trigger this problem is that you start a DMA
> > > > >>> transfer like this:
> > > > >>> 
> > > > >>> struct dma_async_tx_descriptor *tx = ...;
> > > > >>> 
> > > > >>> ...
> > > > >>> 
> > > > >>> tx->tx_submit(tx);
> > > > >>> 
> > > > >>> And then you directly call dma_release_channel() on this channel
> > > > >>> without making sure the completion callback ran or anything. Now if you
> > > > >>> are unlucky the ISR have not finished running for the DMA when
> > > > >>> dma_release_channel() starts to clean up resources. The synchronisation
> > > > >>> point in the dma_release_channel() call path fixes this.
> > > > >> 
> > > > >> Well the API expectation would be you abort the txn before calling
> > > > >> release. So the expected order should be:
> > > > >> 
> > > > >> dmaengine_terminate_all();
> > > > >> dma_release_channel();
> > > > > 
> > > > > Agree this is the correct way and in this case patch 3/3 in this series
> > > > > could be dropped. Then device_synchronize() would added to rcar-dmac,
> > > > > dmaengine_terminate_all() would turn of the IRQ and
> > > > > dma_release_channel() would ensure that device_synchronize() is called
> > > > > prior to calling rcar-dmac device_free_chan_resources().
> > > > > 
> > > > >> Terminate should then stop the channel, ie abort the pending
> > > > >> descriptors..
> > > > > 
> > > > > However for reasons unknown to me the rcar-dmac
> > > > > device_free_chan_resources() implementation implements logic to turn of
> > > > > IRQs before it frees the resources. And it's because of this patch 3/3
> > > > > is needed so that it can be sure no ISR is running before it frees
> > > > > resources.
> > > > > 
> > > > > I don't know how to best proceed here. I agree it feels a bit odd that
> > > > > device_free_chan_resources() is dealing with the IRQs as such things
> > > > > should be done before it's called. But on the other hand that code has
> > > > > been part of the driver since it was added upstream. I feel a bit
> > > > > uncomfortable just removing that part from the
> > > > > device_free_chan_resources() since the driver have been in use with it
> > > > > for such a long time.
> > > > > 
> > > > > How would you prefer I try and resolve this?
> > > > 
> > > > Perhaps Laurent knows why it was implemented this way?
> > > 
> > > That was nearly 3 years ago, and I can hardly remember reasons related to code 
> > > I wrote 3 months ago :-)
> > > 
> > > I might just have been overcautious, guarding against conditions that should 
> > > not happen if the caller behaves correctly. The situation might have changed 
> > > since the driver was written. It might also be just a case of cargo-cult 
> > > programming, as the shdma_free_chan_resources() has very similar code.
> > 
> > Since the driver today have this behavior would it not be best to first 
> > make sure it functions as expected and then as a second step see if we 
> > can remove it all together?
> > 
> > Vinod would you be strongly opposed to picking up this series as is?
> 
> If there are no objections then I don't mind picking, please do rebase on
> -rc1 and resend

Thanks I have sent out a rebased v2.

> 
> -- 
> ~Vinod

-- 
Regards,
Niklas S�derlund

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-05-15 23:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28 22:40 [PATCH 0/3] dmaengine: rcar-dmac: fix resource freeing synchronization Niklas Söderlund
2017-03-28 22:40 ` [PATCH 1/3] dmaengine: rcar-dmac: store channel IRQ in struct rcar_dmac_chan Niklas Söderlund
2017-03-28 22:40 ` [PATCH 2/3] dmaengine: rcar-dmac: implement device_synchronize() Niklas Söderlund
2017-03-28 22:40 ` [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources Niklas Söderlund
2017-03-29 12:31   ` Geert Uytterhoeven
2017-03-29 13:30     ` Niklas Söderlund
2017-03-30  7:38       ` Niklas Söderlund
2017-04-05  3:25         ` Vinod Koul
2017-04-05  9:14           ` Niklas Söderlund
2017-04-05 10:40             ` Geert Uytterhoeven
2017-04-07 11:33               ` Laurent Pinchart
2017-05-12 12:49                 ` Niklas Söderlund
2017-05-14 12:01                   ` Vinod Koul
2017-05-15 23:12                     ` Niklas Söderlund

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.