linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dmaengine: tegra: crash fix observed during dma client(UART) stress testing
@ 2016-05-03 12:14 Shardar Shariff Md
  2016-05-03 14:09 ` Jon Hunter
  0 siblings, 1 reply; 4+ messages in thread
From: Shardar Shariff Md @ 2016-05-03 12:14 UTC (permalink / raw)
  To: ldewangan, vinod.koul, dan.j.williams, swarren, thierry.reding,
	gnurou, dmaengine, linux-tegra, linux-kernel, jonathanh
  Cc: smohammed

During DMA client(UART) stress testing, observed below crash:

[  167.041591] Unable to handle kernel paging request at virtual address 00100108
[  167.048818] pgd = ffffffc0de7ee000
[  167.052222] [00100108] *pgd=0000000000000000
[  167.056513] Internal error: Oops: 96000045 [#1] PREEMPT SMP
[  167.084048] Modules linked in:
[  167.087126] CPU: 0 PID: 1786 Comm: uarttest Tainted: G        W    3.10.33-gb76f6f9 #5
[  167.095040] task: ffffffc0a5ba6ac0 ti: ffffffc094380000 task.ti: ffffffc094380000
[  167.102529] PC is at tegra_dma_tasklet+0x50/0xf4
[  167.107148] LR is at tegra_dma_tasklet+0xc0/0xf4
[  167.111767] pc : [<ffffffc00044acc8>] lr : [<ffffffc00044ad38>] pstate: 800001c5
[  167.119155] sp : ffffffc094383a60
[  167.122469] x29: ffffffc094383a60 x28: 0000000000000000

Issue: UART RX channel DMA completion EOC(End of completion) interrupt
occurs and dma driver schedules tasklet() to execute callback function
and empty the cb_desc (callback descriptor). Before dma driver tasklet
runs, UART RX EORD (end of receive data) interrupt occurs. Here UART RX
ISR handler calls tegra_dma_terminate_all() and re-configures the DMA
for RX. While re-configuring, the cb_node data is re-initialized but the
cb_desc list is not emptied. Now when dma driver tasklet callback function
tries to check cb_desc and delete the cb_node (re-initialized node) kernel
crashes.

Fix: Empty the cb_desc data structure during tegra_dma_terminate_all()
routine if there are no pending transfers.

Signed-off-by: Shardar Shariff Md <smohammed@nvidia.com>
---
 drivers/dma/tegra20-apb-dma.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
index 3871f29..34bb4cd 100644
--- a/drivers/dma/tegra20-apb-dma.c
+++ b/drivers/dma/tegra20-apb-dma.c
@@ -751,10 +751,8 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
 	bool was_busy;
 
 	spin_lock_irqsave(&tdc->lock, flags);
-	if (list_empty(&tdc->pending_sg_req)) {
-		spin_unlock_irqrestore(&tdc->lock, flags);
-		return 0;
-	}
+	if (list_empty(&tdc->pending_sg_req))
+		goto empty_cblist;
 
 	if (!tdc->busy)
 		goto skip_dma_stop;
@@ -787,6 +785,7 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
 skip_dma_stop:
 	tegra_dma_abort_all(tdc);
 
+empty_cblist:
 	while (!list_empty(&tdc->cb_desc)) {
 		dma_desc  = list_first_entry(&tdc->cb_desc,
 					typeof(*dma_desc), cb_node);
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] dmaengine: tegra: crash fix observed during dma client(UART) stress testing
  2016-05-03 12:14 [PATCH] dmaengine: tegra: crash fix observed during dma client(UART) stress testing Shardar Shariff Md
@ 2016-05-03 14:09 ` Jon Hunter
  2016-05-03 14:44   ` Shardar Mohammed
  0 siblings, 1 reply; 4+ messages in thread
From: Jon Hunter @ 2016-05-03 14:09 UTC (permalink / raw)
  To: Shardar Shariff Md, ldewangan, vinod.koul, dan.j.williams,
	swarren, thierry.reding, gnurou, dmaengine, linux-tegra,
	linux-kernel


On 03/05/16 13:14, Shardar Shariff Md wrote:
> During DMA client(UART) stress testing, observed below crash:
> 
> [  167.041591] Unable to handle kernel paging request at virtual address 00100108
> [  167.048818] pgd = ffffffc0de7ee000
> [  167.052222] [00100108] *pgd=0000000000000000
> [  167.056513] Internal error: Oops: 96000045 [#1] PREEMPT SMP
> [  167.084048] Modules linked in:
> [  167.087126] CPU: 0 PID: 1786 Comm: uarttest Tainted: G        W    3.10.33-gb76f6f9 #5
> [  167.095040] task: ffffffc0a5ba6ac0 ti: ffffffc094380000 task.ti: ffffffc094380000
> [  167.102529] PC is at tegra_dma_tasklet+0x50/0xf4
> [  167.107148] LR is at tegra_dma_tasklet+0xc0/0xf4
> [  167.111767] pc : [<ffffffc00044acc8>] lr : [<ffffffc00044ad38>] pstate: 800001c5
> [  167.119155] sp : ffffffc094383a60
> [  167.122469] x29: ffffffc094383a60 x28: 0000000000000000

This appears to be from quite an old kernel. I assume that this is still
valid for the latest mainline?

> Issue: UART RX channel DMA completion EOC(End of completion) interrupt
> occurs and dma driver schedules tasklet() to execute callback function
> and empty the cb_desc (callback descriptor). Before dma driver tasklet
> runs, UART RX EORD (end of receive data) interrupt occurs. Here UART RX
> ISR handler calls tegra_dma_terminate_all() and re-configures the DMA
> for RX. While re-configuring, the cb_node data is re-initialized but the
> cb_desc list is not emptied. Now when dma driver tasklet callback function
> tries to check cb_desc and delete the cb_node (re-initialized node) kernel
> crashes.

I am wondering if we can simplify the description a bit here.

Is the problem that the current implementation assumes that the tasklet
will run before the next transfer has been configured? And if this does
not happen then we may request the same descriptor for the next tranfer
that is currently on the callback queue waiting for the tasklet to run?

> Fix: Empty the cb_desc data structure during tegra_dma_terminate_all()
> routine if there are no pending transfers.

Does note really describe the fix. We are emptying a list of descriptors
that have a callback pending.

I would be tempted to change the subject slightly as this is fixing a
race condition that could be seen by various different clients.

Cheers
Jon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] dmaengine: tegra: crash fix observed during dma client(UART) stress testing
  2016-05-03 14:09 ` Jon Hunter
@ 2016-05-03 14:44   ` Shardar Mohammed
  2016-05-03 15:17     ` Thierry Reding
  0 siblings, 1 reply; 4+ messages in thread
From: Shardar Mohammed @ 2016-05-03 14:44 UTC (permalink / raw)
  To: Jonathan Hunter, Laxman Dewangan, vinod.koul, dan.j.williams,
	swarren, thierry.reding, gnurou, dmaengine, linux-tegra,
	linux-kernel

Thanks for review. Please find my comments inline.

> On 03/05/16 13:14, Shardar Shariff Md wrote:
> > During DMA client(UART) stress testing, observed below crash:
> >
> > [  167.041591] Unable to handle kernel paging request at virtual
> > address 00100108 [  167.048818] pgd = ffffffc0de7ee000 [  167.052222]
> > [00100108] *pgd=0000000000000000 [  167.056513] Internal error: Oops:
> > 96000045 [#1] PREEMPT SMP [  167.084048] Modules linked in:
> > [  167.087126] CPU: 0 PID: 1786 Comm: uarttest Tainted: G        W    3.10.33-
> gb76f6f9 #5
> > [  167.095040] task: ffffffc0a5ba6ac0 ti: ffffffc094380000 task.ti:
> > ffffffc094380000 [  167.102529] PC is at tegra_dma_tasklet+0x50/0xf4 [
> > 167.107148] LR is at tegra_dma_tasklet+0xc0/0xf4 [  167.111767] pc :
> > [<ffffffc00044acc8>] lr : [<ffffffc00044ad38>] pstate: 800001c5 [
> > 167.119155] sp : ffffffc094383a60 [  167.122469] x29: ffffffc094383a60
> > x28: 0000000000000000
> 
> This appears to be from quite an old kernel. I assume that this is still valid for
> the latest mainline?
[Shardar] Yes.
> 
> > Issue: UART RX channel DMA completion EOC(End of completion) interrupt
> > occurs and dma driver schedules tasklet() to execute callback function
> > and empty the cb_desc (callback descriptor). Before dma driver tasklet
> > runs, UART RX EORD (end of receive data) interrupt occurs. Here UART
> > RX ISR handler calls tegra_dma_terminate_all() and re-configures the
> > DMA for RX. While re-configuring, the cb_node data is re-initialized
> > but the cb_desc list is not emptied. Now when dma driver tasklet
> > callback function tries to check cb_desc and delete the cb_node
> > (re-initialized node) kernel crashes.
> 
> I am wondering if we can simplify the description a bit here.
> 
[Shardar] Will modify and push new patch.

> Is the problem that the current implementation assumes that the tasklet will
> run before the next transfer has been configured? And if this does not
> happen then we may request the same descriptor for the next tranfer that is
> currently on the callback queue waiting for the tasklet to run?
[Shardar] Yes.
If tasklet has not yet run and meanwhile dma client (UART RX ISR for EORD case) isr handler runs then it calls the tegra_dma_terminate_all() and queries for the bytes transferred through dma and then acknowledges the dma_desc (async_tx_ack(rx_dma_desc)). As the dma desc is acknowledged the same descriptor is allocated to the next transfer and dma_desc is initialized (where cb_node list is initialized).
Now when the tasklet runs, channel callback descriptor queue will be pointed to reinitialized cb_node. Which will cause the crash when deleting cb_node.

> 
> > Fix: Empty the cb_desc data structure during tegra_dma_terminate_all()
> > routine if there are no pending transfers.
> 
> Does note really describe the fix. We are emptying a list of descriptors that
> have a callback pending.
> 
[Shardar] Will elaborate the fix and upload new patch set.
Here dma client (UART ISR) is calling the tegra_dma_terminate_all() before the tasklet is running, so there is no need to handle the functionality of tasklet i.e execute the callback function and deleting the callback nodes from the queue. So fix is to delete the callback queue when there are no pending transfers.

> I would be tempted to change the subject slightly as this is fixing a race
> condition that could be seen by various different clients.
> 
[Shardar] Will modify it.
> Cheers
> Jon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] dmaengine: tegra: crash fix observed during dma client(UART) stress testing
  2016-05-03 14:44   ` Shardar Mohammed
@ 2016-05-03 15:17     ` Thierry Reding
  0 siblings, 0 replies; 4+ messages in thread
From: Thierry Reding @ 2016-05-03 15:17 UTC (permalink / raw)
  To: Shardar Mohammed
  Cc: Jonathan Hunter, Laxman Dewangan, vinod.koul, dan.j.williams,
	swarren, gnurou, dmaengine, linux-tegra, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]

On Tue, May 03, 2016 at 02:44:49PM +0000, Shardar Mohammed wrote:
> Thanks for review. Please find my comments inline.
> 
> > On 03/05/16 13:14, Shardar Shariff Md wrote:
> > > During DMA client(UART) stress testing, observed below crash:
> > >
> > > [  167.041591] Unable to handle kernel paging request at virtual
> > > address 00100108 [  167.048818] pgd = ffffffc0de7ee000 [  167.052222]
> > > [00100108] *pgd=0000000000000000 [  167.056513] Internal error: Oops:
> > > 96000045 [#1] PREEMPT SMP [  167.084048] Modules linked in:
> > > [  167.087126] CPU: 0 PID: 1786 Comm: uarttest Tainted: G        W    3.10.33-
> > gb76f6f9 #5
> > > [  167.095040] task: ffffffc0a5ba6ac0 ti: ffffffc094380000 task.ti:
> > > ffffffc094380000 [  167.102529] PC is at tegra_dma_tasklet+0x50/0xf4 [
> > > 167.107148] LR is at tegra_dma_tasklet+0xc0/0xf4 [  167.111767] pc :
> > > [<ffffffc00044acc8>] lr : [<ffffffc00044ad38>] pstate: 800001c5 [
> > > 167.119155] sp : ffffffc094383a60 [  167.122469] x29: ffffffc094383a60
> > > x28: 0000000000000000
> > 
> > This appears to be from quite an old kernel. I assume that this is still valid for
> > the latest mainline?
> [Shardar] Yes.

Please get into the habit of testing your patches against upstream
kernels. Merely testing that it applies is not enough.

All patches that are submitted for mainline inclusion must be *runtime*
tested against a mainline kernel.

Also it would be nice if you could offer steps on how to reproduce, that
way we can extend our testing to make sure we never regress.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-05-03 15:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-03 12:14 [PATCH] dmaengine: tegra: crash fix observed during dma client(UART) stress testing Shardar Shariff Md
2016-05-03 14:09 ` Jon Hunter
2016-05-03 14:44   ` Shardar Mohammed
2016-05-03 15:17     ` Thierry Reding

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).