Linux-Renesas-SoC Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2] ravb: On timeout disable IRQs to stop processing
@ 2020-05-18  4:54 Dirk Behme
  2020-05-18  5:12 ` Shashikant Suguni (RBEI/ECF1)
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Dirk Behme @ 2020-05-18  4:54 UTC (permalink / raw)
  To: sergei.shtylyov, linux-renesas-soc; +Cc: dirk.behme, Shashikant.Suguni

Analyzing [1] it seems there is a race condition where ravb_start_xmit()
can be called from interrupt while tx skbuffs are being torn down in
the scheduled timeout handling. The actual timeout work is done in
ravb_tx_timeout_work() during which the tx skbuffs are torn down via
invocations of ravb_ring_free(). But there seems to be no flag to tell
the driver it is shutting down so it continues to use the ring buffer
when it should not.

Fix this by disabling the interrupts in the timeout handler.

[1]

-- cut --
ravb e6800000.ethernet ethernet: transmit timed out, status 00000000, resetting...
ravb e6800000.ethernet ethernet: failed to switch device to config mode
Unable to handle kernel NULL pointer dereference at virtual address 00000018
Mem abort info:
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000046
  CM = 0, WnR = 1
user pgtable: 4k pages, 48-bit VAs, pgd = ffff80065622f000
[0000000000000018] *pgd=00000006962a7003
, *pud=00000006962a8003
, *pmd=0000000000000000
Internal error: Oops: 96000046 [#1] PREEMPT SMP
Modules linked in:
...
Process Thread1 (pid: 3132, stack limit = 0xffff000027dd0000)
CPU: 2 PID: 3132 Comm: Thread1 Tainted: G        WC 4.14.130-ltsi-g28acae87 #1
Hardware name: Board based on r8a7796 (DT)
task: ffff80064f2aaa00 task.stack: ffff000027dd0000
PC is at ravb_start_xmit+0x138/0x5a0
LR is at ravb_start_xmit+0x40/0x5a0
pc : [<ffff0000084d6324>] lr : [<ffff0000084d622c>] pstate: 600001c5
sp : ffff000027dd3550
x29: ffff000027dd3550
x28: 0000000000000076
x27: ffff80061035ff00
x26: ffff000027dd3694
x25: ffff80069624f800
x24: ffff80069624f000
x23: 0000000000000003
x22: 0000000000000001
x21: ffff80069624f000
x20: 0000000000000000
x19: ffff80069624f000
x18: 0000000000000014
x17: 0000ffff9b90ddb0
x16: ffff00000867d07c
x15: 0000155107b31031
x14: 000409000c000000
x13: 0000000003000001
x12: 0100050010000001
x11: 0000000003000001
x10: 0100010010000001
x9 : 20000000000000c0
x8 : 0000000000000000
x7 : ffff8006656f9388
x6 : 0000000000000002
x5 : 0000000000000000
x4 : ffff8006656f929c
x3 : ffff000027dd3694
x2 : 0000000000000018
x1 : 0000000000000000
x0 : 0000000000000003
Call trace:
Exception stack(0xffff000027dd3410 to 0xffff000027dd3550)
3400:                                   0000000000000003 0000000000000000
3420: 0000000000000018 ffff000027dd3694 ffff8006656f929c 0000000000000000
3440: 0000000000000002 ffff8006656f9388 0000000000000000 20000000000000c0
3460: 0100010010000001 0000000003000001 0100050010000001 0000000003000001
3480: 000409000c000000 0000155107b31031 ffff00000867d07c 0000ffff9b90ddb0
34a0: 0000000000000014 ffff80069624f000 0000000000000000 ffff80069624f000
34c0: 0000000000000001 0000000000000003 ffff80069624f000 ffff80069624f800
34e0: ffff000027dd3694 ffff80061035ff00 0000000000000076 ffff000027dd3550
3500: ffff0000084d622c ffff000027dd3550 ffff0000084d6324 00000000600001c5
3520: ffff00000921d008 ffff8006159b0d00 0000ffffffffffff ffff0000084d622c
3540: ffff000027dd3550 ffff0000084d6324
[<ffff0000084d6324>] ravb_start_xmit+0x138/0x5a0
[<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c
[<ffff0000086c51c0>] sch_direct_xmit+0xb0/0x1a8
[<ffff0000086c54cc>] __qdisc_run+0x214/0x2ec
[<ffff00000869a348>] __dev_queue_xmit+0x35c/0x5b4
[<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18
[<ffff000000b9765c>] register_vlan_dev+0xc74/0x10f8 [8021q]
[<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c
[<ffff00000869a438>] __dev_queue_xmit+0x44c/0x5b4
[<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18
[<ffff0000086a5b9c>] neigh_connected_output+0xc0/0xe4
[<ffff0000086d6bd8>] ip_finish_output2+0x3c0/0x3fc
[<ffff0000086d8178>] ip_finish_output+0xf8/0x1c4
[<ffff0000086d8b74>] ip_mc_output+0x258/0x308
[<ffff0000086d8400>] ip_local_out+0x44/0x54
[<ffff0000086d9594>] ip_send_skb+0x1c/0xa8
[<ffff0000086fea4c>] udp_send_skb+0x11c/0x244
[<ffff000008701084>] udp_sendmsg+0x534/0x6bc
[<ffff00000870b390>] inet_sendmsg+0x40/0xe0
[<ffff00000867b588>] sock_sendmsg+0x3c/0x58
[<ffff00000867bf18>] ___sys_sendmsg+0x228/0x278
[<ffff00000867d03c>] __sys_sendmsg+0x58/0x98
[<ffff00000867d08c>] SyS_sendmsg+0x10/0x20
Exception stack(0xffff000027dd3ec0 to 0xffff000027dd4000)
3ec0: 0000000000000012 0000ffff8a7fdf18 0000000000004000 0000000000000000
3ee0: 0000ffff8a7ff258 0000ffff8a7ff150 0000ffff8a7ff840 0000000000000000
3f00: 00000000000000d3 0100010010000001 0000000003000001 0100050010000001
3f20: 0000000003000001 000409000c000000 0000000000000047 0000155107b31031
3f40: 0000ffff9b67dfb8 0000ffff9b90ddb0 0000000000000014 0000000000004000
3f60: 0000000000000012 0000ffff8a7fdf18 0000ffff780017b0 0000000000004000
3f80: 0000000000000010 0000000000000001 0000ffff8a7fdf00 0000ffff9000b770
3fa0: 0000000000000012 0000ffff8a7fde60 0000ffff9b90de10 0000ffff8a7fde60
3fc0: 0000ffff9b90de28 0000000080000000 0000000000000012 00000000000000d3
3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[<ffff0000080832c0>] el0_svc_naked+0x34/0x38
Code: d37d7c01 d37d7c02 f90037a1 f9445f01 (f822683b)
---[ end trace eabda93d178d5bcb ]---
Kernel panic - not syncing: Fatal exception in interrupt
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x1802004
Memory Limit: 6144 MB
-- cut --

Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
---

Changes in v2: Dropped the RFC and added Sergei's Reviewed-by. No functional
               change.

 drivers/net/ethernet/renesas/ravb_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 067ad25553b9..0f91ab41b22b 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1447,6 +1447,11 @@ static void ravb_tx_timeout_work(struct work_struct *work)
 
 	netif_tx_stop_all_queues(ndev);
 
+	/* Disable interrupts by clearing the interrupt masks. */
+	ravb_write(ndev, 0, RIC0);
+	ravb_write(ndev, 0, RIC2);
+	ravb_write(ndev, 0, TIC);
+
 	/* Stop PTP Clock driver */
 	if (priv->chip_id == RCAR_GEN2)
 		ravb_ptp_stop(ndev);
-- 
2.20.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v2] ravb: On timeout disable IRQs to stop processing
  2020-05-18  4:54 [PATCH v2] ravb: On timeout disable IRQs to stop processing Dirk Behme
@ 2020-05-18  5:12 ` Shashikant Suguni (RBEI/ECF1)
  2020-05-18  8:36 ` Sergei Shtylyov
  2020-05-18  8:44 ` Sergei Shtylyov
  2 siblings, 0 replies; 9+ messages in thread
From: Shashikant Suguni (RBEI/ECF1) @ 2020-05-18  5:12 UTC (permalink / raw)
  To: Behme Dirk (CM/ESO2),
	sergei.shtylyov, linux-renesas-soc,
	Sasikumar Natarajan (RBEI/ECF1)

+ Sasikumar
	
Best regards,

Suguni Shashikant
RBEI/ECF1  

Tel. +91 80 6136-4404 


-----Original Message-----
From: Behme Dirk (CM/ESO2) <Dirk.Behme@de.bosch.com> 
Sent: Monday, May 18, 2020 10:25 AM
To: sergei.shtylyov@cogentembedded.com; linux-renesas-soc@vger.kernel.org
Cc: Behme Dirk (CM/ESO2) <Dirk.Behme@de.bosch.com>; Shashikant Suguni (RBEI/ECF1) <Shashikant.Suguni@in.bosch.com>
Subject: [PATCH v2] ravb: On timeout disable IRQs to stop processing

Analyzing [1] it seems there is a race condition where ravb_start_xmit() can be called from interrupt while tx skbuffs are being torn down in the scheduled timeout handling. The actual timeout work is done in
ravb_tx_timeout_work() during which the tx skbuffs are torn down via invocations of ravb_ring_free(). But there seems to be no flag to tell the driver it is shutting down so it continues to use the ring buffer when it should not.

Fix this by disabling the interrupts in the timeout handler.

[1]

-- cut --
ravb e6800000.ethernet ethernet: transmit timed out, status 00000000, resetting...
ravb e6800000.ethernet ethernet: failed to switch device to config mode Unable to handle kernel NULL pointer dereference at virtual address 00000018 Mem abort info:
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000046
  CM = 0, WnR = 1
user pgtable: 4k pages, 48-bit VAs, pgd = ffff80065622f000 [0000000000000018] *pgd=00000006962a7003 , *pud=00000006962a8003 , *pmd=0000000000000000 Internal error: Oops: 96000046 [#1] PREEMPT SMP Modules linked in:
...
Process Thread1 (pid: 3132, stack limit = 0xffff000027dd0000)
CPU: 2 PID: 3132 Comm: Thread1 Tainted: G        WC 4.14.130-ltsi-g28acae87 #1
Hardware name: Board based on r8a7796 (DT)
task: ffff80064f2aaa00 task.stack: ffff000027dd0000 PC is at ravb_start_xmit+0x138/0x5a0 LR is at ravb_start_xmit+0x40/0x5a0 pc : [<ffff0000084d6324>] lr : [<ffff0000084d622c>] pstate: 600001c5 sp : ffff000027dd3550
x29: ffff000027dd3550
x28: 0000000000000076
x27: ffff80061035ff00
x26: ffff000027dd3694
x25: ffff80069624f800
x24: ffff80069624f000
x23: 0000000000000003
x22: 0000000000000001
x21: ffff80069624f000
x20: 0000000000000000
x19: ffff80069624f000
x18: 0000000000000014
x17: 0000ffff9b90ddb0
x16: ffff00000867d07c
x15: 0000155107b31031
x14: 000409000c000000
x13: 0000000003000001
x12: 0100050010000001
x11: 0000000003000001
x10: 0100010010000001
x9 : 20000000000000c0
x8 : 0000000000000000
x7 : ffff8006656f9388
x6 : 0000000000000002
x5 : 0000000000000000
x4 : ffff8006656f929c
x3 : ffff000027dd3694
x2 : 0000000000000018
x1 : 0000000000000000
x0 : 0000000000000003
Call trace:
Exception stack(0xffff000027dd3410 to 0xffff000027dd3550)
3400:                                   0000000000000003 0000000000000000
3420: 0000000000000018 ffff000027dd3694 ffff8006656f929c 0000000000000000
3440: 0000000000000002 ffff8006656f9388 0000000000000000 20000000000000c0
3460: 0100010010000001 0000000003000001 0100050010000001 0000000003000001
3480: 000409000c000000 0000155107b31031 ffff00000867d07c 0000ffff9b90ddb0
34a0: 0000000000000014 ffff80069624f000 0000000000000000 ffff80069624f000
34c0: 0000000000000001 0000000000000003 ffff80069624f000 ffff80069624f800
34e0: ffff000027dd3694 ffff80061035ff00 0000000000000076 ffff000027dd3550
3500: ffff0000084d622c ffff000027dd3550 ffff0000084d6324 00000000600001c5
3520: ffff00000921d008 ffff8006159b0d00 0000ffffffffffff ffff0000084d622c
3540: ffff000027dd3550 ffff0000084d6324
[<ffff0000084d6324>] ravb_start_xmit+0x138/0x5a0 [<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c [<ffff0000086c51c0>] sch_direct_xmit+0xb0/0x1a8 [<ffff0000086c54cc>] __qdisc_run+0x214/0x2ec [<ffff00000869a348>] __dev_queue_xmit+0x35c/0x5b4 [<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18 [<ffff000000b9765c>] register_vlan_dev+0xc74/0x10f8 [8021q] [<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c [<ffff00000869a438>] __dev_queue_xmit+0x44c/0x5b4 [<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18 [<ffff0000086a5b9c>] neigh_connected_output+0xc0/0xe4 [<ffff0000086d6bd8>] ip_finish_output2+0x3c0/0x3fc [<ffff0000086d8178>] ip_finish_output+0xf8/0x1c4 [<ffff0000086d8b74>] ip_mc_output+0x258/0x308 [<ffff0000086d8400>] ip_local_out+0x44/0x54 [<ffff0000086d9594>] ip_send_skb+0x1c/0xa8 [<ffff0000086fea4c>] udp_send_skb+0x11c/0x244 [<ffff000008701084>] udp_sendmsg+0x534/0x6bc [<ffff00000870b390>] inet_sendmsg+0x40/0xe0 [<ffff00000867b588>] sock_sendmsg+0x3c/0x58 [<ffff00000867bf18>] ___sys_sendmsg+0x228/0x278 [<ffff00000867d03c>] __sys_sendmsg+0x58/0x98 [<ffff00000867d08c>] SyS_sendmsg+0x10/0x20 Exception stack(0xffff000027dd3ec0 to 0xffff000027dd4000)
3ec0: 0000000000000012 0000ffff8a7fdf18 0000000000004000 0000000000000000
3ee0: 0000ffff8a7ff258 0000ffff8a7ff150 0000ffff8a7ff840 0000000000000000
3f00: 00000000000000d3 0100010010000001 0000000003000001 0100050010000001
3f20: 0000000003000001 000409000c000000 0000000000000047 0000155107b31031
3f40: 0000ffff9b67dfb8 0000ffff9b90ddb0 0000000000000014 0000000000004000
3f60: 0000000000000012 0000ffff8a7fdf18 0000ffff780017b0 0000000000004000
3f80: 0000000000000010 0000000000000001 0000ffff8a7fdf00 0000ffff9000b770
3fa0: 0000000000000012 0000ffff8a7fde60 0000ffff9b90de10 0000ffff8a7fde60
3fc0: 0000ffff9b90de28 0000000080000000 0000000000000012 00000000000000d3
3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [<ffff0000080832c0>] el0_svc_naked+0x34/0x38
Code: d37d7c01 d37d7c02 f90037a1 f9445f01 (f822683b) ---[ end trace eabda93d178d5bcb ]--- Kernel panic - not syncing: Fatal exception in interrupt
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x1802004
Memory Limit: 6144 MB
-- cut --

Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
---

Changes in v2: Dropped the RFC and added Sergei's Reviewed-by. No functional
               change.

 drivers/net/ethernet/renesas/ravb_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 067ad25553b9..0f91ab41b22b 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1447,6 +1447,11 @@ static void ravb_tx_timeout_work(struct work_struct *work)
 
 	netif_tx_stop_all_queues(ndev);
 
+	/* Disable interrupts by clearing the interrupt masks. */
+	ravb_write(ndev, 0, RIC0);
+	ravb_write(ndev, 0, RIC2);
+	ravb_write(ndev, 0, TIC);
+
 	/* Stop PTP Clock driver */
 	if (priv->chip_id == RCAR_GEN2)
 		ravb_ptp_stop(ndev);
--
2.20.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] ravb: On timeout disable IRQs to stop processing
  2020-05-18  4:54 [PATCH v2] ravb: On timeout disable IRQs to stop processing Dirk Behme
  2020-05-18  5:12 ` Shashikant Suguni (RBEI/ECF1)
@ 2020-05-18  8:36 ` Sergei Shtylyov
  2020-05-28  8:12   ` Behme Dirk (CM/ESO2)
  2020-05-18  8:44 ` Sergei Shtylyov
  2 siblings, 1 reply; 9+ messages in thread
From: Sergei Shtylyov @ 2020-05-18  8:36 UTC (permalink / raw)
  To: Dirk Behme, linux-renesas-soc; +Cc: Shashikant.Suguni

Hello!

On 18.05.2020 7:54, Dirk Behme wrote:

> Analyzing [1] it seems there is a race condition where ravb_start_xmit()
> can be called from interrupt while tx skbuffs are being torn down in
> the scheduled timeout handling. The actual timeout work is done in
> ravb_tx_timeout_work() during which the tx skbuffs are torn down via
> invocations of ravb_ring_free(). But there seems to be no flag to tell
> the driver it is shutting down so it continues to use the ring buffer
> when it should not.
> 
> Fix this by disabling the interrupts in the timeout handler.
> 
> [1]
> 
> -- cut --
> ravb e6800000.ethernet ethernet: transmit timed out, status 00000000, resetting...
> ravb e6800000.ethernet ethernet: failed to switch device to config mode
> Unable to handle kernel NULL pointer dereference at virtual address 00000018
> Mem abort info:
>    Exception class = DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
> Data abort info:
>    ISV = 0, ISS = 0x00000046
>    CM = 0, WnR = 1
> user pgtable: 4k pages, 48-bit VAs, pgd = ffff80065622f000
> [0000000000000018] *pgd=00000006962a7003
> , *pud=00000006962a8003
> , *pmd=0000000000000000
> Internal error: Oops: 96000046 [#1] PREEMPT SMP
> Modules linked in:
> ...
> Process Thread1 (pid: 3132, stack limit = 0xffff000027dd0000)
> CPU: 2 PID: 3132 Comm: Thread1 Tainted: G        WC 4.14.130-ltsi-g28acae87 #1
> Hardware name: Board based on r8a7796 (DT)
> task: ffff80064f2aaa00 task.stack: ffff000027dd0000
> PC is at ravb_start_xmit+0x138/0x5a0
> LR is at ravb_start_xmit+0x40/0x5a0
> pc : [<ffff0000084d6324>] lr : [<ffff0000084d622c>] pstate: 600001c5
> sp : ffff000027dd3550
> x29: ffff000027dd3550
> x28: 0000000000000076
> x27: ffff80061035ff00
> x26: ffff000027dd3694
> x25: ffff80069624f800
> x24: ffff80069624f000
> x23: 0000000000000003
> x22: 0000000000000001
> x21: ffff80069624f000
> x20: 0000000000000000
> x19: ffff80069624f000
> x18: 0000000000000014
> x17: 0000ffff9b90ddb0
> x16: ffff00000867d07c
> x15: 0000155107b31031
> x14: 000409000c000000
> x13: 0000000003000001
> x12: 0100050010000001
> x11: 0000000003000001
> x10: 0100010010000001
> x9 : 20000000000000c0
> x8 : 0000000000000000
> x7 : ffff8006656f9388
> x6 : 0000000000000002
> x5 : 0000000000000000
> x4 : ffff8006656f929c
> x3 : ffff000027dd3694
> x2 : 0000000000000018
> x1 : 0000000000000000
> x0 : 0000000000000003
> Call trace:
> Exception stack(0xffff000027dd3410 to 0xffff000027dd3550)
> 3400:                                   0000000000000003 0000000000000000
> 3420: 0000000000000018 ffff000027dd3694 ffff8006656f929c 0000000000000000
> 3440: 0000000000000002 ffff8006656f9388 0000000000000000 20000000000000c0
> 3460: 0100010010000001 0000000003000001 0100050010000001 0000000003000001
> 3480: 000409000c000000 0000155107b31031 ffff00000867d07c 0000ffff9b90ddb0
> 34a0: 0000000000000014 ffff80069624f000 0000000000000000 ffff80069624f000
> 34c0: 0000000000000001 0000000000000003 ffff80069624f000 ffff80069624f800
> 34e0: ffff000027dd3694 ffff80061035ff00 0000000000000076 ffff000027dd3550
> 3500: ffff0000084d622c ffff000027dd3550 ffff0000084d6324 00000000600001c5
> 3520: ffff00000921d008 ffff8006159b0d00 0000ffffffffffff ffff0000084d622c
> 3540: ffff000027dd3550 ffff0000084d6324
> [<ffff0000084d6324>] ravb_start_xmit+0x138/0x5a0
> [<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c
> [<ffff0000086c51c0>] sch_direct_xmit+0xb0/0x1a8
> [<ffff0000086c54cc>] __qdisc_run+0x214/0x2ec
> [<ffff00000869a348>] __dev_queue_xmit+0x35c/0x5b4
> [<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18
> [<ffff000000b9765c>] register_vlan_dev+0xc74/0x10f8 [8021q]
> [<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c
> [<ffff00000869a438>] __dev_queue_xmit+0x44c/0x5b4
> [<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18
> [<ffff0000086a5b9c>] neigh_connected_output+0xc0/0xe4
> [<ffff0000086d6bd8>] ip_finish_output2+0x3c0/0x3fc
> [<ffff0000086d8178>] ip_finish_output+0xf8/0x1c4
> [<ffff0000086d8b74>] ip_mc_output+0x258/0x308
> [<ffff0000086d8400>] ip_local_out+0x44/0x54
> [<ffff0000086d9594>] ip_send_skb+0x1c/0xa8
> [<ffff0000086fea4c>] udp_send_skb+0x11c/0x244
> [<ffff000008701084>] udp_sendmsg+0x534/0x6bc
> [<ffff00000870b390>] inet_sendmsg+0x40/0xe0
> [<ffff00000867b588>] sock_sendmsg+0x3c/0x58
> [<ffff00000867bf18>] ___sys_sendmsg+0x228/0x278
> [<ffff00000867d03c>] __sys_sendmsg+0x58/0x98
> [<ffff00000867d08c>] SyS_sendmsg+0x10/0x20
> Exception stack(0xffff000027dd3ec0 to 0xffff000027dd4000)
> 3ec0: 0000000000000012 0000ffff8a7fdf18 0000000000004000 0000000000000000
> 3ee0: 0000ffff8a7ff258 0000ffff8a7ff150 0000ffff8a7ff840 0000000000000000
> 3f00: 00000000000000d3 0100010010000001 0000000003000001 0100050010000001
> 3f20: 0000000003000001 000409000c000000 0000000000000047 0000155107b31031
> 3f40: 0000ffff9b67dfb8 0000ffff9b90ddb0 0000000000000014 0000000000004000
> 3f60: 0000000000000012 0000ffff8a7fdf18 0000ffff780017b0 0000000000004000
> 3f80: 0000000000000010 0000000000000001 0000ffff8a7fdf00 0000ffff9000b770
> 3fa0: 0000000000000012 0000ffff8a7fde60 0000ffff9b90de10 0000ffff8a7fde60
> 3fc0: 0000ffff9b90de28 0000000080000000 0000000000000012 00000000000000d3
> 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [<ffff0000080832c0>] el0_svc_naked+0x34/0x38
> Code: d37d7c01 d37d7c02 f90037a1 f9445f01 (f822683b)
> ---[ end trace eabda93d178d5bcb ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> CPU features: 0x1802004
> Memory Limit: 6144 MB
> -- cut --
> 
> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
[...]

    You must post this to netdev@vget.kernel.org, else the patch won't get 
applied, ever...

MBR, Sergei

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] ravb: On timeout disable IRQs to stop processing
  2020-05-18  4:54 [PATCH v2] ravb: On timeout disable IRQs to stop processing Dirk Behme
  2020-05-18  5:12 ` Shashikant Suguni (RBEI/ECF1)
  2020-05-18  8:36 ` Sergei Shtylyov
@ 2020-05-18  8:44 ` Sergei Shtylyov
  2020-05-22  5:57   ` Yoshihiro Shimoda
  2 siblings, 1 reply; 9+ messages in thread
From: Sergei Shtylyov @ 2020-05-18  8:44 UTC (permalink / raw)
  To: Dirk Behme, linux-renesas-soc; +Cc: Shashikant.Suguni

On 18.05.2020 7:54, Dirk Behme wrote:

> Analyzing [1] it seems there is a race condition where ravb_start_xmit()
 > can be called from interrupt while tx skbuffs are being torn down in
 > the scheduled timeout handling. The actual timeout work is done in
 > ravb_tx_timeout_work() during which the tx skbuffs are torn down via
 > invocations of ravb_ring_free(). But there seems to be no flag to tell
 > the driver it is shutting down so it continues to use the ring buffer
 > when it should not.
 >
 > Fix this by disabling the interrupts in the timeout handler.

    Hm, given that we stop all TX queues prior to tearing down the buffers,
it is somewhat strange that you see the driver's send path called...
    But disabling the interrupts seems the Right Thing anyways...

> [1]
> 
> -- cut --
> ravb e6800000.ethernet ethernet: transmit timed out, status 00000000, resetting...
> ravb e6800000.ethernet ethernet: failed to switch device to config mode
> Unable to handle kernel NULL pointer dereference at virtual address 00000018
> Mem abort info:
>    Exception class = DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
> Data abort info:
>    ISV = 0, ISS = 0x00000046
>    CM = 0, WnR = 1
> user pgtable: 4k pages, 48-bit VAs, pgd = ffff80065622f000
> [0000000000000018] *pgd=00000006962a7003
> , *pud=00000006962a8003
> , *pmd=0000000000000000
> Internal error: Oops: 96000046 [#1] PREEMPT SMP
> Modules linked in:
> ...
> Process Thread1 (pid: 3132, stack limit = 0xffff000027dd0000)
> CPU: 2 PID: 3132 Comm: Thread1 Tainted: G        WC 4.14.130-ltsi-g28acae87 #1
> Hardware name: Board based on r8a7796 (DT)
> task: ffff80064f2aaa00 task.stack: ffff000027dd0000
> PC is at ravb_start_xmit+0x138/0x5a0
> LR is at ravb_start_xmit+0x40/0x5a0
> pc : [<ffff0000084d6324>] lr : [<ffff0000084d622c>] pstate: 600001c5
> sp : ffff000027dd3550
> x29: ffff000027dd3550
> x28: 0000000000000076
> x27: ffff80061035ff00
> x26: ffff000027dd3694
> x25: ffff80069624f800
> x24: ffff80069624f000
> x23: 0000000000000003
> x22: 0000000000000001
> x21: ffff80069624f000
> x20: 0000000000000000
> x19: ffff80069624f000
> x18: 0000000000000014
> x17: 0000ffff9b90ddb0
> x16: ffff00000867d07c
> x15: 0000155107b31031
> x14: 000409000c000000
> x13: 0000000003000001
> x12: 0100050010000001
> x11: 0000000003000001
> x10: 0100010010000001
> x9 : 20000000000000c0
> x8 : 0000000000000000
> x7 : ffff8006656f9388
> x6 : 0000000000000002
> x5 : 0000000000000000
> x4 : ffff8006656f929c
> x3 : ffff000027dd3694
> x2 : 0000000000000018
> x1 : 0000000000000000
> x0 : 0000000000000003
> Call trace:
> Exception stack(0xffff000027dd3410 to 0xffff000027dd3550)
> 3400:                                   0000000000000003 0000000000000000
> 3420: 0000000000000018 ffff000027dd3694 ffff8006656f929c 0000000000000000
> 3440: 0000000000000002 ffff8006656f9388 0000000000000000 20000000000000c0
> 3460: 0100010010000001 0000000003000001 0100050010000001 0000000003000001
> 3480: 000409000c000000 0000155107b31031 ffff00000867d07c 0000ffff9b90ddb0
> 34a0: 0000000000000014 ffff80069624f000 0000000000000000 ffff80069624f000
> 34c0: 0000000000000001 0000000000000003 ffff80069624f000 ffff80069624f800
> 34e0: ffff000027dd3694 ffff80061035ff00 0000000000000076 ffff000027dd3550
> 3500: ffff0000084d622c ffff000027dd3550 ffff0000084d6324 00000000600001c5
> 3520: ffff00000921d008 ffff8006159b0d00 0000ffffffffffff ffff0000084d622c
> 3540: ffff000027dd3550 ffff0000084d6324
> [<ffff0000084d6324>] ravb_start_xmit+0x138/0x5a0
> [<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c
> [<ffff0000086c51c0>] sch_direct_xmit+0xb0/0x1a8
> [<ffff0000086c54cc>] __qdisc_run+0x214/0x2ec
> [<ffff00000869a348>] __dev_queue_xmit+0x35c/0x5b4
> [<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18
> [<ffff000000b9765c>] register_vlan_dev+0xc74/0x10f8 [8021q]
> [<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c
> [<ffff00000869a438>] __dev_queue_xmit+0x44c/0x5b4
> [<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18
> [<ffff0000086a5b9c>] neigh_connected_output+0xc0/0xe4
> [<ffff0000086d6bd8>] ip_finish_output2+0x3c0/0x3fc
> [<ffff0000086d8178>] ip_finish_output+0xf8/0x1c4
> [<ffff0000086d8b74>] ip_mc_output+0x258/0x308
> [<ffff0000086d8400>] ip_local_out+0x44/0x54
> [<ffff0000086d9594>] ip_send_skb+0x1c/0xa8
> [<ffff0000086fea4c>] udp_send_skb+0x11c/0x244
> [<ffff000008701084>] udp_sendmsg+0x534/0x6bc
> [<ffff00000870b390>] inet_sendmsg+0x40/0xe0
> [<ffff00000867b588>] sock_sendmsg+0x3c/0x58
> [<ffff00000867bf18>] ___sys_sendmsg+0x228/0x278
> [<ffff00000867d03c>] __sys_sendmsg+0x58/0x98
> [<ffff00000867d08c>] SyS_sendmsg+0x10/0x20
> Exception stack(0xffff000027dd3ec0 to 0xffff000027dd4000)
> 3ec0: 0000000000000012 0000ffff8a7fdf18 0000000000004000 0000000000000000
> 3ee0: 0000ffff8a7ff258 0000ffff8a7ff150 0000ffff8a7ff840 0000000000000000
> 3f00: 00000000000000d3 0100010010000001 0000000003000001 0100050010000001
> 3f20: 0000000003000001 000409000c000000 0000000000000047 0000155107b31031
> 3f40: 0000ffff9b67dfb8 0000ffff9b90ddb0 0000000000000014 0000000000004000
> 3f60: 0000000000000012 0000ffff8a7fdf18 0000ffff780017b0 0000000000004000
> 3f80: 0000000000000010 0000000000000001 0000ffff8a7fdf00 0000ffff9000b770
> 3fa0: 0000000000000012 0000ffff8a7fde60 0000ffff9b90de10 0000ffff8a7fde60
> 3fc0: 0000ffff9b90de28 0000000080000000 0000000000000012 00000000000000d3
> 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [<ffff0000080832c0>] el0_svc_naked+0x34/0x38
> Code: d37d7c01 d37d7c02 f90037a1 f9445f01 (f822683b)
> ---[ end trace eabda93d178d5bcb ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> CPU features: 0x1802004
> Memory Limit: 6144 MB
> -- cut --
> 
> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
> ---
> 
> Changes in v2: Dropped the RFC and added Sergei's Reviewed-by. No functional
>                 change.
> 
>   drivers/net/ethernet/renesas/ravb_main.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index 067ad25553b9..0f91ab41b22b 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1447,6 +1447,11 @@ static void ravb_tx_timeout_work(struct work_struct *work)
>   
>   	netif_tx_stop_all_queues(ndev);
>   
> +	/* Disable interrupts by clearing the interrupt masks. */
> +	ravb_write(ndev, 0, RIC0);
> +	ravb_write(ndev, 0, RIC2);
> +	ravb_write(ndev, 0, TIC);
> +
>   	/* Stop PTP Clock driver */
>   	if (priv->chip_id == RCAR_GEN2)
>   		ravb_ptp_stop(ndev);

MBR, Sergei

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v2] ravb: On timeout disable IRQs to stop processing
  2020-05-18  8:44 ` Sergei Shtylyov
@ 2020-05-22  5:57   ` Yoshihiro Shimoda
  2020-05-22 11:16     ` Yoshihiro Shimoda
  0 siblings, 1 reply; 9+ messages in thread
From: Yoshihiro Shimoda @ 2020-05-22  5:57 UTC (permalink / raw)
  To: Sergei Shtylyov, REE dirk.behme@de.bosch.com, linux-renesas-soc
  Cc: Shashikant.Suguni

Hello Sergei-san,

> From: Sergei Shtylyov, Sent: Monday, May 18, 2020 5:45 PM
> 
> On 18.05.2020 7:54, Dirk Behme wrote:
> 
> > Analyzing [1] it seems there is a race condition where ravb_start_xmit()
>  > can be called from interrupt while tx skbuffs are being torn down in
>  > the scheduled timeout handling. The actual timeout work is done in
>  > ravb_tx_timeout_work() during which the tx skbuffs are torn down via
>  > invocations of ravb_ring_free(). But there seems to be no flag to tell
>  > the driver it is shutting down so it continues to use the ring buffer
>  > when it should not.
>  >
>  > Fix this by disabling the interrupts in the timeout handler.
> 
>     Hm, given that we stop all TX queues prior to tearing down the buffers,
> it is somewhat strange that you see the driver's send path called...
>     But disabling the interrupts seems the Right Thing anyways...

I didn't reproduce this issue and I didn't know the root cause yet.
But, I'm feeling to strange. Especially, "ravb_start_xmit() can be called from interrupt".
I didn't find where ravb_start_xmit() can be called from interrupt for now.

By the way, I'm thinking the following message is related to the issue.
> > ravb e6800000.ethernet ethernet: failed to switch device to config mode

The ravb_config() output the message if failed. And, ravb_tx_timeout_work()
calls ravb_config() via ravb_stop_dma() and ravb_dmac_init().
---
ravb_tx_timeout_work() {
	ravb_stop_dma()		// call ravb_config()

	ravb_ring_free()	// *1
<snip>
	ravb_dmac_init()	// call ravb_config()
<snip>
}

ravb_dmac_init()
{
<snip>
	error = ravb_config()	// "failed to switch device to config modes" here and -ETIMEDOUT
	if (error)
		return error	// *2
	ravb_ring_init()	// *3
<snip>
}
--

If ravb_config() failed at the *2, since ravb_ring_init() was not called,
any descriptors were not allocated and then the issue should happen.
Note that according to the datasheet, the controller cannot change the
mode from "Operation" to Configuration" when the controller is doing
TX or RX.

I don't know why the first ravb_config() doesn't fail for now.
But, my scenario is one of functions below enables the TX and RX
by calling ravb_rcv_snd_enable():
 - ravb_emac_interrupt_unlocked() ... if ESCR_LCHNG && !no_avb_link && PSR_LMON
 - ravb_adjust_link() ... if no_avb_link && phydev->link
 - ravb_set_rx_csum() ... if enabling NETIF_F_RXCSUM

ravb_tx_timeout_work() calls ravb_stop_dma(). And, ravb_stop_dma() calls
ravb_rcv_snd_disable(). So, we assumed ravb_tx_timeout_work() didn't transfer
anymore. However, if one of the above functions was called, this is possible
to enable TX and RX.

If this scenario is true, I'm thinking we can fix the issue by using
spin_{un}lock_irq{restore,save}() between ravb_stop_dma() to ravb_dmac_init().
# ravb_ptp_init() calls spin_lock API so that we should not spin lock ravb_ptp_{init,stop}().

What do you think?

I'll try to reproduce this issue on my environment again...
# I'm thinking ravb_adjust_link() is possible to cause this issue.
# But, Salvator-X doesn't have renesas,no-ether-link so that I'll try to add it.

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v2] ravb: On timeout disable IRQs to stop processing
  2020-05-22  5:57   ` Yoshihiro Shimoda
@ 2020-05-22 11:16     ` Yoshihiro Shimoda
  2020-05-22 11:58       ` Yoshihiro Shimoda
  0 siblings, 1 reply; 9+ messages in thread
From: Yoshihiro Shimoda @ 2020-05-22 11:16 UTC (permalink / raw)
  To: Sergei Shtylyov, REE dirk.behme@de.bosch.com, linux-renesas-soc
  Cc: Shashikant.Suguni

Hi again,

> From: Yoshihiro Shimoda, Sent: Friday, May 22, 2020 2:58 PM
> 
> Hello Sergei-san,
> 
> > From: Sergei Shtylyov, Sent: Monday, May 18, 2020 5:45 PM
> >
> > On 18.05.2020 7:54, Dirk Behme wrote:
> >
> > > Analyzing [1] it seems there is a race condition where ravb_start_xmit()
> >  > can be called from interrupt while tx skbuffs are being torn down in
> >  > the scheduled timeout handling. The actual timeout work is done in
> >  > ravb_tx_timeout_work() during which the tx skbuffs are torn down via
> >  > invocations of ravb_ring_free(). But there seems to be no flag to tell
> >  > the driver it is shutting down so it continues to use the ring buffer
> >  > when it should not.
> >  >
> >  > Fix this by disabling the interrupts in the timeout handler.
> >
> >     Hm, given that we stop all TX queues prior to tearing down the buffers,
> > it is somewhat strange that you see the driver's send path called...
> >     But disabling the interrupts seems the Right Thing anyways...
> 
> I didn't reproduce this issue and I didn't know the root cause yet.
> But, I'm feeling to strange. Especially, "ravb_start_xmit() can be called from interrupt".
> I didn't find where ravb_start_xmit() can be called from interrupt for now.
> 
> By the way, I'm thinking the following message is related to the issue.
> > > ravb e6800000.ethernet ethernet: failed to switch device to config mode
> 
> The ravb_config() output the message if failed. And, ravb_tx_timeout_work()
> calls ravb_config() via ravb_stop_dma() and ravb_dmac_init().
> ---
> ravb_tx_timeout_work() {
> 	ravb_stop_dma()		// call ravb_config()
> 
> 	ravb_ring_free()	// *1
> <snip>
> 	ravb_dmac_init()	// call ravb_config()
> <snip>
> }
> 
> ravb_dmac_init()
> {
> <snip>
> 	error = ravb_config()	// "failed to switch device to config modes" here and -ETIMEDOUT
> 	if (error)
> 		return error	// *2
> 	ravb_ring_init()	// *3
> <snip>
> }
> --
> 
> If ravb_config() failed at the *2, since ravb_ring_init() was not called,
> any descriptors were not allocated and then the issue should happen.
> Note that according to the datasheet, the controller cannot change the
> mode from "Operation" to Configuration" when the controller is doing
> TX or RX.

I'm afraid but, this scenario seems wrong because if the controller
enters "Configuration" once, the driver needs to change the mode to
"Operation" for starting any transfer again.

> I don't know why the first ravb_config() doesn't fail for now.
> But, my scenario is one of functions below enables the TX and RX
> by calling ravb_rcv_snd_enable():
>  - ravb_emac_interrupt_unlocked() ... if ESCR_LCHNG && !no_avb_link && PSR_LMON
>  - ravb_adjust_link() ... if no_avb_link && phydev->link
>  - ravb_set_rx_csum() ... if enabling NETIF_F_RXCSUM
> 
> ravb_tx_timeout_work() calls ravb_stop_dma(). And, ravb_stop_dma() calls
> ravb_rcv_snd_disable(). So, we assumed ravb_tx_timeout_work() didn't transfer
> anymore. However, if one of the above functions was called, this is possible
> to enable TX and RX.
> 
> If this scenario is true, I'm thinking we can fix the issue by using
> spin_{un}lock_irq{restore,save}() between ravb_stop_dma() to ravb_dmac_init().

This is also wrong because ravb_ring_init() call kcalloc() with GFP_KERNEL.

I'll try to reproduce this issue again in next week...

Best regards,
Yoshihiro Shimoda

> # ravb_ptp_init() calls spin_lock API so that we should not spin lock ravb_ptp_{init,stop}().
> 
> What do you think?
> 
> I'll try to reproduce this issue on my environment again...
> # I'm thinking ravb_adjust_link() is possible to cause this issue.
> # But, Salvator-X doesn't have renesas,no-ether-link so that I'll try to add it.
> 
> Best regards,
> Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v2] ravb: On timeout disable IRQs to stop processing
  2020-05-22 11:16     ` Yoshihiro Shimoda
@ 2020-05-22 11:58       ` Yoshihiro Shimoda
  2020-05-26  9:51         ` Yoshihiro Shimoda
  0 siblings, 1 reply; 9+ messages in thread
From: Yoshihiro Shimoda @ 2020-05-22 11:58 UTC (permalink / raw)
  To: Sergei Shtylyov, REE dirk.behme@de.bosch.com, linux-renesas-soc
  Cc: Shashikant.Suguni

Hi again and again,

> From: Yoshihiro Shimoda, Sent: Friday, May 22, 2020 8:17 PM
> 
> Hi again,
> 
> > From: Yoshihiro Shimoda, Sent: Friday, May 22, 2020 2:58 PM
> >
> > Hello Sergei-san,
> >
> > > From: Sergei Shtylyov, Sent: Monday, May 18, 2020 5:45 PM
> > >
> > > On 18.05.2020 7:54, Dirk Behme wrote:
> > >
> > > > Analyzing [1] it seems there is a race condition where ravb_start_xmit()
> > >  > can be called from interrupt while tx skbuffs are being torn down in
> > >  > the scheduled timeout handling. The actual timeout work is done in
> > >  > ravb_tx_timeout_work() during which the tx skbuffs are torn down via
> > >  > invocations of ravb_ring_free(). But there seems to be no flag to tell
> > >  > the driver it is shutting down so it continues to use the ring buffer
> > >  > when it should not.
> > >  >
> > >  > Fix this by disabling the interrupts in the timeout handler.
> > >
> > >     Hm, given that we stop all TX queues prior to tearing down the buffers,
> > > it is somewhat strange that you see the driver's send path called...
> > >     But disabling the interrupts seems the Right Thing anyways...
> >
> > I didn't reproduce this issue and I didn't know the root cause yet.
> > But, I'm feeling to strange. Especially, "ravb_start_xmit() can be called from interrupt".
> > I didn't find where ravb_start_xmit() can be called from interrupt for now.
> >
> > By the way, I'm thinking the following message is related to the issue.
> > > > ravb e6800000.ethernet ethernet: failed to switch device to config mode
> >
> > The ravb_config() output the message if failed. And, ravb_tx_timeout_work()
> > calls ravb_config() via ravb_stop_dma() and ravb_dmac_init().
> > ---
> > ravb_tx_timeout_work() {
> > 	ravb_stop_dma()		// call ravb_config()
> >
> > 	ravb_ring_free()	// *1
> > <snip>
> > 	ravb_dmac_init()	// call ravb_config()
> > <snip>
> > }
> >
> > ravb_dmac_init()
> > {
> > <snip>
> > 	error = ravb_config()	// "failed to switch device to config modes" here and -ETIMEDOUT
> > 	if (error)
> > 		return error	// *2
> > 	ravb_ring_init()	// *3
> > <snip>
> > }
> > --
> >
> > If ravb_config() failed at the *2, since ravb_ring_init() was not called,
> > any descriptors were not allocated and then the issue should happen.
> > Note that according to the datasheet, the controller cannot change the
> > mode from "Operation" to Configuration" when the controller is doing
> > TX or RX.
> 
> I'm afraid but, this scenario seems wrong because if the controller
> enters "Configuration" once, the driver needs to change the mode to
> "Operation" for starting any transfer again.

I realized that ravb_stop_dma() is possible to return without ravb_rcv_snd_disable() and
ravb_config() calling if TCCR or CSR register indicates transmit is started.

> > I don't know why the first ravb_config() doesn't fail for now.
> > But, my scenario is one of functions below enables the TX and RX
> > by calling ravb_rcv_snd_enable():
> >  - ravb_emac_interrupt_unlocked() ... if ESCR_LCHNG && !no_avb_link && PSR_LMON
> >  - ravb_adjust_link() ... if no_avb_link && phydev->link
> >  - ravb_set_rx_csum() ... if enabling NETIF_F_RXCSUM

In such the case (ravb_stop_dma() returns without any message),
I think my scenario is still alive without any these functions calling
because ravb_rcv_snd_disable() is not called.

I'll check the TCCR and CSR registers in next week.

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v2] ravb: On timeout disable IRQs to stop processing
  2020-05-22 11:58       ` Yoshihiro Shimoda
@ 2020-05-26  9:51         ` Yoshihiro Shimoda
  0 siblings, 0 replies; 9+ messages in thread
From: Yoshihiro Shimoda @ 2020-05-26  9:51 UTC (permalink / raw)
  To: Sergei Shtylyov, REE dirk.behme@de.bosch.com, linux-renesas-soc
  Cc: Shashikant.Suguni

Hi again,

> From: Yoshihiro Shimoda, Sent: Friday, May 22, 2020 8:59 PM
> In such the case (ravb_stop_dma() returns without any message),
> I think my scenario is still alive without any these functions calling
> because ravb_rcv_snd_disable() is not called.
> 
> I'll check the TCCR and CSR registers in next week.

I investigated these registers' feature and it is possible to return as my expectation.

So, I have submitted a patch as RFC like below.
https://patchwork.kernel.org/patch/11570217/

If you have any feedback about the patch, please let me know on the email.

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] ravb: On timeout disable IRQs to stop processing
  2020-05-18  8:36 ` Sergei Shtylyov
@ 2020-05-28  8:12   ` Behme Dirk (CM/ESO2)
  0 siblings, 0 replies; 9+ messages in thread
From: Behme Dirk (CM/ESO2) @ 2020-05-28  8:12 UTC (permalink / raw)
  To: Sergei Shtylyov, linux-renesas-soc; +Cc: Shashikant.Suguni



On 18.05.2020 10:36, Sergei Shtylyov wrote:
> Hello!
> 
> On 18.05.2020 7:54, Dirk Behme wrote:
> 
>> Analyzing [1] it seems there is a race condition where ravb_start_xmit()
>> can be called from interrupt while tx skbuffs are being torn down in
>> the scheduled timeout handling. The actual timeout work is done in
>> ravb_tx_timeout_work() during which the tx skbuffs are torn down via
>> invocations of ravb_ring_free(). But there seems to be no flag to tell
>> the driver it is shutting down so it continues to use the ring buffer
>> when it should not.
>>
>> Fix this by disabling the interrupts in the timeout handler.
>>
>> [1]
>>
>> -- cut --
>> ravb e6800000.ethernet ethernet: transmit timed out, status 00000000, 
>> resetting...
>> ravb e6800000.ethernet ethernet: failed to switch device to config mode
>> Unable to handle kernel NULL pointer dereference at virtual address 
>> 00000018
>> Mem abort info:
>>    Exception class = DABT (current EL), IL = 32 bits
>>    SET = 0, FnV = 0
>>    EA = 0, S1PTW = 0
>> Data abort info:
>>    ISV = 0, ISS = 0x00000046
>>    CM = 0, WnR = 1
>> user pgtable: 4k pages, 48-bit VAs, pgd = ffff80065622f000
>> [0000000000000018] *pgd=00000006962a7003
>> , *pud=00000006962a8003
>> , *pmd=0000000000000000
>> Internal error: Oops: 96000046 [#1] PREEMPT SMP
>> Modules linked in:
>> ...
>> Process Thread1 (pid: 3132, stack limit = 0xffff000027dd0000)
>> CPU: 2 PID: 3132 Comm: Thread1 Tainted: G        WC 
>> 4.14.130-ltsi-g28acae87 #1
>> Hardware name: Board based on r8a7796 (DT)
>> task: ffff80064f2aaa00 task.stack: ffff000027dd0000
>> PC is at ravb_start_xmit+0x138/0x5a0
>> LR is at ravb_start_xmit+0x40/0x5a0
>> pc : [<ffff0000084d6324>] lr : [<ffff0000084d622c>] pstate: 600001c5
>> sp : ffff000027dd3550
>> x29: ffff000027dd3550
>> x28: 0000000000000076
>> x27: ffff80061035ff00
>> x26: ffff000027dd3694
>> x25: ffff80069624f800
>> x24: ffff80069624f000
>> x23: 0000000000000003
>> x22: 0000000000000001
>> x21: ffff80069624f000
>> x20: 0000000000000000
>> x19: ffff80069624f000
>> x18: 0000000000000014
>> x17: 0000ffff9b90ddb0
>> x16: ffff00000867d07c
>> x15: 0000155107b31031
>> x14: 000409000c000000
>> x13: 0000000003000001
>> x12: 0100050010000001
>> x11: 0000000003000001
>> x10: 0100010010000001
>> x9 : 20000000000000c0
>> x8 : 0000000000000000
>> x7 : ffff8006656f9388
>> x6 : 0000000000000002
>> x5 : 0000000000000000
>> x4 : ffff8006656f929c
>> x3 : ffff000027dd3694
>> x2 : 0000000000000018
>> x1 : 0000000000000000
>> x0 : 0000000000000003
>> Call trace:
>> Exception stack(0xffff000027dd3410 to 0xffff000027dd3550)
>> 3400:                                   0000000000000003 0000000000000000
>> 3420: 0000000000000018 ffff000027dd3694 ffff8006656f929c 0000000000000000
>> 3440: 0000000000000002 ffff8006656f9388 0000000000000000 20000000000000c0
>> 3460: 0100010010000001 0000000003000001 0100050010000001 0000000003000001
>> 3480: 000409000c000000 0000155107b31031 ffff00000867d07c 0000ffff9b90ddb0
>> 34a0: 0000000000000014 ffff80069624f000 0000000000000000 ffff80069624f000
>> 34c0: 0000000000000001 0000000000000003 ffff80069624f000 ffff80069624f800
>> 34e0: ffff000027dd3694 ffff80061035ff00 0000000000000076 ffff000027dd3550
>> 3500: ffff0000084d622c ffff000027dd3550 ffff0000084d6324 00000000600001c5
>> 3520: ffff00000921d008 ffff8006159b0d00 0000ffffffffffff ffff0000084d622c
>> 3540: ffff000027dd3550 ffff0000084d6324
>> [<ffff0000084d6324>] ravb_start_xmit+0x138/0x5a0
>> [<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c
>> [<ffff0000086c51c0>] sch_direct_xmit+0xb0/0x1a8
>> [<ffff0000086c54cc>] __qdisc_run+0x214/0x2ec
>> [<ffff00000869a348>] __dev_queue_xmit+0x35c/0x5b4
>> [<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18
>> [<ffff000000b9765c>] register_vlan_dev+0xc74/0x10f8 [8021q]
>> [<ffff000008699d60>] dev_hard_start_xmit+0xa8/0x24c
>> [<ffff00000869a438>] __dev_queue_xmit+0x44c/0x5b4
>> [<ffff00000869a5b0>] dev_queue_xmit+0x10/0x18
>> [<ffff0000086a5b9c>] neigh_connected_output+0xc0/0xe4
>> [<ffff0000086d6bd8>] ip_finish_output2+0x3c0/0x3fc
>> [<ffff0000086d8178>] ip_finish_output+0xf8/0x1c4
>> [<ffff0000086d8b74>] ip_mc_output+0x258/0x308
>> [<ffff0000086d8400>] ip_local_out+0x44/0x54
>> [<ffff0000086d9594>] ip_send_skb+0x1c/0xa8
>> [<ffff0000086fea4c>] udp_send_skb+0x11c/0x244
>> [<ffff000008701084>] udp_sendmsg+0x534/0x6bc
>> [<ffff00000870b390>] inet_sendmsg+0x40/0xe0
>> [<ffff00000867b588>] sock_sendmsg+0x3c/0x58
>> [<ffff00000867bf18>] ___sys_sendmsg+0x228/0x278
>> [<ffff00000867d03c>] __sys_sendmsg+0x58/0x98
>> [<ffff00000867d08c>] SyS_sendmsg+0x10/0x20
>> Exception stack(0xffff000027dd3ec0 to 0xffff000027dd4000)
>> 3ec0: 0000000000000012 0000ffff8a7fdf18 0000000000004000 0000000000000000
>> 3ee0: 0000ffff8a7ff258 0000ffff8a7ff150 0000ffff8a7ff840 0000000000000000
>> 3f00: 00000000000000d3 0100010010000001 0000000003000001 0100050010000001
>> 3f20: 0000000003000001 000409000c000000 0000000000000047 0000155107b31031
>> 3f40: 0000ffff9b67dfb8 0000ffff9b90ddb0 0000000000000014 0000000000004000
>> 3f60: 0000000000000012 0000ffff8a7fdf18 0000ffff780017b0 0000000000004000
>> 3f80: 0000000000000010 0000000000000001 0000ffff8a7fdf00 0000ffff9000b770
>> 3fa0: 0000000000000012 0000ffff8a7fde60 0000ffff9b90de10 0000ffff8a7fde60
>> 3fc0: 0000ffff9b90de28 0000000080000000 0000000000000012 00000000000000d3
>> 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [<ffff0000080832c0>] el0_svc_naked+0x34/0x38
>> Code: d37d7c01 d37d7c02 f90037a1 f9445f01 (f822683b)
>> ---[ end trace eabda93d178d5bcb ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> SMP: stopping secondary CPUs
>> Kernel Offset: disabled
>> CPU features: 0x1802004
>> Memory Limit: 6144 MB
>> -- cut --
>>
>> Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
>> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>> Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
> [...]
> 
>     You must post this to netdev@vget.kernel.org, else the patch won't 
> get applied, ever...


If I understood correctly, this patch is replaced by

https://patchwork.kernel.org/patch/11570217/

Best regards

Dirk



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18  4:54 [PATCH v2] ravb: On timeout disable IRQs to stop processing Dirk Behme
2020-05-18  5:12 ` Shashikant Suguni (RBEI/ECF1)
2020-05-18  8:36 ` Sergei Shtylyov
2020-05-28  8:12   ` Behme Dirk (CM/ESO2)
2020-05-18  8:44 ` Sergei Shtylyov
2020-05-22  5:57   ` Yoshihiro Shimoda
2020-05-22 11:16     ` Yoshihiro Shimoda
2020-05-22 11:58       ` Yoshihiro Shimoda
2020-05-26  9:51         ` Yoshihiro Shimoda

Linux-Renesas-SoC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-renesas-soc/0 linux-renesas-soc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-renesas-soc linux-renesas-soc/ https://lore.kernel.org/linux-renesas-soc \
		linux-renesas-soc@vger.kernel.org
	public-inbox-index linux-renesas-soc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-renesas-soc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git