linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
@ 2022-03-10  4:53 Jeremy Linton
  2022-03-10 17:29 ` Peter Robinson
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jeremy Linton @ 2022-03-10  4:53 UTC (permalink / raw)
  To: netdev
  Cc: opendmb, f.fainelli, davem, kuba, bcm-kernel-feedback-list,
	linux-kernel, Jeremy Linton, Peter Robinson

GCC12 appears to be much smarter about its dependency tracking and is
aware that the relaxed variants are just normal loads and stores and
this is causing problems like:

[  210.074549] ------------[ cut here ]------------
[  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue 1 timed out
[  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
[  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat]
[  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
[  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E     5.17.0-rc7G12+ #58
[  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read perf counters
[  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
[  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  210.161358] pc : dev_watchdog+0x234/0x240
[  210.161364] lr : dev_watchdog+0x234/0x240
[  210.161368] sp : ffff8000080a3a40
[  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: ffff8000080a3b20
[  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: ffffcd425af8ec08
[  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: ffff55b142688000
[  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: ffffffffffffffff
[  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 6d736e617274203a
[  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: ffffcd425b07d5f0
[  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : ffffcd4258bdad9c
[  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000
[  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000
[  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 0000000000000044
[  210.269682] Call trace:
[  210.272133]  dev_watchdog+0x234/0x240
[  210.275811]  call_timer_fn+0x3c/0x15c
[  210.279489]  __run_timers.part.0+0x288/0x310
[  210.283777]  run_timer_softirq+0x48/0x80
[  210.287716]  __do_softirq+0x128/0x360
[  210.291392]  __irq_exit_rcu+0x138/0x140
[  210.295243]  irq_exit_rcu+0x1c/0x30
[  210.298745]  el1_interrupt+0x38/0x54
[  210.302334]  el1h_64_irq_handler+0x18/0x24
[  210.306445]  el1h_64_irq+0x7c/0x80
[  210.309857]  arch_cpu_idle+0x18/0x2c
[  210.313445]  default_idle_call+0x4c/0x140
[  210.317470]  cpuidle_idle_call+0x14c/0x1a0
[  210.321584]  do_idle+0xb0/0x100
[  210.324737]  cpu_startup_entry+0x30/0x8c
[  210.328675]  secondary_start_kernel+0xe4/0x110
[  210.333138]  __secondary_switched+0x94/0x98

The assumption when these were relaxed seems to be that device memory
would be mapped non reordering, and that other constructs
(spinlocks/etc) would provide the barriers to assure that packet data
and in memory rings/queues were ordered with respect to device
register reads/writes. This itself seems a bit sketchy, but the real
problem with GCC12 is that it is moving the actual reads/writes around
at will as though they were independent operations when in truth they
are not, but the compiler can't know that. When looking at the
assembly dumps for many of these routines its possible to see very
clean, but not strictly in program order operations occurring as the
compiler would be free to do if these weren't actually register
reads/write operations.

Its possible to suppress the timeout with a liberal bit of dma_mb()'s
sprinkled around but the device still seems unable to reliably
send/receive data. A better plan is to use the safer readl/writel
everywhere.

Since this partially reverts an older commit, which notes the use of
the relaxed variants for performance reasons. I would suggest that
any performance problems with this commit are targeted at relaxing only
the performance critical code paths after assuring proper barriers.

Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
Reported-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 87f1056e29ff..e907a2df299c 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -76,7 +76,7 @@ static inline void bcmgenet_writel(u32 value, void __iomem *offset)
 	if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
 		__raw_writel(value, offset);
 	else
-		writel_relaxed(value, offset);
+		writel(value, offset);
 }
 
 static inline u32 bcmgenet_readl(void __iomem *offset)
@@ -84,7 +84,7 @@ static inline u32 bcmgenet_readl(void __iomem *offset)
 	if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
 		return __raw_readl(offset);
 	else
-		return readl_relaxed(offset);
+		return readl(offset);
 }
 
 static inline void dmadesc_set_length_status(struct bcmgenet_priv *priv,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-10  4:53 [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering Jeremy Linton
@ 2022-03-10 17:29 ` Peter Robinson
  2022-03-10 18:59 ` Florian Fainelli
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Peter Robinson @ 2022-03-10 17:29 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: netdev, opendmb, f.fainelli, davem, kuba,
	bcm-kernel-feedback-list, linux-kernel

On Thu, Mar 10, 2022 at 4:54 AM Jeremy Linton <jeremy.linton@arm.com> wrote:
>
> GCC12 appears to be much smarter about its dependency tracking and is
> aware that the relaxed variants are just normal loads and stores and
> this is causing problems like:
>
> [  210.074549] ------------[ cut here ]------------
> [  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue 1 timed out
> [  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
> [  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat]
> [  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
> [  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E     5.17.0-rc7G12+ #58
> [  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read perf counters
> [  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
> [  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  210.161358] pc : dev_watchdog+0x234/0x240
> [  210.161364] lr : dev_watchdog+0x234/0x240
> [  210.161368] sp : ffff8000080a3a40
> [  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: ffff8000080a3b20
> [  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: ffffcd425af8ec08
> [  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: ffff55b142688000
> [  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: ffffffffffffffff
> [  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 6d736e617274203a
> [  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: ffffcd425b07d5f0
> [  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : ffffcd4258bdad9c
> [  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000
> [  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000
> [  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 0000000000000044
> [  210.269682] Call trace:
> [  210.272133]  dev_watchdog+0x234/0x240
> [  210.275811]  call_timer_fn+0x3c/0x15c
> [  210.279489]  __run_timers.part.0+0x288/0x310
> [  210.283777]  run_timer_softirq+0x48/0x80
> [  210.287716]  __do_softirq+0x128/0x360
> [  210.291392]  __irq_exit_rcu+0x138/0x140
> [  210.295243]  irq_exit_rcu+0x1c/0x30
> [  210.298745]  el1_interrupt+0x38/0x54
> [  210.302334]  el1h_64_irq_handler+0x18/0x24
> [  210.306445]  el1h_64_irq+0x7c/0x80
> [  210.309857]  arch_cpu_idle+0x18/0x2c
> [  210.313445]  default_idle_call+0x4c/0x140
> [  210.317470]  cpuidle_idle_call+0x14c/0x1a0
> [  210.321584]  do_idle+0xb0/0x100
> [  210.324737]  cpu_startup_entry+0x30/0x8c
> [  210.328675]  secondary_start_kernel+0xe4/0x110
> [  210.333138]  __secondary_switched+0x94/0x98
>
> The assumption when these were relaxed seems to be that device memory
> would be mapped non reordering, and that other constructs
> (spinlocks/etc) would provide the barriers to assure that packet data
> and in memory rings/queues were ordered with respect to device
> register reads/writes. This itself seems a bit sketchy, but the real
> problem with GCC12 is that it is moving the actual reads/writes around
> at will as though they were independent operations when in truth they
> are not, but the compiler can't know that. When looking at the
> assembly dumps for many of these routines its possible to see very
> clean, but not strictly in program order operations occurring as the
> compiler would be free to do if these weren't actually register
> reads/write operations.
>
> Its possible to suppress the timeout with a liberal bit of dma_mb()'s
> sprinkled around but the device still seems unable to reliably
> send/receive data. A better plan is to use the safer readl/writel
> everywhere.
>
> Since this partially reverts an older commit, which notes the use of
> the relaxed variants for performance reasons. I would suggest that
> any performance problems with this commit are targeted at relaxing only
> the performance critical code paths after assuring proper barriers.
>
> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Peter Robinson <pbrobinson@gmail.com>
Tested-by: Peter Robinson <pbrobinson@gmail.com>

This fixes the issue I'm seeing on Fedora.

> ---
>  drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 87f1056e29ff..e907a2df299c 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -76,7 +76,7 @@ static inline void bcmgenet_writel(u32 value, void __iomem *offset)
>         if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
>                 __raw_writel(value, offset);
>         else
> -               writel_relaxed(value, offset);
> +               writel(value, offset);
>  }
>
>  static inline u32 bcmgenet_readl(void __iomem *offset)
> @@ -84,7 +84,7 @@ static inline u32 bcmgenet_readl(void __iomem *offset)
>         if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
>                 return __raw_readl(offset);
>         else
> -               return readl_relaxed(offset);
> +               return readl(offset);
>  }
>
>  static inline void dmadesc_set_length_status(struct bcmgenet_priv *priv,
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-10  4:53 [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering Jeremy Linton
  2022-03-10 17:29 ` Peter Robinson
@ 2022-03-10 18:59 ` Florian Fainelli
  2022-03-11  1:09   ` Jeremy Linton
  2022-03-18 19:04 ` Florian Fainelli
  2022-03-29 15:07 ` Peter Robinson
  3 siblings, 1 reply; 14+ messages in thread
From: Florian Fainelli @ 2022-03-10 18:59 UTC (permalink / raw)
  To: Jeremy Linton, netdev
  Cc: opendmb, f.fainelli, davem, kuba, bcm-kernel-feedback-list,
	linux-kernel, Peter Robinson

On 3/9/22 8:53 PM, Jeremy Linton wrote:
> GCC12 appears to be much smarter about its dependency tracking and is
> aware that the relaxed variants are just normal loads and stores and
> this is causing problems like:
> 
> [  210.074549] ------------[ cut here ]------------
> [  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue 1 timed out
> [  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
> [  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat]
> [  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
> [  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E     5.17.0-rc7G12+ #58
> [  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read perf counters
> [  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
> [  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  210.161358] pc : dev_watchdog+0x234/0x240
> [  210.161364] lr : dev_watchdog+0x234/0x240
> [  210.161368] sp : ffff8000080a3a40
> [  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: ffff8000080a3b20
> [  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: ffffcd425af8ec08
> [  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: ffff55b142688000
> [  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: ffffffffffffffff
> [  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 6d736e617274203a
> [  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: ffffcd425b07d5f0
> [  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : ffffcd4258bdad9c
> [  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000
> [  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000
> [  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 0000000000000044
> [  210.269682] Call trace:
> [  210.272133]  dev_watchdog+0x234/0x240
> [  210.275811]  call_timer_fn+0x3c/0x15c
> [  210.279489]  __run_timers.part.0+0x288/0x310
> [  210.283777]  run_timer_softirq+0x48/0x80
> [  210.287716]  __do_softirq+0x128/0x360
> [  210.291392]  __irq_exit_rcu+0x138/0x140
> [  210.295243]  irq_exit_rcu+0x1c/0x30
> [  210.298745]  el1_interrupt+0x38/0x54
> [  210.302334]  el1h_64_irq_handler+0x18/0x24
> [  210.306445]  el1h_64_irq+0x7c/0x80
> [  210.309857]  arch_cpu_idle+0x18/0x2c
> [  210.313445]  default_idle_call+0x4c/0x140
> [  210.317470]  cpuidle_idle_call+0x14c/0x1a0
> [  210.321584]  do_idle+0xb0/0x100
> [  210.324737]  cpu_startup_entry+0x30/0x8c
> [  210.328675]  secondary_start_kernel+0xe4/0x110
> [  210.333138]  __secondary_switched+0x94/0x98
> 
> The assumption when these were relaxed seems to be that device memory
> would be mapped non reordering, and that other constructs
> (spinlocks/etc) would provide the barriers to assure that packet data
> and in memory rings/queues were ordered with respect to device
> register reads/writes. This itself seems a bit sketchy, but the real
> problem with GCC12 is that it is moving the actual reads/writes around
> at will as though they were independent operations when in truth they
> are not, but the compiler can't know that. When looking at the
> assembly dumps for many of these routines its possible to see very
> clean, but not strictly in program order operations occurring as the
> compiler would be free to do if these weren't actually register
> reads/write operations.
> 
> Its possible to suppress the timeout with a liberal bit of dma_mb()'s
> sprinkled around but the device still seems unable to reliably
> send/receive data. A better plan is to use the safer readl/writel
> everywhere.
> 
> Since this partially reverts an older commit, which notes the use of
> the relaxed variants for performance reasons. I would suggest that
> any performance problems with this commit are targeted at relaxing only
> the performance critical code paths after assuring proper barriers.
> 
> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

I think this is the correct approach in that it favors correctness over
speed, however there is an opportunity for maintaining the speed and
correctness on non-2711 and non-7712 chips where the GENET core is
interfaced to a system bus (GISB) that guarantees no re-ordering and no
buffering. I suppose that until we prove that the extra barrier is
harmful to performance on those chips, we should go with your patch.

It seems like we missed the GENET_IO_MACRO() in bcmgenet.h, while most
of them deal with the control path which likely does not have any
re-ordering problem, there is an exception to that which are the
intrl2_0 and intrl2_1 macros, which I believe *have* to be ordered as
well in order to avoid spurious or missed interrupts, or maybe there is
enough barriers in the interrupt processing code that this is moot?
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-10 18:59 ` Florian Fainelli
@ 2022-03-11  1:09   ` Jeremy Linton
  2022-03-11  3:57     ` Florian Fainelli
  0 siblings, 1 reply; 14+ messages in thread
From: Jeremy Linton @ 2022-03-11  1:09 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: opendmb, davem, kuba, bcm-kernel-feedback-list, linux-kernel,
	Peter Robinson

On 3/10/22 12:59, Florian Fainelli wrote:
> On 3/9/22 8:53 PM, Jeremy Linton wrote:
>> GCC12 appears to be much smarter about its dependency tracking and is
>> aware that the relaxed variants are just normal loads and stores and
>> this is causing problems like:
>>
>> [  210.074549] ------------[ cut here ]------------
>> [  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue 1 timed out
>> [  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
>> [  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat]
>> [  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
>> [  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E     5.17.0-rc7G12+ #58
>> [  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read perf counters
>> [  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
>> [  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [  210.161358] pc : dev_watchdog+0x234/0x240
>> [  210.161364] lr : dev_watchdog+0x234/0x240
>> [  210.161368] sp : ffff8000080a3a40
>> [  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: ffff8000080a3b20
>> [  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: ffffcd425af8ec08
>> [  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: ffff55b142688000
>> [  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: ffffffffffffffff
>> [  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 6d736e617274203a
>> [  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: ffffcd425b07d5f0
>> [  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : ffffcd4258bdad9c
>> [  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000
>> [  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000
>> [  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 0000000000000044
>> [  210.269682] Call trace:
>> [  210.272133]  dev_watchdog+0x234/0x240
>> [  210.275811]  call_timer_fn+0x3c/0x15c
>> [  210.279489]  __run_timers.part.0+0x288/0x310
>> [  210.283777]  run_timer_softirq+0x48/0x80
>> [  210.287716]  __do_softirq+0x128/0x360
>> [  210.291392]  __irq_exit_rcu+0x138/0x140
>> [  210.295243]  irq_exit_rcu+0x1c/0x30
>> [  210.298745]  el1_interrupt+0x38/0x54
>> [  210.302334]  el1h_64_irq_handler+0x18/0x24
>> [  210.306445]  el1h_64_irq+0x7c/0x80
>> [  210.309857]  arch_cpu_idle+0x18/0x2c
>> [  210.313445]  default_idle_call+0x4c/0x140
>> [  210.317470]  cpuidle_idle_call+0x14c/0x1a0
>> [  210.321584]  do_idle+0xb0/0x100
>> [  210.324737]  cpu_startup_entry+0x30/0x8c
>> [  210.328675]  secondary_start_kernel+0xe4/0x110
>> [  210.333138]  __secondary_switched+0x94/0x98
>>
>> The assumption when these were relaxed seems to be that device memory
>> would be mapped non reordering, and that other constructs
>> (spinlocks/etc) would provide the barriers to assure that packet data
>> and in memory rings/queues were ordered with respect to device
>> register reads/writes. This itself seems a bit sketchy, but the real
>> problem with GCC12 is that it is moving the actual reads/writes around
>> at will as though they were independent operations when in truth they
>> are not, but the compiler can't know that. When looking at the
>> assembly dumps for many of these routines its possible to see very
>> clean, but not strictly in program order operations occurring as the
>> compiler would be free to do if these weren't actually register
>> reads/write operations.
>>
>> Its possible to suppress the timeout with a liberal bit of dma_mb()'s
>> sprinkled around but the device still seems unable to reliably
>> send/receive data. A better plan is to use the safer readl/writel
>> everywhere.
>>
>> Since this partially reverts an older commit, which notes the use of
>> the relaxed variants for performance reasons. I would suggest that
>> any performance problems with this commit are targeted at relaxing only
>> the performance critical code paths after assuring proper barriers.
>>
>> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
>> Reported-by: Peter Robinson <pbrobinson@gmail.com>
>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> 
> I think this is the correct approach in that it favors correctness over
> speed, however there is an opportunity for maintaining the speed and
> correctness on non-2711 and non-7712 chips where the GENET core is
> interfaced to a system bus (GISB) that guarantees no re-ordering and no
> buffering. I suppose that until we prove that the extra barrier is
> harmful to performance on those chips, we should go with your patch.
> 
> It seems like we missed the GENET_IO_MACRO() in bcmgenet.h, while most
> of them deal with the control path which likely does not have any
> re-ordering problem, there is an exception to that which are the
> intrl2_0 and intrl2_1 macros, which I believe *have* to be ordered as
> well in order to avoid spurious or missed interrupts, or maybe there is
> enough barriers in the interrupt processing code that this is moot?


Ok, so I spent some time and tracked down exactly which barrier "fixes" 
this immediate problem on the rpi4.

static void bcmgenet_enable_dma(struct bcmgenet_priv *priv, u32 dma_ctrl)
  {
         u32 reg;

+       dma_mb(); //timeout fix
         reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
         reg |= dma_ctrl;


fixes it as well, and keeps all the existing code. Although, granted I 
didn't stress the adapter beyond a couple interactive ssh sessions. And 
as you mention there are a fair number of other accessors that I didn't 
touch which are still relaxed.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-11  1:09   ` Jeremy Linton
@ 2022-03-11  3:57     ` Florian Fainelli
  2022-03-15 15:29       ` Peter Robinson
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Fainelli @ 2022-03-11  3:57 UTC (permalink / raw)
  To: Jeremy Linton, netdev
  Cc: opendmb, davem, kuba, bcm-kernel-feedback-list, linux-kernel,
	Peter Robinson



On 3/10/2022 5:09 PM, Jeremy Linton wrote:
> On 3/10/22 12:59, Florian Fainelli wrote:
>> On 3/9/22 8:53 PM, Jeremy Linton wrote:
>>> GCC12 appears to be much smarter about its dependency tracking and is
>>> aware that the relaxed variants are just normal loads and stores and
>>> this is causing problems like:
>>>
>>> [  210.074549] ------------[ cut here ]------------
>>> [  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit 
>>> queue 1 timed out
>>> [  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 
>>> dev_watchdog+0x234/0x240
>>> [  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 
>>> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
>>> nft_reject nft_ct nft_chain_nat]
>>> [  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
>>> [  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            
>>> E     5.17.0-rc7G12+ #58
>>> [  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read 
>>> perf counters
>>> [  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 
>>> Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
>>> [  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS 
>>> BTYPE=--)
>>> [  210.161358] pc : dev_watchdog+0x234/0x240
>>> [  210.161364] lr : dev_watchdog+0x234/0x240
>>> [  210.161368] sp : ffff8000080a3a40
>>> [  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: 
>>> ffff8000080a3b20
>>> [  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: 
>>> ffffcd425af8ec08
>>> [  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: 
>>> ffff55b142688000
>>> [  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: 
>>> ffffffffffffffff
>>> [  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 
>>> 6d736e617274203a
>>> [  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: 
>>> ffffcd425b07d5f0
>>> [  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : 
>>> ffffcd4258bdad9c
>>> [  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 
>>> 0000000000000000
>>> [  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 
>>> 0000000000001000
>>> [  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 
>>> 0000000000000044
>>> [  210.269682] Call trace:
>>> [  210.272133]  dev_watchdog+0x234/0x240
>>> [  210.275811]  call_timer_fn+0x3c/0x15c
>>> [  210.279489]  __run_timers.part.0+0x288/0x310
>>> [  210.283777]  run_timer_softirq+0x48/0x80
>>> [  210.287716]  __do_softirq+0x128/0x360
>>> [  210.291392]  __irq_exit_rcu+0x138/0x140
>>> [  210.295243]  irq_exit_rcu+0x1c/0x30
>>> [  210.298745]  el1_interrupt+0x38/0x54
>>> [  210.302334]  el1h_64_irq_handler+0x18/0x24
>>> [  210.306445]  el1h_64_irq+0x7c/0x80
>>> [  210.309857]  arch_cpu_idle+0x18/0x2c
>>> [  210.313445]  default_idle_call+0x4c/0x140
>>> [  210.317470]  cpuidle_idle_call+0x14c/0x1a0
>>> [  210.321584]  do_idle+0xb0/0x100
>>> [  210.324737]  cpu_startup_entry+0x30/0x8c
>>> [  210.328675]  secondary_start_kernel+0xe4/0x110
>>> [  210.333138]  __secondary_switched+0x94/0x98
>>>
>>> The assumption when these were relaxed seems to be that device memory
>>> would be mapped non reordering, and that other constructs
>>> (spinlocks/etc) would provide the barriers to assure that packet data
>>> and in memory rings/queues were ordered with respect to device
>>> register reads/writes. This itself seems a bit sketchy, but the real
>>> problem with GCC12 is that it is moving the actual reads/writes around
>>> at will as though they were independent operations when in truth they
>>> are not, but the compiler can't know that. When looking at the
>>> assembly dumps for many of these routines its possible to see very
>>> clean, but not strictly in program order operations occurring as the
>>> compiler would be free to do if these weren't actually register
>>> reads/write operations.
>>>
>>> Its possible to suppress the timeout with a liberal bit of dma_mb()'s
>>> sprinkled around but the device still seems unable to reliably
>>> send/receive data. A better plan is to use the safer readl/writel
>>> everywhere.
>>>
>>> Since this partially reverts an older commit, which notes the use of
>>> the relaxed variants for performance reasons. I would suggest that
>>> any performance problems with this commit are targeted at relaxing only
>>> the performance critical code paths after assuring proper barriers.
>>>
>>> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
>>> Reported-by: Peter Robinson <pbrobinson@gmail.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>
>> I think this is the correct approach in that it favors correctness over
>> speed, however there is an opportunity for maintaining the speed and
>> correctness on non-2711 and non-7712 chips where the GENET core is
>> interfaced to a system bus (GISB) that guarantees no re-ordering and no
>> buffering. I suppose that until we prove that the extra barrier is
>> harmful to performance on those chips, we should go with your patch.
>>
>> It seems like we missed the GENET_IO_MACRO() in bcmgenet.h, while most
>> of them deal with the control path which likely does not have any
>> re-ordering problem, there is an exception to that which are the
>> intrl2_0 and intrl2_1 macros, which I believe *have* to be ordered as
>> well in order to avoid spurious or missed interrupts, or maybe there is
>> enough barriers in the interrupt processing code that this is moot?
> 
> 
> Ok, so I spent some time and tracked down exactly which barrier "fixes" 
> this immediate problem on the rpi4.
> 
> static void bcmgenet_enable_dma(struct bcmgenet_priv *priv, u32 dma_ctrl)
>   {
>          u32 reg;
> 
> +       dma_mb(); //timeout fix
>          reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
>          reg |= dma_ctrl;
> 
> 
> fixes it as well, and keeps all the existing code. Although, granted I 
> didn't stress the adapter beyond a couple interactive ssh sessions. And 
> as you mention there are a fair number of other accessors that I didn't 
> touch which are still relaxed.

Thanks! This is really helpful. Doug told me earlier today that he 
wanted to take a closer look since your initial approach while correct 
appears a bit heavy handed.
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-11  3:57     ` Florian Fainelli
@ 2022-03-15 15:29       ` Peter Robinson
  2022-03-18 19:01         ` Florian Fainelli
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Robinson @ 2022-03-15 15:29 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jeremy Linton, netdev, opendmb, davem, kuba,
	bcm-kernel-feedback-list, linux-kernel

On Fri, Mar 11, 2022 at 3:57 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 3/10/2022 5:09 PM, Jeremy Linton wrote:
> > On 3/10/22 12:59, Florian Fainelli wrote:
> >> On 3/9/22 8:53 PM, Jeremy Linton wrote:
> >>> GCC12 appears to be much smarter about its dependency tracking and is
> >>> aware that the relaxed variants are just normal loads and stores and
> >>> this is causing problems like:
> >>>
> >>> [  210.074549] ------------[ cut here ]------------
> >>> [  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit
> >>> queue 1 timed out
> >>> [  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529
> >>> dev_watchdog+0x234/0x240
> >>> [  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4
> >>> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> >>> nft_reject nft_ct nft_chain_nat]
> >>> [  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
> >>> [  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G
> >>> E     5.17.0-rc7G12+ #58
> >>> [  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read
> >>> perf counters
> >>> [  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4
> >>> Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
> >>> [  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
> >>> BTYPE=--)
> >>> [  210.161358] pc : dev_watchdog+0x234/0x240
> >>> [  210.161364] lr : dev_watchdog+0x234/0x240
> >>> [  210.161368] sp : ffff8000080a3a40
> >>> [  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27:
> >>> ffff8000080a3b20
> >>> [  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24:
> >>> ffffcd425af8ec08
> >>> [  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21:
> >>> ffff55b142688000
> >>> [  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18:
> >>> ffffffffffffffff
> >>> [  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15:
> >>> 6d736e617274203a
> >>> [  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12:
> >>> ffffcd425b07d5f0
> >>> [  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 :
> >>> ffffcd4258bdad9c
> >>> [  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 :
> >>> 0000000000000000
> >>> [  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
> >>> 0000000000001000
> >>> [  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 :
> >>> 0000000000000044
> >>> [  210.269682] Call trace:
> >>> [  210.272133]  dev_watchdog+0x234/0x240
> >>> [  210.275811]  call_timer_fn+0x3c/0x15c
> >>> [  210.279489]  __run_timers.part.0+0x288/0x310
> >>> [  210.283777]  run_timer_softirq+0x48/0x80
> >>> [  210.287716]  __do_softirq+0x128/0x360
> >>> [  210.291392]  __irq_exit_rcu+0x138/0x140
> >>> [  210.295243]  irq_exit_rcu+0x1c/0x30
> >>> [  210.298745]  el1_interrupt+0x38/0x54
> >>> [  210.302334]  el1h_64_irq_handler+0x18/0x24
> >>> [  210.306445]  el1h_64_irq+0x7c/0x80
> >>> [  210.309857]  arch_cpu_idle+0x18/0x2c
> >>> [  210.313445]  default_idle_call+0x4c/0x140
> >>> [  210.317470]  cpuidle_idle_call+0x14c/0x1a0
> >>> [  210.321584]  do_idle+0xb0/0x100
> >>> [  210.324737]  cpu_startup_entry+0x30/0x8c
> >>> [  210.328675]  secondary_start_kernel+0xe4/0x110
> >>> [  210.333138]  __secondary_switched+0x94/0x98
> >>>
> >>> The assumption when these were relaxed seems to be that device memory
> >>> would be mapped non reordering, and that other constructs
> >>> (spinlocks/etc) would provide the barriers to assure that packet data
> >>> and in memory rings/queues were ordered with respect to device
> >>> register reads/writes. This itself seems a bit sketchy, but the real
> >>> problem with GCC12 is that it is moving the actual reads/writes around
> >>> at will as though they were independent operations when in truth they
> >>> are not, but the compiler can't know that. When looking at the
> >>> assembly dumps for many of these routines its possible to see very
> >>> clean, but not strictly in program order operations occurring as the
> >>> compiler would be free to do if these weren't actually register
> >>> reads/write operations.
> >>>
> >>> Its possible to suppress the timeout with a liberal bit of dma_mb()'s
> >>> sprinkled around but the device still seems unable to reliably
> >>> send/receive data. A better plan is to use the safer readl/writel
> >>> everywhere.
> >>>
> >>> Since this partially reverts an older commit, which notes the use of
> >>> the relaxed variants for performance reasons. I would suggest that
> >>> any performance problems with this commit are targeted at relaxing only
> >>> the performance critical code paths after assuring proper barriers.
> >>>
> >>> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
> >>> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> >>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> >>
> >> I think this is the correct approach in that it favors correctness over
> >> speed, however there is an opportunity for maintaining the speed and
> >> correctness on non-2711 and non-7712 chips where the GENET core is
> >> interfaced to a system bus (GISB) that guarantees no re-ordering and no
> >> buffering. I suppose that until we prove that the extra barrier is
> >> harmful to performance on those chips, we should go with your patch.
> >>
> >> It seems like we missed the GENET_IO_MACRO() in bcmgenet.h, while most
> >> of them deal with the control path which likely does not have any
> >> re-ordering problem, there is an exception to that which are the
> >> intrl2_0 and intrl2_1 macros, which I believe *have* to be ordered as
> >> well in order to avoid spurious or missed interrupts, or maybe there is
> >> enough barriers in the interrupt processing code that this is moot?
> >
> >
> > Ok, so I spent some time and tracked down exactly which barrier "fixes"
> > this immediate problem on the rpi4.
> >
> > static void bcmgenet_enable_dma(struct bcmgenet_priv *priv, u32 dma_ctrl)
> >   {
> >          u32 reg;
> >
> > +       dma_mb(); //timeout fix
> >          reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
> >          reg |= dma_ctrl;
> >
> >
> > fixes it as well, and keeps all the existing code. Although, granted I
> > didn't stress the adapter beyond a couple interactive ssh sessions. And
> > as you mention there are a fair number of other accessors that I didn't
> > touch which are still relaxed.
>
> Thanks! This is really helpful. Doug told me earlier today that he
> wanted to take a closer look since your initial approach while correct
> appears a bit heavy handed.

With 5.17 due in a couple of days could we get a fix in so it works
for users and optimise the approach with a follow up so that it's not
broken for common device?

Peter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-15 15:29       ` Peter Robinson
@ 2022-03-18 19:01         ` Florian Fainelli
  2022-03-18 19:20           ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Fainelli @ 2022-03-18 19:01 UTC (permalink / raw)
  To: Peter Robinson, Florian Fainelli
  Cc: Jeremy Linton, netdev, opendmb, davem, kuba,
	bcm-kernel-feedback-list, linux-kernel

On 3/15/22 8:29 AM, Peter Robinson wrote:
> On Fri, Mar 11, 2022 at 3:57 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>
>>
>> On 3/10/2022 5:09 PM, Jeremy Linton wrote:
>>> On 3/10/22 12:59, Florian Fainelli wrote:
>>>> On 3/9/22 8:53 PM, Jeremy Linton wrote:
>>>>> GCC12 appears to be much smarter about its dependency tracking and is
>>>>> aware that the relaxed variants are just normal loads and stores and
>>>>> this is causing problems like:
>>>>>
>>>>> [  210.074549] ------------[ cut here ]------------
>>>>> [  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit
>>>>> queue 1 timed out
>>>>> [  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529
>>>>> dev_watchdog+0x234/0x240
>>>>> [  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4
>>>>> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
>>>>> nft_reject nft_ct nft_chain_nat]
>>>>> [  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
>>>>> [  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G
>>>>> E     5.17.0-rc7G12+ #58
>>>>> [  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read
>>>>> perf counters
>>>>> [  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4
>>>>> Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
>>>>> [  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
>>>>> BTYPE=--)
>>>>> [  210.161358] pc : dev_watchdog+0x234/0x240
>>>>> [  210.161364] lr : dev_watchdog+0x234/0x240
>>>>> [  210.161368] sp : ffff8000080a3a40
>>>>> [  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27:
>>>>> ffff8000080a3b20
>>>>> [  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24:
>>>>> ffffcd425af8ec08
>>>>> [  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21:
>>>>> ffff55b142688000
>>>>> [  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18:
>>>>> ffffffffffffffff
>>>>> [  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15:
>>>>> 6d736e617274203a
>>>>> [  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12:
>>>>> ffffcd425b07d5f0
>>>>> [  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 :
>>>>> ffffcd4258bdad9c
>>>>> [  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 :
>>>>> 0000000000000000
>>>>> [  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
>>>>> 0000000000001000
>>>>> [  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 :
>>>>> 0000000000000044
>>>>> [  210.269682] Call trace:
>>>>> [  210.272133]  dev_watchdog+0x234/0x240
>>>>> [  210.275811]  call_timer_fn+0x3c/0x15c
>>>>> [  210.279489]  __run_timers.part.0+0x288/0x310
>>>>> [  210.283777]  run_timer_softirq+0x48/0x80
>>>>> [  210.287716]  __do_softirq+0x128/0x360
>>>>> [  210.291392]  __irq_exit_rcu+0x138/0x140
>>>>> [  210.295243]  irq_exit_rcu+0x1c/0x30
>>>>> [  210.298745]  el1_interrupt+0x38/0x54
>>>>> [  210.302334]  el1h_64_irq_handler+0x18/0x24
>>>>> [  210.306445]  el1h_64_irq+0x7c/0x80
>>>>> [  210.309857]  arch_cpu_idle+0x18/0x2c
>>>>> [  210.313445]  default_idle_call+0x4c/0x140
>>>>> [  210.317470]  cpuidle_idle_call+0x14c/0x1a0
>>>>> [  210.321584]  do_idle+0xb0/0x100
>>>>> [  210.324737]  cpu_startup_entry+0x30/0x8c
>>>>> [  210.328675]  secondary_start_kernel+0xe4/0x110
>>>>> [  210.333138]  __secondary_switched+0x94/0x98
>>>>>
>>>>> The assumption when these were relaxed seems to be that device memory
>>>>> would be mapped non reordering, and that other constructs
>>>>> (spinlocks/etc) would provide the barriers to assure that packet data
>>>>> and in memory rings/queues were ordered with respect to device
>>>>> register reads/writes. This itself seems a bit sketchy, but the real
>>>>> problem with GCC12 is that it is moving the actual reads/writes around
>>>>> at will as though they were independent operations when in truth they
>>>>> are not, but the compiler can't know that. When looking at the
>>>>> assembly dumps for many of these routines its possible to see very
>>>>> clean, but not strictly in program order operations occurring as the
>>>>> compiler would be free to do if these weren't actually register
>>>>> reads/write operations.
>>>>>
>>>>> Its possible to suppress the timeout with a liberal bit of dma_mb()'s
>>>>> sprinkled around but the device still seems unable to reliably
>>>>> send/receive data. A better plan is to use the safer readl/writel
>>>>> everywhere.
>>>>>
>>>>> Since this partially reverts an older commit, which notes the use of
>>>>> the relaxed variants for performance reasons. I would suggest that
>>>>> any performance problems with this commit are targeted at relaxing only
>>>>> the performance critical code paths after assuring proper barriers.
>>>>>
>>>>> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
>>>>> Reported-by: Peter Robinson <pbrobinson@gmail.com>
>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>>>
>>>> I think this is the correct approach in that it favors correctness over
>>>> speed, however there is an opportunity for maintaining the speed and
>>>> correctness on non-2711 and non-7712 chips where the GENET core is
>>>> interfaced to a system bus (GISB) that guarantees no re-ordering and no
>>>> buffering. I suppose that until we prove that the extra barrier is
>>>> harmful to performance on those chips, we should go with your patch.
>>>>
>>>> It seems like we missed the GENET_IO_MACRO() in bcmgenet.h, while most
>>>> of them deal with the control path which likely does not have any
>>>> re-ordering problem, there is an exception to that which are the
>>>> intrl2_0 and intrl2_1 macros, which I believe *have* to be ordered as
>>>> well in order to avoid spurious or missed interrupts, or maybe there is
>>>> enough barriers in the interrupt processing code that this is moot?
>>>
>>>
>>> Ok, so I spent some time and tracked down exactly which barrier "fixes"
>>> this immediate problem on the rpi4.
>>>
>>> static void bcmgenet_enable_dma(struct bcmgenet_priv *priv, u32 dma_ctrl)
>>>   {
>>>          u32 reg;
>>>
>>> +       dma_mb(); //timeout fix
>>>          reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
>>>          reg |= dma_ctrl;
>>>
>>>
>>> fixes it as well, and keeps all the existing code. Although, granted I
>>> didn't stress the adapter beyond a couple interactive ssh sessions. And
>>> as you mention there are a fair number of other accessors that I didn't
>>> touch which are still relaxed.
>>
>> Thanks! This is really helpful. Doug told me earlier today that he
>> wanted to take a closer look since your initial approach while correct
>> appears a bit heavy handed.
> 
> With 5.17 due in a couple of days could we get a fix in so it works
> for users and optimise the approach with a follow up so that it's not
> broken for common device?

Given the time crunch we should go with Jeremy's patch that uses
stronger I/O access method and then we will work with Jeremy offline to
make sure that our version of GCC12 is exactly the same as his, as well
as the compiler options (like -mtune/-march) to reproduce this.

If we believe this is only a problem with GCC12 and 5.17 in Fedora, then
I would be inclined to remove the Fixes tag such that when we come up
with a more localized solution we do not have to revert "net: bcmgenet:
Use stronger register read/writes to assure ordering" from stable
branches. This would be mostly a courtesy to our future selves, but an
argument could be made that this probably has always existed, and that
different compilers could behave more or less like GCC12.
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-10  4:53 [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering Jeremy Linton
  2022-03-10 17:29 ` Peter Robinson
  2022-03-10 18:59 ` Florian Fainelli
@ 2022-03-18 19:04 ` Florian Fainelli
  2022-03-21 21:26   ` Jakub Kicinski
  2022-03-29 15:07 ` Peter Robinson
  3 siblings, 1 reply; 14+ messages in thread
From: Florian Fainelli @ 2022-03-18 19:04 UTC (permalink / raw)
  To: Jeremy Linton, netdev
  Cc: opendmb, f.fainelli, davem, kuba, bcm-kernel-feedback-list,
	linux-kernel, Peter Robinson

On 3/9/22 8:53 PM, Jeremy Linton wrote:
> GCC12 appears to be much smarter about its dependency tracking and is
> aware that the relaxed variants are just normal loads and stores and
> this is causing problems like:
> 
> [  210.074549] ------------[ cut here ]------------
> [  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue 1 timed out
> [  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
> [  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat]
> [  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
> [  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E     5.17.0-rc7G12+ #58
> [  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read perf counters
> [  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
> [  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  210.161358] pc : dev_watchdog+0x234/0x240
> [  210.161364] lr : dev_watchdog+0x234/0x240
> [  210.161368] sp : ffff8000080a3a40
> [  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: ffff8000080a3b20
> [  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: ffffcd425af8ec08
> [  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: ffff55b142688000
> [  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: ffffffffffffffff
> [  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 6d736e617274203a
> [  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: ffffcd425b07d5f0
> [  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : ffffcd4258bdad9c
> [  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000
> [  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000
> [  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 0000000000000044
> [  210.269682] Call trace:
> [  210.272133]  dev_watchdog+0x234/0x240
> [  210.275811]  call_timer_fn+0x3c/0x15c
> [  210.279489]  __run_timers.part.0+0x288/0x310
> [  210.283777]  run_timer_softirq+0x48/0x80
> [  210.287716]  __do_softirq+0x128/0x360
> [  210.291392]  __irq_exit_rcu+0x138/0x140
> [  210.295243]  irq_exit_rcu+0x1c/0x30
> [  210.298745]  el1_interrupt+0x38/0x54
> [  210.302334]  el1h_64_irq_handler+0x18/0x24
> [  210.306445]  el1h_64_irq+0x7c/0x80
> [  210.309857]  arch_cpu_idle+0x18/0x2c
> [  210.313445]  default_idle_call+0x4c/0x140
> [  210.317470]  cpuidle_idle_call+0x14c/0x1a0
> [  210.321584]  do_idle+0xb0/0x100
> [  210.324737]  cpu_startup_entry+0x30/0x8c
> [  210.328675]  secondary_start_kernel+0xe4/0x110
> [  210.333138]  __secondary_switched+0x94/0x98
> 
> The assumption when these were relaxed seems to be that device memory
> would be mapped non reordering, and that other constructs
> (spinlocks/etc) would provide the barriers to assure that packet data
> and in memory rings/queues were ordered with respect to device
> register reads/writes. This itself seems a bit sketchy, but the real
> problem with GCC12 is that it is moving the actual reads/writes around
> at will as though they were independent operations when in truth they
> are not, but the compiler can't know that. When looking at the
> assembly dumps for many of these routines its possible to see very
> clean, but not strictly in program order operations occurring as the
> compiler would be free to do if these weren't actually register
> reads/write operations.
> 
> Its possible to suppress the timeout with a liberal bit of dma_mb()'s
> sprinkled around but the device still seems unable to reliably
> send/receive data. A better plan is to use the safer readl/writel
> everywhere.
> 
> Since this partially reverts an older commit, which notes the use of
> the relaxed variants for performance reasons. I would suggest that
> any performance problems with this commit are targeted at relaxing only
> the performance critical code paths after assuring proper barriers.
> 
> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-18 19:01         ` Florian Fainelli
@ 2022-03-18 19:20           ` Jakub Kicinski
  2022-03-18 21:26             ` Florian Fainelli
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2022-03-18 19:20 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Peter Robinson, Jeremy Linton, netdev, opendmb, davem,
	bcm-kernel-feedback-list, linux-kernel

On Fri, 18 Mar 2022 12:01:20 -0700 Florian Fainelli wrote:
> Given the time crunch we should go with Jeremy's patch that uses
> stronger I/O access method and then we will work with Jeremy offline to
> make sure that our version of GCC12 is exactly the same as his, as well
> as the compiler options (like -mtune/-march) to reproduce this.
> 
> If we believe this is only a problem with GCC12 and 5.17 in Fedora, then
> I would be inclined to remove the Fixes tag such that when we come up
> with a more localized solution we do not have to revert "net: bcmgenet:
> Use stronger register read/writes to assure ordering" from stable
> branches. This would be mostly a courtesy to our future selves, but an
> argument could be made that this probably has always existed, and that
> different compilers could behave more or less like GCC12.

Are you expecting this patch to make 5.17? If Linus cuts final this
weekend, as he most likely will, unless we do something special the
patch in question will end up getting merged during the merge window.
Without the Fixes tag you'll need to manually instruct Greg to pull 
it in. Is that the plan?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-18 19:20           ` Jakub Kicinski
@ 2022-03-18 21:26             ` Florian Fainelli
  2022-03-19  0:22               ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Fainelli @ 2022-03-18 21:26 UTC (permalink / raw)
  To: Jakub Kicinski, Florian Fainelli
  Cc: Peter Robinson, Jeremy Linton, netdev, opendmb, davem,
	bcm-kernel-feedback-list, linux-kernel

On 3/18/22 12:20 PM, Jakub Kicinski wrote:
> On Fri, 18 Mar 2022 12:01:20 -0700 Florian Fainelli wrote:
>> Given the time crunch we should go with Jeremy's patch that uses
>> stronger I/O access method and then we will work with Jeremy offline to
>> make sure that our version of GCC12 is exactly the same as his, as well
>> as the compiler options (like -mtune/-march) to reproduce this.
>>
>> If we believe this is only a problem with GCC12 and 5.17 in Fedora, then
>> I would be inclined to remove the Fixes tag such that when we come up
>> with a more localized solution we do not have to revert "net: bcmgenet:
>> Use stronger register read/writes to assure ordering" from stable
>> branches. This would be mostly a courtesy to our future selves, but an
>> argument could be made that this probably has always existed, and that
>> different compilers could behave more or less like GCC12.
> 
> Are you expecting this patch to make 5.17? If Linus cuts final this
> weekend, as he most likely will, unless we do something special the
> patch in question will end up getting merged during the merge window.
> Without the Fixes tag you'll need to manually instruct Greg to pull 
> it in. Is that the plan?

Maybe I should have refrained from making that comment after all :)
Having the Fixes: tag dramatically helps with getting this patch applied
all the way to the relevant stable trees and surely correctness over
speed should prevail. If we want to restore the performance loss (with
the onus on Doug and I to prove that there is a performance drop), then
we could send a fix with the appropriate localized barrier followed by a
revert of Jeremy's patch. And if we cared about getting those two
patches applied to stable, we would tag them with the appropriate Fixes tag.

It looks like there are a few 'net' changes that showed up, are you
going to send a pull request to Linus before 5.17 final is cut?
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-18 21:26             ` Florian Fainelli
@ 2022-03-19  0:22               ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2022-03-19  0:22 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Peter Robinson, Jeremy Linton, netdev, opendmb, davem,
	bcm-kernel-feedback-list, linux-kernel

On Fri, 18 Mar 2022 14:26:36 -0700 Florian Fainelli wrote:
> Maybe I should have refrained from making that comment after all :)
> Having the Fixes: tag dramatically helps with getting this patch applied
> all the way to the relevant stable trees and surely correctness over
> speed should prevail. If we want to restore the performance loss (with
> the onus on Doug and I to prove that there is a performance drop), then
> we could send a fix with the appropriate localized barrier followed by a
> revert of Jeremy's patch. And if we cared about getting those two
> patches applied to stable, we would tag them with the appropriate Fixes tag.
> 
> It looks like there are a few 'net' changes that showed up, are you
> going to send a pull request to Linus before 5.17 final is cut?

Not unless there's -rc9. We'll just merge net into net-next before
submitting net-next, most likely.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-18 19:04 ` Florian Fainelli
@ 2022-03-21 21:26   ` Jakub Kicinski
  2022-03-21 21:28     ` Florian Fainelli
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2022-03-21 21:26 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jeremy Linton, netdev, opendmb, davem, bcm-kernel-feedback-list,
	linux-kernel, Peter Robinson

On Fri, 18 Mar 2022 12:04:34 -0700 Florian Fainelli wrote:
> On 3/9/22 8:53 PM, Jeremy Linton wrote:
> > GCC12 appears to be much smarter about its dependency tracking and is
> > aware that the relaxed variants are just normal loads and stores and
> > this is causing problems like:

> > Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
> > Reported-by: Peter Robinson <pbrobinson@gmail.com>
> > Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>  
> 
> Acked-by: Florian Fainelli <f.fainelli@gmail.com>

Commit 8d3ea3d402db ("net: bcmgenet: Use stronger register read/writes
to assure ordering") in net now, thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-21 21:26   ` Jakub Kicinski
@ 2022-03-21 21:28     ` Florian Fainelli
  0 siblings, 0 replies; 14+ messages in thread
From: Florian Fainelli @ 2022-03-21 21:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jeremy Linton, netdev, opendmb, davem, bcm-kernel-feedback-list,
	linux-kernel, Peter Robinson

On 3/21/22 14:26, Jakub Kicinski wrote:
> On Fri, 18 Mar 2022 12:04:34 -0700 Florian Fainelli wrote:
>> On 3/9/22 8:53 PM, Jeremy Linton wrote:
>>> GCC12 appears to be much smarter about its dependency tracking and is
>>> aware that the relaxed variants are just normal loads and stores and
>>> this is causing problems like:
> 
>>> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
>>> Reported-by: Peter Robinson <pbrobinson@gmail.com>
>>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>>
>> Acked-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Commit 8d3ea3d402db ("net: bcmgenet: Use stronger register read/writes
> to assure ordering") in net now, thanks!

Thank you Jakub!
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering
  2022-03-10  4:53 [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering Jeremy Linton
                   ` (2 preceding siblings ...)
  2022-03-18 19:04 ` Florian Fainelli
@ 2022-03-29 15:07 ` Peter Robinson
  3 siblings, 0 replies; 14+ messages in thread
From: Peter Robinson @ 2022-03-29 15:07 UTC (permalink / raw)
  To: Jeremy Linton, stable
  Cc: netdev, opendmb, f.fainelli, davem, kuba,
	bcm-kernel-feedback-list, linux-kernel

On Thu, Mar 10, 2022 at 4:54 AM Jeremy Linton <jeremy.linton@arm.com> wrote:
>
> GCC12 appears to be much smarter about its dependency tracking and is
> aware that the relaxed variants are just normal loads and stores and
> this is causing problems like:
>
> [  210.074549] ------------[ cut here ]------------
> [  210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue 1 timed out
> [  210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
> [  210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat]
> [  210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110
> [  210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E     5.17.0-rc7G12+ #58
> [  210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read perf counters
> [  210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022
> [  210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  210.161358] pc : dev_watchdog+0x234/0x240
> [  210.161364] lr : dev_watchdog+0x234/0x240
> [  210.161368] sp : ffff8000080a3a40
> [  210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: ffff8000080a3b20
> [  210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: ffffcd425af8ec08
> [  210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: ffff55b142688000
> [  210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: ffffffffffffffff
> [  210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 6d736e617274203a
> [  210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: ffffcd425b07d5f0
> [  210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : ffffcd4258bdad9c
> [  210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000
> [  210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000
> [  210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 0000000000000044
> [  210.269682] Call trace:
> [  210.272133]  dev_watchdog+0x234/0x240
> [  210.275811]  call_timer_fn+0x3c/0x15c
> [  210.279489]  __run_timers.part.0+0x288/0x310
> [  210.283777]  run_timer_softirq+0x48/0x80
> [  210.287716]  __do_softirq+0x128/0x360
> [  210.291392]  __irq_exit_rcu+0x138/0x140
> [  210.295243]  irq_exit_rcu+0x1c/0x30
> [  210.298745]  el1_interrupt+0x38/0x54
> [  210.302334]  el1h_64_irq_handler+0x18/0x24
> [  210.306445]  el1h_64_irq+0x7c/0x80
> [  210.309857]  arch_cpu_idle+0x18/0x2c
> [  210.313445]  default_idle_call+0x4c/0x140
> [  210.317470]  cpuidle_idle_call+0x14c/0x1a0
> [  210.321584]  do_idle+0xb0/0x100
> [  210.324737]  cpu_startup_entry+0x30/0x8c
> [  210.328675]  secondary_start_kernel+0xe4/0x110
> [  210.333138]  __secondary_switched+0x94/0x98
>
> The assumption when these were relaxed seems to be that device memory
> would be mapped non reordering, and that other constructs
> (spinlocks/etc) would provide the barriers to assure that packet data
> and in memory rings/queues were ordered with respect to device
> register reads/writes. This itself seems a bit sketchy, but the real
> problem with GCC12 is that it is moving the actual reads/writes around
> at will as though they were independent operations when in truth they
> are not, but the compiler can't know that. When looking at the
> assembly dumps for many of these routines its possible to see very
> clean, but not strictly in program order operations occurring as the
> compiler would be free to do if these weren't actually register
> reads/write operations.
>
> Its possible to suppress the timeout with a liberal bit of dma_mb()'s
> sprinkled around but the device still seems unable to reliably
> send/receive data. A better plan is to use the safer readl/writel
> everywhere.
>
> Since this partially reverts an older commit, which notes the use of
> the relaxed variants for performance reasons. I would suggest that
> any performance problems with this commit are targeted at relaxing only
> the performance critical code paths after assuring proper barriers.
>
> Fixes: 69d2ea9c79898 ("net: bcmgenet: Use correct I/O accessors")
> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

This is now in Linus's tree as commit 8d3ea3d402db would be good to
get it int 5.17.x

> ---
>  drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 87f1056e29ff..e907a2df299c 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -76,7 +76,7 @@ static inline void bcmgenet_writel(u32 value, void __iomem *offset)
>         if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
>                 __raw_writel(value, offset);
>         else
> -               writel_relaxed(value, offset);
> +               writel(value, offset);
>  }
>
>  static inline u32 bcmgenet_readl(void __iomem *offset)
> @@ -84,7 +84,7 @@ static inline u32 bcmgenet_readl(void __iomem *offset)
>         if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
>                 return __raw_readl(offset);
>         else
> -               return readl_relaxed(offset);
> +               return readl(offset);
>  }
>
>  static inline void dmadesc_set_length_status(struct bcmgenet_priv *priv,
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-03-29 15:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-10  4:53 [PATCH] net: bcmgenet: Use stronger register read/writes to assure ordering Jeremy Linton
2022-03-10 17:29 ` Peter Robinson
2022-03-10 18:59 ` Florian Fainelli
2022-03-11  1:09   ` Jeremy Linton
2022-03-11  3:57     ` Florian Fainelli
2022-03-15 15:29       ` Peter Robinson
2022-03-18 19:01         ` Florian Fainelli
2022-03-18 19:20           ` Jakub Kicinski
2022-03-18 21:26             ` Florian Fainelli
2022-03-19  0:22               ` Jakub Kicinski
2022-03-18 19:04 ` Florian Fainelli
2022-03-21 21:26   ` Jakub Kicinski
2022-03-21 21:28     ` Florian Fainelli
2022-03-29 15:07 ` Peter Robinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).