All of lore.kernel.org
 help / color / mirror / Atom feed
* ARM: mvebu: ethernet packets corruption and I/O coherency
@ 2014-11-18 14:25 Francesco Dolcini
  2014-11-18 14:30 ` Thomas Petazzoni
  0 siblings, 1 reply; 5+ messages in thread
From: Francesco Dolcini @ 2014-11-18 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas and all,
I have seen your patches ("ARM: mvebu: no I/O coherency on non-SMP and
related updates") regarding I/O coherency and I think it might be
related to a problem I am experiencing.

I do have a custom board based on Marvell Armada 370 running kernel
3.13.11 and I see random outgoing ethernet packet corruption (about 1
packet every some million) using mvneta driver.

I have tried linux kernel 3.17 without any improvement.

What do you think? Can you backport the fix to 3.13 kernel?

Regards,
	Francesco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* ARM: mvebu: ethernet packets corruption and I/O coherency
  2014-11-18 14:25 ARM: mvebu: ethernet packets corruption and I/O coherency Francesco Dolcini
@ 2014-11-18 14:30 ` Thomas Petazzoni
  2014-11-19 16:01   ` Francesco Dolcini
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Petazzoni @ 2014-11-18 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Francesco Dolcini,

On Tue, 18 Nov 2014 15:25:20 +0100, Francesco Dolcini wrote:

> I have seen your patches ("ARM: mvebu: no I/O coherency on non-SMP and
> related updates") regarding I/O coherency and I think it might be
> related to a problem I am experiencing.
> 
> I do have a custom board based on Marvell Armada 370 running kernel
> 3.13.11 and I see random outgoing ethernet packet corruption (about 1
> packet every some million) using mvneta driver.
> 
> I have tried linux kernel 3.17 without any improvement.
> 
> What do you think? Can you backport the fix to 3.13 kernel?

It could indeed be related. I have marked the relevant patches in the 
"ARM: mvebu: no I/O coherency on non-SMP and related updates" series as
to be backported to stable up to v3.8, so when they get accepted, I'll
take care of backporting them.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* ARM: mvebu: ethernet packets corruption and I/O coherency
  2014-11-18 14:30 ` Thomas Petazzoni
@ 2014-11-19 16:01   ` Francesco Dolcini
  2014-11-19 16:40     ` Thomas Petazzoni
  0 siblings, 1 reply; 5+ messages in thread
From: Francesco Dolcini @ 2014-11-19 16:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas and all

On Tue, Nov 18, 2014 at 03:30:35PM +0100, Thomas Petazzoni wrote:
> It could indeed be related. I have marked the relevant patches in the 
> "ARM: mvebu: no I/O coherency on non-SMP and related updates" series as
> to be backported to stable up to v3.8, so when they get accepted, I'll
> take care of backporting them.

I prepared and tested this small patch to fix the problem on kernel
3.13.11 and it seems to fix my ethernet packet corruption problem.
Do you think it is fine?

Do you think that this bug on I/O cache coherency could also trigger some
sporadic random OOPS and kernel panic? I got an OOPS with a broken LR in
skb_segment() and a kernel panic in put_page(), but I was never able to
reproduce any of them.

Thanks,
Francesco


--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -124,6 +124,12 @@
 {
 	struct device_node *np;
 
+	if (!is_smp())
+	{
+	    pr_info("Coherency fabric disabled\n");
+	    return 0;
+	}
+
 	np = of_find_matching_node(NULL, of_coherency_table);
 	if (np) {
 		struct resource res;
@@ -150,6 +156,9 @@
 {
 	struct device_node *np;
 
+	if (!is_smp())
+	    return 0;
+
 	np = of_find_matching_node(NULL, of_coherency_table);
 	if (np) {
 		bus_register_notifier(&platform_bus_type,

^ permalink raw reply	[flat|nested] 5+ messages in thread

* ARM: mvebu: ethernet packets corruption and I/O coherency
  2014-11-19 16:01   ` Francesco Dolcini
@ 2014-11-19 16:40     ` Thomas Petazzoni
  2014-11-19 16:57       ` Willy Tarreau
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Petazzoni @ 2014-11-19 16:40 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Francesco Dolcini,

On Wed, 19 Nov 2014 17:01:08 +0100, Francesco Dolcini wrote:

> On Tue, Nov 18, 2014 at 03:30:35PM +0100, Thomas Petazzoni wrote:
> > It could indeed be related. I have marked the relevant patches in the 
> > "ARM: mvebu: no I/O coherency on non-SMP and related updates" series as
> > to be backported to stable up to v3.8, so when they get accepted, I'll
> > take care of backporting them.
> 
> I prepared and tested this small patch to fix the problem on kernel
> 3.13.11 and it seems to fix my ethernet packet corruption problem.
> Do you think it is fine?

There's one missing thing: as of 3.13, the mvebu-mbus driver was
directly looking at the DT to see if it had a coherency fabric node,
and if that's the case, then it was enabling the per-MBus window bit
telling that this window uses HW I/O coherency. I'm not sure it causes
some problems in practice, since with your patch all the cache
maintenance operations anyway properly re-enabled. But still, I'd
suggest to modify the mvebu-mbus driver accordingly. See:

	np = of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric");
	if (np) {
		mbus->hw_io_coherency = 1;
		of_node_put(np);
	}

in drivers/bus/mvebu-mbus.c.

> Do you think that this bug on I/O cache coherency could also trigger some
> sporadic random OOPS and kernel panic? I got an OOPS with a broken LR in
> skb_segment() and a kernel panic in put_page(), but I was never able to
> reproduce any of them.

It's hard to say exactly what could happen with the wrong I/O cache
coherency setup. I would expect only the buffers used for DMA to not be
updated properly, but I might be wrong.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* ARM: mvebu: ethernet packets corruption and I/O coherency
  2014-11-19 16:40     ` Thomas Petazzoni
@ 2014-11-19 16:57       ` Willy Tarreau
  0 siblings, 0 replies; 5+ messages in thread
From: Willy Tarreau @ 2014-11-19 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 19, 2014 at 05:40:07PM +0100, Thomas Petazzoni wrote:
> > Do you think that this bug on I/O cache coherency could also trigger some
> > sporadic random OOPS and kernel panic? I got an OOPS with a broken LR in
> > skb_segment() and a kernel panic in put_page(), but I was never able to
> > reproduce any of them.
> 
> It's hard to say exactly what could happen with the wrong I/O cache
> coherency setup. I would expect only the buffers used for DMA to not be
> updated properly, but I might be wrong.

Interestingly I used to experience some random panics under high network
loads on the mirabox and I never knew whether they were attributed to the
power supply or to cache corruption. But since I have modified the driver
and cache management to synchronize caches before the Rx loop, I haven't
encountered them anymore. It could be a pure coincidence just like it
could also be more or less related, maybe due to the fact that the cache
is synchronized much earlier than the data are used and that this changes
the access patterns.

Just my few cents,
Willy

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-11-19 16:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-18 14:25 ARM: mvebu: ethernet packets corruption and I/O coherency Francesco Dolcini
2014-11-18 14:30 ` Thomas Petazzoni
2014-11-19 16:01   ` Francesco Dolcini
2014-11-19 16:40     ` Thomas Petazzoni
2014-11-19 16:57       ` Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.