All of lore.kernel.org
 help / color / mirror / Atom feed
* macb: should we revert 0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200") ?
@ 2020-12-06  9:20 Willy Tarreau
  2020-12-07 23:40 ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Willy Tarreau @ 2020-12-06  9:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nicolas Ferre, Claudiu Beznea, Daniel Palmer, Alexandre Belloni,
	David Miller, netdev

Hi Jakub,

Two months ago I implemented a small change in the macb driver to
support the two Tx descriptors that AT91RM9200 supports. I implemented
this using the only compatible device I had which is the MSC313E-based
Breadbee board. Since then I've met situations where the chip would stop
sending, mostly when doing bidirectional traffic. I've spent a few week-
ends on this already trying various things including blocking interrupts
on a larger place, switching to the 32-bit register access on the MSC313E
(previous code was using two 16-bit accesses and I thought it was the
cause), and tracing status registers along the various operations. Each
time the pattern looks similar, a write when the chips declares having
room results in an overrun, but the preceeding condition doesn't leave
any info suggesting this may happen.

Sadly we don't have the datasheet for this SoC, what is visible is that it
supports AT91RM9200's tx mode and that it works well when there's a single
descriptor in flight. In this case it complies with AT91RM9200's datasheet.
The chip reports other undocumented bits in its status registers, that
cannot even be correlated to the issue by the way. I couldn't spot anything
wrong there in my changes, even after doing heavy changes to progress on
debugging, and the SoC's behavior reporting an overrun after a single write
when there's room contradicts the RM9200's datasheet. In addition we know
the chip also supports other descriptor-based modes, so it's very possible
it doesn't perfectly implement the RM9200's 2-desc mode and that my change
is still fine.

Yesterday I hope to get my old AT91SAM9G20 board to execute this code and
test it, but it clearly does not work when I remap the init and tx functions,
which indicates that it really does not implement a hidden compatibility
mode with the old chip.

Thus at this point I have no way to make sure that the original SoC for
which I changed the code still works fine or if I broke it. As such, I'd
feel more comfortable if we'd revert my patch until someone has the
opportunity to test it on the original hardware (Alexandre suggested he
might, but later).

The commit in question is the following one:

  0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200")

If the maintainers prefer that we wait for an issue to be reported before
reverting it, that's fine for me as well. What's important to me is that
this potential issue is identified so as not to waste someone else's time
on it.

Thanks!
Willy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: macb: should we revert 0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200") ?
  2020-12-06  9:20 macb: should we revert 0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200") ? Willy Tarreau
@ 2020-12-07 23:40 ` Jakub Kicinski
  2020-12-07 23:52   ` Alexandre Belloni
  2020-12-08  5:17   ` Willy Tarreau
  0 siblings, 2 replies; 4+ messages in thread
From: Jakub Kicinski @ 2020-12-07 23:40 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Nicolas Ferre, Claudiu Beznea, Daniel Palmer, Alexandre Belloni,
	David Miller, netdev

On Sun, 6 Dec 2020 10:20:41 +0100 Willy Tarreau wrote:
> Hi Jakub,
> 
> Two months ago I implemented a small change in the macb driver to
> support the two Tx descriptors that AT91RM9200 supports. I implemented
> this using the only compatible device I had which is the MSC313E-based
> Breadbee board. Since then I've met situations where the chip would stop
> sending, mostly when doing bidirectional traffic. I've spent a few week-
> ends on this already trying various things including blocking interrupts
> on a larger place, switching to the 32-bit register access on the MSC313E
> (previous code was using two 16-bit accesses and I thought it was the
> cause), and tracing status registers along the various operations. Each
> time the pattern looks similar, a write when the chips declares having
> room results in an overrun, but the preceeding condition doesn't leave
> any info suggesting this may happen.
> 
> Sadly we don't have the datasheet for this SoC, what is visible is that it
> supports AT91RM9200's tx mode and that it works well when there's a single
> descriptor in flight. In this case it complies with AT91RM9200's datasheet.
> The chip reports other undocumented bits in its status registers, that
> cannot even be correlated to the issue by the way. I couldn't spot anything
> wrong there in my changes, even after doing heavy changes to progress on
> debugging, and the SoC's behavior reporting an overrun after a single write
> when there's room contradicts the RM9200's datasheet. In addition we know
> the chip also supports other descriptor-based modes, so it's very possible
> it doesn't perfectly implement the RM9200's 2-desc mode and that my change
> is still fine.
> 
> Yesterday I hope to get my old AT91SAM9G20 board to execute this code and
> test it, but it clearly does not work when I remap the init and tx functions,
> which indicates that it really does not implement a hidden compatibility
> mode with the old chip.
> 
> Thus at this point I have no way to make sure that the original SoC for
> which I changed the code still works fine or if I broke it. As such, I'd
> feel more comfortable if we'd revert my patch until someone has the
> opportunity to test it on the original hardware (Alexandre suggested he
> might, but later).
> 
> The commit in question is the following one:
> 
>   0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200")
> 
> If the maintainers prefer that we wait for an issue to be reported before
> reverting it, that's fine for me as well. What's important to me is that
> this potential issue is identified so as not to waste someone else's time
> on it.

Thanks for the report, I remember that one. In hindsight maybe we
should have punted it to 5.11...

Let's revert ASAP, 5.10 is going to be LTS, people will definitely
notice.

Would you mind sending a revert patch with the explanation in the
commit message?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: macb: should we revert 0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200") ?
  2020-12-07 23:40 ` Jakub Kicinski
@ 2020-12-07 23:52   ` Alexandre Belloni
  2020-12-08  5:17   ` Willy Tarreau
  1 sibling, 0 replies; 4+ messages in thread
From: Alexandre Belloni @ 2020-12-07 23:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Willy Tarreau, Nicolas Ferre, Claudiu Beznea, Daniel Palmer,
	David Miller, netdev

Hi,

On 07/12/2020 15:40:42-0800, Jakub Kicinski wrote:
> On Sun, 6 Dec 2020 10:20:41 +0100 Willy Tarreau wrote:
> > Hi Jakub,
> > 
> > Two months ago I implemented a small change in the macb driver to
> > support the two Tx descriptors that AT91RM9200 supports. I implemented
> > this using the only compatible device I had which is the MSC313E-based
> > Breadbee board. Since then I've met situations where the chip would stop
> > sending, mostly when doing bidirectional traffic. I've spent a few week-
> > ends on this already trying various things including blocking interrupts
> > on a larger place, switching to the 32-bit register access on the MSC313E
> > (previous code was using two 16-bit accesses and I thought it was the
> > cause), and tracing status registers along the various operations. Each
> > time the pattern looks similar, a write when the chips declares having
> > room results in an overrun, but the preceeding condition doesn't leave
> > any info suggesting this may happen.
> > 
> > Sadly we don't have the datasheet for this SoC, what is visible is that it
> > supports AT91RM9200's tx mode and that it works well when there's a single
> > descriptor in flight. In this case it complies with AT91RM9200's datasheet.
> > The chip reports other undocumented bits in its status registers, that
> > cannot even be correlated to the issue by the way. I couldn't spot anything
> > wrong there in my changes, even after doing heavy changes to progress on
> > debugging, and the SoC's behavior reporting an overrun after a single write
> > when there's room contradicts the RM9200's datasheet. In addition we know
> > the chip also supports other descriptor-based modes, so it's very possible
> > it doesn't perfectly implement the RM9200's 2-desc mode and that my change
> > is still fine.
> > 
> > Yesterday I hope to get my old AT91SAM9G20 board to execute this code and
> > test it, but it clearly does not work when I remap the init and tx functions,
> > which indicates that it really does not implement a hidden compatibility
> > mode with the old chip.
> > 
> > Thus at this point I have no way to make sure that the original SoC for
> > which I changed the code still works fine or if I broke it. As such, I'd
> > feel more comfortable if we'd revert my patch until someone has the
> > opportunity to test it on the original hardware (Alexandre suggested he
> > might, but later).
> > 
> > The commit in question is the following one:
> > 
> >   0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200")
> > 
> > If the maintainers prefer that we wait for an issue to be reported before
> > reverting it, that's fine for me as well. What's important to me is that
> > this potential issue is identified so as not to waste someone else's time
> > on it.
> 
> Thanks for the report, I remember that one. In hindsight maybe we
> should have punted it to 5.11...
> 
> Let's revert ASAP, 5.10 is going to be LTS, people will definitely
> notice.
> 
> Would you mind sending a revert patch with the explanation in the
> commit message?

FWIW, I went to the office and brought back my rm9200 today. I didn't
have the time to set up a test yet though. I'm not sure it will even
boot v5.10 so don't bet anyone apart me would notice Ethernet being
broken on this SoC.

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: macb: should we revert 0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200") ?
  2020-12-07 23:40 ` Jakub Kicinski
  2020-12-07 23:52   ` Alexandre Belloni
@ 2020-12-08  5:17   ` Willy Tarreau
  1 sibling, 0 replies; 4+ messages in thread
From: Willy Tarreau @ 2020-12-08  5:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nicolas Ferre, Claudiu Beznea, Daniel Palmer, Alexandre Belloni,
	David Miller, netdev

On Mon, Dec 07, 2020 at 03:40:42PM -0800, Jakub Kicinski wrote:
> Thanks for the report, I remember that one. In hindsight maybe we
> should have punted it to 5.11...

Well, not necessarily as it was simple, well documented and *appeared*
to work fine.

> Let's revert ASAP, 5.10 is going to be LTS, people will definitely
> notice.

It could take some time as we're speaking about crazy people running
5.10 on an old 180 MHz MCU :-)

> Would you mind sending a revert patch with the explanation in the
> commit message?

Sure, will do.

Thanks!
Willy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-12-08  5:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-06  9:20 macb: should we revert 0a4e9ce17ba7 ("macb: support the two tx descriptors on at91rm9200") ? Willy Tarreau
2020-12-07 23:40 ` Jakub Kicinski
2020-12-07 23:52   ` Alexandre Belloni
2020-12-08  5:17   ` Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.