All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: bcmgenet: Reset RBUF on first open
@ 2024-02-23 23:53 Maarten Vanraes
  2024-02-26 17:34 ` Florian Fainelli
  0 siblings, 1 reply; 13+ messages in thread
From: Maarten Vanraes @ 2024-02-23 23:53 UTC (permalink / raw)
  To: Doug Berger, Florian Fainelli
  Cc: netdev, Broadcom internal kernel review list, Phil Elwell,
	Maarten Vanraes

From: Phil Elwell <phil@raspberrypi.com>

If the RBUF logic is not reset when the kernel starts then there
may be some data left over from any network boot loader. If the
64-byte packet headers are enabled then this can be fatal.

Extend bcmgenet_dma_disable to do perform the reset, but not when
called from bcmgenet_resume in order to preserve a wake packet.

N.B. This different handling of resume is just based on a hunch -
why else wouldn't one reset the RBUF as well as the TBUF? If this
isn't the case then it's easy to change the patch to make the RBUF
reset unconditional.

See: https://github.com/raspberrypi/linux/issues/3850

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Maarten Vanraes <maarten@rmail.be>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're using
nfsroot), you fail to boot; or at least the boot takes longer than
30 minutes.

Doing a simple ping revealed that when the ping starts working again
(during the boot process), you have ping timings of ~1000ms, 2000ms or
even 3000ms; while in normal cases it would be around 0.2ms.

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 2d7ae71287b1..58995772cc00 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -3282,7 +3282,7 @@ static void bcmgenet_get_hw_addr(struct bcmgenet_priv *priv,
 }
 
 /* Returns a reusable dma control register value */
-static u32 bcmgenet_dma_disable(struct bcmgenet_priv *priv)
+static u32 bcmgenet_dma_disable(struct bcmgenet_priv *priv, bool flush_rx)
 {
 	unsigned int i;
 	u32 reg;
@@ -3307,6 +3307,14 @@ static u32 bcmgenet_dma_disable(struct bcmgenet_priv *priv)
 	udelay(10);
 	bcmgenet_umac_writel(priv, 0, UMAC_TX_FLUSH);
 
+	if (flush_rx) {
+	    reg = bcmgenet_rbuf_ctrl_get(priv);
+	    bcmgenet_rbuf_ctrl_set(priv, reg | BIT(0));
+	    udelay(10);
+	    bcmgenet_rbuf_ctrl_set(priv, reg);
+	    udelay(10);
+	}
+
 	return dma_ctrl;
 }
 
@@ -3370,8 +3378,8 @@ static int bcmgenet_open(struct net_device *dev)
 
 	bcmgenet_set_hw_addr(priv, dev->dev_addr);
 
-	/* Disable RX/TX DMA and flush TX queues */
-	dma_ctrl = bcmgenet_dma_disable(priv);
+	/* Disable RX/TX DMA and flush TX and RX queues */
+	dma_ctrl = bcmgenet_dma_disable(priv, true);
 
 	/* Reinitialize TDMA and RDMA and SW housekeeping */
 	ret = bcmgenet_init_dma(priv);
@@ -4237,7 +4245,7 @@ static int bcmgenet_resume(struct device *d)
 			bcmgenet_hfb_create_rxnfc_filter(priv, rule);
 
 	/* Disable RX/TX DMA and flush TX queues */
-	dma_ctrl = bcmgenet_dma_disable(priv);
+	dma_ctrl = bcmgenet_dma_disable(priv, false);
 
 	/* Reinitialize TDMA and RDMA and SW housekeeping */
 	ret = bcmgenet_init_dma(priv);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-02-23 23:53 [PATCH] net: bcmgenet: Reset RBUF on first open Maarten Vanraes
@ 2024-02-26 17:34 ` Florian Fainelli
  2024-02-26 19:14   ` Maarten
  2024-02-26 23:13   ` Doug Berger
  0 siblings, 2 replies; 13+ messages in thread
From: Florian Fainelli @ 2024-02-26 17:34 UTC (permalink / raw)
  To: Maarten Vanraes, Doug Berger
  Cc: netdev, Broadcom internal kernel review list, Phil Elwell

[-- Attachment #1: Type: text/plain, Size: 2258 bytes --]

On 2/23/24 15:53, Maarten Vanraes wrote:
> From: Phil Elwell <phil@raspberrypi.com>
> 
> If the RBUF logic is not reset when the kernel starts then there
> may be some data left over from any network boot loader. If the
> 64-byte packet headers are enabled then this can be fatal.
> 
> Extend bcmgenet_dma_disable to do perform the reset, but not when
> called from bcmgenet_resume in order to preserve a wake packet.
> 
> N.B. This different handling of resume is just based on a hunch -
> why else wouldn't one reset the RBUF as well as the TBUF? If this
> isn't the case then it's easy to change the patch to make the RBUF
> reset unconditional.

The real question is why is not the boot loader putting the GENET core 
into a quasi power-on-reset state, since this is what Linux expects, and 
also it seems the most conservative and prudent approach. Assuming the 
RDMA and Unimac RX are disabled, otherwise we would happily continuing 
to accept packets in DRAM, then the question is why is not the RBUF 
flushed too, or is it flushed, but this is insufficient, if so, have we 
determined why?

> 
> See: https://github.com/raspberrypi/linux/issues/3850
> 
> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
> Signed-off-by: Maarten Vanraes <maarten@rmail.be>
> ---
>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
> 
> This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're using
> nfsroot), you fail to boot; or at least the boot takes longer than
> 30 minutes.

This makes me wonder whether this also fixes the issues that Maxime 
reported a long time ago, which I can reproduce too, but have not been 
able to track down the source of:

https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/

> 
> Doing a simple ping revealed that when the ping starts working again
> (during the boot process), you have ping timings of ~1000ms, 2000ms or
> even 3000ms; while in normal cases it would be around 0.2ms.

I would prefer that we find a way to better qualify whether a RBUF reset 
is needed or not, but I suppose there is not any other way, since there 
is an "RBUF enabled" bit that we can key off.

Doug, what do you think?
-- 
Florian


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4221 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-02-26 17:34 ` Florian Fainelli
@ 2024-02-26 19:14   ` Maarten
  2024-02-26 23:13   ` Doug Berger
  1 sibling, 0 replies; 13+ messages in thread
From: Maarten @ 2024-02-26 19:14 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Doug Berger, netdev, Broadcom internal kernel review list, Phil Elwell

Florian Fainelli schreef op 2024-02-26 18:34:
> On 2/23/24 15:53, Maarten Vanraes wrote:
>> From: Phil Elwell <phil@raspberrypi.com>
>> 
>> If the RBUF logic is not reset when the kernel starts then there
>> may be some data left over from any network boot loader. If the
>> 64-byte packet headers are enabled then this can be fatal.
>> 
>> Extend bcmgenet_dma_disable to do perform the reset, but not when
>> called from bcmgenet_resume in order to preserve a wake packet.
>> 
>> N.B. This different handling of resume is just based on a hunch -
>> why else wouldn't one reset the RBUF as well as the TBUF? If this
>> isn't the case then it's easy to change the patch to make the RBUF
>> reset unconditional.
> 
> The real question is why is not the boot loader putting the GENET core
> into a quasi power-on-reset state, since this is what Linux expects,
> and also it seems the most conservative and prudent approach. Assuming
> the RDMA and Unimac RX are disabled, otherwise we would happily
> continuing to accept packets in DRAM, then the question is why is not
> the RBUF flushed too, or is it flushed, but this is insufficient, if
> so, have we determined why?

I can only say that when I was testing upstream kernels (6.7, 6.8) I had 
a lot of issue rebooting the RPI4B, and after some searched, I found 
this patch in the raspberrypi kernel (from 2020) and since I've used it, 
I do not have this issue anymore for at least 10 boots. Not sure if I 
should've added a Tested-By with myself?

>> 
>> See: https://github.com/raspberrypi/linux/issues/3850
>> 
>> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
>> Signed-off-by: Maarten Vanraes <maarten@rmail.be>
>> ---
>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 ++++++++++++----
>>   1 file changed, 12 insertions(+), 4 deletions(-)
>> 
>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're 
>> using
>> nfsroot), you fail to boot; or at least the boot takes longer than
>> 30 minutes.
> 
> This makes me wonder whether this also fixes the issues that Maxime
> reported a long time ago, which I can reproduce too, but have not been
> able to track down the source of:
> 
> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
> 
>> 
>> Doing a simple ping revealed that when the ping starts working again
>> (during the boot process), you have ping timings of ~1000ms, 2000ms or
>> even 3000ms; while in normal cases it would be around 0.2ms.
> 
> I would prefer that we find a way to better qualify whether a RBUF
> reset is needed or not, but I suppose there is not any other way,
> since there is an "RBUF enabled" bit that we can key off.
> 
> Doug, what do you think?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-02-26 17:34 ` Florian Fainelli
  2024-02-26 19:14   ` Maarten
@ 2024-02-26 23:13   ` Doug Berger
  2024-02-27 12:53     ` Paolo Abeni
                       ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Doug Berger @ 2024-02-26 23:13 UTC (permalink / raw)
  To: Florian Fainelli, Maarten Vanraes
  Cc: netdev, Broadcom internal kernel review list, Phil Elwell

On 2/26/2024 9:34 AM, Florian Fainelli wrote:
> On 2/23/24 15:53, Maarten Vanraes wrote:
>> From: Phil Elwell <phil@raspberrypi.com>
>>
>> If the RBUF logic is not reset when the kernel starts then there
>> may be some data left over from any network boot loader. If the
>> 64-byte packet headers are enabled then this can be fatal.
>>
>> Extend bcmgenet_dma_disable to do perform the reset, but not when
>> called from bcmgenet_resume in order to preserve a wake packet.
>>
>> N.B. This different handling of resume is just based on a hunch -
>> why else wouldn't one reset the RBUF as well as the TBUF? If this
>> isn't the case then it's easy to change the patch to make the RBUF
>> reset unconditional.
> 
> The real question is why is not the boot loader putting the GENET core 
> into a quasi power-on-reset state, since this is what Linux expects, and 
> also it seems the most conservative and prudent approach. Assuming the 
> RDMA and Unimac RX are disabled, otherwise we would happily continuing 
> to accept packets in DRAM, then the question is why is not the RBUF 
> flushed too, or is it flushed, but this is insufficient, if so, have we 
> determined why?
> 
>>
>> See: https://github.com/raspberrypi/linux/issues/3850
>>
>> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
>> Signed-off-by: Maarten Vanraes <maarten@rmail.be>
>> ---
>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 ++++++++++++----
>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're using
>> nfsroot), you fail to boot; or at least the boot takes longer than
>> 30 minutes.
> 
> This makes me wonder whether this also fixes the issues that Maxime 
> reported a long time ago, which I can reproduce too, but have not been 
> able to track down the source of:
> 
> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
> 
>>
>> Doing a simple ping revealed that when the ping starts working again
>> (during the boot process), you have ping timings of ~1000ms, 2000ms or
>> even 3000ms; while in normal cases it would be around 0.2ms.
> 
> I would prefer that we find a way to better qualify whether a RBUF reset 
> is needed or not, but I suppose there is not any other way, since there 
> is an "RBUF enabled" bit that we can key off.
> 
> Doug, what do you think?
I agree that the Linux driver expects the GENET core to be in a "quasi 
power-on-reset state" and it seems likely that in both Maxime's case and 
the one identified here that is not the case. It would appear that the 
Raspberry Pi bootloader and/or "firmware" are likely not disabling the 
GENET receiver after loading the kernel image and before invoking the 
kernel. They may be disabling the DMA, but that is insufficient since 
any received data would likely overflow the RBUF leaving it in a "bad" 
state which this patch apparently improves.

So it seems likely these issues are caused by improper 
bootloader/firmware behavior.

That said, I suppose it would be nice if the driver were more robust. 
However, we both know how finicky the receive path of the GENET core can 
be about its initialization. Therefore, I am unwilling to "bless" this 
change for upstream without more due diligence on our side.

-Doug

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-02-26 23:13   ` Doug Berger
@ 2024-02-27 12:53     ` Paolo Abeni
  2024-03-05 15:13     ` Jakub Kicinski
  2024-03-16 11:53     ` Maarten
  2 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2024-02-27 12:53 UTC (permalink / raw)
  To: Doug Berger, Florian Fainelli, Maarten Vanraes
  Cc: netdev, Broadcom internal kernel review list, Phil Elwell

On Mon, 2024-02-26 at 15:13 -0800, Doug Berger wrote:
> On 2/26/2024 9:34 AM, Florian Fainelli wrote:
> > On 2/23/24 15:53, Maarten Vanraes wrote:
> > > From: Phil Elwell <phil@raspberrypi.com>
> > > 
> > > If the RBUF logic is not reset when the kernel starts then there
> > > may be some data left over from any network boot loader. If the
> > > 64-byte packet headers are enabled then this can be fatal.
> > > 
> > > Extend bcmgenet_dma_disable to do perform the reset, but not when
> > > called from bcmgenet_resume in order to preserve a wake packet.
> > > 
> > > N.B. This different handling of resume is just based on a hunch -
> > > why else wouldn't one reset the RBUF as well as the TBUF? If this
> > > isn't the case then it's easy to change the patch to make the RBUF
> > > reset unconditional.
> > 
> > The real question is why is not the boot loader putting the GENET core 
> > into a quasi power-on-reset state, since this is what Linux expects, and 
> > also it seems the most conservative and prudent approach. Assuming the 
> > RDMA and Unimac RX are disabled, otherwise we would happily continuing 
> > to accept packets in DRAM, then the question is why is not the RBUF 
> > flushed too, or is it flushed, but this is insufficient, if so, have we 
> > determined why?
> > 
> > > 
> > > See: https://github.com/raspberrypi/linux/issues/3850
> > > 
> > > Signed-off-by: Phil Elwell <phil@raspberrypi.com>
> > > Signed-off-by: Maarten Vanraes <maarten@rmail.be>
> > > ---
> > >   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 ++++++++++++----
> > >   1 file changed, 12 insertions(+), 4 deletions(-)
> > > 
> > > This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're using
> > > nfsroot), you fail to boot; or at least the boot takes longer than
> > > 30 minutes.
> > 
> > This makes me wonder whether this also fixes the issues that Maxime 
> > reported a long time ago, which I can reproduce too, but have not been 
> > able to track down the source of:
> > 
> > https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
> > 
> > > 
> > > Doing a simple ping revealed that when the ping starts working again
> > > (during the boot process), you have ping timings of ~1000ms, 2000ms or
> > > even 3000ms; while in normal cases it would be around 0.2ms.
> > 
> > I would prefer that we find a way to better qualify whether a RBUF reset 
> > is needed or not, but I suppose there is not any other way, since there 
> > is an "RBUF enabled" bit that we can key off.
> > 
> > Doug, what do you think?
> I agree that the Linux driver expects the GENET core to be in a "quasi 
> power-on-reset state" and it seems likely that in both Maxime's case and 
> the one identified here that is not the case. It would appear that the 
> Raspberry Pi bootloader and/or "firmware" are likely not disabling the 
> GENET receiver after loading the kernel image and before invoking the 
> kernel. They may be disabling the DMA, but that is insufficient since 
> any received data would likely overflow the RBUF leaving it in a "bad" 
> state which this patch apparently improves.
> 
> So it seems likely these issues are caused by improper 
> bootloader/firmware behavior.
> 
> That said, I suppose it would be nice if the driver were more robust. 
> However, we both know how finicky the receive path of the GENET core can 
> be about its initialization. Therefore, I am unwilling to "bless" this 
> change for upstream without more due diligence on our side.

Could you please report back in a reasonable timeframe? The issue
addressed here looks like relevant, and the patch quite self-
encapsulated.

We can keep the path in PW meanwhile.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-02-26 23:13   ` Doug Berger
  2024-02-27 12:53     ` Paolo Abeni
@ 2024-03-05 15:13     ` Jakub Kicinski
  2024-03-05 20:36       ` Maarten
  2024-03-16 11:53     ` Maarten
  2 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2024-03-05 15:13 UTC (permalink / raw)
  To: Doug Berger
  Cc: Florian Fainelli, Maarten Vanraes, netdev,
	Broadcom internal kernel review list, Phil Elwell

On Mon, 26 Feb 2024 15:13:57 -0800 Doug Berger wrote:
> I agree that the Linux driver expects the GENET core to be in a "quasi 
> power-on-reset state" and it seems likely that in both Maxime's case and 
> the one identified here that is not the case. It would appear that the 
> Raspberry Pi bootloader and/or "firmware" are likely not disabling the 
> GENET receiver after loading the kernel image and before invoking the 
> kernel. They may be disabling the DMA, but that is insufficient since 
> any received data would likely overflow the RBUF leaving it in a "bad" 
> state which this patch apparently improves.
> 
> So it seems likely these issues are caused by improper 
> bootloader/firmware behavior.
> 
> That said, I suppose it would be nice if the driver were more robust. 
> However, we both know how finicky the receive path of the GENET core can 
> be about its initialization. Therefore, I am unwilling to "bless" this 
> change for upstream without more due diligence on our side.

The patch has minor formatting issues (using spaces to indent).
Once you've gain sufficient confidence that it doesn't cause issues -
please mend that and repost.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-03-05 15:13     ` Jakub Kicinski
@ 2024-03-05 20:36       ` Maarten
  2024-03-05 21:07         ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Maarten @ 2024-03-05 20:36 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, Broadcom internal kernel review list
  Cc: Doug Berger, Florian Fainelli, Phil Elwell

Jakub Kicinski schreef op 2024-03-05 16:13:
> On Mon, 26 Feb 2024 15:13:57 -0800 Doug Berger wrote:
>> I agree that the Linux driver expects the GENET core to be in a "quasi
>> power-on-reset state" and it seems likely that in both Maxime's case 
>> and
>> the one identified here that is not the case. It would appear that the
>> Raspberry Pi bootloader and/or "firmware" are likely not disabling the
>> GENET receiver after loading the kernel image and before invoking the
>> kernel. They may be disabling the DMA, but that is insufficient since
>> any received data would likely overflow the RBUF leaving it in a "bad"
>> state which this patch apparently improves.
>> 
>> So it seems likely these issues are caused by improper
>> bootloader/firmware behavior.
>> 
>> That said, I suppose it would be nice if the driver were more robust.
>> However, we both know how finicky the receive path of the GENET core 
>> can
>> be about its initialization. Therefore, I am unwilling to "bless" this
>> change for upstream without more due diligence on our side.
> 
> The patch has minor formatting issues (using spaces to indent).
> Once you've gain sufficient confidence that it doesn't cause issues -
> please mend that and repost.

I'm sorry, it was blatantly obvious and I missed it :-( . I had added 
indent-with-non-tab to git core.whitespace , but it seems to only error 
when a full 8 spaces are present in indentation. By any chance, is there 
something to test this? In the main time, I'll do a git show -p --raw | 
hexdump -C to check this .

I've fixed that on my git (and fixed some similar issues in other 
patches) and will resend.

thanks,

Maarten

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-03-05 20:36       ` Maarten
@ 2024-03-05 21:07         ` Jakub Kicinski
  2024-03-06  8:03           ` Maarten
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2024-03-05 21:07 UTC (permalink / raw)
  To: Maarten
  Cc: netdev, Broadcom internal kernel review list, Doug Berger,
	Florian Fainelli, Phil Elwell

On Tue, 05 Mar 2024 21:36:03 +0100 Maarten wrote:
> > The patch has minor formatting issues (using spaces to indent).
> > Once you've gain sufficient confidence that it doesn't cause issues -
> > please mend that and repost.  
> 
> I'm sorry, it was blatantly obvious and I missed it :-( . I had added 
> indent-with-non-tab to git core.whitespace , but it seems to only error 
> when a full 8 spaces are present in indentation. By any chance, is there 
> something to test this? In the main time, I'll do a git show -p --raw | 
> hexdump -C to check this .
> 
> I've fixed that on my git (and fixed some similar issues in other 
> patches) and will resend.

I'd rather you waited with the resend until Doug or Florian confirms
the change is okay. No point having the patch rot in patchwork until
then.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-03-05 21:07         ` Jakub Kicinski
@ 2024-03-06  8:03           ` Maarten
  0 siblings, 0 replies; 13+ messages in thread
From: Maarten @ 2024-03-06  8:03 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, Broadcom internal kernel review list
  Cc: Doug Berger, Florian Fainelli, Phil Elwell

Jakub Kicinski schreef op 2024-03-05 22:07:
> On Tue, 05 Mar 2024 21:36:03 +0100 Maarten wrote:
>> > The patch has minor formatting issues (using spaces to indent).
>> > Once you've gain sufficient confidence that it doesn't cause issues -
>> > please mend that and repost.
>> 
>> I'm sorry, it was blatantly obvious and I missed it :-( . I had added
>> indent-with-non-tab to git core.whitespace , but it seems to only 
>> error
>> when a full 8 spaces are present in indentation. By any chance, is 
>> there
>> something to test this? In the main time, I'll do a git show -p --raw 
>> |
>> hexdump -C to check this .
>> 
>> I've fixed that on my git (and fixed some similar issues in other
>> patches) and will resend.
> 
> I'd rather you waited with the resend until Doug or Florian confirms
> the change is okay. No point having the patch rot in patchwork until
> then.

Ok, hopefully it won't take too long for them to do their due diligence.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-02-26 23:13   ` Doug Berger
  2024-02-27 12:53     ` Paolo Abeni
  2024-03-05 15:13     ` Jakub Kicinski
@ 2024-03-16 11:53     ` Maarten
  2024-03-19 16:56       ` Florian Fainelli
  2 siblings, 1 reply; 13+ messages in thread
From: Maarten @ 2024-03-16 11:53 UTC (permalink / raw)
  To: Doug Berger, Florian Fainelli, netdev,
	Broadcom internal kernel review list
  Cc: Phil Elwell

Doug Berger schreef op 2024-02-27 00:13:
> On 2/26/2024 9:34 AM, Florian Fainelli wrote:
>> On 2/23/24 15:53, Maarten Vanraes wrote:
>>> From: Phil Elwell <phil@raspberrypi.com>
>>> 
>>> If the RBUF logic is not reset when the kernel starts then there
>>> may be some data left over from any network boot loader. If the
>>> 64-byte packet headers are enabled then this can be fatal.
>>> 
>>> Extend bcmgenet_dma_disable to do perform the reset, but not when
>>> called from bcmgenet_resume in order to preserve a wake packet.
>>> 
>>> N.B. This different handling of resume is just based on a hunch -
>>> why else wouldn't one reset the RBUF as well as the TBUF? If this
>>> isn't the case then it's easy to change the patch to make the RBUF
>>> reset unconditional.
>> 
>> The real question is why is not the boot loader putting the GENET core 
>> into a quasi power-on-reset state, since this is what Linux expects, 
>> and also it seems the most conservative and prudent approach. Assuming 
>> the RDMA and Unimac RX are disabled, otherwise we would happily 
>> continuing to accept packets in DRAM, then the question is why is not 
>> the RBUF flushed too, or is it flushed, but this is insufficient, if 
>> so, have we determined why?
>> 
>>> 
>>> See: https://github.com/raspberrypi/linux/issues/3850
>>> 
>>> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
>>> Signed-off-by: Maarten Vanraes <maarten@rmail.be>
>>> ---
>>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 
>>> ++++++++++++----
>>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>> 
>>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're 
>>> using
>>> nfsroot), you fail to boot; or at least the boot takes longer than
>>> 30 minutes.
>> 
>> This makes me wonder whether this also fixes the issues that Maxime 
>> reported a long time ago, which I can reproduce too, but have not been 
>> able to track down the source of:
>> 
>> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
>> 
>>> 
>>> Doing a simple ping revealed that when the ping starts working again
>>> (during the boot process), you have ping timings of ~1000ms, 2000ms 
>>> or
>>> even 3000ms; while in normal cases it would be around 0.2ms.
>> 
>> I would prefer that we find a way to better qualify whether a RBUF 
>> reset is needed or not, but I suppose there is not any other way, 
>> since there is an "RBUF enabled" bit that we can key off.
>> 
>> Doug, what do you think?
> I agree that the Linux driver expects the GENET core to be in a "quasi
> power-on-reset state" and it seems likely that in both Maxime's case
> and the one identified here that is not the case. It would appear that
> the Raspberry Pi bootloader and/or "firmware" are likely not disabling
> the GENET receiver after loading the kernel image and before invoking
> the kernel. They may be disabling the DMA, but that is insufficient
> since any received data would likely overflow the RBUF leaving it in a
> "bad" state which this patch apparently improves.
> 
> So it seems likely these issues are caused by improper
> bootloader/firmware behavior.
> 
> That said, I suppose it would be nice if the driver were more robust.
> However, we both know how finicky the receive path of the GENET core
> can be about its initialization. Therefore, I am unwilling to "bless"
> this change for upstream without more due diligence on our side.

Hey, did you guys have any chance to check this stuff out? any thoughts 
on it?

Regards,

Maarten

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-03-16 11:53     ` Maarten
@ 2024-03-19 16:56       ` Florian Fainelli
  2024-03-19 21:11         ` Maarten
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Fainelli @ 2024-03-19 16:56 UTC (permalink / raw)
  To: Maarten, Doug Berger, netdev, Broadcom internal kernel review list
  Cc: Phil Elwell

[-- Attachment #1: Type: text/plain, Size: 4031 bytes --]

On 3/16/24 04:53, Maarten wrote:
> Doug Berger schreef op 2024-02-27 00:13:
>> On 2/26/2024 9:34 AM, Florian Fainelli wrote:
>>> On 2/23/24 15:53, Maarten Vanraes wrote:
>>>> From: Phil Elwell <phil@raspberrypi.com>
>>>>
>>>> If the RBUF logic is not reset when the kernel starts then there
>>>> may be some data left over from any network boot loader. If the
>>>> 64-byte packet headers are enabled then this can be fatal.
>>>>
>>>> Extend bcmgenet_dma_disable to do perform the reset, but not when
>>>> called from bcmgenet_resume in order to preserve a wake packet.
>>>>
>>>> N.B. This different handling of resume is just based on a hunch -
>>>> why else wouldn't one reset the RBUF as well as the TBUF? If this
>>>> isn't the case then it's easy to change the patch to make the RBUF
>>>> reset unconditional.
>>>
>>> The real question is why is not the boot loader putting the GENET 
>>> core into a quasi power-on-reset state, since this is what Linux 
>>> expects, and also it seems the most conservative and prudent 
>>> approach. Assuming the RDMA and Unimac RX are disabled, otherwise we 
>>> would happily continuing to accept packets in DRAM, then the question 
>>> is why is not the RBUF flushed too, or is it flushed, but this is 
>>> insufficient, if so, have we determined why?
>>>
>>>>
>>>> See: https://github.com/raspberrypi/linux/issues/3850
>>>>
>>>> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
>>>> Signed-off-by: Maarten Vanraes <maarten@rmail.be>
>>>> ---
>>>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 ++++++++++++----
>>>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>>>
>>>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're 
>>>> using
>>>> nfsroot), you fail to boot; or at least the boot takes longer than
>>>> 30 minutes.
>>>
>>> This makes me wonder whether this also fixes the issues that Maxime 
>>> reported a long time ago, which I can reproduce too, but have not 
>>> been able to track down the source of:
>>>
>>> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
>>>
>>>>
>>>> Doing a simple ping revealed that when the ping starts working again
>>>> (during the boot process), you have ping timings of ~1000ms, 2000ms or
>>>> even 3000ms; while in normal cases it would be around 0.2ms.
>>>
>>> I would prefer that we find a way to better qualify whether a RBUF 
>>> reset is needed or not, but I suppose there is not any other way, 
>>> since there is an "RBUF enabled" bit that we can key off.
>>>
>>> Doug, what do you think?
>> I agree that the Linux driver expects the GENET core to be in a "quasi
>> power-on-reset state" and it seems likely that in both Maxime's case
>> and the one identified here that is not the case. It would appear that
>> the Raspberry Pi bootloader and/or "firmware" are likely not disabling
>> the GENET receiver after loading the kernel image and before invoking
>> the kernel. They may be disabling the DMA, but that is insufficient
>> since any received data would likely overflow the RBUF leaving it in a
>> "bad" state which this patch apparently improves.
>>
>> So it seems likely these issues are caused by improper
>> bootloader/firmware behavior.
>>
>> That said, I suppose it would be nice if the driver were more robust.
>> However, we both know how finicky the receive path of the GENET core
>> can be about its initialization. Therefore, I am unwilling to "bless"
>> this change for upstream without more due diligence on our side.
> 
> Hey, did you guys have any chance to check this stuff out? any thoughts 
> on it?

We are both busy with higher priority work and I cannot see us being 
able to dedicate any time to this issue until April.

While we are sympathetic to your issue and you having upstreamed a fix 
for it, it is entirely self inflicted by having the VPU boot loader 
firmware not properly quiesce the GENET controller, at least based upon 
the description, therefore the natural fix should be... in the firmware.

 From my perspective: NAK.
-- 
Florian


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4221 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-03-19 16:56       ` Florian Fainelli
@ 2024-03-19 21:11         ` Maarten
  2024-03-19 21:50           ` Florian Fainelli
  0 siblings, 1 reply; 13+ messages in thread
From: Maarten @ 2024-03-19 21:11 UTC (permalink / raw)
  To: Florian Fainelli, netdev, Broadcom internal kernel review list
  Cc: Doug Berger, Phil Elwell

Florian Fainelli schreef op 2024-03-19 17:56:
> On 3/16/24 04:53, Maarten wrote:
>> Doug Berger schreef op 2024-02-27 00:13:
>>> On 2/26/2024 9:34 AM, Florian Fainelli wrote:
>>>> On 2/23/24 15:53, Maarten Vanraes wrote:
>>>>> From: Phil Elwell <phil@raspberrypi.com>
>>>>> 
>>>>> If the RBUF logic is not reset when the kernel starts then there
>>>>> may be some data left over from any network boot loader. If the
>>>>> 64-byte packet headers are enabled then this can be fatal.
>>>>> 
>>>>> Extend bcmgenet_dma_disable to do perform the reset, but not when
>>>>> called from bcmgenet_resume in order to preserve a wake packet.
>>>>> 
>>>>> N.B. This different handling of resume is just based on a hunch -
>>>>> why else wouldn't one reset the RBUF as well as the TBUF? If this
>>>>> isn't the case then it's easy to change the patch to make the RBUF
>>>>> reset unconditional.
>>>> 
>>>> The real question is why is not the boot loader putting the GENET 
>>>> core into a quasi power-on-reset state, since this is what Linux 
>>>> expects, and also it seems the most conservative and prudent 
>>>> approach. Assuming the RDMA and Unimac RX are disabled, otherwise we 
>>>> would happily continuing to accept packets in DRAM, then the 
>>>> question is why is not the RBUF flushed too, or is it flushed, but 
>>>> this is insufficient, if so, have we determined why?
>>>> 
>>>>> 
>>>>> See: https://github.com/raspberrypi/linux/issues/3850
>>>>> 
>>>>> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
>>>>> Signed-off-by: Maarten Vanraes <maarten@rmail.be>
>>>>> ---
>>>>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 
>>>>> ++++++++++++----
>>>>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>>>> 
>>>>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if you're 
>>>>> using
>>>>> nfsroot), you fail to boot; or at least the boot takes longer than
>>>>> 30 minutes.
>>>> 
>>>> This makes me wonder whether this also fixes the issues that Maxime 
>>>> reported a long time ago, which I can reproduce too, but have not 
>>>> been able to track down the source of:
>>>> 
>>>> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
>>>> 
>>>>> 
>>>>> Doing a simple ping revealed that when the ping starts working 
>>>>> again
>>>>> (during the boot process), you have ping timings of ~1000ms, 2000ms 
>>>>> or
>>>>> even 3000ms; while in normal cases it would be around 0.2ms.
>>>> 
>>>> I would prefer that we find a way to better qualify whether a RBUF 
>>>> reset is needed or not, but I suppose there is not any other way, 
>>>> since there is an "RBUF enabled" bit that we can key off.
>>>> 
>>>> Doug, what do you think?
>>> I agree that the Linux driver expects the GENET core to be in a 
>>> "quasi
>>> power-on-reset state" and it seems likely that in both Maxime's case
>>> and the one identified here that is not the case. It would appear 
>>> that
>>> the Raspberry Pi bootloader and/or "firmware" are likely not 
>>> disabling
>>> the GENET receiver after loading the kernel image and before invoking
>>> the kernel. They may be disabling the DMA, but that is insufficient
>>> since any received data would likely overflow the RBUF leaving it in 
>>> a
>>> "bad" state which this patch apparently improves.
>>> 
>>> So it seems likely these issues are caused by improper
>>> bootloader/firmware behavior.
>>> 
>>> That said, I suppose it would be nice if the driver were more robust.
>>> However, we both know how finicky the receive path of the GENET core
>>> can be about its initialization. Therefore, I am unwilling to "bless"
>>> this change for upstream without more due diligence on our side.
>> 
>> Hey, did you guys have any chance to check this stuff out? any 
>> thoughts on it?
> 
> We are both busy with higher priority work and I cannot see us being
> able to dedicate any time to this issue until April.
> 
> While we are sympathetic to your issue and you having upstreamed a fix
> for it, it is entirely self inflicted by having the VPU boot loader
> firmware not properly quiesce the GENET controller, at least based
> upon the description, therefore the natural fix should be... in the
> firmware.

I totally agree that the natural fix should be in the firmware.

> From my perspective: NAK.

Fair enough, though I do think that there are often workarounds for 
faulty firmware, and making the driver more robust is not a bad thing 
either.

In any case, I try to raise this issue with the raspberry pi people in 
the hopes of fixing this in the proper place.

Thanks for the response,

Maarten

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] net: bcmgenet: Reset RBUF on first open
  2024-03-19 21:11         ` Maarten
@ 2024-03-19 21:50           ` Florian Fainelli
  0 siblings, 0 replies; 13+ messages in thread
From: Florian Fainelli @ 2024-03-19 21:50 UTC (permalink / raw)
  To: Maarten, Florian Fainelli, netdev, Broadcom internal kernel review list
  Cc: Doug Berger, Phil Elwell

[-- Attachment #1: Type: text/plain, Size: 5350 bytes --]

On 3/19/24 14:11, Maarten wrote:
> Florian Fainelli schreef op 2024-03-19 17:56:
>> On 3/16/24 04:53, Maarten wrote:
>>> Doug Berger schreef op 2024-02-27 00:13:
>>>> On 2/26/2024 9:34 AM, Florian Fainelli wrote:
>>>>> On 2/23/24 15:53, Maarten Vanraes wrote:
>>>>>> From: Phil Elwell <phil@raspberrypi.com>
>>>>>>
>>>>>> If the RBUF logic is not reset when the kernel starts then there
>>>>>> may be some data left over from any network boot loader. If the
>>>>>> 64-byte packet headers are enabled then this can be fatal.
>>>>>>
>>>>>> Extend bcmgenet_dma_disable to do perform the reset, but not when
>>>>>> called from bcmgenet_resume in order to preserve a wake packet.
>>>>>>
>>>>>> N.B. This different handling of resume is just based on a hunch -
>>>>>> why else wouldn't one reset the RBUF as well as the TBUF? If this
>>>>>> isn't the case then it's easy to change the patch to make the RBUF
>>>>>> reset unconditional.
>>>>>
>>>>> The real question is why is not the boot loader putting the GENET 
>>>>> core into a quasi power-on-reset state, since this is what Linux 
>>>>> expects, and also it seems the most conservative and prudent 
>>>>> approach. Assuming the RDMA and Unimac RX are disabled, otherwise 
>>>>> we would happily continuing to accept packets in DRAM, then the 
>>>>> question is why is not the RBUF flushed too, or is it flushed, but 
>>>>> this is insufficient, if so, have we determined why?
>>>>>
>>>>>>
>>>>>> See: https://github.com/raspberrypi/linux/issues/3850
>>>>>>
>>>>>> Signed-off-by: Phil Elwell <phil@raspberrypi.com>
>>>>>> Signed-off-by: Maarten Vanraes <maarten@rmail.be>
>>>>>> ---
>>>>>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 
>>>>>> ++++++++++++----
>>>>>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if 
>>>>>> you're using
>>>>>> nfsroot), you fail to boot; or at least the boot takes longer than
>>>>>> 30 minutes.
>>>>>
>>>>> This makes me wonder whether this also fixes the issues that Maxime 
>>>>> reported a long time ago, which I can reproduce too, but have not 
>>>>> been able to track down the source of:
>>>>>
>>>>> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/
>>>>>
>>>>>>
>>>>>> Doing a simple ping revealed that when the ping starts working again
>>>>>> (during the boot process), you have ping timings of ~1000ms, 
>>>>>> 2000ms or
>>>>>> even 3000ms; while in normal cases it would be around 0.2ms.
>>>>>
>>>>> I would prefer that we find a way to better qualify whether a RBUF 
>>>>> reset is needed or not, but I suppose there is not any other way, 
>>>>> since there is an "RBUF enabled" bit that we can key off.
>>>>>
>>>>> Doug, what do you think?
>>>> I agree that the Linux driver expects the GENET core to be in a "quasi
>>>> power-on-reset state" and it seems likely that in both Maxime's case
>>>> and the one identified here that is not the case. It would appear that
>>>> the Raspberry Pi bootloader and/or "firmware" are likely not disabling
>>>> the GENET receiver after loading the kernel image and before invoking
>>>> the kernel. They may be disabling the DMA, but that is insufficient
>>>> since any received data would likely overflow the RBUF leaving it in a
>>>> "bad" state which this patch apparently improves.
>>>>
>>>> So it seems likely these issues are caused by improper
>>>> bootloader/firmware behavior.
>>>>
>>>> That said, I suppose it would be nice if the driver were more robust.
>>>> However, we both know how finicky the receive path of the GENET core
>>>> can be about its initialization. Therefore, I am unwilling to "bless"
>>>> this change for upstream without more due diligence on our side.
>>>
>>> Hey, did you guys have any chance to check this stuff out? any 
>>> thoughts on it?
>>
>> We are both busy with higher priority work and I cannot see us being
>> able to dedicate any time to this issue until April.
>>
>> While we are sympathetic to your issue and you having upstreamed a fix
>> for it, it is entirely self inflicted by having the VPU boot loader
>> firmware not properly quiesce the GENET controller, at least based
>> upon the description, therefore the natural fix should be... in the
>> firmware.
> 
> I totally agree that the natural fix should be in the firmware.
> 
>> From my perspective: NAK.
> 
> Fair enough, though I do think that there are often workarounds for 
> faulty firmware, and making the driver more robust is not a bad thing 
> either.

That is fair enough, the kernel does indeed have a number of workarounds 
for bad firmware and hardware obviously. I am seriously concerned that 
doing this RBUF reset might be beneficial here on 2711, but it is not 
clear to me that this is not going to break the 15+ other SoCs that use 
BCMGENET (BCM7xxx). The 2711 use case is a specific chip with a 
completely different clocking and busing compared to the other chips 
where this block has been deployed, and there is also a very narrow use 
with RGMII only, whereas we have MII, reverse MII, GMII and RGMII being 
actively used.

> 
> In any case, I try to raise this issue with the raspberry pi people in 
> the hopes of fixing this in the proper place.
> 

Yes, let's try that route, and then maybe come April we have some spare 
cycles for running some tests.
-- 
Florian


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4221 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-03-19 21:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-23 23:53 [PATCH] net: bcmgenet: Reset RBUF on first open Maarten Vanraes
2024-02-26 17:34 ` Florian Fainelli
2024-02-26 19:14   ` Maarten
2024-02-26 23:13   ` Doug Berger
2024-02-27 12:53     ` Paolo Abeni
2024-03-05 15:13     ` Jakub Kicinski
2024-03-05 20:36       ` Maarten
2024-03-05 21:07         ` Jakub Kicinski
2024-03-06  8:03           ` Maarten
2024-03-16 11:53     ` Maarten
2024-03-19 16:56       ` Florian Fainelli
2024-03-19 21:11         ` Maarten
2024-03-19 21:50           ` Florian Fainelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.