On 3/19/24 14:11, Maarten wrote: > Florian Fainelli schreef op 2024-03-19 17:56: >> On 3/16/24 04:53, Maarten wrote: >>> Doug Berger schreef op 2024-02-27 00:13: >>>> On 2/26/2024 9:34 AM, Florian Fainelli wrote: >>>>> On 2/23/24 15:53, Maarten Vanraes wrote: >>>>>> From: Phil Elwell >>>>>> >>>>>> If the RBUF logic is not reset when the kernel starts then there >>>>>> may be some data left over from any network boot loader. If the >>>>>> 64-byte packet headers are enabled then this can be fatal. >>>>>> >>>>>> Extend bcmgenet_dma_disable to do perform the reset, but not when >>>>>> called from bcmgenet_resume in order to preserve a wake packet. >>>>>> >>>>>> N.B. This different handling of resume is just based on a hunch - >>>>>> why else wouldn't one reset the RBUF as well as the TBUF? If this >>>>>> isn't the case then it's easy to change the patch to make the RBUF >>>>>> reset unconditional. >>>>> >>>>> The real question is why is not the boot loader putting the GENET >>>>> core into a quasi power-on-reset state, since this is what Linux >>>>> expects, and also it seems the most conservative and prudent >>>>> approach. Assuming the RDMA and Unimac RX are disabled, otherwise >>>>> we would happily continuing to accept packets in DRAM, then the >>>>> question is why is not the RBUF flushed too, or is it flushed, but >>>>> this is insufficient, if so, have we determined why? >>>>> >>>>>> >>>>>> See: https://github.com/raspberrypi/linux/issues/3850 >>>>>> >>>>>> Signed-off-by: Phil Elwell >>>>>> Signed-off-by: Maarten Vanraes >>>>>> --- >>>>>>   drivers/net/ethernet/broadcom/genet/bcmgenet.c | 16 >>>>>> ++++++++++++---- >>>>>>   1 file changed, 12 insertions(+), 4 deletions(-) >>>>>> >>>>>> This patch fixes a problem on RPI 4B where in ~2/3 cases (if >>>>>> you're using >>>>>> nfsroot), you fail to boot; or at least the boot takes longer than >>>>>> 30 minutes. >>>>> >>>>> This makes me wonder whether this also fixes the issues that Maxime >>>>> reported a long time ago, which I can reproduce too, but have not >>>>> been able to track down the source of: >>>>> >>>>> https://lore.kernel.org/linux-kernel/20210706081651.diwks5meyaighx3e@gilmour/ >>>>> >>>>>> >>>>>> Doing a simple ping revealed that when the ping starts working again >>>>>> (during the boot process), you have ping timings of ~1000ms, >>>>>> 2000ms or >>>>>> even 3000ms; while in normal cases it would be around 0.2ms. >>>>> >>>>> I would prefer that we find a way to better qualify whether a RBUF >>>>> reset is needed or not, but I suppose there is not any other way, >>>>> since there is an "RBUF enabled" bit that we can key off. >>>>> >>>>> Doug, what do you think? >>>> I agree that the Linux driver expects the GENET core to be in a "quasi >>>> power-on-reset state" and it seems likely that in both Maxime's case >>>> and the one identified here that is not the case. It would appear that >>>> the Raspberry Pi bootloader and/or "firmware" are likely not disabling >>>> the GENET receiver after loading the kernel image and before invoking >>>> the kernel. They may be disabling the DMA, but that is insufficient >>>> since any received data would likely overflow the RBUF leaving it in a >>>> "bad" state which this patch apparently improves. >>>> >>>> So it seems likely these issues are caused by improper >>>> bootloader/firmware behavior. >>>> >>>> That said, I suppose it would be nice if the driver were more robust. >>>> However, we both know how finicky the receive path of the GENET core >>>> can be about its initialization. Therefore, I am unwilling to "bless" >>>> this change for upstream without more due diligence on our side. >>> >>> Hey, did you guys have any chance to check this stuff out? any >>> thoughts on it? >> >> We are both busy with higher priority work and I cannot see us being >> able to dedicate any time to this issue until April. >> >> While we are sympathetic to your issue and you having upstreamed a fix >> for it, it is entirely self inflicted by having the VPU boot loader >> firmware not properly quiesce the GENET controller, at least based >> upon the description, therefore the natural fix should be... in the >> firmware. > > I totally agree that the natural fix should be in the firmware. > >> From my perspective: NAK. > > Fair enough, though I do think that there are often workarounds for > faulty firmware, and making the driver more robust is not a bad thing > either. That is fair enough, the kernel does indeed have a number of workarounds for bad firmware and hardware obviously. I am seriously concerned that doing this RBUF reset might be beneficial here on 2711, but it is not clear to me that this is not going to break the 15+ other SoCs that use BCMGENET (BCM7xxx). The 2711 use case is a specific chip with a completely different clocking and busing compared to the other chips where this block has been deployed, and there is also a very narrow use with RGMII only, whereas we have MII, reverse MII, GMII and RGMII being actively used. > > In any case, I try to raise this issue with the raspberry pi people in > the hopes of fixing this in the proper place. > Yes, let's try that route, and then maybe come April we have some spare cycles for running some tests. -- Florian