netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: lan78xx and phy_state_machine
       [not found]   ` <20191014192529.z7c5x6hzixxeplvw@beryllium.lan>
@ 2019-10-14 19:51     ` Stefan Wahren
  2019-10-14 20:20       ` Heiner Kallweit
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Wahren @ 2019-10-14 19:51 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Heiner Kallweit
  Cc: Daniel Wagner, Russell King - ARM Linux admin,
	bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	netdev

[add more recipients]

Am 14.10.19 um 21:25 schrieb Daniel Wagner:
> On Mon, Oct 14, 2019 at 05:30:04PM +0100, Russell King - ARM Linux admin wrote:
>> On Mon, Oct 14, 2019 at 04:06:04PM +0200, Daniel Wagner wrote:
>>> Hi,
>>>
>>> I've trying to boot a RPi 3 Model B+ in 64 bit mode. While I can get
>>> my configuratin booting with v5.2.20, the current kernel v5.3.6 hangs
>>> when initializing the eth interface.
>>>
>>> Is this a know issue? Some configuration issues?
>> I don't see any successfully probed ethernet devices in the boot log, so
>> I've no idea which of the multitude of ethernet drivers to look at.  I
>> thought maybe I could look at the DT, but I've no idea where
>> "arm/bcm2837-rpi-3-b-plus.dts" is located, included by
>> arch/arm64/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts.
> Sorry about being so terse. I thought, the RPi devices are well known. My bad.
> Anyway, the kernel reports that is the lan78xx driver.
>
> ls -1 /sys/class/net/ | grep -v lo | xargs -n1 -I{} bash -c 'echo -n {} :" " ; basename `readlink -f /sys/class/net/{}/device/driver`'
> eth0 : lan78xx
>
>> The oops is because the PHY state machine has been started, but there
>> is no phydev->adjust_link set.  Can't say much more than that without
>> knowing what the driver is doing.
> This was a good tip! After a few printks I figured out what is happening.
>
> phy_connect_direct()
>    phy_attach_direct()
>      workqueue
>        phy_check_link_status()
>          phy_link_change
>
>
> Moving the phy_prepare_link() up in phy_connect_direct() ensures that
> phydev->adjust_link is set when the phy_check_link_status() is called.
>
> diff --git a/drivers/net/phy/phy_device.c
> b/drivers/net/phy/phy_device.c index 9d2bbb13293e..2a61812bcb0d 100644
> --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c
> @@ -951,11 +951,12 @@ int phy_connect_direct(struct net_device *dev,
> struct phy_device *phydev, if (!dev) return -EINVAL;
>
> +       phy_prepare_link(phydev, handler);
> +
>         rc = phy_attach_direct(dev, phydev, phydev->dev_flags, interface);
>         if (rc)
>                 return rc;
>
> -       phy_prepare_link(phydev, handler);
>         if (phy_interrupt_is_valid(phydev))
>                 phy_request_interrupt(phydev);
>
> _______________________________________________
> linux-rpi-kernel mailing list
> linux-rpi-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-14 19:51     ` lan78xx and phy_state_machine Stefan Wahren
@ 2019-10-14 20:20       ` Heiner Kallweit
  2019-10-14 22:12         ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 17+ messages in thread
From: Heiner Kallweit @ 2019-10-14 20:20 UTC (permalink / raw)
  To: Stefan Wahren, Andrew Lunn, Florian Fainelli
  Cc: Daniel Wagner, Russell King - ARM Linux admin,
	bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	netdev

On 14.10.2019 21:51, Stefan Wahren wrote:
> [add more recipients]
> 
> Am 14.10.19 um 21:25 schrieb Daniel Wagner:
>> On Mon, Oct 14, 2019 at 05:30:04PM +0100, Russell King - ARM Linux admin wrote:
>>> On Mon, Oct 14, 2019 at 04:06:04PM +0200, Daniel Wagner wrote:
>>>> Hi,
>>>>
>>>> I've trying to boot a RPi 3 Model B+ in 64 bit mode. While I can get
>>>> my configuratin booting with v5.2.20, the current kernel v5.3.6 hangs
>>>> when initializing the eth interface.
>>>>
>>>> Is this a know issue? Some configuration issues?
>>> I don't see any successfully probed ethernet devices in the boot log, so
>>> I've no idea which of the multitude of ethernet drivers to look at.  I
>>> thought maybe I could look at the DT, but I've no idea where
>>> "arm/bcm2837-rpi-3-b-plus.dts" is located, included by
>>> arch/arm64/boot/dts/broadcom/bcm2837-rpi-3-b-plus.dts.
>> Sorry about being so terse. I thought, the RPi devices are well known. My bad.
>> Anyway, the kernel reports that is the lan78xx driver.
>>
>> ls -1 /sys/class/net/ | grep -v lo | xargs -n1 -I{} bash -c 'echo -n {} :" " ; basename `readlink -f /sys/class/net/{}/device/driver`'
>> eth0 : lan78xx
>>
>>> The oops is because the PHY state machine has been started, but there
>>> is no phydev->adjust_link set.  Can't say much more than that without
>>> knowing what the driver is doing.
>> This was a good tip! After a few printks I figured out what is happening.
>>
>> phy_connect_direct()
>>    phy_attach_direct()
>>      workqueue
>>        phy_check_link_status()
>>          phy_link_change
>>

Interesting is just what is special with your config that this issue
didn't occur yet on other systems.

>>
>> Moving the phy_prepare_link() up in phy_connect_direct() ensures that
>> phydev->adjust_link is set when the phy_check_link_status() is called.
>>
>> diff --git a/drivers/net/phy/phy_device.c
>> b/drivers/net/phy/phy_device.c index 9d2bbb13293e..2a61812bcb0d 100644
>> --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c
>> @@ -951,11 +951,12 @@ int phy_connect_direct(struct net_device *dev,
>> struct phy_device *phydev, if (!dev) return -EINVAL;
>>
>> +       phy_prepare_link(phydev, handler);
>> +
>>         rc = phy_attach_direct(dev, phydev, phydev->dev_flags, interface);
>>         if (rc)

If phy_attach_direct() fails we may have to reset phydev->adjust_link to NULL,
as we do in phy_disconnect(). Apart from that change looks good to me.

>>                 return rc;
>>
>> -       phy_prepare_link(phydev, handler);
>>         if (phy_interrupt_is_valid(phydev))
>>                 phy_request_interrupt(phydev);
>>
>> _______________________________________________
>> linux-rpi-kernel mailing list
>> linux-rpi-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel
> 

Heiner

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-14 20:20       ` Heiner Kallweit
@ 2019-10-14 22:12         ` Russell King - ARM Linux admin
  2019-10-15 19:38           ` Heiner Kallweit
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux admin @ 2019-10-14 22:12 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Stefan Wahren, Andrew Lunn, Florian Fainelli, Daniel Wagner,
	bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	netdev

On Mon, Oct 14, 2019 at 10:20:15PM +0200, Heiner Kallweit wrote:
> On 14.10.2019 21:51, Stefan Wahren wrote:
> > [add more recipients]
> > 
> > Am 14.10.19 um 21:25 schrieb Daniel Wagner:
> >> Moving the phy_prepare_link() up in phy_connect_direct() ensures that
> >> phydev->adjust_link is set when the phy_check_link_status() is called.
> >>
> >> diff --git a/drivers/net/phy/phy_device.c
> >> b/drivers/net/phy/phy_device.c index 9d2bbb13293e..2a61812bcb0d 100644
> >> --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c
> >> @@ -951,11 +951,12 @@ int phy_connect_direct(struct net_device *dev,
> >> struct phy_device *phydev, if (!dev) return -EINVAL;
> >>
> >> +       phy_prepare_link(phydev, handler);
> >> +
> >>         rc = phy_attach_direct(dev, phydev, phydev->dev_flags, interface);
> >>         if (rc)
> 
> If phy_attach_direct() fails we may have to reset phydev->adjust_link to NULL,
> as we do in phy_disconnect(). Apart from that change looks good to me.

Sorry, but it doesn't look good to me.

I think there's a deeper question here - why is the phy state machine
trying to call the link change function during attach?

At this point, the PHY hasn't been "started" so it shouldn't be
doing that.

Note the documentation, specifically phy.rst's "Keeping Close Tabs on
the PAL" section.  Drivers are at liberty to use phy_prepare_link()
_after_ phy_attach(), which means there is a window for
phydev->adjust_link to be NULL.  It should _not_ be called at this
point.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
       [not found] ` <20191015005327.GJ19861@lunn.ch>
@ 2019-10-15 17:16   ` Daniel Wagner
  2019-10-16 14:25     ` Daniel Wagner
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Wagner @ 2019-10-15 17:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	Woojung Huh, UNGLinuxDriver, netdev

Hi Andrew,

On Tue, Oct 15, 2019 at 02:53:27AM +0200, Andrew Lunn wrote:
> On Mon, Oct 14, 2019 at 04:06:04PM +0200, Daniel Wagner wrote:
> > Hi,
> > 
> > I've trying to boot a RPi 3 Model B+ in 64 bit mode. While I can get
> > my configuratin booting with v5.2.20, the current kernel v5.3.6 hangs
> > when initializing the eth interface.
> > 
> > Is this a know issue? Some configuration issues?
> 
> Hi Daniel
> 
> Please could you add a WARN_ON(1); in phy_queue_state_machine() and
> post the stack dump. That might help us figure out what is going on.

I tried to get a stack dump from the WARN_ON(1). The 'make defconfig'
seems not to enable it(?). Anyway I played a bit and noticed, that
depending which additional debug config switch is enabled the
problem disappears. The boot timing is important it seems.

After the feedback I got so far, it think my setup is 'special' in
sofar I don't boot from eMMC. Instead I rely on TFTP and NFS for
rootfs:

 - kernel is configured as 'make defconfig' +

	#
	# Built in drivers
	#
	CONFIG_USB_LAN78XX=y

	#
	# Networking
	#
	CONFIG_PACKET=y
	CONFIG_UNIX=y
	CONFIG_INET=y
	CONFIG_IP_PNP=y
	CONFIG_IP_PNP_DHCP=y

	# NFS
	CONFIG_NFS_FS=y
	CONFIG_NFS_V4=y
	CONFIG_NFS_V4_1=y
	CONFIG_NFS_V4_2=y

	#
	# Debugging
	#
	CONFIG_PRINTK_TIME=y
	CONFIG_DEBUG_KERNEL=y
	CONFIG_EARLY_PRINTK=y
	CONFIG_MESSAGE_LOGLEVEL_DEFAULT=7

	# Embedded config to kernel. /proc/config.gz
	CONFIG_IKCONFIG=y
	CONFIG_IKCONFIG_PROC=y

	CONFIG_KEXEC=y

 - u-boot enables network interface, does DHCP
 - fetches a PXE image
 - PXE loads DTB, kernel and starts the kernel
 - rootfs is supposed to be provided via NFS

Could it be that the networking interface is still running (from
u-boot and PXE) when the drivers is setting it up and the workqueue is
premature kicked to work?

Anyway, I keep trying to get some trace out of it.

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-14 22:12         ` Russell King - ARM Linux admin
@ 2019-10-15 19:38           ` Heiner Kallweit
  2019-10-15 22:09             ` Russell King - ARM Linux admin
  2019-10-16  5:48             ` Stefan Wahren
  0 siblings, 2 replies; 17+ messages in thread
From: Heiner Kallweit @ 2019-10-15 19:38 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Woojung Huh,
	Microchip Linux Driver Support
  Cc: Stefan Wahren, Andrew Lunn, Florian Fainelli, Daniel Wagner,
	bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	netdev

On 15.10.2019 00:12, Russell King - ARM Linux admin wrote:
> On Mon, Oct 14, 2019 at 10:20:15PM +0200, Heiner Kallweit wrote:
>> On 14.10.2019 21:51, Stefan Wahren wrote:
>>> [add more recipients]
>>>
>>> Am 14.10.19 um 21:25 schrieb Daniel Wagner:
>>>> Moving the phy_prepare_link() up in phy_connect_direct() ensures that
>>>> phydev->adjust_link is set when the phy_check_link_status() is called.
>>>>
>>>> diff --git a/drivers/net/phy/phy_device.c
>>>> b/drivers/net/phy/phy_device.c index 9d2bbb13293e..2a61812bcb0d 100644
>>>> --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c
>>>> @@ -951,11 +951,12 @@ int phy_connect_direct(struct net_device *dev,
>>>> struct phy_device *phydev, if (!dev) return -EINVAL;
>>>>
>>>> +       phy_prepare_link(phydev, handler);
>>>> +
>>>>         rc = phy_attach_direct(dev, phydev, phydev->dev_flags, interface);
>>>>         if (rc)
>>
>> If phy_attach_direct() fails we may have to reset phydev->adjust_link to NULL,
>> as we do in phy_disconnect(). Apart from that change looks good to me.
> 
> Sorry, but it doesn't look good to me.
> 
> I think there's a deeper question here - why is the phy state machine
> trying to call the link change function during attach?
After your comment I had a closer look at the lm78xx driver and few things
look suspicious:

- lan78xx_phy_init() (incl. the call to phy_connect_direct()) is called
  after register_netdev(). This may cause races.

- The following is wrong, irq = 0 doesn't mean polling.
  PHY_POLL is defined as -1. Also in case of irq = 0 phy_interrupt_is_valid()
  returns true.

	/* if phyirq is not set, use polling mode in phylib */
	if (dev->domain_data.phyirq > 0)
		phydev->irq = dev->domain_data.phyirq;
	else
		phydev->irq = 0;

- Manually calling genphy_config_aneg() in lan78xx_phy_init() isn't
  needed, however this should not cause our problem.

Bugs in the network driver would also explain why the issue doesn't occur
on other systems. Once we know more about the actual root cause
maybe phylib can be extended to detect that situation and warn.

> At this point, the PHY hasn't been "started" so it shouldn't be
> doing that.
> 
> Note the documentation, specifically phy.rst's "Keeping Close Tabs on
> the PAL" section.  Drivers are at liberty to use phy_prepare_link()
> _after_ phy_attach(), which means there is a window for
> phydev->adjust_link to be NULL.  It should _not_ be called at this
> point.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-15 19:38           ` Heiner Kallweit
@ 2019-10-15 22:09             ` Russell King - ARM Linux admin
  2019-10-16 15:36               ` Andrew Lunn
  2019-10-16  5:48             ` Stefan Wahren
  1 sibling, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux admin @ 2019-10-15 22:09 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Woojung Huh, Microchip Linux Driver Support, Andrew Lunn,
	Florian Fainelli, Daniel Wagner, netdev,
	bcm-kernel-feedback-list, Stefan Wahren, linux-arm-kernel,
	linux-rpi-kernel

On Tue, Oct 15, 2019 at 09:38:22PM +0200, Heiner Kallweit wrote:
> On 15.10.2019 00:12, Russell King - ARM Linux admin wrote:
> > On Mon, Oct 14, 2019 at 10:20:15PM +0200, Heiner Kallweit wrote:
> >> On 14.10.2019 21:51, Stefan Wahren wrote:
> >>> [add more recipients]
> >>>
> >>> Am 14.10.19 um 21:25 schrieb Daniel Wagner:
> >>>> Moving the phy_prepare_link() up in phy_connect_direct() ensures that
> >>>> phydev->adjust_link is set when the phy_check_link_status() is called.
> >>>>
> >>>> diff --git a/drivers/net/phy/phy_device.c
> >>>> b/drivers/net/phy/phy_device.c index 9d2bbb13293e..2a61812bcb0d 100644
> >>>> --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c
> >>>> @@ -951,11 +951,12 @@ int phy_connect_direct(struct net_device *dev,
> >>>> struct phy_device *phydev, if (!dev) return -EINVAL;
> >>>>
> >>>> +       phy_prepare_link(phydev, handler);
> >>>> +
> >>>>         rc = phy_attach_direct(dev, phydev, phydev->dev_flags, interface);
> >>>>         if (rc)
> >>
> >> If phy_attach_direct() fails we may have to reset phydev->adjust_link to NULL,
> >> as we do in phy_disconnect(). Apart from that change looks good to me.
> > 
> > Sorry, but it doesn't look good to me.
> > 
> > I think there's a deeper question here - why is the phy state machine
> > trying to call the link change function during attach?
> After your comment I had a closer look at the lm78xx driver and few things
> look suspicious:
> 
> - lan78xx_phy_init() (incl. the call to phy_connect_direct()) is called
>   after register_netdev(). This may cause races.

That isn't a problem.  We have lots of network device drivers that do
this - in their open() function.

> - The following is wrong, irq = 0 doesn't mean polling.
>   PHY_POLL is defined as -1. Also in case of irq = 0 phy_interrupt_is_valid()
>   returns true.
> 
> 	/* if phyirq is not set, use polling mode in phylib */
> 	if (dev->domain_data.phyirq > 0)
> 		phydev->irq = dev->domain_data.phyirq;
> 	else
> 		phydev->irq = 0;

Also unlikely to be the cause of this problem.  phy_connect_direct() is
called with an adjust link function, which is set via
phy_prepare_link() in phy_connect_direct(), before interrupts are even
considered.

So, the window for the bug is somewhere before the call to
phy_prepare_link() in phy_connect_direct(), but after
lan78xx_mdio_init().

> - Manually calling genphy_config_aneg() in lan78xx_phy_init() isn't
>   needed, however this should not cause our problem.

Again, way after the point where phydev->adjust_link is non-NULL,
so this can't be it.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-15 19:38           ` Heiner Kallweit
  2019-10-15 22:09             ` Russell King - ARM Linux admin
@ 2019-10-16  5:48             ` Stefan Wahren
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Wahren @ 2019-10-16  5:48 UTC (permalink / raw)
  To: Heiner Kallweit, Russell King - ARM Linux admin, Woojung Huh,
	Microchip Linux Driver Support
  Cc: Andrew Lunn, Florian Fainelli, Daniel Wagner, netdev,
	bcm-kernel-feedback-list, linux-arm-kernel, linux-rpi-kernel

Am 15.10.19 um 21:38 schrieb Heiner Kallweit:
> On 15.10.2019 00:12, Russell King - ARM Linux admin wrote:
>> On Mon, Oct 14, 2019 at 10:20:15PM +0200, Heiner Kallweit wrote:
>>> On 14.10.2019 21:51, Stefan Wahren wrote:
>>>> [add more recipients]
>>>>
>>>> Am 14.10.19 um 21:25 schrieb Daniel Wagner:
>>>>> Moving the phy_prepare_link() up in phy_connect_direct() ensures that
>>>>> phydev->adjust_link is set when the phy_check_link_status() is called.
>>>>>
>>>>> diff --git a/drivers/net/phy/phy_device.c
>>>>> b/drivers/net/phy/phy_device.c index 9d2bbb13293e..2a61812bcb0d 100644
>>>>> --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c
>>>>> @@ -951,11 +951,12 @@ int phy_connect_direct(struct net_device *dev,
>>>>> struct phy_device *phydev, if (!dev) return -EINVAL;
>>>>>
>>>>> +       phy_prepare_link(phydev, handler);
>>>>> +
>>>>>         rc = phy_attach_direct(dev, phydev, phydev->dev_flags, interface);
>>>>>         if (rc)
>>> If phy_attach_direct() fails we may have to reset phydev->adjust_link to NULL,
>>> as we do in phy_disconnect(). Apart from that change looks good to me.
>> Sorry, but it doesn't look good to me.
>>
>> I think there's a deeper question here - why is the phy state machine
>> trying to call the link change function during attach?
> After your comment I had a closer look at the lm78xx driver and few things
> look suspicious:
>
> - lan78xx_phy_init() (incl. the call to phy_connect_direct()) is called
>   after register_netdev(). This may cause races.
>
> - The following is wrong, irq = 0 doesn't mean polling.
>   PHY_POLL is defined as -1. Also in case of irq = 0 phy_interrupt_is_valid()
>   returns true.
>
> 	/* if phyirq is not set, use polling mode in phylib */
> 	if (dev->domain_data.phyirq > 0)
> 		phydev->irq = dev->domain_data.phyirq;
> 	else
> 		phydev->irq = 0;
>
> - Manually calling genphy_config_aneg() in lan78xx_phy_init() isn't
>   needed, however this should not cause our problem.
Thanks for this review. This may help to fix at least a one of all the
other issues with lan78xx.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-15 17:16   ` Daniel Wagner
@ 2019-10-16 14:25     ` Daniel Wagner
  2019-10-16 15:51       ` Andrew Lunn
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Wagner @ 2019-10-16 14:25 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	Woojung Huh, UNGLinuxDriver, netdev

On Tue, Oct 15, 2019 at 07:16:53PM +0200, Daniel Wagner wrote:
> Could it be that the networking interface is still running (from
> u-boot and PXE) when the drivers is setting it up and the workqueue is
> premature kicked to work?

I've dump the registers before the device is setup and verified with
the manual. So the device is in reset state as documented in the
FIGURE 13-1 http://ww1.microchip.com/downloads/en/DeviceDoc/LAN7800-Data-Sheet-DS00001992G.pdf

After being burned several times I'd like to check such things
first. Anyway, rules out my boot setup.

> Anyway, I keep trying to get some trace out of it.

After adding ignore_loglevel to command line, I finally get the a
trace on the console. Note with the WARN_ON the system boots. Though
there seems to be still something wrong the the network, because there
is no reliable connetion to the NFS server.

[    3.743559] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): No External EEPROM. Setting MAC Speed
[    3.754941] libphy: lan78xx-mdiobus: probed
[    3.815609] ------------[ cut here ]------------
[    3.820316] WARNING: CPU: 3 PID: 1 at drivers/net/phy/phy.c:496 phy_queue_state_machine+0xc/0x30
[    3.829226] Modules linked in:
[    3.832329] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc3-00018-g5bc52f64e884-dirty #32
[    3.840974] Hardware name: Raspberry Pi 3 Model B+ (DT)
[    3.846273] pstate: 60000005 (nZCv daif -PAN -UAO)
[    3.851132] pc : phy_queue_state_machine+0xc/0x30
[    3.855903] lr : phy_start+0x88/0xa0
[    3.859524] sp : ffff800010023b80
[    3.862882] x29: ffff800010023b80 x28: ffff000037c34000 
[    3.868270] x27: ffff8000111ac178 x26: 0000000000001002 
[    3.873657] x25: 0000000000000001 x24: 0000000000000000 
[    3.879046] x23: 0000000000001002 x22: ffff800010e3d850 
[    3.884433] x21: ffff000037c34800 x20: ffff000037328438 
[    3.889820] x19: ffff000037328000 x18: 000000000000000e 
[    3.895209] x17: 0000000000000001 x16: 0000000000000019 
[    3.900596] x15: 0000000000000000 x14: 0000000000000000 
[    3.905985] x13: 0000000000000000 x12: 0000000000001da9 
[    3.911372] x11: 0000000000000000 x10: 0000000000000000 
[    3.916759] x9 : ffff0000383b2750 x8 : ffff0000383b1dc0 
[    3.922148] x7 : ffff000037e900c0 x6 : 0000000000000002 
[    3.927535] x5 : 0000000000000001 x4 : ffff000037e90028 
[    3.932923] x3 : 0000000000000000 x2 : 0000000000000001 
[    3.938311] x1 : 0000000000000000 x0 : ffff000037328000 
[    3.943698] Call trace:
[    3.946179]  phy_queue_state_machine+0xc/0x30
[    3.950597]  phy_start+0x88/0xa0
[    3.953870]  lan78xx_open+0x30/0x140
[    3.957499]  __dev_open+0xc0/0x170
[    3.960950]  __dev_change_flags+0x160/0x1b8
[    3.965192]  dev_change_flags+0x20/0x60
[    3.969083]  ip_auto_config+0x254/0xe54
[    3.972974]  do_one_initcall+0x50/0x190
[    3.976865]  kernel_init_freeable+0x194/0x22c
[    3.981285]  kernel_init+0x10/0x100
[    3.984822]  ret_from_fork+0x10/0x18
[    3.988445] ---[ end trace a7b6e745fa28cd56 ]---
[    4.025682] random: crng init done
[    6.401142] ------------[ cut here ]------------
[    6.405854] irq 79 handler irq_default_primary_handler+0x0/0x8 enabled interrupts
[    6.413468] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x150/0x170
[    6.422642] Modules linked in:
[    6.425744] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.4.0-rc3-00018-g5bc52f64e884-dirty #32
[    6.435799] Hardware name: Raspberry Pi 3 Model B+ (DT)
[    6.441099] pstate: 60000005 (nZCv daif -PAN -UAO)
[    6.445957] pc : __handle_irq_event_percpu+0x150/0x170
[    6.451168] lr : __handle_irq_event_percpu+0x150/0x170
[    6.456375] sp : ffff800010003cc0
[    6.459732] x29: ffff800010003cc0 x28: 0000000000000060 
[    6.465120] x27: ffff8000110929a8 x26: ffff80001192d86b 
[    6.470508] x25: ffff800011782d40 x24: ffff0000374cde00 
[    6.475897] x23: 000000000000004f x22: ffff800010003d64 
[    6.481285] x21: 0000000000000000 x20: 0000000000000002 
[    6.486672] x19: ffff0000372ee180 x18: 0000000000000010 
[    6.492060] x17: 0000000000000001 x16: 0000000000000007 
[    6.497448] x15: ffff8000117831b0 x14: 747075727265746e 
[    6.502835] x13: 692064656c62616e x12: 65203878302f3078 
[    6.508223] x11: 302b72656c646e61 x10: 685f7972616d6972 
[    6.513611] x9 : 705f746c75616665 x8 : ffff800011952000 
[    6.518999] x7 : ffff80001066dce0 x6 : 0000000000000106 
[    6.524387] x5 : 0000000000000000 x4 : 0000000000000000 
[    6.529775] x3 : 00000000ffffffff x2 : ffff800011792440 
[    6.535163] x1 : 190f5ab71e843000 x0 : 0000000000000000 
[    6.540550] Call trace:
[    6.543032]  __handle_irq_event_percpu+0x150/0x170
[    6.547890]  handle_irq_event_percpu+0x30/0x88
[    6.552394]  handle_irq_event+0x44/0xc8
[    6.556283]  handle_simple_irq+0x90/0xc0
[    6.560260]  generic_handle_irq+0x24/0x38
[    6.564328]  intr_complete+0xb0/0xe0
[    6.567955]  __usb_hcd_giveback_urb+0x58/0xf8
[    6.572374]  usb_giveback_urb_bh+0xac/0x108
[    6.576618]  tasklet_action_common.isra.0+0x154/0x1a0
[    6.581742]  tasklet_hi_action+0x24/0x30
[    6.585720]  __do_softirq+0x120/0x23c
[    6.589434]  irq_exit+0xb8/0xd8
[    6.592617]  __handle_domain_irq+0x64/0xb8
[    6.596770]  bcm2836_arm_irqchip_handle_irq+0x60/0xc0
[    6.601892]  el1_irq+0xb8/0x180
[    6.605078]  arch_cpu_idle+0x10/0x18
[    6.608704]  do_idle+0x200/0x280
[    6.611975]  cpu_startup_entry+0x24/0x40
[    6.615954]  rest_init+0xd4/0xe0
[    6.619230]  arch_call_rest_init+0xc/0x14
[    6.623294]  start_kernel+0x420/0x44c
[    6.627004] ---[ end trace a7b6e745fa28cd57 ]---
[    6.631779] ------------[ cut here ]------------
[    6.636476] WARNING: CPU: 2 PID: 129 at drivers/net/phy/phy.c:496 phy_queue_state_machine+0xc/0x30
[    6.645561] Modules linked in:
[    6.648661] CPU: 2 PID: 129 Comm: irq/79-usb-001: Tainted: G        W         5.4.0-rc3-00018-g5bc52f64e884-dirty #32
[    6.659422] Hardware name: Raspberry Pi 3 Model B+ (DT)
[    6.664720] pstate: 40000005 (nZcv daif -PAN -UAO)
[    6.669580] pc : phy_queue_state_machine+0xc/0x30
[    6.674351] lr : phy_interrupt+0x94/0xa8
[    6.678325] sp : ffff800011d43d70
[    6.681682] x29: ffff800011d43d70 x28: ffff0000374b8dc0 
[    6.687071] x27: ffff0000374b8dc0 x26: ffff80001013d670 
[    6.692459] x25: 0000000000000001 x24: ffff80001013d760 
[    6.697848] x23: ffff0000374b8dc0 x22: ffff0000374cde00 
[    6.703235] x21: ffff0000372ee180 x20: ffff0000374cde00 
[    6.708623] x19: ffff000037328000 x18: 0000000000000014 
[    6.714011] x17: 0000000007ec1044 x16: 0000000059730e39 
[    6.719400] x15: 0000000024786c56 x14: 003d090000000000 
[    6.724787] x13: 00003d08ffff9c00 x12: 0000000000000000 
[    6.730175] x11: 0000000000000000 x10: 0000000000000990 
[    6.735564] x9 : ffff800011d43d20 x8 : ffff0000374b97b0 
[    6.740952] x7 : ffff0000383de780 x6 : ffff0000383ddd40 
[    6.746340] x5 : 000000000000b958 x4 : 0000000000000000 
[    6.751728] x3 : 0000000000000000 x2 : ffff8000107af9a0 
[    6.757115] x1 : 0000000000000000 x0 : ffff000037328000 
[    6.762501] Call trace:
[    6.764983]  phy_queue_state_machine+0xc/0x30
[    6.769402]  phy_interrupt+0x94/0xa8
[    6.773027]  irq_thread_fn+0x28/0x98
[    6.776651]  irq_thread+0x148/0x240
[    6.780190]  kthread+0xf0/0x120
[    6.783375]  ret_from_fork+0x10/0x18
[    6.786996] ---[ end trace a7b6e745fa28cd58 ]---
[    6.816767] Sending DHCP requests ..., OK
[   13.644910] IP-Config: Got DHCP answer from 192.168.19.2, my address is 192.168.19.53
[   13.652888] IP-Config: Complete:
[   13.656175]      device=eth0, hwaddr=b8:27:eb:85:c7:c9, ipaddr=192.168.19.53, mask=255.255.255.0, gw=192.168.19.1
[   13.666616]      host=192.168.19.53, domain=, nis-domain=(none)
[   13.672650]      bootserver=192.168.19.2, rootserver=192.168.19.2, rootpath=
[   13.672655]      nameserver0=192.168.19.2
[   13.684179] ALSA device list:
[   13.687214]   No soundcards found.
[   13.700948] VFS: Mounted root (nfs filesystem) on device 0:19.
[   13.707424] devtmpfs: mounted
[   13.716523] Freeing unused kernel memory: 5056K
[   13.736832] Run /sbin/init as init process
[  134.108849] nfs: server 192.168.19.2 not responding, still trying
[  134.108854] nfs: server 192.168.19.2 not responding, still trying
[  134.109781] nfs: server 192.168.19.2 not responding, still trying
[  134.109786] nfs: server 192.168.19.2 OK
[  134.132312] nfs: server 192.168.19.2 not responding, still trying
[  134.132316] nfs: server 192.168.19.2 OK
[  134.143314] nfs: server 192.168.19.2 OK
[  134.143345] nfs: server 192.168.19.2 not responding, still trying
[  134.154328] nfs: server 192.168.19.2 not responding, still trying
[  134.154332] nfs: server 192.168.19.2 OK
[  134.165397] nfs: server 192.168.19.2 OK
[  134.166306] nfs: server 192.168.19.2 OK
[  134.166319] nfs: server 192.168.19.2 OK
[  134.166362] nfs: server 192.168.19.2 OK
[  139.585336] systemd[1]: System time before build time, advancing clock.

Welcome to Debian GNU/Linux 9 (stretch)!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-15 22:09             ` Russell King - ARM Linux admin
@ 2019-10-16 15:36               ` Andrew Lunn
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Lunn @ 2019-10-16 15:36 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Heiner Kallweit, Woojung Huh, Microchip Linux Driver Support,
	Florian Fainelli, Daniel Wagner, netdev,
	bcm-kernel-feedback-list, Stefan Wahren, linux-arm-kernel,
	linux-rpi-kernel

> > - lan78xx_phy_init() (incl. the call to phy_connect_direct()) is called
> >   after register_netdev(). This may cause races.
> 
> That isn't a problem.  We have lots of network device drivers that do
> this - in their open() function.

Hi Russell

Actually, here is it. lan7801_phy_init() finds the PHY device and
connects it to the MAC. lan78xx_open() calls phy_start(), with the
assumption lan7801_phy_init() has been called.

But the stack trace just provided shows this assumption is wrong. As
soon a register_netdev() is called, the kernel auto configuration is
kicking in and opening the device.

lan78xx_phy_init() needs to happen before register_netdev(), or inside
lan78xx_open().

	Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-16 14:25     ` Daniel Wagner
@ 2019-10-16 15:51       ` Andrew Lunn
  2019-10-17  6:52         ` Daniel Wagner
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Lunn @ 2019-10-16 15:51 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	Woojung Huh, UNGLinuxDriver, netdev

Hi Daniel

Please could you give this a go. It is totally untested, not even
compile tested...

Thanks
	Andrew

From 235549a687ad91c1500289fb32ee1c775d06d16d Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 16 Oct 2019 10:42:07 -0500
Subject: [PATCH] net: usb: lan78xx: Connect PHY before registering MAC

As soon as the netdev is registers, the kernel can start using the
interface. If the driver connects the MAC to the PHY after the netdev
is registered, there is a race condition where the interface can be
opened without having the PHY connected.

Change the order to close this race condition.

Fixes: 92571a1aae40 ("lan78xx: Connect phy early")
Reported-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/usb/lan78xx.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 58f5a219fb65..62948098191f 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3782,10 +3782,14 @@ static int lan78xx_probe(struct usb_interface *intf,
 	/* driver requires remote-wakeup capability during autosuspend. */
 	intf->needs_remote_wakeup = 1;
 
+	ret = lan78xx_phy_init(dev);
+	if (ret < 0)
+		goto out4;
+
 	ret = register_netdev(netdev);
 	if (ret != 0) {
 		netif_err(dev, probe, netdev, "couldn't register the device\n");
-		goto out4;
+		goto out5;
 	}
 
 	usb_set_intfdata(intf, dev);
@@ -3798,14 +3802,10 @@ static int lan78xx_probe(struct usb_interface *intf,
 	pm_runtime_set_autosuspend_delay(&udev->dev,
 					 DEFAULT_AUTOSUSPEND_DELAY);
 
-	ret = lan78xx_phy_init(dev);
-	if (ret < 0)
-		goto out5;
-
 	return 0;
 
 out5:
-	unregister_netdev(netdev);
+	phy_disconnect(netdev->phydev);
 out4:
 	usb_free_urb(dev->urb_intr);
 out3:
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-16 15:51       ` Andrew Lunn
@ 2019-10-17  6:52         ` Daniel Wagner
  2019-10-17 13:15           ` Andrew Lunn
  2019-10-17 17:05           ` Stefan Wahren
  0 siblings, 2 replies; 17+ messages in thread
From: Daniel Wagner @ 2019-10-17  6:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	Woojung Huh, UNGLinuxDriver, netdev

On Wed, Oct 16, 2019 at 05:51:07PM +0200, Andrew Lunn wrote:
> Hi Daniel
> 
> Please could you give this a go. It is totally untested, not even
> compile tested...

Sure. The system boots but ther is one splat:


[    2.213987] usb 1-1: new high-speed USB device number 2 using dwc2
[    2.426789] hub 1-1:1.0: USB hub found
[    2.430677] hub 1-1:1.0: 4 ports detected
[    2.721982] usb 1-1.1: new high-speed USB device number 3 using dwc2
[    2.826991] hub 1-1.1:1.0: USB hub found
[    2.831093] hub 1-1.1:1.0: 3 ports detected
[    3.489988] usb 1-1.1.1: new high-speed USB device number 4 using dwc2
[    3.729045] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): deferred multicast write 0x00007ca0
[    3.870518] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): No External EEPROM. Setting MAC Speed
[    3.881900] libphy: lan78xx-mdiobus: probed
[    3.893322] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): registered mdiobus bus usb-001:004
[    3.902984] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): phydev->irq = 79
[    4.283761] random: crng init done
[    4.958866] lan78xx 1-1.1.1:1.0 eth0: receive multicast hash filter
[    4.965311] lan78xx 1-1.1.1:1.0 eth0: deferred multicast write 0x00007ca2
[    6.502358] lan78xx 1-1.1.1:1.0 eth0: PHY INTR: 0x00020000
[    6.507935] ------------[ cut here ]------------
[    6.512635] irq 79 handler irq_default_primary_handler+0x0/0x8 enabled interrupts
[    6.520250] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x150/0x170
[    6.529424] Modules linked in:
[    6.532526] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3-00018-g5bc52f64e884-dirty #36
[    6.541172] Hardware name: Raspberry Pi 3 Model B+ (DT)
[    6.546471] pstate: 60000005 (nZCv daif -PAN -UAO)
[    6.551329] pc : __handle_irq_event_percpu+0x150/0x170
[    6.556539] lr : __handle_irq_event_percpu+0x150/0x170
[    6.561747] sp : ffff800010003cc0
[    6.565104] x29: ffff800010003cc0 x28: 0000000000000060 
[    6.570493] x27: ffff8000110fb9b0 x26: ffff800011a3daeb 
[    6.575882] x25: ffff800011892d40 x24: ffff000037525800 
[    6.581270] x23: 000000000000004f x22: ffff800010003d64 
[    6.586659] x21: 0000000000000000 x20: 0000000000000002 
[    6.592046] x19: ffff00003716fb00 x18: 0000000000000010 
[    6.597434] x17: 0000000000000001 x16: 0000000000000007 
[    6.602822] x15: ffff8000118931b0 x14: 747075727265746e 
[    6.608210] x13: 692064656c62616e x12: 65203878302f3078 
[    6.613598] x11: 302b72656c646e61 x10: 685f7972616d6972 
[    6.618986] x9 : 705f746c75616665 x8 : ffff800011a9f000 
[    6.624374] x7 : ffff800010681150 x6 : 00000000000000f9 
[    6.629761] x5 : 0000000000000000 x4 : 0000000000000000 
[    6.635148] x3 : 00000000ffffffff x2 : ffff8000118a2440 
[    6.640535] x1 : ab82878caf7c9e00 x0 : 0000000000000000 
[    6.645923] Call trace:
[    6.648404]  __handle_irq_event_percpu+0x150/0x170
[    6.653262]  handle_irq_event_percpu+0x30/0x88
[    6.657767]  handle_irq_event+0x44/0xc8
[    6.661659]  handle_simple_irq+0x90/0xc0
[    6.665635]  generic_handle_irq+0x24/0x38
[    6.669703]  intr_complete+0x104/0x178
[    6.673508]  __usb_hcd_giveback_urb+0x58/0xf8
[    6.677927]  usb_giveback_urb_bh+0xac/0x108
[    6.682173]  tasklet_action_common.isra.0+0x154/0x1a0
[    6.687298]  tasklet_hi_action+0x24/0x30
[    6.691277]  __do_softirq+0x120/0x23c
[    6.694990]  irq_exit+0xb8/0xd8
[    6.698174]  __handle_domain_irq+0x64/0xb8
[    6.702326]  bcm2836_arm_irqchip_handle_irq+0x60/0xc0
[    6.707449]  el1_irq+0xb8/0x180
[    6.710634]  arch_cpu_idle+0x10/0x18
[    6.714260]  do_idle+0x200/0x280
[    6.717532]  cpu_startup_entry+0x20/0x40
[    6.721512]  rest_init+0xd4/0xe0
[    6.724786]  arch_call_rest_init+0xc/0x14
[    6.728851]  start_kernel+0x420/0x44c
[    6.732562] ---[ end trace e770c2c68be5476f ]---
[    6.742776] lan78xx 1-1.1.1:1.0 eth0: speed: 1000 duplex: 1 anadv: 0x05e1 anlpa: 0xc1e1
[    6.750940] lan78xx 1-1.1.1:1.0 eth0: rx pause disabled, tx pause disabled
[    6.769976] Sending DHCP requests ..., OK
[   12.926088] IP-Config: Got DHCP answer from 192.168.19.2, my address is 192.168.19.53
[   12.934059] IP-Config: Complete:
[   12.937335]      device=eth0, hwaddr=b8:27:eb:85:c7:c9, ipaddr=192.168.19.53, mask=255.255.255.0, gw=192.168.19.1
[   12.947758]      host=192.168.19.53, domain=, nis-domain=(none)
[   12.953772]      bootserver=192.168.19.2, rootserver=192.168.19.2, rootpath=
[   12.953776]      nameserver0=192.168.19.2
[   12.965221] ALSA device list:
[   12.968246]   No soundcards found.
[   12.984397] VFS: Mounted root (nfs filesystem) on device 0:19.
[   12.991059] devtmpfs: mounted
[   13.000530] Freeing unused kernel memory: 5504K
[   13.018077] Run /sbin/init as init process
[   44.010022] nfs: server 192.168.19.2 not responding, still trying
[   44.010027] nfs: server 192.168.19.2 not responding, still trying
[   44.010033] nfs: server 192.168.19.2 not responding, still trying
[   44.010056] nfs: server 192.168.19.2 not responding, still trying
[   44.010070] nfs: server 192.168.19.2 not responding, still trying
[   44.017003] nfs: server 192.168.19.2 OK
[   44.028842] nfs: server 192.168.19.2 OK
[   44.035171] nfs: server 192.168.19.2 OK
[   44.035751] nfs: server 192.168.19.2 OK
[   44.035796] nfs: server 192.168.19.2 OK
[   46.056211] systemd[1]: System time before build time, advancing clock.
[   46.114708] systemd[1]: systemd 232 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
[   46.133593] systemd[1]: Detected architecture arm64.

Welcome to Debian GNU/Linux 9 (stretch)!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-17  6:52         ` Daniel Wagner
@ 2019-10-17 13:15           ` Andrew Lunn
  2019-10-17 17:05           ` Stefan Wahren
  1 sibling, 0 replies; 17+ messages in thread
From: Andrew Lunn @ 2019-10-17 13:15 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel,
	Woojung Huh, UNGLinuxDriver, netdev

On Thu, Oct 17, 2019 at 08:52:30AM +0200, Daniel Wagner wrote:
> On Wed, Oct 16, 2019 at 05:51:07PM +0200, Andrew Lunn wrote:
> > Hi Daniel
> > 
> > Please could you give this a go. It is totally untested, not even
> > compile tested...
> 
> Sure. The system boots but ther is one splat:

Cool. So we are going in the right direction.

This splat looks complete different. But it might still be a race
condition with netdev_register. We should look at what the power
management code is doing.

	   Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-17  6:52         ` Daniel Wagner
  2019-10-17 13:15           ` Andrew Lunn
@ 2019-10-17 17:05           ` Stefan Wahren
  2019-10-17 17:41             ` Daniel Wagner
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Wahren @ 2019-10-17 17:05 UTC (permalink / raw)
  To: Daniel Wagner, Andrew Lunn
  Cc: Woojung Huh, netdev, UNGLinuxDriver, bcm-kernel-feedback-list,
	linux-rpi-kernel, linux-arm-kernel

Hi Daniel,

Am 17.10.19 um 08:52 schrieb Daniel Wagner:
> On Wed, Oct 16, 2019 at 05:51:07PM +0200, Andrew Lunn wrote:
>> Hi Daniel
>>
>> Please could you give this a go. It is totally untested, not even
>> compile tested...
> Sure. The system boots but ther is one splat:
>
this is a known issues since 4.20 [1], [2]. So not related to the crash.

Unfortunately, you didn't wrote which kernel version works for you
(except of this splat). Only 5.3 or 5.4-rc3 too?

[1] - https://marc.info/?l=linux-netdev&m=154604180927252&w=2
[2] - https://patchwork.kernel.org/patch/10888797/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-17 17:05           ` Stefan Wahren
@ 2019-10-17 17:41             ` Daniel Wagner
  2019-10-17 17:52               ` Stefan Wahren
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Wagner @ 2019-10-17 17:41 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Andrew Lunn, Woojung Huh, netdev, UNGLinuxDriver,
	bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel

Hi Stefan,

On Thu, Oct 17, 2019 at 07:05:32PM +0200, Stefan Wahren wrote:
> Am 17.10.19 um 08:52 schrieb Daniel Wagner:
> > On Wed, Oct 16, 2019 at 05:51:07PM +0200, Andrew Lunn wrote:
> >> Please could you give this a go. It is totally untested, not even
> >> compile tested...
> > Sure. The system boots but ther is one splat:
> >
> this is a known issues since 4.20 [1], [2]. So not related to the crash.

Oh, I see.

> Unfortunately, you didn't wrote which kernel version works for you
> (except of this splat). Only 5.3 or 5.4-rc3 too?

With v5.2.20 I was able to boot the system. But after this discussion
I would say that was just luck. The race seems to exist for longer and
only with my 'special' config I am able to reproduce it.

> [1] - https://marc.info/?l=linux-netdev&m=154604180927252&w=2
> [2] - https://patchwork.kernel.org/patch/10888797/

Indeed, the irq domain code looks suspicious and Marc pointed out that
is dead wrong. Could we just go with [2] and fix this up?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-17 17:41             ` Daniel Wagner
@ 2019-10-17 17:52               ` Stefan Wahren
  2019-10-17 18:14                 ` Daniel Wagner
  2019-10-17 18:25                 ` Andrew Lunn
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Wahren @ 2019-10-17 17:52 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Woojung Huh, Andrew Lunn, netdev, UNGLinuxDriver,
	bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel

Hi Daniel,

Am 17.10.19 um 19:41 schrieb Daniel Wagner:
> Hi Stefan,
>
> On Thu, Oct 17, 2019 at 07:05:32PM +0200, Stefan Wahren wrote:
>> Am 17.10.19 um 08:52 schrieb Daniel Wagner:
>>> On Wed, Oct 16, 2019 at 05:51:07PM +0200, Andrew Lunn wrote:
>>>> Please could you give this a go. It is totally untested, not even
>>>> compile tested...
>>> Sure. The system boots but ther is one splat:
>>>
>> this is a known issues since 4.20 [1], [2]. So not related to the crash.
> Oh, I see.
>
>> Unfortunately, you didn't wrote which kernel version works for you
>> (except of this splat). Only 5.3 or 5.4-rc3 too?
> With v5.2.20 I was able to boot the system. But after this discussion
> I would say that was just luck. The race seems to exist for longer and
> only with my 'special' config I am able to reproduce it.
okay, let me rephrase my question. You said that 5.4-rc3 didn't even
boot in your setup. After applying Andrew's patch, does it boot or is it
a different issue?
>
>> [1] - https://marc.info/?l=linux-netdev&m=154604180927252&w=2
>> [2] - https://patchwork.kernel.org/patch/10888797/
> Indeed, the irq domain code looks suspicious and Marc pointed out that
> is dead wrong. Could we just go with [2] and fix this up?

Sorry, i cannot answer this question.

Stefan

>
> Thanks,
> Daniel
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-17 17:52               ` Stefan Wahren
@ 2019-10-17 18:14                 ` Daniel Wagner
  2019-10-17 18:25                 ` Andrew Lunn
  1 sibling, 0 replies; 17+ messages in thread
From: Daniel Wagner @ 2019-10-17 18:14 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Woojung Huh, Andrew Lunn, netdev, UNGLinuxDriver,
	bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel

> >> Unfortunately, you didn't wrote which kernel version works for you
> >> (except of this splat). Only 5.3 or 5.4-rc3 too?
> > With v5.2.20 I was able to boot the system. But after this discussion
> > I would say that was just luck. The race seems to exist for longer and
> > only with my 'special' config I am able to reproduce it.
> okay, let me rephrase my question. You said that 5.4-rc3 didn't even
> boot in your setup. After applying Andrew's patch, does it boot or is it
> a different issue?

Yes, with Andrew's patch the initial problem is gone.

> >> [1] - https://marc.info/?l=linux-netdev&m=154604180927252&w=2
> >> [2] - https://patchwork.kernel.org/patch/10888797/
> > Indeed, the irq domain code looks suspicious and Marc pointed out that
> > is dead wrong. Could we just go with [2] and fix this up?
> 
> Sorry, i cannot answer this question.

Sure, I just trying to lobbying :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lan78xx and phy_state_machine
  2019-10-17 17:52               ` Stefan Wahren
  2019-10-17 18:14                 ` Daniel Wagner
@ 2019-10-17 18:25                 ` Andrew Lunn
  1 sibling, 0 replies; 17+ messages in thread
From: Andrew Lunn @ 2019-10-17 18:25 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Daniel Wagner, Woojung Huh, netdev, UNGLinuxDriver,
	bcm-kernel-feedback-list, linux-rpi-kernel, linux-arm-kernel

On Thu, Oct 17, 2019 at 07:52:32PM +0200, Stefan Wahren wrote:
> Hi Daniel,
> 
> Am 17.10.19 um 19:41 schrieb Daniel Wagner:
> > Hi Stefan,
> >
> > On Thu, Oct 17, 2019 at 07:05:32PM +0200, Stefan Wahren wrote:
> >> Am 17.10.19 um 08:52 schrieb Daniel Wagner:
> >>> On Wed, Oct 16, 2019 at 05:51:07PM +0200, Andrew Lunn wrote:
> >>>> Please could you give this a go. It is totally untested, not even
> >>>> compile tested...
> >>> Sure. The system boots but ther is one splat:
> >>>
> >> this is a known issues since 4.20 [1], [2]. So not related to the crash.
> > Oh, I see.
> >
> >> Unfortunately, you didn't wrote which kernel version works for you
> >> (except of this splat). Only 5.3 or 5.4-rc3 too?
> > With v5.2.20 I was able to boot the system. But after this discussion
> > I would say that was just luck. The race seems to exist for longer and
> > only with my 'special' config I am able to reproduce it.
> okay, let me rephrase my question. You said that 5.4-rc3 didn't even
> boot in your setup. After applying Andrew's patch, does it boot or is it
> a different issue?

Hi Stefan

I would say i fixed a real issue with my patch. I will submit it to
David for stable. The problem has come to light because Danial is
using the kernel ipconfig and NFS root. That makes the race condition
hit every time. But the issue could happen under other conditions as
well.

    Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-10-17 18:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191014140604.iddhmg5ckqhzlbkw@beryllium.lan>
     [not found] ` <20191014163004.GP25745@shell.armlinux.org.uk>
     [not found]   ` <20191014192529.z7c5x6hzixxeplvw@beryllium.lan>
2019-10-14 19:51     ` lan78xx and phy_state_machine Stefan Wahren
2019-10-14 20:20       ` Heiner Kallweit
2019-10-14 22:12         ` Russell King - ARM Linux admin
2019-10-15 19:38           ` Heiner Kallweit
2019-10-15 22:09             ` Russell King - ARM Linux admin
2019-10-16 15:36               ` Andrew Lunn
2019-10-16  5:48             ` Stefan Wahren
     [not found] ` <20191015005327.GJ19861@lunn.ch>
2019-10-15 17:16   ` Daniel Wagner
2019-10-16 14:25     ` Daniel Wagner
2019-10-16 15:51       ` Andrew Lunn
2019-10-17  6:52         ` Daniel Wagner
2019-10-17 13:15           ` Andrew Lunn
2019-10-17 17:05           ` Stefan Wahren
2019-10-17 17:41             ` Daniel Wagner
2019-10-17 17:52               ` Stefan Wahren
2019-10-17 18:14                 ` Daniel Wagner
2019-10-17 18:25                 ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).