All of lore.kernel.org
 help / color / mirror / Atom feed
* net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
@ 2019-01-22 19:16 John David Anglin
  2019-01-22 20:28 ` Andrew Lunn
  2019-01-22 23:12 ` net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs Andrew Lunn
  0 siblings, 2 replies; 52+ messages in thread
From: John David Anglin @ 2019-01-22 19:16 UTC (permalink / raw)
  To: Russell King; +Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

I've been hacking on a espressobin board to try to improve ptp support,
etc.  However, I have
a big problem with link detection on the wan, lan0 and lan1 ports.

I have a standard bridge configuration using systemd-networkd. 
Currently, I'm working with linux
v4.20.2.

From power on, none of the wan, lan0, lan1 or br0 achieve link
(LOWER_UP).  networkctl shows
no carrier for these ports.  Disconnecting and reconnecting cables is
not detected and makes no
difference to link state.  I  added a debug printout in
mv88e6352_port_link_state, but the routine
is not called.  As far as I can tell, link state changes are not
detected using PHY interrupts.  And yet,
if the card is rebooted, link detection seems to magically work.

I know that the 88E6341 port registers detect port link (also RJ45 LED)
correctly.

The attached patch fixes link detection at power on.  However, link
state still doesn't update if a cable
is disconnected or moved.

I'm puzzled as to how this is supposed to work.  Thoughts?

Regards,
Dave Anglin

-- 
John David Anglin  dave.anglin@bell.net


[-- Attachment #2: phylink.c.d --]
[-- Type: text/plain, Size: 447 bytes --]

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 9b8dd0d0ee42..c1ec13b320ee 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -405,6 +405,7 @@ static void phylink_resolve(struct work_struct *w)
 		case MLO_AN_PHY:
 			link_state = pl->phy_state;
 			phylink_resolve_flow(pl, &link_state);
+			phylink_get_mac_state(pl, &link_state);
 			phylink_mac_config(pl, &link_state);
 			break;
 

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-22 19:16 net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs John David Anglin
@ 2019-01-22 20:28 ` Andrew Lunn
  2019-01-22 21:40   ` John David Anglin
  2019-01-22 23:12 ` net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs Andrew Lunn
  1 sibling, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-01-22 20:28 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Tue, Jan 22, 2019 at 02:16:09PM -0500, John David Anglin wrote:
> I've been hacking on a espressobin board to try to improve ptp support,
> etc.  However, I have
> a big problem with link detection on the wan, lan0 and lan1 ports.
> 
> I have a standard bridge configuration using systemd-networkd. 

Hi John

Does it make a difference if you do it by hand? Bring up the master
interface, wan, lan0, lan1, add any bridge you need, etc.

> From power on, none of the wan, lan0, lan1 or br0 achieve link
> (LOWER_UP). 

Is the master interface up? What does ip link show give?

   Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-22 20:28 ` Andrew Lunn
@ 2019-01-22 21:40   ` John David Anglin
  2019-01-22 22:36     ` Andrew Lunn
  0 siblings, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-22 21:40 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

Hi Andrew,

On 1/22/2019 3:28 PM, Andrew Lunn wrote:
> Does it make a difference if you do it by hand? Bring up the master
> interface, wan, lan0, lan1, add any bridge you need, etc.
>
>>  From power on, none of the wan, lan0, lan1 or br0 achieve link
>> (LOWER_UP).
I can explore this but I don't know at the moment.  I believe that all 
interfaces are configured.

In serdes.c, interrupts are used to monitor link changes.  However, 
phy.c doesn't do this and
it doesn't call phylink_mac_change().
> Is the master interface up? What does ip link show give?
The master interface (eth0 on mvneta) always comes up.  The link status 
of this interface
appears to be checked by polling at 1 second intervals.

After a power on boot, this is what I see:

root@espressobin:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
     link/ether 96:80:ba:01:e9:31 brd ff:ff:ff:ff:ff:ff
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether 46:42:f8:2b:d1:4e brd ff:ff:ff:ff:ff:ff
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
mode DEFAULT group default qlen 1024
     link/ether 3e:c0:66:0a:db:ba brd ff:ff:ff:ff:ff:ff
5: wan@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue 
master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
     link/ether 3e:c0:66:0a:db:ba brd ff:ff:ff:ff:ff:ff
6: lan0@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue 
master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
     link/ether 3e:c0:66:0a:db:ba brd ff:ff:ff:ff:ff:ff
7: lan1@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue 
master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
     link/ether 3e:c0:66:0a:db:ba brd ff:ff:ff:ff:ff:ff
8: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state 
DOWN mode DEFAULT group default qlen 1000
     link/ether 00:50:43:25:fb:84 brd ff:ff:ff:ff:ff:ff
root@espressobin:~# networkctl
IDX LINK             TYPE               OPERATIONAL SETUP
   1 lo               loopback           carrier     unmanaged
   2 bond0            bond               off         unmanaged
   3 dummy0           ether              off         unmanaged
   4 eth0             ether              degraded    configured
   5 wan              dsa                no-carrier  configuring
   6 lan0             dsa                no-carrier  configuring
   7 lan1             dsa                no-carrier  configuring
   8 br0              bridge             no-carrier  configuring

8 links listed.

r0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether 00:50:43:25:fb:84  txqueuelen 1000  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::3cc0:66ff:fe0a:dbba  prefixlen 64  scopeid 0x20<link>
         ether 3e:c0:66:0a:db:ba  txqueuelen 1024  (Ethernet)
         RX packets 2229  bytes 160191 (156.4 KiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 14  bytes 1116 (1.0 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 11

lan0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether 3e:c0:66:0a:db:ba  txqueuelen 1000  (Ethernet)
         RX packets 1075  bytes 55969 (54.6 KiB)
         RX errors 0  dropped 551  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lan1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether 3e:c0:66:0a:db:ba  txqueuelen 1000  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
         inet 127.0.0.1  netmask 255.0.0.0
         inet6 ::1  prefixlen 128  scopeid 0x10<host>
         loop  txqueuelen 1000  (Local Loopback)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wan: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether 3e:c0:66:0a:db:ba  txqueuelen 1000  (Ethernet)
         RX packets 1154  bytes 55184 (53.8 KiB)
         RX errors 0  dropped 14  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Then, if I do a "shutdown -r +0", I see:

root@espressobin:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
     link/ether 26:dd:a4:7c:26:3f brd ff:ff:ff:ff:ff:ff
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether 0a:73:61:5a:ae:dc brd ff:ff:ff:ff:ff:ff
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
mode DEFAULT group default qlen 1024
     link/ether f6:2d:a2:ab:32:4d brd ff:ff:ff:ff:ff:ff
5: wan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
master br0 state UP mode DEFAULT group default qlen 1000
     link/ether f6:2d:a2:ab:32:4d brd ff:ff:ff:ff:ff:ff
6: lan0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
master br0 state UP mode DEFAULT group default qlen 1000
     link/ether f6:2d:a2:ab:32:4d brd ff:ff:ff:ff:ff:ff
7: lan1@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue 
master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
     link/ether f6:2d:a2:ab:32:4d brd ff:ff:ff:ff:ff:ff
8: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
UP mode DEFAULT group default qlen 1000
     link/ether 00:50:43:25:fb:84 brd ff:ff:ff:ff:ff:ff
root@espressobin:~# networkctl
IDX LINK             TYPE               OPERATIONAL SETUP
   1 lo               loopback           carrier     unmanaged
   2 bond0            bond               off         unmanaged
   3 dummy0           ether              off         unmanaged
   4 eth0             ether              degraded    configured
   5 wan              dsa                degraded    configured
   6 lan0             dsa                degraded    configured
   7 lan1             dsa                no-carrier  configuring
   8 br0              bridge             routable    configured

8 links listed.
root@espressobin:~# ifconfig
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.0.55  netmask 255.255.255.0  broadcast 192.168.0.255
         inet6 fe80::250:43ff:fe25:fb84  prefixlen 64  scopeid 0x20<link>
         ether 00:50:43:25:fb:84  txqueuelen 1000  (Ethernet)
         RX packets 1184  bytes 237936 (232.3 KiB)
         RX errors 0  dropped 261  overruns 0  frame 0
         TX packets 193  bytes 19444 (18.9 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::f42d:a2ff:feab:324d  prefixlen 64  scopeid 0x20<link>
         ether f6:2d:a2:ab:32:4d  txqueuelen 1024  (Ethernet)
         RX packets 1331  bytes 276209 (269.7 KiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 233  bytes 24352 (23.7 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 11

lan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::f42d:a2ff:feab:324d  prefixlen 64  scopeid 0x20<link>
         ether f6:2d:a2:ab:32:4d  txqueuelen 1000  (Ethernet)
         RX packets 264  bytes 13773 (13.4 KiB)
         RX errors 0  dropped 135  overruns 0  frame 0
         TX packets 18  bytes 1368 (1.3 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lan1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether f6:2d:a2:ab:32:4d  txqueuelen 1000  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
         inet 127.0.0.1  netmask 255.0.0.0
         inet6 ::1  prefixlen 128  scopeid 0x10<host>
         loop  txqueuelen 1000  (Local Loopback)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wan: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::f42d:a2ff:feab:324d  prefixlen 64  scopeid 0x20<link>
         ether f6:2d:a2:ab:32:4d  txqueuelen 1000  (Ethernet)
         RX packets 1067  bytes 233154 (227.6 KiB)
         RX errors 0  dropped 4  overruns 0  frame 0
         TX packets 203  bytes 20240 (19.7 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Everything seems normal.  The wan and lan0 ports are connected. From 
syslog for p2 (lan0):

Jan 22 16:02:34 localhost kernel: [    5.249693] mv88e6085 
d0032004.mdio-mii:01: p2: PortState set to Disabled
Jan 22 16:02:34 localhost kernel: [    5.394717] mv88e6085 
d0032004.mdio-mii:01: p2: Force link down
Jan 22 16:02:34 localhost kernel: [    5.400554] mv88e6085 
d0032004.mdio-mii:01: p2: Speed unforced
Jan 22 16:02:34 localhost kernel: [    5.406558] mv88e6085 
d0032004.mdio-mii:01: p2: Unforce half duplex
Jan 22 16:02:34 localhost kernel: [    5.413007] mv88e6085 
d0032004.mdio-mii:01: p2: Unforce link down
Jan 22 16:02:34 localhost kernel: [    5.421850] mv88e6085 
d0032004.mdio-mii:01: p2: 802.1QMode set to Disabled
Jan 22 16:02:34 localhost kernel: [    5.430197] mv88e6085 
d0032004.mdio-mii:01: p2: FID set to 0
Jan 22 16:02:34 localhost kernel: [    5.433696] mv88e6085 
d0032004.mdio-mii:01: p2: VLANTable set to 001
Jan 22 16:02:34 localhost kernel: [    6.136188] mv88e6085 
d0032004.mdio-mii:01 lan0 (uninitialized): PHY 
[!soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12] driver 
[Marvell 88E6390]
Jan 22 16:02:34 localhost kernel: [    8.890947] bridge: filtering via 
arp/ip/ip6tables is no longer available by default. Update your scripts 
to load br_netfilter if you need this.
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Joined netdev
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Bringing link up
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Flags change: +UP
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Set link
Jan 22 16:02:34 localhost kernel: [    9.017442] br0: port 2(lan0) 
entered blocking state
Jan 22 16:02:34 localhost kernel: [    9.017478] br0: port 2(lan0) 
entered disabled state
Jan 22 16:02:34 localhost kernel: [    9.018314] mv88e6085 
d0032004.mdio-mii:01: p2: VLANTable set to 009
Jan 22 16:02:34 localhost kernel: [    9.021257] device lan0 entered 
promiscuous mode
Jan 22 16:02:34 localhost kernel: [    9.033658] mv88e6085 
d0032004.mdio-mii:01: p2: VLANTable set to 00b
an 22 16:02:34 localhost kernel: [    9.138112] mv88e6085 
d0032004.mdio-mii:01: p2: PortState set to Blocking/Listening
Jan 22 16:02:34 localhost kernel: [    9.138149] mv88e6085 
d0032004.mdio-mii:01 lan0: configuring for phy/ link mode
Jan 22 16:02:34 localhost kernel: [    9.138158] mv88e6085 
d0032004.mdio-mii:01: port=2, mode=0, link=0
Jan 22 16:02:34 localhost kernel: [    9.138267] mv88e6085 
d0032004.mdio-mii:01: port=2, mode=0, link=0
Jan 22 16:02:34 localhost kernel: [    9.138274] mv88e6085 
d0032004.mdio-mii:01 lan0: Link is Down - mac_link_dropped=0
Jan 22 16:02:34 localhost kernel: [    9.138885] IPv6: 
ADDRCONF(NETDEV_UP): lan0: link is not ready
Jan 22 16:02:34 localhost kernel: [    9.213888] mv88e6085 
d0032004.mdio-mii:01: port=2, mode=0, link=1
Jan 22 16:02:34 localhost kernel: [    9.213919] mv88e6085 
d0032004.mdio-mii:01 lan0: Link is Up - 1Gbps/Full - flow control rx/tx
Jan 22 16:02:34 localhost kernel: [    9.213962] IPv6: 
ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
Jan 22 16:02:34 localhost kernel: [   11.356713] br0: port 2(lan0) 
entered blocking state
Jan 22 16:02:34 localhost kernel: [   11.356719] br0: port 2(lan0) 
entered forwarding state
Jan 22 16:02:34 localhost kernel: [   11.359228] mv88e6085 
d0032004.mdio-mii:01: p2: PortState set to Blocking/Listening
Jan 22 16:02:34 localhost kernel: [   11.360339] mv88e6085 
d0032004.mdio-mii:01: p2: PortState set to Forwarding
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Flags change: 
+LOWER_UP +RUNNING
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Gained carrier
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Configured




-- 
John David Anglin    dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-22 21:40   ` John David Anglin
@ 2019-01-22 22:36     ` Andrew Lunn
  2019-01-22 23:52       ` John David Anglin
  2019-01-23  0:11       ` John David Anglin
  0 siblings, 2 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-01-22 22:36 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Tue, Jan 22, 2019 at 04:40:27PM -0500, John David Anglin wrote:
> Hi Andrew,
> 
> On 1/22/2019 3:28 PM, Andrew Lunn wrote:
> >Does it make a difference if you do it by hand? Bring up the master
> >interface, wan, lan0, lan1, add any bridge you need, etc.
> >
> >> From power on, none of the wan, lan0, lan1 or br0 achieve link
> >>(LOWER_UP).
> I can explore this but I don't know at the moment.  I believe that all
> interfaces are configured.

Hi John

In what order? The master interface has to be up first before any
slave interface is configured up.

> In serdes.c, interrupts are used to monitor link changes.  However, phy.c
> doesn't do this and
> it doesn't call phylink_mac_change().

It does not need to. There are two options here:

1) The PHY has no interrupt. phylib will poll the PHY once per second
   for link changes.

2) The PHY has in interrupt. Link changes will cause the interrupt to
   fire, and the phylib will then read the current state.

For PHYs embedded within a switch driver by mv88e6xxx interrupts
should always be used.

I will do some testing on my espressobin.

  Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-22 19:16 net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs John David Anglin
  2019-01-22 20:28 ` Andrew Lunn
@ 2019-01-22 23:12 ` Andrew Lunn
  2019-01-22 23:48   ` John David Anglin
  2019-01-23  0:00   ` John David Anglin
  1 sibling, 2 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-01-22 23:12 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Tue, Jan 22, 2019 at 02:16:09PM -0500, John David Anglin wrote:
> I've been hacking on a espressobin board to try to improve ptp support,
> etc.  However, I have
> a big problem with link detection on the wan, lan0 and lan1 ports.

Hi John

I just booted my espressobin with net-next. It is running Debian, and
i have the following in /etc/network/interfaces

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug wan
iface wan inet dhcp
  pre-up ip link set eth0 up

allow-hotplug lan0
iface lan0 inet static
  pre-up ip link set eth0 up
  address 10.42.42.42
  netmask 255.255.255.0

my wan port got its IP address from DHCP.

root@espressobin:~# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul
t qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1508 qdisc mq state UP group defa
ult qlen 1024
    link/ether f0:ad:4e:03:69:9c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f2ad:4eff:fe03:699c/64 scope link 
       valid_lft forever preferred_lft forever
3: wan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP g
roup default qlen 1000
    link/ether f0:ad:4e:03:69:9c brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.11/24 brd 10.0.0.255 scope global wan
       valid_lft forever preferred_lft forever
    inet6 fe80::f2ad:4eff:fe03:699c/64 scope link 
       valid_lft forever preferred_lft forever
4: lan0@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state L
OWERLAYERDOWN group default qlen 1000
    link/ether f0:ad:4e:03:69:9c brd ff:ff:ff:ff:ff:ff
    inet 10.42.42.42/24 brd 10.42.42.255 scope global lan0
       valid_lft forever preferred_lft forever
5: lan1@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
 qlen 1000
    link/ether f0:ad:4e:03:69:9c brd ff:ff:ff:ff:ff:ff

lan0 is correctly down, because the machine on the other end is down.

I then manually configured lan1 up and powered on the peer. I then
see:

[  543.113227] IPv6: ADDRCONF(NETDEV_UP): lan1: link is not ready
[  546.680276] mv88e6085 d0032004.mdio-mii:01 lan1: Link is Up - 1Gbps/Full - fl
ow control off
[  546.686106] IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready

So for me, everything is working as it should.

I would suggest you manually configure your networking, just to
test. I would suspect systemd is not doing things correctly.

	Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-22 23:12 ` net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs Andrew Lunn
@ 2019-01-22 23:48   ` John David Anglin
  2019-01-23  0:00   ` John David Anglin
  1 sibling, 0 replies; 52+ messages in thread
From: John David Anglin @ 2019-01-22 23:48 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-22 6:12 p.m., Andrew Lunn wrote:
> I would suggest you manually configure your networking, just to
> test. I would suspect systemd is not doing things correctly.
I have armbian with debian stretch installed.  I did find that both
systemd-networkd and
networkmanager were enabled.  I disabled the networkmanager service. 
However, DHCP
on br0 didn't always get an IP.  It was after disabling networkmanager
that the ports didn't
come up on boot.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-22 22:36     ` Andrew Lunn
@ 2019-01-22 23:52       ` John David Anglin
  2019-01-23  0:11       ` John David Anglin
  1 sibling, 0 replies; 52+ messages in thread
From: John David Anglin @ 2019-01-22 23:52 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-22 5:36 p.m., Andrew Lunn wrote:
> In what order? The master interface has to be up first before any
> slave interface is configured up.
Possibly, that's a clue.  If I recall, that's determined by the naming
on the .network files in
/etc/systemd/network.  The slave interfaces are coming up in reverse
order: lan1->lan0->wan.
Need to check eth0 relative to the others.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-22 23:12 ` net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs Andrew Lunn
  2019-01-22 23:48   ` John David Anglin
@ 2019-01-23  0:00   ` John David Anglin
  2019-01-23  0:04     ` Florian Fainelli
  1 sibling, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-23  0:00 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-22 6:12 p.m., Andrew Lunn wrote:
> I just booted my espressobin with net-next. It is running Debian, and
> i have the following in /etc/network/interfaces
>
> source /etc/network/interfaces.d/*
>
> # The loopback network interface
> auto lo
> iface lo inet loopback
>
> # The primary network interface
> allow-hotplug wan
> iface wan inet dhcp
>   pre-up ip link set eth0 up
>
> allow-hotplug lan0
> iface lan0 inet static
>   pre-up ip link set eth0 up
>   address 10.42.42.42
>   netmask 255.255.255.0
>
> my wan port got its IP address from DHCP.
Your configuration is different.  I have the wan, lan0 and lan1 ports
configured as a bridge and they
don't get IP addresses.  The only port that gets an IP is br0.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-23  0:00   ` John David Anglin
@ 2019-01-23  0:04     ` Florian Fainelli
  0 siblings, 0 replies; 52+ messages in thread
From: Florian Fainelli @ 2019-01-23  0:04 UTC (permalink / raw)
  To: John David Anglin, Andrew Lunn; +Cc: Russell King, Vivien Didelot, netdev

On 1/22/19 4:00 PM, John David Anglin wrote:
> On 2019-01-22 6:12 p.m., Andrew Lunn wrote:
>> I just booted my espressobin with net-next. It is running Debian, and
>> i have the following in /etc/network/interfaces
>>
>> source /etc/network/interfaces.d/*
>>
>> # The loopback network interface
>> auto lo
>> iface lo inet loopback
>>
>> # The primary network interface
>> allow-hotplug wan
>> iface wan inet dhcp
>>   pre-up ip link set eth0 up
>>
>> allow-hotplug lan0
>> iface lan0 inet static
>>   pre-up ip link set eth0 up
>>   address 10.42.42.42
>>   netmask 255.255.255.0
>>
>> my wan port got its IP address from DHCP.
> Your configuration is different.  I have the wan, lan0 and lan1 ports
> configured as a bridge and they
> don't get IP addresses.  The only port that gets an IP is br0.

If you bridge all devices together that's expected. If you leave the
network devices interfaces outside of the bridge, then they act as
individual/standalone network ports, and that is where a DHCP client
would be running.

The point Andrew as trying to make is that if you make sure you always
bring-up the DSA master network device, typically ethX first, and then
the DSA network interfaces that are per-port, then you should not have
any issues with link being UP/DOWN. Whether IP stack works is a
different thing.
-- 
Florian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-22 22:36     ` Andrew Lunn
  2019-01-22 23:52       ` John David Anglin
@ 2019-01-23  0:11       ` John David Anglin
  2019-01-23  0:22         ` Andrew Lunn
  1 sibling, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-23  0:11 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-22 5:36 p.m., Andrew Lunn wrote:
> It does not need to. There are two options here:
>
> 1) The PHY has no interrupt. phylib will poll the PHY once per second
>    for link changes.
>
> 2) The PHY has in interrupt. Link changes will cause the interrupt to
>    fire, and the phylib will then read the current state.
>
> For PHYs embedded within a switch driver by mv88e6xxx interrupts
> should always be used.
I don't think option 2) is implemented.  Didn't see any irq code in phy.c.

If I remember correctly, one needs to use clause 45 accesses to get at
the PHY registers in the 88E6341.

Can you point me to the phylib code that does the polling?

Thanks,
Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-23  0:11       ` John David Anglin
@ 2019-01-23  0:22         ` Andrew Lunn
  2019-01-25 16:30           ` John David Anglin
  2019-01-30 17:08           ` John David Anglin
  0 siblings, 2 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-01-23  0:22 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

> > It does not need to. There are two options here:
> >
> > 1) The PHY has no interrupt. phylib will poll the PHY once per second
> >    for link changes.
> >
> > 2) The PHY has in interrupt. Link changes will cause the interrupt to
> >    fire, and the phylib will then read the current state.
> >
> > For PHYs embedded within a switch driver by mv88e6xxx interrupts
> > should always be used.

Hi Dave

From my Espressobin

cat /proc/interrupts
...
 44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
 46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
 48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
 51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
 52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
 53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13

These are PHY interrupts.

> I don't think option 2) is implemented.  Didn't see any irq code in phy.c.

You would not. All the interrupt code is in the PHY core and the PHY
driver. drivers/net/dsa/mv88e6xxx/phy.c is just a bunch of helpers
which allow the mdio bus driver to access phy registers. The PHY
driver itself is drivers/net/phy/marvell.c, and the interrupt handling
is spread between that and drivers/net/phy/phy.c

> If I remember correctly, one needs to use clause 45 accesses to get at
> the PHY registers in the 88E6341.

Nope. The PHYs are c22 devices. The SERDES are probably C45, but those
are not being used here.

      Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-23  0:22         ` Andrew Lunn
@ 2019-01-25 16:30           ` John David Anglin
  2019-01-25 16:48             ` Russell King - ARM Linux admin
  2019-01-30 17:08           ` John David Anglin
  1 sibling, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-25 16:30 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

[-- Attachment #1: Type: text/plain, Size: 1756 bytes --]

On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
> >From my Espressobin
>
> cat /proc/interrupts
> ...
>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>
> These are PHY interrupts.
>
Thanks.  That was the clue.  I had tried to enable hardware for support
for switch interrupts.  However,
the espressobin connects the switch interrupt output to a southbridge
pin that only supports edge interrupts.
Link detection works okay once the attached change is removed.  I may
try and see if moving the interrupt
to a northbridge pin works.

I have to wonder about the use of edge interrupts.  Given that a GPIO
interrupt isn't defined for the mv88e6xxx,
then the interrupt must be done by polling the global1 control register.

This also makes me wonder about this SD interrupt:
43:          0          0     GPIO1   3 Edge      d00d0000.sdhci cd
Many poeple have trouble with SD cards on reboot.

Although the driver doesn't handle AVB interrupts, I started down this
path to try get ptp4l working better.  I
have tweaked the polling but there are still circumstances where
timestamps are overwritten (or not written).

Dave

-- 
John David Anglin  dave.anglin@bell.net


[-- Attachment #2: armada-3720-espressobin.dts-2.d --]
[-- Type: text/plain, Size: 698 bytes --]

diff --git a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts
index 3ab25ad402b9..65a3ff4da610 100644
--- a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts
@@ -126,6 +126,14 @@
 
 		dsa,member = <0 0>;
 
+		interrupt-parent = <&gpiosb>;
+		/* Actually, the irq is active low but gpiosb doesn't
+		   support that.  Falling edge seems to work okay.
+		   We need this for ptp support */
+		interrupts = <23 IRQ_TYPE_EDGE_FALLING>;
+		interrupt-controller;
+		#interrupt-cells = <2>;
+
 		ports {
 			#address-cells = <1>;
 			#size-cells = <0>;

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-25 16:30           ` John David Anglin
@ 2019-01-25 16:48             ` Russell King - ARM Linux admin
  2019-01-25 18:38               ` John David Anglin
  0 siblings, 1 reply; 52+ messages in thread
From: Russell King - ARM Linux admin @ 2019-01-25 16:48 UTC (permalink / raw)
  To: John David Anglin; +Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev

On Fri, Jan 25, 2019 at 11:30:54AM -0500, John David Anglin wrote:
> This also makes me wonder about this SD interrupt:
> 43:          0          0     GPIO1   3 Edge      d00d0000.sdhci cd
> Many poeple have trouble with SD cards on reboot.

[off topic, but a relevant reply to the above]

Very likely unrelated to edge interrupts.

For a signal that goes low when a card is inserted and high when
removed, you really need an edge interrupt to avoid flooding the CPU
with a "stuck" interrupt - if you were to use an active low interrupt
for card detection, and the signal is pulled low by card insertion,
the interrupt remains active until you remove the card!

In any case, I've seen the same with some early SolidRun platforms
when using high speed modes.

Most issues with SD cards at reboot is where the card is left in low
voltage/high speed mode, the reboot happens, and then you try to talk
to it in low-speed mode (as required by the specs for card detection/
initialisation) and they don't respond.  The only way out of that is
to reset the card - and the only way to reset the card is to remove
power from it.

If you're lucky enough to have a "regulator" which can be switched
off at reboot, that may not be sufficient - if there's a capacitor on
the SD card's supply line downstream from the regulator, the card can
be drawing soo little current that the voltage doesn't fall
sufficiently to cause the card to power down.

If you don't have any means to reset the card like that, then the only
workaround is to reduce the maximum speed - and avoid going to 1.8V
signalling mode.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-25 16:48             ` Russell King - ARM Linux admin
@ 2019-01-25 18:38               ` John David Anglin
  0 siblings, 0 replies; 52+ messages in thread
From: John David Anglin @ 2019-01-25 18:38 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev

[-- Attachment #1: Type: text/plain, Size: 453 bytes --]

On 2019-01-25 11:48 a.m., Russell King - ARM Linux admin wrote:
> If you don't have any means to reset the card like that, then the only
> workaround is to reduce the maximum speed - and avoid going to 1.8V
> signalling mode.
There is no power controller on espressobin.  The regulator only
switches the I/O voltage.

The attached change fixed reboot on espressobin with problematic SD card.

Thanks,
Dave

-- 
John David Anglin  dave.anglin@bell.net


[-- Attachment #2: armada-3720-espressobin-sdhc.d --]
[-- Type: text/plain, Size: 511 bytes --]

diff --git a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts
index 3ab25ad402b9..df7eeda4b73c 100644
--- a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts
@@ -57,6 +57,8 @@
 &sdhci1 {
 	wp-inverted;
 	bus-width = <4>;
+	max-frequency = <50000000>;
+	no-1-8-v;
 	cd-gpios = <&gpionb 3 GPIO_ACTIVE_LOW>;
 	marvell,pad-type = "sd";
 	vqmmc-supply = <&vcc_sd_reg1>;

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-23  0:22         ` Andrew Lunn
  2019-01-25 16:30           ` John David Anglin
@ 2019-01-30 17:08           ` John David Anglin
  2019-01-30 17:28             ` Andrew Lunn
  1 sibling, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-30 17:08 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
> >From my Espressobin
>
> cat /proc/interrupts
> ...
>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>
> These are PHY interrupts.
If we come back to my trying to use the INTn pin on the esspressobin, I
have found that clearing and resetting the interrupt
enable bits in the global control register (offset 0x4) restarts link
detection when the device is stuck.  This suggests that the
INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
on this pin.  Possibly, this is caused by the fact
that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
loading is done.  Another possibility might be race conditions
in processing interrupts.

Thoughts?

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-30 17:08           ` John David Anglin
@ 2019-01-30 17:28             ` Andrew Lunn
  2019-01-30 19:01               ` John David Anglin
  2019-01-30 22:24               ` John David Anglin
  0 siblings, 2 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-01-30 17:28 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Wed, Jan 30, 2019 at 12:08:39PM -0500, John David Anglin wrote:
> On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
> > >From my Espressobin
> >
> > cat /proc/interrupts
> > ...
> >  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
> >  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
> >  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
> >  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
> >  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
> >  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
> >
> > These are PHY interrupts.
> If we come back to my trying to use the INTn pin on the esspressobin, I
> have found that clearing and resetting the interrupt
> enable bits in the global control register (offset 0x4) restarts link
> detection when the device is stuck.  This suggests that the
> INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
> on this pin.  Possibly, this is caused by the fact
> that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
> loading is done.  Another possibility might be race conditions
> in processing interrupts.
> 
> Thoughts?

Hi David

You need active low interrupts. Without it, i think you are always
going to have race conditions which will cause interrupts to get
stuck/lost.

I would suggest you remove the interrupt from your device tree and use
the mv88e6xxx polling method. If i remember correctly, it currently
polls 10 per second, so PHY link up/down is going to be 5 times faster
on average than when phylib is polling the PHY.

   Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-30 17:28             ` Andrew Lunn
@ 2019-01-30 19:01               ` John David Anglin
  2019-01-30 19:09                 ` Andrew Lunn
  2019-01-30 22:24               ` John David Anglin
  1 sibling, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-30 19:01 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> On Wed, Jan 30, 2019 at 12:08:39PM -0500, John David Anglin wrote:
>> On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
>>> >From my Espressobin
>>>
>>> cat /proc/interrupts
>>> ...
>>>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>>>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>>>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>>>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>>>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>>>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>>>
>>> These are PHY interrupts.
>> If we come back to my trying to use the INTn pin on the esspressobin, I
>> have found that clearing and resetting the interrupt
>> enable bits in the global control register (offset 0x4) restarts link
>> detection when the device is stuck.  This suggests that the
>> INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
>> on this pin.  Possibly, this is caused by the fact
>> that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
>> loading is done.  Another possibility might be race conditions
>> in processing interrupts.
>>
>> Thoughts?
> Hi David
>
> You need active low interrupts. Without it, i think you are always
> going to have race conditions which will cause interrupts to get
> stuck/lost.
>
> I would suggest you remove the interrupt from your device tree and use
> the mv88e6xxx polling method. If i remember correctly, it currently
> polls 10 per second, so PHY link up/down is going to be 5 times faster
> on average than when phylib is polling the PHY.
Hi Andrew,

The main motivation in doing this is to try to enable the AVB interrupt
and to improve the PTP support.
I agree that polling is perfectly fine for PHY link interrupts. 
Possibly, ATU and VTU might need faster
support but I'm not using that at the moment.

I have hacked on the time stamp code quit a bit to try and improve
things but there are still issues with
lost or overwritten time stamps:

Jan 28 11:15:05 localhost kernel: [234850.840883] mv88e6085
d0032004.mdio-mii:01
: timestamp discarded
Jan 28 11:15:05 localhost ptp4l: [234852.998] port 3: received SYNC
without time
stamp

I think when PTP packets other than PDELAY are too closely spaced, we
have problems accessing
the timestamp quick enough.  Also, timestamp access is dependent on CPU
speed and HZ.

It looks like I can easily connect MPP2_23 to MPP1_16 on the edge
connector P8.  I believe the northbridge
pins support level interrupts.

In /proc/interrupts, the switch interrupts are shown as edge.  The only
place that I see where this
is potentially set is mv88e6xxx_g2_watchdog_setup() where the call to
request_threaded_irq() passes
"IRQF_ONESHOT | IRQF_TRIGGER_FALLING".  Does this need to change?

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-30 19:01               ` John David Anglin
@ 2019-01-30 19:09                 ` Andrew Lunn
  0 siblings, 0 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-01-30 19:09 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

> In /proc/interrupts, the switch interrupts are shown as edge.  The only
> place that I see where this
> is potentially set is mv88e6xxx_g2_watchdog_setup() where the call to
> request_threaded_irq() passes
> "IRQF_ONESHOT | IRQF_TRIGGER_FALLING".  Does this need to change?

Hi Dave

Device tree determines how the interrupt is configured.  It is the
second part of the interrupts property.

       Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-30 17:28             ` Andrew Lunn
  2019-01-30 19:01               ` John David Anglin
@ 2019-01-30 22:24               ` John David Anglin
  2019-01-30 22:38                 ` Andrew Lunn
  1 sibling, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-30 22:24 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> You need active low interrupts. Without it, i think you are always
> going to have race conditions which will cause interrupts to get
> stuck/lost.
I don't know if this is a hardware limitation or not, but currently the
armada 37xx doesn't support
level interrupts:

[    4.013280] genirq: Setting trigger mode 8 for irq 44 failed
(armada_37xx_irq_set_type+0x0/0x158)
[    4.014075] mv88e6085: probe of d0032004.mdio-mii:01 failed with
error -22

The function armada_37xx_irq_set_type() only supports edge interrupts.

On the plus side, DTC no longer objects to level interrupts on southbridge.

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-30 22:24               ` John David Anglin
@ 2019-01-30 22:38                 ` Andrew Lunn
  2019-01-31  1:27                   ` John David Anglin
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-01-30 22:38 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Wed, Jan 30, 2019 at 05:24:53PM -0500, John David Anglin wrote:
> On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> > You need active low interrupts. Without it, i think you are always
> > going to have race conditions which will cause interrupts to get
> > stuck/lost.
> I don't know if this is a hardware limitation or not, but currently the
> armada 37xx doesn't support
> level interrupts:

Yes, i had a quick look at the pinctrl driver. It only has code for
edges.

I'd suggest you take a look at the datasheet for the 37xx and check
what the hardware actually supports. You might need to extend the
driver.

	Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-30 22:38                 ` Andrew Lunn
@ 2019-01-31  1:27                   ` John David Anglin
  2019-01-31 17:27                     ` John David Anglin
  0 siblings, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-31  1:27 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-30 5:38 p.m., Andrew Lunn wrote:
> I'd suggest you take a look at the datasheet for the 37xx and check
> what the hardware actually supports. You might need to extend the
> driver.
I did look and the GIC does support level interrupts.  But all the
documentation is in
generic ARM documents that I don't currently have.  I'll see if I can
find them tomorrow.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
  2019-01-31  1:27                   ` John David Anglin
@ 2019-01-31 17:27                     ` John David Anglin
  2019-02-04 18:37                       ` [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering John David Anglin
  0 siblings, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-01-31 17:27 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-01-30 8:27 p.m., John David Anglin wrote:
> On 2019-01-30 5:38 p.m., Andrew Lunn wrote:
>> I'd suggest you take a look at the datasheet for the 37xx and check
>> what the hardware actually supports. You might need to extend the
>> driver.
> I did look and the GIC does support level interrupts.  But all the
> documentation is in
> generic ARM documents that I don't currently have.  I'll see if I can
> find them tomorrow.
On a closer look at MV-S110897-00C, I see that the north and south
bridge GPIO interrupt registers
only provide edge polarity control.  The GPIO pins don't appear to
support level interrupts on 88F37xx.

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
  2019-01-31 17:27                     ` John David Anglin
@ 2019-02-04 18:37                       ` John David Anglin
  2019-02-04 19:35                         ` Andrew Lunn
  2019-02-04 21:59                         ` [PATCH v2] net: " John David Anglin
  0 siblings, 2 replies; 52+ messages in thread
From: John David Anglin @ 2019-02-04 18:37 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

This change fixes a race condition in the setup of hardware irqs and the
code enabling PHY link
detection.

This was observed on the espressobin board where the GPIO interrupt
controller only supports edge
interrupts.  If the INTn output pin goes low before the GPIO interrupt
is enabled, PHY link interrupts
are not detected.

With this change, we
1) force INTn high by clearing all interrupt enables in global 1 control1,
2) setup the hardware irq, and
3) perform the remaining common setup.

This in fact simplifies the setup and allows some unnecessary code to be
removed.

In order to prevent races in mv88e6xxx_g1_irq_thread_work(), we clear
and restore the interrupt
enables in the normal path just prior to exit.

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
b/drivers/net/dsa/mv88e6xxx/chip.c
index b2a0e59b6252..0dcbc25c1b4b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -260,7 +260,7 @@ static irqreturn_t
mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
     unsigned int nhandled = 0;
     unsigned int sub_irq;
     unsigned int n;
-    u16 reg;
+    u16 mask, reg;
     int err;
 
     mutex_lock(&chip->reg_lock);
@@ -277,6 +277,14 @@ static irqreturn_t
mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
             ++nhandled;
         }
     }
+
+    mutex_lock(&chip->reg_lock);
+    mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &reg);
+    mask = ~GENMASK(chip->g1_irq.nirqs, 0);
+    mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, (reg & mask));
+    mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, reg);
+    mutex_unlock(&chip->reg_lock);
+
 out:
     return (nhandled > 0 ? IRQ_HANDLED : IRQ_NONE);
 }
@@ -374,10 +382,30 @@ static void mv88e6xxx_g1_irq_free(struct
mv88e6xxx_chip *chip)
     mutex_unlock(&chip->reg_lock);
 }
 
+static int mv88e6xxx_g1_irq_setup_masks(struct mv88e6xxx_chip *chip)
+{
+    int err;
+    u16 reg;
+
+    /* The INTn output must be high when hardware interrupts are setup.
+       The EEPROM done interrupt enable is set on reset, so clear all
+       interrupt enable bits to ensure INTn is not driven low */
+    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &reg);
+    if (err)
+        return err;
+    reg &= ~GENMASK(chip->info->g1_irqs, 0);
+    err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, reg);
+    if (err)
+        return err;
+
+    /* Reading the interrupt status clears (most of) them */
+    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
+    return err;
+}
+
 static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 {
-    int err, irq, virq;
-    u16 reg, mask;
+    int irq;
 
     chip->g1_irq.nirqs = chip->info->g1_irqs;
     chip->g1_irq.domain = irq_domain_add_simple(
@@ -392,43 +420,14 @@ static int mv88e6xxx_g1_irq_setup_common(struct
mv88e6xxx_chip *chip)
     chip->g1_irq.chip = mv88e6xxx_g1_irq_chip;
     chip->g1_irq.masked = ~0;
 
-    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
-    if (err)
-        goto out_mapping;
-
-    mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
-
-    err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
-    if (err)
-        goto out_disable;
-
-    /* Reading the interrupt status clears (most of) them */
-    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
-    if (err)
-        goto out_disable;
-
     return 0;
-
-out_disable:
-    mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
-    mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
-
-out_mapping:
-    for (irq = 0; irq < 16; irq++) {
-        virq = irq_find_mapping(chip->g1_irq.domain, irq);
-        irq_dispose_mapping(virq);
-    }
-
-    irq_domain_remove(chip->g1_irq.domain);
-
-    return err;
 }
 
 static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip *chip)
 {
     int err;
 
-    err = mv88e6xxx_g1_irq_setup_common(chip);
+    err = mv88e6xxx_g1_irq_setup_masks(chip);
     if (err)
         return err;
 
@@ -437,9 +436,9 @@ static int mv88e6xxx_g1_irq_setup(struct
mv88e6xxx_chip *chip)
                    IRQF_ONESHOT | IRQF_SHARED,
                    dev_name(chip->dev), chip);
     if (err)
-        mv88e6xxx_g1_irq_free_common(chip);
+        return err;
 
-    return err;
+    return mv88e6xxx_g1_irq_setup_common(chip);
 }
 
 static void mv88e6xxx_irq_poll(struct kthread_work *work)
@@ -457,6 +456,10 @@ static int mv88e6xxx_irq_poll_setup(struct
mv88e6xxx_chip *chip)
 {
     int err;
 
+    err = mv88e6xxx_g1_irq_setup_masks(chip);
+    if (err)
+        return err;
+
     err = mv88e6xxx_g1_irq_setup_common(chip);
     if (err)
         return err;

Signed-off-by: John David Anglin <dave.anglin@bell.net>
-- 

John David Anglin  dave.anglin@bell.net


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 18:37                       ` [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering John David Anglin
@ 2019-02-04 19:35                         ` Andrew Lunn
  2019-02-04 19:52                           ` John David Anglin
  2019-02-04 21:59                         ` [PATCH v2] net: " John David Anglin
  1 sibling, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-02-04 19:35 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Mon, Feb 04, 2019 at 01:37:13PM -0500, John David Anglin wrote:
> This change fixes a race condition in the setup of hardware irqs and the
> code enabling PHY link
> detection.
> 
> This was observed on the espressobin board where the GPIO interrupt
> controller only supports edge
> interrupts.  If the INTn output pin goes low before the GPIO interrupt
> is enabled, PHY link interrupts
> are not detected.

Hi David

Please break this up into two patches.

Masking interrupts in the setup code before enabling the interrupt i'm
happy with.

The change to the interrupt handler i'm pretty sure is wrong. You have
to accept with edge interrupts you are going to loose interrupts.

    Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 19:35                         ` Andrew Lunn
@ 2019-02-04 19:52                           ` John David Anglin
  2019-02-04 20:19                             ` Andrew Lunn
  0 siblings, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-02-04 19:52 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

Hi Andrew,

On 2019-02-04 2:35 p.m., Andrew Lunn wrote:
> The change to the interrupt handler i'm pretty sure is wrong. You have
> to accept with edge interrupts you are going to loose interrupts.
Can you be more specific regarding what you think is wrong with this hunk?

I can see that an interrupt might be handled twice if the source wasn't
cleared before interrupts
are re-enabled but I think that would also occur with level interrupts.

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 19:52                           ` John David Anglin
@ 2019-02-04 20:19                             ` Andrew Lunn
  2019-02-04 21:38                               ` John David Anglin
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-02-04 20:19 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

> Can you be more specific regarding what you think is wrong with this hunk?

Hi David

The IRQ core would do this if it was needed.

How many other irq thread work functions can you point to which do
something similar?

	  Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 20:19                             ` Andrew Lunn
@ 2019-02-04 21:38                               ` John David Anglin
  2019-02-04 22:47                                 ` Andrew Lunn
  0 siblings, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-02-04 21:38 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-02-04 3:19 p.m., Andrew Lunn wrote:
> The IRQ core would do this if it was needed.
>
> How many other irq thread work functions can you point to which do
> something similar?
This is comment for handle_edge_irq:

/**
 *    handle_edge_irq - edge type IRQ handler
 *    @desc:    the interrupt description structure for this irq
 *
 *    Interrupt occures on the falling and/or rising edge of a hardware
 *    signal. The occurrence is latched into the irq controller hardware
 *    and must be acked in order to be reenabled. After the ack another
 *    interrupt can happen on the same source even before the first one
 *    is handled by the associated event handler. If this happens it
 *    might be necessary to disable (mask) the interrupt depending on the
 *    controller hardware. This requires to reenable the interrupt inside
 *    of the loop which handles the interrupts which have arrived while
 *    the handler was running. If all pending interrupts are handled, the
 *    loop is left.
 */

As can be seen, the above comment suggests that it may be necessary to
disable (mask) interrupt
as I proposed.

I see no evidence from the Marvell functional specifications for the
88E6341 that it sequences
interrupts from the various sources although it might be that device
interrupts are sequenced
so INTn rises and falls.  I haven't seen any ports fail to link without
the hunk on espressobin
but it is hard to stress test the code.

Disabling and re-enabling interrupts in the global control register does
not affect their status.
Thus, at worst, the hunk adds a bit of unnecessary code.  It could be
skipped if we knew we
were using level interrupts.

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 18:37                       ` [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering John David Anglin
  2019-02-04 19:35                         ` Andrew Lunn
@ 2019-02-04 21:59                         ` John David Anglin
  2019-02-04 23:14                           ` Andrew Lunn
                                             ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: John David Anglin @ 2019-02-04 21:59 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

This change fixes a race condition in the setup of hardware irqs and the
code enabling PHY link detection in the mv88e6xxx driver.

This race was observed on the espressobin board where the GPIO interrupt
controller only supports edge interrupts.  If the INTn output pin goes low
before the GPIO interrupt is enabled, PHY link interrupts are not detected.

With this change, we
1) force INTn high by clearing all interrupt enables in global 1 control 1,
2) setup the hardware irq, and then
3) perform the remaining common setup.

This simplifies the setup and allows some unnecessary code to be removed.

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
b/drivers/net/dsa/mv88e6xxx/chip.c
index b2a0e59b6252..9f5c416a3223 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -374,10 +374,29 @@ static void mv88e6xxx_g1_irq_free(struct
mv88e6xxx_chip *chip)
     mutex_unlock(&chip->reg_lock);
 }
 
+static int mv88e6xxx_g1_irq_setup_masks(struct mv88e6xxx_chip *chip)
+{
+    int err;
+    u16 reg;
+
+    /* The INTn output must be high when hardware interrupts are setup.
+       The EEPROM done interrupt enable is set on reset, so clear all
+       interrupt enable bits to ensure INTn is not driven low */
+    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &reg);
+    if (err)
+        return err;
+    reg &= ~GENMASK(chip->info->g1_irqs, 0);
+    err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, reg);
+    if (err)
+        return err;
+
+    /* Reading the interrupt status clears (most of) them */
+    return mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
+}
+
 static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 {
-    int err, irq, virq;
-    u16 reg, mask;
+    int irq;
 
     chip->g1_irq.nirqs = chip->info->g1_irqs;
     chip->g1_irq.domain = irq_domain_add_simple(
@@ -392,43 +411,14 @@ static int mv88e6xxx_g1_irq_setup_common(struct
mv88e6xxx_chip *chip)
     chip->g1_irq.chip = mv88e6xxx_g1_irq_chip;
     chip->g1_irq.masked = ~0;
 
-    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
-    if (err)
-        goto out_mapping;
-
-    mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
-
-    err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
-    if (err)
-        goto out_disable;
-
-    /* Reading the interrupt status clears (most of) them */
-    err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
-    if (err)
-        goto out_disable;
-
     return 0;
-
-out_disable:
-    mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
-    mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
-
-out_mapping:
-    for (irq = 0; irq < 16; irq++) {
-        virq = irq_find_mapping(chip->g1_irq.domain, irq);
-        irq_dispose_mapping(virq);
-    }
-
-    irq_domain_remove(chip->g1_irq.domain);
-
-    return err;
 }
 
 static int mv88e6xxx_g1_irq_setup(struct mv88e6xxx_chip *chip)
 {
     int err;
 
-    err = mv88e6xxx_g1_irq_setup_common(chip);
+    err = mv88e6xxx_g1_irq_setup_masks(chip);
     if (err)
         return err;
 
@@ -437,9 +427,9 @@ static int mv88e6xxx_g1_irq_setup(struct
mv88e6xxx_chip *chip)
                    IRQF_ONESHOT | IRQF_SHARED,
                    dev_name(chip->dev), chip);
     if (err)
-        mv88e6xxx_g1_irq_free_common(chip);
+        return err;
 
-    return err;
+    return mv88e6xxx_g1_irq_setup_common(chip);
 }
 
 static void mv88e6xxx_irq_poll(struct kthread_work *work)
@@ -457,6 +447,10 @@ static int mv88e6xxx_irq_poll_setup(struct
mv88e6xxx_chip *chip)
 {
     int err;
 
+    err = mv88e6xxx_g1_irq_setup_masks(chip);
+    if (err)
+        return err;
+
     err = mv88e6xxx_g1_irq_setup_common(chip);
     if (err)
         return err;

Signed-off-by: John David Anglin <dave.anglin@bell.net>

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 21:38                               ` John David Anglin
@ 2019-02-04 22:47                                 ` Andrew Lunn
  0 siblings, 0 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-02-04 22:47 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Mon, Feb 04, 2019 at 04:38:53PM -0500, John David Anglin wrote:
> On 2019-02-04 3:19 p.m., Andrew Lunn wrote:
> > The IRQ core would do this if it was needed.
> >
> > How many other irq thread work functions can you point to which do
> > something similar?
> This is comment for handle_edge_irq:
> 
> /**
>  *    handle_edge_irq - edge type IRQ handler
>  *    @desc:    the interrupt description structure for this irq
>  *
>  *    Interrupt occures on the falling and/or rising edge of a hardware
>  *    signal. The occurrence is latched into the irq controller hardware
>  *    and must be acked in order to be reenabled. After the ack another
>  *    interrupt can happen on the same source even before the first one
>  *    is handled by the associated event handler. If this happens it
>  *    might be necessary to disable (mask) the interrupt depending on the
>  *    controller hardware. This requires to reenable the interrupt inside
>  *    of the loop which handles the interrupts which have arrived while
>  *    the handler was running. If all pending interrupts are handled, the
>  *    loop is left.
>  */
> 
> As can be seen, the above comment suggests that it may be necessary to
> disable (mask) interrupt
> as I proposed.

Hi Dave

This comment is describing what handle_edge_irq() actually does. Read
the code. It does not say anything about that the handling thread
function should do.

	 Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 21:59                         ` [PATCH v2] net: " John David Anglin
@ 2019-02-04 23:14                           ` Andrew Lunn
  2019-02-05  0:38                             ` John David Anglin
  2019-02-05 18:37                           ` David Miller
  2019-02-11 18:40                           ` [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit John David Anglin
  2 siblings, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-02-04 23:14 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Mon, Feb 04, 2019 at 04:59:13PM -0500, John David Anglin wrote:
> This change fixes a race condition in the setup of hardware irqs and the
> code enabling PHY link detection in the mv88e6xxx driver.
> 
> This race was observed on the espressobin board where the GPIO interrupt
> controller only supports edge interrupts.  If the INTn output pin goes low
> before the GPIO interrupt is enabled, PHY link interrupts are not detected.
> 
> With this change, we
> 1) force INTn high by clearing all interrupt enables in global 1 control 1,
> 2) setup the hardware irq, and then
> 3) perform the remaining common setup.
> 
> This simplifies the setup and allows some unnecessary code to be removed.

Hi Dave

I took a closer look now. I don't actually see why the current code is
wrong.

mv88e6xxx_g1_irq_setup() calls mv88e6xxx_g1_irq_setup_common() and
then registers the interrupt handler.

mv88e6xxx_g1_irq_setup_common() does what you want, it masks all
interrupts in the hardware and clears any pending interrupts which can
be cleared.

The change you made is actually dangerous. As soon as you request the
interrupt, it is live, it can fire, and call
mv88e6xxx_g1_irq_thread_work(). That needs the irq domain. But the
change you made defers the creating of the domain until after the
interrupt is registered. So we can de-refernece a NULL pointer in the
interrupt handler.

	  Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 23:14                           ` Andrew Lunn
@ 2019-02-05  0:38                             ` John David Anglin
  2019-02-05  2:21                               ` Andrew Lunn
  0 siblings, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-02-05  0:38 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-02-04 6:14 p.m., Andrew Lunn wrote:
> On Mon, Feb 04, 2019 at 04:59:13PM -0500, John David Anglin wrote:
>> This change fixes a race condition in the setup of hardware irqs and the
>> code enabling PHY link detection in the mv88e6xxx driver.
>>
>> This race was observed on the espressobin board where the GPIO interrupt
>> controller only supports edge interrupts.  If the INTn output pin goes low
>> before the GPIO interrupt is enabled, PHY link interrupts are not detected.
>>
>> With this change, we
>> 1) force INTn high by clearing all interrupt enables in global 1 control 1,
>> 2) setup the hardware irq, and then
>> 3) perform the remaining common setup.
>>
>> This simplifies the setup and allows some unnecessary code to be removed.
> Hi Dave
>
> I took a closer look now. I don't actually see why the current code is
> wrong.
The problem is INTn can go low before the interrupt handler for it is
registered and enabled.
As a result, interrupts never occur if link happens to come up before
the interrupt handler
completes being enabled.
>
> mv88e6xxx_g1_irq_setup() calls mv88e6xxx_g1_irq_setup_common() and
> then registers the interrupt handler.
>
> mv88e6xxx_g1_irq_setup_common() does what you want, it masks all
> interrupts in the hardware and clears any pending interrupts which can
> be cleared.
>
> The change you made is actually dangerous. As soon as you request the
> interrupt, it is live, it can fire, and call
> mv88e6xxx_g1_irq_thread_work(). That needs the irq domain. But the
> change you made defers the creating of the domain until after the
> interrupt is registered. So we can de-refernece a NULL pointer in the
> interrupt handler.
This can't happen.  The domain is setup immediately after registering
the GPIO interrupt.
The interrupt can't fire until one of the enables is set.  These are set
by mv88e6xxx_g2_irq_setup(),
mv88e6xxx_g1_atu_prob_irq_setup() and
mv88e6xxx_g1_vtu_prob_irq_setup().  These irqs
are setup after mv88e6xxx_g1_irq_setup()/mv88e6xxx_irq_poll_setup() is
called.  Thus, the
irq domain is setup before the GPIO interrupt can fire.

I have tested both hardware and polled interrupts using espressobin with
v4.20.6 and networking
starts correctly.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-05  0:38                             ` John David Anglin
@ 2019-02-05  2:21                               ` Andrew Lunn
  2019-02-05 19:20                                 ` John David Anglin
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-02-05  2:21 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

> The problem is INTn can go low before the interrupt handler for it is
> registered and enabled.

> This can't happen.  The domain is setup immediately after registering
> the GPIO interrupt.
> The interrupt can't fire until one of the enables is set.

These two statement seem to contradict each other?

> These are set
> by mv88e6xxx_g2_irq_setup(),
> mv88e6xxx_g1_atu_prob_irq_setup() and
> mv88e6xxx_g1_vtu_prob_irq_setup().  These irqs
> are setup after mv88e6xxx_g1_irq_setup()/mv88e6xxx_irq_poll_setup() is
> called.  Thus, the
> irq domain is setup before the GPIO interrupt can fire.

At what point is INTn going low, causing you all these problems? I've
yet to see a real description of the race. Please give us a blow by
blow of how the race happens. And then explain how your fix actually
fixes this.

Also, i'm not yet convinced this hardware can actually work correctly
with edge interrupts. You can probably reduce the size of the race
window, but i don't think you can eliminate it. And if you cannot
eliminate it, at some point it is going to hit you.

     Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-04 21:59                         ` [PATCH v2] net: " John David Anglin
  2019-02-04 23:14                           ` Andrew Lunn
@ 2019-02-05 18:37                           ` David Miller
  2019-02-11 18:40                           ` [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit John David Anglin
  2 siblings, 0 replies; 52+ messages in thread
From: David Miller @ 2019-02-05 18:37 UTC (permalink / raw)
  To: dave.anglin; +Cc: andrew, linux, vivien.didelot, f.fainelli, netdev


Something is really wrong with how your patches are submitted.

The patch shows up in between the commit message and your signoff
tags, also the patch appears to be mangled by your email client.

Please fix this before submitting any future work.

Thank you.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-05  2:21                               ` Andrew Lunn
@ 2019-02-05 19:20                                 ` John David Anglin
  2019-02-05 19:54                                   ` Andrew Lunn
  0 siblings, 1 reply; 52+ messages in thread
From: John David Anglin @ 2019-02-05 19:20 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-02-04 9:21 p.m., Andrew Lunn wrote:
>> The problem is INTn can go low before the interrupt handler for it is
>> registered and enabled.
>> This can't happen.  The domain is setup immediately after registering
>> the GPIO interrupt.
>> The interrupt can't fire until one of the enables is set.
> These two statement seem to contradict each other?
I don't think so.  In the first, I am referring to the enabling of the
GPIO pin interrupt in the
Armada 3720 CPU.  Specifically, this would be the setting of the enable
in the South Bridge
GPIO2 Interrupt Enable Register (RD0018C00).  In the second, I'm
referring to the enables
in the switch's Global Control Register.  Setting these enables to zero
forces the the switch's
INTn output high (assuming there isn't a short).  This shouldn't cause a
CPU interrupt if the
South Bridge GPIO2 Polarity Control is set correctly at the time.  INTn
goes low after a
switch reset because EEIntEn is set on reset, so clearing the enables
will causes a rising
edge on INTn.
>
>> These are set
>> by mv88e6xxx_g2_irq_setup(),
>> mv88e6xxx_g1_atu_prob_irq_setup() and
>> mv88e6xxx_g1_vtu_prob_irq_setup().  These irqs
>> are setup after mv88e6xxx_g1_irq_setup()/mv88e6xxx_irq_poll_setup() is
>> called.  Thus, the
>> irq domain is setup before the GPIO interrupt can fire.
> At what point is INTn going low, causing you all these problems? I've
> yet to see a real description of the race. Please give us a blow by
> blow of how the race happens. And then explain how your fix actually
> fixes this.
I'm going to withdraw my proposed change.  I had thought that calling
request_threaded_irq()
earlier fixed the issue at boot.  But given your mail, I reverted the
change in my 4.20.6 build
and I wasn't able to reproduce the problem.  Yet, my 4.20.4 build fails
every time doing a power
on boot.

We have the following interrupt status when it fails:

 44:          2          0     GPIO2  23 Edge      d0032004.mdio-mii:01
 48:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
 50:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
 52:          0          1  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
 55:          0          1  mv88e6xxx-g2   1 Edge     
!soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
 56:          0          0  mv88e6xxx-g2   2 Edge     
!soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
 57:          0          0  mv88e6xxx-g2   3 Edge     
!soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
 69:          0          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog

We have the following values in the Global1 registers:

MCLI> sys dumpGlobal1

------------------------------------------------------
Switch Global Status(0x0)                       c881
ATU FID Register(0x1)                           0000
VTU FID Register(0x2)                           0000
VTU SID Register(0x3)                           0000
Switch Global Control Register(0x4)             40a8

The DeviceInt bit is set in the Global Status register.  It would appear
we have missed an edge on GPIO2 23.
Yet, we have handled 2 interrupts on it.  So, this isn't the setup issue
that I thought it was.

>
> Also, i'm not yet convinced this hardware can actually work correctly
> with edge interrupts. You can probably reduce the size of the race
> window, but i don't think you can eliminate it. And if you cannot
> eliminate it, at some point it is going to hit you.
You could be right but I don't want to give up just yet.  I need to go
back and rebuild v4.20.4 and retest.
My hunch is the second hunk of the original patch will fix this.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
  2019-02-05 19:20                                 ` John David Anglin
@ 2019-02-05 19:54                                   ` Andrew Lunn
  0 siblings, 0 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-02-05 19:54 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

> My hunch is the second hunk of the original patch will fix this.

O.K.

But please fully explain how the race happens and how the fix fully
fixes it. I don't really want to add code which just makes the race
window smaller, but the race still happens. That helps nobody.

In the end, i think you will end up polling. If so, feel free to
submit a patch which makes the poll interval build time configurable.

       Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-04 21:59                         ` [PATCH v2] net: " John David Anglin
  2019-02-04 23:14                           ` Andrew Lunn
  2019-02-05 18:37                           ` David Miller
@ 2019-02-11 18:40                           ` John David Anglin
  2019-02-11 23:33                             ` Andrew Lunn
  2019-02-14  2:07                             ` Andrew Lunn
  2 siblings, 2 replies; 52+ messages in thread
From: John David Anglin @ 2019-02-11 18:40 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

The GPIO interrupt controller on the espressobin board only supports edge interrupts.
If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
possible to miss an edge.  When this happens, the INTn pin on the Marvell switch is
stuck low and no further interrupts occur.

I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
a race in handling device interrupts (e.g. PHY link interrupts).  Some interrupts are
directly cleared by reading the Global 1 status register.  However, the device interrupt
flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
This is done by reading the relevant SERDES and PHY status register.

The code only services interrupts whose status bit is set at the time of reading its status
register.  If an interrupt event occurs after its status is read and before all interrupts
are serviced, then this event will not be serviced and the INTn output pin will remain low.

This is not a problem with polling or level interrupts since the handler will be called
again to process the event.  However, it's a big problem when using level interrupts.

The fix presented here is to add a loop around the code servicing switch interrupts.  If
any pending interrupts remain after the current set has been handled, we loop and process
the new set.  If there are no pending interrupts after servicing, we are sure that INTn has
gone high and we will get an edge when a new event occurs.

Tested on espressobin board.

Signed-off-by:  John David Anglin <dave.anglin@bell.net>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8dca2c949e73..12fd7ce3f1ff 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -261,6 +261,7 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
 	unsigned int sub_irq;
 	unsigned int n;
 	u16 reg;
+	u16 ctl1;
 	int err;

 	mutex_lock(&chip->reg_lock);
@@ -270,13 +271,28 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
 	if (err)
 		goto out;

-	for (n = 0; n < chip->g1_irq.nirqs; ++n) {
-		if (reg & (1 << n)) {
-			sub_irq = irq_find_mapping(chip->g1_irq.domain, n);
-			handle_nested_irq(sub_irq);
-			++nhandled;
+	do {
+		for (n = 0; n < chip->g1_irq.nirqs; ++n) {
+			if (reg & (1 << n)) {
+				sub_irq = irq_find_mapping(chip->g1_irq.domain,
+							   n);
+				handle_nested_irq(sub_irq);
+				++nhandled;
+			}
 		}
-	}
+
+		mutex_lock(&chip->reg_lock);
+		err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &ctl1);
+		if (err)
+			goto unlock;
+		err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
+unlock:
+		mutex_unlock(&chip->reg_lock);
+		if (err)
+			goto out;
+		ctl1 &= GENMASK(chip->g1_irq.nirqs, 0);
+	} while (reg & ctl1);
+
 out:
 	return (nhandled > 0 ? IRQ_HANDLED : IRQ_NONE);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-11 18:40                           ` [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit John David Anglin
@ 2019-02-11 23:33                             ` Andrew Lunn
  2019-02-12  0:57                               ` John David Anglin
  2019-02-14  2:07                             ` Andrew Lunn
  1 sibling, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-02-11 23:33 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

> Signed-off-by:  John David Anglin <dave.anglin@bell.net>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 8dca2c949e73..12fd7ce3f1ff 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -261,6 +261,7 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
>  	unsigned int sub_irq;
>  	unsigned int n;
>  	u16 reg;
> +	u16 ctl1;
>  	int err;
> 
>  	mutex_lock(&chip->reg_lock);
> @@ -270,13 +271,28 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
>  	if (err)
>  		goto out;
> 
> -	for (n = 0; n < chip->g1_irq.nirqs; ++n) {
> -		if (reg & (1 << n)) {
> -			sub_irq = irq_find_mapping(chip->g1_irq.domain, n);
> -			handle_nested_irq(sub_irq);
> -			++nhandled;
> +	do {
> +		for (n = 0; n < chip->g1_irq.nirqs; ++n) {
> +			if (reg & (1 << n)) {
> +				sub_irq = irq_find_mapping(chip->g1_irq.domain,
> +							   n);
> +				handle_nested_irq(sub_irq);
> +				++nhandled;
> +			}
>  		}
> -	}
> +
> +		mutex_lock(&chip->reg_lock);
> +		err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &ctl1);
> +		if (err)
> +			goto unlock;
> +		err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
> +unlock:
> +		mutex_unlock(&chip->reg_lock);
> +		if (err)
> +			goto out;
> +		ctl1 &= GENMASK(chip->g1_irq.nirqs, 0);
> +	} while (reg & ctl1);

Hi David

I just tested this on one of my boards. It loops endlessly:

[   47.173396] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
[   47.182108] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
[   47.190820] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
[   47.199535] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
[   47.208254] mv88e6xxx_g1_irq_thread_work: c881 a8 80   

These are reg, ctl1, reg & ctl1.

So there is an unhandled device interrupt. I think this is because
device interrupts are not masked before installing the interrupt
handler. But i've not fully got to the bottom of this yet.

	 Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-11 23:33                             ` Andrew Lunn
@ 2019-02-12  0:57                               ` John David Anglin
  2019-02-12  1:21                                 ` Andrew Lunn
  2019-02-12  3:58                                 ` Andrew Lunn
  0 siblings, 2 replies; 52+ messages in thread
From: John David Anglin @ 2019-02-12  0:57 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-02-11 6:33 p.m., Andrew Lunn wrote:
>> Signed-off-by:  John David Anglin <dave.anglin@bell.net>
>> ---
>>  drivers/net/dsa/mv88e6xxx/chip.c | 28 ++++++++++++++++++++++------
>>  1 file changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
>> index 8dca2c949e73..12fd7ce3f1ff 100644
>> --- a/drivers/net/dsa/mv88e6xxx/chip.c
>> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
>> @@ -261,6 +261,7 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
>>  	unsigned int sub_irq;
>>  	unsigned int n;
>>  	u16 reg;
>> +	u16 ctl1;
>>  	int err;
>>
>>  	mutex_lock(&chip->reg_lock);
>> @@ -270,13 +271,28 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
>>  	if (err)
>>  		goto out;
>>
>> -	for (n = 0; n < chip->g1_irq.nirqs; ++n) {
>> -		if (reg & (1 << n)) {
>> -			sub_irq = irq_find_mapping(chip->g1_irq.domain, n);
>> -			handle_nested_irq(sub_irq);
>> -			++nhandled;
>> +	do {
>> +		for (n = 0; n < chip->g1_irq.nirqs; ++n) {
>> +			if (reg & (1 << n)) {
>> +				sub_irq = irq_find_mapping(chip->g1_irq.domain,
>> +							   n);
>> +				handle_nested_irq(sub_irq);
>> +				++nhandled;
>> +			}
>>  		}
>> -	}
>> +
>> +		mutex_lock(&chip->reg_lock);
>> +		err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &ctl1);
>> +		if (err)
>> +			goto unlock;
>> +		err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
>> +unlock:
>> +		mutex_unlock(&chip->reg_lock);
>> +		if (err)
>> +			goto out;
>> +		ctl1 &= GENMASK(chip->g1_irq.nirqs, 0);
>> +	} while (reg & ctl1);
> Hi David
>
> I just tested this on one of my boards. It loops endlessly:
>
> [   47.173396] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> [   47.182108] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> [   47.190820] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> [   47.199535] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> [   47.208254] mv88e6xxx_g1_irq_thread_work: c881 a8 80   
>
> These are reg, ctl1, reg & ctl1.
>
> So there is an unhandled device interrupt. I think this is because
> device interrupts are not masked before installing the interrupt
> handler. But i've not fully got to the bottom of this yet.
Yes, it is true the PHY and SERDES enables in Global 2 should be cleared before the interrupt handler
is installed for device interrupts.  That's what is done for the interrupts enables in Global 1.  I'm
not seeing that these enables are initialized.

Which switch?  The device interrupts are not be cleared properly on that board.  Would it be possible
to also print the Global 2 status and enables?  Unplugging the cable that's causing the loop might
cause the loop to stop.

I suspect the same would happen if level interrupts were used.

I tested both edge and polling on espressobin with Armada 3700.  There's no problem with
looping there.  I've booted it many times.  I've unplugged and plugged cables many times.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12  0:57                               ` John David Anglin
@ 2019-02-12  1:21                                 ` Andrew Lunn
  2019-02-12  3:58                                 ` Andrew Lunn
  1 sibling, 0 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-02-12  1:21 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

> Yes, it is true the PHY and SERDES enables in Global 2 should be
> cleared before the interrupt handler is installed for device
> interrupts.  That's what is done for the interrupts enables in
> Global 1.  I'm not seeing that these enables are initialized.
> 
> Which switch?

6390X.

> The device interrupts are not be cleared properly on that board.

I added in code to mask all interrupts. It did not help. I need to go
deeper and see if it is a PHY problem.

> I suspect the same would happen if level interrupts were used.

I've not seen it loop. Which is why i want to understand it fully.

     Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12  0:57                               ` John David Anglin
  2019-02-12  1:21                                 ` Andrew Lunn
@ 2019-02-12  3:58                                 ` Andrew Lunn
  2019-02-12  6:51                                   ` Heiner Kallweit
  1 sibling, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-02-12  3:58 UTC (permalink / raw)
  To: John David Anglin, Heiner Kallweit
  Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

> > Hi David
> >
> > I just tested this on one of my boards. It loops endlessly:
> >
> > [   47.173396] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> > [   47.182108] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> > [   47.190820] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> > [   47.199535] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> > [   47.208254] mv88e6xxx_g1_irq_thread_work: c881 a8 80   
> >
> > These are reg, ctl1, reg & ctl1.
> >
> > So there is an unhandled device interrupt.

Hi Heiner

Your patch Fixes: 2b3e88ea6528 ("net: phy: improve phy state
checking") is causing me problems with interrupts for the Marvell
switches.

That change means we don't check the PHY device if it caused an
interrupt when its state is less than UP.

What i'm seeing is that the PHY is interrupting pretty early on after
a reboot when the previous boot had the interface up.

[   10.125702] Marvell 88E6390 mv88e6xxx-0:02: phy_start_interrupts
[   10.162798] Marvell 88E6390 mv88e6xxx-0:02: phy_enable_interrupts
[   10.168931] Marvell 88E6390 mv88e6xxx-0:02: marvell_ack_interrupt
[   10.180164] Marvell 88E6390 mv88e6xxx-0:02: marvell_config_intr 1

a little later it interrupts:

[   12.999717] mv88e6xxx_g1_irq_thread_fn
[   13.007253] mv88e6xxx_g2_irq_thread_fn: 4 811c 4
[   13.012015] libphy: __phy_is_started: phydev->state 1 PHY_UP 3
[   13.017941] Marvell 88E6390 mv88e6xxx-0:02: phy_interrupt: phy_is_started(phydev) 0

The current code just causes it to be ignored. So the interrupts fires
again, and again...

If i change to code to call into the PHY driver and let it handle the
interrupts, things keep running. A little bit later the interface is
configured up:

[   15.921326] mv88e6085 gpio-0:00 red: configuring for phy/gmii link mode
[   15.928693] libphy: __phy_is_started: phydev->state 3 PHY_UP 3
[   15.929442] IPv6: ADDRCONF(NETDEV_UP): red: link is not ready
[   15.935596] Marvell 88E6390 mv88e6xxx-0:02: m88e6390_config_aneg
[   15.935608] Marvell 88E6390 mv88e6xxx-0:02: m88e6390_errata

[   16.071364] Marvell 88E6390 mv88e6xxx-0:02: m88e1510_config_aneg
[   16.112362] Marvell 88E6390 mv88e6xxx-0:02: m88e1318_config_aneg
[   16.151245] Marvell 88E6390 mv88e6xxx-0:02: m88e1121_config_aneg
[   16.368206] Marvell 88E6390 mv88e6xxx-0:02: PHY state change UP -> NOLINK

and after another interrupt the link goes up.

[   19.519840] mv88e6xxx_g1_irq_thread_fn
[   19.528546] mv88e6xxx_g2_irq_thread_fn: 4 811c 4
[   19.534152] libphy: __phy_is_started: phydev->state 5 PHY_UP 3
[   19.540030] Marvell 88E6390 mv88e6xxx-0:02: phy_interrupt: phy_is_started(phydev) 1
[   19.547721] Marvell 88E6390 mv88e6xxx-0:02: m88e1121_did_interrupt
[   19.559829] Marvell 88E6390 mv88e6xxx-0:02: marvell_ack_interrupt
[   19.590753] Marvell 88E6390 mv88e6xxx-0:02: marvell_read_status
[   19.596712] Marvell 88E6390 mv88e6xxx-0:02: marvell_update_link
[   19.628387] Marvell 88E6390 mv88e6xxx-0:02: PHY state change NOLINK -> RUNNING
[   19.628453] mv88e6085 gpio-0:00 red: Link is Up - 1Gbps/Full - flow control off
[   19.635920] IPv6: ADDRCONF(NETDEV_CHANGE): red: link becomes ready

I don't yet know why the first interrupt happens, before we configure
auto-neg, etc. But it is not too unreasonable. We have configured
interrupts, so it could be reporting link down etc.

So i think we might need to revert part of this change, call into the
driver so long as the PHY is not in state PHY_HALTED.

What do you think?

     Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12  3:58                                 ` Andrew Lunn
@ 2019-02-12  6:51                                   ` Heiner Kallweit
  2019-02-12 12:56                                     ` Andrew Lunn
  2019-02-12 16:30                                     ` Russell King - ARM Linux admin
  0 siblings, 2 replies; 52+ messages in thread
From: Heiner Kallweit @ 2019-02-12  6:51 UTC (permalink / raw)
  To: Andrew Lunn, John David Anglin
  Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 12.02.2019 04:58, Andrew Lunn wrote:
>>> Hi David
>>>
>>> I just tested this on one of my boards. It loops endlessly:
>>>
>>> [   47.173396] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>> [   47.182108] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>> [   47.190820] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>> [   47.199535] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>> [   47.208254] mv88e6xxx_g1_irq_thread_work: c881 a8 80   
>>>
>>> These are reg, ctl1, reg & ctl1.
>>>
>>> So there is an unhandled device interrupt.
> 
> Hi Heiner
> 
> Your patch Fixes: 2b3e88ea6528 ("net: phy: improve phy state
> checking") is causing me problems with interrupts for the Marvell
> switches.
> 
Hi Andrew,

what kernel version is it?
And the PHY driver in use is "Marvell 88E6390" ?

> That change means we don't check the PHY device if it caused an
> interrupt when its state is less than UP.
> 
> What i'm seeing is that the PHY is interrupting pretty early on after
> a reboot when the previous boot had the interface up.
> 
So this means that when going down for reboot the interrupts are not
properly masked / disabled? Because (at least for net-next) we enable
interrupts in phy_start() only.


> [   10.125702] Marvell 88E6390 mv88e6xxx-0:02: phy_start_interrupts
> [   10.162798] Marvell 88E6390 mv88e6xxx-0:02: phy_enable_interrupts
> [   10.168931] Marvell 88E6390 mv88e6xxx-0:02: marvell_ack_interrupt
> [   10.180164] Marvell 88E6390 mv88e6xxx-0:02: marvell_config_intr 1
> 
> a little later it interrupts:
> 
> [   12.999717] mv88e6xxx_g1_irq_thread_fn
> [   13.007253] mv88e6xxx_g2_irq_thread_fn: 4 811c 4
> [   13.012015] libphy: __phy_is_started: phydev->state 1 PHY_UP 3
> [   13.017941] Marvell 88E6390 mv88e6xxx-0:02: phy_interrupt: phy_is_started(phydev) 0
> 
> The current code just causes it to be ignored. So the interrupts fires
> again, and again...
> 
I would have more expected the opposite. If the interrupt is ignored
(IRQ_NONE returned), then it doesn't get acked. And if it's not acked
new interrupts should be blocked. Or is it different with this chip?

> If i change to code to call into the PHY driver and let it handle the
> interrupts, things keep running. A little bit later the interface is
> configured up:
> 
> [   15.921326] mv88e6085 gpio-0:00 red: configuring for phy/gmii link mode
> [   15.928693] libphy: __phy_is_started: phydev->state 3 PHY_UP 3
> [   15.929442] IPv6: ADDRCONF(NETDEV_UP): red: link is not ready
> [   15.935596] Marvell 88E6390 mv88e6xxx-0:02: m88e6390_config_aneg
> [   15.935608] Marvell 88E6390 mv88e6xxx-0:02: m88e6390_errata
> 
> [   16.071364] Marvell 88E6390 mv88e6xxx-0:02: m88e1510_config_aneg
> [   16.112362] Marvell 88E6390 mv88e6xxx-0:02: m88e1318_config_aneg
> [   16.151245] Marvell 88E6390 mv88e6xxx-0:02: m88e1121_config_aneg
> [   16.368206] Marvell 88E6390 mv88e6xxx-0:02: PHY state change UP -> NOLINK
> 
> and after another interrupt the link goes up.
> 
> [   19.519840] mv88e6xxx_g1_irq_thread_fn
> [   19.528546] mv88e6xxx_g2_irq_thread_fn: 4 811c 4
> [   19.534152] libphy: __phy_is_started: phydev->state 5 PHY_UP 3
> [   19.540030] Marvell 88E6390 mv88e6xxx-0:02: phy_interrupt: phy_is_started(phydev) 1
> [   19.547721] Marvell 88E6390 mv88e6xxx-0:02: m88e1121_did_interrupt
> [   19.559829] Marvell 88E6390 mv88e6xxx-0:02: marvell_ack_interrupt
> [   19.590753] Marvell 88E6390 mv88e6xxx-0:02: marvell_read_status
> [   19.596712] Marvell 88E6390 mv88e6xxx-0:02: marvell_update_link
> [   19.628387] Marvell 88E6390 mv88e6xxx-0:02: PHY state change NOLINK -> RUNNING
> [   19.628453] mv88e6085 gpio-0:00 red: Link is Up - 1Gbps/Full - flow control off
> [   19.635920] IPv6: ADDRCONF(NETDEV_CHANGE): red: link becomes ready
> 
> I don't yet know why the first interrupt happens, before we configure
> auto-neg, etc. But it is not too unreasonable. We have configured
> interrupts, so it could be reporting link down etc.
> 
> So i think we might need to revert part of this change, call into the
> driver so long as the PHY is not in state PHY_HALTED.
> 
> What do you think?
> 
I will take a closer look later.

>      Andrew
> 
Heiner

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12  6:51                                   ` Heiner Kallweit
@ 2019-02-12 12:56                                     ` Andrew Lunn
  2019-02-12 18:42                                       ` Heiner Kallweit
  2019-02-12 20:09                                       ` John David Anglin
  2019-02-12 16:30                                     ` Russell King - ARM Linux admin
  1 sibling, 2 replies; 52+ messages in thread
From: Andrew Lunn @ 2019-02-12 12:56 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: John David Anglin, Russell King, Vivien Didelot,
	Florian Fainelli, netdev

On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
> On 12.02.2019 04:58, Andrew Lunn wrote:
> >>> Hi David
> >>>
> >>> I just tested this on one of my boards. It loops endlessly:
> >>>
> >>> [   47.173396] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> >>> [   47.182108] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> >>> [   47.190820] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> >>> [   47.199535] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
> >>> [   47.208254] mv88e6xxx_g1_irq_thread_work: c881 a8 80   
> >>>
> >>> These are reg, ctl1, reg & ctl1.
> >>>
> >>> So there is an unhandled device interrupt.
> > 
> > Hi Heiner
> > 
> > Your patch Fixes: 2b3e88ea6528 ("net: phy: improve phy state
> > checking") is causing me problems with interrupts for the Marvell
> > switches.
> > 
> Hi Andrew,
> 
> what kernel version is it?

It is a little bit old, 5.0-rc1 net-next. I should rebase and
retest. I'm testing on a ZII board which is not fully in mainline So i
need some patches.

> And the PHY driver in use is "Marvell 88E6390" ?

Yes, the marvell 1G driver.

     Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12  6:51                                   ` Heiner Kallweit
  2019-02-12 12:56                                     ` Andrew Lunn
@ 2019-02-12 16:30                                     ` Russell King - ARM Linux admin
  2019-02-12 20:11                                       ` Heiner Kallweit
  2019-02-12 20:54                                       ` Heiner Kallweit
  1 sibling, 2 replies; 52+ messages in thread
From: Russell King - ARM Linux admin @ 2019-02-12 16:30 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Andrew Lunn, John David Anglin, Vivien Didelot, Florian Fainelli, netdev

On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
> On 12.02.2019 04:58, Andrew Lunn wrote:
> > That change means we don't check the PHY device if it caused an
> > interrupt when its state is less than UP.
> > 
> > What i'm seeing is that the PHY is interrupting pretty early on after
> > a reboot when the previous boot had the interface up.
> > 
> So this means that when going down for reboot the interrupts are not
> properly masked / disabled? Because (at least for net-next) we enable
> interrupts in phy_start() only.

Looking at Linus' tree as opposed to net-next, things do look rather
broken wrt interrupts:

+-phy_attach_direct
  `-phydev->state = PHY_READY
+-phy_prepare_link
+-phy_start_machine
  `-phy_trigger_machine()
`-phy_start_interrupts
  +-request_threaded_irq()
  `-phy_enable_interrupts()
    +-phy_clear_interrupt()
    `-phy_config_interrupt(, PHY_INTERRUPT_ENABLED)

At this point, the PHY is then able to generate interrupts, which,
because phy_start() has not been called and phy_interrupt() checks
that phydev->state >= PHY_UP, get ignored by the interrupt handler
exactly as Andrew is finding.

So it looks like 5.0-rc is already in need of this being fixed.

In looking at this, I came across this chunk of code:

static inline bool __phy_is_started(struct phy_device *phydev)
{
        WARN_ON(!mutex_is_locked(&phydev->lock));

        return phydev->state >= PHY_UP;
}

/**
 * phy_is_started - Convenience function to check whether PHY is started
 * @phydev: The phy_device struct
 */
static inline bool phy_is_started(struct phy_device *phydev)
{
        bool started;

        mutex_lock(&phydev->lock);
        started = __phy_is_started(phydev);
        mutex_unlock(&phydev->lock);

        return started;
}

which looks to me like over-complication.  The mutex locking there is
completely pointless - what are you trying to achieve with it?

Let's go through this.  The above is exactly equivalent to:

bool phy_is_started(phydev)
{
	int state;

	mutex_lock(&phydev->lock);
	state = phydev->state;
	mutex_unlock(&phydev->lock);

	return state >= PHY_UP;
}

since when we do the test is irrelevant.  Architectures that Linux
runs on are single-copy atomic, which means that reading phydev->state
itself is an atomic operation.  So, the mutex locking around that
doesn't add to the atomicity of the entire operation.

How, depending on what you do with the rest of this function depends
whether the entire operation is safe or not.  For example, let's take
this code at the end of phy_state_machine():

        if (phy_polling_mode(phydev) && phy_is_started(phydev))
                phy_queue_state_machine(phydev, PHY_STATE_TIME);

state = PHY_UP
		thread 0			thread 1
						phy_disconnect()
						+-phy_is_started()
		phy_is_started()                |
						`-phy_stop()
						  +-phydev->state = PHY_HALTED
						  `-phy_stop_machine()
						    `-cancel_delayed_work_sync()
		phy_queue_state_machine()
		`-mod_delayed_work()

At this point, the phydev->state_queue() has been added back onto the
system workqueue despite phy_stop_machine() having been called and
cancel_delayed_work_sync() called on it.

The original code in 4.20 did not have this race condition.

Basically, the lock inside phy_is_started() does nothing useful, and
I'd say is dangerously misleading.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12 12:56                                     ` Andrew Lunn
@ 2019-02-12 18:42                                       ` Heiner Kallweit
  2019-02-12 20:09                                       ` John David Anglin
  1 sibling, 0 replies; 52+ messages in thread
From: Heiner Kallweit @ 2019-02-12 18:42 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: John David Anglin, Russell King, Vivien Didelot,
	Florian Fainelli, netdev

On 12.02.2019 13:56, Andrew Lunn wrote:
> On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
>> On 12.02.2019 04:58, Andrew Lunn wrote:
>>>>> Hi David
>>>>>
>>>>> I just tested this on one of my boards. It loops endlessly:
>>>>>
>>>>> [   47.173396] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>>>> [   47.182108] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>>>> [   47.190820] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>>>> [   47.199535] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
>>>>> [   47.208254] mv88e6xxx_g1_irq_thread_work: c881 a8 80   
>>>>>
>>>>> These are reg, ctl1, reg & ctl1.
>>>>>
>>>>> So there is an unhandled device interrupt.
>>>
>>> Hi Heiner
>>>
>>> Your patch Fixes: 2b3e88ea6528 ("net: phy: improve phy state
>>> checking") is causing me problems with interrupts for the Marvell
>>> switches.
>>>
>> Hi Andrew,
>>
>> what kernel version is it?
> 
> It is a little bit old, 5.0-rc1 net-next. I should rebase and
> retest. I'm testing on a ZII board which is not fully in mainline So i
> need some patches.
> 
Thanks, Andrew. Indeed 5.0 needs a fix, as also pointed out by Russell.
I think I will simply remove the following:

if (!phy_is_started(phydev))
	return IRQ_NONE;	

Then we basically do the same like phy_mac_interrupt(), we always run
the state machine. If it has nothing to do, then it does nothing.
Therefore also state HALTED doesn't need a special handling.
This way we handle interrupts (incl. spurious ones) gracefully.

>> And the PHY driver in use is "Marvell 88E6390" ?
> 
> Yes, the marvell 1G driver.
> 
>      Andrew
> .
> 
Heiner

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12 12:56                                     ` Andrew Lunn
  2019-02-12 18:42                                       ` Heiner Kallweit
@ 2019-02-12 20:09                                       ` John David Anglin
  1 sibling, 0 replies; 52+ messages in thread
From: John David Anglin @ 2019-02-12 20:09 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit
  Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On 2019-02-12 7:56 a.m., Andrew Lunn wrote:
> It is a little bit old, 5.0-rc1 net-next. I should rebase and
> retest. I'm testing on a ZII board which is not fully in mainline So i
> need some patches.
I attempted to test 5.0.0-rc5 on espressobin but it dies on boot:
Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.0.0-rc5+ (root@espressobin) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PREEMPT Tue Feb 12
10:02:30 EST 2019
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Board
[    0.000000] earlycon: ar3700_uart0 at MMIO 0x00000000d0012000 (options '')
[    0.000000] printk: bootconsole [ar3700_uart0] enabled
[    3.218865] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    3.224521] Modules linked in:
[    3.227661] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5+ #1
[    3.234107] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[    3.240740] pstate: 20000005 (nzCv daif -PAN -UAO)
[    3.245676] pc : dma_direct_map_page+0x48/0x1d8
[    3.250331] lr : mv_xor_channel_add+0x3b0/0xb28
[    3.254984] sp : ffffff8010033a60
[    3.258389] x29: ffffff8010033a60 x28: ffffffc03bf70480
[    3.263854] x27: ffffff8010e97068 x26: 0000000000000000
[    3.269320] x25: 0000000000000029 x24: 0000000000000083
[    3.274785] x23: 0000000000000000 x22: 0000000000000002
[    3.280251] x21: 0000000000000080 x20: ffffff8010ecd000
[    3.285716] x19: 0000000000000000 x18: ffffffffffffffff
[    3.291182] x17: 000000000000000c x16: 000000000000000a
[    3.296648] x15: ffffff8010ecd6c8 x14: ffffffc03bf46783
[    3.302113] x13: ffffffc03bf46782 x12: 0000000000000038
[    3.307579] x11: 0000000000001fff x10: 0000000000000001
[    3.313044] x9 : 0000000000000000 x8 : ffffff8010dbe000
[    3.318510] x7 : ffffff8010fbe000 x6 : ffffffbf00000000
[    3.323976] x5 : 0000000000000000 x4 : 0000000000000002
[    3.329441] x3 : 0000000000000002 x2 : 00000000000006ac
[    3.334907] x1 : ffffffbf00efdc00 x0 : 0000000000000000
[    3.340373] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[    3.347272] Call trace:
[    3.349784]  dma_direct_map_page+0x48/0x1d8
[    3.354084]  mv_xor_channel_add+0x3b0/0xb28
[    3.358384]  mv_xor_probe+0x20c/0x4b8
[    3.362150]  platform_drv_probe+0x50/0xb0
[    3.366269]  really_probe+0x1fc/0x2c0
[    3.370032]  driver_probe_device+0x58/0x100
[    3.374332]  __driver_attach+0xd8/0xe0
[    3.378188]  bus_for_each_dev+0x68/0xc8
[    3.382129]  driver_attach+0x20/0x28
[    3.385803]  bus_add_driver+0x108/0x228
[    3.389744]  driver_register+0x60/0x110
[    3.393687]  __platform_driver_register+0x44/0x50
[    3.398529]  mv_xor_driver_init+0x18/0x20
[    3.402648]  do_one_initcall+0x58/0x170
[    3.406591]  kernel_init_freeable+0x190/0x234
[    3.411072]  kernel_init+0x10/0x108
[    3.414653]  ret_from_fork+0x10/0x1c
[    3.418329] Code: 2a0403f6 934cfc00 aa0503f7 7100047f (f9412663)
[    3.424610] ---[ end trace f7751570455a07a0 ]---
[    3.429423] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    3.437232] SMP: stopping secondary CPUs
[    3.441265] Kernel Offset: disabled
[    3.444849] CPU features: 0x002,2000200c
[    3.448878] Memory Limit: none
[    3.452018] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12 16:30                                     ` Russell King - ARM Linux admin
@ 2019-02-12 20:11                                       ` Heiner Kallweit
  2019-02-12 20:54                                       ` Heiner Kallweit
  1 sibling, 0 replies; 52+ messages in thread
From: Heiner Kallweit @ 2019-02-12 20:11 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Andrew Lunn, John David Anglin, Vivien Didelot, Florian Fainelli, netdev

On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
>> On 12.02.2019 04:58, Andrew Lunn wrote:
>>> That change means we don't check the PHY device if it caused an
>>> interrupt when its state is less than UP.
>>>
>>> What i'm seeing is that the PHY is interrupting pretty early on after
>>> a reboot when the previous boot had the interface up.
>>>
>> So this means that when going down for reboot the interrupts are not
>> properly masked / disabled? Because (at least for net-next) we enable
>> interrupts in phy_start() only.
> 
> Looking at Linus' tree as opposed to net-next, things do look rather
> broken wrt interrupts:
> 
> +-phy_attach_direct
>   `-phydev->state = PHY_READY
> +-phy_prepare_link
> +-phy_start_machine
>   `-phy_trigger_machine()
> `-phy_start_interrupts
>   +-request_threaded_irq()
>   `-phy_enable_interrupts()
>     +-phy_clear_interrupt()
>     `-phy_config_interrupt(, PHY_INTERRUPT_ENABLED)
> 
> At this point, the PHY is then able to generate interrupts, which,
> because phy_start() has not been called and phy_interrupt() checks
> that phydev->state >= PHY_UP, get ignored by the interrupt handler
> exactly as Andrew is finding.
> 
> So it looks like 5.0-rc is already in need of this being fixed.
> 
> In looking at this, I came across this chunk of code:
> 
> static inline bool __phy_is_started(struct phy_device *phydev)
> {
>         WARN_ON(!mutex_is_locked(&phydev->lock));
> 
>         return phydev->state >= PHY_UP;
> }
> 
> /**
>  * phy_is_started - Convenience function to check whether PHY is started
>  * @phydev: The phy_device struct
>  */
> static inline bool phy_is_started(struct phy_device *phydev)
> {
>         bool started;
> 
>         mutex_lock(&phydev->lock);
>         started = __phy_is_started(phydev);
>         mutex_unlock(&phydev->lock);
> 
>         return started;
> }
> 
> which looks to me like over-complication.  The mutex locking there is
> completely pointless - what are you trying to achieve with it?
> 
Even though this code is new it's kind of heritage in phylib that each
access (read or write) to phydev->state is protected by this lock.
I also once wondered whether it's actually needed but didn't spend
effort so far on challenging this. Seems that now the time has come ..

> Let's go through this.  The above is exactly equivalent to:
> 
> bool phy_is_started(phydev)
> {
> 	int state;
> 
> 	mutex_lock(&phydev->lock);
> 	state = phydev->state;
> 	mutex_unlock(&phydev->lock);
> 
> 	return state >= PHY_UP;
> }
> 
> since when we do the test is irrelevant.  Architectures that Linux
> runs on are single-copy atomic, which means that reading phydev->state
> itself is an atomic operation.  So, the mutex locking around that
> doesn't add to the atomicity of the entire operation.
> 
> How, depending on what you do with the rest of this function depends
> whether the entire operation is safe or not.  For example, let's take
> this code at the end of phy_state_machine():
> 
>         if (phy_polling_mode(phydev) && phy_is_started(phydev))
>                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> 
> state = PHY_UP
> 		thread 0			thread 1
> 						phy_disconnect()
> 						+-phy_is_started()
> 		phy_is_started()                |
> 						`-phy_stop()
> 						  +-phydev->state = PHY_HALTED
> 						  `-phy_stop_machine()
> 						    `-cancel_delayed_work_sync()
> 		phy_queue_state_machine()
> 		`-mod_delayed_work()
> 
Thanks for describing this scenario, I'll have a closer look at it.

> At this point, the phydev->state_queue() has been added back onto the
> system workqueue despite phy_stop_machine() having been called and
> cancel_delayed_work_sync() called on it.
> 
> The original code in 4.20 did not have this race condition.
> 
> Basically, the lock inside phy_is_started() does nothing useful, and
> I'd say is dangerously misleading.
> 

Heiner

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12 16:30                                     ` Russell King - ARM Linux admin
  2019-02-12 20:11                                       ` Heiner Kallweit
@ 2019-02-12 20:54                                       ` Heiner Kallweit
  2019-02-12 22:55                                         ` Russell King - ARM Linux admin
  1 sibling, 1 reply; 52+ messages in thread
From: Heiner Kallweit @ 2019-02-12 20:54 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Andrew Lunn, John David Anglin, Vivien Didelot, Florian Fainelli, netdev

On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
>> On 12.02.2019 04:58, Andrew Lunn wrote:
>>> That change means we don't check the PHY device if it caused an
>>> interrupt when its state is less than UP.
>>>
>>> What i'm seeing is that the PHY is interrupting pretty early on after
>>> a reboot when the previous boot had the interface up.
>>>
>> So this means that when going down for reboot the interrupts are not
>> properly masked / disabled? Because (at least for net-next) we enable
>> interrupts in phy_start() only.
> 
[..]
> In looking at this, I came across this chunk of code:
> 
> static inline bool __phy_is_started(struct phy_device *phydev)
> {
>         WARN_ON(!mutex_is_locked(&phydev->lock));
> 
>         return phydev->state >= PHY_UP;
> }
> 
> /**
>  * phy_is_started - Convenience function to check whether PHY is started
>  * @phydev: The phy_device struct
>  */
> static inline bool phy_is_started(struct phy_device *phydev)
> {
>         bool started;
> 
>         mutex_lock(&phydev->lock);
>         started = __phy_is_started(phydev);
>         mutex_unlock(&phydev->lock);
> 
>         return started;
> }
> 
> which looks to me like over-complication.  The mutex locking there is
> completely pointless - what are you trying to achieve with it?
> 
> Let's go through this.  The above is exactly equivalent to:
> 
> bool phy_is_started(phydev)
> {
> 	int state;
> 
> 	mutex_lock(&phydev->lock);
> 	state = phydev->state;
> 	mutex_unlock(&phydev->lock);
> 
> 	return state >= PHY_UP;
> }
> 
> since when we do the test is irrelevant.  Architectures that Linux
> runs on are single-copy atomic, which means that reading phydev->state
> itself is an atomic operation.  So, the mutex locking around that
> doesn't add to the atomicity of the entire operation.
> 
> How, depending on what you do with the rest of this function depends
> whether the entire operation is safe or not.  For example, let's take
> this code at the end of phy_state_machine():
> 
>         if (phy_polling_mode(phydev) && phy_is_started(phydev))
>                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> 
> state = PHY_UP
> 		thread 0			thread 1
> 						phy_disconnect()
> 						+-phy_is_started()
> 		phy_is_started()                |
> 						`-phy_stop()
> 						  +-phydev->state = PHY_HALTED
> 						  `-phy_stop_machine()
> 						    `-cancel_delayed_work_sync()
> 		phy_queue_state_machine()
> 		`-mod_delayed_work()
> 
> At this point, the phydev->state_queue() has been added back onto the
> system workqueue despite phy_stop_machine() having been called and
> cancel_delayed_work_sync() called on it.
> 
> The original code in 4.20 did not have this race condition.
> 
> Basically, the lock inside phy_is_started() does nothing useful, and
> I'd say is dangerously misleading.
> 
Then idea would be to first remove the locking from phy_is_started()
and in a second step do the following to prevent the described race
(phy_stop() takes phydev->lock too).

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index c1ed03800..69dc64a4d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -957,8 +957,10 @@ void phy_state_machine(struct work_struct *work)
         * state machine would be pointless and possibly error prone when
         * called from phy_disconnect() synchronously.
         */
+       mutex_lock(&phydev->lock);
        if (phy_polling_mode(phydev) && phy_is_started(phydev))
                phy_queue_state_machine(phydev, PHY_STATE_TIME);
+       mutex_unlock(&phydev->lock);
 }

Heiner

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-12 20:54                                       ` Heiner Kallweit
@ 2019-02-12 22:55                                         ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 52+ messages in thread
From: Russell King - ARM Linux admin @ 2019-02-12 22:55 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Andrew Lunn, John David Anglin, Vivien Didelot, Florian Fainelli, netdev

On Tue, Feb 12, 2019 at 09:54:55PM +0100, Heiner Kallweit wrote:
> On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> > On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
> >> On 12.02.2019 04:58, Andrew Lunn wrote:
> >>> That change means we don't check the PHY device if it caused an
> >>> interrupt when its state is less than UP.
> >>>
> >>> What i'm seeing is that the PHY is interrupting pretty early on after
> >>> a reboot when the previous boot had the interface up.
> >>>
> >> So this means that when going down for reboot the interrupts are not
> >> properly masked / disabled? Because (at least for net-next) we enable
> >> interrupts in phy_start() only.
> > 
> [..]
> > In looking at this, I came across this chunk of code:
> > 
> > static inline bool __phy_is_started(struct phy_device *phydev)
> > {
> >         WARN_ON(!mutex_is_locked(&phydev->lock));
> > 
> >         return phydev->state >= PHY_UP;
> > }
> > 
> > /**
> >  * phy_is_started - Convenience function to check whether PHY is started
> >  * @phydev: The phy_device struct
> >  */
> > static inline bool phy_is_started(struct phy_device *phydev)
> > {
> >         bool started;
> > 
> >         mutex_lock(&phydev->lock);
> >         started = __phy_is_started(phydev);
> >         mutex_unlock(&phydev->lock);
> > 
> >         return started;
> > }
> > 
> > which looks to me like over-complication.  The mutex locking there is
> > completely pointless - what are you trying to achieve with it?
> > 
> > Let's go through this.  The above is exactly equivalent to:
> > 
> > bool phy_is_started(phydev)
> > {
> > 	int state;
> > 
> > 	mutex_lock(&phydev->lock);
> > 	state = phydev->state;
> > 	mutex_unlock(&phydev->lock);
> > 
> > 	return state >= PHY_UP;
> > }
> > 
> > since when we do the test is irrelevant.  Architectures that Linux
> > runs on are single-copy atomic, which means that reading phydev->state
> > itself is an atomic operation.  So, the mutex locking around that
> > doesn't add to the atomicity of the entire operation.
> > 
> > How, depending on what you do with the rest of this function depends
> > whether the entire operation is safe or not.  For example, let's take
> > this code at the end of phy_state_machine():
> > 
> >         if (phy_polling_mode(phydev) && phy_is_started(phydev))
> >                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> > 
> > state = PHY_UP
> > 		thread 0			thread 1
> > 						phy_disconnect()
> > 						+-phy_is_started()
> > 		phy_is_started()                |
> > 						`-phy_stop()
> > 						  +-phydev->state = PHY_HALTED
> > 						  `-phy_stop_machine()
> > 						    `-cancel_delayed_work_sync()
> > 		phy_queue_state_machine()
> > 		`-mod_delayed_work()
> > 
> > At this point, the phydev->state_queue() has been added back onto the
> > system workqueue despite phy_stop_machine() having been called and
> > cancel_delayed_work_sync() called on it.
> > 
> > The original code in 4.20 did not have this race condition.
> > 
> > Basically, the lock inside phy_is_started() does nothing useful, and
> > I'd say is dangerously misleading.
> > 
> Then idea would be to first remove the locking from phy_is_started()
> and in a second step do the following to prevent the described race
> (phy_stop() takes phydev->lock too).
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index c1ed03800..69dc64a4d 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -957,8 +957,10 @@ void phy_state_machine(struct work_struct *work)
>          * state machine would be pointless and possibly error prone when
>          * called from phy_disconnect() synchronously.
>          */
> +       mutex_lock(&phydev->lock);
>         if (phy_polling_mode(phydev) && phy_is_started(phydev))
>                 phy_queue_state_machine(phydev, PHY_STATE_TIME);
> +       mutex_unlock(&phydev->lock);
>  }

Yep, that approach would certainly be better.  I didn't exhaustively
audit the 5.0-rc code though.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-11 18:40                           ` [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit John David Anglin
  2019-02-11 23:33                             ` Andrew Lunn
@ 2019-02-14  2:07                             ` Andrew Lunn
  2019-02-14  4:47                               ` David Miller
  1 sibling, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-02-14  2:07 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev

On Mon, Feb 11, 2019 at 01:40:21PM -0500, John David Anglin wrote:
> The GPIO interrupt controller on the espressobin board only supports edge interrupts.
> If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
> possible to miss an edge.  When this happens, the INTn pin on the Marvell switch is
> stuck low and no further interrupts occur.
> 
> I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
> a race in handling device interrupts (e.g. PHY link interrupts).  Some interrupts are
> directly cleared by reading the Global 1 status register.  However, the device interrupt
> flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
> This is done by reading the relevant SERDES and PHY status register.
> 
> The code only services interrupts whose status bit is set at the time of reading its status
> register.  If an interrupt event occurs after its status is read and before all interrupts
> are serviced, then this event will not be serviced and the INTn output pin will remain low.
> 
> This is not a problem with polling or level interrupts since the handler will be called
> again to process the event.  However, it's a big problem when using level interrupts.
> 
> The fix presented here is to add a loop around the code servicing switch interrupts.  If
> any pending interrupts remain after the current set has been handled, we loop and process
> the new set.  If there are no pending interrupts after servicing, we are sure that INTn has
> gone high and we will get an edge when a new event occurs.
> 
> Tested on espressobin board.
> 
> Signed-off-by:  John David Anglin <dave.anglin@bell.net>

Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")

Tested-by: Andrew Lunn <andrew@lunn.ch>

David, please ensure that Heiner's patch:

net: phy: fix interrupt handling in non-started states

is applied first. Otherwise we can get into an interrupt storm.

    Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-14  2:07                             ` Andrew Lunn
@ 2019-02-14  4:47                               ` David Miller
  2019-02-14  4:50                                 ` Andrew Lunn
  0 siblings, 1 reply; 52+ messages in thread
From: David Miller @ 2019-02-14  4:47 UTC (permalink / raw)
  To: andrew; +Cc: dave.anglin, linux, vivien.didelot, f.fainelli, netdev

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 14 Feb 2019 03:07:23 +0100

> On Mon, Feb 11, 2019 at 01:40:21PM -0500, John David Anglin wrote:
>> The GPIO interrupt controller on the espressobin board only supports edge interrupts.
>> If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
>> possible to miss an edge.  When this happens, the INTn pin on the Marvell switch is
>> stuck low and no further interrupts occur.
>> 
>> I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
>> a race in handling device interrupts (e.g. PHY link interrupts).  Some interrupts are
>> directly cleared by reading the Global 1 status register.  However, the device interrupt
>> flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
>> This is done by reading the relevant SERDES and PHY status register.
>> 
>> The code only services interrupts whose status bit is set at the time of reading its status
>> register.  If an interrupt event occurs after its status is read and before all interrupts
>> are serviced, then this event will not be serviced and the INTn output pin will remain low.
>> 
>> This is not a problem with polling or level interrupts since the handler will be called
>> again to process the event.  However, it's a big problem when using level interrupts.
>> 
>> The fix presented here is to add a loop around the code servicing switch interrupts.  If
>> any pending interrupts remain after the current set has been handled, we loop and process
>> the new set.  If there are no pending interrupts after servicing, we are sure that INTn has
>> gone high and we will get an edge when a new event occurs.
>> 
>> Tested on espressobin board.
>> 
>> Signed-off-by:  John David Anglin <dave.anglin@bell.net>
> 
> Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
> 
> Tested-by: Andrew Lunn <andrew@lunn.ch>
> 
> David, please ensure that Heiner's patch:
> 
> net: phy: fix interrupt handling in non-started states
> 
> is applied first. Otherwise we can get into an interrupt storm.

Ok, all done.

Should I queue just this one for -stable?  I didn't queue up Heiner's change for
-stable because it fixes a 5.0-rcX regression.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-14  4:47                               ` David Miller
@ 2019-02-14  4:50                                 ` Andrew Lunn
  2019-02-14 15:27                                   ` David Miller
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Lunn @ 2019-02-14  4:50 UTC (permalink / raw)
  To: David Miller; +Cc: dave.anglin, linux, vivien.didelot, f.fainelli, netdev

> Ok, all done.

Thanks

> Should I queue just this one for -stable?  I didn't queue up Heiner's change for
> -stable because it fixes a 5.0-rcX regression.

Yes please.

    Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  2019-02-14  4:50                                 ` Andrew Lunn
@ 2019-02-14 15:27                                   ` David Miller
  0 siblings, 0 replies; 52+ messages in thread
From: David Miller @ 2019-02-14 15:27 UTC (permalink / raw)
  To: andrew; +Cc: dave.anglin, linux, vivien.didelot, f.fainelli, netdev

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 14 Feb 2019 05:50:19 +0100

>> Should I queue just this one for -stable?  I didn't queue up Heiner's change for
>> -stable because it fixes a 5.0-rcX regression.
> 
> Yes please.

Done.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2019-02-14 15:27 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-22 19:16 net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs John David Anglin
2019-01-22 20:28 ` Andrew Lunn
2019-01-22 21:40   ` John David Anglin
2019-01-22 22:36     ` Andrew Lunn
2019-01-22 23:52       ` John David Anglin
2019-01-23  0:11       ` John David Anglin
2019-01-23  0:22         ` Andrew Lunn
2019-01-25 16:30           ` John David Anglin
2019-01-25 16:48             ` Russell King - ARM Linux admin
2019-01-25 18:38               ` John David Anglin
2019-01-30 17:08           ` John David Anglin
2019-01-30 17:28             ` Andrew Lunn
2019-01-30 19:01               ` John David Anglin
2019-01-30 19:09                 ` Andrew Lunn
2019-01-30 22:24               ` John David Anglin
2019-01-30 22:38                 ` Andrew Lunn
2019-01-31  1:27                   ` John David Anglin
2019-01-31 17:27                     ` John David Anglin
2019-02-04 18:37                       ` [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering John David Anglin
2019-02-04 19:35                         ` Andrew Lunn
2019-02-04 19:52                           ` John David Anglin
2019-02-04 20:19                             ` Andrew Lunn
2019-02-04 21:38                               ` John David Anglin
2019-02-04 22:47                                 ` Andrew Lunn
2019-02-04 21:59                         ` [PATCH v2] net: " John David Anglin
2019-02-04 23:14                           ` Andrew Lunn
2019-02-05  0:38                             ` John David Anglin
2019-02-05  2:21                               ` Andrew Lunn
2019-02-05 19:20                                 ` John David Anglin
2019-02-05 19:54                                   ` Andrew Lunn
2019-02-05 18:37                           ` David Miller
2019-02-11 18:40                           ` [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit John David Anglin
2019-02-11 23:33                             ` Andrew Lunn
2019-02-12  0:57                               ` John David Anglin
2019-02-12  1:21                                 ` Andrew Lunn
2019-02-12  3:58                                 ` Andrew Lunn
2019-02-12  6:51                                   ` Heiner Kallweit
2019-02-12 12:56                                     ` Andrew Lunn
2019-02-12 18:42                                       ` Heiner Kallweit
2019-02-12 20:09                                       ` John David Anglin
2019-02-12 16:30                                     ` Russell King - ARM Linux admin
2019-02-12 20:11                                       ` Heiner Kallweit
2019-02-12 20:54                                       ` Heiner Kallweit
2019-02-12 22:55                                         ` Russell King - ARM Linux admin
2019-02-14  2:07                             ` Andrew Lunn
2019-02-14  4:47                               ` David Miller
2019-02-14  4:50                                 ` Andrew Lunn
2019-02-14 15:27                                   ` David Miller
2019-01-22 23:12 ` net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs Andrew Lunn
2019-01-22 23:48   ` John David Anglin
2019-01-23  0:00   ` John David Anglin
2019-01-23  0:04     ` Florian Fainelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.