All of lore.kernel.org
 help / color / mirror / Atom feed
* ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
@ 2021-11-16 10:11 Jordan Vrtanoski
  2021-11-16 10:51 ` Russell King (Oracle)
  0 siblings, 1 reply; 11+ messages in thread
From: Jordan Vrtanoski @ 2021-11-16 10:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I am trying to build kernel 5.4.150 for SolidRun ClearFog GT 8K. The kernel is stable and I had not experiance any stability issue. The only issue I am facing is with enabling the support for the SFP+ module. 

I am using SFP-H10GB-CU1M direct coper cable module. The module works on earlier version of the kernel 5.1.x (verified on the same hardware), however once the device is booted under 5.4.150, the transceiver is not recognised and the interface remains down. There are no error messages reported by the kernel.

The 5.1.0 is the kernel that is distributed by SolidRun and is available in the SolidRun image repository as the standard Debian image. The 5.4.150 is vanilla (no patches are applied) and was build on the device under Ubuntu20.04.

I have enabled debug on SFP and MVPP2 modules on both kernels and compared the message:

echo 'module sfp +pfl' > /sys/kernel/debug/dynamic_debug/control
echo 'module mvpp2 +pfl' > /sys/kernel/debug/dynamic_debug/control
echo 'module phylink +pfl' > /sys/kernel/debug/dynamic_debug/control
echo 'module marvell +pfl' > /sys/kernel/debug/dynamic_debug/control
echo 'module marvell10g +pfl' > /sys/kernel/debug/dynamic_debug/control

Following are the messages in both cases
5.1.0
--------------- 
[  131.008032] sfp_check_state:1729: sfp sfp-cp0-eth0: mod-def0 0 -> 1
[  131.008038] sfp_sm_event:1489: sfp sfp-cp0-eth0: SM: enter empty:up:down event insert
[  131.008041] sfp_module_tx_disable:1138: sfp sfp-cp0-eth0: tx disable 1 -> 1
[  131.008054] sfp_sm_event:1623: sfp sfp-cp0-eth0: SM: exit probe:up:down
[  131.328672] sfp_sm_event:1489: sfp sfp-cp0-eth0: SM: enter probe:up:down event timeout
[  131.340025] sfp sfp-cp0-eth0: module OEM              SFP-H10GB-CU1M   rev R    sn CSC200902040339  dc 200918
[  131.350180] phylink_sfp_module_insert:1630: mvpp2 f2000000.ethernet eth0: requesting link mode inband/10gbase-kr with support 0,00000600,0000e040
[  131.350184] sfp_module_tx_enable:1146: sfp sfp-cp0-eth0: tx disable 1 -> 0
[  131.350195] sfp_sm_event:1623: sfp sfp-cp0-eth0: SM: exit present:up:init
[  131.680671] sfp_sm_event:1489: sfp sfp-cp0-eth0: SM: enter present:up:init event timeout
[  131.680676] sfp_sm_event:1623: sfp sfp-cp0-eth0: SM: exit present:up:link_up
[  131.680695] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control off

5.4.150
--------------- 
[  135.613202] sfp_check_state:1795: sfp sfp-cp0-eth0: mod-def0 0 -> 1
[  135.613210] sfp_sm_event:1551: sfp sfp-cp0-eth0: SM: enter empty:up:down event insert
[  135.613214] sfp_module_tx_disable:1203: sfp sfp-cp0-eth0: tx disable 1 -> 1
[  135.613228] sfp_sm_event:1686: sfp sfp-cp0-eth0: SM: exit probe:up:down
[  135.935898] sfp_sm_event:1551: sfp sfp-cp0-eth0: SM: enter probe:up:down event timeout
[  135.947254] sfp sfp-cp0-eth0: module OEM              SFP-H10GB-CU1M   rev R    sn CSC200902040339  dc 200918
[  135.947264] phylink_sfp_module_insert:1749: mvpp2 f2000000.ethernet eth0: requesting link mode inband/10gbase-kr with support 00,00000600,00008040
[  135.947269] sfp_module_tx_enable:1211: sfp sfp-cp0-eth0: tx disable 1 -> 0
[  135.947280] sfp_sm_event:1686: sfp sfp-cp0-eth0: SM: exit present:up:init
[  136.255901] sfp_sm_event:1551: sfp sfp-cp0-eth0: SM: enter present:up:init event timeout
[  136.255910] sfp_sm_event:1686: sfp sfp-cp0-eth0: SM: exit present:up:link_up

I can notice that the SFP module is sending the link_up event, but in case of the 5.4.150 the message "mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control off" is missing. 

Comparison of the device tree for both kernels shows minor changes not related to the SFP module. The 5.4.150 device tree is from the kernel source (not patched).

“lshw” shows some differences, but I am not sure if this is relevant for the issue.

5.1.0
----------------
  *-network:6
       description: Ethernet interface
       physical id: 7
       logical name: eth0
       serial: 76:28:b3:7e:9e:00
       size: 10Gbit/s
       capabilities: ethernet physical autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=mvpp2 driverversion=1.0 duplex=full ip=10.0.1.188 link=yes multicast=yes speed=10Gbit/s


5.4.150
-----------------
  *-network:8
       description: Ethernet interface
       physical id: d
       logical name: eth0
       serial: be:a2:a4:8a:82:7b
       size: 10Gbit/s
       capabilities: ethernet physical 2500bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=mvpp2 driverversion=1.0 duplex=full link=no multicast=yes speed=10Gbit/s


ethtool on 5.4.150 shows that the module is identified by I2C bus and the EPROM can be read:
root@cdev-desktop:~# ethtool -m eth0
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x21 (Copper pigtail)
        Transceiver codes                         : 0x00 0x00 0x00 0x00 0x00 0x04 0x00 0x00 0x00
        Transceiver type                          : Passive Cable
        Encoding                                  : 0x00 (unspecified)
        BR, Nominal                               : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 0km
        Length (SMF)                              : 0m
        Length (50um)                             : 0m
        Length (62.5um)                           : 0m
        Length (Copper)                           : 1m
        Length (OM3)                              : 0m
        Passive Cu cmplnce.                       : 0x01 (SFF-8431 appendix E) [SFF-8472 rev10.4 only]
        Vendor name                               : OEM
        Vendor OUI                                : 00:40:20
        Vendor PN                                 : SFP-H10GB-CU1M
        Vendor rev                                : R
        Option values                             : 0x00 0x00
        BR margin, max                            : 0%
        BR margin, min                            : 0%
        Vendor SN                                 : CSC200902040339
        Date code                                 : 200918
root@cdev-desktop:~# ethtool eth0
Settings for eth0:
        Supported ports: [ ]
        Supported link modes:   2500baseX/Full
                                1000baseX/Full
                                10000baseCR/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  2500baseX/Full
                                1000baseX/Full
                                10000baseCR/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Link detected: no
root@cdev-desktop:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 4e:ad:1d:fa:a5:d9 brd ff:ff:ff:ff:ff:ff
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 6a:8d:f8:e0:8c:d7 brd ff:ff:ff:ff:ff:ff
4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 2048
    link/ether be:a2:a4:8a:82:7b brd ff:ff:ff:ff:ff:ff
5: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 2048
    link/ether 22:1a:5f:3a:5b:f7 brd ff:ff:ff:ff:ff:ff
6: eth2: <BROADCAST,MULTICAST> mtu 1508 qdisc noop state DOWN mode DEFAULT group default qlen 2048
    link/ether 3a:9f:d4:6d:7d:d8 brd ff:ff:ff:ff:ff:ff
7: lan2@eth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 3a:9f:d4:6d:7d:d8 brd ff:ff:ff:ff:ff:ff
8: lan1@eth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 3a:9f:d4:6d:7d:d8 brd ff:ff:ff:ff:ff:ff
9: lan4@eth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 3a:9f:d4:6d:7d:d8 brd ff:ff:ff:ff:ff:ff
10: lan3@eth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 3a:9f:d4:6d:7d:d8 brd ff:ff:ff:ff:ff:ff

The kernel is compiled with following configurations switched to “y"
CONFIG_NET_VENDOR_MARVELL=y
CONFIG_MVMDIO=y
CONFIG_MVPP2=y
CONFIG_MDIO_DEVICE=y
CONFIG_MDIO_BUS=y
CONFIG_MDIO_BITBANG=y
CONFIG_MDIO_BUS_MUX=y
CONFIG_MDIO_BUS_MUX_MMIOREG=y
CONFIG_MDIO_BUS_MUX_MULTIPLEXER=y
CONFIG_MDIO_I2C=y
CONFIG_PHYLINK=y
CONFIG_PHYLIB=y
CONFIG_SWPHY=y
CONFIG_SFP=y
CONFIG_FIXED_PHY=y
CONFIG_MARVELL_PHY=y
CONFIG_MARVELL_10G_PHY=y

The boot messages related to MVPP2, SFP and PHY are:
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd081]
[    0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys).
[    1.620372] ahci f4540000.sata: f4540000.sata supply phy not found, using dummy regulator
[    1.704302] libphy: Fixed MDIO Bus: probed
[    1.783828] libphy: orion_mdio_bus: probed
[    1.821979] mvpp2 f2000000.ethernet: using 8 per-cpu buffers
[    1.834108] mvpp2 f4000000.ethernet: using 8 per-cpu buffers
[    3.983502] ahci f4540000.sata: f4540000.sata supply phy not found, using dummy regulator
[    4.060353] libphy: SFP I2C Bus: probed
[    4.064268] sfp sfp-cp0-eth0: Host maximum power 2.0W
[    4.070274] mvpp2 f2000000.ethernet: using 8 per-cpu buffers
[    4.085370] mvpp2 f2000000.ethernet eth0: Using random mac address e2:b6:4b:8e:ff:2d
[    4.105582] mvpp2 f4000000.ethernet: using 8 per-cpu buffers
[    4.123359] mvpp2 f4000000.ethernet eth1: Using random mac address 72:03:f1:90:06:3a
[    4.132815] mvpp2 f4000000.ethernet eth2: Using random mac address 1e:77:ca:54:86:1b
[    4.430625] sfp sfp-cp0-eth0: module OEM              SFP-H10GB-CU1M   rev R    sn CSC200902040339  dc 200918
[   27.205578] libphy: mdio: probed
[   27.364802] mvpp2 f2000000.ethernet eth0: configuring for inband/10gbase-kr link mode
[   27.473516] mvpp2 f4000000.ethernet eth1: PHY [f412a200.mdio-mii:00] driver [Marvell 88E1510]
[   27.474724] mvpp2 f4000000.ethernet eth1: configuring for phy/sgmii link mode
[   27.947046] mvpp2 f4000000.ethernet: all ports have a low MTU, switching to per-cpu buffers
[   27.982011] mvpp2 f4000000.ethernet: using 8 per-cpu buffers
[   28.000887] mvpp2 f4000000.ethernet eth1: PHY [f412a200.mdio-mii:00] driver [Marvell 88E1510]
[   28.002071] mvpp2 f4000000.ethernet eth1: configuring for phy/sgmii link mode


It looks to me like the MVPP2 is not bringing the interface up, but I was not able to find anything that can help me to further debug the problem. 

I need help to identify the problem and fix it. Is there anything else I need to enable on debug?

Thanks,
Jordan
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
  2021-11-16 10:11 ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150 Jordan Vrtanoski
@ 2021-11-16 10:51 ` Russell King (Oracle)
  2021-11-20 18:57   ` Jordan Vrtanoski
  0 siblings, 1 reply; 11+ messages in thread
From: Russell King (Oracle) @ 2021-11-16 10:51 UTC (permalink / raw)
  To: Jordan Vrtanoski; +Cc: linux-arm-kernel

On Tue, Nov 16, 2021 at 02:11:54PM +0400, Jordan Vrtanoski wrote:
> Hi,
> 
> I am trying to build kernel 5.4.150 for SolidRun ClearFog GT 8K. The kernel is stable and I had not experiance any stability issue. The only issue I am facing is with enabling the support for the SFP+ module. 
> 
> I am using SFP-H10GB-CU1M direct coper cable module. The module works on earlier version of the kernel 5.1.x (verified on the same hardware), however once the device is booted under 5.4.150, the transceiver is not recognised and the interface remains down. There are no error messages reported by the kernel.

The transceiver is recognised, but for some reason the link isn't coming
up - and from the debug kernel messages, there's no information pointing
to why not. At a guess, mvpp2 is not reporting that it has link for some
reason.

I don't think it's an issue with the module or the SFP layer - since
that is going into "link up" state and the TX_DISABLE signal is being
deasserted (not that they mean much for a DA cable.) So, I'd be thinking
that the problem is with the comphy or mvpp2 drivers.

mvpp2 doesn't have much in the way of debugging messages to help with
this, but phylink should at least be printing "mac link" messages in
both 5.1.x and 5.4.x kernels - but it doesn't appear to be. That
suggests enabling dynamic debug is not sufficient to get those messages
out... I'm afraid I've never used dynamic debug so can't help with that.
Maybe someone who knows now netdev_dbg() interacts with dynamic debug
can help with that. Strangely, you are getting the netdev_dbg() from
within phylink_sfp_module_insert(), so I don't really understand why
you aren't getting anything else from phylink.

It all seems rather odd.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
  2021-11-16 10:51 ` Russell King (Oracle)
@ 2021-11-20 18:57   ` Jordan Vrtanoski
  2021-11-20 19:41     ` Russell King (Oracle)
  0 siblings, 1 reply; 11+ messages in thread
From: Jordan Vrtanoski @ 2021-11-20 18:57 UTC (permalink / raw)
  To: Russell King (Oracle), mw; +Cc: linux-arm-kernel

I have tried the same config on 5.10.80 and the transceiver is properly initialised, link between PHY and MAC established and IP address obtained trough DHCP.
This is definitely a regression defect in 5.4.x.

What I managed to find till now by adding debugging messages in the phylink and mvpp2 is that when the phylink_resolve
is invoked, phylink_resolve will invoke mvpp2_phylink_mac_link_state (trough the pl->ops). 
The stat that is returned is same with the old state, so the phylink_mac_link_up is never executed.
From what I can conclude from the code, it looks like the MVPP22_XLG_STATUS_LINK_UP is not set in the registry MVPP22_XLG_STATUS  

I have also noticed that the mvpp2_link_status_isr is not invoked when the transciver is inserted. Is this expected behaviour? 
Who should set the MVPP22_XLG_STATUS_LINK_UP after it’s detected that the module was inserted in the cage?


> On 16.11.2021, at 14:51 , Russell King (Oracle) <linux@armlinux.org.uk> wrote:
> 
> On Tue, Nov 16, 2021 at 02:11:54PM +0400, Jordan Vrtanoski wrote:
>> Hi,
>> 
>> I am trying to build kernel 5.4.150 for SolidRun ClearFog GT 8K. The kernel is stable and I had not experiance any stability issue. The only issue I am facing is with enabling the support for the SFP+ module. 
>> 
>> I am using SFP-H10GB-CU1M direct coper cable module. The module works on earlier version of the kernel 5.1.x (verified on the same hardware), however once the device is booted under 5.4.150, the transceiver is not recognised and the interface remains down. There are no error messages reported by the kernel.
> 
> The transceiver is recognised, but for some reason the link isn't coming
> up - and from the debug kernel messages, there's no information pointing
> to why not. At a guess, mvpp2 is not reporting that it has link for some
> reason.
> 
> I don't think it's an issue with the module or the SFP layer - since
> that is going into "link up" state and the TX_DISABLE signal is being
> deasserted (not that they mean much for a DA cable.) So, I'd be thinking
> that the problem is with the comphy or mvpp2 drivers.
> 
> mvpp2 doesn't have much in the way of debugging messages to help with
> this, but phylink should at least be printing "mac link" messages in
> both 5.1.x and 5.4.x kernels - but it doesn't appear to be. That
> suggests enabling dynamic debug is not sufficient to get those messages
> out... I'm afraid I've never used dynamic debug so can't help with that.
> Maybe someone who knows now netdev_dbg() interacts with dynamic debug
> can help with that. Strangely, you are getting the netdev_dbg() from
> within phylink_sfp_module_insert(), so I don't really understand why
> you aren't getting anything else from phylink.
> 
> It all seems rather odd.
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
  2021-11-20 18:57   ` Jordan Vrtanoski
@ 2021-11-20 19:41     ` Russell King (Oracle)
  2021-11-21  5:37       ` Jordan Vrtanoski
  0 siblings, 1 reply; 11+ messages in thread
From: Russell King (Oracle) @ 2021-11-20 19:41 UTC (permalink / raw)
  To: Jordan Vrtanoski; +Cc: mw, linux-arm-kernel

On Sat, Nov 20, 2021 at 10:57:39PM +0400, Jordan Vrtanoski wrote:
> I have tried the same config on 5.10.80 and the transceiver is properly initialised, link between PHY and MAC established and IP address obtained trough DHCP.
> This is definitely a regression defect in 5.4.x.
> 
> What I managed to find till now by adding debugging messages in the phylink and mvpp2 is that when the phylink_resolve
> is invoked, phylink_resolve will invoke mvpp2_phylink_mac_link_state (trough the pl->ops). 
> The stat that is returned is same with the old state, so the phylink_mac_link_up is never executed.
> From what I can conclude from the code, it looks like the MVPP22_XLG_STATUS_LINK_UP is not set in the registry MVPP22_XLG_STATUS  
> 
> I have also noticed that the mvpp2_link_status_isr is not invoked when the transciver is inserted. Is this expected behaviour? 
> Who should set the MVPP22_XLG_STATUS_LINK_UP after it’s detected that the module was inserted in the cage?

The hardware does, when it detects valid 10GBASE-R signal being received
from the SFP. It could mean something is misconfigured with the COMPHY
block, or something wrong with the mvpp2 driver.

I'm afraid that getting a bug report against a stable kernel is about
the worst situation for me - I don't track the stable kernels or what
goes into them. To be honest, I don't remember much about the 5.4
kernel (I blame that on getting old!)

Please can you test vanilla 5.4? If that works, then please bisect
between 5.4 and the 5.4.x kernel that you've identified as failing.
It could be some patch has been backported into that stable kernel
that should not have been.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
  2021-11-20 19:41     ` Russell King (Oracle)
@ 2021-11-21  5:37       ` Jordan Vrtanoski
  2021-11-22 10:51         ` Jordan Vrtanoski
  0 siblings, 1 reply; 11+ messages in thread
From: Jordan Vrtanoski @ 2021-11-21  5:37 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: mw, linux-arm-kernel

I also suspect that it could be a case of the misconfiguration of the hardware,
and I believe it’s coming from MVPP2 driver.

I will try 5.4.0, and if it works, I will bisect till 5.4.150 to see when the defect was introduced.

> On 20.11.2021, at 23:41 , Russell King (Oracle) <linux@armlinux.org.uk> wrote:
> 
> On Sat, Nov 20, 2021 at 10:57:39PM +0400, Jordan Vrtanoski wrote:
>> I have tried the same config on 5.10.80 and the transceiver is properly initialised, link between PHY and MAC established and IP address obtained trough DHCP.
>> This is definitely a regression defect in 5.4.x.
>> 
>> What I managed to find till now by adding debugging messages in the phylink and mvpp2 is that when the phylink_resolve
>> is invoked, phylink_resolve will invoke mvpp2_phylink_mac_link_state (trough the pl->ops). 
>> The stat that is returned is same with the old state, so the phylink_mac_link_up is never executed.
>> From what I can conclude from the code, it looks like the MVPP22_XLG_STATUS_LINK_UP is not set in the registry MVPP22_XLG_STATUS  
>> 
>> I have also noticed that the mvpp2_link_status_isr is not invoked when the transciver is inserted. Is this expected behaviour? 
>> Who should set the MVPP22_XLG_STATUS_LINK_UP after it’s detected that the module was inserted in the cage?
> 
> The hardware does, when it detects valid 10GBASE-R signal being received
> from the SFP. It could mean something is misconfigured with the COMPHY
> block, or something wrong with the mvpp2 driver.
> 
> I'm afraid that getting a bug report against a stable kernel is about
> the worst situation for me - I don't track the stable kernels or what
> goes into them. To be honest, I don't remember much about the 5.4
> kernel (I blame that on getting old!)
> 
> Please can you test vanilla 5.4? If that works, then please bisect
> between 5.4 and the 5.4.x kernel that you've identified as failing.
> It could be some patch has been backported into that stable kernel
> that should not have been.
> 
> Thanks.
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
  2021-11-21  5:37       ` Jordan Vrtanoski
@ 2021-11-22 10:51         ` Jordan Vrtanoski
  2021-11-22 13:15           ` Jordan Vrtanoski
  2021-11-22 19:53             ` Russell King (Oracle)
  0 siblings, 2 replies; 11+ messages in thread
From: Jordan Vrtanoski @ 2021-11-22 10:51 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: mw, linux-arm-kernel, stefanc

Hi,
    After bisecting, the regression defect was introduced in 5.4.90 with the following patch:
"[PATCH net v3] net: mvpp2: disable force link UP during port init procedure”

    The patch is changing the configuration of the port during the initialisation of MVPP22_XLG_CTRL0_REG, which
on ClearFog GT 8K is preventing the MVPP2 to properly start the MAC after the transceiver is detected. After reverting 
the patch, the transceiver works properly.

    Here is the patch for 5.4.150 I applied (reversing the patch "[PATCH net v3] net: mvpp2: disable force link UP during port init procedure”):

--- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c      2021-11-22 09:36:45.176710763 +0000
+++ drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c      2021-09-30 08:09:26.000000000 +0000
@@ -4545,7 +4545,7 @@
        struct mvpp2 *priv = port->priv;
        struct mvpp2_txq_pcpu *txq_pcpu;
        unsigned int thread;
-       int queue, err;
+       int queue, err, val;

        /* Checks for hardware constraints */
        if (port->first_rxq + port->nrxqs >
@@ -4559,6 +4559,18 @@
        mvpp2_egress_disable(port);
        mvpp2_port_disable(port);

+       if (mvpp2_is_xlg(port->phy_interface)) {
+               val = readl(port->base + MVPP22_XLG_CTRL0_REG);
+               val &= ~MVPP22_XLG_CTRL0_FORCE_LINK_PASS;
+               val |= MVPP22_XLG_CTRL0_FORCE_LINK_DOWN;
+               writel(val, port->base + MVPP22_XLG_CTRL0_REG);
+       } else {
+               val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+               val &= ~MVPP2_GMAC_FORCE_LINK_PASS;
+               val |= MVPP2_GMAC_FORCE_LINK_DOWN;
+               writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+       }
+
        port->tx_time_coal = MVPP2_TXDONE_COAL_USEC;

        port->txqs = devm_kcalloc(dev, port->ntxqs, sizeof(*port->txqs),


> On 21.11.2021, at 09:37 , Jordan Vrtanoski <jordan.vrtanoski@gmail.com> wrote:
> 
> I also suspect that it could be a case of the misconfiguration of the hardware,
> and I believe it’s coming from MVPP2 driver.
> 
> I will try 5.4.0, and if it works, I will bisect till 5.4.150 to see when the defect was introduced.
> 
>> On 20.11.2021, at 23:41 , Russell King (Oracle) <linux@armlinux.org.uk> wrote:
>> 
>> On Sat, Nov 20, 2021 at 10:57:39PM +0400, Jordan Vrtanoski wrote:
>>> I have tried the same config on 5.10.80 and the transceiver is properly initialised, link between PHY and MAC established and IP address obtained trough DHCP.
>>> This is definitely a regression defect in 5.4.x.
>>> 
>>> What I managed to find till now by adding debugging messages in the phylink and mvpp2 is that when the phylink_resolve
>>> is invoked, phylink_resolve will invoke mvpp2_phylink_mac_link_state (trough the pl->ops). 
>>> The stat that is returned is same with the old state, so the phylink_mac_link_up is never executed.
>>> From what I can conclude from the code, it looks like the MVPP22_XLG_STATUS_LINK_UP is not set in the registry MVPP22_XLG_STATUS  
>>> 
>>> I have also noticed that the mvpp2_link_status_isr is not invoked when the transciver is inserted. Is this expected behaviour? 
>>> Who should set the MVPP22_XLG_STATUS_LINK_UP after it’s detected that the module was inserted in the cage?
>> 
>> The hardware does, when it detects valid 10GBASE-R signal being received
>> from the SFP. It could mean something is misconfigured with the COMPHY
>> block, or something wrong with the mvpp2 driver.
>> 
>> I'm afraid that getting a bug report against a stable kernel is about
>> the worst situation for me - I don't track the stable kernels or what
>> goes into them. To be honest, I don't remember much about the 5.4
>> kernel (I blame that on getting old!)
>> 
>> Please can you test vanilla 5.4? If that works, then please bisect
>> between 5.4 and the 5.4.x kernel that you've identified as failing.
>> It could be some patch has been backported into that stable kernel
>> that should not have been.
>> 
>> Thanks.
>> 
>> -- 
>> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
  2021-11-22 10:51         ` Jordan Vrtanoski
@ 2021-11-22 13:15           ` Jordan Vrtanoski
  2021-11-22 19:53             ` Russell King (Oracle)
  1 sibling, 0 replies; 11+ messages in thread
From: Jordan Vrtanoski @ 2021-11-22 13:15 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: mw, linux-arm-kernel, stefanc

Hi,
    Sorry, I didn’t notice that I create the patch in the opposite direction, here is the correct one.

Jordan

--- linux-5.4.150/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c      2021-09-30 08:09:26.000000000 +0000
+++ linux-5.4.150/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c      2021-11-22 09:36:45.176710763 +0000
@@ -4545,7 +4545,7 @@
        struct mvpp2 *priv = port->priv;
        struct mvpp2_txq_pcpu *txq_pcpu;
        unsigned int thread;
-       int queue, err, val;
+       int queue, err;

        /* Checks for hardware constraints */
        if (port->first_rxq + port->nrxqs >
@@ -4559,18 +4559,6 @@
        mvpp2_egress_disable(port);
        mvpp2_port_disable(port);

-       if (mvpp2_is_xlg(port->phy_interface)) {
-               val = readl(port->base + MVPP22_XLG_CTRL0_REG);
-               val &= ~MVPP22_XLG_CTRL0_FORCE_LINK_PASS;
-               val |= MVPP22_XLG_CTRL0_FORCE_LINK_DOWN;
-               writel(val, port->base + MVPP22_XLG_CTRL0_REG);
-       } else {
-               val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-               val &= ~MVPP2_GMAC_FORCE_LINK_PASS;
-               val |= MVPP2_GMAC_FORCE_LINK_DOWN;
-               writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-       }
-
        port->tx_time_coal = MVPP2_TXDONE_COAL_USEC;

        port->txqs = devm_kcalloc(dev, port->ntxqs, sizeof(*port->txqs),


> On 22.11.2021, at 14:51 , Jordan Vrtanoski <jordan.vrtanoski@gmail.com> wrote:
> 
> Hi,
>    After bisecting, the regression defect was introduced in 5.4.90 with the following patch:
> "[PATCH net v3] net: mvpp2: disable force link UP during port init procedure”
> 
>    The patch is changing the configuration of the port during the initialisation of MVPP22_XLG_CTRL0_REG, which
> on ClearFog GT 8K is preventing the MVPP2 to properly start the MAC after the transceiver is detected. After reverting 
> the patch, the transceiver works properly.
> 
>    Here is the patch for 5.4.150 I applied (reversing the patch "[PATCH net v3] net: mvpp2: disable force link UP during port init procedure”):
> 
> --- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c      2021-11-22 09:36:45.176710763 +0000
> +++ drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c      2021-09-30 08:09:26.000000000 +0000
> @@ -4545,7 +4545,7 @@
>        struct mvpp2 *priv = port->priv;
>        struct mvpp2_txq_pcpu *txq_pcpu;
>        unsigned int thread;
> -       int queue, err;
> +       int queue, err, val;
> 
>        /* Checks for hardware constraints */
>        if (port->first_rxq + port->nrxqs >
> @@ -4559,6 +4559,18 @@
>        mvpp2_egress_disable(port);
>        mvpp2_port_disable(port);
> 
> +       if (mvpp2_is_xlg(port->phy_interface)) {
> +               val = readl(port->base + MVPP22_XLG_CTRL0_REG);
> +               val &= ~MVPP22_XLG_CTRL0_FORCE_LINK_PASS;
> +               val |= MVPP22_XLG_CTRL0_FORCE_LINK_DOWN;
> +               writel(val, port->base + MVPP22_XLG_CTRL0_REG);
> +       } else {
> +               val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> +               val &= ~MVPP2_GMAC_FORCE_LINK_PASS;
> +               val |= MVPP2_GMAC_FORCE_LINK_DOWN;
> +               writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
> +       }
> +
>        port->tx_time_coal = MVPP2_TXDONE_COAL_USEC;
> 
>        port->txqs = devm_kcalloc(dev, port->ntxqs, sizeof(*port->txqs),
> 
> 
>> On 21.11.2021, at 09:37 , Jordan Vrtanoski <jordan.vrtanoski@gmail.com> wrote:
>> 
>> I also suspect that it could be a case of the misconfiguration of the hardware,
>> and I believe it’s coming from MVPP2 driver.
>> 
>> I will try 5.4.0, and if it works, I will bisect till 5.4.150 to see when the defect was introduced.
>> 
>>> On 20.11.2021, at 23:41 , Russell King (Oracle) <linux@armlinux.org.uk> wrote:
>>> 
>>> On Sat, Nov 20, 2021 at 10:57:39PM +0400, Jordan Vrtanoski wrote:
>>>> I have tried the same config on 5.10.80 and the transceiver is properly initialised, link between PHY and MAC established and IP address obtained trough DHCP.
>>>> This is definitely a regression defect in 5.4.x.
>>>> 
>>>> What I managed to find till now by adding debugging messages in the phylink and mvpp2 is that when the phylink_resolve
>>>> is invoked, phylink_resolve will invoke mvpp2_phylink_mac_link_state (trough the pl->ops). 
>>>> The stat that is returned is same with the old state, so the phylink_mac_link_up is never executed.
>>>> From what I can conclude from the code, it looks like the MVPP22_XLG_STATUS_LINK_UP is not set in the registry MVPP22_XLG_STATUS  
>>>> 
>>>> I have also noticed that the mvpp2_link_status_isr is not invoked when the transciver is inserted. Is this expected behaviour? 
>>>> Who should set the MVPP22_XLG_STATUS_LINK_UP after it’s detected that the module was inserted in the cage?
>>> 
>>> The hardware does, when it detects valid 10GBASE-R signal being received
>>> from the SFP. It could mean something is misconfigured with the COMPHY
>>> block, or something wrong with the mvpp2 driver.
>>> 
>>> I'm afraid that getting a bug report against a stable kernel is about
>>> the worst situation for me - I don't track the stable kernels or what
>>> goes into them. To be honest, I don't remember much about the 5.4
>>> kernel (I blame that on getting old!)
>>> 
>>> Please can you test vanilla 5.4? If that works, then please bisect
>>> between 5.4 and the 5.4.x kernel that you've identified as failing.
>>> It could be some patch has been backported into that stable kernel
>>> that should not have been.
>>> 
>>> Thanks.
>>> 
>>> -- 
>>> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>>> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
>> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
  2021-11-22 10:51         ` Jordan Vrtanoski
@ 2021-11-22 19:53             ` Russell King (Oracle)
  2021-11-22 19:53             ` Russell King (Oracle)
  1 sibling, 0 replies; 11+ messages in thread
From: Russell King (Oracle) @ 2021-11-22 19:53 UTC (permalink / raw)
  To: Jordan Vrtanoski, stable; +Cc: mw, linux-arm-kernel, stefanc

On Mon, Nov 22, 2021 at 02:51:36PM +0400, Jordan Vrtanoski wrote:
> Hi,
>     After bisecting, the regression defect was introduced in 5.4.90 with the following patch:
> "[PATCH net v3] net: mvpp2: disable force link UP during port init procedure”
> 
>     The patch is changing the configuration of the port during the initialisation of MVPP22_XLG_CTRL0_REG, which
> on ClearFog GT 8K is preventing the MVPP2 to properly start the MAC after the transceiver is detected. After reverting 
> the patch, the transceiver works properly.

Right, the problem will be 875082244853 ("net: mvpp2: disable force
link UP during port init procedure") that has been backported to
kernels that it shouldn't have been applied to.

There is a subtle interaction between that commit and development work
leading up to it that wasn't obvious during the review. Specifically,
any kernel without fefeae73ac7a ("net: mvpp2: ensure the port is forced
down while changing modes") will now be broken.

However, fefeae73ac7a is development work, and so can't be backported.

Adding stable to this thread so they're aware of the issue.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
@ 2021-11-22 19:53             ` Russell King (Oracle)
  0 siblings, 0 replies; 11+ messages in thread
From: Russell King (Oracle) @ 2021-11-22 19:53 UTC (permalink / raw)
  To: Jordan Vrtanoski, stable; +Cc: mw, linux-arm-kernel, stefanc

On Mon, Nov 22, 2021 at 02:51:36PM +0400, Jordan Vrtanoski wrote:
> Hi,
>     After bisecting, the regression defect was introduced in 5.4.90 with the following patch:
> "[PATCH net v3] net: mvpp2: disable force link UP during port init procedure”
> 
>     The patch is changing the configuration of the port during the initialisation of MVPP22_XLG_CTRL0_REG, which
> on ClearFog GT 8K is preventing the MVPP2 to properly start the MAC after the transceiver is detected. After reverting 
> the patch, the transceiver works properly.

Right, the problem will be 875082244853 ("net: mvpp2: disable force
link UP during port init procedure") that has been backported to
kernels that it shouldn't have been applied to.

There is a subtle interaction between that commit and development work
leading up to it that wasn't obvious during the review. Specifically,
any kernel without fefeae73ac7a ("net: mvpp2: ensure the port is forced
down while changing modes") will now be broken.

However, fefeae73ac7a is development work, and so can't be backported.

Adding stable to this thread so they're aware of the issue.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
  2021-11-22 19:53             ` Russell King (Oracle)
@ 2021-11-23 12:20               ` Greg KH
  -1 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2021-11-23 12:20 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jordan Vrtanoski, stable, mw, linux-arm-kernel, stefanc

On Mon, Nov 22, 2021 at 07:53:44PM +0000, Russell King (Oracle) wrote:
> On Mon, Nov 22, 2021 at 02:51:36PM +0400, Jordan Vrtanoski wrote:
> > Hi,
> >     After bisecting, the regression defect was introduced in 5.4.90 with the following patch:
> > "[PATCH net v3] net: mvpp2: disable force link UP during port init procedure”
> > 
> >     The patch is changing the configuration of the port during the initialisation of MVPP22_XLG_CTRL0_REG, which
> > on ClearFog GT 8K is preventing the MVPP2 to properly start the MAC after the transceiver is detected. After reverting 
> > the patch, the transceiver works properly.
> 
> Right, the problem will be 875082244853 ("net: mvpp2: disable force
> link UP during port init procedure") that has been backported to
> kernels that it shouldn't have been applied to.
> 
> There is a subtle interaction between that commit and development work
> leading up to it that wasn't obvious during the review. Specifically,
> any kernel without fefeae73ac7a ("net: mvpp2: ensure the port is forced
> down while changing modes") will now be broken.
> 
> However, fefeae73ac7a is development work, and so can't be backported.
> 
> Adding stable to this thread so they're aware of the issue.

I've now reverted the commit in the 5.4.y tree.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150
@ 2021-11-23 12:20               ` Greg KH
  0 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2021-11-23 12:20 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jordan Vrtanoski, stable, mw, linux-arm-kernel, stefanc

On Mon, Nov 22, 2021 at 07:53:44PM +0000, Russell King (Oracle) wrote:
> On Mon, Nov 22, 2021 at 02:51:36PM +0400, Jordan Vrtanoski wrote:
> > Hi,
> >     After bisecting, the regression defect was introduced in 5.4.90 with the following patch:
> > "[PATCH net v3] net: mvpp2: disable force link UP during port init procedure”
> > 
> >     The patch is changing the configuration of the port during the initialisation of MVPP22_XLG_CTRL0_REG, which
> > on ClearFog GT 8K is preventing the MVPP2 to properly start the MAC after the transceiver is detected. After reverting 
> > the patch, the transceiver works properly.
> 
> Right, the problem will be 875082244853 ("net: mvpp2: disable force
> link UP during port init procedure") that has been backported to
> kernels that it shouldn't have been applied to.
> 
> There is a subtle interaction between that commit and development work
> leading up to it that wasn't obvious during the review. Specifically,
> any kernel without fefeae73ac7a ("net: mvpp2: ensure the port is forced
> down while changing modes") will now be broken.
> 
> However, fefeae73ac7a is development work, and so can't be backported.
> 
> Adding stable to this thread so they're aware of the issue.

I've now reverted the commit in the 5.4.y tree.

thanks,

greg k-h

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-11-23 12:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16 10:11 ClearFog GT 8K not initialising SFP-H10GB-CU1M transceiver on 5.4.150 Jordan Vrtanoski
2021-11-16 10:51 ` Russell King (Oracle)
2021-11-20 18:57   ` Jordan Vrtanoski
2021-11-20 19:41     ` Russell King (Oracle)
2021-11-21  5:37       ` Jordan Vrtanoski
2021-11-22 10:51         ` Jordan Vrtanoski
2021-11-22 13:15           ` Jordan Vrtanoski
2021-11-22 19:53           ` Russell King (Oracle)
2021-11-22 19:53             ` Russell King (Oracle)
2021-11-23 12:20             ` Greg KH
2021-11-23 12:20               ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.