* No traffic with Marvell switch and latest linux-next @ 2019-02-17 15:34 Heiner Kallweit 2019-02-17 15:40 ` Russell King - ARM Linux admin ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Heiner Kallweit @ 2019-02-17 15:34 UTC (permalink / raw) To: Andrew Lunn, Florian Fainelli, Russell King - ARM Linux; +Cc: netdev When testing latest linux-next on the ZII DTU I face the issue that no traffic is flowing over the switch ports, even though in dmesg everything looks good. Also PHY properly establishes the link. With 4.20.10 I don't have the issue and with 5.0-rc6 also not. However on 5.0-rc6 I got the following, also number of network interrupts seems to be very high (few minutes after boot). Any idea what's going on? Andrew, IIRC you recently fixed some interrupt-related issue: 7ae710f9f8b2 ("gpio: vf610: Mask all GPIO interrupts") But the description doesn't seem to match this trace. irq 56: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 577 Comm: irq/38-400d1000 Not tainted 5.0.0-rc6 #1 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) [<8010c898>] (unwind_backtrace) from [<8010ad98>] (show_stack+0x10/0x14) [<8010ad98>] (show_stack) from [<80149660>] (__report_bad_irq+0x38/0xb0) [<80149660>] (__report_bad_irq) from [<80149478>] (note_interrupt+0x10c/0x294) [<80149478>] (note_interrupt) from [<80149cac>] (handle_nested_irq+0xd8/0xf4) [<80149cac>] (handle_nested_irq) from [<80384a64>] (mv88e6xxx_g2_irq_thread_fn+0x90/0xc0) [<80384a64>] (mv88e6xxx_g2_irq_thread_fn) from [<80149c60>] (handle_nested_irq+0x8c/0xf4) [<80149c60>] (handle_nested_irq) from [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work+0x98/0xcc) [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work) from [<80147ff4>] (irq_thread_fn+0x1c/0x78) [<80147ff4>] (irq_thread_fn) from [<80148280>] (irq_thread+0x124/0x1cc) [<80148280>] (irq_thread) from [<8012f8e8>] (kthread+0x140/0x148) [<8012f8e8>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c) Exception stack(0x9f6c7fb0 to 0x9f6c7ff8) 7fa0: 00000000 00000000 00000000 00000000 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 handlers: [<b09c70df>] irq_default_primary_handler threaded [<44d6803f>] phy_interrupt Disabling IRQ #56 36: 2030566 mscm-ir 79 Edge 400d1000.ethernet 38: 1010437 gpio-vf610 2 Level 400d1000.ethernet-1:00 42: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-g1-atu-prob 44: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-g1-vtu-prob 46: 1010435 mv88e6xxx-g1 7 Edge mv88e6xxx-g2 49: 0 mv88e6xxx-g2 1 Edge mv88e6xxx-1:01 53: 0 mv88e6xxx-g2 5 Edge mv88e6xxx-1:05 54: 0 mv88e6xxx-g2 6 Edge mv88e6xxx-1:06 56: 100000 mv88e6xxx-g2 8 Edge mv88e6xxx-1:08 63: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-watchdog Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:34 No traffic with Marvell switch and latest linux-next Heiner Kallweit @ 2019-02-17 15:40 ` Russell King - ARM Linux admin 2019-02-17 15:50 ` Heiner Kallweit 2019-02-17 15:45 ` Andrew Lunn 2019-02-17 15:51 ` Andrew Lunn 2 siblings, 1 reply; 32+ messages in thread From: Russell King - ARM Linux admin @ 2019-02-17 15:40 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Andrew Lunn, Florian Fainelli, netdev On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote: > When testing latest linux-next on the ZII DTU I face the issue that no > traffic is flowing over the switch ports, even though in dmesg > everything looks good. Also PHY properly establishes the link. > > With 4.20.10 I don't have the issue and with 5.0-rc6 also not. > However on 5.0-rc6 I got the following, also number of network > interrupts seems to be very high (few minutes after boot). > Any idea what's going on? > > Andrew, IIRC you recently fixed some interrupt-related issue: > 7ae710f9f8b2 ("gpio: vf610: Mask all GPIO interrupts") > But the description doesn't seem to match this trace. I have a fix for the trace you have below, but it has nothing to do with no traffic. I'll send it out shortly. Which protocol are you using (ipv4 or ipv6)? Have you setup a bridge device containing the ports you wish to switch network traffic. Without a bridge device, DSA will by default treat each port as a separate port. The other thing that gets people is the ethernet interface connected to the DSA switch must be up _before_ bringing up any of the switch ports. > > irq 56: nobody cared (try booting with the "irqpoll" option) > CPU: 0 PID: 577 Comm: irq/38-400d1000 Not tainted 5.0.0-rc6 #1 > Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) > [<8010c898>] (unwind_backtrace) from [<8010ad98>] (show_stack+0x10/0x14) > [<8010ad98>] (show_stack) from [<80149660>] (__report_bad_irq+0x38/0xb0) > [<80149660>] (__report_bad_irq) from [<80149478>] (note_interrupt+0x10c/0x294) > [<80149478>] (note_interrupt) from [<80149cac>] (handle_nested_irq+0xd8/0xf4) > [<80149cac>] (handle_nested_irq) from [<80384a64>] (mv88e6xxx_g2_irq_thread_fn+0x90/0xc0) > [<80384a64>] (mv88e6xxx_g2_irq_thread_fn) from [<80149c60>] (handle_nested_irq+0x8c/0xf4) > [<80149c60>] (handle_nested_irq) from [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work+0x98/0xcc) > [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work) from [<80147ff4>] (irq_thread_fn+0x1c/0x78) > [<80147ff4>] (irq_thread_fn) from [<80148280>] (irq_thread+0x124/0x1cc) > [<80148280>] (irq_thread) from [<8012f8e8>] (kthread+0x140/0x148) > [<8012f8e8>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c) > Exception stack(0x9f6c7fb0 to 0x9f6c7ff8) > 7fa0: 00000000 00000000 00000000 00000000 > 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > handlers: > [<b09c70df>] irq_default_primary_handler threaded [<44d6803f>] phy_interrupt > Disabling IRQ #56 > > > 36: 2030566 mscm-ir 79 Edge 400d1000.ethernet > 38: 1010437 gpio-vf610 2 Level 400d1000.ethernet-1:00 > 42: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-g1-atu-prob > 44: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-g1-vtu-prob > 46: 1010435 mv88e6xxx-g1 7 Edge mv88e6xxx-g2 > 49: 0 mv88e6xxx-g2 1 Edge mv88e6xxx-1:01 > 53: 0 mv88e6xxx-g2 5 Edge mv88e6xxx-1:05 > 54: 0 mv88e6xxx-g2 6 Edge mv88e6xxx-1:06 > 56: 100000 mv88e6xxx-g2 8 Edge mv88e6xxx-1:08 > 63: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-watchdog > > Heiner > -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:40 ` Russell King - ARM Linux admin @ 2019-02-17 15:50 ` Heiner Kallweit 2019-02-17 16:40 ` Heiner Kallweit 0 siblings, 1 reply; 32+ messages in thread From: Heiner Kallweit @ 2019-02-17 15:50 UTC (permalink / raw) To: Russell King - ARM Linux admin; +Cc: Andrew Lunn, Florian Fainelli, netdev On 17.02.2019 16:40, Russell King - ARM Linux admin wrote: > On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote: >> When testing latest linux-next on the ZII DTU I face the issue that no >> traffic is flowing over the switch ports, even though in dmesg >> everything looks good. Also PHY properly establishes the link. >> >> With 4.20.10 I don't have the issue and with 5.0-rc6 also not. >> However on 5.0-rc6 I got the following, also number of network >> interrupts seems to be very high (few minutes after boot). >> Any idea what's going on? >> >> Andrew, IIRC you recently fixed some interrupt-related issue: >> 7ae710f9f8b2 ("gpio: vf610: Mask all GPIO interrupts") >> But the description doesn't seem to match this trace. > > I have a fix for the trace you have below, but it has nothing to do > with no traffic. I'll send it out shortly. > > Which protocol are you using (ipv4 or ipv6)? Have you setup a > bridge device containing the ports you wish to switch network > traffic. Without a bridge device, DSA will by default treat each > port as a separate port. The other thing that gets people is the > ethernet interface connected to the DSA switch must be up _before_ > bringing up any of the switch ports. > ipv4, a simple ping. No bridge. Device is connected to a switch that is always on. Technical environment and userspace is always the same, so it seems to be the kernel version. >> >> irq 56: nobody cared (try booting with the "irqpoll" option) >> CPU: 0 PID: 577 Comm: irq/38-400d1000 Not tainted 5.0.0-rc6 #1 >> Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) >> [<8010c898>] (unwind_backtrace) from [<8010ad98>] (show_stack+0x10/0x14) >> [<8010ad98>] (show_stack) from [<80149660>] (__report_bad_irq+0x38/0xb0) >> [<80149660>] (__report_bad_irq) from [<80149478>] (note_interrupt+0x10c/0x294) >> [<80149478>] (note_interrupt) from [<80149cac>] (handle_nested_irq+0xd8/0xf4) >> [<80149cac>] (handle_nested_irq) from [<80384a64>] (mv88e6xxx_g2_irq_thread_fn+0x90/0xc0) >> [<80384a64>] (mv88e6xxx_g2_irq_thread_fn) from [<80149c60>] (handle_nested_irq+0x8c/0xf4) >> [<80149c60>] (handle_nested_irq) from [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work+0x98/0xcc) >> [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work) from [<80147ff4>] (irq_thread_fn+0x1c/0x78) >> [<80147ff4>] (irq_thread_fn) from [<80148280>] (irq_thread+0x124/0x1cc) >> [<80148280>] (irq_thread) from [<8012f8e8>] (kthread+0x140/0x148) >> [<8012f8e8>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c) >> Exception stack(0x9f6c7fb0 to 0x9f6c7ff8) >> 7fa0: 00000000 00000000 00000000 00000000 >> 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >> 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 >> handlers: >> [<b09c70df>] irq_default_primary_handler threaded [<44d6803f>] phy_interrupt >> Disabling IRQ #56 >> >> >> 36: 2030566 mscm-ir 79 Edge 400d1000.ethernet >> 38: 1010437 gpio-vf610 2 Level 400d1000.ethernet-1:00 >> 42: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-g1-atu-prob >> 44: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-g1-vtu-prob >> 46: 1010435 mv88e6xxx-g1 7 Edge mv88e6xxx-g2 >> 49: 0 mv88e6xxx-g2 1 Edge mv88e6xxx-1:01 >> 53: 0 mv88e6xxx-g2 5 Edge mv88e6xxx-1:05 >> 54: 0 mv88e6xxx-g2 6 Edge mv88e6xxx-1:06 >> 56: 100000 mv88e6xxx-g2 8 Edge mv88e6xxx-1:08 >> 63: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-watchdog >> >> Heiner >> > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:50 ` Heiner Kallweit @ 2019-02-17 16:40 ` Heiner Kallweit 2019-02-17 16:57 ` Andrew Lunn 0 siblings, 1 reply; 32+ messages in thread From: Heiner Kallweit @ 2019-02-17 16:40 UTC (permalink / raw) To: Russell King - ARM Linux admin; +Cc: Andrew Lunn, Florian Fainelli, netdev On 17.02.2019 16:50, Heiner Kallweit wrote: > On 17.02.2019 16:40, Russell King - ARM Linux admin wrote: >> On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote: >>> When testing latest linux-next on the ZII DTU I face the issue that no >>> traffic is flowing over the switch ports, even though in dmesg >>> everything looks good. Also PHY properly establishes the link. >>> >>> With 4.20.10 I don't have the issue and with 5.0-rc6 also not. >>> However on 5.0-rc6 I got the following, also number of network >>> interrupts seems to be very high (few minutes after boot). >>> Any idea what's going on? >>> >>> Andrew, IIRC you recently fixed some interrupt-related issue: >>> 7ae710f9f8b2 ("gpio: vf610: Mask all GPIO interrupts") >>> But the description doesn't seem to match this trace. >> >> I have a fix for the trace you have below, but it has nothing to do >> with no traffic. I'll send it out shortly. >> >> Which protocol are you using (ipv4 or ipv6)? Have you setup a >> bridge device containing the ports you wish to switch network >> traffic. Without a bridge device, DSA will by default treat each >> port as a separate port. The other thing that gets people is the >> ethernet interface connected to the DSA switch must be up _before_ >> bringing up any of the switch ports. >> > ipv4, a simple ping. No bridge. Device is connected to a switch > that is always on. > > Technical environment and userspace is always the same, so it seems > to be the kernel version. > There haven't been that many changes to mv88e8xxx since 5.0-rc6. I reverted 7c0db24cc431 ("dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit") who looked like a candidate and bingo: network is working again. Obviously something is wrong with this patch. >>> >>> irq 56: nobody cared (try booting with the "irqpoll" option) >>> CPU: 0 PID: 577 Comm: irq/38-400d1000 Not tainted 5.0.0-rc6 #1 >>> Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) >>> [<8010c898>] (unwind_backtrace) from [<8010ad98>] (show_stack+0x10/0x14) >>> [<8010ad98>] (show_stack) from [<80149660>] (__report_bad_irq+0x38/0xb0) >>> [<80149660>] (__report_bad_irq) from [<80149478>] (note_interrupt+0x10c/0x294) >>> [<80149478>] (note_interrupt) from [<80149cac>] (handle_nested_irq+0xd8/0xf4) >>> [<80149cac>] (handle_nested_irq) from [<80384a64>] (mv88e6xxx_g2_irq_thread_fn+0x90/0xc0) >>> [<80384a64>] (mv88e6xxx_g2_irq_thread_fn) from [<80149c60>] (handle_nested_irq+0x8c/0xf4) >>> [<80149c60>] (handle_nested_irq) from [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work+0x98/0xcc) >>> [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work) from [<80147ff4>] (irq_thread_fn+0x1c/0x78) >>> [<80147ff4>] (irq_thread_fn) from [<80148280>] (irq_thread+0x124/0x1cc) >>> [<80148280>] (irq_thread) from [<8012f8e8>] (kthread+0x140/0x148) >>> [<8012f8e8>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c) >>> Exception stack(0x9f6c7fb0 to 0x9f6c7ff8) >>> 7fa0: 00000000 00000000 00000000 00000000 >>> 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >>> 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 >>> handlers: >>> [<b09c70df>] irq_default_primary_handler threaded [<44d6803f>] phy_interrupt >>> Disabling IRQ #56 >>> >>> >>> 36: 2030566 mscm-ir 79 Edge 400d1000.ethernet >>> 38: 1010437 gpio-vf610 2 Level 400d1000.ethernet-1:00 >>> 42: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-g1-atu-prob >>> 44: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-g1-vtu-prob >>> 46: 1010435 mv88e6xxx-g1 7 Edge mv88e6xxx-g2 >>> 49: 0 mv88e6xxx-g2 1 Edge mv88e6xxx-1:01 >>> 53: 0 mv88e6xxx-g2 5 Edge mv88e6xxx-1:05 >>> 54: 0 mv88e6xxx-g2 6 Edge mv88e6xxx-1:06 >>> 56: 100000 mv88e6xxx-g2 8 Edge mv88e6xxx-1:08 >>> 63: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-watchdog >>> >>> Heiner >>> >> > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 16:40 ` Heiner Kallweit @ 2019-02-17 16:57 ` Andrew Lunn 2019-02-17 17:06 ` Heiner Kallweit 0 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-17 16:57 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev > There haven't been that many changes to mv88e8xxx since 5.0-rc6. > I reverted 7c0db24cc431 ("dsa: mv88e6xxx: Ensure all pending interrupts > are handled prior to exit") who looked like a candidate and bingo: > network is working again. Obviously something is wrong with this patch. O.K. I tested it on an edge interrupt system, but not a level interrupt. I wounder if it is related to that somehow? DTU should be using level interrupts. Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 16:57 ` Andrew Lunn @ 2019-02-17 17:06 ` Heiner Kallweit 2019-02-17 17:10 ` Andrew Lunn 0 siblings, 1 reply; 32+ messages in thread From: Heiner Kallweit @ 2019-02-17 17:06 UTC (permalink / raw) To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev On 17.02.2019 17:57, Andrew Lunn wrote: >> There haven't been that many changes to mv88e8xxx since 5.0-rc6. >> I reverted 7c0db24cc431 ("dsa: mv88e6xxx: Ensure all pending interrupts >> are handled prior to exit") who looked like a candidate and bingo: >> network is working again. Obviously something is wrong with this patch. > > O.K. I tested it on an edge interrupt system, but not a level > interrupt. I wounder if it is related to that somehow? DTU should be > using level interrupts. > Sorry, I may have been too fast with this statement. With this patch reverted it worked, but now I have a build with this patch still included, and it works too. Need to dig deeper .. > Andrew > Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 17:06 ` Heiner Kallweit @ 2019-02-17 17:10 ` Andrew Lunn 2019-02-18 18:16 ` Heiner Kallweit 0 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-17 17:10 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev > Sorry, I may have been too fast with this statement. With this patch > reverted it worked, but now I have a build with this patch still included, > and it works too. Need to dig deeper .. Hi Heiner Watch out for boot vs reboot, and when rebooting if port 8 had link or not before you reboot. Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 17:10 ` Andrew Lunn @ 2019-02-18 18:16 ` Heiner Kallweit 2019-02-18 18:21 ` Andrew Lunn 0 siblings, 1 reply; 32+ messages in thread From: Heiner Kallweit @ 2019-02-18 18:16 UTC (permalink / raw) To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev On 17.02.2019 18:10, Andrew Lunn wrote: >> Sorry, I may have been too fast with this statement. With this patch >> reverted it worked, but now I have a build with this patch still included, >> and it works too. Need to dig deeper .. > > Hi Heiner > > Watch out for boot vs reboot, and when rebooting if port 8 had link or > not before you reboot. > Will do. Is there some known issue or bug? > Andrew > Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-18 18:16 ` Heiner Kallweit @ 2019-02-18 18:21 ` Andrew Lunn 2019-02-23 21:48 ` Heiner Kallweit 0 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-18 18:21 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev > > Hi Heiner > > > > Watch out for boot vs reboot, and when rebooting if port 8 had link or > > not before you reboot. > > > Will do. Is there some known issue or bug? Hi Heiner No, but it is a variable which can make a difference. The fix i made for the Freescale GPIO controller was not an issue for cold boot, but reboot with link up did cause interrupt problems, etc. Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-18 18:21 ` Andrew Lunn @ 2019-02-23 21:48 ` Heiner Kallweit 2019-02-23 23:42 ` Andrew Lunn 2019-02-23 23:55 ` Florian Fainelli 0 siblings, 2 replies; 32+ messages in thread From: Heiner Kallweit @ 2019-02-23 21:48 UTC (permalink / raw) To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev On 18.02.2019 19:21, Andrew Lunn wrote: >>> Hi Heiner >>> >>> Watch out for boot vs reboot, and when rebooting if port 8 had link or >>> not before you reboot. >>> >> Will do. Is there some known issue or bug? > > Hi Heiner > > No, but it is a variable which can make a difference. The fix i made > for the Freescale GPIO controller was not an issue for cold boot, but > reboot with link up did cause interrupt problems, etc. > Hi Andrew, it took me quite some time to debug this issue .. At first a bisect pointed to one of my commits: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") Further digging lead me to some suspicious dsa code: In dsa_port_fixed_link_register_of() there's a call to genphy_read_status(). At the time of the call phydev->advertising is empty, therefore the fixed phy settings are overwritten with defaults (10/half) what breaks the system. Worth to be mentioned is that for the PHY these two flags are set: - is_pseudo_fixed_link (that's ok) - autoneg -> I'm not sure this is correct. It seems that you once added the code in question: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port speeds/duplex") I did what I like to do most and removed some code. W/o the calls to genphy_config_init() and genphy_read_status() it works again. Do these calls have some purpose here with a fixed link? My commit exposed the issue because before it genphy_read_status() read the advertisement from chip registers instead of using phydev->advertising. Very close to this function is dsa_port_setup_phy_of() which uses genphy_resume() and genphy_read_status() and also looks somewhat suspicious. This code makes quite some assumptions: - PHY is a C22 PHY - PHY is compatible with the generic PHY driver > Andrew > Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-23 21:48 ` Heiner Kallweit @ 2019-02-23 23:42 ` Andrew Lunn 2019-02-24 9:10 ` Heiner Kallweit 2019-02-24 15:31 ` Russell King - ARM Linux admin 2019-02-23 23:55 ` Florian Fainelli 1 sibling, 2 replies; 32+ messages in thread From: Andrew Lunn @ 2019-02-23 23:42 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev > it took me quite some time to debug this issue .. > > At first a bisect pointed to one of my commits: > 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") > > Further digging lead me to some suspicious dsa code: > In dsa_port_fixed_link_register_of() there's a call to genphy_read_status(). > At the time of the call phydev->advertising is empty, therefore the fixed phy > settings are overwritten with defaults (10/half) what breaks the system. > > Worth to be mentioned is that for the PHY these two flags are set: > - is_pseudo_fixed_link (that's ok) > - autoneg -> I'm not sure this is correct. > > It seems that you once added the code in question: > 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port speeds/duplex") Hi Heiner For Ethernet switches, you can have device trees like this: switch0: switch@0 { compatible = "marvell,mv88e6085"; pinctrl-0 = <&pinctrl_gpio_switch0>; pinctrl-names = "default"; reg = <0>; dsa,member = <0 0>; interrupt-parent = <&gpio0>; interrupts = <27 IRQ_TYPE_LEVEL_LOW>; interrupt-controller; #interrupt-cells = <2>; eeprom-length = <512>; ports { #address-cells = <1>; #size-cells = <0>; port@0 { reg = <0>; label = "lan0"; phy-handle = <&switch0phy0>; }; ... switch0port5: port@5 { reg = <5>; label = "dsa"; phy-mode = "rgmii-txid"; link = <&switch1port6 &switch2port9>; fixed-link { speed = <1000>; full-duplex; }; }; This is a DSA port. It is used to connect two Ethernet switches together. In this case, the MACs are connected back to back. There are no PHYs involved. These ports don't have a netdev associated with them, since they are just pipes between switches, not user interfaces. If i remember correctly, the code you are looking at was to deal with the "rgmii-txid". Without the TXID, i could not get frames to pass between the ports. There is a second use case as well. The Vybrid FEC ethernet controller is a Fast Ethernet port. The switch ports are 1G. DSA drivers set the port connecting to the SoC to its highest speed. Again, there is no netdev associated to the switch port, and generally the MAC ports are connected back to back. This 100 vs 1000 does not work for the Vybrid. So we add a fixed PHY to the CPU port. port@6 { reg = <6>; label = "cpu"; ethernet = <&fec1>; fixed-link { speed = <100>; full-duplex; }; }; And there is a third use case: port@4 { reg = <4>; label = "optical4"; fixed-link { speed = <1000>; full-duplex; link-gpios = <&gpio6 3 GPIO_ACTIVE_HIGH>; }; }; We have a GPIO which tells us if the link it up, because there is not a PHY here, but an optical module. The GPIO tells us about loss of signal. Unfortunately we cannot use the SFP driver here, because of a hardware issue. In all cases end up in the same code. We want to tell the MAC to configure RGMII delays, or to configure the port speed, or to have the correct initial link state. The adjust_link() call can do this, if passed a phydev. And we have a phydev for the fixed-link PHY. However, since it has never been attached to a netdev, phy_start() called, etc, the state information is maybe not correct. So we call config_init() and read_status(), so phydev should reflect the state of the fixed-link PHY. And a fixed-link PHY is always c22, and it can be driven by genphy. Take a look at drivers/net/phy/swphy.c which is the core of the simulation, and fixed_phy.c > I did what I like to do most and removed some code. > W/o the calls to genphy_config_init() and genphy_read_status() it works again. > Do these calls have some purpose here with a fixed link? Maybe with all the core changes, these calls are no longer needed? We just need to ensure the state in phydev reflects the state of the fixed link, i.e. in this case 100 Full. Looking forward, at some point we are going to have to make fixed-link support higher speeds. That probably means we need a swphy-c45 which emulates the standard registers for 2.5G, 5G and 10G. At that point genphy will not work... Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-23 23:42 ` Andrew Lunn @ 2019-02-24 9:10 ` Heiner Kallweit 2019-02-24 15:04 ` Andrew Lunn 2019-02-24 15:31 ` Russell King - ARM Linux admin 1 sibling, 1 reply; 32+ messages in thread From: Heiner Kallweit @ 2019-02-24 9:10 UTC (permalink / raw) To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev On 24.02.2019 00:42, Andrew Lunn wrote: >> it took me quite some time to debug this issue .. >> >> At first a bisect pointed to one of my commits: >> 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") >> >> Further digging lead me to some suspicious dsa code: >> In dsa_port_fixed_link_register_of() there's a call to genphy_read_status(). >> At the time of the call phydev->advertising is empty, therefore the fixed phy >> settings are overwritten with defaults (10/half) what breaks the system. >> >> Worth to be mentioned is that for the PHY these two flags are set: >> - is_pseudo_fixed_link (that's ok) >> - autoneg -> I'm not sure this is correct. >> >> It seems that you once added the code in question: >> 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port speeds/duplex") > > Hi Heiner > > For Ethernet switches, you can have device trees like this: > > switch0: switch@0 { > compatible = "marvell,mv88e6085"; > pinctrl-0 = <&pinctrl_gpio_switch0>; > pinctrl-names = "default"; > reg = <0>; > dsa,member = <0 0>; > interrupt-parent = <&gpio0>; > interrupts = <27 IRQ_TYPE_LEVEL_LOW>; > interrupt-controller; > #interrupt-cells = <2>; > eeprom-length = <512>; > > ports { > #address-cells = <1>; > #size-cells = <0>; > > port@0 { > reg = <0>; > label = "lan0"; > phy-handle = <&switch0phy0>; > }; > > ... > switch0port5: port@5 { > reg = <5>; > label = "dsa"; > phy-mode = "rgmii-txid"; > link = <&switch1port6 > &switch2port9>; > fixed-link { > speed = <1000>; > full-duplex; > }; > }; > > This is a DSA port. It is used to connect two Ethernet switches > together. In this case, the MACs are connected back to back. There are > no PHYs involved. These ports don't have a netdev associated with > them, since they are just pipes between switches, not user interfaces. > > If i remember correctly, the code you are looking at was to deal with > the "rgmii-txid". Without the TXID, i could not get frames to pass > between the ports. > > There is a second use case as well. The Vybrid FEC ethernet controller > is a Fast Ethernet port. The switch ports are 1G. DSA drivers set the > port connecting to the SoC to its highest speed. Again, there is no > netdev associated to the switch port, and generally the MAC ports are > connected back to back. This 100 vs 1000 does not work for the > Vybrid. So we add a fixed PHY to the CPU port. > > port@6 { > reg = <6>; > label = "cpu"; > ethernet = <&fec1>; > > fixed-link { > speed = <100>; > full-duplex; > }; > }; > > And there is a third use case: > > port@4 { > reg = <4>; > label = "optical4"; > > fixed-link { > speed = <1000>; > full-duplex; > link-gpios = <&gpio6 3 > GPIO_ACTIVE_HIGH>; > }; > }; > > We have a GPIO which tells us if the link it up, because there is not > a PHY here, but an optical module. The GPIO tells us about loss of > signal. Unfortunately we cannot use the SFP driver here, because of a > hardware issue. > > In all cases end up in the same code. We want to tell the MAC to > configure RGMII delays, or to configure the port speed, or to have the > correct initial link state. The adjust_link() call can do this, if > passed a phydev. And we have a phydev for the fixed-link PHY. However, > since it has never been attached to a netdev, phy_start() called, etc, > the state information is maybe not correct. So we call config_init() > and read_status(), so phydev should reflect the state of the > fixed-link PHY. And a fixed-link PHY is always c22, and it can be > driven by genphy. Take a look at drivers/net/phy/swphy.c which is the > core of the simulation, and fixed_phy.c > Thanks a lot for the very comprehensive explanation! I think what's not correct is that phydev->autoneg is set (by phy_device_create) for a fixed link. genphy_config_init() ensures that aneg isn't set in the "supported" bitmap, but it doesn't care about phydev->autoneg. IMO we need to consider this at few places: - genphy_config_init It should clear phydev->autoneg if aneg isn't supported. - phy_probe After having read the chip features we should clear phydev->autoneg if aneg isn't supported. - In __fixed_phy_register() we should clear phydev->autoneg. >> I did what I like to do most and removed some code. >> W/o the calls to genphy_config_init() and genphy_read_status() it works again. >> Do these calls have some purpose here with a fixed link? > > Maybe with all the core changes, these calls are no longer needed? We > just need to ensure the state in phydev reflects the state of the > fixed link, i.e. in this case 100 Full. > I think at least the call to genphy_config_init can be removed. I don't really like the name of this function anyway, because it doesn't initialize anything but is a sanity check that reads chip capabilities and removes unsupported modes from the "supported" bitmap. genphy_read_status() we need to read the link status from GPIO (if applicable in the use case). To be more precise, most likely calling genphy_update_link() would be sufficient because speed and duplex are fixed. > Looking forward, at some point we are going to have to make fixed-link > support higher speeds. That probably means we need a swphy-c45 which > emulates the standard registers for 2.5G, 5G and 10G. At that point > genphy will not work... > > Andrew > . > Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 9:10 ` Heiner Kallweit @ 2019-02-24 15:04 ` Andrew Lunn 2019-02-24 15:15 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-24 15:04 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev > I think what's not correct is that phydev->autoneg is set > (by phy_device_create) for a fixed link. Fixed-link tries to emulate auto-neg: bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE; Maybe it needs better emulation of auto-neg? Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 15:04 ` Andrew Lunn @ 2019-02-24 15:15 ` Russell King - ARM Linux admin 2019-02-24 15:28 ` Heiner Kallweit 0 siblings, 1 reply; 32+ messages in thread From: Russell King - ARM Linux admin @ 2019-02-24 15:15 UTC (permalink / raw) To: Andrew Lunn; +Cc: Heiner Kallweit, Florian Fainelli, netdev On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote: > > I think what's not correct is that phydev->autoneg is set > > (by phy_device_create) for a fixed link. > > Fixed-link tries to emulate auto-neg: > > bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE; > > Maybe it needs better emulation of auto-neg? Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3 (BMSR_ANEGCAPABLE). In any case, 0.12 (BMCR_ANENABLE) is not set, so according to 802.3-2015, we should not be setting 1.5 (BMSR_ANEGCOMPLETE). However, swphy does try to emulate autonegotiation - we do have cases where swphy is used in situations where the speed and duplex are not fixed. It returns an emulated link partner advertisement for the current speed, which would suggest that we should set BMCR_ANENABLE. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 15:15 ` Russell King - ARM Linux admin @ 2019-02-24 15:28 ` Heiner Kallweit 2019-02-24 15:34 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 32+ messages in thread From: Heiner Kallweit @ 2019-02-24 15:28 UTC (permalink / raw) To: Russell King - ARM Linux admin, Andrew Lunn; +Cc: Florian Fainelli, netdev On 24.02.2019 16:15, Russell King - ARM Linux admin wrote: > On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote: >>> I think what's not correct is that phydev->autoneg is set >>> (by phy_device_create) for a fixed link. >> >> Fixed-link tries to emulate auto-neg: >> >> bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE; >> >> Maybe it needs better emulation of auto-neg? > > Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3 > (BMSR_ANEGCAPABLE). In any case, 0.12 (BMCR_ANENABLE) is not set, > so according to 802.3-2015, we should not be setting 1.5 > (BMSR_ANEGCOMPLETE). > > However, swphy does try to emulate autonegotiation - we do have cases > where swphy is used in situations where the speed and duplex are not > fixed. It returns an emulated link partner advertisement for the > current speed, which would suggest that we should set BMCR_ANENABLE. > If we emulate auto-neg, then it's not needed to set the speed bits in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT don't support forced mode. So we may have to go with auto-neg. To avoid the original issue it should be sufficient to copy supported -> advertising at a suited place. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 15:28 ` Heiner Kallweit @ 2019-02-24 15:34 ` Russell King - ARM Linux admin 2019-02-24 15:39 ` Heiner Kallweit 0 siblings, 1 reply; 32+ messages in thread From: Russell King - ARM Linux admin @ 2019-02-24 15:34 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Andrew Lunn, Florian Fainelli, netdev On Sun, Feb 24, 2019 at 04:28:32PM +0100, Heiner Kallweit wrote: > On 24.02.2019 16:15, Russell King - ARM Linux admin wrote: > > On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote: > >>> I think what's not correct is that phydev->autoneg is set > >>> (by phy_device_create) for a fixed link. > >> > >> Fixed-link tries to emulate auto-neg: > >> > >> bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE; > >> > >> Maybe it needs better emulation of auto-neg? > > > > Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3 > > (BMSR_ANEGCAPABLE). In any case, 0.12 (BMCR_ANENABLE) is not set, > > so according to 802.3-2015, we should not be setting 1.5 > > (BMSR_ANEGCOMPLETE). > > > > However, swphy does try to emulate autonegotiation - we do have cases > > where swphy is used in situations where the speed and duplex are not > > fixed. It returns an emulated link partner advertisement for the > > current speed, which would suggest that we should set BMCR_ANENABLE. > > > If we emulate auto-neg, then it's not needed to set the speed bits > in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT > don't support forced mode. So we may have to go with auto-neg. Sure. > To avoid the original issue it should be sufficient to copy > supported -> advertising at a suited place. Why bother - the software PHY emulation is an emulation to allow existing userspace that pre-dates the ethtool API to get some link parameters. If we augment the PHY emulation in non-standard ways, userspace will need to be updated to handle those non-standard ways. If userspace needs to be updated, why not just bite the bullet and update to ethtool APIs rather than adding more complication through an emulation layer? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 15:34 ` Russell King - ARM Linux admin @ 2019-02-24 15:39 ` Heiner Kallweit 2019-02-24 15:49 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 32+ messages in thread From: Heiner Kallweit @ 2019-02-24 15:39 UTC (permalink / raw) To: Russell King - ARM Linux admin; +Cc: Andrew Lunn, Florian Fainelli, netdev On 24.02.2019 16:34, Russell King - ARM Linux admin wrote: > On Sun, Feb 24, 2019 at 04:28:32PM +0100, Heiner Kallweit wrote: >> On 24.02.2019 16:15, Russell King - ARM Linux admin wrote: >>> On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote: >>>>> I think what's not correct is that phydev->autoneg is set >>>>> (by phy_device_create) for a fixed link. >>>> >>>> Fixed-link tries to emulate auto-neg: >>>> >>>> bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE; >>>> >>>> Maybe it needs better emulation of auto-neg? >>> >>> Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3 >>> (BMSR_ANEGCAPABLE). In any case, 0.12 (BMCR_ANENABLE) is not set, >>> so according to 802.3-2015, we should not be setting 1.5 >>> (BMSR_ANEGCOMPLETE). >>> >>> However, swphy does try to emulate autonegotiation - we do have cases >>> where swphy is used in situations where the speed and duplex are not >>> fixed. It returns an emulated link partner advertisement for the >>> current speed, which would suggest that we should set BMCR_ANENABLE. >>> >> If we emulate auto-neg, then it's not needed to set the speed bits >> in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT >> don't support forced mode. So we may have to go with auto-neg. > > Sure. > >> To avoid the original issue it should be sufficient to copy >> supported -> advertising at a suited place. > Sorry, seems this wasn't clear enough. I don't mean to change swphy but the user side. > Why bother - the software PHY emulation is an emulation to allow > existing userspace that pre-dates the ethtool API to get some link > parameters. If we augment the PHY emulation in non-standard ways, > userspace will need to be updated to handle those non-standard > ways. If userspace needs to be updated, why not just bite the > bullet and update to ethtool APIs rather than adding more > complication through an emulation layer? > It's not only userspace. Based on my limited knowledge of DSA this code also uses e.g. genphy_read_status() with a fixed link. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 15:39 ` Heiner Kallweit @ 2019-02-24 15:49 ` Russell King - ARM Linux admin 2019-02-24 16:32 ` Florian Fainelli 0 siblings, 1 reply; 32+ messages in thread From: Russell King - ARM Linux admin @ 2019-02-24 15:49 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Andrew Lunn, Florian Fainelli, netdev On Sun, Feb 24, 2019 at 04:39:30PM +0100, Heiner Kallweit wrote: > On 24.02.2019 16:34, Russell King - ARM Linux admin wrote: > > On Sun, Feb 24, 2019 at 04:28:32PM +0100, Heiner Kallweit wrote: > >> On 24.02.2019 16:15, Russell King - ARM Linux admin wrote: > >>> On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote: > >>>>> I think what's not correct is that phydev->autoneg is set > >>>>> (by phy_device_create) for a fixed link. > >>>> > >>>> Fixed-link tries to emulate auto-neg: > >>>> > >>>> bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE; > >>>> > >>>> Maybe it needs better emulation of auto-neg? > >>> > >>> Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3 > >>> (BMSR_ANEGCAPABLE). In any case, 0.12 (BMCR_ANENABLE) is not set, > >>> so according to 802.3-2015, we should not be setting 1.5 > >>> (BMSR_ANEGCOMPLETE). > >>> > >>> However, swphy does try to emulate autonegotiation - we do have cases > >>> where swphy is used in situations where the speed and duplex are not > >>> fixed. It returns an emulated link partner advertisement for the > >>> current speed, which would suggest that we should set BMCR_ANENABLE. > >>> > >> If we emulate auto-neg, then it's not needed to set the speed bits > >> in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT > >> don't support forced mode. So we may have to go with auto-neg. > > > > Sure. > > > >> To avoid the original issue it should be sufficient to copy > >> supported -> advertising at a suited place. > > > Sorry, seems this wasn't clear enough. I don't mean to change > swphy but the user side. > > > Why bother - the software PHY emulation is an emulation to allow > > existing userspace that pre-dates the ethtool API to get some link > > parameters. If we augment the PHY emulation in non-standard ways, > > userspace will need to be updated to handle those non-standard > > ways. If userspace needs to be updated, why not just bite the > > bullet and update to ethtool APIs rather than adding more > > complication through an emulation layer? > > > It's not only userspace. Based on my limited knowledge of DSA this > code also uses e.g. genphy_read_status() with a fixed link. DSA has support for phylink, which is perfectly capable of supporting fixed links without using fixed-phy.c, although we have no way to create that without DT. Support could be added for non-DT though. That would avoid using phylib functions to read back from a fixed-link PHY. Since phylink presents to the MAC a fixed link in the same abstract manner as a real PHY, it should result in a more elegant implementation. DSA already has phylink support to support SFPs. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 15:49 ` Russell King - ARM Linux admin @ 2019-02-24 16:32 ` Florian Fainelli 2019-02-24 17:04 ` Andrew Lunn 0 siblings, 1 reply; 32+ messages in thread From: Florian Fainelli @ 2019-02-24 16:32 UTC (permalink / raw) To: Russell King - ARM Linux admin, Heiner Kallweit; +Cc: Andrew Lunn, netdev Le 2/24/19 à 7:49 AM, Russell King - ARM Linux admin a écrit : > On Sun, Feb 24, 2019 at 04:39:30PM +0100, Heiner Kallweit wrote: >> On 24.02.2019 16:34, Russell King - ARM Linux admin wrote: >>> On Sun, Feb 24, 2019 at 04:28:32PM +0100, Heiner Kallweit wrote: >>>> On 24.02.2019 16:15, Russell King - ARM Linux admin wrote: >>>>> On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote: >>>>>>> I think what's not correct is that phydev->autoneg is set >>>>>>> (by phy_device_create) for a fixed link. >>>>>> >>>>>> Fixed-link tries to emulate auto-neg: >>>>>> >>>>>> bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE; >>>>>> >>>>>> Maybe it needs better emulation of auto-neg? >>>>> >>>>> Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3 >>>>> (BMSR_ANEGCAPABLE). In any case, 0.12 (BMCR_ANENABLE) is not set, >>>>> so according to 802.3-2015, we should not be setting 1.5 >>>>> (BMSR_ANEGCOMPLETE). >>>>> >>>>> However, swphy does try to emulate autonegotiation - we do have cases >>>>> where swphy is used in situations where the speed and duplex are not >>>>> fixed. It returns an emulated link partner advertisement for the >>>>> current speed, which would suggest that we should set BMCR_ANENABLE. >>>>> >>>> If we emulate auto-neg, then it's not needed to set the speed bits >>>> in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT >>>> don't support forced mode. So we may have to go with auto-neg. >>> >>> Sure. >>> >>>> To avoid the original issue it should be sufficient to copy >>>> supported -> advertising at a suited place. >>> >> Sorry, seems this wasn't clear enough. I don't mean to change >> swphy but the user side. >> >>> Why bother - the software PHY emulation is an emulation to allow >>> existing userspace that pre-dates the ethtool API to get some link >>> parameters. If we augment the PHY emulation in non-standard ways, >>> userspace will need to be updated to handle those non-standard >>> ways. If userspace needs to be updated, why not just bite the >>> bullet and update to ethtool APIs rather than adding more >>> complication through an emulation layer? >>> >> It's not only userspace. Based on my limited knowledge of DSA this >> code also uses e.g. genphy_read_status() with a fixed link. > > DSA has support for phylink, which is perfectly capable of supporting > fixed links without using fixed-phy.c, although we have no way to > create that without DT. Support could be added for non-DT though. I had a branch at some point that provided feature parity with what DT registration is capable of doing, including representing links between switches etc [1]. I am not entirely sure it makes to support such a configuration for platform devices, since they are not so many these days that would not use DT (except x86 maybe?). It would make sense to support ACPI-based registration for PHYLINK, if there was a standard for describing PHYs, SFP/SFFs which is not apparently the case. [1]: https://github.com/ffainelli/linux/commits/dsa-lbk-pdata2 > > That would avoid using phylib functions to read back from a fixed-link > PHY. Since phylink presents to the MAC a fixed link in the same > abstract manner as a real PHY, it should result in a more elegant > implementation. > > DSA already has phylink support to support SFPs. > The added difficulty here and the reason why Andrew went with the approach that is used by the code currently is because neither do the CPU or DSA ports are backed by a net_device. It is somewhere on my TODO to permit the use of PHYLINK without the need of a net_device to cover those specific DSA cases unless we just brute force the whole thing and allocate a net_device structure but not register that net_device? Yes in fact, why don't we do that? -- Florian ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 16:32 ` Florian Fainelli @ 2019-02-24 17:04 ` Andrew Lunn 2019-02-24 21:26 ` Florian Fainelli 0 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-24 17:04 UTC (permalink / raw) To: Florian Fainelli; +Cc: Russell King - ARM Linux admin, Heiner Kallweit, netdev > The added difficulty here and the reason why Andrew went with the > approach that is used by the code currently is because neither do the > CPU or DSA ports are backed by a net_device. It is somewhere on my TODO > to permit the use of PHYLINK without the need of a net_device to cover > those specific DSA cases unless we just brute force the whole thing and > allocate a net_device structure but not register that net_device? Yes in > fact, why don't we do that? Hi Florian At the moment, we are using a phydev which is not connected to a MAC. That is rather odd, but the phylib maintainers mostly know about this, and keep an eye out for changes which might break any assumptions. And the phylib API is quite small. How many assumptions are going to break with a netdev which is not registered? The API is much bigger, more people hack on it, and it is going to be much harder to review changes to make sure assumptions are not changed. If we are going to do something odd, we should keep the scope as small as possible. Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 17:04 ` Andrew Lunn @ 2019-02-24 21:26 ` Florian Fainelli 2019-02-24 21:42 ` Heiner Kallweit 0 siblings, 1 reply; 32+ messages in thread From: Florian Fainelli @ 2019-02-24 21:26 UTC (permalink / raw) To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Heiner Kallweit, netdev On February 24, 2019 9:04:55 AM PST, Andrew Lunn <andrew@lunn.ch> wrote: >> The added difficulty here and the reason why Andrew went with the >> approach that is used by the code currently is because neither do the >> CPU or DSA ports are backed by a net_device. It is somewhere on my >TODO >> to permit the use of PHYLINK without the need of a net_device to >cover >> those specific DSA cases unless we just brute force the whole thing >and >> allocate a net_device structure but not register that net_device? Yes >in >> fact, why don't we do that? > >Hi Florian > >At the moment, we are using a phydev which is not connected to a >MAC. That is rather odd, but the phylib maintainers mostly know about >this, and keep an eye out for changes which might break any >assumptions. And the phylib API is quite small. I would argue that this very thread is a proof against your argument since we all failed to predict that Heiner's changes would change those assumptions. Having a certain of assumptions is fine but given all the recent PHYLIB helpers that have been added I am not sure how well that will scale. > >How many assumptions are going to break with a netdev which is not >registered? The API is much bigger, more people hack on it, and it is >going to be much harder to review changes to make sure assumptions are >not changed. A non registered net_device appears easier to manage and debug since there is state tracking all over the network stack for those cases. > >If we are going to do something odd, we should keep the scope as small >as possible. Hence my suggestion to allocate a dummy net_device object just so calls to netif_carrier_{on,off} (and possibly more in the future) do nothing. I don't think that teaching either PHYLIB or PHYLINK about a NULL net_device is going to scale really well over time nor make it easier for respective maintainers. If we make the net_device optional, it will be harder to review changes as well as make sure that we do not create locking/object interactions issues. Another approach could be to define a minimal network port object (struct devlink, maybe?) which could be used independently from a net_device, or a lightweight net_device with no visibility into existing namespaces. None of these ideas are new though and would probably require several cycles to get done right. Heiner, Russell, which approach would you take? -- Florian ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 21:26 ` Florian Fainelli @ 2019-02-24 21:42 ` Heiner Kallweit 0 siblings, 0 replies; 32+ messages in thread From: Heiner Kallweit @ 2019-02-24 21:42 UTC (permalink / raw) To: Florian Fainelli, Andrew Lunn; +Cc: Russell King - ARM Linux admin, netdev On 24.02.2019 22:26, Florian Fainelli wrote: > > > On February 24, 2019 9:04:55 AM PST, Andrew Lunn <andrew@lunn.ch> wrote: >>> The added difficulty here and the reason why Andrew went with the >>> approach that is used by the code currently is because neither do the >>> CPU or DSA ports are backed by a net_device. It is somewhere on my >> TODO >>> to permit the use of PHYLINK without the need of a net_device to >> cover >>> those specific DSA cases unless we just brute force the whole thing >> and >>> allocate a net_device structure but not register that net_device? Yes >> in >>> fact, why don't we do that? >> >> Hi Florian >> >> At the moment, we are using a phydev which is not connected to a >> MAC. That is rather odd, but the phylib maintainers mostly know about >> this, and keep an eye out for changes which might break any >> assumptions. And the phylib API is quite small. > > I would argue that this very thread is a proof against your argument since we all failed to predict that Heiner's changes would change those assumptions. Having a certain of assumptions is fine but given all the recent PHYLIB helpers that have been added I am not sure how well that will scale. > >> >> How many assumptions are going to break with a netdev which is not >> registered? The API is much bigger, more people hack on it, and it is >> going to be much harder to review changes to make sure assumptions are >> not changed. > > A non registered net_device appears easier to manage and debug since there is state tracking all over the network stack for those cases. > >> >> If we are going to do something odd, we should keep the scope as small >> as possible. > > Hence my suggestion to allocate a dummy net_device object just so calls to netif_carrier_{on,off} (and possibly more in the future) do nothing. I don't think that teaching either PHYLIB or PHYLINK about a NULL net_device is going to scale really well over time nor make it easier for respective maintainers. If we make the net_device optional, it will be harder to review changes as well as make sure that we do not create locking/object interactions issues. > > Another approach could be to define a minimal network port object (struct devlink, maybe?) which could be used independently from a net_device, or a lightweight net_device with no visibility into existing namespaces. None of these ideas are new though and would probably require several cycles to get done right. > > Heiner, Russell, which approach would you take? > Given my 2 days of experience with DSA (feels like 3 already) I would like to spend few more minutes on thinking before I answer. Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-23 23:42 ` Andrew Lunn 2019-02-24 9:10 ` Heiner Kallweit @ 2019-02-24 15:31 ` Russell King - ARM Linux admin 2019-02-24 17:28 ` Andrew Lunn 1 sibling, 1 reply; 32+ messages in thread From: Russell King - ARM Linux admin @ 2019-02-24 15:31 UTC (permalink / raw) To: Andrew Lunn; +Cc: Heiner Kallweit, Florian Fainelli, netdev On Sun, Feb 24, 2019 at 12:42:35AM +0100, Andrew Lunn wrote: > Looking forward, at some point we are going to have to make fixed-link > support higher speeds. That probably means we need a swphy-c45 which > emulates the standard registers for 2.5G, 5G and 10G. At that point > genphy will not work... Do we _need_ to emulate Clause 45 PHYs? Today, the MII interface does not work with Clause 45 PHYs, so there is no userspace accessing these PHYs out there. I have a patch that adds support for the MII ioctls which gives me the ability to poke around in the 88x3310 for investigatory / debug purposes. However, I don't see the point of adding what would be very complex Clause 45 support (we'd have to emulate the PMA/PMD, PCS and AN as a minimum) when we have the ethtool API, and then we have the issue that not all advertisement modes are standardised in Clause 45 - there are some missing. As I understand it, the swphy for fixed-links only exists because we have existing tooling that expects Clause 22 PHYs to exist. This will break even if we were to add Clause 45 support as many bits in the first 16 registers have different meanings. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 15:31 ` Russell King - ARM Linux admin @ 2019-02-24 17:28 ` Andrew Lunn 2019-02-24 19:41 ` Russell King - ARM Linux admin 0 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-24 17:28 UTC (permalink / raw) To: Russell King - ARM Linux admin; +Cc: Heiner Kallweit, Florian Fainelli, netdev On Sun, Feb 24, 2019 at 03:31:26PM +0000, Russell King - ARM Linux admin wrote: > On Sun, Feb 24, 2019 at 12:42:35AM +0100, Andrew Lunn wrote: > > Looking forward, at some point we are going to have to make fixed-link > > support higher speeds. That probably means we need a swphy-c45 which > > emulates the standard registers for 2.5G, 5G and 10G. At that point > > genphy will not work... > > Do we _need_ to emulate Clause 45 PHYs? Hi Russell One use case would be a mv88e6390X port 9 or 10 connected to a SoC which can only do 2.5G. We have defined that DSA drivers should configure CPU and DSA ports to their maximum speed. So if port 9 or 10 is used, it should be configured to 10G. We then need some way to reconfigure the MAC to a slower speed. We need to do this with ZII boards. We do this with: port@0 { reg = <0>; label = "cpu"; ethernet = <&fec1>; fixed-link { speed = <100>; full-duplex; }; }; The FEC can only do 100Mbs, but the switch defaults to 1G. So the fixed link it used to tell the switch MAC to use 100/Full. In the example of the 6390X, we would want to set the link speed to 2500, which we cannot do at the moment. Either we need fixed-link to support higher speeds, or we need a different mechanism. We also have a similar issue on the SoC side. The FEC has no PHY connected to it. It needs to be told what speed to do: &fec1 { phy-mode = "rmii"; pinctrl-names = "default"; pinctrl-0 = <&pinctrl_fec1>; status = "okay"; fixed-link { speed = <100>; full-duplex; }; In the case of a SoC with an interface which can do 2.5G, you need to tell it to do 2.5G. Ideally we want a mechanism that allows a MAC to 'see' a PHY operating at 2.5G using the standard phylib/phylink API. In the past this has been achieved with an emulated PHY. But so long as the phydev/phylink structure has the correct values, it does not matter how they get those values. Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-24 17:28 ` Andrew Lunn @ 2019-02-24 19:41 ` Russell King - ARM Linux admin 0 siblings, 0 replies; 32+ messages in thread From: Russell King - ARM Linux admin @ 2019-02-24 19:41 UTC (permalink / raw) To: Andrew Lunn; +Cc: Heiner Kallweit, Florian Fainelli, netdev On Sun, Feb 24, 2019 at 06:28:48PM +0100, Andrew Lunn wrote: > On Sun, Feb 24, 2019 at 03:31:26PM +0000, Russell King - ARM Linux admin wrote: > > On Sun, Feb 24, 2019 at 12:42:35AM +0100, Andrew Lunn wrote: > > > Looking forward, at some point we are going to have to make fixed-link > > > support higher speeds. That probably means we need a swphy-c45 which > > > emulates the standard registers for 2.5G, 5G and 10G. At that point > > > genphy will not work... > > > > Do we _need_ to emulate Clause 45 PHYs? > > Hi Russell > > One use case would be a mv88e6390X port 9 or 10 connected to a SoC > which can only do 2.5G. > > We have defined that DSA drivers should configure CPU and DSA ports to > their maximum speed. So if port 9 or 10 is used, it should be > configured to 10G. > > We then need some way to reconfigure the MAC to a slower speed. > > We need to do this with ZII boards. We do this with: > > port@0 { > reg = <0>; > label = "cpu"; > ethernet = <&fec1>; > > fixed-link { > speed = <100>; > full-duplex; > }; > }; > > The FEC can only do 100Mbs, but the switch defaults to 1G. So the > fixed link it used to tell the switch MAC to use 100/Full. > > In the example of the 6390X, we would want to set the link speed to > 2500, which we cannot do at the moment. Either we need fixed-link to > support higher speeds, or we need a different mechanism. > > We also have a similar issue on the SoC side. The FEC has no PHY > connected to it. It needs to be told what speed to do: > > &fec1 { > phy-mode = "rmii"; > pinctrl-names = "default"; > pinctrl-0 = <&pinctrl_fec1>; > status = "okay"; > > fixed-link { > speed = <100>; > full-duplex; > }; > > In the case of a SoC with an interface which can do 2.5G, you need to > tell it to do 2.5G. Ideally we want a mechanism that allows a MAC to > 'see' a PHY operating at 2.5G using the standard phylib/phylink API. > > In the past this has been achieved with an emulated PHY. But so long > as the phydev/phylink structure has the correct values, it does not > matter how they get those values. You'll get a mac_config() with the mode set to MLO_AN_FIXED. state->speed set to whatever was set in the "speed =" and state->duplex set according to the duplex property. If pause modes are also defined, you'll get those through state->pause MLO_PAUSE_SYM and MLO_PAUSE_ASYM, but there won't be any resolution of those modes since a fixed link doesn't know the other ends properties, and of course AN is disabled. Apart from the pause modes, the information passed to the MAC for a fixed link is exactly the same as if it came from a PHY in phylib, which basically means the MAC side doesn't actually need to care whether it has a PHY or fixed-link - but without the complication of needing phylib to provide an emulated PHY to the network driver. phylink does provide an emulated PHY for the mii ioctls since that is an established interface for speeds <= 1G. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-23 21:48 ` Heiner Kallweit 2019-02-23 23:42 ` Andrew Lunn @ 2019-02-23 23:55 ` Florian Fainelli 1 sibling, 0 replies; 32+ messages in thread From: Florian Fainelli @ 2019-02-23 23:55 UTC (permalink / raw) To: Heiner Kallweit, Andrew Lunn; +Cc: Russell King - ARM Linux admin, netdev Le 2/23/19 à 1:48 PM, Heiner Kallweit a écrit : > On 18.02.2019 19:21, Andrew Lunn wrote: >>>> Hi Heiner >>>> >>>> Watch out for boot vs reboot, and when rebooting if port 8 had link or >>>> not before you reboot. >>>> >>> Will do. Is there some known issue or bug? >> >> Hi Heiner >> >> No, but it is a variable which can make a difference. The fix i made >> for the Freescale GPIO controller was not an issue for cold boot, but >> reboot with link up did cause interrupt problems, etc. >> > Hi Andrew, > > it took me quite some time to debug this issue .. > > At first a bisect pointed to one of my commits: > 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status") This is the same issue that Michal reported here: https://www.spinics.net/lists/netdev/msg552955.html > > Further digging lead me to some suspicious dsa code: > In dsa_port_fixed_link_register_of() there's a call to genphy_read_status(). > At the time of the call phydev->advertising is empty, therefore the fixed phy > settings are overwritten with defaults (10/half) what breaks the system. > > Worth to be mentioned is that for the PHY these two flags are set: > - is_pseudo_fixed_link (that's ok) > - autoneg -> I'm not sure this is correct. > > It seems that you once added the code in question: > 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port speeds/duplex") > > I did what I like to do most and removed some code. > W/o the calls to genphy_config_init() and genphy_read_status() it works again. > Do these calls have some purpose here with a fixed link? > > My commit exposed the issue because before it genphy_read_status() read the > advertisement from chip registers instead of using phydev->advertising. > > Very close to this function is dsa_port_setup_phy_of() which uses genphy_resume() > and genphy_read_status() and also looks somewhat suspicious. This code makes > quite some assumptions: > - PHY is a C22 PHY > - PHY is compatible with the generic PHY driver > >> Andrew >> > Heiner > -- Florian ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:34 No traffic with Marvell switch and latest linux-next Heiner Kallweit 2019-02-17 15:40 ` Russell King - ARM Linux admin @ 2019-02-17 15:45 ` Andrew Lunn 2019-02-17 15:48 ` Heiner Kallweit 2019-02-17 15:51 ` Andrew Lunn 2 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-17 15:45 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote: > When testing latest linux-next on the ZII DTU I face the issue that no > traffic is flowing over the switch ports, even though in dmesg > everything looks good. Also PHY properly establishes the link. Hi Heiner Do you have commit b79555d5d8d32643e9d7193341dcaff13bf9ffcd Author: Heiner Kallweit <hkallweit1@gmail.com> Date: Tue Feb 12 19:56:15 2019 +0100 net: phy: fix interrupt handling in non-started states Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:45 ` Andrew Lunn @ 2019-02-17 15:48 ` Heiner Kallweit 2019-02-17 15:57 ` Andrew Lunn 0 siblings, 1 reply; 32+ messages in thread From: Heiner Kallweit @ 2019-02-17 15:48 UTC (permalink / raw) To: Andrew Lunn; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev On 17.02.2019 16:45, Andrew Lunn wrote: > On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote: >> When testing latest linux-next on the ZII DTU I face the issue that no >> traffic is flowing over the switch ports, even though in dmesg >> everything looks good. Also PHY properly establishes the link. > > Hi Heiner > > Do you have > > commit b79555d5d8d32643e9d7193341dcaff13bf9ffcd > Author: Heiner Kallweit <hkallweit1@gmail.com> > Date: Tue Feb 12 19:56:15 2019 +0100 > > net: phy: fix interrupt handling in non-started states > In linux-next from Feb 15th this patch is included already. > Andrew > Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:48 ` Heiner Kallweit @ 2019-02-17 15:57 ` Andrew Lunn 2019-02-17 16:01 ` Heiner Kallweit 0 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-17 15:57 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev > In linux-next from Feb 15th this patch is included already. So why is port 8 not clearing its interrupt? Maybe put a printk in m88e1121_did_interrupt(), marvell_ack_interrupt(), and marvell_config_intr() and see if they are getting called. Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:57 ` Andrew Lunn @ 2019-02-17 16:01 ` Heiner Kallweit 0 siblings, 0 replies; 32+ messages in thread From: Heiner Kallweit @ 2019-02-17 16:01 UTC (permalink / raw) To: Andrew Lunn; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev On 17.02.2019 16:57, Andrew Lunn wrote: >> In linux-next from Feb 15th this patch is included already. > > So why is port 8 not clearing its interrupt? > > Maybe put a printk in m88e1121_did_interrupt(), > marvell_ack_interrupt(), and marvell_config_intr() and see if they are > getting called. > I think we're going to mix two things here: The trace was from 5.0-rc6, there the phylib fix isn't included yet. > Andrew > Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:34 No traffic with Marvell switch and latest linux-next Heiner Kallweit 2019-02-17 15:40 ` Russell King - ARM Linux admin 2019-02-17 15:45 ` Andrew Lunn @ 2019-02-17 15:51 ` Andrew Lunn 2019-02-17 15:55 ` Heiner Kallweit 2 siblings, 1 reply; 32+ messages in thread From: Andrew Lunn @ 2019-02-17 15:51 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev > 36: 2030566 mscm-ir 79 Edge 400d1000.ethernet > 38: 1010437 gpio-vf610 2 Level 400d1000.ethernet-1:00 > 42: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-g1-atu-prob > 44: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-g1-vtu-prob > 46: 1010435 mv88e6xxx-g1 7 Edge mv88e6xxx-g2 > 49: 0 mv88e6xxx-g2 1 Edge mv88e6xxx-1:01 > 53: 0 mv88e6xxx-g2 5 Edge mv88e6xxx-1:05 > 54: 0 mv88e6xxx-g2 6 Edge mv88e6xxx-1:06 > 56: 100000 mv88e6xxx-g2 8 Edge mv88e6xxx-1:08 > 63: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-watchdog The PHY for Port 8 is not clearing its interrupt. In fact, it is not even saying to the kernel it had an interrupt. After 100,000 interrupts which nobody claimed, the kernel just disables it. So this looks like your fix: - if (!phy_is_started(phydev)) - return IRQ_NONE; I think this went into net first. So it takes a little while to make it into net-next. Maybe you don't have it yet? Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: No traffic with Marvell switch and latest linux-next 2019-02-17 15:51 ` Andrew Lunn @ 2019-02-17 15:55 ` Heiner Kallweit 0 siblings, 0 replies; 32+ messages in thread From: Heiner Kallweit @ 2019-02-17 15:55 UTC (permalink / raw) To: Andrew Lunn; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev On 17.02.2019 16:51, Andrew Lunn wrote: >> 36: 2030566 mscm-ir 79 Edge 400d1000.ethernet >> 38: 1010437 gpio-vf610 2 Level 400d1000.ethernet-1:00 >> 42: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-g1-atu-prob >> 44: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-g1-vtu-prob >> 46: 1010435 mv88e6xxx-g1 7 Edge mv88e6xxx-g2 >> 49: 0 mv88e6xxx-g2 1 Edge mv88e6xxx-1:01 >> 53: 0 mv88e6xxx-g2 5 Edge mv88e6xxx-1:05 >> 54: 0 mv88e6xxx-g2 6 Edge mv88e6xxx-1:06 >> 56: 100000 mv88e6xxx-g2 8 Edge mv88e6xxx-1:08 >> 63: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-watchdog > > The PHY for Port 8 is not clearing its interrupt. In fact, it is not > even saying to the kernel it had an interrupt. After 100,000 interrupts > which nobody claimed, the kernel just disables it. So this looks like > your fix: > > - if (!phy_is_started(phydev)) > - return IRQ_NONE; > > I think this went into net first. So it takes a little while to make > it into net-next. Maybe you don't have it yet? > I have it. And the root cause seems to be another one as Russell mentioned he has a fix for the trace. > Andrew > Heiner ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2019-02-24 21:42 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-17 15:34 No traffic with Marvell switch and latest linux-next Heiner Kallweit 2019-02-17 15:40 ` Russell King - ARM Linux admin 2019-02-17 15:50 ` Heiner Kallweit 2019-02-17 16:40 ` Heiner Kallweit 2019-02-17 16:57 ` Andrew Lunn 2019-02-17 17:06 ` Heiner Kallweit 2019-02-17 17:10 ` Andrew Lunn 2019-02-18 18:16 ` Heiner Kallweit 2019-02-18 18:21 ` Andrew Lunn 2019-02-23 21:48 ` Heiner Kallweit 2019-02-23 23:42 ` Andrew Lunn 2019-02-24 9:10 ` Heiner Kallweit 2019-02-24 15:04 ` Andrew Lunn 2019-02-24 15:15 ` Russell King - ARM Linux admin 2019-02-24 15:28 ` Heiner Kallweit 2019-02-24 15:34 ` Russell King - ARM Linux admin 2019-02-24 15:39 ` Heiner Kallweit 2019-02-24 15:49 ` Russell King - ARM Linux admin 2019-02-24 16:32 ` Florian Fainelli 2019-02-24 17:04 ` Andrew Lunn 2019-02-24 21:26 ` Florian Fainelli 2019-02-24 21:42 ` Heiner Kallweit 2019-02-24 15:31 ` Russell King - ARM Linux admin 2019-02-24 17:28 ` Andrew Lunn 2019-02-24 19:41 ` Russell King - ARM Linux admin 2019-02-23 23:55 ` Florian Fainelli 2019-02-17 15:45 ` Andrew Lunn 2019-02-17 15:48 ` Heiner Kallweit 2019-02-17 15:57 ` Andrew Lunn 2019-02-17 16:01 ` Heiner Kallweit 2019-02-17 15:51 ` Andrew Lunn 2019-02-17 15:55 ` Heiner Kallweit
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.