All of lore.kernel.org
 help / color / mirror / Atom feed
* No traffic with Marvell switch and latest linux-next
@ 2019-02-17 15:34 Heiner Kallweit
  2019-02-17 15:40 ` Russell King - ARM Linux admin
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-17 15:34 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Russell King - ARM Linux; +Cc: netdev

When testing latest linux-next on the ZII DTU I face the issue that no
traffic is flowing over the switch ports, even though in dmesg
everything looks good. Also PHY properly establishes the link.

With 4.20.10 I don't have the issue and with 5.0-rc6 also not.
However on 5.0-rc6 I got the following, also number of network
interrupts seems to be very high (few minutes after boot).
Any idea what's going on?

Andrew, IIRC you recently fixed some interrupt-related issue:
7ae710f9f8b2 ("gpio: vf610: Mask all GPIO interrupts")
But the description doesn't seem to match this trace.

irq 56: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 577 Comm: irq/38-400d1000 Not tainted 5.0.0-rc6 #1
Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[<8010c898>] (unwind_backtrace) from [<8010ad98>] (show_stack+0x10/0x14)
[<8010ad98>] (show_stack) from [<80149660>] (__report_bad_irq+0x38/0xb0)
[<80149660>] (__report_bad_irq) from [<80149478>] (note_interrupt+0x10c/0x294)
[<80149478>] (note_interrupt) from [<80149cac>] (handle_nested_irq+0xd8/0xf4)
[<80149cac>] (handle_nested_irq) from [<80384a64>] (mv88e6xxx_g2_irq_thread_fn+0x90/0xc0)
[<80384a64>] (mv88e6xxx_g2_irq_thread_fn) from [<80149c60>] (handle_nested_irq+0x8c/0xf4)
[<80149c60>] (handle_nested_irq) from [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work+0x98/0xcc)
[<8037ccd0>] (mv88e6xxx_g1_irq_thread_work) from [<80147ff4>] (irq_thread_fn+0x1c/0x78)
[<80147ff4>] (irq_thread_fn) from [<80148280>] (irq_thread+0x124/0x1cc)
[<80148280>] (irq_thread) from [<8012f8e8>] (kthread+0x140/0x148)
[<8012f8e8>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0x9f6c7fb0 to 0x9f6c7ff8)
7fa0:                                     00000000 00000000 00000000 00000000
7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
handlers:
[<b09c70df>] irq_default_primary_handler threaded [<44d6803f>] phy_interrupt
Disabling IRQ #56


36:    2030566   mscm-ir  79 Edge      400d1000.ethernet
 38:    1010437  gpio-vf610   2 Level     400d1000.ethernet-1:00
 42:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
 44:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
 46:    1010435  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
 49:          0  mv88e6xxx-g2   1 Edge      mv88e6xxx-1:01
 53:          0  mv88e6xxx-g2   5 Edge      mv88e6xxx-1:05
 54:          0  mv88e6xxx-g2   6 Edge      mv88e6xxx-1:06
 56:     100000  mv88e6xxx-g2   8 Edge      mv88e6xxx-1:08
 63:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog

Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:34 No traffic with Marvell switch and latest linux-next Heiner Kallweit
@ 2019-02-17 15:40 ` Russell King - ARM Linux admin
  2019-02-17 15:50   ` Heiner Kallweit
  2019-02-17 15:45 ` Andrew Lunn
  2019-02-17 15:51 ` Andrew Lunn
  2 siblings, 1 reply; 32+ messages in thread
From: Russell King - ARM Linux admin @ 2019-02-17 15:40 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Andrew Lunn, Florian Fainelli, netdev

On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote:
> When testing latest linux-next on the ZII DTU I face the issue that no
> traffic is flowing over the switch ports, even though in dmesg
> everything looks good. Also PHY properly establishes the link.
> 
> With 4.20.10 I don't have the issue and with 5.0-rc6 also not.
> However on 5.0-rc6 I got the following, also number of network
> interrupts seems to be very high (few minutes after boot).
> Any idea what's going on?
> 
> Andrew, IIRC you recently fixed some interrupt-related issue:
> 7ae710f9f8b2 ("gpio: vf610: Mask all GPIO interrupts")
> But the description doesn't seem to match this trace.

I have a fix for the trace you have below, but it has nothing to do
with no traffic.  I'll send it out shortly.

Which protocol are you using (ipv4 or ipv6)?  Have you setup a
bridge device containing the ports you wish to switch network
traffic.  Without a bridge device, DSA will by default treat each
port as a separate port.  The other thing that gets people is the
ethernet interface connected to the DSA switch must be up _before_
bringing up any of the switch ports.

> 
> irq 56: nobody cared (try booting with the "irqpoll" option)
> CPU: 0 PID: 577 Comm: irq/38-400d1000 Not tainted 5.0.0-rc6 #1
> Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
> [<8010c898>] (unwind_backtrace) from [<8010ad98>] (show_stack+0x10/0x14)
> [<8010ad98>] (show_stack) from [<80149660>] (__report_bad_irq+0x38/0xb0)
> [<80149660>] (__report_bad_irq) from [<80149478>] (note_interrupt+0x10c/0x294)
> [<80149478>] (note_interrupt) from [<80149cac>] (handle_nested_irq+0xd8/0xf4)
> [<80149cac>] (handle_nested_irq) from [<80384a64>] (mv88e6xxx_g2_irq_thread_fn+0x90/0xc0)
> [<80384a64>] (mv88e6xxx_g2_irq_thread_fn) from [<80149c60>] (handle_nested_irq+0x8c/0xf4)
> [<80149c60>] (handle_nested_irq) from [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work+0x98/0xcc)
> [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work) from [<80147ff4>] (irq_thread_fn+0x1c/0x78)
> [<80147ff4>] (irq_thread_fn) from [<80148280>] (irq_thread+0x124/0x1cc)
> [<80148280>] (irq_thread) from [<8012f8e8>] (kthread+0x140/0x148)
> [<8012f8e8>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c)
> Exception stack(0x9f6c7fb0 to 0x9f6c7ff8)
> 7fa0:                                     00000000 00000000 00000000 00000000
> 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> handlers:
> [<b09c70df>] irq_default_primary_handler threaded [<44d6803f>] phy_interrupt
> Disabling IRQ #56
> 
> 
> 36:    2030566   mscm-ir  79 Edge      400d1000.ethernet
>  38:    1010437  gpio-vf610   2 Level     400d1000.ethernet-1:00
>  42:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>  44:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>  46:    1010435  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>  49:          0  mv88e6xxx-g2   1 Edge      mv88e6xxx-1:01
>  53:          0  mv88e6xxx-g2   5 Edge      mv88e6xxx-1:05
>  54:          0  mv88e6xxx-g2   6 Edge      mv88e6xxx-1:06
>  56:     100000  mv88e6xxx-g2   8 Edge      mv88e6xxx-1:08
>  63:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog
> 
> Heiner
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:34 No traffic with Marvell switch and latest linux-next Heiner Kallweit
  2019-02-17 15:40 ` Russell King - ARM Linux admin
@ 2019-02-17 15:45 ` Andrew Lunn
  2019-02-17 15:48   ` Heiner Kallweit
  2019-02-17 15:51 ` Andrew Lunn
  2 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-17 15:45 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev

On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote:
> When testing latest linux-next on the ZII DTU I face the issue that no
> traffic is flowing over the switch ports, even though in dmesg
> everything looks good. Also PHY properly establishes the link.

Hi Heiner

Do you have

commit b79555d5d8d32643e9d7193341dcaff13bf9ffcd
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Tue Feb 12 19:56:15 2019 +0100

    net: phy: fix interrupt handling in non-started states
    
Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:45 ` Andrew Lunn
@ 2019-02-17 15:48   ` Heiner Kallweit
  2019-02-17 15:57     ` Andrew Lunn
  0 siblings, 1 reply; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-17 15:48 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev

On 17.02.2019 16:45, Andrew Lunn wrote:
> On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote:
>> When testing latest linux-next on the ZII DTU I face the issue that no
>> traffic is flowing over the switch ports, even though in dmesg
>> everything looks good. Also PHY properly establishes the link.
> 
> Hi Heiner
> 
> Do you have
> 
> commit b79555d5d8d32643e9d7193341dcaff13bf9ffcd
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Tue Feb 12 19:56:15 2019 +0100
> 
>     net: phy: fix interrupt handling in non-started states
>     
In linux-next from Feb 15th this patch is included already.

> Andrew
> 
Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:40 ` Russell King - ARM Linux admin
@ 2019-02-17 15:50   ` Heiner Kallweit
  2019-02-17 16:40     ` Heiner Kallweit
  0 siblings, 1 reply; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-17 15:50 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: Andrew Lunn, Florian Fainelli, netdev

On 17.02.2019 16:40, Russell King - ARM Linux admin wrote:
> On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote:
>> When testing latest linux-next on the ZII DTU I face the issue that no
>> traffic is flowing over the switch ports, even though in dmesg
>> everything looks good. Also PHY properly establishes the link.
>>
>> With 4.20.10 I don't have the issue and with 5.0-rc6 also not.
>> However on 5.0-rc6 I got the following, also number of network
>> interrupts seems to be very high (few minutes after boot).
>> Any idea what's going on?
>>
>> Andrew, IIRC you recently fixed some interrupt-related issue:
>> 7ae710f9f8b2 ("gpio: vf610: Mask all GPIO interrupts")
>> But the description doesn't seem to match this trace.
> 
> I have a fix for the trace you have below, but it has nothing to do
> with no traffic.  I'll send it out shortly.
> 
> Which protocol are you using (ipv4 or ipv6)?  Have you setup a
> bridge device containing the ports you wish to switch network
> traffic.  Without a bridge device, DSA will by default treat each
> port as a separate port.  The other thing that gets people is the
> ethernet interface connected to the DSA switch must be up _before_
> bringing up any of the switch ports.
> 
ipv4, a simple ping. No bridge. Device is connected to a switch
that is always on.

Technical environment and userspace is always the same, so it seems
to be the kernel version.

>>
>> irq 56: nobody cared (try booting with the "irqpoll" option)
>> CPU: 0 PID: 577 Comm: irq/38-400d1000 Not tainted 5.0.0-rc6 #1
>> Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
>> [<8010c898>] (unwind_backtrace) from [<8010ad98>] (show_stack+0x10/0x14)
>> [<8010ad98>] (show_stack) from [<80149660>] (__report_bad_irq+0x38/0xb0)
>> [<80149660>] (__report_bad_irq) from [<80149478>] (note_interrupt+0x10c/0x294)
>> [<80149478>] (note_interrupt) from [<80149cac>] (handle_nested_irq+0xd8/0xf4)
>> [<80149cac>] (handle_nested_irq) from [<80384a64>] (mv88e6xxx_g2_irq_thread_fn+0x90/0xc0)
>> [<80384a64>] (mv88e6xxx_g2_irq_thread_fn) from [<80149c60>] (handle_nested_irq+0x8c/0xf4)
>> [<80149c60>] (handle_nested_irq) from [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work+0x98/0xcc)
>> [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work) from [<80147ff4>] (irq_thread_fn+0x1c/0x78)
>> [<80147ff4>] (irq_thread_fn) from [<80148280>] (irq_thread+0x124/0x1cc)
>> [<80148280>] (irq_thread) from [<8012f8e8>] (kthread+0x140/0x148)
>> [<8012f8e8>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c)
>> Exception stack(0x9f6c7fb0 to 0x9f6c7ff8)
>> 7fa0:                                     00000000 00000000 00000000 00000000
>> 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> handlers:
>> [<b09c70df>] irq_default_primary_handler threaded [<44d6803f>] phy_interrupt
>> Disabling IRQ #56
>>
>>
>> 36:    2030566   mscm-ir  79 Edge      400d1000.ethernet
>>  38:    1010437  gpio-vf610   2 Level     400d1000.ethernet-1:00
>>  42:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>>  44:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>>  46:    1010435  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>>  49:          0  mv88e6xxx-g2   1 Edge      mv88e6xxx-1:01
>>  53:          0  mv88e6xxx-g2   5 Edge      mv88e6xxx-1:05
>>  54:          0  mv88e6xxx-g2   6 Edge      mv88e6xxx-1:06
>>  56:     100000  mv88e6xxx-g2   8 Edge      mv88e6xxx-1:08
>>  63:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog
>>
>> Heiner
>>
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:34 No traffic with Marvell switch and latest linux-next Heiner Kallweit
  2019-02-17 15:40 ` Russell King - ARM Linux admin
  2019-02-17 15:45 ` Andrew Lunn
@ 2019-02-17 15:51 ` Andrew Lunn
  2019-02-17 15:55   ` Heiner Kallweit
  2 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-17 15:51 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev

> 36:    2030566   mscm-ir  79 Edge      400d1000.ethernet
>  38:    1010437  gpio-vf610   2 Level     400d1000.ethernet-1:00
>  42:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>  44:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>  46:    1010435  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>  49:          0  mv88e6xxx-g2   1 Edge      mv88e6xxx-1:01
>  53:          0  mv88e6xxx-g2   5 Edge      mv88e6xxx-1:05
>  54:          0  mv88e6xxx-g2   6 Edge      mv88e6xxx-1:06
>  56:     100000  mv88e6xxx-g2   8 Edge      mv88e6xxx-1:08
>  63:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog

The PHY for Port 8 is not clearing its interrupt. In fact, it is not
even saying to the kernel it had an interrupt. After 100,000 interrupts
which nobody claimed, the kernel just disables it. So this looks like
your fix:

-       if (!phy_is_started(phydev))
-               return IRQ_NONE;

I think this went into net first. So it takes a little while to make
it into net-next. Maybe you don't have it yet?

     Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:51 ` Andrew Lunn
@ 2019-02-17 15:55   ` Heiner Kallweit
  0 siblings, 0 replies; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-17 15:55 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev

On 17.02.2019 16:51, Andrew Lunn wrote:
>> 36:    2030566   mscm-ir  79 Edge      400d1000.ethernet
>>  38:    1010437  gpio-vf610   2 Level     400d1000.ethernet-1:00
>>  42:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>>  44:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>>  46:    1010435  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>>  49:          0  mv88e6xxx-g2   1 Edge      mv88e6xxx-1:01
>>  53:          0  mv88e6xxx-g2   5 Edge      mv88e6xxx-1:05
>>  54:          0  mv88e6xxx-g2   6 Edge      mv88e6xxx-1:06
>>  56:     100000  mv88e6xxx-g2   8 Edge      mv88e6xxx-1:08
>>  63:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog
> 
> The PHY for Port 8 is not clearing its interrupt. In fact, it is not
> even saying to the kernel it had an interrupt. After 100,000 interrupts
> which nobody claimed, the kernel just disables it. So this looks like
> your fix:
> 
> -       if (!phy_is_started(phydev))
> -               return IRQ_NONE;
> 
> I think this went into net first. So it takes a little while to make
> it into net-next. Maybe you don't have it yet?
> 
I have it. And the root cause seems to be another one as Russell
mentioned he has a fix for the trace.

>      Andrew
> 
Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:48   ` Heiner Kallweit
@ 2019-02-17 15:57     ` Andrew Lunn
  2019-02-17 16:01       ` Heiner Kallweit
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-17 15:57 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev

> In linux-next from Feb 15th this patch is included already.

So why is port 8 not clearing its interrupt?

Maybe put a printk in m88e1121_did_interrupt(),
marvell_ack_interrupt(), and marvell_config_intr() and see if they are
getting called.

	Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:57     ` Andrew Lunn
@ 2019-02-17 16:01       ` Heiner Kallweit
  0 siblings, 0 replies; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-17 16:01 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, Russell King - ARM Linux, netdev

On 17.02.2019 16:57, Andrew Lunn wrote:
>> In linux-next from Feb 15th this patch is included already.
> 
> So why is port 8 not clearing its interrupt?
> 
> Maybe put a printk in m88e1121_did_interrupt(),
> marvell_ack_interrupt(), and marvell_config_intr() and see if they are
> getting called.
> 
I think we're going to mix two things here:
The trace was from 5.0-rc6, there the phylib fix isn't included yet.

> 	Andrew
> 
Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 15:50   ` Heiner Kallweit
@ 2019-02-17 16:40     ` Heiner Kallweit
  2019-02-17 16:57       ` Andrew Lunn
  0 siblings, 1 reply; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-17 16:40 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: Andrew Lunn, Florian Fainelli, netdev

On 17.02.2019 16:50, Heiner Kallweit wrote:
> On 17.02.2019 16:40, Russell King - ARM Linux admin wrote:
>> On Sun, Feb 17, 2019 at 04:34:32PM +0100, Heiner Kallweit wrote:
>>> When testing latest linux-next on the ZII DTU I face the issue that no
>>> traffic is flowing over the switch ports, even though in dmesg
>>> everything looks good. Also PHY properly establishes the link.
>>>
>>> With 4.20.10 I don't have the issue and with 5.0-rc6 also not.
>>> However on 5.0-rc6 I got the following, also number of network
>>> interrupts seems to be very high (few minutes after boot).
>>> Any idea what's going on?
>>>
>>> Andrew, IIRC you recently fixed some interrupt-related issue:
>>> 7ae710f9f8b2 ("gpio: vf610: Mask all GPIO interrupts")
>>> But the description doesn't seem to match this trace.
>>
>> I have a fix for the trace you have below, but it has nothing to do
>> with no traffic.  I'll send it out shortly.
>>
>> Which protocol are you using (ipv4 or ipv6)?  Have you setup a
>> bridge device containing the ports you wish to switch network
>> traffic.  Without a bridge device, DSA will by default treat each
>> port as a separate port.  The other thing that gets people is the
>> ethernet interface connected to the DSA switch must be up _before_
>> bringing up any of the switch ports.
>>
> ipv4, a simple ping. No bridge. Device is connected to a switch
> that is always on.
> 
> Technical environment and userspace is always the same, so it seems
> to be the kernel version.
> 
There haven't been that many changes to mv88e8xxx since 5.0-rc6.
I reverted 7c0db24cc431 ("dsa: mv88e6xxx: Ensure all pending interrupts
are handled prior to exit") who looked like a candidate and bingo:
network is working again. Obviously something is wrong with this patch.

>>>
>>> irq 56: nobody cared (try booting with the "irqpoll" option)
>>> CPU: 0 PID: 577 Comm: irq/38-400d1000 Not tainted 5.0.0-rc6 #1
>>> Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
>>> [<8010c898>] (unwind_backtrace) from [<8010ad98>] (show_stack+0x10/0x14)
>>> [<8010ad98>] (show_stack) from [<80149660>] (__report_bad_irq+0x38/0xb0)
>>> [<80149660>] (__report_bad_irq) from [<80149478>] (note_interrupt+0x10c/0x294)
>>> [<80149478>] (note_interrupt) from [<80149cac>] (handle_nested_irq+0xd8/0xf4)
>>> [<80149cac>] (handle_nested_irq) from [<80384a64>] (mv88e6xxx_g2_irq_thread_fn+0x90/0xc0)
>>> [<80384a64>] (mv88e6xxx_g2_irq_thread_fn) from [<80149c60>] (handle_nested_irq+0x8c/0xf4)
>>> [<80149c60>] (handle_nested_irq) from [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work+0x98/0xcc)
>>> [<8037ccd0>] (mv88e6xxx_g1_irq_thread_work) from [<80147ff4>] (irq_thread_fn+0x1c/0x78)
>>> [<80147ff4>] (irq_thread_fn) from [<80148280>] (irq_thread+0x124/0x1cc)
>>> [<80148280>] (irq_thread) from [<8012f8e8>] (kthread+0x140/0x148)
>>> [<8012f8e8>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c)
>>> Exception stack(0x9f6c7fb0 to 0x9f6c7ff8)
>>> 7fa0:                                     00000000 00000000 00000000 00000000
>>> 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>> handlers:
>>> [<b09c70df>] irq_default_primary_handler threaded [<44d6803f>] phy_interrupt
>>> Disabling IRQ #56
>>>
>>>
>>> 36:    2030566   mscm-ir  79 Edge      400d1000.ethernet
>>>  38:    1010437  gpio-vf610   2 Level     400d1000.ethernet-1:00
>>>  42:          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>>>  44:          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>>>  46:    1010435  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>>>  49:          0  mv88e6xxx-g2   1 Edge      mv88e6xxx-1:01
>>>  53:          0  mv88e6xxx-g2   5 Edge      mv88e6xxx-1:05
>>>  54:          0  mv88e6xxx-g2   6 Edge      mv88e6xxx-1:06
>>>  56:     100000  mv88e6xxx-g2   8 Edge      mv88e6xxx-1:08
>>>  63:          0  mv88e6xxx-g2  15 Edge      mv88e6xxx-watchdog
>>>
>>> Heiner
>>>
>>
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 16:40     ` Heiner Kallweit
@ 2019-02-17 16:57       ` Andrew Lunn
  2019-02-17 17:06         ` Heiner Kallweit
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-17 16:57 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

> There haven't been that many changes to mv88e8xxx since 5.0-rc6.
> I reverted 7c0db24cc431 ("dsa: mv88e6xxx: Ensure all pending interrupts
> are handled prior to exit") who looked like a candidate and bingo:
> network is working again. Obviously something is wrong with this patch.

O.K. I tested it on an edge interrupt system, but not a level
interrupt. I wounder if it is related to that somehow? DTU should be
using level interrupts.

       Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 16:57       ` Andrew Lunn
@ 2019-02-17 17:06         ` Heiner Kallweit
  2019-02-17 17:10           ` Andrew Lunn
  0 siblings, 1 reply; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-17 17:06 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

On 17.02.2019 17:57, Andrew Lunn wrote:
>> There haven't been that many changes to mv88e8xxx since 5.0-rc6.
>> I reverted 7c0db24cc431 ("dsa: mv88e6xxx: Ensure all pending interrupts
>> are handled prior to exit") who looked like a candidate and bingo:
>> network is working again. Obviously something is wrong with this patch.
> 
> O.K. I tested it on an edge interrupt system, but not a level
> interrupt. I wounder if it is related to that somehow? DTU should be
> using level interrupts.
> 
Sorry, I may have been too fast with this statement. With this patch
reverted it worked, but now I have a build with this patch still included,
and it works too. Need to dig deeper ..

>        Andrew
> 

Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 17:06         ` Heiner Kallweit
@ 2019-02-17 17:10           ` Andrew Lunn
  2019-02-18 18:16             ` Heiner Kallweit
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-17 17:10 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

> Sorry, I may have been too fast with this statement. With this patch
> reverted it worked, but now I have a build with this patch still included,
> and it works too. Need to dig deeper ..

Hi Heiner

Watch out for boot vs reboot, and when rebooting if port 8 had link or
not before you reboot.

    Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-17 17:10           ` Andrew Lunn
@ 2019-02-18 18:16             ` Heiner Kallweit
  2019-02-18 18:21               ` Andrew Lunn
  0 siblings, 1 reply; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-18 18:16 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

On 17.02.2019 18:10, Andrew Lunn wrote:
>> Sorry, I may have been too fast with this statement. With this patch
>> reverted it worked, but now I have a build with this patch still included,
>> and it works too. Need to dig deeper ..
> 
> Hi Heiner
> 
> Watch out for boot vs reboot, and when rebooting if port 8 had link or
> not before you reboot.
> 
Will do. Is there some known issue or bug?

>     Andrew
> 
Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-18 18:16             ` Heiner Kallweit
@ 2019-02-18 18:21               ` Andrew Lunn
  2019-02-23 21:48                 ` Heiner Kallweit
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-18 18:21 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

> > Hi Heiner
> > 
> > Watch out for boot vs reboot, and when rebooting if port 8 had link or
> > not before you reboot.
> > 
> Will do. Is there some known issue or bug?

Hi Heiner

No, but it is a variable which can make a difference. The fix i made
for the Freescale GPIO controller was not an issue for cold boot, but
reboot with link up did cause interrupt problems, etc.

       Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-18 18:21               ` Andrew Lunn
@ 2019-02-23 21:48                 ` Heiner Kallweit
  2019-02-23 23:42                   ` Andrew Lunn
  2019-02-23 23:55                   ` Florian Fainelli
  0 siblings, 2 replies; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-23 21:48 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

On 18.02.2019 19:21, Andrew Lunn wrote:
>>> Hi Heiner
>>>
>>> Watch out for boot vs reboot, and when rebooting if port 8 had link or
>>> not before you reboot.
>>>
>> Will do. Is there some known issue or bug?
> 
> Hi Heiner
> 
> No, but it is a variable which can make a difference. The fix i made
> for the Freescale GPIO controller was not an issue for cold boot, but
> reboot with link up did cause interrupt problems, etc.
> 
Hi Andrew,

it took me quite some time to debug this issue ..

At first a bisect pointed to one of my commits:
5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")

Further digging lead me to some suspicious dsa code:
In dsa_port_fixed_link_register_of() there's a call to genphy_read_status().
At the time of the call phydev->advertising is empty, therefore the fixed phy
settings are overwritten with defaults (10/half) what breaks the system.

Worth to be mentioned is that for the PHY these two flags are set:
- is_pseudo_fixed_link (that's ok)
- autoneg -> I'm not sure this is correct.

It seems that you once added the code in question:
39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port speeds/duplex")

I did what I like to do most and removed some code.
W/o the calls to genphy_config_init() and genphy_read_status() it works again.
Do these calls have some purpose here with a fixed link?

My commit exposed the issue because before it genphy_read_status() read the
advertisement from chip registers instead of using phydev->advertising.

Very close to this function is dsa_port_setup_phy_of() which uses genphy_resume()
and genphy_read_status() and also looks somewhat suspicious. This code makes
quite some assumptions:
- PHY is a C22 PHY
- PHY is compatible with the generic PHY driver

>        Andrew
> 
Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-23 21:48                 ` Heiner Kallweit
@ 2019-02-23 23:42                   ` Andrew Lunn
  2019-02-24  9:10                     ` Heiner Kallweit
  2019-02-24 15:31                     ` Russell King - ARM Linux admin
  2019-02-23 23:55                   ` Florian Fainelli
  1 sibling, 2 replies; 32+ messages in thread
From: Andrew Lunn @ 2019-02-23 23:42 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

> it took me quite some time to debug this issue ..
> 
> At first a bisect pointed to one of my commits:
> 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
> 
> Further digging lead me to some suspicious dsa code:
> In dsa_port_fixed_link_register_of() there's a call to genphy_read_status().
> At the time of the call phydev->advertising is empty, therefore the fixed phy
> settings are overwritten with defaults (10/half) what breaks the system.
> 
> Worth to be mentioned is that for the PHY these two flags are set:
> - is_pseudo_fixed_link (that's ok)
> - autoneg -> I'm not sure this is correct.
> 
> It seems that you once added the code in question:
> 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port speeds/duplex")

Hi Heiner

For Ethernet switches, you can have device trees like this:

                       switch0: switch@0 {
                                compatible = "marvell,mv88e6085";
                                pinctrl-0 = <&pinctrl_gpio_switch0>;
                                pinctrl-names = "default";
                                reg = <0>;
                                dsa,member = <0 0>;
                                interrupt-parent = <&gpio0>;
                                interrupts = <27 IRQ_TYPE_LEVEL_LOW>;
                                interrupt-controller;
                                #interrupt-cells = <2>;
                                eeprom-length = <512>;

                                ports {
                                        #address-cells = <1>;
                                        #size-cells = <0>;

                                        port@0 {
                                                reg = <0>;
                                                label = "lan0";
                                                phy-handle = <&switch0phy0>;
                                        };

...
                                        switch0port5: port@5 {
                                                reg = <5>;
                                                label = "dsa";
                                                phy-mode = "rgmii-txid";
                                                link = <&switch1port6
                                                        &switch2port9>;
                                                fixed-link {
                                                        speed = <1000>;
                                                        full-duplex;
                                                };
                                        };

This is a DSA port. It is used to connect two Ethernet switches
together. In this case, the MACs are connected back to back. There are
no PHYs involved. These ports don't have a netdev associated with
them, since they are just pipes between switches, not user interfaces.

If i remember correctly, the code you are looking at was to deal with
the "rgmii-txid". Without the TXID, i could not get frames to pass
between the ports.

There is a second use case as well. The Vybrid FEC ethernet controller
is a Fast Ethernet port. The switch ports are 1G. DSA drivers set the
port connecting to the SoC to its highest speed. Again, there is no
netdev associated to the switch port, and generally the MAC ports are
connected back to back. This 100 vs 1000 does not work for the
Vybrid. So we add a fixed PHY to the CPU port.

                                        port@6 {
                                                reg = <6>;
                                                label = "cpu";
                                                ethernet = <&fec1>;

                                                fixed-link {
                                                        speed = <100>;
                                                        full-duplex;
                                                };
                                        };

And there is a third use case:

                                       port@4 {
                                                reg = <4>;
                                                label = "optical4";

                                                fixed-link {
                                                        speed = <1000>;
                                                        full-duplex;
                                                        link-gpios = <&gpio6 3
                                                              GPIO_ACTIVE_HIGH>;
                                                };
                                        };

We have a GPIO which tells us if the link it up, because there is not
a PHY here, but an optical module. The GPIO tells us about loss of
signal. Unfortunately we cannot use the SFP driver here, because of a
hardware issue.

In all cases end up in the same code. We want to tell the MAC to
configure RGMII delays, or to configure the port speed, or to have the
correct initial link state. The adjust_link() call can do this, if
passed a phydev. And we have a phydev for the fixed-link PHY. However,
since it has never been attached to a netdev, phy_start() called, etc,
the state information is maybe not correct. So we call config_init()
and read_status(), so phydev should reflect the state of the
fixed-link PHY. And a fixed-link PHY is always c22, and it can be
driven by genphy. Take a look at drivers/net/phy/swphy.c which is the
core of the simulation, and fixed_phy.c

> I did what I like to do most and removed some code.
> W/o the calls to genphy_config_init() and genphy_read_status() it works again.
> Do these calls have some purpose here with a fixed link?

Maybe with all the core changes, these calls are no longer needed? We
just need to ensure the state in phydev reflects the state of the
fixed link, i.e. in this case 100 Full.

Looking forward, at some point we are going to have to make fixed-link
support higher speeds. That probably means we need a swphy-c45 which
emulates the standard registers for 2.5G, 5G and 10G. At that point
genphy will not work...

      Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-23 21:48                 ` Heiner Kallweit
  2019-02-23 23:42                   ` Andrew Lunn
@ 2019-02-23 23:55                   ` Florian Fainelli
  1 sibling, 0 replies; 32+ messages in thread
From: Florian Fainelli @ 2019-02-23 23:55 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn; +Cc: Russell King - ARM Linux admin, netdev

Le 2/23/19 à 1:48 PM, Heiner Kallweit a écrit :
> On 18.02.2019 19:21, Andrew Lunn wrote:
>>>> Hi Heiner
>>>>
>>>> Watch out for boot vs reboot, and when rebooting if port 8 had link or
>>>> not before you reboot.
>>>>
>>> Will do. Is there some known issue or bug?
>>
>> Hi Heiner
>>
>> No, but it is a variable which can make a difference. The fix i made
>> for the Freescale GPIO controller was not an issue for cold boot, but
>> reboot with link up did cause interrupt problems, etc.
>>
> Hi Andrew,
> 
> it took me quite some time to debug this issue ..
> 
> At first a bisect pointed to one of my commits:
> 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")

This is the same issue that Michal reported here:

https://www.spinics.net/lists/netdev/msg552955.html

> 
> Further digging lead me to some suspicious dsa code:
> In dsa_port_fixed_link_register_of() there's a call to genphy_read_status().
> At the time of the call phydev->advertising is empty, therefore the fixed phy
> settings are overwritten with defaults (10/half) what breaks the system.
> 
> Worth to be mentioned is that for the PHY these two flags are set:
> - is_pseudo_fixed_link (that's ok)
> - autoneg -> I'm not sure this is correct.
> 
> It seems that you once added the code in question:
> 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port speeds/duplex")
> 
> I did what I like to do most and removed some code.
> W/o the calls to genphy_config_init() and genphy_read_status() it works again.
> Do these calls have some purpose here with a fixed link?
> 
> My commit exposed the issue because before it genphy_read_status() read the
> advertisement from chip registers instead of using phydev->advertising.
> 
> Very close to this function is dsa_port_setup_phy_of() which uses genphy_resume()
> and genphy_read_status() and also looks somewhat suspicious. This code makes
> quite some assumptions:
> - PHY is a C22 PHY
> - PHY is compatible with the generic PHY driver
> 
>>        Andrew
>>
> Heiner
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-23 23:42                   ` Andrew Lunn
@ 2019-02-24  9:10                     ` Heiner Kallweit
  2019-02-24 15:04                       ` Andrew Lunn
  2019-02-24 15:31                     ` Russell King - ARM Linux admin
  1 sibling, 1 reply; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-24  9:10 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

On 24.02.2019 00:42, Andrew Lunn wrote:
>> it took me quite some time to debug this issue ..
>>
>> At first a bisect pointed to one of my commits:
>> 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
>>
>> Further digging lead me to some suspicious dsa code:
>> In dsa_port_fixed_link_register_of() there's a call to genphy_read_status().
>> At the time of the call phydev->advertising is empty, therefore the fixed phy
>> settings are overwritten with defaults (10/half) what breaks the system.
>>
>> Worth to be mentioned is that for the PHY these two flags are set:
>> - is_pseudo_fixed_link (that's ok)
>> - autoneg -> I'm not sure this is correct.
>>
>> It seems that you once added the code in question:
>> 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port speeds/duplex")
> 
> Hi Heiner
> 
> For Ethernet switches, you can have device trees like this:
> 
>                        switch0: switch@0 {
>                                 compatible = "marvell,mv88e6085";
>                                 pinctrl-0 = <&pinctrl_gpio_switch0>;
>                                 pinctrl-names = "default";
>                                 reg = <0>;
>                                 dsa,member = <0 0>;
>                                 interrupt-parent = <&gpio0>;
>                                 interrupts = <27 IRQ_TYPE_LEVEL_LOW>;
>                                 interrupt-controller;
>                                 #interrupt-cells = <2>;
>                                 eeprom-length = <512>;
> 
>                                 ports {
>                                         #address-cells = <1>;
>                                         #size-cells = <0>;
> 
>                                         port@0 {
>                                                 reg = <0>;
>                                                 label = "lan0";
>                                                 phy-handle = <&switch0phy0>;
>                                         };
> 
> ...
>                                         switch0port5: port@5 {
>                                                 reg = <5>;
>                                                 label = "dsa";
>                                                 phy-mode = "rgmii-txid";
>                                                 link = <&switch1port6
>                                                         &switch2port9>;
>                                                 fixed-link {
>                                                         speed = <1000>;
>                                                         full-duplex;
>                                                 };
>                                         };
> 
> This is a DSA port. It is used to connect two Ethernet switches
> together. In this case, the MACs are connected back to back. There are
> no PHYs involved. These ports don't have a netdev associated with
> them, since they are just pipes between switches, not user interfaces.
> 
> If i remember correctly, the code you are looking at was to deal with
> the "rgmii-txid". Without the TXID, i could not get frames to pass
> between the ports.
> 
> There is a second use case as well. The Vybrid FEC ethernet controller
> is a Fast Ethernet port. The switch ports are 1G. DSA drivers set the
> port connecting to the SoC to its highest speed. Again, there is no
> netdev associated to the switch port, and generally the MAC ports are
> connected back to back. This 100 vs 1000 does not work for the
> Vybrid. So we add a fixed PHY to the CPU port.
> 
>                                         port@6 {
>                                                 reg = <6>;
>                                                 label = "cpu";
>                                                 ethernet = <&fec1>;
> 
>                                                 fixed-link {
>                                                         speed = <100>;
>                                                         full-duplex;
>                                                 };
>                                         };
> 
> And there is a third use case:
> 
>                                        port@4 {
>                                                 reg = <4>;
>                                                 label = "optical4";
> 
>                                                 fixed-link {
>                                                         speed = <1000>;
>                                                         full-duplex;
>                                                         link-gpios = <&gpio6 3
>                                                               GPIO_ACTIVE_HIGH>;
>                                                 };
>                                         };
> 
> We have a GPIO which tells us if the link it up, because there is not
> a PHY here, but an optical module. The GPIO tells us about loss of
> signal. Unfortunately we cannot use the SFP driver here, because of a
> hardware issue.
> 
> In all cases end up in the same code. We want to tell the MAC to
> configure RGMII delays, or to configure the port speed, or to have the
> correct initial link state. The adjust_link() call can do this, if
> passed a phydev. And we have a phydev for the fixed-link PHY. However,
> since it has never been attached to a netdev, phy_start() called, etc,
> the state information is maybe not correct. So we call config_init()
> and read_status(), so phydev should reflect the state of the
> fixed-link PHY. And a fixed-link PHY is always c22, and it can be
> driven by genphy. Take a look at drivers/net/phy/swphy.c which is the
> core of the simulation, and fixed_phy.c
> 
Thanks a lot for the very comprehensive explanation!

I think what's not correct is that phydev->autoneg is set
(by phy_device_create) for a fixed link. genphy_config_init() ensures
that aneg isn't set in the "supported" bitmap, but it doesn't care
about phydev->autoneg. IMO we need to consider this at few places:

- genphy_config_init
  It should clear phydev->autoneg if aneg isn't supported.

- phy_probe
  After having read the chip features we should clear phydev->autoneg
  if aneg isn't supported.

- In __fixed_phy_register() we should clear phydev->autoneg.

>> I did what I like to do most and removed some code.
>> W/o the calls to genphy_config_init() and genphy_read_status() it works again.
>> Do these calls have some purpose here with a fixed link?
> 
> Maybe with all the core changes, these calls are no longer needed? We
> just need to ensure the state in phydev reflects the state of the
> fixed link, i.e. in this case 100 Full.
> 
I think at least the call to genphy_config_init can be removed. I don't
really like the name of this function anyway, because it doesn't
initialize anything but is a sanity check that reads chip capabilities
and removes unsupported modes from the "supported" bitmap.

genphy_read_status() we need to read the link status from GPIO
(if applicable in the use case). To be more precise, most likely
calling genphy_update_link() would be sufficient because speed and
duplex are fixed.

> Looking forward, at some point we are going to have to make fixed-link
> support higher speeds. That probably means we need a swphy-c45 which
> emulates the standard registers for 2.5G, 5G and 10G. At that point
> genphy will not work...
> 
>       Andrew
> .
> 
Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24  9:10                     ` Heiner Kallweit
@ 2019-02-24 15:04                       ` Andrew Lunn
  2019-02-24 15:15                         ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-24 15:04 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Russell King - ARM Linux admin, Florian Fainelli, netdev

> I think what's not correct is that phydev->autoneg is set
> (by phy_device_create) for a fixed link.

Fixed-link tries to emulate auto-neg:

                bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;

Maybe it needs better emulation of auto-neg?

      Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 15:04                       ` Andrew Lunn
@ 2019-02-24 15:15                         ` Russell King - ARM Linux admin
  2019-02-24 15:28                           ` Heiner Kallweit
  0 siblings, 1 reply; 32+ messages in thread
From: Russell King - ARM Linux admin @ 2019-02-24 15:15 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Heiner Kallweit, Florian Fainelli, netdev

On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote:
> > I think what's not correct is that phydev->autoneg is set
> > (by phy_device_create) for a fixed link.
> 
> Fixed-link tries to emulate auto-neg:
> 
>                 bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
> 
> Maybe it needs better emulation of auto-neg?

Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3
(BMSR_ANEGCAPABLE).  In any case, 0.12 (BMCR_ANENABLE) is not set,
so according to 802.3-2015, we should not be setting 1.5
(BMSR_ANEGCOMPLETE).

However, swphy does try to emulate autonegotiation - we do have cases
where swphy is used in situations where the speed and duplex are not
fixed.  It returns an emulated link partner advertisement for the
current speed, which would suggest that we should set BMCR_ANENABLE.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 15:15                         ` Russell King - ARM Linux admin
@ 2019-02-24 15:28                           ` Heiner Kallweit
  2019-02-24 15:34                             ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-24 15:28 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Andrew Lunn; +Cc: Florian Fainelli, netdev

On 24.02.2019 16:15, Russell King - ARM Linux admin wrote:
> On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote:
>>> I think what's not correct is that phydev->autoneg is set
>>> (by phy_device_create) for a fixed link.
>>
>> Fixed-link tries to emulate auto-neg:
>>
>>                 bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
>>
>> Maybe it needs better emulation of auto-neg?
> 
> Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3
> (BMSR_ANEGCAPABLE).  In any case, 0.12 (BMCR_ANENABLE) is not set,
> so according to 802.3-2015, we should not be setting 1.5
> (BMSR_ANEGCOMPLETE).
> 
> However, swphy does try to emulate autonegotiation - we do have cases
> where swphy is used in situations where the speed and duplex are not
> fixed.  It returns an emulated link partner advertisement for the
> current speed, which would suggest that we should set BMCR_ANENABLE.
> 
If we emulate auto-neg, then it's not needed to set the speed bits
in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT
don't support forced mode. So we may have to go with auto-neg.
To avoid the original issue it should be sufficient to copy
supported -> advertising at a suited place.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-23 23:42                   ` Andrew Lunn
  2019-02-24  9:10                     ` Heiner Kallweit
@ 2019-02-24 15:31                     ` Russell King - ARM Linux admin
  2019-02-24 17:28                       ` Andrew Lunn
  1 sibling, 1 reply; 32+ messages in thread
From: Russell King - ARM Linux admin @ 2019-02-24 15:31 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Heiner Kallweit, Florian Fainelli, netdev

On Sun, Feb 24, 2019 at 12:42:35AM +0100, Andrew Lunn wrote:
> Looking forward, at some point we are going to have to make fixed-link
> support higher speeds. That probably means we need a swphy-c45 which
> emulates the standard registers for 2.5G, 5G and 10G. At that point
> genphy will not work...

Do we _need_ to emulate Clause 45 PHYs?  Today, the MII interface does
not work with Clause 45 PHYs, so there is no userspace accessing these
PHYs out there.  I have a patch that adds support for the MII ioctls
which gives me the ability to poke around in the 88x3310 for
investigatory / debug purposes.

However, I don't see the point of adding what would be very complex
Clause 45 support (we'd have to emulate the PMA/PMD, PCS and AN as a
minimum) when we have the ethtool API, and then we have the issue
that not all advertisement modes are standardised in Clause 45 -
there are some missing.

As I understand it, the swphy for fixed-links only exists because we
have existing tooling that expects Clause 22 PHYs to exist.  This will
break even if we were to add Clause 45 support as many bits in the
first 16 registers have different meanings.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 15:28                           ` Heiner Kallweit
@ 2019-02-24 15:34                             ` Russell King - ARM Linux admin
  2019-02-24 15:39                               ` Heiner Kallweit
  0 siblings, 1 reply; 32+ messages in thread
From: Russell King - ARM Linux admin @ 2019-02-24 15:34 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Andrew Lunn, Florian Fainelli, netdev

On Sun, Feb 24, 2019 at 04:28:32PM +0100, Heiner Kallweit wrote:
> On 24.02.2019 16:15, Russell King - ARM Linux admin wrote:
> > On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote:
> >>> I think what's not correct is that phydev->autoneg is set
> >>> (by phy_device_create) for a fixed link.
> >>
> >> Fixed-link tries to emulate auto-neg:
> >>
> >>                 bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
> >>
> >> Maybe it needs better emulation of auto-neg?
> > 
> > Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3
> > (BMSR_ANEGCAPABLE).  In any case, 0.12 (BMCR_ANENABLE) is not set,
> > so according to 802.3-2015, we should not be setting 1.5
> > (BMSR_ANEGCOMPLETE).
> > 
> > However, swphy does try to emulate autonegotiation - we do have cases
> > where swphy is used in situations where the speed and duplex are not
> > fixed.  It returns an emulated link partner advertisement for the
> > current speed, which would suggest that we should set BMCR_ANENABLE.
> > 
> If we emulate auto-neg, then it's not needed to set the speed bits
> in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT
> don't support forced mode. So we may have to go with auto-neg.

Sure.

> To avoid the original issue it should be sufficient to copy
> supported -> advertising at a suited place.

Why bother - the software PHY emulation is an emulation to allow
existing userspace that pre-dates the ethtool API to get some link
parameters.  If we augment the PHY emulation in non-standard ways,
userspace will need to be updated to handle those non-standard
ways.  If userspace needs to be updated, why not just bite the
bullet and update to ethtool APIs rather than adding more
complication through an emulation layer?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 15:34                             ` Russell King - ARM Linux admin
@ 2019-02-24 15:39                               ` Heiner Kallweit
  2019-02-24 15:49                                 ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-24 15:39 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: Andrew Lunn, Florian Fainelli, netdev

On 24.02.2019 16:34, Russell King - ARM Linux admin wrote:
> On Sun, Feb 24, 2019 at 04:28:32PM +0100, Heiner Kallweit wrote:
>> On 24.02.2019 16:15, Russell King - ARM Linux admin wrote:
>>> On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote:
>>>>> I think what's not correct is that phydev->autoneg is set
>>>>> (by phy_device_create) for a fixed link.
>>>>
>>>> Fixed-link tries to emulate auto-neg:
>>>>
>>>>                 bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
>>>>
>>>> Maybe it needs better emulation of auto-neg?
>>>
>>> Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3
>>> (BMSR_ANEGCAPABLE).  In any case, 0.12 (BMCR_ANENABLE) is not set,
>>> so according to 802.3-2015, we should not be setting 1.5
>>> (BMSR_ANEGCOMPLETE).
>>>
>>> However, swphy does try to emulate autonegotiation - we do have cases
>>> where swphy is used in situations where the speed and duplex are not
>>> fixed.  It returns an emulated link partner advertisement for the
>>> current speed, which would suggest that we should set BMCR_ANENABLE.
>>>
>> If we emulate auto-neg, then it's not needed to set the speed bits
>> in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT
>> don't support forced mode. So we may have to go with auto-neg.
> 
> Sure.
> 
>> To avoid the original issue it should be sufficient to copy
>> supported -> advertising at a suited place.
> 
Sorry, seems this wasn't clear enough. I don't mean to change
swphy but the user side.

> Why bother - the software PHY emulation is an emulation to allow
> existing userspace that pre-dates the ethtool API to get some link
> parameters.  If we augment the PHY emulation in non-standard ways,
> userspace will need to be updated to handle those non-standard
> ways.  If userspace needs to be updated, why not just bite the
> bullet and update to ethtool APIs rather than adding more
> complication through an emulation layer?
> 
It's not only userspace. Based on my limited knowledge of DSA this
code also uses e.g. genphy_read_status() with a fixed link.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 15:39                               ` Heiner Kallweit
@ 2019-02-24 15:49                                 ` Russell King - ARM Linux admin
  2019-02-24 16:32                                   ` Florian Fainelli
  0 siblings, 1 reply; 32+ messages in thread
From: Russell King - ARM Linux admin @ 2019-02-24 15:49 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Andrew Lunn, Florian Fainelli, netdev

On Sun, Feb 24, 2019 at 04:39:30PM +0100, Heiner Kallweit wrote:
> On 24.02.2019 16:34, Russell King - ARM Linux admin wrote:
> > On Sun, Feb 24, 2019 at 04:28:32PM +0100, Heiner Kallweit wrote:
> >> On 24.02.2019 16:15, Russell King - ARM Linux admin wrote:
> >>> On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote:
> >>>>> I think what's not correct is that phydev->autoneg is set
> >>>>> (by phy_device_create) for a fixed link.
> >>>>
> >>>> Fixed-link tries to emulate auto-neg:
> >>>>
> >>>>                 bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
> >>>>
> >>>> Maybe it needs better emulation of auto-neg?
> >>>
> >>> Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3
> >>> (BMSR_ANEGCAPABLE).  In any case, 0.12 (BMCR_ANENABLE) is not set,
> >>> so according to 802.3-2015, we should not be setting 1.5
> >>> (BMSR_ANEGCOMPLETE).
> >>>
> >>> However, swphy does try to emulate autonegotiation - we do have cases
> >>> where swphy is used in situations where the speed and duplex are not
> >>> fixed.  It returns an emulated link partner advertisement for the
> >>> current speed, which would suggest that we should set BMCR_ANENABLE.
> >>>
> >> If we emulate auto-neg, then it's not needed to set the speed bits
> >> in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT
> >> don't support forced mode. So we may have to go with auto-neg.
> > 
> > Sure.
> > 
> >> To avoid the original issue it should be sufficient to copy
> >> supported -> advertising at a suited place.
> > 
> Sorry, seems this wasn't clear enough. I don't mean to change
> swphy but the user side.
> 
> > Why bother - the software PHY emulation is an emulation to allow
> > existing userspace that pre-dates the ethtool API to get some link
> > parameters.  If we augment the PHY emulation in non-standard ways,
> > userspace will need to be updated to handle those non-standard
> > ways.  If userspace needs to be updated, why not just bite the
> > bullet and update to ethtool APIs rather than adding more
> > complication through an emulation layer?
> > 
> It's not only userspace. Based on my limited knowledge of DSA this
> code also uses e.g. genphy_read_status() with a fixed link.

DSA has support for phylink, which is perfectly capable of supporting
fixed links without using fixed-phy.c, although we have no way to
create that without DT.  Support could be added for non-DT though.

That would avoid using phylib functions to read back from a fixed-link
PHY.  Since phylink presents to the MAC a fixed link in the same
abstract manner as a real PHY, it should result in a more elegant
implementation.

DSA already has phylink support to support SFPs.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 15:49                                 ` Russell King - ARM Linux admin
@ 2019-02-24 16:32                                   ` Florian Fainelli
  2019-02-24 17:04                                     ` Andrew Lunn
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Fainelli @ 2019-02-24 16:32 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Heiner Kallweit; +Cc: Andrew Lunn, netdev

Le 2/24/19 à 7:49 AM, Russell King - ARM Linux admin a écrit :
> On Sun, Feb 24, 2019 at 04:39:30PM +0100, Heiner Kallweit wrote:
>> On 24.02.2019 16:34, Russell King - ARM Linux admin wrote:
>>> On Sun, Feb 24, 2019 at 04:28:32PM +0100, Heiner Kallweit wrote:
>>>> On 24.02.2019 16:15, Russell King - ARM Linux admin wrote:
>>>>> On Sun, Feb 24, 2019 at 04:04:03PM +0100, Andrew Lunn wrote:
>>>>>>> I think what's not correct is that phydev->autoneg is set
>>>>>>> (by phy_device_create) for a fixed link.
>>>>>>
>>>>>> Fixed-link tries to emulate auto-neg:
>>>>>>
>>>>>>                 bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
>>>>>>
>>>>>> Maybe it needs better emulation of auto-neg?
>>>>>
>>>>> Or maybe it needs to represent a fixed-speed PHY by clearing bit 1.3
>>>>> (BMSR_ANEGCAPABLE).  In any case, 0.12 (BMCR_ANENABLE) is not set,
>>>>> so according to 802.3-2015, we should not be setting 1.5
>>>>> (BMSR_ANEGCOMPLETE).
>>>>>
>>>>> However, swphy does try to emulate autonegotiation - we do have cases
>>>>> where swphy is used in situations where the speed and duplex are not
>>>>> fixed.  It returns an emulated link partner advertisement for the
>>>>> current speed, which would suggest that we should set BMCR_ANENABLE.
>>>>>
>>>> If we emulate auto-neg, then it's not needed to set the speed bits
>>>> in BMCR. Also what just comes to my mind, certain speeds like 1000BaseT
>>>> don't support forced mode. So we may have to go with auto-neg.
>>>
>>> Sure.
>>>
>>>> To avoid the original issue it should be sufficient to copy
>>>> supported -> advertising at a suited place.
>>>
>> Sorry, seems this wasn't clear enough. I don't mean to change
>> swphy but the user side.
>>
>>> Why bother - the software PHY emulation is an emulation to allow
>>> existing userspace that pre-dates the ethtool API to get some link
>>> parameters.  If we augment the PHY emulation in non-standard ways,
>>> userspace will need to be updated to handle those non-standard
>>> ways.  If userspace needs to be updated, why not just bite the
>>> bullet and update to ethtool APIs rather than adding more
>>> complication through an emulation layer?
>>>
>> It's not only userspace. Based on my limited knowledge of DSA this
>> code also uses e.g. genphy_read_status() with a fixed link.
> 
> DSA has support for phylink, which is perfectly capable of supporting
> fixed links without using fixed-phy.c, although we have no way to
> create that without DT.  Support could be added for non-DT though.

I had a branch at some point that provided feature parity with what DT
registration is capable of doing, including representing links between
switches etc [1]. I am not entirely sure it makes to support such a
configuration for platform devices, since they are not so many these
days that would not use DT (except x86 maybe?). It would make sense to
support ACPI-based registration for PHYLINK, if there was a standard for
describing PHYs, SFP/SFFs which is not apparently the case.

[1]: https://github.com/ffainelli/linux/commits/dsa-lbk-pdata2

> 
> That would avoid using phylib functions to read back from a fixed-link
> PHY.  Since phylink presents to the MAC a fixed link in the same
> abstract manner as a real PHY, it should result in a more elegant
> implementation.
> 
> DSA already has phylink support to support SFPs.
> 

The added difficulty here and the reason why Andrew went with the
approach that is used by the code currently is because neither do the
CPU or DSA ports are backed by a net_device. It is somewhere on my TODO
to permit the use of PHYLINK without the need of a net_device to cover
those specific DSA cases unless we just brute force the whole thing and
allocate a net_device structure but not register that net_device? Yes in
fact, why don't we do that?
-- 
Florian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 16:32                                   ` Florian Fainelli
@ 2019-02-24 17:04                                     ` Andrew Lunn
  2019-02-24 21:26                                       ` Florian Fainelli
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-24 17:04 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Russell King - ARM Linux admin, Heiner Kallweit, netdev

> The added difficulty here and the reason why Andrew went with the
> approach that is used by the code currently is because neither do the
> CPU or DSA ports are backed by a net_device. It is somewhere on my TODO
> to permit the use of PHYLINK without the need of a net_device to cover
> those specific DSA cases unless we just brute force the whole thing and
> allocate a net_device structure but not register that net_device? Yes in
> fact, why don't we do that?

Hi Florian

At the moment, we are using a phydev which is not connected to a
MAC. That is rather odd, but the phylib maintainers mostly know about
this, and keep an eye out for changes which might break any
assumptions. And the phylib API is quite small.

How many assumptions are going to break with a netdev which is not
registered? The API is much bigger, more people hack on it, and it is
going to be much harder to review changes to make sure assumptions are
not changed.

If we are going to do something odd, we should keep the scope as small
as possible.

   Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 15:31                     ` Russell King - ARM Linux admin
@ 2019-02-24 17:28                       ` Andrew Lunn
  2019-02-24 19:41                         ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Lunn @ 2019-02-24 17:28 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: Heiner Kallweit, Florian Fainelli, netdev

On Sun, Feb 24, 2019 at 03:31:26PM +0000, Russell King - ARM Linux admin wrote:
> On Sun, Feb 24, 2019 at 12:42:35AM +0100, Andrew Lunn wrote:
> > Looking forward, at some point we are going to have to make fixed-link
> > support higher speeds. That probably means we need a swphy-c45 which
> > emulates the standard registers for 2.5G, 5G and 10G. At that point
> > genphy will not work...
> 
> Do we _need_ to emulate Clause 45 PHYs?

Hi Russell

One use case would be a mv88e6390X port 9 or 10 connected to a SoC
which can only do 2.5G.

We have defined that DSA drivers should configure CPU and DSA ports to
their maximum speed. So if port 9 or 10 is used, it should be
configured to 10G.

We then need some way to reconfigure the MAC to a slower speed.

We need to do this with ZII boards. We do this with:

                                        port@0 {
                                                reg = <0>;
                                                label = "cpu";
                                                ethernet = <&fec1>;

                                                fixed-link {
                                                        speed = <100>;
                                                        full-duplex;
                                                };
                                        };

The FEC can only do 100Mbs, but the switch defaults to 1G. So the
fixed link it used to tell the switch MAC to use 100/Full.

In the example of the 6390X, we would want to set the link speed to
2500, which we cannot do at the moment. Either we need fixed-link to
support higher speeds, or we need a different mechanism.

We also have a similar issue on the SoC side. The FEC has no PHY
connected to it. It needs to be told what speed to do:

&fec1 {
        phy-mode = "rmii";
        pinctrl-names = "default";
        pinctrl-0 = <&pinctrl_fec1>;
        status = "okay";

        fixed-link {
                   speed = <100>;
                   full-duplex;
        };

In the case of a SoC with an interface which can do 2.5G, you need to
tell it to do 2.5G. Ideally we want a mechanism that allows a MAC to
'see' a PHY operating at 2.5G using the standard phylib/phylink API.

In the past this has been achieved with an emulated PHY. But so long
as the phydev/phylink structure has the correct values, it does not
matter how they get those values.

      Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 17:28                       ` Andrew Lunn
@ 2019-02-24 19:41                         ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 32+ messages in thread
From: Russell King - ARM Linux admin @ 2019-02-24 19:41 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Heiner Kallweit, Florian Fainelli, netdev

On Sun, Feb 24, 2019 at 06:28:48PM +0100, Andrew Lunn wrote:
> On Sun, Feb 24, 2019 at 03:31:26PM +0000, Russell King - ARM Linux admin wrote:
> > On Sun, Feb 24, 2019 at 12:42:35AM +0100, Andrew Lunn wrote:
> > > Looking forward, at some point we are going to have to make fixed-link
> > > support higher speeds. That probably means we need a swphy-c45 which
> > > emulates the standard registers for 2.5G, 5G and 10G. At that point
> > > genphy will not work...
> > 
> > Do we _need_ to emulate Clause 45 PHYs?
> 
> Hi Russell
> 
> One use case would be a mv88e6390X port 9 or 10 connected to a SoC
> which can only do 2.5G.
> 
> We have defined that DSA drivers should configure CPU and DSA ports to
> their maximum speed. So if port 9 or 10 is used, it should be
> configured to 10G.
> 
> We then need some way to reconfigure the MAC to a slower speed.
> 
> We need to do this with ZII boards. We do this with:
> 
>                                         port@0 {
>                                                 reg = <0>;
>                                                 label = "cpu";
>                                                 ethernet = <&fec1>;
> 
>                                                 fixed-link {
>                                                         speed = <100>;
>                                                         full-duplex;
>                                                 };
>                                         };
> 
> The FEC can only do 100Mbs, but the switch defaults to 1G. So the
> fixed link it used to tell the switch MAC to use 100/Full.
> 
> In the example of the 6390X, we would want to set the link speed to
> 2500, which we cannot do at the moment. Either we need fixed-link to
> support higher speeds, or we need a different mechanism.
> 
> We also have a similar issue on the SoC side. The FEC has no PHY
> connected to it. It needs to be told what speed to do:
> 
> &fec1 {
>         phy-mode = "rmii";
>         pinctrl-names = "default";
>         pinctrl-0 = <&pinctrl_fec1>;
>         status = "okay";
> 
>         fixed-link {
>                    speed = <100>;
>                    full-duplex;
>         };
> 
> In the case of a SoC with an interface which can do 2.5G, you need to
> tell it to do 2.5G. Ideally we want a mechanism that allows a MAC to
> 'see' a PHY operating at 2.5G using the standard phylib/phylink API.
> 
> In the past this has been achieved with an emulated PHY. But so long
> as the phydev/phylink structure has the correct values, it does not
> matter how they get those values.

You'll get a mac_config() with the mode set to MLO_AN_FIXED.

state->speed set to whatever was set in the "speed =" and state->duplex
set according to the duplex property.  If pause modes are also defined,
you'll get those through state->pause MLO_PAUSE_SYM and MLO_PAUSE_ASYM,
but there won't be any resolution of those modes since a fixed link
doesn't know the other ends properties, and of course AN is disabled.

Apart from the pause modes, the information passed to the MAC for a
fixed link is exactly the same as if it came from a PHY in phylib,
which basically means the MAC side doesn't actually need to care
whether it has a PHY or fixed-link - but without the complication of
needing phylib to provide an emulated PHY to the network driver.

phylink does provide an emulated PHY for the mii ioctls since that is
an established interface for speeds <= 1G.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 17:04                                     ` Andrew Lunn
@ 2019-02-24 21:26                                       ` Florian Fainelli
  2019-02-24 21:42                                         ` Heiner Kallweit
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Fainelli @ 2019-02-24 21:26 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King - ARM Linux admin, Heiner Kallweit, netdev



On February 24, 2019 9:04:55 AM PST, Andrew Lunn <andrew@lunn.ch> wrote:
>> The added difficulty here and the reason why Andrew went with the
>> approach that is used by the code currently is because neither do the
>> CPU or DSA ports are backed by a net_device. It is somewhere on my
>TODO
>> to permit the use of PHYLINK without the need of a net_device to
>cover
>> those specific DSA cases unless we just brute force the whole thing
>and
>> allocate a net_device structure but not register that net_device? Yes
>in
>> fact, why don't we do that?
>
>Hi Florian
>
>At the moment, we are using a phydev which is not connected to a
>MAC. That is rather odd, but the phylib maintainers mostly know about
>this, and keep an eye out for changes which might break any
>assumptions. And the phylib API is quite small.

I would argue that this very thread is a proof against your argument since we all failed to predict that Heiner's changes would change those assumptions. Having a certain of assumptions is fine but given all the recent PHYLIB helpers that have been added I am not sure how well that will scale.

>
>How many assumptions are going to break with a netdev which is not
>registered? The API is much bigger, more people hack on it, and it is
>going to be much harder to review changes to make sure assumptions are
>not changed.

A non registered net_device appears easier to manage and debug since there is state tracking all over the network stack for those cases.

>
>If we are going to do something odd, we should keep the scope as small
>as possible.

Hence my suggestion to allocate a dummy net_device object just so calls to netif_carrier_{on,off} (and possibly more in the future) do nothing. I don't think that teaching either PHYLIB or PHYLINK about a NULL net_device is going to scale really well over time nor make it easier  for respective maintainers. If we make the net_device optional, it will be harder to review changes as well as make sure that we do not create locking/object interactions  issues.

Another approach could be to define a minimal network port object (struct devlink, maybe?) which could be used independently from a net_device, or a lightweight net_device with no visibility into existing namespaces. None of these ideas are new though and would probably require several cycles to get done right.

Heiner, Russell, which approach would you take?
-- 
Florian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: No traffic with Marvell switch and latest linux-next
  2019-02-24 21:26                                       ` Florian Fainelli
@ 2019-02-24 21:42                                         ` Heiner Kallweit
  0 siblings, 0 replies; 32+ messages in thread
From: Heiner Kallweit @ 2019-02-24 21:42 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn; +Cc: Russell King - ARM Linux admin, netdev

On 24.02.2019 22:26, Florian Fainelli wrote:
> 
> 
> On February 24, 2019 9:04:55 AM PST, Andrew Lunn <andrew@lunn.ch> wrote:
>>> The added difficulty here and the reason why Andrew went with the
>>> approach that is used by the code currently is because neither do the
>>> CPU or DSA ports are backed by a net_device. It is somewhere on my
>> TODO
>>> to permit the use of PHYLINK without the need of a net_device to
>> cover
>>> those specific DSA cases unless we just brute force the whole thing
>> and
>>> allocate a net_device structure but not register that net_device? Yes
>> in
>>> fact, why don't we do that?
>>
>> Hi Florian
>>
>> At the moment, we are using a phydev which is not connected to a
>> MAC. That is rather odd, but the phylib maintainers mostly know about
>> this, and keep an eye out for changes which might break any
>> assumptions. And the phylib API is quite small.
> 
> I would argue that this very thread is a proof against your argument since we all failed to predict that Heiner's changes would change those assumptions. Having a certain of assumptions is fine but given all the recent PHYLIB helpers that have been added I am not sure how well that will scale.
> 
>>
>> How many assumptions are going to break with a netdev which is not
>> registered? The API is much bigger, more people hack on it, and it is
>> going to be much harder to review changes to make sure assumptions are
>> not changed.
> 
> A non registered net_device appears easier to manage and debug since there is state tracking all over the network stack for those cases.
> 
>>
>> If we are going to do something odd, we should keep the scope as small
>> as possible.
> 
> Hence my suggestion to allocate a dummy net_device object just so calls to netif_carrier_{on,off} (and possibly more in the future) do nothing. I don't think that teaching either PHYLIB or PHYLINK about a NULL net_device is going to scale really well over time nor make it easier  for respective maintainers. If we make the net_device optional, it will be harder to review changes as well as make sure that we do not create locking/object interactions  issues.
> 
> Another approach could be to define a minimal network port object (struct devlink, maybe?) which could be used independently from a net_device, or a lightweight net_device with no visibility into existing namespaces. None of these ideas are new though and would probably require several cycles to get done right.
> 
> Heiner, Russell, which approach would you take?
> 
Given my 2 days of experience with DSA (feels like 3 already) I would like to spend few more minutes on thinking before I answer.

Heiner

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-02-24 21:42 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-17 15:34 No traffic with Marvell switch and latest linux-next Heiner Kallweit
2019-02-17 15:40 ` Russell King - ARM Linux admin
2019-02-17 15:50   ` Heiner Kallweit
2019-02-17 16:40     ` Heiner Kallweit
2019-02-17 16:57       ` Andrew Lunn
2019-02-17 17:06         ` Heiner Kallweit
2019-02-17 17:10           ` Andrew Lunn
2019-02-18 18:16             ` Heiner Kallweit
2019-02-18 18:21               ` Andrew Lunn
2019-02-23 21:48                 ` Heiner Kallweit
2019-02-23 23:42                   ` Andrew Lunn
2019-02-24  9:10                     ` Heiner Kallweit
2019-02-24 15:04                       ` Andrew Lunn
2019-02-24 15:15                         ` Russell King - ARM Linux admin
2019-02-24 15:28                           ` Heiner Kallweit
2019-02-24 15:34                             ` Russell King - ARM Linux admin
2019-02-24 15:39                               ` Heiner Kallweit
2019-02-24 15:49                                 ` Russell King - ARM Linux admin
2019-02-24 16:32                                   ` Florian Fainelli
2019-02-24 17:04                                     ` Andrew Lunn
2019-02-24 21:26                                       ` Florian Fainelli
2019-02-24 21:42                                         ` Heiner Kallweit
2019-02-24 15:31                     ` Russell King - ARM Linux admin
2019-02-24 17:28                       ` Andrew Lunn
2019-02-24 19:41                         ` Russell King - ARM Linux admin
2019-02-23 23:55                   ` Florian Fainelli
2019-02-17 15:45 ` Andrew Lunn
2019-02-17 15:48   ` Heiner Kallweit
2019-02-17 15:57     ` Andrew Lunn
2019-02-17 16:01       ` Heiner Kallweit
2019-02-17 15:51 ` Andrew Lunn
2019-02-17 15:55   ` Heiner Kallweit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.