linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Multi-PHYs and multiple-ports bonding support
@ 2022-10-17  8:51 Maxime Chevallier
  2022-10-17  9:24 ` Russell King (Oracle)
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Maxime Chevallier @ 2022-10-17  8:51 UTC (permalink / raw)
  To: netdev, linux-arm-kernel, Thomas Petazzoni, Antoine Tenart,
	David S. Miller, Heiner Kallweit, Florian Fainelli,
	Vivien Didelot, Andrew Lunn, Russell King - ARM Linux admin,
	Tobias Waldekranz, Oleksij Rempel, Jakub Kicinski

Hello everyone,

I'm reaching out to discuss a PHY topic that we would like to see
upstreamed, to support multiple ports attached to a MAC.

The end-goal is to achieve some redundancy in case of a physical link
interruption, in a transparent manner, but using only one network
interface (1 MAC).

We've been made aware that some products in the wild propose this
feature-set, using 2 PHYs connected to the same MAC, using some custom
logic to switch back and forth between the 2 PHYs, and that's the main
use-case we'd like to see supported :

                            +-------+
                     /----- |  PHY  | --- BaseT port
 +-------+           |      +-------+
 |  MAC  |-- RGMII --|
 +-------+           |      +-------+
                     \----- |  PHY  | --- BaseT port
                            +-------+

This configuration comes with quite a lot of challenges since we bend
the existing standards in numerous ways :

 - We have 2 PHYs on the same xMII bus, and they can't be active on that
   bus at the same time. To solve that, we have 2 strategies: 
 
   - Put the PHY in isolate mode when not in use, they can perform link
     detection and reporting, but wont communicate on the MII bus.
     This can have side effects if both links are connected to the same
     network, which can be addressed through the use of gratuitous ARPs
     to make sure the right link gets known by the spanning-tree.
     
   - Put PHY down entirely when not is use, select an active PHY, and
   when the link goes down on that PHY, switch to the other. This was
   used on products that had PHYs were the isolate mode is broken.
     
Upstream, we have one device that does something a bit similar, which is
the macchiatobin, using the 88x3310 PHY. This PHY exports both an SFP
interface as long as a copper BaseT interface. These 2 interfaces are
connected to the same MAC and are mutually exclusive.

It looks like this :

 +-------+              +---------+   |---- Copper BaseT
 |  MAC  | -- xxxMII -- |   PHY   |---|
 +-------+              +---------+   |---- SFP
 
We don't have any way to control which port gets used, the first that
has the link gets the link.

Ideally we would like to be able to configure every aspects of these
2 cases, like :
 - Which link do we use
 - Do we switch automatically from one to the other
 - What are the links available

I see 4 different aspects of this that would need to be added for this
whole mechanism to work :

1) DT representation

To support that, we would need a way to give knowledge to the kernel
about the numer of physical ports that are connected to a given MAC.
In the dual-phy mode, it's pretty straightforward, since we would
"just" need to pass multiple phy handles to the mac node. In the MCBin
case, it's a bit more complex, since we don't have a clear view on the
number of ports connected to a given phy.

The assumption is that we have only one port per phy, and it's nature is
derived from the presence of an sfp=<> phandle in the DT, plus the
driver itself specifying the phydev->port field (which to my knowledge
isn't used that much ?)

The subject of describing the ports a PHY exposes in a sensible way that
doesn't require changing all DTs out-there has been discussed in the
past here :
https://lore.kernel.org/netdev/20201119152246.085514e1@bootlin.com/

If we only focus on the dual-phy use-case - and not the single-phy
dual-port - we might not have to deal with extensive DT changes at all.

2) Changes in Phylink

This might be the tricky part, as we need to track several ports,
possibly connected to different PHYs, to get their state. For now, I
haven't prototyped any of this yet.

The goal would be to allow either automatic switching, as is already
done by the 3310 driver, but at a higher level. Phylink might not be the
right place to do that, so maybe we just want to expose an API to get
the possible ports on a given interface, their repective state, and a
way to select one

My idea would be to introduce a notion of a struct phy_port, that would
describe a physical port. They would be controlled by a PHY (or a MAC,
if the mac outputs 1000BaseX for example), one phy can
possibly control multiple ports.

The whole link redundancy would then be done manipulating ports, giving
a layer of abstraction on the hardware topology itself.

We would therefore abstract the logic by having :
                         +--------+
                     /---|  Port  |
 +-------------+     |   +--------+
 |  netdevice  | ----|
 +-------------+     |
                     |   +---------+
                     \---|  Port   |
                         +---------+

This is the representation the userspace would know about, without
necessarily having to worry about the phys inbetween.

I don't see that as a breaking change, since as of today, most systems
only have one port per netdevice. We would need to add a way to deal
with multiple ports per netdevice.

3) Adding a L2 bonding driver

If the link switching logic is deported outside of phylink, we might
want a generic way of bonding ports on an interface, configuring the
policy to use for the switching (automatic, manual selection, maybe
more like trying to elect the link with the highest speed ?). This is
where we would handle sending the gratuitous ARPs upon link switching
too.

3) UAPI

From userspace, we would need ways to list the ports, their state, and
possibly to configure the bonding parameters. for now in ethtool, we
don't have the notion of port at all, we just have 1 netdevice == 1
port. Should we therefore create one netdevice per port ? or stick to
that one interface and refer to its ports with some ethtool parameters ?

All of these are open questions, as this topic spans quite a lot of
aspects in the stack. Any input, idea, comment, are very very welcome.

Thanks,

Maxime

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multi-PHYs and multiple-ports bonding support
  2022-10-17  8:51 Multi-PHYs and multiple-ports bonding support Maxime Chevallier
@ 2022-10-17  9:24 ` Russell King (Oracle)
  2022-10-17 13:03   ` Andrew Lunn
  2022-10-18  8:02   ` Maxime Chevallier
  2022-10-17  9:45 ` Jiri Pirko
  2022-10-17 10:03 ` Oleksij Rempel
  2 siblings, 2 replies; 9+ messages in thread
From: Russell King (Oracle) @ 2022-10-17  9:24 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: netdev, linux-arm-kernel, Thomas Petazzoni, Antoine Tenart,
	David S. Miller, Heiner Kallweit, Florian Fainelli,
	Vivien Didelot, Andrew Lunn, Tobias Waldekranz, Oleksij Rempel,
	Jakub Kicinski

On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier wrote:
> 2) Changes in Phylink
> 
> This might be the tricky part, as we need to track several ports,
> possibly connected to different PHYs, to get their state. For now, I
> haven't prototyped any of this yet.

The problem is _way_ larger than phylink. It's a fundamental throughout
the net layer that there is one-PHY to one-MAC relationship. Phylink
just adopts this because it is the established norm, and trying to fix
it is rather rediculous without a use case.

See code such as the ethtool code, where the MAC and associated layers
are# entirely bypassed with all the PHY-accessing ethtool commands and
the commands are passed directly to phylib for the PHY registered
against the netdev.

We do have use cases though - consider a setup such as the mcbin with
the 3310 in SGMII mode on the fibre link and a copper PHY plugged in
with its own PHY - a stacked PHY situation (we don't support this
right now.) Which PHY should the MII ioctls, ethtool, and possibly the
PTP timestamp code be accessing with a copper SFP module plugged in?

This needs to be solved for your multi-PHY case, because you need to
deal with programming e.g. the link advertisement in both PHYs, not
just one - and with the above model, you have no choice which PHY gets
the call, it's always going to be the one registered with the netdev.

The point I'm making is that you're suggesting this is a phylink issue,
but it isn't, it's a generic networking layering bypass issue. If the
net code always forwarded the ethtool etc stuff to the MAC and let the
MAC make appropriate decisions about how these were handled, then we
would have a properly layered approach where each layer can decide
how a particular interface is implemented - to cope with situations
such as the one you describe.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multi-PHYs and multiple-ports bonding support
  2022-10-17  8:51 Multi-PHYs and multiple-ports bonding support Maxime Chevallier
  2022-10-17  9:24 ` Russell King (Oracle)
@ 2022-10-17  9:45 ` Jiri Pirko
  2022-10-17 10:03 ` Oleksij Rempel
  2 siblings, 0 replies; 9+ messages in thread
From: Jiri Pirko @ 2022-10-17  9:45 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: netdev, linux-arm-kernel, Thomas Petazzoni, Antoine Tenart,
	David S. Miller, Heiner Kallweit, Florian Fainelli,
	Vivien Didelot, Andrew Lunn, Russell King - ARM Linux admin,
	Tobias Waldekranz, Oleksij Rempel, Jakub Kicinski

Mon, Oct 17, 2022 at 10:51:00AM CEST, maxime.chevallier@bootlin.com wrote:

[...]


>3) UAPI
>
>From userspace, we would need ways to list the ports, their state, and
>possibly to configure the bonding parameters. for now in ethtool, we
>don't have the notion of port at all, we just have 1 netdevice == 1
>port. Should we therefore create one netdevice per port ? or stick to
>that one interface and refer to its ports with some ethtool parameters ?

I don't like the idea of having 1 netdev per port. Netdev represents
mostly the MAC entity, and there is only one.

[...]


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multi-PHYs and multiple-ports bonding support
  2022-10-17  8:51 Multi-PHYs and multiple-ports bonding support Maxime Chevallier
  2022-10-17  9:24 ` Russell King (Oracle)
  2022-10-17  9:45 ` Jiri Pirko
@ 2022-10-17 10:03 ` Oleksij Rempel
  2 siblings, 0 replies; 9+ messages in thread
From: Oleksij Rempel @ 2022-10-17 10:03 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: netdev, linux-arm-kernel, Thomas Petazzoni, Antoine Tenart,
	David S. Miller, Heiner Kallweit, Florian Fainelli,
	Vivien Didelot, Andrew Lunn, Russell King - ARM Linux admin,
	Tobias Waldekranz, Jakub Kicinski

Hi Maxime,

On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier wrote:
> Hello everyone,
> 
> I'm reaching out to discuss a PHY topic that we would like to see
> upstreamed, to support multiple ports attached to a MAC.
> 
> The end-goal is to achieve some redundancy in case of a physical link
> interruption, in a transparent manner, but using only one network
> interface (1 MAC).
> 
> We've been made aware that some products in the wild propose this
> feature-set, using 2 PHYs connected to the same MAC, using some custom
> logic to switch back and forth between the 2 PHYs, and that's the main
> use-case we'd like to see supported :
> 
>                             +-------+
>                      /----- |  PHY  | --- BaseT port
>  +-------+           |      +-------+
>  |  MAC  |-- RGMII --|
>  +-------+           |      +-------+
>                      \----- |  PHY  | --- BaseT port
>                             +-------+
> 

I can add more cases:

- case 1:
	Similar HW can be found in combination with AX88772B:
	https://cms.nacsemi.com/content/AuthDatasheets/ASIXS00048-1.pdf
	Page 6

	Current ASIX driver only takes care to power down internal PHY if
	external is present:
	https://elixir.bootlin.com/linux/latest/source/drivers/net/usb/asix_devices.c#L659
	But I can image some one wants to implement hot switching between
	internal PHY and external PHY or direct RMII connection too.

- case 2:
	A $CUSTOMER of us has a system where the RGMII from the MAC is routed
	via a analog multiplexer to a PHY or to an optional external
	board where the RGMII connects to the host port of a switch chip
	supported by DSA.

> This configuration comes with quite a lot of challenges since we bend
> the existing standards in numerous ways :
> 
>  - We have 2 PHYs on the same xMII bus, and they can't be active on that
>    bus at the same time. To solve that, we have 2 strategies: 
>  
>    - Put the PHY in isolate mode when not in use, they can perform link
>      detection and reporting, but wont communicate on the MII bus.
>      This can have side effects if both links are connected to the same
>      network, which can be addressed through the use of gratuitous ARPs
>      to make sure the right link gets known by the spanning-tree.

Can we "announce" topology change/reseting by switch the link state?
Usually, switches should drop forwarding entry for a port with down
state. But the problem with get complicated if there are multiple
bridges... :/

>    - Put PHY down entirely when not is use, select an active PHY, and
>    when the link goes down on that PHY, switch to the other. This was
>    used on products that had PHYs were the isolate mode is broken.

This is probably better way to go. I assume the use cases where this
kind of redundancy is used, it is preferable to to reduce weight, cost
and power consumption.

> Upstream, we have one device that does something a bit similar, which is
> the macchiatobin, using the 88x3310 PHY. This PHY exports both an SFP
> interface as long as a copper BaseT interface. These 2 interfaces are
> connected to the same MAC and are mutually exclusive.
> 
> It looks like this :
> 
>  +-------+              +---------+   |---- Copper BaseT
>  |  MAC  | -- xxxMII -- |   PHY   |---|
>  +-------+              +---------+   |---- SFP
>  
> We don't have any way to control which port gets used, the first that
> has the link gets the link.
> 
> Ideally we would like to be able to configure every aspects of these
> 2 cases, like :
>  - Which link do we use
>  - Do we switch automatically from one to the other
>  - What are the links available
> 
> I see 4 different aspects of this that would need to be added for this
> whole mechanism to work :
> 
> 1) DT representation
> 
> To support that, we would need a way to give knowledge to the kernel
> about the numer of physical ports that are connected to a given MAC.
> In the dual-phy mode, it's pretty straightforward, since we would
> "just" need to pass multiple phy handles to the mac node. In the MCBin
> case, it's a bit more complex, since we don't have a clear view on the
> number of ports connected to a given phy.
> 
> The assumption is that we have only one port per phy, and it's nature is
> derived from the presence of an sfp=<> phandle in the DT, plus the
> driver itself specifying the phydev->port field (which to my knowledge
> isn't used that much ?)
> 
> The subject of describing the ports a PHY exposes in a sensible way that
> doesn't require changing all DTs out-there has been discussed in the
> past here :
> https://lore.kernel.org/netdev/20201119152246.085514e1@bootlin.com/
> 
> If we only focus on the dual-phy use-case - and not the single-phy
> dual-port - we might not have to deal with extensive DT changes at all.
> 
> 2) Changes in Phylink
> 
> This might be the tricky part, as we need to track several ports,
> possibly connected to different PHYs, to get their state. For now, I
> haven't prototyped any of this yet.
> 
> The goal would be to allow either automatic switching, as is already
> done by the 3310 driver, but at a higher level. Phylink might not be the
> right place to do that, so maybe we just want to expose an API to get
> the possible ports on a given interface, their repective state, and a
> way to select one
> 
> My idea would be to introduce a notion of a struct phy_port, that would
> describe a physical port. They would be controlled by a PHY (or a MAC,
> if the mac outputs 1000BaseX for example), one phy can
> possibly control multiple ports.
> 
> The whole link redundancy would then be done manipulating ports, giving
> a layer of abstraction on the hardware topology itself.
> 
> We would therefore abstract the logic by having :
>                          +--------+
>                      /---|  Port  |
>  +-------------+     |   +--------+
>  |  netdevice  | ----|
>  +-------------+     |
>                      |   +---------+
>                      \---|  Port   |
>                          +---------+
> 
> This is the representation the userspace would know about, without
> necessarily having to worry about the phys inbetween.
> 
> I don't see that as a breaking change, since as of today, most systems
> only have one port per netdevice. We would need to add a way to deal
> with multiple ports per netdevice.
> 
> 3) Adding a L2 bonding driver
> 
> If the link switching logic is deported outside of phylink, we might
> want a generic way of bonding ports on an interface, configuring the
> policy to use for the switching (automatic, manual selection, maybe
> more like trying to elect the link with the highest speed ?). This is
> where we would handle sending the gratuitous ARPs upon link switching
> too.
> 
> 3) UAPI
> 
> From userspace, we would need ways to list the ports, their state, and
> possibly to configure the bonding parameters. for now in ethtool, we
> don't have the notion of port at all, we just have 1 netdevice == 1
> port. Should we therefore create one netdevice per port ? or stick to
> that one interface and refer to its ports with some ethtool parameters ?
> 
> All of these are open questions, as this topic spans quite a lot of
> aspects in the stack. Any input, idea, comment, are very very welcome.

What about this use case:

MAC with > 1 PHYs. One PHY is active, you want to do cable testing
and/or to check the signal quality with SQI. Both are triggered
currently via ethtool on an interface.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multi-PHYs and multiple-ports bonding support
  2022-10-17  9:24 ` Russell King (Oracle)
@ 2022-10-17 13:03   ` Andrew Lunn
  2022-10-18 11:45     ` Maxime Chevallier
  2022-10-18  8:02   ` Maxime Chevallier
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Lunn @ 2022-10-17 13:03 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Maxime Chevallier, netdev, linux-arm-kernel, Thomas Petazzoni,
	Antoine Tenart, David S. Miller, Heiner Kallweit,
	Florian Fainelli, Vivien Didelot, Tobias Waldekranz,
	Oleksij Rempel, Jakub Kicinski

On Mon, Oct 17, 2022 at 10:24:49AM +0100, Russell King (Oracle) wrote:
> On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier wrote:
> > 2) Changes in Phylink
> > 
> > This might be the tricky part, as we need to track several ports,
> > possibly connected to different PHYs, to get their state. For now, I
> > haven't prototyped any of this yet.
> 
> The problem is _way_ larger than phylink. It's a fundamental throughout
> the net layer that there is one-PHY to one-MAC relationship. Phylink
> just adopts this because it is the established norm, and trying to fix
> it is rather rediculous without a use case.
> 
> See code such as the ethtool code, where the MAC and associated layers
> are# entirely bypassed with all the PHY-accessing ethtool commands and
> the commands are passed directly to phylib for the PHY registered
> against the netdev.

We probably need to model the MII MUX. We can then have netdev->phydev
and netdev->sfp_bus point to the MUX, which then defers to the
currently active PHY/SFP for backwards compatibility. Additionally,
for netlink ethtool, we can add a new property which allows a specific
PHY/SFP hanging off the MUX to be addressed.

Modeling the MUX probably helps us with the overall architecture.  As
Maxime described, there are at least two different architectures: the
MUX is between the MAC and the PHYs, and the MUX is inside the PHY
between the host interface and the line interfaces. There are at least
4 PHYs like this.

We also have Russells problem of two PHYs on one path. It would be
nice to solve that at the same time, which the additional identifier
attribute should help solve.

I would probably start this work from the uAPI. How does the uAPI
work?

	Andrew

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multi-PHYs and multiple-ports bonding support
  2022-10-17  9:24 ` Russell King (Oracle)
  2022-10-17 13:03   ` Andrew Lunn
@ 2022-10-18  8:02   ` Maxime Chevallier
  2022-10-18  8:13     ` Russell King (Oracle)
  1 sibling, 1 reply; 9+ messages in thread
From: Maxime Chevallier @ 2022-10-18  8:02 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: netdev, linux-arm-kernel, Thomas Petazzoni, Antoine Tenart,
	David S. Miller, Heiner Kallweit, Florian Fainelli,
	Vivien Didelot, Andrew Lunn, Tobias Waldekranz, Oleksij Rempel,
	Jakub Kicinski, Jiri Pirko

Hello Russell,

On Mon, 17 Oct 2022 10:24:49 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier wrote:
> > 2) Changes in Phylink
> > 
> > This might be the tricky part, as we need to track several ports,
> > possibly connected to different PHYs, to get their state. For now, I
> > haven't prototyped any of this yet.  
> 
> The problem is _way_ larger than phylink. It's a fundamental
> throughout the net layer that there is one-PHY to one-MAC
> relationship. Phylink just adopts this because it is the established
> norm, and trying to fix it is rather rediculous without a use case.
> 
> See code such as the ethtool code, where the MAC and associated layers
> are# entirely bypassed with all the PHY-accessing ethtool commands and
> the commands are passed directly to phylib for the PHY registered
> against the netdev.
> 
> We do have use cases though - consider a setup such as the mcbin with
> the 3310 in SGMII mode on the fibre link and a copper PHY plugged in
> with its own PHY - a stacked PHY situation (we don't support this
> right now.) Which PHY should the MII ioctls, ethtool, and possibly the
> PTP timestamp code be accessing with a copper SFP module plugged in?
> 
> This needs to be solved for your multi-PHY case, because you need to
> deal with programming e.g. the link advertisement in both PHYs, not
> just one - and with the above model, you have no choice which PHY gets
> the call, it's always going to be the one registered with the netdev.
> 
> The point I'm making is that you're suggesting this is a phylink
> issue, but it isn't, it's a generic networking layering bypass issue.
> If the net code always forwarded the ethtool etc stuff to the MAC and
> let the MAC make appropriate decisions about how these were handled,
> then we would have a properly layered approach where each layer can
> decide how a particular interface is implemented - to cope with
> situations such as the one you describe.

I agree with all you say, and indeed this problem is a good opportunity
IMO to consider the other use-cases like the one you mention and come
up with a nice solution.

My intention was never to imply that this is a phylink issue. Quite the
contrary, what I'm saying is that phylink as it is would need to take
this into account, by extending it, with all the above-mentionned
use-cases.

When you mention that ethtool bypasses the MAC layer and talks to
phylib, since phylink has the overall view of the link, and abstracts
the phy away from the MAC, I would think this is a good place to
manage this tree of PHYs/ports, but on the other hand that's adding
quite a lot of complexity to phylink.

Maxime




_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multi-PHYs and multiple-ports bonding support
  2022-10-18  8:02   ` Maxime Chevallier
@ 2022-10-18  8:13     ` Russell King (Oracle)
  2022-10-18  9:20       ` Maxime Chevallier
  0 siblings, 1 reply; 9+ messages in thread
From: Russell King (Oracle) @ 2022-10-18  8:13 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: netdev, linux-arm-kernel, Thomas Petazzoni, Antoine Tenart,
	David S. Miller, Heiner Kallweit, Florian Fainelli,
	Vivien Didelot, Andrew Lunn, Tobias Waldekranz, Oleksij Rempel,
	Jakub Kicinski, Jiri Pirko

On Tue, Oct 18, 2022 at 10:02:05AM +0200, Maxime Chevallier wrote:
> Hello Russell,
> 
> On Mon, 17 Oct 2022 10:24:49 +0100
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier wrote:
> > > 2) Changes in Phylink
> > > 
> > > This might be the tricky part, as we need to track several ports,
> > > possibly connected to different PHYs, to get their state. For now, I
> > > haven't prototyped any of this yet.  
> > 
> > The problem is _way_ larger than phylink. It's a fundamental
> > throughout the net layer that there is one-PHY to one-MAC
> > relationship. Phylink just adopts this because it is the established
> > norm, and trying to fix it is rather rediculous without a use case.
> > 
> > See code such as the ethtool code, where the MAC and associated layers
> > are# entirely bypassed with all the PHY-accessing ethtool commands and
> > the commands are passed directly to phylib for the PHY registered
> > against the netdev.
> > 
> > We do have use cases though - consider a setup such as the mcbin with
> > the 3310 in SGMII mode on the fibre link and a copper PHY plugged in
> > with its own PHY - a stacked PHY situation (we don't support this
> > right now.) Which PHY should the MII ioctls, ethtool, and possibly the
> > PTP timestamp code be accessing with a copper SFP module plugged in?
> > 
> > This needs to be solved for your multi-PHY case, because you need to
> > deal with programming e.g. the link advertisement in both PHYs, not
> > just one - and with the above model, you have no choice which PHY gets
> > the call, it's always going to be the one registered with the netdev.
> > 
> > The point I'm making is that you're suggesting this is a phylink
> > issue, but it isn't, it's a generic networking layering bypass issue.
> > If the net code always forwarded the ethtool etc stuff to the MAC and
> > let the MAC make appropriate decisions about how these were handled,
> > then we would have a properly layered approach where each layer can
> > decide how a particular interface is implemented - to cope with
> > situations such as the one you describe.
> 
> I agree with all you say, and indeed this problem is a good opportunity
> IMO to consider the other use-cases like the one you mention and come
> up with a nice solution.

However, this isn't really "other use-cases" that I'm talking about
above, but a problem that needs solving for your case.

> When you mention that ethtool bypasses the MAC layer and talks to
> phylib, since phylink has the overall view of the link, and abstracts
> the phy away from the MAC, I would think this is a good place to
> manage this tree of PHYs/ports, but on the other hand that's adding
> quite a lot of complexity to phylink.

phylink doesn't abstract the PHY from the networking layer. What we
have are these call paths through the layers:

net --> mac --> phylink --> phy
 |                           ^
 `---------------------------'
      (bypass call path)

That bypass call path will be a problem as soon as you start talking
about having more than one PHY for a MAC.

Yes, changing phylink fixes some of the issues, but doesn't get away
from the fundamental issue that both the MAC and phylink are bypassed
for certain paths.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multi-PHYs and multiple-ports bonding support
  2022-10-18  8:13     ` Russell King (Oracle)
@ 2022-10-18  9:20       ` Maxime Chevallier
  0 siblings, 0 replies; 9+ messages in thread
From: Maxime Chevallier @ 2022-10-18  9:20 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: netdev, linux-arm-kernel, Thomas Petazzoni, Antoine Tenart,
	David S. Miller, Heiner Kallweit, Florian Fainelli,
	Vivien Didelot, Andrew Lunn, Tobias Waldekranz, Oleksij Rempel,
	Jakub Kicinski, Jiri Pirko

On Tue, 18 Oct 2022 09:13:14 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Tue, Oct 18, 2022 at 10:02:05AM +0200, Maxime Chevallier wrote:
> > Hello Russell,
> > 
> > On Mon, 17 Oct 2022 10:24:49 +0100
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> >   
> > > On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier
> > > wrote:  
> > > > 2) Changes in Phylink
> > > > 
> > > > This might be the tricky part, as we need to track several
> > > > ports, possibly connected to different PHYs, to get their
> > > > state. For now, I haven't prototyped any of this yet.    
> > > 
> > > The problem is _way_ larger than phylink. It's a fundamental
> > > throughout the net layer that there is one-PHY to one-MAC
> > > relationship. Phylink just adopts this because it is the
> > > established norm, and trying to fix it is rather rediculous
> > > without a use case.
> > > 
> > > See code such as the ethtool code, where the MAC and associated
> > > layers are# entirely bypassed with all the PHY-accessing ethtool
> > > commands and the commands are passed directly to phylib for the
> > > PHY registered against the netdev.
> > > 
> > > We do have use cases though - consider a setup such as the mcbin
> > > with the 3310 in SGMII mode on the fibre link and a copper PHY
> > > plugged in with its own PHY - a stacked PHY situation (we don't
> > > support this right now.) Which PHY should the MII ioctls,
> > > ethtool, and possibly the PTP timestamp code be accessing with a
> > > copper SFP module plugged in?
> > > 
> > > This needs to be solved for your multi-PHY case, because you need
> > > to deal with programming e.g. the link advertisement in both
> > > PHYs, not just one - and with the above model, you have no choice
> > > which PHY gets the call, it's always going to be the one
> > > registered with the netdev.
> > > 
> > > The point I'm making is that you're suggesting this is a phylink
> > > issue, but it isn't, it's a generic networking layering bypass
> > > issue. If the net code always forwarded the ethtool etc stuff to
> > > the MAC and let the MAC make appropriate decisions about how
> > > these were handled, then we would have a properly layered
> > > approach where each layer can decide how a particular interface
> > > is implemented - to cope with situations such as the one you
> > > describe.  
> > 
> > I agree with all you say, and indeed this problem is a good
> > opportunity IMO to consider the other use-cases like the one you
> > mention and come up with a nice solution.  
> 
> However, this isn't really "other use-cases" that I'm talking about
> above, but a problem that needs solving for your case.
> 
> > When you mention that ethtool bypasses the MAC layer and talks to
> > phylib, since phylink has the overall view of the link, and
> > abstracts the phy away from the MAC, I would think this is a good
> > place to manage this tree of PHYs/ports, but on the other hand
> > that's adding quite a lot of complexity to phylink.  
> 
> phylink doesn't abstract the PHY from the networking layer. What we
> have are these call paths through the layers:
> 
> net --> mac --> phylink --> phy
>  |                           ^
>  `---------------------------'
>       (bypass call path)
> 
> That bypass call path will be a problem as soon as you start talking
> about having more than one PHY for a MAC.
> 
> Yes, changing phylink fixes some of the issues, but doesn't get away
> from the fundamental issue that both the MAC and phylink are bypassed
> for certain paths.

You're right, that will need to be addressed. With Andrew's proposition
of having an identifier to address each part of the link (end-points or
intermediary PHYs/link-converters/PCSs), that would be a first step
towards fixing this. So, we would have to rewrite that call chain to
make sure we address the proper member, defaulting to the active PHY
for backwards compat as Andrew said.

I'll poke around and try to come-up with what it could look like on the
userspace side, with some ethtool examples maybe, to try to get a
clearer view.

Thanks,

Maxime


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multi-PHYs and multiple-ports bonding support
  2022-10-17 13:03   ` Andrew Lunn
@ 2022-10-18 11:45     ` Maxime Chevallier
  0 siblings, 0 replies; 9+ messages in thread
From: Maxime Chevallier @ 2022-10-18 11:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King (Oracle),
	netdev, linux-arm-kernel, Thomas Petazzoni, Antoine Tenart,
	David S. Miller, Heiner Kallweit, Florian Fainelli,
	Vivien Didelot, Tobias Waldekranz, Oleksij Rempel,
	Jakub Kicinski

Hello Andrew,

On Mon, 17 Oct 2022 15:03:59 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Mon, Oct 17, 2022 at 10:24:49AM +0100, Russell King (Oracle) wrote:
> > On Mon, Oct 17, 2022 at 10:51:00AM +0200, Maxime Chevallier wrote:  
> > > 2) Changes in Phylink
> > > 
> > > This might be the tricky part, as we need to track several ports,
> > > possibly connected to different PHYs, to get their state. For
> > > now, I haven't prototyped any of this yet.  
> > 
> > The problem is _way_ larger than phylink. It's a fundamental
> > throughout the net layer that there is one-PHY to one-MAC
> > relationship. Phylink just adopts this because it is the
> > established norm, and trying to fix it is rather rediculous without
> > a use case.
> > 
> > See code such as the ethtool code, where the MAC and associated
> > layers are# entirely bypassed with all the PHY-accessing ethtool
> > commands and the commands are passed directly to phylib for the PHY
> > registered against the netdev.  
> 
> We probably need to model the MII MUX. We can then have netdev->phydev
> and netdev->sfp_bus point to the MUX, which then defers to the
> currently active PHY/SFP for backwards compatibility. Additionally,
> for netlink ethtool, we can add a new property which allows a specific
> PHY/SFP hanging off the MUX to be addressed.

That's a good idea ! I find it pretty elegant indeed, and would be the
right place to implement the switching logic too.

> Modeling the MUX probably helps us with the overall architecture.  As
> Maxime described, there are at least two different architectures: the
> MUX is between the MAC and the PHYs, and the MUX is inside the PHY
> between the host interface and the line interfaces. There are at least
> 4 PHYs like this.
> 
> We also have Russells problem of two PHYs on one path. It would be
> nice to solve that at the same time, which the additional identifier
> attribute should help solve.
> 
> I would probably start this work from the uAPI. How does the uAPI
> work?

From the doc of struct ethtool_link_setting, there seems to be an
attempt to support that already :

"* Some hardware interfaces may have multiple PHYs and/or physical
 * connectors fitted or do not allow the driver to detect which are
 * fitted.  For these interfaces @port and/or @phy_address may be
 * writable, possibly dependent on @autoneg being %AUTONEG_DISABLE.
 * Otherwise, attempts to write different values may be ignored or
 * rejected.
"

https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/ethtool.h#L2047

However, this doesn't allow to enumerate clearly the interfaces
available, and it relies on the mdio address + port. This doesn't
address the chained PHYs as we don't have a clear view of the topology
(but do we need to ?)

I like very much the concept of having a way to address the interfaces
or the parts of the link chain.

Could we introduce a new ethtool cmd, allowing to enumerate the
ports and discover the topology, and another one to get an equivalent
of the ethtool_link_settings for each block in the chain ?

I'll try to prototype a few things to get a clearer picture...

Thanks a lot for your input,

Maxime

> 	Andrew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-18 11:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17  8:51 Multi-PHYs and multiple-ports bonding support Maxime Chevallier
2022-10-17  9:24 ` Russell King (Oracle)
2022-10-17 13:03   ` Andrew Lunn
2022-10-18 11:45     ` Maxime Chevallier
2022-10-18  8:02   ` Maxime Chevallier
2022-10-18  8:13     ` Russell King (Oracle)
2022-10-18  9:20       ` Maxime Chevallier
2022-10-17  9:45 ` Jiri Pirko
2022-10-17 10:03 ` Oleksij Rempel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).