netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] Rate adaptation for Felix DSA switch
@ 2020-01-16 18:19 Vladimir Oltean
  2020-01-16 18:19 ` [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result Vladimir Oltean
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Vladimir Oltean @ 2020-01-16 18:19 UTC (permalink / raw)
  To: davem, netdev, linux
  Cc: andrew, f.fainelli, vivien.didelot, claudiu.manoil, Vladimir Oltean

When operating the MAC at 2.5Gbps (2500Base-X and USXGMII/QSXGMII) and
in combination with certain PHYs, it is possible that the copper side
may operate at lower link speeds. In this case, it is the PHY who has a
MAC inside of it that emits pause frames towards the switch's MAC,
telling it to slow down so that the transmission is lossless.

These patches are the support needed for the switch side of things to
work.

Alex Marginean (2):
  net: dsa: felix: Handle PAUSE RX regardless of AN result
  net: dsa: felix: Allow PHY to AN 10/100/1000 with 2500 serdes link

 drivers/net/dsa/ocelot/felix.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result
  2020-01-16 18:19 [PATCH net-next 0/2] Rate adaptation for Felix DSA switch Vladimir Oltean
@ 2020-01-16 18:19 ` Vladimir Oltean
  2020-01-20  9:43   ` Russell King - ARM Linux admin
  2020-01-16 18:19 ` [PATCH net-next 2/2] net: dsa: felix: Allow PHY to AN 10/100/1000 with 2500 serdes link Vladimir Oltean
  2020-01-19 15:00 ` [PATCH net-next 0/2] Rate adaptation for Felix DSA switch David Miller
  2 siblings, 1 reply; 7+ messages in thread
From: Vladimir Oltean @ 2020-01-16 18:19 UTC (permalink / raw)
  To: davem, netdev, linux
  Cc: andrew, f.fainelli, vivien.didelot, claudiu.manoil,
	Alex Marginean, Vladimir Oltean

From: Alex Marginean <alexandru.marginean@nxp.com>

Flow control is used with 2500Base-X and AQR PHYs to do rate adaptation
between line side 100/1000 links and MAC running at 2.5G.

This is independent of the flow control configuration settled on line
side though AN.

In general, allowing the MAC to handle flow control even if not
negotiated with the link partner should not be a problem, so the patch
just enables it in all cases.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/ocelot/felix.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index d6ee089dbfe1..46334436a8fe 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -222,8 +222,12 @@ static void felix_phylink_mac_config(struct dsa_switch *ds, int port,
 	 * specification in incoming pause frames.
 	 */
 	mac_fc_cfg = SYS_MAC_FC_CFG_FC_LINK_SPEED(state->speed);
-	if (state->pause & MLO_PAUSE_RX)
-		mac_fc_cfg |= SYS_MAC_FC_CFG_RX_FC_ENA;
+
+	/* handle Rx pause in all cases, with 2500base-X this is used for rate
+	 * adaptation.
+	 */
+	mac_fc_cfg |= SYS_MAC_FC_CFG_RX_FC_ENA;
+
 	if (state->pause & MLO_PAUSE_TX)
 		mac_fc_cfg |= SYS_MAC_FC_CFG_TX_FC_ENA |
 			      SYS_MAC_FC_CFG_PAUSE_VAL_CFG(0xffff) |
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/2] net: dsa: felix: Allow PHY to AN 10/100/1000 with 2500 serdes link
  2020-01-16 18:19 [PATCH net-next 0/2] Rate adaptation for Felix DSA switch Vladimir Oltean
  2020-01-16 18:19 ` [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result Vladimir Oltean
@ 2020-01-16 18:19 ` Vladimir Oltean
  2020-01-19 15:00 ` [PATCH net-next 0/2] Rate adaptation for Felix DSA switch David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: Vladimir Oltean @ 2020-01-16 18:19 UTC (permalink / raw)
  To: davem, netdev, linux
  Cc: andrew, f.fainelli, vivien.didelot, claudiu.manoil,
	Alex Marginean, Vladimir Oltean

From: Alex Marginean <alexandru.marginean@nxp.com>

If the serdes link is set to 2500 using interfce type 2500base-X, lower
link speeds over on the line side should still be supported.
Rate adaptation is done out of band, in our case using AQR PHYs this is
done using flow control.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/ocelot/felix.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 46334436a8fe..8108aaef96f8 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -172,11 +172,10 @@ static void felix_phylink_validate(struct dsa_switch *ds, int port,
 	phylink_set(mask, Autoneg);
 	phylink_set(mask, Pause);
 	phylink_set(mask, Asym_Pause);
-	if (state->interface != PHY_INTERFACE_MODE_2500BASEX) {
-		phylink_set(mask, 10baseT_Full);
-		phylink_set(mask, 100baseT_Full);
-		phylink_set(mask, 1000baseT_Full);
-	}
+	phylink_set(mask, 10baseT_Full);
+	phylink_set(mask, 100baseT_Full);
+	phylink_set(mask, 1000baseT_Full);
+
 	/* The internal ports that run at 2.5G are overclocked GMII */
 	if (state->interface == PHY_INTERFACE_MODE_GMII ||
 	    state->interface == PHY_INTERFACE_MODE_2500BASEX ||
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/2] Rate adaptation for Felix DSA switch
  2020-01-16 18:19 [PATCH net-next 0/2] Rate adaptation for Felix DSA switch Vladimir Oltean
  2020-01-16 18:19 ` [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result Vladimir Oltean
  2020-01-16 18:19 ` [PATCH net-next 2/2] net: dsa: felix: Allow PHY to AN 10/100/1000 with 2500 serdes link Vladimir Oltean
@ 2020-01-19 15:00 ` David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2020-01-19 15:00 UTC (permalink / raw)
  To: olteanv; +Cc: netdev, linux, andrew, f.fainelli, vivien.didelot, claudiu.manoil

From: Vladimir Oltean <olteanv@gmail.com>
Date: Thu, 16 Jan 2020 20:19:31 +0200

> When operating the MAC at 2.5Gbps (2500Base-X and USXGMII/QSXGMII) and
> in combination with certain PHYs, it is possible that the copper side
> may operate at lower link speeds. In this case, it is the PHY who has a
> MAC inside of it that emits pause frames towards the switch's MAC,
> telling it to slow down so that the transmission is lossless.
> 
> These patches are the support needed for the switch side of things to
> work.

Series applied, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result
  2020-01-16 18:19 ` [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result Vladimir Oltean
@ 2020-01-20  9:43   ` Russell King - ARM Linux admin
  2020-01-21  8:18     ` Alexandru Marginean
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2020-01-20  9:43 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: davem, netdev, andrew, f.fainelli, vivien.didelot,
	claudiu.manoil, Alex Marginean, Vladimir Oltean

On Thu, Jan 16, 2020 at 08:19:32PM +0200, Vladimir Oltean wrote:
> From: Alex Marginean <alexandru.marginean@nxp.com>
> 
> Flow control is used with 2500Base-X and AQR PHYs to do rate adaptation
> between line side 100/1000 links and MAC running at 2.5G.
> 
> This is independent of the flow control configuration settled on line
> side though AN.
> 
> In general, allowing the MAC to handle flow control even if not
> negotiated with the link partner should not be a problem, so the patch
> just enables it in all cases.
> 
> Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

I think this is not the best approach - you're working around the
issue in your network driver, rather than recognising that it's a
larger problem than just your network driver.  Rate adaption is
present in other PHYs using exactly the same mechanism so why do we
want to hack around this in each network driver?  It is a property
of the PHY, not of the network driver.

Surely it not be better to address this in phylib/phylink - after
all, there are several aspects to this:

1) separation of the MAC configuration (reported to the MAC) from
   the negotiation results (reported to the user).
2) we need the MAC to be able to receive and act on flow control.
3) we need to report the correct speed setting to the MAC.

I already have patches to improve the current phylib method of
reporting the flow control information to MAC drivers with the resolved
flow state rather than just the current link partner advertisement
bits, which should make (2) fairly easy to achieve.  (1) and (3) will
require additional work.

> ---
>  drivers/net/dsa/ocelot/felix.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
> index d6ee089dbfe1..46334436a8fe 100644
> --- a/drivers/net/dsa/ocelot/felix.c
> +++ b/drivers/net/dsa/ocelot/felix.c
> @@ -222,8 +222,12 @@ static void felix_phylink_mac_config(struct dsa_switch *ds, int port,
>  	 * specification in incoming pause frames.
>  	 */
>  	mac_fc_cfg = SYS_MAC_FC_CFG_FC_LINK_SPEED(state->speed);
> -	if (state->pause & MLO_PAUSE_RX)
> -		mac_fc_cfg |= SYS_MAC_FC_CFG_RX_FC_ENA;
> +
> +	/* handle Rx pause in all cases, with 2500base-X this is used for rate
> +	 * adaptation.
> +	 */
> +	mac_fc_cfg |= SYS_MAC_FC_CFG_RX_FC_ENA;
> +
>  	if (state->pause & MLO_PAUSE_TX)
>  		mac_fc_cfg |= SYS_MAC_FC_CFG_TX_FC_ENA |
>  			      SYS_MAC_FC_CFG_PAUSE_VAL_CFG(0xffff) |
> -- 
> 2.17.1
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result
  2020-01-20  9:43   ` Russell King - ARM Linux admin
@ 2020-01-21  8:18     ` Alexandru Marginean
  2020-01-21 11:55       ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandru Marginean @ 2020-01-21  8:18 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Vladimir Oltean
  Cc: davem, netdev, andrew, f.fainelli, vivien.didelot,
	Claudiu Manoil, Vladimir Oltean

resent from NXP account, apparently google thinks this is just spam.

On 1/20/2020 10:43 AM, Russell King - ARM Linux admin wrote:
> On Thu, Jan 16, 2020 at 08:19:32PM +0200, Vladimir Oltean wrote:
>> From: Alex Marginean <alexandru.marginean@nxp.com>
>>
>> Flow control is used with 2500Base-X and AQR PHYs to do rate adaptation
>> between line side 100/1000 links and MAC running at 2.5G.
>>
>> This is independent of the flow control configuration settled on line
>> side though AN.
>>
>> In general, allowing the MAC to handle flow control even if not
>> negotiated with the link partner should not be a problem, so the patch
>> just enables it in all cases.
>>
>> Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
>> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> I think this is not the best approach - you're working around the
> issue in your network driver, rather than recognising that it's a
> larger problem than just your network driver.

Both are true, this is a work-around so our system isn't functionally 
broken and it's clear that the general issue has to be handled elsewhere.


> Rate adaption is present in other PHYs using exactly the same
> mechanism so why do we want to hack around this in each network
> driver?  It is a property of the PHY, not of the network driver.
> 
> Surely it not be better to address this in phylib/phylink - after
> all, there are several aspects to this:
> 
> 1) separation of the MAC configuration (reported to the MAC) from
>     the negotiation results (reported to the user).
> 2) we need the MAC to be able to receive and act on flow control.
> 3) we need to report the correct speed setting to the MAC.
> 
> I already have patches to improve the current phylib method of
> reporting the flow control information to MAC drivers with the resolved
> flow state rather than just the current link partner advertisement
> bits, which should make (2) fairly easy to achieve.  (1) and (3) will
> require additional work.

I think this won't be trivial to address, let's see how it will go.

At the PHY level we would add a capability indicating that flow control 
can be used as a way to do rate adaptation.  If that's just a capability 
bit declared at probing, it should probably imply that the PHY driver 
also provides run-time configuration.  This way flow control can be 
enabled/disabled so that the PHY configuration is adjusted based on 
phylink resolved state, this is probably the ideal case.

I'm not sure how many PHYs are going to be that flexible though, we may 
need quirks for:
- the PHY supports flow control but it's not configurable, potentially 
baked into the firmware,
- flow control is available for certain interface types but not for others,
- flow control is available for certain link speeds but not for others, 
or other restrictions like these.

Should PHY level indication of flow control support be a static flag, or 
a function of interface type/link speed?  The latter would allow the PHY 
driver to address any quirkiness internally.

We've been working with Aquantia PHYs and they are somewhat quirky, it 
would be useful to hear about other PHYs if anyone has any first-hand 
experience with other PHYs doing flow control.

Phylink should then take in MAC capabilites, PHY capabilities and if:
- MAC supports flow control configuration or Rx is always on, and
- PHY supports flow control (either as a generic capability flag plus 
optional quirks, or as a function of interface type/link speed), and
- the current interface type does not allow dealing with rate adaptation 
internally (like XFI, 2500base-x as currently used in the kernel), and maybe
- link speed on line side is below the capacity of the system interface,
then instruct the PHY to do rate adaptation using flow control and 
enable flow control Rx in the MAC.

If the conditions aren't met maybe phylink should issue a warning or 
some sort of indication to the user.  If the system ends up with a 1G 
link on line side, XFI and no rate adaptation, some networking protocols 
aren't going to work too well.

At the user level I think we're going to have to present more 
information, what the peers advertised during AN, what the result of the 
AN was and also what the configuration of the MAC is, as now this could 
be different due to rate adaptation.  That way the user can tell for 
instance that flow control was disabled as part of AN but it is enabled 
in the MAC for the purpose of doing rate adaptation between MAC and PHY.

ethtool -A should probably not control the actual MAC configuration 
either, to keep rate adaptation sane, but rather be part of the phylink 
algorithm.  This is a little tricky to do, given that ethtool ops are 
now implemented by Eth drivers.

Is this how you see it working out?

Thanks!
Alex

> 
>> ---
>>   drivers/net/dsa/ocelot/felix.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
>> index d6ee089dbfe1..46334436a8fe 100644
>> --- a/drivers/net/dsa/ocelot/felix.c
>> +++ b/drivers/net/dsa/ocelot/felix.c
>> @@ -222,8 +222,12 @@ static void felix_phylink_mac_config(struct dsa_switch *ds, int port,
>>   	 * specification in incoming pause frames.
>>   	 */
>>   	mac_fc_cfg = SYS_MAC_FC_CFG_FC_LINK_SPEED(state->speed);
>> -	if (state->pause & MLO_PAUSE_RX)
>> -		mac_fc_cfg |= SYS_MAC_FC_CFG_RX_FC_ENA;
>> +
>> +	/* handle Rx pause in all cases, with 2500base-X this is used for rate
>> +	 * adaptation.
>> +	 */
>> +	mac_fc_cfg |= SYS_MAC_FC_CFG_RX_FC_ENA;
>> +
>>   	if (state->pause & MLO_PAUSE_TX)
>>   		mac_fc_cfg |= SYS_MAC_FC_CFG_TX_FC_ENA |
>>   			      SYS_MAC_FC_CFG_PAUSE_VAL_CFG(0xffff) |
>> -- 
>> 2.17.1
>>
>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result
  2020-01-21  8:18     ` Alexandru Marginean
@ 2020-01-21 11:55       ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2020-01-21 11:55 UTC (permalink / raw)
  To: Alexandru Marginean
  Cc: Vladimir Oltean, davem, netdev, andrew, f.fainelli,
	vivien.didelot, Claudiu Manoil, Vladimir Oltean

On Tue, Jan 21, 2020 at 08:18:10AM +0000, Alexandru Marginean wrote:
> resent from NXP account, apparently google thinks this is just spam.
> 
> On 1/20/2020 10:43 AM, Russell King - ARM Linux admin wrote:
> > On Thu, Jan 16, 2020 at 08:19:32PM +0200, Vladimir Oltean wrote:
> >> From: Alex Marginean <alexandru.marginean@nxp.com>
> >>
> >> Flow control is used with 2500Base-X and AQR PHYs to do rate adaptation
> >> between line side 100/1000 links and MAC running at 2.5G.
> >>
> >> This is independent of the flow control configuration settled on line
> >> side though AN.
> >>
> >> In general, allowing the MAC to handle flow control even if not
> >> negotiated with the link partner should not be a problem, so the patch
> >> just enables it in all cases.
> >>
> >> Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
> >> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > 
> > I think this is not the best approach - you're working around the
> > issue in your network driver, rather than recognising that it's a
> > larger problem than just your network driver.
> 
> Both are true, this is a work-around so our system isn't functionally 
> broken and it's clear that the general issue has to be handled elsewhere.
> 
> 
> > Rate adaption is present in other PHYs using exactly the same
> > mechanism so why do we want to hack around this in each network
> > driver?  It is a property of the PHY, not of the network driver.
> > 
> > Surely it not be better to address this in phylib/phylink - after
> > all, there are several aspects to this:
> > 
> > 1) separation of the MAC configuration (reported to the MAC) from
> >     the negotiation results (reported to the user).
> > 2) we need the MAC to be able to receive and act on flow control.
> > 3) we need to report the correct speed setting to the MAC.
> > 
> > I already have patches to improve the current phylib method of
> > reporting the flow control information to MAC drivers with the resolved
> > flow state rather than just the current link partner advertisement
> > bits, which should make (2) fairly easy to achieve.  (1) and (3) will
> > require additional work.
> 
> I think this won't be trivial to address, let's see how it will go.

I think you're trying to inflate the complexity of the problem.

We've had "rate adaption" in ethernet networks for years. Consider a
10/100/1G switch, where some devices are connected at 1G speed and
others at a slower speed.

The switch has to do "rate adaption" to allow devices operating at
dis-similar speeds to communicate with each other.

Whether flow control is used or not depends on the supported
capabilities of each end of each link, and also the user configuration
of the interface.

So, rate adaption itself is nothing new.  However, where it is being
performed is new.

> At the PHY level we would add a capability indicating that flow control 
> can be used as a way to do rate adaptation.  If that's just a capability 
> bit declared at probing, it should probably imply that the PHY driver 
> also provides run-time configuration.  This way flow control can be 
> enabled/disabled so that the PHY configuration is adjusted based on 
> phylink resolved state, this is probably the ideal case.
> 
> I'm not sure how many PHYs are going to be that flexible though, we may 
> need quirks for:
> - the PHY supports flow control but it's not configurable, potentially 
> baked into the firmware,

That isn't a problem.

We have situations today where the user disables flow control on the
interface despite it having been negotiated with the link partner.  An
example of this is a buggy switch, which advertises flow control, but
if it's used, the switch misbehaves.  The normal solution adopted is
to disable flow control via ethtool -A.  This may or may not change
the interface's advertisement (that seems to be driver specific).

So we can already end up with flow control advertised and negotiated,
but disabled at the interface.  No one has complained about that.

Let me present a couple of examples:

 MAC <--10GBASE-R--> PHY <--1G--> SWITCH <--100M--> PARTNER

If the MAC is flow control capable, the PHY performs rate adaption,
but the switch is not flow control capable.  Whether or not the
partner supports flow control is irrelevent to my scenario.

Flow control receive is force-enabled at the MAC as we've decided
that should happen when the PHY is performing rate adaption.

The MAC sends packets, and fills the PHY rate adaption buffer.  The
PHY sends flow control packets to the MAC to stop transmission.  No
packet loss occurs at this stage.

The PHY sends the packets out at 1G line rate to the switch, which
fills its own rate adaption buffer for sending on to the 100M partner.
There is no flow control, so the switch starts dropping packets.

Now consider:

 MAC <--10GBASE-R--> PHY <--10G--> SWITCH <--100M--> PARTNER

with the same capabilities.  The PHY is not performing rate adaption.
Let's say that the PHY doesn't have that capability.

The MAC sends packets, which are passed through by the PHY to the
switch, filling its rate adaption buffer.  As the switch has no flow
control, it starts dropping packets.

Both scenarios are not very different - packets get lost in both
situations, but there are two differences:

1) where they are lost is slightly different.
2) the latency of the connection is different.

To think about the second point, consider that the higher layers on
the MAC side want to send a retrainsmission of some of the packets
that were dropped.

In the first case, the MAC is stalled from sending packets until the
PHY permits it, meaning that there's packets held at the MAC until
the PHY can send them at 1G speed - and probably many of them will
be dropped by the switch.  So, we have to wait for these to be sent
at 1G speed, assuming that the retransmission is sent out in queued
packet order.

In the second case, all packets will be sent at 10G speed and
discarded by the switch once the switch has filled its buffers, but
the retransmission is not delayed.

That said, I'm not sufficiently into the details of networking to know
whether one situation is better than the other, I suspect that depends
on how the protocol that is suffering packet loss has been designed.

> - flow control is available for certain interface types but not for others,
> - flow control is available for certain link speeds but not for others, 
> or other restrictions like these.

Given that flow control is fundamentally just a normal ethernet packet
with a certain destination address, I don't see that being an issue.
In any case, if flow control were not available, this is no different
from Ethernet without flow control.

> Should PHY level indication of flow control support be a static flag, or 
> a function of interface type/link speed?  The latter would allow the PHY 
> driver to address any quirkiness internally.

It depends on the configuration of the PHY.  For the Marvell 88x3310,
the host interface can be configured in several modes, which include:

1) switching between 10GBASE-R, 5GBASE-R, 2500BASE-X and SGMII on the
   fly depending on the negotiated link speed (the MAC side interface
   changes.)

2) fixed 10GBASE-R with rate adaption, which *may* include the PHY
   sending flow control when the rate adaption buffers fill up.  Rate
   adaption will only be used if there are dis-similar speeds.  From
   what I read, some PHYs may not be capable of flow control in this
   mode, and where flow control is not used, apparently the IPG
   "should" be extended.

So:
- rate adaption is a function of how the PHY is configured and the
  negotiated link speed.
- whether flow control should be used is a separate property from
  whether rate adaption is enabled.
- if flow control is not used, the IPG in use by the MAC "should" be
  extended to match the PHY egress rate.

> We've been working with Aquantia PHYs and they are somewhat quirky, it 
> would be useful to hear about other PHYs if anyone has any first-hand 
> experience with other PHYs doing flow control.
> 
> Phylink should then take in MAC capabilites, PHY capabilities and if:
> - MAC supports flow control configuration or Rx is always on, and
> - PHY supports flow control (either as a generic capability flag plus 
> optional quirks, or as a function of interface type/link speed), and
> - the current interface type does not allow dealing with rate adaptation 
> internally (like XFI, 2500base-x as currently used in the kernel), and maybe
> - link speed on line side is below the capacity of the system interface,
> then instruct the PHY to do rate adaptation using flow control and 
> enable flow control Rx in the MAC.
> 
> If the conditions aren't met maybe phylink should issue a warning or 
> some sort of indication to the user.  If the system ends up with a 1G 
> link on line side, XFI and no rate adaptation, some networking protocols 
> aren't going to work too well.

If a system has 1G on the line side but XFI on the MAC side, and no
rate adaption, then it simply can't work.  Rate adaption is providing
themechanism to allow dis-similar speeds on either side of the block
doing rate adaption to be able to communicate.  So, I think you
actually mean flow control here.

I'd say the networking protocols haven't been well designed then -
Ethernet without flow control is a lossy networking system.

> At the user level I think we're going to have to present more 
> information, what the peers advertised during AN, what the result of the 
> AN was and also what the configuration of the MAC is, as now this could 
> be different due to rate adaptation.  That way the user can tell for 
> instance that flow control was disabled as part of AN but it is enabled 
> in the MAC for the purpose of doing rate adaptation between MAC and PHY.

Ethtool gives the following information:

- via ksettings API, access to the negotiation link modes.  Ethtool
  reports the link modes that each end advertised, and also via
  ethtool -a, calculates the resolution from the advertised link
  modes reported via the ksettings API.  In no way does this reflect
  how the MAC was configured.

- via the pauseparam API, access to the manual configuration of the
  MAC - whether the results of AN are configured to be used, and the
  forced flow control settings if pause AN is disabled.

So, there is already no way today to actually tell what the MAC is
doing with respect to flow control.

> ethtool -A should probably not control the actual MAC configuration 
> either, to keep rate adaptation sane, but rather be part of the phylink 
> algorithm.  This is a little tricky to do, given that ethtool ops are 
> now implemented by Eth drivers.

ethtool -A should _definitely_ continue to control the MAC
configuration, so that the user can decide to have flow control
disabled if there is a switch that causes problems when flow
control is in use (they exist), and without that ability, the users
only other solution is to throw out the problematical switch.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-01-21 11:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-16 18:19 [PATCH net-next 0/2] Rate adaptation for Felix DSA switch Vladimir Oltean
2020-01-16 18:19 ` [PATCH net-next 1/2] net: dsa: felix: Handle PAUSE RX regardless of AN result Vladimir Oltean
2020-01-20  9:43   ` Russell King - ARM Linux admin
2020-01-21  8:18     ` Alexandru Marginean
2020-01-21 11:55       ` Russell King - ARM Linux admin
2020-01-16 18:19 ` [PATCH net-next 2/2] net: dsa: felix: Allow PHY to AN 10/100/1000 with 2500 serdes link Vladimir Oltean
2020-01-19 15:00 ` [PATCH net-next 0/2] Rate adaptation for Felix DSA switch David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).