All of lore.kernel.org
 help / color / mirror / Atom feed
* Ethernet padding - ti_cpsw vs DSA tail tag
@ 2021-05-31 12:40 Ben Hutchings
  2021-05-31 14:29 ` Vladimir Oltean
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Hutchings @ 2021-05-31 12:40 UTC (permalink / raw)
  To: netdev
  Cc: Grygorii Strashko, linux-omap, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean

I'm working on a system that uses a TI Sitara SoC with one of its
Ethernet ports connected to the host port of a Microchip KSZ8795
switch.  I'm updating the kernel from 4.14.y to 5.10.y.  Currently I
am using the ti_cpsw driver, but it looks like the ti_cpsw_new driver
has the same issue.

The Microchip switch expects a tail tag on ingress from the host port
to control which external port(s) to forward to.  This must appear
immediately before the frame checksum.  The DSA core correctly pads
outgoing skbs to at least 60 bytes before tag_ksz appends the tag.

However, since commit 9421c9015047 ("net: ethernet: ti: cpsw: fix min
eth packet size"), the cpsw driver pads outgoing skbs to at least 64
bytes.  This means that in smaller packets the tag byte is no longer
at the tail.

It's not obvious to me where this should be fixed.  Should drivers
that pad in ndo_start_xmit be aware of any tail tag that needs to be
moved?  Should DSA be aware that a lower driver has a minimum size >
60 bytes?

Ben.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ethernet padding - ti_cpsw vs DSA tail tag
  2021-05-31 12:40 Ethernet padding - ti_cpsw vs DSA tail tag Ben Hutchings
@ 2021-05-31 14:29 ` Vladimir Oltean
  2021-05-31 15:22   ` Florian Fainelli
  2021-06-11 14:12   ` Grygorii Strashko
  0 siblings, 2 replies; 4+ messages in thread
From: Vladimir Oltean @ 2021-05-31 14:29 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, Grygorii Strashko, linux-omap, Andrew Lunn,
	Vivien Didelot, Florian Fainelli

Hi Ben,

On Mon, May 31, 2021 at 02:40:52PM +0200, Ben Hutchings wrote:
> I'm working on a system that uses a TI Sitara SoC with one of its
> Ethernet ports connected to the host port of a Microchip KSZ8795
> switch.  I'm updating the kernel from 4.14.y to 5.10.y.  Currently I
> am using the ti_cpsw driver, but it looks like the ti_cpsw_new driver
> has the same issue.
> 
> The Microchip switch expects a tail tag on ingress from the host port
> to control which external port(s) to forward to.  This must appear
> immediately before the frame checksum.  The DSA core correctly pads
> outgoing skbs to at least 60 bytes before tag_ksz appends the tag.
> 
> However, since commit 9421c9015047 ("net: ethernet: ti: cpsw: fix min
> eth packet size"), the cpsw driver pads outgoing skbs to at least 64
> bytes.  This means that in smaller packets the tag byte is no longer
> at the tail.
> 
> It's not obvious to me where this should be fixed.  Should drivers
> that pad in ndo_start_xmit be aware of any tail tag that needs to be
> moved?  Should DSA be aware that a lower driver has a minimum size >
> 60 bytes?

These are good questions.

In principle, DSA needs a hint from the master driver for tail taggers
to work properly. We should pad to ETH_ZLEN + <the hint value> before
inserting the tail tag. This is for correctness, to ensure we do not
operate in marginal conditions which are not guaranteed to work.

A naive approach would be to take the hint from master->min_mtu.
However, the first issue that appears is that the dev->min_mtu value is
not always set quite correctly.

The MTU in general measures the number of bytes in the L2 payload (i.e.
not counting the Ethernet + VLAN header, nor FCS). The DSA tag is
considered to be a part of the L2 payload from the perspective of a
DSA-unaware master.

But ether_setup() sets up dev->min_mtu by default to ETH_MIN_MTU (68),
which cites RFC791. This says:

    Every internet module must be able to forward a datagram of 68
    octets without further fragmentation.  This is because an internet
    header may be up to 60 octets, and the minimum fragment is 8 octets.

But many drivers simply don't set dev->min_mtu = 0, even if they support
sending minimum-sized Ethernet frames. Many set dev->min_mtu to ETH_ZLEN,
proving nothing except the fact that they don't understand that the
Ethernet header should not be counted by the MTU anyway.

So to work with these drivers which leave dev->min_mtu = ETH_MIN_MTU, we
would have to pad the packets in DSA to ETH_ZLEN + ETH_MIN_MTU. This is
not quite ideal, so even if it would be the correct approach, a large
amount of drivers would have to be converted to set dev->min_mtu = 0
before we could consider switching to that and not have too many
regressions.

Also, dev->min_mtu does not appear to have a very strict definition
anywhere other than "Interface Minimum MTU value". My hopes were some
guarantees along the lines of "if you try to send a packet with a
smaller L2 payload than dev->mtu, the controller might pad the packet".
But no luck with that, it seems.

Going to commit 9421c9015047, it looks like that took a shortcut for
performance reasons, and omitted to check whether the skb is actually
VLAN-tagged or not, and if egress untagging was requested or not.
My understanding is that packets smaller than CPSW_MIN_PACKET_SIZE _can_
be sent, it's only that the value was chosen (too) conservatively as
VLAN_ETH_ZLEN. The cpsw driver might be able to check whether the packet
is a VLAN tagged one by looking at skb->protocol, and choose the pad
size dynamically. Although I can understand why Grygorii might not want
to do that.

The pitfall is that even if we declare the proper min_mtu value for
every master driver, it would still not avoid padding in the cpsw case.
This is because the reason cpsw pads is due to VLAN, but VLAN is not
part of the L2 payload, so cpsw would still declare dev->min_mtu = 0 in
spite of needing to pad.

The only honest solution might be to extend struct net_device and add a
pad_size value somewhere in there. You might be able to find a hole with
pahole or something, and it doesn't need to be larger than an u8 (for up
to 255 bytes of padding). Then cpsw can set master->pad_size, and DSA
can look at it for tail taggers.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ethernet padding - ti_cpsw vs DSA tail tag
  2021-05-31 14:29 ` Vladimir Oltean
@ 2021-05-31 15:22   ` Florian Fainelli
  2021-06-11 14:12   ` Grygorii Strashko
  1 sibling, 0 replies; 4+ messages in thread
From: Florian Fainelli @ 2021-05-31 15:22 UTC (permalink / raw)
  To: Vladimir Oltean, Ben Hutchings
  Cc: netdev, Grygorii Strashko, linux-omap, Andrew Lunn, Vivien Didelot



On 5/31/2021 7:29 AM, Vladimir Oltean wrote:
> Hi Ben,
> 
> On Mon, May 31, 2021 at 02:40:52PM +0200, Ben Hutchings wrote:
>> I'm working on a system that uses a TI Sitara SoC with one of its
>> Ethernet ports connected to the host port of a Microchip KSZ8795
>> switch.  I'm updating the kernel from 4.14.y to 5.10.y.  Currently I
>> am using the ti_cpsw driver, but it looks like the ti_cpsw_new driver
>> has the same issue.
>>
>> The Microchip switch expects a tail tag on ingress from the host port
>> to control which external port(s) to forward to.  This must appear
>> immediately before the frame checksum.  The DSA core correctly pads
>> outgoing skbs to at least 60 bytes before tag_ksz appends the tag.
>>
>> However, since commit 9421c9015047 ("net: ethernet: ti: cpsw: fix min
>> eth packet size"), the cpsw driver pads outgoing skbs to at least 64
>> bytes.  This means that in smaller packets the tag byte is no longer
>> at the tail.
>>
>> It's not obvious to me where this should be fixed.  Should drivers
>> that pad in ndo_start_xmit be aware of any tail tag that needs to be
>> moved?  Should DSA be aware that a lower driver has a minimum size >
>> 60 bytes?
> 
> These are good questions.
> 
> In principle, DSA needs a hint from the master driver for tail taggers
> to work properly. We should pad to ETH_ZLEN + <the hint value> before
> inserting the tail tag. This is for correctness, to ensure we do not
> operate in marginal conditions which are not guaranteed to work.
> 
> A naive approach would be to take the hint from master->min_mtu.
> However, the first issue that appears is that the dev->min_mtu value is
> not always set quite correctly.
> 
> The MTU in general measures the number of bytes in the L2 payload (i.e.
> not counting the Ethernet + VLAN header, nor FCS). The DSA tag is
> considered to be a part of the L2 payload from the perspective of a
> DSA-unaware master.
> 
> But ether_setup() sets up dev->min_mtu by default to ETH_MIN_MTU (68),
> which cites RFC791. This says:
> 
>     Every internet module must be able to forward a datagram of 68
>     octets without further fragmentation.  This is because an internet
>     header may be up to 60 octets, and the minimum fragment is 8 octets.
> 
> But many drivers simply don't set dev->min_mtu = 0, even if they support
> sending minimum-sized Ethernet frames. Many set dev->min_mtu to ETH_ZLEN,
> proving nothing except the fact that they don't understand that the
> Ethernet header should not be counted by the MTU anyway.
> 
> So to work with these drivers which leave dev->min_mtu = ETH_MIN_MTU, we
> would have to pad the packets in DSA to ETH_ZLEN + ETH_MIN_MTU. This is
> not quite ideal, so even if it would be the correct approach, a large
> amount of drivers would have to be converted to set dev->min_mtu = 0
> before we could consider switching to that and not have too many
> regressions.
> 
> Also, dev->min_mtu does not appear to have a very strict definition
> anywhere other than "Interface Minimum MTU value". My hopes were some
> guarantees along the lines of "if you try to send a packet with a
> smaller L2 payload than dev->mtu, the controller might pad the packet".
> But no luck with that, it seems.
> 
> Going to commit 9421c9015047, it looks like that took a shortcut for
> performance reasons, and omitted to check whether the skb is actually
> VLAN-tagged or not, and if egress untagging was requested or not.
> My understanding is that packets smaller than CPSW_MIN_PACKET_SIZE _can_
> be sent, it's only that the value was chosen (too) conservatively as
> VLAN_ETH_ZLEN. The cpsw driver might be able to check whether the packet
> is a VLAN tagged one by looking at skb->protocol, and choose the pad
> size dynamically. Although I can understand why Grygorii might not want
> to do that.

Agree, that specific commit seems to be possibly by off by 4 in most cases.

> 
> The pitfall is that even if we declare the proper min_mtu value for
> every master driver, it would still not avoid padding in the cpsw case.
> This is because the reason cpsw pads is due to VLAN, but VLAN is not
> part of the L2 payload, so cpsw would still declare dev->min_mtu = 0 in
> spite of needing to pad.
> 
> The only honest solution might be to extend struct net_device and add a
> pad_size value somewhere in there. You might be able to find a hole with
> pahole or something, and it doesn't need to be larger than an u8 (for up
> to 255 bytes of padding). Then cpsw can set master->pad_size, and DSA
> can look at it for tail taggers.

Do we need another way for drivers to be left a chance to be wrong? Even
if we document the semantics of net_device::pad_size correctly this
probably won't cut it.

TBH, I don't fully understand why the network stack has left so much
leeway for Ethernet drivers to do their own padding as opposed to making
sure that non-tagged (VLAN, DSA, whatever) frames are guaranteed to be
at least 60 bytes when they reach ndo_start_xmit() and then just leave
the stacking of devices to add their bytes where they need them, with
the special trailer case that is a tiny bit harder to figure out. Maybe
back in the days most Ethernet NICs would hardware pad and this only
became a concern with newer/cheaper/embedded SoCs NICs that can no
longer hardware pad by default? I can understand the argument about raw
sockets which should permit an application to have full control over the
minimum packet length, but again, in general we have a real link partner
on the other side that is not going to be very tolerant.
-- 
Florian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ethernet padding - ti_cpsw vs DSA tail tag
  2021-05-31 14:29 ` Vladimir Oltean
  2021-05-31 15:22   ` Florian Fainelli
@ 2021-06-11 14:12   ` Grygorii Strashko
  1 sibling, 0 replies; 4+ messages in thread
From: Grygorii Strashko @ 2021-06-11 14:12 UTC (permalink / raw)
  To: Vladimir Oltean, Ben Hutchings
  Cc: netdev, linux-omap, Andrew Lunn, Vivien Didelot, Florian Fainelli



On 31/05/2021 17:29, Vladimir Oltean wrote:
> Hi Ben,
> 
> On Mon, May 31, 2021 at 02:40:52PM +0200, Ben Hutchings wrote:
>> I'm working on a system that uses a TI Sitara SoC with one of its
>> Ethernet ports connected to the host port of a Microchip KSZ8795
>> switch.  I'm updating the kernel from 4.14.y to 5.10.y.  Currently I
>> am using the ti_cpsw driver, but it looks like the ti_cpsw_new driver
>> has the same issue.
>>
>> The Microchip switch expects a tail tag on ingress from the host port
>> to control which external port(s) to forward to.  This must appear
>> immediately before the frame checksum.  The DSA core correctly pads
>> outgoing skbs to at least 60 bytes before tag_ksz appends the tag.
>>
>> However, since commit 9421c9015047 ("net: ethernet: ti: cpsw: fix min
>> eth packet size"), the cpsw driver pads outgoing skbs to at least 64
>> bytes.  This means that in smaller packets the tag byte is no longer
>> at the tail.
>>
>> It's not obvious to me where this should be fixed.  Should drivers
>> that pad in ndo_start_xmit be aware of any tail tag that needs to be
>> moved?  Should DSA be aware that a lower driver has a minimum size >
>> 60 bytes?
> 
> These are good questions.
> 
> In principle, DSA needs a hint from the master driver for tail taggers
> to work properly. We should pad to ETH_ZLEN + <the hint value> before
> inserting the tail tag. This is for correctness, to ensure we do not
> operate in marginal conditions which are not guaranteed to work.
> 
> A naive approach would be to take the hint from master->min_mtu.
> However, the first issue that appears is that the dev->min_mtu value is
> not always set quite correctly.
> 
> The MTU in general measures the number of bytes in the L2 payload (i.e.
> not counting the Ethernet + VLAN header, nor FCS). The DSA tag is
> considered to be a part of the L2 payload from the perspective of a
> DSA-unaware master.
> 
> But ether_setup() sets up dev->min_mtu by default to ETH_MIN_MTU (68),
> which cites RFC791. This says:
> 
>      Every internet module must be able to forward a datagram of 68
>      octets without further fragmentation.  This is because an internet
>      header may be up to 60 octets, and the minimum fragment is 8 octets.
> 
> But many drivers simply don't set dev->min_mtu = 0, even if they support
> sending minimum-sized Ethernet frames. Many set dev->min_mtu to ETH_ZLEN,
> proving nothing except the fact that they don't understand that the
> Ethernet header should not be counted by the MTU anyway.
> 
> So to work with these drivers which leave dev->min_mtu = ETH_MIN_MTU, we
> would have to pad the packets in DSA to ETH_ZLEN + ETH_MIN_MTU. This is
> not quite ideal, so even if it would be the correct approach, a large
> amount of drivers would have to be converted to set dev->min_mtu = 0
> before we could consider switching to that and not have too many
> regressions.
> 
> Also, dev->min_mtu does not appear to have a very strict definition
> anywhere other than "Interface Minimum MTU value". My hopes were some
> guarantees along the lines of "if you try to send a packet with a
> smaller L2 payload than dev->mtu, the controller might pad the packet".
> But no luck with that, it seems.
> 
> Going to commit 9421c9015047, it looks like that took a shortcut for
> performance reasons, and omitted to check whether the skb is actually
> VLAN-tagged or not, and if egress untagging was requested or not.
> My understanding is that packets smaller than CPSW_MIN_PACKET_SIZE _can_
> be sent, it's only that the value was chosen (too) conservatively as
> VLAN_ETH_ZLEN.

It can. HW limit is ETH_ZLEN and issue fixed by referred commit happens only
for packets sent from Host to Ext. Port with vid un-tagged enabled.

> The cpsw driver might be able to check whether the packet
> is a VLAN tagged one by looking at skb->protocol, and choose the pad
> size dynamically. Although I can understand why Grygorii might not want
> to do that.

Yeah. It's pretty complicated as will require maintain list of untugged vlans and
analyze frame in driver (seems vlan info not provided by upper layer when packet reaches
driver and !NETIF_F_HW_VLAN_CTAG_TX). Internally we carry very simple patch which
allows to change it (but it's module parameter).

Posted patch for cpsw-switch [1] and think I'll just post revert for legacy cpsw.

> 
> The pitfall is that even if we declare the proper min_mtu value for
> every master driver, it would still not avoid padding in the cpsw case.
> This is because the reason cpsw pads is due to VLAN, but VLAN is not
> part of the L2 payload, so cpsw would still declare dev->min_mtu = 0 in
> spite of needing to pad.

Yah. all MTU staff is L3 specific.

> 
> The only honest solution might be to extend struct net_device and add a
> pad_size value somewhere in there. You might be able to find a hole with
> pahole or something, and it doesn't need to be larger than an u8 (for up
> to 255 bytes of padding). Then cpsw can set master->pad_size, and DSA
> can look at it for tail taggers.
> 

There are:
	unsigned int		mtu;
	unsigned short		needed_headroom;
	unsigned short		needed_tailroom;
	unsigned short		hard_header_len;
	unsigned int		min_mtu;
	unsigned int		max_mtu;
	unsigned char		min_header_len;

  *	@mtu:		Interface MTU value
  *	@min_mtu:	Interface Minimum MTU value
  *	@max_mtu:	Interface Maximum MTU value
  *	@hard_header_len: Maximum hardware header length.
  *	@min_header_len:  Minimum hardware header length
  *	@needed_headroom: Extra headroom the hardware may need, but not in all
  *			  cases can this be guaranteed
  *	@needed_tailroom: Extra tailroom the hardware may need, but not in all
  *			  cases can this be guaranteed. Some cases also use
  *			  LL_MAX_HEADER instead to allocate the skb

but none of them seems suitable as here we are talking about min Eth frame lengs.
Any way, even if decided to add smth. new - it need to be configuarble.

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20210611132732.10690-1-grygorii.strashko@ti.com/
-- 
Best regards,
grygorii

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-11 14:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-31 12:40 Ethernet padding - ti_cpsw vs DSA tail tag Ben Hutchings
2021-05-31 14:29 ` Vladimir Oltean
2021-05-31 15:22   ` Florian Fainelli
2021-06-11 14:12   ` Grygorii Strashko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.