netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: Vladimir Oltean <olteanv@gmail.com>,
	Ben Hutchings <ben.hutchings@essensium.com>
Cc: netdev@vger.kernel.org,
	Grygorii Strashko <grygorii.strashko@ti.com>,
	linux-omap@vger.kernel.org, Andrew Lunn <andrew@lunn.ch>,
	Vivien Didelot <vivien.didelot@gmail.com>
Subject: Re: Ethernet padding - ti_cpsw vs DSA tail tag
Date: Mon, 31 May 2021 08:22:03 -0700	[thread overview]
Message-ID: <cfa6494d-0d7c-a260-6f9f-d7b8d74b287e@gmail.com> (raw)
In-Reply-To: <20210531142914.bfvcbhglqz55us6s@skbuf>



On 5/31/2021 7:29 AM, Vladimir Oltean wrote:
> Hi Ben,
> 
> On Mon, May 31, 2021 at 02:40:52PM +0200, Ben Hutchings wrote:
>> I'm working on a system that uses a TI Sitara SoC with one of its
>> Ethernet ports connected to the host port of a Microchip KSZ8795
>> switch.  I'm updating the kernel from 4.14.y to 5.10.y.  Currently I
>> am using the ti_cpsw driver, but it looks like the ti_cpsw_new driver
>> has the same issue.
>>
>> The Microchip switch expects a tail tag on ingress from the host port
>> to control which external port(s) to forward to.  This must appear
>> immediately before the frame checksum.  The DSA core correctly pads
>> outgoing skbs to at least 60 bytes before tag_ksz appends the tag.
>>
>> However, since commit 9421c9015047 ("net: ethernet: ti: cpsw: fix min
>> eth packet size"), the cpsw driver pads outgoing skbs to at least 64
>> bytes.  This means that in smaller packets the tag byte is no longer
>> at the tail.
>>
>> It's not obvious to me where this should be fixed.  Should drivers
>> that pad in ndo_start_xmit be aware of any tail tag that needs to be
>> moved?  Should DSA be aware that a lower driver has a minimum size >
>> 60 bytes?
> 
> These are good questions.
> 
> In principle, DSA needs a hint from the master driver for tail taggers
> to work properly. We should pad to ETH_ZLEN + <the hint value> before
> inserting the tail tag. This is for correctness, to ensure we do not
> operate in marginal conditions which are not guaranteed to work.
> 
> A naive approach would be to take the hint from master->min_mtu.
> However, the first issue that appears is that the dev->min_mtu value is
> not always set quite correctly.
> 
> The MTU in general measures the number of bytes in the L2 payload (i.e.
> not counting the Ethernet + VLAN header, nor FCS). The DSA tag is
> considered to be a part of the L2 payload from the perspective of a
> DSA-unaware master.
> 
> But ether_setup() sets up dev->min_mtu by default to ETH_MIN_MTU (68),
> which cites RFC791. This says:
> 
>     Every internet module must be able to forward a datagram of 68
>     octets without further fragmentation.  This is because an internet
>     header may be up to 60 octets, and the minimum fragment is 8 octets.
> 
> But many drivers simply don't set dev->min_mtu = 0, even if they support
> sending minimum-sized Ethernet frames. Many set dev->min_mtu to ETH_ZLEN,
> proving nothing except the fact that they don't understand that the
> Ethernet header should not be counted by the MTU anyway.
> 
> So to work with these drivers which leave dev->min_mtu = ETH_MIN_MTU, we
> would have to pad the packets in DSA to ETH_ZLEN + ETH_MIN_MTU. This is
> not quite ideal, so even if it would be the correct approach, a large
> amount of drivers would have to be converted to set dev->min_mtu = 0
> before we could consider switching to that and not have too many
> regressions.
> 
> Also, dev->min_mtu does not appear to have a very strict definition
> anywhere other than "Interface Minimum MTU value". My hopes were some
> guarantees along the lines of "if you try to send a packet with a
> smaller L2 payload than dev->mtu, the controller might pad the packet".
> But no luck with that, it seems.
> 
> Going to commit 9421c9015047, it looks like that took a shortcut for
> performance reasons, and omitted to check whether the skb is actually
> VLAN-tagged or not, and if egress untagging was requested or not.
> My understanding is that packets smaller than CPSW_MIN_PACKET_SIZE _can_
> be sent, it's only that the value was chosen (too) conservatively as
> VLAN_ETH_ZLEN. The cpsw driver might be able to check whether the packet
> is a VLAN tagged one by looking at skb->protocol, and choose the pad
> size dynamically. Although I can understand why Grygorii might not want
> to do that.

Agree, that specific commit seems to be possibly by off by 4 in most cases.

> 
> The pitfall is that even if we declare the proper min_mtu value for
> every master driver, it would still not avoid padding in the cpsw case.
> This is because the reason cpsw pads is due to VLAN, but VLAN is not
> part of the L2 payload, so cpsw would still declare dev->min_mtu = 0 in
> spite of needing to pad.
> 
> The only honest solution might be to extend struct net_device and add a
> pad_size value somewhere in there. You might be able to find a hole with
> pahole or something, and it doesn't need to be larger than an u8 (for up
> to 255 bytes of padding). Then cpsw can set master->pad_size, and DSA
> can look at it for tail taggers.

Do we need another way for drivers to be left a chance to be wrong? Even
if we document the semantics of net_device::pad_size correctly this
probably won't cut it.

TBH, I don't fully understand why the network stack has left so much
leeway for Ethernet drivers to do their own padding as opposed to making
sure that non-tagged (VLAN, DSA, whatever) frames are guaranteed to be
at least 60 bytes when they reach ndo_start_xmit() and then just leave
the stacking of devices to add their bytes where they need them, with
the special trailer case that is a tiny bit harder to figure out. Maybe
back in the days most Ethernet NICs would hardware pad and this only
became a concern with newer/cheaper/embedded SoCs NICs that can no
longer hardware pad by default? I can understand the argument about raw
sockets which should permit an application to have full control over the
minimum packet length, but again, in general we have a real link partner
on the other side that is not going to be very tolerant.
-- 
Florian

  reply	other threads:[~2021-05-31 17:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-31 12:40 Ethernet padding - ti_cpsw vs DSA tail tag Ben Hutchings
2021-05-31 14:29 ` Vladimir Oltean
2021-05-31 15:22   ` Florian Fainelli [this message]
2021-06-11 14:12   ` Grygorii Strashko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfa6494d-0d7c-a260-6f9f-d7b8d74b287e@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=ben.hutchings@essensium.com \
    --cc=grygorii.strashko@ti.com \
    --cc=linux-omap@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).