Re: [PATCH net] net: lan78xx: Disable hardware vlan filtering in promiscuous mode

From: "Markus Blöchl" <Markus.Bloechl@ipetronik.com>
To: Vladimir Oltean <olteanv@gmail.com>, Jakub Kicinski <kuba@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
	Ido Schimmel <idosch@idosch.org>, Andrew Lunn <andrew@lunn.ch>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Woojung Huh <woojung.huh@microchip.com>,
	"David S. Miller" <davem@davemloft.net>,
	Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] net: lan78xx: Disable hardware vlan filtering in promiscuous mode
Date: Thu, 19 Nov 2020 16:37:51 +0100	[thread overview]
Message-ID: <20201119153751.ix73o5h4n6dgv4az@ipetronik.com> (raw)
In-Reply-To: <20201114181103.2eeh3eexcdnbcfj2@skbuf>

On Sat, Nov 14, 2020 at 08:11:03PM +0200, Vladimir Oltean wrote:
> On Thu, Nov 12, 2020 at 11:53:15AM +0100, Markus Blöchl wrote:
> > From what I can see, most other drivers use a special hardware register
> > flag to enable promiscuous mode, which overrules all other filters.
> 
> Yes, but it may not mean what you think.
> 
> > e.g. from the ASIX AX88178 datasheet:
> >
> >   PRO:  PACKET_TYPE_PROMISCUOUS.
> >     1: All frames received by the ASIC are forwarded up toward the host.
> >     0: Disabled (default).
> 
> See, that's one definition of promiscuity that is really strange, and
> not backed by any standards body that I know of (if you know otherwise,
> please speak up).
> 
> > It is just so that the lan78xx controllers don't have this explicit flag.
> 
> Which is not surprising, at least under that description. Others may be
> inclined to call that behavior "packet trapping" when applying it to
> e.g. an Ethernet switch.
> 
> There might be an entire discussion about how promiscuity does _not_
> mean "deliver all packets to the CPU" that might be of interest to you:
> https://lkml.org/lkml/2019/8/29/255

If I glanced over this discussion correctly, it is about avoiding
promiscuity under certain circumstances (while promiscuity was
only requested for switching and not by userspace) because HW promiscuity
is not needed in that particular case.

As I currently see it, there are two common use cases for promiscuity:

1) Applying filtering, switching and other logic on the CPU.
   This could be due to limited resources in the NIC, e.g. when there
   are too many unicast addresses configured on that interface or
   simply an unavailable hardware capability.

2) Sniffing the wire.

The kernel uses IFF_PROMISC (or `__dev_set_promiscuity()`) for both,
which obviously can be overkill in the first case.
The question remains, what does IFF_PROMISC exactly mean for userspace
(which, I guess, most often uses it for 2)?

> 
> > But since my change is more controversial than I anticipated, I would like
> > to take a step back and ask some general questions first:
> >
> > We used to connect something akin to a port mirror to a lan7800 NIC
> > (and a few others) in order to record and debug some VLAN heavy traffic.
> > On older kernel versions putting the interface into promiscuous mode
> > and opening and binding an AF_PACKET socket (or just throwing tcpdump
> > or libpcap at it) was sufficient.
> > After a kernel upgrade the same setup missed most of the traffic.
> > Does this qualify as a regression? Why not?
> 
> If something used to work but no longer does, that's what a regression
> is. But fixing it depends on whether it should have worked like that in
> the first place or not. That we don't know.

Admittedly, I certainly don't know, but hoped to find whom who does on
this list. ;-)

> 
> > Should there be a documented and future proof way to receive *all*
> > valid ethernet frames from an interface?
> 
> Yes, there should.
> 
> > This could be something like:
> >
> > a) - Bring up the interface
> >    - Put the interface into promiscuous mode
> 
> This one would be necessary in order to not drop packets with unknown
> addresses, not more.
> 
> >    - Open, bind and read a raw AF_PACKET socket with ETH_P_ALL
> >    - Patch up the 801.1Q headers if required.
> 
> What do you mean by "patching up"? Are you talking about the fact that
> packets are untagged by the stack in the receive path anyway, and the
> VLAN header would need to be reconstructed?
> https://patchwork.ozlabs.org/project/netdev/patch/e06dbb47-2d1c-03ca-4cd7-cc958b6a939e@gmail.com/

Yes, that's exactly what I was referring to.
I find it slightly annoying on RAW sockets, but it's documented and
(hopefully) consistent behaviour now, so okay for me.

> 
> >
> > b) - The same as a)
> >    - Additionally enumerate and disable all available offloading features
> 
> If said offloading features have to do with the CPU not receiving some
> frames any longer, and you _do_ want the CPU to see them, then obviously
> said offloading features should be disabled. This includes not only VLAN
> filtering, but also bridging offload, IP forwarding offload, etc.
> 
> I'd say that (b) should be sufficient, but not future-proof in the sense
> that new offloading features might come every day, and they would need
> to be disabled on a case-by-case basis.
> 
> For the forwarding offload, there would be no question whatsoever that
> you'd need to turn it off, or add a mirroring rule towards the CPU, or
> do _something_ user-visible, to get that traffic. But as for VLAN
> filtering offload in particular, there's the (not negligible at all)
> precedent created by the bridge driver, that Ido pointed out. That would
> be a decision for the maintainers to make, if the Linux network stack
> creates its own definition of promiscuity which openly contradicts IEEE's.
> One might perhaps try to argue that the VLAN ID is an integral part of
> the station's address (which is true if you look at it from the
> perspective of an IEEE 802.1Q bridge), and therefore promiscuity should
> apply on the VLAN ID too, not just the MAC address. Either way, that
> should be made more clear than it currently is, once a decision is
> taken.

In that case, maybe new features which could alter user-visible behaviour
should not be enabled by default?
If I am not mistaken, bridging offload, IP forwarding offload and
similar have to be enabled explicitly, at least.

I am not convinced that this definition would indeed contradict the
IEEE standard. It might just be a stronger one with more guarantees.
Assuming you have a very stupid NIC without any filtering or offloading
capabilities, which would always forward all frames to the CPU.
Would this NIC not comply to the standard?

@Jakub
May I take your answer as a final decision or would you prefer some more
input on this?

> 
> > c) - Use libpcap / Do whatever libpcap does (like with TPACKET)
> >    In this case you need to help me convince the tcpdump folks that this
> >    is a bug on their side... ;-)
> 
> Well, that assumes that today, tcpdump can always capture all traffic on
> all types of interfaces, something which is already false (see bridging
> offload / IP forwarding offload). There, it was even argued that you'd
> better be 100% sure that you want to see all received traffic, as the
> interfaces can be very high-speed, and not even a mirroring rule might
> guarantee reception of 100% of the traffic. That's why things like sFlow
> / tc-sample exist.
> 
> > d) - Read the controller datasheet
> >    - Read the kernel documentation
> >    - Read your kernels and drivers sources
> >    - Do whatever might be necessary
> 
> Yes, in general reading is helpful, but I'm not quite sure where you're
> going with this?

Well, that just meant, that there should always be a way, but no
universal one.
Which one depends on your exact hardware setup or maybe even the current
constellation of stars...

> 
> > e) - No, there is no guaranteed way to to this
> 
> No, there should be a way to achieve that through some sort of
> configuration.
> 
> > Any opinions on these questions?
> 
> My 2 cents have just been shared.
> 
> > After those are answered, I am open to suggestions on how to fix this
> > differently (if still needed).
> 
> Turn off VLAN filtering, or get a commonly accepted definition of
> promiscuity?

Other Drivers
=============

So I tried to figure out what other existing hardware drivers do
by grepping for drivers which do something on a change to
NETIF_F_HW_VLAN_CTAG_FILTER.

Here are the results of me trying to understand the drivers quickly.
I hope it's somewhat close and helps:

1) aqc111
   This controller has a HW register flag for promiscuous mode.
   I am not sure what it does, but since this is another ASIX
   device, the documentation from above should apply.

2) lan78xx
   Here it began ...

3) ixgbe
   This driver disables HW_VLAN_CTAG_FILTER if IFF_PROMISC is set.

4) ice
   I don't know.

5) ocelot_net
   This driver does not support promiscuous mode with the following
   explanation:
	/* This doesn't handle promiscuous mode because the bridge core is
	 * setting IFF_PROMISC on all slave interfaces and all frames would be
	 * forwarded to the CPU port.
	 */
   See the already mentioned discussion https://lkml.org/lkml/2019/8/29/255
   for more.

6) liquidio
   This driver also has a HW flag for promiscuous mode.
   I don't know what it does, though.

7) mvpp2
   This driver disables vid filtering if IFF_PROMISC is set.

8) efx
	/* Disable VLAN filtering by default.  It may be enforced if
	 * the feature is fixed (i.e. VLAN filters are required to
	 * receive VLAN tagged packets due to vPort restrictions).
	 */
   I did not find a variant that really honors this feature.

9) ef100
   I don't know.

10) mlx5
   This driver sets a HW promisc flag, adds an `ANY` vlan filter rule
   and ignores further changes to vlan filtering if IFF_PROMISC is set.

11) nfp
   This driver uses a HW promisc flag.

12) atlantic
   I don't know.

13) xlgmac
   This one has an interesting comment in `xlgmac_set_promiscuous_mode`:
	/* Hardware will still perform VLAN filtering in promiscuous mode */
	if (enable) {
		xlgmac_disable_rx_vlan_filtering(pdata);
	} else {
		if (pdata->netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER)
			xlgmac_enable_rx_vlan_filtering(pdata);
	}

14) enetc
   I think, the feature is overridden everytime in `set_rx_mode` as long
   as IFF_PROMISC is set.

15) xgbe
   In `xgbe_set_promiscuous_mode` once again:
	/* Hardware will still perform VLAN filtering in promiscuous mode */
	if (enable) {
		xgbe_disable_rx_vlan_filtering(pdata);
	} else {
		if (pdata->netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER)
			xgbe_enable_rx_vlan_filtering(pdata);
	}

Implementation
==============

I then tried to come up with a solution in net/core that would
universally disable vlan filtering in promiscuous mode.
Removing the features in `netdev_fix_features` is easily done, but
updating the active set of features whenever IFF_PROMISC changes
seems hard.

`__dev_set_promiscuity` is often called in atomic context but
`.ndo_set_features` can sleep in many drivers.

Adding a work_queue or similar to every net_device seems clumsy and
inappropriate.

Rewriting ndo_set_features of all drivers to be atomic is not a task
you should ask from me...

Calling `netdev_update_features` after every entrypoint that locks
the rtnl seems too error-prone and also clumsy.

Only updating the features when promiscuity was explicitly requested
by userspace in `dev_change_flags` might leave the device in a
weird inconsistent state.

Continue to let each driver enforce the kernels definition of
promiscuity. They know how to update the features atomically.
Then I am back at my original patch...

I'm afraid, I might need some guidance on how to approach this.

Thanks for your help and all of your responses

Markus