netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: Avoid spurious rx_dropped increases with tap and rx_handler
@ 2020-05-25  5:01 Benjamin Poirier
  2020-05-25  5:41 ` Eric Dumazet
  2020-05-26  1:00 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Benjamin Poirier @ 2020-05-25  5:01 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov, Eric Dumazet, Jiri Pirko

Consider an skb which doesn't match a ptype_base/ptype_specific handler. If
this skb is delivered to a ptype_all handler, it does not count as a drop.
However, if the skb is also processed by an rx_handler which returns
RX_HANDLER_PASS, the frame is now counted as a drop because pt_prev was
reset. An example of this situation is an LLDP frame received on a bridge
port while lldpd is listening on a packet socket with ETH_P_ALL (ex. by
specifying `lldpd -c`).

Fix by adding an extra condition variable to record if the skb was
delivered to a packet tap before running an rx_handler.

The situation is similar for RX_HANDLER_EXACT frames so their accounting is
also changed. OTOH, the behavior is unchanged for RX_HANDLER_ANOTHER frames
- they are accounted according to what happens with the new skb->dev.

Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter")
Message-Id: <20200522011420.263574-1-bpoirier@cumulusnetworks.com>
Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
---
 net/core/dev.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ae37586f6ee8..07957a0f57e6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5061,11 +5061,11 @@ static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
 static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
 				    struct packet_type **ppt_prev)
 {
+	bool deliver_exact = false, rx_tapped = false;
 	struct packet_type *ptype, *pt_prev;
 	rx_handler_func_t *rx_handler;
 	struct sk_buff *skb = *pskb;
 	struct net_device *orig_dev;
-	bool deliver_exact = false;
 	int ret = NET_RX_DROP;
 	__be16 type;
 
@@ -5158,12 +5158,14 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
 		if (pt_prev) {
 			ret = deliver_skb(skb, pt_prev, orig_dev);
 			pt_prev = NULL;
+			rx_tapped = true;
 		}
 		switch (rx_handler(&skb)) {
 		case RX_HANDLER_CONSUMED:
 			ret = NET_RX_SUCCESS;
 			goto out;
 		case RX_HANDLER_ANOTHER:
+			rx_tapped = false;
 			goto another_round;
 		case RX_HANDLER_EXACT:
 			deliver_exact = true;
@@ -5234,11 +5236,13 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
 			goto drop;
 		*ppt_prev = pt_prev;
 	} else {
+		if (!rx_tapped) {
 drop:
-		if (!deliver_exact)
-			atomic_long_inc(&skb->dev->rx_dropped);
-		else
-			atomic_long_inc(&skb->dev->rx_nohandler);
+			if (!deliver_exact)
+				atomic_long_inc(&skb->dev->rx_dropped);
+			else
+				atomic_long_inc(&skb->dev->rx_nohandler);
+		}
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: Avoid spurious rx_dropped increases with tap and rx_handler
  2020-05-25  5:01 [PATCH net-next] net: Avoid spurious rx_dropped increases with tap and rx_handler Benjamin Poirier
@ 2020-05-25  5:41 ` Eric Dumazet
  2020-05-25  8:25   ` Benjamin Poirier
  2020-05-26  1:00 ` David Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2020-05-25  5:41 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: netdev, Nikolay Aleksandrov, Jiri Pirko

On Sun, May 24, 2020 at 10:02 PM Benjamin Poirier
<bpoirier@cumulusnetworks.com> wrote:
>
> Consider an skb which doesn't match a ptype_base/ptype_specific handler. If
> this skb is delivered to a ptype_all handler, it does not count as a drop.
> However, if the skb is also processed by an rx_handler which returns
> RX_HANDLER_PASS, the frame is now counted as a drop because pt_prev was
> reset. An example of this situation is an LLDP frame received on a bridge
> port while lldpd is listening on a packet socket with ETH_P_ALL (ex. by
> specifying `lldpd -c`).
>
> Fix by adding an extra condition variable to record if the skb was
> delivered to a packet tap before running an rx_handler.
>
> The situation is similar for RX_HANDLER_EXACT frames so their accounting is
> also changed. OTOH, the behavior is unchanged for RX_HANDLER_ANOTHER frames
> - they are accounted according to what happens with the new skb->dev.
>
> Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter")

I disagree.

> Message-Id: <20200522011420.263574-1-bpoirier@cumulusnetworks.com>
> Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>
> ---
>  net/core/dev.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ae37586f6ee8..07957a0f57e6 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5061,11 +5061,11 @@ static inline int nf_ingress(struct sk_buff *skb, struct packet_type **pt_prev,
>  static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
>                                     struct packet_type **ppt_prev)
>  {
> +       bool deliver_exact = false, rx_tapped = false;
>         struct packet_type *ptype, *pt_prev;
>         rx_handler_func_t *rx_handler;
>         struct sk_buff *skb = *pskb;
>         struct net_device *orig_dev;
> -       bool deliver_exact = false;
>         int ret = NET_RX_DROP;
>         __be16 type;
>
> @@ -5158,12 +5158,14 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
>                 if (pt_prev) {
>                         ret = deliver_skb(skb, pt_prev, orig_dev);
>                         pt_prev = NULL;
> +                       rx_tapped = true;
>                 }
>                 switch (rx_handler(&skb)) {
>                 case RX_HANDLER_CONSUMED:
>                         ret = NET_RX_SUCCESS;
>                         goto out;
>                 case RX_HANDLER_ANOTHER:
> +                       rx_tapped = false;
>                         goto another_round;
>                 case RX_HANDLER_EXACT:
>                         deliver_exact = true;
> @@ -5234,11 +5236,13 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc,
>                         goto drop;
>                 *ppt_prev = pt_prev;
>         } else {
> +               if (!rx_tapped) {
>  drop:
> -               if (!deliver_exact)
> -                       atomic_long_inc(&skb->dev->rx_dropped);
> -               else
> -                       atomic_long_inc(&skb->dev->rx_nohandler);
> +                       if (!deliver_exact)
> +                               atomic_long_inc(&skb->dev->rx_dropped);
> +                       else
> +                               atomic_long_inc(&skb->dev->rx_nohandler);
> +               }

This does not make sense to me.

Here we call kfree_skb() meaning this packet is _dropped_.
I understand it does not please some people, because they do not
always understand the meaning of this counter, but it is a mere fact.

Fact that a packet capture made a clone of this packet should not
matter, tcpdump should not hide that a packet is _dropped_.

>                 kfree_skb(skb);
>                 /* Jamal, now you will not able to escape explaining
>                  * me how you were going to use this. :-)


Can we please not add code in the fast path ?

Why are LLDP packets so special, why are they not consumed ?

I would suggest fixing the layer that handles LLDP packets so that we
do not have to workaround this issue in a critical fast path.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: Avoid spurious rx_dropped increases with tap and rx_handler
  2020-05-25  5:41 ` Eric Dumazet
@ 2020-05-25  8:25   ` Benjamin Poirier
  0 siblings, 0 replies; 4+ messages in thread
From: Benjamin Poirier @ 2020-05-25  8:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Nikolay Aleksandrov, Jiri Pirko

On 2020-05-24 22:41 -0700, Eric Dumazet wrote:
> On Sun, May 24, 2020 at 10:02 PM Benjamin Poirier
> <bpoirier@cumulusnetworks.com> wrote:
> >
> > Consider an skb which doesn't match a ptype_base/ptype_specific handler. If
> > this skb is delivered to a ptype_all handler, it does not count as a drop.
> > However, if the skb is also processed by an rx_handler which returns
> > RX_HANDLER_PASS, the frame is now counted as a drop because pt_prev was
> > reset. An example of this situation is an LLDP frame received on a bridge
> > port while lldpd is listening on a packet socket with ETH_P_ALL (ex. by
> > specifying `lldpd -c`).
> >
> > Fix by adding an extra condition variable to record if the skb was
> > delivered to a packet tap before running an rx_handler.
> >
> > The situation is similar for RX_HANDLER_EXACT frames so their accounting is
> > also changed. OTOH, the behavior is unchanged for RX_HANDLER_ANOTHER frames
> > - they are accounted according to what happens with the new skb->dev.
> >
> > Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter")
> 
> I disagree.
> 
> > Message-Id: <20200522011420.263574-1-bpoirier@cumulusnetworks.com>
[...]
> > +               if (!rx_tapped) {
> >  drop:
> > -               if (!deliver_exact)
> > -                       atomic_long_inc(&skb->dev->rx_dropped);
> > -               else
> > -                       atomic_long_inc(&skb->dev->rx_nohandler);
> > +                       if (!deliver_exact)
> > +                               atomic_long_inc(&skb->dev->rx_dropped);
> > +                       else
> > +                               atomic_long_inc(&skb->dev->rx_nohandler);
> > +               }
> 
> This does not make sense to me.
> 
> Here we call kfree_skb() meaning this packet is _dropped_.
> I understand it does not please some people, because they do not
> always understand the meaning of this counter, but it is a mere fact.

IMO, the core of the issue is calling deliver_skb() before running the
rx_handler function. The rx_handler may not even attempt to deliver the
skb anywhere (RX_HANDLER_PASS). Because of that deliver_skb() call, the
packet_type handler (packet_rcv()) makes a spurious skb_clone(). Once a
useless copy has been made, it has to be freed somewhere. That's why
with my patch there may be kfree_skb() without an increase of the
dropped counter.

> 
> Fact that a packet capture made a clone of this packet should not
> matter, tcpdump should not hide that a packet is _dropped_.

What this patch intends to fix is that the behavior is inconsistent
depending on whether the interface has an rx_handler or not:
	eth0 nomaster -> tapped frames don't count as dropped
	eth0 master br0 -> tapped frames count as dropped
That has been the case since the counter was introduced.

This patch makes tapped frames uniformly not count as drops. If we
should move in the other direction (always count frames that were only
delivered to ptype_all handlers as dropped), I'll work on a different
patch.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: Avoid spurious rx_dropped increases with tap and rx_handler
  2020-05-25  5:01 [PATCH net-next] net: Avoid spurious rx_dropped increases with tap and rx_handler Benjamin Poirier
  2020-05-25  5:41 ` Eric Dumazet
@ 2020-05-26  1:00 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2020-05-26  1:00 UTC (permalink / raw)
  To: bpoirier; +Cc: netdev, nikolay, edumazet, jiri

From: Benjamin Poirier <bpoirier@cumulusnetworks.com>
Date: Mon, 25 May 2020 14:01:37 +0900

> Consider an skb which doesn't match a ptype_base/ptype_specific handler. If
> this skb is delivered to a ptype_all handler, it does not count as a drop.
> However, if the skb is also processed by an rx_handler which returns
> RX_HANDLER_PASS, the frame is now counted as a drop because pt_prev was
> reset. An example of this situation is an LLDP frame received on a bridge
> port while lldpd is listening on a packet socket with ETH_P_ALL (ex. by
> specifying `lldpd -c`).
> 
> Fix by adding an extra condition variable to record if the skb was
> delivered to a packet tap before running an rx_handler.
> 
> The situation is similar for RX_HANDLER_EXACT frames so their accounting is
> also changed. OTOH, the behavior is unchanged for RX_HANDLER_ANOTHER frames
> - they are accounted according to what happens with the new skb->dev.
> 
> Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter")
> Message-Id: <20200522011420.263574-1-bpoirier@cumulusnetworks.com>
> Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com>

You can state over and over about the semantics of PASS and other
rx_handler return values, but as Eric pointed out we free this SKB
so the counter should be bumped.

I'm sorry if it is confusing, but the counter is counting the event
accurately.

If you can make that kfree_skb() not happen, then maybe it shouldn't
be incremented.  But until then...

I'm not applying this, sorry.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-26  1:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25  5:01 [PATCH net-next] net: Avoid spurious rx_dropped increases with tap and rx_handler Benjamin Poirier
2020-05-25  5:41 ` Eric Dumazet
2020-05-25  8:25   ` Benjamin Poirier
2020-05-26  1:00 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).