All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
@ 2016-01-22 19:11 Jarod Wilson
  2016-01-22 20:59 ` Jay Vosburgh
                   ` (7 more replies)
  0 siblings, 8 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-22 19:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

The network core tries to keep track of dropped packets, but some packets
you wouldn't really call dropped, so much as intentionally ignored, under
certain circumstances. One such case is that of bonding and team device
slaves that are currently inactive. Their respective rx_handler functions
return RX_HANDLER_EXACT (the only places in the kernel that return that),
which ends up tracking into the network core's __netif_receive_skb_core()
function's drop path, with no pt_prev set. On a noisy network, this can
result in a very rapidly incrementing rx_dropped counter, not only on the
inactive slave(s), but also on the master device, such as the following:

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  p7p1: 14783346  140430    0 140428    0     0          0      2040      680       8    0    0    0     0       0          0
  p7p2: 14805198  140648    0    0    0     0          0      2034        0       0    0    0    0     0       0          0
 bond0: 53365248  532798    0 421160    0     0          0    115151     2040      24    0    0    0     0       0          0
    lo:    5420      54    0    0    0     0          0         0     5420      54    0    0    0     0       0          0
  p5p1: 19292195  196197    0 140368    0     0          0     56564      680       8    0    0    0     0       0          0
  p5p2: 19289707  196171    0 140364    0     0          0     56547      680       8    0    0    0     0       0          0
   em3: 20996626  158214    0    0    0     0          0       383        0       0    0    0    0     0       0          0
   em2: 14065122  138462    0    0    0     0          0       310        0       0    0    0    0     0       0          0
   em1: 14063162  138440    0    0    0     0          0       308        0       0    0    0    0     0       0          0
   em4: 21050830  158729    0    0    0     0          0       385    71662     469    0    0    0     0       0          0
   ib0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
active-backup bond0, and you can see that all three have high drop counts,
with the master bond0 showing a tally of all three.

I know that this was previously discussed some here:

    http://www.spinics.net/lists/netdev/msg226341.html

It seems additional counters never came to fruition, but honestly, for
this particular case, I'm not even sure they're warranted, I'd be inclined
to say just silently drop these packets without incrementing a counter. At
least, that's probably what would make someone who has complained loudly
about this issue happy, as they have monitoring tools that are squaking
loudly at any increments to rx_dropped.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..1354c7b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4153,8 +4153,11 @@ ncls:
 		else
 			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 	} else {
+		if (deliver_exact)
+			goto inactive; /* bond or team inactive slave */
 drop:
 		atomic_long_inc(&skb->dev->rx_dropped);
+inactive:
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
@ 2016-01-22 20:59 ` Jay Vosburgh
  2016-01-23  8:26   ` Jiri Pirko
  2016-01-23  8:07 ` Jiri Pirko
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 64+ messages in thread
From: Jay Vosburgh @ 2016-01-22 20:59 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Veaceslav Falico, Andy Gospodarek,
	netdev

Jarod Wilson <jarod@redhat.com> wrote:

>The network core tries to keep track of dropped packets, but some packets
>you wouldn't really call dropped, so much as intentionally ignored, under
>certain circumstances. One such case is that of bonding and team device
>slaves that are currently inactive. Their respective rx_handler functions
>return RX_HANDLER_EXACT (the only places in the kernel that return that),
>which ends up tracking into the network core's __netif_receive_skb_core()
>function's drop path, with no pt_prev set. On a noisy network, this can
>result in a very rapidly incrementing rx_dropped counter, not only on the
>inactive slave(s), but also on the master device, such as the following:
[...]
>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
>active-backup bond0, and you can see that all three have high drop counts,
>with the master bond0 showing a tally of all three.
>
>I know that this was previously discussed some here:
>
>    http://www.spinics.net/lists/netdev/msg226341.html
>
>It seems additional counters never came to fruition, but honestly, for
>this particular case, I'm not even sure they're warranted, I'd be inclined
>to say just silently drop these packets without incrementing a counter. At
>least, that's probably what would make someone who has complained loudly
>about this issue happy, as they have monitoring tools that are squaking
>loudly at any increments to rx_dropped.

	I don't think the kernel should silently drop packets; there
should be a counter somewhere.  If a packet is being thrown away
deliberately, it should not just vanish into the screaming void of
space.  Someday someone will try and track down where that packet is
being dropped.

	I've had that same conversation with customers who insist on
accounting for every packet drop (from the "any drop is an error"
mindset), so I understand the issue.

	Thinking about the prior discussion, the rx_drop_inactive is
still a good idea, but I'd actually today get good use from a
"rx_drop_unforwardable" (or an equivalent but shorter name) counter that
counts every time a packet is dropped due to is_skb_forwardable()
returning false.  __dev_forward_skb does this (and hits rx_dropped), as
does the bridge (and does not count it).

	-J

>CC: "David S. Miller" <davem@davemloft.net>
>CC: Eric Dumazet <edumazet@google.com>
>CC: Jiri Pirko <jiri@mellanox.com>
>CC: Daniel Borkmann <daniel@iogearbox.net>
>CC: Tom Herbert <tom@herbertland.com>
>CC: Jay Vosburgh <j.vosburgh@gmail.com>
>CC: Veaceslav Falico <vfalico@gmail.com>
>CC: Andy Gospodarek <gospo@cumulusnetworks.com>
>CC: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>---
> net/core/dev.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 8cba3d8..1354c7b 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -4153,8 +4153,11 @@ ncls:
> 		else
> 			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> 	} else {
>+		if (deliver_exact)
>+			goto inactive; /* bond or team inactive slave */
> drop:
> 		atomic_long_inc(&skb->dev->rx_dropped);
>+inactive:
> 		kfree_skb(skb);
> 		/* Jamal, now you will not able to escape explaining
> 		 * me how you were going to use this. :-)
>-- 
>1.8.3.1
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
  2016-01-22 20:59 ` Jay Vosburgh
@ 2016-01-23  8:07 ` Jiri Pirko
  2016-01-23 14:19 ` Andy Gospodarek
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 64+ messages in thread
From: Jiri Pirko @ 2016-01-23  8:07 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

Fri, Jan 22, 2016 at 08:11:22PM CET, jarod@redhat.com wrote:
>The network core tries to keep track of dropped packets, but some packets
>you wouldn't really call dropped, so much as intentionally ignored, under
>certain circumstances. One such case is that of bonding and team device
>slaves that are currently inactive. Their respective rx_handler functions
>return RX_HANDLER_EXACT (the only places in the kernel that return that),
>which ends up tracking into the network core's __netif_receive_skb_core()
>function's drop path, with no pt_prev set. On a noisy network, this can
>result in a very rapidly incrementing rx_dropped counter, not only on the
>inactive slave(s), but also on the master device, such as the following:
>
>Inter-|   Receive                                                |  Transmit
> face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
>  p7p1: 14783346  140430    0 140428    0     0          0      2040      680       8    0    0    0     0       0          0
>  p7p2: 14805198  140648    0    0    0     0          0      2034        0       0    0    0    0     0       0          0
> bond0: 53365248  532798    0 421160    0     0          0    115151     2040      24    0    0    0     0       0          0
>    lo:    5420      54    0    0    0     0          0         0     5420      54    0    0    0     0       0          0
>  p5p1: 19292195  196197    0 140368    0     0          0     56564      680       8    0    0    0     0       0          0
>  p5p2: 19289707  196171    0 140364    0     0          0     56547      680       8    0    0    0     0       0          0
>   em3: 20996626  158214    0    0    0     0          0       383        0       0    0    0    0     0       0          0
>   em2: 14065122  138462    0    0    0     0          0       310        0       0    0    0    0     0       0          0
>   em1: 14063162  138440    0    0    0     0          0       308        0       0    0    0    0     0       0          0
>   em4: 21050830  158729    0    0    0     0          0       385    71662     469    0    0    0     0       0          0
>   ib0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
>
>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
>active-backup bond0, and you can see that all three have high drop counts,
>with the master bond0 showing a tally of all three.
>
>I know that this was previously discussed some here:
>
>    http://www.spinics.net/lists/netdev/msg226341.html
>
>It seems additional counters never came to fruition, but honestly, for
>this particular case, I'm not even sure they're warranted, I'd be inclined
>to say just silently drop these packets without incrementing a counter. At
>least, that's probably what would make someone who has complained loudly
>about this issue happy, as they have monitoring tools that are squaking
>loudly at any increments to rx_dropped.
>
>CC: "David S. Miller" <davem@davemloft.net>
>CC: Eric Dumazet <edumazet@google.com>
>CC: Jiri Pirko <jiri@mellanox.com>
>CC: Daniel Borkmann <daniel@iogearbox.net>
>CC: Tom Herbert <tom@herbertland.com>
>CC: Jay Vosburgh <j.vosburgh@gmail.com>
>CC: Veaceslav Falico <vfalico@gmail.com>
>CC: Andy Gospodarek <gospo@cumulusnetworks.com>
>CC: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson <jarod@redhat.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

I think this should be considered as a bug and go to -net.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-22 20:59 ` Jay Vosburgh
@ 2016-01-23  8:26   ` Jiri Pirko
  0 siblings, 0 replies; 64+ messages in thread
From: Jiri Pirko @ 2016-01-23  8:26 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Jarod Wilson, linux-kernel, David S. Miller, Eric Dumazet,
	Jiri Pirko, Daniel Borkmann, Tom Herbert, Veaceslav Falico,
	Andy Gospodarek, netdev

Fri, Jan 22, 2016 at 09:59:12PM CET, jay.vosburgh@canonical.com wrote:
>Jarod Wilson <jarod@redhat.com> wrote:
>
>>The network core tries to keep track of dropped packets, but some packets
>>you wouldn't really call dropped, so much as intentionally ignored, under
>>certain circumstances. One such case is that of bonding and team device
>>slaves that are currently inactive. Their respective rx_handler functions
>>return RX_HANDLER_EXACT (the only places in the kernel that return that),
>>which ends up tracking into the network core's __netif_receive_skb_core()
>>function's drop path, with no pt_prev set. On a noisy network, this can
>>result in a very rapidly incrementing rx_dropped counter, not only on the
>>inactive slave(s), but also on the master device, such as the following:
>[...]
>>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
>>active-backup bond0, and you can see that all three have high drop counts,
>>with the master bond0 showing a tally of all three.
>>
>>I know that this was previously discussed some here:
>>
>>    http://www.spinics.net/lists/netdev/msg226341.html
>>
>>It seems additional counters never came to fruition, but honestly, for
>>this particular case, I'm not even sure they're warranted, I'd be inclined
>>to say just silently drop these packets without incrementing a counter. At
>>least, that's probably what would make someone who has complained loudly
>>about this issue happy, as they have monitoring tools that are squaking
>>loudly at any increments to rx_dropped.

In this case, it is delivered with exact delivery according to per-dev
registered callback. We just have to avoid it gets to bond. So this case
is not "to drop", but rather "to block skb to don't get where it does
not belong".

>
>	I don't think the kernel should silently drop packets; there
>should be a counter somewhere.  If a packet is being thrown away
>deliberately, it should not just vanish into the screaming void of
>space.  Someday someone will try and track down where that packet is
>being dropped.
>
>	I've had that same conversation with customers who insist on
>accounting for every packet drop (from the "any drop is an error"
>mindset), so I understand the issue.
>
>	Thinking about the prior discussion, the rx_drop_inactive is
>still a good idea, but I'd actually today get good use from a
>"rx_drop_unforwardable" (or an equivalent but shorter name) counter that
>counts every time a packet is dropped due to is_skb_forwardable()
>returning false.  __dev_forward_skb does this (and hits rx_dropped), as
>does the bridge (and does not count it).
>
>	-J
>
>>CC: "David S. Miller" <davem@davemloft.net>
>>CC: Eric Dumazet <edumazet@google.com>
>>CC: Jiri Pirko <jiri@mellanox.com>
>>CC: Daniel Borkmann <daniel@iogearbox.net>
>>CC: Tom Herbert <tom@herbertland.com>
>>CC: Jay Vosburgh <j.vosburgh@gmail.com>
>>CC: Veaceslav Falico <vfalico@gmail.com>
>>CC: Andy Gospodarek <gospo@cumulusnetworks.com>
>>CC: netdev@vger.kernel.org
>>Signed-off-by: Jarod Wilson <jarod@redhat.com>
>>---
>> net/core/dev.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>>diff --git a/net/core/dev.c b/net/core/dev.c
>>index 8cba3d8..1354c7b 100644
>>--- a/net/core/dev.c
>>+++ b/net/core/dev.c
>>@@ -4153,8 +4153,11 @@ ncls:
>> 		else
>> 			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>> 	} else {
>>+		if (deliver_exact)
>>+			goto inactive; /* bond or team inactive slave */
>> drop:
>> 		atomic_long_inc(&skb->dev->rx_dropped);
>>+inactive:
>> 		kfree_skb(skb);
>> 		/* Jamal, now you will not able to escape explaining
>> 		 * me how you were going to use this. :-)
>>-- 
>>1.8.3.1
>>
>
>---
>	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
  2016-01-22 20:59 ` Jay Vosburgh
  2016-01-23  8:07 ` Jiri Pirko
@ 2016-01-23 14:19 ` Andy Gospodarek
  2016-01-23 15:23 ` Eric Dumazet
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 64+ messages in thread
From: Andy Gospodarek @ 2016-01-23 14:19 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	netdev

On Fri, Jan 22, 2016 at 02:11:22PM -0500, Jarod Wilson wrote:
> The network core tries to keep track of dropped packets, but some packets
> you wouldn't really call dropped, so much as intentionally ignored, under
> certain circumstances. One such case is that of bonding and team device
> slaves that are currently inactive. Their respective rx_handler functions
> return RX_HANDLER_EXACT (the only places in the kernel that return that),
> which ends up tracking into the network core's __netif_receive_skb_core()
> function's drop path, with no pt_prev set. On a noisy network, this can
> result in a very rapidly incrementing rx_dropped counter, not only on the
> inactive slave(s), but also on the master device, such as the following:
> 
> Inter-|   Receive                                                |  Transmit
>  face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
>   p7p1: 14783346  140430    0 140428    0     0          0      2040      680       8    0    0    0     0       0          0
>   p7p2: 14805198  140648    0    0    0     0          0      2034        0       0    0    0    0     0       0          0
>  bond0: 53365248  532798    0 421160    0     0          0    115151     2040      24    0    0    0     0       0          0
>     lo:    5420      54    0    0    0     0          0         0     5420      54    0    0    0     0       0          0
>   p5p1: 19292195  196197    0 140368    0     0          0     56564      680       8    0    0    0     0       0          0
>   p5p2: 19289707  196171    0 140364    0     0          0     56547      680       8    0    0    0     0       0          0
>    em3: 20996626  158214    0    0    0     0          0       383        0       0    0    0    0     0       0          0
>    em2: 14065122  138462    0    0    0     0          0       310        0       0    0    0    0     0       0          0
>    em1: 14063162  138440    0    0    0     0          0       308        0       0    0    0    0     0       0          0
>    em4: 21050830  158729    0    0    0     0          0       385    71662     469    0    0    0     0       0          0
>    ib0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
> 
> In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
> active-backup bond0, and you can see that all three have high drop counts,
> with the master bond0 showing a tally of all three.
> 
> I know that this was previously discussed some here:
> 
>     http://www.spinics.net/lists/netdev/msg226341.html
> 
> It seems additional counters never came to fruition, but honestly, for
> this particular case, I'm not even sure they're warranted, I'd be inclined
> to say just silently drop these packets without incrementing a counter. At
> least, that's probably what would make someone who has complained loudly
> about this issue happy, as they have monitoring tools that are squaking
> loudly at any increments to rx_dropped.

I completely agree.

> CC: "David S. Miller" <davem@davemloft.net>
> CC: Eric Dumazet <edumazet@google.com>
> CC: Jiri Pirko <jiri@mellanox.com>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> CC: Tom Herbert <tom@herbertland.com>
> CC: Jay Vosburgh <j.vosburgh@gmail.com>
> CC: Veaceslav Falico <vfalico@gmail.com>
> CC: Andy Gospodarek <gospo@cumulusnetworks.com>
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>

Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>

> ---
>  net/core/dev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..1354c7b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4153,8 +4153,11 @@ ncls:
>  		else
>  			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>  	} else {
> +		if (deliver_exact)
> +			goto inactive; /* bond or team inactive slave */
>  drop:
>  		atomic_long_inc(&skb->dev->rx_dropped);
> +inactive:
>  		kfree_skb(skb);
>  		/* Jamal, now you will not able to escape explaining
>  		 * me how you were going to use this. :-)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
                   ` (2 preceding siblings ...)
  2016-01-23 14:19 ` Andy Gospodarek
@ 2016-01-23 15:23 ` Eric Dumazet
  2016-01-26 21:14   ` Jarod Wilson
  2016-01-25  6:42 ` David Miller
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-01-23 15:23 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:

> ---
>  net/core/dev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..1354c7b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4153,8 +4153,11 @@ ncls:
>  		else
>  			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>  	} else {
> +		if (deliver_exact)
> +			goto inactive; /* bond or team inactive slave */
>  drop:
>  		atomic_long_inc(&skb->dev->rx_dropped);
> +inactive:
>  		kfree_skb(skb);
>  		/* Jamal, now you will not able to escape explaining
>  		 * me how you were going to use this. :-)

Note that if you still have a kfree_skb() instead of consume_skb(),
some tools will still give you a wrong signal (packet dropped ...).

But then maybe the signal is telling some truth.

We receive a packet, and decide to drop it because no one was willing to
handle it.

Maybe someone wants to know a particular slave receives 10,000 such
frames per second and hurts performance with useless work.

We should at least increment some counter and maybe dump it with
"ethtool -S" or something.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
                   ` (3 preceding siblings ...)
  2016-01-23 15:23 ` Eric Dumazet
@ 2016-01-25  6:42 ` David Miller
  2016-01-25 14:27   ` Jarod Wilson
  2016-01-27 20:21 ` [PATCH net 0/4] net: add rx_unhandled stat counter Jarod Wilson
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 64+ messages in thread
From: David Miller @ 2016-01-25  6:42 UTC (permalink / raw)
  To: jarod
  Cc: linux-kernel, edumazet, jiri, daniel, tom, j.vosburgh, vfalico,
	gospo, netdev

From: Jarod Wilson <jarod@redhat.com>
Date: Fri, 22 Jan 2016 14:11:22 -0500

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..1354c7b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4153,8 +4153,11 @@ ncls:
>  		else
>  			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>  	} else {
> +		if (deliver_exact)
> +			goto inactive; /* bond or team inactive slave */
>  drop:
>  		atomic_long_inc(&skb->dev->rx_dropped);
> +inactive:
>  		kfree_skb(skb);
>  		/* Jamal, now you will not able to escape explaining
>  		 * me how you were going to use this. :-)
> -- 
> 1.8.3.1
> 

I agree that rx_dropped is not the correct stat to bump here, but
I'm totally against the event disappearing completely into thin
air.

You have to replace the rx_dropped bump with _something_.

The only reason this hasn't been "fixed" yet is that everyone is
too damn lazy to implement that "something".

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-25  6:42 ` David Miller
@ 2016-01-25 14:27   ` Jarod Wilson
  2016-01-26  4:45     ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-01-25 14:27 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, edumazet, jiri, daniel, tom, j.vosburgh, vfalico,
	gospo, netdev

On Sun, Jan 24, 2016 at 10:42:22PM -0800, David Miller wrote:
> From: Jarod Wilson <jarod@redhat.com>
> Date: Fri, 22 Jan 2016 14:11:22 -0500
> 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8cba3d8..1354c7b 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4153,8 +4153,11 @@ ncls:
> >  		else
> >  			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> >  	} else {
> > +		if (deliver_exact)
> > +			goto inactive; /* bond or team inactive slave */
> >  drop:
> >  		atomic_long_inc(&skb->dev->rx_dropped);
> > +inactive:
> >  		kfree_skb(skb);
> >  		/* Jamal, now you will not able to escape explaining
> >  		 * me how you were going to use this. :-)
> > -- 
> > 1.8.3.1
> > 
> 
> I agree that rx_dropped is not the correct stat to bump here, but
> I'm totally against the event disappearing completely into thin
> air.
> 
> You have to replace the rx_dropped bump with _something_.
> 
> The only reason this hasn't been "fixed" yet is that everyone is
> too damn lazy to implement that "something".

Would you want to see all things that shouldn't increment rx_dropped come
in one shot, along with the four or so other counters, as discussed in the
prior thread, or can they be done piecemeal? To date, I'm really only
familiar with this particular case, and could probably get something
together this week. To address the rest, I'd have to poke around a bit
more and see what there is to see and do.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-25 14:27   ` Jarod Wilson
@ 2016-01-26  4:45     ` Jarod Wilson
  0 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-26  4:45 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, edumazet, jiri, daniel, tom, j.vosburgh, vfalico,
	gospo, netdev

On Mon, Jan 25, 2016 at 09:27:20AM -0500, Jarod Wilson wrote:
> On Sun, Jan 24, 2016 at 10:42:22PM -0800, David Miller wrote:
> > From: Jarod Wilson <jarod@redhat.com>
> > Date: Fri, 22 Jan 2016 14:11:22 -0500
> > 
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 8cba3d8..1354c7b 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -4153,8 +4153,11 @@ ncls:
> > >  		else
> > >  			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> > >  	} else {
> > > +		if (deliver_exact)
> > > +			goto inactive; /* bond or team inactive slave */
> > >  drop:
> > >  		atomic_long_inc(&skb->dev->rx_dropped);
> > > +inactive:
> > >  		kfree_skb(skb);
> > >  		/* Jamal, now you will not able to escape explaining
> > >  		 * me how you were going to use this. :-)
> > 
> > I agree that rx_dropped is not the correct stat to bump here, but
> > I'm totally against the event disappearing completely into thin
> > air.
> > 
> > You have to replace the rx_dropped bump with _something_.
> > 
> > The only reason this hasn't been "fixed" yet is that everyone is
> > too damn lazy to implement that "something".
> 
> Would you want to see all things that shouldn't increment rx_dropped come
> in one shot, along with the four or so other counters, as discussed in the
> prior thread, or can they be done piecemeal? To date, I'm really only
> familiar with this particular case, and could probably get something
> together this week. To address the rest, I'd have to poke around a bit
> more and see what there is to see and do.

Spent a while hacking around today, now have this, p7p1 and p5p2 are
the inactive slaves in the bond:

[root@dell-per720-06 ~]# cat /proc/net/dev
Inter-|   Receive                                                       |  Transmit
 face |bytes    packets errs drop drop_i fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  p6p1:   16024     238    0    0      0    0     0          0       521        0       0    0    0    0     0       0          0
  p7p1: 1691386   16537    0    0  16568    0     0          0       488        0       0    0    0    0     0       0          0
  p7p2: 1709438   16718    0    0      0    0     0          0       561        0       0    0    0    0     0       0          0
 bond0: 6183056   63065    0    0  33151    0     0          0     13964    24747     193    0    0    0     0       0          0
  p4p1:       0       0    0    0      0    0     0          0         0        0       0    0    0    0     0       0          0
  p4p2:       0       0    0    0      0    0     0          0         0        0       0    0    0    0     0       0          0
    lo:    4928      50    0    0      0    0     0          0         0     4928      50    0    0    0     0       0          0
  p5p1: 2259498   23401    0    0      0    0     0          0      6740    24747     193    0    0    0     0       0          0
  p5p2: 2232172   23127    0    0  16583    0     0          0      6736        0       0    0    0    0     0       0          0
   em4: 2347251   18224    0    0      0    0     0          0        90     4541      47    0    0    0     0       0          0
   em2: 1590296   16061    0    0      0    0     0          0        81        0       0    0    0    0     0       0          0
   em1: 1590180   16060    0    0      0    0     0          0        79        0       0    0    0    0     0       0          0
   em3: 2343156   18209    0    0      0    0     0          0        94        0       0    0    0    0     0       0          0
[root@dell-per720-06 ~]# cat /sys/devices/virtual/net/bond0/statistics/rx_dropped_inactive
33181

Haven't yet thrown together anything for ethtool -S output as Eric had
suggested, but I'll dig into that tomorrow.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-23 15:23 ` Eric Dumazet
@ 2016-01-26 21:14   ` Jarod Wilson
  2016-01-26 21:21     ` David Miller
  2016-01-26 21:24     ` Eric Dumazet
  0 siblings, 2 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-26 21:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote:
> On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:
> 
> > ---
> >  net/core/dev.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8cba3d8..1354c7b 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4153,8 +4153,11 @@ ncls:
> >  		else
> >  			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> >  	} else {
> > +		if (deliver_exact)
> > +			goto inactive; /* bond or team inactive slave */
> >  drop:
> >  		atomic_long_inc(&skb->dev->rx_dropped);
> > +inactive:
> >  		kfree_skb(skb);
> >  		/* Jamal, now you will not able to escape explaining
> >  		 * me how you were going to use this. :-)
> 
> Note that if you still have a kfree_skb() instead of consume_skb(),
> some tools will still give you a wrong signal (packet dropped ...).
> 
> But then maybe the signal is telling some truth.
> 
> We receive a packet, and decide to drop it because no one was willing to
> handle it.
> 
> Maybe someone wants to know a particular slave receives 10,000 such
> frames per second and hurts performance with useless work.
> 
> We should at least increment some counter and maybe dump it with
> "ethtool -S" or something.

I've been digging into ethtool -S a little bit, and am somewhat at a loss
as to how I would wire into this. From what I've been able to figure out,
it's entirely device-specific-ish counters spit out. On my sfc cards, I
get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for
the core network stack, these are actually added up and shoved into
rx_dropped, and no other network driver has those two individual counters.

By itself, rx_dropped isn't output directly anywhere from ethtool, 
so far as I can see. And ethtool -S bondX shows absolutely nothing.
*Should* ethtool -S be dumping all the network core stats? I have to say I
was more than a little surprised at this:

# ethtool -S bond0
no stats available

Particularly given that if I look in /proc/net/dev or
/sys/devices/virtual/net/bond0/statistics/*, there are quite a few stats
that are being tracked and make their way out to userspace...

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-26 21:14   ` Jarod Wilson
@ 2016-01-26 21:21     ` David Miller
  2016-01-26 21:36       ` Jarod Wilson
  2016-01-26 21:24     ` Eric Dumazet
  1 sibling, 1 reply; 64+ messages in thread
From: David Miller @ 2016-01-26 21:21 UTC (permalink / raw)
  To: jarod
  Cc: eric.dumazet, linux-kernel, edumazet, jiri, daniel, tom,
	j.vosburgh, vfalico, gospo, netdev

From: Jarod Wilson <jarod@redhat.com>
Date: Tue, 26 Jan 2016 16:14:53 -0500

> # ethtool -S bond0
> no stats available

ethtool -S is for device specific stats.

Some drivers use this facility to provide per-RX-queue and per-TX-queue
versions of the existing core netdev stats.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-26 21:14   ` Jarod Wilson
  2016-01-26 21:21     ` David Miller
@ 2016-01-26 21:24     ` Eric Dumazet
  2016-01-26 21:35       ` Jarod Wilson
  1 sibling, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-01-26 21:24 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Eric Dumazet, LKML, David S. Miller, Jiri Pirko, Daniel Borkmann,
	Tom Herbert, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	netdev

On Tue, Jan 26, 2016 at 1:14 PM, Jarod Wilson <jarod@redhat.com> wrote:
> On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote:
>> On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:
>>
>> > ---
>> >  net/core/dev.c | 3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/net/core/dev.c b/net/core/dev.c
>> > index 8cba3d8..1354c7b 100644
>> > --- a/net/core/dev.c
>> > +++ b/net/core/dev.c
>> > @@ -4153,8 +4153,11 @@ ncls:
>> >             else
>> >                     ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>> >     } else {
>> > +           if (deliver_exact)
>> > +                   goto inactive; /* bond or team inactive slave */
>> >  drop:
>> >             atomic_long_inc(&skb->dev->rx_dropped);
>> > +inactive:
>> >             kfree_skb(skb);
>> >             /* Jamal, now you will not able to escape explaining
>> >              * me how you were going to use this. :-)
>>
>> Note that if you still have a kfree_skb() instead of consume_skb(),
>> some tools will still give you a wrong signal (packet dropped ...).
>>
>> But then maybe the signal is telling some truth.
>>
>> We receive a packet, and decide to drop it because no one was willing to
>> handle it.
>>
>> Maybe someone wants to know a particular slave receives 10,000 such
>> frames per second and hurts performance with useless work.
>>
>> We should at least increment some counter and maybe dump it with
>> "ethtool -S" or something.
>
> I've been digging into ethtool -S a little bit, and am somewhat at a loss
> as to how I would wire into this. From what I've been able to figure out,
> it's entirely device-specific-ish counters spit out. On my sfc cards, I
> get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for
> the core network stack, these are actually added up and shoved into
> rx_dropped, and no other network driver has those two individual counters.
>
> By itself, rx_dropped isn't output directly anywhere from ethtool,
> so far as I can see. And ethtool -S bondX shows absolutely nothing.
> *Should* ethtool -S be dumping all the network core stats? I have to say I
> was more than a little surprised at this:

# ip -s -s link sh dev eth0
15: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether 3c:97:0e:be:91:7b brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       0
    RX errors: length  crc     frame   fifo    missed
               0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0
    TX errors: aborted fifo    window  heartbeat
               0        0       0       0

So start with the following patch :

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b78090594..7c762cf1e4d5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -68,6 +68,8 @@ struct rtnl_link_stats64 {
        /* for cslip etc */
        __u64   rx_compressed;
        __u64   tx_compressed;
+
+       __u64   rx_nohandler;           /* packet was of no interest */
 };

 /* The struct should be in sync with struct ifmap */

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-26 21:24     ` Eric Dumazet
@ 2016-01-26 21:35       ` Jarod Wilson
  0 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-26 21:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, LKML, David S. Miller, Jiri Pirko, Daniel Borkmann,
	Tom Herbert, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	netdev

On Tue, Jan 26, 2016 at 01:24:59PM -0800, Eric Dumazet wrote:
> On Tue, Jan 26, 2016 at 1:14 PM, Jarod Wilson <jarod@redhat.com> wrote:
> > On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote:
> >> On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:
> >>
> >> > ---
> >> >  net/core/dev.c | 3 +++
> >> >  1 file changed, 3 insertions(+)
> >> >
> >> > diff --git a/net/core/dev.c b/net/core/dev.c
> >> > index 8cba3d8..1354c7b 100644
> >> > --- a/net/core/dev.c
> >> > +++ b/net/core/dev.c
> >> > @@ -4153,8 +4153,11 @@ ncls:
> >> >             else
> >> >                     ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> >> >     } else {
> >> > +           if (deliver_exact)
> >> > +                   goto inactive; /* bond or team inactive slave */
> >> >  drop:
> >> >             atomic_long_inc(&skb->dev->rx_dropped);
> >> > +inactive:
> >> >             kfree_skb(skb);
> >> >             /* Jamal, now you will not able to escape explaining
> >> >              * me how you were going to use this. :-)
> >>
> >> Note that if you still have a kfree_skb() instead of consume_skb(),
> >> some tools will still give you a wrong signal (packet dropped ...).
> >>
> >> But then maybe the signal is telling some truth.
> >>
> >> We receive a packet, and decide to drop it because no one was willing to
> >> handle it.
> >>
> >> Maybe someone wants to know a particular slave receives 10,000 such
> >> frames per second and hurts performance with useless work.
> >>
> >> We should at least increment some counter and maybe dump it with
> >> "ethtool -S" or something.
> >
> > I've been digging into ethtool -S a little bit, and am somewhat at a loss
> > as to how I would wire into this. From what I've been able to figure out,
> > it's entirely device-specific-ish counters spit out. On my sfc cards, I
> > get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for
> > the core network stack, these are actually added up and shoved into
> > rx_dropped, and no other network driver has those two individual counters.
> >
> > By itself, rx_dropped isn't output directly anywhere from ethtool,
> > so far as I can see. And ethtool -S bondX shows absolutely nothing.
> > *Should* ethtool -S be dumping all the network core stats? I have to say I
> > was more than a little surprised at this:
> 
> # ip -s -s link sh dev eth0
> 15: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
> pfifo_fast state DOWN mode DEFAULT group default qlen 1000
>     link/ether 3c:97:0e:be:91:7b brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast
>     0          0        0       0       0       0
>     RX errors: length  crc     frame   fifo    missed
>                0        0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       0       0       0
>     TX errors: aborted fifo    window  heartbeat
>                0        0       0       0
> 
> So start with the following patch :
> 
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index a30b78090594..7c762cf1e4d5 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -68,6 +68,8 @@ struct rtnl_link_stats64 {
>         /* for cslip etc */
>         __u64   rx_compressed;
>         __u64   tx_compressed;
> +
> +       __u64   rx_nohandler;           /* packet was of no interest */
>  };
> 
>  /* The struct should be in sync with struct ifmap */

I'm already well past that point, using rx_dropped_inactive as the stat
name though, based on the prior discussion. I can certainly convert that
over to rx_nohandler easily enough. It looks like adding a column to ip's
output there would be as simple as fetching the stat over netlink and
spitting it out.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
  2016-01-26 21:21     ` David Miller
@ 2016-01-26 21:36       ` Jarod Wilson
  0 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-26 21:36 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, linux-kernel, edumazet, jiri, daniel, tom,
	j.vosburgh, vfalico, gospo, netdev

On Tue, Jan 26, 2016 at 01:21:00PM -0800, David Miller wrote:
> From: Jarod Wilson <jarod@redhat.com>
> Date: Tue, 26 Jan 2016 16:14:53 -0500
> 
> > # ethtool -S bond0
> > no stats available
> 
> ethtool -S is for device specific stats.

Okay, good, that was what it looked like to me. Glad I'm not completely
lost here. :)

So this sort of output wouldn't belong there, it should show up in sysfs,
procfs, and be available to ip over netlink.

> Some drivers use this facility to provide per-RX-queue and per-TX-queue
> versions of the existing core netdev stats.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
                   ` (4 preceding siblings ...)
  2016-01-25  6:42 ` David Miller
@ 2016-01-27 20:21 ` Jarod Wilson
  2016-01-27 20:21   ` [PATCH net 1/4] " Jarod Wilson
                     ` (4 more replies)
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
  2016-01-28 16:22 ` [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
  7 siblings, 5 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-27 20:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev


The network core tries to keep track of dropped packets, but some packets
you wouldn't really call dropped, so much as intentionally ignored, under
certain circumstances. One such case is that of bonding and team device
slaves that are currently inactive. Their respective rx_handler functions
return RX_HANDLER_EXACT (the only places in the kernel that return that),
which ends up tracking into the network core's __netif_receive_skb_core()
function's drop path, with no pt_prev set. On a noisy network, this can
result in a very rapidly incrementing rx_dropped counter, not only on the
inactive slave(s), but also on the master device, such as the following:

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  p7p1: 14783346  140430    0 140428    0     0          0      2040      680       8    0    0    0     0       0          0
  p7p2: 14805198  140648    0    0    0     0          0      2034        0       0    0    0    0     0       0          0
 bond0: 53365248  532798    0 421160    0     0          0    115151     2040      24    0    0    0     0       0          0
    lo:    5420      54    0    0    0     0          0         0     5420      54    0    0    0     0       0          0
  p5p1: 19292195  196197    0 140368    0     0          0     56564      680       8    0    0    0     0       0          0
  p5p2: 19289707  196171    0 140364    0     0          0     56547      680       8    0    0    0     0       0          0
   em3: 20996626  158214    0    0    0     0          0       383        0       0    0    0    0     0       0          0
   em2: 14065122  138462    0    0    0     0          0       310        0       0    0    0    0     0       0          0
   em1: 14063162  138440    0    0    0     0          0       308        0       0    0    0    0     0       0          0
   em4: 21050830  158729    0    0    0     0          0       385    71662     469    0    0    0     0       0          0
   ib0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
active-backup bond0, and you can see that all three have high drop counts,
with the master bond0 showing a tally of all three.

I know that this was previously discussed some here:

    http://www.spinics.net/lists/netdev/msg226341.html

It seems additional counters never came to fruition, so this is a first
attempt at creating one of them, so that we stop calling these drops,
which for users monitoring rx_dropped, causes great alarm, and renders the
counter much less useful for them.

This adds a sysfs statistics node and copies the counter out via netlink,
procfs additions will be handled separately, as I'm unsure if the current
output should be considered a stable interface...

Additionally, I'm not certain if this set qualifies for net, or if it
should be put aside and resubmitted for net-next after 4.5 is put to
bed, but I do have users who consider this an important bugfix.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 include/linux/netdevice.h    | 4 ++++
 include/uapi/linux/if_link.h | 4 ++++
 net/core/dev.c               | 6 +++++-
 net/core/net-sysfs.c         | 2 ++
 net/core/rtnetlink.c         | 2 ++
 5 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c231..7973ab5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -180,6 +180,7 @@ struct net_device_stats {
 	unsigned long	tx_window_errors;
 	unsigned long	rx_compressed;
 	unsigned long	tx_compressed;
+	unsigned long	rx_unhandled;
 };
 
 
@@ -1397,6 +1398,8 @@ enum netdev_priv_flags {
  *			do not use this in drivers
  *	@tx_dropped:	Dropped packets by core network,
  *			do not use this in drivers
+ *	@rx_unhandled:	Unhandled dropped packets by core network on
+ *			inactive devices, do not use this in drivers
  *
  *	@wireless_handlers:	List of functions to handle Wireless Extensions,
  *				instead of ioctl,
@@ -1611,6 +1614,7 @@ struct net_device {
 
 	atomic_long_t		rx_dropped;
 	atomic_long_t		tx_dropped;
+	atomic_long_t		rx_unhandled;
 
 #ifdef CONFIG_WIRELESS_EXT
 	const struct iw_handler_def *	wireless_handlers;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b780..533639d 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -35,6 +35,8 @@ struct rtnl_link_stats {
 	/* for cslip etc */
 	__u32	rx_compressed;
 	__u32	tx_compressed;
+
+	__u32	rx_unhandled;		/* dropped, no handler found	*/
 };
 
 /* The main device statistics structure */
@@ -68,6 +70,8 @@ struct rtnl_link_stats64 {
 	/* for cslip etc */
 	__u64	rx_compressed;
 	__u64	tx_compressed;
+
+	__u64	rx_unhandled;		/* dropped, no handler found	*/
 };
 
 /* The struct should be in sync with struct ifmap */
diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..7700ca6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4154,7 +4154,10 @@ ncls:
 			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 	} else {
 drop:
-		atomic_long_inc(&skb->dev->rx_dropped);
+		if (!deliver_exact)
+			atomic_long_inc(&skb->dev->rx_dropped);
+		else
+			atomic_long_inc(&skb->dev->rx_unhandled);
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
@@ -7300,6 +7303,7 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 	}
 	storage->rx_dropped += atomic_long_read(&dev->rx_dropped);
 	storage->tx_dropped += atomic_long_read(&dev->tx_dropped);
+	storage->rx_unhandled += atomic_long_read(&dev->rx_unhandled);
 	return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b6c8a66..fd22276 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -574,6 +574,7 @@ NETSTAT_ENTRY(tx_heartbeat_errors);
 NETSTAT_ENTRY(tx_window_errors);
 NETSTAT_ENTRY(rx_compressed);
 NETSTAT_ENTRY(tx_compressed);
+NETSTAT_ENTRY(rx_unhandled);
 
 static struct attribute *netstat_attrs[] = {
 	&dev_attr_rx_packets.attr,
@@ -599,6 +600,7 @@ static struct attribute *netstat_attrs[] = {
 	&dev_attr_tx_window_errors.attr,
 	&dev_attr_rx_compressed.attr,
 	&dev_attr_tx_compressed.attr,
+	&dev_attr_rx_unhandled.attr,
 	NULL
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d735e85..2e0f656 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -804,6 +804,8 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
 
 	a->rx_compressed = b->rx_compressed;
 	a->tx_compressed = b->tx_compressed;
+
+	a->rx_unhandled = b->rx_unhandled;
 }
 
 static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
-- 
1.8.3.1


Jarod Wilson (4):
  net: add rx_unhandled counter for tracking inactive slave drops
  net-procfs: show rx_unhandled counters
  team: track sum of rx_unhandled for all slaves
  bond: track sum of rx_unhandled for all slaves

 drivers/net/bonding/bond_main.c |  1 +
 drivers/net/team/team.c         | 10 +++++++---
 include/linux/if_team.h         |  1 +
 include/linux/netdevice.h       |  4 ++++
 include/uapi/linux/if_link.h    |  4 ++++
 net/core/dev.c                  |  6 +++++-
 net/core/net-procfs.c           | 11 ++++++-----
 net/core/net-sysfs.c            |  2 ++
 net/core/rtnetlink.c            |  2 ++
 9 files changed, 32 insertions(+), 9 deletions(-)

-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net 1/4] net: add rx_unhandled stat counter
  2016-01-27 20:21 ` [PATCH net 0/4] net: add rx_unhandled stat counter Jarod Wilson
@ 2016-01-27 20:21   ` Jarod Wilson
  2016-01-27 20:21   ` [PATCH net 2/4] net-procfs: show rx_unhandled counters Jarod Wilson
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-27 20:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

This adds an rx_unhandled stat counter, along with a sysfs statistics
node, and copies the counter out via netlink as well.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 include/linux/netdevice.h    | 4 ++++
 include/uapi/linux/if_link.h | 4 ++++
 net/core/dev.c               | 6 +++++-
 net/core/net-sysfs.c         | 2 ++
 net/core/rtnetlink.c         | 2 ++
 5 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c231..7973ab5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -180,6 +180,7 @@ struct net_device_stats {
 	unsigned long	tx_window_errors;
 	unsigned long	rx_compressed;
 	unsigned long	tx_compressed;
+	unsigned long	rx_unhandled;
 };
 
 
@@ -1397,6 +1398,8 @@ enum netdev_priv_flags {
  *			do not use this in drivers
  *	@tx_dropped:	Dropped packets by core network,
  *			do not use this in drivers
+ *	@rx_unhandled:	Unhandled dropped packets by core network on
+ *			inactive devices, do not use this in drivers
  *
  *	@wireless_handlers:	List of functions to handle Wireless Extensions,
  *				instead of ioctl,
@@ -1611,6 +1614,7 @@ struct net_device {
 
 	atomic_long_t		rx_dropped;
 	atomic_long_t		tx_dropped;
+	atomic_long_t		rx_unhandled;
 
 #ifdef CONFIG_WIRELESS_EXT
 	const struct iw_handler_def *	wireless_handlers;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b780..533639d 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -35,6 +35,8 @@ struct rtnl_link_stats {
 	/* for cslip etc */
 	__u32	rx_compressed;
 	__u32	tx_compressed;
+
+	__u32	rx_unhandled;		/* dropped, no handler found	*/
 };
 
 /* The main device statistics structure */
@@ -68,6 +70,8 @@ struct rtnl_link_stats64 {
 	/* for cslip etc */
 	__u64	rx_compressed;
 	__u64	tx_compressed;
+
+	__u64	rx_unhandled;		/* dropped, no handler found	*/
 };
 
 /* The struct should be in sync with struct ifmap */
diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..7700ca6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4154,7 +4154,10 @@ ncls:
 			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 	} else {
 drop:
-		atomic_long_inc(&skb->dev->rx_dropped);
+		if (!deliver_exact)
+			atomic_long_inc(&skb->dev->rx_dropped);
+		else
+			atomic_long_inc(&skb->dev->rx_unhandled);
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
@@ -7300,6 +7303,7 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 	}
 	storage->rx_dropped += atomic_long_read(&dev->rx_dropped);
 	storage->tx_dropped += atomic_long_read(&dev->tx_dropped);
+	storage->rx_unhandled += atomic_long_read(&dev->rx_unhandled);
 	return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b6c8a66..fd22276 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -574,6 +574,7 @@ NETSTAT_ENTRY(tx_heartbeat_errors);
 NETSTAT_ENTRY(tx_window_errors);
 NETSTAT_ENTRY(rx_compressed);
 NETSTAT_ENTRY(tx_compressed);
+NETSTAT_ENTRY(rx_unhandled);
 
 static struct attribute *netstat_attrs[] = {
 	&dev_attr_rx_packets.attr,
@@ -599,6 +600,7 @@ static struct attribute *netstat_attrs[] = {
 	&dev_attr_tx_window_errors.attr,
 	&dev_attr_rx_compressed.attr,
 	&dev_attr_tx_compressed.attr,
+	&dev_attr_rx_unhandled.attr,
 	NULL
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d735e85..2e0f656 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -804,6 +804,8 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
 
 	a->rx_compressed = b->rx_compressed;
 	a->tx_compressed = b->tx_compressed;
+
+	a->rx_unhandled = b->rx_unhandled;
 }
 
 static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net 2/4] net-procfs: show rx_unhandled counters
  2016-01-27 20:21 ` [PATCH net 0/4] net: add rx_unhandled stat counter Jarod Wilson
  2016-01-27 20:21   ` [PATCH net 1/4] " Jarod Wilson
@ 2016-01-27 20:21   ` Jarod Wilson
  2016-01-27 20:21   ` [PATCH net 3/4] team: track sum of rx_unhandled for all slaves Jarod Wilson
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-27 20:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, netdev

The rx_unhandled counters get output here in a column after drop with
a heading name of unh. I'm not sure if perhaps /proc/net/dev should be
considered a stable interface that shouldn't be mucked with. If so,
this can certainly be dropped, without impacting the core functionality.

CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 net/core/net-procfs.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index 2bf8329..9f88f34 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -79,11 +79,12 @@ static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
 	struct rtnl_link_stats64 temp;
 	const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
 
-	seq_printf(seq, "%6s: %7llu %7llu %4llu %4llu %4llu %5llu %10llu %9llu "
-		   "%8llu %7llu %4llu %4llu %4llu %5llu %7llu %10llu\n",
+	seq_printf(seq, "%6s: %7llu %7llu %4llu %4llu %4llu %4llu %5llu %10llu "
+		   "%9llu %8llu %7llu %4llu %4llu %4llu %5llu %7llu %10llu\n",
 		   dev->name, stats->rx_bytes, stats->rx_packets,
 		   stats->rx_errors,
 		   stats->rx_dropped + stats->rx_missed_errors,
+		   stats->rx_unhandled,
 		   stats->rx_fifo_errors,
 		   stats->rx_length_errors + stats->rx_over_errors +
 		    stats->rx_crc_errors + stats->rx_frame_errors,
@@ -107,9 +108,9 @@ static int dev_seq_show(struct seq_file *seq, void *v)
 	if (v == SEQ_START_TOKEN)
 		seq_puts(seq, "Inter-|   Receive                            "
 			      "                    |  Transmit\n"
-			      " face |bytes    packets errs drop fifo frame "
-			      "compressed multicast|bytes    packets errs "
-			      "drop fifo colls carrier compressed\n");
+			      " face |bytes    packets errs drop  unh fifo "
+			      "frame compressed multicast|bytes    packets "
+			      "errs drop fifo colls carrier compressed\n");
 	else
 		dev_seq_printf_stats(seq, v);
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net 3/4] team: track sum of rx_unhandled for all slaves
  2016-01-27 20:21 ` [PATCH net 0/4] net: add rx_unhandled stat counter Jarod Wilson
  2016-01-27 20:21   ` [PATCH net 1/4] " Jarod Wilson
  2016-01-27 20:21   ` [PATCH net 2/4] net-procfs: show rx_unhandled counters Jarod Wilson
@ 2016-01-27 20:21   ` Jarod Wilson
  2016-01-27 20:21   ` [PATCH net 4/4] bond: " Jarod Wilson
  2016-01-27 21:09   ` [PATCH net 0/4] net: add rx_unhandled stat counter Eric Dumazet
  4 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-27 20:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, Jiri Pirko, netdev

CC: Jiri Pirko <jiri@resnulli.us>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/team/team.c | 10 +++++++---
 include/linux/if_team.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 718ceea..4460a8c 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -758,6 +758,8 @@ static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
 		u64_stats_update_end(&pcpu_stats->syncp);
 
 		skb->dev = team->dev;
+	} else if (res == RX_HANDLER_EXACT) {
+		this_cpu_inc(team->pcpu_stats->rx_unhandled);
 	} else {
 		this_cpu_inc(team->pcpu_stats->rx_dropped);
 	}
@@ -1807,7 +1809,7 @@ team_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	struct team *team = netdev_priv(dev);
 	struct team_pcpu_stats *p;
 	u64 rx_packets, rx_bytes, rx_multicast, tx_packets, tx_bytes;
-	u32 rx_dropped = 0, tx_dropped = 0;
+	u32 rx_dropped = 0, tx_dropped = 0, rx_unhandled = 0;
 	unsigned int start;
 	int i;
 
@@ -1828,14 +1830,16 @@ team_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 		stats->tx_packets	+= tx_packets;
 		stats->tx_bytes		+= tx_bytes;
 		/*
-		 * rx_dropped & tx_dropped are u32, updated
-		 * without syncp protection.
+		 * rx_dropped, tx_dropped & rx_unhandled are u32,
+		 * updated without syncp protection.
 		 */
 		rx_dropped	+= p->rx_dropped;
 		tx_dropped	+= p->tx_dropped;
+		rx_unhandled	+= p->rx_unhandled;
 	}
 	stats->rx_dropped	= rx_dropped;
 	stats->tx_dropped	= tx_dropped;
+	stats->rx_unhandled	= rx_unhandled;
 	return stats;
 }
 
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index b84e49c..787dbd2 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -24,6 +24,7 @@ struct team_pcpu_stats {
 	struct u64_stats_sync	syncp;
 	u32			rx_dropped;
 	u32			tx_dropped;
+	u32			rx_unhandled;
 };
 
 struct team;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net 4/4] bond: track sum of rx_unhandled for all slaves
  2016-01-27 20:21 ` [PATCH net 0/4] net: add rx_unhandled stat counter Jarod Wilson
                     ` (2 preceding siblings ...)
  2016-01-27 20:21   ` [PATCH net 3/4] team: track sum of rx_unhandled for all slaves Jarod Wilson
@ 2016-01-27 20:21   ` Jarod Wilson
  2016-01-27 21:09   ` [PATCH net 0/4] net: add rx_unhandled stat counter Eric Dumazet
  4 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-27 20:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, netdev

Sample output with this set applied for an active-backup bond:

$ cat /proc/net/dev
Inter-|   Receive                                                       |  Transmit
 face |bytes    packets errs drop  unh fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  p6p1:   16024     238    0    0    0    0     0          0       521        0       0    0    0    0     0       0          0
  p7p1: 1691386   16537    0    0 16568    0     0          0       488        0       0    0    0    0     0       0          0
  p7p2: 1709438   16718    0    0    0    0     0          0       561        0       0    0    0    0     0       0          0
 bond0: 6183056   63065    0    0 33151    0     0          0     13964    24747     193    0    0    0     0       0          0
  p4p1:       0       0    0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
  p4p2:       0       0    0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
    lo:    4928      50    0    0    0    0     0          0         0     4928      50    0    0    0     0       0          0
  p5p1: 2259498   23401    0    0    0    0     0          0      6740    24747     193    0    0    0     0       0          0
  p5p2: 2232172   23127    0    0 16583    0     0          0      6736        0       0    0    0    0     0       0          0
   em4: 2347251   18224    0    0    0    0     0          0        90     4541      47    0    0    0     0       0          0
   em2: 1590296   16061    0    0    0    0     0          0        81        0       0    0    0    0     0       0          0
   em1: 1590180   16060    0    0    0    0     0          0        79        0       0    0    0    0     0       0          0
   em3: 2343156   18209    0    0    0    0     0          0        94        0       0    0    0    0     0       0          0
   ib0:       0       0    0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
$ cat /sys/devices/virtual/net/bond0/statistics/rx_unhandled
33181

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 56b5605..72164dc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3309,6 +3309,7 @@ static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev,
 		stats->rx_bytes += sstats->rx_bytes - pstats->rx_bytes;
 		stats->rx_errors += sstats->rx_errors - pstats->rx_errors;
 		stats->rx_dropped += sstats->rx_dropped - pstats->rx_dropped;
+		stats->rx_unhandled += sstats->rx_unhandled - pstats->rx_unhandled;
 
 		stats->tx_packets += sstats->tx_packets - pstats->tx_packets;;
 		stats->tx_bytes += sstats->tx_bytes - pstats->tx_bytes;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-27 20:21 ` [PATCH net 0/4] net: add rx_unhandled stat counter Jarod Wilson
                     ` (3 preceding siblings ...)
  2016-01-27 20:21   ` [PATCH net 4/4] bond: " Jarod Wilson
@ 2016-01-27 21:09   ` Eric Dumazet
  2016-01-28  6:02     ` Jarod Wilson
  4 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-01-27 21:09 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Wed, 2016-01-27 at 15:21 -0500, Jarod Wilson wrote:

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 289c231..7973ab5 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -180,6 +180,7 @@ struct net_device_stats {
>  	unsigned long	tx_window_errors;
>  	unsigned long	rx_compressed;
>  	unsigned long	tx_compressed;
> +	unsigned long	rx_unhandled;
>  };
>  

This structure is deprecated, please do not add new fields in it,
as it will increase netlink answers for no good reason.

rtnl_link_stats64 is what really matters these days.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-27 21:09   ` [PATCH net 0/4] net: add rx_unhandled stat counter Eric Dumazet
@ 2016-01-28  6:02     ` Jarod Wilson
  2016-01-28  6:10       ` Jarod Wilson
                         ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28  6:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Wed, Jan 27, 2016 at 01:09:47PM -0800, Eric Dumazet wrote:
> On Wed, 2016-01-27 at 15:21 -0500, Jarod Wilson wrote:
> 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 289c231..7973ab5 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -180,6 +180,7 @@ struct net_device_stats {
> >  	unsigned long	tx_window_errors;
> >  	unsigned long	rx_compressed;
> >  	unsigned long	tx_compressed;
> > +	unsigned long	rx_unhandled;
> >  };
> >  
> 
> This structure is deprecated, please do not add new fields in it,
> as it will increase netlink answers for no good reason.
> 
> rtnl_link_stats64 is what really matters these days.

I'll respin the set without that, along with s/unhandled/nohandler/, which
I somehow got screwed up in my head and realized a split second after
hitting send. Outside of that, does this approach look sane? Should I
bother with touching /proc/net/dev output or not?

Thanks much,

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28  6:02     ` Jarod Wilson
@ 2016-01-28  6:10       ` Jarod Wilson
  2016-01-28  6:18       ` Jarod Wilson
  2016-01-28 13:00       ` Eric Dumazet
  2 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28  6:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, Jan 28, 2016 at 01:02:15AM -0500, Jarod Wilson wrote:
> On Wed, Jan 27, 2016 at 01:09:47PM -0800, Eric Dumazet wrote:
> > On Wed, 2016-01-27 at 15:21 -0500, Jarod Wilson wrote:
> > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 289c231..7973ab5 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -180,6 +180,7 @@ struct net_device_stats {
> > >  	unsigned long	tx_window_errors;
> > >  	unsigned long	rx_compressed;
> > >  	unsigned long	tx_compressed;
> > > +	unsigned long	rx_unhandled;
> > >  };
> > >  
> > 
> > This structure is deprecated, please do not add new fields in it,
> > as it will increase netlink answers for no good reason.
> > 
> > rtnl_link_stats64 is what really matters these days.
> 
> I'll respin the set without that, along with s/unhandled/nohandler/, which
> I somehow got screwed up in my head and realized a split second after
> hitting send. Outside of that, does this approach look sane? Should I
> bother with touching /proc/net/dev output or not?

Also, please excuse the poor excuse for a cover-letter that had a
duplicate of patch 1 in it. I'll fix that the next pass too.

/me hangs head in shame...

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28  6:02     ` Jarod Wilson
  2016-01-28  6:10       ` Jarod Wilson
@ 2016-01-28  6:18       ` Jarod Wilson
  2016-01-28 13:00         ` Eric Dumazet
  2016-01-28 13:00       ` Eric Dumazet
  2 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28  6:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, Jan 28, 2016 at 01:02:15AM -0500, Jarod Wilson wrote:
> On Wed, Jan 27, 2016 at 01:09:47PM -0800, Eric Dumazet wrote:
> > On Wed, 2016-01-27 at 15:21 -0500, Jarod Wilson wrote:
> > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 289c231..7973ab5 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -180,6 +180,7 @@ struct net_device_stats {
> > >  	unsigned long	tx_window_errors;
> > >  	unsigned long	rx_compressed;
> > >  	unsigned long	tx_compressed;
> > > +	unsigned long	rx_unhandled;
> > >  };
> > >  
> > 
> > This structure is deprecated, please do not add new fields in it,
> > as it will increase netlink answers for no good reason.
> > 
> > rtnl_link_stats64 is what really matters these days.
> 
> I'll respin the set without that

Or not. Now I remember why I added that in the first place:

In file included from ./arch/x86/include/asm/uaccess.h:7:0,
                 from net/core/dev.c:75:
net/core/dev.c: In function 'netdev_stats_to_stats64':
include/linux/compiler.h:484:20: error: call to '__compiletime_assert_7263' declared with attribute error: BUILD_BUG_ON failed: sizeof(*stats64) != sizeof(*netdev_stats)
    prefix ## suffix();    \
                    ^

Things are actually hard-wired to require that addition at the moment, or
you get the above build failure. Not sure if it's safe to remove that
BUILD_BUG_ON() yet, haven't looked closely, it's past my bed time. :)

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28  6:18       ` Jarod Wilson
@ 2016-01-28 13:00         ` Eric Dumazet
  2016-01-28 14:38           ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-01-28 13:00 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, 2016-01-28 at 01:18 -0500, Jarod Wilson wrote:

> Or not. Now I remember why I added that in the first place:
> 
> In file included from ./arch/x86/include/asm/uaccess.h:7:0,
>                  from net/core/dev.c:75:
> net/core/dev.c: In function 'netdev_stats_to_stats64':
> include/linux/compiler.h:484:20: error: call to '__compiletime_assert_7263' declared with attribute error: BUILD_BUG_ON failed: sizeof(*stats64) != sizeof(*netdev_stats)
>     prefix ## suffix();    \
>                     ^
> 
> Things are actually hard-wired to require that addition at the moment, or
> you get the above build failure. Not sure if it's safe to remove that
> BUILD_BUG_ON() yet, haven't looked closely, it's past my bed time. :)
> 

This was done for the transition from "unsigned long" to "u64", which is
a nop on 64bit arches.

But as we do not need to be compatible, since no linux kernel in the
past had this new field in struct net_device_stats,

and we do not need to add this new field as it is only accessed from
core networking stack [1], you need to adapt this helper.

[1] And maybe some virtual devices like bonding/team 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28  6:02     ` Jarod Wilson
  2016-01-28  6:10       ` Jarod Wilson
  2016-01-28  6:18       ` Jarod Wilson
@ 2016-01-28 13:00       ` Eric Dumazet
  2 siblings, 0 replies; 64+ messages in thread
From: Eric Dumazet @ 2016-01-28 13:00 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, 2016-01-28 at 01:02 -0500, Jarod Wilson wrote:
> Outside of that, does this approach look sane? Should I
> bother with touching /proc/net/dev output or not?

Please do not touch /proc/net/dev 

This is legacy stuff and really should not be touched anymore.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28 13:00         ` Eric Dumazet
@ 2016-01-28 14:38           ` Jarod Wilson
  2016-01-28 14:42             ` Eric Dumazet
  0 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28 14:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, Jan 28, 2016 at 05:00:02AM -0800, Eric Dumazet wrote:
> On Thu, 2016-01-28 at 01:18 -0500, Jarod Wilson wrote:
> 
> > Or not. Now I remember why I added that in the first place:
> > 
> > In file included from ./arch/x86/include/asm/uaccess.h:7:0,
> >                  from net/core/dev.c:75:
> > net/core/dev.c: In function 'netdev_stats_to_stats64':
> > include/linux/compiler.h:484:20: error: call to '__compiletime_assert_7263' declared with attribute error: BUILD_BUG_ON failed: sizeof(*stats64) != sizeof(*netdev_stats)
> >     prefix ## suffix();    \
> >                     ^
> > 
> > Things are actually hard-wired to require that addition at the moment, or
> > you get the above build failure. Not sure if it's safe to remove that
> > BUILD_BUG_ON() yet, haven't looked closely, it's past my bed time. :)
> > 
> 
> This was done for the transition from "unsigned long" to "u64", which is
> a nop on 64bit arches.
> 
> But as we do not need to be compatible, since no linux kernel in the
> past had this new field in struct net_device_stats,
> 
> and we do not need to add this new field as it is only accessed from
> core networking stack [1], you need to adapt this helper.

Something like this then:

diff --git a/net/core/dev.c b/net/core/dev.c
index 82334c6..2ca3eab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7262,15 +7262,16 @@ void netdev_run_todo(void)
 void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 			     const struct net_device_stats *netdev_stats)
 {
+	memset(stats64, 0, sizeof(*stats64));
 #if BITS_PER_LONG == 64
-	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
+	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
 	memcpy(stats64, netdev_stats, sizeof(*stats64));
 #else
 	size_t i, n = sizeof(*stats64) / sizeof(u64);
 	const unsigned long *src = (const unsigned long *)netdev_stats;
 	u64 *dst = (u64 *)stats64;
 
-	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
+	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) >
 		     sizeof(*stats64) / sizeof(u64));
 	for (i = 0; i < n; i++)
 		dst[i] = src[i];

Compiles locally w/o that net_device_stats addition, seems sane to me.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28 14:38           ` Jarod Wilson
@ 2016-01-28 14:42             ` Eric Dumazet
  2016-01-28 14:44               ` Eric Dumazet
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-01-28 14:42 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, 2016-01-28 at 09:38 -0500, Jarod Wilson wrote:

> Something like this then:
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 82334c6..2ca3eab 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7262,15 +7262,16 @@ void netdev_run_todo(void)
>  void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
>  			     const struct net_device_stats *netdev_stats)
>  {
> +	memset(stats64, 0, sizeof(*stats64));
>  #if BITS_PER_LONG == 64
> -	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
> +	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
>  	memcpy(stats64, netdev_stats, sizeof(*stats64));
>  #else
>  	size_t i, n = sizeof(*stats64) / sizeof(u64);
>  	const unsigned long *src = (const unsigned long *)netdev_stats;
>  	u64 *dst = (u64 *)stats64;
>  
> -	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
> +	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) >
>  		     sizeof(*stats64) / sizeof(u64));
>  	for (i = 0; i < n; i++)
>  		dst[i] = src[i];
> 
> Compiles locally w/o that net_device_stats addition, seems sane to me.
> 

Sure, you also can set stats64->rx_unhandled to 0 here, just to be 100%
safe.

Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28 14:42             ` Eric Dumazet
@ 2016-01-28 14:44               ` Eric Dumazet
  2016-01-28 14:46                 ` Eric Dumazet
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-01-28 14:44 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, 2016-01-28 at 06:42 -0800, Eric Dumazet wrote:

> 
> Sure, you also can set stats64->rx_unhandled to 0 here, just to be 100%
> safe.

And not add the memset(stats64, 0, sizeof(*stats64)), since we have the
guarantee to properly init whole stats64 structure.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28 14:44               ` Eric Dumazet
@ 2016-01-28 14:46                 ` Eric Dumazet
  2016-01-28 15:11                   ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-01-28 14:46 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, 2016-01-28 at 06:44 -0800, Eric Dumazet wrote:
> On Thu, 2016-01-28 at 06:42 -0800, Eric Dumazet wrote:
> 
> > 
> > Sure, you also can set stats64->rx_unhandled to 0 here, just to be 100%
> > safe.
> 
> And not add the memset(stats64, 0, sizeof(*stats64)), since we have the
> guarantee to properly init whole stats64 structure.

Or a more tricky

memset((char *)stats64 + sizeof(struct net_device_stats),
       0,
       sizeof(*stats64) - sizeof(struct net_device_stats));

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net 0/4] net: add rx_unhandled stat counter
  2016-01-28 14:46                 ` Eric Dumazet
@ 2016-01-28 15:11                   ` Jarod Wilson
  0 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28 15:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Thu, Jan 28, 2016 at 06:46:50AM -0800, Eric Dumazet wrote:
> On Thu, 2016-01-28 at 06:44 -0800, Eric Dumazet wrote:
> > On Thu, 2016-01-28 at 06:42 -0800, Eric Dumazet wrote:
> > 
> > > 
> > > Sure, you also can set stats64->rx_unhandled to 0 here, just to be 100%
> > > safe.
> > 
> > And not add the memset(stats64, 0, sizeof(*stats64)), since we have the
> > guarantee to properly init whole stats64 structure.
> 
> Or a more tricky
> 
> memset((char *)stats64 + sizeof(struct net_device_stats),
>        0,
>        sizeof(*stats64) - sizeof(struct net_device_stats));

I think I like this best, since it won't require someone to notice they
have to add an explicit initialization if they add a new counter, as well
as saving us from memset'ing the entire struct, the majority of which
we'll initialize moments later. Just needs some documentation updates,
which I've done locally. Will rebuild, re-test, then hopefully get an
updated patchset out the door.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net v2 0/4] net: add and use rx_nohandler stat counter
  2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
                   ` (5 preceding siblings ...)
  2016-01-27 20:21 ` [PATCH net 0/4] net: add rx_unhandled stat counter Jarod Wilson
@ 2016-01-28 15:49 ` Jarod Wilson
  2016-01-28 15:49   ` [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
                     ` (6 more replies)
  2016-01-28 16:22 ` [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
  7 siblings, 7 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28 15:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

The network core tries to keep track of dropped packets, but some packets
you wouldn't really call dropped, so much as intentionally ignored, under
certain circumstances. One such case is that of bonding and team device
slaves that are currently inactive. Their respective rx_handler functions
return RX_HANDLER_EXACT (the only places in the kernel that return that),
which ends up tracking into the network core's __netif_receive_skb_core()
function's drop path, with no pt_prev set. On a noisy network, this can
result in a very rapidly incrementing rx_dropped counter, not only on the
inactive slave(s), but also on the master device, such as the following:

$ cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  p7p1: 14783346  140430    0 140428    0     0          0      2040      680       8    0    0    0     0       0          0
  p7p2: 14805198  140648    0    0    0     0          0      2034        0       0    0    0    0     0       0          0
 bond0: 53365248  532798    0 421160    0     0          0    115151     2040      24    0    0    0     0       0          0
    lo:    5420      54    0    0    0     0          0         0     5420      54    0    0    0     0       0          0
  p5p1: 19292195  196197    0 140368    0     0          0     56564      680       8    0    0    0     0       0          0
  p5p2: 19289707  196171    0 140364    0     0          0     56547      680       8    0    0    0     0       0          0
   em3: 20996626  158214    0    0    0     0          0       383        0       0    0    0    0     0       0          0
   em2: 14065122  138462    0    0    0     0          0       310        0       0    0    0    0     0       0          0
   em1: 14063162  138440    0    0    0     0          0       308        0       0    0    0    0     0       0          0
   em4: 21050830  158729    0    0    0     0          0       385    71662     469    0    0    0     0       0          0
   ib0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
active-backup bond0, and you can see that all three have high drop counts,
with the master bond0 showing a tally of all three.

I know that this was previously discussed some here:

    http://www.spinics.net/lists/netdev/msg226341.html

It seems additional counters never came to fruition, so this is a first
attempt at creating one of them, so that we stop calling these drops,
which for users monitoring rx_dropped, causes great alarm, and renders the
counter much less useful for them.

This adds a sysfs statistics node and makes the counter available via
netlink.

Additionally, I'm not certain if this set qualifies for net, or if it
should be put aside and resubmitted for net-next after 4.5 is put to
bed, but I do have users who consider this an important bugfix.

Jarod Wilson (4):
  net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  net: add rx_nohandler stat counter
  team: track sum of rx_nohandler for all slaves
  bond: track sum of rx_nohandler for all slaves

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>

 drivers/net/bonding/bond_main.c |  1 +
 drivers/net/team/team.c         | 10 +++++++---
 include/linux/if_team.h         |  1 +
 include/linux/netdevice.h       |  3 +++
 include/uapi/linux/if_link.h    |  4 ++++
 net/core/dev.c                  | 19 ++++++++++++++-----
 net/core/net-sysfs.c            |  2 ++
 net/core/rtnetlink.c            |  2 ++
 8 files changed, 34 insertions(+), 8 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
@ 2016-01-28 15:49   ` Jarod Wilson
  2016-01-28 15:49   ` [PATCH net v2 2/4] net: add rx_nohandler stat counter Jarod Wilson
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28 15:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, Eric Dumazet

The netdev_stats_to_stats64 function copies the deprecated
net_device_stats format stats into rtnl_link_stats64 for legacy support
purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to
extend rtnl_link_stats64 without also extending net_device_stats. Relax
the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and
zero out all the stat counters that aren't present in net_device_stats.

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 net/core/dev.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..575a7df 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7253,25 +7253,30 @@ void netdev_run_todo(void)
 	}
 }
 
-/* Convert net_device_stats to rtnl_link_stats64.  They have the same
- * fields in the same order, with only the type differing.
+/* Convert net_device_stats to rtnl_link_stats64. rtnl_link_stats64 has
+ * all the same fields in the same order as net_device_stats, with only
+ * the type differing, but rtnl_link_stats64 may have additional fields
+ * at the end for newer counters.
  */
 void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 			     const struct net_device_stats *netdev_stats)
 {
 #if BITS_PER_LONG == 64
-	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
+	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
 	memcpy(stats64, netdev_stats, sizeof(*stats64));
 #else
 	size_t i, n = sizeof(*stats64) / sizeof(u64);
 	const unsigned long *src = (const unsigned long *)netdev_stats;
 	u64 *dst = (u64 *)stats64;
 
-	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
+	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) >
 		     sizeof(*stats64) / sizeof(u64));
 	for (i = 0; i < n; i++)
 		dst[i] = src[i];
 #endif
+	/* zero out counters that only exist in rtnl_link_stats64 */
+	memset((char *)stats64 + sizeof(*netdev_stats), 0,
+	       sizeof(*stats64) - sizeof(*netdev_stats));
 }
 EXPORT_SYMBOL(netdev_stats_to_stats64);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net v2 2/4] net: add rx_nohandler stat counter
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
  2016-01-28 15:49   ` [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
@ 2016-01-28 15:49   ` Jarod Wilson
  2016-01-28 15:49   ` [PATCH net v2 3/4] team: track sum of rx_nohandler for all slaves Jarod Wilson
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28 15:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

This adds an rx_nohandler stat counter, along with a sysfs statistics
node, and copies the counter out via netlink as well.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 include/linux/netdevice.h    | 3 +++
 include/uapi/linux/if_link.h | 4 ++++
 net/core/dev.c               | 6 +++++-
 net/core/net-sysfs.c         | 2 ++
 net/core/rtnetlink.c         | 2 ++
 5 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c231..78a20ce 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1397,6 +1397,8 @@ enum netdev_priv_flags {
  *			do not use this in drivers
  *	@tx_dropped:	Dropped packets by core network,
  *			do not use this in drivers
+ *	@rx_nohandler:	nohandler dropped packets by core network on
+ *			inactive devices, do not use this in drivers
  *
  *	@wireless_handlers:	List of functions to handle Wireless Extensions,
  *				instead of ioctl,
@@ -1611,6 +1613,7 @@ struct net_device {
 
 	atomic_long_t		rx_dropped;
 	atomic_long_t		tx_dropped;
+	atomic_long_t		rx_nohandler;
 
 #ifdef CONFIG_WIRELESS_EXT
 	const struct iw_handler_def *	wireless_handlers;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b780..d3e90b9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -35,6 +35,8 @@ struct rtnl_link_stats {
 	/* for cslip etc */
 	__u32	rx_compressed;
 	__u32	tx_compressed;
+
+	__u32	rx_nohandler;		/* dropped, no handler found	*/
 };
 
 /* The main device statistics structure */
@@ -68,6 +70,8 @@ struct rtnl_link_stats64 {
 	/* for cslip etc */
 	__u64	rx_compressed;
 	__u64	tx_compressed;
+
+	__u64	rx_nohandler;		/* dropped, no handler found	*/
 };
 
 /* The struct should be in sync with struct ifmap */
diff --git a/net/core/dev.c b/net/core/dev.c
index 575a7df..fef2351 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4154,7 +4154,10 @@ ncls:
 			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 	} else {
 drop:
-		atomic_long_inc(&skb->dev->rx_dropped);
+		if (!deliver_exact)
+			atomic_long_inc(&skb->dev->rx_dropped);
+		else
+			atomic_long_inc(&skb->dev->rx_nohandler);
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
@@ -7305,6 +7308,7 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 	}
 	storage->rx_dropped += atomic_long_read(&dev->rx_dropped);
 	storage->tx_dropped += atomic_long_read(&dev->tx_dropped);
+	storage->rx_nohandler += atomic_long_read(&dev->rx_nohandler);
 	return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b6c8a66..da7dbc2 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -574,6 +574,7 @@ NETSTAT_ENTRY(tx_heartbeat_errors);
 NETSTAT_ENTRY(tx_window_errors);
 NETSTAT_ENTRY(rx_compressed);
 NETSTAT_ENTRY(tx_compressed);
+NETSTAT_ENTRY(rx_nohandler);
 
 static struct attribute *netstat_attrs[] = {
 	&dev_attr_rx_packets.attr,
@@ -599,6 +600,7 @@ static struct attribute *netstat_attrs[] = {
 	&dev_attr_tx_window_errors.attr,
 	&dev_attr_rx_compressed.attr,
 	&dev_attr_tx_compressed.attr,
+	&dev_attr_rx_nohandler.attr,
 	NULL
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d735e85..20d7135 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -804,6 +804,8 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
 
 	a->rx_compressed = b->rx_compressed;
 	a->tx_compressed = b->tx_compressed;
+
+	a->rx_nohandler = b->rx_nohandler;
 }
 
 static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net v2 3/4] team: track sum of rx_nohandler for all slaves
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
  2016-01-28 15:49   ` [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
  2016-01-28 15:49   ` [PATCH net v2 2/4] net: add rx_nohandler stat counter Jarod Wilson
@ 2016-01-28 15:49   ` Jarod Wilson
  2016-01-28 15:49   ` [PATCH net v2 4/4] bond: " Jarod Wilson
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28 15:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, Jiri Pirko, netdev

CC: Jiri Pirko <jiri@resnulli.us>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/team/team.c | 10 +++++++---
 include/linux/if_team.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 718ceea..00558e1 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -758,6 +758,8 @@ static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
 		u64_stats_update_end(&pcpu_stats->syncp);
 
 		skb->dev = team->dev;
+	} else if (res == RX_HANDLER_EXACT) {
+		this_cpu_inc(team->pcpu_stats->rx_nohandler);
 	} else {
 		this_cpu_inc(team->pcpu_stats->rx_dropped);
 	}
@@ -1807,7 +1809,7 @@ team_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	struct team *team = netdev_priv(dev);
 	struct team_pcpu_stats *p;
 	u64 rx_packets, rx_bytes, rx_multicast, tx_packets, tx_bytes;
-	u32 rx_dropped = 0, tx_dropped = 0;
+	u32 rx_dropped = 0, tx_dropped = 0, rx_nohandler = 0;
 	unsigned int start;
 	int i;
 
@@ -1828,14 +1830,16 @@ team_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 		stats->tx_packets	+= tx_packets;
 		stats->tx_bytes		+= tx_bytes;
 		/*
-		 * rx_dropped & tx_dropped are u32, updated
-		 * without syncp protection.
+		 * rx_dropped, tx_dropped & rx_nohandler are u32,
+		 * updated without syncp protection.
 		 */
 		rx_dropped	+= p->rx_dropped;
 		tx_dropped	+= p->tx_dropped;
+		rx_nohandler	+= p->rx_nohandler;
 	}
 	stats->rx_dropped	= rx_dropped;
 	stats->tx_dropped	= tx_dropped;
+	stats->rx_nohandler	= rx_nohandler;
 	return stats;
 }
 
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index b84e49c..174f43f 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -24,6 +24,7 @@ struct team_pcpu_stats {
 	struct u64_stats_sync	syncp;
 	u32			rx_dropped;
 	u32			tx_dropped;
+	u32			rx_nohandler;
 };
 
 struct team;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net v2 4/4] bond: track sum of rx_nohandler for all slaves
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
                     ` (2 preceding siblings ...)
  2016-01-28 15:49   ` [PATCH net v2 3/4] team: track sum of rx_nohandler for all slaves Jarod Wilson
@ 2016-01-28 15:49   ` Jarod Wilson
  2016-01-30  3:37   ` [PATCH net v2 0/4] net: add and use rx_nohandler stat counter David Miller
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28 15:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, netdev

Sample output with this set applied for an active-backup bond:

$ cat /sys/devices/virtual/net/bond0/lower_p7p1/statistics/rx_nohandler
16568
$ cat /sys/devices/virtual/net/bond0/lower_p5p2/statistics/rx_nohandler
16583
$ cat /sys/devices/virtual/net/bond0/statistics/rx_nohandler
33151

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 56b5605..6587929 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3309,6 +3309,7 @@ static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev,
 		stats->rx_bytes += sstats->rx_bytes - pstats->rx_bytes;
 		stats->rx_errors += sstats->rx_errors - pstats->rx_errors;
 		stats->rx_dropped += sstats->rx_dropped - pstats->rx_dropped;
+		stats->rx_nohandler += sstats->rx_nohandler - pstats->rx_nohandler;
 
 		stats->tx_packets += sstats->tx_packets - pstats->tx_packets;;
 		stats->tx_bytes += sstats->tx_bytes - pstats->tx_bytes;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
                   ` (6 preceding siblings ...)
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
@ 2016-01-28 16:22 ` Jarod Wilson
  7 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-28 16:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, Eric Dumazet, netdev

The netdev_stats_to_stats64 function copies the deprecated
net_device_stats format stats into rtnl_link_stats64 for legacy support
purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to
extend rtnl_link_stats64 without also extending net_device_stats. Relax
the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and
zero out all the stat counters that aren't present in net_device_stats.

CC: Eric Dumazet <edumazet@google.com>
CC: netdev@ger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
v3: get the CC list right, moron

 net/core/dev.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..575a7df 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7253,25 +7253,30 @@ void netdev_run_todo(void)
 	}
 }
 
-/* Convert net_device_stats to rtnl_link_stats64.  They have the same
- * fields in the same order, with only the type differing.
+/* Convert net_device_stats to rtnl_link_stats64. rtnl_link_stats64 has
+ * all the same fields in the same order as net_device_stats, with only
+ * the type differing, but rtnl_link_stats64 may have additional fields
+ * at the end for newer counters.
  */
 void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 			     const struct net_device_stats *netdev_stats)
 {
 #if BITS_PER_LONG == 64
-	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
+	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
 	memcpy(stats64, netdev_stats, sizeof(*stats64));
 #else
 	size_t i, n = sizeof(*stats64) / sizeof(u64);
 	const unsigned long *src = (const unsigned long *)netdev_stats;
 	u64 *dst = (u64 *)stats64;
 
-	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
+	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) >
 		     sizeof(*stats64) / sizeof(u64));
 	for (i = 0; i < n; i++)
 		dst[i] = src[i];
 #endif
+	/* zero out counters that only exist in rtnl_link_stats64 */
+	memset((char *)stats64 + sizeof(*netdev_stats), 0,
+	       sizeof(*stats64) - sizeof(*netdev_stats));
 }
 EXPORT_SYMBOL(netdev_stats_to_stats64);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net v2 0/4] net: add and use rx_nohandler stat counter
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
                     ` (3 preceding siblings ...)
  2016-01-28 15:49   ` [PATCH net v2 4/4] bond: " Jarod Wilson
@ 2016-01-30  3:37   ` David Miller
  2016-01-30 18:16     ` Jarod Wilson
  2016-01-30 18:19   ` [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
  2016-02-01 23:51   ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter Jarod Wilson
  6 siblings, 1 reply; 64+ messages in thread
From: David Miller @ 2016-01-30  3:37 UTC (permalink / raw)
  To: jarod
  Cc: linux-kernel, edumazet, jiri, daniel, tom, j.vosburgh, vfalico,
	gospo, netdev

From: Jarod Wilson <jarod@redhat.com>
Date: Thu, 28 Jan 2016 10:49:44 -0500

> The network core tries to keep track of dropped packets, but some packets
> you wouldn't really call dropped, so much as intentionally ignored, under
> certain circumstances. One such case is that of bonding and team device
> slaves that are currently inactive. Their respective rx_handler functions
> return RX_HANDLER_EXACT (the only places in the kernel that return that),
> which ends up tracking into the network core's __netif_receive_skb_core()
> function's drop path, with no pt_prev set. On a noisy network, this can
> result in a very rapidly incrementing rx_dropped counter, not only on the
> inactive slave(s), but also on the master device, such as the following:
 ...

Both my inbox and patchwork only show patch 2, 3, and 4.  Where is #1?

Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v2 0/4] net: add and use rx_nohandler stat counter
  2016-01-30  3:37   ` [PATCH net v2 0/4] net: add and use rx_nohandler stat counter David Miller
@ 2016-01-30 18:16     ` Jarod Wilson
  0 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-30 18:16 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, edumazet, jiri, daniel, tom, j.vosburgh, vfalico,
	gospo, netdev

On Fri, Jan 29, 2016 at 07:37:51PM -0800, David Miller wrote:
> From: Jarod Wilson <jarod@redhat.com>
> Date: Thu, 28 Jan 2016 10:49:44 -0500
> 
> > The network core tries to keep track of dropped packets, but some packets
> > you wouldn't really call dropped, so much as intentionally ignored, under
> > certain circumstances. One such case is that of bonding and team device
> > slaves that are currently inactive. Their respective rx_handler functions
> > return RX_HANDLER_EXACT (the only places in the kernel that return that),
> > which ends up tracking into the network core's __netif_receive_skb_core()
> > function's drop path, with no pt_prev set. On a noisy network, this can
> > result in a very rapidly incrementing rx_dropped counter, not only on the
> > inactive slave(s), but also on the master device, such as the following:
>  ...
> 
> Both my inbox and patchwork only show patch 2, 3, and 4.  Where is #1?

Crap. It seems I fat-fingered netdev@vger.kernel.org without the v in
front of ger. Will re-send in a sec, hopefully finallly not screwing it up
this time.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
                     ` (4 preceding siblings ...)
  2016-01-30  3:37   ` [PATCH net v2 0/4] net: add and use rx_nohandler stat counter David Miller
@ 2016-01-30 18:19   ` Jarod Wilson
  2016-01-30 18:34     ` Eric Dumazet
  2016-02-01 23:51   ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter Jarod Wilson
  6 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-01-30 18:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, Eric Dumazet, netdev

The netdev_stats_to_stats64 function copies the deprecated
net_device_stats format stats into rtnl_link_stats64 for legacy support
purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to
extend rtnl_link_stats64 without also extending net_device_stats. Relax
the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and
zero out all the stat counters that aren't present in net_device_stats.

CC: Eric Dumazet <edumazet@google.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
Re-re-sending, hopefully getting the patch to the right list this time.

 net/core/dev.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..575a7df 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7253,25 +7253,30 @@ void netdev_run_todo(void)
 	}
 }
 
-/* Convert net_device_stats to rtnl_link_stats64.  They have the same
- * fields in the same order, with only the type differing.
+/* Convert net_device_stats to rtnl_link_stats64. rtnl_link_stats64 has
+ * all the same fields in the same order as net_device_stats, with only
+ * the type differing, but rtnl_link_stats64 may have additional fields
+ * at the end for newer counters.
  */
 void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 			     const struct net_device_stats *netdev_stats)
 {
 #if BITS_PER_LONG == 64
-	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
+	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
 	memcpy(stats64, netdev_stats, sizeof(*stats64));
 #else
 	size_t i, n = sizeof(*stats64) / sizeof(u64);
 	const unsigned long *src = (const unsigned long *)netdev_stats;
 	u64 *dst = (u64 *)stats64;
 
-	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
+	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) >
 		     sizeof(*stats64) / sizeof(u64));
 	for (i = 0; i < n; i++)
 		dst[i] = src[i];
 #endif
+	/* zero out counters that only exist in rtnl_link_stats64 */
+	memset((char *)stats64 + sizeof(*netdev_stats), 0,
+	       sizeof(*stats64) - sizeof(*netdev_stats));
 }
 EXPORT_SYMBOL(netdev_stats_to_stats64);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-01-30 18:19   ` [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
@ 2016-01-30 18:34     ` Eric Dumazet
  2016-01-30 20:39       ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-01-30 18:34 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-kernel, Eric Dumazet, netdev

On Sat, 2016-01-30 at 13:19 -0500, Jarod Wilson wrote:
> The netdev_stats_to_stats64 function copies the deprecated
> net_device_stats format stats into rtnl_link_stats64 for legacy support
> purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to
> extend rtnl_link_stats64 without also extending net_device_stats. Relax
> the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and
> zero out all the stat counters that aren't present in net_device_stats.
> 
> CC: Eric Dumazet <edumazet@google.com>
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
> Re-re-sending, hopefully getting the patch to the right list this time.
> 
>  net/core/dev.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..575a7df 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7253,25 +7253,30 @@ void netdev_run_todo(void)
>  	}
>  }
>  
> -/* Convert net_device_stats to rtnl_link_stats64.  They have the same
> - * fields in the same order, with only the type differing.
> +/* Convert net_device_stats to rtnl_link_stats64. rtnl_link_stats64 has
> + * all the same fields in the same order as net_device_stats, with only
> + * the type differing, but rtnl_link_stats64 may have additional fields
> + * at the end for newer counters.
>   */
>  void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
>  			     const struct net_device_stats *netdev_stats)
>  {
>  #if BITS_PER_LONG == 64
> -	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
> +	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
>  	memcpy(stats64, netdev_stats, sizeof(*stats64));
>  #else
>  	size_t i, n = sizeof(*stats64) / sizeof(u64);
>  	const unsigned long *src = (const unsigned long *)netdev_stats;
>  	u64 *dst = (u64 *)stats64;
>  
> -	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
> +	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) >
>  		     sizeof(*stats64) / sizeof(u64));
>  	for (i = 0; i < n; i++)
>  		dst[i] = src[i];
>  #endif
> +	/* zero out counters that only exist in rtnl_link_stats64 */
> +	memset((char *)stats64 + sizeof(*netdev_stats), 0,
> +	       sizeof(*stats64) - sizeof(*netdev_stats));

Are you sure it works on 32bit arches ?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-01-30 18:34     ` Eric Dumazet
@ 2016-01-30 20:39       ` Jarod Wilson
  2016-01-30 20:53         ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-01-30 20:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, Eric Dumazet, netdev

On Sat, Jan 30, 2016 at 10:34:32AM -0800, Eric Dumazet wrote:
> On Sat, 2016-01-30 at 13:19 -0500, Jarod Wilson wrote:
> > The netdev_stats_to_stats64 function copies the deprecated
> > net_device_stats format stats into rtnl_link_stats64 for legacy support
> > purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to
> > extend rtnl_link_stats64 without also extending net_device_stats. Relax
> > the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and
> > zero out all the stat counters that aren't present in net_device_stats.
> > 
> > CC: Eric Dumazet <edumazet@google.com>
> > CC: netdev@vger.kernel.org
> > Signed-off-by: Jarod Wilson <jarod@redhat.com>
> > ---
> > Re-re-sending, hopefully getting the patch to the right list this time.
> > 
> >  net/core/dev.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8cba3d8..575a7df 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -7253,25 +7253,30 @@ void netdev_run_todo(void)
> >  	}
> >  }
> >  
> > -/* Convert net_device_stats to rtnl_link_stats64.  They have the same
> > - * fields in the same order, with only the type differing.
> > +/* Convert net_device_stats to rtnl_link_stats64. rtnl_link_stats64 has
> > + * all the same fields in the same order as net_device_stats, with only
> > + * the type differing, but rtnl_link_stats64 may have additional fields
> > + * at the end for newer counters.
> >   */
> >  void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
> >  			     const struct net_device_stats *netdev_stats)
> >  {
> >  #if BITS_PER_LONG == 64
> > -	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
> > +	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
> >  	memcpy(stats64, netdev_stats, sizeof(*stats64));
> >  #else
> >  	size_t i, n = sizeof(*stats64) / sizeof(u64);
> >  	const unsigned long *src = (const unsigned long *)netdev_stats;
> >  	u64 *dst = (u64 *)stats64;
> >  
> > -	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
> > +	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) >
> >  		     sizeof(*stats64) / sizeof(u64));
> >  	for (i = 0; i < n; i++)
> >  		dst[i] = src[i];
> >  #endif
> > +	/* zero out counters that only exist in rtnl_link_stats64 */
> > +	memset((char *)stats64 + sizeof(*netdev_stats), 0,
> > +	       sizeof(*stats64) - sizeof(*netdev_stats));
> 
> Are you sure it works on 32bit arches ?

Ew, no, it won't work correctly on 32-bit. The for loop is going to copy
data into dst from beyond the end of netdev_stats, and the range looks
like it won't be right either, only half of the added stats64 space will
get zeroed out. Okay, I'll fix that up correctly.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-01-30 20:39       ` Jarod Wilson
@ 2016-01-30 20:53         ` Jarod Wilson
  2016-01-30 23:26           ` David Miller
  0 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-01-30 20:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, Eric Dumazet, netdev

On Sat, Jan 30, 2016 at 03:39:01PM -0500, Jarod Wilson wrote:
> On Sat, Jan 30, 2016 at 10:34:32AM -0800, Eric Dumazet wrote:
> > On Sat, 2016-01-30 at 13:19 -0500, Jarod Wilson wrote:
> > > The netdev_stats_to_stats64 function copies the deprecated
> > > net_device_stats format stats into rtnl_link_stats64 for legacy support
> > > purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to
> > > extend rtnl_link_stats64 without also extending net_device_stats. Relax
> > > the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and
> > > zero out all the stat counters that aren't present in net_device_stats.
> > > 
> > > CC: Eric Dumazet <edumazet@google.com>
> > > CC: netdev@vger.kernel.org
> > > Signed-off-by: Jarod Wilson <jarod@redhat.com>
> > > ---
> > > Re-re-sending, hopefully getting the patch to the right list this time.
> > > 
> > >  net/core/dev.c | 13 +++++++++----
> > >  1 file changed, 9 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 8cba3d8..575a7df 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -7253,25 +7253,30 @@ void netdev_run_todo(void)
> > >  	}
> > >  }
> > >  
> > > -/* Convert net_device_stats to rtnl_link_stats64.  They have the same
> > > - * fields in the same order, with only the type differing.
> > > +/* Convert net_device_stats to rtnl_link_stats64. rtnl_link_stats64 has
> > > + * all the same fields in the same order as net_device_stats, with only
> > > + * the type differing, but rtnl_link_stats64 may have additional fields
> > > + * at the end for newer counters.
> > >   */
> > >  void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
> > >  			     const struct net_device_stats *netdev_stats)
> > >  {
> > >  #if BITS_PER_LONG == 64
> > > -	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
> > > +	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
> > >  	memcpy(stats64, netdev_stats, sizeof(*stats64));
> > >  #else
> > >  	size_t i, n = sizeof(*stats64) / sizeof(u64);
> > >  	const unsigned long *src = (const unsigned long *)netdev_stats;
> > >  	u64 *dst = (u64 *)stats64;
> > >  
> > > -	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
> > > +	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) >
> > >  		     sizeof(*stats64) / sizeof(u64));
> > >  	for (i = 0; i < n; i++)
> > >  		dst[i] = src[i];
> > >  #endif
> > > +	/* zero out counters that only exist in rtnl_link_stats64 */
> > > +	memset((char *)stats64 + sizeof(*netdev_stats), 0,
> > > +	       sizeof(*stats64) - sizeof(*netdev_stats));
> > 
> > Are you sure it works on 32bit arches ?
> 
> Ew, no, it won't work correctly on 32-bit. The for loop is going to copy
> data into dst from beyond the end of netdev_stats, and the range looks
> like it won't be right either, only half of the added stats64 space will
> get zeroed out. Okay, I'll fix that up correctly.

Completely untested as of yet, but I think something like the following
looks correct. I'll give it a spin as soon as I can.

diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..65863e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7253,24 +7253,31 @@ void netdev_run_todo(void)
 	}
 }
 
-/* Convert net_device_stats to rtnl_link_stats64.  They have the same
- * fields in the same order, with only the type differing.
+/* Convert net_device_stats to rtnl_link_stats64. rtnl_link_stats64 has
+ * all the same fields in the same order as net_device_stats, with only
+ * the type differing, but rtnl_link_stats64 may have additional fields
+ * at the end for newer counters.
  */
 void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 			     const struct net_device_stats *netdev_stats)
 {
 #if BITS_PER_LONG == 64
-	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
+	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
 	memcpy(stats64, netdev_stats, sizeof(*stats64));
+	/* zero out counters that only exist in rtnl_link_stats64 */
+	memset((char *)stats64 + sizeof(*netdev_stats), 0,
+	       sizeof(*stats64) - sizeof(*netdev_stats));
 #else
-	size_t i, n = sizeof(*stats64) / sizeof(u64);
+	size_t i, n = sizeof(*netdev_stats) / sizeof(unsigned long);
 	const unsigned long *src = (const unsigned long *)netdev_stats;
 	u64 *dst = (u64 *)stats64;
 
-	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
-		     sizeof(*stats64) / sizeof(u64));
+	BUILD_BUG_ON(n > sizeof(*stats64) / sizeof(u64));
 	for (i = 0; i < n; i++)
 		dst[i] = src[i];
+	/* zero out counters that only exist in rtnl_link_stats64 */
+	memset((char *)stats64 + n * sizeof(u64), 0,
+	       sizeof(*stats64) - n * sizeof(u64));
 #endif
 }
 EXPORT_SYMBOL(netdev_stats_to_stats64);

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-01-30 20:53         ` Jarod Wilson
@ 2016-01-30 23:26           ` David Miller
  2016-01-31 18:07             ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: David Miller @ 2016-01-30 23:26 UTC (permalink / raw)
  To: jarod; +Cc: eric.dumazet, linux-kernel, edumazet, netdev

From: Jarod Wilson <jarod@redhat.com>
Date: Sat, 30 Jan 2016 15:53:05 -0500

> On Sat, Jan 30, 2016 at 03:39:01PM -0500, Jarod Wilson wrote:
>> Ew, no, it won't work correctly on 32-bit. The for loop is going to copy
>> data into dst from beyond the end of netdev_stats, and the range looks
>> like it won't be right either, only half of the added stats64 space will
>> get zeroed out. Okay, I'll fix that up correctly.
> 
> Completely untested as of yet, but I think something like the following
> looks correct. I'll give it a spin as soon as I can.

Jarod, please respin your entire series as a v3 once you sort this out.

Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-01-30 23:26           ` David Miller
@ 2016-01-31 18:07             ` Jarod Wilson
  0 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-01-31 18:07 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, linux-kernel, edumazet, netdev

On Sat, Jan 30, 2016 at 03:26:35PM -0800, David Miller wrote:
> From: Jarod Wilson <jarod@redhat.com>
> Date: Sat, 30 Jan 2016 15:53:05 -0500
> 
> > On Sat, Jan 30, 2016 at 03:39:01PM -0500, Jarod Wilson wrote:
> >> Ew, no, it won't work correctly on 32-bit. The for loop is going to copy
> >> data into dst from beyond the end of netdev_stats, and the range looks
> >> like it won't be right either, only half of the added stats64 space will
> >> get zeroed out. Okay, I'll fix that up correctly.
> > 
> > Completely untested as of yet, but I think something like the following
> > looks correct. I'll give it a spin as soon as I can.
> 
> Jarod, please respin your entire series as a v3 once you sort this out.

Will do. I should be able to get it tested out on a 32-bit setup Monday.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net v3 0/4] net: add and use rx_nohandler stat counter
  2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
                     ` (5 preceding siblings ...)
  2016-01-30 18:19   ` [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
@ 2016-02-01 23:51   ` Jarod Wilson
  2016-02-01 23:51     ` [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
                       ` (4 more replies)
  6 siblings, 5 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-02-01 23:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

The network core tries to keep track of dropped packets, but some packets
you wouldn't really call dropped, so much as intentionally ignored, under
certain circumstances. One such case is that of bonding and team device
slaves that are currently inactive. Their respective rx_handler functions
return RX_HANDLER_EXACT (the only places in the kernel that return that),
which ends up tracking into the network core's __netif_receive_skb_core()
function's drop path, with no pt_prev set. On a noisy network, this can
result in a very rapidly incrementing rx_dropped counter, not only on the
inactive slave(s), but also on the master device, such as the following:

$ cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  p7p1: 14783346  140430    0 140428    0     0          0      2040      680       8    0    0    0     0       0          0
  p7p2: 14805198  140648    0    0    0     0          0      2034        0       0    0    0    0     0       0          0
 bond0: 53365248  532798    0 421160    0     0          0    115151     2040      24    0    0    0     0       0          0
    lo:    5420      54    0    0    0     0          0         0     5420      54    0    0    0     0       0          0
  p5p1: 19292195  196197    0 140368    0     0          0     56564      680       8    0    0    0     0       0          0
  p5p2: 19289707  196171    0 140364    0     0          0     56547      680       8    0    0    0     0       0          0
   em3: 20996626  158214    0    0    0     0          0       383        0       0    0    0    0     0       0          0
   em2: 14065122  138462    0    0    0     0          0       310        0       0    0    0    0     0       0          0
   em1: 14063162  138440    0    0    0     0          0       308        0       0    0    0    0     0       0          0
   em4: 21050830  158729    0    0    0     0          0       385    71662     469    0    0    0     0       0          0
   ib0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0

In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
active-backup bond0, and you can see that all three have high drop counts,
with the master bond0 showing a tally of all three.

I know that this was previously discussed some here:

    http://www.spinics.net/lists/netdev/msg226341.html

It seems additional counters never came to fruition, so this is a first
attempt at creating one of them, so that we stop calling these drops,
which for users monitoring rx_dropped, causes great alarm, and renders the
counter much less useful for them.

This adds a sysfs statistics node and makes the counter available via
netlink.

Additionally, I'm not certain if this set qualifies for net, or if it
should be put aside and resubmitted for net-next after 4.5 is put to
bed, but I do have users who consider this an important bugfix.

This has been tested quite a bit on x86_64, and now lightly on i686 as
well, to verify functionality of updates to netdev_stats_to_stats64()
on 32-bit arches.

Jarod Wilson (4):
  net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  net: add rx_nohandler stat counter
  team: track sum of rx_nohandler for all slaves
  bond: track sum of rx_nohandler for all slaves

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>

 drivers/net/bonding/bond_main.c |  1 +
 drivers/net/team/team.c         | 10 +++++++---
 include/linux/if_team.h         |  1 +
 include/linux/netdevice.h       |  3 +++
 include/uapi/linux/if_link.h    |  4 ++++
 net/core/dev.c                  | 25 ++++++++++++++++++-------
 net/core/net-sysfs.c            |  2 ++
 net/core/rtnetlink.c            |  2 ++
 8 files changed, 38 insertions(+), 10 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64
  2016-02-01 23:51   ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter Jarod Wilson
@ 2016-02-01 23:51     ` Jarod Wilson
  2016-02-01 23:51     ` [PATCH net v3 2/4] net: add rx_nohandler stat counter Jarod Wilson
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-02-01 23:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, Eric Dumazet, netdev

The netdev_stats_to_stats64 function copies the deprecated
net_device_stats format stats into rtnl_link_stats64 for legacy support
purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to
extend rtnl_link_stats64 without also extending net_device_stats. Relax
the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and
zero out all the stat counters that aren't present in net_device_stats.

CC: Eric Dumazet <edumazet@google.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 net/core/dev.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..65863e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7253,24 +7253,31 @@ void netdev_run_todo(void)
 	}
 }
 
-/* Convert net_device_stats to rtnl_link_stats64.  They have the same
- * fields in the same order, with only the type differing.
+/* Convert net_device_stats to rtnl_link_stats64. rtnl_link_stats64 has
+ * all the same fields in the same order as net_device_stats, with only
+ * the type differing, but rtnl_link_stats64 may have additional fields
+ * at the end for newer counters.
  */
 void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 			     const struct net_device_stats *netdev_stats)
 {
 #if BITS_PER_LONG == 64
-	BUILD_BUG_ON(sizeof(*stats64) != sizeof(*netdev_stats));
+	BUILD_BUG_ON(sizeof(*stats64) < sizeof(*netdev_stats));
 	memcpy(stats64, netdev_stats, sizeof(*stats64));
+	/* zero out counters that only exist in rtnl_link_stats64 */
+	memset((char *)stats64 + sizeof(*netdev_stats), 0,
+	       sizeof(*stats64) - sizeof(*netdev_stats));
 #else
-	size_t i, n = sizeof(*stats64) / sizeof(u64);
+	size_t i, n = sizeof(*netdev_stats) / sizeof(unsigned long);
 	const unsigned long *src = (const unsigned long *)netdev_stats;
 	u64 *dst = (u64 *)stats64;
 
-	BUILD_BUG_ON(sizeof(*netdev_stats) / sizeof(unsigned long) !=
-		     sizeof(*stats64) / sizeof(u64));
+	BUILD_BUG_ON(n > sizeof(*stats64) / sizeof(u64));
 	for (i = 0; i < n; i++)
 		dst[i] = src[i];
+	/* zero out counters that only exist in rtnl_link_stats64 */
+	memset((char *)stats64 + n * sizeof(u64), 0,
+	       sizeof(*stats64) - n * sizeof(u64));
 #endif
 }
 EXPORT_SYMBOL(netdev_stats_to_stats64);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-01 23:51   ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter Jarod Wilson
  2016-02-01 23:51     ` [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
@ 2016-02-01 23:51     ` Jarod Wilson
  2016-02-07 19:37       ` Stephen Hemminger
  2016-02-01 23:51     ` [PATCH net v3 3/4] team: track sum of rx_nohandler for all slaves Jarod Wilson
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-02-01 23:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

This adds an rx_nohandler stat counter, along with a sysfs statistics
node, and copies the counter out via netlink as well.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 include/linux/netdevice.h    | 3 +++
 include/uapi/linux/if_link.h | 4 ++++
 net/core/dev.c               | 6 +++++-
 net/core/net-sysfs.c         | 2 ++
 net/core/rtnetlink.c         | 2 ++
 5 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c231..78a20ce 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1397,6 +1397,8 @@ enum netdev_priv_flags {
  *			do not use this in drivers
  *	@tx_dropped:	Dropped packets by core network,
  *			do not use this in drivers
+ *	@rx_nohandler:	nohandler dropped packets by core network on
+ *			inactive devices, do not use this in drivers
  *
  *	@wireless_handlers:	List of functions to handle Wireless Extensions,
  *				instead of ioctl,
@@ -1611,6 +1613,7 @@ struct net_device {
 
 	atomic_long_t		rx_dropped;
 	atomic_long_t		tx_dropped;
+	atomic_long_t		rx_nohandler;
 
 #ifdef CONFIG_WIRELESS_EXT
 	const struct iw_handler_def *	wireless_handlers;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b780..d3e90b9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -35,6 +35,8 @@ struct rtnl_link_stats {
 	/* for cslip etc */
 	__u32	rx_compressed;
 	__u32	tx_compressed;
+
+	__u32	rx_nohandler;		/* dropped, no handler found	*/
 };
 
 /* The main device statistics structure */
@@ -68,6 +70,8 @@ struct rtnl_link_stats64 {
 	/* for cslip etc */
 	__u64	rx_compressed;
 	__u64	tx_compressed;
+
+	__u64	rx_nohandler;		/* dropped, no handler found	*/
 };
 
 /* The struct should be in sync with struct ifmap */
diff --git a/net/core/dev.c b/net/core/dev.c
index 65863e5..f128483 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4154,7 +4154,10 @@ ncls:
 			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 	} else {
 drop:
-		atomic_long_inc(&skb->dev->rx_dropped);
+		if (!deliver_exact)
+			atomic_long_inc(&skb->dev->rx_dropped);
+		else
+			atomic_long_inc(&skb->dev->rx_nohandler);
 		kfree_skb(skb);
 		/* Jamal, now you will not able to escape explaining
 		 * me how you were going to use this. :-)
@@ -7307,6 +7310,7 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
 	}
 	storage->rx_dropped += atomic_long_read(&dev->rx_dropped);
 	storage->tx_dropped += atomic_long_read(&dev->tx_dropped);
+	storage->rx_nohandler += atomic_long_read(&dev->rx_nohandler);
 	return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b6c8a66..da7dbc2 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -574,6 +574,7 @@ NETSTAT_ENTRY(tx_heartbeat_errors);
 NETSTAT_ENTRY(tx_window_errors);
 NETSTAT_ENTRY(rx_compressed);
 NETSTAT_ENTRY(tx_compressed);
+NETSTAT_ENTRY(rx_nohandler);
 
 static struct attribute *netstat_attrs[] = {
 	&dev_attr_rx_packets.attr,
@@ -599,6 +600,7 @@ static struct attribute *netstat_attrs[] = {
 	&dev_attr_tx_window_errors.attr,
 	&dev_attr_rx_compressed.attr,
 	&dev_attr_tx_compressed.attr,
+	&dev_attr_rx_nohandler.attr,
 	NULL
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d735e85..20d7135 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -804,6 +804,8 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
 
 	a->rx_compressed = b->rx_compressed;
 	a->tx_compressed = b->tx_compressed;
+
+	a->rx_nohandler = b->rx_nohandler;
 }
 
 static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net v3 3/4] team: track sum of rx_nohandler for all slaves
  2016-02-01 23:51   ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter Jarod Wilson
  2016-02-01 23:51     ` [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
  2016-02-01 23:51     ` [PATCH net v3 2/4] net: add rx_nohandler stat counter Jarod Wilson
@ 2016-02-01 23:51     ` Jarod Wilson
  2016-02-01 23:51     ` [PATCH net v3 4/4] bond: " Jarod Wilson
  2016-02-06  8:00     ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter David Miller
  4 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-02-01 23:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jarod Wilson, Jiri Pirko, netdev

CC: Jiri Pirko <jiri@resnulli.us>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/team/team.c | 10 +++++++---
 include/linux/if_team.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 718ceea..00558e1 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -758,6 +758,8 @@ static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
 		u64_stats_update_end(&pcpu_stats->syncp);
 
 		skb->dev = team->dev;
+	} else if (res == RX_HANDLER_EXACT) {
+		this_cpu_inc(team->pcpu_stats->rx_nohandler);
 	} else {
 		this_cpu_inc(team->pcpu_stats->rx_dropped);
 	}
@@ -1807,7 +1809,7 @@ team_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	struct team *team = netdev_priv(dev);
 	struct team_pcpu_stats *p;
 	u64 rx_packets, rx_bytes, rx_multicast, tx_packets, tx_bytes;
-	u32 rx_dropped = 0, tx_dropped = 0;
+	u32 rx_dropped = 0, tx_dropped = 0, rx_nohandler = 0;
 	unsigned int start;
 	int i;
 
@@ -1828,14 +1830,16 @@ team_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 		stats->tx_packets	+= tx_packets;
 		stats->tx_bytes		+= tx_bytes;
 		/*
-		 * rx_dropped & tx_dropped are u32, updated
-		 * without syncp protection.
+		 * rx_dropped, tx_dropped & rx_nohandler are u32,
+		 * updated without syncp protection.
 		 */
 		rx_dropped	+= p->rx_dropped;
 		tx_dropped	+= p->tx_dropped;
+		rx_nohandler	+= p->rx_nohandler;
 	}
 	stats->rx_dropped	= rx_dropped;
 	stats->tx_dropped	= tx_dropped;
+	stats->rx_nohandler	= rx_nohandler;
 	return stats;
 }
 
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index b84e49c..174f43f 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -24,6 +24,7 @@ struct team_pcpu_stats {
 	struct u64_stats_sync	syncp;
 	u32			rx_dropped;
 	u32			tx_dropped;
+	u32			rx_nohandler;
 };
 
 struct team;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH net v3 4/4] bond: track sum of rx_nohandler for all slaves
  2016-02-01 23:51   ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter Jarod Wilson
                       ` (2 preceding siblings ...)
  2016-02-01 23:51     ` [PATCH net v3 3/4] team: track sum of rx_nohandler for all slaves Jarod Wilson
@ 2016-02-01 23:51     ` Jarod Wilson
  2016-02-06  8:00     ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter David Miller
  4 siblings, 0 replies; 64+ messages in thread
From: Jarod Wilson @ 2016-02-01 23:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jarod Wilson, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, netdev

Sample output with this set applied for an active-backup bond:

$ cat /sys/devices/virtual/net/bond0/lower_p7p1/statistics/rx_nohandler
16568
$ cat /sys/devices/virtual/net/bond0/lower_p5p2/statistics/rx_nohandler
16583
$ cat /sys/devices/virtual/net/bond0/statistics/rx_nohandler
33151

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 56b5605..6587929 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3309,6 +3309,7 @@ static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev,
 		stats->rx_bytes += sstats->rx_bytes - pstats->rx_bytes;
 		stats->rx_errors += sstats->rx_errors - pstats->rx_errors;
 		stats->rx_dropped += sstats->rx_dropped - pstats->rx_dropped;
+		stats->rx_nohandler += sstats->rx_nohandler - pstats->rx_nohandler;
 
 		stats->tx_packets += sstats->tx_packets - pstats->tx_packets;;
 		stats->tx_bytes += sstats->tx_bytes - pstats->tx_bytes;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 0/4] net: add and use rx_nohandler stat counter
  2016-02-01 23:51   ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter Jarod Wilson
                       ` (3 preceding siblings ...)
  2016-02-01 23:51     ` [PATCH net v3 4/4] bond: " Jarod Wilson
@ 2016-02-06  8:00     ` David Miller
  4 siblings, 0 replies; 64+ messages in thread
From: David Miller @ 2016-02-06  8:00 UTC (permalink / raw)
  To: jarod
  Cc: linux-kernel, edumazet, jiri, daniel, tom, j.vosburgh, vfalico,
	gospo, netdev

From: Jarod Wilson <jarod@redhat.com>
Date: Mon,  1 Feb 2016 18:51:03 -0500

 ...
> It seems additional counters never came to fruition, so this is a first
> attempt at creating one of them, so that we stop calling these drops,
> which for users monitoring rx_dropped, causes great alarm, and renders the
> counter much less useful for them.
> 
> This adds a sysfs statistics node and makes the counter available via
> netlink.
> 
> Additionally, I'm not certain if this set qualifies for net, or if it
> should be put aside and resubmitted for net-next after 4.5 is put to
> bed, but I do have users who consider this an important bugfix.
> 
> This has been tested quite a bit on x86_64, and now lightly on i686 as
> well, to verify functionality of updates to netdev_stats_to_stats64()
> on 32-bit arches.

Series applied, thanks Jarod.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-01 23:51     ` [PATCH net v3 2/4] net: add rx_nohandler stat counter Jarod Wilson
@ 2016-02-07 19:37       ` Stephen Hemminger
  2016-02-07 19:46         ` David Miller
  0 siblings, 1 reply; 64+ messages in thread
From: Stephen Hemminger @ 2016-02-07 19:37 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, David S. Miller, Eric Dumazet, Jiri Pirko,
	Daniel Borkmann, Tom Herbert, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, netdev

On Mon,  1 Feb 2016 18:51:05 -0500
Jarod Wilson <jarod@redhat.com> wrote:

> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -35,6 +35,8 @@ struct rtnl_link_stats {
>  	/* for cslip etc */
>  	__u32	rx_compressed;
>  	__u32	tx_compressed;
> +
> +	__u32	rx_nohandler;		/* dropped, no handler found	*/
>  };
>  
>  /* The main device statistics structure */
> @@ -68,6 +70,8 @@ struct rtnl_link_stats64 {
>  	/* for cslip etc */
>  	__u64	rx_compressed;
>  	__u64	tx_compressed;
> +
> +	__u64	rx_nohandler;		/* dropped, no handler found	*/
>  };

Why was this userspace ABI change allowed?
The stats structure is exposed to user space via netlink
and changing the size of responses will break iproute2 commands.

The code will be expecting one size and the response will vary and
break existing code.  Yes, the code should check the size
of the response, but it doesn't and I am sure iproute2 is not
the only code that does this.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-07 19:37       ` Stephen Hemminger
@ 2016-02-07 19:46         ` David Miller
  2016-02-07 20:19           ` Eric Dumazet
  0 siblings, 1 reply; 64+ messages in thread
From: David Miller @ 2016-02-07 19:46 UTC (permalink / raw)
  To: stephen
  Cc: jarod, linux-kernel, edumazet, jiri, daniel, tom, j.vosburgh,
	vfalico, gospo, netdev

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Sun, 7 Feb 2016 11:37:32 -0800

> On Mon,  1 Feb 2016 18:51:05 -0500
> Jarod Wilson <jarod@redhat.com> wrote:
> 
>> --- a/include/uapi/linux/if_link.h
>> +++ b/include/uapi/linux/if_link.h
>> @@ -35,6 +35,8 @@ struct rtnl_link_stats {
>>  	/* for cslip etc */
>>  	__u32	rx_compressed;
>>  	__u32	tx_compressed;
>> +
>> +	__u32	rx_nohandler;		/* dropped, no handler found	*/
>>  };
>>  
>>  /* The main device statistics structure */
>> @@ -68,6 +70,8 @@ struct rtnl_link_stats64 {
>>  	/* for cslip etc */
>>  	__u64	rx_compressed;
>>  	__u64	tx_compressed;
>> +
>> +	__u64	rx_nohandler;		/* dropped, no handler found	*/
>>  };
> 
> Why was this userspace ABI change allowed?
> The stats structure is exposed to user space via netlink
> and changing the size of responses will break iproute2 commands.
> 
> The code will be expecting one size and the response will vary and
> break existing code.  Yes, the code should check the size
> of the response, but it doesn't and I am sure iproute2 is not
> the only code that does this.

Jarod, please look into this.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-07 19:46         ` David Miller
@ 2016-02-07 20:19           ` Eric Dumazet
  2016-02-08 18:32             ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-02-07 20:19 UTC (permalink / raw)
  To: David Miller
  Cc: stephen, jarod, linux-kernel, edumazet, jiri, daniel, tom,
	j.vosburgh, vfalico, gospo, netdev

On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote:

> > Why was this userspace ABI change allowed?
> > The stats structure is exposed to user space via netlink
> > and changing the size of responses will break iproute2 commands.

I do not think it breaks anything.

iproute2 always assumed kernel was sending at least 23 u64, and does not
check at all if the kernel sends more. (or less, so iproute2 can print
garbage if kernel is malicious)

an iproute2 patch will be needed to automatically detect if new kernels
are sending more data and print it accordingly.

> > 
> > The code will be expecting one size and the response will vary and
> > break existing code.  Yes, the code should check the size
> > of the response, but it doesn't and I am sure iproute2 is not
> > the only code that does this.
> 
> Jarod, please look into this.

Running latest net-next, and old iproute2 is just fine.

# ip -s link sh dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid
001a11fffec30d80 state UP mode DEFAULT group default qlen 16000
    link/ether 00:1a:11:c3:0d:7f brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    533766     1875     0       0       0       135     
    TX: bytes  packets  errors  dropped carrier collsns 
    209204     1858     0       0       0       0       

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-07 20:19           ` Eric Dumazet
@ 2016-02-08 18:32             ` Jarod Wilson
  2016-02-08 19:38               ` Stephen Hemminger
  0 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-02-08 18:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, stephen, linux-kernel, edumazet, jiri, daniel, tom,
	j.vosburgh, vfalico, gospo, netdev

On Sun, Feb 07, 2016 at 12:19:28PM -0800, Eric Dumazet wrote:
> On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote:
> 
> > > Why was this userspace ABI change allowed?
> > > The stats structure is exposed to user space via netlink
> > > and changing the size of responses will break iproute2 commands.
> 
> I do not think it breaks anything.
> 
> iproute2 always assumed kernel was sending at least 23 u64, and does not
> check at all if the kernel sends more. (or less, so iproute2 can print
> garbage if kernel is malicious)
> 
> an iproute2 patch will be needed to automatically detect if new kernels
> are sending more data and print it accordingly.

My TODO list did include poking at iproute2 to expose the new info, I can
take a closer look for possible issues as well, but...

> > > The code will be expecting one size and the response will vary and
> > > break existing code.  Yes, the code should check the size
> > > of the response, but it doesn't and I am sure iproute2 is not
> > > the only code that does this.
> > 
> > Jarod, please look into this.
> 
> Running latest net-next, and old iproute2 is just fine.

...I haven't run into anything that didn't work with current iproute2
either while testing out functionality of these patches. If there's
something in particular that seems most suspect that I perhaps simply
haven't tried, I can give that a go as well.

In any case, I'm definitely due to take a look at iproute2 as it relates
to this patchset.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-08 18:32             ` Jarod Wilson
@ 2016-02-08 19:38               ` Stephen Hemminger
  2016-02-08 22:57                 ` Eric Dumazet
  0 siblings, 1 reply; 64+ messages in thread
From: Stephen Hemminger @ 2016-02-08 19:38 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Eric Dumazet, David Miller, linux-kernel, edumazet, jiri, daniel,
	tom, j.vosburgh, vfalico, gospo, netdev

On Mon, 8 Feb 2016 13:32:54 -0500
Jarod Wilson <jarod@redhat.com> wrote:

> On Sun, Feb 07, 2016 at 12:19:28PM -0800, Eric Dumazet wrote:
> > On Sun, 2016-02-07 at 14:46 -0500, David Miller wrote:
> > 
> > > > Why was this userspace ABI change allowed?
> > > > The stats structure is exposed to user space via netlink
> > > > and changing the size of responses will break iproute2 commands.
> > 
> > I do not think it breaks anything.
> > 
> > iproute2 always assumed kernel was sending at least 23 u64, and does not
> > check at all if the kernel sends more. (or less, so iproute2 can print
> > garbage if kernel is malicious)
> > 
> > an iproute2 patch will be needed to automatically detect if new kernels
> > are sending more data and print it accordingly.
> 
> My TODO list did include poking at iproute2 to expose the new info, I can
> take a closer look for possible issues as well, but...
> 
> > > > The code will be expecting one size and the response will vary and
> > > > break existing code.  Yes, the code should check the size
> > > > of the response, but it doesn't and I am sure iproute2 is not
> > > > the only code that does this.
> > > 
> > > Jarod, please look into this.
> > 
> > Running latest net-next, and old iproute2 is just fine.
> 
> ...I haven't run into anything that didn't work with current iproute2
> either while testing out functionality of these patches. If there's
> something in particular that seems most suspect that I perhaps simply
> haven't tried, I can give that a go as well.
> 
> In any case, I'm definitely due to take a look at iproute2 as it relates
> to this patchset.
> 

The iproute2 command can be fixed, but adding dependency on size of response
gets gross fast.  Imagine when 4 more fields get added, this doesn't scale well.

Also, the definition of userspace ABI is that structures can't change.
There are many other utilities that are not visible that may get broken.
Traditionally Linux has guaranteed that programs will continue to work
no matter how they were coded.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-08 19:38               ` Stephen Hemminger
@ 2016-02-08 22:57                 ` Eric Dumazet
  2016-02-09  8:40                   ` David Miller
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-02-08 22:57 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarod Wilson, David Miller, linux-kernel, edumazet, jiri, daniel,
	tom, j.vosburgh, vfalico, gospo, netdev

On Mon, 2016-02-08 at 11:38 -0800, Stephen Hemminger wrote:

> The iproute2 command can be fixed, but adding dependency on size of response
> gets gross fast.  Imagine when 4 more fields get added, this doesn't scale well.

Really ? I see no problem at all doing the proper tests.

> 
> Also, the definition of userspace ABI is that structures can't change.
> There are many other utilities that are not visible that may get broken.
> Traditionally Linux has guaranteed that programs will continue to work
> no matter how they were coded.

Stephen, we have been doing that for years.

Whole point of TLV is that it allows us to add new fields at the end of
the structures.

Yes, some buggy programs might need a fix.

Look at iproute2, you were the one adding in 2004 code to cope with
various tcp_info sizes.

So 12 years later, you cannot say it does not work anymore.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-08 22:57                 ` Eric Dumazet
@ 2016-02-09  8:40                   ` David Miller
  2016-02-09 10:56                     ` Jamal Hadi Salim
  0 siblings, 1 reply; 64+ messages in thread
From: David Miller @ 2016-02-09  8:40 UTC (permalink / raw)
  To: eric.dumazet
  Cc: stephen, jarod, linux-kernel, edumazet, jiri, daniel, tom,
	j.vosburgh, vfalico, gospo, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 08 Feb 2016 14:57:40 -0800

> Whole point of TLV is that it allows us to add new fields at the end of
> the structures.
 ...
> Look at iproute2, you were the one adding in 2004 code to cope with
> various tcp_info sizes.
> 
> So 12 years later, you cannot say it does not work anymore.

+1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net v3 2/4] net: add rx_nohandler stat counter
  2016-02-09  8:40                   ` David Miller
@ 2016-02-09 10:56                     ` Jamal Hadi Salim
  2016-02-09 19:17                       ` [PATCH net-next iproute2] iplink: display rx nohandler stats Stephen Hemminger
  0 siblings, 1 reply; 64+ messages in thread
From: Jamal Hadi Salim @ 2016-02-09 10:56 UTC (permalink / raw)
  To: David Miller, eric.dumazet
  Cc: stephen, jarod, linux-kernel, edumazet, jiri, daniel, tom,
	j.vosburgh, vfalico, gospo, netdev

On 16-02-09 03:40 AM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 08 Feb 2016 14:57:40 -0800
>
>> Whole point of TLV is that it allows us to add new fields at the end of
>> the structures.
>   ...
>> Look at iproute2, you were the one adding in 2004 code to cope with
>> various tcp_info sizes.
>>
>> So 12 years later, you cannot say it does not work anymore.
>
> +1
>

The TLV L should be canonical way to determine length. i.e should be
sufficient to just look at L and understand that content has changed.
But:
Using sizeof could be dangerous unless the data is packed to be
32-bit aligned. Looking INET_DIAG_INFO check for sizeof
there is a small 8 bit hole in tcp_info I think between
these two fields:

----
__u8    tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4;
__u32   tcpi_rto;
---

The kernel will pad to make sure the TLV data is 32-bit aligned.
I am not sure if that will be the same length as sizeof() in all
hardware + compilers... For this case,
it is almost safe to just add a version field - probably in the hole.
Or have a #define to say what the expected length should be. Or add
an 8 bit pad.

In general adding new fields that are non-optional is problematic. i.e
by non-optional i mean always expected to be present.
I think a good test is old kernel with new iproute2. If the new field
is non-optional, it will fail (example iproute2 may try to print a value
that it expects but because old kernel doesnt understand it; it is 
non-existent).

cheers,
jamal

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH net-next iproute2] iplink: display rx nohandler stats
  2016-02-09 10:56                     ` Jamal Hadi Salim
@ 2016-02-09 19:17                       ` Stephen Hemminger
  2016-02-09 23:51                         ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: Stephen Hemminger @ 2016-02-09 19:17 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David Miller, eric.dumazet, jarod, linux-kernel, edumazet, jiri,
	daniel, tom, j.vosburgh, vfalico, gospo, netdev

Support for the new rx_nohandler statistic.
This code is designed to handle the case where the kernel reported statistic
structure is smaller than the larger structure in later releases (and vice versa).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 ip/ipaddress.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 9d254d2..c4a8fc3 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -481,7 +481,8 @@ static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
 	/* RX error stats */
 	if (show_stats > 1) {
 		fprintf(fp, "%s", _SL_);
-		fprintf(fp, "    RX errors: length   crc     frame   fifo    missed%s", _SL_);
+		fprintf(fp, "    RX errors: length   crc     frame   fifo    missed%s%s",
+			s->rx_nohandler ? "   nohandler" : "",  _SL_);
 
 		fprintf(fp, "               ");
 		print_num(fp, 8, s->rx_length_errors);
@@ -489,6 +490,9 @@ static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
 		print_num(fp, 7, s->rx_frame_errors);
 		print_num(fp, 7, s->rx_fifo_errors);
 		print_num(fp, 7, s->rx_missed_errors);
+		if (s->rx_nohandler)
+			print_num(fp, 7, s->rx_nohandler);
+
 	}
 	fprintf(fp, "%s", _SL_);
 
@@ -496,7 +500,6 @@ static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
 	fprintf(fp, "    TX: bytes  packets  errors  dropped carrier collsns %s%s",
 		s->tx_compressed ? "compressed" : "", _SL_);
 
-
 	fprintf(fp, "    ");
 	print_num(fp, 10, s->tx_bytes);
 	print_num(fp, 8, s->tx_packets);
@@ -546,13 +549,16 @@ static void print_link_stats32(FILE *fp, const struct rtnl_link_stats *s,
 	/* RX error stats */
 	if (show_stats > 1) {
 		fprintf(fp, "%s", _SL_);
-		fprintf(fp, "    RX errors: length   crc     frame   fifo    missed%s", _SL_);
+		fprintf(fp, "    RX errors: length   crc     frame   fifo    missed%s%s",
+			s->rx_nohandler ? "   nohandler" : "",  _SL_);
 		fprintf(fp, "               ");
 		print_num(fp, 8, s->rx_length_errors);
 		print_num(fp, 7, s->rx_crc_errors);
 		print_num(fp, 7, s->rx_frame_errors);
 		print_num(fp, 7, s->rx_fifo_errors);
 		print_num(fp, 7, s->rx_missed_errors);
+		if (s->rx_nohandler)
+			print_num(fp, 7, s->rx_nohandler);
 	}
 	fprintf(fp, "%s", _SL_);
 
@@ -590,12 +596,23 @@ static void print_link_stats32(FILE *fp, const struct rtnl_link_stats *s,
 
 static void __print_link_stats(FILE *fp, struct rtattr **tb)
 {
-	if (tb[IFLA_STATS64])
-		print_link_stats64(fp, RTA_DATA(tb[IFLA_STATS64]),
-					tb[IFLA_CARRIER_CHANGES]);
-	else if (tb[IFLA_STATS])
-		print_link_stats32(fp, RTA_DATA(tb[IFLA_STATS]),
-					tb[IFLA_CARRIER_CHANGES]);
+	const struct rtattr *carrier_changes = tb[IFLA_CARRIER_CHANGES];
+
+	if (tb[IFLA_STATS64]) {
+		struct rtnl_link_stats64 stats = { 0 };
+
+		memcpy(&stats, RTA_DATA(tb[IFLA_STATS64]),
+		       MIN(RTA_PAYLOAD(tb[IFLA_STATS64]), sizeof(stats)));
+
+		print_link_stats64(fp, &stats, carrier_changes);
+	} else if (tb[IFLA_STATS]) {
+		struct rtnl_link_stats stats = { 0 };
+
+		memcpy(&stats, RTA_DATA(tb[IFLA_STATS]),
+		       MIN(RTA_PAYLOAD(tb[IFLA_STATS]), sizeof(stats)));
+
+		print_link_stats32(fp, &stats, carrier_changes);
+	}
 }
 
 static void print_link_stats(FILE *fp, struct nlmsghdr *n)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next iproute2] iplink: display rx nohandler stats
  2016-02-09 19:17                       ` [PATCH net-next iproute2] iplink: display rx nohandler stats Stephen Hemminger
@ 2016-02-09 23:51                         ` Jarod Wilson
  2016-02-10  1:41                           ` Stephen Hemminger
  0 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-02-09 23:51 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jamal Hadi Salim, David Miller, eric.dumazet, linux-kernel,
	edumazet, jiri, daniel, tom, j.vosburgh, vfalico, gospo, netdev

On Tue, Feb 09, 2016 at 11:17:57AM -0800, Stephen Hemminger wrote:
> Support for the new rx_nohandler statistic.
> This code is designed to handle the case where the kernel reported statistic
> structure is smaller than the larger structure in later releases (and vice versa).

This seems to work here, for the most part. However, if you are running a
kernel with the new counter, and the counter happens to contain 0, aren't
we going to not print anything?

I've got a tweaked version here locally that gets a touch messy, where I
get a count of members from RTA_DATA(IFLA_STATS{,64} / sizeof(__u{32,64}),
pass that into the print functions, and key off that length for whether or
not to print the extra members, so they'll show up even when 0, if they're
supported. This does rely on strict ordering of the struct members, no
reordering, no removals, etc., but I think everyone is already in favor of
that. Looks like the same sort of length checks could be used for
rx_compressed and tx_compressed as well, as I think they fall victim to
the same issue of not printing if those counters are legitimately 0. Yes,
it's a little uglier, and more brittle, but more accurate output.

Work-in-progress patch:

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 9d254d2..ae4359a 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -462,7 +462,8 @@ static void print_vf_stats64(FILE *fp, struct rtattr *vfstats)
 }
 
 static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
-                               const struct rtattr *carrier_changes)
+                               const struct rtattr *carrier_changes,
+			       unsigned slen)
 {
 	/* RX stats */
 	fprintf(fp, "    RX: bytes  packets  errors  dropped overrun mcast   %s%s",
@@ -481,14 +482,16 @@ static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
 	/* RX error stats */
 	if (show_stats > 1) {
 		fprintf(fp, "%s", _SL_);
-		fprintf(fp, "    RX errors: length   crc     frame   fifo    missed%s", _SL_);
-
+		fprintf(fp, "    RX errors: length   crc     frame   fifo    missed%s%s",
+			slen > 23 ? "  nohandler" : "", _SL_);
 		fprintf(fp, "               ");
 		print_num(fp, 8, s->rx_length_errors);
 		print_num(fp, 7, s->rx_crc_errors);
 		print_num(fp, 7, s->rx_frame_errors);
 		print_num(fp, 7, s->rx_fifo_errors);
 		print_num(fp, 7, s->rx_missed_errors);
+		if (slen > 23)
+			print_num(fp, 7, s->rx_nohandler);
 	}
 	fprintf(fp, "%s", _SL_);
 
@@ -496,7 +499,6 @@ static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
 	fprintf(fp, "    TX: bytes  packets  errors  dropped carrier collsns %s%s",
 		s->tx_compressed ? "compressed" : "", _SL_);
 
-
 	fprintf(fp, "    ");
 	print_num(fp, 10, s->tx_bytes);
 	print_num(fp, 8, s->tx_packets);
@@ -526,13 +528,13 @@ static void print_link_stats64(FILE *fp, const struct rtnl_link_stats64 *s,
 }
 
 static void print_link_stats32(FILE *fp, const struct rtnl_link_stats *s,
-			       const struct rtattr *carrier_changes)
+			       const struct rtattr *carrier_changes,
+			       unsigned slen)
 {
 	/* RX stats */
 	fprintf(fp, "    RX: bytes  packets  errors  dropped overrun mcast   %s%s",
 		s->rx_compressed ? "compressed" : "", _SL_);
 
-
 	fprintf(fp, "    ");
 	print_num(fp, 10, s->rx_bytes);
 	print_num(fp, 8, s->rx_packets);
@@ -546,13 +548,16 @@ static void print_link_stats32(FILE *fp, const struct rtnl_link_stats *s,
 	/* RX error stats */
 	if (show_stats > 1) {
 		fprintf(fp, "%s", _SL_);
-		fprintf(fp, "    RX errors: length   crc     frame   fifo    missed%s", _SL_);
+		fprintf(fp, "    RX errors: length   crc     frame   fifo    missed%s%s",
+			slen > 23 ? "  nohandler" : "", _SL_);
 		fprintf(fp, "               ");
 		print_num(fp, 8, s->rx_length_errors);
 		print_num(fp, 7, s->rx_crc_errors);
 		print_num(fp, 7, s->rx_frame_errors);
 		print_num(fp, 7, s->rx_fifo_errors);
 		print_num(fp, 7, s->rx_missed_errors);
+		if (slen > 23)
+			print_num(fp, 7, s->rx_nohandler);
 	}
 	fprintf(fp, "%s", _SL_);
 
@@ -590,12 +595,27 @@ static void print_link_stats32(FILE *fp, const struct rtnl_link_stats *s,
 
 static void __print_link_stats(FILE *fp, struct rtattr **tb)
 {
-	if (tb[IFLA_STATS64])
-		print_link_stats64(fp, RTA_DATA(tb[IFLA_STATS64]),
-					tb[IFLA_CARRIER_CHANGES]);
-	else if (tb[IFLA_STATS])
-		print_link_stats32(fp, RTA_DATA(tb[IFLA_STATS]),
-					tb[IFLA_CARRIER_CHANGES]);
+	const struct rtattr *carrier_changes = tb[IFLA_CARRIER_CHANGES];
+	unsigned slen;
+
+	if (tb[IFLA_STATS64]) {
+		struct rtnl_link_stats64 stats = { 0 };
+		slen = RTA_PAYLOAD(tb[IFLA_STATS64]) / sizeof(__u64);
+
+		memcpy(&stats, RTA_DATA(tb[IFLA_STATS64]),
+		       MIN(RTA_PAYLOAD(tb[IFLA_STATS64]), sizeof(stats)));
+
+		print_link_stats64(fp, &stats, carrier_changes, slen);
+	} else if (tb[IFLA_STATS]) {
+		struct rtnl_link_stats stats = { 0 };
+		slen = RTA_PAYLOAD(tb[IFLA_STATS]) / sizeof(__u32);
+
+		memcpy(&stats, RTA_DATA(tb[IFLA_STATS]),
+		       MIN(RTA_PAYLOAD(tb[IFLA_STATS]), sizeof(stats)));
+
+		print_link_stats32(fp, &stats, carrier_changes, slen);
+	}
+
 }
 
 static void print_link_stats(FILE *fp, struct nlmsghdr *n)

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next iproute2] iplink: display rx nohandler stats
  2016-02-09 23:51                         ` Jarod Wilson
@ 2016-02-10  1:41                           ` Stephen Hemminger
  2016-02-10  4:52                             ` Eric Dumazet
  0 siblings, 1 reply; 64+ messages in thread
From: Stephen Hemminger @ 2016-02-10  1:41 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Jamal Hadi Salim, David Miller, eric.dumazet, linux-kernel,
	edumazet, jiri, daniel, tom, j.vosburgh, vfalico, gospo, netdev

On Tue, 9 Feb 2016 18:51:35 -0500
Jarod Wilson <jarod@redhat.com> wrote:

> On Tue, Feb 09, 2016 at 11:17:57AM -0800, Stephen Hemminger wrote:
> > Support for the new rx_nohandler statistic.
> > This code is designed to handle the case where the kernel reported statistic
> > structure is smaller than the larger structure in later releases (and vice versa).
> 
> This seems to work here, for the most part. However, if you are running a
> kernel with the new counter, and the counter happens to contain 0, aren't
> we going to not print anything?

That is the desirable outcome, since if run on older system the
output format will not change from current format.


> I've got a tweaked version here locally that gets a touch messy, where I
> get a count of members from RTA_DATA(IFLA_STATS{,64} / sizeof(__u{32,64}),
> pass that into the print functions, and key off that length for whether or
> not to print the extra members, so they'll show up even when 0, if they're
> supported. This does rely on strict ordering of the struct members, no
> reordering, no removals, etc., but I think everyone is already in favor of
> that. Looks like the same sort of length checks could be used for
> rx_compressed and tx_compressed as well, as I think they fall victim to
> the same issue of not printing if those counters are legitimately 0. Yes,
> it's a little uglier, and more brittle, but more accurate output.
> 

I don't like the added complexity.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next iproute2] iplink: display rx nohandler stats
  2016-02-10  1:41                           ` Stephen Hemminger
@ 2016-02-10  4:52                             ` Eric Dumazet
  2016-02-10 13:20                               ` Jarod Wilson
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2016-02-10  4:52 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarod Wilson, Jamal Hadi Salim, David Miller, linux-kernel,
	edumazet, jiri, daniel, tom, j.vosburgh, vfalico, gospo, netdev

On Tue, 2016-02-09 at 17:41 -0800, Stephen Hemminger wrote:
> On Tue, 9 Feb 2016 18:51:35 -0500
> Jarod Wilson <jarod@redhat.com> wrote:
> 
> > On Tue, Feb 09, 2016 at 11:17:57AM -0800, Stephen Hemminger wrote:
> > > Support for the new rx_nohandler statistic.
> > > This code is designed to handle the case where the kernel reported statistic
> > > structure is smaller than the larger structure in later releases (and vice versa).
> > 
> > This seems to work here, for the most part. However, if you are running a
> > kernel with the new counter, and the counter happens to contain 0, aren't
> > we going to not print anything?
> 
> That is the desirable outcome, since if run on older system the
> output format will not change from current format.

The problem here is that a change in output might break some user
scripts using sed/whatever games.

So it might be better to output a zero field, so that such breakages are
detected early, even if no packet was dropped at the time the new kernel
was tested.

Having a binary that adds the new field only in some cases hides the
change. It looks fine for us humans, but not for programs processing the
output.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next iproute2] iplink: display rx nohandler stats
  2016-02-10  4:52                             ` Eric Dumazet
@ 2016-02-10 13:20                               ` Jarod Wilson
  2016-02-10 15:06                                 ` Andy Gospodarek
  0 siblings, 1 reply; 64+ messages in thread
From: Jarod Wilson @ 2016-02-10 13:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Jamal Hadi Salim, David Miller, linux-kernel,
	edumazet, jiri, daniel, tom, j.vosburgh, vfalico, gospo, netdev

On Tue, Feb 09, 2016 at 08:52:38PM -0800, Eric Dumazet wrote:
> On Tue, 2016-02-09 at 17:41 -0800, Stephen Hemminger wrote:
> > On Tue, 9 Feb 2016 18:51:35 -0500
> > Jarod Wilson <jarod@redhat.com> wrote:
> > 
> > > On Tue, Feb 09, 2016 at 11:17:57AM -0800, Stephen Hemminger wrote:
> > > > Support for the new rx_nohandler statistic.
> > > > This code is designed to handle the case where the kernel reported statistic
> > > > structure is smaller than the larger structure in later releases (and vice versa).
> > > 
> > > This seems to work here, for the most part. However, if you are running a
> > > kernel with the new counter, and the counter happens to contain 0, aren't
> > > we going to not print anything?
> > 
> > That is the desirable outcome, since if run on older system the
> > output format will not change from current format.
> 
> The problem here is that a change in output might break some user
> scripts using sed/whatever games.
> 
> So it might be better to output a zero field, so that such breakages are
> detected early, even if no packet was dropped at the time the new kernel
> was tested.
> 
> Having a binary that adds the new field only in some cases hides the
> change. It looks fine for us humans, but not for programs processing the
> output.

On my test setup, my bond's active interface currently has 0, while the
backup interface has a few thousand, so I can alternate back and forth
checking the interfaces, and one doesn't print the counter while the other
does, which is what seemed odd to me and prompted the added ugliness. But
most setups (anything outside of bond/team currently) should never have
this counter incremented, we do have prior art with the compressed fields,
and scripts really probably ought to be scraping stats out of sysfs rather
than using ip, so I can sort of understand not wanting the added ugliness.
I do tend to prefer consistency though.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH net-next iproute2] iplink: display rx nohandler stats
  2016-02-10 13:20                               ` Jarod Wilson
@ 2016-02-10 15:06                                 ` Andy Gospodarek
  0 siblings, 0 replies; 64+ messages in thread
From: Andy Gospodarek @ 2016-02-10 15:06 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: Eric Dumazet, Stephen Hemminger, Jamal Hadi Salim, David Miller,
	linux-kernel, edumazet, jiri, daniel, tom, j.vosburgh, vfalico,
	netdev

On Wed, Feb 10, 2016 at 08:20:59AM -0500, Jarod Wilson wrote:
> On Tue, Feb 09, 2016 at 08:52:38PM -0800, Eric Dumazet wrote:
> > On Tue, 2016-02-09 at 17:41 -0800, Stephen Hemminger wrote:
> > > On Tue, 9 Feb 2016 18:51:35 -0500
> > > Jarod Wilson <jarod@redhat.com> wrote:
> > > 
> > > > On Tue, Feb 09, 2016 at 11:17:57AM -0800, Stephen Hemminger wrote:
> > > > > Support for the new rx_nohandler statistic.
> > > > > This code is designed to handle the case where the kernel reported statistic
> > > > > structure is smaller than the larger structure in later releases (and vice versa).
> > > > 
> > > > This seems to work here, for the most part. However, if you are running a
> > > > kernel with the new counter, and the counter happens to contain 0, aren't
> > > > we going to not print anything?
> > > 
> > > That is the desirable outcome, since if run on older system the
> > > output format will not change from current format.
> > 
> > The problem here is that a change in output might break some user
> > scripts using sed/whatever games.
> > 
> > So it might be better to output a zero field, so that such breakages are
> > detected early, even if no packet was dropped at the time the new kernel
> > was tested.
> > 
> > Having a binary that adds the new field only in some cases hides the
> > change. It looks fine for us humans, but not for programs processing the
> > output.
> 
> On my test setup, my bond's active interface currently has 0, while the
> backup interface has a few thousand, so I can alternate back and forth
> checking the interfaces, and one doesn't print the counter while the other
> does, which is what seemed odd to me and prompted the added ugliness. But
> most setups (anything outside of bond/team currently) should never have
> this counter incremented, we do have prior art with the compressed fields,
> and scripts really probably ought to be scraping stats out of sysfs rather
> than using ip, so I can sort of understand not wanting the added ugliness.
> I do tend to prefer consistency though.

FWIW, I tend to agree with Jarod and Eric on this.  Consistency seems
better, even if 0 all the time.

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2016-02-10 15:06 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-22 19:11 [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves Jarod Wilson
2016-01-22 20:59 ` Jay Vosburgh
2016-01-23  8:26   ` Jiri Pirko
2016-01-23  8:07 ` Jiri Pirko
2016-01-23 14:19 ` Andy Gospodarek
2016-01-23 15:23 ` Eric Dumazet
2016-01-26 21:14   ` Jarod Wilson
2016-01-26 21:21     ` David Miller
2016-01-26 21:36       ` Jarod Wilson
2016-01-26 21:24     ` Eric Dumazet
2016-01-26 21:35       ` Jarod Wilson
2016-01-25  6:42 ` David Miller
2016-01-25 14:27   ` Jarod Wilson
2016-01-26  4:45     ` Jarod Wilson
2016-01-27 20:21 ` [PATCH net 0/4] net: add rx_unhandled stat counter Jarod Wilson
2016-01-27 20:21   ` [PATCH net 1/4] " Jarod Wilson
2016-01-27 20:21   ` [PATCH net 2/4] net-procfs: show rx_unhandled counters Jarod Wilson
2016-01-27 20:21   ` [PATCH net 3/4] team: track sum of rx_unhandled for all slaves Jarod Wilson
2016-01-27 20:21   ` [PATCH net 4/4] bond: " Jarod Wilson
2016-01-27 21:09   ` [PATCH net 0/4] net: add rx_unhandled stat counter Eric Dumazet
2016-01-28  6:02     ` Jarod Wilson
2016-01-28  6:10       ` Jarod Wilson
2016-01-28  6:18       ` Jarod Wilson
2016-01-28 13:00         ` Eric Dumazet
2016-01-28 14:38           ` Jarod Wilson
2016-01-28 14:42             ` Eric Dumazet
2016-01-28 14:44               ` Eric Dumazet
2016-01-28 14:46                 ` Eric Dumazet
2016-01-28 15:11                   ` Jarod Wilson
2016-01-28 13:00       ` Eric Dumazet
2016-01-28 15:49 ` [PATCH net v2 0/4] net: add and use rx_nohandler " Jarod Wilson
2016-01-28 15:49   ` [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
2016-01-28 15:49   ` [PATCH net v2 2/4] net: add rx_nohandler stat counter Jarod Wilson
2016-01-28 15:49   ` [PATCH net v2 3/4] team: track sum of rx_nohandler for all slaves Jarod Wilson
2016-01-28 15:49   ` [PATCH net v2 4/4] bond: " Jarod Wilson
2016-01-30  3:37   ` [PATCH net v2 0/4] net: add and use rx_nohandler stat counter David Miller
2016-01-30 18:16     ` Jarod Wilson
2016-01-30 18:19   ` [PATCH net v2 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
2016-01-30 18:34     ` Eric Dumazet
2016-01-30 20:39       ` Jarod Wilson
2016-01-30 20:53         ` Jarod Wilson
2016-01-30 23:26           ` David Miller
2016-01-31 18:07             ` Jarod Wilson
2016-02-01 23:51   ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter Jarod Wilson
2016-02-01 23:51     ` [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson
2016-02-01 23:51     ` [PATCH net v3 2/4] net: add rx_nohandler stat counter Jarod Wilson
2016-02-07 19:37       ` Stephen Hemminger
2016-02-07 19:46         ` David Miller
2016-02-07 20:19           ` Eric Dumazet
2016-02-08 18:32             ` Jarod Wilson
2016-02-08 19:38               ` Stephen Hemminger
2016-02-08 22:57                 ` Eric Dumazet
2016-02-09  8:40                   ` David Miller
2016-02-09 10:56                     ` Jamal Hadi Salim
2016-02-09 19:17                       ` [PATCH net-next iproute2] iplink: display rx nohandler stats Stephen Hemminger
2016-02-09 23:51                         ` Jarod Wilson
2016-02-10  1:41                           ` Stephen Hemminger
2016-02-10  4:52                             ` Eric Dumazet
2016-02-10 13:20                               ` Jarod Wilson
2016-02-10 15:06                                 ` Andy Gospodarek
2016-02-01 23:51     ` [PATCH net v3 3/4] team: track sum of rx_nohandler for all slaves Jarod Wilson
2016-02-01 23:51     ` [PATCH net v3 4/4] bond: " Jarod Wilson
2016-02-06  8:00     ` [PATCH net v3 0/4] net: add and use rx_nohandler stat counter David Miller
2016-01-28 16:22 ` [PATCH net v3 1/4] net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64 Jarod Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.