netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] af_packet: convert pending frame counter to atomic_t
@ 2019-06-28 14:52 Neil Horman
  2019-06-28 15:15 ` Willem de Bruijn
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Neil Horman @ 2019-06-28 14:52 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, David S. Miller, Willem de Bruijn

The AF_PACKET protocol, when running as a memory mapped socket uses a
pending frame counter to track the number of skbs in flight during
transmission.  It is incremented during the sendmsg call (via
tpacket_snd), and decremented (possibly asynchronously) in the skb
desructor during skb_free.

The counter is currently implemented as a percpu variable for each open
socket, but for reads (via packet_read_pending), we iterate over every
cpu variable, accumulating the total pending count.

Given that the socket transmit path is an exclusive path (locked via the
pg_vec_lock mutex), we do not have the ability to increment this counter
on multiple cpus in parallel.  This implementation also seems to have
the potential to be broken, in that, should an skb be freed on a cpu
other than the one that it was initially transmitted on, we may
decrement a counter that was not initially incremented, leading to
underflow.

As such, adjust the packet socket struct to convert the per-cpu counter
to an atomic_t variable (to enforce consistency between the send path
and the skb free path).  This saves us some space in the packet_sock
structure, prevents the possibility of underflow, and should reduce the
run time of packet_read_pending, as we only need to read a single
variable, instead of having to loop over every available cpu variable
instance.

Tested by myself by running a small program which sends frames via
AF_PACKET on multiple cpus in parallel, with good results.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Willem de Bruijn <willemb@google.com>
---
 net/packet/af_packet.c | 40 +++++-----------------------------------
 net/packet/internal.h  |  2 +-
 2 files changed, 6 insertions(+), 36 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8d54f3047768..25ffb486fac9 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1154,43 +1154,17 @@ static void packet_increment_head(struct packet_ring_buffer *buff)
 
 static void packet_inc_pending(struct packet_ring_buffer *rb)
 {
-	this_cpu_inc(*rb->pending_refcnt);
+	atomic_inc(&rb->pending_refcnt);
 }
 
 static void packet_dec_pending(struct packet_ring_buffer *rb)
 {
-	this_cpu_dec(*rb->pending_refcnt);
+	atomic_dec(&rb->pending_refcnt);
 }
 
 static unsigned int packet_read_pending(const struct packet_ring_buffer *rb)
 {
-	unsigned int refcnt = 0;
-	int cpu;
-
-	/* We don't use pending refcount in rx_ring. */
-	if (rb->pending_refcnt == NULL)
-		return 0;
-
-	for_each_possible_cpu(cpu)
-		refcnt += *per_cpu_ptr(rb->pending_refcnt, cpu);
-
-	return refcnt;
-}
-
-static int packet_alloc_pending(struct packet_sock *po)
-{
-	po->rx_ring.pending_refcnt = NULL;
-
-	po->tx_ring.pending_refcnt = alloc_percpu(unsigned int);
-	if (unlikely(po->tx_ring.pending_refcnt == NULL))
-		return -ENOBUFS;
-
-	return 0;
-}
-
-static void packet_free_pending(struct packet_sock *po)
-{
-	free_percpu(po->tx_ring.pending_refcnt);
+	atomic_read(&rb->pending_refcnt);
 }
 
 #define ROOM_POW_OFF	2
@@ -3046,7 +3020,6 @@ static int packet_release(struct socket *sock)
 	/* Purge queues */
 
 	skb_queue_purge(&sk->sk_receive_queue);
-	packet_free_pending(po);
 	sk_refcnt_debug_release(sk);
 
 	sock_put(sk);
@@ -3236,9 +3209,8 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	po->num = proto;
 	po->xmit = dev_queue_xmit;
 
-	err = packet_alloc_pending(po);
-	if (err)
-		goto out2;
+	atomic_set(&po->tx_ring.pending_refcnt,0);
+	atomic_set(&po->rx_ring.pending_refcnt,0);
 
 	packet_cached_dev_reset(po);
 
@@ -3273,8 +3245,6 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	preempt_enable();
 
 	return 0;
-out2:
-	sk_free(sk);
 out:
 	return err;
 }
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 82fb2b10f790..b0fdb54bb91b 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -68,7 +68,7 @@ struct packet_ring_buffer {
 	unsigned int		pg_vec_pages;
 	unsigned int		pg_vec_len;
 
-	unsigned int __percpu	*pending_refcnt;
+	atomic_t		pending_refcnt;
 
 	struct tpacket_kbdq_core	prb_bdqc;
 };
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] af_packet: convert pending frame counter to atomic_t
  2019-06-28 14:52 [PATCH net-next] af_packet: convert pending frame counter to atomic_t Neil Horman
@ 2019-06-28 15:15 ` Willem de Bruijn
  2019-06-28 19:54   ` Neil Horman
  2019-06-28 20:46 ` kbuild test robot
  2019-07-02 21:03 ` David Miller
  2 siblings, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2019-06-28 15:15 UTC (permalink / raw)
  To: Neil Horman
  Cc: Network Development, David S. Miller, Willem de Bruijn, Daniel Borkmann

On Fri, Jun 28, 2019 at 10:53 AM Neil Horman <nhorman@tuxdriver.com> wrote:
>
> The AF_PACKET protocol, when running as a memory mapped socket uses a
> pending frame counter to track the number of skbs in flight during
> transmission.  It is incremented during the sendmsg call (via
> tpacket_snd), and decremented (possibly asynchronously) in the skb
> desructor during skb_free.
>
> The counter is currently implemented as a percpu variable for each open
> socket, but for reads (via packet_read_pending), we iterate over every
> cpu variable, accumulating the total pending count.
>
> Given that the socket transmit path is an exclusive path (locked via the
> pg_vec_lock mutex), we do not have the ability to increment this counter
> on multiple cpus in parallel.  This implementation also seems to have
> the potential to be broken, in that, should an skb be freed on a cpu
> other than the one that it was initially transmitted on, we may
> decrement a counter that was not initially incremented, leading to
> underflow.
>
> As such, adjust the packet socket struct to convert the per-cpu counter
> to an atomic_t variable (to enforce consistency between the send path
> and the skb free path).  This saves us some space in the packet_sock
> structure, prevents the possibility of underflow, and should reduce the
> run time of packet_read_pending, as we only need to read a single
> variable, instead of having to loop over every available cpu variable
> instance.
>
> Tested by myself by running a small program which sends frames via
> AF_PACKET on multiple cpus in parallel, with good results.
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: Willem de Bruijn <willemb@google.com>
> ---

This essentially is a revert of commit b013840810c2 ("packet: use
percpu mmap tx frame pending refcount"). That has some benchmark
numbers and also discusses the overflow issue.

I think more interesting would be to eschew the counter when
MSG_DONTWAIT, as it is only used to know when to exit the loop if
need_wait.

But, IMHO packet sockets are deprecated in favor of AF_XDP and
should be limited to bug fixes at this point.

>  net/packet/af_packet.c | 40 +++++-----------------------------------
>  net/packet/internal.h  |  2 +-
>  2 files changed, 6 insertions(+), 36 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 8d54f3047768..25ffb486fac9 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1154,43 +1154,17 @@ static void packet_increment_head(struct packet_ring_buffer *buff)
>
>  static void packet_inc_pending(struct packet_ring_buffer *rb)
>  {
> -       this_cpu_inc(*rb->pending_refcnt);
> +       atomic_inc(&rb->pending_refcnt);
>  }

If making this change, can also remove these helper functions. The
layer of indirection just hinders readability.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] af_packet: convert pending frame counter to atomic_t
  2019-06-28 15:15 ` Willem de Bruijn
@ 2019-06-28 19:54   ` Neil Horman
  0 siblings, 0 replies; 5+ messages in thread
From: Neil Horman @ 2019-06-28 19:54 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, David S. Miller, Willem de Bruijn, Daniel Borkmann

On Fri, Jun 28, 2019 at 11:15:09AM -0400, Willem de Bruijn wrote:
> On Fri, Jun 28, 2019 at 10:53 AM Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > The AF_PACKET protocol, when running as a memory mapped socket uses a
> > pending frame counter to track the number of skbs in flight during
> > transmission.  It is incremented during the sendmsg call (via
> > tpacket_snd), and decremented (possibly asynchronously) in the skb
> > desructor during skb_free.
> >
> > The counter is currently implemented as a percpu variable for each open
> > socket, but for reads (via packet_read_pending), we iterate over every
> > cpu variable, accumulating the total pending count.
> >
> > Given that the socket transmit path is an exclusive path (locked via the
> > pg_vec_lock mutex), we do not have the ability to increment this counter
> > on multiple cpus in parallel.  This implementation also seems to have
> > the potential to be broken, in that, should an skb be freed on a cpu
> > other than the one that it was initially transmitted on, we may
> > decrement a counter that was not initially incremented, leading to
> > underflow.
> >
> > As such, adjust the packet socket struct to convert the per-cpu counter
> > to an atomic_t variable (to enforce consistency between the send path
> > and the skb free path).  This saves us some space in the packet_sock
> > structure, prevents the possibility of underflow, and should reduce the
> > run time of packet_read_pending, as we only need to read a single
> > variable, instead of having to loop over every available cpu variable
> > instance.
> >
> > Tested by myself by running a small program which sends frames via
> > AF_PACKET on multiple cpus in parallel, with good results.
> >
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > CC: Willem de Bruijn <willemb@google.com>
> > ---
> 
> This essentially is a revert of commit b013840810c2 ("packet: use
> percpu mmap tx frame pending refcount"). That has some benchmark
> numbers and also discusses the overflow issue.
> 
Yes it is.  I was looking at it thinking that the accumulation issue was more
serious now that we actually sleep until we get a completion, rather than just
schedule.  I.e if we're freeing frames in parallel, I was thinking it was
possible for the ordering to result in neither instance getting a return value
from packet_read_pending of 0, which would lead to a hang/maximal timeout, but
as I look back, I think I was wrong about that.

As for the performance, you're right.  They're almost the same, but I did some
perf runs, and for 10000 iterations this patch is about 0.1% slower (measuring
system time).  I kind of wonder if it isn't worth the code and data savings, but
faster is faster.  Though thats on a 6 cpu system.  I suppose some more testing
might be warranted on high cpu count systems (i.e. is there a point at which the
looping over the cpu_possible_mask becomes more expensive), though perhaps that
just so small it doesn't matter.

> I think more interesting would be to eschew the counter when
> MSG_DONTWAIT, as it is only used to know when to exit the loop if
> need_wait.
> 
Yeah, that might be interesting.  Though that would mean needing to have
tpacket_destruct_skb be aware of wether or not the frame being free was sent as
part of a WAIT/DONTWAIT flagged call.


> But, IMHO packet sockets are deprecated in favor of AF_XDP and
> should be limited to bug fixes at this point.
> 
I don't see any documentation listing AF_PACKET as deprecated.

People can use AF_XDP to do raw frame sends if they like,
but AF_PACKET will still be around (ostensibly in perpetuity), to support
existing applications.  I see no need to avoid improving it when we can.

Neil


> >  net/packet/af_packet.c | 40 +++++-----------------------------------
> >  net/packet/internal.h  |  2 +-
> >  2 files changed, 6 insertions(+), 36 deletions(-)
> >
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 8d54f3047768..25ffb486fac9 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -1154,43 +1154,17 @@ static void packet_increment_head(struct packet_ring_buffer *buff)
> >
> >  static void packet_inc_pending(struct packet_ring_buffer *rb)
> >  {
> > -       this_cpu_inc(*rb->pending_refcnt);
> > +       atomic_inc(&rb->pending_refcnt);
> >  }
> 
> If making this change, can also remove these helper functions. The
> layer of indirection just hinders readability.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] af_packet: convert pending frame counter to atomic_t
  2019-06-28 14:52 [PATCH net-next] af_packet: convert pending frame counter to atomic_t Neil Horman
  2019-06-28 15:15 ` Willem de Bruijn
@ 2019-06-28 20:46 ` kbuild test robot
  2019-07-02 21:03 ` David Miller
  2 siblings, 0 replies; 5+ messages in thread
From: kbuild test robot @ 2019-06-28 20:46 UTC (permalink / raw)
  To: Neil Horman
  Cc: kbuild-all, netdev, Neil Horman, David S. Miller, Willem de Bruijn

[-- Attachment #1: Type: text/plain, Size: 1578 bytes --]

Hi Neil,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Neil-Horman/af_packet-convert-pending-frame-counter-to-atomic_t/20190629-032437
config: x86_64-randconfig-x007-201925 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-9) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   net/packet/af_packet.c: In function 'packet_read_pending':
>> net/packet/af_packet.c:1168:1: warning: no return statement in function returning non-void [-Wreturn-type]
    }
    ^

vim +1168 net/packet/af_packet.c

b0138408 Daniel Borkmann 2014-01-15  1164  
b0138408 Daniel Borkmann 2014-01-15  1165  static unsigned int packet_read_pending(const struct packet_ring_buffer *rb)
b0138408 Daniel Borkmann 2014-01-15  1166  {
5e9e5c90 Neil Horman     2019-06-28  1167  	atomic_read(&rb->pending_refcnt);
b0138408 Daniel Borkmann 2014-01-15 @1168  }
b0138408 Daniel Borkmann 2014-01-15  1169  

:::::: The code at line 1168 was first introduced by commit
:::::: b013840810c221f2b0cf641d01531526052dc1fb packet: use percpu mmap tx frame pending refcount

:::::: TO: Daniel Borkmann <dborkman@redhat.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32812 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] af_packet: convert pending frame counter to atomic_t
  2019-06-28 14:52 [PATCH net-next] af_packet: convert pending frame counter to atomic_t Neil Horman
  2019-06-28 15:15 ` Willem de Bruijn
  2019-06-28 20:46 ` kbuild test robot
@ 2019-07-02 21:03 ` David Miller
  2 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2019-07-02 21:03 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, willemb

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 28 Jun 2019 10:52:06 -0400

> Given that the socket transmit path is an exclusive path (locked via the
> pg_vec_lock mutex), we do not have the ability to increment this counter
> on multiple cpus in parallel.  This implementation also seems to have
> the potential to be broken, in that, should an skb be freed on a cpu
> other than the one that it was initially transmitted on, we may
> decrement a counter that was not initially incremented, leading to
> underflow.

This isn't a problem.  There is only one valid "read" operation and that
is the "summation" of all of the per-cpu counters.

All of the overflows and underflows cancel out in the situation you
describe, so all is fine.

I'm hesitant to get behind any fiddling in this area.

Thanks Neil.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-07-03  0:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-28 14:52 [PATCH net-next] af_packet: convert pending frame counter to atomic_t Neil Horman
2019-06-28 15:15 ` Willem de Bruijn
2019-06-28 19:54   ` Neil Horman
2019-06-28 20:46 ` kbuild test robot
2019-07-02 21:03 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).