All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>,
	davem@davemloft.net, edumazet@google.com, hkchu@google.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array
Date: Wed, 19 Jun 2013 12:11:32 +0300	[thread overview]
Message-ID: <20130619091132.GA2816@redhat.com> (raw)
In-Reply-To: <1371623518.3252.267.camel@edumazet-glaptop>

On Tue, Jun 18, 2013 at 11:31:58PM -0700, Eric Dumazet wrote:
> On Wed, 2013-06-19 at 13:40 +0800, Jason Wang wrote:
> > Currently, we use kcalloc to allocate rx/tx queues for a net device which could
> > be easily lead to a high order memory allocation request when initializing a
> > multiqueue net device. We can simply avoid this by switching to use flex array
> > which always allocate at order zero.
> > 
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  include/linux/netdevice.h |   13 ++++++----
> >  net/core/dev.c            |   57 ++++++++++++++++++++++++++++++++------------
> >  net/core/net-sysfs.c      |   15 +++++++----
> >  3 files changed, 58 insertions(+), 27 deletions(-)
> > 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 09b4188..c0b5d04 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -32,6 +32,7 @@
> >  #include <linux/atomic.h>
> >  #include <asm/cache.h>
> >  #include <asm/byteorder.h>
> > +#include <linux/flex_array.h>
> >  
> >  #include <linux/percpu.h>
> >  #include <linux/rculist.h>
> > @@ -1230,7 +1231,7 @@ struct net_device {
> >  
> > 
> >  #ifdef CONFIG_RPS
> > -	struct netdev_rx_queue	*_rx;
> > +	struct flex_array	*_rx;
> >  
> >  	/* Number of RX queues allocated at register_netdev() time */
> >  	unsigned int		num_rx_queues;
> > @@ -1250,7 +1251,7 @@ struct net_device {
> >  /*
> >   * Cache lines mostly used on transmit path
> >   */
> > -	struct netdev_queue	*_tx ____cacheline_aligned_in_smp;
> > +	struct flex_array	*_tx ____cacheline_aligned_in_smp;
> >  
> 
> Using flex_array and adding overhead in this super critical part of
> network stack, only to avoid order-1 allocations done in GFP_KERNEL
> context is simply insane.
> 
> We can revisit this in 2050 if we ever need order-4 allocations or so,
> and still use 4K pages.
> 
> 

Well KVM supports up to 160 VCPUs on x86.

Creating a queue per CPU is very reasonable, and
assuming cache line size of 64 bytes, netdev_queue seems to be 320
bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation.
I agree most people don't have such systems yet, but
they do exist.

We can cut the size of netdev_queue, moving out kobj - which
does not seem to be used on data path to a separate structure.
It's 64 byte in size so exactly 256 bytes.
That will get us an order-3 allocation, and there's
some padding there so we won't immediately increase it
the moment we add some fields.

Comments on this idea?

Instead of always using a flex array, we could have
+	struct netdev_queue     *_tx; /* Used with small # of queues */
+#ifdef CONFIG_NETDEV_HUGE_NUMBER_OR_QUEUES
+	struct flex_array     *_tx_large; /* Used with large # of queues */
+#endif

And fix wrappers to use _tx if not NULL, otherwise _tx_large.

If configured in, it's an extra branch on data path but probably less
costly than the extra indirection.

-- 
MST

  parent reply	other threads:[~2013-06-19  9:10 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-19  5:40 [net-next rfc 0/3] increase the limit of tuntap queues Jason Wang
2013-06-19  5:40 ` [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array Jason Wang
2013-06-19  6:31   ` Eric Dumazet
2013-06-19  7:14     ` Jason Wang
2013-06-19  9:11     ` Michael S. Tsirkin [this message]
2013-06-19  9:56       ` Eric Dumazet
2013-06-19 12:22         ` Michael S. Tsirkin
2013-06-19 15:40         ` Michael S. Tsirkin
2013-06-19 15:58           ` Eric Dumazet
2013-06-19 16:06             ` David Laight
2013-06-19 16:06               ` David Laight
2013-06-19 16:28               ` Eric Dumazet
2013-06-19 18:07             ` Michael S. Tsirkin
2013-06-20  8:15               ` [PATCH net-next] net: allow large number of tx queues Eric Dumazet
2013-06-20  8:35                 ` Michael S. Tsirkin
2013-06-21  6:41                   ` Jason Wang
2013-06-21  7:12                     ` Eric Dumazet
2013-06-23 10:29                       ` Michael S. Tsirkin
2013-06-24  6:57                 ` David Miller
2013-06-20  5:14         ` [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array Jason Wang
2013-06-20  6:05           ` Eric Dumazet
2013-06-19  5:40 ` [net-next rfc 2/3] tuntap: reduce the size of tun_struct " Jason Wang
2013-06-19  5:40 ` [net-next rfc 3/3] tuntap: increase the max queues to 16 Jason Wang
2013-06-19  6:34   ` Eric Dumazet
2013-06-19  7:15     ` Jason Wang
2013-06-19 19:16     ` Jerry Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130619091132.GA2816@redhat.com \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hkchu@google.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.