netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.h.duyck@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Jiafei.Pan@freescale.com" <Jiafei.Pan@freescale.com>,
	David Miller <davem@davemloft.net>,
	"jkosina@suse.cz" <jkosina@suse.cz>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"LeoLi@freescale.com" <LeoLi@freescale.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
Date: Thu, 16 Oct 2014 15:12:37 -0700	[thread overview]
Message-ID: <544042D5.5090501@redhat.com> (raw)
In-Reply-To: <1413495626.28798.38.camel@edumazet-glaptop2.roam.corp.google.com>


On 10/16/2014 02:40 PM, Eric Dumazet wrote:
> On Thu, 2014-10-16 at 11:20 -0700, Alexander Duyck wrote:
>
>> My concern would be that we are off by a factor of 2 and prematurely
>> collapse the TCP too soon with this change.
> That is the opposite actually. We can consume 4K but we pretend we
> consume 2K in some worst cases.

The only case where we consume the full 4K but only list it as 2K should 
be if we have memory from the wrong node and we want to flush it from 
the descriptor queue.  For all other cases we should be using the page 
at least twice per buffer.  So the the first page that was assigned for 
an Rx descriptor might be flushed but then after that reuse should take 
hold and stay in place as long as the NAPI poll doesn't change NUMA nodes.

That should be no worse than the case where the remaining space in a 
large page is not large enough to use as a buffer. You still use the 
current size as your truesize, you don't include the overhead of the 
unused space in your calculation.

>>   For example if you are
>> looking at a socket that is holding pages for a long period of time
>> there would be a good chance of it ending up with both halves of the
>> page.  In this case is it fair to charge it for 8K or memory use when in
>> reality it is only using 4K?
> Its better to collapse too soon than too late.
>
> If you want to avoid collapses because one host has plenty of memory,
> all you need to do is increase tcp_rmem.
>
> Why are you referring to 8K ? PAGE_SIZE is 4K

The truesize would be reported as 8K vs 4K for 2 half pages with your 
change if we were to hand off both halves of a page to the same socket.

The 2K value makes sense and is consistent with how we handle this in 
other cases where we are partitioning pages for use as network buffers.  
I think increasing this to 4K is just going to cause performance issues 
as flows are going to get choked off prematurely for memory usage that 
they aren't actually getting.

Part of my hesitation is that I spent the last couple of years 
explaining to our performance testing team and customers that they need 
to adjust tcp_rmem with all of the changes that have been made to 
truesize and the base network drivers, and I think I would prefer it if 
I didn't have to go another round of it.  Then again I probably won't 
have to anyway since I am not doing drivers for Intel any more, but 
still my reaction to this kind of change is what it is.

Thanks,

Alex





  reply	other threads:[~2014-10-16 22:12 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-15  3:26 [PATCH] net: use hardware buffer pool to allocate skb Pan Jiafei
2014-10-15  4:15 ` Eric Dumazet
2014-10-15  4:26   ` David Miller
2014-10-15  5:43   ` Jiafei.Pan
2014-10-15  9:15     ` Eric Dumazet
2014-10-15  4:25 ` David Miller
2014-10-15  5:34   ` Jiafei.Pan
2014-10-15  9:15     ` Eric Dumazet
2014-10-16  2:17       ` Jiafei.Pan
2014-10-16  4:15         ` Eric Dumazet
2014-10-16  5:15           ` Jiafei.Pan
2014-10-16 15:28             ` Alexander Duyck
2014-10-16 16:57               ` Eric Dumazet
2014-10-16 17:10                 ` Alexander Duyck
2014-10-16 17:45                   ` Eric Dumazet
2014-10-16 18:20                     ` Alexander Duyck
2014-10-16 21:40                       ` Eric Dumazet
2014-10-16 22:12                         ` Alexander Duyck [this message]
2014-10-17  9:11                       ` David Laight
2014-10-17 14:40                         ` Alexander Duyck
2014-10-17 16:55                           ` Eric Dumazet
2014-10-17 18:28                             ` Alexander Duyck
2014-10-17 18:53                               ` Eric Dumazet
2014-10-18  0:26                                 ` Eric Dumazet
2014-10-17 19:02                               ` Eric Dumazet
2014-10-17 19:38                                 ` Alexander Duyck
2014-10-17 19:51                                   ` Eric Dumazet
2014-10-17 22:13                                     ` Alexander Duyck
2014-10-17  2:35               ` Jiafei.Pan
2014-10-17 14:05                 ` Eric Dumazet
2014-10-17 14:12                   ` Alexander Duyck
2014-10-16  2:17       ` Jiafei.Pan
2014-10-15 15:51     ` David Miller
2014-10-15  4:59 ` Oliver Hartkopp
2014-10-15  5:47   ` Jiafei.Pan
2014-10-15  8:57 ` David Laight
2014-10-15  9:33 ` Stephen Hemminger
2014-10-16  2:30   ` Jiafei.Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=544042D5.5090501@redhat.com \
    --to=alexander.h.duyck@redhat.com \
    --cc=Jiafei.Pan@freescale.com \
    --cc=LeoLi@freescale.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jkosina@suse.cz \
    --cc=linux-doc@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).