linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dennis Zhou <dennisz@fb.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>,
	kernel-team@fb.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Dennis Zhou <dennisszhou@gmail.com>
Subject: Re: [PATCH 00/10] percpu: replace percpu area map allocator with bitmap allocator
Date: Mon, 24 Jul 2017 17:14:41 -0400	[thread overview]
Message-ID: <20170724211440.GD91613@dennisz-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20170718191527.GA4009@destiny>

Hi Josef,

On Tue, Jul 18, 2017 at 03:15:28PM -0400, Josef Bacik wrote:
 
> Ok so you say that this test is better for the area map allocator, so presumably
> this is worst case for the bitmap allocator?  What does the average case look
> like?

The bitmap allocator should perform relatively consistent in most
allocation schemes. The worst case would be where constant scanning is
required like the allocation scenario of 4-bytes with 8-byte alignment.
As far as average case, I would expect similar performance on average.
These are updated values with v2.

  Area Map Allocator:

        Object Size | Alloc Time (ms) | Free Time (ms)
        ----------------------------------------------
              4B    |        310      |     4770
             16B    |        557      |     1325
             64B    |        436      |      273
            256B    |        776      |      131
           1024B    |       3280      |      122

  Bitmap Allocator:

        Object Size | Alloc Time (ms) | Free Time (ms)
        ----------------------------------------------
              4B    |        490      |       70
             16B    |        515      |       75
             64B    |        610      |       80
            256B    |        950      |      100
           1024B    |       3520      |      200

> Trading 2x allocation latency for a pretty significant free latency
> reduction seems ok, but are we allocating or freeing more?  Are both allocations
> and free's done in performance critical areas, or do allocations only happen in
> performance critical areas, making the increased allocation latency hurt more?

I unfortunately do not have any data to support whether we are
allocating or freeing more in critical areas. I would defer use case
questions to those consuming percpu memory. There is one issue that
stood out though with BPF that is causing the cpu to hang due to the
percpu allocator. This leads me to believe that freeing is of more
concern. I would believe that if allocation latency is a major problem,
it is easier for the user to preallocate than it is for them to manage
delayed freeing.

> And lastly, why are we paying a 2x latency cost?  What is it about the bitmap
> allocator that makes it much worse than the area map allocator?

So it turns out v1 was making poor use of control logic and the
compiler wasn't doing me any favors. I've since done a rather large
refactor with v2 which shows better performance overall. The bitmap
allocator is paying more in that it has to manage metadata that the area
map allocator does not. In the optimal case, the area map allocator is
just an append without any heavy operations.

A worse performing scenario for the area map allocator is when the
second half of a chunk is occupied by a lot of small allocations. In
that case, each allocation has to memmove the rest of the area map. This
scenario is nontrivial to reproduce, so I provide data below for a
similar scenario where the second half of each page is filled.

  Area Map Allocator:

        Object Size | Alloc Time (ms) | Free Time (ms)
        ----------------------------------------------
              4B    |       4118      |     4892
             16B    |       1651      |     1163
             64B    |        598      |      285
            256B    |        771      |      158
           1024B    |       3034      |      160

  Bitmap Allocator:

        Object Size | Alloc Time (ms) | Free Time (ms)
        ----------------------------------------------
              4B    |        481      |       67
             16B    |        506      |       69
             64B    |        636      |       75
            256B    |        892      |       90
           1024B    |       3262      |      147

This shows that in less ideal chunk states, the allocation path can fail
just like the freeing path can with non-ideal deallocation patterns.

Thanks,
Dennis

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2017-07-24 21:15 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-16  2:23 [PATCH 00/10] percpu: replace percpu area map allocator with bitmap allocator Dennis Zhou
2017-07-16  2:23 ` [PATCH 01/10] percpu: pcpu-stats change void buffer to int buffer Dennis Zhou
2017-07-17 14:44   ` Tejun Heo
2017-07-16  2:23 ` [PATCH 02/10] percpu: change the format for percpu_stats output Dennis Zhou
2017-07-17 14:46   ` Tejun Heo
2017-07-16  2:23 ` [PATCH 03/10] percpu: expose pcpu_nr_empty_pop_pages in pcpu_stats Dennis Zhou
2017-07-17 14:47   ` Tejun Heo
2017-07-16  2:23 ` [PATCH 04/10] percpu: update the header comment and pcpu_build_alloc_info comments Dennis Zhou
2017-07-17 14:53   ` Tejun Heo
2017-07-16  2:23 ` [PATCH 05/10] percpu: change reserved_size to end page aligned Dennis Zhou
2017-07-16  4:01   ` kbuild test robot
2017-07-16  5:11   ` kbuild test robot
2017-07-17 16:46   ` Tejun Heo
2017-07-17 19:10     ` Dennis Zhou
2017-07-24 20:04     ` Dennis Zhou
2017-07-16  2:23 ` [PATCH 06/10] percpu: modify base_addr to be region specific Dennis Zhou
2017-07-17 18:57   ` Tejun Heo
2017-07-18 19:26   ` Josef Bacik
2017-07-18 19:36     ` Matthew Wilcox
2017-07-19 14:20       ` Josef Bacik
2017-07-16  2:23 ` [PATCH 07/10] percpu: fix misnomer in schunk/dchunk variable names Dennis Zhou
2017-07-17 19:10   ` Tejun Heo
2017-07-24 20:07     ` Dennis Zhou
2017-07-16  2:23 ` [PATCH 08/10] percpu: change the number of pages marked in the first_chunk bitmaps Dennis Zhou
2017-07-17 19:26   ` Tejun Heo
2017-07-24 20:13     ` Dennis Zhou
2017-07-16  2:23 ` [PATCH 09/10] percpu: replace area map allocator with bitmap allocator Dennis Zhou
2017-07-17 23:27   ` Tejun Heo
2017-07-24 21:37     ` Dennis Zhou
2017-07-19 19:11   ` Josef Bacik
2017-07-19 22:19     ` Dennis Zhou
2017-07-19 19:16   ` Josef Bacik
2017-07-19 22:13     ` Dennis Zhou
2017-07-16  2:23 ` [PATCH 10/10] percpu: add optimizations on allocation path for the " Dennis Zhou
2017-07-17 23:32   ` Tejun Heo
2017-07-18 19:15 ` [PATCH 00/10] percpu: replace percpu area map allocator with " Josef Bacik
2017-07-24 21:14   ` Dennis Zhou [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170724211440.GD91613@dennisz-mbp.dhcp.thefacebook.com \
    --to=dennisz@fb.com \
    --cc=cl@linux.com \
    --cc=dennisszhou@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).