From: Josef Bacik <josef@toxicpanda.com>
To: Dennis Zhou <dennisz@fb.com>
Cc: Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>,
kernel-team@fb.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Dennis Zhou <dennisszhou@gmail.com>
Subject: Re: [PATCH 09/10] percpu: replace area map allocator with bitmap allocator
Date: Wed, 19 Jul 2017 19:11:06 +0000 [thread overview]
Message-ID: <20170719191105.GC23135@li70-116.members.linode.com> (raw)
In-Reply-To: <20170716022315.19892-10-dennisz@fb.com>
On Sat, Jul 15, 2017 at 10:23:14PM -0400, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com>
>
> The percpu memory allocator is experiencing scalability issues when
> allocating and freeing large numbers of counters as in BPF.
> Additionally, there is a corner case where iteration is triggered over
> all chunks if the contig_hint is the right size, but wrong alignment.
>
> Implementation:
> This patch removes the area map allocator in favor of a bitmap allocator
> backed by metadata blocks. The primary goal is to provide consistency
> in performance and memory footprint with a focus on small allocations
> (< 64 bytes). The bitmap removes the heavy memmove from the freeing
> critical path and provides a consistent memory footprint. The metadata
> blocks provide a bound on the amount of scanning required by maintaining
> a set of hints.
>
> The chunks previously were managed by free_size, a value maintained in
> bytes. Now, the chunks are managed in terms of bits, which is just a
> scaled value of free_size down by PCPU_MIN_ALLOC_SIZE.
>
> There is one caveat with this implementation. In an effort to make
> freeing fast, the only time metadata is updated on the free path is if a
> whole block becomes free or the freed area spans across metadata blocks.
> This causes the chunka??s contig_hint to be potentially smaller than what
> it could allocate by up to a block. If the chunka??s contig_hint is
> smaller than a block, a check occurs and the hint is kept accurate.
> Metadata is always kept accurate on allocation and therefore the
> situation where a chunk has a larger contig_hint than available will
> never occur.
>
> Evaluation:
> I have primarily done testing against a simple workload of allocation of
> 1 million objects of varying size. Deallocation was done by in order,
> alternating, and in reverse. These numbers were collected after rebasing
> ontop of a80099a152. I present the worst-case numbers here:
>
> Area Map Allocator:
>
> Object Size | Alloc Time (ms) | Free Time (ms)
> ----------------------------------------------
> 4B | 335 | 4960
> 16B | 485 | 1150
> 64B | 445 | 280
> 128B | 505 | 177
> 1024B | 3385 | 140
>
> Bitmap Allocator:
>
> Object Size | Alloc Time (ms) | Free Time (ms)
> ----------------------------------------------
> 4B | 725 | 70
> 16B | 760 | 70
> 64B | 855 | 80
> 128B | 910 | 90
> 1024B | 3770 | 260
>
> This data demonstrates the inability for the area map allocator to
> handle less than ideal situations. In the best case of reverse
> deallocation, the area map allocator was able to perform within range
> of the bitmap allocator. In the worst case situation, freeing took
> nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator
> dramatically improves the consistency of the free path. The small
> allocations performed nearly identical regardless of the freeing
> pattern.
>
> While it does add to the allocation latency, the allocation scenario
> here is optimal for the area map allocator. The second problem of
> additional scanning can result in the area map allocator completing in
> 52 minutes. The same workload takes only 14 seconds to complete for the
> bitmap allocator. This was produced under a more contrived scenario of
> allocating 1 milion 4-byte objects with 8-byte alignment.
>
This was a bear to review, I feel like it could be split into a few smaller
pieces. You are changing hinting, allocating/freeing, and how you find chunks.
Those seem like good logical divisions to me. Overall the design seems sound
and I didn't spot any major problems. Once you've split them up I'll do another
thorough comb through and then add my reviewed by. Thanks,
Josef
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-07-19 19:11 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-16 2:23 [PATCH 00/10] percpu: replace percpu area map allocator with bitmap allocator Dennis Zhou
2017-07-16 2:23 ` [PATCH 01/10] percpu: pcpu-stats change void buffer to int buffer Dennis Zhou
2017-07-17 14:44 ` Tejun Heo
2017-07-16 2:23 ` [PATCH 02/10] percpu: change the format for percpu_stats output Dennis Zhou
2017-07-17 14:46 ` Tejun Heo
2017-07-16 2:23 ` [PATCH 03/10] percpu: expose pcpu_nr_empty_pop_pages in pcpu_stats Dennis Zhou
2017-07-17 14:47 ` Tejun Heo
2017-07-16 2:23 ` [PATCH 04/10] percpu: update the header comment and pcpu_build_alloc_info comments Dennis Zhou
2017-07-17 14:53 ` Tejun Heo
2017-07-16 2:23 ` [PATCH 05/10] percpu: change reserved_size to end page aligned Dennis Zhou
2017-07-16 4:01 ` kbuild test robot
2017-07-16 5:11 ` kbuild test robot
2017-07-17 16:46 ` Tejun Heo
2017-07-17 19:10 ` Dennis Zhou
2017-07-24 20:04 ` Dennis Zhou
2017-07-16 2:23 ` [PATCH 06/10] percpu: modify base_addr to be region specific Dennis Zhou
2017-07-17 18:57 ` Tejun Heo
2017-07-18 19:26 ` Josef Bacik
2017-07-18 19:36 ` Matthew Wilcox
2017-07-19 14:20 ` Josef Bacik
2017-07-16 2:23 ` [PATCH 07/10] percpu: fix misnomer in schunk/dchunk variable names Dennis Zhou
2017-07-17 19:10 ` Tejun Heo
2017-07-24 20:07 ` Dennis Zhou
2017-07-16 2:23 ` [PATCH 08/10] percpu: change the number of pages marked in the first_chunk bitmaps Dennis Zhou
2017-07-17 19:26 ` Tejun Heo
2017-07-24 20:13 ` Dennis Zhou
2017-07-16 2:23 ` [PATCH 09/10] percpu: replace area map allocator with bitmap allocator Dennis Zhou
2017-07-17 23:27 ` Tejun Heo
2017-07-24 21:37 ` Dennis Zhou
2017-07-19 19:11 ` Josef Bacik [this message]
2017-07-19 22:19 ` Dennis Zhou
2017-07-19 19:16 ` Josef Bacik
2017-07-19 22:13 ` Dennis Zhou
2017-07-16 2:23 ` [PATCH 10/10] percpu: add optimizations on allocation path for the " Dennis Zhou
2017-07-17 23:32 ` Tejun Heo
2017-07-18 19:15 ` [PATCH 00/10] percpu: replace percpu area map allocator with " Josef Bacik
2017-07-24 21:14 ` Dennis Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170719191105.GC23135@li70-116.members.linode.com \
--to=josef@toxicpanda.com \
--cc=cl@linux.com \
--cc=dennisszhou@gmail.com \
--cc=dennisz@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).