From: Uladzislau Rezki <urezki@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Nicholas Piggin <npiggin@gmail.com>,
urezki@gmail.com
Subject: Re: [PATCH 2/2] mm/vmalloc: Use kvmalloc to allocate the table of pages
Date: Wed, 24 Mar 2021 19:41:19 +0100 [thread overview]
Message-ID: <20210324184119.GA25852@pc638.lan> (raw)
In-Reply-To: <20210323203924.GA52881@pc638.lan>
On Tue, Mar 23, 2021 at 09:39:24PM +0100, Uladzislau Rezki wrote:
> > On Tue, Mar 23, 2021 at 01:04:36PM +0100, Uladzislau Rezki wrote:
> > > On Mon, Mar 22, 2021 at 11:03:11PM +0000, Matthew Wilcox wrote:
> > > > I suspect the vast majority of the time is spent calling alloc_pages_node()
> > > > 1024 times. Have you looked at Mel's patch to do ... well, exactly what
> > > > vmalloc() wants?
> > > >
> > > <snip>
> > > - __vmalloc_node_range
> > > - 45.25% __alloc_pages_nodemask
> > > - 37.59% get_page_from_freelist
> > [...]
> > > - 44.61% 0xffffffffc047348d
> > > - __vunmap
> > > - 35.56% free_unref_page
> >
> > Hmm! I hadn't been thinking about the free side of things.
> > Does this make a difference?
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 4f5f8c907897..61d5b769fea0 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2277,16 +2277,8 @@ static void __vunmap(const void *addr, int deallocate_pages)
> > vm_remove_mappings(area, deallocate_pages);
> >
> > if (deallocate_pages) {
> > - int i;
> > -
> > - for (i = 0; i < area->nr_pages; i++) {
> > - struct page *page = area->pages[i];
> > -
> > - BUG_ON(!page);
> > - __free_pages(page, 0);
> > - }
> > + release_pages(area->pages, area->nr_pages);
> > atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
> > -
> > kvfree(area->pages);
> > }
> >
> Same test. 4MB allocation on a single CPU:
>
> default: loops: 1000000 avg: 93601889 usec
> patch: loops: 1000000 avg: 98217904 usec
>
> <snip default>
> - __vunmap
> - 41.17% free_unref_page
> - 28.42% free_pcppages_bulk
> - 6.38% __mod_zone_page_state
> 4.79% check_preemption_disabled
> 2.63% __list_del_entry_valid
> 2.63% __list_add_valid
> - 7.50% free_unref_page_commit
> 2.15% check_preemption_disabled
> 2.01% __list_add_valid
> 2.31% free_unref_page_prepare.part.86
> 0.70% free_pcp_prepare
> <snip default>
>
> <snip patch>
> - __vunmap
> - 45.36% release_pages
> - 37.70% free_unref_page_list
> - 24.70% free_pcppages_bulk
> - 5.42% __mod_zone_page_state
> 4.23% check_preemption_disabled
> 2.31% __list_add_valid
> 2.07% __list_del_entry_valid
> - 7.58% free_unref_page_commit
> 2.47% check_preemption_disabled
> 1.75% __list_add_valid
> 3.43% free_unref_page_prepare.part.86
> - 2.39% mem_cgroup_uncharge_list
> uncharge_page
> <snip patch>
>
> It is obvious that the default version is slightly better. It requires
> less things to be done comparing with release_pages() variant.
>
> >
> > release_pages does a bunch of checks that are unnecessary ... we could
> > probably just do:
> >
> > LIST_HEAD(pages_to_free);
> >
> > for (i = 0; i < area->nr_pages; i++) {
> > struct page *page = area->pages[i];
> > if (put_page_testzero(page))
> > list_add(&page->lru, &pages_to_free);
> > }
> > free_unref_page_list(&pages_to_free);
> >
> > but let's see if the provided interface gets us the performance we want.
> >
> I will test it tomorrow. From the first glance it looks like a more light version :)
>
Here we go:
<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4f5f8c907897..349024768ba6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2254,6 +2254,7 @@ static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
static void __vunmap(const void *addr, int deallocate_pages)
{
struct vm_struct *area;
+ LIST_HEAD(pages_to_free);
if (!addr)
return;
@@ -2282,11 +2283,12 @@ static void __vunmap(const void *addr, int deallocate_pages)
for (i = 0; i < area->nr_pages; i++) {
struct page *page = area->pages[i];
- BUG_ON(!page);
- __free_pages(page, 0);
+ if (put_page_testzero(page))
+ list_add(&page->lru, &pages_to_free);
}
- atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
+ free_unref_page_list(&pages_to_free);
+ atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
kvfree(area->pages);
}
<snip>
# patch
4MB allocation, single cpu, loops: 1000000 avg: 89065758 usec
4MB allocation, single cpu, loops: 1000000 avg: 90258523 usec
4MB allocation, single cpu, loops: 1000000 avg: 89363057 usec
4MB allocation, single cpu, loops: 1000000 avg: 89271685 usec
4MB allocation, single cpu, loops: 1000000 avg: 89247375 usec
# default
4MB allocation, single cpu, loops: 1000000 avg: 89258814 usec
4MB allocation, single cpu, loops: 1000000 avg: 89364194 usec
4MB allocation, single cpu, loops: 1000000 avg: 89226816 usec
4MB allocation, single cpu, loops: 1000000 avg: 89247360 usec
4MB allocation, single cpu, loops: 1000000 avg: 89330116 usec
Do not see any difference.
See below some profiling regarding cache misses:
<snip>
- __vunmap
- 32.15% free_unref_page_list
- 23.54% free_pcppages_bulk
- 6.33% __mod_zone_page_state
4.65% check_preemption_disabled
<snip>
free_unref_page_list():
│ free_unref_page_list():
│ffffffff8125152a: mov 0x8(%rbp),%rax
31.81 │ffffffff8125152e: lea 0x8(%rbp),%r12
│ffffffff81251532: mov %rbp,%r14
14.40 │ffffffff81251535: lea -0x8(%rax),%rbp
(gdb) l *0xffffffff8125152e
0xffffffff8125152e is in free_unref_page_list (mm/page_alloc.c:3271).
3266 struct page *page, *next;
3267 unsigned long flags, pfn;
3268 int batch_count = 0;
3269
3270 /* Prepare pages for freeing */
3271 list_for_each_entry_safe(page, next, list, lru) {
3272 pfn = page_to_pfn(page);
3273 if (!free_unref_page_prepare(page, pfn))
3274 list_del(&page->lru);
3275 set_page_private(page, pfn);
(gdb)
free_pcppages_bulk():
│ PageBuddy():
0.59 │ffffffff8124f523: mov 0x30(%rax),%edi
13.59 │ffffffff8124f526: and $0xf0000080,%edi
(gdb) l *0xffffffff8124f526
0xffffffff8124f526 is in free_pcppages_bulk (./include/linux/page-flags.h:742).
737
738 /*
739 * PageBuddy() indicates that the page is free and in the buddy system
740 * (see mm/page_alloc.c).
741 */
742 PAGE_TYPE_OPS(Buddy, buddy)
743
744 /*
745 * PageOffline() indicates that the page is logically offline although the
746 * containing section is online. (e.g. inflated in a balloon driver or
(gdb)
Looks like it would be good to have a free_pages_bulk_array() :)
--
Vlad Rezki
prev parent reply other threads:[~2021-03-24 18:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-22 19:38 [PATCH 1/2] mm/util: Add kvmalloc_node_caller Matthew Wilcox (Oracle)
2021-03-22 19:38 ` [PATCH 2/2] mm/vmalloc: Use kvmalloc to allocate the table of pages Matthew Wilcox (Oracle)
2021-03-22 22:36 ` Uladzislau Rezki
2021-03-22 23:03 ` Matthew Wilcox
2021-03-23 12:04 ` Uladzislau Rezki
2021-03-23 12:39 ` Matthew Wilcox
2021-03-23 13:39 ` Uladzislau Rezki
2021-03-23 14:07 ` Matthew Wilcox
2021-03-23 20:49 ` Uladzislau Rezki
2021-03-23 20:39 ` Uladzislau Rezki
2021-03-24 18:41 ` Uladzislau Rezki [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210324184119.GA25852@pc638.lan \
--to=urezki@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@gmail.com \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.