All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Nicholas Piggin <npiggin@gmail.com>,
	urezki@gmail.com
Subject: Re: [PATCH 2/2] mm/vmalloc: Use kvmalloc to allocate the table of pages
Date: Wed, 24 Mar 2021 19:41:19 +0100	[thread overview]
Message-ID: <20210324184119.GA25852@pc638.lan> (raw)
In-Reply-To: <20210323203924.GA52881@pc638.lan>

On Tue, Mar 23, 2021 at 09:39:24PM +0100, Uladzislau Rezki wrote:
> > On Tue, Mar 23, 2021 at 01:04:36PM +0100, Uladzislau Rezki wrote:
> > > On Mon, Mar 22, 2021 at 11:03:11PM +0000, Matthew Wilcox wrote:
> > > > I suspect the vast majority of the time is spent calling alloc_pages_node()
> > > > 1024 times.  Have you looked at Mel's patch to do ... well, exactly what
> > > > vmalloc() wants?
> > > > 
> > > <snip>
> > >          - __vmalloc_node_range
> > >             - 45.25% __alloc_pages_nodemask
> > >                - 37.59% get_page_from_freelist
> > [...]
> > >       - 44.61% 0xffffffffc047348d
> > >          - __vunmap
> > >             - 35.56% free_unref_page
> > 
> > Hmm!  I hadn't been thinking about the free side of things.
> > Does this make a difference?
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 4f5f8c907897..61d5b769fea0 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2277,16 +2277,8 @@ static void __vunmap(const void *addr, int deallocate_pages)
> >  	vm_remove_mappings(area, deallocate_pages);
> >  
> >  	if (deallocate_pages) {
> > -		int i;
> > -
> > -		for (i = 0; i < area->nr_pages; i++) {
> > -			struct page *page = area->pages[i];
> > -
> > -			BUG_ON(!page);
> > -			__free_pages(page, 0);
> > -		}
> > +		release_pages(area->pages, area->nr_pages);
> >  		atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
> > -
> >  		kvfree(area->pages);
> >  	}
> >
> Same test. 4MB allocation on a single CPU:
> 
> default: loops: 1000000 avg: 93601889 usec
> patch:   loops: 1000000 avg: 98217904 usec
> 
> <snip default>
>             - __vunmap
>                - 41.17% free_unref_page
>                   - 28.42% free_pcppages_bulk
>                      - 6.38% __mod_zone_page_state
>                           4.79% check_preemption_disabled
>                        2.63% __list_del_entry_valid
>                        2.63% __list_add_valid
>                   - 7.50% free_unref_page_commit
>                        2.15% check_preemption_disabled
>                        2.01% __list_add_valid
>                     2.31% free_unref_page_prepare.part.86
>                     0.70% free_pcp_prepare
> <snip default>
> 
> <snip patch>
>         - __vunmap
>                - 45.36% release_pages
>                   - 37.70% free_unref_page_list
>                      - 24.70% free_pcppages_bulk
>                         - 5.42% __mod_zone_page_state
>                              4.23% check_preemption_disabled
>                           2.31% __list_add_valid
>                           2.07% __list_del_entry_valid
>                      - 7.58% free_unref_page_commit
>                           2.47% check_preemption_disabled
>                           1.75% __list_add_valid
>                        3.43% free_unref_page_prepare.part.86
>                   - 2.39% mem_cgroup_uncharge_list
>                        uncharge_page
> <snip patch>
> 
> It is obvious that the default version is slightly better. It requires
> less things to be done comparing with release_pages() variant.
> 
> > 
> > release_pages does a bunch of checks that are unnecessary ... we could
> > probably just do:
> > 
> > 		LIST_HEAD(pages_to_free);
> > 
> > 		for (i = 0; i < area->nr_pages; i++) {
> > 			struct page *page = area->pages[i];
> > 			if (put_page_testzero(page))
> > 				list_add(&page->lru, &pages_to_free);
> > 		}
> > 		free_unref_page_list(&pages_to_free);
> > 
> > but let's see if the provided interface gets us the performance we want.
> >  
> I will test it tomorrow. From the first glance it looks like a more light version :)
> 
Here we go:

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4f5f8c907897..349024768ba6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2254,6 +2254,7 @@ static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 static void __vunmap(const void *addr, int deallocate_pages)
 {
        struct vm_struct *area;
+       LIST_HEAD(pages_to_free);
 
        if (!addr)
                return;
@@ -2282,11 +2283,12 @@ static void __vunmap(const void *addr, int deallocate_pages)
                for (i = 0; i < area->nr_pages; i++) {
                        struct page *page = area->pages[i];
 
-                       BUG_ON(!page);
-                       __free_pages(page, 0);
+                       if (put_page_testzero(page))
+                               list_add(&page->lru, &pages_to_free);
                }
-               atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
 
+               free_unref_page_list(&pages_to_free);
+               atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
                kvfree(area->pages);
        }
<snip>

# patch
4MB allocation, single cpu, loops: 1000000 avg: 89065758 usec
4MB allocation, single cpu, loops: 1000000 avg: 90258523 usec
4MB allocation, single cpu, loops: 1000000 avg: 89363057 usec
4MB allocation, single cpu, loops: 1000000 avg: 89271685 usec
4MB allocation, single cpu, loops: 1000000 avg: 89247375 usec

# default
4MB allocation, single cpu, loops: 1000000 avg: 89258814 usec
4MB allocation, single cpu, loops: 1000000 avg: 89364194 usec
4MB allocation, single cpu, loops: 1000000 avg: 89226816 usec
4MB allocation, single cpu, loops: 1000000 avg: 89247360 usec
4MB allocation, single cpu, loops: 1000000 avg: 89330116 usec

Do not see any difference.

See below some profiling regarding cache misses:

<snip>
         - __vunmap
            - 32.15% free_unref_page_list                                                                                                                                               
               - 23.54% free_pcppages_bulk
                  - 6.33% __mod_zone_page_state
                       4.65% check_preemption_disabled
<snip>

free_unref_page_list():
       │        free_unref_page_list():
       │ffffffff8125152a:   mov    0x8(%rbp),%rax
 31.81 │ffffffff8125152e:   lea    0x8(%rbp),%r12                                                                                                                                        
       │ffffffff81251532:   mov    %rbp,%r14
 14.40 │ffffffff81251535:   lea    -0x8(%rax),%rbp  


(gdb) l *0xffffffff8125152e
0xffffffff8125152e is in free_unref_page_list (mm/page_alloc.c:3271).
3266            struct page *page, *next;
3267            unsigned long flags, pfn;
3268            int batch_count = 0;
3269
3270            /* Prepare pages for freeing */
3271            list_for_each_entry_safe(page, next, list, lru) {
3272                    pfn = page_to_pfn(page);
3273                    if (!free_unref_page_prepare(page, pfn))
3274                            list_del(&page->lru);
3275                    set_page_private(page, pfn);
(gdb)

free_pcppages_bulk():
       │        PageBuddy():
  0.59 │ffffffff8124f523:   mov        0x30(%rax),%edi
 13.59 │ffffffff8124f526:   and        $0xf0000080,%edi    

(gdb) l *0xffffffff8124f526
0xffffffff8124f526 is in free_pcppages_bulk (./include/linux/page-flags.h:742).
737
738     /*
739      * PageBuddy() indicates that the page is free and in the buddy system
740      * (see mm/page_alloc.c).
741      */
742     PAGE_TYPE_OPS(Buddy, buddy)
743
744     /*
745      * PageOffline() indicates that the page is logically offline although the
746      * containing section is online. (e.g. inflated in a balloon driver or
(gdb)

Looks like it would be good to have a free_pages_bulk_array() :)

--
Vlad Rezki

      reply	other threads:[~2021-03-24 18:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22 19:38 [PATCH 1/2] mm/util: Add kvmalloc_node_caller Matthew Wilcox (Oracle)
2021-03-22 19:38 ` [PATCH 2/2] mm/vmalloc: Use kvmalloc to allocate the table of pages Matthew Wilcox (Oracle)
2021-03-22 22:36   ` Uladzislau Rezki
2021-03-22 23:03     ` Matthew Wilcox
2021-03-23 12:04       ` Uladzislau Rezki
2021-03-23 12:39         ` Matthew Wilcox
2021-03-23 13:39           ` Uladzislau Rezki
2021-03-23 14:07             ` Matthew Wilcox
2021-03-23 20:49               ` Uladzislau Rezki
2021-03-23 20:39           ` Uladzislau Rezki
2021-03-24 18:41             ` Uladzislau Rezki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210324184119.GA25852@pc638.lan \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@gmail.com \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.