All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] put page to pcp->lists[] tail if it is not on the same node
@ 2018-10-19  4:33 Wei Yang
  2018-10-19  8:38 ` Mel Gorman
  2018-10-19 13:43 ` Vlastimil Babka
  0 siblings, 2 replies; 10+ messages in thread
From: Wei Yang @ 2018-10-19  4:33 UTC (permalink / raw)
  To: willy, mhocko, mgorman; +Cc: richard.weiyang, linux-mm, akpm

node
Reply-To: Wei Yang <richard.weiyang@gmail.com>

Masters,

During the code reading, I pop up this idea.

    In case we put some intelegence of NUMA node to pcp->lists[], we may
    get a better performance.

The idea is simple:

    Put page on other nodes to the tail of pcp->lists[], because we
    allocate from head and free from tail.

Since my desktop just has one numa node, I couldn't test the effect. I
just run a kernel build test to see if it would degrade current kernel.
The result looks not bad.

                    make -j4 bzImage
           base-line:
           
           real    6m15.947s        
           user    21m14.481s       
           sys     2m34.407s        
           
           real    6m16.089s        
           user    21m18.295s       
           sys     2m35.551s        
           
           real    6m16.239s        
           user    21m17.590s       
           sys     2m35.252s        
           
           patched:
           
           real    6m14.558s
           user    21m18.374s
           sys     2m33.143s
           
           real    6m14.606s
           user    21m14.969s
           sys     2m32.039s
           
           real    6m15.264s
           user    21m16.698s
           sys     2m33.024s

Sorry for sending this without a real justification. Hope this will not
make you uncomfortable. I would be very glad if you suggest some
verifications that I could do.

Below is my testing patch, look forward your comments.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-19  4:33 [RFC] put page to pcp->lists[] tail if it is not on the same node Wei Yang
@ 2018-10-19  8:38 ` Mel Gorman
  2018-10-20  0:54   ` Wei Yang
  2018-10-20 16:33   ` Wei Yang
  2018-10-19 13:43 ` Vlastimil Babka
  1 sibling, 2 replies; 10+ messages in thread
From: Mel Gorman @ 2018-10-19  8:38 UTC (permalink / raw)
  To: Wei Yang; +Cc: willy, mhocko, linux-mm, akpm

On Fri, Oct 19, 2018 at 04:33:03AM +0000, Wei Yang wrote:
> node
> Reply-To: Wei Yang <richard.weiyang@gmail.com>
> 
> Masters,
> 
> During the code reading, I pop up this idea.
> 
>     In case we put some intelegence of NUMA node to pcp->lists[], we may
>     get a better performance.
> 

Why?

> The idea is simple:
> 
>     Put page on other nodes to the tail of pcp->lists[], because we
>     allocate from head and free from tail.
> 

Pages from remote nodes are not placed on local lists. Even in the slab
context, such objects are placed on alien caches which have special
handling.

> Since my desktop just has one numa node, I couldn't test the effect.

I suspect it would eventually cause a crash or at least weirdness as the
page zone ids would not match due to different nodes.

> Sorry for sending this without a real justification. Hope this will not
> make you uncomfortable. I would be very glad if you suggest some
> verifications that I could do.
> 
> Below is my testing patch, look forward your comments.
> 

I commend you trying to understand how the page allocator works but I
suggest you take a step back, pick a workload that is of interest and
profile it to see where hot spots are that may pinpoint where an
improvement can be made.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-19  4:33 [RFC] put page to pcp->lists[] tail if it is not on the same node Wei Yang
  2018-10-19  8:38 ` Mel Gorman
@ 2018-10-19 13:43 ` Vlastimil Babka
  2018-10-20  1:38   ` Wei Yang
  2018-10-20 16:10   ` Wei Yang
  1 sibling, 2 replies; 10+ messages in thread
From: Vlastimil Babka @ 2018-10-19 13:43 UTC (permalink / raw)
  To: Wei Yang, willy, mhocko, mgorman; +Cc: linux-mm, akpm

On 10/19/18 6:33 AM, Wei Yang wrote:
> @@ -2763,7 +2764,14 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
>  	}
>  
>  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
> -	list_add(&page->lru, &pcp->lists[migratetype]);

My impression is that you think there's only one pcp per cpu. But the
"pcp" here is already specific to the zone (and thus node) of the page
being freed. So it doesn't matter if we put the page to the list or
tail. For allocation we already typically prefer local nodes, thus local
zones, thus pcp's containing only local pages.

> +	/*
> +	 * If the page has the same node_id as this cpu, put the page at head.
> +	 * Otherwise, put at the end.
> +	 */
> +	if (page_node == pcp->node)

So this should in fact be always true due to what I explained above.

Otherwise I second the recommendation from Mel.

Cheers,
Vlastimil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-19  8:38 ` Mel Gorman
@ 2018-10-20  0:54   ` Wei Yang
  2018-10-20 16:33   ` Wei Yang
  1 sibling, 0 replies; 10+ messages in thread
From: Wei Yang @ 2018-10-20  0:54 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Wei Yang, willy, mhocko, linux-mm, akpm

On Fri, Oct 19, 2018 at 09:38:18AM +0100, Mel Gorman wrote:
>On Fri, Oct 19, 2018 at 04:33:03AM +0000, Wei Yang wrote:
>> node
>> Reply-To: Wei Yang <richard.weiyang@gmail.com>
>> 
>> Masters,
>> 
>> During the code reading, I pop up this idea.
>> 
>>     In case we put some intelegence of NUMA node to pcp->lists[], we may
>>     get a better performance.
>> 
>
>Why?
>
>> The idea is simple:
>> 
>>     Put page on other nodes to the tail of pcp->lists[], because we
>>     allocate from head and free from tail.
>> 
>
>Pages from remote nodes are not placed on local lists. Even in the slab
>context, such objects are placed on alien caches which have special
>handling.
>

Hmm... ok, I need to read the code again.

>> Since my desktop just has one numa node, I couldn't test the effect.
>
>I suspect it would eventually cause a crash or at least weirdness as the
>page zone ids would not match due to different nodes.
>
>> Sorry for sending this without a real justification. Hope this will not
>> make you uncomfortable. I would be very glad if you suggest some
>> verifications that I could do.
>> 
>> Below is my testing patch, look forward your comments.
>> 
>
>I commend you trying to understand how the page allocator works but I
>suggest you take a step back, pick a workload that is of interest and
>profile it to see where hot spots are that may pinpoint where an
>improvement can be made.
>

Thanks for your words.

>-- 
>Mel Gorman
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-19 13:43 ` Vlastimil Babka
@ 2018-10-20  1:38   ` Wei Yang
  2018-10-20 16:10   ` Wei Yang
  1 sibling, 0 replies; 10+ messages in thread
From: Wei Yang @ 2018-10-20  1:38 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Wei Yang, willy, mhocko, mgorman, linux-mm, akpm

On Fri, Oct 19, 2018 at 03:43:29PM +0200, Vlastimil Babka wrote:
>On 10/19/18 6:33 AM, Wei Yang wrote:
>> @@ -2763,7 +2764,14 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
>>  	}
>>  
>>  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
>> -	list_add(&page->lru, &pcp->lists[migratetype]);
>
>My impression is that you think there's only one pcp per cpu. But the
>"pcp" here is already specific to the zone (and thus node) of the page
>being freed. So it doesn't matter if we put the page to the list or
>tail. For allocation we already typically prefer local nodes, thus local
>zones, thus pcp's containing only local pages.
>

Your guess is right. :-)

I took a look in the code

    zone->pageset = alloc_percpu(struct per_cpu_pageset);

each zone has its pageset.

This means just a portion of the pageset is used on a multi-node
system, since a node just belongs to one node. Could we allocate just
this part or initialize just this part? Maybe it is too small to polish.

Well, I am lost on when we will allocate a page from remote node. Let me
try to understand :-)

>> +	/*
>> +	 * If the page has the same node_id as this cpu, put the page at head.
>> +	 * Otherwise, put at the end.
>> +	 */
>> +	if (page_node == pcp->node)
>
>So this should in fact be always true due to what I explained above.
>
>Otherwise I second the recommendation from Mel.
>

Sure, I have to say you are right.

BTW, is there other channel not as formal as mail list to raise some
question or discussion? Reading the code alone is not that exciting and
sometimes when I get some idea or confusion, I really willing to chat
with someone or to understand why it is so.

Mail list seems not the proper channel, maybe the irc is a proper way?

>Cheers,
>Vlastimil

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-19 13:43 ` Vlastimil Babka
  2018-10-20  1:38   ` Wei Yang
@ 2018-10-20 16:10   ` Wei Yang
  1 sibling, 0 replies; 10+ messages in thread
From: Wei Yang @ 2018-10-20 16:10 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Wei Yang, willy, mhocko, mgorman, linux-mm, akpm

On Fri, Oct 19, 2018 at 03:43:29PM +0200, Vlastimil Babka wrote:
>On 10/19/18 6:33 AM, Wei Yang wrote:
>> @@ -2763,7 +2764,14 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
>>  	}
>>  
>>  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
>> -	list_add(&page->lru, &pcp->lists[migratetype]);
>
>My impression is that you think there's only one pcp per cpu. But the
>"pcp" here is already specific to the zone (and thus node) of the page
>being freed. So it doesn't matter if we put the page to the list or
>tail. For allocation we already typically prefer local nodes, thus local
>zones, thus pcp's containing only local pages.
>
>> +	/*
>> +	 * If the page has the same node_id as this cpu, put the page at head.
>> +	 * Otherwise, put at the end.
>> +	 */
>> +	if (page_node == pcp->node)
>
>So this should in fact be always true due to what I explained above.

Vlastimil,

After looking at the code, I got some new understanding of the pcp
pages, which maybe a little different from yours.

Every zone has a per_cpu_pageset for each cpu, and the pages allocated
to per_cpu_pageset is either of the same node with this *cpu* or
different node.

So this comparison (page_node == pcp->node) would always be true or
false for a particular per_cpu_pageset.

Well, one thing for sure is putting a page to tail will not improve the
locality.

>
>Otherwise I second the recommendation from Mel.
>
>Cheers,
>Vlastimil

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-19  8:38 ` Mel Gorman
  2018-10-20  0:54   ` Wei Yang
@ 2018-10-20 16:33   ` Wei Yang
  2018-10-21  2:36     ` Wei Yang
  2018-10-21 12:12     ` Mel Gorman
  1 sibling, 2 replies; 10+ messages in thread
From: Wei Yang @ 2018-10-20 16:33 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Wei Yang, willy, mhocko, linux-mm, akpm

On Fri, Oct 19, 2018 at 09:38:18AM +0100, Mel Gorman wrote:
>On Fri, Oct 19, 2018 at 04:33:03AM +0000, Wei Yang wrote:
>> node
>> Reply-To: Wei Yang <richard.weiyang@gmail.com>
>> 
>> Masters,
>> 
>> During the code reading, I pop up this idea.
>> 
>>     In case we put some intelegence of NUMA node to pcp->lists[], we may
>>     get a better performance.
>> 
>
>Why?
>
>> The idea is simple:
>> 
>>     Put page on other nodes to the tail of pcp->lists[], because we
>>     allocate from head and free from tail.
>> 
>
>Pages from remote nodes are not placed on local lists. Even in the slab
>context, such objects are placed on alien caches which have special
>handling.
>

Hmm... I am not sure get your point correctly.

As I mentioned in the reply to Vlastimil, every zone has a
per_cpu_pageset for each cpu. For those per_cpu_pageset of one zone, it
will only contains pages from this zone. This means, some of
per_cpu_pageset will have the pages with the same node id, while others
not.

I don't get your point for the slab context. They use a different list
instead of pcp->lists[]? If you could give me some hint, I may catch up.

>> Since my desktop just has one numa node, I couldn't test the effect.
>
>I suspect it would eventually cause a crash or at least weirdness as the
>page zone ids would not match due to different nodes.
>

If my analysis is correct, there are only two relationship between page
node_id of those pages in pcp and the pcp's node_id, either the same or
not.

Let me have a try with qemu emulated numa system. :-)

>> Sorry for sending this without a real justification. Hope this will not
>> make you uncomfortable. I would be very glad if you suggest some
>> verifications that I could do.
>> 
>> Below is my testing patch, look forward your comments.
>> 
>
>I commend you trying to understand how the page allocator works but I
>suggest you take a step back, pick a workload that is of interest and
>profile it to see where hot spots are that may pinpoint where an
>improvement can be made.
>
>-- 
>Mel Gorman
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-20 16:33   ` Wei Yang
@ 2018-10-21  2:36     ` Wei Yang
  2018-10-21 12:12     ` Mel Gorman
  1 sibling, 0 replies; 10+ messages in thread
From: Wei Yang @ 2018-10-21  2:36 UTC (permalink / raw)
  To: Wei Yang; +Cc: Mel Gorman, willy, mhocko, linux-mm, akpm

On Sat, Oct 20, 2018 at 04:33:18PM +0000, Wei Yang wrote:
>>
>>I suspect it would eventually cause a crash or at least weirdness as the
>>page zone ids would not match due to different nodes.
>>
>
>If my analysis is correct, there are only two relationship between page
>node_id of those pages in pcp and the pcp's node_id, either the same or
>not.
>
>Let me have a try with qemu emulated numa system. :-)
>

Just run an emulated sytem with 4 numa nodes in qemu, the kernel with
this change looks good.

But nothing to be happy, just want you be informed.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-20 16:33   ` Wei Yang
  2018-10-21  2:36     ` Wei Yang
@ 2018-10-21 12:12     ` Mel Gorman
  2018-10-22  1:24       ` Wei Yang
  1 sibling, 1 reply; 10+ messages in thread
From: Mel Gorman @ 2018-10-21 12:12 UTC (permalink / raw)
  To: Wei Yang; +Cc: willy, mhocko, linux-mm, akpm

On Sat, Oct 20, 2018 at 04:33:18PM +0000, Wei Yang wrote:
> >Pages from remote nodes are not placed on local lists. Even in the slab
> >context, such objects are placed on alien caches which have special
> >handling.
> >
> 
> Hmm... I am not sure get your point correctly.
> 

The point is that one list should not contain a mix of pages belonging to
different nodes or zones or it'll result in unexpected behaviour. If you
are just shuffling the ordering of pages in the list, it needs justification
as to why that makes sense.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] put page to pcp->lists[] tail if it is not on the same node
  2018-10-21 12:12     ` Mel Gorman
@ 2018-10-22  1:24       ` Wei Yang
  0 siblings, 0 replies; 10+ messages in thread
From: Wei Yang @ 2018-10-22  1:24 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Wei Yang, willy, mhocko, linux-mm, akpm

On Sun, Oct 21, 2018 at 01:12:51PM +0100, Mel Gorman wrote:
>On Sat, Oct 20, 2018 at 04:33:18PM +0000, Wei Yang wrote:
>> >Pages from remote nodes are not placed on local lists. Even in the slab
>> >context, such objects are placed on alien caches which have special
>> >handling.
>> >
>> 
>> Hmm... I am not sure get your point correctly.
>> 
>
>The point is that one list should not contain a mix of pages belonging to
>different nodes or zones or it'll result in unexpected behaviour. If you
>are just shuffling the ordering of pages in the list, it needs justification
>as to why that makes sense.
>

Yep, you are right :-)

>-- 
>Mel Gorman
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-22  1:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-19  4:33 [RFC] put page to pcp->lists[] tail if it is not on the same node Wei Yang
2018-10-19  8:38 ` Mel Gorman
2018-10-20  0:54   ` Wei Yang
2018-10-20 16:33   ` Wei Yang
2018-10-21  2:36     ` Wei Yang
2018-10-21 12:12     ` Mel Gorman
2018-10-22  1:24       ` Wei Yang
2018-10-19 13:43 ` Vlastimil Babka
2018-10-20  1:38   ` Wei Yang
2018-10-20 16:10   ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.