From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751753AbeCLOW6 (ORCPT <rfc822;w@1wt.eu>);
        Mon, 12 Mar 2018 10:22:58 -0400
Received: from mx2.suse.de ([195.135.220.15]:37285 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751280AbeCLOW4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 12 Mar 2018 10:22:56 -0400
Subject: Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when
 picking pages to free
To: Aaron Lu <aaron.lu@intel.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
        Huang Ying <ying.huang@intel.com>, Dave Hansen <dave.hansen@intel.com>,
        Kemi Wang <kemi.wang@intel.com>, Tim Chen <tim.c.chen@linux.intel.com>,
        Andi Kleen <ak@linux.intel.com>, Michal Hocko <mhocko@suse.com>,
        Mel Gorman <mgorman@techsingularity.net>,
        Matthew Wilcox <willy@infradead.org>,
        David Rientjes <rientjes@google.com>
References: <20180301062845.26038-1-aaron.lu@intel.com>
 <20180301062845.26038-3-aaron.lu@intel.com>
From: Vlastimil Babka <vbabka@suse.cz>
Message-ID: <9cad642d-9fe5-b2c3-456c-279065c32337@suse.cz>
Date: Mon, 12 Mar 2018 15:22:53 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <20180301062845.26038-3-aaron.lu@intel.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 03/01/2018 07:28 AM, Aaron Lu wrote:
> When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy,
> the zone->lock is held and then pages are chosen from PCP's migratetype
> list. While there is actually no need to do this 'choose part' under
> lock since it's PCP pages, the only CPU that can touch them is us and
> irq is also disabled.
> 
> Moving this part outside could reduce lock held time and improve
> performance. Test with will-it-scale/page_fault1 full load:
> 
> kernel      Broadwell(2S)  Skylake(2S)   Broadwell(4S)  Skylake(4S)
> v4.16-rc2+  9034215        7971818       13667135       15677465
> this patch  9536374 +5.6%  8314710 +4.3% 14070408 +3.0% 16675866 +6.4%
> 
> What the test does is: starts $nr_cpu processes and each will repeatedly
> do the following for 5 minutes:
> 1 mmap 128M anonymouse space;
> 2 write access to that space;
> 3 munmap.
> The score is the aggregated iteration.
> 
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c
> 
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---
>  mm/page_alloc.c | 39 +++++++++++++++++++++++----------------
>  1 file changed, 23 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index faa33eac1635..dafdcdec9c1f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1116,12 +1116,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  	int migratetype = 0;
>  	int batch_free = 0;
>  	bool isolated_pageblocks;
> -
> -	spin_lock(&zone->lock);
> -	isolated_pageblocks = has_isolate_pageblock(zone);
> +	struct page *page, *tmp;
> +	LIST_HEAD(head);
>  
>  	while (count) {
> -		struct page *page;
>  		struct list_head *list;
>  
>  		/*
> @@ -1143,27 +1141,36 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  			batch_free = count;
>  
>  		do {
> -			int mt;	/* migratetype of the to-be-freed page */
> -
>  			page = list_last_entry(list, struct page, lru);
> -			/* must delete as __free_one_page list manipulates */
> +			/* must delete to avoid corrupting pcp list */
>  			list_del(&page->lru);

Well, since bulkfree_pcp_prepare() doesn't care about page->lru, you
could maybe use list_move_tail() instead of list_del() +
list_add_tail()? That avoids temporarily writing poison values.

Hm actually, you are reversing the list in the process, because page is
obtained by list_last_entry and you use list_add_tail. That could have
unintended performance consequences?

Also maybe list_cut_position() could be faster than shuffling pages one
by one? I guess not really, because batch_free will be generally low?

>  			pcp->count--;
>  
> -			mt = get_pcppage_migratetype(page);
> -			/* MIGRATE_ISOLATE page should not go to pcplists */
> -			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> -			/* Pageblock could have been isolated meanwhile */
> -			if (unlikely(isolated_pageblocks))
> -				mt = get_pageblock_migratetype(page);
> -
>  			if (bulkfree_pcp_prepare(page))
>  				continue;
>  
> -			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> -			trace_mm_page_pcpu_drain(page, 0, mt);
> +			list_add_tail(&page->lru, &head);
>  		} while (--count && --batch_free && !list_empty(list));
>  	}
> +
> +	spin_lock(&zone->lock);
> +	isolated_pageblocks = has_isolate_pageblock(zone);
> +
> +	/*
> +	 * Use safe version since after __free_one_page(),
> +	 * page->lru.next will not point to original list.
> +	 */
> +	list_for_each_entry_safe(page, tmp, &head, lru) {
> +		int mt = get_pcppage_migratetype(page);
> +		/* MIGRATE_ISOLATE page should not go to pcplists */
> +		VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> +		/* Pageblock could have been isolated meanwhile */
> +		if (unlikely(isolated_pageblocks))
> +			mt = get_pageblock_migratetype(page);
> +
> +		__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> +		trace_mm_page_pcpu_drain(page, 0, mt);
> +	}
>  	spin_unlock(&zone->lock);
>  }
>  
>