From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id 895A86B025F for ; Thu, 20 Jul 2017 09:40:41 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id u89so13082054wrc.1 for ; Thu, 20 Jul 2017 06:40:41 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id x186si1710695wmb.93.2017.07.20.06.40.40 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 20 Jul 2017 06:40:40 -0700 (PDT) From: Vlastimil Babka Subject: [PATCH 0/4] page_ext/page_owner init fixes Date: Thu, 20 Jul 2017 15:40:25 +0200 Message-Id: <20170720134029.25268-1-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Michal Hocko , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang , Vlastimil Babka This is a followup to the mail thread [1] where we discussed some issues with current page_owner (thus also page_ext) init code. Patch 1 should be a straightforward optimization found during the process. Patches 2 and 3 are preparation towards patch 4, with the main issue described in its commit log. It's a RFC because there may be other solutions possible. [1] http://marc.info/?l=linux-mm&m=149883233221147&w=2 Vlastimil Babka (4): mm, page_owner: make init_pages_in_zone() faster mm, page_ext: periodically reschedule during page_ext_init() mm, page_owner: don't grab zone->lock for init_pages_in_zone() mm, page_ext: move page_ext_init() after page_alloc_init_late() init/main.c | 3 ++- mm/page_ext.c | 5 ++--- mm/page_owner.c | 35 ++++++++++++++++++++++++++++------- 3 files changed, 32 insertions(+), 11 deletions(-) -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id B96566B0292 for ; Thu, 20 Jul 2017 09:40:41 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id w63so13081954wrc.5 for ; Thu, 20 Jul 2017 06:40:41 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id v186si1721906wma.275.2017.07.20.06.40.39 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 20 Jul 2017 06:40:40 -0700 (PDT) From: Vlastimil Babka Subject: [PATCH 2/4] mm, page_ext: periodically reschedule during page_ext_init() Date: Thu, 20 Jul 2017 15:40:27 +0200 Message-Id: <20170720134029.25268-3-vbabka@suse.cz> In-Reply-To: <20170720134029.25268-1-vbabka@suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Michal Hocko , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang , Vlastimil Babka page_ext_init() can take long on large machines, so add a cond_resched() point after each section is processed. This will allow moving the init to a later point at boot without triggering lockup reports. Signed-off-by: Vlastimil Babka --- mm/page_ext.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/page_ext.c b/mm/page_ext.c index 88ccc044b09a..24cf8abefc8d 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -409,6 +409,7 @@ void __init page_ext_init(void) continue; if (init_section_page_ext(pfn, nid)) goto oom; + cond_resched(); } } hotplug_memory_notifier(page_ext_callback, 0); -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id D51706B025F for ; Thu, 20 Jul 2017 09:40:41 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id k71so13060392wrc.15 for ; Thu, 20 Jul 2017 06:40:41 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id f187si1701313wme.20.2017.07.20.06.40.40 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 20 Jul 2017 06:40:40 -0700 (PDT) From: Vlastimil Babka Subject: [PATCH 3/4] mm, page_owner: don't grab zone->lock for init_pages_in_zone() Date: Thu, 20 Jul 2017 15:40:28 +0200 Message-Id: <20170720134029.25268-4-vbabka@suse.cz> In-Reply-To: <20170720134029.25268-1-vbabka@suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Michal Hocko , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang , Vlastimil Babka init_pages_in_zone() is run under zone->lock, which means a long lock time and disabled interrupts on large machines. This is currently not an issue since it runs early in boot, but a later patch will change that. However, like other pfn scanners, we don't actually need zone->lock even when other cpus are running. The only potentially dangerous operation here is reading bogus buddy page owner due to race, and we already know how to handle that. The worse that can happen is that we skip some early allocated pages, which should not affect the debugging power of page_owner noticeably. Signed-off-by: Vlastimil Babka --- mm/page_owner.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 5aa21ca237d9..cf6568d1dc14 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -567,11 +567,17 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) continue; /* - * We are safe to check buddy flag and order, because - * this is init stage and only single thread runs. + * To avoid having to grab zone->lock, be a little + * careful when reading buddy page order. The only + * danger is that we skip too much and potentially miss + * some early allocated pages, which is better than + * heavy lock contention. */ if (PageBuddy(page)) { - pfn += (1UL << page_order(page)) - 1; + unsigned long order = page_order_unsafe(page); + + if (order > 0 && order < MAX_ORDER) + pfn += (1UL << order) - 1; continue; } @@ -590,6 +596,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) __set_page_owner_init(page_ext, init_handle); count++; } + cond_resched(); } pr_info("Node %d, zone %8s: page owner found early allocated %lu pages\n", @@ -600,15 +607,12 @@ static void init_zones_in_node(pg_data_t *pgdat) { struct zone *zone; struct zone *node_zones = pgdat->node_zones; - unsigned long flags; for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) { if (!populated_zone(zone)) continue; - spin_lock_irqsave(&zone->lock, flags); init_pages_in_zone(pgdat, zone); - spin_unlock_irqrestore(&zone->lock, flags); } } -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id EB5396B02F3 for ; Thu, 20 Jul 2017 09:40:41 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id k71so13060399wrc.15 for ; Thu, 20 Jul 2017 06:40:41 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p124si1699862wmd.269.2017.07.20.06.40.39 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 20 Jul 2017 06:40:40 -0700 (PDT) From: Vlastimil Babka Subject: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster Date: Thu, 20 Jul 2017 15:40:26 +0200 Message-Id: <20170720134029.25268-2-vbabka@suse.cz> In-Reply-To: <20170720134029.25268-1-vbabka@suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Michal Hocko , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang , Vlastimil Babka In init_pages_in_zone() we currently use the generic set_page_owner() function to initialize page_owner info for early allocated pages. This means we needlessly do lookup_page_ext() twice for each page, and more importantly save_stack(), which has to unwind the stack and find the corresponding stack depot handle. Because the stack is always the same for the initialization, unwind it once in init_pages_in_zone() and reuse the handle. Also avoid the repeated lookup_page_ext(). This can significantly reduce boot times with page_owner=on on large machines, especially for kernels built without frame pointer, where the stack unwinding is noticeably slower. Signed-off-by: Vlastimil Babka --- mm/page_owner.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/mm/page_owner.c b/mm/page_owner.c index 401feb070335..5aa21ca237d9 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -183,6 +183,20 @@ noinline void __set_page_owner(struct page *page, unsigned int order, __set_bit(PAGE_EXT_OWNER, &page_ext->flags); } +static void __set_page_owner_init(struct page_ext *page_ext, + depot_stack_handle_t handle) +{ + struct page_owner *page_owner; + + page_owner = get_page_owner(page_ext); + page_owner->handle = handle; + page_owner->order = 0; + page_owner->gfp_mask = 0; + page_owner->last_migrate_reason = -1; + + __set_bit(PAGE_EXT_OWNER, &page_ext->flags); +} + void __set_page_owner_migrate_reason(struct page *page, int reason) { struct page_ext *page_ext = lookup_page_ext(page); @@ -520,10 +534,13 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) unsigned long pfn = zone->zone_start_pfn, block_end_pfn; unsigned long end_pfn = pfn + zone->spanned_pages; unsigned long count = 0; + depot_stack_handle_t init_handle; /* Scan block by block. First and last block may be incomplete */ pfn = zone->zone_start_pfn; + init_handle = save_stack(0); + /* * Walk the zone in pageblock_nr_pages steps. If a page block spans * a zone boundary, it will be double counted between zones. This does @@ -570,7 +587,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) continue; /* Found early allocated page */ - set_page_owner(page, 0, 0); + __set_page_owner_init(page_ext, init_handle); count++; } } -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 1C3B36B0311 for ; Thu, 20 Jul 2017 09:40:42 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id l81so2798136wmg.8 for ; Thu, 20 Jul 2017 06:40:42 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id b23si6617756wra.151.2017.07.20.06.40.40 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 20 Jul 2017 06:40:40 -0700 (PDT) From: Vlastimil Babka Subject: [RFC PATCH 4/4] mm, page_ext: move page_ext_init() after page_alloc_init_late() Date: Thu, 20 Jul 2017 15:40:29 +0200 Message-Id: <20170720134029.25268-5-vbabka@suse.cz> In-Reply-To: <20170720134029.25268-1-vbabka@suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Michal Hocko , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang , Vlastimil Babka Commit b8f1a75d61d8 ("mm: call page_ext_init() after all struct pages are initialized") has avoided a a NULL pointer dereference due to DEFERRED_STRUCT_PAGE_INIT clashing with page_ext, by calling page_ext_init() only after the deferred struct page init has finished. Later commit fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") avoided the underlying issue differently and moved the page_ext_init() call back to where it was before. However, there are two problems with the current code: - on very large machines, page_ext_init() may fail to allocate the page_ext structures, because deferred struct page init hasn't yet started, and the pre-inited part might be too small. This has been observed with a 3TB machine with page_owner=on. Although it was an older kernel where page_owner hasn't yet been converted to stack depot, thus page_ext was larger, the fundamental problem is still in mainline. - page_owner's init_pages_in_zone() is called before deferred struct page init has started, so it will encounter unitialized struct pages. This currently happens to cause no harm, because the memmap array is are pre-zeroed on allocation and thus the "if (page_zone(page) != zone)" check is negative, but that pre-zeroing guarantee might change soon. The second problem could be also solved by limiting init_page_in_zone() by pgdat->first_deferred_pfn, but fixing the first issue would be more problematic. So this patch again moves page_ext_init() to wait for deferred struct page init to finish. This has some performance implications for boot time, which should be acceptable when enabling debugging functionality. We however keep the benefits of parallel initialization (one kthread per node) so it's better than e.g. disabling DEFERRED_STRUCT_PAGE_INIT completely when page_ext is being used. This effectively reverts commit fe53ca54270a757f0a28ee6bf3a54d952b550ed0. Signed-off-by: Vlastimil Babka --- init/main.c | 3 ++- mm/page_ext.c | 4 +--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/init/main.c b/init/main.c index f866510472d7..7b6517fe0980 100644 --- a/init/main.c +++ b/init/main.c @@ -628,7 +628,6 @@ asmlinkage __visible void __init start_kernel(void) initrd_start = 0; } #endif - page_ext_init(); debug_objects_mem_init(); kmemleak_init(); setup_per_cpu_pageset(); @@ -1035,6 +1034,8 @@ static noinline void __init kernel_init_freeable(void) sched_init_smp(); page_alloc_init_late(); + /* Initialize page ext after all struct pages are initializaed */ + page_ext_init(); do_basic_setup(); diff --git a/mm/page_ext.c b/mm/page_ext.c index 24cf8abefc8d..8522ebd784ac 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -402,10 +402,8 @@ void __init page_ext_init(void) * We know some arch can have a nodes layout such as * -------------pfn--------------> * N0 | N1 | N2 | N0 | N1 | N2|.... - * - * Take into account DEFERRED_STRUCT_PAGE_INIT. */ - if (early_pfn_to_nid(pfn) != nid) + if (pfn_to_nid(pfn) != nid) continue; if (init_section_page_ext(pfn, nid)) goto oom; -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id B40916B02C3 for ; Mon, 24 Jul 2017 08:38:51 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id 79so10816751wmg.4 for ; Mon, 24 Jul 2017 05:38:51 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 60si9456721wrq.87.2017.07.24.05.38.50 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 24 Jul 2017 05:38:50 -0700 (PDT) Date: Mon, 24 Jul 2017 14:38:43 +0200 From: Michal Hocko Subject: Re: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster Message-ID: <20170724123843.GH25221@dhcp22.suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-2-vbabka@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170720134029.25268-2-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On Thu 20-07-17 15:40:26, Vlastimil Babka wrote: > In init_pages_in_zone() we currently use the generic set_page_owner() function > to initialize page_owner info for early allocated pages. This means we > needlessly do lookup_page_ext() twice for each page, and more importantly > save_stack(), which has to unwind the stack and find the corresponding stack > depot handle. Because the stack is always the same for the initialization, > unwind it once in init_pages_in_zone() and reuse the handle. Also avoid the > repeated lookup_page_ext(). Yes this looks like an improvement but I have to admit that I do not really get why we even do save_stack at all here. Those pages might got allocated from anywhere so we could very well provide a statically allocated "fake" stack trace, no? Memory allocated for the stackdepot storage can be tracked inside depot_alloc_stack as well I guess (again with a statically preallocated storage). > This can significantly reduce boot times with page_owner=on on large machines, > especially for kernels built without frame pointer, where the stack unwinding > is noticeably slower. Some numbders would be really nice here > Signed-off-by: Vlastimil Babka > --- > mm/page_owner.c | 19 ++++++++++++++++++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/mm/page_owner.c b/mm/page_owner.c > index 401feb070335..5aa21ca237d9 100644 > --- a/mm/page_owner.c > +++ b/mm/page_owner.c > @@ -183,6 +183,20 @@ noinline void __set_page_owner(struct page *page, unsigned int order, > __set_bit(PAGE_EXT_OWNER, &page_ext->flags); > } > > +static void __set_page_owner_init(struct page_ext *page_ext, > + depot_stack_handle_t handle) > +{ > + struct page_owner *page_owner; > + > + page_owner = get_page_owner(page_ext); > + page_owner->handle = handle; > + page_owner->order = 0; > + page_owner->gfp_mask = 0; > + page_owner->last_migrate_reason = -1; > + > + __set_bit(PAGE_EXT_OWNER, &page_ext->flags); > +} Do we need to duplicated a part of __set_page_owner? Can we pull out both owner and handle out __set_page_owner? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id EDF786B02FD for ; Mon, 24 Jul 2017 08:45:10 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id w63so24594163wrc.5 for ; Mon, 24 Jul 2017 05:45:10 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id k23si9405925wrk.451.2017.07.24.05.45.09 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 24 Jul 2017 05:45:09 -0700 (PDT) Date: Mon, 24 Jul 2017 14:45:06 +0200 From: Michal Hocko Subject: Re: [PATCH 2/4] mm, page_ext: periodically reschedule during page_ext_init() Message-ID: <20170724124505.GI25221@dhcp22.suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-3-vbabka@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170720134029.25268-3-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On Thu 20-07-17 15:40:27, Vlastimil Babka wrote: > page_ext_init() can take long on large machines, so add a cond_resched() point > after each section is processed. This will allow moving the init to a later > point at boot without triggering lockup reports. > > Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko > --- > mm/page_ext.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 88ccc044b09a..24cf8abefc8d 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -409,6 +409,7 @@ void __init page_ext_init(void) > continue; > if (init_section_page_ext(pfn, nid)) > goto oom; > + cond_resched(); > } > } > hotplug_memory_notifier(page_ext_callback, 0); > -- > 2.13.2 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 31F286B0313 for ; Mon, 24 Jul 2017 08:50:24 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id 92so24616324wra.11 for ; Mon, 24 Jul 2017 05:50:24 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id f185si1213773wmg.188.2017.07.24.05.50.22 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 24 Jul 2017 05:50:23 -0700 (PDT) Date: Mon, 24 Jul 2017 14:50:15 +0200 From: Michal Hocko Subject: Re: [PATCH 3/4] mm, page_owner: don't grab zone->lock for init_pages_in_zone() Message-ID: <20170724125015.GJ25221@dhcp22.suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-4-vbabka@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170720134029.25268-4-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On Thu 20-07-17 15:40:28, Vlastimil Babka wrote: > init_pages_in_zone() is run under zone->lock, which means a long lock time and > disabled interrupts on large machines. This is currently not an issue since it > runs early in boot, but a later patch will change that. > However, like other pfn scanners, we don't actually need zone->lock even when > other cpus are running. The only potentially dangerous operation here is > reading bogus buddy page owner due to race, and we already know how to handle > that. The worse that can happen is that we skip some early allocated pages, > which should not affect the debugging power of page_owner noticeably. Makes sense to me > Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko > --- > mm/page_owner.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/mm/page_owner.c b/mm/page_owner.c > index 5aa21ca237d9..cf6568d1dc14 100644 > --- a/mm/page_owner.c > +++ b/mm/page_owner.c > @@ -567,11 +567,17 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) > continue; > > /* > - * We are safe to check buddy flag and order, because > - * this is init stage and only single thread runs. > + * To avoid having to grab zone->lock, be a little > + * careful when reading buddy page order. The only > + * danger is that we skip too much and potentially miss > + * some early allocated pages, which is better than > + * heavy lock contention. > */ > if (PageBuddy(page)) { > - pfn += (1UL << page_order(page)) - 1; > + unsigned long order = page_order_unsafe(page); > + > + if (order > 0 && order < MAX_ORDER) > + pfn += (1UL << order) - 1; > continue; > } > > @@ -590,6 +596,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) > __set_page_owner_init(page_ext, init_handle); > count++; > } > + cond_resched(); > } > > pr_info("Node %d, zone %8s: page owner found early allocated %lu pages\n", > @@ -600,15 +607,12 @@ static void init_zones_in_node(pg_data_t *pgdat) > { > struct zone *zone; > struct zone *node_zones = pgdat->node_zones; > - unsigned long flags; > > for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) { > if (!populated_zone(zone)) > continue; > > - spin_lock_irqsave(&zone->lock, flags); > init_pages_in_zone(pgdat, zone); > - spin_unlock_irqrestore(&zone->lock, flags); > } > } > > -- > 2.13.2 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id DA3796B02C3 for ; Mon, 24 Jul 2017 09:06:24 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id v102so24666920wrb.2 for ; Mon, 24 Jul 2017 06:06:24 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id c6si12001219wrb.310.2017.07.24.06.06.23 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 24 Jul 2017 06:06:23 -0700 (PDT) Date: Mon, 24 Jul 2017 15:06:13 +0200 From: Michal Hocko Subject: Re: [RFC PATCH 4/4] mm, page_ext: move page_ext_init() after page_alloc_init_late() Message-ID: <20170724130613.GK25221@dhcp22.suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-5-vbabka@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170720134029.25268-5-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On Thu 20-07-17 15:40:29, Vlastimil Babka wrote: > Commit b8f1a75d61d8 ("mm: call page_ext_init() after all struct pages are > initialized") has avoided a a NULL pointer dereference due to > DEFERRED_STRUCT_PAGE_INIT clashing with page_ext, by calling page_ext_init() > only after the deferred struct page init has finished. Later commit > fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") avoided the > underlying issue differently and moved the page_ext_init() call back to where > it was before. > > However, there are two problems with the current code: > - on very large machines, page_ext_init() may fail to allocate the page_ext > structures, because deferred struct page init hasn't yet started, and the > pre-inited part might be too small. > This has been observed with a 3TB machine with page_owner=on. Although it > was an older kernel where page_owner hasn't yet been converted to stack depot, > thus page_ext was larger, the fundamental problem is still in mainline. I was about to suggest using memblock/bootmem allocator but it seems that page_ext_init is called passed mm_init. Is there any specific reason why we cannot do the per-section initialization along with the rest of the memory section init code which should have an early allocator available? > - page_owner's init_pages_in_zone() is called before deferred struct page init > has started, so it will encounter unitialized struct pages. This currently > happens to cause no harm, because the memmap array is are pre-zeroed on > allocation and thus the "if (page_zone(page) != zone)" check is negative, but > that pre-zeroing guarantee might change soon. Yes this is annoying and the bug IMHO. We shouldn't consider spanned pages but rather the maximum valid pfn for the zone. The rest simply cannot by used by anybody so there shouldn't be any page_ext work due. Or am I missing something? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id BC110280757 for ; Wed, 23 Aug 2017 02:47:41 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id x6so1359124wme.4 for ; Tue, 22 Aug 2017 23:47:41 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 199si840502wma.60.2017.08.22.23.47.39 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 22 Aug 2017 23:47:40 -0700 (PDT) Subject: Re: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-2-vbabka@suse.cz> <20170724123843.GH25221@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> Date: Wed, 23 Aug 2017 08:47:37 +0200 MIME-Version: 1.0 In-Reply-To: <20170724123843.GH25221@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On 07/24/2017 02:38 PM, Michal Hocko wrote: > On Thu 20-07-17 15:40:26, Vlastimil Babka wrote: >> In init_pages_in_zone() we currently use the generic set_page_owner() function >> to initialize page_owner info for early allocated pages. This means we >> needlessly do lookup_page_ext() twice for each page, and more importantly >> save_stack(), which has to unwind the stack and find the corresponding stack >> depot handle. Because the stack is always the same for the initialization, >> unwind it once in init_pages_in_zone() and reuse the handle. Also avoid the >> repeated lookup_page_ext(). > > Yes this looks like an improvement but I have to admit that I do not > really get why we even do save_stack at all here. Those pages might > got allocated from anywhere so we could very well provide a statically > allocated "fake" stack trace, no? We could, but it's much simpler to do it this way than try to extend stack depot/stack saving to support creating such fakes. Would it be worth the effort? > Memory allocated for the stackdepot storage can be tracked inside > depot_alloc_stack as well I guess (again with a statically preallocated > storage). I'm not sure I get your point here? The pages we have to "fake" are not just the stackdepot storage itself, but everything that has been allocated before the page_owner is initialized. >> This can significantly reduce boot times with page_owner=on on large machines, >> especially for kernels built without frame pointer, where the stack unwinding >> is noticeably slower. > > Some numbders would be really nice here Well, the problem was that on a 3TB machine I just gave up and rebooted it after ~30 minutes of waiting for the init to finish. After this patch it was maybe 5 minutes. >> Signed-off-by: Vlastimil Babka >> --- >> mm/page_owner.c | 19 ++++++++++++++++++- >> 1 file changed, 18 insertions(+), 1 deletion(-) >> >> diff --git a/mm/page_owner.c b/mm/page_owner.c >> index 401feb070335..5aa21ca237d9 100644 >> --- a/mm/page_owner.c >> +++ b/mm/page_owner.c >> @@ -183,6 +183,20 @@ noinline void __set_page_owner(struct page *page, unsigned int order, >> __set_bit(PAGE_EXT_OWNER, &page_ext->flags); >> } >> >> +static void __set_page_owner_init(struct page_ext *page_ext, >> + depot_stack_handle_t handle) >> +{ >> + struct page_owner *page_owner; >> + >> + page_owner = get_page_owner(page_ext); >> + page_owner->handle = handle; >> + page_owner->order = 0; >> + page_owner->gfp_mask = 0; >> + page_owner->last_migrate_reason = -1; >> + >> + __set_bit(PAGE_EXT_OWNER, &page_ext->flags); >> +} > > Do we need to duplicated a part of __set_page_owner? Can we pull out > both owner and handle out __set_page_owner? I wanted to avoid overhead in __set_page_owner() by introducing extra shared function, but I'll check if that can be helped. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id BC654280704 for ; Thu, 24 Aug 2017 03:01:55 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id 184so2248344wmi.12 for ; Thu, 24 Aug 2017 00:01:55 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 16si2981733wrx.316.2017.08.24.00.01.53 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 24 Aug 2017 00:01:54 -0700 (PDT) Subject: Re: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster From: Vlastimil Babka References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-2-vbabka@suse.cz> <20170724123843.GH25221@dhcp22.suse.cz> <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> Message-ID: Date: Thu, 24 Aug 2017 09:01:52 +0200 MIME-Version: 1.0 In-Reply-To: <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On 08/23/2017 08:47 AM, Vlastimil Babka wrote: > On 07/24/2017 02:38 PM, Michal Hocko wrote: >> >> Do we need to duplicated a part of __set_page_owner? Can we pull out >> both owner and handle out __set_page_owner? > > I wanted to avoid overhead in __set_page_owner() by introducing extra > shared function, but I'll check if that can be helped. Ok, here's a -fix for that. ----8<---- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id DDD186B0292 for ; Thu, 31 Aug 2017 04:33:44 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id i76so5437901wme.2 for ; Thu, 31 Aug 2017 01:33:44 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p7si5978179wrg.170.2017.08.31.01.33.43 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 31 Aug 2017 01:33:43 -0700 (PDT) Subject: Re: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster From: Vlastimil Babka References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-2-vbabka@suse.cz> <20170724123843.GH25221@dhcp22.suse.cz> <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> Message-ID: <45813564-2342-fc8d-d31a-f4b68a724325@suse.cz> Date: Thu, 31 Aug 2017 09:55:25 +0200 MIME-Version: 1.0 In-Reply-To: <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On 08/23/2017 08:47 AM, Vlastimil Babka wrote: > On 07/24/2017 02:38 PM, Michal Hocko wrote: >> On Thu 20-07-17 15:40:26, Vlastimil Babka wrote: >>> In init_pages_in_zone() we currently use the generic set_page_owner() function >>> to initialize page_owner info for early allocated pages. This means we >>> needlessly do lookup_page_ext() twice for each page, and more importantly >>> save_stack(), which has to unwind the stack and find the corresponding stack >>> depot handle. Because the stack is always the same for the initialization, >>> unwind it once in init_pages_in_zone() and reuse the handle. Also avoid the >>> repeated lookup_page_ext(). >> >> Yes this looks like an improvement but I have to admit that I do not >> really get why we even do save_stack at all here. Those pages might >> got allocated from anywhere so we could very well provide a statically >> allocated "fake" stack trace, no? > > We could, but it's much simpler to do it this way than try to extend > stack depot/stack saving to support creating such fakes. Would it be > worth the effort? Ah, I've noticed we already do this for the dummy (prevent recursion) stack and failure stack. So here you go. It will also make the fake stack more obvious after "[PATCH 2/2] mm, page_owner: Skip unnecessary stack_trace entries" is merged, which would otherwise remove init_page_owner() from the stack. ----8<---- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 9B10428042D for ; Wed, 6 Sep 2017 09:38:52 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id p17so6228483wmd.3 for ; Wed, 06 Sep 2017 06:38:52 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p9si1240742wmi.0.2017.09.06.06.38.50 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 06 Sep 2017 06:38:50 -0700 (PDT) Date: Wed, 6 Sep 2017 15:38:42 +0200 From: Michal Hocko Subject: Re: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster Message-ID: <20170906133842.md6gt76nt2z46fdz@dhcp22.suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-2-vbabka@suse.cz> <20170724123843.GH25221@dhcp22.suse.cz> <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang [Sorry for the late reply] On Thu 24-08-17 09:01:52, Vlastimil Babka wrote: > On 08/23/2017 08:47 AM, Vlastimil Babka wrote: > > On 07/24/2017 02:38 PM, Michal Hocko wrote: > >> > >> Do we need to duplicated a part of __set_page_owner? Can we pull out > >> both owner and handle out __set_page_owner? > > > > I wanted to avoid overhead in __set_page_owner() by introducing extra > > shared function, but I'll check if that can be helped. > > Ok, here's a -fix for that. > > ----8<---- > >From b607d021d52c5f4b64874a7e738a62e3f0e5ddea Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka > Date: Thu, 24 Aug 2017 08:39:58 +0200 > Subject: [PATCH] mm, page_owner: make init_pages_in_zone() faster-fix > > Don't duplicate code of __set_page_owner(), per Michal Hocko. Looks good. I assume Andrew will fold this into mm-page_owner-make-init_pages_in_zone-faster.patch. Anyway Acked-by: Michal Hocko > --- > mm/page_owner.c | 31 +++++++++++++------------------ > 1 file changed, 13 insertions(+), 18 deletions(-) > > diff --git a/mm/page_owner.c b/mm/page_owner.c > index 5aa21ca237d9..cd72a74d41a0 100644 > --- a/mm/page_owner.c > +++ b/mm/page_owner.c > @@ -165,17 +165,13 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags) > return handle; > } > > -noinline void __set_page_owner(struct page *page, unsigned int order, > - gfp_t gfp_mask) > +static inline void __set_page_owner_handle(struct page_ext *page_ext, > + depot_stack_handle_t handle, unsigned int order, gfp_t gfp_mask) > { > - struct page_ext *page_ext = lookup_page_ext(page); > struct page_owner *page_owner; > > - if (unlikely(!page_ext)) > - return; > - > page_owner = get_page_owner(page_ext); > - page_owner->handle = save_stack(gfp_mask); > + page_owner->handle = handle; > page_owner->order = order; > page_owner->gfp_mask = gfp_mask; > page_owner->last_migrate_reason = -1; > @@ -183,18 +179,17 @@ noinline void __set_page_owner(struct page *page, unsigned int order, > __set_bit(PAGE_EXT_OWNER, &page_ext->flags); > } > > -static void __set_page_owner_init(struct page_ext *page_ext, > - depot_stack_handle_t handle) > +noinline void __set_page_owner(struct page *page, unsigned int order, > + gfp_t gfp_mask) > { > - struct page_owner *page_owner; > + struct page_ext *page_ext = lookup_page_ext(page); > + depot_stack_handle_t handle; > > - page_owner = get_page_owner(page_ext); > - page_owner->handle = handle; > - page_owner->order = 0; > - page_owner->gfp_mask = 0; > - page_owner->last_migrate_reason = -1; > + if (unlikely(!page_ext)) > + return; > > - __set_bit(PAGE_EXT_OWNER, &page_ext->flags); > + handle = save_stack(gfp_mask); > + __set_page_owner_handle(page_ext, handle, order, gfp_mask); > } > > void __set_page_owner_migrate_reason(struct page *page, int reason) > @@ -582,12 +577,12 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) > if (unlikely(!page_ext)) > continue; > > - /* Maybe overraping zone */ > + /* Maybe overlaping zone */ > if (test_bit(PAGE_EXT_OWNER, &page_ext->flags)) > continue; > > /* Found early allocated page */ > - __set_page_owner_init(page_ext, init_handle); > + __set_page_owner_handle(page_ext, init_handle, 0, 0); > count++; > } > } > -- > 2.14.1 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f197.google.com (mail-wr0-f197.google.com [209.85.128.197]) by kanga.kvack.org (Postfix) with ESMTP id 5EFF56B04E1 for ; Wed, 6 Sep 2017 09:49:15 -0400 (EDT) Received: by mail-wr0-f197.google.com with SMTP id v109so7062893wrc.5 for ; Wed, 06 Sep 2017 06:49:15 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id b34si2598006wra.5.2017.09.06.06.49.13 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 06 Sep 2017 06:49:13 -0700 (PDT) Date: Wed, 6 Sep 2017 15:49:08 +0200 From: Michal Hocko Subject: Re: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster Message-ID: <20170906134908.xv7esjffv2xmpbq4@dhcp22.suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-2-vbabka@suse.cz> <20170724123843.GH25221@dhcp22.suse.cz> <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> <45813564-2342-fc8d-d31a-f4b68a724325@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45813564-2342-fc8d-d31a-f4b68a724325@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On Thu 31-08-17 09:55:25, Vlastimil Babka wrote: > On 08/23/2017 08:47 AM, Vlastimil Babka wrote: > > On 07/24/2017 02:38 PM, Michal Hocko wrote: > >> On Thu 20-07-17 15:40:26, Vlastimil Babka wrote: > >>> In init_pages_in_zone() we currently use the generic set_page_owner() function > >>> to initialize page_owner info for early allocated pages. This means we > >>> needlessly do lookup_page_ext() twice for each page, and more importantly > >>> save_stack(), which has to unwind the stack and find the corresponding stack > >>> depot handle. Because the stack is always the same for the initialization, > >>> unwind it once in init_pages_in_zone() and reuse the handle. Also avoid the > >>> repeated lookup_page_ext(). > >> > >> Yes this looks like an improvement but I have to admit that I do not > >> really get why we even do save_stack at all here. Those pages might > >> got allocated from anywhere so we could very well provide a statically > >> allocated "fake" stack trace, no? > > > > We could, but it's much simpler to do it this way than try to extend > > stack depot/stack saving to support creating such fakes. Would it be > > worth the effort? > > Ah, I've noticed we already do this for the dummy (prevent recursion) > stack and failure stack. So here you go. It will also make the fake > stack more obvious after "[PATCH 2/2] mm, page_owner: Skip unnecessary > stack_trace entries" is merged, which would otherwise remove > init_page_owner() from the stack. Yes this is what I've had in mind. > ----8<---- > >From 9804a5e62fc768e12b86fd4a3184e692c59ebfd1 Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka > Date: Thu, 31 Aug 2017 09:46:46 +0200 > Subject: [PATCH] mm, page_owner: make init_pages_in_zone() faster-fix2 > > Create statically allocated fake stack trace for early allocated pages, per > Michal Hocko. Yes this looks good to me. I am just wondering why we need 3 different fake stacks. I do not see any code that would special case them when dumping traces. Maybe this can be done on top? > Signed-off-by: Vlastimil Babka Anyway Acked-by: Michal Hocko > --- > mm/page_owner.c | 30 +++++++++++++++--------------- > 1 file changed, 15 insertions(+), 15 deletions(-) > > diff --git a/mm/page_owner.c b/mm/page_owner.c > index 54d49fc8035e..262503f3ea66 100644 > --- a/mm/page_owner.c > +++ b/mm/page_owner.c > @@ -30,6 +30,7 @@ DEFINE_STATIC_KEY_FALSE(page_owner_inited); > > static depot_stack_handle_t dummy_handle; > static depot_stack_handle_t failure_handle; > +static depot_stack_handle_t early_handle; > > static void init_early_allocated_pages(void); > > @@ -53,7 +54,7 @@ static bool need_page_owner(void) > return true; > } > > -static noinline void register_dummy_stack(void) > +static __always_inline depot_stack_handle_t create_dummy_stack(void) > { > unsigned long entries[4]; > struct stack_trace dummy; > @@ -64,21 +65,22 @@ static noinline void register_dummy_stack(void) > dummy.skip = 0; > > save_stack_trace(&dummy); > - dummy_handle = depot_save_stack(&dummy, GFP_KERNEL); > + return depot_save_stack(&dummy, GFP_KERNEL); > } > > -static noinline void register_failure_stack(void) > +static noinline void register_dummy_stack(void) > { > - unsigned long entries[4]; > - struct stack_trace failure; > + dummy_handle = create_dummy_stack(); > +} > > - failure.nr_entries = 0; > - failure.max_entries = ARRAY_SIZE(entries); > - failure.entries = &entries[0]; > - failure.skip = 0; > +static noinline void register_failure_stack(void) > +{ > + failure_handle = create_dummy_stack(); > +} > > - save_stack_trace(&failure); > - failure_handle = depot_save_stack(&failure, GFP_KERNEL); > +static noinline void register_early_stack(void) > +{ > + early_handle = create_dummy_stack(); > } > > static void init_page_owner(void) > @@ -88,6 +90,7 @@ static void init_page_owner(void) > > register_dummy_stack(); > register_failure_stack(); > + register_early_stack(); > static_branch_enable(&page_owner_inited); > init_early_allocated_pages(); > } > @@ -529,13 +532,10 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) > unsigned long pfn = zone->zone_start_pfn, block_end_pfn; > unsigned long end_pfn = pfn + zone->spanned_pages; > unsigned long count = 0; > - depot_stack_handle_t init_handle; > > /* Scan block by block. First and last block may be incomplete */ > pfn = zone->zone_start_pfn; > > - init_handle = save_stack(0); > - > /* > * Walk the zone in pageblock_nr_pages steps. If a page block spans > * a zone boundary, it will be double counted between zones. This does > @@ -588,7 +588,7 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) > continue; > > /* Found early allocated page */ > - __set_page_owner_handle(page_ext, init_handle, 0, 0); > + __set_page_owner_handle(page_ext, early_handle, 0, 0); > count++; > } > cond_resched(); > -- > 2.14.1 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 311BB2802FE for ; Wed, 6 Sep 2017 09:55:26 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id u26so6260654wma.4 for ; Wed, 06 Sep 2017 06:55:26 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id s26si391175wma.233.2017.09.06.06.55.24 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 06 Sep 2017 06:55:24 -0700 (PDT) Subject: Re: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-2-vbabka@suse.cz> <20170724123843.GH25221@dhcp22.suse.cz> <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> <45813564-2342-fc8d-d31a-f4b68a724325@suse.cz> <20170906134908.xv7esjffv2xmpbq4@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: Date: Wed, 6 Sep 2017 15:55:22 +0200 MIME-Version: 1.0 In-Reply-To: <20170906134908.xv7esjffv2xmpbq4@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On 09/06/2017 03:49 PM, Michal Hocko wrote: > On Thu 31-08-17 09:55:25, Vlastimil Babka wrote: >> On 08/23/2017 08:47 AM, Vlastimil Babka wrote: >>> On 07/24/2017 02:38 PM, Michal Hocko wrote: >>>> On Thu 20-07-17 15:40:26, Vlastimil Babka wrote: >>>>> In init_pages_in_zone() we currently use the generic set_page_owner() function >>>>> to initialize page_owner info for early allocated pages. This means we >>>>> needlessly do lookup_page_ext() twice for each page, and more importantly >>>>> save_stack(), which has to unwind the stack and find the corresponding stack >>>>> depot handle. Because the stack is always the same for the initialization, >>>>> unwind it once in init_pages_in_zone() and reuse the handle. Also avoid the >>>>> repeated lookup_page_ext(). >>>> >>>> Yes this looks like an improvement but I have to admit that I do not >>>> really get why we even do save_stack at all here. Those pages might >>>> got allocated from anywhere so we could very well provide a statically >>>> allocated "fake" stack trace, no? >>> >>> We could, but it's much simpler to do it this way than try to extend >>> stack depot/stack saving to support creating such fakes. Would it be >>> worth the effort? >> >> Ah, I've noticed we already do this for the dummy (prevent recursion) >> stack and failure stack. So here you go. It will also make the fake >> stack more obvious after "[PATCH 2/2] mm, page_owner: Skip unnecessary >> stack_trace entries" is merged, which would otherwise remove >> init_page_owner() from the stack. > > Yes this is what I've had in mind. > >> ----8<---- >> >From 9804a5e62fc768e12b86fd4a3184e692c59ebfd1 Mon Sep 17 00:00:00 2001 >> From: Vlastimil Babka >> Date: Thu, 31 Aug 2017 09:46:46 +0200 >> Subject: [PATCH] mm, page_owner: make init_pages_in_zone() faster-fix2 >> >> Create statically allocated fake stack trace for early allocated pages, per >> Michal Hocko. > > Yes this looks good to me. I am just wondering why we need 3 different > fake stacks. I do not see any code that would special case them when > dumping traces. Maybe this can be done on top? It's so that the user can differentiate them in the output. That's why the functions are noinline. >> Signed-off-by: Vlastimil Babka > > Anyway > Acked-by: Michal Hocko Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id B63B0280422 for ; Wed, 6 Sep 2017 10:32:06 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id g50so1384426wra.4 for ; Wed, 06 Sep 2017 07:32:06 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id r142si449236wmg.229.2017.09.06.07.32.04 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 06 Sep 2017 07:32:05 -0700 (PDT) Date: Wed, 6 Sep 2017 16:32:00 +0200 From: Michal Hocko Subject: Re: [PATCH 1/4] mm, page_owner: make init_pages_in_zone() faster Message-ID: <20170906143200.dq4or4paer7or5pc@dhcp22.suse.cz> References: <20170720134029.25268-1-vbabka@suse.cz> <20170720134029.25268-2-vbabka@suse.cz> <20170724123843.GH25221@dhcp22.suse.cz> <483227ce-6786-f04b-72d1-dba18e06ccaa@suse.cz> <45813564-2342-fc8d-d31a-f4b68a724325@suse.cz> <20170906134908.xv7esjffv2xmpbq4@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Andrew Morton , Mel Gorman , Yang Shi , Laura Abbott , Vinayak Menon , zhong jiang On Wed 06-09-17 15:55:22, Vlastimil Babka wrote: > On 09/06/2017 03:49 PM, Michal Hocko wrote: [...] > > Yes this looks good to me. I am just wondering why we need 3 different > > fake stacks. I do not see any code that would special case them when > > dumping traces. Maybe this can be done on top? > > It's so that the user can differentiate them in the output. That's why > the functions are noinline. Ble I've missed the the noinline part. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org