* [PATCH] mm: initialize deferred pages with interrupts enabled @ 2020-04-01 19:32 Pavel Tatashin 2020-04-01 19:57 ` Michal Hocko ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Pavel Tatashin @ 2020-04-01 19:32 UTC (permalink / raw) To: linux-kernel, akpm, mhocko, linux-mm, dan.j.williams, shile.zhang, daniel.m.jordan, pasha.tatashin, ktkhai, david, jmorris, sashal Initializing struct pages is a long task and keeping interrupts disabled for the duration of this operation introduces a number of problems. 1. jiffies are not updated for long period of time, and thus incorrect time is reported. See proposed solution and discussion here: lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com 2. It prevents farther improving deferred page initialization by allowing inter-node multi-threading. We are keeping interrupts disabled to solve a rather theoretical problem that was never observed in real world (See 3a2d7fa8a3d5). Lets keep interrupts enabled. In case we ever encounter a scenario where an interrupt thread wants to allocate large amount of memory this early in boot we can deal with that by growing zone (see deferred_grow_zone()) by the needed amount before starting deferred_init_memmap() threads. Before: [ 1.232459] node 0 initialised, 12058412 pages in 1ms After: [ 1.632580] node 0 initialised, 12051227 pages in 436ms Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> --- mm/page_alloc.c | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c4eb750a199..4498a13b372d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1792,6 +1792,13 @@ static int __init deferred_init_memmap(void *data) BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); pgdat->first_deferred_pfn = ULONG_MAX; + /* + * Once we unlock here, the zone cannot be grown anymore, thus if an + * interrupt thread must allocate this early in boot, zone must be + * pre-grown prior to start of deferred page initialization. + */ + pgdat_resize_unlock(pgdat, &flags); + /* Only the highest zone is deferred so find it */ for (zid = 0; zid < MAX_NR_ZONES; zid++) { zone = pgdat->node_zones + zid; @@ -1812,8 +1819,6 @@ static int __init deferred_init_memmap(void *data) while (spfn < epfn) nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); zone_empty: - pgdat_resize_unlock(pgdat, &flags); - /* Sanity check that the next zone really is unpopulated */ WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); @@ -1854,18 +1859,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order) return false; pgdat_resize_lock(pgdat, &flags); - - /* - * If deferred pages have been initialized while we were waiting for - * the lock, return true, as the zone was grown. The caller will retry - * this zone. We won't return to this function since the caller also - * has this static branch. - */ - if (!static_branch_unlikely(&deferred_pages)) { - pgdat_resize_unlock(pgdat, &flags); - return true; - } - /* * If someone grew this zone while we were waiting for spinlock, return * true, as there might be enough pages already. -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: initialize deferred pages with interrupts enabled 2020-04-01 19:32 [PATCH] mm: initialize deferred pages with interrupts enabled Pavel Tatashin @ 2020-04-01 19:57 ` Michal Hocko 2020-04-01 20:27 ` Pavel Tatashin 2020-04-01 19:58 ` Michal Hocko 2020-04-01 20:00 ` Daniel Jordan 2 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2020-04-01 19:57 UTC (permalink / raw) To: Pavel Tatashin Cc: linux-kernel, akpm, linux-mm, dan.j.williams, shile.zhang, daniel.m.jordan, ktkhai, david, jmorris, sashal On Wed 01-04-20 15:32:38, Pavel Tatashin wrote: > Initializing struct pages is a long task and keeping interrupts disabled > for the duration of this operation introduces a number of problems. > > 1. jiffies are not updated for long period of time, and thus incorrect time > is reported. See proposed solution and discussion here: > lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com http://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com > 2. It prevents farther improving deferred page initialization by allowing > inter-node multi-threading. > > We are keeping interrupts disabled to solve a rather theoretical problem > that was never observed in real world (See 3a2d7fa8a3d5). > > Lets keep interrupts enabled. In case we ever encounter a scenario where > an interrupt thread wants to allocate large amount of memory this early in > boot we can deal with that by growing zone (see deferred_grow_zone()) by > the needed amount before starting deferred_init_memmap() threads. > > Before: > [ 1.232459] node 0 initialised, 12058412 pages in 1ms > > After: > [ 1.632580] node 0 initialised, 12051227 pages in 436ms > Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> I would much rather see pgdat_resize_lock completely out of both the allocator and deferred init path altogether but this can be done in a separate patch. This one looks slightly safer for stable backports. To be completely honest I would love to see the resize lock go away completely. That might need a deeper thought but I believe it is something that has never been done properly. Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/page_alloc.c | 21 +++++++-------------- > 1 file changed, 7 insertions(+), 14 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c4eb750a199..4498a13b372d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1792,6 +1792,13 @@ static int __init deferred_init_memmap(void *data) > BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); > pgdat->first_deferred_pfn = ULONG_MAX; > > + /* > + * Once we unlock here, the zone cannot be grown anymore, thus if an > + * interrupt thread must allocate this early in boot, zone must be > + * pre-grown prior to start of deferred page initialization. > + */ > + pgdat_resize_unlock(pgdat, &flags); > + > /* Only the highest zone is deferred so find it */ > for (zid = 0; zid < MAX_NR_ZONES; zid++) { > zone = pgdat->node_zones + zid; > @@ -1812,8 +1819,6 @@ static int __init deferred_init_memmap(void *data) > while (spfn < epfn) > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > zone_empty: > - pgdat_resize_unlock(pgdat, &flags); > - > /* Sanity check that the next zone really is unpopulated */ > WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); > > @@ -1854,18 +1859,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order) > return false; > > pgdat_resize_lock(pgdat, &flags); > - > - /* > - * If deferred pages have been initialized while we were waiting for > - * the lock, return true, as the zone was grown. The caller will retry > - * this zone. We won't return to this function since the caller also > - * has this static branch. > - */ > - if (!static_branch_unlikely(&deferred_pages)) { > - pgdat_resize_unlock(pgdat, &flags); > - return true; > - } > - > /* > * If someone grew this zone while we were waiting for spinlock, return > * true, as there might be enough pages already. > -- > 2.17.1 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: initialize deferred pages with interrupts enabled 2020-04-01 19:57 ` Michal Hocko @ 2020-04-01 20:27 ` Pavel Tatashin 2020-04-02 7:34 ` Michal Hocko 0 siblings, 1 reply; 9+ messages in thread From: Pavel Tatashin @ 2020-04-01 20:27 UTC (permalink / raw) To: Michal Hocko Cc: LKML, Andrew Morton, linux-mm, Dan Williams, Shile Zhang, Daniel Jordan, Kirill Tkhai, David Hildenbrand, James Morris, Sasha Levin On Wed, Apr 1, 2020 at 3:57 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Wed 01-04-20 15:32:38, Pavel Tatashin wrote: > > Initializing struct pages is a long task and keeping interrupts disabled > > for the duration of this operation introduces a number of problems. > > > > 1. jiffies are not updated for long period of time, and thus incorrect time > > is reported. See proposed solution and discussion here: > > lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com > > http://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com > > > 2. It prevents farther improving deferred page initialization by allowing > > inter-node multi-threading. > > > > We are keeping interrupts disabled to solve a rather theoretical problem > > that was never observed in real world (See 3a2d7fa8a3d5). > > > > Lets keep interrupts enabled. In case we ever encounter a scenario where > > an interrupt thread wants to allocate large amount of memory this early in > > boot we can deal with that by growing zone (see deferred_grow_zone()) by > > the needed amount before starting deferred_init_memmap() threads. > > > > Before: > > [ 1.232459] node 0 initialised, 12058412 pages in 1ms > > > > After: > > [ 1.632580] node 0 initialised, 12051227 pages in 436ms > > > > Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > I would much rather see pgdat_resize_lock completely out of both the > allocator and deferred init path altogether but this can be done in a > separate patch. This one looks slightly safer for stable backports. This is what I wanted to do, but after studying deferred_grow_zone(), I do not see a simple way to solve this. It is one thing to fail an allocation, and it is another thing to have a corruption because of race. > To be completely honest I would love to see the resize lock go away > completely. That might need a deeper thought but I believe it is > something that has never been done properly. > > Acked-by: Michal Hocko <mhocko@suse.com> Thank you, Pasha > > Thanks! > > > --- > > mm/page_alloc.c | 21 +++++++-------------- > > 1 file changed, 7 insertions(+), 14 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 3c4eb750a199..4498a13b372d 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1792,6 +1792,13 @@ static int __init deferred_init_memmap(void *data) > > BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); > > pgdat->first_deferred_pfn = ULONG_MAX; > > > > + /* > > + * Once we unlock here, the zone cannot be grown anymore, thus if an > > + * interrupt thread must allocate this early in boot, zone must be > > + * pre-grown prior to start of deferred page initialization. > > + */ > > + pgdat_resize_unlock(pgdat, &flags); > > + > > /* Only the highest zone is deferred so find it */ > > for (zid = 0; zid < MAX_NR_ZONES; zid++) { > > zone = pgdat->node_zones + zid; > > @@ -1812,8 +1819,6 @@ static int __init deferred_init_memmap(void *data) > > while (spfn < epfn) > > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > > zone_empty: > > - pgdat_resize_unlock(pgdat, &flags); > > - > > /* Sanity check that the next zone really is unpopulated */ > > WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); > > > > @@ -1854,18 +1859,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order) > > return false; > > > > pgdat_resize_lock(pgdat, &flags); > > - > > - /* > > - * If deferred pages have been initialized while we were waiting for > > - * the lock, return true, as the zone was grown. The caller will retry > > - * this zone. We won't return to this function since the caller also > > - * has this static branch. > > - */ > > - if (!static_branch_unlikely(&deferred_pages)) { > > - pgdat_resize_unlock(pgdat, &flags); > > - return true; > > - } > > - > > /* > > * If someone grew this zone while we were waiting for spinlock, return > > * true, as there might be enough pages already. > > -- > > 2.17.1 > > > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: initialize deferred pages with interrupts enabled 2020-04-01 20:27 ` Pavel Tatashin @ 2020-04-02 7:34 ` Michal Hocko 0 siblings, 0 replies; 9+ messages in thread From: Michal Hocko @ 2020-04-02 7:34 UTC (permalink / raw) To: Pavel Tatashin Cc: LKML, Andrew Morton, linux-mm, Dan Williams, Shile Zhang, Daniel Jordan, Kirill Tkhai, David Hildenbrand, James Morris, Sasha Levin On Wed 01-04-20 16:27:33, Pavel Tatashin wrote: > On Wed, Apr 1, 2020 at 3:57 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Wed 01-04-20 15:32:38, Pavel Tatashin wrote: > > > Initializing struct pages is a long task and keeping interrupts disabled > > > for the duration of this operation introduces a number of problems. > > > > > > 1. jiffies are not updated for long period of time, and thus incorrect time > > > is reported. See proposed solution and discussion here: > > > lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com > > > > http://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com > > > > > 2. It prevents farther improving deferred page initialization by allowing > > > inter-node multi-threading. > > > > > > We are keeping interrupts disabled to solve a rather theoretical problem > > > that was never observed in real world (See 3a2d7fa8a3d5). > > > > > > Lets keep interrupts enabled. In case we ever encounter a scenario where > > > an interrupt thread wants to allocate large amount of memory this early in > > > boot we can deal with that by growing zone (see deferred_grow_zone()) by > > > the needed amount before starting deferred_init_memmap() threads. > > > > > > Before: > > > [ 1.232459] node 0 initialised, 12058412 pages in 1ms > > > > > > After: > > > [ 1.632580] node 0 initialised, 12051227 pages in 436ms > > > > > > > Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > > > I would much rather see pgdat_resize_lock completely out of both the > > allocator and deferred init path altogether but this can be done in a > > separate patch. This one looks slightly safer for stable backports. > > This is what I wanted to do, but after studying deferred_grow_zone(), > I do not see a simple way to solve this. It is one thing to fail an > allocation, and it is another thing to have a corruption because of > race. Let's discuss deferred_grow_zone after this all settles down. I still have to study it because I wasn't aware that this is actually a page allocator path relying on the resize lock. My recollection was that the resize lock is only about memory hotplug. Your patches flew by and I didn't have time to review them back then. So I have to admit I have seen the resize lock too simple. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: initialize deferred pages with interrupts enabled 2020-04-01 19:32 [PATCH] mm: initialize deferred pages with interrupts enabled Pavel Tatashin 2020-04-01 19:57 ` Michal Hocko @ 2020-04-01 19:58 ` Michal Hocko 2020-04-01 20:00 ` Daniel Jordan 2 siblings, 0 replies; 9+ messages in thread From: Michal Hocko @ 2020-04-01 19:58 UTC (permalink / raw) To: Pavel Tatashin Cc: linux-kernel, akpm, linux-mm, dan.j.williams, shile.zhang, daniel.m.jordan, ktkhai, david, jmorris, sashal, Vlastimil Babka btw. Cc Vlastimil On Wed 01-04-20 15:32:38, Pavel Tatashin wrote: > Initializing struct pages is a long task and keeping interrupts disabled > for the duration of this operation introduces a number of problems. > > 1. jiffies are not updated for long period of time, and thus incorrect time > is reported. See proposed solution and discussion here: > lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com > 2. It prevents farther improving deferred page initialization by allowing > inter-node multi-threading. > > We are keeping interrupts disabled to solve a rather theoretical problem > that was never observed in real world (See 3a2d7fa8a3d5). > > Lets keep interrupts enabled. In case we ever encounter a scenario where > an interrupt thread wants to allocate large amount of memory this early in > boot we can deal with that by growing zone (see deferred_grow_zone()) by > the needed amount before starting deferred_init_memmap() threads. > > Before: > [ 1.232459] node 0 initialised, 12058412 pages in 1ms > > After: > [ 1.632580] node 0 initialised, 12051227 pages in 436ms > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > --- > mm/page_alloc.c | 21 +++++++-------------- > 1 file changed, 7 insertions(+), 14 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c4eb750a199..4498a13b372d 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1792,6 +1792,13 @@ static int __init deferred_init_memmap(void *data) > BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); > pgdat->first_deferred_pfn = ULONG_MAX; > > + /* > + * Once we unlock here, the zone cannot be grown anymore, thus if an > + * interrupt thread must allocate this early in boot, zone must be > + * pre-grown prior to start of deferred page initialization. > + */ > + pgdat_resize_unlock(pgdat, &flags); > + > /* Only the highest zone is deferred so find it */ > for (zid = 0; zid < MAX_NR_ZONES; zid++) { > zone = pgdat->node_zones + zid; > @@ -1812,8 +1819,6 @@ static int __init deferred_init_memmap(void *data) > while (spfn < epfn) > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > zone_empty: > - pgdat_resize_unlock(pgdat, &flags); > - > /* Sanity check that the next zone really is unpopulated */ > WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); > > @@ -1854,18 +1859,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order) > return false; > > pgdat_resize_lock(pgdat, &flags); > - > - /* > - * If deferred pages have been initialized while we were waiting for > - * the lock, return true, as the zone was grown. The caller will retry > - * this zone. We won't return to this function since the caller also > - * has this static branch. > - */ > - if (!static_branch_unlikely(&deferred_pages)) { > - pgdat_resize_unlock(pgdat, &flags); > - return true; > - } > - > /* > * If someone grew this zone while we were waiting for spinlock, return > * true, as there might be enough pages already. > -- > 2.17.1 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: initialize deferred pages with interrupts enabled 2020-04-01 19:32 [PATCH] mm: initialize deferred pages with interrupts enabled Pavel Tatashin 2020-04-01 19:57 ` Michal Hocko 2020-04-01 19:58 ` Michal Hocko @ 2020-04-01 20:00 ` Daniel Jordan 2020-04-01 20:08 ` Daniel Jordan 2 siblings, 1 reply; 9+ messages in thread From: Daniel Jordan @ 2020-04-01 20:00 UTC (permalink / raw) To: Pavel Tatashin Cc: linux-kernel, akpm, mhocko, linux-mm, dan.j.williams, shile.zhang, daniel.m.jordan, ktkhai, david, jmorris, sashal On Wed, Apr 01, 2020 at 03:32:38PM -0400, Pavel Tatashin wrote: > Initializing struct pages is a long task and keeping interrupts disabled > for the duration of this operation introduces a number of problems. > > 1. jiffies are not updated for long period of time, and thus incorrect time > is reported. See proposed solution and discussion here: > lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com > 2. It prevents farther improving deferred page initialization by allowing not allowing > inter-node multi-threading. intra-node ... > After: > [ 1.632580] node 0 initialised, 12051227 pages in 436ms Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Freezing jiffies for a while during boot sounds like stable to me, so Cc: <stable@vger.kernel.org> [4.17.x+] Can you please add a comment to mmzone.h above node_size_lock, something like * Must be held any time you expect node_start_pfn, * node_present_pages, node_spanned_pages or nr_zones to stay constant. + * Also synchronizes pgdat->first_deferred_pfn during deferred page + * init. ... spinlock_t node_size_lock; > @@ -1854,18 +1859,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order) > return false; > > pgdat_resize_lock(pgdat, &flags); > - > - /* > - * If deferred pages have been initialized while we were waiting for > - * the lock, return true, as the zone was grown. The caller will retry > - * this zone. We won't return to this function since the caller also > - * has this static branch. > - */ > - if (!static_branch_unlikely(&deferred_pages)) { > - pgdat_resize_unlock(pgdat, &flags); > - return true; > - } > - Huh, looks like this wasn't needed even before this change. The rest looks fine. Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: initialize deferred pages with interrupts enabled 2020-04-01 20:00 ` Daniel Jordan @ 2020-04-01 20:08 ` Daniel Jordan 2020-04-01 20:31 ` Pavel Tatashin 2020-04-02 7:36 ` Michal Hocko 0 siblings, 2 replies; 9+ messages in thread From: Daniel Jordan @ 2020-04-01 20:08 UTC (permalink / raw) To: Pavel Tatashin Cc: linux-kernel, akpm, mhocko, linux-mm, dan.j.williams, shile.zhang, daniel.m.jordan, ktkhai, david, jmorris, sashal On Wed, Apr 01, 2020 at 04:00:27PM -0400, Daniel Jordan wrote: > On Wed, Apr 01, 2020 at 03:32:38PM -0400, Pavel Tatashin wrote: > > Initializing struct pages is a long task and keeping interrupts disabled > > for the duration of this operation introduces a number of problems. > > > > 1. jiffies are not updated for long period of time, and thus incorrect time > > is reported. See proposed solution and discussion here: > > lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com > > 2. It prevents farther improving deferred page initialization by allowing > > not allowing > > inter-node multi-threading. > > intra-node > > ... > > After: > > [ 1.632580] node 0 initialised, 12051227 pages in 436ms > > Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") > Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > Freezing jiffies for a while during boot sounds like stable to me, so > > Cc: <stable@vger.kernel.org> [4.17.x+] > > > Can you please add a comment to mmzone.h above node_size_lock, something like > > * Must be held any time you expect node_start_pfn, > * node_present_pages, node_spanned_pages or nr_zones to stay constant. > + * Also synchronizes pgdat->first_deferred_pfn during deferred page > + * init. > ... > spinlock_t node_size_lock; > > > @@ -1854,18 +1859,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order) > > return false; > > > > pgdat_resize_lock(pgdat, &flags); > > - > > - /* > > - * If deferred pages have been initialized while we were waiting for > > - * the lock, return true, as the zone was grown. The caller will retry > > - * this zone. We won't return to this function since the caller also > > - * has this static branch. > > - */ > > - if (!static_branch_unlikely(&deferred_pages)) { > > - pgdat_resize_unlock(pgdat, &flags); > > - return true; > > - } > > - > > Huh, looks like this wasn't needed even before this change. > > > The rest looks fine. > > Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> ...except for I forgot about the touch_nmi_watchdog() calls. I think you'd need something kind of like this before your patch. ---8<--- From: Daniel Jordan <daniel.m.jordan@oracle.com> Date: Fri, 27 Mar 2020 17:29:05 -0400 Subject: [PATCH] mm: call touch_nmi_watchdog() on max order boundaries in deferred init deferred_init_memmap() disables interrupts the entire time, so it calls touch_nmi_watchdog() periodically to avoid soft lockup splats. Soon it will run with interrupts enabled, at which point cond_resched() should be used instead. deferred_grow_zone() makes the same watchdog calls through code shared with deferred init but will continue to run with interrupts disabled, so it can't call cond_resched(). Pull the watchdog calls up to these two places to allow the first to be changed later, independently of the second. The frequency reduces from twice per pageblock (init and free) to once per max order block. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> --- mm/page_alloc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 212734c4f8b0..4cf18c534233 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1639,7 +1639,6 @@ static void __init deferred_free_pages(unsigned long pfn, } else if (!(pfn & nr_pgmask)) { deferred_free_range(pfn - nr_free, nr_free); nr_free = 1; - touch_nmi_watchdog(); } else { nr_free++; } @@ -1669,7 +1668,6 @@ static unsigned long __init deferred_init_pages(struct zone *zone, continue; } else if (!page || !(pfn & nr_pgmask)) { page = pfn_to_page(pfn); - touch_nmi_watchdog(); } else { page++; } @@ -1813,8 +1811,10 @@ static int __init deferred_init_memmap(void *data) * that we can avoid introducing any issues with the buddy * allocator. */ - while (spfn < epfn) + while (spfn < epfn) { nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + touch_nmi_watchdog(); + } zone_empty: pgdat_resize_unlock(pgdat, &flags); @@ -1908,6 +1908,7 @@ deferred_grow_zone_locked(pg_data_t *pgdat, struct zone *zone, first_deferred_pfn = spfn; nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + touch_nmi_watchdog(); /* We should only stop along section boundaries */ if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION) -- 2.25.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: initialize deferred pages with interrupts enabled 2020-04-01 20:08 ` Daniel Jordan @ 2020-04-01 20:31 ` Pavel Tatashin 2020-04-02 7:36 ` Michal Hocko 1 sibling, 0 replies; 9+ messages in thread From: Pavel Tatashin @ 2020-04-01 20:31 UTC (permalink / raw) To: Daniel Jordan Cc: LKML, Andrew Morton, Michal Hocko, linux-mm, Dan Williams, Shile Zhang, Kirill Tkhai, David Hildenbrand, James Morris, Sasha Levin On Wed, Apr 1, 2020 at 4:10 PM Daniel Jordan <daniel.m.jordan@oracle.com> wrote: > > On Wed, Apr 01, 2020 at 04:00:27PM -0400, Daniel Jordan wrote: > > On Wed, Apr 01, 2020 at 03:32:38PM -0400, Pavel Tatashin wrote: > > > Initializing struct pages is a long task and keeping interrupts disabled > > > for the duration of this operation introduces a number of problems. > > > > > > 1. jiffies are not updated for long period of time, and thus incorrect time > > > is reported. See proposed solution and discussion here: > > > lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com > > > 2. It prevents farther improving deferred page initialization by allowing > > > > not allowing > > > inter-node multi-threading. > > > > intra-node > > > > ... > > > After: > > > [ 1.632580] node 0 initialised, 12051227 pages in 436ms > > > > Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") > > Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> > > > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > > > Freezing jiffies for a while during boot sounds like stable to me, so > > > > Cc: <stable@vger.kernel.org> [4.17.x+] > > > > > > Can you please add a comment to mmzone.h above node_size_lock, something like > > > > * Must be held any time you expect node_start_pfn, > > * node_present_pages, node_spanned_pages or nr_zones to stay constant. > > + * Also synchronizes pgdat->first_deferred_pfn during deferred page > > + * init. > > ... > > spinlock_t node_size_lock; > > > > > @@ -1854,18 +1859,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order) > > > return false; > > > > > > pgdat_resize_lock(pgdat, &flags); > > > - > > > - /* > > > - * If deferred pages have been initialized while we were waiting for > > > - * the lock, return true, as the zone was grown. The caller will retry > > > - * this zone. We won't return to this function since the caller also > > > - * has this static branch. > > > - */ > > > - if (!static_branch_unlikely(&deferred_pages)) { > > > - pgdat_resize_unlock(pgdat, &flags); > > > - return true; > > > - } > > > - > > > > Huh, looks like this wasn't needed even before this change. > > > > > > The rest looks fine. > > > > Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> > > ...except for I forgot about the touch_nmi_watchdog() calls. I think you'd > need something kind of like this before your patch. Thank you for review. You are right, I will add your patch, and modify my to change touch_nmi_watchdog() to cond_resched(). Pasha ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] mm: initialize deferred pages with interrupts enabled 2020-04-01 20:08 ` Daniel Jordan 2020-04-01 20:31 ` Pavel Tatashin @ 2020-04-02 7:36 ` Michal Hocko 1 sibling, 0 replies; 9+ messages in thread From: Michal Hocko @ 2020-04-02 7:36 UTC (permalink / raw) To: Daniel Jordan Cc: Pavel Tatashin, linux-kernel, akpm, linux-mm, dan.j.williams, shile.zhang, ktkhai, david, jmorris, sashal On Wed 01-04-20 16:08:55, Daniel Jordan wrote: [...] > From: Daniel Jordan <daniel.m.jordan@oracle.com> > Date: Fri, 27 Mar 2020 17:29:05 -0400 > Subject: [PATCH] mm: call touch_nmi_watchdog() on max order boundaries in > deferred init > > deferred_init_memmap() disables interrupts the entire time, so it calls > touch_nmi_watchdog() periodically to avoid soft lockup splats. Soon it > will run with interrupts enabled, at which point cond_resched() should > be used instead. > > deferred_grow_zone() makes the same watchdog calls through code shared > with deferred init but will continue to run with interrupts disabled, so > it can't call cond_resched(). > > Pull the watchdog calls up to these two places to allow the first to be > changed later, independently of the second. The frequency reduces from > twice per pageblock (init and free) to once per max order block. This makes sense but I am not really sure this is necessary for the stable backport. > Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/page_alloc.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 212734c4f8b0..4cf18c534233 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1639,7 +1639,6 @@ static void __init deferred_free_pages(unsigned long pfn, > } else if (!(pfn & nr_pgmask)) { > deferred_free_range(pfn - nr_free, nr_free); > nr_free = 1; > - touch_nmi_watchdog(); > } else { > nr_free++; > } > @@ -1669,7 +1668,6 @@ static unsigned long __init deferred_init_pages(struct zone *zone, > continue; > } else if (!page || !(pfn & nr_pgmask)) { > page = pfn_to_page(pfn); > - touch_nmi_watchdog(); > } else { > page++; > } > @@ -1813,8 +1811,10 @@ static int __init deferred_init_memmap(void *data) > * that we can avoid introducing any issues with the buddy > * allocator. > */ > - while (spfn < epfn) > + while (spfn < epfn) { > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > + touch_nmi_watchdog(); > + } > zone_empty: > pgdat_resize_unlock(pgdat, &flags); > > @@ -1908,6 +1908,7 @@ deferred_grow_zone_locked(pg_data_t *pgdat, struct zone *zone, > first_deferred_pfn = spfn; > > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > + touch_nmi_watchdog(); > > /* We should only stop along section boundaries */ > if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION) > -- > 2.25.0 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-04-02 7:36 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-01 19:32 [PATCH] mm: initialize deferred pages with interrupts enabled Pavel Tatashin 2020-04-01 19:57 ` Michal Hocko 2020-04-01 20:27 ` Pavel Tatashin 2020-04-02 7:34 ` Michal Hocko 2020-04-01 19:58 ` Michal Hocko 2020-04-01 20:00 ` Daniel Jordan 2020-04-01 20:08 ` Daniel Jordan 2020-04-01 20:31 ` Pavel Tatashin 2020-04-02 7:36 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).