linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm: fix tick timer stall during deferred page init
Date: Wed, 1 Apr 2020 17:42:17 +0200	[thread overview]
Message-ID: <20200401154217.GQ22681@dhcp22.suse.cz> (raw)
In-Reply-To: <20200311123848.118638-1-shile.zhang@linux.alibaba.com>

I am sorry but I have completely missed this patch.

On Wed 11-03-20 20:38:48, Shile Zhang wrote:
> When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will
> initialise the deferred pages with local interrupts disabled. It is
> introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while
> initializing deferred pages").
> 
> On machine with NCPUS <= 2, the 'pgdatinit' kthread could be bound to
> the boot CPU, which could caused the tick timer long time stall, system
> jiffies not be updated in time.
> 
> The dmesg shown that:
> 
>     [    0.197975] node 0 initialised, 32170688 pages in 1ms
> 
> Obviously, 1ms is unreasonable.
> 
> Now, fix it by restore in the pending interrupts for every 32*1204 pages
> (128MB) initialized, give the chance to update the systemd jiffies.
> The reasonable demsg shown likes:
> 
>     [    1.069306] node 0 initialised, 32203456 pages in 894ms
> 
> Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages").

I dislike this solution TBH. It effectivelly conserves the current code
and just works around the problem. Why do we hold the IRQ lock here in
the first place? This is an early init code and a very limited code is
running at this stage. Certainly nothing memory hotplug related which
should be the only path really interested in the resize lock AFAIR.

This needs a double checking but I strongly believe that the lock can be
simply dropped in this path.
 
> Co-developed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
> ---
>  mm/page_alloc.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..a3a47845e150 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1763,12 +1763,17 @@ deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn,
>  	return nr_pages;
>  }
>  
> +/*
> + * Release the pending interrupts for every TICK_PAGE_COUNT pages.
> + */
> +#define TICK_PAGE_COUNT	(32 * 1024)
> +
>  /* Initialise remaining memory on a node */
>  static int __init deferred_init_memmap(void *data)
>  {
>  	pg_data_t *pgdat = data;
>  	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
> -	unsigned long spfn = 0, epfn = 0, nr_pages = 0;
> +	unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0;
>  	unsigned long first_init_pfn, flags;
>  	unsigned long start = jiffies;
>  	struct zone *zone;
> @@ -1779,6 +1784,7 @@ static int __init deferred_init_memmap(void *data)
>  	if (!cpumask_empty(cpumask))
>  		set_cpus_allowed_ptr(current, cpumask);
>  
> +again:
>  	pgdat_resize_lock(pgdat, &flags);
>  	first_init_pfn = pgdat->first_deferred_pfn;
>  	if (first_init_pfn == ULONG_MAX) {
> @@ -1790,7 +1796,6 @@ static int __init deferred_init_memmap(void *data)
>  	/* Sanity check boundaries */
>  	BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
>  	BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat));
> -	pgdat->first_deferred_pfn = ULONG_MAX;
>  
>  	/* Only the highest zone is deferred so find it */
>  	for (zid = 0; zid < MAX_NR_ZONES; zid++) {
> @@ -1809,9 +1814,23 @@ static int __init deferred_init_memmap(void *data)
>  	 * that we can avoid introducing any issues with the buddy
>  	 * allocator.
>  	 */
> -	while (spfn < epfn)
> +	while (spfn < epfn) {
>  		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> +		/*
> +		 * Release the interrupts for every TICK_PAGE_COUNT pages
> +		 * (128MB) to give tick timer the chance to update the
> +		 * system jiffies.
> +		 */
> +		if ((nr_pages - prev_nr_pages) > TICK_PAGE_COUNT) {
> +			prev_nr_pages = nr_pages;
> +			pgdat->first_deferred_pfn = spfn;
> +			pgdat_resize_unlock(pgdat, &flags);
> +			goto again;
> +		}
> +	}
> +
>  zone_empty:
> +	pgdat->first_deferred_pfn = ULONG_MAX;
>  	pgdat_resize_unlock(pgdat, &flags);
>  
>  	/* Sanity check that the next zone really is unpopulated */
> -- 
> 2.24.0.rc2

-- 
Michal Hocko
SUSE Labs


  parent reply	other threads:[~2020-04-01 15:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-11 12:38 [PATCH v3] mm: fix tick timer stall during deferred page init Shile Zhang
2020-03-11 17:45 ` Pavel Tatashin
2020-03-11 20:16   ` Kirill Tkhai
2020-03-11 20:25     ` Pavel Tatashin
2020-03-19 19:05 ` Daniel Jordan
2020-03-26 18:58   ` Daniel Jordan
2020-03-26 19:36     ` Pavel Tatashin
2020-03-27  4:39       ` Shile Zhang
2020-03-28  0:17         ` Daniel Jordan
2020-04-01 14:26 ` David Hildenbrand
2020-04-01 15:42 ` Michal Hocko [this message]
2020-04-01 15:50   ` David Hildenbrand
2020-04-01 16:00     ` Michal Hocko
2020-04-01 16:09       ` Daniel Jordan
2020-04-01 16:12         ` Michal Hocko
2020-04-01 16:15           ` David Hildenbrand
2020-04-01 16:18           ` Daniel Jordan
2020-04-01 16:26             ` Michal Hocko
2020-04-01 16:41               ` Pavel Tatashin
2020-04-01 16:46                 ` Michal Hocko
2020-04-01 16:51                   ` Pavel Tatashin
2020-04-01 17:13                     ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200401154217.GQ22681@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=shile.zhang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).