linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: Shile Zhang <shile.zhang@linux.alibaba.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND] mm: fix tick_sched timer blocked by pgdat_resize_lock
Date: Fri, 10 Jan 2020 14:42:02 +0300	[thread overview]
Message-ID: <1ee6088c-9e72-8824-3a9a-fc099d196faf@virtuozzo.com> (raw)
In-Reply-To: <20200110093053.34777-1-shile.zhang@linux.alibaba.com>

On 10.01.2020 12:30, Shile Zhang wrote:
> When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdat_resize_lock'
> will be called inside 'pgdatinit' kthread to initialise the deferred
> pages with local interrupts disabled. Which is introduced by
> commit 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred
> pages").
> 
> But 'pgdatinit' kthread is possible be pined on the boot CPU (CPU#0 by
> default), especially in small system with NRCPUS <= 2. In this case, the
> interrupts are disabled on boot CPU during memory initialising, which
> caused the tick_sched timer be blocked, leading to wall clock stuck.
> 
> Fixes: commit 3a2d7fa8a3d5 ("mm: disable interrupts while initializing
> deferred pages")
> 
> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
> ---
>  include/linux/memory_hotplug.h | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index ba0dca6aac6e..be69a6dc4fee 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -6,6 +6,8 @@
>  #include <linux/spinlock.h>
>  #include <linux/notifier.h>
>  #include <linux/bug.h>
> +#include <linux/sched.h>
> +#include <linux/smp.h>
>  
>  struct page;
>  struct zone;
> @@ -282,12 +284,22 @@ static inline bool movable_node_is_enabled(void)
>  static inline
>  void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags)
>  {
> -	spin_lock_irqsave(&pgdat->node_size_lock, *flags);
> +	/*
> +	 * Disable local interrupts on boot CPU will stop the tick_sched
> +	 * timer, which will block jiffies(wall clock) update.
> +	 */
> +	if (current->cpu != get_boot_cpu_id())
> +		spin_lock_irqsave(&pgdat->node_size_lock, *flags);
> +	else
> +		spin_lock(&pgdat->node_size_lock);
>  }
>  static inline
>  void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags)
>  {
> -	spin_unlock_irqrestore(&pgdat->node_size_lock, *flags);
> +	if (current->cpu != get_boot_cpu_id())
> +		spin_unlock_irqrestore(&pgdat->node_size_lock, *flags);
> +	else
> +		spin_unlock(&pgdat->node_size_lock);
>  }
>  static inline
>  void pgdat_resize_init(struct pglist_data *pgdat)

1)Linux kernel is *preemptible*. Kernel with CONFIG_PREEMPT_RT option even may preempt
*kernel* code in the middle of function. When you are executing a code containing
pgdat_resize_lock() and pgdat_resize_unlock(), the process may migrate to another cpu
between them.

bool cpu               another cpu
----------------------------------
pgdat_resize_lock()
  spin_lock()
  --> migrate to another cpu
                      pgdat_resize_unlock()
                      spin_unlock_irqrestore(<uninitialized flags>)

(Yes, in case of CONFIG_PREEMPT_RT, process is preemptible even after spin_lock() call).

This looks like a bad helpers, and we should not introduce such the design.

2)I think there is no the problem this patch solves. Do we really this statistics?
Can't we simple remove print message from deferred_init_memmap() and solve this?

Also, you may try to check that sched_clock() gives better results with interrupts
disabled (on x86 it uses rdtsc, when it's possible. But it also may fallback to
jiffies-based clock in some hardware cases, and they also won't go with interrupts
disabled).

Kirill

  reply	other threads:[~2020-01-10 11:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-10  8:25 [PATCH 0/1] try to fix tick_sched timer stuck issue Shile Zhang
2020-01-10  8:25 ` [PATCH 1/1] mm: fix tick_sched timer blocked by pgdat_resize_lock Shile Zhang
2020-01-10  9:30   ` [PATCH RESEND] " Shile Zhang
2020-01-10 11:42     ` Kirill Tkhai [this message]
2020-01-13  0:54       ` Shile Zhang
2020-01-13  8:11         ` Kirill Tkhai
2020-01-14  8:54           ` Shile Zhang
2020-01-15  9:45             ` Kirill Tkhai
2020-02-04  7:24               ` Shile Zhang
2020-02-10  5:45                 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ee6088c-9e72-8824-3a9a-fc099d196faf@virtuozzo.com \
    --to=ktkhai@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=shile.zhang@linux.alibaba.com \
    --subject='Re: [PATCH RESEND] mm: fix tick_sched timer blocked by pgdat_resize_lock' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).