linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Petr Holasek <pholasek@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Izik Eidus <izik.eidus@ravellosystems.com>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
Date: Fri, 8 Feb 2013 19:45:10 +0100	[thread overview]
Message-ID: <20130208194510.65fadd37@thinkpad.boeblingen.de.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1301251808120.29196@eggly.anvils>

On Fri, 25 Jan 2013 18:10:18 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:

> Complaints are rare, but lockdep still does not understand the way
> ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
> holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
> to be a problem because notifier callbacks are made under down_read
> of blocking_notifier_head->rwsem (so first the mutex is taken while
> holding the rwsem, then later the rwsem is taken while still holding
> the mutex); but is not in fact a problem because mem_hotplug_mutex
> is held throughout the dance.
> 
> There was an attempt to fix this with mutex_lock_nested(); but if that
> happened to fool lockdep two years ago, apparently it does so no
> longer.
> 
> I had hoped to eradicate this issue in extending KSM page migration
> not to need the ksm_thread_mutex.  But then realized that although
> the page migration itself is safe, we do still need to lock out ksmd
> and other users of get_ksm_page() while offlining memory - at some
> point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages
> themselves may vanish, and get_ksm_page()'s accesses to them become a
> violation.
> 
> So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE
> to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and
> wait_while_offlining() checks, to achieve the same lockout without
> being caught by lockdep. This is less elegant for KSM, but it's more
> important to keep lockdep useful to other users - and I apologize for
> how long it took to fix.

Thanks a lot for the patch! I verified that it fixes the lockdep warning
that we got on memory hotremove.

> 
> Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/ksm.c |   55 +++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> --- mmotm.orig/mm/ksm.c	2013-01-25 14:37:06.880206290 -0800
> +++ mmotm/mm/ksm.c	2013-01-25 14:38:53.984208836 -0800
> @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod
>  #define KSM_RUN_STOP	0
>  #define KSM_RUN_MERGE	1
>  #define KSM_RUN_UNMERGE	2
> -static unsigned int ksm_run = KSM_RUN_STOP;
> +#define KSM_RUN_OFFLINE	4
> +static unsigned long ksm_run = KSM_RUN_STOP;
> +static void wait_while_offlining(void);
> 
>  static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait);
>  static DEFINE_MUTEX(ksm_thread_mutex);
> @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing
> 
>  	while (!kthread_should_stop()) {
>  		mutex_lock(&ksm_thread_mutex);
> +		wait_while_offlining();
>  		if (ksmd_should_run())
>  			ksm_do_scan(ksm_thread_pages_to_scan);
>  		mutex_unlock(&ksm_thread_mutex);
> @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa
>  #endif /* CONFIG_MIGRATION */
> 
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> +static int just_wait(void *word)
> +{
> +	schedule();
> +	return 0;
> +}
> +
> +static void wait_while_offlining(void)
> +{
> +	while (ksm_run & KSM_RUN_OFFLINE) {
> +		mutex_unlock(&ksm_thread_mutex);
> +		wait_on_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE),
> +				just_wait, TASK_UNINTERRUPTIBLE);
> +		mutex_lock(&ksm_thread_mutex);
> +	}
> +}
> +
>  static void ksm_check_stable_tree(unsigned long start_pfn,
>  				  unsigned long end_pfn)
>  {
> @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
>  	switch (action) {
>  	case MEM_GOING_OFFLINE:
>  		/*
> -		 * Keep it very simple for now: just lock out ksmd
> and
> -		 * MADV_UNMERGEABLE while any memory is going
> offline.
> -		 * mutex_lock_nested() is necessary because lockdep
> was alarmed
> -		 * that here we take ksm_thread_mutex inside
> notifier chain
> -		 * mutex, and later take notifier chain mutex inside
> -		 * ksm_thread_mutex to unlock it.   But that's safe
> because both
> -		 * are inside mem_hotplug_mutex.
> +		 * Prevent ksm_do_scan(),
> unmerge_and_remove_all_rmap_items()
> +		 * and remove_all_stable_nodes() while memory is
> going offline:
> +		 * it is unsafe for them to touch the stable tree at
> this time.
> +		 * But unmerge_ksm_pages(), rmap lookups and other
> entry points
> +		 * which do not need the ksm_thread_mutex are all
> safe. */
> -		mutex_lock_nested(&ksm_thread_mutex,
> SINGLE_DEPTH_NESTING);
> +		mutex_lock(&ksm_thread_mutex);
> +		ksm_run |= KSM_RUN_OFFLINE;
> +		mutex_unlock(&ksm_thread_mutex);
>  		break;
> 
>  	case MEM_OFFLINE:
> @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no
>  		/* fallthrough */
> 
>  	case MEM_CANCEL_OFFLINE:
> +		mutex_lock(&ksm_thread_mutex);
> +		ksm_run &= ~KSM_RUN_OFFLINE;
>  		mutex_unlock(&ksm_thread_mutex);
> +
> +		smp_mb();	/* wake_up_bit advises this */
> +		wake_up_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE));
>  		break;
>  	}
>  	return NOTIFY_OK;
>  }
> +#else
> +static void wait_while_offlining(void)
> +{
> +}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> 
>  #ifdef CONFIG_SYSFS
> @@ -2189,7 +2217,7 @@ KSM_ATTR(pages_to_scan);
>  static ssize_t run_show(struct kobject *kobj, struct kobj_attribute
> *attr, char *buf)
>  {
> -	return sprintf(buf, "%u\n", ksm_run);
> +	return sprintf(buf, "%lu\n", ksm_run);
>  }
> 
>  static ssize_t run_store(struct kobject *kobj, struct kobj_attribute
> *attr, @@ -2212,6 +2240,7 @@ static ssize_t run_store(struct kobject
>  	 */
> 
>  	mutex_lock(&ksm_thread_mutex);
> +	wait_while_offlining();
>  	if (ksm_run != flags) {
>  		ksm_run = flags;
>  		if (flags & KSM_RUN_UNMERGE) {
> @@ -2254,6 +2283,7 @@ static ssize_t merge_across_nodes_store(
>  		return -EINVAL;
> 
>  	mutex_lock(&ksm_thread_mutex);
> +	wait_while_offlining();
>  	if (ksm_merge_across_nodes != knob) {
>  		if (ksm_pages_shared || remove_all_stable_nodes())
>  			err = -EBUSY;
> @@ -2366,10 +2396,7 @@ static int __init ksm_init(void)
>  #endif /* CONFIG_SYSFS */
> 
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> -	/*
> -	 * Choose a high priority since the callback takes
> ksm_thread_mutex:
> -	 * later callbacks could only be taking locks which nest
> within that.
> -	 */
> +	/* There is no significance to this priority 100 */
>  	hotplug_memory_notifier(ksm_memory_callback, 100);
>  #endif
>  	return 0;
> 


  parent reply	other threads:[~2013-02-08 18:45 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26  1:53 [PATCH 0/11] ksm: NUMA trees and page migration Hugh Dickins
2013-01-26  1:54 ` [PATCH 1/11] ksm: allow trees per NUMA node Hugh Dickins
2013-01-27  1:14   ` Simon Jeons
2013-01-27  2:54     ` Hugh Dickins
2013-01-27  3:16       ` Simon Jeons
2013-01-27 21:55         ` Hugh Dickins
2013-01-28 23:03   ` Andrew Morton
2013-01-29  1:17     ` Hugh Dickins
2013-01-28 23:08   ` Andrew Morton
2013-01-29  1:38     ` Hugh Dickins
2013-02-05 16:41   ` Mel Gorman
2013-02-07 23:57     ` Hugh Dickins
2013-01-26  1:56 ` [PATCH 2/11] ksm: add sysfs ABI Documentation Hugh Dickins
2013-01-26  1:58 ` [PATCH 3/11] ksm: trivial tidyups Hugh Dickins
2013-01-28 23:11   ` Andrew Morton
2013-01-29  1:44     ` Hugh Dickins
2013-01-26  1:59 ` [PATCH 4/11] ksm: reorganize ksm_check_stable_tree Hugh Dickins
2013-02-05 16:48   ` Mel Gorman
2013-02-08  0:07     ` Hugh Dickins
2013-02-14 11:30       ` Mel Gorman
2013-01-26  2:00 ` [PATCH 5/11] ksm: get_ksm_page locked Hugh Dickins
2013-01-27  2:36   ` Simon Jeons
2013-01-27 22:08     ` Hugh Dickins
2013-01-28  0:36       ` Simon Jeons
2013-01-28  3:35         ` Hugh Dickins
2013-01-27  2:48   ` Simon Jeons
2013-01-27 22:10     ` Hugh Dickins
2013-02-05 17:18   ` Mel Gorman
2013-02-08  0:33     ` Hugh Dickins
2013-02-14 11:34       ` Mel Gorman
2013-01-26  2:01 ` [PATCH 6/11] ksm: remove old stable nodes more thoroughly Hugh Dickins
2013-01-27  4:55   ` Simon Jeons
2013-01-27 23:05     ` Hugh Dickins
2013-01-28  1:42       ` Simon Jeons
2013-01-28  4:14         ` Hugh Dickins
2013-01-28  2:12   ` Simon Jeons
2013-01-28  4:19     ` Hugh Dickins
2013-01-28  6:36   ` Simon Jeons
2013-01-28 23:44   ` Andrew Morton
2013-01-29  2:03     ` Hugh Dickins
2013-02-05 17:55   ` Mel Gorman
2013-02-08 19:33     ` Hugh Dickins
2013-02-14 11:58       ` Mel Gorman
2013-02-14 22:19         ` Hugh Dickins
2013-01-26  2:03 ` [PATCH 7/11] ksm: make KSM page migration possible Hugh Dickins
2013-01-27  5:47   ` Simon Jeons
2013-01-27 23:12     ` Hugh Dickins
2013-01-28  0:41       ` Simon Jeons
2013-01-28  3:44         ` Hugh Dickins
2013-02-05 19:11   ` Mel Gorman
2013-02-08 20:52     ` Hugh Dickins
2013-01-26  2:05 ` [PATCH 8/11] ksm: make !merge_across_nodes migration safe Hugh Dickins
2013-01-27  8:49   ` Simon Jeons
2013-01-27 23:25     ` Hugh Dickins
2013-01-28  3:44   ` Simon Jeons
2013-01-26  2:06 ` [PATCH 9/11] ksm: enable KSM page migration Hugh Dickins
2013-01-26  2:07 ` [PATCH 10/11] mm: remove offlining arg to migrate_pages Hugh Dickins
2013-01-26  2:10 ` [PATCH 11/11] ksm: stop hotremove lockdep warning Hugh Dickins
2013-01-27  6:23   ` Simon Jeons
2013-01-27 23:35     ` Hugh Dickins
2013-02-08 18:45   ` Gerald Schaefer [this message]
2013-02-11 22:13     ` Hugh Dickins
2013-01-28 23:54 ` [PATCH 0/11] ksm: NUMA trees and page migration Andrew Morton
2013-01-29  0:49   ` Izik Eidus
2013-01-29  2:26     ` Izik Eidus
2013-01-29 16:51       ` Andrea Arcangeli
2013-01-31  0:05         ` Ric Mason
2013-01-29  1:07   ` Hugh Dickins
2013-01-29 10:45     ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130208194510.65fadd37@thinkpad.boeblingen.de.com \
    --to=gerald.schaefer@de.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=izik.eidus@ravellosystems.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pholasek@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).