linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	Michal Nazarewicz <mina86@mina86.com>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 1/1] mm: make start_isolate_page_range() fail if already isolated
Date: Fri, 2 Mar 2018 16:56:14 -0800	[thread overview]
Message-ID: <20180302165614.edb17a020964e9ea2f1797ca@linux-foundation.org> (raw)
In-Reply-To: <3887b37d-2bc0-1eff-9aec-6a99cc0715fb@oracle.com>

On Fri, 2 Mar 2018 16:38:33 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> On 03/02/2018 04:06 PM, Andrew Morton wrote:
> > On Mon, 26 Feb 2018 11:10:54 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > 
> >> start_isolate_page_range() is used to set the migrate type of a
> >> set of page blocks to MIGRATE_ISOLATE while attempting to start
> >> a migration operation.  It assumes that only one thread is
> >> calling it for the specified range.  This routine is used by
> >> CMA, memory hotplug and gigantic huge pages.  Each of these users
> >> synchronize access to the range within their subsystem.  However,
> >> two subsystems (CMA and gigantic huge pages for example) could
> >> attempt operations on the same range.  If this happens, page
> >> blocks may be incorrectly left marked as MIGRATE_ISOLATE and
> >> therefore not available for page allocation.
> >>
> >> Without 'locking code' there is no easy way to synchronize access
> >> to the range of page blocks passed to start_isolate_page_range.
> >> However, if two threads are working on the same set of page blocks
> >> one will stumble upon blocks set to MIGRATE_ISOLATE by the other.
> >> In such conditions, make the thread noticing MIGRATE_ISOLATE
> >> clean up as normal and return -EBUSY to the caller.
> >>
> >> This will allow start_isolate_page_range to serve as a
> >> synchronization mechanism and will allow for more general use
> >> of callers making use of these interfaces.  So, update comments
> >> in alloc_contig_range to reflect this new functionality.
> >>
> >> ...
> >>
> >> --- a/mm/page_isolation.c
> >> +++ b/mm/page_isolation.c
> >> @@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
> >>  
> >>  	spin_lock_irqsave(&zone->lock, flags);
> >>  
> >> +	/*
> >> +	 * We assume we are the only ones trying to isolate this block.
> >> +	 * If MIGRATE_ISOLATE already set, return -EBUSY
> >> +	 */
> >> +	if (is_migrate_isolate_page(page))
> >> +		goto out;
> >> +
> >>  	pfn = page_to_pfn(page);
> >>  	arg.start_pfn = pfn;
> >>  	arg.nr_pages = pageblock_nr_pages;
> > 
> > Seems a bit ugly and I'm not sure that it's correct.  If the loop in
> > start_isolate_page_range() gets partway through a number of pages then
> > we hit the race, start_isolate_page_range() will then go and "undo" the
> > work being done by the thread which it is racing against?
> 
> I agree that it is a bit ugly.  However, when a thread hits the above
> condition it will only undo what it has done.  Only one thread is able
> to set migrate state to isolate (under the zone lock).  So, a thread
> will only undo what it has done.

I don't get it.  That would make sense if start_isolate_page_range()
held zone->lock across the entire loop, but it doesn't do that.

> The exact problem of one thread undoing what another thread has done
> is possible with the code today and is what this patch is attempting
> to address.
> 
> > Even if that can't happen, blundering through a whole bunch of pages
> > then saying whoops then undoing everything is unpleasing.
> > 
> > Should we be looking at preventing these races at a higher level?
> 
> I could not immediately come up with a good idea here.  The zone lock
> would be the obvious choice, but I don't think we want to hold it while
> examining each of the page blocks.  Perhaps a new lock or semaphore
> associated with the zone?  I'm open to suggestions.

Yes, I think it would need a new lock.  Hopefully a mutex.

  reply	other threads:[~2018-03-03  0:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-26 19:10 [PATCH 0/1] make start_isolate_page_range() thread safe Mike Kravetz
2018-02-26 19:10 ` [PATCH 1/1] mm: make start_isolate_page_range() fail if already isolated Mike Kravetz
2018-03-03  0:06   ` Andrew Morton
2018-03-03  0:38     ` Mike Kravetz
2018-03-03  0:56       ` Andrew Morton [this message]
2018-03-03  1:39         ` Mike Kravetz
2018-03-06  0:57           ` Mike Kravetz
2018-03-06 22:32             ` Andrew Morton
2018-03-09 22:47 ` [PATCH v2] " Mike Kravetz
2018-03-13 21:14   ` Andrew Morton
2018-03-13 21:27     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180302165614.edb17a020964e9ea2f1797ca@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mina86@mina86.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).