linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Luiz Capitulino <lcapitulino@redhat.com>,
	Michal Nazarewicz <mina86@mina86.com>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2] mm: make start_isolate_page_range() fail if already isolated
Date: Tue, 13 Mar 2018 14:27:48 -0700	[thread overview]
Message-ID: <182227bf-3fa1-e8dd-1bd2-b2530f5bd0e8@oracle.com> (raw)
In-Reply-To: <20180313141454.f3ad61c301c299ab6f81aae0@linux-foundation.org>

On 03/13/2018 02:14 PM, Andrew Morton wrote:
> On Fri,  9 Mar 2018 14:47:31 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> start_isolate_page_range() is used to set the migrate type of a
>> set of pageblocks to MIGRATE_ISOLATE while attempting to start
>> a migration operation.  It assumes that only one thread is
>> calling it for the specified range.  This routine is used by
>> CMA, memory hotplug and gigantic huge pages.  Each of these users
>> synchronize access to the range within their subsystem.  However,
>> two subsystems (CMA and gigantic huge pages for example) could
>> attempt operations on the same range.  If this happens, one thread
>> may 'undo' the work another thread is doing.  This can result in
>> pageblocks being incorrectly left marked as MIGRATE_ISOLATE and
>> therefore not available for page allocation.
>>
>> What is ideally needed is a way to synchronize access to a set
>> of pageblocks that are undergoing isolation and migration.  The
>> only thing we know about these pageblocks is that they are all
>> in the same zone.  A per-node mutex is too coarse as we want to
>> allow multiple operations on different ranges within the same zone
>> concurrently.  Instead, we will use the migration type of the
>> pageblocks themselves as a form of synchronization.
>>
>> start_isolate_page_range sets the migration type on a set of page-
>> blocks going in order from the one associated with the smallest
>> pfn to the largest pfn.  The zone lock is acquired to check and
>> set the migration type.  When going through the list of pageblocks
>> check if MIGRATE_ISOLATE is already set.  If so, this indicates
>> another thread is working on this pageblock.  We know exactly
>> which pageblocks we set, so clean up by undo those and return
>> -EBUSY.
>>
>> This allows start_isolate_page_range to serve as a synchronization
>> mechanism and will allow for more general use of callers making
>> use of these interfaces.  Update comments in alloc_contig_range
>> to reflect this new functionality.
>>
>> ...
>>
>> + * There is no high level synchronization mechanism that prevents two threads
>> + * from trying to isolate overlapping ranges.  If this happens, one thread
>> + * will notice pageblocks in the overlapping range already set to isolate.
>> + * This happens in set_migratetype_isolate, and set_migratetype_isolate
>> + * returns an error.  We then clean up by restoring the migration type on
>> + * pageblocks we may have modified and return -EBUSY to caller.  This
>> + * prevents two threads from simultaneously working on overlapping ranges.
>>   */
> 
> Well I can kinda visualize how this works, with two CPUs chewing away
> at two overlapping blocks of pfns, possibly with different starting
> pfns.  And I can't immediately see any holes in it, apart from possible
> memory ordering issues.  What guarantee is there that CPU1 will see
> CPU2's writes in the order in which CPU2 performed them?  And what
> guarantee is there that CPU1 will see CPU2's writes in a sequential
> manner?  If four of CPU2's writes get written back in a single atomic
> flush, what will CPU1 make of that?
> 

Each CPU holds the associated zone lock to modify or examine the migration
type of a pageblock.  And, it will only examine/update a single pageblock
per lock acquire/release cycle.

-- 
Mike Kravetz

      reply	other threads:[~2018-03-13 21:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-26 19:10 [PATCH 0/1] make start_isolate_page_range() thread safe Mike Kravetz
2018-02-26 19:10 ` [PATCH 1/1] mm: make start_isolate_page_range() fail if already isolated Mike Kravetz
2018-03-03  0:06   ` Andrew Morton
2018-03-03  0:38     ` Mike Kravetz
2018-03-03  0:56       ` Andrew Morton
2018-03-03  1:39         ` Mike Kravetz
2018-03-06  0:57           ` Mike Kravetz
2018-03-06 22:32             ` Andrew Morton
2018-03-09 22:47 ` [PATCH v2] " Mike Kravetz
2018-03-13 21:14   ` Andrew Morton
2018-03-13 21:27     ` Mike Kravetz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=182227bf-3fa1-e8dd-1bd2-b2530f5bd0e8@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mina86@mina86.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).