All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@csn.ul.ie>, Wu Fengguang <fengguang.wu@intel.com>,
	"Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/9] Hugepage migration (v2)
Date: Tue, 17 Aug 2010 17:18:17 +0900	[thread overview]
Message-ID: <20100817081817.GA28969@spritzera.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20100817023719.GC12736@spritzera.linux.bs1.fc.nec.co.jp>

On Tue, Aug 17, 2010 at 11:37:19AM +0900, Naoya Horiguchi wrote:
> On Mon, Aug 16, 2010 at 07:19:58AM -0500, Christoph Lameter wrote:
> > On Mon, 16 Aug 2010, Naoya Horiguchi wrote:
> > 
> > > In my understanding, in current code "other processors increasing refcount
> > > during migration" can happen both in non-hugepage direct I/O and in hugepage
> > > direct I/O in the similar way (i.e. get_user_pages_fast() from dio_refill_pages()).
> > > So I think there is no specific problem to hugepage.
> > > Or am I missing your point?
> > 
> > With a single page there is the check of the refcount during migration
> > after all the references have been removed (at that point the page is no
> > longer mapped by any process and direct iO can no longer be
> > initiated without a page fault.
> 
> The same checking mechanism works for hugeapge.

So, my previous comment below was not correct:

>>> This patch only handles migration under direct I/O.
>>> For the opposite (direct I/O under migration) it's not true.
>>> I wrote additional patches (later I'll reply to this email)
>>> for solving locking problem. Could you review them?

The hugepage migration patchset should work fine without the
additional page locking patch.
Please ignore the additional page locking patch-set
and review the hugepage migration patch-set only.
Sorry for confusion.

I explain below why the page lock in direct I/O is not needed to avoid
race with migration. This is true for both hugepage and non-huge page.

Race between page migration and direct I/O is in essense the one between
try_to_unmap() in unmap_and_move() and get_user_pages_fast() in dio_get_page().

When try_to_unmap() is called before get_user_pages_fast(),
all ptes pointing to the page to be migrated are replaced to migration
swap entries, so direct I/O code experiences page fault.
In the page fault, the kernel finds migration swap entry and waits the page lock
(which was held by migration code before try_to_unmap()) to be unlocked
in migration_entry_wait(), so direct I/O blocks until migration completes.

When get_user_pages_fast() is called before try_to_unmap(),
direct I/O code increments refcount on the target page.
Because this refcount is not associated to the mapping,
migration code will find remaining refcounts after try_to_unmap()
unmaps all mappings. Then refcount check decides migration to fail,
so direct I/O is continued safely.

Thanks,
Naoya Horiguchi


WARNING: multiple messages have this Message-ID (diff)
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mel@csn.ul.ie>, Wu Fengguang <fengguang.wu@intel.com>,
	Jun'ichi Nomura <j-nomura@ce.jp.nec.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/9] Hugepage migration (v2)
Date: Tue, 17 Aug 2010 17:18:17 +0900	[thread overview]
Message-ID: <20100817081817.GA28969@spritzera.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20100817023719.GC12736@spritzera.linux.bs1.fc.nec.co.jp>

On Tue, Aug 17, 2010 at 11:37:19AM +0900, Naoya Horiguchi wrote:
> On Mon, Aug 16, 2010 at 07:19:58AM -0500, Christoph Lameter wrote:
> > On Mon, 16 Aug 2010, Naoya Horiguchi wrote:
> > 
> > > In my understanding, in current code "other processors increasing refcount
> > > during migration" can happen both in non-hugepage direct I/O and in hugepage
> > > direct I/O in the similar way (i.e. get_user_pages_fast() from dio_refill_pages()).
> > > So I think there is no specific problem to hugepage.
> > > Or am I missing your point?
> > 
> > With a single page there is the check of the refcount during migration
> > after all the references have been removed (at that point the page is no
> > longer mapped by any process and direct iO can no longer be
> > initiated without a page fault.
> 
> The same checking mechanism works for hugeapge.

So, my previous comment below was not correct:

>>> This patch only handles migration under direct I/O.
>>> For the opposite (direct I/O under migration) it's not true.
>>> I wrote additional patches (later I'll reply to this email)
>>> for solving locking problem. Could you review them?

The hugepage migration patchset should work fine without the
additional page locking patch.
Please ignore the additional page locking patch-set
and review the hugepage migration patch-set only.
Sorry for confusion.

I explain below why the page lock in direct I/O is not needed to avoid
race with migration. This is true for both hugepage and non-huge page.

Race between page migration and direct I/O is in essense the one between
try_to_unmap() in unmap_and_move() and get_user_pages_fast() in dio_get_page().

When try_to_unmap() is called before get_user_pages_fast(),
all ptes pointing to the page to be migrated are replaced to migration
swap entries, so direct I/O code experiences page fault.
In the page fault, the kernel finds migration swap entry and waits the page lock
(which was held by migration code before try_to_unmap()) to be unlocked
in migration_entry_wait(), so direct I/O blocks until migration completes.

When get_user_pages_fast() is called before try_to_unmap(),
direct I/O code increments refcount on the target page.
Because this refcount is not associated to the mapping,
migration code will find remaining refcounts after try_to_unmap()
unmaps all mappings. Then refcount check decides migration to fail,
so direct I/O is continued safely.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-08-17  8:20 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-10  9:27 [PATCH 0/9] Hugepage migration (v2) Naoya Horiguchi
2010-08-10  9:27 ` Naoya Horiguchi
2010-08-10  9:27 ` [PATCH 1/9] HWPOISON, hugetlb: move PG_HWPoison bit check Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-18  0:18   ` Wu Fengguang
2010-08-18  0:18     ` Wu Fengguang
2010-08-19  7:55     ` Naoya Horiguchi
2010-08-19  7:55       ` Naoya Horiguchi
2010-08-19  9:28       ` Wu Fengguang
2010-08-19  9:28         ` Wu Fengguang
2010-08-23  9:24         ` Naoya Horiguchi
2010-08-23  9:24           ` Naoya Horiguchi
2010-08-10  9:27 ` [PATCH 2/9] hugetlb: add allocate function for hugepage migration Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-17  6:51   ` David Rientjes
2010-08-17  6:51     ` David Rientjes
2010-08-18  3:02     ` Naoya Horiguchi
2010-08-18  3:02       ` Naoya Horiguchi
2010-08-10  9:27 ` [PATCH 3/9] hugetlb: rename hugepage allocation functions Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-10  9:27 ` [PATCH 4/9] hugetlb: redefine hugepage copy functions Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-10  9:27 ` [PATCH 5/9] hugetlb: hugepage migration core Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-10  9:27 ` [PATCH 6/9] HWPOISON, hugetlb: soft offlining for hugepage Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-10  9:27 ` [PATCH 7/9] HWPOISON, hugetlb: fix unpoison " Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-10  9:27 ` [PATCH 8/9] page-types.c: fix name of unpoison interface Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-19  1:24   ` Wu Fengguang
2010-08-19  1:24     ` Wu Fengguang
2010-08-10  9:27 ` [PATCH 9/9] hugetlb: add corrupted hugepage counter Naoya Horiguchi
2010-08-10  9:27   ` Naoya Horiguchi
2010-08-19  1:57   ` Wu Fengguang
2010-08-19  1:57     ` Wu Fengguang
2010-08-24  3:01     ` Naoya Horiguchi
2010-08-24  3:01       ` Naoya Horiguchi
2010-08-24  3:08       ` Wu Fengguang
2010-08-24  3:08         ` Wu Fengguang
2010-08-11 13:09 ` [PATCH 0/9] Hugepage migration (v2) Christoph Lameter
2010-08-11 13:09   ` Christoph Lameter
2010-08-12  7:53   ` Naoya Horiguchi
2010-08-12  7:53     ` Naoya Horiguchi
2010-08-12  7:57     ` [RFC] [PATCH 1/4] hugetlb: prepare exclusion control functions for hugepage Naoya Horiguchi
2010-08-12  7:57       ` Naoya Horiguchi
2010-08-12  7:59     ` [RFC] [PATCH 2/4] dio: add page locking for direct I/O Naoya Horiguchi
2010-08-12  7:59       ` Naoya Horiguchi
2010-08-12 13:42       ` Jeff Moyer
2010-08-12 13:42         ` Jeff Moyer
2010-08-16  2:07         ` Naoya Horiguchi
2010-08-16  2:07           ` Naoya Horiguchi
2010-08-16  7:21           ` Andi Kleen
2010-08-16  7:21             ` Andi Kleen
2010-08-16 13:20           ` Jeff Moyer
2010-08-16 13:20             ` Jeff Moyer
2010-08-17  8:17             ` Naoya Horiguchi
2010-08-17  8:17               ` Naoya Horiguchi
2010-08-17 13:46               ` Jeff Moyer
2010-08-17 13:46                 ` Jeff Moyer
2010-08-17 14:21                 ` Andi Kleen
2010-08-17 14:21                   ` Andi Kleen
2010-08-17 16:41                   ` Christoph Lameter
2010-08-17 16:41                     ` Christoph Lameter
2010-08-12  8:00     ` [PATCH 3/4] HWPOISON: replace locking functions into hugepage variants Naoya Horiguchi
2010-08-12  8:00       ` Naoya Horiguchi
2010-08-12  8:00     ` [PATCH 4/4] correct locking functions of hugepage migration routine Naoya Horiguchi
2010-08-12  8:00       ` Naoya Horiguchi
2010-08-13 12:47     ` [PATCH 0/9] Hugepage migration (v2) Christoph Lameter
2010-08-13 12:47       ` Christoph Lameter
2010-08-16  9:19       ` Naoya Horiguchi
2010-08-16  9:19         ` Naoya Horiguchi
2010-08-16 12:19         ` Christoph Lameter
2010-08-16 12:19           ` Christoph Lameter
2010-08-17  2:37           ` Naoya Horiguchi
2010-08-17  2:37             ` Naoya Horiguchi
2010-08-17  8:18             ` Naoya Horiguchi [this message]
2010-08-17  8:18               ` Naoya Horiguchi
2010-08-17  9:40               ` Andi Kleen
2010-08-17  9:40                 ` Andi Kleen
2010-08-18  7:32                 ` Naoya Horiguchi
2010-08-18  7:32                   ` Naoya Horiguchi
2010-08-18  7:46                   ` Andi Kleen
2010-08-18  7:46                     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100817081817.GA28969@spritzera.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=cl@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.