All of lore.kernel.org
 help / color / mirror / Atom feed
From: Davidlohr Bueso <davidlohr.bueso@hp.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	Michal Hocko <mhocko@suse.cz>,
	"AneeshKumarK.V" <aneesh.kumar@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Eric B Munson <emunson@mgebm.net>,
	Anton Blanchard <anton@samba.org>
Subject: Re: [PATCH] hugepage: allow parallelization of the hugepage fault path
Date: Fri, 19 Jul 2013 14:24:15 -0700	[thread overview]
Message-ID: <1374269055.9305.19.camel@buesod1.americas.hpqcorp.net> (raw)
In-Reply-To: <20130719071432.GB19634@voom.fritz.box>

On Fri, 2013-07-19 at 17:14 +1000, David Gibson wrote:
> On Thu, Jul 18, 2013 at 05:42:35PM +0900, Joonsoo Kim wrote:
> > On Wed, Jul 17, 2013 at 12:50:25PM -0700, Davidlohr Bueso wrote:
> > > From: David Gibson <david@gibson.dropbear.id.au>
> > > 
> > > At present, the page fault path for hugepages is serialized by a
> > > single mutex. This is used to avoid spurious out-of-memory conditions
> > > when the hugepage pool is fully utilized (two processes or threads can
> > > race to instantiate the same mapping with the last hugepage from the
> > > pool, the race loser returning VM_FAULT_OOM).  This problem is
> > > specific to hugepages, because it is normal to want to use every
> > > single hugepage in the system - with normal pages we simply assume
> > > there will always be a few spare pages which can be used temporarily
> > > until the race is resolved.
> > > 
> > > Unfortunately this serialization also means that clearing of hugepages
> > > cannot be parallelized across multiple CPUs, which can lead to very
> > > long process startup times when using large numbers of hugepages.
> > > 
> > > This patch improves the situation by replacing the single mutex with a
> > > table of mutexes, selected based on a hash, which allows us to know
> > > which page in the file we're instantiating. For shared mappings, the
> > > hash key is selected based on the address space and file offset being faulted.
> > > Similarly, for private mappings, the mm and virtual address are used.
> > > 
> > 
> > Hello.
> > 
> > With this table mutex, we cannot protect region tracking structure.
> > See below comment.
> > 
> > /*
> >  * Region tracking -- allows tracking of reservations and instantiated pages
> >  *                    across the pages in a mapping.
> >  *
> >  * The region data structures are protected by a combination of the mmap_sem
> >  * and the hugetlb_instantion_mutex.  To access or modify a region the caller
> >  * must either hold the mmap_sem for write, or the mmap_sem for read and
> >  * the hugetlb_instantiation mutex:
> >  *
> >  *      down_write(&mm->mmap_sem);
> >  * or
> >  *      down_read(&mm->mmap_sem);
> >  *      mutex_lock(&hugetlb_instantiation_mutex);
> >  */
> 
> Ugh.  Who the hell added that.  I guess you'll need to split of
> another mutex for that purpose, afaict there should be no interaction
> with the actual, intended purpose of the instantiation mutex.

This was added in commit 84afd99b. One way to go would be to add a
spinlock to protect changes to the regions - however reading the
changelog, and based on David's previous explanation for the
instantiation mutex, I don't see why it was added. In fact several
places modify regions without holding the instantiation mutex, ie:
hugetlb_reserve_pages()

Am I missing something here?

Thanks,
Davidlohr





WARNING: multiple messages have this Message-ID (diff)
From: Davidlohr Bueso <davidlohr.bueso@hp.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	Michal Hocko <mhocko@suse.cz>,
	"AneeshKumarK.V" <aneesh.kumar@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Eric B Munson <emunson@mgebm.net>,
	Anton Blanchard <anton@samba.org>
Subject: Re: [PATCH] hugepage: allow parallelization of the hugepage fault path
Date: Fri, 19 Jul 2013 14:24:15 -0700	[thread overview]
Message-ID: <1374269055.9305.19.camel@buesod1.americas.hpqcorp.net> (raw)
In-Reply-To: <20130719071432.GB19634@voom.fritz.box>

On Fri, 2013-07-19 at 17:14 +1000, David Gibson wrote:
> On Thu, Jul 18, 2013 at 05:42:35PM +0900, Joonsoo Kim wrote:
> > On Wed, Jul 17, 2013 at 12:50:25PM -0700, Davidlohr Bueso wrote:
> > > From: David Gibson <david@gibson.dropbear.id.au>
> > > 
> > > At present, the page fault path for hugepages is serialized by a
> > > single mutex. This is used to avoid spurious out-of-memory conditions
> > > when the hugepage pool is fully utilized (two processes or threads can
> > > race to instantiate the same mapping with the last hugepage from the
> > > pool, the race loser returning VM_FAULT_OOM).  This problem is
> > > specific to hugepages, because it is normal to want to use every
> > > single hugepage in the system - with normal pages we simply assume
> > > there will always be a few spare pages which can be used temporarily
> > > until the race is resolved.
> > > 
> > > Unfortunately this serialization also means that clearing of hugepages
> > > cannot be parallelized across multiple CPUs, which can lead to very
> > > long process startup times when using large numbers of hugepages.
> > > 
> > > This patch improves the situation by replacing the single mutex with a
> > > table of mutexes, selected based on a hash, which allows us to know
> > > which page in the file we're instantiating. For shared mappings, the
> > > hash key is selected based on the address space and file offset being faulted.
> > > Similarly, for private mappings, the mm and virtual address are used.
> > > 
> > 
> > Hello.
> > 
> > With this table mutex, we cannot protect region tracking structure.
> > See below comment.
> > 
> > /*
> >  * Region tracking -- allows tracking of reservations and instantiated pages
> >  *                    across the pages in a mapping.
> >  *
> >  * The region data structures are protected by a combination of the mmap_sem
> >  * and the hugetlb_instantion_mutex.  To access or modify a region the caller
> >  * must either hold the mmap_sem for write, or the mmap_sem for read and
> >  * the hugetlb_instantiation mutex:
> >  *
> >  *      down_write(&mm->mmap_sem);
> >  * or
> >  *      down_read(&mm->mmap_sem);
> >  *      mutex_lock(&hugetlb_instantiation_mutex);
> >  */
> 
> Ugh.  Who the hell added that.  I guess you'll need to split of
> another mutex for that purpose, afaict there should be no interaction
> with the actual, intended purpose of the instantiation mutex.

This was added in commit 84afd99b. One way to go would be to add a
spinlock to protect changes to the regions - however reading the
changelog, and based on David's previous explanation for the
instantiation mutex, I don't see why it was added. In fact several
places modify regions without holding the instantiation mutex, ie:
hugetlb_reserve_pages()

Am I missing something here?

Thanks,
Davidlohr




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-07-19 21:24 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12 23:28 [PATCH] mm/hugetlb: per-vma instantiation mutexes Davidlohr Bueso
2013-07-12 23:28 ` Davidlohr Bueso
2013-07-13  0:54 ` Hugh Dickins
2013-07-13  0:54   ` Hugh Dickins
2013-07-15  3:16   ` Davidlohr Bueso
2013-07-15  3:16     ` Davidlohr Bueso
2013-07-15  7:24     ` David Gibson
2013-07-15 23:08       ` Andrew Morton
2013-07-15 23:08         ` Andrew Morton
2013-07-16  0:12         ` Davidlohr Bueso
2013-07-16  0:12           ` Davidlohr Bueso
2013-07-16  8:00           ` David Gibson
2013-07-17 19:50         ` [PATCH] hugepage: allow parallelization of the hugepage fault path Davidlohr Bueso
2013-07-17 19:50           ` Davidlohr Bueso
2013-07-18  8:42           ` Joonsoo Kim
2013-07-18  8:42             ` Joonsoo Kim
2013-07-19  7:14             ` David Gibson
2013-07-19 21:24               ` Davidlohr Bueso [this message]
2013-07-19 21:24                 ` Davidlohr Bueso
2013-07-22  0:59                 ` Joonsoo Kim
2013-07-22  0:59                   ` Joonsoo Kim
2013-07-18  9:07           ` Joonsoo Kim
2013-07-18  9:07             ` Joonsoo Kim
2013-07-19  0:19             ` Davidlohr Bueso
2013-07-19  0:19               ` Davidlohr Bueso
2013-07-19  0:35               ` Davidlohr Bueso
2013-07-19  0:35                 ` Davidlohr Bueso
2013-07-23  7:04             ` Hush Bensen
2013-07-23  7:04               ` Hush Bensen
2013-07-23  6:55           ` Hush Bensen
2013-07-23  6:55             ` Hush Bensen
2013-07-16  1:51       ` [PATCH] mm/hugetlb: per-vma instantiation mutexes Rik van Riel
2013-07-16  1:51         ` Rik van Riel
2013-07-16  5:34         ` Joonsoo Kim
2013-07-16  5:34           ` Joonsoo Kim
2013-07-16 10:01           ` David Gibson
2013-07-18  6:50             ` Joonsoo Kim
2013-07-18  6:50               ` Joonsoo Kim
2013-07-16  8:20         ` David Gibson
2013-07-15  4:18 ` Konstantin Khlebnikov
2013-07-15  4:18   ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1374269055.9305.19.camel@buesod1.americas.hpqcorp.net \
    --to=davidlohr.bueso@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=dhillf@gmail.com \
    --cc=emunson@mgebm.net \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.