linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Michel Lespinasse <walken@google.com>
Cc: Zhouping Liu <zliu@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	hughd@google.com, Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [PATCH] mm: thp: Acquire the anon_vma rwsem for lock during split
Date: Mon, 7 Jan 2013 15:08:09 +0000	[thread overview]
Message-ID: <20130107150809.GJ3885@suse.de> (raw)
In-Reply-To: <CANN689E8S5mmszQoeaYgL_SYe1piBDTWCk-Gy1kxcg6hPfUPwA@mail.gmail.com>

On Fri, Jan 04, 2013 at 05:32:17PM -0800, Michel Lespinasse wrote:
> On Fri, Jan 4, 2013 at 6:08 AM, Mel Gorman <mgorman@suse.de> wrote:
> > Despite the reason for these commits, NUMA balancing is not the direct
> > source of the problem. split_huge_page() expected the anon_vma lock to be
> > exclusive to serialise the whole split operation. Ordinarily it is expected
> > that the anon_vma lock would only be required when updating the avcs but
> > THP also uses it. The locking requirements for THP are complex and there
> > is some overlap but broadly speaking they include the following
> >
> > 1. mmap_sem for read or write prevents THPs being created underneath
> > 2. anon_vma is taken for write if collapsing a huge page
> > 3. mm->page_table_lock should be taken when checking if pmd_trans_huge as
> >    split_huge_page can run in parallel
> > 4. wait_split_huge_page uses anon_vma taken for write mode to serialise
> >    against other THP operations
> > 5. compound_lock is used to serialise between
> >    __split_huge_page_refcount() and gup
> >
> > split_huge_page takes anon_vma for read but that does not serialise against
> > parallel split_huge_page operations on the same page (rule 2). One process
> > could be modifying the ref counts while the other modifies the page tables
> > leading to counters not being reliable. This patch takes the anon_vma
> > lock for write to serialise against parallel split_huge_page and parallel
> > collapse operations as it is the most fine-grained lock available that
> > protects against both.
> 
> Your comment about this being the most fine-grained lock made me
> think, couldn't we use lock_page() on the THP page here ?
> 
> Now I don't necessarily want to push you that direction, because I
> haven't fully thought it trough and because what you propose brings us
> closer to what happened before anon_vma became an rwlock, which is
> more obviously safe. But I felt I should still mention it, since we're
> really only trying to protect from concurrent operations on the same
> THP page, so locking at just that granularity would seem desirable.
> 

I considered this too because anon_vma locking is really coarse. The
coarse nature is not the only issue as depending on the anon_vma lock is
yet another obstacle to using THP for file-backed pages. I also did not
think about it fully but it felt that trying to convert to the page lock
would be problematic. Take reclaim;

shrink_page_list (trylock_page so we hold the page lock)
  -> add_to_swap
    -> split_huge_page (lock_page)

BANG, we recursively lock. During collapse it's also problematic because we
use the anon_vma lock to protect the whole operation but still depend on
the page lock to protect against LRU modifications. We take the page
lock before isolating the page from the LRU otherwise we'd have to
disable IRQs each time and that would ugly.

I did not think converting to the page lock woul dbe impossible but it
it non-trivial and it's not better if it means we have to take the LRU
lock frequently in khugepaged when collapsing to a THP.

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2013-01-07 15:08 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1621091901.34838094.1356409676820.JavaMail.root@redhat.com>
2012-12-25  4:38 ` kernel BUG at mm/huge_memory.c:1798! Zhouping Liu
2012-12-25 12:05   ` Hillf Danton
2012-12-26  2:55     ` Zhouping Liu
2012-12-27  0:31     ` Alexander Beregalov
2012-12-27 12:12       ` Hillf Danton
2012-12-27 16:08     ` Alex Xu
2012-12-29  7:22       ` Hillf Danton
2012-12-26 11:22   ` BUG: unable to handle kernel NULL pointer dereference at 0000000000000500 Zhouping Liu
2012-12-26 12:01     ` Hillf Danton
2012-12-26 13:24       ` Zhouping Liu
2012-12-26 15:14         ` Hillf Danton
2012-12-26 14:59     ` Zlatko Calusic
2012-12-27 14:58     ` Zlatko Calusic
2012-12-27 23:55       ` David R. Piegdon
2012-12-28  0:09         ` Zlatko Calusic
2012-12-28  2:45       ` Zhouping Liu
2012-12-28  2:48         ` Zhouping Liu
2012-12-28  9:01         ` Zhouping Liu
2012-12-28 13:43           ` Zlatko Calusic
2012-12-28 12:57         ` Zlatko Calusic
2013-01-03 17:57   ` kernel BUG at mm/huge_memory.c:1798! Mel Gorman
2013-01-04 14:08     ` [PATCH] mm: thp: Acquire the anon_vma rwsem for lock during split Mel Gorman
2013-01-04 21:28       ` Hugh Dickins
2013-01-07 14:36         ` Mel Gorman
2013-01-07 14:39         ` [PATCH] mm: thp: Acquire the anon_vma rwsem for write " Mel Gorman
2013-01-05  1:32       ` [PATCH] mm: thp: Acquire the anon_vma rwsem for lock " Michel Lespinasse
2013-01-05 12:24         ` Simon Jeons
2013-01-07 15:09           ` Mel Gorman
2013-01-07 15:08         ` Mel Gorman [this message]
2013-01-05  5:51       ` Zhouping Liu
2013-01-07 14:38         ` Mel Gorman
2013-01-05 12:21       ` Simon Jeons
2013-01-04 16:58     ` kernel BUG at mm/huge_memory.c:1798! Zhouping Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130107150809.GJ3885@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=hughd@google.com \
    --cc=jweiner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=riel@redhat.com \
    --cc=walken@google.com \
    --cc=zliu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).