All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michel Lespinasse <walken@google.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <jweiner@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Shaohua Li <shaohua.li@intel.com>
Subject: Re: [RFC PATCH 0/3] page count lock for simpler put_page
Date: Fri, 12 Aug 2011 15:35:25 -0700	[thread overview]
Message-ID: <CANN689HQjNUDHWXn9PuvHxP0A-6_ypsW=jdt=UvnMr8M0xm-WA@mail.gmail.com> (raw)
In-Reply-To: <CAEwNFnA3o_9SJZEeSg5azO45-E3HhEcSnww0GarN5m6n-qL_Vw@mail.gmail.com>

On Tue, Aug 9, 2011 at 3:22 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Tue, Aug 9, 2011 at 8:04 PM, Michel Lespinasse <walken@google.com> wrote:
>> On Sun, Aug 7, 2011 at 7:25 AM, Minchan Kim <minchan.kim@gmail.com> wrote:
>>> On Thu, Aug 04, 2011 at 11:39:19PM -0700, Michel Lespinasse wrote:
>>>> I can see that the page_cache_get_speculative comment in
>>>> include/linux/pagemap.h maps out one way to prevent the issue. If
>>>> thread T continually held an rcu read lock from the time it finds the
>>>> pointer to P until the time it calls get_page_unless_zero on that
>>>> page, AND there was a synchronize_rcu() call somewhere between the
>>>> time a THP page gets allocated and the time __split_huge_page_refcount
>>>> might first get called on that page, then things would be safe.
>>>> However, that does not seem to be true today: I could not find a
>>>> synchronize_rcu() call before __split_huge_page_refcount(), AND there
>>>> are also places (such as deactivate_page() for example) that call
>>>> get_page_unless_zero without being within an rcu read locked section
>>>> (or holding the zone lru lock to provide exclusion against
>>>> __split_huge_page_refcount).
>>
>> Going forward, I can see several possible solutions:
>> - Use my proposed page count lock in order to avoid the race. One
>> would have to convert all get_page_unless_zero() sites to use it. I
>> expect the cost would be low but still measurable.
>
> It's not necessary to apply it on *all* get_page_unless_zero sites.
> Because deactivate_page does it on file pages while THP handles only anon pages.
> So the race should not a problem.

But it doesn't matter what kind of page the get_page_unless_zero call
site hopes to get a reference on - if it doesn't already hold a
reference on the page (either directly as a reference, or if a known
mapping points to that page and the page table lock is taken or
interrupts are disabled in order to guarantee the mapping won't get
yanked), then the page can get yanked and a THP page could show up
there before the call site gets a reference.

>> - Protect all get_page_unless_zero call sites with rcu read lock or
>> lru lock (page_cache_get_speculative already has it, but there are
>> others to consider), and add a synchronize_rcu() before splitting huge
>> pages.
>
> I think it can't be a solution.
> If we don't have any lock for protect write-side, page_count could be
> unstable again while we peek page->count in
> __split_huge_page_refcount after calling synchronize_rcu.
> Do I miss something?

The tail page count would be unstable for at most one rcu grace period
after the page got allocated. This is guaranteed by making all
get_page_unless_zero call sites make sure they somehow determine the
page is not a THP tail page (for example because they found it in
radix tree) before calling get_page_unless_zero and having an rcu read
lock wrapping these two together. This is basically the protocol
described in the comment for page_cache_get_speculative() in pagemap.h

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-08-12 22:35 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-04 21:07 [RFC PATCH 0/3] page count lock for simpler put_page Michel Lespinasse
2011-08-04 21:07 ` [RFC PATCH 1/3] mm: Replace naked page->_count accesses with accessor functions Michel Lespinasse
2011-08-04 21:07 ` [RFC PATCH 2/3] mm: page count lock Michel Lespinasse
2011-08-07 14:00   ` Minchan Kim
2011-08-04 21:07 ` [RFC PATCH 3/3] mm: get_first_page_unless_zero() Michel Lespinasse
2011-08-07 14:13   ` Minchan Kim
2011-08-05  6:39 ` [RFC PATCH 0/3] page count lock for simpler put_page Michel Lespinasse
2011-08-07 14:25   ` Minchan Kim
2011-08-09 11:04     ` Michel Lespinasse
2011-08-09 22:22       ` Minchan Kim
2011-08-12 22:35         ` Michel Lespinasse [this message]
2011-08-13  4:07           ` Minchan Kim
2011-08-12 15:36       ` Andrea Arcangeli
2011-08-12 16:08         ` SPAM: " Paul E. McKenney
2011-08-12 16:43           ` Andrea Arcangeli
2011-08-12 17:27             ` Paul E. McKenney
2011-08-12 23:45               ` Michel Lespinasse
2011-08-13  1:57                 ` Paul E. McKenney
2011-08-13 23:56                   ` Andrea Arcangeli
2011-08-13  4:18             ` Minchan Kim
2011-08-12 16:57           ` Johannes Weiner
2011-08-12 17:08             ` Andrea Arcangeli
2011-08-12 17:52               ` Johannes Weiner
2011-08-12 18:13                 ` Paul E. McKenney
2011-08-12 19:05                   ` Johannes Weiner
2011-08-12 22:14                     ` Paul E. McKenney
2011-08-12 22:22                 ` Andrea Arcangeli
2011-08-12 18:03               ` Paul E. McKenney
2011-08-12 17:41             ` Paul E. McKenney
2011-08-12 17:56               ` Johannes Weiner
2011-08-12 23:02           ` Michel Lespinasse
2011-08-12 22:50         ` Michel Lespinasse
2011-08-13  4:11         ` Minchan Kim
2011-08-12 16:58   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANN689HQjNUDHWXn9PuvHxP0A-6_ypsW=jdt=UvnMr8M0xm-WA@mail.gmail.com' \
    --to=walken@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jweiner@redhat.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.