linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan.kim@gmail.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Nick Piggin <npiggin@suse.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Avi Kivity <avi@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
	Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	akpm@linux-foundation.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	David Miller <davem@davemloft.net>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Mel Gorman <mel@csn.ul.ie>
Subject: mlock and pageout race?
Date: Fri, 09 Apr 2010 23:41:19 +0900	[thread overview]
Message-ID: <1270824079.2524.63.camel@barrios-desktop> (raw)
In-Reply-To: <20100409170529.80E9.A69D9226@jp.fujitsu.com>

Hi, Kosaki. 

I don't want to make noise due to off-topic.
So I open new thread. 

On Fri, 2010-04-09 at 17:17 +0900, KOSAKI Motohiro wrote:
> Hi Minchan,
> 
> > OFF-TOPIC:
> > 
> > I think you pointed out good thing, too. :)
> > 
> > You mean although application call mlock of any vma, few pages on the vma can
> > be swapout by race between mlock and reclaim?
> > 
> > Although it's not disaster, apparently it breaks API.
> > Man page
> > " mlock() and munlock()
> >   mlock()  locks pages in the address range starting at addr and
> > continuing for len bytes. All pages that contain a part of the
> > specified address range are guaranteed to be resident in RAM when the
> > call returns  successfully;  the pages are guaranteed to stay in RAM
> > until later unlocked."
> > 
> > Do you have a plan to solve such problem?
> > 
> > And how about adding simple comment about that race in page_referenced_one?
> > Could you send the patch?
> 
> I'm surprising this mail. you were pushing much patch in this area.
> I believed you know all stuff ;)

If I disappoint you, sorry for that. 
Still, there are many thing to study to me. :)

> 
> My answer is, it don't need to fix, because it's not bug. The point is
> that this one is race issue. not "pageout after mlock" issue.
> If pageout and mlock occur at the exactly same time, the human can't
> observe which event occur in first. it's not API violation.


If it might happen, it's obviously API violation, I think.

int main()
{
	mlock(any vma, CURRENT|FUTURE);
	system("cat /proc/self/smaps | grep "any vma");
	..
}
result : 

08884000-088a5000 rw-p 00000000 00:00 0          [any vma]
Size:                  4 kB
Rss:                   4 kB
...
Swap:                  4 kB
...

Apparently, user expected that "If I call mlock, there are whole pages
of the vma in DRAM". But the result make him embarrassed :(

side note : 
Of course, mlock's semantic is rather different with smaps's Swap. 
mlock's semantic just makes sure pages on DRAM after success of mlock
call. it's not relate smap's swap entry. 
Actually, smaps's swap entry cannot compare to mlock's semantic.
Some page isn't on swap device yet but on swap cache and whole PTEs of
page already have swap entry(ie, all unmapped). In such case, smap's
Swap entry represent it with swap page. But with semantic of mlock, it's
still on RAM so that it's okay.  

I looked the code more detail.
Fortunately, the situation you said "page_referenced() already can take
unstable VM_LOCKED value. So, In worst case we make false positive
pageout, but it's not disaster" cannot happen, I think. 

1) 
mlock_fixup				shrink_page_list

					lock_page
					try_to_unmap

vma->vm_flags = VM_LOCKED
pte_lock				
pte_present test
get_page
pte_unlock
					pte_lock
					VM_LOCKED test fail
					pte_unlock
					never pageout
So, no problem. 

2) 
mlock_fixup				shrink_page_list

					lock_page
					try_to_unmap
					pte_lock
					VM_LOCKED test pass
vma->vm_flags = VM_LOCKED		make pte to swap entry
pte_lock				pte_unlock
pte_present test fail
pte_unlock
					pageout
swapin by handle_mm_fault     

So, no problem. 

3)
mlock_fixup				shrink_page_list

					lock_page
					try_to_unmap
					pte_lock
					VM_LOCKED test pass
vma->vm_flags = VM_LOCKED		make pte to swap entry
pte_lock				pte_unlock
pte_present test fail
pte_unlock
cachehit in swapcache by handle_mm_fault 
					pageout
					is_page_cache_freeable fail
So, no problem, too. 

I can't think the race situation you mentioned.
When 'false positive pageout' happens?
Could you elaborate on it?


-- 
Kind regards,
Minchan Kim



  reply	other threads:[~2010-04-09 14:41 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-08 19:17 [PATCH 00/13] mm: preemptibility -v2 Peter Zijlstra
2010-04-08 19:17 ` [PATCH 01/13] powerpc: Add rcu_read_lock() to gup_fast() implementation Peter Zijlstra
2010-04-08 20:31   ` Rik van Riel
2010-04-09  3:11   ` Nick Piggin
2010-04-13  1:05   ` Benjamin Herrenschmidt
2010-04-13  3:43     ` Paul E. McKenney
2010-04-14 13:51       ` Peter Zijlstra
2010-04-15 14:28         ` Paul E. McKenney
2010-04-16  6:54           ` Benjamin Herrenschmidt
2010-04-16 13:43             ` Paul E. McKenney
2010-04-16 23:25               ` Benjamin Herrenschmidt
2010-04-16 13:51           ` Peter Zijlstra
2010-04-16 14:17             ` Paul E. McKenney
2010-04-16 14:23               ` Peter Zijlstra
2010-04-16 14:32                 ` Paul E. McKenney
2010-04-16 14:56                   ` Peter Zijlstra
2010-04-16 15:09                     ` Paul E. McKenney
2010-04-16 15:14                       ` Peter Zijlstra
2010-04-16 16:45                         ` Paul E. McKenney
2010-04-16 19:37                           ` Peter Zijlstra
2010-04-16 20:28                             ` Paul E. McKenney
2010-04-18  3:06                           ` James Bottomley
2010-04-18 13:55                             ` Paul E. McKenney
2010-04-18 18:55                               ` James Bottomley
2010-04-16  6:51       ` Benjamin Herrenschmidt
2010-04-16  8:18         ` Nick Piggin
2010-04-16  8:29           ` Benjamin Herrenschmidt
2010-04-16  9:22             ` Nick Piggin
2010-04-08 19:17 ` [PATCH 02/13] mm: Revalidate anon_vma in page_lock_anon_vma() Peter Zijlstra
2010-04-08 20:50   ` Rik van Riel
2010-04-08 21:20   ` Andrew Morton
2010-04-08 21:54     ` Peter Zijlstra
2010-04-09  2:19       ` KOSAKI Motohiro
2010-04-09  2:19   ` Minchan Kim
2010-04-09  3:16   ` Nick Piggin
2010-04-09  4:56     ` KAMEZAWA Hiroyuki
2010-04-09  6:34       ` KOSAKI Motohiro
2010-04-09  6:47         ` KAMEZAWA Hiroyuki
2010-04-09  7:29           ` KOSAKI Motohiro
2010-04-09  7:57             ` KAMEZAWA Hiroyuki
2010-04-09  8:03               ` KAMEZAWA Hiroyuki
2010-04-09  8:24                 ` KAMEZAWA Hiroyuki
2010-04-09  8:01             ` Minchan Kim
2010-04-09  8:17               ` KOSAKI Motohiro
2010-04-09 14:41                 ` Minchan Kim [this message]
2010-04-09  8:44             ` Peter Zijlstra
2010-05-24 19:32               ` Andrew Morton
2010-05-25  9:01                 ` Peter Zijlstra
2010-04-09 12:57   ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 03/13] x86: Remove last traces of quicklist usage Peter Zijlstra
2010-04-08 20:51   ` Rik van Riel
2010-04-08 19:17 ` [PATCH 04/13] mm: Move anon_vma ref out from under CONFIG_KSM Peter Zijlstra
2010-04-09 12:35   ` Rik van Riel
2010-04-08 19:17 ` [PATCH 05/13] mm: Make use of the anon_vma ref count Peter Zijlstra
2010-04-09  7:04   ` Christian Ehrhardt
2010-04-09  9:57     ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 06/13] mm: Preemptible mmu_gather Peter Zijlstra
2010-04-09  3:25   ` Nick Piggin
2010-04-09  8:18     ` Peter Zijlstra
2010-04-09 20:36     ` Peter Zijlstra
2010-04-19 19:16       ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 07/13] powerpc: " Peter Zijlstra
2010-04-09  4:07   ` Nick Piggin
2010-04-09  8:14     ` Peter Zijlstra
2010-04-09  8:46       ` Nick Piggin
2010-04-09  9:22         ` Peter Zijlstra
2010-04-13  2:06       ` Benjamin Herrenschmidt
2010-04-13  1:56     ` Benjamin Herrenschmidt
2010-04-13  1:23   ` Benjamin Herrenschmidt
2010-04-13 10:22     ` Peter Zijlstra
2010-04-14 13:34     ` Peter Zijlstra
2010-04-14 13:51     ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 08/13] sparc: " Peter Zijlstra
2010-04-08 19:17 ` [PATCH 09/13] mm, powerpc: Move the RCU page-table freeing into generic code Peter Zijlstra
2010-04-09  3:35   ` Nick Piggin
2010-04-09  8:08     ` Peter Zijlstra
2010-04-08 19:17 ` [PATCH 10/13] lockdep, mutex: Provide mutex_lock_nest_lock Peter Zijlstra
2010-04-09 15:36   ` Rik van Riel
2010-04-08 19:17 ` [PATCH 11/13] mutex: Provide mutex_is_contended Peter Zijlstra
2010-04-09 15:37   ` Rik van Riel
2010-04-08 19:17 ` [PATCH 12/13] mm: Convert i_mmap_lock and anon_vma->lock to mutexes Peter Zijlstra
2010-04-08 19:17 ` [PATCH 13/13] mm: Optimize page_lock_anon_vma Peter Zijlstra
2010-04-08 22:18   ` Paul E. McKenney
2010-04-09  8:35     ` Peter Zijlstra
2010-04-09 19:22       ` Paul E. McKenney
2010-04-08 20:29 ` [PATCH 00/13] mm: preemptibility -v2 David Miller
2010-04-08 20:35   ` Peter Zijlstra
2010-04-09  1:00   ` David Miller
2010-04-09  4:14 ` Nick Piggin
2010-04-09  8:35   ` Peter Zijlstra
2010-04-09  8:50     ` Nick Piggin
2010-04-09  8:58       ` Peter Zijlstra
2010-04-09  8:58 ` Martin Schwidefsky
2010-04-09  9:53   ` Peter Zijlstra
2010-04-09  9:03 ` David Howells
2010-04-09  9:22   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1270824079.2524.63.camel@barrios-desktop \
    --to=minchan.kim@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=davem@davemloft.net \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).