linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Greg KH <greg@kroah.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	mtk.manpages@gmail.com, linux-man@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?
Date: Tue, 3 Feb 2009 11:55:40 +0900	[thread overview]
Message-ID: <20090203115540.86a01273.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090203023147.GZ20323@random.random>

On Tue, 3 Feb 2009 03:31:47 +0100
Andrea Arcangeli <aarcange@redhat.com> wrote:

> On Tue, Feb 03, 2009 at 10:29:20AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Mon, 2 Feb 2009 23:08:56 +0100
> > Andrea Arcangeli <aarcange@redhat.com> wrote:
> > 
> > > Hi Greg!
> > > 
> > > > Thanks for the pointers, I'll go read the thread and follow up there.
> > > 
> > > If you also run into this final fix is attached below. Porting to
> > > mainline is a bit hard because of gup-fast... Perhaps we can use mmu
> > > notifiers to fix gup-fast... need to think more about it then I'll
> > > post something.
> > > 
> > > Please help testing the below on pre-gup-fast kernels, thanks!
> > > 
> > I commented in FJ-Redhat Path but not forwared from unknown reason ;)
> > I comment again.
> > 
> > 1. Why TestSetLockPage() is necessary ?
> >    It seems not necesary.
> 
> To avoid the VM to remove or add the page from/to swapcache and change
> page_count/mapcount from under us. This most certainly wasn't the
> reason of the slowdown (the slowdown were the false positives
> generated by pagevec pinning) and removing it was more intrusive than
> I wanted.

My point is.
  - If TestSetLockPage() failes, force_cow=1.
  - If count/mapcount check fails, force_cow=1.

So, lock_page() here seems meaningless. If you consider lock_page() is important,
just use lock_page() seems better.

> 
> > 2. This patch doesn't cover HugeTLB.
> 
> There's no need to change hugetlb with my approach. I'm not touching
> the cow path, I'm addressing the real source of the problem (i.e. when
> fork pretends to mark the child pte readonly and pointing to the
> shared parent page, same as ksm: while the pte wrprotect + tlb flush
> stops the _cpu_ it can't stop any get_user_pages(write=1) user, hence
> we need to pre-cow the child page in fork instead of marking the child
> pte readonly to avoid the parent to lose writes if post-fork the
> parent cows and the child doesn't cow).
> 
No need to make a patch for copy_hugetlb_page_range() ?
IMHO, HugeTLB can be write-protected at fork().

> > 3. Why "follow_page() successfully finds a page" case only ?
> >  not necessary to insert SetPageGUP() in following path ?
> > 
> >  - handle_mm_fault()
> >            => do_anonymos/swap/wp_page()
> >            or some.
> 
> No need to change that either, all we need to know are the pages whose
> count vs mapcount has a discrepancy that could have been caused by
> get_user_pages. So only follow_page has to set it. More precisely
> FOLL_GET|FOLL_WRITE is the only path we care about there.
> 

Assume 3 threads in a process.
==
 Thread1 (DIO-Read)                        Thread2           Thread3
 get_user_page()
 => handle_mm_fault().
    => map a page with no-write-protect.
                                            fork()
                                      (write-protect here)
                                                              Copy-On-Write
 endio.

pre-cow-at-fork will never happen becasue PageGUP is not set. 
After the end of READ, this process will see a broken page.

Thanks,
-Kame


  reply	other threads:[~2009-02-03  2:57 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-28 21:33 open(2) says O_DIRECT works on 512 byte boundries? Greg KH
2009-01-29  0:41 ` Robert Hancock
     [not found]   ` <20090129011758.GA26534@kroah.com>
2009-01-29  2:59     ` Michael Kerrisk
2009-01-29  3:13       ` Greg KH
2009-01-29 15:40         ` Jeff Moyer
2009-01-30  6:16           ` Greg KH
2009-01-29  5:13 ` KAMEZAWA Hiroyuki
2009-01-29  7:10   ` KOSAKI Motohiro
2009-01-30  6:17     ` Greg KH
2009-02-02 22:08       ` Andrea Arcangeli
2009-02-03  1:29         ` KAMEZAWA Hiroyuki
2009-02-03  2:31           ` Andrea Arcangeli
2009-02-03  2:55             ` KAMEZAWA Hiroyuki [this message]
2009-02-03  3:42               ` KAMEZAWA Hiroyuki
2009-02-06 17:55               ` Andrea Arcangeli
2009-02-03  3:50         ` Greg KH
2009-02-03 15:01           ` Andrea Arcangeli
2009-02-03  4:13         ` KAMEZAWA Hiroyuki
2009-02-03  4:38         ` KAMEZAWA Hiroyuki
2009-02-03 15:08           ` Andrea Arcangeli
2009-02-04 23:41         ` Greg KH
2009-02-06 17:54           ` Andrea Arcangeli
2009-02-06 18:38             ` Andrea Arcangeli
2009-02-07 13:32             ` Izik Eidus
2009-02-07 15:33               ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090203115540.86a01273.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=aarcange@redhat.com \
    --cc=greg@kroah.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).