linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Greg KH <greg@kroah.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	mtk.manpages@gmail.com, linux-man@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: open(2) says O_DIRECT works on 512 byte boundries?
Date: Tue, 3 Feb 2009 13:38:11 +0900	[thread overview]
Message-ID: <20090203133811.47324d80.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090202220856.GY20323@random.random>

On Mon, 2 Feb 2009 23:08:56 +0100
Andrea Arcangeli <aarcange@redhat.com> wrote:

> Hi Greg!
> 
> > Thanks for the pointers, I'll go read the thread and follow up there.
> 
> If you also run into this final fix is attached below. Porting to
> mainline is a bit hard because of gup-fast... Perhaps we can use mmu
> notifiers to fix gup-fast... need to think more about it then I'll
> post something.
> 
> Please help testing the below on pre-gup-fast kernels, thanks!
> 
> From: Andrea Arcangeli <aarcange@redhat.com>
> Subject: fork-o_direct-race
> 
> Think a thread writing constantly to the last 512bytes of a page, while another
> thread read and writes to/from the first 512bytes of the page. We can lose
> O_DIRECT reads (or any other get_user_pages write=1 I/O not just bio/O_DIRECT),
> the very moment we mark any pte wrprotected because a third unrelated thread
> forks off a child.
> 
> This fixes it by never wprotecting anon ptes if there can be any direct I/O in
> flight to the page, and by instantiating a readonly pte and triggering a COW in
> the child. The only trouble here are O_DIRECT reads (writes to memory, read
> from disk). Checking the page_count under the PT lock guarantees no
> get_user_pages could be running under us because if somebody wants to write to
> the page, it has to break any cow first and that requires taking the PT lock in
> follow_page before increasing the page count. We are guaranteed mapcount is 1 if
> fork is writeprotecting the pte so the PT lock is enough to serialize against
> get_user_pages->get_page.
> 
> The COW triggered inside fork will run while the parent pte is readonly to
> provide as usual the per-page atomic copy from parent to child during fork.
> However timings will be altered by having to copy the pages that might be under
> O_DIRECT.
> 
> The pagevec code calls get_page while the page is sitting in the pagevec
> (before it becomes PageLRU) and doing so it can generate false positives, so to
> avoid slowing down fork all the time even for pages that could never possibly
> be under O_DIRECT write=1, the PG_gup bitflag is added, this eliminates
> most overhead of the fix in fork.
> 
> Patch doesn't break kABI despite introducing a new page flag.
> 
> Fixed version of original patch from Nick Piggin.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---

Sorry, one more ;)
==
1117                         cond_resched();
1118                         while (!(page = follow_page(vma, start, foll_flags))) {
1119                                 int ret;
1120                                 ret = __handle_mm_fault(mm, vma, start,
1121                                                 foll_flags & FOLL_WRITE);
1122                                 /*
1123                                  * The VM_FAULT_WRITE bit tells us that do_wp_page has
1124                                  * broken COW when necessary, even if maybe_mkwrite
1125                                  * decided not to set pte_write. We can thus safely do
1126                                  * subsequent page lookups as if they were reads.
1127                                  */
1128                                 if (ret & VM_FAULT_WRITE)
1129                                         foll_flags &= ~FOLL_WRITE;
==

>From above, FOLL_WRITE can be dropped and PageGUP() will not be set ?

Thanks,
-Kame



  parent reply	other threads:[~2009-02-03  4:39 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-28 21:33 open(2) says O_DIRECT works on 512 byte boundries? Greg KH
2009-01-29  0:41 ` Robert Hancock
     [not found]   ` <20090129011758.GA26534@kroah.com>
2009-01-29  2:59     ` Michael Kerrisk
2009-01-29  3:13       ` Greg KH
2009-01-29 15:40         ` Jeff Moyer
2009-01-30  6:16           ` Greg KH
2009-01-29  5:13 ` KAMEZAWA Hiroyuki
2009-01-29  7:10   ` KOSAKI Motohiro
2009-01-30  6:17     ` Greg KH
2009-02-02 22:08       ` Andrea Arcangeli
2009-02-03  1:29         ` KAMEZAWA Hiroyuki
2009-02-03  2:31           ` Andrea Arcangeli
2009-02-03  2:55             ` KAMEZAWA Hiroyuki
2009-02-03  3:42               ` KAMEZAWA Hiroyuki
2009-02-06 17:55               ` Andrea Arcangeli
2009-02-03  3:50         ` Greg KH
2009-02-03 15:01           ` Andrea Arcangeli
2009-02-03  4:13         ` KAMEZAWA Hiroyuki
2009-02-03  4:38         ` KAMEZAWA Hiroyuki [this message]
2009-02-03 15:08           ` Andrea Arcangeli
2009-02-04 23:41         ` Greg KH
2009-02-06 17:54           ` Andrea Arcangeli
2009-02-06 18:38             ` Andrea Arcangeli
2009-02-07 13:32             ` Izik Eidus
2009-02-07 15:33               ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090203133811.47324d80.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=aarcange@redhat.com \
    --cc=greg@kroah.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).