From: Andrew Morton <akpm@osdl.org> To: Andrea Arcangeli <andrea@novell.com> Cc: shaggy@austin.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] zap_pte_range should not mark non-uptodate pages dirty Date: Thu, 21 Oct 2004 16:42:45 -0700 [thread overview] Message-ID: <20041021164245.4abec5d2.akpm@osdl.org> (raw) In-Reply-To: <20041021232059.GE8756@dualathlon.random> Andrea Arcangeli <andrea@novell.com> wrote: > > On Thu, Oct 21, 2004 at 04:02:33PM -0700, Andrew Morton wrote: > > Andrea Arcangeli <andrea@novell.com> wrote: > > > > > > On Thu, Oct 21, 2004 at 02:45:31PM -0700, Andrew Morton wrote: > > > > Maybe we should revisit invalidate_inode_pages2(). It used to be an > > > > invariant that "pages which are mapped into process address space are > > > > always uptodate". We broke that (good) invariant and we're now seeing > > > > some fallout. There may be more. > > > > > > such invariant doesn't exists since 2.4.10. There's no way to get mmaps > > > reload data from disk without breaking such an invariant. > > > > There are at least two ways: > > > > a) Set a new page flag in invalidate, test+clear that at fault time > > What's the point of adding a new page flag when the invariant > !PageUptodate && page_mapcount(page) already provides the information? Step back and think about this. What earthly sense is there in permitting userspace access to non uptodate pages? None. It's completely wrong and the invariant was a good one. We broke it by introducing some kluge to force new I/O when someone does a new fault against the page. (A new PG_needs_rereading flag isn't sufficient btw - we'd also need BH_Needs_Rereading and associated code. ug.) > > b) shoot down all pte's mapping the locked page at invalidate time, mark the > > page not uptodate. > > invalidate should run fast, I didn't enforce coherency or it'd hurt too > much the O_DIRECT write if something is mapped, we only allow buffered > read against O_DIRECT write to work coherently, the mmap coherency has > never been provided to avoid having to search for vmas in the prio_tree > for every single write to an inode. I don't get it. invalidate has the pageframe. All it need to do is to lock the page, examine mapcount and if it's non-zero, do the shootdown. The only way in which we would be performing the shootdown a significant number of times would be if someone was repeatedly faulting the thing back in anyway, and in that case the physical I/O cost would dominate. Where's the performance overhead?? Plus it makes the currently incorrect code correct for existing mmaps. Plus it avoids the idiotic situation of having non uptodate pages accessible to user processes.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@osdl.org> To: Andrea Arcangeli <andrea@novell.com> Cc: shaggy@austin.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] zap_pte_range should not mark non-uptodate pages dirty Date: Thu, 21 Oct 2004 16:42:45 -0700 [thread overview] Message-ID: <20041021164245.4abec5d2.akpm@osdl.org> (raw) In-Reply-To: <20041021232059.GE8756@dualathlon.random> Andrea Arcangeli <andrea@novell.com> wrote: > > On Thu, Oct 21, 2004 at 04:02:33PM -0700, Andrew Morton wrote: > > Andrea Arcangeli <andrea@novell.com> wrote: > > > > > > On Thu, Oct 21, 2004 at 02:45:31PM -0700, Andrew Morton wrote: > > > > Maybe we should revisit invalidate_inode_pages2(). It used to be an > > > > invariant that "pages which are mapped into process address space are > > > > always uptodate". We broke that (good) invariant and we're now seeing > > > > some fallout. There may be more. > > > > > > such invariant doesn't exists since 2.4.10. There's no way to get mmaps > > > reload data from disk without breaking such an invariant. > > > > There are at least two ways: > > > > a) Set a new page flag in invalidate, test+clear that at fault time > > What's the point of adding a new page flag when the invariant > !PageUptodate && page_mapcount(page) already provides the information? Step back and think about this. What earthly sense is there in permitting userspace access to non uptodate pages? None. It's completely wrong and the invariant was a good one. We broke it by introducing some kluge to force new I/O when someone does a new fault against the page. (A new PG_needs_rereading flag isn't sufficient btw - we'd also need BH_Needs_Rereading and associated code. ug.) > > b) shoot down all pte's mapping the locked page at invalidate time, mark the > > page not uptodate. > > invalidate should run fast, I didn't enforce coherency or it'd hurt too > much the O_DIRECT write if something is mapped, we only allow buffered > read against O_DIRECT write to work coherently, the mmap coherency has > never been provided to avoid having to search for vmas in the prio_tree > for every single write to an inode. I don't get it. invalidate has the pageframe. All it need to do is to lock the page, examine mapcount and if it's non-zero, do the shootdown. The only way in which we would be performing the shootdown a significant number of times would be if someone was repeatedly faulting the thing back in anyway, and in that case the physical I/O cost would dominate. Where's the performance overhead?? Plus it makes the currently incorrect code correct for existing mmaps. Plus it avoids the idiotic situation of having non uptodate pages accessible to user processes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-10-21 23:45 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2004-10-21 21:15 [PATCH] zap_pte_range should not mark non-uptodate pages dirty Dave Kleikamp 2004-10-21 21:15 ` Dave Kleikamp 2004-10-21 21:45 ` Andrew Morton 2004-10-21 21:45 ` Andrew Morton 2004-10-21 22:36 ` Andrea Arcangeli 2004-10-21 22:36 ` Andrea Arcangeli 2004-10-21 23:02 ` Andrew Morton 2004-10-21 23:02 ` Andrew Morton 2004-10-21 23:20 ` Andrea Arcangeli 2004-10-21 23:20 ` Andrea Arcangeli 2004-10-21 23:42 ` Andrew Morton [this message] 2004-10-21 23:42 ` Andrew Morton 2004-10-22 0:15 ` Andrew Morton 2004-10-22 0:15 ` Andrew Morton 2004-10-22 0:41 ` Andrea Arcangeli 2004-10-22 0:41 ` Andrea Arcangeli 2004-10-22 2:51 ` Rik van Riel 2004-10-22 2:51 ` Rik van Riel 2004-10-22 16:19 ` Andrea Arcangeli 2004-10-22 16:19 ` Andrea Arcangeli 2004-10-22 0:30 ` Andrea Arcangeli 2004-10-22 0:30 ` Andrea Arcangeli 2004-10-22 1:22 ` Andrea Arcangeli 2004-10-22 1:22 ` Andrea Arcangeli 2004-10-22 2:03 ` Andrew Morton 2004-10-22 2:03 ` Andrew Morton 2004-10-22 16:17 ` Andrea Arcangeli 2004-10-22 16:17 ` Andrea Arcangeli 2004-10-22 17:04 ` Andrea Arcangeli 2004-10-22 17:04 ` Andrea Arcangeli 2004-10-22 23:24 ` Andrew Morton 2004-10-22 23:24 ` Andrew Morton 2004-10-25 13:58 ` Dave Kleikamp 2004-10-25 13:58 ` Dave Kleikamp 2004-10-26 0:35 ` Andrea Arcangeli 2004-10-26 0:35 ` Andrea Arcangeli 2004-11-09 14:15 ` Dave Kleikamp 2004-11-09 14:15 ` Dave Kleikamp 2004-11-09 14:46 ` Andrea Arcangeli 2004-11-09 14:46 ` Andrea Arcangeli 2004-11-09 19:51 ` Andrew Morton 2004-11-09 19:51 ` Andrew Morton 2004-11-09 19:46 ` Andrew Morton 2004-11-09 19:46 ` Andrew Morton
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20041021164245.4abec5d2.akpm@osdl.org \ --to=akpm@osdl.org \ --cc=andrea@novell.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=shaggy@austin.ibm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.