From: Daniel McNeil <daniel@osdl.org>
To: Andrew Morton <akpm@osdl.org>
Cc: janetmor@us.ibm.com, Badari Pulavarty <pbadari@us.ibm.com>,
"linux-aio@kvack.org" <linux-aio@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Suparna Bhattacharya <suparna@in.ibm.com>
Subject: Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
Date: 05 Feb 2004 09:52:56 -0800 [thread overview]
Message-ID: <1076003576.7182.77.camel@ibm-c.pdx.osdl.net> (raw)
In-Reply-To: <20040204213336.354d8103.akpm@osdl.org>
Andrew,
I am still thinking about your patch. I will run some tests today using
2.6.2-mm1 to see if the problem is fixed. My 8-proc machine ran
overnight with 6 copies of the read_under running without problems with
my original patch. Previously on the 8-proc machine, it would hit
uninitialized data within an hour.
The concern I have is that DIO needs filemap_write_and_wait() to
make sure all previously dirty pages have been written back to
disk before the DIO is issued.
If __block_write_full_page() can possibly clear PageWriteback
with buffer i/o still in flight (even for WB_SYNC_NONE) then
a subsequent filemap_write_and_wait() will miss that page.
For example, I previously tried:
do {
get_bh(bh);
+ if (wbc->sync_mode != WB_SYCN_NONE)
+ wait_on_buffer(bh);
if (buffer_mapped(bh) && buffer_dirty(bh)) {
if (wbc->sync_mode != WB_SYNC_NONE) {
lock_buffer(bh);
and this still saw uninitialized data.
Also, if __block_write_full_page() can redirty a page wouldn't this
allow filemap_write_and_wait() to return with page still dirty that
DIO needs written back?
I'll work on updating the other patches.
Daniel
On Wed, 2004-02-04 at 21:33, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > I have found (finally) the problem causing DIO reads racing with
> > buffered writes to see uninitialized data on ext3 file systems
> > (which is what I have been testing on).
> >
> > The problem is caused by the changes to __block_write_page_full()
> > and a race with journaling:
> >
> > journal_commit_transaction() -> ll_rw_block() -> submit_bh()
> >
> > ll_rw_block() locks the buffer, clears buffer dirty and calls
> > submit_bh()
> >
> > A racing __block_write_full_page() (from ext3_ordered_writepage())
> >
> > would see that buffer_dirty() is not set because the i/o
> > is still in flight, so it would not do a bh_submit()
> >
> > It would SetPageWriteback() and unlock_page() and then
> > see that no i/o was submitted and call end_page_writeback()
> > (with the i/o still in flight).
> >
> > This would allow the DIO code to issue the DIO read while buffer writes
> > are still in flight. The i/o can be reordered by i/o scheduling and
> > the DIO can complete BEFORE the writebacks complete. Thus the DIO
> > sees the old uninitialized data.
>
> I suppose we should go for a general fix to the problem. I'm not 100%
> happy with it. It's similar to yours, except we only wait if
> wbc->sync_mode says it's a write-for-sync. Also we hold the buffer lock
> across all the tests.
>
>
>
>
>
>
> Fix a race which was identified by Daniel McNeil <daniel@osdl.org>
>
> If a buffer_head is under I/O due to JBD's ordered data writeout (which uses
> ll_rw_block()) then either filemap_fdatawrite() or filemap_fdatawait() need
> to wait on the buffer's existing I/O.
>
> Presently neither will do so, because __block_write_full_page() will not
> actually submit any I/O and will hence not mark the page as being under
> writeback.
>
> The best-performing fix would be to somehow mark the page as being under
> writeback and defer waiting for the ll_rw_block-initiated I/O until
> filemap_fdatawait()-time. But this is hard, because in
> __block_write_full_page() we do not have control of the buffer_head's end_io
> handler. Possibly we could make JBD call into end_buffer_async_write(), but
> that gets nasty.
>
> This patch makes __block_write_full_page() wait for any buffer_head I/O to
> complete before inspecting the buffer_head state. It only does this in the
> case where __block_write_full_page() was called for a "data-integrity" write:
> (wbc->sync_mode != WB_SYNC_NONE).
>
> Probably it doesn't matter, because kjournald is currently submitting (or has
> already submitted) all dirty buffers anyway.
>
>
>
> ---
>
> fs/buffer.c | 29 +++++++++++++++--------------
> 1 files changed, 15 insertions(+), 14 deletions(-)
>
> diff -puN fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix fs/buffer.c
> --- 25/fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix 2004-02-04 20:38:30.000000000 -0800
> +++ 25-akpm/fs/buffer.c 2004-02-04 20:40:19.000000000 -0800
> @@ -1810,23 +1810,24 @@ static int __block_write_full_page(struc
>
> do {
> get_bh(bh);
> - if (buffer_mapped(bh) && buffer_dirty(bh)) {
> - if (wbc->sync_mode != WB_SYNC_NONE) {
> - lock_buffer(bh);
> - } else {
> - if (test_set_buffer_locked(bh)) {
> + if (!buffer_mapped(bh))
> + continue;
> + if (wbc->sync_mode != WB_SYNC_NONE) {
> + lock_buffer(bh);
> + } else {
> + if (test_set_buffer_locked(bh)) {
> + if (buffer_dirty(bh))
> __set_page_dirty_nobuffers(page);
> - continue;
> - }
> - }
> - if (test_clear_buffer_dirty(bh)) {
> - if (!buffer_uptodate(bh))
> - buffer_error();
> - mark_buffer_async_write(bh);
> - } else {
> - unlock_buffer(bh);
> + continue;
> }
> }
> + if (test_clear_buffer_dirty(bh)) {
> + if (!buffer_uptodate(bh))
> + buffer_error();
> + mark_buffer_async_write(bh);
> + } else {
> + unlock_buffer(bh);
> + }
> } while ((bh = bh->b_this_page) != head);
>
> BUG_ON(PageWriteback(page));
>
> _
next prev parent reply other threads:[~2004-02-05 17:54 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3FCD4B66.8090905@us.ibm.com>
2003-12-06 1:29 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Daniel McNeil
2003-12-08 18:23 ` Daniel McNeil
2003-12-12 0:51 ` Daniel McNeil
2003-12-17 1:25 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-17 2:03 ` Andrew Morton
2003-12-17 19:25 ` Daniel McNeil
2003-12-17 20:17 ` Janet Morgan
2003-12-31 9:18 ` Suparna Bhattacharya
2003-12-31 9:35 ` Andrew Morton
2003-12-31 9:55 ` Suparna Bhattacharya
2003-12-31 9:59 ` Andrew Morton
2003-12-31 10:09 ` Suparna Bhattacharya
2003-12-31 10:10 ` Andrew Morton
2003-12-31 10:48 ` Suparna Bhattacharya
2003-12-31 10:53 ` Andrew Morton
2003-12-31 10:54 ` Andrew Morton
2003-12-31 11:17 ` Andrew Morton
2003-12-31 22:34 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-31 22:41 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
2003-12-31 23:46 ` Andrew Morton
2004-01-02 5:14 ` Suparna Bhattacharya
2004-01-02 7:46 ` Andrew Morton
2004-01-05 3:55 ` Suparna Bhattacharya
2004-01-05 5:06 ` Andrew Morton
2004-01-05 5:28 ` Suparna Bhattacharya
2004-01-05 5:28 ` Andrew Morton
2004-01-05 6:06 ` Suparna Bhattacharya
2004-01-05 6:14 ` Lincoln Dale
2003-12-31 22:47 ` [PATCH linux-2.6.1-rc1-mm1] dio_isize.patch Daniel McNeil
2003-12-31 23:42 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Andrew Morton
2004-01-02 4:20 ` Suparna Bhattacharya
2004-01-02 4:36 ` Andrew Morton
2004-01-02 5:50 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
2004-01-02 7:31 ` Andrew Morton
2004-01-05 13:49 ` Marcelo Tosatti
2004-01-05 20:27 ` Andrew Morton
2004-03-29 15:44 ` Marcelo Tosatti
2004-01-11 23:14 ` Janet Morgan
2004-01-11 23:44 ` Andrew Morton
2004-01-12 18:00 ` filemap_fdatawait.patch Daniel McNeil
2004-01-12 19:39 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Janet Morgan
2004-01-12 19:46 ` Daniel McNeil
2004-01-13 4:12 ` Janet Morgan
2003-12-30 4:53 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
2003-12-31 0:29 ` Daniel McNeil
2003-12-31 6:09 ` Suparna Bhattacharya
2004-01-08 23:55 ` Daniel McNeil
2004-01-09 3:55 ` Suparna Bhattacharya
2004-02-05 1:39 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
2004-02-05 1:54 ` Badari Pulavarty
2004-02-05 2:07 ` Andrew Morton
2004-02-05 2:54 ` Janet Morgan
2004-02-05 3:19 ` Andrew Morton
2004-02-05 3:43 ` Suparna Bhattacharya
2004-02-05 5:33 ` Andrew Morton
2004-02-05 17:52 ` Daniel McNeil [this message]
2004-02-05 18:53 ` Badari Pulavarty
2004-03-29 15:41 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1076003576.7182.77.camel@ibm-c.pdx.osdl.net \
--to=daniel@osdl.org \
--cc=akpm@osdl.org \
--cc=janetmor@us.ibm.com \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbadari@us.ibm.com \
--cc=suparna@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).