linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Daniel McNeil <daniel@osdl.org>
Cc: janetmor@us.ibm.com, pbadari@us.ibm.com, linux-aio@kvack.org,
	linux-kernel@vger.kernel.org, suparna@in.ibm.com
Subject: Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
Date: Wed, 4 Feb 2004 21:33:36 -0800	[thread overview]
Message-ID: <20040204213336.354d8103.akpm@osdl.org> (raw)
In-Reply-To: <1075945198.7182.46.camel@ibm-c.pdx.osdl.net>

Daniel McNeil <daniel@osdl.org> wrote:
>
>  I have found (finally) the problem causing DIO reads racing with
>  buffered writes to see uninitialized data on ext3 file systems 
>  (which is what I have been testing on).
> 
>  The problem is caused by the changes to __block_write_page_full()
>  and a race with journaling:
> 
>  journal_commit_transaction() -> ll_rw_block() -> submit_bh()
>  	
>  ll_rw_block() locks the buffer, clears buffer dirty and calls
>  submit_bh()
> 
>  A racing __block_write_full_page() (from ext3_ordered_writepage())
> 
>  	would see that buffer_dirty() is not set because the i/o
>          is still in flight, so it would not do a bh_submit()
> 
>  	It would SetPageWriteback() and unlock_page() and then
>  	see that no i/o was submitted and call end_page_writeback()
>  	(with the i/o still in flight).
> 
>  This would allow the DIO code to issue the DIO read while buffer writes
>  are still in flight.  The i/o can be reordered by i/o scheduling and
>  the DIO can complete BEFORE the writebacks complete.  Thus the DIO
>  sees the old uninitialized data.

I suppose we should go for a general fix to the problem.  I'm not 100%
happy with it.  It's similar to yours, except we only wait if
wbc->sync_mode says it's a write-for-sync.  Also we hold the buffer lock
across all the tests.






Fix a race which was identified by Daniel McNeil <daniel@osdl.org>

If a buffer_head is under I/O due to JBD's ordered data writeout (which uses
ll_rw_block()) then either filemap_fdatawrite() or filemap_fdatawait() need
to wait on the buffer's existing I/O.

Presently neither will do so, because __block_write_full_page() will not
actually submit any I/O and will hence not mark the page as being under
writeback.

The best-performing fix would be to somehow mark the page as being under
writeback and defer waiting for the ll_rw_block-initiated I/O until
filemap_fdatawait()-time.  But this is hard, because in
__block_write_full_page() we do not have control of the buffer_head's end_io
handler.  Possibly we could make JBD call into end_buffer_async_write(), but
that gets nasty.

This patch makes __block_write_full_page() wait for any buffer_head I/O to
complete before inspecting the buffer_head state.  It only does this in the
case where __block_write_full_page() was called for a "data-integrity" write:
(wbc->sync_mode != WB_SYNC_NONE).

Probably it doesn't matter, because kjournald is currently submitting (or has
already submitted) all dirty buffers anyway.



---

 fs/buffer.c |   29 +++++++++++++++--------------
 1 files changed, 15 insertions(+), 14 deletions(-)

diff -puN fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix fs/buffer.c
--- 25/fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix	2004-02-04 20:38:30.000000000 -0800
+++ 25-akpm/fs/buffer.c	2004-02-04 20:40:19.000000000 -0800
@@ -1810,23 +1810,24 @@ static int __block_write_full_page(struc
 
 	do {
 		get_bh(bh);
-		if (buffer_mapped(bh) && buffer_dirty(bh)) {
-			if (wbc->sync_mode != WB_SYNC_NONE) {
-				lock_buffer(bh);
-			} else {
-				if (test_set_buffer_locked(bh)) {
+		if (!buffer_mapped(bh))
+			continue;
+		if (wbc->sync_mode != WB_SYNC_NONE) {
+			lock_buffer(bh);
+		} else {
+			if (test_set_buffer_locked(bh)) {
+				if (buffer_dirty(bh))
 					__set_page_dirty_nobuffers(page);
-					continue;
-				}
-			}
-			if (test_clear_buffer_dirty(bh)) {
-				if (!buffer_uptodate(bh))
-					buffer_error();
-				mark_buffer_async_write(bh);
-			} else {
-				unlock_buffer(bh);
+				continue;
 			}
 		}
+		if (test_clear_buffer_dirty(bh)) {
+			if (!buffer_uptodate(bh))
+				buffer_error();
+			mark_buffer_async_write(bh);
+		} else {
+			unlock_buffer(bh);
+		}
 	} while ((bh = bh->b_this_page) != head);
 
 	BUG_ON(PageWriteback(page));

_


  parent reply	other threads:[~2004-02-05  5:32 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3FCD4B66.8090905@us.ibm.com>
2003-12-06  1:29 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Daniel McNeil
2003-12-08 18:23   ` Daniel McNeil
2003-12-12  0:51     ` Daniel McNeil
2003-12-17  1:25       ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-17  2:03         ` Andrew Morton
2003-12-17 19:25           ` Daniel McNeil
2003-12-17 20:17             ` Janet Morgan
2003-12-31  9:18           ` Suparna Bhattacharya
2003-12-31  9:35             ` Andrew Morton
2003-12-31  9:55               ` Suparna Bhattacharya
2003-12-31  9:59                 ` Andrew Morton
2003-12-31 10:09                   ` Suparna Bhattacharya
2003-12-31 10:10                     ` Andrew Morton
2003-12-31 10:48                       ` Suparna Bhattacharya
2003-12-31 10:53                         ` Andrew Morton
2003-12-31 10:54                           ` Andrew Morton
2003-12-31 11:17                             ` Andrew Morton
2003-12-31 22:34                               ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-31 22:41                                 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
2003-12-31 23:46                                   ` Andrew Morton
2004-01-02  5:14                                     ` Suparna Bhattacharya
2004-01-02  7:46                                       ` Andrew Morton
2004-01-05  3:55                                         ` Suparna Bhattacharya
2004-01-05  5:06                                           ` Andrew Morton
2004-01-05  5:28                                             ` Suparna Bhattacharya
2004-01-05  5:28                                               ` Andrew Morton
2004-01-05  6:06                                                 ` Suparna Bhattacharya
2004-01-05  6:14                                                 ` Lincoln Dale
2003-12-31 22:47                                 ` [PATCH linux-2.6.1-rc1-mm1] dio_isize.patch Daniel McNeil
2003-12-31 23:42                                 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Andrew Morton
2004-01-02  4:20                                   ` Suparna Bhattacharya
2004-01-02  4:36                                     ` Andrew Morton
2004-01-02  5:50                               ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
2004-01-02  7:31                                 ` Andrew Morton
2004-01-05 13:49                                 ` Marcelo Tosatti
2004-01-05 20:27                                   ` Andrew Morton
2004-03-29 15:44                                 ` Marcelo Tosatti
2004-01-11 23:14                               ` Janet Morgan
2004-01-11 23:44                                 ` Andrew Morton
2004-01-12 18:00                                   ` filemap_fdatawait.patch Daniel McNeil
2004-01-12 19:39                                   ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Janet Morgan
2004-01-12 19:46                                     ` Daniel McNeil
2004-01-13  4:12                                 ` Janet Morgan
2003-12-30  4:53       ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
2003-12-31  0:29         ` Daniel McNeil
2003-12-31  6:09           ` Suparna Bhattacharya
2004-01-08 23:55             ` Daniel McNeil
2004-01-09  3:55               ` Suparna Bhattacharya
2004-02-05  1:39                 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
2004-02-05  1:54                   ` Badari Pulavarty
2004-02-05  2:07                   ` Andrew Morton
2004-02-05  2:54                     ` Janet Morgan
2004-02-05  3:19                       ` Andrew Morton
2004-02-05  3:43                         ` Suparna Bhattacharya
2004-02-05  5:33                   ` Andrew Morton [this message]
2004-02-05 17:52                     ` Daniel McNeil
2004-02-05 18:53                     ` Badari Pulavarty
2004-03-29 15:41           ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040204213336.354d8103.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=daniel@osdl.org \
    --cc=janetmor@us.ibm.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbadari@us.ibm.com \
    --cc=suparna@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).