linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Device-backed loop broken in 2.6.0-test2?
@ 2003-08-06 22:40 Thomas Themel
  2003-08-07  0:40 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Themel @ 2003-08-06 22:40 UTC (permalink / raw)
  To: linux-kernel

Hi,

it seems that device backed loopback is broken in the 2.6.0-test2 series.

I've noticed the error while testing cryptoloop, but it still appears
reliably when using plain loop without encryption.

I set up a loopback device on an IDE partition

losetup /dev/loop0 /dev/hda6

and create an ext3 filesystem on it. Then, when trying to fill it with
data, it works for a while until errors of the form 

Buffer I/O error on device loop0, logical block 377367
Buffer I/O error on device loop0, logical block 377380
Buffer I/O error on device loop0, logical block 377419
Buffer I/O error on device loop0, logical block 378937
Buffer I/O error on device loop0, logical block 378983
Buffer I/O error on device loop0, logical block 380008
Buffer I/O error on device loop0, logical block 380009

start to appear in the kernel log. This does not affect the writes,
however, and only manifests later when the filesystem breaks or data in
files is corrupted. 

ciao,
-- 
[*Thomas  Themel*] I read what some of you folks here write and all I can
[extended contact] say is that I hope you are inside the fireballs when the
[info provided in] freedom fighters take out the Great Satan.
[*message header*]	- Tim May on cypherpunks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Device-backed loop broken in 2.6.0-test2?
  2003-08-06 22:40 Device-backed loop broken in 2.6.0-test2? Thomas Themel
@ 2003-08-07  0:40 ` Andrew Morton
  2003-08-07  7:23   ` Thomas Themel
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Andrew Morton @ 2003-08-07  0:40 UTC (permalink / raw)
  To: Thomas Themel; +Cc: linux-kernel

Thomas Themel <themel@iwoars.net> wrote:
>
> it seems that device backed loopback is broken in the 2.6.0-test2 series.

doh.



We're currently setting PF_READAHEAD across a call into the page allocator. 
We end up calling writepage() with PF_READAHEAD set and the block layer
aborts the writes, resulting in corrupted data.

It only seems to bite with loop-on-blockdev for some reason.

And add a warning in ll_rw_block() to catch any more occurrences.




 drivers/block/ll_rw_blk.c |    8 +++++++-
 mm/readahead.c            |   22 +++++++++++-----------
 2 files changed, 18 insertions(+), 12 deletions(-)

diff -puN mm/readahead.c~PF_READAHEAD-loop-fix mm/readahead.c
--- 25/mm/readahead.c~PF_READAHEAD-loop-fix	2003-08-06 16:59:29.000000000 -0700
+++ 25-akpm/mm/readahead.c	2003-08-06 16:59:29.000000000 -0700
@@ -202,9 +202,9 @@ out:
  *
  * Returns the number of pages which actually had IO started against them.
  */
-static inline int
+static int
 __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
-			unsigned long offset, unsigned long nr_to_read)
+	unsigned long offset, unsigned long nr_to_read, int pf_readahead)
 {
 	struct inode *inode = mapping->host;
 	struct page *page;
@@ -249,8 +249,11 @@ __do_page_cache_readahead(struct address
 	 * uptodate then the caller will launch readpage again, and
 	 * will then handle the error.
 	 */
-	if (ret)
+	if (ret) {
+		current->flags |= pf_readahead;
 		read_pages(mapping, filp, &page_pool, ret);
+		current->flags &= ~pf_readahead;
+	}
 	BUG_ON(!list_empty(&page_pool));
 out:
 	return ret;
@@ -275,8 +278,8 @@ int force_page_cache_readahead(struct ad
 
 		if (this_chunk > nr_to_read)
 			this_chunk = nr_to_read;
-		err = __do_page_cache_readahead(mapping, filp,
-						offset, this_chunk);
+		err = __do_page_cache_readahead(mapping, filp, offset,
+						this_chunk, 0);
 		if (err < 0) {
 			ret = err;
 			break;
@@ -300,12 +303,9 @@ int do_page_cache_readahead(struct addre
 {
 	int ret = 0;
 
-	if (!bdi_read_congested(mapping->backing_dev_info)) {
-		current->flags |= PF_READAHEAD;
-		ret = __do_page_cache_readahead(mapping, filp,
-						offset, nr_to_read);
-		current->flags &= ~PF_READAHEAD;
-	}
+	if (!bdi_read_congested(mapping->backing_dev_info))
+		ret = __do_page_cache_readahead(mapping, filp, offset,
+						nr_to_read, PF_READAHEAD);
 	return ret;
 }
 
diff -puN drivers/block/ll_rw_blk.c~PF_READAHEAD-loop-fix drivers/block/ll_rw_blk.c
--- 25/drivers/block/ll_rw_blk.c~PF_READAHEAD-loop-fix	2003-08-06 16:59:29.000000000 -0700
+++ 25-akpm/drivers/block/ll_rw_blk.c	2003-08-06 17:40:27.000000000 -0700
@@ -1847,7 +1847,13 @@ static int __make_request(request_queue_
 
 	barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw);
 
-	ra = bio_flagged(bio, BIO_RW_AHEAD) || current->flags & PF_READAHEAD;
+	ra = bio_flagged(bio, BIO_RW_AHEAD);
+	if (current->flags & PF_READAHEAD) {
+		if (rw == WRITE)
+			WARN_ON(1);
+		else
+			ra = 1;
+	}
 
 again:
 	insert_here = NULL;

_


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Device-backed loop broken in 2.6.0-test2?
  2003-08-07  0:40 ` Andrew Morton
@ 2003-08-07  7:23   ` Thomas Themel
  2003-08-07 16:07   ` Valdis.Kletnieks
  2003-08-09 20:48   ` cryptoloop data corruption (was Re: Device-backed loop broken in 2.6.0-test2?) Thomas Themel
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Themel @ 2003-08-07  7:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Hi,
Andrew Morton (akpm@osdl.org) wrote on 2003-08-07:
> Thomas Themel <themel@iwoars.net> wrote:
> > it seems that device backed loopback is broken in the 2.6.0-test2 series.
> doh.

Patch applied, and it at least withstood the initial restoration of the
8 GB of data onto it, which I never managed with the unpatched version.

Thanks!

ciao,
-- 
[*Thomas  Themel*] Great Goddess Discordia, Holy Mother Eris, 
[extended contact] Joy of the Universe, Laughter of Space, Grant 
[info provided in] us Life, Light, Love and Liberty and make the 
[*message header*] bloody magick work!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Device-backed loop broken in 2.6.0-test2?
  2003-08-07  0:40 ` Andrew Morton
  2003-08-07  7:23   ` Thomas Themel
@ 2003-08-07 16:07   ` Valdis.Kletnieks
  2003-08-07 16:24     ` Valdis.Kletnieks
  2003-08-07 16:29     ` Andrew Morton
  2003-08-09 20:48   ` cryptoloop data corruption (was Re: Device-backed loop broken in 2.6.0-test2?) Thomas Themel
  2 siblings, 2 replies; 7+ messages in thread
From: Valdis.Kletnieks @ 2003-08-07 16:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Thomas Themel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

On Wed, 06 Aug 2003 17:40:43 PDT, Andrew Morton said:

> We're currently setting PF_READAHEAD across a call into the page allocator. 
> We end up calling writepage() with PF_READAHEAD set and the block layer
> aborts the writes, resulting in corrupted data.
> 
> It only seems to bite with loop-on-blockdev for some reason.

For what it's worth, I've been seeing these same symptoms on ext3 on an LVM
partition - so it's not *just* loop, it appears to be any filesystem that interposes
a mapping layer.  Hmm.. wonder if this explains the failures on RAID that somebody
was reporting, too....

/Valdis (who is off to apply the patch that Andrew attached, which doesn't appear to
be in -mm5)...


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Device-backed loop broken in 2.6.0-test2?
  2003-08-07 16:07   ` Valdis.Kletnieks
@ 2003-08-07 16:24     ` Valdis.Kletnieks
  2003-08-07 16:29     ` Andrew Morton
  1 sibling, 0 replies; 7+ messages in thread
From: Valdis.Kletnieks @ 2003-08-07 16:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Thomas Themel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

On Thu, 07 Aug 2003 12:07:32 EDT, Valdis.Kletnieks@vt.edu said:

> /Valdis (who is off to apply the patch that Andrew attached, which doesn't appear to
> be in -mm5)...

Passing curious.. the first 3 hunks of the patch aren't in -mm5, the last 2 (or
variants thereof) are.... of course I hit 'send' before checking past the first
3 hunks.. ;)

Are the first 3 superfluous, or did -mm5 get half a patch?


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Device-backed loop broken in 2.6.0-test2?
  2003-08-07 16:07   ` Valdis.Kletnieks
  2003-08-07 16:24     ` Valdis.Kletnieks
@ 2003-08-07 16:29     ` Andrew Morton
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2003-08-07 16:29 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: themel, linux-kernel

Valdis.Kletnieks@vt.edu wrote:
>
> /Valdis (who is off to apply the patch that Andrew attached, which doesn't appear to
>  be in -mm5)...

mm5 fixed it differently.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* cryptoloop data corruption (was Re: Device-backed loop broken in 2.6.0-test2?)
  2003-08-07  0:40 ` Andrew Morton
  2003-08-07  7:23   ` Thomas Themel
  2003-08-07 16:07   ` Valdis.Kletnieks
@ 2003-08-09 20:48   ` Thomas Themel
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Themel @ 2003-08-09 20:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton (akpm@osdl.org) wrote on 2003-08-07:
> Thomas Themel <themel@iwoars.net> wrote:
> > it seems that device backed loopback is broken in the 2.6.0-test2 series.
> 
> doh.

Hm, it seems that this patch doesn't apply to 2.6.0-test3, so I assume
that the 'other fix' from -mm5 is included? 

I still get data corruption on cryptoloop, but now it is a bit more
subtle... One bit of every byte at multiples of 0x200 is flipped,
starting with the one at 0x1000. 

See this for a short example (xxd output of file before and after copy
to cryptoloop):

--- good.xxd	2003-08-09 22:33:21.000000000 +0200
+++ b0rk.xxd	2003-08-09 22:32:59.000000000 +0200
@@ -256,3 +256,3 @@
 0000ff0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
-0001000: ffff ffff ffff ffff ffff ffff ffff ffff  ................
+0001000: f7ff ffff ffff ffff ffff ffff ffff ffff  ................
 0001010: ffff ffff ffff ffff ffff ffff ffff ffff  ................
@@ -288,3 +288,3 @@
 00011f0: 0ae0 004b 0000 0000 0000 0960 0000 0000  ...K.......`....
-0001200: 0001 2c00 0000 0000 0025 8000 0000 ffff  ..,......%......
+0001200: 0801 2c00 0000 0000 0025 8000 0000 ffff  ..,......%......
 0001210: ffff ffff ffff ffff ffff ffff ffff ffff  ................
@@ -320,3 +320,3 @@
 00013f0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
-0001400: ffff ffff ffff ffff ffff ffff ffff ffff  ................
+0001400: f7ff ffff ffff ffff ffff ffff ffff ffff  ................
 0001410: ffff ffff ffff ffff ffff ffff ffff ffff  ................
@@ -352,3 +352,3 @@
 00015f0: ffff ffff ffff ffff ffff ffff ffff ffff  ................
-0001600: ffff ffff ffff ffff ffff ffff ffff ffff  ................

Any ideas what's causing this? The files are ext3 on an AES cryptoloop
backed by an IDE partition. 

ciao,
-- 
[*Thomas  Themel*] US law prohibits boycotting Israel
[extended contact] 
[info provided in] <http://news.bbc.co.uk/2/hi/business/2403303.stm>
[*message header*] <http://www.bxa.doc.gov/AntiboycottCompliance/Default.htm>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-08-09 20:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-06 22:40 Device-backed loop broken in 2.6.0-test2? Thomas Themel
2003-08-07  0:40 ` Andrew Morton
2003-08-07  7:23   ` Thomas Themel
2003-08-07 16:07   ` Valdis.Kletnieks
2003-08-07 16:24     ` Valdis.Kletnieks
2003-08-07 16:29     ` Andrew Morton
2003-08-09 20:48   ` cryptoloop data corruption (was Re: Device-backed loop broken in 2.6.0-test2?) Thomas Themel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).