All of lore.kernel.org
 help / color / mirror / Atom feed
* raid10 make_request failure during iozone benchmark upon btrfs
@ 2012-07-02  2:34 Kerin Millar
  2012-07-02  2:52 ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Kerin Millar @ 2012-07-02  2:34 UTC (permalink / raw)
  To: linux-raid

Hello,

I'm running a 4-way RAID-10 array with the f2 layout scheme on a 3.5-rc5
kernel:

Personalities : [raid10] [raid6] [raid5] [raid4]
md0 : active raid10 sdb2[4] sdd2[3] sdc2[2] sda2[1]
       5860462592 blocks super 1.1 256K chunks 2 far-copies [4/4] [UUUU]

I am also using LVM, with md0 serving as the sole PV in a volume group
named vg0. The drives are brand new Hitachi Desktar 5K3000 drives and
they are known to be in good health. XFS is my filesystem of choice but
I recently created a volume so that I could benchmark btrfs with iozone
(just out of curiosity). The volume arrangement is as follows:

# lvs -o lv_name,lv_attr,lv_size,seg_pe_ranges
   LV     Attr   LSize   PE Ranges
   public -wi-ao   3.00t /dev/md0:25600-812031
   rootfs -wi-ao 100.00g /dev/md0:0-25599
   test   -wi-ao   2.00g /dev/md0:812032-812543

The btrfs filesystem was created as follows:

# mkfs.btrfs /dev/vg0/test
...
fs created label (null) on /dev/vg0/test
         nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
Btrfs Btrfs v0.19

I'm not sure whether this is a bug in the raid10 code but I am
encountering a reproducible error while running iozone -a. It triggers
during the tests that read and write 2MiB with a 4KiB record length.
Here's the tail end of iozone's output:

2048  4  530020  473540  1660915  1655474  1427182  388846  1405465  558811  1394966  462500  520324

Error in file: Found ?101010101010101? Expecting ?6d6d6d6d6d6d6d6d? addr 7ff7c8700000
Error in file: Position 131072
Record # 32 Record size 4 kb
where 7ff7c8700000 loop 0

Note that the last two column's worth of figures are missing, implying
that the failure occurs when iozone is running the fread/freread tests.

Here are the error messages from the kernel ring buffer:

[  919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256
[  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[  919.894060] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500672 256
[  919.894070] btrfs: bdev /dev/mapper/vg0-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
[  919.894634] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653501184 256
[  919.894643] btrfs: bdev /dev/mapper/vg0-test errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
[  919.895225] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653501696 256
[  919.895234] btrfs: bdev /dev/mapper/vg0-test errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
[  919.895801] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653502208 256
[  919.895811] btrfs: bdev /dev/mapper/vg0-test errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
[  919.896390] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653502720 256
[  919.896399] btrfs: bdev /dev/mapper/vg0-test errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
[  919.896981] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653503232 256
[  919.896990] btrfs: bdev /dev/mapper/vg0-test errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
[  920.029589] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653504256 256
[  920.029603] btrfs: bdev /dev/mapper/vg0-test errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[  920.030208] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653504768 256
[  920.030222] btrfs: bdev /dev/mapper/vg0-test errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
[  920.030788] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653505280 256
[  920.030802] btrfs: bdev /dev/mapper/vg0-test errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
[  920.031385] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653505792 256
[  920.031957] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653506304 256
[  920.032551] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653506816 256
[  920.033135] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653507328 256
[  920.161304] btrfs no csum found for inode 328 start 131072
[  920.180249] btrfs csum failed ino 328 off 131072 csum 2259312665 private 0

I have no intention of using btrfs for anything other than
experimentation. Sill, my fear is that something could be amiss in
the guts of the raid10 code. I'd welcome any insights as to what is
happening here.

Cheers,

--Kerin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 make_request failure during iozone benchmark upon btrfs
  2012-07-02  2:34 raid10 make_request failure during iozone benchmark upon btrfs Kerin Millar
@ 2012-07-02  2:52 ` NeilBrown
  2012-07-02  2:58   ` Kerin Millar
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2012-07-02  2:52 UTC (permalink / raw)
  To: Kerin Millar; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5572 bytes --]

On Mon, 02 Jul 2012 03:34:16 +0100 Kerin Millar <kerframil@gmail.com> wrote:

> Hello,
> 
> I'm running a 4-way RAID-10 array with the f2 layout scheme on a 3.5-rc5

I thought I fixed this in 3.5-rc2.
Maybe there is another bug....

Could you please double check that you are running a kernel with

commit aba336bd1d46d6b0404b06f6915ed76150739057
Author: NeilBrown <neilb@suse.de>
Date:   Thu May 31 15:39:11 2012 +1000

    md: raid1/raid10: fix problem with merge_bvec_fn

in it?

Thanks,
NeilBrown


> kernel:
> 
> Personalities : [raid10] [raid6] [raid5] [raid4]
> md0 : active raid10 sdb2[4] sdd2[3] sdc2[2] sda2[1]
>        5860462592 blocks super 1.1 256K chunks 2 far-copies [4/4] [UUUU]
> 
> I am also using LVM, with md0 serving as the sole PV in a volume group
> named vg0. The drives are brand new Hitachi Desktar 5K3000 drives and
> they are known to be in good health. XFS is my filesystem of choice but
> I recently created a volume so that I could benchmark btrfs with iozone
> (just out of curiosity). The volume arrangement is as follows:
> 
> # lvs -o lv_name,lv_attr,lv_size,seg_pe_ranges
>    LV     Attr   LSize   PE Ranges
>    public -wi-ao   3.00t /dev/md0:25600-812031
>    rootfs -wi-ao 100.00g /dev/md0:0-25599
>    test   -wi-ao   2.00g /dev/md0:812032-812543
> 
> The btrfs filesystem was created as follows:
> 
> # mkfs.btrfs /dev/vg0/test
> ...
> fs created label (null) on /dev/vg0/test
>          nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
> Btrfs Btrfs v0.19
> 
> I'm not sure whether this is a bug in the raid10 code but I am
> encountering a reproducible error while running iozone -a. It triggers
> during the tests that read and write 2MiB with a 4KiB record length.
> Here's the tail end of iozone's output:
> 
> 2048  4  530020  473540  1660915  1655474  1427182  388846  1405465  558811  1394966  462500  520324
> 
> Error in file: Found ?101010101010101? Expecting ?6d6d6d6d6d6d6d6d? addr 7ff7c8700000
> Error in file: Position 131072
> Record # 32 Record size 4 kb
> where 7ff7c8700000 loop 0
> 
> Note that the last two column's worth of figures are missing, implying
> that the failure occurs when iozone is running the fread/freread tests.
> 
> Here are the error messages from the kernel ring buffer:
> 
> [  919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256
> [  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> [  919.894060] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500672 256
> [  919.894070] btrfs: bdev /dev/mapper/vg0-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> [  919.894634] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653501184 256
> [  919.894643] btrfs: bdev /dev/mapper/vg0-test errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
> [  919.895225] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653501696 256
> [  919.895234] btrfs: bdev /dev/mapper/vg0-test errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
> [  919.895801] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653502208 256
> [  919.895811] btrfs: bdev /dev/mapper/vg0-test errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
> [  919.896390] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653502720 256
> [  919.896399] btrfs: bdev /dev/mapper/vg0-test errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
> [  919.896981] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653503232 256
> [  919.896990] btrfs: bdev /dev/mapper/vg0-test errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
> [  920.029589] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653504256 256
> [  920.029603] btrfs: bdev /dev/mapper/vg0-test errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
> [  920.030208] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653504768 256
> [  920.030222] btrfs: bdev /dev/mapper/vg0-test errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
> [  920.030788] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653505280 256
> [  920.030802] btrfs: bdev /dev/mapper/vg0-test errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
> [  920.031385] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653505792 256
> [  920.031957] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653506304 256
> [  920.032551] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653506816 256
> [  920.033135] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653507328 256
> [  920.161304] btrfs no csum found for inode 328 start 131072
> [  920.180249] btrfs csum failed ino 328 off 131072 csum 2259312665 private 0
> 
> I have no intention of using btrfs for anything other than
> experimentation. Sill, my fear is that something could be amiss in
> the guts of the raid10 code. I'd welcome any insights as to what is
> happening here.
> 
> Cheers,
> 
> --Kerin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 make_request failure during iozone benchmark upon btrfs
  2012-07-02  2:52 ` NeilBrown
@ 2012-07-02  2:58   ` Kerin Millar
  2012-07-03  1:39     ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Kerin Millar @ 2012-07-02  2:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neil,

On 02/07/2012 03:52, NeilBrown wrote:
> On Mon, 02 Jul 2012 03:34:16 +0100 Kerin Millar<kerframil@gmail.com>  wrote:
>
>> >  Hello,
>> >
>> >  I'm running a 4-way RAID-10 array with the f2 layout scheme on a 3.5-rc5
> I thought I fixed this in 3.5-rc2.
> Maybe there is another bug....
>
> Could you please double check that you are running a kernel with
>
> commit aba336bd1d46d6b0404b06f6915ed76150739057
> Author: NeilBrown<neilb@suse.de>
> Date:   Thu May 31 15:39:11 2012 +1000
>
>      md: raid1/raid10: fix problem with merge_bvec_fn
>
> in it?

I am indeed. I searched the list beforehand and noticed the patch in
question. Not sure which -rc it landed in but I checked my source tree
and it's definitely in there.

Cheers,

--Kerin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 make_request failure during iozone benchmark upon btrfs
  2012-07-02  2:58   ` Kerin Millar
@ 2012-07-03  1:39     ` NeilBrown
  2012-07-03  2:13       ` Kerin Millar
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2012-07-03  1:39 UTC (permalink / raw)
  To: Kerin Millar; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]

On Mon, 02 Jul 2012 03:58:57 +0100 Kerin Millar <kerframil@gmail.com> wrote:

> Hi Neil,
> 
> On 02/07/2012 03:52, NeilBrown wrote:
> > On Mon, 02 Jul 2012 03:34:16 +0100 Kerin Millar<kerframil@gmail.com>  wrote:
> >
> >> >  Hello,
> >> >
> >> >  I'm running a 4-way RAID-10 array with the f2 layout scheme on a 3.5-rc5
> > I thought I fixed this in 3.5-rc2.
> > Maybe there is another bug....
> >
> > Could you please double check that you are running a kernel with
> >
> > commit aba336bd1d46d6b0404b06f6915ed76150739057
> > Author: NeilBrown<neilb@suse.de>
> > Date:   Thu May 31 15:39:11 2012 +1000
> >
> >      md: raid1/raid10: fix problem with merge_bvec_fn
> >
> > in it?
> 
> I am indeed. I searched the list beforehand and noticed the patch in
> question. Not sure which -rc it landed in but I checked my source tree
> and it's definitely in there.
> 
> Cheers,
> 
> --Kerin

Thanks.
Looking at it again I see that it is definitely a different bug, that patch
wouldn't affect it.

But I cannot see what could possibly be causing the problem.
You have a 256K chunk size, so requests should be limited to 512 sectors
aligned at a 512-sector boundary.
However all the requests that a causing errors are 512 sectors long, but
aligned on a 256-sector boundary (which is not also 512-sector).  This is
wrong.

It could be that btrfs is submitting bad requests, but I think it always uses
bio_add_page, and bio_add_page appears to do the right thing.
It could be that dm-linear is causing problem, but it seems to correctly after
the underlying device for alignment, and reports that alignment to
bio_add_page.
It could be that md/raid10 is the problem but I cannot find any fault in
raid10_mergeable_bvec - performs much the same tests that the
raid01 make_request function does.

So it is a mystery.

Is this failure repeatable?

If so, could you please insert
   WARN_ON_ONCE(1);
in drivers/md/raid10.c where it prints out the message: just after the
"bad_map:" label.

Also, in raid10_mergeable_bvec, insert 
   WARN_ON_ONCE(max < 0);
just before
		if (max < 0)
			/* bio_add cannot handle a negative return */
			max = 0;

and then see if either of those generate a warning, and post the full stack
trace  if they do.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 make_request failure during iozone benchmark upon btrfs
  2012-07-03  1:39     ` NeilBrown
@ 2012-07-03  2:13       ` Kerin Millar
  2012-07-03  2:47         ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Kerin Millar @ 2012-07-03  2:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi,

On 03/07/2012 02:39, NeilBrown wrote:

[snip]

 >>> Could you please double check that you are running a kernel with
 >>>
 >>> commit aba336bd1d46d6b0404b06f6915ed76150739057
 >>> Author: NeilBrown<neilb@suse.de>
 >>> Date:   Thu May 31 15:39:11 2012 +1000
 >>>
 >>>       md: raid1/raid10: fix problem with merge_bvec_fn
 >>>
 >>> in it?
 >>
 >> I am indeed. I searched the list beforehand and noticed the patch in
 >> question. Not sure which -rc it landed in but I checked my source tree
 >> and it's definitely in there.
 >>
 >> Cheers,
 >>
 >> --Kerin
 >
 > Thanks.
 > Looking at it again I see that it is definitely a different bug, that patch
 > wouldn't affect it.
 >
 > But I cannot see what could possibly be causing the problem.
 > You have a 256K chunk size, so requests should be limited to 512 sectors
 > aligned at a 512-sector boundary.
 > However all the requests that a causing errors are 512 sectors long, but
 > aligned on a 256-sector boundary (which is not also 512-sector).  This is
 > wrong.

I see.

 >
 > It could be that btrfs is submitting bad requests, but I think it always uses
 > bio_add_page, and bio_add_page appears to do the right thing.
 > It could be that dm-linear is causing problem, but it seems to correctly after
 > the underlying device for alignment, and reports that alignment to
 > bio_add_page.
 > It could be that md/raid10 is the problem but I cannot find any fault in
 > raid10_mergeable_bvec - performs much the same tests that the
 > raid01 make_request function does.
 >
 > So it is a mystery.
 >
 > Is this failure repeatable?

Yes, it's reproducible with 100% consistency. Furthermore, I tried to
use the btrfs volume as a store for the package manager, so as to try
with a 'realistic' workload. Many of these errors were triggered
immediately upon invoking the package manager. In case it matters, the
package manager is portage (in Gentoo Linux) and the directory structure
entails a shallow directory depth with a large number of distributed
small files. I haven't been able to reproduce with xfs, ext4 or reiserfs.

 >
 > If so, could you please insert
 >     WARN_ON_ONCE(1);
 > in drivers/md/raid10.c where it prints out the message: just after the
 > "bad_map:" label.
 >
 > Also, in raid10_mergeable_bvec, insert
 >     WARN_ON_ONCE(max<  0);
 > just before
 > 		if (max<  0)
 > 			/* bio_add cannot handle a negative return */
 > 			max = 0;
 >
 > and then see if either of those generate a warning, and post the full stack
 > trace  if they do.

OK. I ran iozone again on a fresh filesystem, mounted with the default
options. Here's the trace that appears, just before the first
make_request_bug message:

WARNING: at drivers/md/raid10.c:1094 make_request+0xda5/0xe20()
Hardware name: ProLiant MicroServer
Modules linked in: btrfs zlib_deflate lzo_compress kvm_amd kvm sp5100_tco i2c_piix4
Pid: 1031, comm: btrfs-submit-1 Not tainted 3.5.0-rc5 #3
Call Trace:
[<ffffffff81031987>] ? warn_slowpath_common+0x67/0xa0
[<ffffffff81442b45>] ? make_request+0xda5/0xe20
[<ffffffff81460b34>] ? __split_and_process_bio+0x2d4/0x600
[<ffffffff81063429>] ? set_next_entity+0x29/0x60
[<ffffffff810652c3>] ? pick_next_task_fair+0x63/0x140
[<ffffffff81450b7f>] ? md_make_request+0xbf/0x1e0
[<ffffffff8123d12f>] ? generic_make_request+0xaf/0xe0
[<ffffffff8123d1c3>] ? submit_bio+0x63/0xe0
[<ffffffff81040abd>] ? try_to_del_timer_sync+0x7d/0x120
[<ffffffffa016839a>] ? run_scheduled_bios+0x23a/0x520 [btrfs]
[<ffffffffa0170e40>] ? worker_loop+0x120/0x520 [btrfs]
[<ffffffffa0170d20>] ? btrfs_queue_worker+0x2e0/0x2e0 [btrfs]
[<ffffffff810520c5>] ? kthread+0x85/0xa0
[<ffffffff815441f4>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81052040>] ? kthread_freezable_should_stop+0x60/0x60
[<ffffffff815441f0>] ? gs_change+0xb/0xb

Cheers,

--Kerin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 make_request failure during iozone benchmark upon btrfs
  2012-07-03  2:13       ` Kerin Millar
@ 2012-07-03  2:47         ` NeilBrown
  2012-07-03 15:08           ` Chris Mason
  2012-07-07 17:29           ` Kerin Millar
  0 siblings, 2 replies; 8+ messages in thread
From: NeilBrown @ 2012-07-03  2:47 UTC (permalink / raw)
  To: Kerin Millar; +Cc: linux-raid, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5526 bytes --]

On Tue, 03 Jul 2012 03:13:33 +0100 Kerin Millar <kerframil@gmail.com> wrote:

> Hi,
> 
> On 03/07/2012 02:39, NeilBrown wrote:
> 
> [snip]
> 
>  >>> Could you please double check that you are running a kernel with
>  >>>
>  >>> commit aba336bd1d46d6b0404b06f6915ed76150739057
>  >>> Author: NeilBrown<neilb@suse.de>
>  >>> Date:   Thu May 31 15:39:11 2012 +1000
>  >>>
>  >>>       md: raid1/raid10: fix problem with merge_bvec_fn
>  >>>
>  >>> in it?
>  >>
>  >> I am indeed. I searched the list beforehand and noticed the patch in
>  >> question. Not sure which -rc it landed in but I checked my source tree
>  >> and it's definitely in there.
>  >>
>  >> Cheers,
>  >>
>  >> --Kerin
>  >
>  > Thanks.
>  > Looking at it again I see that it is definitely a different bug, that patch
>  > wouldn't affect it.
>  >
>  > But I cannot see what could possibly be causing the problem.
>  > You have a 256K chunk size, so requests should be limited to 512 sectors
>  > aligned at a 512-sector boundary.
>  > However all the requests that a causing errors are 512 sectors long, but
>  > aligned on a 256-sector boundary (which is not also 512-sector).  This is
>  > wrong.
> 
> I see.
> 
>  >
>  > It could be that btrfs is submitting bad requests, but I think it always uses
>  > bio_add_page, and bio_add_page appears to do the right thing.
>  > It could be that dm-linear is causing problem, but it seems to correctly after
>  > the underlying device for alignment, and reports that alignment to
>  > bio_add_page.
>  > It could be that md/raid10 is the problem but I cannot find any fault in
>  > raid10_mergeable_bvec - performs much the same tests that the
>  > raid01 make_request function does.
>  >
>  > So it is a mystery.
>  >
>  > Is this failure repeatable?
> 
> Yes, it's reproducible with 100% consistency. Furthermore, I tried to
> use the btrfs volume as a store for the package manager, so as to try
> with a 'realistic' workload. Many of these errors were triggered
> immediately upon invoking the package manager. In case it matters, the
> package manager is portage (in Gentoo Linux) and the directory structure
> entails a shallow directory depth with a large number of distributed
> small files. I haven't been able to reproduce with xfs, ext4 or reiserfs.
> 
>  >
>  > If so, could you please insert
>  >     WARN_ON_ONCE(1);
>  > in drivers/md/raid10.c where it prints out the message: just after the
>  > "bad_map:" label.
>  >
>  > Also, in raid10_mergeable_bvec, insert
>  >     WARN_ON_ONCE(max<  0);
>  > just before
>  > 		if (max<  0)
>  > 			/* bio_add cannot handle a negative return */
>  > 			max = 0;
>  >
>  > and then see if either of those generate a warning, and post the full stack
>  > trace  if they do.
> 
> OK. I ran iozone again on a fresh filesystem, mounted with the default
> options. Here's the trace that appears, just before the first
> make_request_bug message:
> 
> WARNING: at drivers/md/raid10.c:1094 make_request+0xda5/0xe20()
> Hardware name: ProLiant MicroServer
> Modules linked in: btrfs zlib_deflate lzo_compress kvm_amd kvm sp5100_tco i2c_piix4
> Pid: 1031, comm: btrfs-submit-1 Not tainted 3.5.0-rc5 #3
> Call Trace:
> [<ffffffff81031987>] ? warn_slowpath_common+0x67/0xa0
> [<ffffffff81442b45>] ? make_request+0xda5/0xe20
> [<ffffffff81460b34>] ? __split_and_process_bio+0x2d4/0x600
> [<ffffffff81063429>] ? set_next_entity+0x29/0x60
> [<ffffffff810652c3>] ? pick_next_task_fair+0x63/0x140
> [<ffffffff81450b7f>] ? md_make_request+0xbf/0x1e0
> [<ffffffff8123d12f>] ? generic_make_request+0xaf/0xe0
> [<ffffffff8123d1c3>] ? submit_bio+0x63/0xe0
> [<ffffffff81040abd>] ? try_to_del_timer_sync+0x7d/0x120
> [<ffffffffa016839a>] ? run_scheduled_bios+0x23a/0x520 [btrfs]
> [<ffffffffa0170e40>] ? worker_loop+0x120/0x520 [btrfs]
> [<ffffffffa0170d20>] ? btrfs_queue_worker+0x2e0/0x2e0 [btrfs]
> [<ffffffff810520c5>] ? kthread+0x85/0xa0
> [<ffffffff815441f4>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff81052040>] ? kthread_freezable_should_stop+0x60/0x60
> [<ffffffff815441f0>] ? gs_change+0xb/0xb
> 
> Cheers,
> 
> --Kerin

Thanks.  Looks like it is a btrfs bug - so a big "hello" to linux-btrfs :-)

The symptom is that iozone on btrfs on md/raid10 can result in

[  919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256
[  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0


i.e. RAID10 has a 256K chunk size, but is getting 256K requests which overlap
two chunks - the last half of one chunk and the first half of the next.
That isn't allowed and raid10_mergeable_bvec, called by bio_add_page, should
prevent it.

However btrfs_map_bio() sets ->bi_sector to a new value without verifying
that the resulting bio is still acceptable - which it isn't.

The core problem is that you cannot build a bio for one location, then use it
freely at another location.
md/raid1 handles this by checking each addition to a bio against all the
possible location that it might read/write it.  Maybe btrfs could do the
same.
Alternately we could work with Kent Overstreet (of bcache fame) to remove the
restriction that the fs must make the bio compatible with the device -
instead requiring the device to split bios when needed, and making it easy to
do that (currently it is not easy).
And there are probably other alternative.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 make_request failure during iozone benchmark upon btrfs
  2012-07-03  2:47         ` NeilBrown
@ 2012-07-03 15:08           ` Chris Mason
  2012-07-07 17:29           ` Kerin Millar
  1 sibling, 0 replies; 8+ messages in thread
From: Chris Mason @ 2012-07-03 15:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: Kerin Millar, linux-raid, linux-btrfs

On Mon, Jul 02, 2012 at 08:47:27PM -0600, NeilBrown wrote:
> Thanks.  Looks like it is a btrfs bug - so a big "hello" to linux-btrfs :-)
> 
> The symptom is that iozone on btrfs on md/raid10 can result in
> 
> [  919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256
> [  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> 
> 
> i.e. RAID10 has a 256K chunk size, but is getting 256K requests which overlap
> two chunks - the last half of one chunk and the first half of the next.
> That isn't allowed and raid10_mergeable_bvec, called by bio_add_page, should
> prevent it.
> 
> However btrfs_map_bio() sets ->bi_sector to a new value without verifying
> that the resulting bio is still acceptable - which it isn't.
> 
> The core problem is that you cannot build a bio for one location, then use it
> freely at another location.
> md/raid1 handles this by checking each addition to a bio against all the
> possible location that it might read/write it.  Maybe btrfs could do the
> same.
> Alternately we could work with Kent Overstreet (of bcache fame) to remove the
> restriction that the fs must make the bio compatible with the device -
> instead requiring the device to split bios when needed, and making it easy to
> do that (currently it is not easy).
> And there are probably other alternative.

In this case btrfs should really break the bio down to smaller chunks
and hand feed the lower layers.  There are corners where we think the
device can go a certain size and then later on figure out we were just
too optimistic.  So we should deal with it by breaking the bio up and
then lowering our max.

-chris


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid10 make_request failure during iozone benchmark upon btrfs
  2012-07-03  2:47         ` NeilBrown
  2012-07-03 15:08           ` Chris Mason
@ 2012-07-07 17:29           ` Kerin Millar
  1 sibling, 0 replies; 8+ messages in thread
From: Kerin Millar @ 2012-07-07 17:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 03/07/2012 03:47, NeilBrown wrote:

[snip]

> Thanks.  Looks like it is a btrfs bug - so a big "hello" to linux-btrfs :-)
>
> The symptom is that iozone on btrfs on md/raid10 can result in
>
> [  919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256
> [  919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
>
>
> i.e. RAID10 has a 256K chunk size, but is getting 256K requests which overlap
> two chunks - the last half of one chunk and the first half of the next.
> That isn't allowed and raid10_mergeable_bvec, called by bio_add_page, should
> prevent it.
>
> However btrfs_map_bio() sets ->bi_sector to a new value without verifying
> that the resulting bio is still acceptable - which it isn't.
>
> The core problem is that you cannot build a bio for one location, then use it
> freely at another location.
> md/raid1 handles this by checking each addition to a bio against all the
> possible location that it might read/write it.  Maybe btrfs could do the
> same.
> Alternately we could work with Kent Overstreet (of bcache fame) to remove the
> restriction that the fs must make the bio compatible with the device -
> instead requiring the device to split bios when needed, and making it easy to
> do that (currently it is not easy).
> And there are probably other alternative.
>

Thanks very much for identifying the bug. I'm glad to find that the raid 
subsystem is not at fault. I'll give btrfs a spin at some point in the 
future and see whether anything has changed by then.

Cheers,

--Kerin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-07-07 17:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-02  2:34 raid10 make_request failure during iozone benchmark upon btrfs Kerin Millar
2012-07-02  2:52 ` NeilBrown
2012-07-02  2:58   ` Kerin Millar
2012-07-03  1:39     ` NeilBrown
2012-07-03  2:13       ` Kerin Millar
2012-07-03  2:47         ` NeilBrown
2012-07-03 15:08           ` Chris Mason
2012-07-07 17:29           ` Kerin Millar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.