All of lore.kernel.org
 help / color / mirror / Atom feed
* How to stress test raid6 on 122 disk array
@ 2016-08-04 17:43 Martin
  2016-08-04 19:05 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 18+ messages in thread
From: Martin @ 2016-08-04 17:43 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi,

I would like to find rare raid6 bugs in btrfs, where I have the following hw:

* 2x 8 core CPU
* 128GB ram
* 70 FC disk array (56x 500GB + 14x 1TB SATA disks)
* 24 FC or 2x SAS disk array (1TB SAS disks)
* 16 FC disk array (1TB SATA disks)
* 12 SAS disk array (3TB SATA disks)

The test can run for a month or so.

I prefer CentOS/Fedora, but if someone will write a script that
configures and compiles a preferred kernel, then we can do that on any
preferred OS.

Can anyone give recommendations on how the setup should be configured
to most likely find rare raid6 bugs?

And does there exist a script that is good for testing this sort of thing?

Best regards,
Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-04 17:43 How to stress test raid6 on 122 disk array Martin
@ 2016-08-04 19:05 ` Austin S. Hemmelgarn
  2016-08-04 20:01   ` Chris Murphy
  0 siblings, 1 reply; 18+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-04 19:05 UTC (permalink / raw)
  To: Martin, Btrfs BTRFS

On 2016-08-04 13:43, Martin wrote:
> Hi,
>
> I would like to find rare raid6 bugs in btrfs, where I have the following hw:
>
> * 2x 8 core CPU
> * 128GB ram
> * 70 FC disk array (56x 500GB + 14x 1TB SATA disks)
> * 24 FC or 2x SAS disk array (1TB SAS disks)
> * 16 FC disk array (1TB SATA disks)
> * 12 SAS disk array (3TB SATA disks)
>
> The test can run for a month or so.
>
> I prefer CentOS/Fedora, but if someone will write a script that
> configures and compiles a preferred kernel, then we can do that on any
> preferred OS.
>
> Can anyone give recommendations on how the setup should be configured
> to most likely find rare raid6 bugs?
>
> And does there exist a script that is good for testing this sort of thing?
I'm glad to hear there people interested in testing BTRFS for the 
purpose of finding bugs.  Sadly I can't provide much help in this 
respect (I do testing, but it's all regression testing these days).

Regarding OS, I'd avoid CentOS for testing something like BTRFS unless 
you specifically want to help their development team fix issues.  They 
have a large number of back-ported patches, and it's not all that 
practical for us to chase down bugs in such a situation, because it 
could just as easily be a bug introduced by the back-porting process or 
may be fixed in the mainline kernel anyway.  Fedora should be fine 
(they're good about staying up to date), but if possible you should 
probably use Rawhide instead of a regular release, as that will give you 
quite possibly one of the closest distribution kernels to a mainline 
Linux kernel available, and will make sure everything is as up to date 
as possible.

As far as testing, I don't know that there are any scripts for this type 
of thing, you may want to look into dbench, fio, iozone, and similar 
tools though, as well as xfstests (which is more about regression 
testing, but is still worth looking at).

Most of the big known issues with RAID6 in BTRFS at the moment involve 
device failures and array recovery, but most of them aren't well 
characterized and nobody's really sure why they're happening, so if you 
want to look for something specific, figuring out those issues would be 
a great place to start (even if they aren't rare bugs).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-04 19:05 ` Austin S. Hemmelgarn
@ 2016-08-04 20:01   ` Chris Murphy
  2016-08-04 20:51     ` Martin
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Murphy @ 2016-08-04 20:01 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Martin, Btrfs BTRFS

On Thu, Aug 4, 2016 at 1:05 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

>Fedora should be fine (they're good about staying up to
> date), but if possible you should probably use Rawhide instead of a regular
> release, as that will give you quite possibly one of the closest
> distribution kernels to a mainline Linux kernel available, and will make
> sure everything is as up to date as possible.

Yes. It's possible to run on a release version (currently Fedora 23
and Fedora 24) and run a Rawhide kernel. This is what I often do.


> As far as testing, I don't know that there are any scripts for this type of
> thing, you may want to look into dbench, fio, iozone, and similar tools
> though, as well as xfstests (which is more about regression testing, but is
> still worth looking at).
>
> Most of the big known issues with RAID6 in BTRFS at the moment involve
> device failures and array recovery, but most of them aren't well
> characterized and nobody's really sure why they're happening, so if you want
> to look for something specific, figuring out those issues would be a great
> place to start (even if they aren't rare bugs).

Yeah it seems pretty reliable to do normal things with raid56 arrays.
The problem is when they're degraded, weird stuff seems to happen some
of the time. So it might be valid to have several raid56's that are
intentionally running in degraded mode with some tests that will
tolerate that and see when it breaks and why.

There is also in the archives the bug where parity is being computed
wrongly when a data strip is wrong (corrupt), and Btrfs sees this,
reports the mismatch, fixes the mismatch, recomputes parity for some
reason, and the parity is then wrong. It'd be nice to know when else
this can happen, if it's possible parity is recomputed (and wrongly)
on a normal read, or a balance, or if it's really restricted to scrub.

Another test might be raid 1 or raid10 metadata vs raid56 for data.
That'd probably be more performance related, but there might be some
unexpected behaviors that crop up.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-04 20:01   ` Chris Murphy
@ 2016-08-04 20:51     ` Martin
  2016-08-04 21:12       ` Chris Murphy
  0 siblings, 1 reply; 18+ messages in thread
From: Martin @ 2016-08-04 20:51 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

Thanks for the benchmark tools and tips on where the issues might be.

Is Fedora 24 rawhide preferred over ArchLinux?

If I want to compile a mainline kernel. Are there anything I need to tune?

When I do the tests, how do I log the info you would like to see, if I
find a bug?



On 4 August 2016 at 22:01, Chris Murphy <lists@colorremedies.com> wrote:
> On Thu, Aug 4, 2016 at 1:05 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>>Fedora should be fine (they're good about staying up to
>> date), but if possible you should probably use Rawhide instead of a regular
>> release, as that will give you quite possibly one of the closest
>> distribution kernels to a mainline Linux kernel available, and will make
>> sure everything is as up to date as possible.
>
> Yes. It's possible to run on a release version (currently Fedora 23
> and Fedora 24) and run a Rawhide kernel. This is what I often do.
>
>
>> As far as testing, I don't know that there are any scripts for this type of
>> thing, you may want to look into dbench, fio, iozone, and similar tools
>> though, as well as xfstests (which is more about regression testing, but is
>> still worth looking at).
>>
>> Most of the big known issues with RAID6 in BTRFS at the moment involve
>> device failures and array recovery, but most of them aren't well
>> characterized and nobody's really sure why they're happening, so if you want
>> to look for something specific, figuring out those issues would be a great
>> place to start (even if they aren't rare bugs).
>
> Yeah it seems pretty reliable to do normal things with raid56 arrays.
> The problem is when they're degraded, weird stuff seems to happen some
> of the time. So it might be valid to have several raid56's that are
> intentionally running in degraded mode with some tests that will
> tolerate that and see when it breaks and why.
>
> There is also in the archives the bug where parity is being computed
> wrongly when a data strip is wrong (corrupt), and Btrfs sees this,
> reports the mismatch, fixes the mismatch, recomputes parity for some
> reason, and the parity is then wrong. It'd be nice to know when else
> this can happen, if it's possible parity is recomputed (and wrongly)
> on a normal read, or a balance, or if it's really restricted to scrub.
>
> Another test might be raid 1 or raid10 metadata vs raid56 for data.
> That'd probably be more performance related, but there might be some
> unexpected behaviors that crop up.
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-04 20:51     ` Martin
@ 2016-08-04 21:12       ` Chris Murphy
  2016-08-04 22:19         ` Martin
  2016-08-05 11:39         ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Murphy @ 2016-08-04 21:12 UTC (permalink / raw)
  To: Martin; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

On Thu, Aug 4, 2016 at 2:51 PM, Martin <rc6encrypted@gmail.com> wrote:
> Thanks for the benchmark tools and tips on where the issues might be.
>
> Is Fedora 24 rawhide preferred over ArchLinux?

I'm not sure what Arch does any differently to their kernels from
kernel.org kernels. But bugzilla.kernel.org offers a Mainline and
Fedora drop down for identifying the kernel source tree.

>
> If I want to compile a mainline kernel. Are there anything I need to tune?

Fedora kernels do not have these options set.

# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set

The sanity and integrity tests are both compile time and mount time
options, i.e. it has to be compiled enabled for the mount option to do
anything. I can't recall any thread where a developer asked a user to
set any of these options for testing though.


> When I do the tests, how do I log the info you would like to see, if I
> find a bug?

bugzilla.kernel.org for tracking, and then reference the URL for the
bug with a summary in an email to list is how I usually do it. The
main thing is going to be the exact reproduce steps. It's also better,
I think, to have complete dmesg (or journalctl -k) attached to the bug
report because not all problems are directly related to Btrfs, they
can have contributing factors elsewhere. And various MTAs, or more
commonly MUAs, have a tendancy to wrap such wide text as found in
kernel or journald messages.

And then whatever Austin says.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-04 21:12       ` Chris Murphy
@ 2016-08-04 22:19         ` Martin
  2016-08-05 10:15           ` Erkki Seppala
  2016-08-05 11:39         ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 18+ messages in thread
From: Martin @ 2016-08-04 22:19 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

Excellent. Thanks.

In order to automate it, would it be ok if I dd some zeroes directly
to the devices to corrupt them, or do need to physically take the
disks out while running?

The smallest disk of the 122 is 500GB. Is it possible to have btrfs
see each disk as only e.g. 10GB? That way I can corrupt and resilver
more disks over a month.








On 4 August 2016 at 23:12, Chris Murphy <lists@colorremedies.com> wrote:
> On Thu, Aug 4, 2016 at 2:51 PM, Martin <rc6encrypted@gmail.com> wrote:
>> Thanks for the benchmark tools and tips on where the issues might be.
>>
>> Is Fedora 24 rawhide preferred over ArchLinux?
>
> I'm not sure what Arch does any differently to their kernels from
> kernel.org kernels. But bugzilla.kernel.org offers a Mainline and
> Fedora drop down for identifying the kernel source tree.
>
>>
>> If I want to compile a mainline kernel. Are there anything I need to tune?
>
> Fedora kernels do not have these options set.
>
> # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
> # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
> # CONFIG_BTRFS_DEBUG is not set
> # CONFIG_BTRFS_ASSERT is not set
>
> The sanity and integrity tests are both compile time and mount time
> options, i.e. it has to be compiled enabled for the mount option to do
> anything. I can't recall any thread where a developer asked a user to
> set any of these options for testing though.
>
>
>> When I do the tests, how do I log the info you would like to see, if I
>> find a bug?
>
> bugzilla.kernel.org for tracking, and then reference the URL for the
> bug with a summary in an email to list is how I usually do it. The
> main thing is going to be the exact reproduce steps. It's also better,
> I think, to have complete dmesg (or journalctl -k) attached to the bug
> report because not all problems are directly related to Btrfs, they
> can have contributing factors elsewhere. And various MTAs, or more
> commonly MUAs, have a tendancy to wrap such wide text as found in
> kernel or journald messages.
>
> And then whatever Austin says.
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-04 22:19         ` Martin
@ 2016-08-05 10:15           ` Erkki Seppala
  2016-08-15 12:19             ` Martin
  0 siblings, 1 reply; 18+ messages in thread
From: Erkki Seppala @ 2016-08-05 10:15 UTC (permalink / raw)
  To: linux-btrfs

Martin <rc6encrypted@gmail.com> writes:

> The smallest disk of the 122 is 500GB. Is it possible to have btrfs
> see each disk as only e.g. 10GB? That way I can corrupt and resilver
> more disks over a month.

Well, at least you can easily partition the devices for that to happen.

However, I would also suggest that would it be more useful use of the
resource to run many arrays in parallel? Ie. one 6-device raid6, one
20-device raid6, and then perhaps use the rest of the devices for a very
large btrfs filesystem? Or if you have been using partitioning the large
btrfs volume can also be composed of all the 122 devices; in fact you
could even run multiple 122-device raid6s and use different kind of
testing on each. For performance testing you might only excert one of
the file systems at a time, though.

-- 
  _____________________________________________________________________
     / __// /__ ____  __               http://www.modeemi.fi/~flux/\   \
    / /_ / // // /\ \/ /                                            \  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi                                  \/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-04 21:12       ` Chris Murphy
  2016-08-04 22:19         ` Martin
@ 2016-08-05 11:39         ` Austin S. Hemmelgarn
  2016-08-15 12:19           ` Martin
  1 sibling, 1 reply; 18+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-05 11:39 UTC (permalink / raw)
  To: Chris Murphy, Martin; +Cc: Btrfs BTRFS

On 2016-08-04 17:12, Chris Murphy wrote:
> On Thu, Aug 4, 2016 at 2:51 PM, Martin <rc6encrypted@gmail.com> wrote:
>> Thanks for the benchmark tools and tips on where the issues might be.
>>
>> Is Fedora 24 rawhide preferred over ArchLinux?
>
> I'm not sure what Arch does any differently to their kernels from
> kernel.org kernels. But bugzilla.kernel.org offers a Mainline and
> Fedora drop down for identifying the kernel source tree.
IIRC, they're pretty close to mainline kernels.  I don't think they have 
any patches in the filesystem or block layer code at least, but I may be 
wrong, it's been a long time since I looked at an Arch kernel.
>
>>
>> If I want to compile a mainline kernel. Are there anything I need to tune?
>
> Fedora kernels do not have these options set.
>
> # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
> # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
> # CONFIG_BTRFS_DEBUG is not set
> # CONFIG_BTRFS_ASSERT is not set
>
> The sanity and integrity tests are both compile time and mount time
> options, i.e. it has to be compiled enabled for the mount option to do
> anything. I can't recall any thread where a developer asked a user to
> set any of these options for testing though.
FWIW, I actually have the integrity checking code built in on most 
kernels I build.  I don't often use it, but it has near zero overhead 
when not enabled, and it's helped me track down lower-level storage 
configuration issues on occasion.
>
>
>> When I do the tests, how do I log the info you would like to see, if I
>> find a bug?
>
> bugzilla.kernel.org for tracking, and then reference the URL for the
> bug with a summary in an email to list is how I usually do it. The
> main thing is going to be the exact reproduce steps. It's also better,
> I think, to have complete dmesg (or journalctl -k) attached to the bug
> report because not all problems are directly related to Btrfs, they
> can have contributing factors elsewhere. And various MTAs, or more
> commonly MUAs, have a tendancy to wrap such wide text as found in
> kernel or journald messages.
Aside from kernel messages, the other general stuff you want to have is:
1. Kernel version and userspace tools version (`uname -a` and `btrfs 
--version`)
2. Any underlying storage configuration if it's not just plain a SSD/HDD 
or partitions (for example, usage of dm-crypt, LVM, mdadm, and similar 
things).
3. Output from `btrfs filesystem show` (this can be trimmed to the 
filesystem that's having the issue).
4. If you can still mount the filesystem, `btrfs filesystem df` output 
can be helpful.
5. If you can't mount the filesystem, output from `btrfs check` run 
without any options will usually be asked for.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-05 11:39         ` Austin S. Hemmelgarn
@ 2016-08-15 12:19           ` Martin
  2016-08-15 12:44             ` Austin S. Hemmelgarn
  2016-08-15 13:40             ` Chris Murphy
  0 siblings, 2 replies; 18+ messages in thread
From: Martin @ 2016-08-15 12:19 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Btrfs BTRFS

>> I'm not sure what Arch does any differently to their kernels from
>> kernel.org kernels. But bugzilla.kernel.org offers a Mainline and
>> Fedora drop down for identifying the kernel source tree.
>
> IIRC, they're pretty close to mainline kernels.  I don't think they have any
> patches in the filesystem or block layer code at least, but I may be wrong,
> it's been a long time since I looked at an Arch kernel.

Perhaps I should use Arch then, as Fedora rawhide kernel wouldn't boot
on my hw, so I am running the stock Fedora 24 kernel right now for the
tests...

>>> If I want to compile a mainline kernel. Are there anything I need to
>>> tune?
>>
>>
>> Fedora kernels do not have these options set.
>>
>> # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
>> # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
>> # CONFIG_BTRFS_DEBUG is not set
>> # CONFIG_BTRFS_ASSERT is not set
>>
>> The sanity and integrity tests are both compile time and mount time
>> options, i.e. it has to be compiled enabled for the mount option to do
>> anything. I can't recall any thread where a developer asked a user to
>> set any of these options for testing though.

> FWIW, I actually have the integrity checking code built in on most kernels I
> build.  I don't often use it, but it has near zero overhead when not
> enabled, and it's helped me track down lower-level storage configuration
> issues on occasion.

I'll give that a shot tomorrow.

>>> When I do the tests, how do I log the info you would like to see, if I
>>> find a bug?
>>
>>
>> bugzilla.kernel.org for tracking, and then reference the URL for the
>> bug with a summary in an email to list is how I usually do it. The
>> main thing is going to be the exact reproduce steps. It's also better,
>> I think, to have complete dmesg (or journalctl -k) attached to the bug
>> report because not all problems are directly related to Btrfs, they
>> can have contributing factors elsewhere. And various MTAs, or more
>> commonly MUAs, have a tendancy to wrap such wide text as found in
>> kernel or journald messages.
>
> Aside from kernel messages, the other general stuff you want to have is:
> 1. Kernel version and userspace tools version (`uname -a` and `btrfs
> --version`)
> 2. Any underlying storage configuration if it's not just plain a SSD/HDD or
> partitions (for example, usage of dm-crypt, LVM, mdadm, and similar things).
> 3. Output from `btrfs filesystem show` (this can be trimmed to the
> filesystem that's having the issue).
> 4. If you can still mount the filesystem, `btrfs filesystem df` output can
> be helpful.
> 5. If you can't mount the filesystem, output from `btrfs check` run without
> any options will usually be asked for.

I have now had the first crash, can you take a look if I have provided
the needed info?

https://bugzilla.kernel.org/show_bug.cgi?id=153141

How long should I keep the host untouched? Or is all interesting idea provided?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-05 10:15           ` Erkki Seppala
@ 2016-08-15 12:19             ` Martin
  2016-08-15 12:38               ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 18+ messages in thread
From: Martin @ 2016-08-15 12:19 UTC (permalink / raw)
  To: Erkki Seppala; +Cc: Btrfs BTRFS

>> The smallest disk of the 122 is 500GB. Is it possible to have btrfs
>> see each disk as only e.g. 10GB? That way I can corrupt and resilver
>> more disks over a month.
>
> Well, at least you can easily partition the devices for that to happen.

Can it be done with btrfs or should I do it with gdisk?

> However, I would also suggest that would it be more useful use of the
> resource to run many arrays in parallel? Ie. one 6-device raid6, one
> 20-device raid6, and then perhaps use the rest of the devices for a very
> large btrfs filesystem? Or if you have been using partitioning the large
> btrfs volume can also be composed of all the 122 devices; in fact you
> could even run multiple 122-device raid6s and use different kind of
> testing on each. For performance testing you might only excert one of
> the file systems at a time, though.

Very interesting idea, which leads me to the following question:

For the past weeks have I had all 122 disks in one raid6 filesystem,
and since I didn't entered any vdev (zfs term) size, I suspect only 2
of the 122 disks are parity.

If, how can I make the filesystem, so for every 6 disks, 2 of them are parity?

Reading the mkfs.btrfs man page gives me the impression that it can't
be done, which I find hard to believe.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-15 12:19             ` Martin
@ 2016-08-15 12:38               ` Austin S. Hemmelgarn
  2016-08-15 13:39                 ` Martin
  0 siblings, 1 reply; 18+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-15 12:38 UTC (permalink / raw)
  To: Martin; +Cc: Erkki Seppala, Btrfs BTRFS

On 2016-08-15 08:19, Martin wrote:
>>> The smallest disk of the 122 is 500GB. Is it possible to have btrfs
>>> see each disk as only e.g. 10GB? That way I can corrupt and resilver
>>> more disks over a month.
>>
>> Well, at least you can easily partition the devices for that to happen.
>
> Can it be done with btrfs or should I do it with gdisk?
With gdisk.  BTRFS includes some volume management features, but it 
doesn't handle partitioning itself.
>
>> However, I would also suggest that would it be more useful use of the
>> resource to run many arrays in parallel? Ie. one 6-device raid6, one
>> 20-device raid6, and then perhaps use the rest of the devices for a very
>> large btrfs filesystem? Or if you have been using partitioning the large
>> btrfs volume can also be composed of all the 122 devices; in fact you
>> could even run multiple 122-device raid6s and use different kind of
>> testing on each. For performance testing you might only excert one of
>> the file systems at a time, though.
>
> Very interesting idea, which leads me to the following question:
>
> For the past weeks have I had all 122 disks in one raid6 filesystem,
> and since I didn't entered any vdev (zfs term) size, I suspect only 2
> of the 122 disks are parity.
>
> If, how can I make the filesystem, so for every 6 disks, 2 of them are parity?
>
> Reading the mkfs.btrfs man page gives me the impression that it can't
> be done, which I find hard to believe.
That really is the case, there's currently no way to do this with BTRFS. 
  You have to keep in mind that the raid5/6 code only went into the 
mainline kernel a few versions ago, and it's still pretty immature as 
far as kernel code goes.  I don't know when (if ever) such a feature 
might get put in, but it's definitely something to add to the list of 
things that would be nice to have.

For the moment, the only option to achieve something like this is to set 
up a bunch of separate 8 device filesystems, but I would be willing to 
bet that the way you have it configured right now is closer to what most 
people would be doing in a regular deployment, and therefore is probably 
more valuable for testing.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-15 12:19           ` Martin
@ 2016-08-15 12:44             ` Austin S. Hemmelgarn
  2016-08-15 13:38               ` Martin
  2016-08-15 13:40             ` Chris Murphy
  1 sibling, 1 reply; 18+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-15 12:44 UTC (permalink / raw)
  To: Martin; +Cc: Chris Murphy, Btrfs BTRFS

On 2016-08-15 08:19, Martin wrote:
>>> I'm not sure what Arch does any differently to their kernels from
>>> kernel.org kernels. But bugzilla.kernel.org offers a Mainline and
>>> Fedora drop down for identifying the kernel source tree.
>>
>> IIRC, they're pretty close to mainline kernels.  I don't think they have any
>> patches in the filesystem or block layer code at least, but I may be wrong,
>> it's been a long time since I looked at an Arch kernel.
>
> Perhaps I should use Arch then, as Fedora rawhide kernel wouldn't boot
> on my hw, so I am running the stock Fedora 24 kernel right now for the
> tests...
>
>>>> If I want to compile a mainline kernel. Are there anything I need to
>>>> tune?
>>>
>>>
>>> Fedora kernels do not have these options set.
>>>
>>> # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
>>> # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
>>> # CONFIG_BTRFS_DEBUG is not set
>>> # CONFIG_BTRFS_ASSERT is not set
>>>
>>> The sanity and integrity tests are both compile time and mount time
>>> options, i.e. it has to be compiled enabled for the mount option to do
>>> anything. I can't recall any thread where a developer asked a user to
>>> set any of these options for testing though.
>
>> FWIW, I actually have the integrity checking code built in on most kernels I
>> build.  I don't often use it, but it has near zero overhead when not
>> enabled, and it's helped me track down lower-level storage configuration
>> issues on occasion.
>
> I'll give that a shot tomorrow.
>
>>>> When I do the tests, how do I log the info you would like to see, if I
>>>> find a bug?
>>>
>>>
>>> bugzilla.kernel.org for tracking, and then reference the URL for the
>>> bug with a summary in an email to list is how I usually do it. The
>>> main thing is going to be the exact reproduce steps. It's also better,
>>> I think, to have complete dmesg (or journalctl -k) attached to the bug
>>> report because not all problems are directly related to Btrfs, they
>>> can have contributing factors elsewhere. And various MTAs, or more
>>> commonly MUAs, have a tendancy to wrap such wide text as found in
>>> kernel or journald messages.
>>
>> Aside from kernel messages, the other general stuff you want to have is:
>> 1. Kernel version and userspace tools version (`uname -a` and `btrfs
>> --version`)
>> 2. Any underlying storage configuration if it's not just plain a SSD/HDD or
>> partitions (for example, usage of dm-crypt, LVM, mdadm, and similar things).
>> 3. Output from `btrfs filesystem show` (this can be trimmed to the
>> filesystem that's having the issue).
>> 4. If you can still mount the filesystem, `btrfs filesystem df` output can
>> be helpful.
>> 5. If you can't mount the filesystem, output from `btrfs check` run without
>> any options will usually be asked for.
>
> I have now had the first crash, can you take a look if I have provided
> the needed info?
>
> https://bugzilla.kernel.org/show_bug.cgi?id=153141
>
> How long should I keep the host untouched? Or is all interesting idea provided?
>
Looking at the kernel log itself, you've got a ton of write errors on 
/dev/sdap.  I would suggest checking that particular disk with smartctl, 
and possibly checking the other hardware involved (the storage 
controller and cabling).

I would kind of expect BTRFS to crash with that many write errors 
regardless of what profile is being used, but we really should get 
better about reporting errors to user space in a sane way (making people 
dig through kernel logs to figure out their having issues like this is 
not particularly user friendly).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-15 12:44             ` Austin S. Hemmelgarn
@ 2016-08-15 13:38               ` Martin
  2016-08-15 13:41                 ` Austin S. Hemmelgarn
  2016-08-15 13:43                 ` Chris Murphy
  0 siblings, 2 replies; 18+ messages in thread
From: Martin @ 2016-08-15 13:38 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Btrfs BTRFS

> Looking at the kernel log itself, you've got a ton of write errors on
> /dev/sdap.  I would suggest checking that particular disk with smartctl, and
> possibly checking the other hardware involved (the storage controller and
> cabling).
>
> I would kind of expect BTRFS to crash with that many write errors regardless
> of what profile is being used, but we really should get better about
> reporting errors to user space in a sane way (making people dig through
> kernel logs to figure out their having issues like this is not particularly
> user friendly).

Interesting!

Why does it speak of "device sdq" and /dev/sdap ?

[337411.703937] BTRFS error (device sdq): bdev /dev/sdap errs: wr
36973, rd 0, flush 1, corrupt 0, gen 0
[337411.704658] BTRFS warning (device sdq): lost page write due to IO
error on /dev/sdap

/dev/sdap doesn't exist.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-15 12:38               ` Austin S. Hemmelgarn
@ 2016-08-15 13:39                 ` Martin
  2016-08-15 13:47                   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 18+ messages in thread
From: Martin @ 2016-08-15 13:39 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Erkki Seppala, Btrfs BTRFS

> That really is the case, there's currently no way to do this with BTRFS.
> You have to keep in mind that the raid5/6 code only went into the mainline
> kernel a few versions ago, and it's still pretty immature as far as kernel
> code goes.  I don't know when (if ever) such a feature might get put in, but
> it's definitely something to add to the list of things that would be nice to
> have.
>
> For the moment, the only option to achieve something like this is to set up
> a bunch of separate 8 device filesystems, but I would be willing to bet that
> the way you have it configured right now is closer to what most people would
> be doing in a regular deployment, and therefore is probably more valuable
> for testing.
>

I see.

Right now on our +500TB zfs filesystems we used raid6 with a 6 disk
vdev, which is often in the zfs world, and for btrfs I would be the
same when stable/possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-15 12:19           ` Martin
  2016-08-15 12:44             ` Austin S. Hemmelgarn
@ 2016-08-15 13:40             ` Chris Murphy
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Murphy @ 2016-08-15 13:40 UTC (permalink / raw)
  To: Martin; +Cc: Austin S. Hemmelgarn, Chris Murphy, Btrfs BTRFS

On Mon, Aug 15, 2016 at 6:19 AM, Martin <rc6encrypted@gmail.com> wrote:

>
> I have now had the first crash, can you take a look if I have provided
> the needed info?
>
> https://bugzilla.kernel.org/show_bug.cgi?id=153141

[337406.626175] BTRFS warning (device sdq): lost page write due to IO
error on /dev/sdap

Anytime there's I/O related errors that you'd need to go back farther
in the log to find out what really happened. You can play around with
'journalctl --since' for this. It'll accept things like -1m or -2h for
"back one minute or back two hours" or also "today" "yesterday" or by
explicit date and time.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-15 13:38               ` Martin
@ 2016-08-15 13:41                 ` Austin S. Hemmelgarn
  2016-08-15 13:43                 ` Chris Murphy
  1 sibling, 0 replies; 18+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-15 13:41 UTC (permalink / raw)
  To: Martin; +Cc: Chris Murphy, Btrfs BTRFS

On 2016-08-15 09:38, Martin wrote:
>> Looking at the kernel log itself, you've got a ton of write errors on
>> /dev/sdap.  I would suggest checking that particular disk with smartctl, and
>> possibly checking the other hardware involved (the storage controller and
>> cabling).
>>
>> I would kind of expect BTRFS to crash with that many write errors regardless
>> of what profile is being used, but we really should get better about
>> reporting errors to user space in a sane way (making people dig through
>> kernel logs to figure out their having issues like this is not particularly
>> user friendly).
>
> Interesting!
>
> Why does it speak of "device sdq" and /dev/sdap ?
>
> [337411.703937] BTRFS error (device sdq): bdev /dev/sdap errs: wr
> 36973, rd 0, flush 1, corrupt 0, gen 0
> [337411.704658] BTRFS warning (device sdq): lost page write due to IO
> error on /dev/sdap
>
> /dev/sdap doesn't exist.
>
I'm not quite certain, something in the kernel might have been confused, 
but it's hard to be sure.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-15 13:38               ` Martin
  2016-08-15 13:41                 ` Austin S. Hemmelgarn
@ 2016-08-15 13:43                 ` Chris Murphy
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Murphy @ 2016-08-15 13:43 UTC (permalink / raw)
  To: Martin; +Cc: Austin S. Hemmelgarn, Chris Murphy, Btrfs BTRFS

On Mon, Aug 15, 2016 at 7:38 AM, Martin <rc6encrypted@gmail.com> wrote:
>> Looking at the kernel log itself, you've got a ton of write errors on
>> /dev/sdap.  I would suggest checking that particular disk with smartctl, and
>> possibly checking the other hardware involved (the storage controller and
>> cabling).
>>
>> I would kind of expect BTRFS to crash with that many write errors regardless
>> of what profile is being used, but we really should get better about
>> reporting errors to user space in a sane way (making people dig through
>> kernel logs to figure out their having issues like this is not particularly
>> user friendly).
>
> Interesting!
>
> Why does it speak of "device sdq" and /dev/sdap ?
>
> [337411.703937] BTRFS error (device sdq): bdev /dev/sdap errs: wr
> 36973, rd 0, flush 1, corrupt 0, gen 0
> [337411.704658] BTRFS warning (device sdq): lost page write due to IO
> error on /dev/sdap
>
> /dev/sdap doesn't exist.

OK well
journalctl -b | grep -A10 -B10 "sdap"

See in what other context it appears. And also 'btrfs fi show' and see
if it appears associated with this Btrfs volume.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: How to stress test raid6 on 122 disk array
  2016-08-15 13:39                 ` Martin
@ 2016-08-15 13:47                   ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 18+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-15 13:47 UTC (permalink / raw)
  To: Martin; +Cc: Erkki Seppala, Btrfs BTRFS

On 2016-08-15 09:39, Martin wrote:
>> That really is the case, there's currently no way to do this with BTRFS.
>> You have to keep in mind that the raid5/6 code only went into the mainline
>> kernel a few versions ago, and it's still pretty immature as far as kernel
>> code goes.  I don't know when (if ever) such a feature might get put in, but
>> it's definitely something to add to the list of things that would be nice to
>> have.
>>
>> For the moment, the only option to achieve something like this is to set up
>> a bunch of separate 8 device filesystems, but I would be willing to bet that
>> the way you have it configured right now is closer to what most people would
>> be doing in a regular deployment, and therefore is probably more valuable
>> for testing.
>>
>
> I see.
>
> Right now on our +500TB zfs filesystems we used raid6 with a 6 disk
> vdev, which is often in the zfs world, and for btrfs I would be the
> same when stable/possible.
>
A while back there was talk of implementing a system where you could 
specify any arbitrary number of replicas, stripes or parity (for 
example, if you had 16 devices, you could tell it to do two copies with 
double parity using full width stripes), and in theory, it would be 
possible there (parity level of 2 with a stripe width of 6 or 8 
depending on how it's implemented), but I don't think it's likely that 
that functionality will exist any time soon.  Implementing such a system 
would pretty much require re-writing most of the allocation code (which 
probably would be a good idea for other reasons now too), and that's not 
likely to happen given the amount of coding that went into the raid5/6 
support.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-08-15 13:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-04 17:43 How to stress test raid6 on 122 disk array Martin
2016-08-04 19:05 ` Austin S. Hemmelgarn
2016-08-04 20:01   ` Chris Murphy
2016-08-04 20:51     ` Martin
2016-08-04 21:12       ` Chris Murphy
2016-08-04 22:19         ` Martin
2016-08-05 10:15           ` Erkki Seppala
2016-08-15 12:19             ` Martin
2016-08-15 12:38               ` Austin S. Hemmelgarn
2016-08-15 13:39                 ` Martin
2016-08-15 13:47                   ` Austin S. Hemmelgarn
2016-08-05 11:39         ` Austin S. Hemmelgarn
2016-08-15 12:19           ` Martin
2016-08-15 12:44             ` Austin S. Hemmelgarn
2016-08-15 13:38               ` Martin
2016-08-15 13:41                 ` Austin S. Hemmelgarn
2016-08-15 13:43                 ` Chris Murphy
2016-08-15 13:40             ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.