All of lore.kernel.org
 help / color / mirror / Atom feed
* Poll: time to switch skinny-metadata on by default?
@ 2014-10-16 11:33 David Sterba
  2014-10-20 16:34 ` David Sterba
  0 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2014-10-16 11:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm, jbacik

Hi,

the core of skinny-metadata feature has been merged in 3.10 (Jun 2013)
and has been reportedly used by many people. No major bugs were reported
lately unless I missed them.

The obvious benefit is reduced metadata consumption at the cost of lost
backward compatibility for pre-3.10 kernels. I believe this is
acceptable to do the change now.

The feature can be turned off at mkfs time by '-O ^skinny-metadata' if
needed.

I'd like to make it default with the 3.17 release of btrfs-progs.
Please let me know if you have objections.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-16 11:33 Poll: time to switch skinny-metadata on by default? David Sterba
@ 2014-10-20 16:34 ` David Sterba
  2014-10-21  9:29   ` Duncan
                     ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: David Sterba @ 2014-10-20 16:34 UTC (permalink / raw)
  To: dsterba, linux-btrfs, clm, jbacik

On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
> I'd like to make it default with the 3.17 release of btrfs-progs.
> Please let me know if you have objections.

For the record, 3.17 will not change the defaults. The timing of the
poll was very bad to get enough feedback before the release. Let's keep
it open for now.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-20 16:34 ` David Sterba
@ 2014-10-21  9:29   ` Duncan
  2014-10-21 11:02     ` Austin S Hemmelgarn
  2014-10-21 16:40     ` Rich Freeman
  2014-10-25 12:24   ` Marc Joliet
  2014-10-27  4:39   ` Zygo Blaxell
  2 siblings, 2 replies; 28+ messages in thread
From: Duncan @ 2014-10-21  9:29 UTC (permalink / raw)
  To: linux-btrfs

David Sterba posted on Mon, 20 Oct 2014 18:34:03 +0200 as excerpted:

> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
>> I'd like to make it default with the 3.17 release of btrfs-progs.
>> Please let me know if you have objections.
> 
> For the record, 3.17 will not change the defaults. The timing of the
> poll was very bad to get enough feedback before the release. Let's keep
> it open for now.

FWIW my own results agree with yours, I've had no problem with skinny-
metadata here, and it has been my default now for a couple backup-and-new-
mkfs.btrfs generations, now.

As you know there were some problems with it in the first kernel cycle or 
two after it was introduced as an option, and I waited awhile until they 
died down before trying it here, but as I said, no problems since I 
switched it on, and I've been running it awhile now.

So defaulting to skinny-metadata looks good from here. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-21  9:29   ` Duncan
@ 2014-10-21 11:02     ` Austin S Hemmelgarn
  2014-10-21 12:35       ` Konstantinos Skarlatos
  2014-10-21 16:40     ` Rich Freeman
  1 sibling, 1 reply; 28+ messages in thread
From: Austin S Hemmelgarn @ 2014-10-21 11:02 UTC (permalink / raw)
  To: Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

On 2014-10-21 05:29, Duncan wrote:
> David Sterba posted on Mon, 20 Oct 2014 18:34:03 +0200 as excerpted:
>
>> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
>>> I'd like to make it default with the 3.17 release of btrfs-progs.
>>> Please let me know if you have objections.
>>
>> For the record, 3.17 will not change the defaults. The timing of the
>> poll was very bad to get enough feedback before the release. Let's keep
>> it open for now.
>
> FWIW my own results agree with yours, I've had no problem with skinny-
> metadata here, and it has been my default now for a couple backup-and-new-
> mkfs.btrfs generations, now.
>
> As you know there were some problems with it in the first kernel cycle or
> two after it was introduced as an option, and I waited awhile until they
> died down before trying it here, but as I said, no problems since I
> switched it on, and I've been running it awhile now.
>
> So defaulting to skinny-metadata looks good from here. =:^)
>
Same here, I've been using it on all my systems since I switched from 
3.15 to 3.16, and have had no issues whatsoever.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-21 11:02     ` Austin S Hemmelgarn
@ 2014-10-21 12:35       ` Konstantinos Skarlatos
  0 siblings, 0 replies; 28+ messages in thread
From: Konstantinos Skarlatos @ 2014-10-21 12:35 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: Duncan, linux-btrfs

On 21/10/2014 2:02 μμ, Austin S Hemmelgarn wrote:
> On 2014-10-21 05:29, Duncan wrote:
>> David Sterba posted on Mon, 20 Oct 2014 18:34:03 +0200 as excerpted:
>>
>>> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
>>>> I'd like to make it default with the 3.17 release of btrfs-progs.
>>>> Please let me know if you have objections.
>>>
>>> For the record, 3.17 will not change the defaults. The timing of the
>>> poll was very bad to get enough feedback before the release. Let's keep
>>> it open for now.
>>
>> FWIW my own results agree with yours, I've had no problem with skinny-
>> metadata here, and it has been my default now for a couple 
>> backup-and-new-
>> mkfs.btrfs generations, now.
>>
>> As you know there were some problems with it in the first kernel 
>> cycle or
>> two after it was introduced as an option, and I waited awhile until they
>> died down before trying it here, but as I said, no problems since I
>> switched it on, and I've been running it awhile now.
>>
>> So defaulting to skinny-metadata looks good from here. =:^)
>>
> Same here, I've been using it on all my systems since I switched from 
> 3.15 to 3.16, and have had no issues whatsoever.
>
I am using skinny-metadata for years, and only once had an issue with 
it. It was with scrub and was fixed by Liu Bo[1], so i think 
skinny-metadata is mature enough be a default.

[1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg34493.html

-- 
Konstantinos Skarlatos


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-21  9:29   ` Duncan
  2014-10-21 11:02     ` Austin S Hemmelgarn
@ 2014-10-21 16:40     ` Rich Freeman
  2014-10-22  2:08       ` Duncan
  1 sibling, 1 reply; 28+ messages in thread
From: Rich Freeman @ 2014-10-21 16:40 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Tue, Oct 21, 2014 at 5:29 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> David Sterba posted on Mon, 20 Oct 2014 18:34:03 +0200 as excerpted:
>
>> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
>>> I'd like to make it default with the 3.17 release of btrfs-progs.
>>> Please let me know if you have objections.
>>
>> For the record, 3.17 will not change the defaults. The timing of the
>> poll was very bad to get enough feedback before the release. Let's keep
>> it open for now.
>
> FWIW my own results agree with yours, I've had no problem with skinny-
> metadata here, and it has been my default now for a couple backup-and-new-
> mkfs.btrfs generations, now.
>

How does one enable it for an existing filesystem?  Is it safe to just
run btrfstune -x?  Can this be done on a mounted filesystem?  Are
there any risks with converting?

--
Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-21 16:40     ` Rich Freeman
@ 2014-10-22  2:08       ` Duncan
  2014-10-22 12:49         ` Dave
  2014-10-23 14:47         ` Tobias Geerinckx-Rice
  0 siblings, 2 replies; 28+ messages in thread
From: Duncan @ 2014-10-22  2:08 UTC (permalink / raw)
  To: linux-btrfs

Rich Freeman posted on Tue, 21 Oct 2014 12:40:01 -0400 as excerpted:

> On Tue, Oct 21, 2014 at 5:29 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>> David Sterba posted on Mon, 20 Oct 2014 18:34:03 +0200 as excerpted:
>>
>>> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
>>>> I'd like to make it default with the 3.17 release of btrfs-progs.
>>>> Please let me know if you have objections.
>>>
>>> For the record, 3.17 will not change the defaults. The timing of the
>>> poll was very bad to get enough feedback before the release. Let's
>>> keep it open for now.
>>
>> FWIW my own results agree with yours, I've had no problem with skinny-
>> metadata here, and it has been my default now for a couple
>> backup-and-new-
>> mkfs.btrfs generations, now.
>>
>>
> How does one enable it for an existing filesystem?  Is it safe to just
> run btrfstune -x?  Can this be done on a mounted filesystem?  Are there
> any risks with converting?

AFAIK, enabling skinny-metadata with btrfstune simply enables it for 
future metadata commits.  It doesn't change existing metadata.  However, 
with skinny-metadata enabled, doing a balance start -m will rewrite 
existing metadata, thus converting it to skinny if it wasn't skinny 
before.

Since the kernel has code for both "fat" metadata and skinny-metadata, 
they can exist side-by-side and the kernel will use whichever code is 
appropriate.  And since (afaik) a balance effects conversion of existing 
metadata by simply rewriting it using the same metadata writing paths 
it'd normally use only now on the skinny-metadata side, there should be 
no additional risk to that, either.

The narrow-case additional risk there is would be due to the fact that 
the code paths are different, and while both paths have been well 
exercised by now with no bugs related to those specific code paths in 
awhile, in theory anyway, it's narrowly possible that an individual 
installation's use-case and data happened to work just fine on the 
available metadata, but will trigger some exotic and as yet unseen bug 
when you switch to skinny-metadata and thus exercise the other code-
path.  I'd call the risk of that nonzero but extremely unlikely.  

IOW, if you're familiar with Douglas Adams' Hitchhiker's Guide series, 
it's almost the kind of probability that you'd need an improbability 
drive to hit. =:^)  

If not, compare it to winning the lottery or getting struck by 
lightening, yes, people do sometimes do it, but it's not something you 
should plan your life around, particularly if you aren't in the habit of 
playing golf and sticking a club in the air, in the middle of a lightning 
storm! =:^)

And if you're adverse to that sort of odds, why are you playing with 
still not entirely stable btrfs at this point, anyway?

As for the mounted filesystem question, since all it does is flip a 
switch so that new metadata writes use the skinny-metadata code path, it 
shouldn't be a problem.  However, I'd probably do it on an unmounted 
filesystem here, simply because there's no reason to tempt fate... unless 
your goal is to see what happens, of course. =:^)

Matter of fact, personally, since I tend to periodically backup, do a 
fresh mkfs.btrfs with the new features I want enabled, and restore, I've 
never actually used btrfstune for this myself, either.  But that's more a 
matter of that being the most convenient time to switch it over since I'm 
already doing the fresh mkfs anyway, than because I'm being overly 
cautious.  Still, for those with a similar btrfs rotation system already 
in place, why tempt fate, unless of course your whole /object/ is a 
deliberate test and tempt of fate?

BTW...

@ Dave Sterba: I'm running no-holes too, and haven't had problems with it 
either, tho it's obviously a bit newer and doesn't yet have the degree of 
testing that skinny-metadata has.  Any idea when that'll go default?  
It's probably best to stagger them, which probably means default no-holes 
for the 3.19 userspace release since default skinny-metadata is 
presumably going to be 3.18 now; does that sound about right?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-22  2:08       ` Duncan
@ 2014-10-22 12:49         ` Dave
  2014-10-23  2:41           ` Duncan
  2014-10-23 14:47         ` Tobias Geerinckx-Rice
  1 sibling, 1 reply; 28+ messages in thread
From: Dave @ 2014-10-22 12:49 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Tue, Oct 21, 2014 at 10:08 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> As for the mounted filesystem question, since all it does is flip a
> switch so that new metadata writes use the skinny-metadata code path, it
> shouldn't be a problem.

Nope.  Just tried it here:

# btrfs --version
Btrfs v3.16.1-42-g140eccb

# btrfstune -x /dev/dm-0
/dev/dm-0 is mounted
-- 
-=[dave]=-

Entropy isn't what it used to be.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-22 12:49         ` Dave
@ 2014-10-23  2:41           ` Duncan
  2014-10-23 13:37             ` David Sterba
  0 siblings, 1 reply; 28+ messages in thread
From: Duncan @ 2014-10-23  2:41 UTC (permalink / raw)
  To: linux-btrfs

Dave posted on Wed, 22 Oct 2014 08:49:46 -0400 as excerpted:

> On Tue, Oct 21, 2014 at 10:08 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> As for the mounted filesystem question, since all it does is flip a
>> switch so that new metadata writes use the skinny-metadata code path,
>> it shouldn't be a problem.
> 
> Nope.  Just tried it here:
> 
> # btrfs --version Btrfs v3.16.1-42-g140eccb
> 
> # btrfstune -x /dev/dm-0 /dev/dm-0 is mounted

Thanks.

So btrfstune refuses to set the skinny-metadata flag at all on mounted 
devices.  Nicely reduces risk, /and/ answers the question. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-23  2:41           ` Duncan
@ 2014-10-23 13:37             ` David Sterba
  0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2014-10-23 13:37 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Thu, Oct 23, 2014 at 02:41:47AM +0000, Duncan wrote:
> Dave posted on Wed, 22 Oct 2014 08:49:46 -0400 as excerpted:
> 
> > On Tue, Oct 21, 2014 at 10:08 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> >> As for the mounted filesystem question, since all it does is flip a
> >> switch so that new metadata writes use the skinny-metadata code path,
> >> it shouldn't be a problem.
> > 
> > Nope.  Just tried it here:
> > 
> > # btrfs --version Btrfs v3.16.1-42-g140eccb
> > 
> > # btrfstune -x /dev/dm-0 /dev/dm-0 is mounted
> 
> Thanks.
> 
> So btrfstune refuses to set the skinny-metadata flag at all on mounted 
> devices.  Nicely reduces risk, /and/ answers the question. =:^)

btrfstune requires an unmounted device. The on-line change to features
is done via the sysfs interface, eg /sys/fs/btrfs/<UUID>/features, then
echo 1 > featurename. Right now only the extended refs (aka hardlink
limit) can be turned on.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-22  2:08       ` Duncan
  2014-10-22 12:49         ` Dave
@ 2014-10-23 14:47         ` Tobias Geerinckx-Rice
  2014-10-24  1:33           ` Duncan
  1 sibling, 1 reply; 28+ messages in thread
From: Tobias Geerinckx-Rice @ 2014-10-23 14:47 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On 22 October 2014 04:08, Duncan <1i5t5.duncan@cox.net> wrote:

> Since the kernel has code for both "fat" metadata and skinny-metadata,
> they can exist side-by-side and the kernel will use whichever code is
> appropriate.

I understand that the fat extent code will probably never be removed
for compatibility reasons, but do wonder why it's still the default.
Caution?

Petr Janecek's balancing problem [1] and similar bugs aside: is there
a functional reason to prefer "fat" over skinny metadata for future
file systems?

Regards,

T G-R

[1] http://www.spinics.net/lists/linux-btrfs/msg38443.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-23 14:47         ` Tobias Geerinckx-Rice
@ 2014-10-24  1:33           ` Duncan
  0 siblings, 0 replies; 28+ messages in thread
From: Duncan @ 2014-10-24  1:33 UTC (permalink / raw)
  To: linux-btrfs

Tobias Geerinckx-Rice posted on Thu, 23 Oct 2014 16:47:19 +0200 as
excerpted:

> On 22 October 2014 04:08, Duncan <1i5t5.duncan@cox.net> wrote:
> 
>> Since the kernel has code for both "fat" metadata and skinny-metadata,
>> they can exist side-by-side and the kernel will use whichever code is
>> appropriate.
> 
> I understand that the fat extent code will probably never be removed for
> compatibility reasons, but do wonder why it's still the default.
> Caution?

Caution, backward kernel compatibility, and simply timing.

The skinny code is newer, and there were several skinny-metadata related 
bugs in the first couple kernel cycles it was available, so not making it 
the immediate default was certainly wise.  Tho the new code has been 
reasonably stable for awhile, now.  But that's exactly why this 
discussion, is it time to make the new code the default yet, or not, 
because it hasn't been done yet.

Additionally, some people want to keep the flexibility to mount with old 
kernels.  Consider a distro installation and rescue image (ISO or USB), 
for instance.  Those can be used for rescue purposes for not only the 
life of that distro release, but for sometime afterward.  If the only 
rescue image you can find is a two year old image and it won't mount your 
btrfs because the on-device format has changed since then and your 
filesystem is the newer format, you're going to be one frustrated btrfs 
user!

> Petr Janecek's balancing problem [1] and similar bugs aside: is there a
> functional reason to prefer "fat" over skinny metadata for future file
> systems?

Other than keeping backward compatibility to work with old rescue images 
and the like, as discussed above, not that I'm aware of.  IOW, I know of 
no corner-case where fat metadata is now more efficient or more stable.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-20 16:34 ` David Sterba
  2014-10-21  9:29   ` Duncan
@ 2014-10-25 12:24   ` Marc Joliet
  2014-10-25 19:58     ` Marc Joliet
  2014-10-25 20:33     ` Chris Murphy
  2014-10-27  4:39   ` Zygo Blaxell
  2 siblings, 2 replies; 28+ messages in thread
From: Marc Joliet @ 2014-10-25 12:24 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 13463 bytes --]

Am Mon, 20 Oct 2014 18:34:03 +0200
schrieb David Sterba <dsterba@suse.cz>:

> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
> > I'd like to make it default with the 3.17 release of btrfs-progs.
> > Please let me know if you have objections.
> 
> For the record, 3.17 will not change the defaults. The timing of the
> poll was very bad to get enough feedback before the release. Let's keep
> it open for now.

Two points:

First of all: does grub2 support booting from a btrfs file system with
skinny-metadata, or is it irrelevant?

And secondly, I've gotten a BUG after trying to convert my external backup
partition to skinny-metadata (the same one from the bug report mentioned
previously in this thread, I believe). Below is a more detailed account.

First of all, my setup (as of *now*, not before the BUG):

  # btrfs filesystem show
  Label: none  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
  	Total devices 1 FS bytes used 41.42GiB
  	devid    1 size 107.79GiB used 53.06GiB path /dev/sdf1
  
  Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
  	Total devices 4 FS bytes used 514.54GiB
  	devid    1 size 298.09GiB used 259.03GiB path /dev/sda
  	devid    2 size 298.09GiB used 259.03GiB path /dev/sdb
  	devid    3 size 298.09GiB used 259.03GiB path /dev/sdc
  	devid    4 size 298.09GiB used 259.03GiB path /dev/sdd
  
  Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
  	Total devices 1 FS bytes used 169.31GiB
  	devid    1 size 976.56GiB used 175.06GiB path /dev/sdg2
  
  Btrfs v3.17

  # btrfs filesystem df /
  Data, single: total=48.00GiB, used=39.94GiB
  System, DUP: total=32.00MiB, used=12.00KiB
  Metadata, DUP: total=2.50GiB, used=1.48GiB
  GlobalReserve, single: total=508.00MiB, used=0.00B

  # btrfs filesystem df /home
  Data, RAID10: total=516.00GiB, used=513.38GiB
  System, RAID10: total=64.00MiB, used=96.00KiB
  Metadata, RAID10: total=2.00GiB, used=1.16GiB
  GlobalReserve, single: total=400.00MiB, used=0.00B

  # btrfs filesystem df /media/MARCEC_BACKUP
  Data, single: total=167.00GiB, used=166.53GiB
  System, DUP: total=32.00MiB, used=28.00KiB
  Metadata, DUP: total=4.00GiB, used=2.79GiB
  GlobalReserve, single: total=512.00MiB, used=1.33MiB

  # uname -a
  Linux marcec 3.16.6-gentoo #1 SMP PREEMPT Fri Oct 24 01:06:49 CEST 2014 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux

  # btrfs --version
  Btrfs v3.17

Now, what I was trying to do - motivated by this thread - was convert /home
and /media/MARCEC_BACKUP to skinny-metadata, using "btrfstune -x".  That in
itself worked fine, and the MARCEC_BACKUP has since seen filesystem activity
(running rsync, creating and deleting snapshots). *Then* I started a "btrfs
balance -m" on /home (which completed without errors) and then
on /media/MARCEC_BACKUP, which is when the BUG happened (dmesg output see
below).

The result in user-space was that "btrfs balance" SEGFAULTed.  "btrfs balance
status" showed the balance still running, so I tried to cancel it, which ended
up hanging (the btrfs program has yet to return back to the shell).  For some
reason I tried running "sync" (as root), which has also hung in the same way.

I can still access files on MARCEC_BACKUP just fine, and the snapshots are
still there ("btrfs subvolume list" succeeds).

Is there anything else I can do, or any other information you might need?

------------ dmesg output (starting with the start of the balance) ------------

  [ 4651.448883] BTRFS info (device sdb): relocating block group 1492765376512 flags 66
  [ 4652.259501] BTRFS info (device sdb): found 2 extents
  [ 4652.987753] BTRFS info (device sdb): relocating block group 1491691634688 flags 68
  [ 4688.655390] BTRFS info (device sdb): found 13744 extents
  [ 4689.382109] BTRFS info (device sdb): relocating block group 1485249183744 flags 68
  [ 4753.879520] BTRFS info (device sdb): found 62519 extents
  [ 4791.123268] BTRFS info (device sdg2): relocating block group 2499670966272 flags 36
  [ 4830.811665] BTRFS info (device sdg2): found 1793 extents
  [ 4831.240909] BTRFS info (device sdg2): relocating block group 2499134095360 flags 36
  [ 5407.582370] BTRFS info (device sdg2): found 51182 extents
  [ 5407.959115] BTRFS info (device sdg2): relocating block group 2498597224448 flags 36
  [ 5724.487824] BTRFS info (device sdg2): found 51435 extents
  [ 5725.006401] BTRFS info (device sdg2): relocating block group 2473867608064 flags 34
  [ 5725.817513] BTRFS info (device sdg2): found 7 extents
  [ 5726.328413] BTRFS info (device sdg2): relocating block group 2469002215424 flags 36
  [ 5844.148295] ------------[ cut here ]------------
  [ 5844.148307] WARNING: CPU: 1 PID: 7270 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x48c/0x4c0()
  [ 5844.148308] Modules linked in: uas usb_storage joydev hid_logitech_dj bridge stp llc ipt_REJECT xt_tcpudp iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ip_tables x_tables snd_hda_codec_analog snd_hda_codec_generic dummy sg snd_hda_codec_hdmi sr_mod cdrom kvm_amd kvm radeon evdev i2c_algo_bit drm_kms_helper k8temp ttm drm backlight snd_ice1724 snd_ak4113 snd_pt2258 snd_i2c snd_ak4114 snd_ac97_codec snd_hda_intel ac97_bus snd_ice17xx_ak4xxx snd_hda_controller snd_ak4xxx_adda forcedeth snd_rawmidi xhci_hcd snd_hda_codec snd_seq_device snd_pcm snd_timer r8169 snd mii rtc_cmos ohci_pci asus_atk0110 i2c_nforce2 i2c_core ata_generic ehci_pci ohci_hcd ehci_hcd pata_amd pata_acpi
  [ 5844.148357] CPU: 1 PID: 7270 Comm: btrfs Not tainted 3.16.6-gentoo #1
  [ 5844.148359] Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 1701 10/30/2008
  [ 5844.148361]  0000000000000000 0000000000000009 ffffffff815675bc 0000000000000000
  [ 5844.148364]  ffffffff810460a6 ffff880052559ea0 ffff8800048a5800 00000246752f5000
  [ 5844.148366]  ffff880109252000 0000000000000001 ffffffff81237e3c 0000000000001000
  [ 5844.148369] Call Trace:
  [ 5844.148375]  [<ffffffff815675bc>] ? dump_stack+0x49/0x6a
  [ 5844.148378]  [<ffffffff810460a6>] ? warn_slowpath_common+0x86/0xb0
  [ 5844.148382]  [<ffffffff81237e3c>] ? btrfs_lookup_extent_info+0x48c/0x4c0
  [ 5844.148385]  [<ffffffff8123afde>] ? do_walk_down+0x13e/0x560
  [ 5844.148388]  [<ffffffff812387ea>] ? walk_down_proc+0x1da/0x2c0
  [ 5844.148391]  [<ffffffff8123b4b3>] ? walk_down_tree+0xb3/0xe0
  [ 5844.148394]  [<ffffffff8123f235>] ? btrfs_drop_subtree+0x195/0x210
  [ 5844.148397]  [<ffffffff8129fa2f>] ? do_relocation+0x36f/0x500
  [ 5844.148401]  [<ffffffff8129d985>] ? calcu_metadata_size.isra.43.constprop.57+0x95/0xb0
  [ 5844.148405]  [<ffffffff8127284f>] ? read_extent_buffer+0xaf/0x110
  [ 5844.148407]  [<ffffffff8129f50d>] ? remove_backref_node+0xad/0x140
  [ 5844.148410]  [<ffffffff812a007d>] ? relocate_tree_blocks+0x4bd/0x610
  [ 5844.148413]  [<ffffffff812a159b>] ? relocate_block_group+0x3cb/0x660
  [ 5844.148416]  [<ffffffff812a19e8>] ? btrfs_relocate_block_group+0x1b8/0x2e0
  [ 5844.148418]  [<ffffffff81276a46>] ? btrfs_relocate_chunk.isra.62+0x56/0x740
  [ 5844.148422]  [<ffffffff81288e50>] ? btrfs_set_lock_blocking_rw+0x60/0xa0
  [ 5844.148425]  [<ffffffff8127284f>] ? read_extent_buffer+0xaf/0x110
  [ 5844.148428]  [<ffffffff81230d65>] ? btrfs_previous_item+0x95/0x120
  [ 5844.148431]  [<ffffffff81268961>] ? btrfs_get_token_64+0x61/0xf0
  [ 5844.148433]  [<ffffffff8127182f>] ? release_extent_buffer+0x2f/0xd0
  [ 5844.148436]  [<ffffffff81279b68>] ? btrfs_balance+0x858/0xf20
  [ 5844.148440]  [<ffffffff81148585>] ? __sb_start_write+0x65/0x110
  [ 5844.148443]  [<ffffffff8128093e>] ? btrfs_ioctl_balance+0x19e/0x500
  [ 5844.148446]  [<ffffffff8128688f>] ? btrfs_ioctl+0xa8f/0x2940
  [ 5844.148450]  [<ffffffff8111d1e3>] ? handle_mm_fault+0x873/0xba0
  [ 5844.148453]  [<ffffffff8103889a>] ? __do_page_fault+0x2ba/0x570
  [ 5844.148457]  [<ffffffff81120359>] ? vma_link+0xd9/0xe0
  [ 5844.148460]  [<ffffffff8113bb9a>] ? kmem_cache_alloc+0x16a/0x170
  [ 5844.148463]  [<ffffffff81157c9e>] ? do_vfs_ioctl+0x7e/0x500
  [ 5844.148466]  [<ffffffff811581b9>] ? SyS_ioctl+0x99/0xb0
  [ 5844.148469]  [<ffffffff8156df82>] ? page_fault+0x22/0x30
  [ 5844.148473]  [<ffffffff8156c612>] ? system_call_fastpath+0x16/0x1b
  [ 5844.148475] ---[ end trace bf07dd9e2f7fb342 ]---
  [ 5844.148478] BTRFS error (device sdg2): Missing references.
  [ 5844.148496] ------------[ cut here ]------------
  [ 5844.148532] kernel BUG at fs/btrfs/extent-tree.c:7624!
  [ 5844.148565] invalid opcode: 0000 [#1] PREEMPT SMP 
  [ 5844.148600] Modules linked in: uas usb_storage joydev hid_logitech_dj bridge stp llc ipt_REJECT xt_tcpudp iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ip_tables x_tables snd_hda_codec_analog snd_hda_codec_generic dummy sg snd_hda_codec_hdmi sr_mod cdrom kvm_amd kvm radeon evdev i2c_algo_bit drm_kms_helper k8temp ttm drm backlight snd_ice1724 snd_ak4113 snd_pt2258 snd_i2c snd_ak4114 snd_ac97_codec snd_hda_intel ac97_bus snd_ice17xx_ak4xxx snd_hda_controller snd_ak4xxx_adda forcedeth snd_rawmidi xhci_hcd snd_hda_codec snd_seq_device snd_pcm snd_timer r8169 snd mii rtc_cmos ohci_pci asus_atk0110 i2c_nforce2 i2c_core ata_generic ehci_pci ohci_hcd ehci_hcd pata_amd pata_acpi
  [ 5844.149007] CPU: 1 PID: 7270 Comm: btrfs Tainted: G        W     3.16.6-gentoo #1
  [ 5844.149007] Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 1701 10/30/2008
  [ 5844.149007] task: ffff88003324f000 ti: ffff8800156d0000 task.ti: ffff8800156d0000
  [ 5844.149007] RIP: 0010:[<ffffffff8123b3ec>]  [<ffffffff8123b3ec>] do_walk_down+0x54c/0x560
  [ 5844.149007] RSP: 0018:ffff8800156d3778  EFLAGS: 00010292
  [ 5844.149007] RAX: 000000000000002e RBX: ffff88010c4ba0c0 RCX: 0000000000000006
  [ 5844.149007] RDX: 0000000000000007 RSI: 0000000000000046 RDI: ffff88011fc8d140
  [ 5844.149007] RBP: ffff880052559bd0 R08: 0000000000000400 R09: 00000000000003a5
  [ 5844.149007] R10: 0000000000000006 R11: 00000000000003a4 R12: ffff880037378a68
  [ 5844.149007] R13: 0000000000000002 R14: ffff8800048a5800 R15: 0000000000000002
  [ 5844.149007] FS:  00007f6eda85c8c0(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
  [ 5844.149007] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  [ 5844.149007] CR2: 000000000262ddf0 CR3: 0000000018ce4000 CR4: 00000000000007e0
  [ 5844.149007] Stack:
  [ 5844.149007]  ffff88010c4ba108 0000000000000000 02a9000000000000 ff00000000000000
  [ 5844.149007]  0000000000000001 0000000000000009 00000001178e5528 00000246752f5000
  [ 5844.149007]  ffff8800156d3854 000000000000b38a ffff880109252000 0000000000001000
  [ 5844.149007] Call Trace:
  [ 5844.149007]  [<ffffffff812387ea>] ? walk_down_proc+0x1da/0x2c0
  [ 5844.149007]  [<ffffffff8123b4b3>] ? walk_down_tree+0xb3/0xe0
  [ 5844.149007]  [<ffffffff8123f235>] ? btrfs_drop_subtree+0x195/0x210
  [ 5844.149007]  [<ffffffff8129fa2f>] ? do_relocation+0x36f/0x500
  [ 5844.149007]  [<ffffffff8129d985>] ? calcu_metadata_size.isra.43.constprop.57+0x95/0xb0
  [ 5844.149007]  [<ffffffff8127284f>] ? read_extent_buffer+0xaf/0x110
  [ 5844.149007]  [<ffffffff8129f50d>] ? remove_backref_node+0xad/0x140
  [ 5844.149007]  [<ffffffff812a007d>] ? relocate_tree_blocks+0x4bd/0x610
  [ 5844.149007]  [<ffffffff812a159b>] ? relocate_block_group+0x3cb/0x660
  [ 5844.149007]  [<ffffffff812a19e8>] ? btrfs_relocate_block_group+0x1b8/0x2e0
  [ 5844.149007]  [<ffffffff81276a46>] ? btrfs_relocate_chunk.isra.62+0x56/0x740
  [ 5844.149007]  [<ffffffff81288e50>] ? btrfs_set_lock_blocking_rw+0x60/0xa0
  [ 5844.149007]  [<ffffffff8127284f>] ? read_extent_buffer+0xaf/0x110
  [ 5844.149007]  [<ffffffff81230d65>] ? btrfs_previous_item+0x95/0x120
  [ 5844.149007]  [<ffffffff81268961>] ? btrfs_get_token_64+0x61/0xf0
  [ 5844.149007]  [<ffffffff8127182f>] ? release_extent_buffer+0x2f/0xd0
  [ 5844.149007]  [<ffffffff81279b68>] ? btrfs_balance+0x858/0xf20
  [ 5844.149007]  [<ffffffff81148585>] ? __sb_start_write+0x65/0x110
  [ 5844.149007]  [<ffffffff8128093e>] ? btrfs_ioctl_balance+0x19e/0x500
  [ 5844.149007]  [<ffffffff8128688f>] ? btrfs_ioctl+0xa8f/0x2940
  [ 5844.149007]  [<ffffffff8111d1e3>] ? handle_mm_fault+0x873/0xba0
  [ 5844.149007]  [<ffffffff8103889a>] ? __do_page_fault+0x2ba/0x570
  [ 5844.149007]  [<ffffffff81120359>] ? vma_link+0xd9/0xe0
  [ 5844.149007]  [<ffffffff8113bb9a>] ? kmem_cache_alloc+0x16a/0x170
  [ 5844.149007]  [<ffffffff81157c9e>] ? do_vfs_ioctl+0x7e/0x500
  [ 5844.149007]  [<ffffffff811581b9>] ? SyS_ioctl+0x99/0xb0
  [ 5844.149007]  [<ffffffff8156df82>] ? page_fault+0x22/0x30
  [ 5844.149007]  [<ffffffff8156c612>] ? system_call_fastpath+0x16/0x1b
  [ 5844.149007] Code: c8 0f 85 62 fe ff ff e9 75 fd ff ff b8 f4 ff ff ff e9 c1 fc ff ff 49 8b be f0 01 00 00 48 c7 c6 1b 90 74 81 31 c0 e8 84 7f fe ff <0f> 0b 0f 0b 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 41 
  [ 5844.151353] RIP  [<ffffffff8123b3ec>] do_walk_down+0x54c/0x560
  [ 5844.151353]  RSP <ffff8800156d3778>
  [ 5844.172535] ---[ end trace bf07dd9e2f7fb343 ]---

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-25 12:24   ` Marc Joliet
@ 2014-10-25 19:58     ` Marc Joliet
  2014-10-27  1:30       ` Marc Joliet
  2014-10-25 20:33     ` Chris Murphy
  1 sibling, 1 reply; 28+ messages in thread
From: Marc Joliet @ 2014-10-25 19:58 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1934 bytes --]

Am Sat, 25 Oct 2014 14:24:58 +0200
schrieb Marc Joliet <marcec@gmx.de>:

> I can still access files on MARCEC_BACKUP just fine, and the snapshots are
> still there ("btrfs subvolume list" succeeds).

Just an update: that was true for a while, but at one point listing directories
and accessing the file system in general stopped working (all processes that
touched the FS hung/zombified). This necessitated a hard reboot, since "reboot"
and "halt" (so... "shutdown", really) didn't do anything other than spit out the
usual "the system is rebooting" message.

Interestingly enough, the file system was (apparently) fine after that (just as
Petr Janecek wrote), other than an invalid space cache file:

  [   65.477006] BTRFS info (device sdg2): The free space cache file
  (2466854731776) is invalid. skip it

That is, running my backup routine worked just as before, and I can access
files on the FS just fine.

Oh, and apparently the rebalance continued successfully?!

  [  342.540865] BTRFS info (device sdg2): continuing balance
  [  342.599991] BTRFS info (device sdg2): relocating block group 2502355320832
  flags 34 [  342.821608] BTRFS info (device sdg2): found 4 extents
  [  343.056915] BTRFS info (device sdg2): relocating block group 2501818449920
  flags 36 [  437.932405] BTRFS info (device sdg2): found 25086 extents
  [  438.727197] BTRFS info (device sdg2): relocating block group 2501281579008
  flags 36 [  557.319354] BTRFS info (device sdg2): found 83875 extents

  # btrfs balance status /media/MARCEC_BACKUP
  No balance found on '/media/MARCEC_BACKUP'

No SEGFAULT anywhere. All I can say right now is "huh". Although I'll try
starting a "balance -m" again tomorrow, because the continued balance only
took about 3-4 minutes (maybe it .

HTH
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-25 12:24   ` Marc Joliet
  2014-10-25 19:58     ` Marc Joliet
@ 2014-10-25 20:33     ` Chris Murphy
  2014-10-25 20:35       ` Chris Murphy
  1 sibling, 1 reply; 28+ messages in thread
From: Chris Murphy @ 2014-10-25 20:33 UTC (permalink / raw)
  To: linux-btrfs


On Oct 25, 2014, at 6:24 AM, Marc Joliet <marcec@gmx.de> wrote:
> 
> First of all: does grub2 support booting from a btrfs file system with
> skinny-metadata, or is it irrelevant?

Seems plausible if older kernels don't understand skinny-metadata, that GRUB2 won't either. So I just tested it with grub2-2.02-0.8.fc21 and it works. I'm surprised, actually.

The way I did this was creating a whole new fs with -Oskinny-metadata and using btrfs send receive to copy an existing system over. Kernel reports at boot time that the volume uses skinny extents.

Chris Murphy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-25 20:33     ` Chris Murphy
@ 2014-10-25 20:35       ` Chris Murphy
  2014-10-27  1:24         ` Marc Joliet
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Murphy @ 2014-10-25 20:35 UTC (permalink / raw)
  To: linux-btrfs


On Oct 25, 2014, at 2:33 PM, Chris Murphy <lists@colorremedies.com> wrote:

> 
> On Oct 25, 2014, at 6:24 AM, Marc Joliet <marcec@gmx.de> wrote:
>> 
>> First of all: does grub2 support booting from a btrfs file system with
>> skinny-metadata, or is it irrelevant?
> 
> Seems plausible if older kernels don't understand skinny-metadata, that GRUB2 won't either. So I just tested it with grub2-2.02-0.8.fc21 and it works. I'm surprised, actually.

I don't understand the nature of the incompatibility with older kernels. Can they not mount a Btrfs volume even as ro? If so then I'd expect GRUB to have a problem, so I'm going to guess that maybe a 3.9 or older kernel could ro mount a Btrfs volume with skinny extents and the incompatibility is writing.

Chris Murphy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-25 20:35       ` Chris Murphy
@ 2014-10-27  1:24         ` Marc Joliet
  2014-10-27  7:50           ` Duncan
  0 siblings, 1 reply; 28+ messages in thread
From: Marc Joliet @ 2014-10-27  1:24 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]

Am Sat, 25 Oct 2014 14:35:33 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> 
> On Oct 25, 2014, at 2:33 PM, Chris Murphy <lists@colorremedies.com> wrote:
> 
> > 
> > On Oct 25, 2014, at 6:24 AM, Marc Joliet <marcec@gmx.de> wrote:
> >> 
> >> First of all: does grub2 support booting from a btrfs file system with
> >> skinny-metadata, or is it irrelevant?
> > 
> > Seems plausible if older kernels don't understand skinny-metadata, that GRUB2 won't either. So I just tested it with grub2-2.02-0.8.fc21 and it works. I'm surprised, actually.
> 
> I don't understand the nature of the incompatibility with older kernels. Can they not mount a Btrfs volume even as ro? If so then I'd expect GRUB to have a problem, so I'm going to guess that maybe a 3.9 or older kernel could ro mount a Btrfs volume with skinny extents and the incompatibility is writing.

That sounds plausible, though I hope for a definitive answer. (FWIW, I
originally asked because I couldn't find any commits to grub2 related to skinny
metadata; the updates to the btrfs driver were fairly sparse.)

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-25 19:58     ` Marc Joliet
@ 2014-10-27  1:30       ` Marc Joliet
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Joliet @ 2014-10-27  1:30 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2468 bytes --]

Am Sat, 25 Oct 2014 21:58:08 +0200
schrieb Marc Joliet <marcec@gmx.de>:

> Am Sat, 25 Oct 2014 14:24:58 +0200
> schrieb Marc Joliet <marcec@gmx.de>:
> 
> > I can still access files on MARCEC_BACKUP just fine, and the snapshots are
> > still there ("btrfs subvolume list" succeeds).
> 
> Just an update: that was true for a while, but at one point listing directories
> and accessing the file system in general stopped working (all processes that
> touched the FS hung/zombified). This necessitated a hard reboot, since "reboot"
> and "halt" (so... "shutdown", really) didn't do anything other than spit out the
> usual "the system is rebooting" message.
> 
> Interestingly enough, the file system was (apparently) fine after that (just as
> Petr Janecek wrote), other than an invalid space cache file:
> 
>   [   65.477006] BTRFS info (device sdg2): The free space cache file
>   (2466854731776) is invalid. skip it
> 
> That is, running my backup routine worked just as before, and I can access
> files on the FS just fine.
> 
> Oh, and apparently the rebalance continued successfully?!
> 
>   [  342.540865] BTRFS info (device sdg2): continuing balance
>   [  342.599991] BTRFS info (device sdg2): relocating block group 2502355320832
>   flags 34 [  342.821608] BTRFS info (device sdg2): found 4 extents
>   [  343.056915] BTRFS info (device sdg2): relocating block group 2501818449920
>   flags 36 [  437.932405] BTRFS info (device sdg2): found 25086 extents
>   [  438.727197] BTRFS info (device sdg2): relocating block group 2501281579008
>   flags 36 [  557.319354] BTRFS info (device sdg2): found 83875 extents
> 
>   # btrfs balance status /media/MARCEC_BACKUP
>   No balance found on '/media/MARCEC_BACKUP'
> 
> No SEGFAULT anywhere. All I can say right now is "huh". Although I'll try
> starting a "balance -m" again tomorrow, because the continued balance only
> took about 3-4 minutes (maybe it .

Maybe it exploded, I don't know (sorry, clearly I didn't delete the entirety of
that incomplete train of thought).

Anyway, I did run a full "balance -m" again, and this time it finished
successfully.  Make of that what you will, but it appears that the bug is
non-deterministic (makes me wonder if Petr Janecek or anybody else who hit the
bug ever got a balance to finish).

HTH
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-20 16:34 ` David Sterba
  2014-10-21  9:29   ` Duncan
  2014-10-25 12:24   ` Marc Joliet
@ 2014-10-27  4:39   ` Zygo Blaxell
  2014-10-27  7:16     ` Duncan
  2 siblings, 1 reply; 28+ messages in thread
From: Zygo Blaxell @ 2014-10-27  4:39 UTC (permalink / raw)
  To: dsterba, linux-btrfs, clm, jbacik

[-- Attachment #1: Type: text/plain, Size: 3084 bytes --]

On Mon, Oct 20, 2014 at 06:34:03PM +0200, David Sterba wrote:
> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
> > I'd like to make it default with the 3.17 release of btrfs-progs.
> > Please let me know if you have objections.
> 
> For the record, 3.17 will not change the defaults. The timing of the
> poll was very bad to get enough feedback before the release. Let's keep
> it open for now.

I don't have hard data, but I do have disturbing soft data:

	12 btrfs filesystems with various mixed workloads

	4 of those w/skinny metadata (converted with btrfstune -x)

	3 of those have processes or the entire filesystem hanging
	every few days, triggering watchdog reboots

I'm still trying to find the smoking gun, but it looks like there's a
problem that only shows up when skinny metadata is enabled (or possibly
one that only shows up when both skinny and non-skinny are mixed?).

One thing that may be significant is _when_ those 3 hanging filesystems
are hanging:  when using rsync to update local files.  These machines are
using the traditional rsync copy-then-rename method rather than --inplace
updates.  There's no problem copying data into an empty directory with
rsync, but as soon as I start updating existing data, some process (not
necessarily rsync) using with the filesystem gets stuck within 36 hours,
and stays stuck for days.  If I don't run rsync on the skinny filesystems,
they'll run for a week or more without incident--and if I then start
running rsync again, they hang later the same day.

When I get kernel stacks they show ~50 processes stuck all over the
btrfs metadata manipulation code.  If someone wants to wade through
these I can collect them easily enough.

The 4th skinny-metadata machine--the one that doesn't hang often--is
the only one that isn't using rsync to receive files from elsewhere.
It's also the busiest filesystem (in iops/sec) with the largest variety
in its workload, so all things being equal it should be encountering
more random btrfs problems than the other three.

Some of my machines have multiple filesystems, some with skinny and
some without.  I've tried moving the rsync destination tree to the
non-skinny filesystems on those machines, and in those cases I was able
to complete several rsync updates without incident.  That seems to rule
out any system-level problem.

The 8 filesystems without skinny don't have the hang problem.  They have
had a variety of other issues, but not hangs alone.  Currently 3.17 +
stable-queue patches fixes all the problems I've encountered so far with
the non-skinny filesystems, so the skinny filesystems are now earning
most of my attention.

With this small sample size and data collection rate I admit I could
just have a spurious correlation.  The data also supports conclusions
such as "Western Digital hard drives cause hangs" or "filesystems
created in August 2014 cause hangs."  I'd encourage anyone with the
intrastructure set up to do a larger-scale test to see if this is--or
is not--reproducible.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-27  4:39   ` Zygo Blaxell
@ 2014-10-27  7:16     ` Duncan
  0 siblings, 0 replies; 28+ messages in thread
From: Duncan @ 2014-10-27  7:16 UTC (permalink / raw)
  To: linux-btrfs

Zygo Blaxell posted on Mon, 27 Oct 2014 00:39:25 -0400 as excerpted:

> One thing that may be significant is _when_ those 3 hanging filesystems
> are hanging:  when using rsync to update local files.  These machines
> are using the traditional rsync copy-then-rename method rather than
> --inplace updates.  There's no problem copying data into an empty
> directory with rsync, but as soon as I start updating existing data,
> some process (not necessarily rsync) using with the filesystem gets
> stuck within 36 hours, and stays stuck for days.  If I don't run rsync
> on the skinny filesystems,
> they'll run for a week or more without incident--and if I then start
> running rsync again, they hang later the same day.

Limited counterpoint here:

My packages partition is btrfs with skinny-metadata (skinny extents in 
dmsg), and the main gentoo tree on it gets regularly rsynced against 
gentoo servers.  In fact, my sync script does that *AND* a git-pull on 
three overlays, in parallel with the rsync so all three git-pulls and the 
rsync are happening at once.

No problems with that here. =:^)

However, I suspect other factors in my setup avoid whatever's triggering 
it for Zygo.

* The filesystem is btrfs raid1 mode data/metadata.

* Only 24 GiB in size (show says 19.78 GiB used, df says 15.84 of 18 GiB 
data used, 969 MiB of 1.75 GiB metadata used).

* Relatively fast SSD, ssd auto-detected and added as a mount option.

* I set the skinny-metadata option (and extref and no-holes) at 
mkfs.btrfs time, while Zygo converted and presumably has both fat and 
skinny metadata.

FWIW I've been spared all the rsync-triggered issues people have reported 
over time.  I'm guessing I don't hit the same race conditions because 
with the small filesystem my overhead is lower, and with the ssd I simply 
don't have the same bottlenecks.  So I'd not expect to hit this problem 
here either and that I'm not hitting it doesn't prove much, except that 
with reasonably fast ssds and smaller filesystems, whatever race 
conditions people seem to so commonly trigger with rsync elsewhere, 
simply don't seem to happen here.

So as I said, limited counterpoint, but offered FWIW.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-27  1:24         ` Marc Joliet
@ 2014-10-27  7:50           ` Duncan
  0 siblings, 0 replies; 28+ messages in thread
From: Duncan @ 2014-10-27  7:50 UTC (permalink / raw)
  To: linux-btrfs

Marc Joliet posted on Mon, 27 Oct 2014 02:24:15 +0100 as excerpted:

> Am Sat, 25 Oct 2014 14:35:33 -0600 schrieb Chris Murphy
> <lists@colorremedies.com>:
> 
> 
>> On Oct 25, 2014, at 2:33 PM, Chris Murphy <lists@colorremedies.com>
>> wrote:
>> 
>> 
>> > On Oct 25, 2014, at 6:24 AM, Marc Joliet <marcec@gmx.de> wrote:
>> >> 
>> >> First of all: does grub2 support booting from a btrfs file system
>> >> with skinny-metadata, or is it irrelevant?
>> > 
>> > Seems plausible if older kernels don't understand skinny-metadata,
>> > that GRUB2 won't either. So I just tested it with grub2-2.02-0.8.fc21
>> > and it works. I'm surprised, actually.
>> 
>> I don't understand the nature of the incompatibility with older
>> kernels. Can they not mount a Btrfs volume even as ro? If so then I'd
>> expect GRUB to have a problem, so I'm going to guess that maybe a 3.9
>> or older kernel could ro mount a Btrfs volume with skinny extents and
>> the incompatibility is writing.
> 
> That sounds plausible, though I hope for a definitive answer. (FWIW, I
> originally asked because I couldn't find any commits to grub2 related to
> skinny metadata; the updates to the btrfs driver were fairly sparse.)

FWIW I have three /boot partitions, one one each of my main drives.  All 
three are gpt with a reserved BIOS partition that grub2 installs its 
monolithic grub2core into, but have dedicated /boot partitions as well, 
for the grub2 config and additional grub2 modules, kernels, etc.  The 
third one is reiserfs on spinning rust, but the other two are btrfs on 
ssd.

Last time I updated I thought I switched them to skinny-metadata, but 
just checking dmesg while mounting them now, the second one (first 
backup) is skinny-metadata, but my working /boot is still fat-metadata.

I did test the backup (with the skinny-metadata) after I did the mkfs and 
restore and it booted to grub2 and from grub2 to my main system just 
fine, so grub2 with skinny-metadata *CAN* work.

But because it's my backup, I don't update it with new kernels as 
frequently as I do my working /boot, nor do I boot from it that often.  
So while I can be sure grub2 /can/ work with skinny-metadata, I do not 
yet know at this point if it does so /reliably/.

And of course, to the extent that grub2 works differently on MBR and/or 
on GPT when it doesn't have a reserved BIOS partition to put the 
monolithic grub2core in, I haven't tested that.  Tho in theory that 
should install in slack-space if available and the filesystem shouldn't 
affect that at all.  But I know reiserfs used to screw up grub1 very  
occasionally (maybe .5-1% of new kernel installations; it did it I think 
twice in about 7 years, and I run git kernels so update them reasonably 
frequently) on my old MBR setup without much slack-space to spare, and 
I'd have to reinstall grub1.

So that's a qualified skinny-metadata shouldn't affect grub2, as I've 
booted using grub2 on a btrfs with skinny-metadata /boot.  But I've 
simply not tested it enough to know whether it's reliable over time as 
the filesystem updates and changes, or not.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-18 15:53         ` Josef Bacik
@ 2014-10-18 16:01           ` Wang Shilong
  0 siblings, 0 replies; 28+ messages in thread
From: Wang Shilong @ 2014-10-18 16:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Petr Janecek, linux-btrfs, dsterba

Sure, that is cool, let me know if i could give any help!
I have an idle VM that could run btrfs tests there.^_^


> Thanks I'll run this on Monday.
> 
> Josef
> 
> Wang Shilong <wangshilong1991@gmail.com> wrote:
> 
> 
> Hello Josef,
> 
> With Skinny metadta and i running your btrfs-next repo for-suse branch
> (which has extent ref patch), i hit following problem:
> 
> [  250.679705] BTRFS info (device sdb): relocating block group 35597058048 flags 36
> [  250.728815] BTRFS info (device sdb): relocating block group 35462840320 flags 36
> [  253.562133] Dropping a ref for a root that doesn't have a ref on the block
> [  253.562475] Dumping block entry [34793177088 8192], num_refs 3, metadata 0
> [  253.562795]   Ref root 0, parent 35532013568, owner 23988, offset 0, num_refs 18446744073709551615
> [  253.563126]   Ref root 0, parent 35560964096, owner 23988, offset 0, num_refs 1
> [  253.563505]   Ref root 0, parent 35654615040, owner 23988, offset 0, num_refs 1
> [  253.563837]   Ref root 0, parent 35678650368, owner 23988, offset 0, num_refs 1
> [  253.564162]   Root entry 5, num_refs 1
> [  253.564520]   Root entry 18446744073709551608, num_refs 18446744073709551615
> [  253.564860]   Ref action 4, root 5, ref_root 5, parent 0, owner 23988, offset 0, num_refs 1
> [  253.565205]    [<ffffffffa049d2f1>] process_leaf.isra.6+0x281/0x3e0 [btrfs]
> [  253.565225]    [<ffffffffa049de83>] build_ref_tree_for_root+0x433/0x460 [btrfs]
> [  253.565234]    [<ffffffffa049e1af>] btrfs_build_ref_tree+0x18f/0x1c0 [btrfs]
> [  253.565241]    [<ffffffffa0419ce8>] open_ctree+0x18b8/0x21a0 [btrfs]
> [  253.565247]    [<ffffffffa03ecb0e>] btrfs_mount+0x62e/0x8b0 [btrfs]
> [  253.565251]    [<ffffffff812324e9>] mount_fs+0x39/0x1b0
> [  253.565255]    [<ffffffff8125285b>] vfs_kern_mount+0x6b/0x150
> [  253.565257]    [<ffffffff8125565b>] do_mount+0x27b/0xc30
> [  253.565259]    [<ffffffff81256356>] SyS_mount+0x96/0xf0
> [  253.565260]    [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
> [  253.565263]    [<ffffffffffffffff>] 0xffffffffffffffff
> [  253.565272]   Ref action 1, root 18446744073709551608, ref_root 0, parent 35654615040, owner 23988, offset 0, num_refs 1
> [  253.565681]    [<ffffffffa049d564>] btrfs_ref_tree_mod+0x114/0x570 [btrfs]
> [  253.565692]    [<ffffffffa03f946b>] btrfs_inc_extent_ref+0x6b/0x120 [btrfs]
> [  253.565697]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
> [  253.565702]    [<ffffffffa0401504>] btrfs_inc_ref+0x14/0x20 [btrfs]
> [  253.565707]    [<ffffffffa03f05ff>] update_ref_for_cow+0x15f/0x380 [btrfs]
> [  253.565711]    [<ffffffffa03f0a3d>] __btrfs_cow_block+0x21d/0x540 [btrfs]
> [  253.565716]    [<ffffffffa03f0f0c>] btrfs_cow_block+0x12c/0x290 [btrfs]
> [  253.565721]    [<ffffffffa046f59c>] do_relocation+0x49c/0x570 [btrfs]
> [  253.565728]    [<ffffffffa04723ce>] relocate_tree_blocks+0x60e/0x660 [btrfs]
> [  253.565735]    [<ffffffffa0473ce7>] relocate_block_group+0x407/0x690 [btrfs]
> [  253.565741]    [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
> [  253.565746]    [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
> [  253.565753]    [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
> [  253.565760]    [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
> [  253.565766]    [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
> [  253.565772]    [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
> [  253.565779]   Ref action 1, root 18446744073709551608, ref_root 0, parent 35560964096, owner 23988, offset 0, num_refs 1
> [  253.566143]    [<ffffffffa049d564>] btrfs_ref_tree_mod+0x114/0x570 [btrfs]
> [  253.566152]    [<ffffffffa03f946b>] btrfs_inc_extent_ref+0x6b/0x120 [btrfs]
> [  253.566180]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
> [  253.566186]    [<ffffffffa0401504>] btrfs_inc_ref+0x14/0x20 [btrfs]
> [  253.566191]    [<ffffffffa03f071b>] update_ref_for_cow+0x27b/0x380 [btrfs]
> [  253.566195]    [<ffffffffa03f0a3d>] __btrfs_cow_block+0x21d/0x540 [btrfs]
> [  253.566199]    [<ffffffffa03f0f0c>] btrfs_cow_block+0x12c/0x290 [btrfs]
> [  253.566203]    [<ffffffffa046f59c>] do_relocation+0x49c/0x570 [btrfs]
> [  253.566210]    [<ffffffffa04723ce>] relocate_tree_blocks+0x60e/0x660 [btrfs]
> [  253.566216]    [<ffffffffa0473ce7>] relocate_block_group+0x407/0x690 [btrfs]
> [  253.566222]    [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
> [  253.566227]    [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
> [  253.566233]    [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
> [  253.566240]    [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
> [  253.566245]    [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
> [  253.566252]    [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
> [  253.566258]   Ref action 2, root 18446744073709551608, ref_root 5, parent 0, owner 23988, offset 0, num_refs 18446744073709551615
> [  253.566641]    [<ffffffffa049d710>] btrfs_ref_tree_mod+0x2c0/0x570 [btrfs]
> [  253.566651]    [<ffffffffa040404a>] btrfs_free_extent+0x7a/0x180 [btrfs]
> [  253.566657]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
> [  253.566662]    [<ffffffffa0401521>] btrfs_dec_ref+0x11/0x20 [btrfs]
> [  253.566668]    [<ffffffffa03f07a8>] update_ref_for_cow+0x308/0x380 [btrfs]
> 
> Below is my Test scrips:
> 
> #!/bin/bash
> DEVICE=/dev/sdb
> TEST_MNT=/mnt
> SLEEP=3
> 
> function run_snapshots()
> {
>        i=1
>        while [ 1 ]
>        do
>                btrfs sub snapshot $TEST_MNT $TEST_MNT/snap_$i
>                a=$(($i%10))
>                if [ $a -eq 0 ]; then
>                        btrfs sub delete *
>                fi
>                ((i++))
>                sleep $SLEEP
>        done
> }
> 
> function run_compiling()
> {
>        while [ 1 ]
>        do
>                make -j4 -C $TEST_MNT/linux-btrfs
>                make -C $TEST_MNT/linux-btrfs clean
>        done
> }
> 
> function run_balance()
> {
>        while [ 1 ]
>        do
>                btrfs balance start $TEST_MNT
>                sleep $SLEEP
>        done
> }
> 
> run_snapshots &
> run_compiling &
> run_balance &
> 
> ——cut—
> 
> Mount options:
> /dev/sdb /mnt btrfs rw,relatime,space_cache 0 0
> 
> Here my /dev/sdb is 10G, and before comping kernel,run ‘make allmodconfig’
> Above tests maybe detect more problem, after running a while, system seems
> blocked, echo w > /proc/sysrq-trigger
> 
> [ 1970.909512] SysRq : Show Blocked State
>   2 [ 1970.910490]   task                        PC stack   pid father
>   3 [ 1970.910564] kworker/u128:9  D ffff880208a89a30     0  3514      2 0x00000080
>   4 [ 1970.910587] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
>   5 [ 1970.910590]  ffff8800b3dab8e8 0000000000000046 ffff8800b3dabfd8 00000000001d59c0
>   6 [ 1970.910594]  00000000001d59c0 ffff880208a89a30 ffff8801f71c3460 ffff8802303d6360
>   7 [ 1970.910597]  ffff88023ff509a8 ffff8800b3dab978 0000000000000002 ffffffff8178eda0
>   8 [ 1970.910600] Call Trace:
>   9 [ 1970.910606]  [<ffffffff8178eda0>] ? bit_wait+0x50/0x50
>  10 [ 1970.910609]  [<ffffffff8178e56d>] io_schedule+0x9d/0x130
>  11 [ 1970.910612]  [<ffffffff8178edcc>] bit_wait_io+0x2c/0x50
>  12 [ 1970.910614]  [<ffffffff8178eb3b>] __wait_on_bit_lock+0x4b/0xb0
>  13 [ 1970.910619]  [<ffffffff811aa2ef>] __lock_page+0xbf/0xe0
>  14 [ 1970.910623]  [<ffffffff810caa90>] ? autoremove_wake_function+0x40/0x40
>  15 [ 1970.910642]  [<ffffffffa043d9d0>] extent_write_cache_pages.isra.30.constprop.52+0x410/0x440 [btrfs]
>  16 [ 1970.910645]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
>  17 [ 1970.910648]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
>  18 [ 1970.910661]  [<ffffffffa043f92c>] extent_writepages+0x5c/0x90 [btrfs]
>  19 [ 1970.910672]  [<ffffffffa04216a0>] ? btrfs_submit_direct+0x6b0/0x6b0 [btrfs]
>  20 [ 1970.910674]  [<ffffffff810b7174>] ? local_clock+0x24/0x30
>  21 [ 1970.910685]  [<ffffffffa041f008>] btrfs_writepages+0x28/0x30 [btrfs]
>  22 [ 1970.910688]  [<ffffffff811b8a21>] do_writepages+0x21/0x50
>  23 [ 1970.910692]  [<ffffffff8125f920>] __writeback_single_inode+0x40/0x540
>  24 [ 1970.910694]  [<ffffffff81260425>] writeback_sb_inodes+0x275/0x520
>  25 [ 1970.910697]  [<ffffffff8126076f>] __writeback_inodes_wb+0x9f/0xd0
>  26 [ 1970.910700]  [<ffffffff81260a53>] wb_writeback+0x2b3/0x550
>  27 [ 1970.910702]  [<ffffffff811b7e90>] ? bdi_dirty_limit+0x40/0xe0
>  28 [ 1970.910705]  [<ffffffff812610d8>] bdi_writeback_workfn+0x1f8/0x650
>  29 [ 1970.910711]  [<ffffffff8109c684>] process_one_work+0x1c4/0x640
>  30 [ 1970.910713]  [<ffffffff8109c624>] ? process_one_work+0x164/0x640
>  31 [ 1970.910716]  [<ffffffff8109cc1b>] worker_thread+0x11b/0x490
>  32 [ 1970.910718]  [<ffffffff8109cb00>] ? process_one_work+0x640/0x640
>  33 [ 1970.910721]  [<ffffffff810a2f1f>] kthread+0xff/0x120
>  34 [ 1970.910724]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
>  35 [ 1970.910727]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
>  36 [ 1970.910730]  [<ffffffff8179537c>] ret_from_fork+0x7c/0xb0
>  37 [ 1970.910732]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
>  38 [ 1970.910737] kworker/u128:20 D ffff8801f71c3460     0  8244      2 0x00000080
>  39 [ 1970.910752] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
>  40 [ 1970.910754]  ffff88020aa2b640 0000000000000046 ffff88020aa2bfd8 00000000001d59c0
>  41 [ 1970.910757]  00000000001d59c0 ffff8801f71c3460 ffff880225e93460 7fffffffffffffff
>  42 [ 1970.910760]  ffff880035763520 ffff880035763518 ffff880225e93460 ffff880201c44000
>  43 [ 1970.910763] Call Trace:
>  44 [ 1970.910766]  [<ffffffff8178e209>] schedule+0x29/0x70
>  45 [ 1970.910769]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
>  46 [ 1970.910772]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
>  47 [ 1970.910775]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
>  48 [ 1970.910777]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
>  49 [ 1970.910780]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
>  50 [ 1970.910790]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
>  51 [ 1970.910802]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
>  52 [ 1970.910811]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
>  53 [ 1970.910821]  [<ffffffffa042365b>] cow_file_range_inline+0x49b/0x5e0 [btrfs]
>  54 [ 1970.910824]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
>  55 [ 1970.910833]  [<ffffffffa0423aa3>] cow_file_range+0x303/0x450 [btrfs]
>  56 [ 1970.910836]  [<ffffffff817945b7>] ? _raw_spin_unlock+0x27/0x40
>  57 [ 1970.910845]  [<ffffffffa0424a88>] run_delalloc_range+0x338/0x370 [btrfs]
>  58 [ 1970.910857]  [<ffffffffa043c5e9>] ? find_lock_delalloc_range+0x1e9/0x210 [btrfs]
>  59 [ 1970.910859]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
>  60 [ 1970.910870]  [<ffffffffa043c72c>] writepage_delalloc.isra.34+0x11c/0x180 [btrfs]
>  61 [ 1970.910880]  [<ffffffffa043d2fa>] __extent_writepage+0xca/0x390 [btrfs]
>  62 [ 1970.910883]  [<ffffffff811b6f49>] ? clear_page_dirty_for_io+0xc9/0x110
> [ 1970.910893]  [<ffffffffa043d93a>] extent_write_cache_pages.isra.30.constprop.52+0x37a/0x440 [btrfs]
>  64 [ 1970.910895]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
>  65 [ 1970.910898]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
>  66 [ 1970.910900]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
>  67 [ 1970.910909]  [<ffffffffa043f92c>] extent_writepages+0x5c/0x90 [btrfs]
>  68 [ 1970.910918]  [<ffffffffa04216a0>] ? btrfs_submit_direct+0x6b0/0x6b0 [btrfs]
>  69 [ 1970.910928]  [<ffffffffa041f008>] btrfs_writepages+0x28/0x30 [btrfs]
>  70 [ 1970.910930]  [<ffffffff811b8a21>] do_writepages+0x21/0x50
>  71 [ 1970.910933]  [<ffffffff811ac7dd>] __filemap_fdatawrite_range+0x5d/0x80
>  72 [ 1970.910936]  [<ffffffff811ac8ac>] filemap_flush+0x1c/0x20
>  73 [ 1970.910945]  [<ffffffffa042271a>] btrfs_run_delalloc_work+0x5a/0xa0 [btrfs]
>  74 [ 1970.910956]  [<ffffffffa044ec1f>] normal_work_helper+0x13f/0x5c0 [btrfs]
>  75 [ 1970.910966]  [<ffffffffa044f0f2>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
>  76 [ 1970.910969]  [<ffffffff8109c684>] process_one_work+0x1c4/0x640
>  77 [ 1970.910971]  [<ffffffff8109c624>] ? process_one_work+0x164/0x640
>  78 [ 1970.910976]  [<ffffffff8109cc1b>] worker_thread+0x11b/0x490
>  79 [ 1970.910978]  [<ffffffff8109cb00>] ? process_one_work+0x640/0x640
>  80 [ 1970.910981]  [<ffffffff810a2f1f>] kthread+0xff/0x120
>  81 [ 1970.910983]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
>  82 [ 1970.910986]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
>  83 [ 1970.910988]  [<ffffffff8179537c>] ret_from_fork+0x7c/0xb0
>  84 [ 1970.910991]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
>  85 [ 1970.910997] btrfs           D ffff88022beab460     0 62979   2587 0x00000080
>  86 [ 1970.911000]  ffff880054083870 0000000000000046 ffff880054083fd8 00000000001d59c0
>  87 [ 1970.911004]  00000000001d59c0 ffff88022beab460 ffff88017f740000 7fffffffffffffff
>  88 [ 1970.911007]  ffff8800546a9520 ffff8800546a9518 ffff88017f740000 ffff880201c44000
>  89 [ 1970.911010] Call Trace:
>  90 [ 1970.911012]  [<ffffffff8178e209>] schedule+0x29/0x70
>  91 [ 1970.911015]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
>  92 [ 1970.911018]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
>  93 [ 1970.911021]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
>  94 [ 1970.911023]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
>  95 [ 1970.911026]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
>  96 [ 1970.911035]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
>  97 [ 1970.911047]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
>  98 [ 1970.911059]  [<ffffffffa041e0c3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
>  99 [ 1970.911073]  [<ffffffffa0473cfe>] relocate_block_group+0x41e/0x690 [btrfs]
> 100 [ 1970.911086]  [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
> 101 [ 1970.911100]  [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
> 102 [ 1970.911102]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
> 103 [ 1970.911105]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
> 104 [ 1970.911118]  [<ffffffffa0435568>] ? btrfs_get_token_64+0x68/0x100 [btrfs]
> 105 [ 1970.911132]  [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
> 106 [ 1970.911146]  [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
> 107 [ 1970.911159]  [<ffffffffa045112a>] ? btrfs_ioctl_balance+0x16a/0x530 [btrfs]
> 108 [ 1970.911172]  [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
> 109 [ 1970.911186]  [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
> 110 [ 1970.911189]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
> 111 [ 1970.911191]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
> 112 [ 1970.911194]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
> 113 [ 1970.911197]  [<ffffffff810cfe7f>] ? up_read+0x1f/0x40
> 114 [ 1970.911200]  [<ffffffff81067a84>] ? __do_page_fault+0x254/0x5b0
> 115 [ 1970.911202]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
> 116 [ 1970.911206]  [<ffffffff81243830>] do_vfs_ioctl+0x300/0x520
> 117 [ 1970.911209]  [<ffffffff8124fc6d>] ? __fget_light+0x13d/0x160
> 118 [ 1970.911212]  [<ffffffff81243ad1>] SyS_ioctl+0x81/0xa0
> 119 [ 1970.911217]  [<ffffffff8114a49c>] ? __audit_syscall_entry+0x9c/0xf0
> 120 [ 1970.911220]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
> [ 1970.911217]  [<ffffffff8114a49c>] ? __audit_syscall_entry+0x9c/0xf0
> 120 [ 1970.911220]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
> 121 [ 1970.911228] as              D ffff880225e93460     0  6423   6421 0x00000080
> 122 [ 1970.911231]  ffff880049657ad0 0000000000000046 ffff880049657fd8 00000000001d59c0
> 123 [ 1970.911235]  00000000001d59c0 ffff880225e93460 ffff880225631a30 7fffffffffffffff
> 124 [ 1970.911238]  ffff880035762b20 ffff880035762b18 ffff880225631a30 ffff880201c44000
> 125 [ 1970.911241] Call Trace:
> 126 [ 1970.911244]  [<ffffffff8178e209>] schedule+0x29/0x70
> 127 [ 1970.911247]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
> 128 [ 1970.911250]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
> 129 [ 1970.911252]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
> 130 [ 1970.911255]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
> 131 [ 1970.911258]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
> 132 [ 1970.911268]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
> 133 [ 1970.911280]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
> 134 [ 1970.911292]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
> 135 [ 1970.911304]  [<ffffffffa0423088>] btrfs_dirty_inode+0x78/0xe0 [btrfs]
> 136 [ 1970.911307]  [<ffffffff8124cf55>] ? touch_atime+0xf5/0x160
> 137 [ 1970.911319]  [<ffffffffa0423154>] btrfs_update_time+0x64/0xd0 [btrfs]
> 138 [ 1970.911321]  [<ffffffff8124cdb5>] update_time+0x25/0xd0
> 139 [ 1970.911323]  [<ffffffff8124cf79>] touch_atime+0x119/0x160
> 140 [ 1970.911327]  [<ffffffff811acf34>] generic_file_read_iter+0x5f4/0x660
> 141 [ 1970.911330]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
> 142 [ 1970.911332]  [<ffffffff81790ed6>] ? mutex_lock_nested+0x2d6/0x520
> 143 [ 1970.911335]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
> 144 [ 1970.911338]  [<ffffffff8122d82b>] new_sync_read+0x8b/0xd0
> 145 [ 1970.911340]  [<ffffffff8122dfdb>] vfs_read+0x9b/0x180
> 146 [ 1970.911343]  [<ffffffff8122ecf8>] SyS_read+0x58/0xd0
> 147 [ 1970.911345]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
> 148 [ 1970.911347] as              D ffff88022bf01a30     0  6433   6431 0x00000080
> 149 [ 1970.911351]  ffff88016f3ffad0 0000000000000046 ffff88016f3fffd8 00000000001d59c0
> 150 [ 1970.911354]  00000000001d59c0 ffff88022bf01a30 ffff8800ba1a3460 7fffffffffffffff
> 151 [ 1970.911419]  ffff88017faa6820 ffff88017faa6818 ffff8800ba1a3460 ffff880201c44000
> 152 [ 1970.911423] Call Trace:
> 153 [ 1970.911426]  [<ffffffff8178e209>] schedule+0x29/0x70
> 154 [ 1970.911429]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
> 155 [ 1970.911432]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
> 156 [ 1970.911435]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
> 157 [ 1970.911438]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
> 158 [ 1970.911440]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
> 159 [ 1970.911452]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
> 160 [ 1970.911464]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
> 161 [ 1970.911476]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
> 162 [ 1970.911488]  [<ffffffffa0423088>] btrfs_dirty_inode+0x78/0xe0 [btrfs]
> 163 [ 1970.911490]  [<ffffffff8124cf55>] ? touch_atime+0xf5/0x160
> 164 [ 1970.911502]  [<ffffffffa0423154>] btrfs_update_time+0x64/0xd0 [btrfs]
> 165 [ 1970.911505]  [<ffffffff8124cdb5>] update_time+0x25/0xd0
> 166 [ 1970.911507]  [<ffffffff8124cf79>] touch_atime+0x119/0x160
> 167 [ 1970.911510]  [<ffffffff811acf34>] generic_file_read_iter+0x5f4/0x660
> 168 [ 1970.911513]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
> 169 [ 1970.911516]  [<ffffffff81790ed6>] ? mutex_lock_nested+0x2d6/0x520
> 170 [ 1970.911518]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
> 171 [ 1970.911521]  [<ffffffff8122d82b>] new_sync_read+0x8b/0xd0
> 172 [ 1970.911523]  [<ffffffff8122dfdb>] vfs_read+0x9b/0x180
> 173 [ 1970.911526]  [<ffffffff8122ecf8>] SyS_read+0x58/0xd0
> 174 [ 1970.911528]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
> 175 [ 1970.911530] ld              D ffff880225e93460     0  6435   6370 0x00000080
> 176 [ 1970.911534]  ffff880049623ad0 0000000000000046 ffff880049623fd8 00000000001d59c0
> 177 [ 1970.911537]  00000000001d59c0 ffff880225e93460 ffff8800364b4e90 7fffffffffffffff
> 178 [ 1970.911541]  ffff880035762820 ffff880035762818 ffff8800364b4e90 ffff880201c44000
> 179 [ 1970.911544] Call Trace:
> 180 [ 1970.911547]  [<ffffffff8178e209>] schedule+0x29/0x7
> 
> It is easy to reproduce this problem using my scripts…
> 
> 
>> On 10/18/2014 07:21 AM, Petr Janecek wrote:
>>> Hello,
>>> 
>>>>>  so far I haven't succeeded running btrfs balance on a large
>>>>> skinny-metadata fs -- segfault, kernel bug, reproducible.  No such
>>>>> problems on ^skinny-metadata fs (same disks, same data).  Tried both
>>>>> several times on 3.17.  More info in comments 10,14 in
>>>>> https://urldefense.proofpoint.com/v1/url?u=https://bugzilla.kernel.org/show_bug.cgi?id%3D64961&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=w2UsZEYXkYBIP7OADCD4aaaiMhrfPRT6P52q9vkc07k%3D%0A&s=e638e9ef49e562448ef1e564bcdc8ddc0ac2ef7e07a6e30c7405ec489ba4e672
>>>> 
>>>> I can't reproduce this, how big is your home directory, and are you
>>>> still seeing corruptions after just rsyncing to a clean fs?  Thanks,
>>> 
>>>  as I wrote in comment 10, it has improved since year ago when I
>>> reported it: I see no corruption at all, neither after rsync, nor after
>>> balance crash: btrfs check doesn't find anything wrong, files look ok.
>>> The only problem is that after adding a disk the balance segfaults on a
>>> kernel bug and the fs gets stuck.  When I run balance again after
>>> reboot, it makes only a very small progress and crashes again the same
>>> way.
>>> 
>>>  There are some 2.5TB of data in 7.5M files on that fs.  And couple
>>> dozen ro snapshots -- I'm testing 3.17 + revert of 9c3b306e1c9e right
>>> now, but it takes more than day to copy the data and recreate all the
>>> snapshots.  But a test with ^skinny-metadata showed no problems, so I
>>> don't thing I got bitten by that bug.
>>> 
>>>  I have btrfs-image of one of previous runs after crashed balance.
>>> It's 15GB. I can place it somewhere with fast link, are you interested?
>>> 
>>> 
>> 
>> Yup, send me the link and I'll pull it down.  Thanks,
>> 
>> Josef
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  https://urldefense.proofpoint.com/v1/url?u=http://vger.kernel.org/majordomo-info.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=w2UsZEYXkYBIP7OADCD4aaaiMhrfPRT6P52q9vkc07k%3D%0A&s=5db2bf67575db1c2c60f26d25b0419e691e95fffaf526828334f7896ee687a2e
> 
> Best Regards,
> Wang Shilong
> 

Best Regards,
Wang Shilong


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-18 15:52       ` Wang Shilong
@ 2014-10-18 15:53         ` Josef Bacik
  2014-10-18 16:01           ` Wang Shilong
  0 siblings, 1 reply; 28+ messages in thread
From: Josef Bacik @ 2014-10-18 15:53 UTC (permalink / raw)
  To: Wang Shilong; +Cc: Petr Janecek, linux-btrfs, dsterba

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="big5", Size: 22089 bytes --]

Thanks I'll run this on Monday.

Josef

Wang Shilong <wangshilong1991@gmail.com> wrote:


Hello Josef,

With Skinny metadta and i running your btrfs-next repo for-suse branch
(which has extent ref patch), i hit following problem:

[  250.679705] BTRFS info (device sdb): relocating block group 35597058048 flags 36
[  250.728815] BTRFS info (device sdb): relocating block group 35462840320 flags 36
[  253.562133] Dropping a ref for a root that doesn't have a ref on the block
[  253.562475] Dumping block entry [34793177088 8192], num_refs 3, metadata 0
[  253.562795]   Ref root 0, parent 35532013568, owner 23988, offset 0, num_refs 18446744073709551615
[  253.563126]   Ref root 0, parent 35560964096, owner 23988, offset 0, num_refs 1
[  253.563505]   Ref root 0, parent 35654615040, owner 23988, offset 0, num_refs 1
[  253.563837]   Ref root 0, parent 35678650368, owner 23988, offset 0, num_refs 1
[  253.564162]   Root entry 5, num_refs 1
[  253.564520]   Root entry 18446744073709551608, num_refs 18446744073709551615
[  253.564860]   Ref action 4, root 5, ref_root 5, parent 0, owner 23988, offset 0, num_refs 1
[  253.565205]    [<ffffffffa049d2f1>] process_leaf.isra.6+0x281/0x3e0 [btrfs]
[  253.565225]    [<ffffffffa049de83>] build_ref_tree_for_root+0x433/0x460 [btrfs]
[  253.565234]    [<ffffffffa049e1af>] btrfs_build_ref_tree+0x18f/0x1c0 [btrfs]
[  253.565241]    [<ffffffffa0419ce8>] open_ctree+0x18b8/0x21a0 [btrfs]
[  253.565247]    [<ffffffffa03ecb0e>] btrfs_mount+0x62e/0x8b0 [btrfs]
[  253.565251]    [<ffffffff812324e9>] mount_fs+0x39/0x1b0
[  253.565255]    [<ffffffff8125285b>] vfs_kern_mount+0x6b/0x150
[  253.565257]    [<ffffffff8125565b>] do_mount+0x27b/0xc30
[  253.565259]    [<ffffffff81256356>] SyS_mount+0x96/0xf0
[  253.565260]    [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
[  253.565263]    [<ffffffffffffffff>] 0xffffffffffffffff
[  253.565272]   Ref action 1, root 18446744073709551608, ref_root 0, parent 35654615040, owner 23988, offset 0, num_refs 1
[  253.565681]    [<ffffffffa049d564>] btrfs_ref_tree_mod+0x114/0x570 [btrfs]
[  253.565692]    [<ffffffffa03f946b>] btrfs_inc_extent_ref+0x6b/0x120 [btrfs]
[  253.565697]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
[  253.565702]    [<ffffffffa0401504>] btrfs_inc_ref+0x14/0x20 [btrfs]
[  253.565707]    [<ffffffffa03f05ff>] update_ref_for_cow+0x15f/0x380 [btrfs]
[  253.565711]    [<ffffffffa03f0a3d>] __btrfs_cow_block+0x21d/0x540 [btrfs]
[  253.565716]    [<ffffffffa03f0f0c>] btrfs_cow_block+0x12c/0x290 [btrfs]
[  253.565721]    [<ffffffffa046f59c>] do_relocation+0x49c/0x570 [btrfs]
[  253.565728]    [<ffffffffa04723ce>] relocate_tree_blocks+0x60e/0x660 [btrfs]
[  253.565735]    [<ffffffffa0473ce7>] relocate_block_group+0x407/0x690 [btrfs]
[  253.565741]    [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
[  253.565746]    [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
[  253.565753]    [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
[  253.565760]    [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
[  253.565766]    [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
[  253.565772]    [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
[  253.565779]   Ref action 1, root 18446744073709551608, ref_root 0, parent 35560964096, owner 23988, offset 0, num_refs 1
[  253.566143]    [<ffffffffa049d564>] btrfs_ref_tree_mod+0x114/0x570 [btrfs]
[  253.566152]    [<ffffffffa03f946b>] btrfs_inc_extent_ref+0x6b/0x120 [btrfs]
[  253.566180]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
[  253.566186]    [<ffffffffa0401504>] btrfs_inc_ref+0x14/0x20 [btrfs]
[  253.566191]    [<ffffffffa03f071b>] update_ref_for_cow+0x27b/0x380 [btrfs]
[  253.566195]    [<ffffffffa03f0a3d>] __btrfs_cow_block+0x21d/0x540 [btrfs]
[  253.566199]    [<ffffffffa03f0f0c>] btrfs_cow_block+0x12c/0x290 [btrfs]
[  253.566203]    [<ffffffffa046f59c>] do_relocation+0x49c/0x570 [btrfs]
[  253.566210]    [<ffffffffa04723ce>] relocate_tree_blocks+0x60e/0x660 [btrfs]
[  253.566216]    [<ffffffffa0473ce7>] relocate_block_group+0x407/0x690 [btrfs]
[  253.566222]    [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
[  253.566227]    [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
[  253.566233]    [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
[  253.566240]    [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
[  253.566245]    [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
[  253.566252]    [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
[  253.566258]   Ref action 2, root 18446744073709551608, ref_root 5, parent 0, owner 23988, offset 0, num_refs 18446744073709551615
[  253.566641]    [<ffffffffa049d710>] btrfs_ref_tree_mod+0x2c0/0x570 [btrfs]
[  253.566651]    [<ffffffffa040404a>] btrfs_free_extent+0x7a/0x180 [btrfs]
[  253.566657]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
[  253.566662]    [<ffffffffa0401521>] btrfs_dec_ref+0x11/0x20 [btrfs]
[  253.566668]    [<ffffffffa03f07a8>] update_ref_for_cow+0x308/0x380 [btrfs]

Below is my Test scrips:

#!/bin/bash
DEVICE=/dev/sdb
TEST_MNT=/mnt
SLEEP=3

function run_snapshots()
{
        i=1
        while [ 1 ]
        do
                btrfs sub snapshot $TEST_MNT $TEST_MNT/snap_$i
                a=$(($i%10))
                if [ $a -eq 0 ]; then
                        btrfs sub delete *
                fi
                ((i++))
                sleep $SLEEP
        done
}

function run_compiling()
{
        while [ 1 ]
        do
                make -j4 -C $TEST_MNT/linux-btrfs
                make -C $TEST_MNT/linux-btrfs clean
        done
}

function run_balance()
{
        while [ 1 ]
        do
                btrfs balance start $TEST_MNT
                sleep $SLEEP
        done
}

run_snapshots &
run_compiling &
run_balance &

¡X¡Xcut¡X

Mount options:
/dev/sdb /mnt btrfs rw,relatime,space_cache 0 0

Here my /dev/sdb is 10G, and before comping kernel,run ¡¥make allmodconfig¡¦
Above tests maybe detect more problem¡A after running a while, system seems
blocked, echo w > /proc/sysrq-trigger

[ 1970.909512] SysRq : Show Blocked State
   2 [ 1970.910490]   task                        PC stack   pid father
   3 [ 1970.910564] kworker/u128:9  D ffff880208a89a30     0  3514      2 0x00000080
   4 [ 1970.910587] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
   5 [ 1970.910590]  ffff8800b3dab8e8 0000000000000046 ffff8800b3dabfd8 00000000001d59c0
   6 [ 1970.910594]  00000000001d59c0 ffff880208a89a30 ffff8801f71c3460 ffff8802303d6360
   7 [ 1970.910597]  ffff88023ff509a8 ffff8800b3dab978 0000000000000002 ffffffff8178eda0
   8 [ 1970.910600] Call Trace:
   9 [ 1970.910606]  [<ffffffff8178eda0>] ? bit_wait+0x50/0x50
  10 [ 1970.910609]  [<ffffffff8178e56d>] io_schedule+0x9d/0x130
  11 [ 1970.910612]  [<ffffffff8178edcc>] bit_wait_io+0x2c/0x50
  12 [ 1970.910614]  [<ffffffff8178eb3b>] __wait_on_bit_lock+0x4b/0xb0
  13 [ 1970.910619]  [<ffffffff811aa2ef>] __lock_page+0xbf/0xe0
  14 [ 1970.910623]  [<ffffffff810caa90>] ? autoremove_wake_function+0x40/0x40
  15 [ 1970.910642]  [<ffffffffa043d9d0>] extent_write_cache_pages.isra.30.constprop.52+0x410/0x440 [btrfs]
  16 [ 1970.910645]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
  17 [ 1970.910648]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  18 [ 1970.910661]  [<ffffffffa043f92c>] extent_writepages+0x5c/0x90 [btrfs]
  19 [ 1970.910672]  [<ffffffffa04216a0>] ? btrfs_submit_direct+0x6b0/0x6b0 [btrfs]
  20 [ 1970.910674]  [<ffffffff810b7174>] ? local_clock+0x24/0x30
  21 [ 1970.910685]  [<ffffffffa041f008>] btrfs_writepages+0x28/0x30 [btrfs]
  22 [ 1970.910688]  [<ffffffff811b8a21>] do_writepages+0x21/0x50
  23 [ 1970.910692]  [<ffffffff8125f920>] __writeback_single_inode+0x40/0x540
  24 [ 1970.910694]  [<ffffffff81260425>] writeback_sb_inodes+0x275/0x520
  25 [ 1970.910697]  [<ffffffff8126076f>] __writeback_inodes_wb+0x9f/0xd0
  26 [ 1970.910700]  [<ffffffff81260a53>] wb_writeback+0x2b3/0x550
  27 [ 1970.910702]  [<ffffffff811b7e90>] ? bdi_dirty_limit+0x40/0xe0
  28 [ 1970.910705]  [<ffffffff812610d8>] bdi_writeback_workfn+0x1f8/0x650
  29 [ 1970.910711]  [<ffffffff8109c684>] process_one_work+0x1c4/0x640
  30 [ 1970.910713]  [<ffffffff8109c624>] ? process_one_work+0x164/0x640
  31 [ 1970.910716]  [<ffffffff8109cc1b>] worker_thread+0x11b/0x490
  32 [ 1970.910718]  [<ffffffff8109cb00>] ? process_one_work+0x640/0x640
  33 [ 1970.910721]  [<ffffffff810a2f1f>] kthread+0xff/0x120
  34 [ 1970.910724]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  35 [ 1970.910727]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
  36 [ 1970.910730]  [<ffffffff8179537c>] ret_from_fork+0x7c/0xb0
  37 [ 1970.910732]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
  38 [ 1970.910737] kworker/u128:20 D ffff8801f71c3460     0  8244      2 0x00000080
  39 [ 1970.910752] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
  40 [ 1970.910754]  ffff88020aa2b640 0000000000000046 ffff88020aa2bfd8 00000000001d59c0
  41 [ 1970.910757]  00000000001d59c0 ffff8801f71c3460 ffff880225e93460 7fffffffffffffff
  42 [ 1970.910760]  ffff880035763520 ffff880035763518 ffff880225e93460 ffff880201c44000
  43 [ 1970.910763] Call Trace:
  44 [ 1970.910766]  [<ffffffff8178e209>] schedule+0x29/0x70
  45 [ 1970.910769]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
  46 [ 1970.910772]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
  47 [ 1970.910775]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
  48 [ 1970.910777]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
  49 [ 1970.910780]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
  50 [ 1970.910790]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
  51 [ 1970.910802]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
  52 [ 1970.910811]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
  53 [ 1970.910821]  [<ffffffffa042365b>] cow_file_range_inline+0x49b/0x5e0 [btrfs]
  54 [ 1970.910824]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  55 [ 1970.910833]  [<ffffffffa0423aa3>] cow_file_range+0x303/0x450 [btrfs]
  56 [ 1970.910836]  [<ffffffff817945b7>] ? _raw_spin_unlock+0x27/0x40
  57 [ 1970.910845]  [<ffffffffa0424a88>] run_delalloc_range+0x338/0x370 [btrfs]
  58 [ 1970.910857]  [<ffffffffa043c5e9>] ? find_lock_delalloc_range+0x1e9/0x210 [btrfs]
  59 [ 1970.910859]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
  60 [ 1970.910870]  [<ffffffffa043c72c>] writepage_delalloc.isra.34+0x11c/0x180 [btrfs]
  61 [ 1970.910880]  [<ffffffffa043d2fa>] __extent_writepage+0xca/0x390 [btrfs]
  62 [ 1970.910883]  [<ffffffff811b6f49>] ? clear_page_dirty_for_io+0xc9/0x110
[ 1970.910893]  [<ffffffffa043d93a>] extent_write_cache_pages.isra.30.constprop.52+0x37a/0x440 [btrfs]
  64 [ 1970.910895]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  65 [ 1970.910898]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
  66 [ 1970.910900]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
  67 [ 1970.910909]  [<ffffffffa043f92c>] extent_writepages+0x5c/0x90 [btrfs]
  68 [ 1970.910918]  [<ffffffffa04216a0>] ? btrfs_submit_direct+0x6b0/0x6b0 [btrfs]
  69 [ 1970.910928]  [<ffffffffa041f008>] btrfs_writepages+0x28/0x30 [btrfs]
  70 [ 1970.910930]  [<ffffffff811b8a21>] do_writepages+0x21/0x50
  71 [ 1970.910933]  [<ffffffff811ac7dd>] __filemap_fdatawrite_range+0x5d/0x80
  72 [ 1970.910936]  [<ffffffff811ac8ac>] filemap_flush+0x1c/0x20
  73 [ 1970.910945]  [<ffffffffa042271a>] btrfs_run_delalloc_work+0x5a/0xa0 [btrfs]
  74 [ 1970.910956]  [<ffffffffa044ec1f>] normal_work_helper+0x13f/0x5c0 [btrfs]
  75 [ 1970.910966]  [<ffffffffa044f0f2>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
  76 [ 1970.910969]  [<ffffffff8109c684>] process_one_work+0x1c4/0x640
  77 [ 1970.910971]  [<ffffffff8109c624>] ? process_one_work+0x164/0x640
  78 [ 1970.910976]  [<ffffffff8109cc1b>] worker_thread+0x11b/0x490
  79 [ 1970.910978]  [<ffffffff8109cb00>] ? process_one_work+0x640/0x640
  80 [ 1970.910981]  [<ffffffff810a2f1f>] kthread+0xff/0x120
  81 [ 1970.910983]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  82 [ 1970.910986]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
  83 [ 1970.910988]  [<ffffffff8179537c>] ret_from_fork+0x7c/0xb0
  84 [ 1970.910991]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
  85 [ 1970.910997] btrfs           D ffff88022beab460     0 62979   2587 0x00000080
  86 [ 1970.911000]  ffff880054083870 0000000000000046 ffff880054083fd8 00000000001d59c0
  87 [ 1970.911004]  00000000001d59c0 ffff88022beab460 ffff88017f740000 7fffffffffffffff
  88 [ 1970.911007]  ffff8800546a9520 ffff8800546a9518 ffff88017f740000 ffff880201c44000
  89 [ 1970.911010] Call Trace:
  90 [ 1970.911012]  [<ffffffff8178e209>] schedule+0x29/0x70
  91 [ 1970.911015]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
  92 [ 1970.911018]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
  93 [ 1970.911021]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
  94 [ 1970.911023]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
  95 [ 1970.911026]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
  96 [ 1970.911035]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
  97 [ 1970.911047]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
  98 [ 1970.911059]  [<ffffffffa041e0c3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
  99 [ 1970.911073]  [<ffffffffa0473cfe>] relocate_block_group+0x41e/0x690 [btrfs]
 100 [ 1970.911086]  [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
 101 [ 1970.911100]  [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
 102 [ 1970.911102]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
 103 [ 1970.911105]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
 104 [ 1970.911118]  [<ffffffffa0435568>] ? btrfs_get_token_64+0x68/0x100 [btrfs]
 105 [ 1970.911132]  [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
 106 [ 1970.911146]  [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
 107 [ 1970.911159]  [<ffffffffa045112a>] ? btrfs_ioctl_balance+0x16a/0x530 [btrfs]
 108 [ 1970.911172]  [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
 109 [ 1970.911186]  [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
 110 [ 1970.911189]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
 111 [ 1970.911191]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
 112 [ 1970.911194]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
 113 [ 1970.911197]  [<ffffffff810cfe7f>] ? up_read+0x1f/0x40
 114 [ 1970.911200]  [<ffffffff81067a84>] ? __do_page_fault+0x254/0x5b0
 115 [ 1970.911202]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
 116 [ 1970.911206]  [<ffffffff81243830>] do_vfs_ioctl+0x300/0x520
 117 [ 1970.911209]  [<ffffffff8124fc6d>] ? __fget_light+0x13d/0x160
 118 [ 1970.911212]  [<ffffffff81243ad1>] SyS_ioctl+0x81/0xa0
 119 [ 1970.911217]  [<ffffffff8114a49c>] ? __audit_syscall_entry+0x9c/0xf0
 120 [ 1970.911220]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
[ 1970.911217]  [<ffffffff8114a49c>] ? __audit_syscall_entry+0x9c/0xf0
 120 [ 1970.911220]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
 121 [ 1970.911228] as              D ffff880225e93460     0  6423   6421 0x00000080
 122 [ 1970.911231]  ffff880049657ad0 0000000000000046 ffff880049657fd8 00000000001d59c0
 123 [ 1970.911235]  00000000001d59c0 ffff880225e93460 ffff880225631a30 7fffffffffffffff
 124 [ 1970.911238]  ffff880035762b20 ffff880035762b18 ffff880225631a30 ffff880201c44000
 125 [ 1970.911241] Call Trace:
 126 [ 1970.911244]  [<ffffffff8178e209>] schedule+0x29/0x70
 127 [ 1970.911247]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
 128 [ 1970.911250]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
 129 [ 1970.911252]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
 130 [ 1970.911255]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
 131 [ 1970.911258]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
 132 [ 1970.911268]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
 133 [ 1970.911280]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
 134 [ 1970.911292]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
 135 [ 1970.911304]  [<ffffffffa0423088>] btrfs_dirty_inode+0x78/0xe0 [btrfs]
 136 [ 1970.911307]  [<ffffffff8124cf55>] ? touch_atime+0xf5/0x160
 137 [ 1970.911319]  [<ffffffffa0423154>] btrfs_update_time+0x64/0xd0 [btrfs]
 138 [ 1970.911321]  [<ffffffff8124cdb5>] update_time+0x25/0xd0
 139 [ 1970.911323]  [<ffffffff8124cf79>] touch_atime+0x119/0x160
 140 [ 1970.911327]  [<ffffffff811acf34>] generic_file_read_iter+0x5f4/0x660
 141 [ 1970.911330]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
 142 [ 1970.911332]  [<ffffffff81790ed6>] ? mutex_lock_nested+0x2d6/0x520
 143 [ 1970.911335]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
 144 [ 1970.911338]  [<ffffffff8122d82b>] new_sync_read+0x8b/0xd0
 145 [ 1970.911340]  [<ffffffff8122dfdb>] vfs_read+0x9b/0x180
 146 [ 1970.911343]  [<ffffffff8122ecf8>] SyS_read+0x58/0xd0
 147 [ 1970.911345]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
 148 [ 1970.911347] as              D ffff88022bf01a30     0  6433   6431 0x00000080
 149 [ 1970.911351]  ffff88016f3ffad0 0000000000000046 ffff88016f3fffd8 00000000001d59c0
 150 [ 1970.911354]  00000000001d59c0 ffff88022bf01a30 ffff8800ba1a3460 7fffffffffffffff
 151 [ 1970.911419]  ffff88017faa6820 ffff88017faa6818 ffff8800ba1a3460 ffff880201c44000
 152 [ 1970.911423] Call Trace:
 153 [ 1970.911426]  [<ffffffff8178e209>] schedule+0x29/0x70
 154 [ 1970.911429]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
 155 [ 1970.911432]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
 156 [ 1970.911435]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
 157 [ 1970.911438]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
 158 [ 1970.911440]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
 159 [ 1970.911452]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
 160 [ 1970.911464]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
 161 [ 1970.911476]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
 162 [ 1970.911488]  [<ffffffffa0423088>] btrfs_dirty_inode+0x78/0xe0 [btrfs]
 163 [ 1970.911490]  [<ffffffff8124cf55>] ? touch_atime+0xf5/0x160
 164 [ 1970.911502]  [<ffffffffa0423154>] btrfs_update_time+0x64/0xd0 [btrfs]
 165 [ 1970.911505]  [<ffffffff8124cdb5>] update_time+0x25/0xd0
 166 [ 1970.911507]  [<ffffffff8124cf79>] touch_atime+0x119/0x160
 167 [ 1970.911510]  [<ffffffff811acf34>] generic_file_read_iter+0x5f4/0x660
 168 [ 1970.911513]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
 169 [ 1970.911516]  [<ffffffff81790ed6>] ? mutex_lock_nested+0x2d6/0x520
 170 [ 1970.911518]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
 171 [ 1970.911521]  [<ffffffff8122d82b>] new_sync_read+0x8b/0xd0
 172 [ 1970.911523]  [<ffffffff8122dfdb>] vfs_read+0x9b/0x180
 173 [ 1970.911526]  [<ffffffff8122ecf8>] SyS_read+0x58/0xd0
 174 [ 1970.911528]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
 175 [ 1970.911530] ld              D ffff880225e93460     0  6435   6370 0x00000080
 176 [ 1970.911534]  ffff880049623ad0 0000000000000046 ffff880049623fd8 00000000001d59c0
 177 [ 1970.911537]  00000000001d59c0 ffff880225e93460 ffff8800364b4e90 7fffffffffffffff
 178 [ 1970.911541]  ffff880035762820 ffff880035762818 ffff8800364b4e90 ffff880201c44000
 179 [ 1970.911544] Call Trace:
 180 [ 1970.911547]  [<ffffffff8178e209>] schedule+0x29/0x7

It is easy to reproduce this problem using my scripts¡K


> On 10/18/2014 07:21 AM, Petr Janecek wrote:
>> Hello,
>>
>>>>   so far I haven't succeeded running btrfs balance on a large
>>>> skinny-metadata fs -- segfault, kernel bug, reproducible.  No such
>>>> problems on ^skinny-metadata fs (same disks, same data).  Tried both
>>>> several times on 3.17.  More info in comments 10,14 in
>>>> https://urldefense.proofpoint.com/v1/url?u=https://bugzilla.kernel.org/show_bug.cgi?id%3D64961&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=w2UsZEYXkYBIP7OADCD4aaaiMhrfPRT6P52q9vkc07k%3D%0A&s=e638e9ef49e562448ef1e564bcdc8ddc0ac2ef7e07a6e30c7405ec489ba4e672
>>>
>>> I can't reproduce this, how big is your home directory, and are you
>>> still seeing corruptions after just rsyncing to a clean fs?  Thanks,
>>
>>   as I wrote in comment 10, it has improved since year ago when I
>> reported it: I see no corruption at all, neither after rsync, nor after
>> balance crash: btrfs check doesn't find anything wrong, files look ok.
>> The only problem is that after adding a disk the balance segfaults on a
>> kernel bug and the fs gets stuck.  When I run balance again after
>> reboot, it makes only a very small progress and crashes again the same
>> way.
>>
>>   There are some 2.5TB of data in 7.5M files on that fs.  And couple
>> dozen ro snapshots -- I'm testing 3.17 + revert of 9c3b306e1c9e right
>> now, but it takes more than day to copy the data and recreate all the
>> snapshots.  But a test with ^skinny-metadata showed no problems, so I
>> don't thing I got bitten by that bug.
>>
>>   I have btrfs-image of one of previous runs after crashed balance.
>> It's 15GB. I can place it somewhere with fast link, are you interested?
>>
>>
>
> Yup, send me the link and I'll pull it down.  Thanks,
>
> Josef
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  https://urldefense.proofpoint.com/v1/url?u=http://vger.kernel.org/majordomo-info.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=w2UsZEYXkYBIP7OADCD4aaaiMhrfPRT6P52q9vkc07k%3D%0A&s=5db2bf67575db1c2c60f26d25b0419e691e95fffaf526828334f7896ee687a2e

Best Regards,
Wang Shilong

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-18 14:04     ` Josef Bacik
@ 2014-10-18 15:52       ` Wang Shilong
  2014-10-18 15:53         ` Josef Bacik
  0 siblings, 1 reply; 28+ messages in thread
From: Wang Shilong @ 2014-10-18 15:52 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Petr Janecek, linux-btrfs, dsterba

Hello Josef,

With Skinny metadta and i running your btrfs-next repo for-suse branch
(which has extent ref patch), i hit following problem:

[  250.679705] BTRFS info (device sdb): relocating block group 35597058048 flags 36                                                                                                                        
[  250.728815] BTRFS info (device sdb): relocating block group 35462840320 flags 36
[  253.562133] Dropping a ref for a root that doesn't have a ref on the block
[  253.562475] Dumping block entry [34793177088 8192], num_refs 3, metadata 0
[  253.562795]   Ref root 0, parent 35532013568, owner 23988, offset 0, num_refs 18446744073709551615
[  253.563126]   Ref root 0, parent 35560964096, owner 23988, offset 0, num_refs 1
[  253.563505]   Ref root 0, parent 35654615040, owner 23988, offset 0, num_refs 1
[  253.563837]   Ref root 0, parent 35678650368, owner 23988, offset 0, num_refs 1
[  253.564162]   Root entry 5, num_refs 1
[  253.564520]   Root entry 18446744073709551608, num_refs 18446744073709551615
[  253.564860]   Ref action 4, root 5, ref_root 5, parent 0, owner 23988, offset 0, num_refs 1
[  253.565205]    [<ffffffffa049d2f1>] process_leaf.isra.6+0x281/0x3e0 [btrfs]
[  253.565225]    [<ffffffffa049de83>] build_ref_tree_for_root+0x433/0x460 [btrfs]
[  253.565234]    [<ffffffffa049e1af>] btrfs_build_ref_tree+0x18f/0x1c0 [btrfs]
[  253.565241]    [<ffffffffa0419ce8>] open_ctree+0x18b8/0x21a0 [btrfs]
[  253.565247]    [<ffffffffa03ecb0e>] btrfs_mount+0x62e/0x8b0 [btrfs]
[  253.565251]    [<ffffffff812324e9>] mount_fs+0x39/0x1b0
[  253.565255]    [<ffffffff8125285b>] vfs_kern_mount+0x6b/0x150
[  253.565257]    [<ffffffff8125565b>] do_mount+0x27b/0xc30
[  253.565259]    [<ffffffff81256356>] SyS_mount+0x96/0xf0
[  253.565260]    [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
[  253.565263]    [<ffffffffffffffff>] 0xffffffffffffffff
[  253.565272]   Ref action 1, root 18446744073709551608, ref_root 0, parent 35654615040, owner 23988, offset 0, num_refs 1
[  253.565681]    [<ffffffffa049d564>] btrfs_ref_tree_mod+0x114/0x570 [btrfs]
[  253.565692]    [<ffffffffa03f946b>] btrfs_inc_extent_ref+0x6b/0x120 [btrfs]
[  253.565697]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
[  253.565702]    [<ffffffffa0401504>] btrfs_inc_ref+0x14/0x20 [btrfs]
[  253.565707]    [<ffffffffa03f05ff>] update_ref_for_cow+0x15f/0x380 [btrfs]
[  253.565711]    [<ffffffffa03f0a3d>] __btrfs_cow_block+0x21d/0x540 [btrfs]
[  253.565716]    [<ffffffffa03f0f0c>] btrfs_cow_block+0x12c/0x290 [btrfs]
[  253.565721]    [<ffffffffa046f59c>] do_relocation+0x49c/0x570 [btrfs]
[  253.565728]    [<ffffffffa04723ce>] relocate_tree_blocks+0x60e/0x660 [btrfs]
[  253.565735]    [<ffffffffa0473ce7>] relocate_block_group+0x407/0x690 [btrfs]
[  253.565741]    [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
[  253.565746]    [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
[  253.565753]    [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
[  253.565760]    [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
[  253.565766]    [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
[  253.565772]    [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
[  253.565779]   Ref action 1, root 18446744073709551608, ref_root 0, parent 35560964096, owner 23988, offset 0, num_refs 1
[  253.566143]    [<ffffffffa049d564>] btrfs_ref_tree_mod+0x114/0x570 [btrfs]
[  253.566152]    [<ffffffffa03f946b>] btrfs_inc_extent_ref+0x6b/0x120 [btrfs]
[  253.566180]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
[  253.566186]    [<ffffffffa0401504>] btrfs_inc_ref+0x14/0x20 [btrfs]
[  253.566191]    [<ffffffffa03f071b>] update_ref_for_cow+0x27b/0x380 [btrfs]
[  253.566195]    [<ffffffffa03f0a3d>] __btrfs_cow_block+0x21d/0x540 [btrfs]
[  253.566199]    [<ffffffffa03f0f0c>] btrfs_cow_block+0x12c/0x290 [btrfs]
[  253.566203]    [<ffffffffa046f59c>] do_relocation+0x49c/0x570 [btrfs]
[  253.566210]    [<ffffffffa04723ce>] relocate_tree_blocks+0x60e/0x660 [btrfs]
[  253.566216]    [<ffffffffa0473ce7>] relocate_block_group+0x407/0x690 [btrfs]
[  253.566222]    [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
[  253.566227]    [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
[  253.566233]    [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
[  253.566240]    [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
[  253.566245]    [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
[  253.566252]    [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
[  253.566258]   Ref action 2, root 18446744073709551608, ref_root 5, parent 0, owner 23988, offset 0, num_refs 18446744073709551615
[  253.566641]    [<ffffffffa049d710>] btrfs_ref_tree_mod+0x2c0/0x570 [btrfs]
[  253.566651]    [<ffffffffa040404a>] btrfs_free_extent+0x7a/0x180 [btrfs]
[  253.566657]    [<ffffffffa03fb77c>] __btrfs_mod_ref+0x16c/0x2b0 [btrfs]
[  253.566662]    [<ffffffffa0401521>] btrfs_dec_ref+0x11/0x20 [btrfs]
[  253.566668]    [<ffffffffa03f07a8>] update_ref_for_cow+0x308/0x380 [btrfs]

Below is my Test scrips:

#!/bin/bash
DEVICE=/dev/sdb
TEST_MNT=/mnt
SLEEP=3

function run_snapshots()
{
	i=1
	while [ 1 ]
	do
		btrfs sub snapshot $TEST_MNT $TEST_MNT/snap_$i
		a=$(($i%10))
		if [ $a -eq 0 ]; then
			btrfs sub delete *
		fi
		((i++))
		sleep $SLEEP
	done
}

function run_compiling()
{
	while [ 1 ]
	do
		make -j4 -C $TEST_MNT/linux-btrfs
		make -C $TEST_MNT/linux-btrfs clean
	done
}

function run_balance()
{
	while [ 1 ]
	do
		btrfs balance start $TEST_MNT
		sleep $SLEEP
	done
}

run_snapshots &
run_compiling &
run_balance &

——cut—

Mount options:
/dev/sdb /mnt btrfs rw,relatime,space_cache 0 0

Here my /dev/sdb is 10G, and before comping kernel,run ‘make allmodconfig’
Above tests maybe detect more problem, after running a while, system seems
blocked, echo w > /proc/sysrq-trigger

[ 1970.909512] SysRq : Show Blocked State                                                                                                                                                             
   2 [ 1970.910490]   task                        PC stack   pid father
   3 [ 1970.910564] kworker/u128:9  D ffff880208a89a30     0  3514      2 0x00000080
   4 [ 1970.910587] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
   5 [ 1970.910590]  ffff8800b3dab8e8 0000000000000046 ffff8800b3dabfd8 00000000001d59c0
   6 [ 1970.910594]  00000000001d59c0 ffff880208a89a30 ffff8801f71c3460 ffff8802303d6360
   7 [ 1970.910597]  ffff88023ff509a8 ffff8800b3dab978 0000000000000002 ffffffff8178eda0
   8 [ 1970.910600] Call Trace:
   9 [ 1970.910606]  [<ffffffff8178eda0>] ? bit_wait+0x50/0x50
  10 [ 1970.910609]  [<ffffffff8178e56d>] io_schedule+0x9d/0x130
  11 [ 1970.910612]  [<ffffffff8178edcc>] bit_wait_io+0x2c/0x50
  12 [ 1970.910614]  [<ffffffff8178eb3b>] __wait_on_bit_lock+0x4b/0xb0
  13 [ 1970.910619]  [<ffffffff811aa2ef>] __lock_page+0xbf/0xe0
  14 [ 1970.910623]  [<ffffffff810caa90>] ? autoremove_wake_function+0x40/0x40
  15 [ 1970.910642]  [<ffffffffa043d9d0>] extent_write_cache_pages.isra.30.constprop.52+0x410/0x440 [btrfs]
  16 [ 1970.910645]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
  17 [ 1970.910648]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  18 [ 1970.910661]  [<ffffffffa043f92c>] extent_writepages+0x5c/0x90 [btrfs]
  19 [ 1970.910672]  [<ffffffffa04216a0>] ? btrfs_submit_direct+0x6b0/0x6b0 [btrfs]
  20 [ 1970.910674]  [<ffffffff810b7174>] ? local_clock+0x24/0x30
  21 [ 1970.910685]  [<ffffffffa041f008>] btrfs_writepages+0x28/0x30 [btrfs]
  22 [ 1970.910688]  [<ffffffff811b8a21>] do_writepages+0x21/0x50
  23 [ 1970.910692]  [<ffffffff8125f920>] __writeback_single_inode+0x40/0x540
  24 [ 1970.910694]  [<ffffffff81260425>] writeback_sb_inodes+0x275/0x520
  25 [ 1970.910697]  [<ffffffff8126076f>] __writeback_inodes_wb+0x9f/0xd0
  26 [ 1970.910700]  [<ffffffff81260a53>] wb_writeback+0x2b3/0x550
  27 [ 1970.910702]  [<ffffffff811b7e90>] ? bdi_dirty_limit+0x40/0xe0
  28 [ 1970.910705]  [<ffffffff812610d8>] bdi_writeback_workfn+0x1f8/0x650
  29 [ 1970.910711]  [<ffffffff8109c684>] process_one_work+0x1c4/0x640
  30 [ 1970.910713]  [<ffffffff8109c624>] ? process_one_work+0x164/0x640
  31 [ 1970.910716]  [<ffffffff8109cc1b>] worker_thread+0x11b/0x490
  32 [ 1970.910718]  [<ffffffff8109cb00>] ? process_one_work+0x640/0x640
  33 [ 1970.910721]  [<ffffffff810a2f1f>] kthread+0xff/0x120
  34 [ 1970.910724]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  35 [ 1970.910727]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
  36 [ 1970.910730]  [<ffffffff8179537c>] ret_from_fork+0x7c/0xb0
  37 [ 1970.910732]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
  38 [ 1970.910737] kworker/u128:20 D ffff8801f71c3460     0  8244      2 0x00000080
  39 [ 1970.910752] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
  40 [ 1970.910754]  ffff88020aa2b640 0000000000000046 ffff88020aa2bfd8 00000000001d59c0
  41 [ 1970.910757]  00000000001d59c0 ffff8801f71c3460 ffff880225e93460 7fffffffffffffff
  42 [ 1970.910760]  ffff880035763520 ffff880035763518 ffff880225e93460 ffff880201c44000
  43 [ 1970.910763] Call Trace:
  44 [ 1970.910766]  [<ffffffff8178e209>] schedule+0x29/0x70
  45 [ 1970.910769]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
  46 [ 1970.910772]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
  47 [ 1970.910775]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
  48 [ 1970.910777]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
  49 [ 1970.910780]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
  50 [ 1970.910790]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
  51 [ 1970.910802]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
  52 [ 1970.910811]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
  53 [ 1970.910821]  [<ffffffffa042365b>] cow_file_range_inline+0x49b/0x5e0 [btrfs]
  54 [ 1970.910824]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  55 [ 1970.910833]  [<ffffffffa0423aa3>] cow_file_range+0x303/0x450 [btrfs]
  56 [ 1970.910836]  [<ffffffff817945b7>] ? _raw_spin_unlock+0x27/0x40
  57 [ 1970.910845]  [<ffffffffa0424a88>] run_delalloc_range+0x338/0x370 [btrfs]
  58 [ 1970.910857]  [<ffffffffa043c5e9>] ? find_lock_delalloc_range+0x1e9/0x210 [btrfs]
  59 [ 1970.910859]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
  60 [ 1970.910870]  [<ffffffffa043c72c>] writepage_delalloc.isra.34+0x11c/0x180 [btrfs]
  61 [ 1970.910880]  [<ffffffffa043d2fa>] __extent_writepage+0xca/0x390 [btrfs]
  62 [ 1970.910883]  [<ffffffff811b6f49>] ? clear_page_dirty_for_io+0xc9/0x110
[ 1970.910893]  [<ffffffffa043d93a>] extent_write_cache_pages.isra.30.constprop.52+0x37a/0x440 [btrfs]
  64 [ 1970.910895]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  65 [ 1970.910898]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
  66 [ 1970.910900]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
  67 [ 1970.910909]  [<ffffffffa043f92c>] extent_writepages+0x5c/0x90 [btrfs]
  68 [ 1970.910918]  [<ffffffffa04216a0>] ? btrfs_submit_direct+0x6b0/0x6b0 [btrfs]
  69 [ 1970.910928]  [<ffffffffa041f008>] btrfs_writepages+0x28/0x30 [btrfs]
  70 [ 1970.910930]  [<ffffffff811b8a21>] do_writepages+0x21/0x50
  71 [ 1970.910933]  [<ffffffff811ac7dd>] __filemap_fdatawrite_range+0x5d/0x80
  72 [ 1970.910936]  [<ffffffff811ac8ac>] filemap_flush+0x1c/0x20
  73 [ 1970.910945]  [<ffffffffa042271a>] btrfs_run_delalloc_work+0x5a/0xa0 [btrfs]
  74 [ 1970.910956]  [<ffffffffa044ec1f>] normal_work_helper+0x13f/0x5c0 [btrfs]
  75 [ 1970.910966]  [<ffffffffa044f0f2>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
  76 [ 1970.910969]  [<ffffffff8109c684>] process_one_work+0x1c4/0x640
  77 [ 1970.910971]  [<ffffffff8109c624>] ? process_one_work+0x164/0x640
  78 [ 1970.910976]  [<ffffffff8109cc1b>] worker_thread+0x11b/0x490
  79 [ 1970.910978]  [<ffffffff8109cb00>] ? process_one_work+0x640/0x640
  80 [ 1970.910981]  [<ffffffff810a2f1f>] kthread+0xff/0x120
  81 [ 1970.910983]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
  82 [ 1970.910986]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
  83 [ 1970.910988]  [<ffffffff8179537c>] ret_from_fork+0x7c/0xb0
  84 [ 1970.910991]  [<ffffffff810a2e20>] ? kthread_create_on_node+0x250/0x250
  85 [ 1970.910997] btrfs           D ffff88022beab460     0 62979   2587 0x00000080
  86 [ 1970.911000]  ffff880054083870 0000000000000046 ffff880054083fd8 00000000001d59c0
  87 [ 1970.911004]  00000000001d59c0 ffff88022beab460 ffff88017f740000 7fffffffffffffff
  88 [ 1970.911007]  ffff8800546a9520 ffff8800546a9518 ffff88017f740000 ffff880201c44000
  89 [ 1970.911010] Call Trace:
  90 [ 1970.911012]  [<ffffffff8178e209>] schedule+0x29/0x70
  91 [ 1970.911015]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
  92 [ 1970.911018]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
  93 [ 1970.911021]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
  94 [ 1970.911023]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
  95 [ 1970.911026]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
  96 [ 1970.911035]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
  97 [ 1970.911047]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
  98 [ 1970.911059]  [<ffffffffa041e0c3>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
  99 [ 1970.911073]  [<ffffffffa0473cfe>] relocate_block_group+0x41e/0x690 [btrfs]
 100 [ 1970.911086]  [<ffffffffa0474148>] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs]
 101 [ 1970.911100]  [<ffffffffa04455a7>] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs]
 102 [ 1970.911102]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
 103 [ 1970.911105]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
 104 [ 1970.911118]  [<ffffffffa0435568>] ? btrfs_get_token_64+0x68/0x100 [btrfs]
 105 [ 1970.911132]  [<ffffffffa0448a8b>] __btrfs_balance+0x4eb/0x8d0 [btrfs]
 106 [ 1970.911146]  [<ffffffffa044928a>] btrfs_balance+0x41a/0x720 [btrfs]
 107 [ 1970.911159]  [<ffffffffa045112a>] ? btrfs_ioctl_balance+0x16a/0x530 [btrfs]
 108 [ 1970.911172]  [<ffffffffa045112a>] btrfs_ioctl_balance+0x16a/0x530 [btrfs]
 109 [ 1970.911186]  [<ffffffffa0456df8>] btrfs_ioctl+0x588/0x2cb0 [btrfs]
 110 [ 1970.911189]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
 111 [ 1970.911191]  [<ffffffff81024f39>] ? sched_clock+0x9/0x10
 112 [ 1970.911194]  [<ffffffff810b7175>] ? local_clock+0x25/0x30
 113 [ 1970.911197]  [<ffffffff810cfe7f>] ? up_read+0x1f/0x40
 114 [ 1970.911200]  [<ffffffff81067a84>] ? __do_page_fault+0x254/0x5b0
 115 [ 1970.911202]  [<ffffffff810d6a46>] ? __lock_acquire+0x396/0xbe0
 116 [ 1970.911206]  [<ffffffff81243830>] do_vfs_ioctl+0x300/0x520
 117 [ 1970.911209]  [<ffffffff8124fc6d>] ? __fget_light+0x13d/0x160
 118 [ 1970.911212]  [<ffffffff81243ad1>] SyS_ioctl+0x81/0xa0
 119 [ 1970.911217]  [<ffffffff8114a49c>] ? __audit_syscall_entry+0x9c/0xf0
 120 [ 1970.911220]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
[ 1970.911217]  [<ffffffff8114a49c>] ? __audit_syscall_entry+0x9c/0xf0                                                                                                                                 
 120 [ 1970.911220]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
 121 [ 1970.911228] as              D ffff880225e93460     0  6423   6421 0x00000080
 122 [ 1970.911231]  ffff880049657ad0 0000000000000046 ffff880049657fd8 00000000001d59c0
 123 [ 1970.911235]  00000000001d59c0 ffff880225e93460 ffff880225631a30 7fffffffffffffff
 124 [ 1970.911238]  ffff880035762b20 ffff880035762b18 ffff880225631a30 ffff880201c44000
 125 [ 1970.911241] Call Trace:
 126 [ 1970.911244]  [<ffffffff8178e209>] schedule+0x29/0x70
 127 [ 1970.911247]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
 128 [ 1970.911250]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
 129 [ 1970.911252]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
 130 [ 1970.911255]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
 131 [ 1970.911258]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
 132 [ 1970.911268]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
 133 [ 1970.911280]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
 134 [ 1970.911292]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
 135 [ 1970.911304]  [<ffffffffa0423088>] btrfs_dirty_inode+0x78/0xe0 [btrfs]
 136 [ 1970.911307]  [<ffffffff8124cf55>] ? touch_atime+0xf5/0x160
 137 [ 1970.911319]  [<ffffffffa0423154>] btrfs_update_time+0x64/0xd0 [btrfs]
 138 [ 1970.911321]  [<ffffffff8124cdb5>] update_time+0x25/0xd0
 139 [ 1970.911323]  [<ffffffff8124cf79>] touch_atime+0x119/0x160
 140 [ 1970.911327]  [<ffffffff811acf34>] generic_file_read_iter+0x5f4/0x660
 141 [ 1970.911330]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
 142 [ 1970.911332]  [<ffffffff81790ed6>] ? mutex_lock_nested+0x2d6/0x520
 143 [ 1970.911335]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
 144 [ 1970.911338]  [<ffffffff8122d82b>] new_sync_read+0x8b/0xd0
 145 [ 1970.911340]  [<ffffffff8122dfdb>] vfs_read+0x9b/0x180
 146 [ 1970.911343]  [<ffffffff8122ecf8>] SyS_read+0x58/0xd0
 147 [ 1970.911345]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
 148 [ 1970.911347] as              D ffff88022bf01a30     0  6433   6431 0x00000080
 149 [ 1970.911351]  ffff88016f3ffad0 0000000000000046 ffff88016f3fffd8 00000000001d59c0
 150 [ 1970.911354]  00000000001d59c0 ffff88022bf01a30 ffff8800ba1a3460 7fffffffffffffff
 151 [ 1970.911419]  ffff88017faa6820 ffff88017faa6818 ffff8800ba1a3460 ffff880201c44000
 152 [ 1970.911423] Call Trace:
 153 [ 1970.911426]  [<ffffffff8178e209>] schedule+0x29/0x70
 154 [ 1970.911429]  [<ffffffff81793621>] schedule_timeout+0x281/0x460
 155 [ 1970.911432]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
 156 [ 1970.911435]  [<ffffffff817946ac>] ? _raw_spin_unlock_irq+0x2c/0x40
 157 [ 1970.911438]  [<ffffffff8178f89c>] wait_for_completion+0xfc/0x140
 158 [ 1970.911440]  [<ffffffff810b31c0>] ? wake_up_state+0x20/0x20
 159 [ 1970.911452]  [<ffffffffa0400ff7>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
 160 [ 1970.911464]  [<ffffffffa041d0c8>] __btrfs_end_transaction+0x208/0x390 [btrfs]
 161 [ 1970.911476]  [<ffffffffa041d260>] btrfs_end_transaction+0x10/0x20 [btrfs]
 162 [ 1970.911488]  [<ffffffffa0423088>] btrfs_dirty_inode+0x78/0xe0 [btrfs]
 163 [ 1970.911490]  [<ffffffff8124cf55>] ? touch_atime+0xf5/0x160
 164 [ 1970.911502]  [<ffffffffa0423154>] btrfs_update_time+0x64/0xd0 [btrfs]
 165 [ 1970.911505]  [<ffffffff8124cdb5>] update_time+0x25/0xd0
 166 [ 1970.911507]  [<ffffffff8124cf79>] touch_atime+0x119/0x160
 167 [ 1970.911510]  [<ffffffff811acf34>] generic_file_read_iter+0x5f4/0x660
 168 [ 1970.911513]  [<ffffffff810d47d5>] ? mark_held_locks+0x75/0xa0
 169 [ 1970.911516]  [<ffffffff81790ed6>] ? mutex_lock_nested+0x2d6/0x520
 170 [ 1970.911518]  [<ffffffff81024ec5>] ? native_sched_clock+0x35/0xa0
 171 [ 1970.911521]  [<ffffffff8122d82b>] new_sync_read+0x8b/0xd0
 172 [ 1970.911523]  [<ffffffff8122dfdb>] vfs_read+0x9b/0x180
 173 [ 1970.911526]  [<ffffffff8122ecf8>] SyS_read+0x58/0xd0
 174 [ 1970.911528]  [<ffffffff81795429>] system_call_fastpath+0x16/0x1b
 175 [ 1970.911530] ld              D ffff880225e93460     0  6435   6370 0x00000080
 176 [ 1970.911534]  ffff880049623ad0 0000000000000046 ffff880049623fd8 00000000001d59c0
 177 [ 1970.911537]  00000000001d59c0 ffff880225e93460 ffff8800364b4e90 7fffffffffffffff
 178 [ 1970.911541]  ffff880035762820 ffff880035762818 ffff8800364b4e90 ffff880201c44000
 179 [ 1970.911544] Call Trace:
 180 [ 1970.911547]  [<ffffffff8178e209>] schedule+0x29/0x7

It is easy to reproduce this problem using my scripts…


> On 10/18/2014 07:21 AM, Petr Janecek wrote:
>> Hello,
>> 
>>>>   so far I haven't succeeded running btrfs balance on a large
>>>> skinny-metadata fs -- segfault, kernel bug, reproducible.  No such
>>>> problems on ^skinny-metadata fs (same disks, same data).  Tried both
>>>> several times on 3.17.  More info in comments 10,14 in
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=64961
>>> 
>>> I can't reproduce this, how big is your home directory, and are you
>>> still seeing corruptions after just rsyncing to a clean fs?  Thanks,
>> 
>>   as I wrote in comment 10, it has improved since year ago when I
>> reported it: I see no corruption at all, neither after rsync, nor after
>> balance crash: btrfs check doesn't find anything wrong, files look ok.
>> The only problem is that after adding a disk the balance segfaults on a
>> kernel bug and the fs gets stuck.  When I run balance again after
>> reboot, it makes only a very small progress and crashes again the same
>> way.
>> 
>>   There are some 2.5TB of data in 7.5M files on that fs.  And couple
>> dozen ro snapshots -- I'm testing 3.17 + revert of 9c3b306e1c9e right
>> now, but it takes more than day to copy the data and recreate all the
>> snapshots.  But a test with ^skinny-metadata showed no problems, so I
>> don't thing I got bitten by that bug.
>> 
>>   I have btrfs-image of one of previous runs after crashed balance.
>> It's 15GB. I can place it somewhere with fast link, are you interested?
>> 
>> 
> 
> Yup, send me the link and I'll pull it down.  Thanks,
> 
> Josef
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-18 11:21   ` Petr Janecek
@ 2014-10-18 14:04     ` Josef Bacik
  2014-10-18 15:52       ` Wang Shilong
  0 siblings, 1 reply; 28+ messages in thread
From: Josef Bacik @ 2014-10-18 14:04 UTC (permalink / raw)
  To: Petr Janecek; +Cc: linux-btrfs, dsterba

On 10/18/2014 07:21 AM, Petr Janecek wrote:
> Hello,
>
>>>    so far I haven't succeeded running btrfs balance on a large
>>> skinny-metadata fs -- segfault, kernel bug, reproducible.  No such
>>> problems on ^skinny-metadata fs (same disks, same data).  Tried both
>>> several times on 3.17.  More info in comments 10,14 in
>>> https://bugzilla.kernel.org/show_bug.cgi?id=64961
>>
>> I can't reproduce this, how big is your home directory, and are you
>> still seeing corruptions after just rsyncing to a clean fs?  Thanks,
>
>    as I wrote in comment 10, it has improved since year ago when I
> reported it: I see no corruption at all, neither after rsync, nor after
> balance crash: btrfs check doesn't find anything wrong, files look ok.
> The only problem is that after adding a disk the balance segfaults on a
> kernel bug and the fs gets stuck.  When I run balance again after
> reboot, it makes only a very small progress and crashes again the same
> way.
>
>    There are some 2.5TB of data in 7.5M files on that fs.  And couple
> dozen ro snapshots -- I'm testing 3.17 + revert of 9c3b306e1c9e right
> now, but it takes more than day to copy the data and recreate all the
> snapshots.  But a test with ^skinny-metadata showed no problems, so I
> don't thing I got bitten by that bug.
>
>    I have btrfs-image of one of previous runs after crashed balance.
> It's 15GB. I can place it somewhere with fast link, are you interested?
>
>

Yup, send me the link and I'll pull it down.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-17 18:25 ` Josef Bacik
@ 2014-10-18 11:21   ` Petr Janecek
  2014-10-18 14:04     ` Josef Bacik
  0 siblings, 1 reply; 28+ messages in thread
From: Petr Janecek @ 2014-10-18 11:21 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, dsterba

Hello,

> >   so far I haven't succeeded running btrfs balance on a large
> >skinny-metadata fs -- segfault, kernel bug, reproducible.  No such
> >problems on ^skinny-metadata fs (same disks, same data).  Tried both
> >several times on 3.17.  More info in comments 10,14 in
> >https://bugzilla.kernel.org/show_bug.cgi?id=64961
> 
> I can't reproduce this, how big is your home directory, and are you
> still seeing corruptions after just rsyncing to a clean fs?  Thanks,

  as I wrote in comment 10, it has improved since year ago when I
reported it: I see no corruption at all, neither after rsync, nor after
balance crash: btrfs check doesn't find anything wrong, files look ok.
The only problem is that after adding a disk the balance segfaults on a
kernel bug and the fs gets stuck.  When I run balance again after
reboot, it makes only a very small progress and crashes again the same
way.
 
  There are some 2.5TB of data in 7.5M files on that fs.  And couple
dozen ro snapshots -- I'm testing 3.17 + revert of 9c3b306e1c9e right
now, but it takes more than day to copy the data and recreate all the
snapshots.  But a test with ^skinny-metadata showed no problems, so I
don't thing I got bitten by that bug.

  I have btrfs-image of one of previous runs after crashed balance.
It's 15GB. I can place it somewhere with fast link, are you interested?


Thanks,

Petr

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
  2014-10-17 12:30 Petr Janecek
@ 2014-10-17 18:25 ` Josef Bacik
  2014-10-18 11:21   ` Petr Janecek
  0 siblings, 1 reply; 28+ messages in thread
From: Josef Bacik @ 2014-10-17 18:25 UTC (permalink / raw)
  To: Petr Janecek, linux-btrfs; +Cc: dsterba

On 10/17/2014 08:30 AM, Petr Janecek wrote:
> Hello,
>
>> the core of skinny-metadata feature has been merged in 3.10 (Jun 2013)
>> and has been reportedly used by many people. No major bugs were reported
>> lately unless I missed them.
>
>    so far I haven't succeeded running btrfs balance on a large
> skinny-metadata fs -- segfault, kernel bug, reproducible.  No such
> problems on ^skinny-metadata fs (same disks, same data).  Tried both
> several times on 3.17.  More info in comments 10,14 in
> https://urldefense.proofpoint.com/v1/url?u=https://bugzilla.kernel.org/show_bug.cgi?id%3D64961&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=3qxE39iiu%2BoZB%2F05dE7hnGHZojWhjjijrtjNYki0NFg%3D%0A&s=b262347a1ad2505ebdcb21dcc9f0944a14c174a1dcf447746ce196faddd99092
>
>

I can't reproduce this, how big is your home directory, and are you 
still seeing corruptions after just rsyncing to a clean fs?  Thanks,

Josef


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Poll: time to switch skinny-metadata on by default?
@ 2014-10-17 12:30 Petr Janecek
  2014-10-17 18:25 ` Josef Bacik
  0 siblings, 1 reply; 28+ messages in thread
From: Petr Janecek @ 2014-10-17 12:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Hello,
 
> the core of skinny-metadata feature has been merged in 3.10 (Jun 2013)
> and has been reportedly used by many people. No major bugs were reported
> lately unless I missed them.

  so far I haven't succeeded running btrfs balance on a large
skinny-metadata fs -- segfault, kernel bug, reproducible.  No such
problems on ^skinny-metadata fs (same disks, same data).  Tried both
several times on 3.17.  More info in comments 10,14 in
https://bugzilla.kernel.org/show_bug.cgi?id=64961


Regards,

Petr

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-10-27  7:51 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-16 11:33 Poll: time to switch skinny-metadata on by default? David Sterba
2014-10-20 16:34 ` David Sterba
2014-10-21  9:29   ` Duncan
2014-10-21 11:02     ` Austin S Hemmelgarn
2014-10-21 12:35       ` Konstantinos Skarlatos
2014-10-21 16:40     ` Rich Freeman
2014-10-22  2:08       ` Duncan
2014-10-22 12:49         ` Dave
2014-10-23  2:41           ` Duncan
2014-10-23 13:37             ` David Sterba
2014-10-23 14:47         ` Tobias Geerinckx-Rice
2014-10-24  1:33           ` Duncan
2014-10-25 12:24   ` Marc Joliet
2014-10-25 19:58     ` Marc Joliet
2014-10-27  1:30       ` Marc Joliet
2014-10-25 20:33     ` Chris Murphy
2014-10-25 20:35       ` Chris Murphy
2014-10-27  1:24         ` Marc Joliet
2014-10-27  7:50           ` Duncan
2014-10-27  4:39   ` Zygo Blaxell
2014-10-27  7:16     ` Duncan
2014-10-17 12:30 Petr Janecek
2014-10-17 18:25 ` Josef Bacik
2014-10-18 11:21   ` Petr Janecek
2014-10-18 14:04     ` Josef Bacik
2014-10-18 15:52       ` Wang Shilong
2014-10-18 15:53         ` Josef Bacik
2014-10-18 16:01           ` Wang Shilong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.