All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found] <CAJ3TwYQXqUZiKhYc5rciTmvGX1RLkHnkQb5SSYAJ7AD+kbudag@mail.gmail.com>
@ 2015-07-31  2:34 ` Qu Wenruo
  2015-07-31  4:10   ` John Ettedgui
       [not found]   ` <CAJ3TwYRN+1tJY+paz=qZT0_XP=r9CcTKbBgX_kZRFOWj8vSK=w@mail.gmail.com>
  0 siblings, 2 replies; 54+ messages in thread
From: Qu Wenruo @ 2015-07-31  2:34 UTC (permalink / raw)
  To: John Ettedgui, linux-btrfs, georgi-georgiev-btrfs



John Ettedgui wrote on 2015/07/29 18:55 +0000:
> Hello,
> I have the same issue and would like to add myself to this thread.
> My btrfs partition is about 10tb on top of lvm2 and has been taking about a minute to mount in the past few months.
>
>> Qu Wenruo <quwenruo <at>cn.fujitsu.com <http://cn.fujitsu.com>> writes:
>>
>> Quite common, especial when it grows large.
>> But it would be much better to use ftrace to show which btrfs operation
>> takes the most time.
>
>
> I have got a trace file running this command:
> trace-cmd record -e btrfs mount <PARTITION>
>
> Since it is fairly big for an email I have gzipped it.
>
> Thanks!
> John
>
Hi John,
Thanks for the trace output.

But it seems that, your root partition is also btrfs, causing a lot of 
btrfs trace from your systemd journal.

Would you mind re-collecting the ftrace without such logging system 
caused btrfs trace?

BTW, although I'm not quite familiar with ftrace, would you please 
consider collect ftrace with function_graph tracer?
That would help a lot to find which takes the most time.
But it may trace too much things and maybe hard to read.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-07-31  2:34 ` mount btrfs takes 30 minutes, btrfs check runs out of memory Qu Wenruo
@ 2015-07-31  4:10   ` John Ettedgui
  2015-08-02  5:44     ` Georgi Georgiev
       [not found]   ` <CAJ3TwYRN+1tJY+paz=qZT0_XP=r9CcTKbBgX_kZRFOWj8vSK=w@mail.gmail.com>
  1 sibling, 1 reply; 54+ messages in thread
From: John Ettedgui @ 2015-07-31  4:10 UTC (permalink / raw)
  Cc: “linux-btrfs@vger.kernel.org”, georgi-georgiev-btrfs

On Thu, Jul 30, 2015 at 7:34 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>
> Hi John,
> Thanks for the trace output.
You are welcome, thank you for looking at it!
>
> But it seems that, your root partition is also btrfs, causing a lot of btrfs
> trace from your systemd journal.
>
Oh yes sorry about that.
I actually have 3 partition in btrfs, the problematic one being the
only big one.
> Would you mind re-collecting the ftrace without such logging system caused
> btrfs trace?
Sure, how would I do that?
This is my first time using ftrace.
>
> BTW, although I'm not quite familiar with ftrace, would you please consider
> collect ftrace with function_graph tracer?
Sure, how would I do that one as well?
(I'll look these up in the meantime, I just want to make sure to not
give you something not useful again).
> That would help a lot to find which takes the most time.
> But it may trace too much things and maybe hard to read.
>
> Thanks,
> Qu

Great, thank you!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]   ` <CAJ3TwYRN+1tJY+paz=qZT0_XP=r9CcTKbBgX_kZRFOWj8vSK=w@mail.gmail.com>
@ 2015-07-31  4:52     ` Qu Wenruo
       [not found]       ` <CAJ3TwYR5g-JhjmGnZUXqLXc7qV1_=AN5_6sj54JQODbtgG9Aag@mail.gmail.com>
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2015-07-31  4:52 UTC (permalink / raw)
  To: John Ettedgui, btrfs



John Ettedgui wrote on 2015/07/30 21:09 -0700:
> On Thu, Jul 30, 2015 at 7:34 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>> Hi John,
>> Thanks for the trace output.
> You are welcome, thank you for looking at it!
>>
>> But it seems that, your root partition is also btrfs, causing a lot of btrfs
>> trace from your systemd journal.
>>
> Oh yes sorry about that.
> I actually have 3 partition in btrfs, the problematic one being the
> only big one.
>> Would you mind re-collecting the ftrace without such logging system caused
>> btrfs trace?
> Sure, how would I do that?
> This is my first time using ftrace.

I'm not familiar with ftrace either, but your trace is good enough 
already, the only thing needed is to avoid using btrfs as root 
partition(at least /var/).

My personal recommendation is to use a liveCD or rescue media to do the 
trace dump.

Other recommendation is to enable all btrfs trace point, and it seems 
that you have already done it while collecting the trace.
>>
>> BTW, although I'm not quite familiar with ftrace, would you please consider
>> collect ftrace with function_graph tracer?
> Sure, how would I do that one as well?
> (I'll look these up in the meantime, I just want to make sure to not
> give you something not useful again).

This LWN article should help you, as I'm not so familiar with it either.

https://lwn.net/Articles/370423/
<The function_graph tracer> paragraph.

And the graph_function is btrfs_mount.

Thanks,
Qu

>> That would help a lot to find which takes the most time.
>> But it may trace too much things and maybe hard to read.
>>
>> Thanks,
>> Qu
>
> Great, thank you!
> John
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]       ` <CAJ3TwYR5g-JhjmGnZUXqLXc7qV1_=AN5_6sj54JQODbtgG9Aag@mail.gmail.com>
@ 2015-07-31  5:40         ` Qu Wenruo
  2015-07-31  5:45           ` John Ettedgui
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2015-07-31  5:40 UTC (permalink / raw)
  To: John Ettedgui; +Cc: btrfs



John Ettedgui wrote on 2015/07/30 22:15 -0700:
> On Thu, Jul 30, 2015 at 9:52 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> I'm not familiar with ftrace either, but your trace is good enough already,
>> the only thing needed is to avoid using btrfs as root partition(at least
>> /var/).
> I've stopped all journaling services for now, I hope that's enough/
>>
>> My personal recommendation is to use a liveCD or rescue media to do the
>> trace dump.
>>
> If not I'll have to do that, but this computer has no CD drive.
It seems that you're using Chromium while doing the dump. :)

If no CD drive, I'll recommend to use Archlinux installation iso to make 
a bootable USB stick and do the dump.
(just download and dd would do the trick)
As its kernel and tools is much newer than most distribution.

It's better to provide two trace.
One is the function tracer one, with "btrfs:*" as set_event.
The other is the function_graph one. with "btrfs_mount" as 
set_graph_function.

Thanks for your patient to help improving btrfs.
Although I may not be able to check the trace until next Monday... :(

Thanks,
Qu

>> Other recommendation is to enable all btrfs trace point, and it seems that
>> you have already done it while collecting the trace.
>>>>
>> This LWN article should help you, as I'm not so familiar with it either.
>>
>> https://lwn.net/Articles/370423/
>> <The function_graph tracer> paragraph.
>>
>> And the graph_function is btrfs_mount.
> That actually helped a lot!
> I've been trying to get it working since I sent the previous email,
> but never realized I needed the supply the function and that's
> probably it never worked (or used too much space before crashing)
>>
>> Thanks,
>> Qu
>>
> I hope this is better.
>
> John
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-07-31  5:40         ` Qu Wenruo
@ 2015-07-31  5:45           ` John Ettedgui
  2015-08-01  4:35             ` John Ettedgui
  0 siblings, 1 reply; 54+ messages in thread
From: John Ettedgui @ 2015-07-31  5:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs

On Thu, Jul 30, 2015 at 10:40 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>

>
> It seems that you're using Chromium while doing the dump. :)
>
Ooops I did not think that would be an issue :/
> If no CD drive, I'll recommend to use Archlinux installation iso to make a
> bootable USB stick and do the dump.
> (just download and dd would do the trick)
> As its kernel and tools is much newer than most distribution.
Sure that's the distribution I use anyway. :)
I should have some usb stick somewhere to try it.
>
> It's better to provide two trace.
> One is the function tracer one, with "btrfs:*" as set_event.
> The other is the function_graph one. with "btrfs_mount" as
> set_graph_function.
>
Oh I see, I will try that.
> Thanks for your patient to help improving btrfs.
Well, thank you for helping me out here!
> Although I may not be able to check the trace until next Monday... :(
>
Oh that's fine, I can live with a one extra minute reboot for a few more days.
> Thanks,
> Qu
>
>
Thanks!
John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-07-31  5:45           ` John Ettedgui
@ 2015-08-01  4:35             ` John Ettedgui
  2015-08-01 10:05               ` Russell Coker
  2015-08-04  1:39               ` Qu Wenruo
  0 siblings, 2 replies; 54+ messages in thread
From: John Ettedgui @ 2015-08-01  4:35 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs

On Thu, Jul 30, 2015 at 10:45 PM, John Ettedgui <john.ettedgui@gmail.com> wrote:
> On Thu, Jul 30, 2015 at 10:40 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>> It seems that you're using Chromium while doing the dump. :)
>> If no CD drive, I'll recommend to use Archlinux installation iso to make a
>> bootable USB stick and do the dump.
>> (just download and dd would do the trick)
>> As its kernel and tools is much newer than most distribution.
So I did not have any usb sticks large enough for this task (only 4Gb)
so I restarted into emergency runlevel with only / mounted and as ro,
I hope that'll do.
>>
>> It's better to provide two trace.
>> One is the function tracer one, with "btrfs:*" as set_event.
>> The other is the function_graph one. with "btrfs_mount" as
>> set_graph_function.
So I got 2 new traces, and I am hoping that these are what you meant,
but I am still not sure.
Here are the commands I used in case...:

trace-cmd record -o
trace-function_graph.dat -p function_graph -g btrfs_mount mount MountPoint

and

trace-function_graph.dat -p function -l 'btrfs_*' mount MountPoint
(using -e btrfs only lead to a crash but -l 'btrfs_*' passed, though I
am sure they have different purposes.. I hope that's the correct one)

The first one was so big, 2Gb, I had to use xz to compress it and host
it somewhere else, the ML would most likely not take it.
The other one is quite small but I hosted it in the same place....
Here are the links:
https://mega.nz/#!8tgTjKyK!XJnWH05bsv9sJ3nANIxKsdkL20RePPS4cKgWSxit0eQ
https://mega.nz/#!xopkVA6L!z9xjo3us1Nv6wdOs05jNZdhNbiAP5yeLdneEp0huUzI

I hope that was it this time!
Thanks,
John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-01  4:35             ` John Ettedgui
@ 2015-08-01 10:05               ` Russell Coker
  2015-08-04  1:39               ` Qu Wenruo
  1 sibling, 0 replies; 54+ messages in thread
From: Russell Coker @ 2015-08-01 10:05 UTC (permalink / raw)
  To: John Ettedgui, btrfs

On Sat, 1 Aug 2015 02:35:39 PM John Ettedgui wrote:
> >> It seems that you're using Chromium while doing the dump. :)
> >> If no CD drive, I'll recommend to use Archlinux installation iso to make
> >> a bootable USB stick and do the dump.
> >> (just download and dd would do the trick)
> >> As its kernel and tools is much newer than most distribution.
> 
> So I did not have any usb sticks large enough for this task (only 4Gb)
> so I restarted into emergency runlevel with only / mounted and as ro,
> I hope that'll do.

The Debian/Jessie Netinst image is about 120M and allows you to launch a 
shell.  If you want a newer kernel you could rebuild the Debian Netinst 
yourself.

Also a basic text-only Linux installation takes a lot less than 4G of storage.  
I have a couple of 1G USB sticks with Debian installed that I use to fix 
things.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-07-31  4:10   ` John Ettedgui
@ 2015-08-02  5:44     ` Georgi Georgiev
  0 siblings, 0 replies; 54+ messages in thread
From: Georgi Georgiev @ 2015-08-02  5:44 UTC (permalink / raw)
  To: John Ettedgui; +Cc: “linux-btrfs@vger.kernel.org”

Quoting John Ettedgui at 2015-07-30-21:10:27(-0700):
> On Thu, Jul 30, 2015 at 7:34 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> >
> > Hi John,
> > Thanks for the trace output.
> You are welcome, thank you for looking at it!
> >
> > But it seems that, your root partition is also btrfs, causing a lot of btrfs
> > trace from your systemd journal.
> >
> Oh yes sorry about that.
> I actually have 3 partition in btrfs, the problematic one being the
> only big one.
> > Would you mind re-collecting the ftrace without such logging system caused
> > btrfs trace?
> Sure, how would I do that?
> This is my first time using ftrace.
> >
> > BTW, although I'm not quite familiar with ftrace, would you please consider
> > collect ftrace with function_graph tracer?
> Sure, how would I do that one as well?

You can use set_ftrace_pid to trace only a single process (for example,
the mount command). There is a sample script I found in the ftrace
documentation that goes something like this:

# First disable tracing, to clear the trace buffer
echo nop            > current_tracer
echo 0              > tracing_on
echo 0              > tracing_enabled

# Then re-enable it after setting the filters
echo $$             > set_ftrace_pid
echo '*btrfs*'      > set_ftrace_filter
echo function_graph > current_tracer
echo 1              > tracing_enabled
echo 1              > tracing_on

# And finally *exec* the command to trace:
exec mount ....

I tried it, but the logs were way too large, and I was still fiddling
with the trace_options to set. If someone has good advice, we can try it
again

-- 
Georgi

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-01  4:35             ` John Ettedgui
  2015-08-01 10:05               ` Russell Coker
@ 2015-08-04  1:39               ` Qu Wenruo
  2015-08-04  1:55                 ` John Ettedgui
  1 sibling, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2015-08-04  1:39 UTC (permalink / raw)
  To: John Ettedgui; +Cc: btrfs



John Ettedgui wrote on 2015/07/31 21:35 -0700:
> On Thu, Jul 30, 2015 at 10:45 PM, John Ettedgui <john.ettedgui@gmail.com> wrote:
>> On Thu, Jul 30, 2015 at 10:40 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>
>>> It seems that you're using Chromium while doing the dump. :)
>>> If no CD drive, I'll recommend to use Archlinux installation iso to make a
>>> bootable USB stick and do the dump.
>>> (just download and dd would do the trick)
>>> As its kernel and tools is much newer than most distribution.
> So I did not have any usb sticks large enough for this task (only 4Gb)
> so I restarted into emergency runlevel with only / mounted and as ro,
> I hope that'll do.
>>>
>>> It's better to provide two trace.
>>> One is the function tracer one, with "btrfs:*" as set_event.
>>> The other is the function_graph one. with "btrfs_mount" as
>>> set_graph_function.
> So I got 2 new traces, and I am hoping that these are what you meant,
> but I am still not sure.
> Here are the commands I used in case...:
>
> trace-cmd record -o
> trace-function_graph.dat -p function_graph -g btrfs_mount mount MountPoint
>
> and
>
> trace-function_graph.dat -p function -l 'btrfs_*' mount MountPoint
> (using -e btrfs only lead to a crash but -l 'btrfs_*' passed, though I
> am sure they have different purposes.. I hope that's the correct one)
>
> The first one was so big, 2Gb, I had to use xz to compress it and host
> it somewhere else, the ML would most likely not take it.
> The other one is quite small but I hosted it in the same place....
> Here are the links:
> https://mega.nz/#!8tgTjKyK!XJnWH05bsv9sJ3nANIxKsdkL20RePPS4cKgWSxit0eQ
> https://mega.nz/#!xopkVA6L!z9xjo3us1Nv6wdOs05jNZdhNbiAP5yeLdneEp0huUzI
>
> I hope that was it this time!
Oh, you were using trace-cmd, that's why the data is so huge.

I was originally hoping you just copy the trace file, which is human 
readable and not so huge.

But that's OK anyway.

I'll try to analyse it to find a clue if possible.

Thanks,
Qu
> Thanks,
> John
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04  1:39               ` Qu Wenruo
@ 2015-08-04  1:55                 ` John Ettedgui
  2015-08-04  2:31                   ` John Ettedgui
  2015-08-04  3:01                   ` Qu Wenruo
  0 siblings, 2 replies; 54+ messages in thread
From: John Ettedgui @ 2015-08-04  1:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs

On Mon, Aug 3, 2015 at 6:39 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
> Oh, you were using trace-cmd, that's why the data is so huge.
Oh, I thought it was just automating the work for me, but without any
sort of impact.
>
> I was originally hoping you just copy the trace file, which is human
> readable and not so huge.
If you mean something like the ouput of trace-cmd report, it was
actually bigger than the dat files (about twice the size) that's why I
shared the dats instead.
If you want the reports instead I'll gladly share them.
>
> But that's OK anyway.
>
> I'll try to analyse it to find a clue if possible.
>
> Thanks,
> Qu
Great thank you!

By the way, I just thought of a few things to mention.
This btrfs partition is an ext4 converted partition, and I hit the
same behavior as these guys under heavy load:
http://www.spinics.net/lists/linux-btrfs/msg44660.html
http://www.spinics.net/lists/linux-btrfs/msg44191.html
I don't think it's related to the crash, but maybe to the conversion?

Thanks Qu!
John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04  1:55                 ` John Ettedgui
@ 2015-08-04  2:31                   ` John Ettedgui
  2015-08-04  3:01                   ` Qu Wenruo
  1 sibling, 0 replies; 54+ messages in thread
From: John Ettedgui @ 2015-08-04  2:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs

On Mon, Aug 3, 2015 at 6:55 PM, John Ettedgui <john.ettedgui@gmail.com> wrote:
> On Mon, Aug 3, 2015 at 6:39 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>> Oh, you were using trace-cmd, that's why the data is so huge.
> Oh, I thought it was just automating the work for me, but without any
> sort of impact.
>>
>> I was originally hoping you just copy the trace file, which is human
>> readable and not so huge.
> If you mean something like the ouput of trace-cmd report, it was
> actually bigger than the dat files (about twice the size) that's why I
> shared the dats instead.
> If you want the reports instead I'll gladly share them.
In case it helps here are the reports instead of the dats:
https://mega.co.nz/#!FwpwHQyL!m0dQHSfQSNGzw9yUwJ6l0eb7Mzta0pOSAf1JHDZ1zfo
https://mega.co.nz/#!B1JgXLxZ!oI1bm0RyhqFbkCWnT95GNKohGozmvqxgJDSUtVdo77s

I guess once compressed the size difference is meaningless

Thanks,
John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04  1:55                 ` John Ettedgui
  2015-08-04  2:31                   ` John Ettedgui
@ 2015-08-04  3:01                   ` Qu Wenruo
  2015-08-04  4:58                     ` John Ettedgui
  2015-08-04 14:38                     ` Chris Murphy
  1 sibling, 2 replies; 54+ messages in thread
From: Qu Wenruo @ 2015-08-04  3:01 UTC (permalink / raw)
  To: John Ettedgui; +Cc: btrfs



John Ettedgui wrote on 2015/08/03 18:55 -0700:
> On Mon, Aug 3, 2015 at 6:39 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>> Oh, you were using trace-cmd, that's why the data is so huge.
> Oh, I thought it was just automating the work for me, but without any
> sort of impact.
>>
>> I was originally hoping you just copy the trace file, which is human
>> readable and not so huge.
> If you mean something like the ouput of trace-cmd report, it was
> actually bigger than the dat files (about twice the size) that's why I
> shared the dats instead.
> If you want the reports instead I'll gladly share them.
Nop, not the report, but /sys/kernel/debug/tracing/trace.

But that needs some manual operation, like set event and graph functions.
>>
>> But that's OK anyway.
>>
>> I'll try to analyse it to find a clue if possible.
>>
>> Thanks,
>> Qu
> Great thank you!
>
> By the way, I just thought of a few things to mention.
> This btrfs partition is an ext4 converted partition, and I hit the
> same behavior as these guys under heavy load:
> http://www.spinics.net/lists/linux-btrfs/msg44660.html
> http://www.spinics.net/lists/linux-btrfs/msg44191.html
> I don't think it's related to the crash, but maybe to the conversion?

Oh, converted...
That's too bad. :(

[[What's wrong with convert]]
Although btrfs is flex enough in theory to fit itself into the free 
space of ext* and works fine,
But in practice, ext* is too fragmental in the standard of btrfs, not to 
mention it also enables mixed-blockgroup.


[[Recommendations]]
I'd recommend to delete the ext*_img subvolume and rebalance all chunks 
in the fs if you're stick to the converted filesystem.

Although the best practice is staying away from such converted fs, 
either using pure, newly created btrfs, or convert back to ext* before 
any balance.

[[But before that, just try something]]
But you have already provided some interesting facts. As the filesystem 
is high fragmented, I'd like to recommend to do some little test:
(BTW I assume you don't use some special mount options)
To test if it's the space cache causing the mount speed drop.

1) clear page cache
    # echo 3 > /proc/sys/vm/drop_caches
2) Do a normal mount
    Just as what you do as usual, with your normal mount options
    Record the mount time
3) umount it.
4) clear page cache
    # echo 3 > /proc/sys/vm/drop_caches
5) mount it with "clear_cache" mount option
    It may takes sometime to clear the existing cache.
    It's just used to clear space cache.
    Don't compare mount time!
6) umount it
7) clear page cache
    # echo 3 > /proc/sys/vm/drop_caches
8) mount with "nospace_cache" mount option
    To see if there is obvious mount time change.

Hopes that's the space cache thing causing the slow mount.
But don't expect it too much anyway, it's just one personal guess.

After the test, I'd recommend to follow the [[Recommendations]] if you 
just want a stable filesystem.

Thanks,
Qu

>
> Thanks Qu!
> John
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04  3:01                   ` Qu Wenruo
@ 2015-08-04  4:58                     ` John Ettedgui
  2015-08-04  6:47                       ` Duncan
  2015-08-04 11:28                       ` Austin S Hemmelgarn
  2015-08-04 14:38                     ` Chris Murphy
  1 sibling, 2 replies; 54+ messages in thread
From: John Ettedgui @ 2015-08-04  4:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs

On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> Oh, converted...
> That's too bad. :(
>
> [[What's wrong with convert]]
> Although btrfs is flex enough in theory to fit itself into the free space of
> ext* and works fine,
> But in practice, ext* is too fragmental in the standard of btrfs, not to
> mention it also enables mixed-blockgroup.
>
Oh oh :/
>
> [[Recommendations]]
> I'd recommend to delete the ext*_img subvolume and rebalance all chunks in
> the fs if you're stick to the converted filesystem.
>
Already done (well the rebalance crashed towards the end both time
with the read only error, but someone on #btrfs looked at my partition
stats and said it was probably good enough)
> Although the best practice is staying away from such converted fs, either
> using pure, newly created btrfs, or convert back to ext* before any balance.
>
Unfortunately I don't have enough hard drive space to do a clean
btrfs, so my only way to use btrfs for that partition was a
conversion.
> [[But before that, just try something]]
> But you have already provided some interesting facts. As the filesystem is
> high fragmented, I'd like to recommend to do some little test:
> (BTW I assume you don't use some special mount options)
Current mount options in fstab:
btrfs   defaults,noatime,compress=lzo,space_cache,autodefrag    0       0
It's the same as my other btrfs partitions, apart from the fact that
they are on a SSD and way smaller.
> To test if it's the space cache causing the mount speed drop.
>
> 1) clear page cache
>    # echo 3 > /proc/sys/vm/drop_caches
> 2) Do a normal mount
>    Just as what you do as usual, with your normal mount options
>    Record the mount time
0.01s user 0.42s system 0% cpu 1:01.70 total
> 3) umount it.
not asked but might as well:
0.00s user 0.65s system 1% cpu 35.536 total
> 4) clear page cache
>    # echo 3 > /proc/sys/vm/drop_caches
> 5) mount it with "clear_cache" mount option
>    It may takes sometime to clear the existing cache.
>    It's just used to clear space cache.
>    Don't compare mount time!
Yes I know it's supposed to be slower :)
although... it was pretty much the same actually:
0.01s user 0.44s system 0% cpu 1:02.07 total
> 6) umount it

> 7) clear page cache
>    # echo 3 > /proc/sys/vm/drop_caches
Is it ok if that value never changed since 1) ?
> 8) mount with "nospace_cache" mount option
>    To see if there is obvious mount time change.
>
0.00s user 0.44s system 0% cpu 1:01.86 total
> Hopes that's the space cache thing causing the slow mount.
> But don't expect it too much anyway, it's just one personal guess.
>
Unfortunately it is about the same :/
> After the test, I'd recommend to follow the [[Recommendations]] if you just
> want a stable filesystem.
>
I am already within these recommendations I think.

Thanks!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04  4:58                     ` John Ettedgui
@ 2015-08-04  6:47                       ` Duncan
  2015-08-04 11:28                       ` Austin S Hemmelgarn
  1 sibling, 0 replies; 54+ messages in thread
From: Duncan @ 2015-08-04  6:47 UTC (permalink / raw)
  To: linux-btrfs

John Ettedgui posted on Mon, 03 Aug 2015 21:58:09 -0700 as excerpted:

> Current mount options in fstab:
> defaults,noatime,compress=lzo,space_cache,autodefrag 0 0



Just a few hints for a tidier fstab.  Feel free to ignore if you don't 
care, as the practical difference in mount options is nil.  =:^)

1) You should be able to delete that space_cache option.  Btrfs has 
defaulted to space_cache since at least 3.0 I think and probably way 
before that, and even when it wasn't the absolute default, you only had 
to enable it once, to have it on after that unless you turned it off 
again.

I know I've never specifically added space_cache to my mount options, 
yet /proc/mounts always has said it was there, and I've been on btrfs 
solidly since kernel 3.5 era, with tests before that (tho I do think I 
had to turn it on once, after which it stayed on for that filesystem, 
back in my earliest tests, which would have been late kernel 2.6 era).

2) Similarly you can omit defaults, since that's only a field placeholder 
in case you don't have any other options in that field.  As soon as you 
have your first non-default option holding the place of that field, you 
can omit defaults, since that's exactly what they are, defaults, 
regardless of whether the kernel is told to use them or not.

So all you really need there is noatime,compress=lzo,autodefrag.  FWIW, 
that's what I use as my normal mount options, too.

3) Actually, assuming you're running a half-way modern util-linux (which 
you should be if you're not on an old enterprise distro), you can omit 
the trailing 0 0 as well, since those fields are now optional and default 
to 0 if they aren't there.

See the fstab(5) manpage for more on the last two.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04  4:58                     ` John Ettedgui
  2015-08-04  6:47                       ` Duncan
@ 2015-08-04 11:28                       ` Austin S Hemmelgarn
  2015-08-04 17:36                         ` John Ettedgui
  1 sibling, 1 reply; 54+ messages in thread
From: Austin S Hemmelgarn @ 2015-08-04 11:28 UTC (permalink / raw)
  To: John Ettedgui, Qu Wenruo; +Cc: btrfs

[-- Attachment #1: Type: text/plain, Size: 2013 bytes --]

On 2015-08-04 00:58, John Ettedgui wrote:
> On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Although the best practice is staying away from such converted fs, either
>> using pure, newly created btrfs, or convert back to ext* before any balance.
>>
> Unfortunately I don't have enough hard drive space to do a clean
> btrfs, so my only way to use btrfs for that partition was a
> conversion.
If you could get your hands on a decent sized flash drive (32G or more), 
you could do an incremental conversion offline.  The steps would look 
something like this:

1. Boot the system into a LiveCD or something similar that doesn't need 
to run from your regular root partition (SystemRescueCD would be my 
personal recommendation, although if you go that way, make sure to boot 
the alternative kernel, as it's a lot newer then the standard ones).
2. Plug in the flash drive, format it as BTRFS.
3. Mount both your old partition and the flash drive somewhere.
4. Start copying files from the old partition to the flash drive.
5. When you hit ENOSPC on the flash drive, unmount the old partition, 
shrink it down to the minimum size possible, and create a new partition 
in the free space produced by doing so.
6. Add the new partition to the BTRFS filesystem on the flash drive.
7. Repeat steps 4-6 until you have copied everything.
8. Wipe the old partition, and add it to the BTRFS filesystem.
9. Run a full balance on the new BTRFS filesystem.
10. Delete the partition from step 5 that is closest to the old 
partition (via btrfs device delete), then resize the old partition to 
fill the space that the deleted partition took up.
11. Repeat steps 9-10 until the only remaining partitions in the new 
BTRFS filesystem are the old one and the flash drive.
12. Delete the flash drive from the BTRFS filesystem.

This takes some time and coordination, but it does work reliably as long 
as you are careful (I've done it before on multiple systems).



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04  3:01                   ` Qu Wenruo
  2015-08-04  4:58                     ` John Ettedgui
@ 2015-08-04 14:38                     ` Chris Murphy
  1 sibling, 0 replies; 54+ messages in thread
From: Chris Murphy @ 2015-08-04 14:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: btrfs, Hugo Mills

On Mon, Aug 3, 2015 at 9:01 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:

> Oh, converted...
> That's too bad. :(
>
> [[What's wrong with convert]]
> Although btrfs is flex enough in theory to fit itself into the free space of
> ext* and works fine,
> But in practice, ext* is too fragmental in the standard of btrfs, not to
> mention it also enables mixed-blockgroup.

There is an -f flag for mkfs to help users avoid accidents. Is there a
case to be made for btrfs-convert having either a -f flag, or an
interactive

"Convert has limitations that could increase risk to data, please see
the wiki. Continue? y/n"
OR
"Convert has limitations, is not recommended for production usage,
please see the wiki. Continue? y/n"

It just seems users are jumping into convert without reading the wiki
warning. Is it a good idea to reduce problems for less experienced
users by actively discouraging btrfs-convert for production use?



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04 11:28                       ` Austin S Hemmelgarn
@ 2015-08-04 17:36                         ` John Ettedgui
  2015-08-05 11:30                           ` Austin S Hemmelgarn
  0 siblings, 1 reply; 54+ messages in thread
From: John Ettedgui @ 2015-08-04 17:36 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: Qu Wenruo, btrfs

On Tue, Aug 4, 2015 at 4:28 AM, Austin S Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2015-08-04 00:58, John Ettedgui wrote:
>>
>> On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>
>>> Although the best practice is staying away from such converted fs, either
>>> using pure, newly created btrfs, or convert back to ext* before any
>>> balance.
>>>
>> Unfortunately I don't have enough hard drive space to do a clean
>> btrfs, so my only way to use btrfs for that partition was a
>> conversion.
>
> If you could get your hands on a decent sized flash drive (32G or more), you
> could do an incremental conversion offline.  The steps would look something
> like this:
>
> 1. Boot the system into a LiveCD or something similar that doesn't need to
> run from your regular root partition (SystemRescueCD would be my personal
> recommendation, although if you go that way, make sure to boot the
> alternative kernel, as it's a lot newer then the standard ones).
> 2. Plug in the flash drive, format it as BTRFS.
> 3. Mount both your old partition and the flash drive somewhere.
> 4. Start copying files from the old partition to the flash drive.
> 5. When you hit ENOSPC on the flash drive, unmount the old partition, shrink
> it down to the minimum size possible, and create a new partition in the free
> space produced by doing so.
> 6. Add the new partition to the BTRFS filesystem on the flash drive.
> 7. Repeat steps 4-6 until you have copied everything.
> 8. Wipe the old partition, and add it to the BTRFS filesystem.
> 9. Run a full balance on the new BTRFS filesystem.
> 10. Delete the partition from step 5 that is closest to the old partition
> (via btrfs device delete), then resize the old partition to fill the space
> that the deleted partition took up.
> 11. Repeat steps 9-10 until the only remaining partitions in the new BTRFS
> filesystem are the old one and the flash drive.
> 12. Delete the flash drive from the BTRFS filesystem.
>
> This takes some time and coordination, but it does work reliably as long as
> you are careful (I've done it before on multiple systems).
>
>
I suppose I could do that even without the flash as I have some free
space anyway, but moving Tbs of data with Gbs of free space will take
days, plus the repartitioning. It'd probably be easier to start with a
1Tb drive or something.
Is this currently my best bet as conversion is not as good as I thought?

I believe my other 2 partitions also come from conversion, though I
may have rebuilt them later from scratch.

Thank you!
John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-04 17:36                         ` John Ettedgui
@ 2015-08-05 11:30                           ` Austin S Hemmelgarn
  2015-08-13 22:38                             ` Vincent Olivier
       [not found]                             ` <CAJ3TwYSW+SvbBrh1u_x+c3HTRx03qSR6BoH5cj_VzCXxZYv6EA@mail.gmail.com>
  0 siblings, 2 replies; 54+ messages in thread
From: Austin S Hemmelgarn @ 2015-08-05 11:30 UTC (permalink / raw)
  To: John Ettedgui; +Cc: Qu Wenruo, btrfs

[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]

On 2015-08-04 13:36, John Ettedgui wrote:
> On Tue, Aug 4, 2015 at 4:28 AM, Austin S Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2015-08-04 00:58, John Ettedgui wrote:
>>>
>>> On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>>
>>>> Although the best practice is staying away from such converted fs, either
>>>> using pure, newly created btrfs, or convert back to ext* before any
>>>> balance.
>>>>
>>> Unfortunately I don't have enough hard drive space to do a clean
>>> btrfs, so my only way to use btrfs for that partition was a
>>> conversion.
>>
>> If you could get your hands on a decent sized flash drive (32G or more), you
>> could do an incremental conversion offline.  The steps would look something
>> like this:
>>
>> 1. Boot the system into a LiveCD or something similar that doesn't need to
>> run from your regular root partition (SystemRescueCD would be my personal
>> recommendation, although if you go that way, make sure to boot the
>> alternative kernel, as it's a lot newer then the standard ones).
>> 2. Plug in the flash drive, format it as BTRFS.
>> 3. Mount both your old partition and the flash drive somewhere.
>> 4. Start copying files from the old partition to the flash drive.
>> 5. When you hit ENOSPC on the flash drive, unmount the old partition, shrink
>> it down to the minimum size possible, and create a new partition in the free
>> space produced by doing so.
>> 6. Add the new partition to the BTRFS filesystem on the flash drive.
>> 7. Repeat steps 4-6 until you have copied everything.
>> 8. Wipe the old partition, and add it to the BTRFS filesystem.
>> 9. Run a full balance on the new BTRFS filesystem.
>> 10. Delete the partition from step 5 that is closest to the old partition
>> (via btrfs device delete), then resize the old partition to fill the space
>> that the deleted partition took up.
>> 11. Repeat steps 9-10 until the only remaining partitions in the new BTRFS
>> filesystem are the old one and the flash drive.
>> 12. Delete the flash drive from the BTRFS filesystem.
>>
>> This takes some time and coordination, but it does work reliably as long as
>> you are careful (I've done it before on multiple systems).
>>
>>
> I suppose I could do that even without the flash as I have some free
> space anyway, but moving Tbs of data with Gbs of free space will take
> days, plus the repartitioning. It'd probably be easier to start with a
> 1Tb drive or something.
> Is this currently my best bet as conversion is not as good as I thought?
>
> I believe my other 2 partitions also come from conversion, though I
> may have rebuilt them later from scratch.
>
> Thank you!
> John
>
Yeah, you're probably better off getting a TB disk and starting with 
that.  In theory it is possible to automate the process, but I would 
advise against that if at all possible, it's a lot easier to recover 
from an error if you're doing it manually.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-05 11:30                           ` Austin S Hemmelgarn
@ 2015-08-13 22:38                             ` Vincent Olivier
  2015-08-13 23:19                               ` Chris Murphy
       [not found]                             ` <CAJ3TwYSW+SvbBrh1u_x+c3HTRx03qSR6BoH5cj_VzCXxZYv6EA@mail.gmail.com>
  1 sibling, 1 reply; 54+ messages in thread
From: Vincent Olivier @ 2015-08-13 22:38 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I think I might be having this problem too. 12 x 4TB RAID10 (original makefs, not converted from ext or whatnot). Says it has ~6TiB left. Centos 7. Dual Xeon CPU. 32GB RAM. ELRepo Kernel 4.1.5. Fstab options: noatime,autodefrag,compress=zlib,space_cache,nossd,noauto,x-systemd.automount

Sometimes (not all the time) when I cd or ls the mount point it will not return within 5 minutes (I never let it run more than 5 minutes before rebooting) and I reboot and then it takes between 10-30s. Well as I'm writing this it's already been more than 10 minutes.
 
I don't have the problem when I mount manually without the "noauto,x-systemd.automount" options.
 
Can anyone help ?
 
Thanks.

Vincent

-----Original Message-----
From: "Austin S Hemmelgarn" <ahferroin7@gmail.com>
Sent: Wednesday, August 5, 2015 07:30
To: "John Ettedgui" <john.ettedgui@gmail.com>
Cc: "Qu Wenruo" <quwenruo@cn.fujitsu.com>, "btrfs" <linux-btrfs@vger.kernel.org>
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

On 2015-08-04 13:36, John Ettedgui wrote:
> On Tue, Aug 4, 2015 at 4:28 AM, Austin S Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2015-08-04 00:58, John Ettedgui wrote:
>>>
>>> On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>>
>>>> Although the best practice is staying away from such converted fs, either
>>>> using pure, newly created btrfs, or convert back to ext* before any
>>>> balance.
>>>>
>>> Unfortunately I don't have enough hard drive space to do a clean
>>> btrfs, so my only way to use btrfs for that partition was a
>>> conversion.
>>
>> If you could get your hands on a decent sized flash drive (32G or more), you
>> could do an incremental conversion offline.  The steps would look something
>> like this:
>>
>> 1. Boot the system into a LiveCD or something similar that doesn't need to
>> run from your regular root partition (SystemRescueCD would be my personal
>> recommendation, although if you go that way, make sure to boot the
>> alternative kernel, as it's a lot newer then the standard ones).
>> 2. Plug in the flash drive, format it as BTRFS.
>> 3. Mount both your old partition and the flash drive somewhere.
>> 4. Start copying files from the old partition to the flash drive.
>> 5. When you hit ENOSPC on the flash drive, unmount the old partition, shrink
>> it down to the minimum size possible, and create a new partition in the free
>> space produced by doing so.
>> 6. Add the new partition to the BTRFS filesystem on the flash drive.
>> 7. Repeat steps 4-6 until you have copied everything.
>> 8. Wipe the old partition, and add it to the BTRFS filesystem.
>> 9. Run a full balance on the new BTRFS filesystem.
>> 10. Delete the partition from step 5 that is closest to the old partition
>> (via btrfs device delete), then resize the old partition to fill the space
>> that the deleted partition took up.
>> 11. Repeat steps 9-10 until the only remaining partitions in the new BTRFS
>> filesystem are the old one and the flash drive.
>> 12. Delete the flash drive from the BTRFS filesystem.
>>
>> This takes some time and coordination, but it does work reliably as long as
>> you are careful (I've done it before on multiple systems).
>>
>>
> I suppose I could do that even without the flash as I have some free
> space anyway, but moving Tbs of data with Gbs of free space will take
> days, plus the repartitioning. It'd probably be easier to start with a
> 1Tb drive or something.
> Is this currently my best bet as conversion is not as good as I thought?
>
> I believe my other 2 partitions also come from conversion, though I
> may have rebuilt them later from scratch.
>
> Thank you!
> John
>
Yeah, you're probably better off getting a TB disk and starting with 
that.  In theory it is possible to automate the process, but I would 
advise against that if at all possible, it's a lot easier to recover 
from an error if you're doing it manually.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-13 22:38                             ` Vincent Olivier
@ 2015-08-13 23:19                               ` Chris Murphy
  2015-08-14  0:30                                 ` Duncan
  2015-08-14  2:39                                 ` Vincent Olivier
  0 siblings, 2 replies; 54+ messages in thread
From: Chris Murphy @ 2015-08-13 23:19 UTC (permalink / raw)
  To: Btrfs BTRFS

On Thu, Aug 13, 2015 at 4:38 PM, Vincent Olivier <vincent@up4.com> wrote:
> Hi,
>
> I think I might be having this problem too. 12 x 4TB RAID10 (original makefs, not converted from ext or whatnot). Says it has ~6TiB left. Centos 7. Dual Xeon CPU. 32GB RAM. ELRepo Kernel 4.1.5. Fstab options: noatime,autodefrag,compress=zlib,space_cache,nossd,noauto,x-systemd.automount

Well I think others have suggested 3000 snapshots and quite a few
things will get very slow. But then also you have autodefrag and I
forget the interaction of this with many snapshots since the snapshot
aware defrag code was removed.

I'd say file a bug with the full details of the hardware from the
ground up to the Btrfs file system. And include as an attachment,
dmesg with sysrq+t during this "hang". Usually I see t asked if
there's just slowness/delays, and w if there's already a kernel
message saying there's a blocked task for 120 seconds.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-13 23:19                               ` Chris Murphy
@ 2015-08-14  0:30                                 ` Duncan
  2015-08-14  2:42                                   ` Vincent Olivier
  2015-08-14  2:39                                 ` Vincent Olivier
  1 sibling, 1 reply; 54+ messages in thread
From: Duncan @ 2015-08-14  0:30 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Thu, 13 Aug 2015 17:19:41 -0600 as excerpted:

> Well I think others have suggested 3000 snapshots and quite a few things
> will get very slow. But then also you have autodefrag and I forget the
> interaction of this with many snapshots since the snapshot aware defrag
> code was removed.

Autodefrag shouldn't have any snapshots mount-time-related interaction, 
with snapshot-aware-defrag disabled.  The interaction between defrag 
(auto or not) and snapshots will be additional data space usage, since 
with snapshot-aware disabled, defrag only works with the current copy, 
thus forcing it to COW the extents elsewhere while not freeing the old 
extents as they're still referenced by the snapshots, but it shouldn't 
affect mount-time.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-13 23:19                               ` Chris Murphy
  2015-08-14  0:30                                 ` Duncan
@ 2015-08-14  2:39                                 ` Vincent Olivier
  1 sibling, 0 replies; 54+ messages in thread
From: Vincent Olivier @ 2015-08-14  2:39 UTC (permalink / raw)
  To: linux-btrfs

I have 2 snapshots a few days apart for incrementally backing up the volume but that's it.

I'll try without autodefrag tomorrow.

Vincent

-----Original Message-----
From: "Chris Murphy" <lists@colorremedies.com>
Sent: Thursday, August 13, 2015 19:19
To: "Btrfs BTRFS" <linux-btrfs@vger.kernel.org>
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

On Thu, Aug 13, 2015 at 4:38 PM, Vincent Olivier <vincent@up4.com> wrote:
> Hi,
>
> I think I might be having this problem too. 12 x 4TB RAID10 (original makefs, not converted from ext or whatnot). Says it has ~6TiB left. Centos 7. Dual Xeon CPU. 32GB RAM. ELRepo Kernel 4.1.5. Fstab options: noatime,autodefrag,compress=zlib,space_cache,nossd,noauto,x-systemd.automount

Well I think others have suggested 3000 snapshots and quite a few
things will get very slow. But then also you have autodefrag and I
forget the interaction of this with many snapshots since the snapshot
aware defrag code was removed.

I'd say file a bug with the full details of the hardware from the
ground up to the Btrfs file system. And include as an attachment,
dmesg with sysrq+t during this "hang". Usually I see t asked if
there's just slowness/delays, and w if there's already a kernel
message saying there's a blocked task for 120 seconds.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-14  0:30                                 ` Duncan
@ 2015-08-14  2:42                                   ` Vincent Olivier
  2015-08-18 17:36                                     ` Vincent Olivier
  0 siblings, 1 reply; 54+ messages in thread
From: Vincent Olivier @ 2015-08-14  2:42 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

I'll try without autodefrag anyways tomorrow just to make sure.

And then file a bug report too with however it decides to behave.

Vincent

-----Original Message-----
From: "Duncan" <1i5t5.duncan@cox.net>
Sent: Thursday, August 13, 2015 20:30
To: linux-btrfs@vger.kernel.org
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Chris Murphy posted on Thu, 13 Aug 2015 17:19:41 -0600 as excerpted:

> Well I think others have suggested 3000 snapshots and quite a few things
> will get very slow. But then also you have autodefrag and I forget the
> interaction of this with many snapshots since the snapshot aware defrag
> code was removed.

Autodefrag shouldn't have any snapshots mount-time-related interaction, 
with snapshot-aware-defrag disabled.  The interaction between defrag 
(auto or not) and snapshots will be additional data space usage, since 
with snapshot-aware disabled, defrag only works with the current copy, 
thus forcing it to COW the extents elsewhere while not freeing the old 
extents as they're still referenced by the snapshots, but it shouldn't 
affect mount-time.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-08-14  2:42                                   ` Vincent Olivier
@ 2015-08-18 17:36                                     ` Vincent Olivier
  0 siblings, 0 replies; 54+ messages in thread
From: Vincent Olivier @ 2015-08-18 17:36 UTC (permalink / raw)
  To: Duncan, linux-btrfs

it appears that it might be related to label/uuid fstab boot mounting instead

when I mount manually without the "noauto,x-systemd.automount” options and use the first device I get from "btrfs fi show" after a "btrfs device scan" I never get the problem

does this sound familiar ? I thought I was safe with uuid mount in stab…

I can (temporarily) live with manually mounting this filesystem but I would appreciate being able to mount it at boot time via fstab…

thanks

vincent

-----Original Message-----
From: "Vincent Olivier" <vincent@up4.com>
Sent: Thursday, August 13, 2015 22:42
To: "Duncan" <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

I'll try without autodefrag anyways tomorrow just to make sure.

And then file a bug report too with however it decides to behave.

Vincent

-----Original Message-----
From: "Duncan" <1i5t5.duncan@cox.net>
Sent: Thursday, August 13, 2015 20:30
To: linux-btrfs@vger.kernel.org
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

Chris Murphy posted on Thu, 13 Aug 2015 17:19:41 -0600 as excerpted:

> Well I think others have suggested 3000 snapshots and quite a few things
> will get very slow. But then also you have autodefrag and I forget the
> interaction of this with many snapshots since the snapshot aware defrag
> code was removed.

Autodefrag shouldn't have any snapshots mount-time-related interaction, 
with snapshot-aware-defrag disabled.  The interaction between defrag 
(auto or not) and snapshots will be additional data space usage, since 
with snapshot-aware disabled, defrag only works with the current copy, 
thus forcing it to COW the extents elsewhere while not freeing the old 
extents as they're still referenced by the snapshots, but it shouldn't 
affect mount-time.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]                             ` <CAJ3TwYSW+SvbBrh1u_x+c3HTRx03qSR6BoH5cj_VzCXxZYv6EA@mail.gmail.com>
@ 2016-07-15  3:56                               ` Qu Wenruo
       [not found]                                 ` <CAJ3TwYRXwDVVfT0TRRiM9dEw-7TvY8qG=WvMYKczZOv6wkFWAQ@mail.gmail.com>
  2016-07-15 11:29                                 ` Christian Rohmann
  0 siblings, 2 replies; 54+ messages in thread
From: Qu Wenruo @ 2016-07-15  3:56 UTC (permalink / raw)
  To: John Ettedgui, Austin S Hemmelgarn; +Cc: btrfs

Sorry for the late reply.

[Slow mount]
In fact we also reproduce the same problem, and found the problem.

It's related to the size of extent tree.

If the extent tree is large enough, mount needs to do quite a lot of IO 
to read out all block group items.
And such read is random small read (default leaf size is just 16K), and 
considering the per GB cost, spinning rust is the normal choice for such 
large fs, which makes random small read even more slower.


The good news is, we have patch to slightly speedup the mount, by 
avoiding reading out unrelated tree blocks.

In our test environment, it takes 15% less time to mount a fs filled 
with 16K files(2T used space).

https://patchwork.kernel.org/patch/9021421/


And according to the facts that only extent size is related to the 
problem, any method to reduce extent tree size will help, including 
defrag, nodatacow.

[Btrfsck OOM]
Lu Fengqi is developing btrfsck low memory usage mode.
It's not merged into mainline btrfs progs and not fully completely, but 
shows quite positive result for large fs.

It may needs sometime to get it stable, but IMHO it's going the right 
direction.

Thanks,
Qu



At 07/12/2016 04:31 AM, John Ettedgui wrote:
> On Wed, Aug 5, 2015 at 4:30 AM Austin S Hemmelgarn <ahferroin7@gmail.com
> <mailto:ahferroin7@gmail.com>> wrote:
>> Yeah, you're probably better off getting a TB disk and starting with
>> that. In theory it is possible to automate the process, but I would
>> advise against that if at all possible, it's a lot easier to recover
>> from an error if you're doing it manually.
>
> Hello,
>
> Has there been any progress on this issue?
>
> My btrfs partitions are now all cleanly made, not converted from ext4,
> and yet they still take up to 30s to mount. Interestingly they're all
> about the same size, but some take quite longer than others.. I guess
> differences FS related.
>
>
> Thank you!
> John



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]                                 ` <CAJ3TwYRXwDVVfT0TRRiM9dEw-7TvY8qG=WvMYKczZOv6wkFWAQ@mail.gmail.com>
@ 2016-07-15  5:24                                   ` Qu Wenruo
  2016-07-15  6:56                                     ` Kai Krakow
       [not found]                                     ` <CAJ3TwYSTnQfj=qmBLtnmtXQKexMMD4x=9Gk3p3anf4uF+G26kw@mail.gmail.com>
  0 siblings, 2 replies; 54+ messages in thread
From: Qu Wenruo @ 2016-07-15  5:24 UTC (permalink / raw)
  To: John Ettedgui, Austin S Hemmelgarn; +Cc: btrfs



At 07/15/2016 12:39 PM, John Ettedgui wrote:
> On Thu, Jul 14, 2016 at 8:56 PM Qu Wenruo <quwenruo@cn.fujitsu.com
> <mailto:quwenruo@cn.fujitsu.com>> wrote:
>
>     Sorry for the late reply.
>
> Oh it's all good, it's only a been a few days.
>
>     [Slow mount]
>     In fact we also reproduce the same problem, and found the problem.
>
> Awesome!
>
>     It's related to the size of extent tree.
>
>     If the extent tree is large enough, mount needs to do quite a lot of IO
>     to read out all block group items.
>     And such read is random small read (default leaf size is just 16K), and
>     considering the per GB cost, spinning rust is the normal choice for such
>     large fs, which makes random small read even more slower.
>
>
>     The good news is, we have patch to slightly speedup the mount, by
>     avoiding reading out unrelated tree blocks.
>
>     In our test environment, it takes 15% less time to mount a fs filled
>     with 16K files(2T used space).
>
>     https://patchwork.kernel.org/patch/9021421/
>
>
> Great, I will try this and report on it.
>
>     And according to the facts that only extent size is related to the
>     problem, any method to reduce extent tree size will help, including
>     defrag, nodatacow.
>
> Would increasing the leaf size help as well?
May help.
But didn't test it, and since leafsize can only be determined at mkfs 
time, it's not an easy thing to try it.

> nodatacow seems unsafe
Nodatacow is not that unsafe, as btrfs will still do data cow if it's 
needed, like rewriting data of another subvolume/snapshot.

That would be one of the most obvious method if you do a lot of rewrite.

> as for defrag, all my partitions are already on
> autodefrag, so I assume that should be good. Or is manual once in a
> while a good idea as well?
AFAIK autodefrag will only help if you're doing appending write.

Manual one will help more, but since btrfs has problem defraging extents 
shared by different subvolumes, I doubt the effect if you have a lot of 
subvolumes/snapshots.


Another method is to disable compression.
For compression, file extent size up limit is 128K, while for 
non-compress case, it's 128M.

So for the same 1G sized file, it would cause 8K extents using 
compression, while only 8 extents without compression.

>
> Is there a way to display the tree size? that would help knowing what
> worked and what didn't.

You can dump the whole extent tree to get the accurate size:

# btrfs-debug-tree -t 2 <your dev> > some_file

It may be quite long, so output redirection is highly recommended.
You can do it online(mounted), but if the fs is very very large, it's 
recommended to do it offline(unmounted), or at least make sure there is 
not much write while mounted.

Check the first few line then you can already get the overall size:

------
btrfs-progs v4.6.1
extent tree key (EXTENT_TREE ROOT_ITEM 0)
node 30441472 level *1* items 41 free 452 generation 7 owner 2
------

If the level is high (7 is the highest possible value), it's almost sure 
that's the problem.

For accurate space size, use the following scrip
t to get the number of extent tree blocks:

------
$ egrep -e "^node" -e "^leaf" some_file | wc -l
------

Then multiple it by nodesize, you get the accurate size of extent tree.

Thanks,
Qu
>
>
>     [Btrfsck OOM]
>     Lu Fengqi is developing btrfsck low memory usage mode.
>     It's not merged into mainline btrfs progs and not fully completely, but
>     shows quite positive result for large fs.
>
>     It may needs sometime to get it stable, but IMHO it's going the right
>     direction.
>
> Well that is great news as well, thank you for sharing it!
>
>     Thanks,
>     Qu
>
>
> Thank you!
> John



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-15  5:24                                   ` Qu Wenruo
@ 2016-07-15  6:56                                     ` Kai Krakow
       [not found]                                     ` <CAJ3TwYSTnQfj=qmBLtnmtXQKexMMD4x=9Gk3p3anf4uF+G26kw@mail.gmail.com>
  1 sibling, 0 replies; 54+ messages in thread
From: Kai Krakow @ 2016-07-15  6:56 UTC (permalink / raw)
  To: linux-btrfs

Am Fri, 15 Jul 2016 13:24:45 +0800
schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:

> > as for defrag, all my partitions are already on
> > autodefrag, so I assume that should be good. Or is manual once in a
> > while a good idea as well?  
> AFAIK autodefrag will only help if you're doing appending write.
> 
> Manual one will help more, but since btrfs has problem defraging
> extents shared by different subvolumes, I doubt the effect if you
> have a lot of subvolumes/snapshots.

"btrfs fi defrag" is said to only defrag metadata if you are pointing
it to directories only without recursion. It could maybe help that case
without unsharing the extents:

find /btrfs-subvol0 -type d -print0 | xargs -0 btrfs fi defrag

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-15  3:56                               ` Qu Wenruo
       [not found]                                 ` <CAJ3TwYRXwDVVfT0TRRiM9dEw-7TvY8qG=WvMYKczZOv6wkFWAQ@mail.gmail.com>
@ 2016-07-15 11:29                                 ` Christian Rohmann
  2016-07-16 23:53                                   ` Qu Wenruo
  1 sibling, 1 reply; 54+ messages in thread
From: Christian Rohmann @ 2016-07-15 11:29 UTC (permalink / raw)
  To: Qu Wenruo, John Ettedgui, Austin S Hemmelgarn; +Cc: btrfs

Hey Qu, all

On 07/15/2016 05:56 AM, Qu Wenruo wrote:
> 
> The good news is, we have patch to slightly speedup the mount, by
> avoiding reading out unrelated tree blocks.
> 
> In our test environment, it takes 15% less time to mount a fs filled
> with 16K files(2T used space).
> 
> https://patchwork.kernel.org/patch/9021421/

I have a 30TB RAID6 filesystem with compression on and I've seen mount
times of up to 20 minutes (!).

I don't want to sound unfair, but 15% improvement is good, but not in
the league where BTRFS needs to be.
Do I understand you comments correctly that further improvement would
result in a change of the on-disk format?



Thanks and with regards

Christian

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-15 11:29                                 ` Christian Rohmann
@ 2016-07-16 23:53                                   ` Qu Wenruo
  2016-07-18 13:42                                     ` Josef Bacik
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2016-07-16 23:53 UTC (permalink / raw)
  To: Christian Rohmann, Qu Wenruo, John Ettedgui, Austin S Hemmelgarn
  Cc: btrfs, Chris Mason, David Sterba, Josef Bacik



On 07/15/2016 07:29 PM, Christian Rohmann wrote:
> Hey Qu, all
>
> On 07/15/2016 05:56 AM, Qu Wenruo wrote:
>>
>> The good news is, we have patch to slightly speedup the mount, by
>> avoiding reading out unrelated tree blocks.
>>
>> In our test environment, it takes 15% less time to mount a fs filled
>> with 16K files(2T used space).
>>
>> https://patchwork.kernel.org/patch/9021421/
>
> I have a 30TB RAID6 filesystem with compression on and I've seen mount
> times of up to 20 minutes (!).
>
> I don't want to sound unfair, but 15% improvement is good, but not in
> the league where BTRFS needs to be.
> Do I understand you comments correctly that further improvement would
> result in a change of the on-disk format?

Yes, that's the case.

The problem is, we put BLOCK_GROUP_ITEM into extent tree, along with 
tons of EXTENT_ITEM/METADATA_ITEM.

This makes search for BLOCK_GROUP_ITEM very very very slow if extent 
tree is really big.

On the handle, we search CHUNK_ITEM very very fast, because CHUNK_ITEM 
are in their own tree.
(CHUNK_ITEM and BLOCK_GROUP_ITEM are 1:1 mapped)

So to completely fix it, btrfs needs on-disk format change to put 
BLOCK_GROUP_ITEM into their own tree.

IMHO there maybe be some objection from other devs though.

Anyway, I add the three maintainers to Cc, and hopes we can get a better 
idea to fix it.

Thanks,
Qu
>
>
>
> Thanks and with regards
>
> Christian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]                                       ` <CAJ3TwYTnMPVwkrZEU-=Q_Nq+9Bn0vM3z+EFC8RP=RTyaufSoqw@mail.gmail.com>
@ 2016-07-18  1:13                                         ` Qu Wenruo
       [not found]                                           ` <CAJ3TwYRpc_R-wVur0T6+Uy_aPVXTGpvp_ag1Ar9K2HoB0H1ySQ@mail.gmail.com>
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2016-07-18  1:13 UTC (permalink / raw)
  To: John Ettedgui, Austin S Hemmelgarn; +Cc: btrfs



At 07/16/2016 07:17 PM, John Ettedgui wrote:
> On Thu, Jul 14, 2016 at 10:54 PM John Ettedgui <john.ettedgui@gmail.com
> <mailto:john.ettedgui@gmail.com>> wrote:
>
>     On Thu, Jul 14, 2016 at 10:26 PM Qu Wenruo <quwenruo@cn.fujitsu.com
>     <mailto:quwenruo@cn.fujitsu.com>> wrote:
>
>
>         > Would increasing the leaf size help as well?
>
>         > nodatacow seems unsafe
>
>
>         Nodatacow is not that unsafe, as btrfs will still do data cow if
>         it's
>         needed, like rewriting data of another subvolume/snapshot.
>
>     Alright.
>
>         That would be one of the most obvious method if you do a lot of
>         rewrite.
>
>         > as for defrag, all my partitions are already on
>         > autodefrag, so I assume that should be good. Or is manual once
>         in a
>         > while a good idea as well?
>         AFAIK autodefrag will only help if you're doing appending write.
>
>         Manual one will help more, but since btrfs has problem defraging
>         extents
>         shared by different subvolumes, I doubt the effect if you have a
>         lot of
>         subvolumes/snapshots.
>
>     I don't have any subvolume/snapshot for the big partitions, my usage
>     there is fairly simple. I'll have to add a regular defrag job then.
>
>
>         Another method is to disable compression.
>         For compression, file extent size up limit is 128K, while for
>         non-compress case, it's 128M.
>
>         So for the same 1G sized file, it would cause 8K extents using
>         compression, while only 8 extents without compression.
>
>     Now that might be something important, I do use LZO compression on
>     all of them.
>     Does this limit apply to only compressed files, or any file if the
>     fs is mounted using the compression option?
>     Would mounting these partitions without compression option and then
>     defragmenting them reverse the compression?
>
> I've tried this for the slowest to mount partition.
> I changed its mount option to compression=no, run defrag and balance.
> Not sure if the latter was needed but I thought to try... like in the
> past it worked fine up to dusage=99 but with 100% I get a crash, oh well.
> The result of defrag + nocompress (I don't know how much it actually
> decompressed, and if it changed the limit Qu mentioned before) is about
> 26% less time spent to mount the partition, and it's no more my slowerst
> partition to mount.!

Well, compression=no only affects any write after the mount option.
And balance won't help to convert compressed extent to non-compressed one.

But maybe the defrag will convert them to normal extents.

The best method to de-compress them is, to read them out and rewrite 
them with compression=no mount option.

>
> I'll try just defragmenting another partition but keeping the
> compression on and see what difference I get there the same changes.
>
> I've tried the patch, which applied fine to my kernel (4.6.4) but I
> don't see any difference in mounting time, maybe I made a mistake or my
> issue is not really the same?

Pretty possible that there is another problem causing the slow mount.

The best method to verify is to do a ftrace on the btrfs mount.
Here is the script I tested my patch:

------
#!/bin/bash

trace_dir=/sys/kernel/debug/tracing

init_trace () {
	echo 0 > $trace_dir/tracing_on
	echo > $trace_dir/trace
	echo function_graph > $trace_dir/current_tracer
	echo > $trace_dir/set_ftrace_filter

	echo open_ctree			>> $trace_dir/set_ftrace_filter
	echo btrfs_read_chunk_tree	>> $trace_dir/set_ftrace_filter
	echo btrfs_read_block_groups	>> $trace_dir/set_ftrace_filter

	# This will generate tons of trace, better to comment it out
	echo find_block_group		>> $trace_dir/set_ftrace_filter

	echo 1 > $trace_dir/tracing_on
}

end_trace () {
	cp $trace_dir/trace $(dirname $0)
	echo 0 > $trace_dir/tracing_on
	echo > $trace_dir/set_ftrace_filter
	echo > $trace_dir/trace
}

init_trace
echo start mounting
time mount /dev/sdb /mnt/test
echo mount done
end_trace
------

After executing the script, you got a file named "trace" at the same 
directory of the script.

The content will be like:
------
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  1) $ 7670856 us  |  open_ctree [btrfs]();
  2) * 13533.45 us |    btrfs_read_chunk_tree [btrfs]();
  2) # 1320.981 us |    btrfs_init_space_info [btrfs]();
  2)               |    btrfs_read_block_groups [btrfs]() {
  2) * 10127.35 us |      find_block_group [btrfs]();
  2)   4.951 us    |      find_block_group [btrfs]();
  2) * 26225.17 us |      find_block_group [btrfs]();
......
  3) * 26450.28 us |      find_block_group [btrfs]();
  3) * 11590.29 us |      find_block_group [btrfs]();
  3) $ 7557210 us  |    } /* btrfs_read_block_groups [btrfs] */ <<<
------

And you can see open_ctree() function, the main part of btrfs mount, 
takes about 7.67 seconds to execute, while btrfs_read_block_groups() 
takes 7.56 seconds, about 98.6% of the open_ctree() executing time.

If your result are much the same as mine, then that's the same problem.

And after applying my patch, please try to compare the executing time of 
btrfs_read_block_groups() to see if there is any obvious(>5%) change.

Thanks,
Qu

>
> Thank you,
> John



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]                                           ` <CAJ3TwYRpc_R-wVur0T6+Uy_aPVXTGpvp_ag1Ar9K2HoB0H1ySQ@mail.gmail.com>
@ 2016-07-18  8:41                                             ` Qu Wenruo
       [not found]                                               ` <CAJ3TwYRH8JVkuv2Hu7FYb+BSwKGrq1spx079zwOF_FO1y=9NFA@mail.gmail.com>
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2016-07-18  8:41 UTC (permalink / raw)
  To: John Ettedgui, Austin S Hemmelgarn; +Cc: btrfs



At 07/18/2016 04:20 PM, John Ettedgui wrote:
> On Sun, Jul 17, 2016 at 6:14 PM Qu Wenruo <quwenruo@cn.fujitsu.com
> <mailto:quwenruo@cn.fujitsu.com>> wrote:
>
>
>     Well, compression=no only affects any write after the mount option.
>     And balance won't help to convert compressed extent to
>     non-compressed one.
>
>     But maybe the defrag will convert them to normal extents.
>
>     The best method to de-compress them is, to read them out and rewrite
>     them with compression=no mount option.
>
> Right, I just don't have the extra storage for that right now, though I
> suppose I could do little by little, but manually that would take a
> really long time, so I went with the defrag route :)
>
>
>     >
>     > I'll try just defragmenting another partition but keeping the
>     > compression on and see what difference I get there the same changes.
>     >
>
> So following that, another partition got its mounting time reduced by
> about 70% by running a manual defrag (I kept compression on and used
> -clzo for this defragmentation).
> So maybe a manual defrag is really the best thing to do so far.

Seems to be the case.

For further investigation, it would be quite nice for you to upload the 
output of "btrfs-debug-tree -t 2" dump of your fs.
Both before and after, and it doesn't containing anything meaningful(no 
file name/relation, only extent allocation info), so it's should be 
quite safe to upload.

Since I'm really surprised on the mount time reduce, especially 
considering the fact that for compression case, max extent size is 
limited to 128K, IMHO defrag won't help much.


>
>     > I've tried the patch, which applied fine to my kernel (4.6.4) but I
>     > don't see any difference in mounting time, maybe I made a mistake
>     or my
>     > issue is not really the same?
>
>     Pretty possible that there is another problem causing the slow mount.
>
>     The best method to verify is to do a ftrace on the btrfs mount.
>     Here is the script I tested my patch:
>
>     ....
>     ....
>
> Thank you for the script, that makes it a lot easier for me!
>
>     And you can see open_ctree() function, the main part of btrfs mount,
>     takes about 7.67 seconds to execute, while btrfs_read_block_groups()
>     takes 7.56 seconds, about 98.6% of the open_ctree() executing time.
>
>     If your result are much the same as mine, then that's the same problem.
>
> They are similar, 99% is spent in btrfs_read_block_groups.
>
>     And after applying my patch, please try to compare the executing time of
>     btrfs_read_block_groups() to see if there is any obvious(>5%) change.
>
> Here's what I have for one partition:
>
> no patch:
> open_ctree: 16952419
> btrfs_read_block_groups: 16844453
> ratio: 0.9936312333950689
>
> patch:
> open_ctree: 16680173
> btrfs_read_block_groups: 16600532
> ratio: 0.9952254092328659
>
> ratio no patch/patch: 0.9983981761086407

OK, almost no improvement. So in your case, most BLOCK_GROUP_ITEMS are 
not at the tail of a extent tree leaf.
And in our test environment, it seems that quite some BLOCK_GROUPS_ITEMS 
are at the tail of an extent tree leaf, and make the improvement quite 
obvious.

But anyway, if we can change the on-disk format to introduce a specific 
block group items tree, then I assume the mount time would reduce to 
less than 5 seconds.

Thanks,
Qu
>
> Thank you,
> John
>



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]                                               ` <CAJ3TwYRH8JVkuv2Hu7FYb+BSwKGrq1spx079zwOF_FO1y=9NFA@mail.gmail.com>
@ 2016-07-18  9:07                                                 ` Qu Wenruo
  2016-07-18 15:31                                                   ` Duncan
       [not found]                                                   ` <CAJ3TwYS6UTkWf=PNku3RG7hPrXMKz3yhk2WqCRLix4v_VwgrmA@mail.gmail.com>
  0 siblings, 2 replies; 54+ messages in thread
From: Qu Wenruo @ 2016-07-18  9:07 UTC (permalink / raw)
  To: John Ettedgui, Austin S Hemmelgarn; +Cc: btrfs



At 07/18/2016 04:53 PM, John Ettedgui wrote:
> On Mon, Jul 18, 2016 at 1:42 AM Qu Wenruo <quwenruo@cn.fujitsu.com
> <mailto:quwenruo@cn.fujitsu.com>> wrote:
>
>
>     > So following that, another partition got its mounting time reduced by
>     > about 70% by running a manual defrag (I kept compression on and used
>     > -clzo for this defragmentation).
>     > So maybe a manual defrag is really the best thing to do so far.
>
>     Seems to be the case.
>
>     For further investigation, it would be quite nice for you to upload the
>     output of "btrfs-debug-tree -t 2" dump of your fs.
>     Both before and after, and it doesn't containing anything meaningful(no
>     file name/relation, only extent allocation info), so it's should be
>     quite safe to upload.
>
> What do you mean by before and after?
> Before defragmentation?

Yes, to compare the extent size and verify my assumption.

But I'm afraid you don't have any fs with that slow mount time any more.

>
>     Since I'm really surprised on the mount time reduce, especially
>     considering the fact that for compression case, max extent size is
>     limited to 128K, IMHO defrag won't help much.
>
> Is the 128K limit for the whole FS or only for files that btrfs deemed
> worth to compress? If it's the latter, that could explain why defrag helped.

The latter. But the 128K is not for compressed size, but raw size.

So no matter the compressed size, any extent whose uncompressed size is 
larger than 128K will be split.

The main reason I'm surprised about the mount time reduce, is that 
considering the sectorsize (4K for x86_64 and x86), the fragments won't 
increase too much.
The smallest extent size is determined by sectorsize(4K for most arch).
Compressed extent up limit is 128K,  4K -> 128K is only 32 times.
While for non-compress case, its extent size up limit is 128M.
32K times larger than sector size, or 1024 times larger than compressed 
extent size.

So I'm quite surprised that defrag helps so much.

>
>
>     >     And after applying my patch, please try to compare the
>     executing time of
>     >     btrfs_read_block_groups() to see if there is any obvious(>5%)
>     change.
>     >
>     > Here's what I have for one partition:
>     >
>     > no patch:
>     > open_ctree: 16952419
>     > btrfs_read_block_groups: 16844453
>     > ratio: 0.9936312333950689
>     >
>     > patch:
>     > open_ctree: 16680173
>     > btrfs_read_block_groups: 16600532
>     > ratio: 0.9952254092328659
>     >
>     > ratio no patch/patch: 0.9983981761086407
>
>     OK, almost no improvement. So in your case, most BLOCK_GROUP_ITEMS are
>     not at the tail of a extent tree leaf.
>     And in our test environment, it seems that quite some BLOCK_GROUPS_ITEMS
>     are at the tail of an extent tree leaf, and make the improvement quite
>     obvious.
>
>     But anyway, if we can change the on-disk format to introduce a specific
>     block group items tree, then I assume the mount time would reduce to
>     less than 5 seconds.
>
> Less than 5 seconds without regular defrag would be nice.
> It  would be even nicer to be able to convert from one format to another
> and not need to do it at mkfs time, but I don't know how feasible that
> will be.

If it's possible, it may works just like METADATA_ITEM(or 
skinny_metadata feature), and in that case, time reduce will depend on 
how many BLOCK_GROUP_ITEMs are in the new tree.

Thanks,
Qu
>
> Another option would be to use something like bcache to have the extent
> tree on a SSD while the data stays on the HD. No idea how feasible that
> would be though...
>
> Thank you,
> John



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-16 23:53                                   ` Qu Wenruo
@ 2016-07-18 13:42                                     ` Josef Bacik
  2016-07-19  0:35                                       ` Qu Wenruo
  2016-07-25 13:01                                       ` David Sterba
  0 siblings, 2 replies; 54+ messages in thread
From: Josef Bacik @ 2016-07-18 13:42 UTC (permalink / raw)
  To: Qu Wenruo, Christian Rohmann, Qu Wenruo, John Ettedgui,
	Austin S Hemmelgarn
  Cc: btrfs, Chris Mason, David Sterba

On 07/16/2016 07:53 PM, Qu Wenruo wrote:
>
>
> On 07/15/2016 07:29 PM, Christian Rohmann wrote:
>> Hey Qu, all
>>
>> On 07/15/2016 05:56 AM, Qu Wenruo wrote:
>>>
>>> The good news is, we have patch to slightly speedup the mount, by
>>> avoiding reading out unrelated tree blocks.
>>>
>>> In our test environment, it takes 15% less time to mount a fs filled
>>> with 16K files(2T used space).
>>>
>>> https://patchwork.kernel.org/patch/9021421/
>>
>> I have a 30TB RAID6 filesystem with compression on and I've seen mount
>> times of up to 20 minutes (!).
>>
>> I don't want to sound unfair, but 15% improvement is good, but not in
>> the league where BTRFS needs to be.
>> Do I understand you comments correctly that further improvement would
>> result in a change of the on-disk format?
>
> Yes, that's the case.
>
> The problem is, we put BLOCK_GROUP_ITEM into extent tree, along with tons of
> EXTENT_ITEM/METADATA_ITEM.
>
> This makes search for BLOCK_GROUP_ITEM very very very slow if extent tree is
> really big.
>
> On the handle, we search CHUNK_ITEM very very fast, because CHUNK_ITEM are in
> their own tree.
> (CHUNK_ITEM and BLOCK_GROUP_ITEM are 1:1 mapped)
>
> So to completely fix it, btrfs needs on-disk format change to put
> BLOCK_GROUP_ITEM into their own tree.
>
> IMHO there maybe be some objection from other devs though.
>
> Anyway, I add the three maintainers to Cc, and hopes we can get a better idea to
> fix it.

Yeah I'm going to fix this when I do the per-block group extent tree thing.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-18  9:07                                                 ` Qu Wenruo
@ 2016-07-18 15:31                                                   ` Duncan
       [not found]                                                   ` <CAJ3TwYS6UTkWf=PNku3RG7hPrXMKz3yhk2WqCRLix4v_VwgrmA@mail.gmail.com>
  1 sibling, 0 replies; 54+ messages in thread
From: Duncan @ 2016-07-18 15:31 UTC (permalink / raw)
  To: linux-btrfs

Qu Wenruo posted on Mon, 18 Jul 2016 17:07:47 +0800 as excerpted:

>>     Since I'm really surprised on the mount time reduce, especially
>>     considering the fact that for compression case, max extent size is
>>     limited to 128K, IMHO defrag won't help much.
>>
>> Is the 128K limit for the whole FS or only for files that btrfs deemed
>> worth to compress? If it's the latter, that could explain why defrag
>> helped.
> 
> The latter. But the 128K is not for compressed size, but raw size.
> 
> So no matter the compressed size, any extent whose uncompressed size is
> larger than 128K will be split.
> 
> The main reason I'm surprised about the mount time reduce, is that
> considering the sectorsize (4K for x86_64 and x86), the fragments won't
> increase too much.
> The smallest extent size is determined by sectorsize(4K for most arch).
> Compressed extent up limit is 128K,  4K -> 128K is only 32 times. While
> for non-compress case, its extent size up limit is 128M.
> 32K times larger than sector size, or 1024 times larger than compressed
> extent size.
> 
> So I'm quite surprised that defrag helps so much.

[I'm only seeing your posts here, not his (yet?), so I'm only seeing what 
you quote from his posts and may be missing part of the story.  Never-the-
less...]

I think what he's referring to is that he's only running compress, not 
compress-force, and presumably not autodefrag.  So there's probably a lot 
of uncompressed files that were heavily fragmented and thus in many 
extents, that a defrag run helped consolidate, thus reducing mount time 
substantially.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-18 13:42                                     ` Josef Bacik
@ 2016-07-19  0:35                                       ` Qu Wenruo
  2016-07-25 13:01                                       ` David Sterba
  1 sibling, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2016-07-19  0:35 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, Christian Rohmann, John Ettedgui,
	Austin S Hemmelgarn
  Cc: btrfs, Chris Mason, David Sterba



At 07/18/2016 09:42 PM, Josef Bacik wrote:
> On 07/16/2016 07:53 PM, Qu Wenruo wrote:
>>
>>
>> On 07/15/2016 07:29 PM, Christian Rohmann wrote:
>>> Hey Qu, all
>>>
>>> On 07/15/2016 05:56 AM, Qu Wenruo wrote:
>>>>
>>>> The good news is, we have patch to slightly speedup the mount, by
>>>> avoiding reading out unrelated tree blocks.
>>>>
>>>> In our test environment, it takes 15% less time to mount a fs filled
>>>> with 16K files(2T used space).
>>>>
>>>> https://patchwork.kernel.org/patch/9021421/
>>>
>>> I have a 30TB RAID6 filesystem with compression on and I've seen mount
>>> times of up to 20 minutes (!).
>>>
>>> I don't want to sound unfair, but 15% improvement is good, but not in
>>> the league where BTRFS needs to be.
>>> Do I understand you comments correctly that further improvement would
>>> result in a change of the on-disk format?
>>
>> Yes, that's the case.
>>
>> The problem is, we put BLOCK_GROUP_ITEM into extent tree, along with
>> tons of
>> EXTENT_ITEM/METADATA_ITEM.
>>
>> This makes search for BLOCK_GROUP_ITEM very very very slow if extent
>> tree is
>> really big.
>>
>> On the handle, we search CHUNK_ITEM very very fast, because CHUNK_ITEM
>> are in
>> their own tree.
>> (CHUNK_ITEM and BLOCK_GROUP_ITEM are 1:1 mapped)
>>
>> So to completely fix it, btrfs needs on-disk format change to put
>> BLOCK_GROUP_ITEM into their own tree.
>>
>> IMHO there maybe be some objection from other devs though.
>>
>> Anyway, I add the three maintainers to Cc, and hopes we can get a
>> better idea to
>> fix it.
>
> Yeah I'm going to fix this when I do the per-block group extent tree
> thing.  Thanks,
>
> Josef

Awesome!

Can't wait to see the implementation.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]                                                   ` <CAJ3TwYS6UTkWf=PNku3RG7hPrXMKz3yhk2WqCRLix4v_VwgrmA@mail.gmail.com>
@ 2016-07-21  8:10                                                     ` Qu Wenruo
       [not found]                                                       ` <CAJ3TwYQ47SVpbO1Pb-TWjhaTCCpMFFmijwTgmV8=7+1_a6_3Ww@mail.gmail.com>
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2016-07-21  8:10 UTC (permalink / raw)
  To: John Ettedgui, Austin S Hemmelgarn; +Cc: btrfs

Thanks for the info, pretty helpful.

After a simple analysis, the defrag did do a pretty good job.

-----------------------------------------------------------------------
            | Avg Extent size | Median Extent size | Data Extents      |
-----------------------------------------------------------------------
Predefrag  | 2.6M            | 512K               | 1043589           |
Postdefrag | 7.4M            | 80K                | 359823            |

Defrag reduced the number of extents to 34%!

Quite awesome.

While I still see quite a lot small extents (In fact, small extents are 
more after defrag), so I assume there can be more improvement.

But considering the mount time is only affected by number of extents 
(data and meta, but amount of meta is not affect by defrag), so the 
improvement is already quite obvious now.

Much more obvious than my expectation.

Thanks,
Qu

At 07/20/2016 06:44 PM, John Ettedgui wrote:
> On Mon, Jul 18, 2016 at 2:07 AM Qu Wenruo <quwenruo@cn.fujitsu.com
> <mailto:quwenruo@cn.fujitsu.com>> wrote:
>
>     Yes, to compare the extent size and verify my assumption.
>
>     But I'm afraid you don't have any fs with that slow mount time any more.
>
>
> Here are 2 links for the information you requested, I've gzipped each
> file as it was quite big...
>
> https://mega.nz/#!QhQSHBhb!RwN3kDBK6ZOkq3e5UkNhzB0XnbfgZgql4c5fvjfDq1w
> <https://mega.nz/#%21QhQSHBhb%21RwN3kDBK6ZOkq3e5UkNhzB0XnbfgZgql4c5fvjfDq1w>
> https://mega.nz/#!M5gVAbLA!S_TxIls1_q6MqMVlCRK5XxTXifXPE76tdJWsf5XRxYE
> <https://mega.nz/#%21M5gVAbLA%21S_TxIls1_q6MqMVlCRK5XxTXifXPE76tdJWsf5XRxYE>
>
> I didn't look at their content, but just comparing their size, there is
> quite a difference there.
>
> Thank you,
> John



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
       [not found]                                                       ` <CAJ3TwYQ47SVpbO1Pb-TWjhaTCCpMFFmijwTgmV8=7+1_a6_3Ww@mail.gmail.com>
@ 2016-07-21  8:19                                                         ` Qu Wenruo
  2016-07-21 15:47                                                           ` Graham Cobb
  2018-02-13 10:21                                                           ` John Ettedgui
  0 siblings, 2 replies; 54+ messages in thread
From: Qu Wenruo @ 2016-07-21  8:19 UTC (permalink / raw)
  To: John Ettedgui, Austin S Hemmelgarn; +Cc: btrfs



At 07/21/2016 04:13 PM, John Ettedgui wrote:
> On Thu, Jul 21, 2016 at 1:10 AM Qu Wenruo <quwenruo@cn.fujitsu.com
> <mailto:quwenruo@cn.fujitsu.com>> wrote:
>
>     Thanks for the info, pretty helpful.
>
>     After a simple analysis, the defrag did do a pretty good job.
>
>     -----------------------------------------------------------------------
>                 | Avg Extent size | Median Extent size | Data Extents      |
>     -----------------------------------------------------------------------
>     Predefrag  | 2.6M            | 512K               | 1043589           |
>     Postdefrag | 7.4M            | 80K                | 359823            |
>
>     Defrag reduced the number of extents to 34%!
>
>     Quite awesome.
>
>     While I still see quite a lot small extents (In fact, small extents are
>     more after defrag), so I assume there can be more improvement.
>
>     But considering the mount time is only affected by number of extents
>     (data and meta, but amount of meta is not affect by defrag), so the
>     improvement is already quite obvious now.
>
>     Much more obvious than my expectation.
>
>     Thanks,
>     Qu
>
> I'm glad to be of help.
> Is there anything else you'd like me to try?
> I don't have any non-defragmented partitions anymore, but you already
> got that information so that should be ok.
>
> Thank you,
> John

No more.

The dump is already good enough for me to dig for some time.

We don't usually get such large extent tree dump from a real world use case.

It would help us in several ways, from determine how fragmented a block 
group is to determine if a defrag will help.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-21  8:19                                                         ` Qu Wenruo
@ 2016-07-21 15:47                                                           ` Graham Cobb
  2017-04-10  0:52                                                             ` Qu Wenruo
  2018-02-13 10:21                                                           ` John Ettedgui
  1 sibling, 1 reply; 54+ messages in thread
From: Graham Cobb @ 2016-07-21 15:47 UTC (permalink / raw)
  To: btrfs

On 21/07/16 09:19, Qu Wenruo wrote:
> We don't usually get such large extent tree dump from a real world use
> case.

Let us know if you want some more :-)

I have a heavily used single disk BTRFS filesystem with about 3.7TB in
use and about 9 million extents.  I am happy to provide an extent dump
if it is useful to you.  Particularly if you don't need me to actually
unmount it (i.e. you can live with some inconsistencies).



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-18 13:42                                     ` Josef Bacik
  2016-07-19  0:35                                       ` Qu Wenruo
@ 2016-07-25 13:01                                       ` David Sterba
  2016-07-25 13:38                                         ` Josef Bacik
  1 sibling, 1 reply; 54+ messages in thread
From: David Sterba @ 2016-07-25 13:01 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Qu Wenruo, Christian Rohmann, Qu Wenruo, John Ettedgui,
	Austin S Hemmelgarn, btrfs, Chris Mason

On Mon, Jul 18, 2016 at 09:42:50AM -0400, Josef Bacik wrote:
> >
> > This makes search for BLOCK_GROUP_ITEM very very very slow if extent tree is
> > really big.
> >
> > On the handle, we search CHUNK_ITEM very very fast, because CHUNK_ITEM are in
> > their own tree.
> > (CHUNK_ITEM and BLOCK_GROUP_ITEM are 1:1 mapped)
> >
> > So to completely fix it, btrfs needs on-disk format change to put
> > BLOCK_GROUP_ITEM into their own tree.
> >
> > IMHO there maybe be some objection from other devs though.
> >
> > Anyway, I add the three maintainers to Cc, and hopes we can get a better idea to
> > fix it.
> 
> Yeah I'm going to fix this when I do the per-block group extent tree thing.  Thanks,

Will it be capable of "per- subvolume set" extent trees? IOW, a set of
subvolumes will could share extents only among the members of the same
group. The usecase is to start an isolate subvolume and allow snapshots
(and obviously forbid reflinks outside of the group).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-25 13:01                                       ` David Sterba
@ 2016-07-25 13:38                                         ` Josef Bacik
  0 siblings, 0 replies; 54+ messages in thread
From: Josef Bacik @ 2016-07-25 13:38 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Christian Rohmann, Qu Wenruo, John Ettedgui,
	Austin S Hemmelgarn, btrfs, Chris Mason

On 07/25/2016 09:01 AM, David Sterba wrote:
> On Mon, Jul 18, 2016 at 09:42:50AM -0400, Josef Bacik wrote:
>>>
>>> This makes search for BLOCK_GROUP_ITEM very very very slow if extent tree is
>>> really big.
>>>
>>> On the handle, we search CHUNK_ITEM very very fast, because CHUNK_ITEM are in
>>> their own tree.
>>> (CHUNK_ITEM and BLOCK_GROUP_ITEM are 1:1 mapped)
>>>
>>> So to completely fix it, btrfs needs on-disk format change to put
>>> BLOCK_GROUP_ITEM into their own tree.
>>>
>>> IMHO there maybe be some objection from other devs though.
>>>
>>> Anyway, I add the three maintainers to Cc, and hopes we can get a better idea to
>>> fix it.
>>
>> Yeah I'm going to fix this when I do the per-block group extent tree thing.  Thanks,
>
> Will it be capable of "per- subvolume set" extent trees? IOW, a set of
> subvolumes will could share extents only among the members of the same
> group. The usecase is to start an isolate subvolume and allow snapshots
> (and obviously forbid reflinks outside of the group).
>

I suppose.   The problem is the btrfs_header doesn't have much room for verbose 
descriptions of which root owns it.  We have objectid since it was always 
unique, but in the case of per bg extents we can't use that anymore, so we have 
to abuse flags to say this is an extent root block, and then we know that 
btrfs_header_owner(eb) is really the offset of the root and not the objectid. 
Doing something like having it per subvolume would mean having another flag that 
says this block belongs to a subvolume root, and then have the 
btrfs_header_owner(eb) set to the new offset.

The point I'm making is we can do whatever we want here, but it'll be a little 
strange since we have to use flag bits to indicate what type of root the owner 
points to, so any future modifications will also be format changes.  At least 
once I get this work done we'll be able to more easily add new variations on the 
per-whatever setup.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-21 15:47                                                           ` Graham Cobb
@ 2017-04-10  0:52                                                             ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2017-04-10  0:52 UTC (permalink / raw)
  To: Graham Cobb, btrfs



At 07/21/2016 11:47 PM, Graham Cobb wrote:
> On 21/07/16 09:19, Qu Wenruo wrote:
>> We don't usually get such large extent tree dump from a real world use
>> case.
>
> Let us know if you want some more :-)
>
> I have a heavily used single disk BTRFS filesystem with about 3.7TB in
> use and about 9 million extents.  I am happy to provide an extent dump
> if it is useful to you.  Particularly if you don't need me to actually
> unmount it (i.e. you can live with some inconsistencies).

Btrfs-debug-tree can dump fs on-line. But I'm not sure if the result is 
consistent.


BTW, for the original problem (slow mount and fsck OOM), although we 
have no progress to handle slow mount but use defrag to reduce the 
number of extents, for fsck OOM, the lowmem mode now can handle all the 
trees.

So now lowmem mode should not cause OOM now, but it may take much longer 
time if you have multiple snapshots.
It would be nice if you could try it, especially to see if the lowmem 
mode really lives up to its name.

Thanks,
Qu

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2016-07-21  8:19                                                         ` Qu Wenruo
  2016-07-21 15:47                                                           ` Graham Cobb
@ 2018-02-13 10:21                                                           ` John Ettedgui
  2018-02-13 11:04                                                             ` Qu Wenruo
  1 sibling, 1 reply; 54+ messages in thread
From: John Ettedgui @ 2018-02-13 10:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Austin S Hemmelgarn, btrfs

On Thu, Jul 21, 2016 at 1:19 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>
> No more.
>
> The dump is already good enough for me to dig for some time.
>
> We don't usually get such large extent tree dump from a real world use case.
>
> It would help us in several ways, from determine how fragmented a block
> group is to determine if a defrag will help.
>
> Thanks,
> Qu
>
>


Hello there,

have you found anything good since then?
With a default system, the behavior is pretty much still the same,
though I have not recreated the partitions since.

Defrag helps, but I think balance helps even more.
clear_cache may help too, but I'm not really sure as I've not tried it
on its own.
I was actually able to get a 4TB partition on a 5400rpm HDD to mount
in around 500ms, quite faster that even some Gb partitions I have on
SSDs! Alas I wrote some files to it and it's taking over a second
again, so no more magic there.

The workarounds do work, so it's still not a major issue, but they're
slow and sometimes I have to workaround the "no space left on device"
which then takes even more time.

Thank you!
John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 10:21                                                           ` John Ettedgui
@ 2018-02-13 11:04                                                             ` Qu Wenruo
  2018-02-13 11:25                                                               ` John Ettedgui
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2018-02-13 11:04 UTC (permalink / raw)
  To: John Ettedgui, Qu Wenruo; +Cc: Austin S Hemmelgarn, btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3368 bytes --]



On 2018年02月13日 18:21, John Ettedgui wrote:
> On Thu, Jul 21, 2016 at 1:19 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>> No more.
>>
>> The dump is already good enough for me to dig for some time.
>>
>> We don't usually get such large extent tree dump from a real world use case.
>>
>> It would help us in several ways, from determine how fragmented a block
>> group is to determine if a defrag will help.
>>
>> Thanks,
>> Qu
>>
>>
> 
> 
> Hello there,
> 
> have you found anything good since then?

Unfortunately, not really much to speed it up.

This reminds me of the old (and crazy) idea to skip block group build
for RO mount.
But not really helpful for it.

> With a default system, the behavior is pretty much still the same,
> though I have not recreated the partitions since.
> 
> Defrag helps, but I think balance helps even more.
> clear_cache may help too, but I'm not really sure as I've not tried it
> on its own.
> I was actually able to get a 4TB partition on a 5400rpm HDD to mount
> in around 500ms, quite faster that even some Gb partitions I have on
> SSDs! Alas I wrote some files to it and it's taking over a second
> again, so no more magic there.

The problem is not about how much space it takes, but how many extents
are here in the filesystem.

For new fs filled with normal data, I'm pretty sure data extents will be
as large as its maximum size (256M), causing very little or even no
pressure to block group search.

> 
> The workarounds do work, so it's still not a major issue, but they're
> slow and sometimes I have to workaround the "no space left on device"
> which then takes even more time.

And since I went to SUSE, some mail/info is lost during the procedure.

Despite that, I have several more assumption to this problem:

1) Metadata usage bumped by inline files
   If there are a lot of small files (<2K as default), and your metadata
   usage is quite high (generally speaking, it meta:data ratio should be
   way below 1:8), that may be the cause.

   If so, try mount the fs with "max_inline=0" mount option and then
   try to rewrite such small files.

2) SSD write amplification along with dynamic remapping
   To be honest, I'm not really buying this idea, since mount doesn't
   have anything related to write.
   But running fstrim won't harm anyway.

3) Rewrite the existing files (extreme defrag)
   In fact, defrag doesn't work well if there are subvolumes/snapshots
   /reflink involved.
   The most stupid and mindless way, is to write a small script and find
   all regular files, read them out and rewrite it back.

   This should acts much better than traditional defrag, although it's
   time-consuming and makes snapshot completely meaningless.
   (and since you're already hitting ENOSPC, I don't think the idea is
    really working for you)

And since you're already hitting ENOSPC, either it's caused by
unbalanced meta/data usage, or it's really going hit the limit, I would
recommend to enlarge the fs or delete some files to see if it helps.

Thanks,
Qu

> 
> Thank you!
> John
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 11:04                                                             ` Qu Wenruo
@ 2018-02-13 11:25                                                               ` John Ettedgui
  2018-02-13 11:40                                                                 ` Qu Wenruo
  0 siblings, 1 reply; 54+ messages in thread
From: John Ettedgui @ 2018-02-13 11:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, Austin S Hemmelgarn, btrfs

On Tue, Feb 13, 2018 at 3:04 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2018年02月13日 18:21, John Ettedgui wrote:
>> Hello there,
>>
>> have you found anything good since then?
>
> Unfortunately, not really much to speed it up.
Oh :/
>
> This reminds me of the old (and crazy) idea to skip block group build
> for RO mount.
> But not really helpful for it.
>
>> With a default system, the behavior is pretty much still the same,
>> though I have not recreated the partitions since.
>>
>> Defrag helps, but I think balance helps even more.
>> clear_cache may help too, but I'm not really sure as I've not tried it
>> on its own.
>> I was actually able to get a 4TB partition on a 5400rpm HDD to mount
>> in around 500ms, quite faster that even some Gb partitions I have on
>> SSDs! Alas I wrote some files to it and it's taking over a second
>> again, so no more magic there.
>
> The problem is not about how much space it takes, but how many extents
> are here in the filesystem.
>
> For new fs filled with normal data, I'm pretty sure data extents will be
> as large as its maximum size (256M), causing very little or even no
> pressure to block group search.
>
What do you mean by "new fs", was there any change that would improve
the behavior if I were to recreate the FS?
Last time we talked I believe max extent was 128M for non-compressed
files, so maybe there's been some good change.
>>
>> The workarounds do work, so it's still not a major issue, but they're
>> slow and sometimes I have to workaround the "no space left on device"
>> which then takes even more time.
>
> And since I went to SUSE, some mail/info is lost during the procedure.
I still have all mails, if you want it. No dump left though.
>
> Despite that, I have several more assumption to this problem:
>
> 1) Metadata usage bumped by inline files
What are inline files? Should I view this as inline in C, in that the
small files are stored in the tree directly?
>    If there are a lot of small files (<2K as default),
Of the slow to mount partitions:
2 partitions have less than a dozen files smaller than 2K.
1 has about 5 thousand and the last one 15 thousand.
Are the latter considered a lot?
> and your metadata
>    usage is quite high (generally speaking, it meta:data ratio should be
>    way below 1:8), that may be the cause.
>
The ratio is about 1:900 on average so that should be ok I guess.
>    If so, try mount the fs with "max_inline=0" mount option and then
>    try to rewrite such small files.
>
Should I try that?
> 2) SSD write amplification along with dynamic remapping
>    To be honest, I'm not really buying this idea, since mount doesn't
>    have anything related to write.
>    But running fstrim won't harm anyway.
>
Oh I am not complaining about slow SSDs mounting. I was just amazed
that a partition on a slow HDD mounted faster.
Without any specific work, my SSDs partitions tend to mount around 1 sec or so.
Of course I'd be happy to worry about them once all the partitions on
HDDs mount in a handful of ms :)

> 3) Rewrite the existing files (extreme defrag)
>    In fact, defrag doesn't work well if there are subvolumes/snapshots
I have no subvolume or snapshot so that's not a problem.
>    /reflink involved.
>    The most stupid and mindless way, is to write a small script and find
>    all regular files, read them out and rewrite it back.
>
That's fairly straightforward to do, though it should be quite slow so
I'd hope not to have to do that too often.
>    This should acts much better than traditional defrag, although it's
>    time-consuming and makes snapshot completely meaningless.
>    (and since you're already hitting ENOSPC, I don't think the idea is
>     really working for you)
>
> And since you're already hitting ENOSPC, either it's caused by
> unbalanced meta/data usage, or it's really going hit the limit, I would
> recommend to enlarge the fs or delete some files to see if it helps.
>
Yup, I usually either slowly ramp up the {d,m}usage to pass it, or
when that does not work I free some space, then balance will finish.
Or did you mean to free some space to see about mount speed?
> Thanks,
> Qu
>

Thank you for the quick reply!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 11:25                                                               ` John Ettedgui
@ 2018-02-13 11:40                                                                 ` Qu Wenruo
  2018-02-13 12:06                                                                   ` John Ettedgui
  2018-02-13 12:26                                                                   ` Holger Hoffstätte
  0 siblings, 2 replies; 54+ messages in thread
From: Qu Wenruo @ 2018-02-13 11:40 UTC (permalink / raw)
  To: John Ettedgui; +Cc: Qu Wenruo, Austin S Hemmelgarn, btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5778 bytes --]



On 2018年02月13日 19:25, John Ettedgui wrote:
> On Tue, Feb 13, 2018 at 3:04 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2018年02月13日 18:21, John Ettedgui wrote:
>>> Hello there,
>>>
>>> have you found anything good since then?
>>
>> Unfortunately, not really much to speed it up.
> Oh :/
>>
>> This reminds me of the old (and crazy) idea to skip block group build
>> for RO mount.
>> But not really helpful for it.
>>
>>> With a default system, the behavior is pretty much still the same,
>>> though I have not recreated the partitions since.
>>>
>>> Defrag helps, but I think balance helps even more.
>>> clear_cache may help too, but I'm not really sure as I've not tried it
>>> on its own.
>>> I was actually able to get a 4TB partition on a 5400rpm HDD to mount
>>> in around 500ms, quite faster that even some Gb partitions I have on
>>> SSDs! Alas I wrote some files to it and it's taking over a second
>>> again, so no more magic there.
>>
>> The problem is not about how much space it takes, but how many extents
>> are here in the filesystem.
>>
>> For new fs filled with normal data, I'm pretty sure data extents will be
>> as large as its maximum size (256M), causing very little or even no
>> pressure to block group search.
>>
> What do you mean by "new fs",

I mean the 4TB partition on that 5400rpm HDD.

> was there any change that would improve
> the behavior if I were to recreate the FS?

If you backed up your fs, and recreate a new, empty btrfs on your
original SSD, then copying all data back, I believe it would be much
faster to mount.

> Last time we talked I believe max extent was 128M for non-compressed
> files, so maybe there's been some good change.

My fault, 128M is correct.

>>>
>>> The workarounds do work, so it's still not a major issue, but they're
>>> slow and sometimes I have to workaround the "no space left on device"
>>> which then takes even more time.
>>
>> And since I went to SUSE, some mail/info is lost during the procedure.
> I still have all mails, if you want it. No dump left though.
>>
>> Despite that, I have several more assumption to this problem:
>>
>> 1) Metadata usage bumped by inline files
> What are inline files? Should I view this as inline in C, in that the
> small files are stored in the tree directly?

Exactly.

>>    If there are a lot of small files (<2K as default),
> Of the slow to mount partitions:
> 2 partitions have less than a dozen files smaller than 2K.
> 1 has about 5 thousand and the last one 15 thousand.
> Are the latter considered a lot?

If using default 16K nodesize, 8 small files takes one leaf.
And 15K small failes means about 2K tree extents.

Not that much in my opinion, can't even fill half of a metadata chunk.

>> and your metadata
>>    usage is quite high (generally speaking, it meta:data ratio should be
>>    way below 1:8), that may be the cause.
>>
> The ratio is about 1:900 on average so that should be ok I guess.

Yep, that should be fine.
So not metadata to blame.

Then purely fragmented data extents.

>>    If so, try mount the fs with "max_inline=0" mount option and then
>>    try to rewrite such small files.
>>
> Should I try that?

No need, it won't cause much difference.

>> 2) SSD write amplification along with dynamic remapping
>>    To be honest, I'm not really buying this idea, since mount doesn't
>>    have anything related to write.
>>    But running fstrim won't harm anyway.
>>
> Oh I am not complaining about slow SSDs mounting. I was just amazed
> that a partition on a slow HDD mounted faster.
> Without any specific work, my SSDs partitions tend to mount around 1 sec or so.
> Of course I'd be happy to worry about them once all the partitions on
> HDDs mount in a handful of ms :)
> 
>> 3) Rewrite the existing files (extreme defrag)
>>    In fact, defrag doesn't work well if there are subvolumes/snapshots
> I have no subvolume or snapshot so that's not a problem.
>>    /reflink involved.
>>    The most stupid and mindless way, is to write a small script and find
>>    all regular files, read them out and rewrite it back.
>>
> That's fairly straightforward to do, though it should be quite slow so
> I'd hope not to have to do that too often.

Then it could be tried on the most frequently updated files then.

And since you don't use snapshot, locate such files and then "chattr +C"
would make them nodatacow, reducing later fragments.

>>    This should acts much better than traditional defrag, although it's
>>    time-consuming and makes snapshot completely meaningless.
>>    (and since you're already hitting ENOSPC, I don't think the idea is
>>     really working for you)
>>
>> And since you're already hitting ENOSPC, either it's caused by
>> unbalanced meta/data usage, or it's really going hit the limit, I would
>> recommend to enlarge the fs or delete some files to see if it helps.
>>
> Yup, I usually either slowly ramp up the {d,m}usage to pass it, or
> when that does not work I free some space, then balance will finish.
> Or did you mean to free some space to see about mount speed?

Kind of, just do such freeing in advance, and try to make btrfs always
have unallocated space in case.

And finally, use latest kernel if possible.
IIRC old kernel doesn't have empty block group auto remove, which makes
user need to manually balance to free some space.

Thanks,
Qu

>> Thanks,
>> Qu
>>
> 
> Thank you for the quick reply!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 11:40                                                                 ` Qu Wenruo
@ 2018-02-13 12:06                                                                   ` John Ettedgui
  2018-02-13 12:46                                                                     ` Qu Wenruo
  2018-02-13 12:26                                                                   ` Holger Hoffstätte
  1 sibling, 1 reply; 54+ messages in thread
From: John Ettedgui @ 2018-02-13 12:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Austin S Hemmelgarn, btrfs

On Tue, Feb 13, 2018 at 3:40 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2018年02月13日 19:25, John Ettedgui wrote:
>> On Tue, Feb 13, 2018 at 3:04 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>>
>>> The problem is not about how much space it takes, but how many extents
>>> are here in the filesystem.
>>>
>>> For new fs filled with normal data, I'm pretty sure data extents will be
>>> as large as its maximum size (256M), causing very little or even no
>>> pressure to block group search.
>>>
>> What do you mean by "new fs",
>
> I mean the 4TB partition on that 5400rpm HDD.
>
>> was there any change that would improve
>> the behavior if I were to recreate the FS?
>
> If you backed up your fs, and recreate a new, empty btrfs on your
> original SSD, then copying all data back, I believe it would be much
> faster to mount.
>
Alright, I'll have to wait on getting some more drives for that but I
look forward to trying that.

>> Last time we talked I believe max extent was 128M for non-compressed
>> files, so maybe there's been some good change.
>
> My fault, 128M is correct.
>
>>> And since I went to SUSE, some mail/info is lost during the procedure.
>> I still have all mails, if you want it. No dump left though.
>>>
>>> Despite that, I have several more assumption to this problem:
>>>
>>> 1) Metadata usage bumped by inline files
>> What are inline files? Should I view this as inline in C, in that the
>> small files are stored in the tree directly?
>
> Exactly.
>
>>>    If there are a lot of small files (<2K as default),
>> Of the slow to mount partitions:
>> 2 partitions have less than a dozen files smaller than 2K.
>> 1 has about 5 thousand and the last one 15 thousand.
>> Are the latter considered a lot?
>
> If using default 16K nodesize, 8 small files takes one leaf.
> And 15K small failes means about 2K tree extents.
>
> Not that much in my opinion, can't even fill half of a metadata chunk.
>
>>> and your metadata
>>>    usage is quite high (generally speaking, it meta:data ratio should be
>>>    way below 1:8), that may be the cause.
>>>
>> The ratio is about 1:900 on average so that should be ok I guess.
>
> Yep, that should be fine.
> So not metadata to blame.
>
> Then purely fragmented data extents.
>
>>>    If so, try mount the fs with "max_inline=0" mount option and then
>>>    try to rewrite such small files.
>>>
>> Should I try that?
>
> No need, it won't cause much difference.

Alright!

>>> 2) SSD write amplification along with dynamic remapping
>>>    To be honest, I'm not really buying this idea, since mount doesn't
>>>    have anything related to write.
>>>    But running fstrim won't harm anyway.
>>>
>> Oh I am not complaining about slow SSDs mounting. I was just amazed
>> that a partition on a slow HDD mounted faster.
>> Without any specific work, my SSDs partitions tend to mount around 1 sec or so.
>> Of course I'd be happy to worry about them once all the partitions on
>> HDDs mount in a handful of ms :)
>>
>>> 3) Rewrite the existing files (extreme defrag)
>>>    In fact, defrag doesn't work well if there are subvolumes/snapshots
>> I have no subvolume or snapshot so that's not a problem.
>>>    /reflink involved.
>>>    The most stupid and mindless way, is to write a small script and find
>>>    all regular files, read them out and rewrite it back.
>>>
>> That's fairly straightforward to do, though it should be quite slow so
>> I'd hope not to have to do that too often.
>
> Then it could be tried on the most frequently updated files then.

That's an interesting idea.
More than 3/4 of the data is just storage, so that should be very ok.

>
> And since you don't use snapshot, locate such files and then "chattr +C"
> would make them nodatacow, reducing later fragments.

I don't understand, why would that reduce later fragments?

>
>>>    This should acts much better than traditional defrag, although it's
>>>    time-consuming and makes snapshot completely meaningless.
>>>    (and since you're already hitting ENOSPC, I don't think the idea is
>>>     really working for you)
>>>
>>> And since you're already hitting ENOSPC, either it's caused by
>>> unbalanced meta/data usage, or it's really going hit the limit, I would
>>> recommend to enlarge the fs or delete some files to see if it helps.
>>>
>> Yup, I usually either slowly ramp up the {d,m}usage to pass it, or
>> when that does not work I free some space, then balance will finish.
>> Or did you mean to free some space to see about mount speed?
>
> Kind of, just do such freeing in advance, and try to make btrfs always
> have unallocated space in case.
>

I actually have very little free space on those partitions, usually
under 90Gb, maybe that's part of my problem.

> And finally, use latest kernel if possible.
> IIRC old kernel doesn't have empty block group auto remove, which makes
> user need to manually balance to free some space.
>
> Thanks,
> Qu
>

I am on 4.15 so no problem there.

So manual defrag and new FS to try.

Thank you!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 11:40                                                                 ` Qu Wenruo
  2018-02-13 12:06                                                                   ` John Ettedgui
@ 2018-02-13 12:26                                                                   ` Holger Hoffstätte
  2018-02-13 12:54                                                                     ` Qu Wenruo
  1 sibling, 1 reply; 54+ messages in thread
From: Holger Hoffstätte @ 2018-02-13 12:26 UTC (permalink / raw)
  To: Qu Wenruo, John Ettedgui; +Cc: Qu Wenruo, Austin S Hemmelgarn, btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1094 bytes --]

On 02/13/18 12:40, Qu Wenruo wrote:
>>> The problem is not about how much space it takes, but how many extents
>>> are here in the filesystem.

I have no idea why btrfs' mount even needs to touch all block groups to
get going (which seems to be the root of the problem), but here's a
not so crazy idea for more "mechanical sympathy". Feel free to mock
me if this is terribly wrong or not possible. ;)

Mounting of even large filesystems (with many extents) seems to be fine
on SSDS, but not so fine on rotational storage. We've heard that from
several people with large (multi-TB) filesystems, and obviously it's
even more terrible on 5400RPM drives because their seeks are sooo sloow.

If the problem is that the bgs are touched/iterated in "tree order",
would it then not be possible to sort the block groups in physical order
before trying to load whatever mount needs to load? That way the entire
process would involve less seeking (no backward seeks for one) and the
drive could very likely get more done during a rotation before stepping
further.

cheers,
Holger


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 12:06                                                                   ` John Ettedgui
@ 2018-02-13 12:46                                                                     ` Qu Wenruo
  2018-02-13 12:52                                                                       ` John Ettedgui
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2018-02-13 12:46 UTC (permalink / raw)
  To: John Ettedgui; +Cc: Austin S Hemmelgarn, btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2232 bytes --]



On 2018年02月13日 20:06, John Ettedgui wrote:
>>>>
>>> That's fairly straightforward to do, though it should be quite slow so
>>> I'd hope not to have to do that too often.
>>
>> Then it could be tried on the most frequently updated files then.
> 
> That's an interesting idea.
> More than 3/4 of the data is just storage, so that should be very ok.

BTW, how the initial data is created?

If the initial data is all written once and doesn't get modified later,
then the problem may not be fragments.

> 
>>
>> And since you don't use snapshot, locate such files and then "chattr +C"
>> would make them nodatacow, reducing later fragments.
> 
> I don't understand, why would that reduce later fragments?

Later overwrite will not create new extent, but overwrite existing extents.
Other than CoW and cause new extents (fragments)

Although expand write will still cause new extent, but that's
unavoidable anyway.

Thanks,
Qu

> 
>>
>>>>    This should acts much better than traditional defrag, although it's
>>>>    time-consuming and makes snapshot completely meaningless.
>>>>    (and since you're already hitting ENOSPC, I don't think the idea is
>>>>     really working for you)
>>>>
>>>> And since you're already hitting ENOSPC, either it's caused by
>>>> unbalanced meta/data usage, or it's really going hit the limit, I would
>>>> recommend to enlarge the fs or delete some files to see if it helps.
>>>>
>>> Yup, I usually either slowly ramp up the {d,m}usage to pass it, or
>>> when that does not work I free some space, then balance will finish.
>>> Or did you mean to free some space to see about mount speed?
>>
>> Kind of, just do such freeing in advance, and try to make btrfs always
>> have unallocated space in case.
>>
> 
> I actually have very little free space on those partitions, usually
> under 90Gb, maybe that's part of my problem.
> 
>> And finally, use latest kernel if possible.
>> IIRC old kernel doesn't have empty block group auto remove, which makes
>> user need to manually balance to free some space.
>>
>> Thanks,
>> Qu
>>
> 
> I am on 4.15 so no problem there.
> 
> So manual defrag and new FS to try.
> 
> Thank you!
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 12:46                                                                     ` Qu Wenruo
@ 2018-02-13 12:52                                                                       ` John Ettedgui
  0 siblings, 0 replies; 54+ messages in thread
From: John Ettedgui @ 2018-02-13 12:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Austin S Hemmelgarn, btrfs

On Tue, Feb 13, 2018 at 4:46 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2018年02月13日 20:06, John Ettedgui wrote:
>>>>>
>>>> That's fairly straightforward to do, though it should be quite slow so
>>>> I'd hope not to have to do that too often.
>>>
>>> Then it could be tried on the most frequently updated files then.
>>
>> That's an interesting idea.
>> More than 3/4 of the data is just storage, so that should be very ok.
>
> BTW, how the initial data is created?
>
> If the initial data is all written once and doesn't get modified later,
> then the problem may not be fragments.
>
Mostly at once when I recreated the FS a few years ago, and then
adding on to it slowly.
Though I do try to somewhat balance the free space on all partitions
of similar drives, so it may be a tad further more from its original
condition.

>>
>>>
>>> And since you don't use snapshot, locate such files and then "chattr +C"
>>> would make them nodatacow, reducing later fragments.
>>
>> I don't understand, why would that reduce later fragments?
>
> Later overwrite will not create new extent, but overwrite existing extents.
> Other than CoW and cause new extents (fragments)
>
> Although expand write will still cause new extent, but that's
> unavoidable anyway.
>
That's why I didn't understand.
Fair enough!

Thank you!
John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 12:26                                                                   ` Holger Hoffstätte
@ 2018-02-13 12:54                                                                     ` Qu Wenruo
  2018-02-13 16:24                                                                       ` Holger Hoffstätte
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2018-02-13 12:54 UTC (permalink / raw)
  To: Holger Hoffstätte, John Ettedgui
  Cc: Qu Wenruo, Austin S Hemmelgarn, btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1848 bytes --]



On 2018年02月13日 20:26, Holger Hoffstätte wrote:
> On 02/13/18 12:40, Qu Wenruo wrote:
>>>> The problem is not about how much space it takes, but how many extents
>>>> are here in the filesystem.
> 
> I have no idea why btrfs' mount even needs to touch all block groups to
> get going (which seems to be the root of the problem), but here's a
> not so crazy idea for more "mechanical sympathy". Feel free to mock
> me if this is terribly wrong or not possible. ;)
> 
> Mounting of even large filesystems (with many extents) seems to be fine
> on SSDS, but not so fine on rotational storage. We've heard that from
> several people with large (multi-TB) filesystems, and obviously it's
> even more terrible on 5400RPM drives because their seeks are sooo sloow.
> 
> If the problem is that the bgs are touched/iterated in "tree order",
> would it then not be possible to sort the block groups in physical order
> before trying to load whatever mount needs to load?

This is in fact a good idea.
Make block group into its own tree.

But it will takes a lot of work to do, since we a modifying the on-disk
format.

In that case, a leaf with default leaf size (16K) can store 678 block
group items.
And that many block groups can contain data between 169G (256M metadata
size) to 1.6T (10G for max data chunk size).

And even for tens of tegas, a level-2 tree should handle it without
problem, and searching them should be quite fast.

The only problem is, I'm not sure if there will be enough developer
interesting with this idea, and this idea may have extra problems hidden.

Thanks,
Qu

> That way the entire
> process would involve less seeking (no backward seeks for one) and the
> drive could very likely get more done during a rotation before stepping
> further.
> 
> cheers,
> Holger
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 12:54                                                                     ` Qu Wenruo
@ 2018-02-13 16:24                                                                       ` Holger Hoffstätte
  2018-02-14  0:43                                                                         ` Qu Wenruo
  0 siblings, 1 reply; 54+ messages in thread
From: Holger Hoffstätte @ 2018-02-13 16:24 UTC (permalink / raw)
  To: Qu Wenruo, John Ettedgui; +Cc: Qu Wenruo, Austin S Hemmelgarn, btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2970 bytes --]

On 02/13/18 13:54, Qu Wenruo wrote:
> On 2018年02月13日 20:26, Holger Hoffstätte wrote:
>> On 02/13/18 12:40, Qu Wenruo wrote:
>>>>> The problem is not about how much space it takes, but how many extents
>>>>> are here in the filesystem.
>>
>> I have no idea why btrfs' mount even needs to touch all block groups to
>> get going (which seems to be the root of the problem), but here's a
>> not so crazy idea for more "mechanical sympathy". Feel free to mock
>> me if this is terribly wrong or not possible. ;)
>>
>> Mounting of even large filesystems (with many extents) seems to be fine
>> on SSDS, but not so fine on rotational storage. We've heard that from
>> several people with large (multi-TB) filesystems, and obviously it's
>> even more terrible on 5400RPM drives because their seeks are sooo sloow.
>>
>> If the problem is that the bgs are touched/iterated in "tree order",
>> would it then not be possible to sort the block groups in physical order
>> before trying to load whatever mount needs to load?
> 
> This is in fact a good idea.
> Make block group into its own tree.

Well, that's not what I was thinking about at all..yet. :)
(keep in mind I'm not really that familiar with the internals).

Out of curiosity I ran a bit of perf on my own mount process, which is
fast (~700 ms) despite being a ~1.1TB fs, mixture of lots of large and
small files. Unfortunately it's also very fresh since I recreated it just
this weekend, so everything is neatly packed together and fast.

In contrast a friend's fs is ~800 GB, but has 11 GB metadata and is pretty
old and fragmented (but running an up-to-date kernel). His fs mounts in ~5s.

My perf run shows that the only interesting part responsible for mount time
is the nested loop in btrfs_read_block_groups calling find_first_block_group
(which got inlined & is not in the perf callgraph) over and over again,
accounting for 75% of time spent.

I now understand your comment why the real solution to this problem
is to move bgs into their own tree, and agree: both kitchens and databases
have figured out a long time ago that the key to fast scan and lookup
performance is to not put different things in the same storage container;
in the case of analytical DBMS this is columnar storage. :)

But what I originally meant was something much simpler and more
brute-force-ish. I see that btrfs_read_block_groups adds readahead
(is that actually effective?) but what I was looking for was the equivalent
of a DBMS' sequential scan. Right now finding (and loading) a bg seems to
involve a nested loop of tree lookups. It seems easier to rip through the
entire tree in nice 8MB chunks and discard what you don't need instead
of seeking around trying to find all the right bits in scattered order.

Could we alleviate cold mounts by starting more readaheads in
btrfs_read_block_groups, so that the extent tree is scanned more linearly?

cheers,
Holger


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2018-02-13 16:24                                                                       ` Holger Hoffstätte
@ 2018-02-14  0:43                                                                         ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2018-02-14  0:43 UTC (permalink / raw)
  To: Holger Hoffstätte, John Ettedgui
  Cc: Qu Wenruo, Austin S Hemmelgarn, btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3470 bytes --]



On 2018年02月14日 00:24, Holger Hoffstätte wrote:
> On 02/13/18 13:54, Qu Wenruo wrote:
>> On 2018年02月13日 20:26, Holger Hoffstätte wrote:
>>> On 02/13/18 12:40, Qu Wenruo wrote:
>>>>>> The problem is not about how much space it takes, but how many extents
>>>>>> are here in the filesystem.
>>>
>>> I have no idea why btrfs' mount even needs to touch all block groups to
>>> get going (which seems to be the root of the problem), but here's a
>>> not so crazy idea for more "mechanical sympathy". Feel free to mock
>>> me if this is terribly wrong or not possible. ;)
>>>
>>> Mounting of even large filesystems (with many extents) seems to be fine
>>> on SSDS, but not so fine on rotational storage. We've heard that from
>>> several people with large (multi-TB) filesystems, and obviously it's
>>> even more terrible on 5400RPM drives because their seeks are sooo sloow.
>>>
>>> If the problem is that the bgs are touched/iterated in "tree order",
>>> would it then not be possible to sort the block groups in physical order
>>> before trying to load whatever mount needs to load?
>>
>> This is in fact a good idea.
>> Make block group into its own tree.
> 
> Well, that's not what I was thinking about at all..yet. :)
> (keep in mind I'm not really that familiar with the internals).
> 
> Out of curiosity I ran a bit of perf on my own mount process, which is
> fast (~700 ms) despite being a ~1.1TB fs, mixture of lots of large and
> small files. Unfortunately it's also very fresh since I recreated it just
> this weekend, so everything is neatly packed together and fast.
> 
> In contrast a friend's fs is ~800 GB, but has 11 GB metadata and is pretty
> old and fragmented (but running an up-to-date kernel). His fs mounts in ~5s.
> 
> My perf run shows that the only interesting part responsible for mount time
> is the nested loop in btrfs_read_block_groups calling find_first_block_group
> (which got inlined & is not in the perf callgraph) over and over again,
> accounting for 75% of time spent.
> 
> I now understand your comment why the real solution to this problem
> is to move bgs into their own tree, and agree: both kitchens and databases
> have figured out a long time ago that the key to fast scan and lookup
> performance is to not put different things in the same storage container;
> in the case of analytical DBMS this is columnar storage. :)
> 
> But what I originally meant was something much simpler and more
> brute-force-ish. I see that btrfs_read_block_groups adds readahead
> (is that actually effective?) but what I was looking for was the equivalent
> of a DBMS' sequential scan. Right now finding (and loading) a bg seems to
> involve a nested loop of tree lookups. It seems easier to rip through the
> entire tree in nice 8MB chunks and discard what you don't need instead
> of seeking around trying to find all the right bits in scattered order.

The problem is, the tree (extent tree) containing block groups is very,
very very large.

It's a tree shared by all subvolumes. And since tree nodes and leaves
can be scattered around the whole disk, it's pretty hard to do batch
readahead.

> 
> Could we alleviate cold mounts by starting more readaheads in
> btrfs_read_block_groups, so that the extent tree is scanned more linearly?

Since extent tree is not linear, it won't be as effective as we believe.

Thanks,
Qu

> 
> cheers,
> Holger
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
  2015-07-29  5:46 Georgi Georgiev
@ 2015-07-29  6:19 ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2015-07-29  6:19 UTC (permalink / raw)
  To: Georgi Georgiev, linux-btrfs

Hi,

Georgi Georgiev wrote on 2015/07/29 14:46 +0900:
> Using BTRFS on a very large filesystem, and as we put and more data to
> it, the time it takes to mount it grew to, presently, about 30 minutes.
> Is there something wrong with the filesystem? Is there a way to bring
> this time down?
>
> ...
>
> Here is a snippet from dmesg, showing how long it takes to mount (the
> EXT4-fs line is the filesystem mounted next in the boot sequence):
>
>    $ dmesg | grep -A1 btrfs
>    [   12.215764] TECH PREVIEW: btrfs may not be fully supported.
>    [   12.215766] Please review provided documentation for limitations.
>    --
>    [   12.220266] btrfs: use zlib compression
>    [   12.220815] btrfs: disk space caching is enabled
>    [   22.427258] btrfs: bdev /dev/mapper/datavg-backuplv errs: wr 0, rd 0, flush 0, corrupt 0, gen 0
>    [ 2022.397318] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
>
Quite common, especial when it grows large.
But it would be much better to use ftrace to show which btrfs operation 
takes the most time.

We have some guess on this, from reading space cache to reading chunk info.
But didn't know which takes the most of time.
> The btrfs filesystem is quite large:
>
>    $ sudo btrfs filesystem usage /dev/mapper/datavg-backuplv
>    Overall:
>        Device size:                  82.58TiB
>        Device allocated:             82.58TiB
>        Device unallocated:              0.00B
>        Device missing:                  0.00B
>        Used:                         62.01TiB
>        Free (estimated):             17.76TiB      (min: 17.76TiB)
>        Data ratio:                       1.00
>        Metadata ratio:                   2.00
>        Global reserve:                  0.00B      (used: 0.00B)
>
>    Data,single: Size:79.28TiB, Used:61.52TiB
>       /dev/mapper/datavg-backuplv    79.28TiB
>
>    Metadata,single: Size:8.00MiB, Used:0.00B
>       /dev/mapper/datavg-backuplv     8.00MiB
>
>    Metadata,DUP: Size:1.65TiB, Used:252.68GiB
>       /dev/mapper/datavg-backuplv     3.30TiB
>
>    System,single: Size:4.00MiB, Used:0.00B
>       /dev/mapper/datavg-backuplv     4.00MiB
>
>    System,DUP: Size:40.00MiB, Used:8.66MiB
>       /dev/mapper/datavg-backuplv    80.00MiB
>
>    Unallocated:
>       /dev/mapper/datavg-backuplv       0.00B
Wow, near 100T, that really huge now.
>
> Other info about the filesystem is that it has a rather large number of
> files and subvolumes and read only snapshots, which started from about
> zero in March, and grew over to the current state of 3000 snapshots and
> no idea how many files (filesystem usage is quite stable at the moment).
>
> I also noticed that while the machine is rebooted on a weekly basis, the
> time it takes to come up after a reboot has been growing. This is likely
> correlated to how long it takes to mount the filesystem, and maybe
> correlated to how much data there is on the filesystem.
>
> Reboot time used to be normally about 3 minutes, then it jumped to 8
> minutes on March 21 and the following weeks it went like this:
> 8 minutes, 11 minutes, 15 minutes...
> 19, 19, 19, 19, 23, 21, 22
> 32, 33, 36, 42, 46, 37, 30
>
> This is on CentOS 6.6, and while I understand that the version of btrfs
> is definitely oldish, even trying to mount the filesystem on a much more
> recent kernel (3.14.43) there is no improvement. Switching the regular
> OS kernel from the CentOS one (2.6.32-504.12.2.el6.x86_64) to something
> more recent is also feasible.
>
> I wanted to check the sytem for problems, so tried an offline "btrfs
> check" using the latest btrfs-progs (version 4.1.2 freshly compiled from
> source), but "btrfs check" ran out of memory after about 30 minutes.
>
> The only output I get is this (timestamps added by me):
>
>    2015-07-28 18:14:45 $ sudo btrfs check /dev/datavg/backuplv
>    2015-07-28 18:33:05 checking extents
>
> And at 19:04:55 btrfs was killed by OOM: (abbreviated log below,
> full excerpt as an attachment).
Not surprised at all.
As for extent/chunk tree checking, it will read all the the chunk and 
extents, and restore needed info into memory, and then do cross 
reference check.

The btrfsck process really takes a lot of memory.
Maybe 1/10 or more of the metadata space.
In your case, your metadata is about 250GB, so maybe 25GB memory is used 
to hold the needed info.

That's already known but we don't have some good idea or deveopler to 
reduce the space usage yet.

Maybe we can change the behavior to do chunk by chunk extent cross 
checking to reduce memory usage, but not now...

Thanks,
Qu
>
>    2015-07-28T19:04:55.224855+09:00 localhost kernel: [11689.692680] htop invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
>    ...
>    2015-07-28T19:04:55.225855+09:00 localhost kernel: [11689.801354] 631 total pagecache pages
>    2015-07-28T19:04:55.225857+09:00 localhost kernel: [11689.801829] 0 pages in swap cache
>    2015-07-28T19:04:55.225859+09:00 localhost kernel: [11689.802305] Swap cache stats: add 0, delete 0, find 0/0
>    2015-07-28T19:04:55.225861+09:00 localhost kernel: [11689.802781] Free swap  = 0kB
>    2015-07-28T19:04:55.225863+09:00 localhost kernel: [11689.803341] Total swap = 0kB
>    2015-07-28T19:04:55.225864+09:00 localhost kernel: [11689.946223] 16777215 pages RAM
>    2015-07-28T19:04:55.225867+09:00 localhost kernel: [11689.946724] 295175 pages reserved
>    2015-07-28T19:04:55.225869+09:00 localhost kernel: [11689.947223] 5173 pages shared
>    2015-07-28T19:04:55.225871+09:00 localhost kernel: [11689.947721] 16369184 pages non-shared
>    2015-07-28T19:04:55.225874+09:00 localhost kernel: [11689.948222] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
>    ...
>    2015-07-28T19:04:55.225970+09:00 localhost kernel: [11689.994240] [16291]     0 16291    47166      177  18       0             0 sudo
>    2015-07-28T19:04:55.225972+09:00 localhost kernel: [11689.995232] [16292]  1000 16292      981       20   3       0             0 tai64n
>    2015-07-28T19:04:55.225974+09:00 localhost kernel: [11689.996241] [16293]     0 16293    47166      177  22       0             0 sudo
>    2015-07-28T19:04:55.225978+09:00 localhost kernel: [11689.997230] [16294]  1000 16294     1018       21   1       0             0 tai64nlocal
>    2015-07-28T19:04:55.225993+09:00 localhost kernel: [11689.998227] [16295]     0 16295 16122385 16118611   7       0             0 btrfs
>    2015-07-28T19:04:55.225995+09:00 localhost kernel: [11689.999210] [16296]     0 16296    25228       25   5       0             0 tee
>    2015-07-28T19:04:55.225997+09:00 localhost kernel: [11690.000201] [16297]  1000 16297    27133      162   1       0             0 bash
>    ...
>    2015-07-28T19:04:55.226030+09:00 localhost kernel: [11690.008288] Out of memory: Kill process 16295 (btrfs) score 949 or sacrifice child
>    2015-07-28T19:04:55.226031+09:00 localhost kernel: [11690.009300] Killed process 16295, UID 0, (btrfs) total-vm:64489540kB, anon-rss:64474408kB, file-rss:36kB
>
> Thanks in advance for any advice,
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* mount btrfs takes 30 minutes, btrfs check runs out of memory
@ 2015-07-29  5:46 Georgi Georgiev
  2015-07-29  6:19 ` Qu Wenruo
  0 siblings, 1 reply; 54+ messages in thread
From: Georgi Georgiev @ 2015-07-29  5:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Georgi Georgiev

[-- Attachment #1: Type: text/plain, Size: 5910 bytes --]

Using BTRFS on a very large filesystem, and as we put and more data to
it, the time it takes to mount it grew to, presently, about 30 minutes.
Is there something wrong with the filesystem? Is there a way to bring
this time down?

...

Here is a snippet from dmesg, showing how long it takes to mount (the
EXT4-fs line is the filesystem mounted next in the boot sequence):

  $ dmesg | grep -A1 btrfs
  [   12.215764] TECH PREVIEW: btrfs may not be fully supported.
  [   12.215766] Please review provided documentation for limitations.
  --
  [   12.220266] btrfs: use zlib compression
  [   12.220815] btrfs: disk space caching is enabled
  [   22.427258] btrfs: bdev /dev/mapper/datavg-backuplv errs: wr 0, rd 0, flush 0, corrupt 0, gen 0
  [ 2022.397318] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: 

The btrfs filesystem is quite large:

  $ sudo btrfs filesystem usage /dev/mapper/datavg-backuplv
  Overall:
      Device size:                  82.58TiB
      Device allocated:             82.58TiB
      Device unallocated:              0.00B
      Device missing:                  0.00B
      Used:                         62.01TiB
      Free (estimated):             17.76TiB      (min: 17.76TiB)
      Data ratio:                       1.00
      Metadata ratio:                   2.00
      Global reserve:                  0.00B      (used: 0.00B)
  
  Data,single: Size:79.28TiB, Used:61.52TiB
     /dev/mapper/datavg-backuplv    79.28TiB
  
  Metadata,single: Size:8.00MiB, Used:0.00B
     /dev/mapper/datavg-backuplv     8.00MiB
  
  Metadata,DUP: Size:1.65TiB, Used:252.68GiB
     /dev/mapper/datavg-backuplv     3.30TiB
  
  System,single: Size:4.00MiB, Used:0.00B
     /dev/mapper/datavg-backuplv     4.00MiB
  
  System,DUP: Size:40.00MiB, Used:8.66MiB
     /dev/mapper/datavg-backuplv    80.00MiB
  
  Unallocated:
     /dev/mapper/datavg-backuplv       0.00B

Other info about the filesystem is that it has a rather large number of
files and subvolumes and read only snapshots, which started from about
zero in March, and grew over to the current state of 3000 snapshots and
no idea how many files (filesystem usage is quite stable at the moment).

I also noticed that while the machine is rebooted on a weekly basis, the
time it takes to come up after a reboot has been growing. This is likely
correlated to how long it takes to mount the filesystem, and maybe
correlated to how much data there is on the filesystem.

Reboot time used to be normally about 3 minutes, then it jumped to 8
minutes on March 21 and the following weeks it went like this:
8 minutes, 11 minutes, 15 minutes...
19, 19, 19, 19, 23, 21, 22
32, 33, 36, 42, 46, 37, 30

This is on CentOS 6.6, and while I understand that the version of btrfs
is definitely oldish, even trying to mount the filesystem on a much more
recent kernel (3.14.43) there is no improvement. Switching the regular
OS kernel from the CentOS one (2.6.32-504.12.2.el6.x86_64) to something
more recent is also feasible.

I wanted to check the sytem for problems, so tried an offline "btrfs
check" using the latest btrfs-progs (version 4.1.2 freshly compiled from
source), but "btrfs check" ran out of memory after about 30 minutes.

The only output I get is this (timestamps added by me):

  2015-07-28 18:14:45 $ sudo btrfs check /dev/datavg/backuplv
  2015-07-28 18:33:05 checking extents

And at 19:04:55 btrfs was killed by OOM: (abbreviated log below,
full excerpt as an attachment).

  2015-07-28T19:04:55.224855+09:00 localhost kernel: [11689.692680] htop invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
  ...
  2015-07-28T19:04:55.225855+09:00 localhost kernel: [11689.801354] 631 total pagecache pages
  2015-07-28T19:04:55.225857+09:00 localhost kernel: [11689.801829] 0 pages in swap cache
  2015-07-28T19:04:55.225859+09:00 localhost kernel: [11689.802305] Swap cache stats: add 0, delete 0, find 0/0
  2015-07-28T19:04:55.225861+09:00 localhost kernel: [11689.802781] Free swap  = 0kB
  2015-07-28T19:04:55.225863+09:00 localhost kernel: [11689.803341] Total swap = 0kB
  2015-07-28T19:04:55.225864+09:00 localhost kernel: [11689.946223] 16777215 pages RAM
  2015-07-28T19:04:55.225867+09:00 localhost kernel: [11689.946724] 295175 pages reserved
  2015-07-28T19:04:55.225869+09:00 localhost kernel: [11689.947223] 5173 pages shared
  2015-07-28T19:04:55.225871+09:00 localhost kernel: [11689.947721] 16369184 pages non-shared
  2015-07-28T19:04:55.225874+09:00 localhost kernel: [11689.948222] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
  ...
  2015-07-28T19:04:55.225970+09:00 localhost kernel: [11689.994240] [16291]     0 16291    47166      177  18       0             0 sudo
  2015-07-28T19:04:55.225972+09:00 localhost kernel: [11689.995232] [16292]  1000 16292      981       20   3       0             0 tai64n
  2015-07-28T19:04:55.225974+09:00 localhost kernel: [11689.996241] [16293]     0 16293    47166      177  22       0             0 sudo
  2015-07-28T19:04:55.225978+09:00 localhost kernel: [11689.997230] [16294]  1000 16294     1018       21   1       0             0 tai64nlocal
  2015-07-28T19:04:55.225993+09:00 localhost kernel: [11689.998227] [16295]     0 16295 16122385 16118611   7       0             0 btrfs
  2015-07-28T19:04:55.225995+09:00 localhost kernel: [11689.999210] [16296]     0 16296    25228       25   5       0             0 tee
  2015-07-28T19:04:55.225997+09:00 localhost kernel: [11690.000201] [16297]  1000 16297    27133      162   1       0             0 bash
  ...
  2015-07-28T19:04:55.226030+09:00 localhost kernel: [11690.008288] Out of memory: Kill process 16295 (btrfs) score 949 or sacrifice child
  2015-07-28T19:04:55.226031+09:00 localhost kernel: [11690.009300] Killed process 16295, UID 0, (btrfs) total-vm:64489540kB, anon-rss:64474408kB, file-rss:36kB

Thanks in advance for any advice,
-- 
Georgi

[-- Attachment #2: oom.log --]
[-- Type: text/plain, Size: 28271 bytes --]

2015-07-28T19:04:55.224855+09:00 localhost kernel: [11689.692680] htop invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
2015-07-28T19:04:55.225076+09:00 localhost kernel: [11689.693636] htop cpuset=/ mems_allowed=0-1
2015-07-28T19:04:55.225269+09:00 localhost kernel: [11689.694114] Pid: 16323, comm: htop Tainted: G           ---------------  T 2.6.32-504.12.2.el6.x86_64 #1
2015-07-28T19:04:55.225274+09:00 localhost kernel: [11689.695062] Call Trace:
2015-07-28T19:04:55.225278+09:00 localhost kernel: [11689.695551]  [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
2015-07-28T19:04:55.225281+09:00 localhost kernel: [11689.696045]  [<ffffffff81127300>] ? dump_header+0x90/0x1b0
2015-07-28T19:04:55.225283+09:00 localhost kernel: [11689.696534]  [<ffffffff8122eb5c>] ? security_real_capable_noaudit+0x3c/0x70
2015-07-28T19:04:55.225285+09:00 localhost kernel: [11689.697021]  [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
2015-07-28T19:04:55.225288+09:00 localhost kernel: [11689.697507]  [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120
2015-07-28T19:04:55.225290+09:00 localhost kernel: [11689.697991]  [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
2015-07-28T19:04:55.225292+09:00 localhost kernel: [11689.698479]  [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
2015-07-28T19:04:55.225295+09:00 localhost kernel: [11689.698967]  [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
2015-07-28T19:04:55.225297+09:00 localhost kernel: [11689.699451]  [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
2015-07-28T19:04:55.225300+09:00 localhost kernel: [11689.699929]  [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
2015-07-28T19:04:55.225302+09:00 localhost kernel: [11689.700413]  [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
2015-07-28T19:04:55.225305+09:00 localhost kernel: [11689.700896]  [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
2015-07-28T19:04:55.225307+09:00 localhost kernel: [11689.701377]  [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
2015-07-28T19:04:55.225310+09:00 localhost kernel: [11689.701862]  [<ffffffff811b07e0>] ? mntput_no_expire+0x30/0x110
2015-07-28T19:04:55.225314+09:00 localhost kernel: [11689.702348]  [<ffffffff8118b18f>] ? __dentry_open+0x23f/0x360
2015-07-28T19:04:55.225316+09:00 localhost kernel: [11689.702827]  [<ffffffff8122e6ff>] ? security_inode_permission+0x1f/0x30
2015-07-28T19:04:55.225318+09:00 localhost kernel: [11689.703308]  [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
2015-07-28T19:04:55.225321+09:00 localhost kernel: [11689.703873]  [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
2015-07-28T19:04:55.225323+09:00 localhost kernel: [11689.704357]  [<ffffffff8129901b>] ? strncpy_from_user+0x5b/0x90
2015-07-28T19:04:55.225325+09:00 localhost kernel: [11689.704842]  [<ffffffff8153003e>] ? do_page_fault+0x3e/0xa0
2015-07-28T19:04:55.225328+09:00 localhost kernel: [11689.705329]  [<ffffffff8152d3f5>] ? page_fault+0x25/0x30
2015-07-28T19:04:55.225331+09:00 localhost kernel: [11689.705807] Mem-Info:
2015-07-28T19:04:55.225333+09:00 localhost kernel: [11689.706280] Node 0 DMA per-cpu:
2015-07-28T19:04:55.225336+09:00 localhost kernel: [11689.706756] CPU    0: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225338+09:00 localhost kernel: [11689.707233] CPU    1: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225341+09:00 localhost kernel: [11689.707709] CPU    2: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225344+09:00 localhost kernel: [11689.708190] CPU    3: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225347+09:00 localhost kernel: [11689.708667] CPU    4: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225349+09:00 localhost kernel: [11689.709144] CPU    5: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225352+09:00 localhost kernel: [11689.709622] CPU    6: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225354+09:00 localhost kernel: [11689.710100] CPU    7: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225357+09:00 localhost kernel: [11689.710577] CPU    8: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225359+09:00 localhost kernel: [11689.711052] CPU    9: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225361+09:00 localhost kernel: [11689.711531] CPU   10: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225364+09:00 localhost kernel: [11689.712005] CPU   11: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225366+09:00 localhost kernel: [11689.712485] CPU   12: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225368+09:00 localhost kernel: [11689.712959] CPU   13: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225371+09:00 localhost kernel: [11689.713440] CPU   14: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225373+09:00 localhost kernel: [11689.713912] CPU   15: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225376+09:00 localhost kernel: [11689.714392] CPU   16: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225379+09:00 localhost kernel: [11689.714868] CPU   17: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225381+09:00 localhost kernel: [11689.715350] CPU   18: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225383+09:00 localhost kernel: [11689.715830] CPU   19: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225386+09:00 localhost kernel: [11689.716310] CPU   20: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225388+09:00 localhost kernel: [11689.716788] CPU   21: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225390+09:00 localhost kernel: [11689.717267] CPU   22: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225393+09:00 localhost kernel: [11689.717838] CPU   23: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225396+09:00 localhost kernel: [11689.718314] CPU   24: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225398+09:00 localhost kernel: [11689.718787] CPU   25: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225401+09:00 localhost kernel: [11689.719264] CPU   26: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225403+09:00 localhost kernel: [11689.719741] CPU   27: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225406+09:00 localhost kernel: [11689.720217] CPU   28: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225408+09:00 localhost kernel: [11689.720698] CPU   29: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225411+09:00 localhost kernel: [11689.721177] CPU   30: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225413+09:00 localhost kernel: [11689.721655] CPU   31: hi:    0, btch:   1 usd:   0
2015-07-28T19:04:55.225416+09:00 localhost kernel: [11689.722136] Node 0 DMA32 per-cpu:
2015-07-28T19:04:55.225418+09:00 localhost kernel: [11689.722616] CPU    0: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225421+09:00 localhost kernel: [11689.723095] CPU    1: hi:  186, btch:  31 usd:  30
2015-07-28T19:04:55.225422+09:00 localhost kernel: [11689.734923] CPU    2: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225425+09:00 localhost kernel: [11689.735406] CPU    3: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225427+09:00 localhost kernel: [11689.735880] CPU    4: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225429+09:00 localhost kernel: [11689.736359] CPU    5: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225431+09:00 localhost kernel: [11689.736834] CPU    6: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225433+09:00 localhost kernel: [11689.737312] CPU    7: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225436+09:00 localhost kernel: [11689.737793] CPU    8: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225438+09:00 localhost kernel: [11689.738275] CPU    9: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225440+09:00 localhost kernel: [11689.738755] CPU   10: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225443+09:00 localhost kernel: [11689.739233] CPU   11: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225445+09:00 localhost kernel: [11689.739713] CPU   12: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225448+09:00 localhost kernel: [11689.740192] CPU   13: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225451+09:00 localhost kernel: [11689.740670] CPU   14: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225454+09:00 localhost kernel: [11689.741264] CPU   15: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225469+09:00 localhost kernel: [11689.741744] CPU   16: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225471+09:00 localhost kernel: [11689.742224] CPU   17: hi:  186, btch:  31 usd:  31
2015-07-28T19:04:55.225473+09:00 localhost kernel: [11689.742701] CPU   18: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225476+09:00 localhost kernel: [11689.743178] CPU   19: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225478+09:00 localhost kernel: [11689.743654] CPU   20: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225480+09:00 localhost kernel: [11689.744131] CPU   21: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225483+09:00 localhost kernel: [11689.744610] CPU   22: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225485+09:00 localhost kernel: [11689.745088] CPU   23: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225488+09:00 localhost kernel: [11689.745572] CPU   24: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225490+09:00 localhost kernel: [11689.746144] CPU   25: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225493+09:00 localhost kernel: [11689.746618] CPU   26: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225495+09:00 localhost kernel: [11689.747089] CPU   27: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225498+09:00 localhost kernel: [11689.747570] CPU   28: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225501+09:00 localhost kernel: [11689.748051] CPU   29: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225503+09:00 localhost kernel: [11689.748531] CPU   30: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225505+09:00 localhost kernel: [11689.749007] CPU   31: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225507+09:00 localhost kernel: [11689.749489] Node 0 Normal per-cpu:
2015-07-28T19:04:55.225509+09:00 localhost kernel: [11689.749965] CPU    0: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225510+09:00 localhost kernel: [11689.750444] CPU    1: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225512+09:00 localhost kernel: [11689.750921] CPU    2: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225514+09:00 localhost kernel: [11689.751405] CPU    3: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225517+09:00 localhost kernel: [11689.751886] CPU    4: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225532+09:00 localhost kernel: [11689.752368] CPU    5: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225534+09:00 localhost kernel: [11689.752849] CPU    6: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225536+09:00 localhost kernel: [11689.753329] CPU    7: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225538+09:00 localhost kernel: [11689.753808] CPU    8: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225540+09:00 localhost kernel: [11689.754291] CPU    9: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225543+09:00 localhost kernel: [11689.754770] CPU   10: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225545+09:00 localhost kernel: [11689.755250] CPU   11: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225548+09:00 localhost kernel: [11689.755731] CPU   12: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225550+09:00 localhost kernel: [11689.756214] CPU   13: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225552+09:00 localhost kernel: [11689.756692] CPU   14: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225554+09:00 localhost kernel: [11689.757174] CPU   15: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225556+09:00 localhost kernel: [11689.757652] CPU   16: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225558+09:00 localhost kernel: [11689.758129] CPU   17: hi:  186, btch:  31 usd:   5
2015-07-28T19:04:55.225561+09:00 localhost kernel: [11689.758607] CPU   18: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225563+09:00 localhost kernel: [11689.759081] CPU   19: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225570+09:00 localhost kernel: [11689.759559] CPU   20: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225573+09:00 localhost kernel: [11689.760038] CPU   21: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225575+09:00 localhost kernel: [11689.760608] CPU   22: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225578+09:00 localhost kernel: [11689.761082] CPU   23: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225580+09:00 localhost kernel: [11689.761562] CPU   24: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225582+09:00 localhost kernel: [11689.762040] CPU   25: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225584+09:00 localhost kernel: [11689.762519] CPU   26: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225585+09:00 localhost kernel: [11689.762998] CPU   27: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225587+09:00 localhost kernel: [11689.763481] CPU   28: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225589+09:00 localhost kernel: [11689.763961] CPU   29: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225591+09:00 localhost kernel: [11689.764442] CPU   30: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225593+09:00 localhost kernel: [11689.764921] CPU   31: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225595+09:00 localhost kernel: [11689.765402] Node 1 Normal per-cpu:
2015-07-28T19:04:55.225597+09:00 localhost kernel: [11689.765882] CPU    0: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225599+09:00 localhost kernel: [11689.766365] CPU    1: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225601+09:00 localhost kernel: [11689.766839] CPU    2: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225603+09:00 localhost kernel: [11689.767316] CPU    3: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225605+09:00 localhost kernel: [11689.767791] CPU    4: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225607+09:00 localhost kernel: [11689.768268] CPU    5: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225609+09:00 localhost kernel: [11689.768746] CPU    6: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225611+09:00 localhost kernel: [11689.769228] CPU    7: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225614+09:00 localhost kernel: [11689.769708] CPU    8: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225616+09:00 localhost kernel: [11689.770187] CPU    9: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225618+09:00 localhost kernel: [11689.770665] CPU   10: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225620+09:00 localhost kernel: [11689.771145] CPU   11: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225622+09:00 localhost kernel: [11689.771624] CPU   12: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225625+09:00 localhost kernel: [11689.772105] CPU   13: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225774+09:00 localhost kernel: [11689.772585] CPU   14: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225777+09:00 localhost kernel: [11689.773063] CPU   15: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225780+09:00 localhost kernel: [11689.773545] CPU   16: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225782+09:00 localhost kernel: [11689.774023] CPU   17: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225784+09:00 localhost kernel: [11689.774506] CPU   18: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225786+09:00 localhost kernel: [11689.775078] CPU   19: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225789+09:00 localhost kernel: [11689.775557] CPU   20: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225792+09:00 localhost kernel: [11689.776031] CPU   21: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225794+09:00 localhost kernel: [11689.776511] CPU   22: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225796+09:00 localhost kernel: [11689.776986] CPU   23: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225797+09:00 localhost kernel: [11689.777469] CPU   24: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225799+09:00 localhost kernel: [11689.777948] CPU   25: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225802+09:00 localhost kernel: [11689.778429] CPU   26: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225804+09:00 localhost kernel: [11689.778906] CPU   27: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225806+09:00 localhost kernel: [11689.779386] CPU   28: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225808+09:00 localhost kernel: [11689.779861] CPU   29: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225810+09:00 localhost kernel: [11689.780343] CPU   30: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225813+09:00 localhost kernel: [11689.780823] CPU   31: hi:  186, btch:  31 usd:   0
2015-07-28T19:04:55.225815+09:00 localhost kernel: [11689.781309] active_anon:16209682 inactive_anon:26 isolated_anon:0
2015-07-28T19:04:55.225817+09:00 localhost kernel: [11689.781309]  active_file:124 inactive_file:123 isolated_file:0
2015-07-28T19:04:55.225819+09:00 localhost kernel: [11689.781310]  unevictable:0 dirty:3 writeback:0 unstable:0
2015-07-28T19:04:55.225822+09:00 localhost kernel: [11689.781311]  free:98591 slab_reclaimable:4276 slab_unreclaimable:15061
2015-07-28T19:04:55.225824+09:00 localhost kernel: [11689.781311]  mapped:202 shmem:132 pagetables:32940 bounce:0
2015-07-28T19:04:55.225827+09:00 localhost kernel: [11689.783710] Node 0 DMA free:15740kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
2015-07-28T19:04:55.225830+09:00 localhost kernel: [11689.786576] lowmem_reserve[]: 0 2955 32245 32245
2015-07-28T19:04:55.225832+09:00 localhost kernel: [11689.787078] Node 0 DMA32 free:129148kB min:11992kB low:14988kB high:17988kB active_anon:2270248kB inactive_anon:0kB active_file:76kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3026080kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:1268kB slab_unreclaimable:1140kB kernel_stack:0kB pagetables:4288kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:124 all_unreclaimable? yes
2015-07-28T19:04:55.225835+09:00 localhost kernel: [11689.790032] lowmem_reserve[]: 0 0 29290 29290
2015-07-28T19:04:55.225837+09:00 localhost kernel: [11689.790536] Node 0 Normal free:118632kB min:118892kB low:148612kB high:178336kB active_anon:29973164kB inactive_anon:40kB active_file:0kB inactive_file:108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:29992960kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:232kB slab_reclaimable:8184kB slab_unreclaimable:35756kB kernel_stack:4992kB pagetables:61180kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:168 all_unreclaimable? yes
2015-07-28T19:04:55.225839+09:00 localhost kernel: [11689.793404] lowmem_reserve[]: 0 0 0 0
2015-07-28T19:04:55.225841+09:00 localhost kernel: [11689.793904] Node 1 Normal free:130844kB min:131192kB low:163988kB high:196788kB active_anon:32595316kB inactive_anon:64kB active_file:420kB inactive_file:484kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:33095680kB mlocked:0kB dirty:12kB writeback:0kB mapped:804kB shmem:296kB slab_reclaimable:7652kB slab_unreclaimable:23348kB kernel_stack:376kB pagetables:66292kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1426 all_unreclaimable? yes
2015-07-28T19:04:55.225844+09:00 localhost kernel: [11689.796784] lowmem_reserve[]: 0 0 0 0
2015-07-28T19:04:55.225846+09:00 localhost kernel: [11689.797286] Node 0 DMA: 3*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15740kB
2015-07-28T19:04:55.225848+09:00 localhost kernel: [11689.798302] Node 0 DMA32: 348*4kB 323*8kB 294*16kB 230*32kB 189*64kB 145*128kB 92*256kB 47*512kB 26*1024kB 4*2048kB 0*4096kB = 129128kB
2015-07-28T19:04:55.225850+09:00 localhost kernel: [11689.799321] Node 0 Normal: 30009*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 120044kB
2015-07-28T19:04:55.225852+09:00 localhost kernel: [11689.800336] Node 1 Normal: 32983*4kB 11*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 132036kB
2015-07-28T19:04:55.225855+09:00 localhost kernel: [11689.801354] 631 total pagecache pages
2015-07-28T19:04:55.225857+09:00 localhost kernel: [11689.801829] 0 pages in swap cache
2015-07-28T19:04:55.225859+09:00 localhost kernel: [11689.802305] Swap cache stats: add 0, delete 0, find 0/0
2015-07-28T19:04:55.225861+09:00 localhost kernel: [11689.802781] Free swap  = 0kB
2015-07-28T19:04:55.225863+09:00 localhost kernel: [11689.803341] Total swap = 0kB
2015-07-28T19:04:55.225864+09:00 localhost kernel: [11689.946223] 16777215 pages RAM
2015-07-28T19:04:55.225867+09:00 localhost kernel: [11689.946724] 295175 pages reserved
2015-07-28T19:04:55.225869+09:00 localhost kernel: [11689.947223] 5173 pages shared
2015-07-28T19:04:55.225871+09:00 localhost kernel: [11689.947721] 16369184 pages non-shared
2015-07-28T19:04:55.225874+09:00 localhost kernel: [11689.948222] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
2015-07-28T19:04:55.225876+09:00 localhost kernel: [11689.949309] [ 1327]     0  1327     2874      317   1     -17         -1000 udevd
2015-07-28T19:04:55.225879+09:00 localhost kernel: [11689.950308] [ 3227]     0  3227    25814       77   0       0             0 lvmetad
2015-07-28T19:04:55.225881+09:00 localhost kernel: [11689.951302] [ 8574]     0  8574     6899       61   5     -17         -1000 auditd
2015-07-28T19:04:55.225883+09:00 localhost kernel: [11689.952296] [ 8594]     0  8594   125317      287   3       0             0 rsyslogd
2015-07-28T19:04:55.225886+09:00 localhost kernel: [11689.953289] [ 8718]     0  8718    40367      243   0       0             0 pbx_exchange
2015-07-28T19:04:55.225888+09:00 localhost kernel: [11689.954284] [ 8730]    81  8730     5358       63   1       0             0 dbus-daemon
2015-07-28T19:04:55.225890+09:00 localhost kernel: [11689.955281] [ 8768]     0  8768    63314    14236  16       0             0 snmpd
2015-07-28T19:04:55.225892+09:00 localhost kernel: [11689.956277] [ 8785]     0  8785    16554      178   0     -17         -1000 sshd
2015-07-28T19:04:55.225894+09:00 localhost kernel: [11689.957265] [ 8796]     0  8796     5429       59   1       0             0 xinetd
2015-07-28T19:04:55.225897+09:00 localhost kernel: [11689.958259] [ 8823]    38  8823     6627      147   0       0             0 ntpd
2015-07-28T19:04:55.225899+09:00 localhost kernel: [11689.959254] [ 8902]     0  8902    20214      226  21       0             0 master
2015-07-28T19:04:55.225901+09:00 localhost kernel: [11689.960325] [ 8912]     0  8912    29216      156  16       0             0 crond
2015-07-28T19:04:55.225903+09:00 localhost kernel: [11689.961316] [ 8914]    89  8914    20277      238   1       0             0 qmgr
2015-07-28T19:04:55.225906+09:00 localhost kernel: [11689.962315] [ 8935]     0  8935     5276       45   1       0             0 atd
2015-07-28T19:04:55.225908+09:00 localhost kernel: [11689.963310] [ 9141]     0  9141   257570     5334   0       0             0 dsm_sa_datamgrd
2015-07-28T19:04:55.225910+09:00 localhost kernel: [11689.964305] [ 9334]     0  9334    73207      203  17       0             0 dsm_sa_eventmgr
2015-07-28T19:04:55.225913+09:00 localhost kernel: [11689.977183] [ 9347]     0  9347   125807     2198  20       0             0 dsm_sa_snmpd
2015-07-28T19:04:55.225915+09:00 localhost kernel: [11689.978202] [ 9381]     0  9381    33145      113   3       0             0 dsm_om_connsvcd
2015-07-28T19:04:55.225929+09:00 localhost kernel: [11689.979228] [ 9382]     0  9382   889850    61373  19       0             0 dsm_om_connsvcd
2015-07-28T19:04:55.225931+09:00 localhost kernel: [11689.980218] [ 9414]     0  9414   189407     5176   0       0             0 dsm_sa_datamgrd
2015-07-28T19:04:55.225934+09:00 localhost kernel: [11689.981215] [ 9435]     0  9435   159830     1217   1       0             0 dsm_om_shrsvcd
2015-07-28T19:04:55.225936+09:00 localhost kernel: [11689.982216] [10192]     0 10192     1016       20   9       0             0 mingetty
2015-07-28T19:04:55.225938+09:00 localhost kernel: [11689.983201] [10194]     0 10194     1016       21  19       0             0 mingetty
2015-07-28T19:04:55.225940+09:00 localhost kernel: [11689.984202] [10196]     0 10196     1016       21  19       0             0 mingetty
2015-07-28T19:04:55.225942+09:00 localhost kernel: [11689.985201] [10200]     0 10200     1016       21  27       0             0 mingetty
2015-07-28T19:04:55.225944+09:00 localhost kernel: [11689.986180] [10202]     0 10202     1016       22  25       0             0 mingetty
2015-07-28T19:04:55.225946+09:00 localhost kernel: [11689.987176] [13176]  1000 13176     7468     1112  21       0             0 tmux
2015-07-28T19:04:55.225949+09:00 localhost kernel: [11689.988269] [13177]  1000 13177    27187      201   1       0             0 bash
2015-07-28T19:04:55.225951+09:00 localhost kernel: [11689.989268] [13242]     0 13242     1016       21   0       0             0 mingetty
2015-07-28T19:04:55.225961+09:00 localhost kernel: [11689.990262] [15161]     0 15161     2663      109   5     -17         -1000 udevd
2015-07-28T19:04:55.225964+09:00 localhost kernel: [11689.991245] [15179]     0 15179     2661      104   2     -17         -1000 udevd
2015-07-28T19:04:55.225966+09:00 localhost kernel: [11689.992266] [15471]  1000 15471    27133      168   0       0             0 bash
2015-07-28T19:04:55.225968+09:00 localhost kernel: [11689.993246] [15577]    89 15577    20234      218   1       0             0 pickup
2015-07-28T19:04:55.225970+09:00 localhost kernel: [11689.994240] [16291]     0 16291    47166      177  18       0             0 sudo
2015-07-28T19:04:55.225972+09:00 localhost kernel: [11689.995232] [16292]  1000 16292      981       20   3       0             0 tai64n
2015-07-28T19:04:55.225974+09:00 localhost kernel: [11689.996241] [16293]     0 16293    47166      177  22       0             0 sudo
2015-07-28T19:04:55.225978+09:00 localhost kernel: [11689.997230] [16294]  1000 16294     1018       21   1       0             0 tai64nlocal
2015-07-28T19:04:55.225993+09:00 localhost kernel: [11689.998227] [16295]     0 16295 16122385 16118611   7       0             0 btrfs
2015-07-28T19:04:55.225995+09:00 localhost kernel: [11689.999210] [16296]     0 16296    25228       25   5       0             0 tee
2015-07-28T19:04:55.225997+09:00 localhost kernel: [11690.000201] [16297]  1000 16297    27133      162   1       0             0 bash
2015-07-28T19:04:55.225999+09:00 localhost kernel: [11690.001179] [16322]     0 16322    47166      178  19       0             0 sudo
2015-07-28T19:04:55.226015+09:00 localhost kernel: [11690.002167] [16323]     0 16323    28411      433   1       0             0 htop
2015-07-28T19:04:55.226020+09:00 localhost kernel: [11690.003270] [16329]  1000 16329    25240       38   0       0             0 iostat
2015-07-28T19:04:55.226022+09:00 localhost kernel: [11690.004244] [16436]     0 16436    24490      233   0       0             0 sshd
2015-07-28T19:04:55.226024+09:00 localhost kernel: [11690.005229] [16454]  1000 16454    24490      237   2       0             0 sshd
2015-07-28T19:04:55.226026+09:00 localhost kernel: [11690.006230] [16455]  1000 16455    27142      178  16       0             0 bash
2015-07-28T19:04:55.226028+09:00 localhost kernel: [11690.007272] [16481]  1000 16481     5925       82  18       0             0 tmux
2015-07-28T19:04:55.226030+09:00 localhost kernel: [11690.008288] Out of memory: Kill process 16295 (btrfs) score 949 or sacrifice child
2015-07-28T19:04:55.226031+09:00 localhost kernel: [11690.009300] Killed process 16295, UID 0, (btrfs) total-vm:64489540kB, anon-rss:64474408kB, file-rss:36kB

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2018-02-14  0:43 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAJ3TwYQXqUZiKhYc5rciTmvGX1RLkHnkQb5SSYAJ7AD+kbudag@mail.gmail.com>
2015-07-31  2:34 ` mount btrfs takes 30 minutes, btrfs check runs out of memory Qu Wenruo
2015-07-31  4:10   ` John Ettedgui
2015-08-02  5:44     ` Georgi Georgiev
     [not found]   ` <CAJ3TwYRN+1tJY+paz=qZT0_XP=r9CcTKbBgX_kZRFOWj8vSK=w@mail.gmail.com>
2015-07-31  4:52     ` Qu Wenruo
     [not found]       ` <CAJ3TwYR5g-JhjmGnZUXqLXc7qV1_=AN5_6sj54JQODbtgG9Aag@mail.gmail.com>
2015-07-31  5:40         ` Qu Wenruo
2015-07-31  5:45           ` John Ettedgui
2015-08-01  4:35             ` John Ettedgui
2015-08-01 10:05               ` Russell Coker
2015-08-04  1:39               ` Qu Wenruo
2015-08-04  1:55                 ` John Ettedgui
2015-08-04  2:31                   ` John Ettedgui
2015-08-04  3:01                   ` Qu Wenruo
2015-08-04  4:58                     ` John Ettedgui
2015-08-04  6:47                       ` Duncan
2015-08-04 11:28                       ` Austin S Hemmelgarn
2015-08-04 17:36                         ` John Ettedgui
2015-08-05 11:30                           ` Austin S Hemmelgarn
2015-08-13 22:38                             ` Vincent Olivier
2015-08-13 23:19                               ` Chris Murphy
2015-08-14  0:30                                 ` Duncan
2015-08-14  2:42                                   ` Vincent Olivier
2015-08-18 17:36                                     ` Vincent Olivier
2015-08-14  2:39                                 ` Vincent Olivier
     [not found]                             ` <CAJ3TwYSW+SvbBrh1u_x+c3HTRx03qSR6BoH5cj_VzCXxZYv6EA@mail.gmail.com>
2016-07-15  3:56                               ` Qu Wenruo
     [not found]                                 ` <CAJ3TwYRXwDVVfT0TRRiM9dEw-7TvY8qG=WvMYKczZOv6wkFWAQ@mail.gmail.com>
2016-07-15  5:24                                   ` Qu Wenruo
2016-07-15  6:56                                     ` Kai Krakow
     [not found]                                     ` <CAJ3TwYSTnQfj=qmBLtnmtXQKexMMD4x=9Gk3p3anf4uF+G26kw@mail.gmail.com>
     [not found]                                       ` <CAJ3TwYTnMPVwkrZEU-=Q_Nq+9Bn0vM3z+EFC8RP=RTyaufSoqw@mail.gmail.com>
2016-07-18  1:13                                         ` Qu Wenruo
     [not found]                                           ` <CAJ3TwYRpc_R-wVur0T6+Uy_aPVXTGpvp_ag1Ar9K2HoB0H1ySQ@mail.gmail.com>
2016-07-18  8:41                                             ` Qu Wenruo
     [not found]                                               ` <CAJ3TwYRH8JVkuv2Hu7FYb+BSwKGrq1spx079zwOF_FO1y=9NFA@mail.gmail.com>
2016-07-18  9:07                                                 ` Qu Wenruo
2016-07-18 15:31                                                   ` Duncan
     [not found]                                                   ` <CAJ3TwYS6UTkWf=PNku3RG7hPrXMKz3yhk2WqCRLix4v_VwgrmA@mail.gmail.com>
2016-07-21  8:10                                                     ` Qu Wenruo
     [not found]                                                       ` <CAJ3TwYQ47SVpbO1Pb-TWjhaTCCpMFFmijwTgmV8=7+1_a6_3Ww@mail.gmail.com>
2016-07-21  8:19                                                         ` Qu Wenruo
2016-07-21 15:47                                                           ` Graham Cobb
2017-04-10  0:52                                                             ` Qu Wenruo
2018-02-13 10:21                                                           ` John Ettedgui
2018-02-13 11:04                                                             ` Qu Wenruo
2018-02-13 11:25                                                               ` John Ettedgui
2018-02-13 11:40                                                                 ` Qu Wenruo
2018-02-13 12:06                                                                   ` John Ettedgui
2018-02-13 12:46                                                                     ` Qu Wenruo
2018-02-13 12:52                                                                       ` John Ettedgui
2018-02-13 12:26                                                                   ` Holger Hoffstätte
2018-02-13 12:54                                                                     ` Qu Wenruo
2018-02-13 16:24                                                                       ` Holger Hoffstätte
2018-02-14  0:43                                                                         ` Qu Wenruo
2016-07-15 11:29                                 ` Christian Rohmann
2016-07-16 23:53                                   ` Qu Wenruo
2016-07-18 13:42                                     ` Josef Bacik
2016-07-19  0:35                                       ` Qu Wenruo
2016-07-25 13:01                                       ` David Sterba
2016-07-25 13:38                                         ` Josef Bacik
2015-08-04 14:38                     ` Chris Murphy
2015-07-29  5:46 Georgi Georgiev
2015-07-29  6:19 ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.