[bcachefs] time of mounting filesystem with high number of dirs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [bcachefs] time of mounting filesystem with high number of dirs
@ 2016-09-07 20:09 Marcin
  2016-09-07 21:12 ` Kent Overstreet
  0 siblings, 1 reply; 16+ messages in thread
From: Marcin @ 2016-09-07 20:09 UTC (permalink / raw)
  To: linux-bcache

Hi!
I'm aware that performance doesn't have high priority, it's something 
for TODO.
I created bcachefs on ~10GB partition, copied some files and next I 
created huge number of directories and files. Problem is in the total 
time needed for mounting filesystem.
# time mount -t bcache /dev/sde1 /mnt/test/

real    12m20.880s
user    0m0.000s
sys     1m18.270s

I looked at iostat, mounting needs to read from disk 10083588 "kB_read". 
Device has size 10485760kB, so it looks that it reads almost the same 
amount of data as partition size. Maybe mount time can be lower? Maybe 
there can be less reads or reads could be more sequential?

Additional informations:
# time find /mnt/test/ -type d |wc -l
10564259

real    10m30.305s
user    1m6.080s
sys     3m43.770s

# time find /mnt/test/ -type f |wc -l
9145093

real    6m28.812s
user    1m3.940s
sys     3m46.210s

RAM: 1GB
CPU: core2duo - 1.86GHz
hdd: old disk SAS 15k
hdd controller: HP P400 with 512MB cache with battery

Marcin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-09-07 20:09 [bcachefs] time of mounting filesystem with high number of dirs Marcin
@ 2016-09-07 21:12 ` Kent Overstreet
  2016-09-09  1:56   ` Kent Overstreet
  0 siblings, 1 reply; 16+ messages in thread
From: Kent Overstreet @ 2016-09-07 21:12 UTC (permalink / raw)
  To: Marcin; +Cc: linux-bcache

On Wed, Sep 07, 2016 at 10:09:58PM +0200, Marcin wrote:
> Hi!
> I'm aware that performance doesn't have high priority, it's something for
> TODO.
> I created bcachefs on ~10GB partition, copied some files and next I created
> huge number of directories and files. Problem is in the total time needed
> for mounting filesystem.
> # time mount -t bcache /dev/sde1 /mnt/test/
> 
> real    12m20.880s
> user    0m0.000s
> sys     1m18.270s

Oh damn, guess it's time to start working on mount time... I knew this was going
to be an issue sooner or later, but 12 minutes is impressive :)

> I looked at iostat, mounting needs to read from disk 10083588 "kB_read".
> Device has size 10485760kB, so it looks that it reads almost the same amount
> of data as partition size. Maybe mount time can be lower? Maybe there can be
> less reads or reads could be more sequential?

So, right now we're checking i_nlinks on every mount - mainly the dirents
implementation predates the transactional machinery we have now. That's almost
definitely what's taking so long, but I'll send you a patch to confirm later.

It shouldn't take that much work to make the relevant filesystem code
transactional, I'll bump that up on the todo list...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-09-07 21:12 ` Kent Overstreet
@ 2016-09-09  1:56   ` Kent Overstreet
  2016-09-09  2:07     ` Christopher James Halse Rogers
  2016-09-09  7:52     ` Marcin Mirosław
  0 siblings, 2 replies; 16+ messages in thread
From: Kent Overstreet @ 2016-09-09  1:56 UTC (permalink / raw)
  To: Marcin; +Cc: linux-bcache

On Wed, Sep 07, 2016 at 01:12:12PM -0800, Kent Overstreet wrote:
> So, right now we're checking i_nlinks on every mount - mainly the dirents
> implementation predates the transactional machinery we have now. That's almost
> definitely what's taking so long, but I'll send you a patch to confirm later.

I just pushed a patch to add printks for the various stages of recovery: use
mount -o verbose_recovery to enable.

How many files does this filesystem have? (df -i will tell you).

As another data point, on my laptop mounting takes half a second - smallish
filesystem though, 47 gb of data and 711k inodes (and it's on an SSD). My
expectation is that mount times with the current code will be good enough as
long as you're using SSDs (or tiering, where tier 0 is SSD) - but I could use
more data points.

Also, increasing the btree node size may help, if you're not already using max
size btree nodes (256k). I may readd prefetching to metadata scans too, that
should help a good bit on rotating disks...

Mounting taking 12 minutes (and the amount of IO you were seeing) implies to me
that a metadata isn't being cached as well as it should be though, which is odd
considering outside of journal replay we aren't doing random access, all the
metadata access is inorder scans. So yeah, definitely want that timing
information...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-09-09  1:56   ` Kent Overstreet
@ 2016-09-09  2:07     ` Christopher James Halse Rogers
  2016-09-09  7:52     ` Marcin Mirosław
  1 sibling, 0 replies; 16+ messages in thread
From: Christopher James Halse Rogers @ 2016-09-09  2:07 UTC (permalink / raw)
  To: linux-bcache



On Fri, Sep 9, 2016 at 11:56 AM, Kent Overstreet 
<kent.overstreet@gmail.com> wrote:
> On Wed, Sep 07, 2016 at 01:12:12PM -0800, Kent Overstreet wrote:
>>  So, right now we're checking i_nlinks on every mount - mainly the 
>> dirents
>>  implementation predates the transactional machinery we have now. 
>> That's almost
>>  definitely what's taking so long, but I'll send you a patch to 
>> confirm later.
> 
> I just pushed a patch to add printks for the various stages of 
> recovery: use
> mount -o verbose_recovery to enable.
> 
> How many files does this filesystem have? (df -i will tell you).
> 
> As another data point, on my laptop mounting takes half a second - 
> smallish
> filesystem though, 47 gb of data and 711k inodes (and it's on an 
> SSD). My
> expectation is that mount times with the current code will be good 
> enough as
> long as you're using SSDs (or tiering, where tier 0 is SSD) - but I 
> could use
> more data points.

FWIW, I've got a tier 0 SSD in front of two 3TB HDDs, 1.8M inodes and 
150GB used, and that takes 380ms to mount if systemd-analyse is to be 
trusted.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-09-09  1:56   ` Kent Overstreet
  2016-09-09  2:07     ` Christopher James Halse Rogers
@ 2016-09-09  7:52     ` Marcin Mirosław
  2016-09-09  9:00       ` Kent Overstreet
  1 sibling, 1 reply; 16+ messages in thread
From: Marcin Mirosław @ 2016-09-09  7:52 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 09.09.2016 o 03:56, Kent Overstreet pisze:
Hi!

> On Wed, Sep 07, 2016 at 01:12:12PM -0800, Kent Overstreet wrote:
>> So, right now we're checking i_nlinks on every mount - mainly the dirents
>> implementation predates the transactional machinery we have now. That's almost
>> definitely what's taking so long, but I'll send you a patch to confirm later.
> 
> I just pushed a patch to add printks for the various stages of recovery: use
> mount -o verbose_recovery to enable.
> 
> How many files does this filesystem have? (df -i will tell you).

>> # time find /mnt/test/ -type d |wc -l
>> 10564259

>> real    10m30.305s
>> user    1m6.080s
>> sys     3m43.770s

>> # time find /mnt/test/ -type f |wc -l
>> 9145093

>> real    6m28.812s
>> user    1m3.940s
>> sys     3m46.210s




> As another data point, on my laptop mounting takes half a second - smallish
> filesystem though, 47 gb of data and 711k inodes (and it's on an SSD). My
> expectation is that mount times with the current code will be good enough as
> long as you're using SSDs (or tiering, where tier 0 is SSD) - but I could use
> more data points.
> 
> Also, increasing the btree node size may help, if you're not already using max
> size btree nodes (256k). I may readd prefetching to metadata scans too, that
> should help a good bit on rotating disks...

I'm using defaults from bcache format, knobs don't have description
aboutwneh I should change some options or when I should don't touch it.
On this, particular filesystem btree_node_size=128k according to sysfs.


> Mounting taking 12 minutes (and the amount of IO you were seeing) implies to me
> that a metadata isn't being cached as well as it should be though, which is odd
> considering outside of journal replay we aren't doing random access, all the
> metadata access is inorder scans. So yeah, definitely want that timing
> information...
As I mentioned in emai, box has 1GB of RAM, maybe this is bottleneck?

Timing from dmesg:

[  375.537762] bcache (sde1): starting mark and sweep:
[  376.220196] bcache (sde1): mark and sweep done
[  376.220489] bcache (sde1): starting journal replay:
[  376.220493] bcache (sde1): journal replay done, 0 keys in 1 entries,
seq 133015
[  376.220496] bcache (sde1): journal replay done
[  376.220498] bcache (sde1): starting fs gc:
[  575.205355] bcache (sde1): fs gc done
[  575.205362] bcache (sde1): starting fsck:
[  822.522269] bcache (sde1): fsck done



Marcin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-09-09  7:52     ` Marcin Mirosław
@ 2016-09-09  9:00       ` Kent Overstreet
  2016-09-12 12:59         ` Marcin
  2016-10-18 12:14         ` [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem Marcin Mirosław
  0 siblings, 2 replies; 16+ messages in thread
From: Kent Overstreet @ 2016-09-09  9:00 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Fri, Sep 09, 2016 at 09:52:56AM +0200, Marcin Mirosław wrote:
> I'm using defaults from bcache format, knobs don't have description
> aboutwneh I should change some options or when I should don't touch it.
> On this, particular filesystem btree_node_size=128k according to sysfs.

Yeah, documentation needs work. Next time you format maybe try 256k, I'd like to
know if that helps.

> > Mounting taking 12 minutes (and the amount of IO you were seeing) implies to me
> > that a metadata isn't being cached as well as it should be though, which is odd
> > considering outside of journal replay we aren't doing random access, all the
> > metadata access is inorder scans. So yeah, definitely want that timing
> > information...
> As I mentioned in emai, box has 1GB of RAM, maybe this is bottleneck?

Yeah, but with fsck off we'll be down to one pass over the dirents btree, so it
won't matter then.

> Timing from dmesg:
> 
> [  375.537762] bcache (sde1): starting mark and sweep:
> [  376.220196] bcache (sde1): mark and sweep done
> [  376.220489] bcache (sde1): starting journal replay:
> [  376.220493] bcache (sde1): journal replay done, 0 keys in 1 entries,
> seq 133015
> [  376.220496] bcache (sde1): journal replay done
> [  376.220498] bcache (sde1): starting fs gc:
> [  575.205355] bcache (sde1): fs gc done
> [  575.205362] bcache (sde1): starting fsck:
> [  822.522269] bcache (sde1): fsck done

Initial mark and sweep (walking the extents btree) is fast - that's really good
to know.

So there's no actual need to run the fsck on every mount - I just left it that
way out of an abundance of caution and because on SSD it's cheap.  I just add a
mount option to skip the fsck - use mount -o nofsck. That'll cut another few
minutes off your mount time.

> >> # time find /mnt/test/ -type d |wc -l
> >> 10564259
> 
> >> real    10m30.305s
> >> user    1m6.080s
> >> sys     3m43.770s
> 
> >> # time find /mnt/test/ -type f |wc -l
> >> 9145093
> 
> >> real    6m28.812s
> >> user    1m3.940s
> >> sys     3m46.210s

Do you know around how long those find operations take on ext4 with similar
hardware/filesystem contents? I hope we don't just suck at walking directories.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-09-09  9:00       ` Kent Overstreet
@ 2016-09-12 12:59         ` Marcin
  2016-09-13  2:35           ` Kent Overstreet
  2016-10-18 12:14         ` [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem Marcin Mirosław
  1 sibling, 1 reply; 16+ messages in thread
From: Marcin @ 2016-09-12 12:59 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 2016-09-09 11:00, Kent Overstreet napisał(a):
Hi!

> On Fri, Sep 09, 2016 at 09:52:56AM +0200, Marcin Mirosław wrote:
>> I'm using defaults from bcache format, knobs don't have description
>> aboutwneh I should change some options or when I should don't touch 
>> it.
>> On this, particular filesystem btree_node_size=128k according to 
>> sysfs.
> 
> Yeah, documentation needs work. Next time you format maybe try 256k, 
> I'd like to
> know if that helps.
> 
>> > Mounting taking 12 minutes (and the amount of IO you were seeing) implies to me
>> > that a metadata isn't being cached as well as it should be though, which is odd
>> > considering outside of journal replay we aren't doing random access, all the
>> > metadata access is inorder scans. So yeah, definitely want that timing
>> > information...
>> As I mentioned in emai, box has 1GB of RAM, maybe this is bottleneck?
> 
> Yeah, but with fsck off we'll be down to one pass over the dirents 
> btree, so it
> won't matter then.


>> Timing from dmesg:
>> 
>> [  375.537762] bcache (sde1): starting mark and sweep:
>> [  376.220196] bcache (sde1): mark and sweep done
>> [  376.220489] bcache (sde1): starting journal replay:
>> [  376.220493] bcache (sde1): journal replay done, 0 keys in 1 
>> entries,
>> seq 133015
>> [  376.220496] bcache (sde1): journal replay done
>> [  376.220498] bcache (sde1): starting fs gc:
>> [  575.205355] bcache (sde1): fs gc done
>> [  575.205362] bcache (sde1): starting fsck:
>> [  822.522269] bcache (sde1): fsck done
> 
> Initial mark and sweep (walking the extents btree) is fast - that's 
> really good
> to know.
> 
> So there's no actual need to run the fsck on every mount - I just left 
> it that
> way out of an abundance of caution and because on SSD it's cheap.  I 
> just add a
> mount option to skip the fsck - use mount -o nofsck. That'll cut 
> another few
> minutes off your mount time.

<zfs mode on> Why do I ever need fsck?;) <zfs mode off>
Maybe, near final version of bcachefs, fsck should be started only after
unclean shutdown?
HDD won't die in the next year or two, are you concerned especially on
SSD support in bcachefs?

>> >> # time find /mnt/test/ -type d |wc -l
>> >> 10564259
>> 
>> >> real    10m30.305s
>> >> user    1m6.080s
>> >> sys     3m43.770s
>> 
>> >> # time find /mnt/test/ -type f |wc -l
>> >> 9145093
>> 
>> >> real    6m28.812s
>> >> user    1m3.940s
>> >> sys     3m46.210s
> 
> Do you know around how long those find operations take on ext4 with 
> similar
> hardware/filesystem contents? I hope we don't just suck at walking 
> directories.


ext4 with default, 4kB sector size needs at least one hour (I didn't
wait to the end of test). I think that such comparision with ext4 or
testing with other btree_node_size needs simple bash script. I'll wait
with it until OOM fixes will be available in bcache-dev. I've often got
problems with allocation failure when I played with bcachefs,ext4 and
milions of directories.

I noticed that bcachefs needs a lot lot of less space for keeping info
about inodes. Are metadata compressed? If yes then I should do
comparison of filesystems with and without compression.

Additional question:
Should be https://github.com/koverstreet/linux-bcache/issues using?

Thanks,
Marcin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-09-12 12:59         ` Marcin
@ 2016-09-13  2:35           ` Kent Overstreet
  2016-10-05 12:51             ` Marcin Mirosław
  0 siblings, 1 reply; 16+ messages in thread
From: Kent Overstreet @ 2016-09-13  2:35 UTC (permalink / raw)
  To: Marcin; +Cc: linux-bcache

On Mon, Sep 12, 2016 at 02:59:35PM +0200, Marcin wrote:
> <zfs mode on> Why do I ever need fsck?;) <zfs mode off>

hah :)

> Maybe, near final version of bcachefs, fsck should be started only after
> unclean shutdown?

It's not about unclean shutdown at all, bcache/bcachefs has always been written
to not care about clean vs. unclean shutdown. We don't even have any way of
telling whether the last shutdown was clean or unclean because we really don't
care.

But in the final release we will definitely make it run much less often - right
now, the concern is bugs, anything that fsck finds would be the result of a bug
and if we do ever have that kind of inconsistency I want to know about it sooner
rather than later.

> HDD won't die in the next year or two, are you concerned especially on
> SSD support in bcachefs?

I'm definitely paying more attention to SSD performance than HDD, but I do want
to make it perform well on HDDs too.

> >> >> # time find /mnt/test/ -type d |wc -l
> >> >> 10564259
> >> 
> >> >> real    10m30.305s
> >> >> user    1m6.080s
> >> >> sys     3m43.770s
> >> 
> >> >> # time find /mnt/test/ -type f |wc -l
> >> >> 9145093
> >> 
> >> >> real    6m28.812s
> >> >> user    1m3.940s
> >> >> sys     3m46.210s
> > 
> > Do you know around how long those find operations take on ext4 with 
> > similar
> > hardware/filesystem contents? I hope we don't just suck at walking 
> > directories.
> 
> 
> ext4 with default, 4kB sector size needs at least one hour (I didn't
> wait to the end of test). I think that such comparision with ext4 or
> testing with other btree_node_size needs simple bash script. I'll wait
> with it until OOM fixes will be available in bcache-dev. I've often got
> problems with allocation failure when I played with bcachefs,ext4 and
> milions of directories.

Oh wow, I guess we're not doing so bad after all :)

Sorry I forgot to reply to your email about the OOMs - those messages are
actually nothing to worry about, we have a mempool we use if that allocation
fails (I'll change it to not print out that message now, just got sidetracked).

> I noticed that bcachefs needs a lot lot of less space for keeping info
> about inodes. Are metadata compressed? If yes then I should do
> comparison of filesystems with and without compression.

There is a sort of metadata compression (packed bkeys), but it's not something
you can or would want to turn off. That's only for keys though, not values (i.e.
inodes).

For inodes, the reason we're taking less space is that since we're storing
inodes in a btree, they don't have to be fixed size (or aligned to a power of
two) - which means we don't have to size them for everything we might ever want
to stick in an inode, like ext4 does, we can have just the essential fields in
struct bch_inode and add optional fields later if we need to.

> Additional question:
> Should be https://github.com/koverstreet/linux-bcache/issues using?

Yeah... I'm not a huge fan of github's issue tracker but I'm not going to run
one myself, and we do need to start using one.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-09-13  2:35           ` Kent Overstreet
@ 2016-10-05 12:51             ` Marcin Mirosław
  2016-10-06 13:01               ` Kent Overstreet
  0 siblings, 1 reply; 16+ messages in thread
From: Marcin Mirosław @ 2016-10-05 12:51 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 13.09.2016 o 04:35, Kent Overstreet pisze:
[...]
Hi!
I'm picking thread with mention about allocation problem.

> Sorry I forgot to reply to your email about the OOMs - those messages are
> actually nothing to worry about, we have a mempool we use if that allocation
> fails (I'll change it to not print out that message now, just got sidetracked).

Today I tried to mount freshly formated filesystem (with lz4 enabled at
format time). Mount failed with message in dmesg:


[16950.860251] mount: page allocation failure: order:8,
mode:0x24040c0(GFP_KERNEL|__GFP_COMP)
[16950.860257] CPU: 3 PID: 22020 Comm: mount Tainted: P           O
4.7.0-bcache+ #9
[16950.860259] Hardware name: .   .  /IP35 Pro XE(Intel P35-ICH9R), BIOS
6.00 PG 09/09/2008
[16950.860262]  0000000000000286 0000000095952302 ffff88012d84fa48
ffffffff812cb37d
[16950.860266]  0000000000000000 0000000000000008 ffff88012d84fad8
ffffffff8113207c
[16950.860270]  024040c095952302 0000000000000040 0000000000000008
ffff88014158d1c0
[16950.860274] Call Trace:
[16950.860281]  [<ffffffff812cb37d>] dump_stack+0x4f/0x72
[16950.860285]  [<ffffffff8113207c>] warn_alloc_failed+0xfc/0x160
[16950.860289]  [<ffffffff81131c71>] ?
__alloc_pages_direct_compact+0x51/0x120
[16950.860291]  [<ffffffff811325cf>] __alloc_pages_nodemask+0x4ef/0xed0
[16950.860295]  [<ffffffff812e096a>] ? find_next_zero_bit+0x1a/0x20
[16950.860298]  [<ffffffff8113328d>] alloc_kmem_pages+0x1d/0x90
[16950.860301]  [<ffffffff8114e3b9>] kmalloc_order_trace+0x29/0xf0
[16950.860306]  [<ffffffff8146b05a>] bch_journal_alloc+0x1aa/0x210
[16950.860309]  [<ffffffff81475e28>] bch_cache_set_alloc+0x928/0xae0
[16950.860312]  [<ffffffff814796c4>] bch_register_cache_set+0x1a4/0x2e0
[16950.860315]  [<ffffffff8117d71c>] ? __kmalloc+0x22c/0x240
[16950.860319]  [<ffffffff8145aafa>] ? bch_mount+0x1ca/0x500
[16950.860321]  [<ffffffff8145ab62>] bch_mount+0x232/0x500
[16950.860323]  [<ffffffff8114c67a>] ? pcpu_alloc+0x37a/0x630
[16950.860327]  [<ffffffff81192d33>] mount_fs+0x33/0x160
[16950.860329]  [<ffffffff8114c950>] ? __alloc_percpu+0x10/0x20
[16950.860333]  [<ffffffff811aeac2>] vfs_kern_mount+0x62/0x100
[16950.860335]  [<ffffffff811b1127>] do_mount+0x247/0xda0
[16950.860338]  [<ffffffff8117fe3c>] ? __kmalloc_track_caller+0x2c/0x240
[16950.860342]  [<ffffffff811476fd>] ? memdup_user+0x3d/0x70
[16950.860345]  [<ffffffff811b1fb0>] SyS_mount+0x90/0xd0
[16950.860349]  [<ffffffff815d0c1f>] entry_SYSCALL_64_fastpath+0x17/0x93
[16950.860351] Mem-Info:
[16950.860356] active_anon:223041 inactive_anon:132461 isolated_anon:0
                active_file:297709 inactive_file:198481 isolated_file:0
                unevictable:1907 dirty:6 writeback:0 unstable:0
                slab_reclaimable:49896 slab_unreclaimable:48418
                mapped:70677 shmem:3674 pagetables:6279 bounce:0
                free:99893 free_pcp:12 free_cma:0
[16950.860365] DMA free:15896kB min:28kB low:40kB high:52kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB
managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[16950.860367] lowmem_reserve[]: 0 3224 4419 4419
[16950.860378] DMA32 free:289904kB min:6196kB low:9496kB high:12796kB
active_anon:682936kB inactive_anon:382700kB active_file:893856kB
inactive_file:551472kB unevictable:6536kB isolated(anon):0kB
isolated(file):0kB present:3390336kB managed:3314508kB mlocked:6536kB
dirty:20kB writeback:0kB mapped:207052kB shmem:10704kB
slab_reclaimable:129808kB slab_unreclaimable:130716kB
kernel_stack:5712kB pagetables:18720kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[16950.860380] lowmem_reserve[]: 0 0 1194 1194
[16950.860390] Normal free:93564kB min:2296kB low:3516kB high:4736kB
active_anon:209228kB inactive_anon:147144kB active_file:296980kB
inactive_file:242452kB unevictable:1092kB isolated(anon):0kB
isolated(file):0kB present:1310720kB managed:1223572kB mlocked:1092kB
dirty:4kB writeback:0kB mapped:75656kB shmem:3992kB
slab_reclaimable:69776kB slab_unreclaimable:62948kB kernel_stack:2656kB
pagetables:6396kB unstable:0kB bounce:0kB free_pcp:160kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[16950.860392] lowmem_reserve[]: 0 0 0 0
[16950.860397] DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB
(U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
[16950.860418] DMA32: 13608*4kB (UMEH) 5636*8kB (UMEH) 8677*16kB (UMEH)
1376*32kB (UMEH) 72*64kB (UMH) 22*128kB (MH) 1*256kB (H) 0*512kB
0*1024kB 0*2048kB 0*4096kB = 290064kB
[16950.860436] Normal: 4269*4kB (UMEH) 1945*8kB (UMEH) 2056*16kB (UMEH)
760*32kB (UMH) 32*64kB (MH) 9*128kB (MH) 0*256kB 1*512kB (H) 0*1024kB
0*2048kB 0*4096kB = 93564kB
[16950.860457] Node 0 hugepages_total=30 hugepages_free=30
hugepages_surp=0 hugepages_size=2048kB
[16950.860459] 511914 total pagecache pages
[16950.860462] 10528 pages in swap cache
[16950.860464] Swap cache stats: add 329149, delete 318621, find
120080/177708
[16950.860466] Free swap  = 752636kB
[16950.860468] Total swap = 1048572kB
[16950.860469] 1179261 pages RAM
[16950.860471] 0 pages HighMem/MovableOnly
[16950.860472] 40765 pages reserved
[16950.860584] mount: page allocation failure: order:6,
mode:0x24000c0(GFP_KERNEL)
[16950.860588] CPU: 3 PID: 22020 Comm: mount Tainted: P           O
4.7.0-bcache+ #9
[16950.860591] Hardware name: .   .  /IP35 Pro XE(Intel P35-ICH9R), BIOS
6.00 PG 09/09/2008
[16950.860593]  0000000000000286 0000000095952302 ffff88012d84faa0
ffffffff812cb37d
[16950.860597]  0000000000000000 0000000000000006 ffff88012d84fb30
ffffffff8113207c
[16950.860601]  024000c095952302 0000000000000040 0000000000000006
ffff88014158d1c0
[16950.860607] Call Trace:
[16950.860611]  [<ffffffff812cb37d>] dump_stack+0x4f/0x72
[16950.860613]  [<ffffffff8113207c>] warn_alloc_failed+0xfc/0x160
[16950.860616]  [<ffffffff81131c71>] ?
__alloc_pages_direct_compact+0x51/0x120
[16950.860619]  [<ffffffff811325cf>] __alloc_pages_nodemask+0x4ef/0xed0
[16950.860623]  [<ffffffff81132fc2>] __get_free_pages+0x12/0x40
[16950.860625]  [<ffffffff8146b075>] bch_journal_alloc+0x1c5/0x210
[16950.860627]  [<ffffffff81475e28>] bch_cache_set_alloc+0x928/0xae0
[16950.860630]  [<ffffffff814796c4>] bch_register_cache_set+0x1a4/0x2e0
[16950.860633]  [<ffffffff8117d71c>] ? __kmalloc+0x22c/0x240
[16950.860636]  [<ffffffff8145aafa>] ? bch_mount+0x1ca/0x500
[16950.860639]  [<ffffffff8145ab62>] bch_mount+0x232/0x500
[16950.860641]  [<ffffffff8114c67a>] ? pcpu_alloc+0x37a/0x630
[16950.860644]  [<ffffffff81192d33>] mount_fs+0x33/0x160
[16950.860646]  [<ffffffff8114c950>] ? __alloc_percpu+0x10/0x20
[16950.860649]  [<ffffffff811aeac2>] vfs_kern_mount+0x62/0x100
[16950.860651]  [<ffffffff811b1127>] do_mount+0x247/0xda0
[16950.860655]  [<ffffffff8117fe3c>] ? __kmalloc_track_caller+0x2c/0x240
[16950.860658]  [<ffffffff811476fd>] ? memdup_user+0x3d/0x70
[16950.860661]  [<ffffffff811b1fb0>] SyS_mount+0x90/0xd0
[16950.860665]  [<ffffffff815d0c1f>] entry_SYSCALL_64_fastpath+0x17/0x93
[16950.860667] Mem-Info:
[16950.860672] active_anon:222994 inactive_anon:132492 isolated_anon:0
                active_file:297709 inactive_file:198481 isolated_file:0
                unevictable:1907 dirty:6 writeback:0 unstable:0
                slab_reclaimable:49896 slab_unreclaimable:48418
                mapped:70677 shmem:3674 pagetables:6294 bounce:0
                free:99616 free_pcp:0 free_cma:0
[16950.860680] DMA free:15896kB min:28kB low:40kB high:52kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB
managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[16950.860682] lowmem_reserve[]: 0 3224 4419 4419
[16950.860692] DMA32 free:289956kB min:6196kB low:9496kB high:12796kB
active_anon:682936kB inactive_anon:382700kB active_file:893856kB
inactive_file:551472kB unevictable:6536kB isolated(anon):0kB
isolated(file):0kB present:3390336kB managed:3314508kB mlocked:6536kB
dirty:20kB writeback:0kB mapped:207052kB shmem:10704kB
slab_reclaimable:129808kB slab_unreclaimable:130716kB
kernel_stack:5712kB pagetables:18720kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[16950.860694] lowmem_reserve[]: 0 0 1194 1194
[16950.860704] Normal free:92612kB min:2296kB low:3516kB high:4736kB
active_anon:209040kB inactive_anon:147268kB active_file:296980kB
inactive_file:242452kB unevictable:1092kB isolated(anon):0kB
isolated(file):0kB present:1310720kB managed:1223572kB mlocked:1092kB
dirty:4kB writeback:0kB mapped:75656kB shmem:3992kB
slab_reclaimable:69776kB slab_unreclaimable:62948kB kernel_stack:2672kB
pagetables:6456kB unstable:0kB bounce:0kB free_pcp:116kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[16950.860706] lowmem_reserve[]: 0 0 0 0
[16950.860712] DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB
(U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
[16950.860729] DMA32: 13608*4kB (UMEH) 5636*8kB (UMEH) 8677*16kB (UMEH)
1376*32kB (UMEH) 72*64kB (UMH) 22*128kB (MH) 1*256kB (H) 0*512kB
0*1024kB 0*2048kB 0*4096kB = 290064kB
[16950.860746] Normal: 4088*4kB (UMEH) 1945*8kB (UMEH) 2056*16kB (UMEH)
760*32kB (UMH) 32*64kB (MH) 9*128kB (MH) 0*256kB 1*512kB (H) 0*1024kB
0*2048kB 0*4096kB = 92840kB
[16950.860764] Node 0 hugepages_total=30 hugepages_free=30
hugepages_surp=0 hugepages_size=2048kB
[16950.860766] 511914 total pagecache pages
[16950.860768] 10528 pages in swap cache
[16950.860771] Swap cache stats: add 329149, delete 318621, find
120086/177714
[16950.860773] Free swap  = 752636kB
[16950.860774] Total swap = 1048572kB
[16950.860776] 1179261 pages RAM
[16950.860777] 0 pages HighMem/MovableOnly
[16950.860780] 40765 pages reserved
[16950.861354] bcache: bch_open_as_blockdevs() register_cache_set err
cannot allocate memory




Marcin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs
  2016-10-05 12:51             ` Marcin Mirosław
@ 2016-10-06 13:01               ` Kent Overstreet
  0 siblings, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2016-10-06 13:01 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Wed, Oct 05, 2016 at 02:51:55PM +0200, Marcin Mirosław wrote:
> W dniu 13.09.2016 o 04:35, Kent Overstreet pisze:
> [...]
> Hi!
> I'm picking thread with mention about allocation problem.
> 
> > Sorry I forgot to reply to your email about the OOMs - those messages are
> > actually nothing to worry about, we have a mempool we use if that allocation
> > fails (I'll change it to not print out that message now, just got sidetracked).
> 
> Today I tried to mount freshly formated filesystem (with lz4 enabled at
> format time). Mount failed with message in dmesg:

Hey - this is another one I've got a fix for in the on disk format changes
branch :) I've got a patch that splits out the gzip and lz4 compression
workspaces, and allocates the gzip one with vmalloc, and adds feature bits to
the superblock so it doesn't have to allocate them at all if compression isn't
in use.

Hoping to get that stuff out soon...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem
  2016-09-09  9:00       ` Kent Overstreet
  2016-09-12 12:59         ` Marcin
@ 2016-10-18 12:14         ` Marcin Mirosław
  2016-10-18 12:45           ` Kent Overstreet
  1 sibling, 1 reply; 16+ messages in thread
From: Marcin Mirosław @ 2016-10-18 12:14 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 09.09.2016 o 11:00, Kent Overstreet pisze:
> On Fri, Sep 09, 2016 at 09:52:56AM +0200, Marcin Mirosław wrote:
>> I'm using defaults from bcache format, knobs don't have description
>> aboutwneh I should change some options or when I should don't touch it.
>> On this, particular filesystem btree_node_size=128k according to sysfs.
> 
> Yeah, documentation needs work. Next time you format maybe try 256k, I'd like to
> know if that helps.

Hi!

# bcache format --help
bcache format - create a new bcache filesystem on one or more devices
Usage: bcache format [OPTION]... <devices>

Options:
  -b, --block=size
      --btree_node=size       Btree node size, default 256k
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ it's not true

# bcache format  /dev/mapper/system10-bcache
/dev/mapper/system10-bcache contains a bcache filesystem
Proceed anyway? (y,n) y
External UUID:                  1a064a62-fb61-42c8-8f0e-68961ad37d4c
Internal UUID:                  c2802bef-fbc4-414a-9fb0-e071943582c8
Label:
Version:                        6
Block_size:                     512
Btree node size:                128.0K
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I see another problem, I observed it due to long mount time.
I'm creating many dirs:
# for x in {0..31}; do eatmydata \
mkdir -p /mnt/test/a/${x}/{0..255}/{0..255}; done

# find /mnt/test|wc -l
2105378

df -h shows:
/dev/mapper/system10-bcache          9,8G  421M  9,4G   5% /mnt/test

next I removing all those dirs. Umount, mount:
[ 6172.131784] bcache (dm-12): starting mark and sweep:
[ 6189.113714] bcache (dm-12): mark and sweep done
[ 6189.113979] bcache (dm-12): starting journal replay:
[ 6189.114201] bcache (dm-12): journal replay done, 129 keys in 88
entries, seq 28579
[ 6189.114214] bcache (dm-12): journal replay done
[ 6189.114214] bcache (dm-12): starting fs gc:
[ 6189.118244] bcache (dm-12): fs gc done
[ 6189.118246] bcache (dm-12): starting fsck:
[ 6189.119220] bcache (dm-12): fsck done

So mount time is still long, even with empty fileystem.
df shows:
/dev/mapper/system10-bcache  9,8G  421M  9,4G   5% /mnt/test

# find /mnt/test|wc -l
1

It looks that creating and removing dirs doesn't clean some internal
structures.

Marcin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem
  2016-10-18 12:14         ` [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem Marcin Mirosław
@ 2016-10-18 12:45           ` Kent Overstreet
  2016-10-18 12:51             ` Marcin Mirosław
  0 siblings, 1 reply; 16+ messages in thread
From: Kent Overstreet @ 2016-10-18 12:45 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Tue, Oct 18, 2016 at 02:14:47PM +0200, Marcin Mirosław wrote:
> W dniu 09.09.2016 o 11:00, Kent Overstreet pisze:
> > On Fri, Sep 09, 2016 at 09:52:56AM +0200, Marcin Mirosław wrote:
> >> I'm using defaults from bcache format, knobs don't have description
> >> aboutwneh I should change some options or when I should don't touch it.
> >> On this, particular filesystem btree_node_size=128k according to sysfs.
> > 
> > Yeah, documentation needs work. Next time you format maybe try 256k, I'd like to
> > know if that helps.
> 
> Hi!
> 
> # bcache format --help
> bcache format - create a new bcache filesystem on one or more devices
> Usage: bcache format [OPTION]... <devices>
> 
> Options:
>   -b, --block=size
>       --btree_node=size       Btree node size, default 256k
>                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ it's not true

It is if your bucket size is big enough - btree node size can't be bigger than
bucket size.

> # bcache format  /dev/mapper/system10-bcache
> /dev/mapper/system10-bcache contains a bcache filesystem
> Proceed anyway? (y,n) y
> External UUID:                  1a064a62-fb61-42c8-8f0e-68961ad37d4c
> Internal UUID:                  c2802bef-fbc4-414a-9fb0-e071943582c8
> Label:
> Version:                        6
> Block_size:                     512
> Btree node size:                128.0K
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> 
> I see another problem, I observed it due to long mount time.
> I'm creating many dirs:
> # for x in {0..31}; do eatmydata \
> mkdir -p /mnt/test/a/${x}/{0..255}/{0..255}; done
> 
> # find /mnt/test|wc -l
> 2105378
> 
> df -h shows:
> /dev/mapper/system10-bcache          9,8G  421M  9,4G   5% /mnt/test
> 
> next I removing all those dirs. Umount, mount:
> [ 6172.131784] bcache (dm-12): starting mark and sweep:
> [ 6189.113714] bcache (dm-12): mark and sweep done
> [ 6189.113979] bcache (dm-12): starting journal replay:
> [ 6189.114201] bcache (dm-12): journal replay done, 129 keys in 88
> entries, seq 28579
> [ 6189.114214] bcache (dm-12): journal replay done
> [ 6189.114214] bcache (dm-12): starting fs gc:
> [ 6189.118244] bcache (dm-12): fs gc done
> [ 6189.118246] bcache (dm-12): starting fsck:
> [ 6189.119220] bcache (dm-12): fsck done
> 
> So mount time is still long, even with empty fileystem.
> df shows:
> /dev/mapper/system10-bcache  9,8G  421M  9,4G   5% /mnt/test
> 
> # find /mnt/test|wc -l
> 1
> 
> It looks that creating and removing dirs doesn't clean some internal
> structures.

The issue is that right now btree node coalescing is only run as a batch pass
when mark and sweep GC runs (it has nothing to do with GC, it just runs at the
same time in the current code). At some point we need to come up with a good way
of triggering it as needed.

Try triggering a gc, and then check mount time:

echo 1 > /sys/fs/bcache/<uuid>/internal/trigger_gc

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem
  2016-10-18 12:45           ` Kent Overstreet
@ 2016-10-18 12:51             ` Marcin Mirosław
  2016-10-18 13:04               ` Kent Overstreet
  2016-10-18 13:19               ` Kent Overstreet
  0 siblings, 2 replies; 16+ messages in thread
From: Marcin Mirosław @ 2016-10-18 12:51 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 18.10.2016 o 14:45, Kent Overstreet pisze:
> The issue is that right now btree node coalescing is only run as a batch pass
> when mark and sweep GC runs (it has nothing to do with GC, it just runs at the
> same time in the current code). At some point we need to come up with a good way
> of triggering it as needed.
> 
> Try triggering a gc, and then check mount time:
> 
> echo 1 > /sys/fs/bcache/<uuid>/internal/trigger_gc

No change:
[ 8417.101640] bcache (dm-12): starting mark and sweep:
[ 8433.795575] bcache (dm-12): mark and sweep done
[ 8433.795842] bcache (dm-12): starting journal replay:
[ 8433.796064] bcache (dm-12): journal replay done, 129 keys in 90
entries, seq 28581
[ 8433.796075] bcache (dm-12): journal replay done
[ 8433.796076] bcache (dm-12): starting fs gc:
[ 8433.799493] bcache (dm-12): fs gc done
[ 8433.799495] bcache (dm-12): starting fsck:
[ 8433.800613] bcache (dm-12): fsck done


/dev/mapper/system10-bcache  9,8G  421M  9,4G   5% /mnt/test

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem
  2016-10-18 12:51             ` Marcin Mirosław
@ 2016-10-18 13:04               ` Kent Overstreet
  2016-10-18 13:13                 ` Marcin Mirosław
  2016-10-18 13:19               ` Kent Overstreet
  1 sibling, 1 reply; 16+ messages in thread
From: Kent Overstreet @ 2016-10-18 13:04 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Tue, Oct 18, 2016 at 02:51:11PM +0200, Marcin Mirosław wrote:
> W dniu 18.10.2016 o 14:45, Kent Overstreet pisze:
> > The issue is that right now btree node coalescing is only run as a batch pass
> > when mark and sweep GC runs (it has nothing to do with GC, it just runs at the
> > same time in the current code). At some point we need to come up with a good way
> > of triggering it as needed.
> > 
> > Try triggering a gc, and then check mount time:
> > 
> > echo 1 > /sys/fs/bcache/<uuid>/internal/trigger_gc
> 
> No change:
> [ 8417.101640] bcache (dm-12): starting mark and sweep:
> [ 8433.795575] bcache (dm-12): mark and sweep done
> [ 8433.795842] bcache (dm-12): starting journal replay:
> [ 8433.796064] bcache (dm-12): journal replay done, 129 keys in 90
> entries, seq 28581
> [ 8433.796075] bcache (dm-12): journal replay done
> [ 8433.796076] bcache (dm-12): starting fs gc:
> [ 8433.799493] bcache (dm-12): fs gc done
> [ 8433.799495] bcache (dm-12): starting fsck:
> [ 8433.800613] bcache (dm-12): fsck done
> 
> 
> /dev/mapper/system10-bcache  9,8G  421M  9,4G   5% /mnt/test

Bleh.

Can you check how many nodes are in each btree, post coalescing?

grep -c ^l /sys/kernel/debug/bcache/<uuid>/*-formats

Coalescing will also just skip running if the allocator doesn't have enough new
nodes ready to go, rather than block on the allocator thread with locks held -
you can try running it a few times, and see if it's making any progress
shrinking the btrees.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem
  2016-10-18 13:04               ` Kent Overstreet
@ 2016-10-18 13:13                 ` Marcin Mirosław
  0 siblings, 0 replies; 16+ messages in thread
From: Marcin Mirosław @ 2016-10-18 13:13 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

W dniu 18.10.2016 o 15:04, Kent Overstreet pisze:
> On Tue, Oct 18, 2016 at 02:51:11PM +0200, Marcin Mirosław wrote:
>> W dniu 18.10.2016 o 14:45, Kent Overstreet pisze:
>>> The issue is that right now btree node coalescing is only run as a batch pass
>>> when mark and sweep GC runs (it has nothing to do with GC, it just runs at the
>>> same time in the current code). At some point we need to come up with a good way
>>> of triggering it as needed.
>>>
>>> Try triggering a gc, and then check mount time:
>>>
>>> echo 1 > /sys/fs/bcache/<uuid>/internal/trigger_gc
>>
>> No change:
>> [ 8417.101640] bcache (dm-12): starting mark and sweep:
>> [ 8433.795575] bcache (dm-12): mark and sweep done
>> [ 8433.795842] bcache (dm-12): starting journal replay:
>> [ 8433.796064] bcache (dm-12): journal replay done, 129 keys in 90
>> entries, seq 28581
>> [ 8433.796075] bcache (dm-12): journal replay done
>> [ 8433.796076] bcache (dm-12): starting fs gc:
>> [ 8433.799493] bcache (dm-12): fs gc done
>> [ 8433.799495] bcache (dm-12): starting fsck:
>> [ 8433.800613] bcache (dm-12): fsck done
>>
>>
>> /dev/mapper/system10-bcache  9,8G  421M  9,4G   5% /mnt/test
> 
> Bleh.
> 
> Can you check how many nodes are in each btree, post coalescing?
> 
> grep -c ^l /sys/kernel/debug/bcache/<uuid>/*-formats
> 
> Coalescing will also just skip running if the allocator doesn't have enough new
> nodes ready to go, rather than block on the allocator thread with locks held -
> you can try running it a few times, and see if it's making any progress
> shrinking the btrees.

No change after a few run, I've got constant values:
/sys/kernel/debug/bcache/61fc20db-d626-4f0d-bb33-31a4d998f6df/dirents-formats:1044
/sys/kernel/debug/bcache/61fc20db-d626-4f0d-bb33-31a4d998f6df/extents-formats:1
/sys/kernel/debug/bcache/61fc20db-d626-4f0d-bb33-31a4d998f6df/inodes-formats:2320
/sys/kernel/debug/bcache/61fc20db-d626-4f0d-bb33-31a4d998f6df/xattrs-formats:1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem
  2016-10-18 12:51             ` Marcin Mirosław
  2016-10-18 13:04               ` Kent Overstreet
@ 2016-10-18 13:19               ` Kent Overstreet
  1 sibling, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2016-10-18 13:19 UTC (permalink / raw)
  To: Marcin Mirosław; +Cc: linux-bcache

On Tue, Oct 18, 2016 at 02:51:11PM +0200, Marcin Mirosław wrote:
> W dniu 18.10.2016 o 14:45, Kent Overstreet pisze:
> > The issue is that right now btree node coalescing is only run as a batch pass
> > when mark and sweep GC runs (it has nothing to do with GC, it just runs at the
> > same time in the current code). At some point we need to come up with a good way
> > of triggering it as needed.
> > 
> > Try triggering a gc, and then check mount time:
> > 
> > echo 1 > /sys/fs/bcache/<uuid>/internal/trigger_gc
> 
> No change:
> [ 8417.101640] bcache (dm-12): starting mark and sweep:
> [ 8433.795575] bcache (dm-12): mark and sweep done
> [ 8433.795842] bcache (dm-12): starting journal replay:
> [ 8433.796064] bcache (dm-12): journal replay done, 129 keys in 90
> entries, seq 28581
> [ 8433.796075] bcache (dm-12): journal replay done
> [ 8433.796076] bcache (dm-12): starting fs gc:
> [ 8433.799493] bcache (dm-12): fs gc done
> [ 8433.799495] bcache (dm-12): starting fsck:
> [ 8433.800613] bcache (dm-12): fsck done
> 
> 
> /dev/mapper/system10-bcache  9,8G  421M  9,4G   5% /mnt/test

well that's odd.

I'll work on getting some more information out of coalescing tomorrow.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-10-18 13:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-07 20:09 [bcachefs] time of mounting filesystem with high number of dirs Marcin
2016-09-07 21:12 ` Kent Overstreet
2016-09-09  1:56   ` Kent Overstreet
2016-09-09  2:07     ` Christopher James Halse Rogers
2016-09-09  7:52     ` Marcin Mirosław
2016-09-09  9:00       ` Kent Overstreet
2016-09-12 12:59         ` Marcin
2016-09-13  2:35           ` Kent Overstreet
2016-10-05 12:51             ` Marcin Mirosław
2016-10-06 13:01               ` Kent Overstreet
2016-10-18 12:14         ` [bcachefs] time of mounting filesystem with high number of dirs aka ageing filesystem Marcin Mirosław
2016-10-18 12:45           ` Kent Overstreet
2016-10-18 12:51             ` Marcin Mirosław
2016-10-18 13:04               ` Kent Overstreet
2016-10-18 13:13                 ` Marcin Mirosław
2016-10-18 13:19               ` Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.