From: Leszek Dubiel <leszek@dubiel.pl>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
Date: Fri, 3 Jan 2020 21:59:46 +0100 [thread overview]
Message-ID: <3e13e4ad-0143-2a92-503b-7bf2fae9a527@dubiel.pl> (raw)
In-Reply-To: <CAJCQCtSpFyfHAWVth5PuvjJtiHPfN52WzspOdsvLrJxMdbcirw@mail.gmail.com>
W dniu 03.01.2020 o 20:02, Chris Murphy pisze:
> On Fri, Jan 3, 2020 at 7:39 AM Leszek Dubiel <leszek@dubiel.pl> wrote:
>>
>> ** number of files by given size
>>
>> root@wawel:/mnt/root/orion# cat disk_usage | perl -MData::Dumper -e
>> '$Data::Dumper::Sortkeys = 1; while (<>) { chomp; my ($byt, $nam) =
>> split /\t/, $_, -1; if (index("$las/", $nam) == 0) { $dir++; } else {
>> $filtot++; for $p (1 .. 99) { if ($byt < 10 ** $p) { $fil{"num of files
>> size <10^$p"}++; last; } } }; $las = $nam; }; print "\ndirectories:
>> $dir\ntotal num of files: $filtot\n", "\nnumber of files grouped by
>> size: \n", Dumper(\%fil) '
>>
>> directories: 1314246
>> total num of files: 10123960
>>
>> number of files grouped by size:
>> $VAR1 = {
>> 'num of files size <10^1' => 3325886,
>> 'num of files size <10^2' => 3709276,
>> 'num of files size <10^3' => 789852,
>> 'num of files size <10^4' => 1085927,
>> 'num of files size <10^5' => 650571,
>> 'num of files size <10^6' => 438717,
>> 'num of files size <10^7' => 116757,
>> 'num of files size <10^8' => 6638,
>> 'num of files size <10^9' => 323
>> 'num of files size <10^10' => 13,
>
> Is that really ~7.8 million files at or less than 1KiB?? (totalling
> the first three)
Yes. This is Workflow Management system in my company (bathroom mirorr
manufacturer).
System was made in 2004. Back then ReisierFs was great because it had
"tail packing" and put small pieces of data together with metadata, so
disks could read many pieces of data during one read request. Other
systems were not any close to ReiserFs when it came to speed with lots
of small files. That's why I'm testing BTFS for a few years now.
> Compression may not do much with such small files, and also I'm not
> sure which algorithm would do the best job. They all probably want a
> lot more than 1KiB to become efficient.
>
> But nodesize 64KiB might be a big deal...worth testing.
Yes -- I will have to read about nodesize.
Thank you for hint.
Current data:
root@wawel:~# btrfs inspect dump-super /dev/sdb2
superblock: bytenr=65536, device=/dev/sdb2
---------------------------------------------------------
csum_type 0 (crc32c)
csum_size 4
csum 0x0bd7280d [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
fsid 44803366-3981-4ebb-853b-6c991380c8a6
metadata_uuid 44803366-3981-4ebb-853b-6c991380c8a6
label
generation 553943
root 17155128295424
sys_array_size 129
chunk_root_generation 553648
root_level 1
chunk_root 10136287444992
chunk_root_level 1
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 23967879376896
bytes_used 5844982415360
sectorsize 4096
nodesize 16384 ---------------<<<
leafsize (deprecated) 16384
stripesize 4096
root_dir 6
num_devices 3
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x163
( MIXED_BACKREF |
DEFAULT_SUBVOL |
BIG_METADATA |
EXTENDED_IREF |
SKINNY_METADATA )
cache_generation 553943
uuid_tree_generation 594
dev_item.uuid 5f74e436-f8f9-43ba-95fc-44cdb2bc1838
dev_item.fsid 44803366-3981-4ebb-853b-6c991380c8a6 [match]
dev_item.type 0
dev_item.total_bytes 5992192409600
dev_item.bytes_used 2946381119488
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 3
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0
next prev parent reply other threads:[~2020-01-03 21:02 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel
2019-12-26 5:08 ` Qu Wenruo
2019-12-26 13:17 ` Leszek Dubiel
2019-12-26 13:44 ` Remi Gauvin
2019-12-26 14:05 ` Leszek Dubiel
2019-12-26 14:21 ` Remi Gauvin
2019-12-26 15:42 ` Leszek Dubiel
2019-12-26 22:40 ` Chris Murphy
2019-12-26 22:58 ` Leszek Dubiel
2019-12-28 17:04 ` Leszek Dubiel
2019-12-28 20:23 ` Zygo Blaxell
2020-01-02 18:37 ` Leszek Dubiel
2020-01-02 21:57 ` Chris Murphy
2020-01-02 22:39 ` Leszek Dubiel
2020-01-02 23:22 ` Chris Murphy
2020-01-03 9:08 ` Leszek Dubiel
2020-01-03 19:15 ` Chris Murphy
2020-01-03 14:39 ` Leszek Dubiel
2020-01-03 19:02 ` Chris Murphy
2020-01-03 20:59 ` Leszek Dubiel [this message]
2020-01-04 5:38 ` Zygo Blaxell
2020-01-07 18:44 ` write amplification, was: " Chris Murphy
2020-01-07 19:26 ` Holger Hoffstätte
2020-01-07 23:32 ` Zygo Blaxell
2020-01-07 23:53 ` Chris Murphy
2020-01-08 1:41 ` Zygo Blaxell
2020-01-08 2:54 ` Chris Murphy
2020-01-06 11:14 ` Leszek Dubiel
2020-01-07 0:21 ` Chris Murphy
2020-01-07 7:09 ` Leszek Dubiel
2019-12-26 22:15 ` Chris Murphy
2019-12-26 22:48 ` Leszek Dubiel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3e13e4ad-0143-2a92-503b-7bf2fae9a527@dubiel.pl \
--to=leszek@dubiel.pl \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.