All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leszek Dubiel <leszek@dubiel.pl>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
Date: Fri, 3 Jan 2020 21:59:46 +0100	[thread overview]
Message-ID: <3e13e4ad-0143-2a92-503b-7bf2fae9a527@dubiel.pl> (raw)
In-Reply-To: <CAJCQCtSpFyfHAWVth5PuvjJtiHPfN52WzspOdsvLrJxMdbcirw@mail.gmail.com>



W dniu 03.01.2020 o 20:02, Chris Murphy pisze:
 > On Fri, Jan 3, 2020 at 7:39 AM Leszek Dubiel <leszek@dubiel.pl> wrote:
 >>
 >> ** number of files by given size
 >>
 >> root@wawel:/mnt/root/orion# cat disk_usage | perl -MData::Dumper -e
 >> '$Data::Dumper::Sortkeys = 1; while (<>) { chomp; my ($byt, $nam) =
 >> split /\t/, $_, -1; if (index("$las/", $nam) == 0) { $dir++; } else {
 >> $filtot++; for $p (1 .. 99) { if ($byt < 10 ** $p) { $fil{"num of files
 >> size <10^$p"}++; last; } } }; $las = $nam; }; print "\ndirectories:
 >> $dir\ntotal num of files: $filtot\n", "\nnumber of files grouped by
 >> size: \n", Dumper(\%fil) '
 >>
 >> directories: 1314246
 >> total num of files: 10123960
 >>
 >> number of files grouped by size:
 >> $VAR1 = {
 >>            'num of files size <10^1' => 3325886,
 >>            'num of files size <10^2' => 3709276,
 >>            'num of files size <10^3' => 789852,
 >>            'num of files size <10^4' => 1085927,
 >>            'num of files size <10^5' => 650571,
 >>            'num of files size <10^6' => 438717,
 >>            'num of files size <10^7' => 116757,
 >>            'num of files size <10^8' => 6638,
 >>            'num of files size <10^9' => 323
 >>            'num of files size <10^10' => 13,
 >
 > Is that really ~7.8 million files at or less than 1KiB?? (totalling
 > the first three)

Yes. This is Workflow Management system in my company (bathroom mirorr 
manufacturer).

System was made in 2004. Back then ReisierFs was great because it had 
"tail packing" and put small pieces of data together with metadata, so 
disks could read many pieces of data during one read request. Other 
systems were not any close to ReiserFs when it came to speed with lots 
of small files. That's why I'm testing BTFS for a few years now.


 > Compression may not do much with such small files, and also I'm not
 > sure which algorithm would do the best job. They all probably want a
 > lot more than 1KiB to become efficient.
 >
 > But nodesize 64KiB might be a big deal...worth testing.

Yes -- I will have to read about nodesize.
Thank you for hint.


Current data:

root@wawel:~# btrfs inspect dump-super /dev/sdb2
superblock: bytenr=65536, device=/dev/sdb2
---------------------------------------------------------
csum_type        0 (crc32c)
csum_size        4
csum            0x0bd7280d [match]
bytenr            65536
flags            0x1
             ( WRITTEN )
magic            _BHRfS_M [match]
fsid            44803366-3981-4ebb-853b-6c991380c8a6
metadata_uuid        44803366-3981-4ebb-853b-6c991380c8a6
label
generation        553943
root            17155128295424
sys_array_size        129
chunk_root_generation    553648
root_level        1
chunk_root        10136287444992
chunk_root_level    1
log_root        0
log_root_transid    0
log_root_level        0
total_bytes        23967879376896
bytes_used        5844982415360
sectorsize        4096
nodesize        16384 ---------------<<<
leafsize (deprecated)        16384
stripesize        4096
root_dir        6
num_devices        3
compat_flags        0x0
compat_ro_flags        0x0
incompat_flags        0x163
             ( MIXED_BACKREF |
               DEFAULT_SUBVOL |
               BIG_METADATA |
               EXTENDED_IREF |
               SKINNY_METADATA )
cache_generation    553943
uuid_tree_generation    594
dev_item.uuid        5f74e436-f8f9-43ba-95fc-44cdb2bc1838
dev_item.fsid        44803366-3981-4ebb-853b-6c991380c8a6 [match]
dev_item.type        0
dev_item.total_bytes    5992192409600
dev_item.bytes_used    2946381119488
dev_item.io_align    4096
dev_item.io_width    4096
dev_item.sector_size    4096
dev_item.devid        3
dev_item.dev_group    0
dev_item.seek_speed    0
dev_item.bandwidth    0
dev_item.generation    0






  reply	other threads:[~2020-01-03 21:02 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel
2019-12-26  5:08 ` Qu Wenruo
2019-12-26 13:17   ` Leszek Dubiel
2019-12-26 13:44     ` Remi Gauvin
2019-12-26 14:05       ` Leszek Dubiel
2019-12-26 14:21         ` Remi Gauvin
2019-12-26 15:42           ` Leszek Dubiel
2019-12-26 22:40         ` Chris Murphy
2019-12-26 22:58           ` Leszek Dubiel
2019-12-28 17:04             ` Leszek Dubiel
2019-12-28 20:23               ` Zygo Blaxell
2020-01-02 18:37                 ` Leszek Dubiel
2020-01-02 21:57                   ` Chris Murphy
2020-01-02 22:39                     ` Leszek Dubiel
2020-01-02 23:22                       ` Chris Murphy
2020-01-03  9:08                         ` Leszek Dubiel
2020-01-03 19:15                           ` Chris Murphy
2020-01-03 14:39                         ` Leszek Dubiel
2020-01-03 19:02                           ` Chris Murphy
2020-01-03 20:59                             ` Leszek Dubiel [this message]
2020-01-04  5:38                         ` Zygo Blaxell
2020-01-07 18:44                           ` write amplification, was: " Chris Murphy
2020-01-07 19:26                             ` Holger Hoffstätte
2020-01-07 23:32                             ` Zygo Blaxell
2020-01-07 23:53                               ` Chris Murphy
2020-01-08  1:41                                 ` Zygo Blaxell
2020-01-08  2:54                                   ` Chris Murphy
2020-01-06 11:14                     ` Leszek Dubiel
2020-01-07  0:21                       ` Chris Murphy
2020-01-07  7:09                         ` Leszek Dubiel
2019-12-26 22:15 ` Chris Murphy
2019-12-26 22:48   ` Leszek Dubiel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e13e4ad-0143-2a92-503b-7bf2fae9a527@dubiel.pl \
    --to=leszek@dubiel.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.