All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leszek Dubiel <leszek@dubiel.pl>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Subject: Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
Date: Fri, 3 Jan 2020 10:08:45 +0100	[thread overview]
Message-ID: <283b1c8a-9923-4612-0bbf-acb2a731e726@dubiel.pl> (raw)
In-Reply-To: <CAJCQCtSr9j8AzLRfguHb8+9n_snxmpXkw0V+LiuDnqqvLVAxKQ@mail.gmail.com>



W dniu 03.01.2020 o 00:22, Chris Murphy pisze:
 > On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@dubiel.pl> wrote:
 >
 > > This system could have a few million (!) of small files.
 > > On reiserfs it takes about 40 minutes, to do "find /".
 > > Rsync runs for 6 hours to backup data.
 >
 >
 > There is a mount option:  max_inline=<bytes> which the man page says
 > (default: min(2048, page size) )
 >
 > I've never used it, so in theory the max_inline byte size is 2KiB.
 > However, I have seen substantially larger inline extents than 2KiB
 > when using a nodesize larger than 16KiB at mkfs time.
 >
 > I've wondered whether it makes any difference for the "many small
 > files" case to do more aggressive inlining of extents.
 >
 > I've seen with 16KiB leaf size, often small files that could be
 > inlined, are instead put into a data block group, taking up a minimum
 > 4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
 > there just isn't enough room in that leaf to always use inline
 > extents, and yet there is enough room to just reference it as a data
 > block group extent. When using a larger node size, a larger percentage
 > of small files ended up using inline extents. I'd expect this to be
 > quite a bit more efficient, because it eliminates a time expensive (on
 > HDD anyway) seek.

I will try that option when making new disks with BTRFS.
Then I'll report about efficiency.





 > Another optimization, using compress=zstd:1, which is the lowest
 > compression setting. That'll increase the chance a file can use inline
 > extents, in particular with a larger nodesize.
 >
 > And still another optimization, at the expense of much more
 > complexity, is LVM cache with an SSD. You'd have to pick a suitable
 > policy for the workload, but I expect that if the iostat utilizations
 > you see of often near max utilization in normal operation, you'll see
 > improved performance. SSD's can handle way higher iops than HDD. But a
 > lot of this optimization stuff is use case specific. I'm not even sure
 > what your mean small file size is.



There is 11 million files:

root@gamma:/mnt/sdb1# find orion2 > listor2
root@gamma:/mnt/sdb1# ls -lt listor2
-rw-r--r-- 1 root root 988973729 sty  3 03:09 listor2
root@gamma:/mnt/sdb1# wc -l listor2
11329331 listor2


And df on reiserfs shows:

root@orion:~# df  -h -BM
System plików    1M-bl   used      avail %uż. zamont. na
/dev/md0        71522M  10353M   61169M  15% /
/dev/md1       905967M 731199M  174768M  81% /root

10353 + 731199 = 741552 M,

that is average file size is 741552 * 1000000 / 11000000 = 67413 bytes 
per file.
This estimation is not good, because df counts in blocks...

I will count more precisely with df --apparent-size.





 >> # iotop -d30
 >>
 >> Total DISK READ:        34.12 M/s | Total DISK WRITE: 40.36 M/s
 >> Current DISK READ:      34.12 M/s | Current DISK WRITE:      79.22 M/s
 >>    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN IO> COMMAND
 >>   4596 be/4 root       34.12 M/s   37.79 M/s  0.00 % 91.77 % btrfs
 >
 > Not so bad for many small file reads and writes with HDD. I've see
 > this myself with single spindle when doing small file reads and
 > writes.


So small files slow down in my case.
Ok! Thank you for the expertise.



PS. This morning:

root@wawel:~# btrfs bala stat /
Balance on '/' is running
1227 out of about 1231 chunks balanced (5390 considered),   0% left

So during the night it balanced  600Gb + 600Gb = 1.2Tb of
data in single profile to raid1 in about 12 hours. That is:

(600 + 600) * 1000 Mb/Gb / (12 hours * 3600 sec/hour)
       = (600 + 600) * 1000 / (12 × 3600)
             = 27 Mb/sec




root@wawel:~# btrfs dev usag /
/dev/sda2, ID: 2
    Device size:             5.45TiB
    Device slack:              0.00B
    Data,RAID1:              2.62TiB
    Metadata,RAID1:         22.00GiB
    Unallocated:             2.81TiB

/dev/sdb2, ID: 3
    Device size:             5.45TiB
    Device slack:              0.00B
    Data,RAID1:              2.62TiB
    Metadata,RAID1:         21.00GiB
    System,RAID1:           32.00MiB
    Unallocated:             2.81TiB

/dev/sdc3, ID: 4
    Device size:            10.90TiB
    Device slack:            3.50KiB
    Data,RAID1:              5.24TiB
    Metadata,RAID1:         33.00GiB
    System,RAID1:           32.00MiB
    Unallocated:             5.62TiB





root@wawel:~# iostat 10  -x
Linux 4.19.0-6-amd64 (wawel)     03.01.2020     _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,00    0,00    0,00    0,00    0,00  100,00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00
sdb              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00
sdc              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            0,04    0,00    0,08    0,00    0,00   99,89

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s wrqm/s  
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00
sdb              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00
sdc              0,00    0,00      0,00      0,00     0,00 0,00   0,00   
0,00    0,00    0,00   0,00     0,00     0,00 0,00   0,00







  reply	other threads:[~2020-01-03  9:08 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel
2019-12-26  5:08 ` Qu Wenruo
2019-12-26 13:17   ` Leszek Dubiel
2019-12-26 13:44     ` Remi Gauvin
2019-12-26 14:05       ` Leszek Dubiel
2019-12-26 14:21         ` Remi Gauvin
2019-12-26 15:42           ` Leszek Dubiel
2019-12-26 22:40         ` Chris Murphy
2019-12-26 22:58           ` Leszek Dubiel
2019-12-28 17:04             ` Leszek Dubiel
2019-12-28 20:23               ` Zygo Blaxell
2020-01-02 18:37                 ` Leszek Dubiel
2020-01-02 21:57                   ` Chris Murphy
2020-01-02 22:39                     ` Leszek Dubiel
2020-01-02 23:22                       ` Chris Murphy
2020-01-03  9:08                         ` Leszek Dubiel [this message]
2020-01-03 19:15                           ` Chris Murphy
2020-01-03 14:39                         ` Leszek Dubiel
2020-01-03 19:02                           ` Chris Murphy
2020-01-03 20:59                             ` Leszek Dubiel
2020-01-04  5:38                         ` Zygo Blaxell
2020-01-07 18:44                           ` write amplification, was: " Chris Murphy
2020-01-07 19:26                             ` Holger Hoffstätte
2020-01-07 23:32                             ` Zygo Blaxell
2020-01-07 23:53                               ` Chris Murphy
2020-01-08  1:41                                 ` Zygo Blaxell
2020-01-08  2:54                                   ` Chris Murphy
2020-01-06 11:14                     ` Leszek Dubiel
2020-01-07  0:21                       ` Chris Murphy
2020-01-07  7:09                         ` Leszek Dubiel
2019-12-26 22:15 ` Chris Murphy
2019-12-26 22:48   ` Leszek Dubiel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=283b1c8a-9923-4612-0bbf-acb2a731e726@dubiel.pl \
    --to=leszek@dubiel.pl \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.