All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Leszek Dubiel <leszek@dubiel.pl>
Cc: Chris Murphy <lists@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Subject: Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data
Date: Thu, 2 Jan 2020 16:22:37 -0700	[thread overview]
Message-ID: <CAJCQCtSr9j8AzLRfguHb8+9n_snxmpXkw0V+LiuDnqqvLVAxKQ@mail.gmail.com> (raw)
In-Reply-To: <5e6e2ff8-89be-45db-49d3-802de42663ed@dubiel.pl>

On Thu, Jan 2, 2020 at 3:39 PM Leszek Dubiel <leszek@dubiel.pl> wrote:

>  > Almost no reads, all writes, but slow. And rather high write request
>  > per second, almost double for sdc. And sdc is near it's max
>  > utilization so it might be ear to its iops limit?
>  >
>  > ~210 rareq-sz = 210KiB is the average size of the read request for
> sda and sdb
>  >
>  > Default mkfs and default mount options? Or other and if so what other?
>  >
>  > Many small files on this file system? Or possibly large files with a
>  > lot of fragmentation?
>
> Default mkfs and default mount options.
>
> This system could have a few million (!) of small files.
> On reiserfs it takes about 40 minutes, to do "find /".
> Rsync runs for 6 hours to backup data.

There is a mount option:  max_inline=<bytes> which the man page says
(default: min(2048, page size) )

I've never used it, so in theory the max_inline byte size is 2KiB.
However, I have seen substantially larger inline extents than 2KiB
when using a nodesize larger than 16KiB at mkfs time.

I've wondered whether it makes any difference for the "many small
files" case to do more aggressive inlining of extents.

I've seen with 16KiB leaf size, often small files that could be
inlined, are instead put into a data block group, taking up a minimum
4KiB block size (on x64_64 anyway). I'm not sure why, but I suspect
there just isn't enough room in that leaf to always use inline
extents, and yet there is enough room to just reference it as a data
block group extent. When using a larger node size, a larger percentage
of small files ended up using inline extents. I'd expect this to be
quite a bit more efficient, because it eliminates a time expensive (on
HDD anyway) seek.

Another optimization, using compress=zstd:1, which is the lowest
compression setting. That'll increase the chance a file can use inline
extents, in particular with a larger nodesize.

And still another optimization, at the expense of much more
complexity, is LVM cache with an SSD. You'd have to pick a suitable
policy for the workload, but I expect that if the iostat utilizations
you see of often near max utilization in normal operation, you'll see
improved performance. SSD's can handle way higher iops than HDD. But a
lot of this optimization stuff is use case specific. I'm not even sure
what your mean small file size is.

> # iotop -d30
>
> Total DISK READ:        34.12 M/s | Total DISK WRITE: 40.36 M/s
> Current DISK READ:      34.12 M/s | Current DISK WRITE:      79.22 M/s
>    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO> COMMAND
>   4596 be/4 root       34.12 M/s   37.79 M/s  0.00 % 91.77 % btrfs

Not so bad for many small file reads and writes with HDD. I've see
this myself with single spindle when doing small file reads and
writes.


-- 
Chris Murphy

  reply	other threads:[~2020-01-02 23:22 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-25 22:35 very slow "btrfs dev delete" 3x6Tb, 7Tb of data Leszek Dubiel
2019-12-26  5:08 ` Qu Wenruo
2019-12-26 13:17   ` Leszek Dubiel
2019-12-26 13:44     ` Remi Gauvin
2019-12-26 14:05       ` Leszek Dubiel
2019-12-26 14:21         ` Remi Gauvin
2019-12-26 15:42           ` Leszek Dubiel
2019-12-26 22:40         ` Chris Murphy
2019-12-26 22:58           ` Leszek Dubiel
2019-12-28 17:04             ` Leszek Dubiel
2019-12-28 20:23               ` Zygo Blaxell
2020-01-02 18:37                 ` Leszek Dubiel
2020-01-02 21:57                   ` Chris Murphy
2020-01-02 22:39                     ` Leszek Dubiel
2020-01-02 23:22                       ` Chris Murphy [this message]
2020-01-03  9:08                         ` Leszek Dubiel
2020-01-03 19:15                           ` Chris Murphy
2020-01-03 14:39                         ` Leszek Dubiel
2020-01-03 19:02                           ` Chris Murphy
2020-01-03 20:59                             ` Leszek Dubiel
2020-01-04  5:38                         ` Zygo Blaxell
2020-01-07 18:44                           ` write amplification, was: " Chris Murphy
2020-01-07 19:26                             ` Holger Hoffstätte
2020-01-07 23:32                             ` Zygo Blaxell
2020-01-07 23:53                               ` Chris Murphy
2020-01-08  1:41                                 ` Zygo Blaxell
2020-01-08  2:54                                   ` Chris Murphy
2020-01-06 11:14                     ` Leszek Dubiel
2020-01-07  0:21                       ` Chris Murphy
2020-01-07  7:09                         ` Leszek Dubiel
2019-12-26 22:15 ` Chris Murphy
2019-12-26 22:48   ` Leszek Dubiel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJCQCtSr9j8AzLRfguHb8+9n_snxmpXkw0V+LiuDnqqvLVAxKQ@mail.gmail.com \
    --to=lists@colorremedies.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=leszek@dubiel.pl \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.