linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Martin <m_btrfs@ml1.co.uk>, Xavier Nicollet <nicollet@jeru.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: How to debug very very slow file delete? (btrfs on md-raid5)
Date: Tue, 25 Mar 2014 09:41:42 -0700	[thread overview]
Message-ID: <20140325164142.GN12833@merlins.org> (raw)
In-Reply-To: <20140325135756.GA14382@jeru.org> <lgrrtf$el3$1@ger.gmane.org>

On Tue, Mar 25, 2014 at 12:13:50PM +0000, Martin wrote:
> On 25/03/14 01:49, Marc MERLIN wrote:
> > I had a tree with some amount of thousand files (less than 1 million)
> > on top of md raid5.
> > 
> > It took 18H to rm it in 3 tries:

I ran another test after typing the original Email:
gargamel:/mnt/dshelf2/backup/polgara# time du -sh 20140312-feisty/; time find 20140 312-feisty/ | wc -l
17G     20140312-feisty/
real    245m19.491s
user    0m2.108s
sys     1m0.508s

728507 <- number of files
real    11m41.853s <- 11mn to restat them when they should all be in cache ideally
user    0m1.040s
sys     0m4.360s

4 hours to stat 700K files. That's bad...
Even 11mn to restat them just to count them looks bad too.

> > I checked that btrfs scrub is not running.
> > What else can I check from here?
> 
> "noatime" set?

I have relatime
gargamel:/mnt/dshelf2/backup/polgara# df .
Filesystem           1K-blocks       Used  Available Use% Mounted on
/dev/mapper/dshelf2 7814041600 3026472436 4760588292  39% /mnt/dshelf2/backup

gargamel:/mnt/dshelf2/backup/polgara# grep /mnt/dshelf2/backup /proc/mounts
/dev/mapper/dshelf2 /mnt/dshelf2/backup btrfs rw,relatime,compress=lzo,space_cache 0 0
 
> What's your cpu hardware wait time?
 
Sorry, not sure how to get that.
 
> And is not *the 512kByte raid chunk* going to give you horrendous write
> amplification?! For example, rm updates a few bytes in one 4kByte
> metadata block and the system has to then do a read-modify-write on
> 512kBytes...

That's probably not great, but
1) rm -rf should bunch a lot of writes together before they start
hitting the block layer for writes, so I'm not sure that is too much a
problem with the caching layer in between

2) this does not explain 4H to just run du with relatime, which
shouldn't generate any writing, correct?
iostat seems to confirm:

gargamel:~# iostat /dev/md8 1 20
Linux 3.14.0-rc5-amd64-i915-preempt-20140216c (gargamel.svh.merlins.org)        03/25/2014      _x86_64_        (4 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
          75.19    0.00   10.13    8.61    0.00    6.08
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
md8              98.00       392.00         0.00        392          0
md8              96.00       384.00         0.00        384          0
md8              83.00       332.00         0.00        332          0
md8             153.00       612.00         0.00        612          0
md8              82.00       328.00         0.00        328          0
md8              55.00       220.00         0.00        220          0
md8              69.00       276.00         0.00        276          0

> Also, the 64MByte chunk bit-intent map will add a lot of head seeks to
> anything you do on that raid. (The map would be better on a separate SSD
> or other separate drive.)

That's true for writing, but not reading, right?
 
> So... That sort of setup is fine for archived data that is effectively
> read-only. You'll see poor performance for small writes/changes.

So I agree with you that the write case can be improved, especially since I also have a layer
of dmcrypt in the middle
gargamel:/mnt/dshelf2/backup/polgara# cryptsetup luksDump /dev/md8
LUKS header information for /dev/md8
Cipher name:    aes
Cipher mode:    xts-plain64
Hash spec:      sha1
Payload offset: 8192

(I used cryptsetup luksFormat --align-payload=8192 -s 256 -c aes-xts-plain64)

I'm still not convinced that a lot of file IO don't get all collated in memory 
before hitting disk in bigger blocks, but maybe not.

If I were to recreate this array entirely, what would you use for the raid creation
and cryptsetup?

More generally, before I go through all that trouble (it will likely
take 1 week of data copying back and forth), I'd like to debug why my reads are
so slow first.

Thanks,
Marc

On Tue, Mar 25, 2014 at 02:57:57PM +0100, Xavier Nicollet wrote:
> Le 25 mars 2014 à 12:13, Martin a écrit:
> > On 25/03/14 01:49, Marc MERLIN wrote:
> > > It took 18H to rm it in 3 tries:
> 
> > And is not *the 512kByte raid chunk* going to give you horrendous write
> > amplification?! For example, rm updates a few bytes in one 4kByte
> > metadata block and the system has to then do a read-modify-write on
> > 512kBytes...
> 
> My question would be naive, but would it be possible to have a syscall or something to do 
> a fast "rm -rf" or du ?

Well, that wouldn't hurt either, even if it wouldn't address my underlying problem.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

  parent reply	other threads:[~2014-03-25 16:41 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-07 16:05 btrfs on 3.14rc5 stuck on "btrfs_tree_read_lock sync" Marc MERLIN
2014-04-07 16:10 ` Josef Bacik
2014-04-07 18:51   ` Marc MERLIN
2014-04-07 19:32     ` Chris Mason
2014-04-07 20:00       ` Marc MERLIN
2014-04-09 17:38         ` Marc MERLIN
2014-03-25  1:49           ` How to debug very very slow file delete? Marc MERLIN
2014-03-25 12:13             ` How to debug very very slow file delete? (btrfs on md-raid5) Martin
2014-03-25 13:57               ` Xavier Nicollet
2014-03-25 16:41               ` Marc MERLIN [this message]
2014-04-10 17:07                 ` How to debug very very slow file delete? (btrfs on md-raid5 with many files, 70GB metadata) Marc MERLIN
2014-04-11 14:15                 ` How to debug very very slow file delete? (btrfs on md-raid5) Chris Samuel
2014-04-11 17:23                   ` Marc MERLIN
2014-04-11 18:00                     ` Duncan
2014-04-11 19:15                     ` Roman Mamedov
2014-04-12 20:25             ` very slow btrfs filesystem: any data needed before I wipe it? Marc MERLIN
2014-04-13  4:02               ` Duncan
2014-04-14  1:43                 ` Marc MERLIN
2014-04-14 10:28                   ` Duncan
2014-04-16 22:35                     ` Marc MERLIN
2014-04-13 14:57               ` Marc MERLIN
2014-04-13 16:59                 ` what does your btrfsck look like? Marc MERLIN
2014-04-14  2:15             ` How to debug very very slow file delete? Liu Bo
2014-04-14  2:21               ` Liu Bo
2014-06-09 23:40         ` btrfs balance crash BUG ON fs/btrfs/relocation.c:1062 or RIP build_backref_tree+0x9fc/0xcc4 Marc MERLIN
2014-06-10  0:32           ` Russell Coker
2014-06-10  4:58             ` Marc MERLIN
2014-06-14 16:21           ` Marc MERLIN
2014-06-17 18:29           ` Josef Bacik
2014-06-17 18:55             ` Marc MERLIN
2014-06-18 15:26               ` Josef Bacik
2014-06-18 20:21                 ` Marc MERLIN
2014-06-19 16:12                   ` Josef Bacik
2014-06-19 22:25                     ` Marc MERLIN
2014-06-19 22:50                       ` Josef Bacik
2014-06-20  0:53                         ` Marc MERLIN
2014-06-20 15:40                           ` Josef Bacik
2014-06-25 19:40                             ` Marc MERLIN
2014-06-25 21:05                               ` Josef Bacik
2015-05-05 21:02           ` 3.19.6: __btrfs_free_extent:5987: errno=-2 No such entry, did btrfs check --repair break it? Marc MERLIN
2015-05-06 11:04             ` Duncan
2015-05-06 17:25               ` Chris Murphy
2015-05-07  3:15                 ` Duncan
2015-05-06 17:49               ` Marc MERLIN
  -- strict thread matches above, loose matches on Subject: below --
2014-09-03 17:42 kernel BUG at fs/btrfs/extent-tree.c:7727! with 3.17-rc3 Tomasz Chmielewski
2014-09-03 12:04 ` kernel BUG at fs/btrfs/relocation.c:1065 in 3.14.16 to 3.17-rc3 Olivier Bonvalet
2014-09-29 14:13   ` Liu Bo
     [not found]   ` <20140824000720.GN3875@merlins.org>
     [not found]     ` <20140926214821.GX13219@merlins.org>
     [not found]       ` <20150502141102.GB1809@merlins.org>
     [not found]         ` <20150501210013.GH13624@merlins.org>
2015-04-29 23:21           ` 3.19.3, btrfs send/receive error: failed to clone extents Marc MERLIN
2015-05-02 16:30             ` 3.19.3: check tree block failed + WARNING: device 0 not present on scrub Marc MERLIN
2015-05-02 16:50               ` Christian Dysthe
2015-05-02 17:05                 ` Marc MERLIN
2015-05-02 17:20                   ` Christian Dysthe
2015-05-02 17:29                     ` Marc MERLIN
2015-05-02 18:56                       ` Christian Dysthe
2015-05-05  6:32               ` Marc MERLIN
2015-05-05 19:56                 ` 3.19.6: __btrfs_free_extent:5987: errno=-2 No such entry Marc MERLIN
2014-09-08 18:04 ` kernel BUG at fs/btrfs/extent-tree.c:7727! with 3.17-rc3 Tomasz Chmielewski
2014-10-04  1:19   ` Tomasz Chmielewski
2014-04-02  8:29 [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Qu Wenruo
2014-04-02  8:29 ` [PATCH 01/27] btrfs-progs: Introduce asciidoc based man page and btrfs man page Qu Wenruo
2014-04-02  8:29 ` [PATCH 02/27] btrfs-progs: Convert man page for btrfs-subvolume Qu Wenruo
2014-04-02  8:29 ` [PATCH 03/27] btrfs-progs: Convert man page for filesystem subcommand Qu Wenruo
2014-04-02  8:29 ` [PATCH 04/27] btrfs-progs: Convert man page for btrfs-balance Qu Wenruo
2014-04-02  8:29 ` [PATCH 05/27] btrfs-progs: Convert man page for btrfs-device subcommand Qu Wenruo
2014-04-02  8:29 ` [PATCH 06/27] btrfs-progs: Convert man page for btrfs-scrub Qu Wenruo
2014-04-02  8:29 ` [PATCH 07/27] btrfs-progs: Convert man page for btrfs-check Qu Wenruo
2014-04-02  8:29 ` [PATCH 08/27] btrfs-progs: Convert man page for btrfs-rescue Qu Wenruo
2014-04-02  8:29 ` [PATCH 09/27] btrfs-progs: Convert man page for btrfs-inspect-internal Qu Wenruo
2014-04-02  8:29 ` [PATCH 10/27] btrfs-progs: Convert man page for btrfs-send Qu Wenruo
2014-04-02  8:29 ` [PATCH 11/27] btrfs-progs: Convert man page for btrfs-receive Qu Wenruo
2014-04-02  8:29 ` [PATCH 12/27] btrfs-progs: Convert man page for btrfs-quota Qu Wenruo
2014-04-02  8:29 ` [PATCH 13/27] btrfs-progs: Convert and enhance the man page of btrfs-qgroup Qu Wenruo
2014-04-02  8:29 ` [PATCH 14/27] btrfs-progs: Convert man page for btrfs-replace Qu Wenruo
2014-04-04 20:29   ` Marc MERLIN
2014-04-08  1:20     ` Qu Wenruo
2014-04-02  8:29 ` [PATCH 15/27] btrfs-progs: Convert man page for btrfs-dedup Qu Wenruo
2014-04-02  8:29 ` [PATCH 16/27] btrfs-progs: Convert man page for btrfsck Qu Wenruo
2014-04-02  8:29 ` [PATCH 17/27] btrfs-progs: Convert man page for btrfs-convert Qu Wenruo
2014-04-02  8:29 ` [PATCH 18/27] btrfs-progs: Convert man page for btrfs-debug-tree Qu Wenruo
2014-04-02  8:29 ` [PATCH 19/27] btrfs-progs: Convert man page for btrfs-find-root Qu Wenruo
2014-04-02  8:29 ` [PATCH 20/27] btrfs-progs: Convert man page for btrfs-image Qu Wenruo
2014-04-02  8:29 ` [PATCH 21/27] btrfs-progs: Convert man page for btrfs-map-logical Qu Wenruo
2014-04-02  8:29 ` [PATCH 22/27] btrfs-progs: Convert man page for btrfs-show-super Qu Wenruo
2014-04-02  8:29 ` [PATCH 23/27] btrfs-progs: Convert man page for btrfstune Qu Wenruo
2014-04-02  8:29 ` [PATCH 24/27] btrfs-progs: Convert man page for btrfs-zero-log Qu Wenruo
2014-04-04 18:46   ` Marc MERLIN
2014-04-05 22:00     ` cwillu
2014-04-05 22:02       ` Marc MERLIN
2014-04-05 22:03         ` Hugo Mills
2014-04-05 22:21           ` Marc MERLIN
2014-04-05 22:05         ` Marc MERLIN
2014-04-05 22:02       ` Hugo Mills
2014-04-08  1:42     ` Qu Wenruo
2014-04-11  5:54       ` Marc MERLIN
2014-04-02  8:29 ` [PATCH 25/27] btrfs-progs: Convert man page for fsck.btrfs Qu Wenruo
2014-04-02  8:29 ` [PATCH 26/27] btrfs-progs: Convert man page for mkfs.btrfs Qu Wenruo
2014-04-02  8:29 ` [PATCH 27/27] btrfs-progs: Switch to the new asciidoc Documentation Qu Wenruo
2014-04-02 13:24 ` [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Chris Mason
2014-04-02 14:47   ` Marc MERLIN
2014-04-03 20:33   ` Zach Brown
2014-04-02 17:29 ` David Sterba
2014-04-16 17:12 ` David Sterba
2014-04-16 17:16   ` [PATCH] btrfs-progs: doc: link btrfsck to btrfs-check David Sterba
2014-04-17  0:47     ` Qu Wenruo
2014-04-18 14:48       ` David Sterba
2014-04-30 12:14         ` WorMzy Tykashi
2014-05-05 14:57           ` David Sterba
2014-05-08  1:40         ` Qu Wenruo
2014-05-12 14:09           ` David Sterba
2014-06-03  9:38             ` WorMzy Tykashi
2014-06-03 12:19               ` David Sterba
2014-05-17 17:43   ` [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Hugo Mills
2014-05-17 18:22     ` Hugo Mills
2014-05-18  7:04       ` Qu Wenruo
2014-05-18 12:05         ` Hugo Mills
2014-05-18 16:02           ` Brendan Hide
2014-05-19  0:35           ` Qu Wenruo
2014-05-18  6:51     ` Qu Wenruo
2014-05-18 10:10       ` Hugo Mills
2014-05-19 13:02     ` Chris Mason
2014-05-19 14:01     ` David Sterba
2014-05-19 14:33       ` David Sterba
2014-05-20  0:34         ` Qu Wenruo
2014-05-20 11:08           ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140325164142.GN12833@merlins.org \
    --to=marc@merlins.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=m_btrfs@ml1.co.uk \
    --cc=nicollet@jeru.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).