From: Marc MERLIN <marc@merlins.org>
To: Martin <m_btrfs@ml1.co.uk>, Xavier Nicollet <nicollet@jeru.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: How to debug very very slow file delete? (btrfs on md-raid5)
Date: Tue, 25 Mar 2014 09:41:42 -0700 [thread overview]
Message-ID: <20140325164142.GN12833@merlins.org> (raw)
In-Reply-To: <20140325135756.GA14382@jeru.org> <lgrrtf$el3$1@ger.gmane.org>
On Tue, Mar 25, 2014 at 12:13:50PM +0000, Martin wrote:
> On 25/03/14 01:49, Marc MERLIN wrote:
> > I had a tree with some amount of thousand files (less than 1 million)
> > on top of md raid5.
> >
> > It took 18H to rm it in 3 tries:
I ran another test after typing the original Email:
gargamel:/mnt/dshelf2/backup/polgara# time du -sh 20140312-feisty/; time find 20140 312-feisty/ | wc -l
17G 20140312-feisty/
real 245m19.491s
user 0m2.108s
sys 1m0.508s
728507 <- number of files
real 11m41.853s <- 11mn to restat them when they should all be in cache ideally
user 0m1.040s
sys 0m4.360s
4 hours to stat 700K files. That's bad...
Even 11mn to restat them just to count them looks bad too.
> > I checked that btrfs scrub is not running.
> > What else can I check from here?
>
> "noatime" set?
I have relatime
gargamel:/mnt/dshelf2/backup/polgara# df .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/dshelf2 7814041600 3026472436 4760588292 39% /mnt/dshelf2/backup
gargamel:/mnt/dshelf2/backup/polgara# grep /mnt/dshelf2/backup /proc/mounts
/dev/mapper/dshelf2 /mnt/dshelf2/backup btrfs rw,relatime,compress=lzo,space_cache 0 0
> What's your cpu hardware wait time?
Sorry, not sure how to get that.
> And is not *the 512kByte raid chunk* going to give you horrendous write
> amplification?! For example, rm updates a few bytes in one 4kByte
> metadata block and the system has to then do a read-modify-write on
> 512kBytes...
That's probably not great, but
1) rm -rf should bunch a lot of writes together before they start
hitting the block layer for writes, so I'm not sure that is too much a
problem with the caching layer in between
2) this does not explain 4H to just run du with relatime, which
shouldn't generate any writing, correct?
iostat seems to confirm:
gargamel:~# iostat /dev/md8 1 20
Linux 3.14.0-rc5-amd64-i915-preempt-20140216c (gargamel.svh.merlins.org) 03/25/2014 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
75.19 0.00 10.13 8.61 0.00 6.08
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
md8 98.00 392.00 0.00 392 0
md8 96.00 384.00 0.00 384 0
md8 83.00 332.00 0.00 332 0
md8 153.00 612.00 0.00 612 0
md8 82.00 328.00 0.00 328 0
md8 55.00 220.00 0.00 220 0
md8 69.00 276.00 0.00 276 0
> Also, the 64MByte chunk bit-intent map will add a lot of head seeks to
> anything you do on that raid. (The map would be better on a separate SSD
> or other separate drive.)
That's true for writing, but not reading, right?
> So... That sort of setup is fine for archived data that is effectively
> read-only. You'll see poor performance for small writes/changes.
So I agree with you that the write case can be improved, especially since I also have a layer
of dmcrypt in the middle
gargamel:/mnt/dshelf2/backup/polgara# cryptsetup luksDump /dev/md8
LUKS header information for /dev/md8
Cipher name: aes
Cipher mode: xts-plain64
Hash spec: sha1
Payload offset: 8192
(I used cryptsetup luksFormat --align-payload=8192 -s 256 -c aes-xts-plain64)
I'm still not convinced that a lot of file IO don't get all collated in memory
before hitting disk in bigger blocks, but maybe not.
If I were to recreate this array entirely, what would you use for the raid creation
and cryptsetup?
More generally, before I go through all that trouble (it will likely
take 1 week of data copying back and forth), I'd like to debug why my reads are
so slow first.
Thanks,
Marc
On Tue, Mar 25, 2014 at 02:57:57PM +0100, Xavier Nicollet wrote:
> Le 25 mars 2014 à 12:13, Martin a écrit:
> > On 25/03/14 01:49, Marc MERLIN wrote:
> > > It took 18H to rm it in 3 tries:
>
> > And is not *the 512kByte raid chunk* going to give you horrendous write
> > amplification?! For example, rm updates a few bytes in one 4kByte
> > metadata block and the system has to then do a read-modify-write on
> > 512kBytes...
>
> My question would be naive, but would it be possible to have a syscall or something to do
> a fast "rm -rf" or du ?
Well, that wouldn't hurt either, even if it wouldn't address my underlying problem.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
next prev parent reply other threads:[~2014-03-25 16:41 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-07 16:05 btrfs on 3.14rc5 stuck on "btrfs_tree_read_lock sync" Marc MERLIN
2014-04-07 16:10 ` Josef Bacik
2014-04-07 18:51 ` Marc MERLIN
2014-04-07 19:32 ` Chris Mason
2014-04-07 20:00 ` Marc MERLIN
2014-04-09 17:38 ` Marc MERLIN
2014-03-25 1:49 ` How to debug very very slow file delete? Marc MERLIN
2014-03-25 12:13 ` How to debug very very slow file delete? (btrfs on md-raid5) Martin
2014-03-25 13:57 ` Xavier Nicollet
2014-03-25 16:41 ` Marc MERLIN [this message]
2014-04-10 17:07 ` How to debug very very slow file delete? (btrfs on md-raid5 with many files, 70GB metadata) Marc MERLIN
2014-04-11 14:15 ` How to debug very very slow file delete? (btrfs on md-raid5) Chris Samuel
2014-04-11 17:23 ` Marc MERLIN
2014-04-11 18:00 ` Duncan
2014-04-11 19:15 ` Roman Mamedov
2014-04-12 20:25 ` very slow btrfs filesystem: any data needed before I wipe it? Marc MERLIN
2014-04-13 4:02 ` Duncan
2014-04-14 1:43 ` Marc MERLIN
2014-04-14 10:28 ` Duncan
2014-04-16 22:35 ` Marc MERLIN
2014-04-13 14:57 ` Marc MERLIN
2014-04-13 16:59 ` what does your btrfsck look like? Marc MERLIN
2014-04-14 2:15 ` How to debug very very slow file delete? Liu Bo
2014-04-14 2:21 ` Liu Bo
2014-06-09 23:40 ` btrfs balance crash BUG ON fs/btrfs/relocation.c:1062 or RIP build_backref_tree+0x9fc/0xcc4 Marc MERLIN
2014-06-10 0:32 ` Russell Coker
2014-06-10 4:58 ` Marc MERLIN
2014-06-14 16:21 ` Marc MERLIN
2014-06-17 18:29 ` Josef Bacik
2014-06-17 18:55 ` Marc MERLIN
2014-06-18 15:26 ` Josef Bacik
2014-06-18 20:21 ` Marc MERLIN
2014-06-19 16:12 ` Josef Bacik
2014-06-19 22:25 ` Marc MERLIN
2014-06-19 22:50 ` Josef Bacik
2014-06-20 0:53 ` Marc MERLIN
2014-06-20 15:40 ` Josef Bacik
2014-06-25 19:40 ` Marc MERLIN
2014-06-25 21:05 ` Josef Bacik
2015-05-05 21:02 ` 3.19.6: __btrfs_free_extent:5987: errno=-2 No such entry, did btrfs check --repair break it? Marc MERLIN
2015-05-06 11:04 ` Duncan
2015-05-06 17:25 ` Chris Murphy
2015-05-07 3:15 ` Duncan
2015-05-06 17:49 ` Marc MERLIN
-- strict thread matches above, loose matches on Subject: below --
2014-09-03 17:42 kernel BUG at fs/btrfs/extent-tree.c:7727! with 3.17-rc3 Tomasz Chmielewski
2014-09-03 12:04 ` kernel BUG at fs/btrfs/relocation.c:1065 in 3.14.16 to 3.17-rc3 Olivier Bonvalet
2014-09-29 14:13 ` Liu Bo
[not found] ` <20140824000720.GN3875@merlins.org>
[not found] ` <20140926214821.GX13219@merlins.org>
[not found] ` <20150502141102.GB1809@merlins.org>
[not found] ` <20150501210013.GH13624@merlins.org>
2015-04-29 23:21 ` 3.19.3, btrfs send/receive error: failed to clone extents Marc MERLIN
2015-05-02 16:30 ` 3.19.3: check tree block failed + WARNING: device 0 not present on scrub Marc MERLIN
2015-05-02 16:50 ` Christian Dysthe
2015-05-02 17:05 ` Marc MERLIN
2015-05-02 17:20 ` Christian Dysthe
2015-05-02 17:29 ` Marc MERLIN
2015-05-02 18:56 ` Christian Dysthe
2015-05-05 6:32 ` Marc MERLIN
2015-05-05 19:56 ` 3.19.6: __btrfs_free_extent:5987: errno=-2 No such entry Marc MERLIN
2014-09-08 18:04 ` kernel BUG at fs/btrfs/extent-tree.c:7727! with 3.17-rc3 Tomasz Chmielewski
2014-10-04 1:19 ` Tomasz Chmielewski
2014-04-02 8:29 [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Qu Wenruo
2014-04-02 8:29 ` [PATCH 01/27] btrfs-progs: Introduce asciidoc based man page and btrfs man page Qu Wenruo
2014-04-02 8:29 ` [PATCH 02/27] btrfs-progs: Convert man page for btrfs-subvolume Qu Wenruo
2014-04-02 8:29 ` [PATCH 03/27] btrfs-progs: Convert man page for filesystem subcommand Qu Wenruo
2014-04-02 8:29 ` [PATCH 04/27] btrfs-progs: Convert man page for btrfs-balance Qu Wenruo
2014-04-02 8:29 ` [PATCH 05/27] btrfs-progs: Convert man page for btrfs-device subcommand Qu Wenruo
2014-04-02 8:29 ` [PATCH 06/27] btrfs-progs: Convert man page for btrfs-scrub Qu Wenruo
2014-04-02 8:29 ` [PATCH 07/27] btrfs-progs: Convert man page for btrfs-check Qu Wenruo
2014-04-02 8:29 ` [PATCH 08/27] btrfs-progs: Convert man page for btrfs-rescue Qu Wenruo
2014-04-02 8:29 ` [PATCH 09/27] btrfs-progs: Convert man page for btrfs-inspect-internal Qu Wenruo
2014-04-02 8:29 ` [PATCH 10/27] btrfs-progs: Convert man page for btrfs-send Qu Wenruo
2014-04-02 8:29 ` [PATCH 11/27] btrfs-progs: Convert man page for btrfs-receive Qu Wenruo
2014-04-02 8:29 ` [PATCH 12/27] btrfs-progs: Convert man page for btrfs-quota Qu Wenruo
2014-04-02 8:29 ` [PATCH 13/27] btrfs-progs: Convert and enhance the man page of btrfs-qgroup Qu Wenruo
2014-04-02 8:29 ` [PATCH 14/27] btrfs-progs: Convert man page for btrfs-replace Qu Wenruo
2014-04-04 20:29 ` Marc MERLIN
2014-04-08 1:20 ` Qu Wenruo
2014-04-02 8:29 ` [PATCH 15/27] btrfs-progs: Convert man page for btrfs-dedup Qu Wenruo
2014-04-02 8:29 ` [PATCH 16/27] btrfs-progs: Convert man page for btrfsck Qu Wenruo
2014-04-02 8:29 ` [PATCH 17/27] btrfs-progs: Convert man page for btrfs-convert Qu Wenruo
2014-04-02 8:29 ` [PATCH 18/27] btrfs-progs: Convert man page for btrfs-debug-tree Qu Wenruo
2014-04-02 8:29 ` [PATCH 19/27] btrfs-progs: Convert man page for btrfs-find-root Qu Wenruo
2014-04-02 8:29 ` [PATCH 20/27] btrfs-progs: Convert man page for btrfs-image Qu Wenruo
2014-04-02 8:29 ` [PATCH 21/27] btrfs-progs: Convert man page for btrfs-map-logical Qu Wenruo
2014-04-02 8:29 ` [PATCH 22/27] btrfs-progs: Convert man page for btrfs-show-super Qu Wenruo
2014-04-02 8:29 ` [PATCH 23/27] btrfs-progs: Convert man page for btrfstune Qu Wenruo
2014-04-02 8:29 ` [PATCH 24/27] btrfs-progs: Convert man page for btrfs-zero-log Qu Wenruo
2014-04-04 18:46 ` Marc MERLIN
2014-04-05 22:00 ` cwillu
2014-04-05 22:02 ` Marc MERLIN
2014-04-05 22:03 ` Hugo Mills
2014-04-05 22:21 ` Marc MERLIN
2014-04-05 22:05 ` Marc MERLIN
2014-04-05 22:02 ` Hugo Mills
2014-04-08 1:42 ` Qu Wenruo
2014-04-11 5:54 ` Marc MERLIN
2014-04-02 8:29 ` [PATCH 25/27] btrfs-progs: Convert man page for fsck.btrfs Qu Wenruo
2014-04-02 8:29 ` [PATCH 26/27] btrfs-progs: Convert man page for mkfs.btrfs Qu Wenruo
2014-04-02 8:29 ` [PATCH 27/27] btrfs-progs: Switch to the new asciidoc Documentation Qu Wenruo
2014-04-02 13:24 ` [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Chris Mason
2014-04-02 14:47 ` Marc MERLIN
2014-04-03 20:33 ` Zach Brown
2014-04-02 17:29 ` David Sterba
2014-04-16 17:12 ` David Sterba
2014-04-16 17:16 ` [PATCH] btrfs-progs: doc: link btrfsck to btrfs-check David Sterba
2014-04-17 0:47 ` Qu Wenruo
2014-04-18 14:48 ` David Sterba
2014-04-30 12:14 ` WorMzy Tykashi
2014-05-05 14:57 ` David Sterba
2014-05-08 1:40 ` Qu Wenruo
2014-05-12 14:09 ` David Sterba
2014-06-03 9:38 ` WorMzy Tykashi
2014-06-03 12:19 ` David Sterba
2014-05-17 17:43 ` [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Hugo Mills
2014-05-17 18:22 ` Hugo Mills
2014-05-18 7:04 ` Qu Wenruo
2014-05-18 12:05 ` Hugo Mills
2014-05-18 16:02 ` Brendan Hide
2014-05-19 0:35 ` Qu Wenruo
2014-05-18 6:51 ` Qu Wenruo
2014-05-18 10:10 ` Hugo Mills
2014-05-19 13:02 ` Chris Mason
2014-05-19 14:01 ` David Sterba
2014-05-19 14:33 ` David Sterba
2014-05-20 0:34 ` Qu Wenruo
2014-05-20 11:08 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140325164142.GN12833@merlins.org \
--to=marc@merlins.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=m_btrfs@ml1.co.uk \
--cc=nicollet@jeru.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).