All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Martin <m_btrfs@ml1.co.uk>, Xavier Nicollet <nicollet@jeru.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: How to debug very very slow file delete? (btrfs on md-raid5)
Date: Tue, 25 Mar 2014 09:41:42 -0700	[thread overview]
Message-ID: <20140325164142.GN12833@merlins.org> (raw)
In-Reply-To: <20140325135756.GA14382@jeru.org> <lgrrtf$el3$1@ger.gmane.org>

On Tue, Mar 25, 2014 at 12:13:50PM +0000, Martin wrote:
> On 25/03/14 01:49, Marc MERLIN wrote:
> > I had a tree with some amount of thousand files (less than 1 million)
> > on top of md raid5.
> > 
> > It took 18H to rm it in 3 tries:

I ran another test after typing the original Email:
gargamel:/mnt/dshelf2/backup/polgara# time du -sh 20140312-feisty/; time find 20140 312-feisty/ | wc -l
17G     20140312-feisty/
real    245m19.491s
user    0m2.108s
sys     1m0.508s

728507 <- number of files
real    11m41.853s <- 11mn to restat them when they should all be in cache ideally
user    0m1.040s
sys     0m4.360s

4 hours to stat 700K files. That's bad...
Even 11mn to restat them just to count them looks bad too.

> > I checked that btrfs scrub is not running.
> > What else can I check from here?
> 
> "noatime" set?

I have relatime
gargamel:/mnt/dshelf2/backup/polgara# df .
Filesystem           1K-blocks       Used  Available Use% Mounted on
/dev/mapper/dshelf2 7814041600 3026472436 4760588292  39% /mnt/dshelf2/backup

gargamel:/mnt/dshelf2/backup/polgara# grep /mnt/dshelf2/backup /proc/mounts
/dev/mapper/dshelf2 /mnt/dshelf2/backup btrfs rw,relatime,compress=lzo,space_cache 0 0
 
> What's your cpu hardware wait time?
 
Sorry, not sure how to get that.
 
> And is not *the 512kByte raid chunk* going to give you horrendous write
> amplification?! For example, rm updates a few bytes in one 4kByte
> metadata block and the system has to then do a read-modify-write on
> 512kBytes...

That's probably not great, but
1) rm -rf should bunch a lot of writes together before they start
hitting the block layer for writes, so I'm not sure that is too much a
problem with the caching layer in between

2) this does not explain 4H to just run du with relatime, which
shouldn't generate any writing, correct?
iostat seems to confirm:

gargamel:~# iostat /dev/md8 1 20
Linux 3.14.0-rc5-amd64-i915-preempt-20140216c (gargamel.svh.merlins.org)        03/25/2014      _x86_64_        (4 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
          75.19    0.00   10.13    8.61    0.00    6.08
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
md8              98.00       392.00         0.00        392          0
md8              96.00       384.00         0.00        384          0
md8              83.00       332.00         0.00        332          0
md8             153.00       612.00         0.00        612          0
md8              82.00       328.00         0.00        328          0
md8              55.00       220.00         0.00        220          0
md8              69.00       276.00         0.00        276          0

> Also, the 64MByte chunk bit-intent map will add a lot of head seeks to
> anything you do on that raid. (The map would be better on a separate SSD
> or other separate drive.)

That's true for writing, but not reading, right?
 
> So... That sort of setup is fine for archived data that is effectively
> read-only. You'll see poor performance for small writes/changes.

So I agree with you that the write case can be improved, especially since I also have a layer
of dmcrypt in the middle
gargamel:/mnt/dshelf2/backup/polgara# cryptsetup luksDump /dev/md8
LUKS header information for /dev/md8
Cipher name:    aes
Cipher mode:    xts-plain64
Hash spec:      sha1
Payload offset: 8192

(I used cryptsetup luksFormat --align-payload=8192 -s 256 -c aes-xts-plain64)

I'm still not convinced that a lot of file IO don't get all collated in memory 
before hitting disk in bigger blocks, but maybe not.

If I were to recreate this array entirely, what would you use for the raid creation
and cryptsetup?

More generally, before I go through all that trouble (it will likely
take 1 week of data copying back and forth), I'd like to debug why my reads are
so slow first.

Thanks,
Marc

On Tue, Mar 25, 2014 at 02:57:57PM +0100, Xavier Nicollet wrote:
> Le 25 mars 2014 à 12:13, Martin a écrit:
> > On 25/03/14 01:49, Marc MERLIN wrote:
> > > It took 18H to rm it in 3 tries:
> 
> > And is not *the 512kByte raid chunk* going to give you horrendous write
> > amplification?! For example, rm updates a few bytes in one 4kByte
> > metadata block and the system has to then do a read-modify-write on
> > 512kBytes...
> 
> My question would be naive, but would it be possible to have a syscall or something to do 
> a fast "rm -rf" or du ?

Well, that wouldn't hurt either, even if it wouldn't address my underlying problem.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

  parent reply	other threads:[~2014-03-25 16:41 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-07 16:05 btrfs on 3.14rc5 stuck on "btrfs_tree_read_lock sync" Marc MERLIN
2014-04-07 16:10 ` Josef Bacik
2014-04-07 18:51   ` Marc MERLIN
2014-04-07 19:32     ` Chris Mason
2014-04-07 20:00       ` Marc MERLIN
2014-04-09 17:38         ` Marc MERLIN
2014-03-25  1:49           ` How to debug very very slow file delete? Marc MERLIN
2014-03-25 12:13             ` How to debug very very slow file delete? (btrfs on md-raid5) Martin
2014-03-25 13:57               ` Xavier Nicollet
2014-03-25 16:41               ` Marc MERLIN [this message]
2014-04-10 17:07                 ` How to debug very very slow file delete? (btrfs on md-raid5 with many files, 70GB metadata) Marc MERLIN
2014-04-11 14:15                 ` How to debug very very slow file delete? (btrfs on md-raid5) Chris Samuel
2014-04-11 17:23                   ` Marc MERLIN
2014-04-11 18:00                     ` Duncan
2014-04-11 19:15                     ` Roman Mamedov
2014-04-12 20:25             ` very slow btrfs filesystem: any data needed before I wipe it? Marc MERLIN
2014-04-13  4:02               ` Duncan
2014-04-14  1:43                 ` Marc MERLIN
2014-04-14 10:28                   ` Duncan
2014-04-16 22:35                     ` Marc MERLIN
2014-04-13 14:57               ` Marc MERLIN
2014-04-13 16:59                 ` what does your btrfsck look like? Marc MERLIN
2014-04-14  2:15             ` How to debug very very slow file delete? Liu Bo
2014-04-14  2:21               ` Liu Bo
2014-06-09 23:40         ` btrfs balance crash BUG ON fs/btrfs/relocation.c:1062 or RIP build_backref_tree+0x9fc/0xcc4 Marc MERLIN
2014-06-10  0:32           ` Russell Coker
2014-06-10  4:58             ` Marc MERLIN
2014-06-14 16:21           ` Marc MERLIN
2014-06-17 18:29           ` Josef Bacik
2014-06-17 18:55             ` Marc MERLIN
2014-06-18 15:26               ` Josef Bacik
2014-06-18 20:21                 ` Marc MERLIN
2014-06-19 16:12                   ` Josef Bacik
2014-06-19 22:25                     ` Marc MERLIN
2014-06-19 22:50                       ` Josef Bacik
2014-06-20  0:53                         ` Marc MERLIN
2014-06-20 15:40                           ` Josef Bacik
2014-06-25 19:40                             ` Marc MERLIN
2014-06-25 21:05                               ` Josef Bacik
2015-05-05 21:02           ` 3.19.6: __btrfs_free_extent:5987: errno=-2 No such entry, did btrfs check --repair break it? Marc MERLIN
2015-05-06 11:04             ` Duncan
2015-05-06 17:25               ` Chris Murphy
2015-05-07  3:15                 ` Duncan
2015-05-06 17:49               ` Marc MERLIN
  -- strict thread matches above, loose matches on Subject: below --
2014-09-03 17:42 kernel BUG at fs/btrfs/extent-tree.c:7727! with 3.17-rc3 Tomasz Chmielewski
2014-09-03 12:04 ` kernel BUG at fs/btrfs/relocation.c:1065 in 3.14.16 to 3.17-rc3 Olivier Bonvalet
2014-09-29 14:13   ` Liu Bo
     [not found]   ` <20140824000720.GN3875@merlins.org>
     [not found]     ` <20140926214821.GX13219@merlins.org>
     [not found]       ` <20150502141102.GB1809@merlins.org>
     [not found]         ` <20150501210013.GH13624@merlins.org>
2015-04-29 23:21           ` 3.19.3, btrfs send/receive error: failed to clone extents Marc MERLIN
2015-05-02 16:30             ` 3.19.3: check tree block failed + WARNING: device 0 not present on scrub Marc MERLIN
2015-05-02 16:50               ` Christian Dysthe
2015-05-02 17:05                 ` Marc MERLIN
2015-05-02 17:20                   ` Christian Dysthe
2015-05-02 17:29                     ` Marc MERLIN
2015-05-02 18:56                       ` Christian Dysthe
2015-05-05  6:32               ` Marc MERLIN
2015-05-05 19:56                 ` 3.19.6: __btrfs_free_extent:5987: errno=-2 No such entry Marc MERLIN
2014-09-08 18:04 ` kernel BUG at fs/btrfs/extent-tree.c:7727! with 3.17-rc3 Tomasz Chmielewski
2014-10-04  1:19   ` Tomasz Chmielewski
2014-04-02  8:29 [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Qu Wenruo
2014-04-02  8:29 ` [PATCH 01/27] btrfs-progs: Introduce asciidoc based man page and btrfs man page Qu Wenruo
2014-04-02  8:29 ` [PATCH 02/27] btrfs-progs: Convert man page for btrfs-subvolume Qu Wenruo
2014-04-02  8:29 ` [PATCH 03/27] btrfs-progs: Convert man page for filesystem subcommand Qu Wenruo
2014-04-02  8:29 ` [PATCH 04/27] btrfs-progs: Convert man page for btrfs-balance Qu Wenruo
2014-04-02  8:29 ` [PATCH 05/27] btrfs-progs: Convert man page for btrfs-device subcommand Qu Wenruo
2014-04-02  8:29 ` [PATCH 06/27] btrfs-progs: Convert man page for btrfs-scrub Qu Wenruo
2014-04-02  8:29 ` [PATCH 07/27] btrfs-progs: Convert man page for btrfs-check Qu Wenruo
2014-04-02  8:29 ` [PATCH 08/27] btrfs-progs: Convert man page for btrfs-rescue Qu Wenruo
2014-04-02  8:29 ` [PATCH 09/27] btrfs-progs: Convert man page for btrfs-inspect-internal Qu Wenruo
2014-04-02  8:29 ` [PATCH 10/27] btrfs-progs: Convert man page for btrfs-send Qu Wenruo
2014-04-02  8:29 ` [PATCH 11/27] btrfs-progs: Convert man page for btrfs-receive Qu Wenruo
2014-04-02  8:29 ` [PATCH 12/27] btrfs-progs: Convert man page for btrfs-quota Qu Wenruo
2014-04-02  8:29 ` [PATCH 13/27] btrfs-progs: Convert and enhance the man page of btrfs-qgroup Qu Wenruo
2014-04-02  8:29 ` [PATCH 14/27] btrfs-progs: Convert man page for btrfs-replace Qu Wenruo
2014-04-04 20:29   ` Marc MERLIN
2014-04-08  1:20     ` Qu Wenruo
2014-04-02  8:29 ` [PATCH 15/27] btrfs-progs: Convert man page for btrfs-dedup Qu Wenruo
2014-04-02  8:29 ` [PATCH 16/27] btrfs-progs: Convert man page for btrfsck Qu Wenruo
2014-04-02  8:29 ` [PATCH 17/27] btrfs-progs: Convert man page for btrfs-convert Qu Wenruo
2014-04-02  8:29 ` [PATCH 18/27] btrfs-progs: Convert man page for btrfs-debug-tree Qu Wenruo
2014-04-02  8:29 ` [PATCH 19/27] btrfs-progs: Convert man page for btrfs-find-root Qu Wenruo
2014-04-02  8:29 ` [PATCH 20/27] btrfs-progs: Convert man page for btrfs-image Qu Wenruo
2014-04-02  8:29 ` [PATCH 21/27] btrfs-progs: Convert man page for btrfs-map-logical Qu Wenruo
2014-04-02  8:29 ` [PATCH 22/27] btrfs-progs: Convert man page for btrfs-show-super Qu Wenruo
2014-04-02  8:29 ` [PATCH 23/27] btrfs-progs: Convert man page for btrfstune Qu Wenruo
2014-04-02  8:29 ` [PATCH 24/27] btrfs-progs: Convert man page for btrfs-zero-log Qu Wenruo
2014-04-04 18:46   ` Marc MERLIN
2014-04-05 22:00     ` cwillu
2014-04-05 22:02       ` Marc MERLIN
2014-04-05 22:03         ` Hugo Mills
2014-04-05 22:21           ` Marc MERLIN
2014-04-05 22:05         ` Marc MERLIN
2014-04-05 22:02       ` Hugo Mills
2014-04-08  1:42     ` Qu Wenruo
2014-04-11  5:54       ` Marc MERLIN
2014-04-02  8:29 ` [PATCH 25/27] btrfs-progs: Convert man page for fsck.btrfs Qu Wenruo
2014-04-02  8:29 ` [PATCH 26/27] btrfs-progs: Convert man page for mkfs.btrfs Qu Wenruo
2014-04-02  8:29 ` [PATCH 27/27] btrfs-progs: Switch to the new asciidoc Documentation Qu Wenruo
2014-04-02 13:24 ` [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Chris Mason
2014-04-02 14:47   ` Marc MERLIN
2014-04-03 20:33   ` Zach Brown
2014-04-02 17:29 ` David Sterba
2014-04-16 17:12 ` David Sterba
2014-04-16 17:16   ` [PATCH] btrfs-progs: doc: link btrfsck to btrfs-check David Sterba
2014-04-17  0:47     ` Qu Wenruo
2014-04-18 14:48       ` David Sterba
2014-04-30 12:14         ` WorMzy Tykashi
2014-05-05 14:57           ` David Sterba
2014-05-08  1:40         ` Qu Wenruo
2014-05-12 14:09           ` David Sterba
2014-06-03  9:38             ` WorMzy Tykashi
2014-06-03 12:19               ` David Sterba
2014-05-17 17:43   ` [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand Hugo Mills
2014-05-17 18:22     ` Hugo Mills
2014-05-18  7:04       ` Qu Wenruo
2014-05-18 12:05         ` Hugo Mills
2014-05-18 16:02           ` Brendan Hide
2014-05-19  0:35           ` Qu Wenruo
2014-05-18  6:51     ` Qu Wenruo
2014-05-18 10:10       ` Hugo Mills
2014-05-19 13:02     ` Chris Mason
2014-05-19 14:01     ` David Sterba
2014-05-19 14:33       ` David Sterba
2014-05-20  0:34         ` Qu Wenruo
2014-05-20 11:08           ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140325164142.GN12833@merlins.org \
    --to=marc@merlins.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=m_btrfs@ml1.co.uk \
    --cc=nicollet@jeru.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.