linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Andreas Dilger <adilger@dilger.ca>
Cc: David Howells <dhowells@redhat.com>,
	Theodore Ts'o <tytso@mit.edu>,
	"Darrick J. Wong" <djwong@kernel.org>, Chris Mason <clm@fb.com>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	xfs <linux-xfs@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-cachefs@redhat.com,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	NeilBrown <neilb@suse.com>
Subject: Re: How capacious and well-indexed are ext4, xfs and btrfs directories?
Date: Sat, 22 May 2021 22:51:02 -0700	[thread overview]
Message-ID: <YKntRtEUoxTEFBOM@localhost> (raw)
In-Reply-To: <6E4DE257-4220-4B5B-B3D0-B67C7BC69BB5@dilger.ca>

On Thu, May 20, 2021 at 11:13:28PM -0600, Andreas Dilger wrote:
> On May 17, 2021, at 9:06 AM, David Howells <dhowells@redhat.com> wrote:
> > With filesystems like ext4, xfs and btrfs, what are the limits on directory
> > capacity, and how well are they indexed?
> > 
> > The reason I ask is that inside of cachefiles, I insert fanout directories
> > inside index directories to divide up the space for ext2 to cope with the
> > limits on directory sizes and that it did linear searches (IIRC).
> > 
> > For some applications, I need to be able to cache over 1M entries (render
> > farm) and even a kernel tree has over 100k.
> > 
> > What I'd like to do is remove the fanout directories, so that for each logical
> > "volume"[*] I have a single directory with all the files in it.  But that
> > means sticking massive amounts of entries into a single directory and hoping
> > it (a) isn't too slow and (b) doesn't hit the capacity limit.
> 
> Ext4 can comfortably handle ~12M entries in a single directory, if the
> filenames are not too long (e.g. 32 bytes or so).  With the "large_dir"
> feature (since 4.13, but not enabled by default) a single directory can
> hold around 4B entries, basically all the inodes of a filesystem.

ext4 definitely seems to be able to handle it. I've seen bottlenecks in
other parts of the storage stack, though.

With a normal NVMe drive, a dm-crypt volume containing ext4, and discard
enabled (on both ext4 and dm-crypt), I've seen rm -r of a directory with
a few million entries (each pointing to a ~4-8k file) take the better
part of an hour, almost all of it system time in iowait. Also makes any
other concurrent disk writes hang, even a simple "touch x". Turning off
discard speeds it up by several orders of magnitude.

(I don't know if this is a known issue or not, so here are the details
just in case it isn't. Also, if this is already fixed in a newer kernel,
my apologies for the outdated report.)

$ uname -a
Linux s 5.10.0-6-amd64 #1 SMP Debian 5.10.28-1 (2021-04-09) x86_64 GNU/Linux

Reproducer (doesn't take *as* long but still long enough to demonstrate
the issue):
$ mkdir testdir
$ time python3 -c 'for i in range(1000000): open(f"testdir/{i}", "wb").write(b"test data")'
$ time rm -r testdir

dmesg details:

INFO: task rm:379934 blocked for more than 120 seconds.
      Not tainted 5.10.0-6-amd64 #1 Debian 5.10.28-1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:rm              state:D stack:    0 pid:379934 ppid:379461 flags:0x00004000
Call Trace:
 __schedule+0x282/0x870
 schedule+0x46/0xb0
 wait_transaction_locked+0x8a/0xd0 [jbd2]
 ? add_wait_queue_exclusive+0x70/0x70
 add_transaction_credits+0xd6/0x2a0 [jbd2]
 start_this_handle+0xfb/0x520 [jbd2]
 ? jbd2__journal_start+0x8d/0x1e0 [jbd2]
 ? kmem_cache_alloc+0xed/0x1f0
 jbd2__journal_start+0xf7/0x1e0 [jbd2]
 __ext4_journal_start_sb+0xf3/0x110 [ext4]
 ext4_evict_inode+0x24c/0x630 [ext4]
 evict+0xd1/0x1a0
 do_unlinkat+0x1db/0x2f0
 do_syscall_64+0x33/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f088f0c3b87
RSP: 002b:00007ffc8d3a27a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
RAX: ffffffffffffffda RBX: 000055ffee46de70 RCX: 00007f088f0c3b87
RDX: 0000000000000000 RSI: 000055ffee46df78 RDI: 0000000000000004
RBP: 000055ffece9daa0 R08: 0000000000000100 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc8d3a2980 R14: 00007ffc8d3a2980 R15: 0000000000000002
INFO: task touch:379982 blocked for more than 120 seconds.
      Not tainted 5.10.0-6-amd64 #1 Debian 5.10.28-1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:touch           state:D stack:    0 pid:379982 ppid:379969 flags:0x00000000
Call Trace:
 __schedule+0x282/0x870
 schedule+0x46/0xb0
 wait_transaction_locked+0x8a/0xd0 [jbd2]
 ? add_wait_queue_exclusive+0x70/0x70
 add_transaction_credits+0xd6/0x2a0 [jbd2]
 ? xas_load+0x5/0x70
 ? find_get_entry+0xd1/0x170
 start_this_handle+0xfb/0x520 [jbd2]
 ? jbd2__journal_start+0x8d/0x1e0 [jbd2]
 ? kmem_cache_alloc+0xed/0x1f0
 jbd2__journal_start+0xf7/0x1e0 [jbd2]
 __ext4_journal_start_sb+0xf3/0x110 [ext4]
 __ext4_new_inode+0x721/0x1670 [ext4]
 ext4_create+0x106/0x1b0 [ext4]
 path_openat+0xde1/0x1080
 do_filp_open+0x88/0x130
 ? getname_flags.part.0+0x29/0x1a0
 ? __check_object_size+0x136/0x150
 do_sys_openat2+0x97/0x150
 __x64_sys_openat+0x54/0x90
 do_syscall_64+0x33/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fb2afb8fbe7
RSP: 002b:00007ffee3e287b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007ffee3e28a68 RCX: 00007fb2afb8fbe7
RDX: 0000000000000941 RSI: 00007ffee3e2a340 RDI: 00000000ffffff9c
RBP: 00007ffee3e2a340 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
R13: 00007ffee3e2a340 R14: 0000000000000000 R15: 0000000000000000



  reply	other threads:[~2021-05-23  5:51 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 15:06 How capacious and well-indexed are ext4, xfs and btrfs directories? David Howells
2021-05-17 23:22 ` Dave Chinner
2021-05-17 23:40   ` Chris Mason
2021-05-19  8:00   ` Avi Kivity
2021-05-19 12:57     ` Dave Chinner
2021-05-19 14:13       ` Avi Kivity
2021-05-18  7:24 ` David Howells
2021-05-21  5:13 ` Andreas Dilger
2021-05-23  5:51   ` Josh Triplett [this message]
2021-05-25  4:21     ` Darrick J. Wong
2021-05-25  5:00       ` Christoph Hellwig
2021-05-25 21:13     ` Andreas Dilger
2021-05-25 21:26       ` Matthew Wilcox
2021-05-25 22:13         ` Darrick J. Wong
2021-05-25 22:48         ` Andreas Dilger
2021-05-26  0:24       ` Chris Mason
2021-06-22  0:50       ` Josh Triplett
2021-05-25 22:31 ` David Howells
2021-05-25 22:58   ` Andreas Dilger
2021-05-26  0:00   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKntRtEUoxTEFBOM@localhost \
    --to=josh@joshtriplett.org \
    --cc=adilger@dilger.ca \
    --cc=clm@fb.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).