linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: Jan Kara <jack@suse.cz>
Cc: Ted Tso <tytso@mit.edu>, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 3/4] ext4: Speedup ext4 orphan inode handling
Date: Thu, 17 Jun 2021 01:44:13 -0600	[thread overview]
Message-ID: <A8AFE573-798C-4E07-AD66-A369B3B1CC51@dilger.ca> (raw)
In-Reply-To: <20210616105655.5129-4-jack@suse.cz>

[-- Attachment #1: Type: text/plain, Size: 3807 bytes --]

On Jun 16, 2021, at 4:56 AM, Jan Kara <jack@suse.cz> wrote:
> 
> Ext4 orphan inode handling is a bottleneck for workloads which heavily
> truncate / unlink small files since it contends on the global
> s_orphan_mutex lock (and generally it's difficult to improve scalability
> of the ondisk linked list of orphaned inodes).
> 
> This patch implements new way of handling orphan inodes. Instead of
> linking orphaned inode into a linked list, we store it's inode number in
> a new special file which we call "orphan file". Currently we still
> protect the orphan file with a spinlock for simplicity but even in this
> setting we can substantially reduce the length of the critical section
> and thus speedup some workloads.

Is it a single spinlock for the whole file?  Did you consider using
a per-page lock or grouplock?  With a page in the orphan file for each
CPU core, it would basically be lockless.

> Note that the change is backwards compatible when the filesystem is
> clean - the existence of the orphan file is a compat feature, we set
> another ro-compat feature indicating orphan file needs scanning for
> orphaned inodes when mounting filesystem read-write. This ro-compat
> feature gets cleared on unmount / remount read-only.
> 
> Some performance data from 80 CPU Xeon Server with 512 GB of RAM,
> filesystem located on SSD, average of 5 runs:
> 
> stress-orphan (microbenchmark truncating files byte-by-byte from N
> processes in parallel)
> 
> Threads Time            Time
>        Vanilla         Patched
>  1       1.057200        0.945600
>  2       1.680400        1.331800
>  4       2.547000        1.995000
>  8       7.049400        6.424200
> 16      14.827800       14.937600
> 32      40.948200       33.038200
> 64      87.787400       60.823600
> 128     206.504000      122.941400
> 
> So we can see significant wins all over the board.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> 
> +static int ext4_orphan_file_add(handle_t *handle, struct inode *inode)
> +{
> 	spin_lock(&oi->of_lock);
> +	for (i = 0; i < oi->of_blocks && !oi->of_binfo[i].ob_free_entries; i++);
> +	if (i == oi->of_blocks) {
> +		spin_unlock(&oi->of_lock);
> +		/*
> +		 * For now we don't grow or shrink orphan file. We just use
> +		 * whatever was allocated at mke2fs time. The additional
> +		 * credits we would have to reserve for each orphan inode
> +		 * operation just don't seem worth it.
> +		 */
> +		return -ENOSPC;
> +	}
> +	oi->of_binfo[i].ob_free_entries--;
> +	spin_unlock(&oi->of_lock);

How do we know how large to make the orphan file at mkfs time?  What if it
becomes full during use?  It seems like reserving a fixed number of blocks
will invariably be incorrect for the actual workload on the filesystem.

> @@ -49,6 +95,16 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
> 	ASSERT((S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> 		  S_ISLNK(inode->i_mode)) || inode->i_nlink == 0);
> 
> +	if (sbi->s_orphan_info.of_blocks) {
> +		err = ext4_orphan_file_add(handle, inode);
> +		/*
> +		 * Fallback to normal orphan list of orphan file is
> +		 * out of space
> +		 */
> +		if (err != -ENOSPC)
> +			return err;
> +	}

This could schedule a task on a workqueue to allocate a few more blocks?
That could easily reserve more credits for this action, without making
the common case more expensive.  Even if it isn't used with the current
mount, it would be available for the next mount (which presumably would
also need additional blocks).

Whether it is worth the complexity to make this fully dynamic, at least
it would auto-tune for the workload placed on this filesystem, and would
not initially be worse than the old single-linked list.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

  reply	other threads:[~2021-06-17  7:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16 10:56 [PATCH 0/4 v3] ext4: Speedup orphan file handling Jan Kara
2021-06-16 10:56 ` [PATCH 1/4] ext4: Support for checksumming from journal triggers Jan Kara
2021-06-16 19:56   ` Andreas Dilger
2021-06-17  8:24     ` Jan Kara
2021-06-16 10:56 ` [PATCH 2/4] ext4: Move orphan inode handling into a separate file Jan Kara
2021-06-17  3:24   ` Andreas Dilger
2021-06-17  6:56   ` kernel test robot
2021-06-17 12:40   ` kernel test robot
2021-06-16 10:56 ` [PATCH 3/4] ext4: Speedup ext4 orphan inode handling Jan Kara
2021-06-17  7:44   ` Andreas Dilger [this message]
2021-06-17  8:22     ` Jan Kara
2021-06-30 13:25   ` Lukas Czerner
2021-06-30 15:54   ` Darrick J. Wong
2021-06-16 10:56 ` [PATCH 4/4] ext4: Improve scalability of ext4 orphan file handling Jan Kara
2021-06-30 13:46   ` Lukas Czerner
2021-07-08 18:30     ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2015-05-22 11:21 [PATCH 0/3 v2] ext4: Speedup " Jan Kara
2015-05-22 11:21 ` [PATCH 3/4] ext4: Speedup ext4 orphan inode handling Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A8AFE573-798C-4E07-AD66-A369B3B1CC51@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).