linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: linux-ext4@vger.kernel.org
Subject: Parallel fsck current status
Date: Fri, 11 Jun 2021 10:40:18 -0400	[thread overview]
Message-ID: <YMN10sXgoTR/IPxr@mit.edu> (raw)

Parallel FSCK Project current status 
Written by harshads@ and further updated by tytso@

Background
==========

Ext4 fsck has traditionally been a single threaded program. On large
(and especially fragmented) disks, fsck has resulted in performance
degradation. On large disks, this single threaded fsck takes a long
time to complete.

Fortunately, upstream has seen some action for parallelizing fsck
[1]. However, as you can see the patchset is very long (with around
50~ patches) and it didn’t completely make it through to e2fsck. Ted
added threading support to e2fsprogs [3] that added following
features:

* The patchset made libext2fs thread-aware
* The patchset added parallel bitmap loading

However, the upstream changes added by Ted only parallelize bitmap
loading. File system checking is still single threaded.  Reviewing and
merging massive patchset is extremely hard and that’s why Ted
suggested on the mailing list[4] that we first add support for
multithreading to libext2fs. This will allow us to add unit tests for
parallelizing libext2fs independently of parallel e2fsck. Once that
goes in, we can rebase the rest of the patches on top of libext2fs
changes.

Saranya spent some effort cleaning up Wang Shilong's patches, and
there is a working version of those patches which are based on a
recent version of e2fsprogs (just before fast_commit support was
integrated) at [2].  However, when we looked more closely at that
patch, a fundamental issue of that patch is that the changes to e2fsck
to enable multithreaded access to the internal data structures of the
libext2fs library made the patches extremely fragile, since it exposed
the internal data abstractions of libext2fs into e2fsck.


Problem Definition
==================

The top level object holding critical information in e2fsprogs is
called ext2fil_sys. Every application that links against libext2fs,
allocates, updates and frees this struct using libext2fs API [5]. For
making any libext2fs application thread-aware, we first need to add
the ability in libext2fs to clone this structure so that multiple
threads can make progress parallely. Once all the threads finish,
we’ll need to add the ability to merge these structures back. So, in
other words, we’ll need to add following APIs in libext2fs:

/* Clone fs object into dest based on flags */
errcode_t ext2fs_clone_fs(ext2_filsys fs, ext2_filsys *dest, int flags);

/* Try to free the FS object. If this object is a clone, merge it with the parent. */
errcode_t ext2fs_free_fs(ext2_filsys fs);


Saranya was working on this project; the commit [6] is a work in
progress to implement this design. We can either take that code and
modify or start from scratch and use that code as a reference.

Outcome and Future Direction
============================

At the end of this project, we’ll have an upstream ready
patchset.  Once these changes are in, the next step would be to drop
some patches from Wang’s original e2fsck patchset[1] and rebase the
rest of the series on top of the patchset. 



REFERENCES
==========

[1] Wang Shilong’s original parallel e2fsck patchset:
	http://patchwork.ozlabs.org/project/linux-ext4/list/?series=169193

[2] Wang Shilong's patches rebased and cleaned up versus a relatively
recent version of e2fsprogs:
       https://github.com/tytso/e2fsprogs/tree/pfsck
       git fetch https://github.com/tytso/e2fsprogs.git pfsck
       
[3] Patches sent by Ted that add parallel bitmap support:
	https://www.spinics.net/lists/linux-ext4/msg75716.html

[4] Ted’s suggested next steps:
	http://patchwork.ozlabs.org/project/linux-ext4/patch/20201118153947.3394530-11-saranyamohan@google.com/#2584340

[5] libext2fs API
	https://github.com/tytso/e2fsprogs/blob/master/lib/ext2fs/ext2fs.h

[6] Saranya’s WIP commit that adds clonefs support:
	https://github.com/srnym/e2fsprogs/commit/3007ba6c47a5caf2e2346d4eb2e05f1333663c2f

                 reply	other threads:[~2021-06-11 14:40 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMN10sXgoTR/IPxr@mit.edu \
    --to=tytso@mit.edu \
    --cc=linux-ext4@vger.kernel.org \
    --cc=riteshh@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).