archive mirror
 help / color / mirror / Atom feed
From: Josh Triplett <>
To: "Theodore Y. Ts'o" <>
Cc: "Darrick J. Wong" <>,
	Linus Torvalds <>,
	Andreas Dilger <>,
	Jan Kara <>,
	Linux Kernel Mailing List <>,
Subject: Re: ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps
Date: Wed, 7 Oct 2020 13:14:24 -0700	[thread overview]
Message-ID: <20201007201424.GB15049@localhost> (raw)
In-Reply-To: <>

On Wed, Oct 07, 2020 at 10:32:11AM -0400, Theodore Y. Ts'o wrote:
> On Wed, Oct 07, 2020 at 01:03:04AM -0700, Josh Triplett wrote:
> > > But can we *please* take your custom tool out back and shoot it in the
> > > head?
> > 
> > Nope. As mentioned, this isn't about creating ext4 filesystem images,
> > and it isn't even remotely similar to mke2fs.
> Can you please tell us what your tool is for, then?  Why are you doing
> this?  Why are you inflicting this on us?

That sounds like a conversation that would have been a lot more
interesting and enjoyable if it hadn't started with "can we shoot it in
the head", and continued with the notion that anything other than
e2fsprogs making something to be mounted by mount(2) and handled by
fs/ext4 is being "inflicted", and if the goal didn't still seem to be
"how do we make it go away so that only e2fsprogs and the kernel ever
touch ext4". I started this thread because I'd written some userspace
code, a new version of the kernel made that userspace code stop working,
so I wanted to report that the moment I'd discovered that, along with a
potential way to address it with as little disruption to ext4 as

I'm not looking to be an alternative userspace, or an alternative
anything; that implies serving more-or-less the same function
differently. I have no interest in supplanting mke2fs or any other part
of e2fsprogs; I'm using those for many of the purposes they already

The short version is that I needed a library to rapidly turn
dynamically-obtained data into a set of disk blocks to be served
on-the-fly as a software-defined disk, and then mounted on the other
side of that interface by the Linux kernel. Turns out that's *many
orders of magnitude* faster than any kind of network filesystem like
NFS. It's slightly similar to a vvfat for ext4. The less blocks it can
generate and account for and cache, the faster it can run, and
microseconds matter.

ext4 has *incredible* compatibility guarantees. I'd already understood
the whole compat/ro_compat mechanism when I read through the on-disk
format documentation and the code. RO_COMPAT_SHARED_BLOCKS *seemed* like
the right semantic description of "don't ever try to write to this
filesystem because there are deduplicated blocks", and
RO_COMPAT_READONLY seemed like the right semantic description for "don't
ever try to write this, it's permanently read-only for unspecified

If those aren't the right way to express that, I could potentially
adapt. I had a similar such conversation on linux-ext4 already (about
inline data with 128-bit inodes), which led to me choosing to abandon
128-byte inodes rather than try to get ext4 to support what I wanted
with them, because I didn't want to be disruptive to ext4 for a niche
use case. In the particular case that motivated this thread, what I was
doing already worked in previous kernels, and it seemed reasonable to
ask for it to continue to work in new kernels, while preserving the
newly added checks in the new kernels.

If the response here had been more along the lines of "could we create
and use a *different* compat flag for shared metadata and have
RO_COMPAT_SHARED_BLOCKS only mean shared data blocks", I'd be fine with

If at some point I'm looking to make ext4 support more than it already
does (e.g. a way to omit bitmaps entirely, or a way to express
contiguous files with smaller extent maps, or other enhancements for
read-only filesystems), I'd be coming with architectural discussions
first, patches second, and at no point would I have the expectation that
ext4 folks need to do extra work on my behalf. I'm happy to do the work.
The *only* thing I'm asking, here, is "don't break things that worked".
And after this particular item, I'd be happy to narrow that to "don't
break things that e2fsck was previously happy with".

- Josh Triplett

  reply	other threads:[~2020-10-07 20:14 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-04 23:17 Linux 5.9-rc8 Linus Torvalds
2020-10-05  8:14 ` ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps Josh Triplett
2020-10-05  9:46   ` Jan Kara
2020-10-05 10:16     ` Josh Triplett
2020-10-05 16:19       ` Jan Kara
2020-10-05 16:20   ` Jan Kara
2020-10-05 17:36   ` Darrick J. Wong
2020-10-06  0:04     ` Theodore Y. Ts'o
2020-10-06  0:32     ` Josh Triplett
2020-10-06  2:51       ` Darrick J. Wong
2020-10-06  3:18         ` Theodore Y. Ts'o
2020-10-06  5:03           ` Josh Triplett
2020-10-06  6:03             ` Josh Triplett
2020-10-06 13:35             ` Theodore Y. Ts'o
2020-10-07  8:03               ` Josh Triplett
2020-10-07 14:32                 ` Theodore Y. Ts'o
2020-10-07 20:14                   ` Josh Triplett [this message]
2020-10-08  2:10                     ` Theodore Y. Ts'o
2020-10-08 17:54                       ` Darrick J. Wong
2020-10-08 22:38                         ` Josh Triplett
2020-10-09  2:54                           ` Darrick J. Wong
2020-10-09 19:08                             ` Josh Triplett
2020-10-08 22:22                       ` Josh Triplett
2020-10-09 14:37                         ` Theodore Y. Ts'o
2020-10-09 20:30                           ` Josh Triplett
2021-01-10 18:41                           ` Malicious fs images was " Pavel Machek
2021-01-11 18:51                             ` Darrick J. Wong
2021-01-11 19:39                               ` Eric Biggers
2021-01-12 21:43                             ` Theodore Ts'o
2021-01-12 22:28                               ` Pavel Machek
2021-01-13  5:09                                 ` Theodore Ts'o
2020-10-08  2:57                     ` Andreas Dilger
2020-10-08 19:12                       ` Josh Triplett
2020-10-08 19:25                         ` Andreas Dilger
2020-10-08 22:28                           ` Josh Triplett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201007201424.GB15049@localhost \ \ \ \ \ \ \ \ \
    --subject='Re: ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox