All of lore.kernel.org
 help / color / mirror / Atom feed
* [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
@ 2023-08-30 14:07 Christoph Hellwig
  2023-09-05 23:06 ` Dave Chinner
  2023-09-06 22:32 ` Guenter Roeck
  0 siblings, 2 replies; 97+ messages in thread
From: Christoph Hellwig @ 2023-08-30 14:07 UTC (permalink / raw)
  To: ksummit, linux-fsdevel

Hi all,

we have a lot of on-disk file system drivers in Linux, which I consider
a good thing as it allows a lot of interoperability.  At the same time
maintaining them is a burden, and there is a lot expectation on how
they are maintained.

Part 1: untrusted file systems

There has been a lot of syzbot fuzzing using generated file system
images, which I again consider a very good thing as syzbot is good
a finding bugs.  Unfortunately it also finds a lot of bugs that no
one is interested in fixing.   The reason for that is that file system
maintainers only consider a tiny subset of the file system drivers,
and for some of them a subset of the format options to be trusted vs
untrusted input.  It thus is not just a waste of time for syzbot itself,
but even more so for the maintainers to report fuzzing bugs in other
implementations.

What can we do to only mark certain file systems (and format options)
as trusted on untrusted input and remove a lot of the current tension
and make everyone work more efficiently?  Note that this isn't even
getting into really trusted on-disk formats, which is a security
discussion on it's own, but just into formats where the maintainers
are interested in dealing with fuzzed images.

Part 2: unmaintained file systems

A lot of our file system drivers are either de facto or formally
unmaintained.  If we want to move the kernel forward by finishing
API transitions (new mount API, buffer_head removal for the I/O path,
->writepage removal, etc) these file systems need to change as well
and need some kind of testing.  The easiest way forward would be
to remove everything that is not fully maintained, but that would
remove a lot of useful features.

E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
It collects odd fixes because it is really useful for interoperating
with MacOS and it would be a pity to remove it.  At the same time
it is impossible to test changes to hfsplus sanely as there is no
mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
one that was ported from the open source Darwin code drops, and
I managed to get xfstests to run on hfsplus with them, but this
old version doesn't compile on any modern Linux distribution and
new versions of the code aren't trivially portable to Linux.

Do we have volunteers with old enough distros that we can list as
testers for this code?  Do we have any other way to proceed?

If we don't, are we just going to untested API changes to these
code bases, or keep the old APIs around forever?

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-08-30 14:07 [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems Christoph Hellwig
@ 2023-09-05 23:06 ` Dave Chinner
  2023-09-05 23:23   ` Matthew Wilcox
  2023-09-08  8:55   ` Christoph Hellwig
  2023-09-06 22:32 ` Guenter Roeck
  1 sibling, 2 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-05 23:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: ksummit, linux-fsdevel

On Wed, Aug 30, 2023 at 04:07:39PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> we have a lot of on-disk file system drivers in Linux, which I consider
> a good thing as it allows a lot of interoperability.  At the same time
> maintaining them is a burden, and there is a lot expectation on how
> they are maintained.
> 
> Part 1: untrusted file systems
> 
> There has been a lot of syzbot fuzzing using generated file system
> images, which I again consider a very good thing as syzbot is good
> a finding bugs.  Unfortunately it also finds a lot of bugs that no
> one is interested in fixing.   The reason for that is that file system
> maintainers only consider a tiny subset of the file system drivers,
> and for some of them a subset of the format options to be trusted vs
> untrusted input.  It thus is not just a waste of time for syzbot itself,
> but even more so for the maintainers to report fuzzing bugs in other
> implementations.
> 
> What can we do to only mark certain file systems (and format options)
> as trusted on untrusted input and remove a lot of the current tension
> and make everyone work more efficiently?  Note that this isn't even
> getting into really trusted on-disk formats, which is a security
> discussion on it's own, but just into formats where the maintainers
> are interested in dealing with fuzzed images.

I think this completely misses the point of contention of the larger
syzbot vs filesystem discussion: the assertion that "testing via
syzbot means the subsystem is secure" where "secure" means "can be
used safely for operations that involve trust model violations".

Fundamentally, syzbot does nothing to actually validate the
filesystem is "secure". Fuzzing can only find existing bugs by
simulating an attacker, but it does nothing to address the
underlying issues that allow that attack channel to exist.

All "syzbot doesn't find bugs" means is that -random bit
manipulation- of the filesystem's metadata *hasn't found issues*.

Even though the XFS V5 format is pretty robust against random bit
manipulation, it's certainly not invulnerable and cannot detect
coordinated, multiple object corruptions (cross linked blocks,
cycles in trees, etc) without a full filesystem scan. These sorts of
corruptions are almost never going to be exercised by random bit
manipulation fuzzers like syzbot, but they are exactly the sort of
thing a malicious attacker with some knowledge of how the filesystem
works would look at....

Let's also address the elephant in the room: malicious attackers
don't need to to exploit flaws in the filesystem metadata structure
to trojan an unsuspecting user.

i.e. We cannot detect changes to metadata that are within valid
bounds and may be security sensitive - things like UIDs and GIDs,
inode permissions, inode flags, link counts, symbolic links, etc. We
also can't determine if the file data is unchanged, so it's easy to
trojan the contents of an executable file on a filesystem image.

IOWs, all the attacker needs to do is trojan an installer script on
an application or device driver disk/image, and the user will run it
as root themselves....

There are whole classes of malicious modifications that syzbot
doesn't exercise and we cannot detect nor defend against at the
filesystem level without changing the trust model the filesystem
operates under. And if we change the trust model, we are now talking
about on-disk format changes and using robust crypto for all the
data and metadata in the filesystem. At which point, we may as well
require a full disk encryption layer via dm-crypt....

If we say "filesystem is secure against untrusted input" then that
is what users will expect us to provide. It will also means that
every bug that syzbot might find will result in a high priority CVE,
because any issue arising from untrusted input is a now a major
system security issue.

As such, I just don't see how "tested with syzbot" equates with
"safe for untrusted use cases" whilst also reducing the impact of
the problems that syzbot finds and reports...

> Part 2: unmaintained file systems
> 
> A lot of our file system drivers are either de facto or formally
> unmaintained.  If we want to move the kernel forward by finishing
> API transitions (new mount API, buffer_head removal for the I/O path,
> ->writepage removal, etc) these file systems need to change as well
> and need some kind of testing.  The easiest way forward would be
> to remove everything that is not fully maintained, but that would
> remove a lot of useful features.

Linus has explicitly NACKed that approach.

https://lore.kernel.org/linux-fsdevel/CAHk-=wg7DSNsHY6tWc=WLeqDBYtXges_12fFk1c+-No+fZ0xYQ@mail.gmail.com/

Which is a problem, because historically we've taken code into
the kernel without requiring a maintainer, or the people who
maintained the code have moved on, yet we don't have a policy for
removing code that is slowly bit-rotting to uselessness.

> E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
> It collects odd fixes because it is really useful for interoperating
> with MacOS and it would be a pity to remove it.  At the same time
> it is impossible to test changes to hfsplus sanely as there is no
> mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
> one that was ported from the open source Darwin code drops, and
> I managed to get xfstests to run on hfsplus with them, but this
> old version doesn't compile on any modern Linux distribution and
> new versions of the code aren't trivially portable to Linux.
> 
> Do we have volunteers with old enough distros that we can list as
> testers for this code?  Do we have any other way to proceed?
>
> If we don't, are we just going to untested API changes to these
> code bases, or keep the old APIs around forever?

We do slowly remove device drivers and platforms as the hardware,
developers and users disappear. We do also just change driver APIs
in device drivers for hardware that no-one is actually able to test.
The assumption is that if it gets broken during API changes,
someone who needs it to work will fix it and send patches.

That seems to be the historical model for removing unused/obsolete
code from the kernel, so why should we treat unmaintained/obsolete
filesystems any differently?  i.e. Just change the API, mark it
CONFIG_BROKEN until someone comes along and starts fixing it...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-05 23:06 ` Dave Chinner
@ 2023-09-05 23:23   ` Matthew Wilcox
  2023-09-06  2:09     ` Dave Chinner
                       ` (2 more replies)
  2023-09-08  8:55   ` Christoph Hellwig
  1 sibling, 3 replies; 97+ messages in thread
From: Matthew Wilcox @ 2023-09-05 23:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
> > Part 2: unmaintained file systems
> > 
> > A lot of our file system drivers are either de facto or formally
> > unmaintained.  If we want to move the kernel forward by finishing
> > API transitions (new mount API, buffer_head removal for the I/O path,
> > ->writepage removal, etc) these file systems need to change as well
> > and need some kind of testing.  The easiest way forward would be
> > to remove everything that is not fully maintained, but that would
> > remove a lot of useful features.
> 
> Linus has explicitly NACKed that approach.
> 
> https://lore.kernel.org/linux-fsdevel/CAHk-=wg7DSNsHY6tWc=WLeqDBYtXges_12fFk1c+-No+fZ0xYQ@mail.gmail.com/
> 
> Which is a problem, because historically we've taken code into
> the kernel without requiring a maintainer, or the people who
> maintained the code have moved on, yet we don't have a policy for
> removing code that is slowly bit-rotting to uselessness.
> 
> > E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
> > It collects odd fixes because it is really useful for interoperating
> > with MacOS and it would be a pity to remove it.  At the same time
> > it is impossible to test changes to hfsplus sanely as there is no
> > mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
> > one that was ported from the open source Darwin code drops, and
> > I managed to get xfstests to run on hfsplus with them, but this
> > old version doesn't compile on any modern Linux distribution and
> > new versions of the code aren't trivially portable to Linux.
> > 
> > Do we have volunteers with old enough distros that we can list as
> > testers for this code?  Do we have any other way to proceed?
> >
> > If we don't, are we just going to untested API changes to these
> > code bases, or keep the old APIs around forever?
> 
> We do slowly remove device drivers and platforms as the hardware,
> developers and users disappear. We do also just change driver APIs
> in device drivers for hardware that no-one is actually able to test.
> The assumption is that if it gets broken during API changes,
> someone who needs it to work will fix it and send patches.
> 
> That seems to be the historical model for removing unused/obsolete
> code from the kernel, so why should we treat unmaintained/obsolete
> filesystems any differently?  i.e. Just change the API, mark it
> CONFIG_BROKEN until someone comes along and starts fixing it...

Umm.  If I change ->write_begin and ->write_end to take a folio,
convert only the filesystems I can test via Luis' kdevops and mark the
rest as CONFIG_BROKEN, I can guarantee you that Linus will reject that
pull request.

I really feel we're between a rock and a hard place with our unmaintained
filesystems.  They have users who care passionately, but not the ability
to maintain them.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-05 23:23   ` Matthew Wilcox
@ 2023-09-06  2:09     ` Dave Chinner
  2023-09-06 15:06       ` Christian Brauner
  2023-09-07  0:46     ` Bagas Sanjaya
  2023-09-09 12:50     ` James Bottomley
  2 siblings, 1 reply; 97+ messages in thread
From: Dave Chinner @ 2023-09-06  2:09 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 12:23:22AM +0100, Matthew Wilcox wrote:
> On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
> > > Part 2: unmaintained file systems
> > > 
> > > A lot of our file system drivers are either de facto or formally
> > > unmaintained.  If we want to move the kernel forward by finishing
> > > API transitions (new mount API, buffer_head removal for the I/O path,
> > > ->writepage removal, etc) these file systems need to change as well
> > > and need some kind of testing.  The easiest way forward would be
> > > to remove everything that is not fully maintained, but that would
> > > remove a lot of useful features.
> > 
> > Linus has explicitly NACKed that approach.
> > 
> > https://lore.kernel.org/linux-fsdevel/CAHk-=wg7DSNsHY6tWc=WLeqDBYtXges_12fFk1c+-No+fZ0xYQ@mail.gmail.com/
> > 
> > Which is a problem, because historically we've taken code into
> > the kernel without requiring a maintainer, or the people who
> > maintained the code have moved on, yet we don't have a policy for
> > removing code that is slowly bit-rotting to uselessness.
> > 
> > > E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
> > > It collects odd fixes because it is really useful for interoperating
> > > with MacOS and it would be a pity to remove it.  At the same time
> > > it is impossible to test changes to hfsplus sanely as there is no
> > > mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
> > > one that was ported from the open source Darwin code drops, and
> > > I managed to get xfstests to run on hfsplus with them, but this
> > > old version doesn't compile on any modern Linux distribution and
> > > new versions of the code aren't trivially portable to Linux.
> > > 
> > > Do we have volunteers with old enough distros that we can list as
> > > testers for this code?  Do we have any other way to proceed?
> > >
> > > If we don't, are we just going to untested API changes to these
> > > code bases, or keep the old APIs around forever?
> > 
> > We do slowly remove device drivers and platforms as the hardware,
> > developers and users disappear. We do also just change driver APIs
> > in device drivers for hardware that no-one is actually able to test.
> > The assumption is that if it gets broken during API changes,
> > someone who needs it to work will fix it and send patches.
> > 
> > That seems to be the historical model for removing unused/obsolete
> > code from the kernel, so why should we treat unmaintained/obsolete
> > filesystems any differently?  i.e. Just change the API, mark it
> > CONFIG_BROKEN until someone comes along and starts fixing it...
> 
> Umm.  If I change ->write_begin and ->write_end to take a folio,
> convert only the filesystems I can test via Luis' kdevops and mark the
> rest as CONFIG_BROKEN, I can guarantee you that Linus will reject that
> pull request.

No, that's not what I was suggesting. I suggest that we -change all
the API users when we need to, but in doing so we also need to 
formalise the fact we do not know if the filesystems nobody can/will
maintain function correctly or not.

Reflect that with CONFIG_BROKEN or some other mechanism that
forces people to acknowledge that the filesystem implementation is
not fit for purpose before they attempt to use it. e.g.
write some code that emits a log warning about the filesystem being
unmaintained at mount time and should not be used in situations
where stability, security or data integrity guarantees are required.

> I really feel we're between a rock and a hard place with our unmaintained
> filesystems.  They have users who care passionately, but not the ability
> to maintain them.

Well, yes. IMO, it is even worse to maintain the lie that these
unmaintained filesystems actually work correctly. Just because it's
part of the kernel it doesn't mean it is functional or that users
should be able to trust that it will not lose their data...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06  2:09     ` Dave Chinner
@ 2023-09-06 15:06       ` Christian Brauner
  2023-09-06 15:59         ` Christian Brauner
                           ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: Christian Brauner @ 2023-09-06 15:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Matthew Wilcox, Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 12:09:37PM +1000, Dave Chinner wrote:
> On Wed, Sep 06, 2023 at 12:23:22AM +0100, Matthew Wilcox wrote:
> > On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
> > > > Part 2: unmaintained file systems
> > > > 
> > > > A lot of our file system drivers are either de facto or formally
> > > > unmaintained.  If we want to move the kernel forward by finishing
> > > > API transitions (new mount API, buffer_head removal for the I/O path,
> > > > ->writepage removal, etc) these file systems need to change as well
> > > > and need some kind of testing.  The easiest way forward would be
> > > > to remove everything that is not fully maintained, but that would
> > > > remove a lot of useful features.
> > > 
> > > Linus has explicitly NACKed that approach.
> > > 
> > > https://lore.kernel.org/linux-fsdevel/CAHk-=wg7DSNsHY6tWc=WLeqDBYtXges_12fFk1c+-No+fZ0xYQ@mail.gmail.com/
> > > 
> > > Which is a problem, because historically we've taken code into
> > > the kernel without requiring a maintainer, or the people who
> > > maintained the code have moved on, yet we don't have a policy for
> > > removing code that is slowly bit-rotting to uselessness.
> > > 
> > > > E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
> > > > It collects odd fixes because it is really useful for interoperating
> > > > with MacOS and it would be a pity to remove it.  At the same time
> > > > it is impossible to test changes to hfsplus sanely as there is no
> > > > mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
> > > > one that was ported from the open source Darwin code drops, and
> > > > I managed to get xfstests to run on hfsplus with them, but this
> > > > old version doesn't compile on any modern Linux distribution and
> > > > new versions of the code aren't trivially portable to Linux.
> > > > 
> > > > Do we have volunteers with old enough distros that we can list as
> > > > testers for this code?  Do we have any other way to proceed?
> > > >
> > > > If we don't, are we just going to untested API changes to these
> > > > code bases, or keep the old APIs around forever?
> > > 
> > > We do slowly remove device drivers and platforms as the hardware,
> > > developers and users disappear. We do also just change driver APIs
> > > in device drivers for hardware that no-one is actually able to test.
> > > The assumption is that if it gets broken during API changes,
> > > someone who needs it to work will fix it and send patches.
> > > 
> > > That seems to be the historical model for removing unused/obsolete
> > > code from the kernel, so why should we treat unmaintained/obsolete
> > > filesystems any differently?  i.e. Just change the API, mark it
> > > CONFIG_BROKEN until someone comes along and starts fixing it...
> > 
> > Umm.  If I change ->write_begin and ->write_end to take a folio,
> > convert only the filesystems I can test via Luis' kdevops and mark the
> > rest as CONFIG_BROKEN, I can guarantee you that Linus will reject that
> > pull request.
> 
> No, that's not what I was suggesting. I suggest that we -change all
> the API users when we need to, but in doing so we also need to 
> formalise the fact we do not know if the filesystems nobody can/will
> maintain function correctly or not.
> 
> Reflect that with CONFIG_BROKEN or some other mechanism that
> forces people to acknowledge that the filesystem implementation is
> not fit for purpose before they attempt to use it. e.g.
> write some code that emits a log warning about the filesystem being
> unmaintained at mount time and should not be used in situations
> where stability, security or data integrity guarantees are required.

In addition to this e need to involve low-level userspace. We already
started this a while ago.

util-linux has already implemented X-mount.auto-fstypes which we
requested. For example, X-mount.auto-fstypes="ext4,xfs" accepts only
ext4 and xfs, and X-mount.auto-fstypes="novfat,reiserfs" accepts all
filesystems except vfat and reiserfs.

https://github.com/util-linux/util-linux/commit/1592425a0a1472db3168cd9247f001d7c5dd84b6

IOW,
        mount -t X-mount.auto-fstypes="ext4,xfs,btrfs,erofs" /dev/bla /mnt
would only mount these for filesystems and refuse the rest.

Of course, that's optional so if userspace only uses
        mount /dev/bla /mnt
then libmount will currently happily mount anything that's on /dev/bla.

So adding another RFE to libmount to add support for a global allowlist
or denylist of filesystems and refuse to mount anything else might also
be a good thing. Actually, might go and do this now.

So that we can slowly move userspace towards a smaller set of
filesystems and then distros can start turning off more and more
filesystems.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06 15:06       ` Christian Brauner
@ 2023-09-06 15:59         ` Christian Brauner
  2023-09-06 19:09         ` Geert Uytterhoeven
  2023-09-08  8:34         ` Christoph Hellwig
  2 siblings, 0 replies; 97+ messages in thread
From: Christian Brauner @ 2023-09-06 15:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Matthew Wilcox, Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 05:06:29PM +0200, Christian Brauner wrote:
> On Wed, Sep 06, 2023 at 12:09:37PM +1000, Dave Chinner wrote:
> > On Wed, Sep 06, 2023 at 12:23:22AM +0100, Matthew Wilcox wrote:
> > > On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
> > > > > Part 2: unmaintained file systems
> > > > > 
> > > > > A lot of our file system drivers are either de facto or formally
> > > > > unmaintained.  If we want to move the kernel forward by finishing
> > > > > API transitions (new mount API, buffer_head removal for the I/O path,
> > > > > ->writepage removal, etc) these file systems need to change as well
> > > > > and need some kind of testing.  The easiest way forward would be
> > > > > to remove everything that is not fully maintained, but that would
> > > > > remove a lot of useful features.
> > > > 
> > > > Linus has explicitly NACKed that approach.
> > > > 
> > > > https://lore.kernel.org/linux-fsdevel/CAHk-=wg7DSNsHY6tWc=WLeqDBYtXges_12fFk1c+-No+fZ0xYQ@mail.gmail.com/
> > > > 
> > > > Which is a problem, because historically we've taken code into
> > > > the kernel without requiring a maintainer, or the people who
> > > > maintained the code have moved on, yet we don't have a policy for
> > > > removing code that is slowly bit-rotting to uselessness.
> > > > 
> > > > > E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
> > > > > It collects odd fixes because it is really useful for interoperating
> > > > > with MacOS and it would be a pity to remove it.  At the same time
> > > > > it is impossible to test changes to hfsplus sanely as there is no
> > > > > mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
> > > > > one that was ported from the open source Darwin code drops, and
> > > > > I managed to get xfstests to run on hfsplus with them, but this
> > > > > old version doesn't compile on any modern Linux distribution and
> > > > > new versions of the code aren't trivially portable to Linux.
> > > > > 
> > > > > Do we have volunteers with old enough distros that we can list as
> > > > > testers for this code?  Do we have any other way to proceed?
> > > > >
> > > > > If we don't, are we just going to untested API changes to these
> > > > > code bases, or keep the old APIs around forever?
> > > > 
> > > > We do slowly remove device drivers and platforms as the hardware,
> > > > developers and users disappear. We do also just change driver APIs
> > > > in device drivers for hardware that no-one is actually able to test.
> > > > The assumption is that if it gets broken during API changes,
> > > > someone who needs it to work will fix it and send patches.
> > > > 
> > > > That seems to be the historical model for removing unused/obsolete
> > > > code from the kernel, so why should we treat unmaintained/obsolete
> > > > filesystems any differently?  i.e. Just change the API, mark it
> > > > CONFIG_BROKEN until someone comes along and starts fixing it...
> > > 
> > > Umm.  If I change ->write_begin and ->write_end to take a folio,
> > > convert only the filesystems I can test via Luis' kdevops and mark the
> > > rest as CONFIG_BROKEN, I can guarantee you that Linus will reject that
> > > pull request.
> > 
> > No, that's not what I was suggesting. I suggest that we -change all
> > the API users when we need to, but in doing so we also need to 
> > formalise the fact we do not know if the filesystems nobody can/will
> > maintain function correctly or not.
> > 
> > Reflect that with CONFIG_BROKEN or some other mechanism that
> > forces people to acknowledge that the filesystem implementation is
> > not fit for purpose before they attempt to use it. e.g.
> > write some code that emits a log warning about the filesystem being
> > unmaintained at mount time and should not be used in situations
> > where stability, security or data integrity guarantees are required.
> 
> In addition to this e need to involve low-level userspace. We already
> started this a while ago.
> 
> util-linux has already implemented X-mount.auto-fstypes which we
> requested. For example, X-mount.auto-fstypes="ext4,xfs" accepts only
> ext4 and xfs, and X-mount.auto-fstypes="novfat,reiserfs" accepts all
> filesystems except vfat and reiserfs.
> 
> https://github.com/util-linux/util-linux/commit/1592425a0a1472db3168cd9247f001d7c5dd84b6
> 
> IOW,
>         mount -t X-mount.auto-fstypes="ext4,xfs,btrfs,erofs" /dev/bla /mnt
> would only mount these for filesystems and refuse the rest.
> 
> Of course, that's optional so if userspace only uses
>         mount /dev/bla /mnt
> then libmount will currently happily mount anything that's on /dev/bla.
> 
> So adding another RFE to libmount to add support for a global allowlist
> or denylist of filesystems and refuse to mount anything else might also
> be a good thing. Actually, might go and do this now.

https://github.com/util-linux/util-linux/issues/2478

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06 15:06       ` Christian Brauner
  2023-09-06 15:59         ` Christian Brauner
@ 2023-09-06 19:09         ` Geert Uytterhoeven
  2023-09-08  8:34         ` Christoph Hellwig
  2 siblings, 0 replies; 97+ messages in thread
From: Geert Uytterhoeven @ 2023-09-06 19:09 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Dave Chinner, Matthew Wilcox, Christoph Hellwig, ksummit, linux-fsdevel

Hi Christian,

On Wed, Sep 6, 2023 at 5:06 PM Christian Brauner <brauner@kernel.org> wrote:
> util-linux has already implemented X-mount.auto-fstypes which we
> requested. For example, X-mount.auto-fstypes="ext4,xfs" accepts only
> ext4 and xfs, and X-mount.auto-fstypes="novfat,reiserfs" accepts all

I hope that should be achieved using "novfat,noreiserfs"?

And let's hope we don't get any future file system named no<something> ;-)

> filesystems except vfat and reiserfs.
>
> https://github.com/util-linux/util-linux/commit/1592425a0a1472db3168cd9247f001d7c5dd84b6
>
> IOW,
>         mount -t X-mount.auto-fstypes="ext4,xfs,btrfs,erofs" /dev/bla /mnt
> would only mount these for filesystems and refuse the rest.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-08-30 14:07 [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems Christoph Hellwig
  2023-09-05 23:06 ` Dave Chinner
@ 2023-09-06 22:32 ` Guenter Roeck
  2023-09-06 22:54   ` Dave Chinner
  2023-09-07  0:48   ` Bagas Sanjaya
  1 sibling, 2 replies; 97+ messages in thread
From: Guenter Roeck @ 2023-09-06 22:32 UTC (permalink / raw)
  To: Christoph Hellwig, ksummit, linux-fsdevel

On 8/30/23 07:07, Christoph Hellwig wrote:
> Hi all,
> 
> we have a lot of on-disk file system drivers in Linux, which I consider
> a good thing as it allows a lot of interoperability.  At the same time
> maintaining them is a burden, and there is a lot expectation on how
> they are maintained.
> 
> Part 1: untrusted file systems
> 
> There has been a lot of syzbot fuzzing using generated file system
> images, which I again consider a very good thing as syzbot is good
> a finding bugs.  Unfortunately it also finds a lot of bugs that no
> one is interested in fixing.   The reason for that is that file system
> maintainers only consider a tiny subset of the file system drivers,
> and for some of them a subset of the format options to be trusted vs
> untrusted input.  It thus is not just a waste of time for syzbot itself,
> but even more so for the maintainers to report fuzzing bugs in other
> implementations.
> 
> What can we do to only mark certain file systems (and format options)
> as trusted on untrusted input and remove a lot of the current tension
> and make everyone work more efficiently?  Note that this isn't even
> getting into really trusted on-disk formats, which is a security
> discussion on it's own, but just into formats where the maintainers
> are interested in dealing with fuzzed images.
> 
> Part 2: unmaintained file systems
> 
> A lot of our file system drivers are either de facto or formally
> unmaintained.  If we want to move the kernel forward by finishing
> API transitions (new mount API, buffer_head removal for the I/O path,
> ->writepage removal, etc) these file systems need to change as well
> and need some kind of testing.  The easiest way forward would be
> to remove everything that is not fully maintained, but that would
> remove a lot of useful features.
> 
> E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
> It collects odd fixes because it is really useful for interoperating
> with MacOS and it would be a pity to remove it.  At the same time
> it is impossible to test changes to hfsplus sanely as there is no
> mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
> one that was ported from the open source Darwin code drops, and
> I managed to get xfstests to run on hfsplus with them, but this
> old version doesn't compile on any modern Linux distribution and
> new versions of the code aren't trivially portable to Linux.
> 
> Do we have volunteers with old enough distros that we can list as
> testers for this code?  Do we have any other way to proceed?
> 
> If we don't, are we just going to untested API changes to these
> code bases, or keep the old APIs around forever?
> 

In this context, it might be worthwhile trying to determine if and when
to call a file system broken.

Case in point: After this e-mail, I tried playing with a few file systems.
The most interesting exercise was with ntfsv3.
Create it, mount it, copy a few files onto it, remove some of them, repeat.
A script doing that only takes a few seconds to corrupt the file system.
Trying to unmount it with the current upstream typically results in
a backtrace and/or crash.

Does that warrant marking it as BROKEN ? If not, what does ?

Guenter


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06 22:32 ` Guenter Roeck
@ 2023-09-06 22:54   ` Dave Chinner
  2023-09-07  0:53     ` Bagas Sanjaya
                       ` (2 more replies)
  2023-09-07  0:48   ` Bagas Sanjaya
  1 sibling, 3 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-06 22:54 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 03:32:28PM -0700, Guenter Roeck wrote:
> On 8/30/23 07:07, Christoph Hellwig wrote:
> > Hi all,
> > 
> > we have a lot of on-disk file system drivers in Linux, which I consider
> > a good thing as it allows a lot of interoperability.  At the same time
> > maintaining them is a burden, and there is a lot expectation on how
> > they are maintained.
> > 
> > Part 1: untrusted file systems
> > 
> > There has been a lot of syzbot fuzzing using generated file system
> > images, which I again consider a very good thing as syzbot is good
> > a finding bugs.  Unfortunately it also finds a lot of bugs that no
> > one is interested in fixing.   The reason for that is that file system
> > maintainers only consider a tiny subset of the file system drivers,
> > and for some of them a subset of the format options to be trusted vs
> > untrusted input.  It thus is not just a waste of time for syzbot itself,
> > but even more so for the maintainers to report fuzzing bugs in other
> > implementations.
> > 
> > What can we do to only mark certain file systems (and format options)
> > as trusted on untrusted input and remove a lot of the current tension
> > and make everyone work more efficiently?  Note that this isn't even
> > getting into really trusted on-disk formats, which is a security
> > discussion on it's own, but just into formats where the maintainers
> > are interested in dealing with fuzzed images.
> > 
> > Part 2: unmaintained file systems
> > 
> > A lot of our file system drivers are either de facto or formally
> > unmaintained.  If we want to move the kernel forward by finishing
> > API transitions (new mount API, buffer_head removal for the I/O path,
> > ->writepage removal, etc) these file systems need to change as well
> > and need some kind of testing.  The easiest way forward would be
> > to remove everything that is not fully maintained, but that would
> > remove a lot of useful features.
> > 
> > E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
> > It collects odd fixes because it is really useful for interoperating
> > with MacOS and it would be a pity to remove it.  At the same time
> > it is impossible to test changes to hfsplus sanely as there is no
> > mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
> > one that was ported from the open source Darwin code drops, and
> > I managed to get xfstests to run on hfsplus with them, but this
> > old version doesn't compile on any modern Linux distribution and
> > new versions of the code aren't trivially portable to Linux.
> > 
> > Do we have volunteers with old enough distros that we can list as
> > testers for this code?  Do we have any other way to proceed?
> > 
> > If we don't, are we just going to untested API changes to these
> > code bases, or keep the old APIs around forever?
> > 
> 
> In this context, it might be worthwhile trying to determine if and when
> to call a file system broken.
> 
> Case in point: After this e-mail, I tried playing with a few file systems.
> The most interesting exercise was with ntfsv3.
> Create it, mount it, copy a few files onto it, remove some of them, repeat.
> A script doing that only takes a few seconds to corrupt the file system.
> Trying to unmount it with the current upstream typically results in
> a backtrace and/or crash.
> 
> Does that warrant marking it as BROKEN ? If not, what does ?

There's a bigger policy question around that.

I think that if we are going to have filesystems be "community
maintained" because they have no explicit maintainer, we need some
kind of standard policy to be applied.

I'd argue that the filesystem needs, at minimum, a working mkfs and
fsck implementation, and that it is supported by fstests so anyone
changing core infrastructure can simply run fstests against the
filesystem to smoke test the infrastructure changes they are making.

I'd suggest that syzbot coverage of such filesystems is not desired,
because nobody is going to be fixing problems related to on-disk
format verification. All we really care about is that a user can
read and write to the filesystem without trashing anything.

I'd also suggest that we mark filesystem support state via fstype
flags rather than config options. That way we aren't reliant on
distros setting config options correctly to include/indicate the
state of the filesystem implementation. We could also use similar
flags for indicating deprecation and obsolete state (i.e. pending
removal) and have code in the high level mount path issue the
relevant warnings.

This method of marking would also allow us to document and implement
a formal policy for removal of unmaintained and/or obsolete
filesystems without having to be dependent on distros juggling
config variables to allow users to continue using deprecated, broken
and/or obsolete filesystem implementations right up to the point
where they are removed from the kernel.

And let's not forget: removing a filesystem from the kernel is not
removing end user support for extracting data from old filesystems.
We have VMs for that - we can run pretty much any kernel ever built
inside a VM, so users that need to extract data from a really old
filesystem we no longer support in a modern kernel can simply boot
up an old distro that did support it and extract the data that way.

We need to get away from the idea that we have to support old
filesystems forever because someone, somewhere might have an old
disk on the shelf with that filesystem on it and they might plug it
in one day. If that day ever happens, they can go to the effort of
booting an era-relevant distro in a VM to extract that data. It
makes no sense to put an ongoing burden on current development to
support this sort of rare, niche use case....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-05 23:23   ` Matthew Wilcox
  2023-09-06  2:09     ` Dave Chinner
@ 2023-09-07  0:46     ` Bagas Sanjaya
  2023-09-09 12:50     ` James Bottomley
  2 siblings, 0 replies; 97+ messages in thread
From: Bagas Sanjaya @ 2023-09-07  0:46 UTC (permalink / raw)
  To: Matthew Wilcox, Dave Chinner
  Cc: Christoph Hellwig, ksummit, linux-fsdevel, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

[disclaimer: I'm no expert here, just my opinion]

On Wed, Sep 06, 2023 at 12:23:22AM +0100, Matthew Wilcox wrote:
> I really feel we're between a rock and a hard place with our unmaintained
> filesystems.  They have users who care passionately, but not the ability
> to maintain them.

In OTW: these fses are in limbo state, which induces another question:
how to get users of these into developers (and possibly maintainers)
to get out of this unfortunate situation? Do we have to keep
deprecated APIs they use indefinitely for the sake of servicing them without
any transition plan to replacement APIs? Do akpm have to step in for that to
happen?

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06 22:32 ` Guenter Roeck
  2023-09-06 22:54   ` Dave Chinner
@ 2023-09-07  0:48   ` Bagas Sanjaya
  2023-09-07  3:07     ` Guenter Roeck
  1 sibling, 1 reply; 97+ messages in thread
From: Bagas Sanjaya @ 2023-09-07  0:48 UTC (permalink / raw)
  To: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

On Wed, Sep 06, 2023 at 03:32:28PM -0700, Guenter Roeck wrote:
> On 8/30/23 07:07, Christoph Hellwig wrote:
> > Hi all,
> > 
> > we have a lot of on-disk file system drivers in Linux, which I consider
> > a good thing as it allows a lot of interoperability.  At the same time
> > maintaining them is a burden, and there is a lot expectation on how
> > they are maintained.
> > 
> > Part 1: untrusted file systems
> > 
> > There has been a lot of syzbot fuzzing using generated file system
> > images, which I again consider a very good thing as syzbot is good
> > a finding bugs.  Unfortunately it also finds a lot of bugs that no
> > one is interested in fixing.   The reason for that is that file system
> > maintainers only consider a tiny subset of the file system drivers,
> > and for some of them a subset of the format options to be trusted vs
> > untrusted input.  It thus is not just a waste of time for syzbot itself,
> > but even more so for the maintainers to report fuzzing bugs in other
> > implementations.
> > 
> > What can we do to only mark certain file systems (and format options)
> > as trusted on untrusted input and remove a lot of the current tension
> > and make everyone work more efficiently?  Note that this isn't even
> > getting into really trusted on-disk formats, which is a security
> > discussion on it's own, but just into formats where the maintainers
> > are interested in dealing with fuzzed images.
> > 
> > Part 2: unmaintained file systems
> > 
> > A lot of our file system drivers are either de facto or formally
> > unmaintained.  If we want to move the kernel forward by finishing
> > API transitions (new mount API, buffer_head removal for the I/O path,
> > ->writepage removal, etc) these file systems need to change as well
> > and need some kind of testing.  The easiest way forward would be
> > to remove everything that is not fully maintained, but that would
> > remove a lot of useful features.
> > 
> > E.g. the hfsplus driver is unmaintained despite collecting odd fixes.
> > It collects odd fixes because it is really useful for interoperating
> > with MacOS and it would be a pity to remove it.  At the same time
> > it is impossible to test changes to hfsplus sanely as there is no
> > mkfs.hfsplus or fsck.hfsplus available for Linux.  We used to have
> > one that was ported from the open source Darwin code drops, and
> > I managed to get xfstests to run on hfsplus with them, but this
> > old version doesn't compile on any modern Linux distribution and
> > new versions of the code aren't trivially portable to Linux.
> > 
> > Do we have volunteers with old enough distros that we can list as
> > testers for this code?  Do we have any other way to proceed?
> > 
> > If we don't, are we just going to untested API changes to these
> > code bases, or keep the old APIs around forever?
> > 
> 
> In this context, it might be worthwhile trying to determine if and when
> to call a file system broken.
> 
> Case in point: After this e-mail, I tried playing with a few file systems.
> The most interesting exercise was with ntfsv3.
> Create it, mount it, copy a few files onto it, remove some of them, repeat.
> A script doing that only takes a few seconds to corrupt the file system.
> Trying to unmount it with the current upstream typically results in
> a backtrace and/or crash.

Did you forget to take the checksum after copying and verifying it
when remounting the fs?

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06 22:54   ` Dave Chinner
@ 2023-09-07  0:53     ` Bagas Sanjaya
  2023-09-07  3:14       ` Dave Chinner
  2023-09-07  1:53     ` Steven Rostedt
  2023-09-08  8:38     ` Christoph Hellwig
  2 siblings, 1 reply; 97+ messages in thread
From: Bagas Sanjaya @ 2023-09-07  0:53 UTC (permalink / raw)
  To: Dave Chinner, Guenter Roeck; +Cc: Christoph Hellwig, ksummit, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

On Thu, Sep 07, 2023 at 08:54:38AM +1000, Dave Chinner wrote:
> There's a bigger policy question around that.
> 
> I think that if we are going to have filesystems be "community
> maintained" because they have no explicit maintainer, we need some
> kind of standard policy to be applied.
> 
> I'd argue that the filesystem needs, at minimum, a working mkfs and
> fsck implementation, and that it is supported by fstests so anyone
> changing core infrastructure can simply run fstests against the
> filesystem to smoke test the infrastructure changes they are making.

OK.

> 
> I'd suggest that syzbot coverage of such filesystems is not desired,
> because nobody is going to be fixing problems related to on-disk
> format verification. All we really care about is that a user can
> read and write to the filesystem without trashing anything.
> 
> I'd also suggest that we mark filesystem support state via fstype
> flags rather than config options. That way we aren't reliant on
> distros setting config options correctly to include/indicate the
> state of the filesystem implementation. We could also use similar
> flags for indicating deprecation and obsolete state (i.e. pending
> removal) and have code in the high level mount path issue the
> relevant warnings.

Something like xfs v4 format?

> 
> This method of marking would also allow us to document and implement
> a formal policy for removal of unmaintained and/or obsolete
> filesystems without having to be dependent on distros juggling
> config variables to allow users to continue using deprecated, broken
> and/or obsolete filesystem implementations right up to the point
> where they are removed from the kernel.
> 
> And let's not forget: removing a filesystem from the kernel is not
> removing end user support for extracting data from old filesystems.
> We have VMs for that - we can run pretty much any kernel ever built
> inside a VM, so users that need to extract data from a really old
> filesystem we no longer support in a modern kernel can simply boot
> up an old distro that did support it and extract the data that way.
> 
> We need to get away from the idea that we have to support old
> filesystems forever because someone, somewhere might have an old
> disk on the shelf with that filesystem on it and they might plug it
> in one day. If that day ever happens, they can go to the effort of
> booting an era-relevant distro in a VM to extract that data. It
> makes no sense to put an ongoing burden on current development to
> support this sort of rare, niche use case....

This reminds me of me going to a random internet cafe when kids played
popular online games (think of Point Blank), with the computers were
running Windows XP which was almost (and already) EOL, yet these
games still supported it (kudos to game developers).

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06 22:54   ` Dave Chinner
  2023-09-07  0:53     ` Bagas Sanjaya
@ 2023-09-07  1:53     ` Steven Rostedt
  2023-09-07  2:22       ` Dave Chinner
                         ` (2 more replies)
  2023-09-08  8:38     ` Christoph Hellwig
  2 siblings, 3 replies; 97+ messages in thread
From: Steven Rostedt @ 2023-09-07  1:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Thu, 7 Sep 2023 08:54:38 +1000
Dave Chinner <david@fromorbit.com> wrote:

> And let's not forget: removing a filesystem from the kernel is not
> removing end user support for extracting data from old filesystems.
> We have VMs for that - we can run pretty much any kernel ever built
> inside a VM, so users that need to extract data from a really old
> filesystem we no longer support in a modern kernel can simply boot
> up an old distro that did support it and extract the data that way.

Of course there's the case of trying to recreate a OS that can run on a
very old kernel. Just building an old kernel is difficult today because
today's compilers will refuse to build them (I've hit issues in bisections
because of that!)

You could argue that you could just install an old OS into the VM, but that
too requires access to that old OS.

Anyway, what about just having read-only be the minimum for supporting a
file system? We can say "sorry, due to no one maintaining this file system,
we will no longer allow write access." But I'm guessing that just
supporting reading an old file system is much easier than modifying one
(wasn't that what we did with NTFS for the longest time?)

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  1:53     ` Steven Rostedt
@ 2023-09-07  2:22       ` Dave Chinner
  2023-09-07  2:51         ` Steven Rostedt
  2023-09-07  9:48       ` Dan Carpenter
  2023-09-08  8:39       ` Christoph Hellwig
  2 siblings, 1 reply; 97+ messages in thread
From: Dave Chinner @ 2023-09-07  2:22 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 09:53:27PM -0400, Steven Rostedt wrote:
> On Thu, 7 Sep 2023 08:54:38 +1000
> Dave Chinner <david@fromorbit.com> wrote:
> 
> > And let's not forget: removing a filesystem from the kernel is not
> > removing end user support for extracting data from old filesystems.
> > We have VMs for that - we can run pretty much any kernel ever built
> > inside a VM, so users that need to extract data from a really old
> > filesystem we no longer support in a modern kernel can simply boot
> > up an old distro that did support it and extract the data that way.
> 
> Of course there's the case of trying to recreate a OS that can run on a
> very old kernel. Just building an old kernel is difficult today because
> today's compilers will refuse to build them (I've hit issues in bisections
> because of that!)
> 
> You could argue that you could just install an old OS into the VM, but that
> too requires access to that old OS.

Well, yes - why would anyone even bother trying to build an ancient
kernel when all they need to do is download an iso and point the VM
at it?

> Anyway, what about just having read-only be the minimum for supporting a
> file system? We can say "sorry, due to no one maintaining this file system,
> we will no longer allow write access." But I'm guessing that just
> supporting reading an old file system is much easier than modifying one
> (wasn't that what we did with NTFS for the longest time?)

"Read only" doesn't mean the filesytsem implementation is in any way
secure, robust or trustworthy - the kernel is still parsing
untrusted data in ring 0 using unmaintained, bit-rotted, untested
code....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  2:22       ` Dave Chinner
@ 2023-09-07  2:51         ` Steven Rostedt
  2023-09-07  3:26           ` Matthew Wilcox
  2023-09-07  3:38           ` Dave Chinner
  0 siblings, 2 replies; 97+ messages in thread
From: Steven Rostedt @ 2023-09-07  2:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Thu, 7 Sep 2023 12:22:43 +1000
Dave Chinner <david@fromorbit.com> wrote:

> > Anyway, what about just having read-only be the minimum for supporting a
> > file system? We can say "sorry, due to no one maintaining this file system,
> > we will no longer allow write access." But I'm guessing that just
> > supporting reading an old file system is much easier than modifying one
> > (wasn't that what we did with NTFS for the longest time?)  
> 
> "Read only" doesn't mean the filesytsem implementation is in any way
> secure, robust or trustworthy - the kernel is still parsing
> untrusted data in ring 0 using unmaintained, bit-rotted, untested
> code....

It's just a way to still easily retrieve it, than going through and looking
for those old ISOs that still might exist on the interwebs. I wouldn't
recommend anyone actually having that code enabled on a system that doesn't
need access to one of those file systems.

I guess the point I'm making is, what's the burden in keeping it around in
the read-only state? It shouldn't require any updates for new features,
which is the complaint I believe Willy was having.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  0:48   ` Bagas Sanjaya
@ 2023-09-07  3:07     ` Guenter Roeck
  0 siblings, 0 replies; 97+ messages in thread
From: Guenter Roeck @ 2023-09-07  3:07 UTC (permalink / raw)
  To: Bagas Sanjaya, Christoph Hellwig, ksummit, linux-fsdevel

On 9/6/23 17:48, Bagas Sanjaya wrote:
[ ... ]
>> Case in point: After this e-mail, I tried playing with a few file systems.
>> The most interesting exercise was with ntfsv3.
>> Create it, mount it, copy a few files onto it, remove some of them, repeat.
>> A script doing that only takes a few seconds to corrupt the file system.
>> Trying to unmount it with the current upstream typically results in
>> a backtrace and/or crash.
> 
> Did you forget to take the checksum after copying and verifying it
> when remounting the fs?
> 
Sorry, I don't understand what you mean. I didn't try to remount.
The file system images in my tests are pristine, as created with mkfs, and
are marked read-only to prevent corruption. They are also md5 checksum
protected and regenerated before the test if there is a checksum mismatch.
For ntfs, the file system was created with

truncate -s 64M myfilesystem
mkfs.ntfs -F -H 1 -S 16 -p 16 myfilesystem

My tests run under qemu, and always use the -snapshot option.

The "test", if you want to call it that, is a simple

mount "${fstestdev}" /mnt
cp -a /bin /usr /sbin /etc /lib* /opt /var /mnt
rm -rf /mnt/bin
cp -a /bin /usr /sbin /etc /lib* /opt /var /mnt
umount /mnt

This is with a buildroot generated root file system. "cp -a" is a recursive
copy which copies symlinks.

If the file system is ntfs3, the rm command typically fails, complaining
that /mnt/bin is not empty. The umount command typically results in at
least a traceback, and often a crash. Repeating the cp; rm; cp sequence
multiple times quite reliably results in a file system corruption.

The resulting (corrupted or not) file system is discarded after the qemu
session.

Guenter


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  0:53     ` Bagas Sanjaya
@ 2023-09-07  3:14       ` Dave Chinner
  0 siblings, 0 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-07  3:14 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Thu, Sep 07, 2023 at 07:53:36AM +0700, Bagas Sanjaya wrote:
> On Thu, Sep 07, 2023 at 08:54:38AM +1000, Dave Chinner wrote:
> > There's a bigger policy question around that.
> > 
> > I think that if we are going to have filesystems be "community
> > maintained" because they have no explicit maintainer, we need some
> > kind of standard policy to be applied.
> > 
> > I'd argue that the filesystem needs, at minimum, a working mkfs and
> > fsck implementation, and that it is supported by fstests so anyone
> > changing core infrastructure can simply run fstests against the
> > filesystem to smoke test the infrastructure changes they are making.
> 
> OK.
> 
> > 
> > I'd suggest that syzbot coverage of such filesystems is not desired,
> > because nobody is going to be fixing problems related to on-disk
> > format verification. All we really care about is that a user can
> > read and write to the filesystem without trashing anything.
> > 
> > I'd also suggest that we mark filesystem support state via fstype
> > flags rather than config options. That way we aren't reliant on
> > distros setting config options correctly to include/indicate the
> > state of the filesystem implementation. We could also use similar
> > flags for indicating deprecation and obsolete state (i.e. pending
> > removal) and have code in the high level mount path issue the
> > relevant warnings.
> 
> Something like xfs v4 format?

Kind of, but that's a story of obsolescence, not a lack of
maintenance.  For those that don't know the back story, it's below.
For those that do, skip the bit between the '----' lines.

----

We deprecated the v4 XFS on-disk format back in 2020 because it was
superceded by the v5 format that was merged in 2013 (a decade ago).
Since then we have not been adding features to the v4 format because
the v5 format fixes a heap of problems with that old format that
can't otherwise be fixed without changing the on-disk v4 format to
something like the V5 format.

Now throw in the fact that the v4 format is not y2038 compliant.
It's got a hard "end of life" date without putting resources and
effort into an on-disk format change. We aren't going to do that
largely because the V4 format is a development dead end.

Because the v4 format has a hard end of life date, we needed to
have a deprecation plan for the format that was sympathetic to
enterprise distro feature removal policies. 

Given that there's usually a 10 year support life from first release
in an enterprise kernel, and typically a 2-3 year lead in cycle,
we're looking at need to have filesystem feature removal occur 10-15
years before the hard end of support date. Further, feature removal
policies required us to mark the feature deprecated for an entire
major before we can remove it in the subsequent release. This means
we needed to make a decision about the V4 format in 2020, a full 18
years before the hard end of life actually occurs.

[ How many people reading this are thinking about what impact a
decision made has on people using that functionality in 10 years
time? This is something filesystem developers have to do all the
time, because the current on-disk format is almost certainly going
to be in use in 10 years time....]

So we deprecated the v4 format upstream, and the enterprise kernels
inherited that in 2020 before the major release went out the door.
That means we can remove support in the next major release, and the
upstream deprecation schedule reflects this - we're turning off v4
support by default in 2025...

We don't want to carry v4 support code forever, so we have a
removal date defined as well. All upstream support for the v4 format
will stop in 2030, and we will remove the relevant code at that
point in time.

Long story short, we recognised that we have obsolete functionality
that we cannot support forever, and we defined and documented the
long term EOL process for to removing support of the obsolete
functionality from the filesystem.

This, however, does not mean the V4 code is unmaintained or
untested; while it is supported it will be tested, though at a
lesser priority than the v5 format code we want everyone to be
using. THe V4 and V5 formats and code share huge amounts of
commonality, so even when we are testing V5 formats we are
exercising almost all the same code that the V4 format uses....

-----

It should be clear at this point that we cannot equate a well
planned removal of obsolescent functionality from a maintained
filesystem with the current situation of kernel being full of
unmaintained filesystem code.  The two cases are clearly at opposite
ends of spectrum.

However, we should have similar policies to deal with both
situations. If the filesystem is unmaintained and/or obsolete, we
should have a defined policy and process that leads to it being
either "community maintained" for some period of time and/or
deprecated and removed from the code base.

Look at reiserfs.

The maintainer stepped back and said "I'm happy just to remove the
code from the kernel". It has been unmaintained since.

Rather than pulling the rug out from under any remaining users,
we've marked the filesystem as deprecated (and now obsolete) and
documented a planned removal schedule.

This is the sort of process we need to discuss, formalise and
document - how we go about removing kernel features that are no
longer maintained but may still have a small number of long tail
users with the minimum of disruption to everyone....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  2:51         ` Steven Rostedt
@ 2023-09-07  3:26           ` Matthew Wilcox
  2023-09-07  8:04             ` Thorsten Leemhuis
  2023-09-07  3:38           ` Dave Chinner
  1 sibling, 1 reply; 97+ messages in thread
From: Matthew Wilcox @ 2023-09-07  3:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dave Chinner, Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 10:51:39PM -0400, Steven Rostedt wrote:
> I guess the point I'm making is, what's the burden in keeping it around in
> the read-only state? It shouldn't require any updates for new features,
> which is the complaint I believe Willy was having.

Old filesystems depend on old core functionality like bufferheads.

We want to remove bufferheads.

Who has the responsibility for updating those old filesystmes to use
iomap instead of bufferheads?

Who has the responsibility for testing those filesystems still work
after the update?

Who has the responsibility for looking at a syzbot bug report that comes
in twelve months after the conversion is done and deciding whether the
conversion was the problem, or whether it's some other patch that
happened before or after?


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  2:51         ` Steven Rostedt
  2023-09-07  3:26           ` Matthew Wilcox
@ 2023-09-07  3:38           ` Dave Chinner
  2023-09-07 11:18             ` Steven Rostedt
  1 sibling, 1 reply; 97+ messages in thread
From: Dave Chinner @ 2023-09-07  3:38 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 10:51:39PM -0400, Steven Rostedt wrote:
> On Thu, 7 Sep 2023 12:22:43 +1000
> Dave Chinner <david@fromorbit.com> wrote:
> 
> > > Anyway, what about just having read-only be the minimum for supporting a
> > > file system? We can say "sorry, due to no one maintaining this file system,
> > > we will no longer allow write access." But I'm guessing that just
> > > supporting reading an old file system is much easier than modifying one
> > > (wasn't that what we did with NTFS for the longest time?)  
> > 
> > "Read only" doesn't mean the filesytsem implementation is in any way
> > secure, robust or trustworthy - the kernel is still parsing
> > untrusted data in ring 0 using unmaintained, bit-rotted, untested
> > code....
> 
> It's just a way to still easily retrieve it, than going through and looking
> for those old ISOs that still might exist on the interwebs. I wouldn't
> recommend anyone actually having that code enabled on a system that doesn't
> need access to one of those file systems.

In which cae, we should not support it in the kernel!

If all a user needs is a read-only implementation for data recovery,
then it should be done in userspace or with a FUSE back end. Just
because it is a "filesystem" does not mean it needs to be
implemented in the kernel.

> I guess the point I'm making is, what's the burden in keeping it around in
> the read-only state? It shouldn't require any updates for new features,
> which is the complaint I believe Willy was having.

Keeping stuff around as "read-only" doesn't reduce the maintainence
burden; it actually makes it harder because now you can't use the
kernel filesystem code to create the necessary initial conditions
needed to test the filesystem is actually reading things correctly.

That is, testing a "read-only" filesystem implementation requires
you to have some external mechanism to create filesystem images in
the first place. With a read-write implementation, the filesystem
implementation itself can create the structures that then get
tested....

Hence, IMO, gutting a filesystem implementation to just support
read-only behaviour "to prolong it's support life" actually makes
things worse from a maintenance and testing persepective, not
better....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  3:26           ` Matthew Wilcox
@ 2023-09-07  8:04             ` Thorsten Leemhuis
  2023-09-07 10:29               ` Christian Brauner
  0 siblings, 1 reply; 97+ messages in thread
From: Thorsten Leemhuis @ 2023-09-07  8:04 UTC (permalink / raw)
  To: Matthew Wilcox, Steven Rostedt
  Cc: Dave Chinner, Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

[disclaimer: while I agree with many things Christoph, Dave, and Willy
said in this thread, I at the same time feel that someone needs to take
a stance for our "no regressions rule" here and act as its advocate. I
mean, Linus calls it our "#1 rule"; but sure, at the same time it's of
course of similar or higher importance that the Kernel does not loose or
damage any data users entrusted it, as the Kernel otherwise might be "a
pointless piece of code that you might as well throw away"[1].]

On 07.09.23 05:26, Matthew Wilcox wrote:
> On Wed, Sep 06, 2023 at 10:51:39PM -0400, Steven Rostedt wrote:
>> I guess the point I'm making is, what's the burden in keeping it around in
>> the read-only state? It shouldn't require any updates for new features,
>> which is the complaint I believe Willy was having.
> 
> Old filesystems depend on old core functionality like bufferheads.
> 
> We want to remove bufferheads.
> 
> Who has the responsibility for updating those old filesystmes to use
> iomap instead of bufferheads?
>
> Who has the responsibility for testing those filesystems still work
> after the update?
>
> Who has the responsibility for looking at a syzbot bug report that comes
> in twelve months after the conversion is done and deciding whether the
> conversion was the problem, or whether it's some other patch that
> happened before or after?

Isn't the answer to those question the usual one: if you want to change
an in-kernel API, you have to switch all in-kernel users (or mark them
as broken and remove them later, if they apparently are not used anymore
in the wild), and deal with the fallout if a reliable bisection later
says that a regression is caused by a chance of yours?

The only thing slightly special is the testing story, as those for
things like drivers it is a whole lot simpler: developers there can get
away with only little or no testing, as the risk of data loss or damage
is extremely small.

But well, changes to arch/ or mm/ code can lead to data damage or loss
on rare or unsupported environments as well. All those CI systems out
there that test the kernel in various environments help to catch quite a
few of those problems before regular users run into them.

So why can't that work similarly for unmaintained file systems? We could
even establish the rule that Linus should only apply patches to some
parts of the kernel if the test suite for unmaintained file systems
succeeded without regressions. And only accept new file system code if a
test suite that is easy to integrate in CI systems exists (e.g.
something smaller and faster than what the ext4 and xfs developers run
regularly, but smaller and faster should likely be good enough here).

Ciao, Thorsten

[1] that's something Linus once said in the context of a regression, but
I think it fits here

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  1:53     ` Steven Rostedt
  2023-09-07  2:22       ` Dave Chinner
@ 2023-09-07  9:48       ` Dan Carpenter
  2023-09-07 11:04         ` Segher Boessenkool
  2023-09-08  8:39       ` Christoph Hellwig
  2 siblings, 1 reply; 97+ messages in thread
From: Dan Carpenter @ 2023-09-07  9:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dave Chinner, Guenter Roeck, Christoph Hellwig, ksummit,
	linux-fsdevel, gcc-patches

On Wed, Sep 06, 2023 at 09:53:27PM -0400, Steven Rostedt wrote:
> On Thu, 7 Sep 2023 08:54:38 +1000
> Dave Chinner <david@fromorbit.com> wrote:
> 
> > And let's not forget: removing a filesystem from the kernel is not
> > removing end user support for extracting data from old filesystems.
> > We have VMs for that - we can run pretty much any kernel ever built
> > inside a VM, so users that need to extract data from a really old
> > filesystem we no longer support in a modern kernel can simply boot
> > up an old distro that did support it and extract the data that way.
> 
> Of course there's the case of trying to recreate a OS that can run on a
> very old kernel. Just building an old kernel is difficult today because
> today's compilers will refuse to build them (I've hit issues in bisections
> because of that!)

Yeah.  I can't run Smatch on obsolete kernels because I can't build the
tools/ directory etc.  For example, it would be interesting to look at
really ancient kernels to see how buggy they are.  I started to hunt
down all the Makefile which add a -Werror but there are a lot and
eventually I got bored and gave up.

Someone should patch GCC so there it checks an environment variable to
ignore -Werror.  Somethine like this?

diff --git a/gcc/opts.cc b/gcc/opts.cc
index ac81d4e42944..2de69300d4fe 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -2598,6 +2598,17 @@ print_help (struct gcc_options *opts, unsigned int lang_mask,
 			 lang_mask);
 }
 
+static bool
+ignore_w_error(void)
+{
+  char *str;
+
+  str = getenv("IGNORE_WERROR");
+  if (str && strcmp(str, "1") == 0)
+    return true;
+  return false;
+}
+
 /* Handle target- and language-independent options.  Return zero to
    generate an "unknown option" message.  Only options that need
    extra handling need to be listed here; if you simply want
@@ -2773,11 +2784,15 @@ common_handle_option (struct gcc_options *opts,
       break;
 
     case OPT_Werror:
+      if (ignore_w_error())
+	break;
       dc->warning_as_error_requested = value;
       break;
 
     case OPT_Werror_:
-      if (lang_mask == CL_DRIVER)
+     if (ignore_w_error())
+	break;
+     if (lang_mask == CL_DRIVER)
 	break;
 
       enable_warning_as_error (arg, value, lang_mask, handlers,

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  8:04             ` Thorsten Leemhuis
@ 2023-09-07 10:29               ` Christian Brauner
  2023-09-07 11:18                 ` Thorsten Leemhuis
  0 siblings, 1 reply; 97+ messages in thread
From: Christian Brauner @ 2023-09-07 10:29 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Matthew Wilcox, Steven Rostedt, Dave Chinner, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

> So why can't that work similarly for unmaintained file systems? We could
> even establish the rule that Linus should only apply patches to some
> parts of the kernel if the test suite for unmaintained file systems
> succeeded without regressions. And only accept new file system code if a

Reading this mail scared me. The list of reiserfs bugs alone is crazy.
And syzbot keeps piling them on. It can't even succeed an xfstests run
without splatting all over the place last I checked. And there's no
maintainer for it. We'll pick up patches if we get sent them but none of
the vfs maintainers and reviewers has the bandwith to take care of
rotting filesystems and their various ailments.

Yes, we should have a discussion under what circumstances we can remove
a filesystem. I think that's absolutely what we should do and we should
nudge userspace to stop compiling known orphaned filesystems. If most
distros have stopped compiling support for a filesystem then I think
that's a good indication that we can at least start to talk about
how to remove it. And we should probably tell distros that a filesystem
is orphaned and unmaintained more aggressively.

But even if we decide or it is decided for us that we have to keep such
old filesystems in tree forever then the contract with userspaces must
be that such filesystems are zombies. They should however not become an
even bigger burden or obstacle to improve actively maintained
filesystems or the vfs than they are already.

I think it's also worth clarifying something:
Right now, everyone who does fs wide changes does their absolute best to
account for every filesytem that's in the tree. And for people not
familiar or even refusing to care about any other filesystems the
maintainers and reviewers will remind them about consequences for other
filesystems as far as they have that knowledge. And that's already a
major task.

For every single fs/ wide change we try to make absolutely sure that if
it regresses anything - even the deadest-of-dead filesystems - it will
be fixed as soon as we get a report. That's what we did for the
superblock rework this cycle, the posix acl rework last cycles, the
timestamp patches, the freezing patches.

But it is very scary to think that we might be put even more under the
yoke of dead filesystems. They put enough of a burden on us by not just
having to keep the filesystems itself around but quite often legacy
infrastructure and hacks in various places.

The burden of unmaintained filesystems is very very real. fs/ wide
changes are very costly in development time.

> test suite that is easy to integrate in CI systems exists (e.g.
> something smaller and faster than what the ext4 and xfs developers run
> regularly, but smaller and faster should likely be good enough here).

The big question of course is who is going to do that? We have a large
number of filesystems. And only a subset of them is integrated or even
integratable with xfstests. And xfstests is the standard for fs testing.

So either a filesystem is integrated with xfstests and we can test it or
it isn't and we can't. And if a legacy filesystem becomes integrated
then someone needs to do the work to determine what the baseline of
tests is that need to pass and then fix all bugs to get to a clean
baseline run.

That'll be a fulltime job for quite a while I would expect.

Imho, mounting an unmaintained filesystem that isn't integrated with
xfstests is a gamble with your data.

(And what really I would rather see happen before that is that we get
stuff like vfs.git to be auto-integrated with xfstests runs/CI at some
point.)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  9:48       ` Dan Carpenter
@ 2023-09-07 11:04         ` Segher Boessenkool
  2023-09-07 11:22           ` Steven Rostedt
  2023-09-07 11:23           ` Dan Carpenter
  0 siblings, 2 replies; 97+ messages in thread
From: Segher Boessenkool @ 2023-09-07 11:04 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Steven Rostedt, Dave Chinner, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel, gcc-patches

On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> I started to hunt
> down all the Makefile which add a -Werror but there are a lot and
> eventually I got bored and gave up.

I have a patch stack for that, since 2014 or so.  I build Linux with
unreleased GCC versions all the time, so pretty much any new warning is
fatal if you unwisely use -Werror.

> Someone should patch GCC so there it checks an environment variable to
> ignore -Werror.  Somethine like this?

No.  You should patch your program, instead.  One easy way is to add a
-Wno-error at the end of your command lines.  Or even just -w if you
want or need a bigger hammer.

Or nicer, put it all in Kconfig, like powerpc already has for example.
There is a CONFIG_WERROR as well, so maybe use that in all places?

> +static bool
> +ignore_w_error(void)
> +{
> +  char *str;
> +
> +  str = getenv("IGNORE_WERROR");
> +  if (str && strcmp(str, "1") == 0)

space before (

>      case OPT_Werror:
> +      if (ignore_w_error())
> +	break;
>        dc->warning_as_error_requested = value;
>        break;
>  
>      case OPT_Werror_:
> -      if (lang_mask == CL_DRIVER)
> +     if (ignore_w_error())
> +	break;
> +     if (lang_mask == CL_DRIVER)
>  	break;

The new indentation is messed up.  And please don't move the existing
early-out to later, it make more sense earlier, the way it was.


Segher

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  3:38           ` Dave Chinner
@ 2023-09-07 11:18             ` Steven Rostedt
  2023-09-13 16:43               ` Eric Sandeen
  0 siblings, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2023-09-07 11:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Thu, 7 Sep 2023 13:38:40 +1000
Dave Chinner <david@fromorbit.com> wrote:

> Hence, IMO, gutting a filesystem implementation to just support
> read-only behaviour "to prolong it's support life" actually makes
> things worse from a maintenance and testing persepective, not
> better....

From your other email about 10 years support, you could first set a fs to
read-only, and then after so long (I'm not sure 10 years is really
necessary), then remove it.

That is, make it the stage before removal. If no one complains about it
being read-only after several years, then it's highly likely that no one is
using it. If someone does complain, you can tell them to either maintain
it, or start moving all their data to another fs.

For testing, you could even have an #ifdef that needs to be manually
changed (not a config option) to make it writable.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 10:29               ` Christian Brauner
@ 2023-09-07 11:18                 ` Thorsten Leemhuis
  2023-09-07 12:04                   ` Matthew Wilcox
  2023-09-07 12:57                   ` Guenter Roeck
  0 siblings, 2 replies; 97+ messages in thread
From: Thorsten Leemhuis @ 2023-09-07 11:18 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Matthew Wilcox, Steven Rostedt, Dave Chinner, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On 07.09.23 12:29, Christian Brauner wrote:
>> So why can't that work similarly for unmaintained file systems? We could
>> even establish the rule that Linus should only apply patches to some
>> parts of the kernel if the test suite for unmaintained file systems
>> succeeded without regressions. And only accept new file system code if a
> 
> Reading this mail scared me.

Sorry about that, I can fully understand that. It's just that some
statements in this thread sounded a whole lot like "filesystems want to
opt-out of the no regression rule" to me. That's why I at some point
thought I had to speak up.

> The list of reiserfs bugs alone is crazy.

Well, we regularly remove drivers or even support for whole archs
without getting into conflict with the "no regressions" rule, so I'd say
that should be possible for file systems as well.

And I think for reiserfs we are on track with that.

But what about hfsplus? From hch's initial mail of this thread it sounds
like that is something users would miss. So removing it without a very
strong need[1] seems wrong to me. That's why I got involved in this
discussion.

[1] e.g. data loss or damage (as mentioned in my earlier mail) or
substantial security problems (forgot to mentioned them in my earlier mail)

> I think it's also worth clarifying something:
> Right now, everyone who does fs wide changes does their absolute best to
> account for every filesytem that's in the tree. And for people not
> familiar or even refusing to care about any other filesystems the
> maintainers and reviewers will remind them about consequences for other
> filesystems as far as they have that knowledge. And that's already a
> major task.
>
> For every single fs/ wide change we try to make absolutely sure that if
> it regresses anything - even the deadest-of-dead filesystems - it will
> be fixed as soon as we get a report. [...]

I know. Big thx to everyone doing the work here!

> But it is very scary to think that we might be put even more under the
> yoke of dead filesystems.

That is not my intent. I just want to ensure the "no regressions" rule
is not forgotten in this discussion.

>> test suite that is easy to integrate in CI systems exists (e.g.
>> something smaller and faster than what the ext4 and xfs developers run
>> regularly, but smaller and faster should likely be good enough here).
> 
> The big question of course is who is going to do that?
> [...]
> That'll be a fulltime job for quite a while I would expect.

Yeah, I know. :-/

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 11:04         ` Segher Boessenkool
@ 2023-09-07 11:22           ` Steven Rostedt
  2023-09-07 12:24             ` Segher Boessenkool
  2023-09-07 11:23           ` Dan Carpenter
  1 sibling, 1 reply; 97+ messages in thread
From: Steven Rostedt @ 2023-09-07 11:22 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Dan Carpenter, Dave Chinner, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel, gcc-patches

On Thu, 7 Sep 2023 06:04:09 -0500
Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> > I started to hunt
> > down all the Makefile which add a -Werror but there are a lot and
> > eventually I got bored and gave up.  
> 
> I have a patch stack for that, since 2014 or so.  I build Linux with
> unreleased GCC versions all the time, so pretty much any new warning is
> fatal if you unwisely use -Werror.
> 
> > Someone should patch GCC so there it checks an environment variable to
> > ignore -Werror.  Somethine like this?  
> 
> No.  You should patch your program, instead.  One easy way is to add a
> -Wno-error at the end of your command lines.  Or even just -w if you
> want or need a bigger hammer.

That's not really possible when bisecting a kernel bug into older kernels.
The build system is highly complex and requires hundreds of changes to do
what you suggested. As it is for a bisection that takes a minimum of 13
iterations, your approach just isn't feasible.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 11:04         ` Segher Boessenkool
  2023-09-07 11:22           ` Steven Rostedt
@ 2023-09-07 11:23           ` Dan Carpenter
  2023-09-07 12:30             ` Segher Boessenkool
  1 sibling, 1 reply; 97+ messages in thread
From: Dan Carpenter @ 2023-09-07 11:23 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Steven Rostedt, Dave Chinner, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel, gcc-patches

On Thu, Sep 07, 2023 at 06:04:09AM -0500, Segher Boessenkool wrote:
> On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> > I started to hunt
> > down all the Makefile which add a -Werror but there are a lot and
> > eventually I got bored and gave up.
> 
> I have a patch stack for that, since 2014 or so.  I build Linux with
> unreleased GCC versions all the time, so pretty much any new warning is
> fatal if you unwisely use -Werror.
> 
> > Someone should patch GCC so there it checks an environment variable to
> > ignore -Werror.  Somethine like this?
> 
> No.  You should patch your program, instead.

There are 2930 Makefiles in the kernel source.

> One easy way is to add a
> -Wno-error at the end of your command lines.  Or even just -w if you
> want or need a bigger hammer.

I tried that.  Some of the Makefiles check an environemnt variable as
well if you want to turn off -Werror.  It's not a complete solution at
all.  I have no idea what a complete solution looks like because I gave
up.

> 
> Or nicer, put it all in Kconfig, like powerpc already has for example.
> There is a CONFIG_WERROR as well, so maybe use that in all places?

That's a good idea but I'm trying to compile old kernels and not the
current kernel.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 11:18                 ` Thorsten Leemhuis
@ 2023-09-07 12:04                   ` Matthew Wilcox
  2023-09-07 12:57                   ` Guenter Roeck
  1 sibling, 0 replies; 97+ messages in thread
From: Matthew Wilcox @ 2023-09-07 12:04 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Christian Brauner, Steven Rostedt, Dave Chinner, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On Thu, Sep 07, 2023 at 01:18:27PM +0200, Thorsten Leemhuis wrote:
> On 07.09.23 12:29, Christian Brauner wrote:
> >> So why can't that work similarly for unmaintained file systems? We could
> >> even establish the rule that Linus should only apply patches to some
> >> parts of the kernel if the test suite for unmaintained file systems
> >> succeeded without regressions. And only accept new file system code if a
> > 
> > Reading this mail scared me.
> 
> Sorry about that, I can fully understand that. It's just that some
> statements in this thread sounded a whole lot like "filesystems want to
> opt-out of the no regression rule" to me. That's why I at some point
> thought I had to speak up.

It's the very opposite of that.  We're all highly conscious of not eating
user data.  Which means that filesystem development often grinds to a
halt while we investigatee bugs.  This is why syzbot is so freaking
dangerous.  It's essentially an automated assault on fs developers.
Worse, Google released syzkaller to the public and now we have random
arseholes running it who have "made proprietary changes to it", and have
no idea how to decide if a report from it is in any way useful.

> But what about hfsplus? From hch's initial mail of this thread it sounds
> like that is something users would miss. So removing it without a very
> strong need[1] seems wrong to me. That's why I got involved in this
> discussion.
> 
> [1] e.g. data loss or damage (as mentioned in my earlier mail) or
> substantial security problems (forgot to mentioned them in my earlier mail)

That's the entire problem!  A seemingly innocent change can easily
lose HFS+ data and we wouldn't find out for years because there's no
test-suite.  A properly tested filesystem looks like this:

https://lore.kernel.org/linux-ext4/20230903120001.qjv5uva2zaqthgk2@zlang-mailbox/

I inadvertently introduced a bug in ext4 with 1kB block size; it's
picked up in less than a week, and within a week of the initial report,
it's diagnosed and fixed.

If that same bug had been introduced to HFS+, how long would it have
taken for anyone to find the bug?  How much longer would it have taken
to track down and fix?


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 11:22           ` Steven Rostedt
@ 2023-09-07 12:24             ` Segher Boessenkool
  0 siblings, 0 replies; 97+ messages in thread
From: Segher Boessenkool @ 2023-09-07 12:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dan Carpenter, Dave Chinner, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel, gcc-patches

On Thu, Sep 07, 2023 at 07:22:45AM -0400, Steven Rostedt wrote:
> On Thu, 7 Sep 2023 06:04:09 -0500
> Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> > No.  You should patch your program, instead.  One easy way is to add a
> > -Wno-error at the end of your command lines.  Or even just -w if you
> > want or need a bigger hammer.
> 
> That's not really possible when bisecting a kernel bug into older kernels.
> The build system is highly complex and requires hundreds of changes to do
> what you suggested. As it is for a bisection that takes a minimum of 13
> iterations, your approach just isn't feasible.

Isn't this exactly what KCFLAGS is for?

But, I meant to edit the build system.  It isn't so hard to bisect with
patch stacks on top.  Just a bit annoying.


Segher

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 11:23           ` Dan Carpenter
@ 2023-09-07 12:30             ` Segher Boessenkool
  2023-09-12  9:50               ` Richard Biener
  0 siblings, 1 reply; 97+ messages in thread
From: Segher Boessenkool @ 2023-09-07 12:30 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Steven Rostedt, Dave Chinner, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel, gcc-patches

On Thu, Sep 07, 2023 at 02:23:00PM +0300, Dan Carpenter wrote:
> On Thu, Sep 07, 2023 at 06:04:09AM -0500, Segher Boessenkool wrote:
> > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> > > I started to hunt
> > > down all the Makefile which add a -Werror but there are a lot and
> > > eventually I got bored and gave up.
> > 
> > I have a patch stack for that, since 2014 or so.  I build Linux with
> > unreleased GCC versions all the time, so pretty much any new warning is
> > fatal if you unwisely use -Werror.
> > 
> > > Someone should patch GCC so there it checks an environment variable to
> > > ignore -Werror.  Somethine like this?
> > 
> > No.  You should patch your program, instead.
> 
> There are 2930 Makefiles in the kernel source.

Yes.  And you need patches to about thirty.  Or a bit more, if you want
to do it more cleanly.  This isn't a guess.

> > One easy way is to add a
> > -Wno-error at the end of your command lines.  Or even just -w if you
> > want or need a bigger hammer.
> 
> I tried that.  Some of the Makefiles check an environemnt variable as
> well if you want to turn off -Werror.  It's not a complete solution at
> all.  I have no idea what a complete solution looks like because I gave
> up.

A solution can not involve changing the compiler.  That is just saying
the kernel doesn't know how to fix its own problems, so let's give the
compiler some more unnecessary problems.

> > Or nicer, put it all in Kconfig, like powerpc already has for example.
> > There is a CONFIG_WERROR as well, so maybe use that in all places?
> 
> That's a good idea but I'm trying to compile old kernels and not the
> current kernel.

You can patch older kernels, too, you know :-)

If you need to not make any changes to your source code for some crazy
reason (political perhaps?), just use a shell script or shell function
instead of invoking the compiler driver directly?


Segher

Segher

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 11:18                 ` Thorsten Leemhuis
  2023-09-07 12:04                   ` Matthew Wilcox
@ 2023-09-07 12:57                   ` Guenter Roeck
  2023-09-07 13:56                     ` Christian Brauner
  2023-09-08  8:44                     ` Christoph Hellwig
  1 sibling, 2 replies; 97+ messages in thread
From: Guenter Roeck @ 2023-09-07 12:57 UTC (permalink / raw)
  To: Thorsten Leemhuis, Christian Brauner
  Cc: Matthew Wilcox, Steven Rostedt, Dave Chinner, Christoph Hellwig,
	ksummit, linux-fsdevel

On 9/7/23 04:18, Thorsten Leemhuis wrote:
> On 07.09.23 12:29, Christian Brauner wrote:
>>> So why can't that work similarly for unmaintained file systems? We could
>>> even establish the rule that Linus should only apply patches to some
>>> parts of the kernel if the test suite for unmaintained file systems
>>> succeeded without regressions. And only accept new file system code if a
>>
>> Reading this mail scared me.
> 
> Sorry about that, I can fully understand that. It's just that some
> statements in this thread sounded a whole lot like "filesystems want to
> opt-out of the no regression rule" to me. That's why I at some point
> thought I had to speak up.
> 
>> The list of reiserfs bugs alone is crazy.
> 
> Well, we regularly remove drivers or even support for whole archs
> without getting into conflict with the "no regressions" rule, so I'd say
> that should be possible for file systems as well.
> 
> And I think for reiserfs we are on track with that.
> 
> But what about hfsplus? From hch's initial mail of this thread it sounds
> like that is something users would miss. So removing it without a very
> strong need[1] seems wrong to me. That's why I got involved in this
> discussion.
> 

The original mail also suggested that there would be essentially no means
to create a hfsplus file system in Linux. That would mean it would, for all
practical purposes, be untestable.

However:

$ sudo apt-get install hfsprogs
$ truncate -s 64M filesystem.hfsplus
$ mkfs.hfsplus filesystem.hfsplus
Initialized filesystem.hfsplus as a 64 MB HFS Plus volume
$ file filesystem.hfsplus
filesystem.hfsplus: Macintosh HFS Extended version 4 data last mounted by: '10.0', created: Thu Sep  7 05:41:21 2023, last modified: Thu Sep  7 12:41:21 2023, last checked: Thu Sep  7 12:41:7

So I am not really sure I understand what the problem actually is.

No a side note, the crash I observed with ntfs3 was introduced by
commit a4f64a300a29 ("ntfs3: free the sbi in ->kill_sb").

Guenter


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 12:57                   ` Guenter Roeck
@ 2023-09-07 13:56                     ` Christian Brauner
  2023-09-08  8:44                     ` Christoph Hellwig
  1 sibling, 0 replies; 97+ messages in thread
From: Christian Brauner @ 2023-09-07 13:56 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Thorsten Leemhuis, Matthew Wilcox, Steven Rostedt, Dave Chinner,
	Christoph Hellwig, ksummit, linux-fsdevel

On Thu, Sep 07, 2023 at 05:57:47AM -0700, Guenter Roeck wrote:
> On 9/7/23 04:18, Thorsten Leemhuis wrote:
> > On 07.09.23 12:29, Christian Brauner wrote:
> > > > So why can't that work similarly for unmaintained file systems? We could
> > > > even establish the rule that Linus should only apply patches to some
> > > > parts of the kernel if the test suite for unmaintained file systems
> > > > succeeded without regressions. And only accept new file system code if a
> > > 
> > > Reading this mail scared me.
> > 
> > Sorry about that, I can fully understand that. It's just that some
> > statements in this thread sounded a whole lot like "filesystems want to
> > opt-out of the no regression rule" to me. That's why I at some point
> > thought I had to speak up.
> > 
> > > The list of reiserfs bugs alone is crazy.
> > 
> > Well, we regularly remove drivers or even support for whole archs
> > without getting into conflict with the "no regressions" rule, so I'd say
> > that should be possible for file systems as well.
> > 
> > And I think for reiserfs we are on track with that.
> > 
> > But what about hfsplus? From hch's initial mail of this thread it sounds
> > like that is something users would miss. So removing it without a very
> > strong need[1] seems wrong to me. That's why I got involved in this
> > discussion.
> > 
> 
> The original mail also suggested that there would be essentially no means
> to create a hfsplus file system in Linux. That would mean it would, for all
> practical purposes, be untestable.
> 
> However:
> 
> $ sudo apt-get install hfsprogs
> $ truncate -s 64M filesystem.hfsplus
> $ mkfs.hfsplus filesystem.hfsplus
> Initialized filesystem.hfsplus as a 64 MB HFS Plus volume
> $ file filesystem.hfsplus
> filesystem.hfsplus: Macintosh HFS Extended version 4 data last mounted by: '10.0', created: Thu Sep  7 05:41:21 2023, last modified: Thu Sep  7 12:41:21 2023, last checked: Thu Sep  7 12:41:7
> 
> So I am not really sure I understand what the problem actually is.
> 
> No a side note, the crash I observed with ntfs3 was introduced by
> commit a4f64a300a29 ("ntfs3: free the sbi in ->kill_sb").

I just gave you a fix to test for your report.

(ntfs3 was holding on to inodes past ->put_super(). Anyway, not relevant
for this discussion here.)

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06 15:06       ` Christian Brauner
  2023-09-06 15:59         ` Christian Brauner
  2023-09-06 19:09         ` Geert Uytterhoeven
@ 2023-09-08  8:34         ` Christoph Hellwig
  2 siblings, 0 replies; 97+ messages in thread
From: Christoph Hellwig @ 2023-09-08  8:34 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Dave Chinner, Matthew Wilcox, Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 05:06:29PM +0200, Christian Brauner wrote:
> So adding another RFE to libmount to add support for a global allowlist
> or denylist of filesystems and refuse to mount anything else might also
> be a good thing. Actually, might go and do this now.
> 
> So that we can slowly move userspace towards a smaller set of
> filesystems and then distros can start turning off more and more
> filesystems.

A global list is good, maintaining it in util-linux is stupid.  This
needs to be in the kernel as that's where we have all the data.  IMHO
a flag in struct file_system_type thast gets exposed in
/proc/filesystems and maybe even a flag to the new mount API to tell
"this is an automount" and refuse it it the trusted flag is not set
will work much better.  That way we can also easily upgrade/downgrade
the status of a file system as needed.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-06 22:54   ` Dave Chinner
  2023-09-07  0:53     ` Bagas Sanjaya
  2023-09-07  1:53     ` Steven Rostedt
@ 2023-09-08  8:38     ` Christoph Hellwig
  2023-09-08 23:21       ` Dave Chinner
  2 siblings, 1 reply; 97+ messages in thread
From: Christoph Hellwig @ 2023-09-08  8:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Thu, Sep 07, 2023 at 08:54:38AM +1000, Dave Chinner wrote:
> There's a bigger policy question around that.
> 
> I think that if we are going to have filesystems be "community
> maintained" because they have no explicit maintainer, we need some
> kind of standard policy to be applied.
> 
> I'd argue that the filesystem needs, at minimum, a working mkfs and
> fsck implementation, and that it is supported by fstests so anyone
> changing core infrastructure can simply run fstests against the
> filesystem to smoke test the infrastructure changes they are making.

Yes, that's what I tried to imply above.  We could relax fsck a bit
(even if that is playing fast and lose), but without mkfs there is
no way anyone can verify anything

> 
> I'd suggest that syzbot coverage of such filesystems is not desired,
> because nobody is going to be fixing problems related to on-disk
> format verification. All we really care about is that a user can
> read and write to the filesystem without trashing anything.

Agreed.

> I'd also suggest that we mark filesystem support state via fstype
> flags rather than config options. That way we aren't reliant on
> distros setting config options correctly to include/indicate the
> state of the filesystem implementation. We could also use similar
> flags for indicating deprecation and obsolete state (i.e. pending
> removal) and have code in the high level mount path issue the
> relevant warnings.

Agreed.

> This method of marking would also allow us to document and implement
> a formal policy for removal of unmaintained and/or obsolete
> filesystems without having to be dependent on distros juggling
> config variables to allow users to continue using deprecated, broken
> and/or obsolete filesystem implementations right up to the point
> where they are removed from the kernel.

I'd love to get there, but that might be a harder sell.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07  1:53     ` Steven Rostedt
  2023-09-07  2:22       ` Dave Chinner
  2023-09-07  9:48       ` Dan Carpenter
@ 2023-09-08  8:39       ` Christoph Hellwig
  2 siblings, 0 replies; 97+ messages in thread
From: Christoph Hellwig @ 2023-09-08  8:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dave Chinner, Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 09:53:27PM -0400, Steven Rostedt wrote:
> Anyway, what about just having read-only be the minimum for supporting a
> file system? We can say "sorry, due to no one maintaining this file system,
> we will no longer allow write access." But I'm guessing that just
> supporting reading an old file system is much easier than modifying one
> (wasn't that what we did with NTFS for the longest time?)

read-only is just as annoying, because all our normal test infrastruture
doesn't work for that at all.  So you'd need not only a test harness
for that, but also a lot of publically shared images and/or a tool
to generate filled images.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 12:57                   ` Guenter Roeck
  2023-09-07 13:56                     ` Christian Brauner
@ 2023-09-08  8:44                     ` Christoph Hellwig
  1 sibling, 0 replies; 97+ messages in thread
From: Christoph Hellwig @ 2023-09-08  8:44 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Thorsten Leemhuis, Christian Brauner, Matthew Wilcox,
	Steven Rostedt, Dave Chinner, Christoph Hellwig, ksummit,
	linux-fsdevel

On Thu, Sep 07, 2023 at 05:57:47AM -0700, Guenter Roeck wrote:
> $ sudo apt-get install hfsprogs

Oh, looks like ohn Paul Adrian Glaubitz actually resurrected it after
a 7 year hiatus when it was dropped entirely.  That's good news to at
least keep hfsplus on life support.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-05 23:06 ` Dave Chinner
  2023-09-05 23:23   ` Matthew Wilcox
@ 2023-09-08  8:55   ` Christoph Hellwig
  2023-09-08 22:47     ` Dave Chinner
  1 sibling, 1 reply; 97+ messages in thread
From: Christoph Hellwig @ 2023-09-08  8:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
> I think this completely misses the point of contention of the larger
> syzbot vs filesystem discussion: the assertion that "testing via
> syzbot means the subsystem is secure" where "secure" means "can be
> used safely for operations that involve trust model violations".
> 
> Fundamentally, syzbot does nothing to actually validate the
> filesystem is "secure". Fuzzing can only find existing bugs by
> simulating an attacker, but it does nothing to address the
> underlying issues that allow that attack channel to exist.

I don't think anyone makes that assertation.  Instead the assumptions
is something that is handling untrusted input should be available to
surive fuzzing by syzbot, and that's an assumption I agree with.  That
doesn't imply anything surving syzbot is secure, but it if doesn't
survive syzbot it surely can't deal with untrusted input.

> > unmaintained.  If we want to move the kernel forward by finishing
> > API transitions (new mount API, buffer_head removal for the I/O path,
> > ->writepage removal, etc) these file systems need to change as well
> > and need some kind of testing.  The easiest way forward would be
> > to remove everything that is not fully maintained, but that would
> > remove a lot of useful features.
> 
> Linus has explicitly NACKed that approach.
> 
> https://lore.kernel.org/linux-fsdevel/CAHk-=wg7DSNsHY6tWc=WLeqDBYtXges_12fFk1c+-No+fZ0xYQ@mail.gmail.com/

.. and that is why I'm bring this up in a place where we can have
a proper procedural discussion instead of snarky remarks.  This is
a fundamental problem we;ll need to sort out.

> Which is a problem, because historically we've taken code into
> the kernel without requiring a maintainer, or the people who
> maintained the code have moved on, yet we don't have a policy for
> removing code that is slowly bit-rotting to uselessness.

... and we keep merging crap that goes against all established normal
requirements when people things it's new and shiny and cool :(


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-08  8:55   ` Christoph Hellwig
@ 2023-09-08 22:47     ` Dave Chinner
  0 siblings, 0 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-08 22:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: ksummit, linux-fsdevel

On Fri, Sep 08, 2023 at 01:55:11AM -0700, Christoph Hellwig wrote:
> On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
> > I think this completely misses the point of contention of the larger
> > syzbot vs filesystem discussion: the assertion that "testing via
> > syzbot means the subsystem is secure" where "secure" means "can be
> > used safely for operations that involve trust model violations".
> > 
> > Fundamentally, syzbot does nothing to actually validate the
> > filesystem is "secure". Fuzzing can only find existing bugs by
> > simulating an attacker, but it does nothing to address the
> > underlying issues that allow that attack channel to exist.
> 
> I don't think anyone makes that assertation.  Instead the assumptions
> is something that is handling untrusted input should be available to
> surive fuzzing by syzbot, and that's an assumption I agree with.  That
> doesn't imply anything surving syzbot is secure, but it if doesn't
> survive syzbot it surely can't deal with untrusted input.

Sure, but as an experienced filesystem developer who, 15 years ago,
architected and implemented a metadata verification mechanism that
effectively defeats *random bit mutation metadata fuzzing*, I am
making sure that everyone is aware that "syzbot doesn't find
problems" is not the same thing as "filesystem is safe to handle
untrusted input".

Sure, syzbot being unable to find problems is a good start, but I
know *many* ways to screw over the XFS kernel implementation by
mutating the metadata in nasty ways that we *can't actually protect
against* at runtime, and that syzbot is *never* going to stumble
across by a random walk through all the possible bit mutations that
can occur in a filesystem's metadata.

I stress this again: syzbot not finding problems does not, in any
way, imply that a filesytem implementation is safe to parse
untrusted filesystem images in a ring 0 context. Anyone who says
that "syzbot doesn't find problems, so it's good to go with
untrusted input" is completely ignoring the long standing and well
known practical limitations of the fuzzing techniques being used by
tools like syzbot...

> > > unmaintained.  If we want to move the kernel forward by finishing
> > > API transitions (new mount API, buffer_head removal for the I/O path,
> > > ->writepage removal, etc) these file systems need to change as well
> > > and need some kind of testing.  The easiest way forward would be
> > > to remove everything that is not fully maintained, but that would
> > > remove a lot of useful features.
> > 
> > Linus has explicitly NACKed that approach.
> > 
> > https://lore.kernel.org/linux-fsdevel/CAHk-=wg7DSNsHY6tWc=WLeqDBYtXges_12fFk1c+-No+fZ0xYQ@mail.gmail.com/
> 
> .. and that is why I'm bring this up in a place where we can have
> a proper procedural discussion instead of snarky remarks.  This is
> a fundamental problem we;ll need to sort out.

I agree, which is why I'm trying to make sure that everyone has the
same understanding of the situation. Allowing filesystems to parse
untrusted data in ring 0 context comes down how which filesystem
developers actually trust their code and on-disk format verification
enough to allow it to be exposed willingly to untrusted input.

Make no mistake about it: I'm not willing to take that risk with
XFS. I'm not willing to take responsibility for deciding that we
should expose XFS to untrusted code - I *know* that it isn't safe,
and it would be gross negligence for me to present the code that I
help maintain and develop any other way.

> > Which is a problem, because historically we've taken code into
> > the kernel without requiring a maintainer, or the people who
> > maintained the code have moved on, yet we don't have a policy for
> > removing code that is slowly bit-rotting to uselessness.
> 
> ... and we keep merging crap that goes against all established normal
> requirements when people things it's new and shiny and cool :(

Well, yes, but that's a separate (though somewhat related)
discussion.

The observation I'd make from your comment is that the Linux
project, as a whole, has no clearly defined feature life-cycle
process. For the purpose of this discussion, we're concerned about
the end-of-life process for removing ancient, obsolete and/or broken
code in a sane, timely manner that we are completely lacking.

A project that has been going for 30 years, and is likely to be
going for another 30 years, needs to have a well defined EOL
process. Not just for filesystems, but for everything: syscalls,
drivers, platforms, sysfs interfaces, etc.

The current process of "send an email, and if anyone shouts don't
remove it" means that as long as there's a single user left, we
can't get rid of the junk that is causing us problems right now.

That's a terrible policy. As long as a single person has something
on their shelf they want to have keep working, we're supposed to
keep it working. In the cases where the developer time to keep the
feature working outweighs the number of users, the cost/benefit
ratio is so so far on the "cost" side it is not funny.  And when it
comes to filesystems, the risk/benefit analysis is pegged as hard as
it can be against the "risk" side.

IOWs, there's a wider scope here than just "how do we manage all
these obsolete, buggy, legacy filesystems?". I points to the fact
that the Linux project itself doesn't really know how to remove old
code and features that have become a burden to ongoing
development....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-08  8:38     ` Christoph Hellwig
@ 2023-09-08 23:21       ` Dave Chinner
  0 siblings, 0 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-08 23:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Guenter Roeck, ksummit, linux-fsdevel

On Fri, Sep 08, 2023 at 01:38:27AM -0700, Christoph Hellwig wrote:
> On Thu, Sep 07, 2023 at 08:54:38AM +1000, Dave Chinner wrote:
> > There's a bigger policy question around that.
> > 
> > I think that if we are going to have filesystems be "community
> > maintained" because they have no explicit maintainer, we need some
> > kind of standard policy to be applied.
> > 
> > I'd argue that the filesystem needs, at minimum, a working mkfs and
> > fsck implementation, and that it is supported by fstests so anyone
> > changing core infrastructure can simply run fstests against the
> > filesystem to smoke test the infrastructure changes they are making.
> 
> Yes, that's what I tried to imply above.  We could relax fsck a bit
> (even if that is playing fast and lose), but without mkfs there is
> no way anyone can verify anything
> 
> > 
> > I'd suggest that syzbot coverage of such filesystems is not desired,
> > because nobody is going to be fixing problems related to on-disk
> > format verification. All we really care about is that a user can
> > read and write to the filesystem without trashing anything.
> 
> Agreed.
> 
> > I'd also suggest that we mark filesystem support state via fstype
> > flags rather than config options. That way we aren't reliant on
> > distros setting config options correctly to include/indicate the
> > state of the filesystem implementation. We could also use similar
> > flags for indicating deprecation and obsolete state (i.e. pending
> > removal) and have code in the high level mount path issue the
> > relevant warnings.
> 
> Agreed.
> 
> > This method of marking would also allow us to document and implement
> > a formal policy for removal of unmaintained and/or obsolete
> > filesystems without having to be dependent on distros juggling
> > config variables to allow users to continue using deprecated, broken
> > and/or obsolete filesystem implementations right up to the point
> > where they are removed from the kernel.
> 
> I'd love to get there, but that might be a harder sell.

Yet that is exactly what we need. We need a well defined life-cycle
policy for features like filesystems. Just as much as we need a
clear, well defined process for removing obsolete filesystems, we
need a well defined policy for merging new filesystems.

The lack of well defined policies leads to arguments, arbitary
roadblocks being dropped again and again in front of merges, and it
does not prevent things like "dumping" from occurring. i.e. the
filesystem is merged, and then the "maintainer" immediately goes
AWOL and this new filesystem becomes an instant burden on the rest
of the fs development community to the point where fs developers
already immediately disregard any issue on a kernel that has used
that filesystem.

Without defined policies and processes to avoid repeating the same
mistakes and arguments and disagreements over and over for each new
filesystem someone wants to merge or remove, we aren't going to pull
ourselves out of the hole we've dug. This isn't the wild west here;
this is a room full of professional engineers. Defining new
processes and policies to make things easier, take less resources,
cause less friction, make operations more efficient, etc is part of
what we are supposed to do. Not everything can be solved with code;
the lack of defined processes for making major changes is the
biggest single issue leading to the problems we have right now....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-05 23:23   ` Matthew Wilcox
  2023-09-06  2:09     ` Dave Chinner
  2023-09-07  0:46     ` Bagas Sanjaya
@ 2023-09-09 12:50     ` James Bottomley
  2023-09-09 15:44       ` Matthew Wilcox
  2023-09-09 22:42       ` Kent Overstreet
  2 siblings, 2 replies; 97+ messages in thread
From: James Bottomley @ 2023-09-09 12:50 UTC (permalink / raw)
  To: Matthew Wilcox, Dave Chinner; +Cc: Christoph Hellwig, ksummit, linux-fsdevel

On Wed, 2023-09-06 at 00:23 +0100, Matthew Wilcox wrote:
> On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
[...]
> > > E.g. the hfsplus driver is unmaintained despite collecting odd
> > > fixes. It collects odd fixes because it is really useful for
> > > interoperating with MacOS and it would be a pity to remove it. 
> > > At the same time it is impossible to test changes to hfsplus
> > > sanely as there is no mkfs.hfsplus or fsck.hfsplus available for
> > > Linux.  We used to have one that was ported from the open source
> > > Darwin code drops, and I managed to get xfstests to run on
> > > hfsplus with them, but this old version doesn't compile on any
> > > modern Linux distribution and new versions of the code aren't
> > > trivially portable to Linux.
> > > 
> > > Do we have volunteers with old enough distros that we can list as
> > > testers for this code?  Do we have any other way to proceed?
> > > 
> > > If we don't, are we just going to untested API changes to these
> > > code bases, or keep the old APIs around forever?
> > 
> > We do slowly remove device drivers and platforms as the hardware,
> > developers and users disappear. We do also just change driver APIs
> > in device drivers for hardware that no-one is actually able to
> > test. The assumption is that if it gets broken during API changes,
> > someone who needs it to work will fix it and send patches.
> > 
> > That seems to be the historical model for removing unused/obsolete
> > code from the kernel, so why should we treat unmaintained/obsolete
> > filesystems any differently?  i.e. Just change the API, mark it
> > CONFIG_BROKEN until someone comes along and starts fixing it...
> 
> Umm.  If I change ->write_begin and ->write_end to take a folio,
> convert only the filesystems I can test via Luis' kdevops and mark
> the rest as CONFIG_BROKEN, I can guarantee you that Linus will reject
> that pull request.

I think really everyone in this debate needs to recognize two things:

   1. There are older systems out there that have an active group of
      maintainers and which depend on some of these older filesystems
   2. Data image archives will ipso facto be in older formats and
      preserving access to them is a historical necessity.

So the problem of what to do with older, less well maintained,
filesystems isn't one that can be solved by simply deleting them and we
have to figure out a way to move forward supporting them (obviously for
some value of the word "support"). 

By the way, people who think virtualization is the answer to this
should remember that virtual hardware is evolving just as fast as
physical hardware.

> I really feel we're between a rock and a hard place with our
> unmaintained filesystems.  They have users who care passionately, but
> not the ability to maintain them.

So why is everybody making this a hard either or? The volunteer
communities that grow around older things like filesystems are going to
be enthusiastic, but not really acquainted with the technical
intricacies of the modern VFS and mm. Requiring that they cope with all
the new stuff like iomap and folios is building an unbridgeable chasm
they're never going to cross. Give them an easier way and they might
get there.

So why can't we figure out that easier way? What's wrong with trying to
figure out if we can do some sort of helper or library set that assists
supporting and porting older filesystems. If we can do that it will not
only make the job of an old fs maintainer a lot easier, but it might
just provide the stepping stones we need to encourage more people climb
up into the modern VFS world.

I'd like to propose that we add to this topic discussion of mechanisms
by which we assist people taking on older filesystems to fit into the
modern world.

James


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-09 12:50     ` James Bottomley
@ 2023-09-09 15:44       ` Matthew Wilcox
  2023-09-10 19:51         ` James Bottomley
  2023-09-09 22:42       ` Kent Overstreet
  1 sibling, 1 reply; 97+ messages in thread
From: Matthew Wilcox @ 2023-09-09 15:44 UTC (permalink / raw)
  To: James Bottomley; +Cc: Dave Chinner, Christoph Hellwig, ksummit, linux-fsdevel

On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> On Wed, 2023-09-06 at 00:23 +0100, Matthew Wilcox wrote:
> > On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
> [...]
> > > > E.g. the hfsplus driver is unmaintained despite collecting odd
> > > > fixes. It collects odd fixes because it is really useful for
> > > > interoperating with MacOS and it would be a pity to remove it. 
> > > > At the same time it is impossible to test changes to hfsplus
> > > > sanely as there is no mkfs.hfsplus or fsck.hfsplus available for
> > > > Linux.  We used to have one that was ported from the open source
> > > > Darwin code drops, and I managed to get xfstests to run on
> > > > hfsplus with them, but this old version doesn't compile on any
> > > > modern Linux distribution and new versions of the code aren't
> > > > trivially portable to Linux.
> > > > 
> > > > Do we have volunteers with old enough distros that we can list as
> > > > testers for this code?  Do we have any other way to proceed?
> > > > 
> > > > If we don't, are we just going to untested API changes to these
> > > > code bases, or keep the old APIs around forever?
> > > 
> > > We do slowly remove device drivers and platforms as the hardware,
> > > developers and users disappear. We do also just change driver APIs
> > > in device drivers for hardware that no-one is actually able to
> > > test. The assumption is that if it gets broken during API changes,
> > > someone who needs it to work will fix it and send patches.
> > > 
> > > That seems to be the historical model for removing unused/obsolete
> > > code from the kernel, so why should we treat unmaintained/obsolete
> > > filesystems any differently?  i.e. Just change the API, mark it
> > > CONFIG_BROKEN until someone comes along and starts fixing it...
> > 
> > Umm.  If I change ->write_begin and ->write_end to take a folio,
> > convert only the filesystems I can test via Luis' kdevops and mark
> > the rest as CONFIG_BROKEN, I can guarantee you that Linus will reject
> > that pull request.
> 
> I think really everyone in this debate needs to recognize two things:
> 
>    1. There are older systems out there that have an active group of
>       maintainers and which depend on some of these older filesystems
>    2. Data image archives will ipso facto be in older formats and
>       preserving access to them is a historical necessity.

I don't understand why you think people don't recognise those things.

> So the problem of what to do with older, less well maintained,
> filesystems isn't one that can be solved by simply deleting them and we
> have to figure out a way to move forward supporting them (obviously for
> some value of the word "support"). 
> 
> By the way, people who think virtualization is the answer to this
> should remember that virtual hardware is evolving just as fast as
> physical hardware.

I think that's a red herring.  Of course there are advances in virtual
hardware for those who need the best performance.  But there's also
qemu's ability to provide to you a 1981-vintage PC (or more likely a
2000-era PC).  That's not going away.

> > I really feel we're between a rock and a hard place with our
> > unmaintained filesystems.  They have users who care passionately, but
> > not the ability to maintain them.
> 
> So why is everybody making this a hard either or? The volunteer
> communities that grow around older things like filesystems are going to
> be enthusiastic, but not really acquainted with the technical
> intricacies of the modern VFS and mm. Requiring that they cope with all
> the new stuff like iomap and folios is building an unbridgeable chasm
> they're never going to cross. Give them an easier way and they might
> get there.

Spoken like someone who has been paying no attention at all to what's
going on in filesystems.  The newer APIs are easier to use.  The problem
is understanding what the hell the old filesystems are doing with the
old APIs.

Nobody's interested.  That's the problem.  The number of filesystem
developers we have is shrinking.  There hasn't been an HFS maintainer
since 2011, and it wasn't a problem until syzbot decreed that every
filesystem bug is a security bug.  And now, who'd want to be a fs
maintainer with the automated harassment?

Burnout amongst fs maintainers is a real problem.  I have no idea how
to solve it.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-09 12:50     ` James Bottomley
  2023-09-09 15:44       ` Matthew Wilcox
@ 2023-09-09 22:42       ` Kent Overstreet
  2023-09-10  8:19         ` Geert Uytterhoeven
  2023-09-11  1:05         ` Dave Chinner
  1 sibling, 2 replies; 97+ messages in thread
From: Kent Overstreet @ 2023-09-09 22:42 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Dave Chinner, Christoph Hellwig, ksummit, linux-fsdevel

On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> So why can't we figure out that easier way? What's wrong with trying to
> figure out if we can do some sort of helper or library set that assists
> supporting and porting older filesystems. If we can do that it will not
> only make the job of an old fs maintainer a lot easier, but it might
> just provide the stepping stones we need to encourage more people climb
> up into the modern VFS world.

What if we could run our existing filesystem code in userspace?

bcachefs has a shim layer (like xfs, but more extensive) to run nearly
the entire filesystem - about 90% by loc - in userspace.

Right now this is used for e.g. userspace fsck, but one of my goals is
to have the entire filesystem available as a FUSE filesystem. I'd been
planning on doing the fuse port as a straight fuse implementation, but
OTOH if we attempted a sh vfs iops/aops/etc. -> fuse shim, then we would
have pretty much everything we need to run any existing fs (e.g.
reiserfs) as a fuse filesystem.

It'd be a nontrivial project with some open questions (e.g. do we have
to lift all of bufferheads to userspace?) but it seems worth
investigating.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-09 22:42       ` Kent Overstreet
@ 2023-09-10  8:19         ` Geert Uytterhoeven
  2023-09-10  8:37           ` Bernd Schubert
  2023-09-10 16:35           ` Kent Overstreet
  2023-09-11  1:05         ` Dave Chinner
  1 sibling, 2 replies; 97+ messages in thread
From: Geert Uytterhoeven @ 2023-09-10  8:19 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: James Bottomley, Matthew Wilcox, Dave Chinner, Christoph Hellwig,
	ksummit, linux-fsdevel

Hi Kent,

On Sun, Sep 10, 2023 at 12:42 AM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
> On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> > So why can't we figure out that easier way? What's wrong with trying to
> > figure out if we can do some sort of helper or library set that assists
> > supporting and porting older filesystems. If we can do that it will not
> > only make the job of an old fs maintainer a lot easier, but it might
> > just provide the stepping stones we need to encourage more people climb
> > up into the modern VFS world.
>
> What if we could run our existing filesystem code in userspace?
>
> bcachefs has a shim layer (like xfs, but more extensive) to run nearly
> the entire filesystem - about 90% by loc - in userspace.
>
> Right now this is used for e.g. userspace fsck, but one of my goals is
> to have the entire filesystem available as a FUSE filesystem. I'd been
> planning on doing the fuse port as a straight fuse implementation, but
> OTOH if we attempted a sh vfs iops/aops/etc. -> fuse shim, then we would
> have pretty much everything we need to run any existing fs (e.g.
> reiserfs) as a fuse filesystem.
>
> It'd be a nontrivial project with some open questions (e.g. do we have
> to lift all of bufferheads to userspace?) but it seems worth
> investigating.

  1. https://xkcd.com/1200/ (not an exact match, but you should get the idea),
  2. Once a file system is removed from the kernel, would the user space
     implementation be maintained better?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-10  8:19         ` Geert Uytterhoeven
@ 2023-09-10  8:37           ` Bernd Schubert
  2023-09-10 16:35           ` Kent Overstreet
  1 sibling, 0 replies; 97+ messages in thread
From: Bernd Schubert @ 2023-09-10  8:37 UTC (permalink / raw)
  To: Geert Uytterhoeven, Kent Overstreet
  Cc: James Bottomley, Matthew Wilcox, Dave Chinner, Christoph Hellwig,
	ksummit, linux-fsdevel



On 9/10/23 10:19, Geert Uytterhoeven wrote:
> Hi Kent,
> 
> On Sun, Sep 10, 2023 at 12:42 AM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
>> On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
>>> So why can't we figure out that easier way? What's wrong with trying to
>>> figure out if we can do some sort of helper or library set that assists
>>> supporting and porting older filesystems. If we can do that it will not
>>> only make the job of an old fs maintainer a lot easier, but it might
>>> just provide the stepping stones we need to encourage more people climb
>>> up into the modern VFS world.
>>
>> What if we could run our existing filesystem code in userspace?
>>
>> bcachefs has a shim layer (like xfs, but more extensive) to run nearly
>> the entire filesystem - about 90% by loc - in userspace.
>>
>> Right now this is used for e.g. userspace fsck, but one of my goals is
>> to have the entire filesystem available as a FUSE filesystem. I'd been
>> planning on doing the fuse port as a straight fuse implementation, but
>> OTOH if we attempted a sh vfs iops/aops/etc. -> fuse shim, then we would
>> have pretty much everything we need to run any existing fs (e.g.
>> reiserfs) as a fuse filesystem.
>>
>> It'd be a nontrivial project with some open questions (e.g. do we have
>> to lift all of bufferheads to userspace?) but it seems worth
>> investigating.
> 
>    1. https://xkcd.com/1200/ (not an exact match, but you should get the idea),
>    2. Once a file system is removed from the kernel, would the user space
>       implementation be maintained better?

Unlikely that it would be maintained any better, more the other way around.
But then, effects on the entire system wouldn't be that severe anymore.
Moving deprecated file systems to fuse had been a short discussion
at LSFMM.




^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-10  8:19         ` Geert Uytterhoeven
  2023-09-10  8:37           ` Bernd Schubert
@ 2023-09-10 16:35           ` Kent Overstreet
  2023-09-10 17:26             ` Geert Uytterhoeven
  1 sibling, 1 reply; 97+ messages in thread
From: Kent Overstreet @ 2023-09-10 16:35 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: James Bottomley, Matthew Wilcox, Dave Chinner, Christoph Hellwig,
	ksummit, linux-fsdevel

On Sun, Sep 10, 2023 at 10:19:30AM +0200, Geert Uytterhoeven wrote:
> Hi Kent,
> 
> On Sun, Sep 10, 2023 at 12:42 AM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> > On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> > > So why can't we figure out that easier way? What's wrong with trying to
> > > figure out if we can do some sort of helper or library set that assists
> > > supporting and porting older filesystems. If we can do that it will not
> > > only make the job of an old fs maintainer a lot easier, but it might
> > > just provide the stepping stones we need to encourage more people climb
> > > up into the modern VFS world.
> >
> > What if we could run our existing filesystem code in userspace?
> >
> > bcachefs has a shim layer (like xfs, but more extensive) to run nearly
> > the entire filesystem - about 90% by loc - in userspace.
> >
> > Right now this is used for e.g. userspace fsck, but one of my goals is
> > to have the entire filesystem available as a FUSE filesystem. I'd been
> > planning on doing the fuse port as a straight fuse implementation, but
> > OTOH if we attempted a sh vfs iops/aops/etc. -> fuse shim, then we would
> > have pretty much everything we need to run any existing fs (e.g.
> > reiserfs) as a fuse filesystem.
> >
> > It'd be a nontrivial project with some open questions (e.g. do we have
> > to lift all of bufferheads to userspace?) but it seems worth
> > investigating.
> 
>   1. https://xkcd.com/1200/ (not an exact match, but you should get the idea),
>   2. Once a file system is removed from the kernel, would the user space
>      implementation be maintained better?

This would be for the filesystems that aren't getting maintained and
tested, to eliminate accidental breakage from in-kernel refactoring and
changing of APIs.

Getting that code out of the kernel would also greatly help with
security concerns.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-10 16:35           ` Kent Overstreet
@ 2023-09-10 17:26             ` Geert Uytterhoeven
  2023-09-10 17:35               ` Kent Overstreet
  0 siblings, 1 reply; 97+ messages in thread
From: Geert Uytterhoeven @ 2023-09-10 17:26 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: James Bottomley, Matthew Wilcox, Dave Chinner, Christoph Hellwig,
	ksummit, linux-fsdevel

On Sun, Sep 10, 2023 at 6:35 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
> On Sun, Sep 10, 2023 at 10:19:30AM +0200, Geert Uytterhoeven wrote:
> > On Sun, Sep 10, 2023 at 12:42 AM Kent Overstreet
> > <kent.overstreet@linux.dev> wrote:
> > > On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> > > > So why can't we figure out that easier way? What's wrong with trying to
> > > > figure out if we can do some sort of helper or library set that assists
> > > > supporting and porting older filesystems. If we can do that it will not
> > > > only make the job of an old fs maintainer a lot easier, but it might
> > > > just provide the stepping stones we need to encourage more people climb
> > > > up into the modern VFS world.
> > >
> > > What if we could run our existing filesystem code in userspace?
> > >
> > > bcachefs has a shim layer (like xfs, but more extensive) to run nearly
> > > the entire filesystem - about 90% by loc - in userspace.
> > >
> > > Right now this is used for e.g. userspace fsck, but one of my goals is
> > > to have the entire filesystem available as a FUSE filesystem. I'd been
> > > planning on doing the fuse port as a straight fuse implementation, but
> > > OTOH if we attempted a sh vfs iops/aops/etc. -> fuse shim, then we would
> > > have pretty much everything we need to run any existing fs (e.g.
> > > reiserfs) as a fuse filesystem.
> > >
> > > It'd be a nontrivial project with some open questions (e.g. do we have
> > > to lift all of bufferheads to userspace?) but it seems worth
> > > investigating.
> >
> >   1. https://xkcd.com/1200/ (not an exact match, but you should get the idea),
> >   2. Once a file system is removed from the kernel, would the user space
> >      implementation be maintained better?
>
> This would be for the filesystems that aren't getting maintained and
> tested, to eliminate accidental breakage from in-kernel refactoring and
> changing of APIs.
>
> Getting that code out of the kernel would also greatly help with
> security concerns.

OK, xkcd 1200 it is...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-10 17:26             ` Geert Uytterhoeven
@ 2023-09-10 17:35               ` Kent Overstreet
  0 siblings, 0 replies; 97+ messages in thread
From: Kent Overstreet @ 2023-09-10 17:35 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: James Bottomley, Matthew Wilcox, Dave Chinner, Christoph Hellwig,
	ksummit, linux-fsdevel

On Sun, Sep 10, 2023 at 07:26:26PM +0200, Geert Uytterhoeven wrote:
> On Sun, Sep 10, 2023 at 6:35 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> > On Sun, Sep 10, 2023 at 10:19:30AM +0200, Geert Uytterhoeven wrote:
> > > On Sun, Sep 10, 2023 at 12:42 AM Kent Overstreet
> > > <kent.overstreet@linux.dev> wrote:
> > > > On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> > > > > So why can't we figure out that easier way? What's wrong with trying to
> > > > > figure out if we can do some sort of helper or library set that assists
> > > > > supporting and porting older filesystems. If we can do that it will not
> > > > > only make the job of an old fs maintainer a lot easier, but it might
> > > > > just provide the stepping stones we need to encourage more people climb
> > > > > up into the modern VFS world.
> > > >
> > > > What if we could run our existing filesystem code in userspace?
> > > >
> > > > bcachefs has a shim layer (like xfs, but more extensive) to run nearly
> > > > the entire filesystem - about 90% by loc - in userspace.
> > > >
> > > > Right now this is used for e.g. userspace fsck, but one of my goals is
> > > > to have the entire filesystem available as a FUSE filesystem. I'd been
> > > > planning on doing the fuse port as a straight fuse implementation, but
> > > > OTOH if we attempted a sh vfs iops/aops/etc. -> fuse shim, then we would
> > > > have pretty much everything we need to run any existing fs (e.g.
> > > > reiserfs) as a fuse filesystem.
> > > >
> > > > It'd be a nontrivial project with some open questions (e.g. do we have
> > > > to lift all of bufferheads to userspace?) but it seems worth
> > > > investigating.
> > >
> > >   1. https://xkcd.com/1200/ (not an exact match, but you should get the idea),
> > >   2. Once a file system is removed from the kernel, would the user space
> > >      implementation be maintained better?
> >
> > This would be for the filesystems that aren't getting maintained and
> > tested, to eliminate accidental breakage from in-kernel refactoring and
> > changing of APIs.
> >
> > Getting that code out of the kernel would also greatly help with
> > security concerns.
> 
> OK, xkcd 1200 it is...

A fuse filesystem process can be restricted to only having access to the
device the filesystem is on.

Not so if it's running in the kernel...

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-09 15:44       ` Matthew Wilcox
@ 2023-09-10 19:51         ` James Bottomley
  2023-09-10 20:19           ` Kent Overstreet
                             ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: James Bottomley @ 2023-09-10 19:51 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Dave Chinner, Christoph Hellwig, ksummit, linux-fsdevel

On Sat, 2023-09-09 at 16:44 +0100, Matthew Wilcox wrote:
> On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> > On Wed, 2023-09-06 at 00:23 +0100, Matthew Wilcox wrote:
> > > On Wed, Sep 06, 2023 at 09:06:21AM +1000, Dave Chinner wrote:
> > [...]
> > > > > E.g. the hfsplus driver is unmaintained despite collecting
> > > > > odd fixes. It collects odd fixes because it is really useful
> > > > > for interoperating with MacOS and it would be a pity to
> > > > > remove it.  At the same time it is impossible to test changes
> > > > > to hfsplus sanely as there is no mkfs.hfsplus or fsck.hfsplus
> > > > > available for Linux.  We used to have one that was ported
> > > > > from the open source Darwin code drops, and I managed to get
> > > > > xfstests to run on hfsplus with them, but this old version
> > > > > doesn't compile on any modern Linux distribution and new
> > > > > versions of the code aren't trivially portable to Linux.
> > > > > 
> > > > > Do we have volunteers with old enough distros that we can
> > > > > list as testers for this code?  Do we have any other way to
> > > > > proceed?
> > > > > 
> > > > > If we don't, are we just going to untested API changes to
> > > > > these code bases, or keep the old APIs around forever?
> > > > 
> > > > We do slowly remove device drivers and platforms as the
> > > > hardware, developers and users disappear. We do also just
> > > > change driver APIs in device drivers for hardware that no-one
> > > > is actually able to test. The assumption is that if it gets
> > > > broken during API changes, someone who needs it to work will
> > > > fix it and send patches.
> > > > 
> > > > That seems to be the historical model for removing
> > > > unused/obsolete code from the kernel, so why should we treat
> > > > unmaintained/obsolete filesystems any differently?  i.e. Just
> > > > change the API, mark it CONFIG_BROKEN until someone comes along
> > > > and starts fixing it...
> > > 
> > > Umm.  If I change ->write_begin and ->write_end to take a folio,
> > > convert only the filesystems I can test via Luis' kdevops and
> > > mark the rest as CONFIG_BROKEN, I can guarantee you that Linus
> > > will reject that pull request.
> > 
> > I think really everyone in this debate needs to recognize two
> > things:
> > 
> >    1. There are older systems out there that have an active group
> > of
> >       maintainers and which depend on some of these older
> > filesystems
> >    2. Data image archives will ipso facto be in older formats and
> >       preserving access to them is a historical necessity.
> 
> I don't understand why you think people don't recognise those things.

Well, people recognize them as somebody else's problem, yes, like
virtualization below.

> > So the problem of what to do with older, less well maintained,
> > filesystems isn't one that can be solved by simply deleting them
> > and we have to figure out a way to move forward supporting them
> > (obviously for some value of the word "support"). 
> > 
> > By the way, people who think virtualization is the answer to this
> > should remember that virtual hardware is evolving just as fast as
> > physical hardware.
> 
> I think that's a red herring.  Of course there are advances in
> virtual hardware for those who need the best performance.  But
> there's also qemu's ability to provide to you a 1981-vintage PC (or
> more likely a 2000-era PC).  That's not going away.

So Red Hat dropping support for the pc type (alias i440fx)

https://bugzilla.redhat.com/show_bug.cgi?id=1946898

And the QEMU deprecation schedule

https://www.qemu.org/docs/master/about/deprecated.html

showing it as deprecated after 7.0 are wrong?  That's not to say
virtualization can't help at all; it can certainly lengthen the time
horizon, it's just not a panacea.

> > > I really feel we're between a rock and a hard place with our
> > > unmaintained filesystems.  They have users who care passionately,
> > > but not the ability to maintain them.
> > 
> > So why is everybody making this a hard either or? The volunteer
> > communities that grow around older things like filesystems are
> > going to be enthusiastic, but not really acquainted with the
> > technical intricacies of the modern VFS and mm. Requiring that they
> > cope with all the new stuff like iomap and folios is building an
> > unbridgeable chasm they're never going to cross. Give them an
> > easier way and they might get there.
> 
> Spoken like someone who has been paying no attention at all to what's
> going on in filesystems.

Well, that didn't take long;  one useful way to reduce stress on
everyone is actually to reduce the temperature of the discourse.

>   The newer APIs are easier to use.  The problem is understanding
> what the hell the old filesystems are doing with the old APIs.

OK, so we definitely have some filesystems that were experimental at
the time and pushed the boundaries, but not every (or even the
majority) of the older filesystems fall into this category.

> Nobody's interested.  That's the problem.  The number of filesystem
> developers we have is shrinking.  

What I actually heard was that there's communities of interested users,
they just don't get over the hump of becoming developers.  Fine, I get
it that a significant number of users will never become developers, but
that doesn't relieve us of the responsibility for lowering the barriers
for the small number that have the capacity.

> There hasn't been an HFS maintainer since 2011, and it wasn't a
> problem until syzbot decreed that every filesystem bug is a security
> bug.  And now, who'd want to be a fs maintainer with the automated
> harassment?

OK, so now we've strayed into the causes of maintainer burnout.  Syzbot
is undoubtedly a stressor, but one way of coping with a stressor is to
put it into perspective: Syzbot is really a latter day coverity and
everyone was much happier when developers ignored coverity reports and
they went into a dedicated pile that was looked over by a team of
people trying to sort the serious issues from the wrong but not
exploitable ones.  I'd also have to say that anyone who allows older
filesystems into customer facing infrastructure is really signing up
themselves for the risk they're running, so I'd personally be happy if
older fs teams simply ignored all the syzbot reports.

> Burnout amongst fs maintainers is a real problem.  I have no idea how
> to solve it.

I already suggested we should share coping strategies:

https://lore.kernel.org/ksummit/ab9cfd857e32635f626a906410ad95877a22f0db.camel@HansenPartnership.com/

The sources of stress aren't really going to decrease, but how people
react to them could change.  Syzbot (and bugs in general) are a case in
point.  We used not to treat seriously untriaged bug reports, but now
lots of people feel they can't ignore any fuzzer report.  We've tipped
to far into "everything's a crisis" mode and we really need to come
back and think that not every bug is actually exploitable or even
important.  We should go back to  requiring an idea how important the
report is before immediately acting on it.  Perhaps we should also go
back to seeing if we can prize some resources out of the major
moneymakers in the cloud space.  After all, a bug that could cause a
cloud exploit might not be even exploitable on a personal laptop that
has no untrusted users.  So if we left it to the monied cloud farms 
to figure out how to get us a triage of the report and concentrated on
fixing say only the obvious personal laptop exploits, that might be a
way of pushing off some of the stressors.

James



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-10 19:51         ` James Bottomley
@ 2023-09-10 20:19           ` Kent Overstreet
  2023-09-10 21:15           ` Guenter Roeck
  2023-09-11  3:10           ` Theodore Ts'o
  2 siblings, 0 replies; 97+ messages in thread
From: Kent Overstreet @ 2023-09-10 20:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Dave Chinner, Christoph Hellwig, ksummit, linux-fsdevel

On Sun, Sep 10, 2023 at 03:51:42PM -0400, James Bottomley wrote:
> OK, so now we've strayed into the causes of maintainer burnout.  Syzbot
> is undoubtedly a stressor, but one way of coping with a stressor is to
> put it into perspective: Syzbot is really a latter day coverity and
> everyone was much happier when developers ignored coverity reports and
> they went into a dedicated pile that was looked over by a team of
> people trying to sort the serious issues from the wrong but not
> exploitable ones.  I'd also have to say that anyone who allows older
> filesystems into customer facing infrastructure is really signing up
> themselves for the risk they're running, so I'd personally be happy if
> older fs teams simply ignored all the syzbot reports.

The problem with syzbot, and fuzzing in general is that reports come out
at random which makes it impossible to pick a think and work on it, i.e.
focus on the task at hand.

To be able to work productively, it's critical that we be able to find
out if our code is broken /when we're still working on it/, which means
getting quick testing feedback. Failing that, if we have to go back and
fix up old code, we really want to be able to look at a file/module/some
reasonably sized chunk, load it up into our brains, fix up what's wrong,
and move onto the next thing.

Syzbot is the absolute worst for developer productivity.

I've been talking about code coverage analysis as a partial replacement
for fuzz testing because you can look at the report for a file, figure
out what tests are missing, and do the work all at once. We'll never
catch all the bugs fuzz testing will find that way, but anything that
reduces our reliance on it would be a good thing.

The real long term solution is, of course, to start rewriting stuff in
Rust.

> > Burnout amongst fs maintainers is a real problem.  I have no idea how
> > to solve it.
> 
> I already suggested we should share coping strategies:
> 
> https://lore.kernel.org/ksummit/ab9cfd857e32635f626a906410ad95877a22f0db.camel@HansenPartnership.com/
> 
> The sources of stress aren't really going to decrease, but how people
> react to them could change.  Syzbot (and bugs in general) are a case in
> point.  We used not to treat seriously untriaged bug reports, but now
> lots of people feel they can't ignore any fuzzer report.  We've tipped
> to far into "everything's a crisis" mode and we really need to come
> back and think that not every bug is actually exploitable or even
> important.

Yeah, burnout is a symptom of too many impossible to meet priorities;
the solution is to figure out what our priorities actually are.

As the codebases I've written have grown, my own mentality has
shifted... when I was younger, every bug was something that had to be
fixed. Now I have to think more in terms of "how much time am I spending
fixing bugs, which bugs am I fixing, and how do I balance that against
my long term priorities".

in particular, the stuff that shows up in a dashboard may be the
/easiest/ to work on - it's in a nice todo list! - but if I spent all my
time on that I wouldn't get to the bugs and issues users are reporting.

Of course, if users are reporting too many bugs that means test coverage
is missing or the automated tests aren't being looked at enough, so it's
a real balancing act.

The other big change in my thinking has been going from trying to fix
every bug when I first see it (and at times going through real heroics
to do so) - to now trying to focus more on making debugging easy; if I
can't figure out a bug right away I'll often add more assertions/debug
output and wait for it to pop next time. That kind of thing has a real
long term impact; the thing I strive for is a codebase where when
something goes wrong it tells you /everything/ about what went wrong.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-10 19:51         ` James Bottomley
  2023-09-10 20:19           ` Kent Overstreet
@ 2023-09-10 21:15           ` Guenter Roeck
  2023-09-11  3:10           ` Theodore Ts'o
  2 siblings, 0 replies; 97+ messages in thread
From: Guenter Roeck @ 2023-09-10 21:15 UTC (permalink / raw)
  To: James Bottomley, Matthew Wilcox
  Cc: Dave Chinner, Christoph Hellwig, ksummit, linux-fsdevel

On 9/10/23 12:51, James Bottomley wrote:
[ ... ]
>> I think that's a red herring.  Of course there are advances in
>> virtual hardware for those who need the best performance.  But
>> there's also qemu's ability to provide to you a 1981-vintage PC (or
>> more likely a 2000-era PC).  That's not going away.
> 
> So Red Hat dropping support for the pc type (alias i440fx)
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1946898
> 
> And the QEMU deprecation schedule
> 
> https://www.qemu.org/docs/master/about/deprecated.html
> 
> showing it as deprecated after 7.0 are wrong?  That's not to say
> virtualization can't help at all; it can certainly lengthen the time
> horizon, it's just not a panacea.

deprecated.html says:

pc-i440fx-1.4 up to pc-i440fx-1.7 (since 7.0)
           ^^^                 ^^^
These old machine types are quite neglected nowadays and thus might have
various pitfalls with regards to live migration. Use a newer machine type
instead.

Unless the qemu documentation is severely misleading, that does not
include pc-i440fx-{2-8}.{0-12}, and there is no indication that the
machine type "pc" (current alias of pc-i440fx-8.1) has been deprecated.

Guenter


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-09 22:42       ` Kent Overstreet
  2023-09-10  8:19         ` Geert Uytterhoeven
@ 2023-09-11  1:05         ` Dave Chinner
  2023-09-11  1:29           ` Kent Overstreet
  2023-09-26  5:24           ` Eric W. Biederman
  1 sibling, 2 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-11  1:05 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: James Bottomley, Matthew Wilcox, Christoph Hellwig, ksummit,
	linux-fsdevel

On Sat, Sep 09, 2023 at 06:42:30PM -0400, Kent Overstreet wrote:
> On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> > So why can't we figure out that easier way? What's wrong with trying to
> > figure out if we can do some sort of helper or library set that assists
> > supporting and porting older filesystems. If we can do that it will not
> > only make the job of an old fs maintainer a lot easier, but it might
> > just provide the stepping stones we need to encourage more people climb
> > up into the modern VFS world.
> 
> What if we could run our existing filesystem code in userspace?

You mean like lklfuse already enables?

https://github.com/lkl/linux

Looks like the upstream repo is currently based on 6.1, so there's
already a mechanism to use relatively recent kernel filesystem
implementations as a FUSE filesystem without needed to support a
userspace code base....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11  1:05         ` Dave Chinner
@ 2023-09-11  1:29           ` Kent Overstreet
  2023-09-11  2:07             ` Dave Chinner
  2023-09-26  5:24           ` Eric W. Biederman
  1 sibling, 1 reply; 97+ messages in thread
From: Kent Overstreet @ 2023-09-11  1:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: James Bottomley, Matthew Wilcox, Christoph Hellwig, ksummit,
	linux-fsdevel

On Mon, Sep 11, 2023 at 11:05:09AM +1000, Dave Chinner wrote:
> On Sat, Sep 09, 2023 at 06:42:30PM -0400, Kent Overstreet wrote:
> > On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> > > So why can't we figure out that easier way? What's wrong with trying to
> > > figure out if we can do some sort of helper or library set that assists
> > > supporting and porting older filesystems. If we can do that it will not
> > > only make the job of an old fs maintainer a lot easier, but it might
> > > just provide the stepping stones we need to encourage more people climb
> > > up into the modern VFS world.
> > 
> > What if we could run our existing filesystem code in userspace?
> 
> You mean like lklfuse already enables?

I'm not seeing that it does?

I just had a look at the code, and I don't see anything there related to
the VFS - AFAIK, a VFS -> fuse layer doesn't exist yet.

And that looks a lot heavier than what we'd ideally want, i.e. a _lot_
more kernel code would be getting pulled in. The entire block layer,
probably the scheduler as well.

What I've got in bcachefs-tools is a much thinner mapping from e.g.
kthreads -> pthreads, block layer -> aio, etc.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11  1:29           ` Kent Overstreet
@ 2023-09-11  2:07             ` Dave Chinner
  2023-09-11 13:35               ` David Disseldorp
  0 siblings, 1 reply; 97+ messages in thread
From: Dave Chinner @ 2023-09-11  2:07 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: James Bottomley, Matthew Wilcox, Christoph Hellwig, ksummit,
	linux-fsdevel

On Sun, Sep 10, 2023 at 09:29:14PM -0400, Kent Overstreet wrote:
> On Mon, Sep 11, 2023 at 11:05:09AM +1000, Dave Chinner wrote:
> > On Sat, Sep 09, 2023 at 06:42:30PM -0400, Kent Overstreet wrote:
> > > On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
> > > > So why can't we figure out that easier way? What's wrong with trying to
> > > > figure out if we can do some sort of helper or library set that assists
> > > > supporting and porting older filesystems. If we can do that it will not
> > > > only make the job of an old fs maintainer a lot easier, but it might
> > > > just provide the stepping stones we need to encourage more people climb
> > > > up into the modern VFS world.
> > > 
> > > What if we could run our existing filesystem code in userspace?
> > 
> > You mean like lklfuse already enables?
> 
> I'm not seeing that it does?
> 
> I just had a look at the code, and I don't see anything there related to
> the VFS - AFAIK, a VFS -> fuse layer doesn't exist yet.

Just to repeat what I said on #xfs here...

It doesn't try to cut in half way through the VFS -> filesystem
path. It just redirects the fuse operations to "lkl syscalls" and so
runs the entire kernel VFS->filesystem path.

https://github.com/lkl/linux/blob/master/tools/lkl/lklfuse.c

> And that looks a lot heavier than what we'd ideally want, i.e. a _lot_
> more kernel code would be getting pulled in. The entire block layer,
> probably the scheduler as well.

Yes, but arguing that "performance sucks" misses the entire point of
this discussion: that for the untrusted user mounts of untrusted
filesystem images we already have a viable method for moving the
dangerous processing out into userspace that requires almost *zero
additional work* from anyone.

As long as the performance of the lklfuse implementation doesn't
totally suck, nobody will really care that much that isn't quite as
fast as a native implementation. PLuggable drives (e.g. via USB) are
already going to be much slower than a host installed drive, so I
don't think performance is even really a consideration for these
sorts of use cases....

> What I've got in bcachefs-tools is a much thinner mapping from e.g.
> kthreads -> pthreads, block layer -> aio, etc.

Right, and we've got that in userspace for XFS, too. If we really
cared that much about XFS-FUSE, I'd be converting userspace to use
ublk w/ io_uring on top of a port of the kernel XFS buffer cache as
the basis for a performant fuse implementation. However, there's a
massive amount of userspace work needed to get a native XFS FUSE
implementation up and running (even ignoring performance), so it's
just not a viable short-term - or even medium-term - solution to the
current problems.

Indeed, if you do a fuse->fs ops wrapper, I'd argue that lklfuse is
the place to do it so that there is a single code base that supports
all kernel filesystems without requiring anyone to support a
separate userspace code base. Requiring every filesystem to do their
own FUSE ports and then support them doesn't reduce the overall
maintenance overhead burden on filesystem developers....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-10 19:51         ` James Bottomley
  2023-09-10 20:19           ` Kent Overstreet
  2023-09-10 21:15           ` Guenter Roeck
@ 2023-09-11  3:10           ` Theodore Ts'o
  2023-09-11 19:03             ` James Bottomley
  2023-09-12 16:52             ` H. Peter Anvin
  2 siblings, 2 replies; 97+ messages in thread
From: Theodore Ts'o @ 2023-09-11  3:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Dave Chinner, Christoph Hellwig, ksummit, linux-fsdevel

On Sun, Sep 10, 2023 at 03:51:42PM -0400, James Bottomley wrote:
> On Sat, 2023-09-09 at 16:44 +0100, Matthew Wilcox wrote:
> > There hasn't been an HFS maintainer since 2011, and it wasn't a
> > problem until syzbot decreed that every filesystem bug is a security
> > bug.  And now, who'd want to be a fs maintainer with the automated
> > harassment?

The problem is that peopel are *believing* syzbot.  If we treat it as
noise, we can ignore it.  There is nothing that says we have to
*believe* syzbot's "decrees" over what is a security bug, and what
isn't.

Before doing a security assessment, you need to have a agreed-upon
threat model.  Another security aphorism, almost as well known this
one, is that security has to be designed in from the start --- and
historically, the storage device on which the file system operates is
part of the trusted computing base.  So trying to change the security
model to one that states that one must assume that the storage device
is under the complete and arbitrary control of the attacker is just
foolhardy.

There are also plenty of circumstances where this threat model is
simply not applicable.  For example, if the server is a secure data
center, and/or where USB ports are expoxy shut, and/or the automounter
is disabled, or not even installed, then this particular threat is
simply not in play.

> OK, so now we've strayed into the causes of maintainer burnout.  Syzbot
> is undoubtedly a stressor, but one way of coping with a stressor is to
> put it into perspective: Syzbot is really a latter day coverity and
> everyone was much happier when developers ignored coverity reports and
> they went into a dedicated pile that was looked over by a team of
> people trying to sort the serious issues from the wrong but not
> exploitable ones.  I'd also have to say that anyone who allows older
> filesystems into customer facing infrastructure is really signing up
> themselves for the risk they're running, so I'd personally be happy if
> older fs teams simply ignored all the syzbot reports.

Exactly.  So to the first approximation, if the syzbot doesn't have a
reliable reproducer --- ignore it.  If it involves a corrupted file
system, don't consider it a security bug.  Remember, we didn't sign up
for claiming that the file system should be proof against malicious
file system image.

I might take a look at it to see if we can improve the quality of the
implementation, but I don't treat it with any kind of urgency.  It's
more of something I do for fun, when I have a free moment or two.  And
when I have higher priority issues, syzkaller issues simply get
dropped and ignored.

The gamification which makes this difficult is when you get the
monthly syzbot reports, and you see the number of open syzkaller
issues climb.  It also doesn't help when you compare the number of
syzkaller issues for your file system with another file system.  For
me, one of the ways that I try to evade the manpulation is to remember
that the numbers are completely incomparable.

For example, if a file system is being used as the root file system,
and there some device driver or networking subsystem is getting
pounded, leading to kernel memory corruptions before the userspace
core dumps, this can generate the syzbot report which is "charged"
against the file system, when in fact it's not actually a file system
bug at all.  Or if the file system hasn't cooperated with Google's
intern project to disable metadata checksum verifications, the better
to trigger more file system corruption-triggered syzbot reports, this
can depress one file system's syzbot numbers over another.

So the bottom line is that the number of syzbot is ultimately fairly
meaningless as a comparison betweentwo different kernel subsystems,
despite the syzbot team's best attempts to manipulate you into feeling
bad about your code, and feeling obligated to Do Something about
bringing down the number of syzbot reports.

This is a "dark pattern", and you should realize this, and not let
yourself get suckered into falling for this mind game.

> The sources of stress aren't really going to decrease, but how people
> react to them could change.  Syzbot (and bugs in general) are a case in
> point.  We used not to treat seriously untriaged bug reports, but now
> lots of people feel they can't ignore any fuzzer report.  We've tipped
> to far into "everything's a crisis" mode and we really need to come
> back and think that not every bug is actually exploitable or even
> important.

Exactly.  A large number of unaddressed syzbot number is not a "kernel
security disaster" unless you let yourself get tricked into believing
that it is.  Again, it's all about threat models, and the syzbot robot
very cleverly hides any discussion over the threat model, and whether
it is valid, and whether it is one that you care about --- or whether
your employer should care.

> Perhaps we should also go
> back to seeing if we can prize some resources out of the major
> moneymakers in the cloud space.  After all, a bug that could cause a
> cloud exploit might not be even exploitable on a personal laptop that
> has no untrusted users.

Actually, I'd say this is backwards.  Many of these issues, and I'd
argue all that involve an maliciously corrupted file system, are not
actually an issue in the cloud space, because we *already* assume that
the attacker may have root.  After all, anyone can pay their $5
CPU/hour, and get an Amazon or Google or Azure VM, and then run
arbitrary workloads as root.

As near as I can tell **no** **one** is crazy enough to assume that
native containers are a security boundary.  For that reason, when a
cloud customer is using Docker, or Kubernetes, they are running it on
a VM which is dedicated to that customer.  Kubernetes jobs running on
behalf of say, Tesla Motors do not run on the same VM as the one
running Kuberentes jobs for Ford Motor Company, so even if an attacker
mounts a malicious file system iamge, they can't use that to break
security and get access to proprietary data belonging to a competitor.

The primary risk for maliciously corrupted file systems is because
GNOME automounts file systems by default, and so many a laptop is
subject to vulnerabilities if someone plugs in an untrusted USB key on
their personal laptop.  But this risk can be addressed simply by
uninstalling the automounter, and a future release of e2fsprogs will
include this patch:

https://lore.kernel.org/all/20230824235936.GA17891@frogsfrogsfrogs/

... which will install a udev rule that will fix this bad design
problem, at least for ext4 file systems.  Of course, a distro could
decide to take remove the udev rule, but at that point, I'd argue that
liability attaches to the distribution for disabling this security
mitigation, and it's no longer the file system developer's
responsibility.

						- Ted

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11  2:07             ` Dave Chinner
@ 2023-09-11 13:35               ` David Disseldorp
  2023-09-11 17:45                 ` Bart Van Assche
  2023-09-11 23:05                 ` Dave Chinner
  0 siblings, 2 replies; 97+ messages in thread
From: David Disseldorp @ 2023-09-11 13:35 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Kent Overstreet, James Bottomley, Matthew Wilcox,
	Christoph Hellwig, ksummit, linux-fsdevel, Hajime Tazaki,
	Octavian Purdila

Hi Dave,

On Mon, 11 Sep 2023 12:07:07 +1000, Dave Chinner wrote:

> On Sun, Sep 10, 2023 at 09:29:14PM -0400, Kent Overstreet wrote:
> > On Mon, Sep 11, 2023 at 11:05:09AM +1000, Dave Chinner wrote:  
> > > On Sat, Sep 09, 2023 at 06:42:30PM -0400, Kent Overstreet wrote:  
> > > > On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:  
> > > > > So why can't we figure out that easier way? What's wrong with trying to
> > > > > figure out if we can do some sort of helper or library set that assists
> > > > > supporting and porting older filesystems. If we can do that it will not
> > > > > only make the job of an old fs maintainer a lot easier, but it might
> > > > > just provide the stepping stones we need to encourage more people climb
> > > > > up into the modern VFS world.  
> > > > 
> > > > What if we could run our existing filesystem code in userspace?  
> > > 
> > > You mean like lklfuse already enables?  
> > 
> > I'm not seeing that it does?
> > 
> > I just had a look at the code, and I don't see anything there related to
> > the VFS - AFAIK, a VFS -> fuse layer doesn't exist yet.  
> 
> Just to repeat what I said on #xfs here...
> 
> It doesn't try to cut in half way through the VFS -> filesystem
> path. It just redirects the fuse operations to "lkl syscalls" and so
> runs the entire kernel VFS->filesystem path.
> 
> https://github.com/lkl/linux/blob/master/tools/lkl/lklfuse.c
> 
> > And that looks a lot heavier than what we'd ideally want, i.e. a _lot_
> > more kernel code would be getting pulled in. The entire block layer,
> > probably the scheduler as well.  

The LKL block layer may also become useful for legacy storage support in
future, e.g. SCSI protocol obsolescence.

> Yes, but arguing that "performance sucks" misses the entire point of
> this discussion: that for the untrusted user mounts of untrusted
> filesystem images we already have a viable method for moving the
> dangerous processing out into userspace that requires almost *zero
> additional work* from anyone.

Indeed. Hajime and Octavian (cc'ed) have also made serious efforts to
get the LKL codebase in shape for mainline:
https://lore.kernel.org/linux-um/cover.1611103406.git.thehajime@gmail.com/

> As long as the performance of the lklfuse implementation doesn't
> totally suck, nobody will really care that much that isn't quite as
> fast as a native implementation. PLuggable drives (e.g. via USB) are
> already going to be much slower than a host installed drive, so I
> don't think performance is even really a consideration for these
> sorts of use cases....
> 
> > What I've got in bcachefs-tools is a much thinner mapping from e.g.
> > kthreads -> pthreads, block layer -> aio, etc.  
> 
> Right, and we've got that in userspace for XFS, too. If we really
> cared that much about XFS-FUSE, I'd be converting userspace to use
> ublk w/ io_uring on top of a port of the kernel XFS buffer cache as
> the basis for a performant fuse implementation. However, there's a
> massive amount of userspace work needed to get a native XFS FUSE
> implementation up and running (even ignoring performance), so it's
> just not a viable short-term - or even medium-term - solution to the
> current problems.
> 
> Indeed, if you do a fuse->fs ops wrapper, I'd argue that lklfuse is
> the place to do it so that there is a single code base that supports
> all kernel filesystems without requiring anyone to support a
> separate userspace code base. Requiring every filesystem to do their
> own FUSE ports and then support them doesn't reduce the overall
> maintenance overhead burden on filesystem developers....

LKL is still implemented as a non-mmu architecture. The only fs specific
downstream change that lklfuse depends on is non-mmu xfs_buf support:
https://lore.kernel.org/linux-xfs/1447800381-20167-1-git-send-email-octavian.purdila@intel.com/

Does your lklfuse enthusiasm here imply that you'd be willing to
reconsider Octavian's earlier proposal for XFS non-mmu support?

Cheers, David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11 13:35               ` David Disseldorp
@ 2023-09-11 17:45                 ` Bart Van Assche
  2023-09-11 19:11                   ` David Disseldorp
  2023-09-11 23:05                 ` Dave Chinner
  1 sibling, 1 reply; 97+ messages in thread
From: Bart Van Assche @ 2023-09-11 17:45 UTC (permalink / raw)
  To: David Disseldorp, Dave Chinner
  Cc: Kent Overstreet, James Bottomley, Matthew Wilcox,
	Christoph Hellwig, ksummit, linux-fsdevel, Hajime Tazaki,
	Octavian Purdila

On 9/11/23 06:35, David Disseldorp wrote:
> The LKL block layer may also become useful for legacy storage support in
> future, e.g. SCSI protocol obsolescence.

There are probably more Linux devices using SCSI than NVMe. There are 
several billion Android phones in use. Modern Android phones use UFS 
storage. UFS is based on SCSI. There are already UFS devices available 
that support more than 300K IOPS and there are plans for improving 
performance further. Moving the SCSI stack to user space would have a
very significant negative performance impact on Android devices.

Bart.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11  3:10           ` Theodore Ts'o
@ 2023-09-11 19:03             ` James Bottomley
  2023-09-12  0:23               ` Dave Chinner
  2023-09-12 16:52             ` H. Peter Anvin
  1 sibling, 1 reply; 97+ messages in thread
From: James Bottomley @ 2023-09-11 19:03 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Matthew Wilcox, Dave Chinner, Christoph Hellwig, ksummit, linux-fsdevel

On Sun, 2023-09-10 at 23:10 -0400, Theodore Ts'o wrote:
> On Sun, Sep 10, 2023 at 03:51:42PM -0400, James Bottomley wrote:
[...]
> > Perhaps we should also go back to seeing if we can prize some
> > resources out of the major moneymakers in the cloud space.  After
> > all, a bug that could cause a cloud exploit might not be even
> > exploitable on a personal laptop that has no untrusted users.
> 
> Actually, I'd say this is backwards.  Many of these issues, and I'd
> argue all that involve an maliciously corrupted file system, are not
> actually an issue in the cloud space, because we *already* assume
> that the attacker may have root.  After all, anyone can pay their $5
> CPU/hour, and get an Amazon or Google or Azure VM, and then run
> arbitrary workloads as root.

Well, that was just one example.  Another way cloud companies could
potentially help is their various AI projects: I seem to get daily
requests from AI people for me to tell them just how AI could help
Linux.  When I suggest bug report triage and classification would be my
number one thing, they all back off faster than a mouse crashing a cat
convention with claims like "That's too hard a problem" and also that
in spite of ChatGPT getting its facts wrong and spewing rubbish for
student essays, it wouldn't survive the embarrassment of being
ridiculed by kernel developers for misclassifying bug reports.

I'm not sure peer pressure works on the AI community, but surely if
enough of us asked, they might one day overcome their fear of trying it
...

James


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11 17:45                 ` Bart Van Assche
@ 2023-09-11 19:11                   ` David Disseldorp
  0 siblings, 0 replies; 97+ messages in thread
From: David Disseldorp @ 2023-09-11 19:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Dave Chinner, Kent Overstreet, James Bottomley, Matthew Wilcox,
	Christoph Hellwig, ksummit, linux-fsdevel, Hajime Tazaki,
	Octavian Purdila

On Mon, 11 Sep 2023 10:45:31 -0700, Bart Van Assche wrote:

> On 9/11/23 06:35, David Disseldorp wrote:
> > The LKL block layer may also become useful for legacy storage support in
> > future, e.g. SCSI protocol obsolescence.  
> 
> There are probably more Linux devices using SCSI than NVMe. There are 
> several billion Android phones in use. Modern Android phones use UFS 
> storage. UFS is based on SCSI. There are already UFS devices available 
> that support more than 300K IOPS and there are plans for improving 
> performance further. Moving the SCSI stack to user space would have a
> very significant negative performance impact on Android devices.

I could imagine cases where support for SBC <= X and SPC <= Y is
deprecated or removed. SG_IO would probably be more applicable for
legacy device support in user-space, but I think it still serves as a
reasonable example for how LKL could also be useful.

Cheers, David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11 13:35               ` David Disseldorp
  2023-09-11 17:45                 ` Bart Van Assche
@ 2023-09-11 23:05                 ` Dave Chinner
  1 sibling, 0 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-11 23:05 UTC (permalink / raw)
  To: David Disseldorp
  Cc: Kent Overstreet, James Bottomley, Matthew Wilcox,
	Christoph Hellwig, ksummit, linux-fsdevel, Hajime Tazaki,
	Octavian Purdila

On Mon, Sep 11, 2023 at 03:35:15PM +0200, David Disseldorp wrote:
> On Mon, 11 Sep 2023 12:07:07 +1000, Dave Chinner wrote:
> > On Sun, Sep 10, 2023 at 09:29:14PM -0400, Kent Overstreet wrote:
> > > What I've got in bcachefs-tools is a much thinner mapping from e.g.
> > > kthreads -> pthreads, block layer -> aio, etc.  
> > 
> > Right, and we've got that in userspace for XFS, too. If we really
> > cared that much about XFS-FUSE, I'd be converting userspace to use
> > ublk w/ io_uring on top of a port of the kernel XFS buffer cache as
> > the basis for a performant fuse implementation. However, there's a
> > massive amount of userspace work needed to get a native XFS FUSE
> > implementation up and running (even ignoring performance), so it's
> > just not a viable short-term - or even medium-term - solution to the
> > current problems.
> > 
> > Indeed, if you do a fuse->fs ops wrapper, I'd argue that lklfuse is
> > the place to do it so that there is a single code base that supports
> > all kernel filesystems without requiring anyone to support a
> > separate userspace code base. Requiring every filesystem to do their
> > own FUSE ports and then support them doesn't reduce the overall
> > maintenance overhead burden on filesystem developers....
> 
> LKL is still implemented as a non-mmu architecture. The only fs specific
> downstream change that lklfuse depends on is non-mmu xfs_buf support:
> https://lore.kernel.org/linux-xfs/1447800381-20167-1-git-send-email-octavian.purdila@intel.com/

That was proposed in 2015.

> Does your lklfuse enthusiasm here imply that you'd be willing to
> reconsider Octavian's earlier proposal for XFS non-mmu support?

8 years a long time, circumstances change and we should always be
open to changing our minds when presented with new circumstances
and/or evidence.

Context: back in 2015 I was in the middle of a significant revamp of
the kernel and userspace code - that was when the shared libxfs
codebase was new and being actively developed, along with a
significant rework of all the userspace shims.  One of the things
that I was looking at the time was pulling everything into userspace
via libxfs that was needed for a native XFS-FUSE implementation.
That project never got that far - maintainer burnout happened before
that ever became a reality.

In that context, lklfuse didn't really make a whole lot of sense for
providing userspace XFS support via fuse because a native FUSE
solution would be much better in most regards (especially
performance). Things have changed a whole lot since then. We have
less fs developers, we have a antagonistic, uncooperative testing
"community", we have more code and releases to support, etc.

If we go back to what I said earlier about the minimum requirements
for a "community supported filesystem", it was about needing three
things:

- mkfs and fsck coverage
- fstests support
- syzbot doesn't get run on it

Now reconsider lklfuse from this perspective. We have #1 for most
filesystems, #2 is pretty trivial, and #3 is basically "syzbot +
lklfuse > /dev/null"...

IOWs, we can largely validate that lklfuse doesn't eat your data
with relatively little extra effort. We can provide userspace with a
viable, supported mechanism for unprivileged mounts of untrusted
filesystem images that can't lead to kernel compromise. And,
largely, we retain control of the quality of the lklfuse
implementation because it's running the kernel code that we already
maintain and support.

Times change, circumstances change, and if we aren't willing to
change our minds because we need to solve the new challenges
presented to us then we should not be in decision making
positions....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11 19:03             ` James Bottomley
@ 2023-09-12  0:23               ` Dave Chinner
  0 siblings, 0 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-12  0:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: Theodore Ts'o, Matthew Wilcox, Christoph Hellwig, ksummit,
	linux-fsdevel

On Mon, Sep 11, 2023 at 03:03:45PM -0400, James Bottomley wrote:
> On Sun, 2023-09-10 at 23:10 -0400, Theodore Ts'o wrote:
> > On Sun, Sep 10, 2023 at 03:51:42PM -0400, James Bottomley wrote:
> [...]
> > > Perhaps we should also go back to seeing if we can prize some
> > > resources out of the major moneymakers in the cloud space.  After
> > > all, a bug that could cause a cloud exploit might not be even
> > > exploitable on a personal laptop that has no untrusted users.
> > 
> > Actually, I'd say this is backwards.  Many of these issues, and I'd
> > argue all that involve an maliciously corrupted file system, are not
> > actually an issue in the cloud space, because we *already* assume
> > that the attacker may have root.  After all, anyone can pay their $5
> > CPU/hour, and get an Amazon or Google or Azure VM, and then run
> > arbitrary workloads as root.
> 
> Well, that was just one example.  Another way cloud companies could
> potentially help is their various AI projects: I seem to get daily
> requests from AI people for me to tell them just how AI could help
> Linux.  When I suggest bug report triage and classification would be my
> number one thing, they all back off faster than a mouse crashing a cat
> convention with claims like "That's too hard a problem" and also that
> in spite of ChatGPT getting its facts wrong and spewing rubbish for
> student essays, it wouldn't survive the embarrassment of being
> ridiculed by kernel developers for misclassifying bug reports.

No fucking way.

Just because you can do something it doesn't make it right or
ethical.  It is not ethical to experiment on human subjects without
their consent.  When someone asks the maintainer of a bot to stop
doing something because it is causing harm to people, then ethics
dictate that the bot should be *stopped immediately* regardless of
whatever other benefits it might have.

This is one of the major problems with syzbot: we can't get it
turned off even though it is clearly doing harm to people.  We
didn't consent to being subject to the constant flood of issues that
it throws our way, and despite repeated requests for it to be
changed or stopped to reduce the harm it is doing the owners of the
bot refuse to change anything. If anything, they double down and
make things worse for the people they send bug reports to (e.g. by
adding explicit writes to the block device under mounted mounted
filesystems).

In this context, the bot and it's owners need to be considered rogue
actors. The owners of the bot just don't seem to care about the harm
it is doing and largely refuse to do anything to reduce that harm.

Suggesting that the solution to the harm a rogue testing bot is
causing people in the community is that we should to subject those
same people to *additional AI-based bug reporting experiments
without their consent* is beyond my comprehension.

> I'm not sure peer pressure works on the AI community, but surely if
> enough of us asked, they might one day overcome their fear of trying it
> ...

Fear isn't an issue here. Anyone with even a moderate concern about
ethics understands that you do not experiment on people without
their explicit consent  (*cough* UoM and hypocrite commits *cough*).
Subjecting mailing lists to experimental AI generated bug reports
without explicit opt-in consent from the people who receive those
bug reports is really a total non-starter.

Testing bots aren't going away any time soon, but new bots -
especially experimental ones - really need to be opt-in. We most
certainly do not need a repeat of the uncooperative, hostile "we've
turned it on and you can't opt out" model that syzbot uses...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 12:30             ` Segher Boessenkool
@ 2023-09-12  9:50               ` Richard Biener
  2023-10-23  5:19                 ` Eric Gallager
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Biener @ 2023-09-12  9:50 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Dan Carpenter, Steven Rostedt, Dave Chinner, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel, gcc-patches

On Thu, Sep 7, 2023 at 2:32 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Thu, Sep 07, 2023 at 02:23:00PM +0300, Dan Carpenter wrote:
> > On Thu, Sep 07, 2023 at 06:04:09AM -0500, Segher Boessenkool wrote:
> > > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> > > > I started to hunt
> > > > down all the Makefile which add a -Werror but there are a lot and
> > > > eventually I got bored and gave up.
> > >
> > > I have a patch stack for that, since 2014 or so.  I build Linux with
> > > unreleased GCC versions all the time, so pretty much any new warning is
> > > fatal if you unwisely use -Werror.
> > >
> > > > Someone should patch GCC so there it checks an environment variable to
> > > > ignore -Werror.  Somethine like this?
> > >
> > > No.  You should patch your program, instead.
> >
> > There are 2930 Makefiles in the kernel source.
>
> Yes.  And you need patches to about thirty.  Or a bit more, if you want
> to do it more cleanly.  This isn't a guess.
>
> > > One easy way is to add a
> > > -Wno-error at the end of your command lines.  Or even just -w if you
> > > want or need a bigger hammer.
> >
> > I tried that.  Some of the Makefiles check an environemnt variable as
> > well if you want to turn off -Werror.  It's not a complete solution at
> > all.  I have no idea what a complete solution looks like because I gave
> > up.
>
> A solution can not involve changing the compiler.  That is just saying
> the kernel doesn't know how to fix its own problems, so let's give the
> compiler some more unnecessary problems.

You can change the compiler by replacing it with a script that appends
-Wno-error
for example.

> > > Or nicer, put it all in Kconfig, like powerpc already has for example.
> > > There is a CONFIG_WERROR as well, so maybe use that in all places?
> >
> > That's a good idea but I'm trying to compile old kernels and not the
> > current kernel.
>
> You can patch older kernels, too, you know :-)
>
> If you need to not make any changes to your source code for some crazy
> reason (political perhaps?), just use a shell script or shell function
> instead of invoking the compiler driver directly?
>
>
> Segher
>
> Segher

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11  3:10           ` Theodore Ts'o
  2023-09-11 19:03             ` James Bottomley
@ 2023-09-12 16:52             ` H. Peter Anvin
  1 sibling, 0 replies; 97+ messages in thread
From: H. Peter Anvin @ 2023-09-12 16:52 UTC (permalink / raw)
  To: Theodore Ts'o, James Bottomley
  Cc: Matthew Wilcox, Dave Chinner, Christoph Hellwig, ksummit, linux-fsdevel

On September 10, 2023 8:10:15 PM PDT, Theodore Ts'o <tytso@mit.edu> wrote:
>On Sun, Sep 10, 2023 at 03:51:42PM -0400, James Bottomley wrote:
>> On Sat, 2023-09-09 at 16:44 +0100, Matthew Wilcox wrote:
>> > There hasn't been an HFS maintainer since 2011, and it wasn't a
>> > problem until syzbot decreed that every filesystem bug is a security
>> > bug.  And now, who'd want to be a fs maintainer with the automated
>> > harassment?
>
>The problem is that peopel are *believing* syzbot.  If we treat it as
>noise, we can ignore it.  There is nothing that says we have to
>*believe* syzbot's "decrees" over what is a security bug, and what
>isn't.
>
>Before doing a security assessment, you need to have a agreed-upon
>threat model.  Another security aphorism, almost as well known this
>one, is that security has to be designed in from the start --- and
>historically, the storage device on which the file system operates is
>part of the trusted computing base.  So trying to change the security
>model to one that states that one must assume that the storage device
>is under the complete and arbitrary control of the attacker is just
>foolhardy.
>
>There are also plenty of circumstances where this threat model is
>simply not applicable.  For example, if the server is a secure data
>center, and/or where USB ports are expoxy shut, and/or the automounter
>is disabled, or not even installed, then this particular threat is
>simply not in play.
>
>> OK, so now we've strayed into the causes of maintainer burnout.  Syzbot
>> is undoubtedly a stressor, but one way of coping with a stressor is to
>> put it into perspective: Syzbot is really a latter day coverity and
>> everyone was much happier when developers ignored coverity reports and
>> they went into a dedicated pile that was looked over by a team of
>> people trying to sort the serious issues from the wrong but not
>> exploitable ones.  I'd also have to say that anyone who allows older
>> filesystems into customer facing infrastructure is really signing up
>> themselves for the risk they're running, so I'd personally be happy if
>> older fs teams simply ignored all the syzbot reports.
>
>Exactly.  So to the first approximation, if the syzbot doesn't have a
>reliable reproducer --- ignore it.  If it involves a corrupted file
>system, don't consider it a security bug.  Remember, we didn't sign up
>for claiming that the file system should be proof against malicious
>file system image.
>
>I might take a look at it to see if we can improve the quality of the
>implementation, but I don't treat it with any kind of urgency.  It's
>more of something I do for fun, when I have a free moment or two.  And
>when I have higher priority issues, syzkaller issues simply get
>dropped and ignored.
>
>The gamification which makes this difficult is when you get the
>monthly syzbot reports, and you see the number of open syzkaller
>issues climb.  It also doesn't help when you compare the number of
>syzkaller issues for your file system with another file system.  For
>me, one of the ways that I try to evade the manpulation is to remember
>that the numbers are completely incomparable.
>
>For example, if a file system is being used as the root file system,
>and there some device driver or networking subsystem is getting
>pounded, leading to kernel memory corruptions before the userspace
>core dumps, this can generate the syzbot report which is "charged"
>against the file system, when in fact it's not actually a file system
>bug at all.  Or if the file system hasn't cooperated with Google's
>intern project to disable metadata checksum verifications, the better
>to trigger more file system corruption-triggered syzbot reports, this
>can depress one file system's syzbot numbers over another.
>
>So the bottom line is that the number of syzbot is ultimately fairly
>meaningless as a comparison betweentwo different kernel subsystems,
>despite the syzbot team's best attempts to manipulate you into feeling
>bad about your code, and feeling obligated to Do Something about
>bringing down the number of syzbot reports.
>
>This is a "dark pattern", and you should realize this, and not let
>yourself get suckered into falling for this mind game.
>
>> The sources of stress aren't really going to decrease, but how people
>> react to them could change.  Syzbot (and bugs in general) are a case in
>> point.  We used not to treat seriously untriaged bug reports, but now
>> lots of people feel they can't ignore any fuzzer report.  We've tipped
>> to far into "everything's a crisis" mode and we really need to come
>> back and think that not every bug is actually exploitable or even
>> important.
>
>Exactly.  A large number of unaddressed syzbot number is not a "kernel
>security disaster" unless you let yourself get tricked into believing
>that it is.  Again, it's all about threat models, and the syzbot robot
>very cleverly hides any discussion over the threat model, and whether
>it is valid, and whether it is one that you care about --- or whether
>your employer should care.
>
>> Perhaps we should also go
>> back to seeing if we can prize some resources out of the major
>> moneymakers in the cloud space.  After all, a bug that could cause a
>> cloud exploit might not be even exploitable on a personal laptop that
>> has no untrusted users.
>
>Actually, I'd say this is backwards.  Many of these issues, and I'd
>argue all that involve an maliciously corrupted file system, are not
>actually an issue in the cloud space, because we *already* assume that
>the attacker may have root.  After all, anyone can pay their $5
>CPU/hour, and get an Amazon or Google or Azure VM, and then run
>arbitrary workloads as root.
>
>As near as I can tell **no** **one** is crazy enough to assume that
>native containers are a security boundary.  For that reason, when a
>cloud customer is using Docker, or Kubernetes, they are running it on
>a VM which is dedicated to that customer.  Kubernetes jobs running on
>behalf of say, Tesla Motors do not run on the same VM as the one
>running Kuberentes jobs for Ford Motor Company, so even if an attacker
>mounts a malicious file system iamge, they can't use that to break
>security and get access to proprietary data belonging to a competitor.
>
>The primary risk for maliciously corrupted file systems is because
>GNOME automounts file systems by default, and so many a laptop is
>subject to vulnerabilities if someone plugs in an untrusted USB key on
>their personal laptop.  But this risk can be addressed simply by
>uninstalling the automounter, and a future release of e2fsprogs will
>include this patch:
>
>https://lore.kernel.org/all/20230824235936.GA17891@frogsfrogsfrogs/
>
>... which will install a udev rule that will fix this bad design
>problem, at least for ext4 file systems.  Of course, a distro could
>decide to take remove the udev rule, but at that point, I'd argue that
>liability attaches to the distribution for disabling this security
>mitigation, and it's no longer the file system developer's
>responsibility.
>
>						- Ted
>

The noisy wheel gets the grease, and bots, especially ones with no kind of data organization, can be very noisy indeed. So even a useful tool can interfere with prioritization, and in particular encourages reactive rather than proactive scheduling of tasks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-07 11:18             ` Steven Rostedt
@ 2023-09-13 16:43               ` Eric Sandeen
  2023-09-13 16:58                 ` Guenter Roeck
  2023-09-13 17:03                 ` Linus Torvalds
  0 siblings, 2 replies; 97+ messages in thread
From: Eric Sandeen @ 2023-09-13 16:43 UTC (permalink / raw)
  To: Steven Rostedt, Dave Chinner
  Cc: Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On 9/7/23 6:18 AM, Steven Rostedt wrote:
> On Thu, 7 Sep 2023 13:38:40 +1000
> Dave Chinner <david@fromorbit.com> wrote:
> 
>> Hence, IMO, gutting a filesystem implementation to just support
>> read-only behaviour "to prolong it's support life" actually makes
>> things worse from a maintenance and testing persepective, not
>> better....
> 
> From your other email about 10 years support, you could first set a fs to
> read-only, and then after so long (I'm not sure 10 years is really
> necessary), then remove it.
> 
> That is, make it the stage before removal. If no one complains about it
> being read-only after several years, then it's highly likely that no one is
> using it. If someone does complain, you can tell them to either maintain
> it, or start moving all their data to another fs.
> 
> For testing, you could even have an #ifdef that needs to be manually
> changed (not a config option) to make it writable.

This still sounds to me like /more/ work for developers and testers that
may interact with the almost-dead filesystems, not less...

I agree w/ Dave here that moving almost-dead filesystems to RO-only
doesn't help solve the problem.

(and back to syzbot, it doesn't care one bit if $FOO-fs is readonly in
the kernel, it can still happily break the fs and the kernel along with it.)

Forcing readonly might make users squawk or speak up on the way to
possible deprecation, but then what? I don't think it reduces the
maintenance burden in any real way.

Isn't it more typical to mark something as on its way to deprecation in
Kconfig and/or a printk?

-Eric


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-13 16:43               ` Eric Sandeen
@ 2023-09-13 16:58                 ` Guenter Roeck
  2023-09-13 17:03                 ` Linus Torvalds
  1 sibling, 0 replies; 97+ messages in thread
From: Guenter Roeck @ 2023-09-13 16:58 UTC (permalink / raw)
  To: Eric Sandeen, Steven Rostedt, Dave Chinner
  Cc: Christoph Hellwig, ksummit, linux-fsdevel

On 9/13/23 09:43, Eric Sandeen wrote:
> On 9/7/23 6:18 AM, Steven Rostedt wrote:
>> On Thu, 7 Sep 2023 13:38:40 +1000
>> Dave Chinner <david@fromorbit.com> wrote:
>>
>>> Hence, IMO, gutting a filesystem implementation to just support
>>> read-only behaviour "to prolong it's support life" actually makes
>>> things worse from a maintenance and testing persepective, not
>>> better....
>>
>>  From your other email about 10 years support, you could first set a fs to
>> read-only, and then after so long (I'm not sure 10 years is really
>> necessary), then remove it.
>>
>> That is, make it the stage before removal. If no one complains about it
>> being read-only after several years, then it's highly likely that no one is
>> using it. If someone does complain, you can tell them to either maintain
>> it, or start moving all their data to another fs.
>>
>> For testing, you could even have an #ifdef that needs to be manually
>> changed (not a config option) to make it writable.
> 
> This still sounds to me like /more/ work for developers and testers that
> may interact with the almost-dead filesystems, not less...
> 
> I agree w/ Dave here that moving almost-dead filesystems to RO-only
> doesn't help solve the problem.
> 
> (and back to syzbot, it doesn't care one bit if $FOO-fs is readonly in
> the kernel, it can still happily break the fs and the kernel along with it.)
> 
> Forcing readonly might make users squawk or speak up on the way to
> possible deprecation, but then what? I don't think it reduces the
> maintenance burden in any real way.
> 
> Isn't it more typical to mark something as on its way to deprecation in
> Kconfig and/or a printk?
> 

I think that commit eb103a51640e ("reiserfs: Deprecate reiserfs") is a perfect
and excellent example for how to do this.

Guenter


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-13 16:43               ` Eric Sandeen
  2023-09-13 16:58                 ` Guenter Roeck
@ 2023-09-13 17:03                 ` Linus Torvalds
  2023-09-15 22:48                   ` Dave Chinner
  2023-09-25  9:38                   ` Christoph Hellwig
  1 sibling, 2 replies; 97+ messages in thread
From: Linus Torvalds @ 2023-09-13 17:03 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Steven Rostedt, Dave Chinner, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Wed, 13 Sept 2023 at 09:52, Eric Sandeen <sandeen@sandeen.net> wrote:
>
> Isn't it more typical to mark something as on its way to deprecation in
> Kconfig and/or a printk?

I haven't actually heard a good reason to really stop supporting
these. Using some kind of user-space library is ridiculous. It's *way*
more effort than just keeping them in the kernel. So anybody who says
"just move them to user space" is just making things up.

The reasons I have heard are:

 - security

Yes, don't enable them, and if you enable them, don't auto-mount them
on hot-pkug devices. Simple. People in this thread have already
pointed to the user-space support for it happening.

 - syzbot issues.

Ignore them for affs & co.

 - "they use the buffer cache".

Waah, waah, waah. The buffer cache is *trivial*. If you don't like the
buffer cache, don't use it. It's that simple.

But not liking the buffer cache and claiming that's a reason to not
support a filesystem is just complete BS.

              Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-13 17:03                 ` Linus Torvalds
@ 2023-09-15 22:48                   ` Dave Chinner
  2023-09-16 19:44                     ` Steven Rostedt
  2023-09-16 21:50                     ` James Bottomley
  2023-09-25  9:38                   ` Christoph Hellwig
  1 sibling, 2 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-15 22:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
> On Wed, 13 Sept 2023 at 09:52, Eric Sandeen <sandeen@sandeen.net> wrote:
> >
> > Isn't it more typical to mark something as on its way to deprecation in
> > Kconfig and/or a printk?
> 
> I haven't actually heard a good reason to really stop supporting
> these. Using some kind of user-space library is ridiculous. It's *way*
> more effort than just keeping them in the kernel. So anybody who says
> "just move them to user space" is just making things up.

No filesystem developer thinks that doing a whole lot of work to
move unmaintained, untestable fs code from the kernel to an
unmaintained, untestable (and likely broken) fs library in userspace
is a viable solution to any of the problems being discussed.

There's a whole lot more to this discussion than "what to do with
old, unmaintained filesystem code".

> The reasons I have heard are:
> 
>  - security
> 
> Yes, don't enable them, and if you enable them, don't auto-mount them
> on hot-pkug devices. Simple. People in this thread have already
> pointed to the user-space support for it happening.

This is just a band-aid. It does nothing to prevent kernel
compromise and is simply blame-shifting the inevitable kernel
compromise to the user because they had to explicitly mount the
filesystem. It's a "security theatre" solution at best.

Indeed, it does not address the frequently requested container use
cases where untrusted users (i.e. root in a container) need to mount
filesystem images.  This is a longstanding feature request we really
need to solve and ignoring it for the purposes of knocking down a
strawman really doesn't help us in any way.

Put simply, what we really need is a trust model mechanism that
allows all the kernel supported filesystems to be mounted by
untrusted users without any risk that the kernel could be
compromised by such an operation.

That's where lklfuse comes into the picture: it allows running the
kernel filesystem parsing code in an isolated userspace sandbox and
only communicates with the kernel and applications via the FUSE
interface.

IOWs, we get *privilege separation* with this lklfuse mechanism for
almost zero extra work on all sides of the fence. The dangerous
stuff occurs in the sandboxed user process so the risk of kernel
compromise is greatly minimised and the user and their applications
can still access it like a normal kernel filesystem.

And because it uses the kernel filesystem implementations, we don't
have a separate codebase that we have to maintain - we get
up-to-date filesystem implementations in userspace for free...

To go back to your original concern about avoiding the removal of
unmaintained filesystems, once we get a robust trust model mechanism
like this in place we we can force them to be mounted through the
supported privilege separation mechanism. Then they can't compromise
the kernel, and the vast majority of the "untested, unmaintained
code that parses untrusted data in kernel space" concerns go away
entirely.

IOWs, if we deal with the trust model issues in a robust manner,
there is much less need for drastic action to protect the kernel and
users from compromise via untestable, unmaintained filesystem code.
Your argument for keeping them around indefinitely *gets stronger*
by addresing the security problems they can expose properly. Hence
arguing against improving the filesystem trust model architecture is
actually providing an argument against your stated goal of keeping
those old filesystems around for ever....

At this point, the only concern that remains is the burden keeping
these old filesystems compiling properly as we we change internal
APIs in future. That's another thing that has been brought up in
this discussion, but....

>  - "they use the buffer cache".
> 
> Waah, waah, waah.

.... you dismiss those concerns in the same way a 6 year old school
yard bully taunts his suffering victims.

Regardless of the merits of the observation you've made, the tone
and content of this response is *completely unacceptable*.  Please
keep to technical arguments, Linus, because this sort of response
has no merit what-so-ever. All it does is shut down the technical
discussion because no-one wants to be the target of this sort of
ugly abuse just for participating in a technical discussion.

Given the number of top level maintainers that signed off on the CoC
that are present in this forum, I had an expectation that this is a
forum where bad behaviour is not tolerated at all.  So I've waited a
couple of days to see if anyone in a project leadership position is
going to say something about this comment.....

<silence>

The deafening silence of tacit acceptance is far more damning than
the high pitched squeal of Linus's childish taunts.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-15 22:48                   ` Dave Chinner
@ 2023-09-16 19:44                     ` Steven Rostedt
  2023-09-16 21:50                     ` James Bottomley
  1 sibling, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2023-09-16 19:44 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Eric Sandeen, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Sat, 16 Sep 2023 08:48:02 +1000
Dave Chinner <david@fromorbit.com> wrote:

> >  - "they use the buffer cache".
> > 
> > Waah, waah, waah.  
> 
> .... you dismiss those concerns in the same way a 6 year old school
> yard bully taunts his suffering victims.
> 
> Regardless of the merits of the observation you've made, the tone
> and content of this response is *completely unacceptable*.  Please
> keep to technical arguments, Linus, because this sort of response
> has no merit what-so-ever. All it does is shut down the technical
> discussion because no-one wants to be the target of this sort of
> ugly abuse just for participating in a technical discussion.
> 
> Given the number of top level maintainers that signed off on the CoC
> that are present in this forum, I had an expectation that this is a
> forum where bad behaviour is not tolerated at all.  So I've waited a
> couple of days to see if anyone in a project leadership position is
> going to say something about this comment.....
> 
> <silence>
> 
> The deafening silence of tacit acceptance is far more damning than
> the high pitched squeal of Linus's childish taunts.

Being one of those that signed off on the CoC, I honestly didn't see this
until you pointed it out. As I'm not a file system maintainer I have been
mostly just skimming the emails in this thread. I had this one marked as
read, but I only really read the first half of it.

Even though I didn't see it, I will admit that even if I had, I would not
have said anything because I'm so use to it and I'm somewhat blind to it
until someone points it out to me.

I'm not on the CoC committee, but I am on the TAB, and I will officially
state that comment was not appropriate.

Linus, please let's keep this technical.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-15 22:48                   ` Dave Chinner
  2023-09-16 19:44                     ` Steven Rostedt
@ 2023-09-16 21:50                     ` James Bottomley
  2023-09-17  1:40                       ` NeilBrown
                                         ` (2 more replies)
  1 sibling, 3 replies; 97+ messages in thread
From: James Bottomley @ 2023-09-16 21:50 UTC (permalink / raw)
  To: Dave Chinner, Linus Torvalds
  Cc: Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Sat, 2023-09-16 at 08:48 +1000, Dave Chinner wrote:
> On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
[...]
> >  - "they use the buffer cache".
> > 
> > Waah, waah, waah.
> 
> .... you dismiss those concerns in the same way a 6 year old school
> yard bully taunts his suffering victims.
> 
> Regardless of the merits of the observation you've made, the tone
> and content of this response is *completely unacceptable*.  Please
> keep to technical arguments, Linus, because this sort of response
> has no merit what-so-ever. All it does is shut down the technical
> discussion because no-one wants to be the target of this sort of
> ugly abuse just for participating in a technical discussion.
> 
> Given the number of top level maintainers that signed off on the CoC
> that are present in this forum, I had an expectation that this is a
> forum where bad behaviour is not tolerated at all.  So I've waited a
> couple of days to see if anyone in a project leadership position is
> going to say something about this comment.....
> 
> <silence>
> 
> The deafening silence of tacit acceptance is far more damning than
> the high pitched squeal of Linus's childish taunts.

Well, let's face it: it's a pretty low level taunt and it wasn't aimed
at you (or indeed anyone on the thread that I could find) and it was
backed by technical argument in the next sentence.  We all have a
tendency to let off steam about stuff in general not at people in
particular as you did here:

https://lore.kernel.org/ksummit/ZP+vcgAOyfqWPcXT@dread.disaster.area/

But I didn't take it as anything more than a rant about AI in general
and syzbot in particular and certainly I didn't assume it was aimed at
me or anyone else.

If everyone reached for the code of conduct when someone had a non-
specific rant using colourful phraseology, we'd be knee deep in
complaints, which is why we tend to be more circumspect when it
happens.

James



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-16 21:50                     ` James Bottomley
@ 2023-09-17  1:40                       ` NeilBrown
  2023-09-17 17:30                         ` Linus Torvalds
  2023-09-18 14:54                       ` Bill O'Donnell
  2023-09-19  2:44                       ` Dave Chinner
  2 siblings, 1 reply; 97+ messages in thread
From: NeilBrown @ 2023-09-17  1:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: Dave Chinner, Linus Torvalds, Eric Sandeen, Steven Rostedt,
	Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Sun, 17 Sep 2023, James Bottomley wrote:
> On Sat, 2023-09-16 at 08:48 +1000, Dave Chinner wrote:
> > On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
> [...]
> > >  - "they use the buffer cache".
> > > 
> > > Waah, waah, waah.
> > 
> > .... you dismiss those concerns in the same way a 6 year old school
> > yard bully taunts his suffering victims.
> > 
> > Regardless of the merits of the observation you've made, the tone
> > and content of this response is *completely unacceptable*.  Please
> > keep to technical arguments, Linus, because this sort of response
> > has no merit what-so-ever. All it does is shut down the technical
> > discussion because no-one wants to be the target of this sort of
> > ugly abuse just for participating in a technical discussion.
> > 
> > Given the number of top level maintainers that signed off on the CoC
> > that are present in this forum, I had an expectation that this is a
> > forum where bad behaviour is not tolerated at all.  So I've waited a
> > couple of days to see if anyone in a project leadership position is
> > going to say something about this comment.....
> > 
> > <silence>
> > 
> > The deafening silence of tacit acceptance is far more damning than
> > the high pitched squeal of Linus's childish taunts.
> 
> Well, let's face it: it's a pretty low level taunt and it wasn't aimed
> at you (or indeed anyone on the thread that I could find) and it was
> backed by technical argument in the next sentence.  We all have a
> tendency to let off steam about stuff in general not at people in
> particular as you did here:

It may have been low-level, but from such a high-profile individual it
carries considerable weight.  It carries real risk of appearing to give
permission for inappropriate childishness - or worse.

I'm not sure the technical argument was particularly coherent.  I think
there is a broad desire to deprecate and remove the buffer cache.  I
think the "technical argument" comes down to "don't remove it, just
ignore it if you don't like it" which hardly seems like a good idea for
overall project management.
A much better (in my opinion) technical argument is that the buffer
cache (which isn't all that much code) can simply be copied into each
filesystem which really cannot be modified to use whatever the current
preferred abstraction is.

NeilBrown

> 
> https://lore.kernel.org/ksummit/ZP+vcgAOyfqWPcXT@dread.disaster.area/
> 
> But I didn't take it as anything more than a rant about AI in general
> and syzbot in particular and certainly I didn't assume it was aimed at
> me or anyone else.
> 
> If everyone reached for the code of conduct when someone had a non-
> specific rant using colourful phraseology, we'd be knee deep in
> complaints, which is why we tend to be more circumspect when it
> happens.
> 
> James
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-17  1:40                       ` NeilBrown
@ 2023-09-17 17:30                         ` Linus Torvalds
  2023-09-17 18:09                           ` Linus Torvalds
                                             ` (2 more replies)
  0 siblings, 3 replies; 97+ messages in thread
From: Linus Torvalds @ 2023-09-17 17:30 UTC (permalink / raw)
  To: NeilBrown
  Cc: James Bottomley, Dave Chinner, Eric Sandeen, Steven Rostedt,
	Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Sat, 16 Sept 2023 at 18:40, NeilBrown <neilb@suse.de> wrote:
>
> I'm not sure the technical argument was particularly coherent.  I think
> there is a broad desire to deprecate and remove the buffer cache.

There really isn't.

There may be _whining_ about the buffer cache, but it's completely
misplaced, and has zero technical background.

The buffer cache is perfectly fine, and as mentioned, it is very
simple. It has absolutely no downsides for what it is.

Sure, it's old.

The whole getblk/bread/bwrite/brelse thing goes all the way back to
original unix, and in fact if you go and read the Lions' book, you'll
see that even in Unix v6, you have comments about some of it being a
relic:

    "B_RELOC is a relic" (p 68)
     http://www.lemis.com/grog/Documentation/Lions/book.pdf

and while obviously the Linux version of it is a different
re-implementation (based on reading _another_ classic book about Unix
-  Maurice Bach's "The Design of the UNIX Operating System"), the
basic notions aren't all that different. The detaisl are different,
the names have been changed later ("sb_bread()" instead of "bread()"),
and it has some extra code to try to do the "pack into a page so that
we can also mmap the result", but in the end it's the exact same
thing.

And because it's old, it's kind of limited. I wouldn't expect a modern
filesystem to use the buffer cache.

IOW, the buffer cache is simple and stupid. But it's literally
designed for simple and stupid old filesystems.

And simple and stupid old filesystems are often designed for it.

Simple and stupid is not *wrong*. In fact, it's often exactly what you want.

Being simple and stupid, it's a physically indexed cache. That's all
kinds of slow and inefficient, since you have to first look up the
physical location of a data file to even find the cached copy of the
data.

It's not fancy.

It's not clever.

But the whole "broad desire to deprecate and remove" is complete and utter BS.

The thing is, the buffer cache is completely pain free, and nobody
sane would ever remove it. That's a FACT. Do these two operations

      wc fs/buffer.c fs/mpage.c
      git grep -l 'struct.buffer_head'

and ponder.

And here's a clue-bat for anybody who can't do the "ponder" part
above: the buffer cache is _small_, it's _simple_, and it has
basically absolutely no effect on anything except for the filesystems
that use it.

And the filesystems that use it are old, and simple, but they are many
(this one is from "grep -w sb_bread", in case people care - I didn't
do any kind of fancier analysis):

      adfs, affs, befs, bfs, efs, exfat, ext2, ext4, f2fs, fat,
      freevxfs, hfs, hpfs, isofs, jfs, minix, nilfs2, ntfs, ntfs3, omfs,
      qnx4, qnx6, reiserfs, romfs, sysv, udf, ufs

And the other part of that "pondering" above, is to look at what the
impact of the buffer cache is *outside* those filesystems that use it.

And here's another clue-bat: it's effectively zero.  There's a couple
of lines in the VM. There's a couple of small helpers functions in
fs/direct-io.c. That's pretty much it.

In other words, the buffer cache is

 - simple

 - self-contained

 - supports 20+ legacy filesystems

so the whole "let's deprecate and remove it" is literally crazy
ranting and whining and completely mis-placed.

And yes, *within* the context of a filesystem or two, the whole "try
to avoid the buffer cache" can be a real thing.

Looking at the list of filesystems above, I would not be surprised if
one or two of them were to have a long-term plan to not use the buffer
cache.

But that in no way changes the actual picture.

Was this enough technical information for people?

And can we now all just admit that anybody who says "remove the buffer
cache" is so uninformed about what they are speaking of that we can
just ignore said whining?

                    Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-17 17:30                         ` Linus Torvalds
@ 2023-09-17 18:09                           ` Linus Torvalds
  2023-09-17 18:57                           ` Theodore Ts'o
  2023-09-19  1:15                           ` Dave Chinner
  2 siblings, 0 replies; 97+ messages in thread
From: Linus Torvalds @ 2023-09-17 18:09 UTC (permalink / raw)
  To: NeilBrown
  Cc: James Bottomley, Dave Chinner, Eric Sandeen, Steven Rostedt,
	Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Sun, 17 Sept 2023 at 10:30, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Sure, it's old.

.. and since it happens to be exactly 32 years today since I released
0.01, I decided to go back and look.

Obviously fs/buffer.c existed back then too, but admittedly it was
even smaller and simpler back then.

"It's small" clearly means something different today than it did 32 years ago.

Today:

   $ wc -l fs/buffer.c
   3152 fs/buffer.c

Back then:

   $ wc -l fs/buffer.c
   254 fs/buffer.c

So things have certainly changed. LOL.

              Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-17 17:30                         ` Linus Torvalds
  2023-09-17 18:09                           ` Linus Torvalds
@ 2023-09-17 18:57                           ` Theodore Ts'o
  2023-09-17 19:45                             ` Linus Torvalds
  2023-09-19  1:15                           ` Dave Chinner
  2 siblings, 1 reply; 97+ messages in thread
From: Theodore Ts'o @ 2023-09-17 18:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: NeilBrown, James Bottomley, Dave Chinner, Eric Sandeen,
	Steven Rostedt, Guenter Roeck, Christoph Hellwig, ksummit,
	linux-fsdevel

On Sun, Sep 17, 2023 at 10:30:55AM -0700, Linus Torvalds wrote:
> And yes, *within* the context of a filesystem or two, the whole "try
> to avoid the buffer cache" can be a real thing.

Ext4 uses buffer_heads, and wasn't on your list because we don't use
sb_bread().  And we are thinking about getting rid of buffer heads,
mostly because (a) we want to have more control over which metadata
blocks gets cached and which doesn't, and (b) the buffer cache doesn't
have a callback function to inform the file system if the writeback
fails, so that the file system can try to work around the issue, or at
the very least, mark the file system as corrupted and to signal it via
fsnotify.

Attempts to fix (b) via enhancements buffer cache where shot down by
the linux-fsdevel bike-shedding cabal, because "the buffer cache is
deprecated", and at the time, I decided it wasn't worth pushing it,
since (a) was also a consideration, and I expect we can also (c)
reduce the memory overhead since there are large parts of struct
buffer_head that ext4 doesn't need.


There was *one* one technical argument raised by people who want to
get rid of buffer heads, which is that the change from set_bh_page()
to folio_set_bh() introduced a bug which broke bh_offset() in a way
that only showed up if you were using bh_offset() and the file system
block size was less than the page size.

Eh, it was a bug, and we caught it quickly enough once someone
actually tried to run xfstests on the commit, and it bisected pretty
quickly.  (Unfortunately, the change went in via the mm tree, and so
it wasn't noticed by the ext4 file system developers; but
fortunatelly, Zorro flagged it, and once that showed up, I
investigated it.)  As far as I'm concerned, that's working as
intended, and these sorts of things happen.  So personally, I don't
consider this an argument for nuking the buffer cache.

I *do* view it as only one of many concerns when we do make these
tree-wide changes, such as the folio migration.  Yes, these these
tree-wide can introduce regressions, such as breaking bh_offset() for
a week or so before the regression tests catch it, and another week
before the fix makes its way to Linus's tree.  That's the system
working as designed.

But that's not the only concern; the other problem with these
tree-wide changes is that it tends to break automatic backports of bug
fixes to the LTS kernels, which now require manual handling by the
file system developers (or we could leave the LTS kernels with the
bugs unfixed, but that tends to make customers cranky :-).

Anyway, it's perhaps natural that the people who make these sorts of
tree-wide changes may get cranky when they need to modify, or at least
regression test, 20+ legacy file systems, and it does kind of suck
that many of these legacy file systems can't easily be tested by
xfstests because we don't even have a mkfs program for them.  (OTOH,
we recently merged ntfs3 w/o a working, or at least open-source, mkfs
program, so this isn't *just* the fault of legacy file systems.)

So sure, they may wish that we could make the job of landing these
sorts of tree-wide changes to make their job easier.  But we don't do
tree-wide changes all that often, and so it's a mistake to try to
optimize for this non-common case.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-17 18:57                           ` Theodore Ts'o
@ 2023-09-17 19:45                             ` Linus Torvalds
  2023-09-18 11:14                               ` Jan Kara
  0 siblings, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2023-09-17 19:45 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: NeilBrown, James Bottomley, Dave Chinner, Eric Sandeen,
	Steven Rostedt, Guenter Roeck, Christoph Hellwig, ksummit,
	linux-fsdevel

On Sun, 17 Sept 2023 at 11:58, Theodore Ts'o <tytso@mit.edu> wrote:
>
> Ext4 uses buffer_heads, and wasn't on your list because we don't use
> sb_bread().

Heh. Look closer at my list. ext4 actually was on my list, and it
turns out that's just because 'sb_bread()' is still mentioned in a
comment.

I did say that my list wasn't really the result of any exhaustive
analysis, but I picked up ext4 by luck.

And yes, ext4 was also one of the reasons I then mentioned that within
the contexts of individual filesystems, it may make sense to deprecate
the use of buffer heads.

Because yes, buffer heads _are_ old and overly simplistic. And I don't
really disagree with people who don't want to extend on them any more.
There are better models.

I think buffer heads are great for one thing, and really one thing
only: legacy use cases.

So I don't think it should be a shock to anybody that most of the
listed filesystems are random old legacy cases (or related to such -
exfat).

But "old" does not mean "bad". And legacy in many ways is worth
cherishing. It needs to become a whole lot more painful than buffer
heads have ever been to be a real issue.

It is in fact somewhat telling that of that list of odds and ends
there was *one* filesystem that was mentioned in this thread that is
actively being deprecated (and happens to use buffer heads).

And that filesystem has been explicitly not maintained, and is being
deprecated partly exactly because it is the opposite of cherished. So
the pain isn't worth it.

All largely for some rather obvious non-technical reasons.

So while reiserfs was mentioned as some kind of "good model for
deprecation", let's be *real* here. The reason nobody wants to have
anything to do with reiserfs is that Hans Reiser murdered his wife.

And I really *really* hope nobody takes that to heart as a good model
for filesystem deprecation.

                Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-17 19:45                             ` Linus Torvalds
@ 2023-09-18 11:14                               ` Jan Kara
  2023-09-18 17:26                                 ` Linus Torvalds
  2023-09-27 22:23                                 ` Dave Kleikamp
  0 siblings, 2 replies; 97+ messages in thread
From: Jan Kara @ 2023-09-18 11:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Ts'o, NeilBrown, James Bottomley, Dave Chinner,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Sun 17-09-23 12:45:30, Linus Torvalds wrote:
> Because yes, buffer heads _are_ old and overly simplistic. And I don't
> really disagree with people who don't want to extend on them any more.
> There are better models.
> 
> I think buffer heads are great for one thing, and really one thing
> only: legacy use cases.
> 
> So I don't think it should be a shock to anybody that most of the
> listed filesystems are random old legacy cases (or related to such -
> exfat).
> 
> But "old" does not mean "bad". And legacy in many ways is worth
> cherishing. It needs to become a whole lot more painful than buffer
> heads have ever been to be a real issue.

I agree. On the other hand each filesystem we carry imposes some
maintenance burden (due to tree wide changes that are happening) and the
question I have for some of them is: Do these filesystems actually bring
any value? I.e., are there still any users left? Sadly that's quite
difficult to answer so people do bother only if the pain is significant
enough like in the case of reiserfs. But I suspect we could get rid of a
few more without a real user complaining (e.g. Shaggy said he'd be happy to
deprecate JFS and he's not aware of any users).

> It is in fact somewhat telling that of that list of odds and ends
> there was *one* filesystem that was mentioned in this thread that is
> actively being deprecated (and happens to use buffer heads).
> 
> And that filesystem has been explicitly not maintained, and is being
> deprecated partly exactly because it is the opposite of cherished. So
> the pain isn't worth it.
> 
> All largely for some rather obvious non-technical reasons.
> 
> So while reiserfs was mentioned as some kind of "good model for
> deprecation", let's be *real* here. The reason nobody wants to have
> anything to do with reiserfs is that Hans Reiser murdered his wife.

Well, I agree that's (consciously or unconsciously) the non-technical part
of the reason. But there's also a technical one. Since Hans' vision how
things should be didn't always match how the rest of the VFS was designed,
reiserfs has accumulated quite some special behavior which is very easy to
break. And because reiserfs testing is non-existent, entrusting your data
to reiserfs is more and more a Russian roulette kind of gamble. So the two
existing reiserfs users that contacted me after announcing reiserfs
deprecation both rather opted for migrating to some other filesystem.
 
> And I really *really* hope nobody takes that to heart as a good model
> for filesystem deprecation.

LOL :).

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-16 21:50                     ` James Bottomley
  2023-09-17  1:40                       ` NeilBrown
@ 2023-09-18 14:54                       ` Bill O'Donnell
  2023-09-19  2:44                       ` Dave Chinner
  2 siblings, 0 replies; 97+ messages in thread
From: Bill O'Donnell @ 2023-09-18 14:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Dave Chinner, Linus Torvalds, Eric Sandeen, Steven Rostedt,
	Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Sat, Sep 16, 2023 at 05:50:50PM -0400, James Bottomley wrote:
> On Sat, 2023-09-16 at 08:48 +1000, Dave Chinner wrote:
> > On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
> [...]
> > >  - "they use the buffer cache".
> > > 
> > > Waah, waah, waah.
> > 
> > .... you dismiss those concerns in the same way a 6 year old school
> > yard bully taunts his suffering victims.
> > 
> > Regardless of the merits of the observation you've made, the tone
> > and content of this response is *completely unacceptable*.  Please
> > keep to technical arguments, Linus, because this sort of response
> > has no merit what-so-ever. All it does is shut down the technical
> > discussion because no-one wants to be the target of this sort of
> > ugly abuse just for participating in a technical discussion.
> > 
> > Given the number of top level maintainers that signed off on the CoC
> > that are present in this forum, I had an expectation that this is a
> > forum where bad behaviour is not tolerated at all.  So I've waited a
> > couple of days to see if anyone in a project leadership position is
> > going to say something about this comment.....
> > 
> > <silence>
> > 
> > The deafening silence of tacit acceptance is far more damning than
> > the high pitched squeal of Linus's childish taunts.
> 
> Well, let's face it: it's a pretty low level taunt and it wasn't aimed
> at you (or indeed anyone on the thread that I could find) and it was
> backed by technical argument in the next sentence.  We all have a
> tendency to let off steam about stuff in general not at people in
> particular as you did here:
> 
> https://lore.kernel.org/ksummit/ZP+vcgAOyfqWPcXT@dread.disaster.area/
> 
> But I didn't take it as anything more than a rant about AI in general
> and syzbot in particular and certainly I didn't assume it was aimed at
> me or anyone else.
> 
> If everyone reached for the code of conduct when someone had a non-
> specific rant using colourful phraseology, we'd be knee deep in
> complaints, which is why we tend to be more circumspect when it
> happens.

It's the kind of response that intimidates some into not participating.
Thanks-
Bill

> 
> James
> 
> 


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-18 11:14                               ` Jan Kara
@ 2023-09-18 17:26                                 ` Linus Torvalds
  2023-09-18 19:32                                   ` Jiri Kosina
  2023-09-19  4:56                                   ` Dave Chinner
  2023-09-27 22:23                                 ` Dave Kleikamp
  1 sibling, 2 replies; 97+ messages in thread
From: Linus Torvalds @ 2023-09-18 17:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Theodore Ts'o, NeilBrown, James Bottomley, Dave Chinner,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Mon, 18 Sept 2023 at 04:14, Jan Kara <jack@suse.cz> wrote:
>
> I agree. On the other hand each filesystem we carry imposes some
> maintenance burden (due to tree wide changes that are happening) and the
> question I have for some of them is: Do these filesystems actually bring
> any value?

I wouldn't be shocked if we could probably remove half of the
filesystems I listed, and nobody would even notice.

But at the same time, the actual upside to removing them is pretty
much zero. I do agree with you that reiserfs had issues - other than
the authorship - that made people much more inclined to remove it.

I'm looking at something like sysv, for example - the ancient old
14-byte filename thing. Does it have a single user? I really couldn't
tell. But at the same time, looking at the actual changes to it, they
fall into three categories:

 - trivial tree-wide changes - things like spelling fixes, or the SPDX
updates, or some "use common helpers"

 - VFS API updates, which are very straightforward (because sysvfs is
in no way doing anything odd)

 - some actual updates by Al Viro, who I doubt uses it, but I think
actually likes it and has some odd connection to it

anyway, I went back five years, and didn't see a single thing that
looked like "that was wasted time and effort".  There's a total of 44
patches over five years, so I'm looking at that filesystem and getting
a very strong feeling of "I think the minimal effort to maintain it
has been worth it".

Even without a single user, there's a history there, and it would be
sad to leave it behind. Exactly because it's _so_ little effort to
just keep.

Now, some of the other filesystems have gotten much more work done to
them - but it's because people have actively worked on them. rmk
actually did several adfs patch-series of cleanups etc back in 2019,
for example. Other than that, adfs seems to actually have gotten less
attention than sysvfs did, but I think that is probably because it
lacked the "Al Viro likes it" factor.

And something like befs - which has no knight in shining armor that
cares at all - has just a very small handful of one-liner patches for
VFS API changes.

So even the completely unloved ones just aren't a *burden*.

Reiserfs does stand out, as you say. There's a fair amount of actual
bug fixes and stuff there, because it's much more complicated, and
there were presumably a lot more complicated uses of it too due to the
history of it being an actual default distro filesystem for a while.

And that's kind of the other side of the picture: usage matters.
Something like affs or minixfs might still have a couple of users, but
those uses would basically be people who likely use Linux to interact
with some legacy machine they maintain..  So the usage they see would
mainly be very simple operations.

And that matters for two reasons:

 (a) we probably don't have to worry about bugs - security or
otherwise - as much. These are not generally "general-purpose"
filesystems. They are used for data transfer etc.

 (b) if they ever turn painful, we might be able to limit the pain further.

For example, mmap() is a very important operation in the general case,
and it actually causes a lot of potential problems from a filesystem
standpoint. It's one of the main sources of what little complexity
there is in the buffer head handling, for example.

But mmap() is *not* important for a filesystem that is used just for
data transport. I bet that FAT is still widely used, for example, and
while exFAT is probably making inroads, I suspect most of us have used
a USB stick with a FAT filesystem on it in the not too distant past.
Yet I doubt we'd have ever even noticed if 'mmap' didn't work on FAT.
Because all you really want for data transport is basic read/write
support.

And the reason I mention mmap is that it actually has some complexity
associated with it. If you support mmap, you have to have a read_folio
function, which in turn is why we have mpage_readpage(), which in turn
ends up being a noticeable part of the buffer cache code - any minor
complexity of the buffer cache does not tend to be about the
individual bh's themselves, but about the 'b_this_page' traversal, and
how buffers can be reached not just with sb_bread() and friends, but
are reachable from the VM through the page they are in.

IOW, *if* the buffer cache ever ends up being a big pain point, I
suspect that we'd still not want to remove ir, but it might be that we
could go "Hmm. Let's remove all the mmap support for the filesystems
that still use the buffer cache for data pages, because that causes
problems".

I think, for example, that ext4 - which obviously needs to continue to
support mmap, and which does use buffer heads in other parts - does
*not* use the buffer cache for actual data pages, only for metadata. I
might be wrong.

Anyway, based on the *current* situation, I don't actually see the
buffer cache even _remotely_ painful enough that we'd do even that
thing. It's not a small undertaking to get rid of the whole
b_this_page stuff and the complexity that comes from the page being
reachable through the VM layer (ie writepages etc). So it would be a
*lot* more work to rip that code out than it is to just support it.

         Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-18 17:26                                 ` Linus Torvalds
@ 2023-09-18 19:32                                   ` Jiri Kosina
  2023-09-18 19:59                                     ` Linus Torvalds
  2023-09-18 20:33                                     ` H. Peter Anvin
  2023-09-19  4:56                                   ` Dave Chinner
  1 sibling, 2 replies; 97+ messages in thread
From: Jiri Kosina @ 2023-09-18 19:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, Theodore Ts'o, NeilBrown, James Bottomley,
	Dave Chinner, Eric Sandeen, Steven Rostedt, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On Mon, 18 Sep 2023, Linus Torvalds wrote:

> But mmap() is *not* important for a filesystem that is used just for 
> data transport. I bet that FAT is still widely used, for example, and 
> while exFAT is probably making inroads, I suspect most of us have used a 
> USB stick with a FAT filesystem on it in the not too distant past. Yet I 
> doubt we'd have ever even noticed if 'mmap' didn't work on FAT. Because 
> all you really want for data transport is basic read/write support.

I am afraid this is not reflecting reality.

I am pretty sure that "give me that document on a USB stick, and I'll take 
a look" leads to using things like libreoffice (or any other editor liked 
by general public) to open the file directly on the FAT USB stick. And 
that's pretty much guaranteed to use mmap().

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-18 19:32                                   ` Jiri Kosina
@ 2023-09-18 19:59                                     ` Linus Torvalds
  2023-09-18 20:50                                       ` Theodore Ts'o
  2023-09-18 20:33                                     ` H. Peter Anvin
  1 sibling, 1 reply; 97+ messages in thread
From: Linus Torvalds @ 2023-09-18 19:59 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jan Kara, Theodore Ts'o, NeilBrown, James Bottomley,
	Dave Chinner, Eric Sandeen, Steven Rostedt, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On Mon, 18 Sept 2023 at 12:32, Jiri Kosina <jikos@kernel.org> wrote:
>
> I am afraid this is not reflecting reality.
>
> I am pretty sure that "give me that document on a USB stick, and I'll take
> a look" leads to using things like libreoffice (or any other editor liked
> by general public) to open the file directly on the FAT USB stick. And
> that's pretty much guaranteed to use mmap().

Ugh. I would have hoped that anybody will fall back to read/write -
because we definitely have filesystems that don't support mmap.

But I guess they are so specialized as to not ever trigger that kind
of problem (eg /proc - nobody is putting office documents there ;)

A cache-incoherent MAP_PRIVATE only mmap (ie one that doesn't react to
'write()' changing the data) is easy to do, but yeah, it would still
be a lot more work than just "keep things as-is".

           Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-18 19:32                                   ` Jiri Kosina
  2023-09-18 19:59                                     ` Linus Torvalds
@ 2023-09-18 20:33                                     ` H. Peter Anvin
  1 sibling, 0 replies; 97+ messages in thread
From: H. Peter Anvin @ 2023-09-18 20:33 UTC (permalink / raw)
  To: Jiri Kosina, Linus Torvalds
  Cc: Jan Kara, Theodore Ts'o, NeilBrown, James Bottomley,
	Dave Chinner, Eric Sandeen, Steven Rostedt, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On September 18, 2023 12:32:05 PM PDT, Jiri Kosina <jikos@kernel.org> wrote:
>On Mon, 18 Sep 2023, Linus Torvalds wrote:
>
>> But mmap() is *not* important for a filesystem that is used just for 
>> data transport. I bet that FAT is still widely used, for example, and 
>> while exFAT is probably making inroads, I suspect most of us have used a 
>> USB stick with a FAT filesystem on it in the not too distant past. Yet I 
>> doubt we'd have ever even noticed if 'mmap' didn't work on FAT. Because 
>> all you really want for data transport is basic read/write support.
>
>I am afraid this is not reflecting reality.
>
>I am pretty sure that "give me that document on a USB stick, and I'll take 
>a look" leads to using things like libreoffice (or any other editor liked 
>by general public) to open the file directly on the FAT USB stick. And 
>that's pretty much guaranteed to use mmap().
>

I mean... fopen() on Linux optionally uses mmap()... and it used to do so unconditionally, even. mmap() is a good match for stdio (at least the input side), so it is a reasonable thing to do.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-18 19:59                                     ` Linus Torvalds
@ 2023-09-18 20:50                                       ` Theodore Ts'o
  2023-09-18 22:48                                         ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Theodore Ts'o @ 2023-09-18 20:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jiri Kosina, Jan Kara, NeilBrown, James Bottomley, Dave Chinner,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Mon, Sep 18, 2023 at 12:59:15PM -0700, Linus Torvalds wrote:
> On Mon, 18 Sept 2023 at 12:32, Jiri Kosina <jikos@kernel.org> wrote:
> >
> > I am afraid this is not reflecting reality.
> >
> > I am pretty sure that "give me that document on a USB stick, and I'll take
> > a look" leads to using things like libreoffice (or any other editor liked
> > by general public) to open the file directly on the FAT USB stick. And
> > that's pretty much guaranteed to use mmap().
> 
> Ugh. I would have hoped that anybody will fall back to read/write -
> because we definitely have filesystems that don't support mmap.

Fortunately, I most of the "simple" file systems appear to support
mmap, via generic_file_mmap:

% git grep generic_file_mmap | grep ^fs | awk -F/ '{print $2}' | uniq | xargs echo

9p adfs affs afs bfs ecryptfs exfat ext2 fat fuse hfs hfsplus hostfs
hpfs jfs minix nfs ntfs ntfs3 omfs ramfs reiserfs smb sysv ubifs ufs
vboxsf

						- Ted

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-18 20:50                                       ` Theodore Ts'o
@ 2023-09-18 22:48                                         ` Linus Torvalds
  0 siblings, 0 replies; 97+ messages in thread
From: Linus Torvalds @ 2023-09-18 22:48 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jiri Kosina, Jan Kara, NeilBrown, James Bottomley, Dave Chinner,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Mon, 18 Sept 2023 at 13:51, Theodore Ts'o <tytso@mit.edu> wrote:
>
> Fortunately, I most of the "simple" file systems appear to support
> mmap, via generic_file_mmap:

Yes, but that is in fact exactly the path that causes the most
complexity for the buffer cache: it needs that "readpage" function
that in turn then uses mpage_readpage() and friends to create the
buffers all in the same page.

And then - in order for normal read/write to not have any buffer
aliases, and be coherent - they too need to deal with that "group of
buffers in the same page" situation too.

It's not a *big* amount of complexity, but it's absolutely the most
complicated part of the buffer cache by far, in how it makes buffer
heads not independent of each other, and how it makes some of the
buffer cache depend on the page lock etc.

So the mmap side is what ties buffers heads together with the pages
(now folios), and it's not pretty. we have a number of loops like

        struct buffer_head *bh = head;
        do {
                .. work on bh ..
                bh = bh->b_this_page;
        } while (bh != head);

together with rules for marking buffers and pages dirty / uptodate /
whatever hand-in-hand.

Anyway, all of this is very old, and all of it is quite stable. We had
mmap support thanks to these games even before the page cache existed.

So it's not _pretty_, but it works, and if we can't just say "we don't
need to support mmap", we're almost certainly stuck with it (at least
if we want mappings that stay coherent with IO).

               Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-17 17:30                         ` Linus Torvalds
  2023-09-17 18:09                           ` Linus Torvalds
  2023-09-17 18:57                           ` Theodore Ts'o
@ 2023-09-19  1:15                           ` Dave Chinner
  2023-09-19  5:17                             ` Matthew Wilcox
  2 siblings, 1 reply; 97+ messages in thread
From: Dave Chinner @ 2023-09-19  1:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: NeilBrown, James Bottomley, Eric Sandeen, Steven Rostedt,
	Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel

On Sun, Sep 17, 2023 at 10:30:55AM -0700, Linus Torvalds wrote:
> On Sat, 16 Sept 2023 at 18:40, NeilBrown <neilb@suse.de> wrote:
> >
> > I'm not sure the technical argument was particularly coherent.  I think
> > there is a broad desire to deprecate and remove the buffer cache.

....

> In other words, the buffer cache is
> 
>  - simple
> 
>  - self-contained
> 
>  - supports 20+ legacy filesystems
> 
> so the whole "let's deprecate and remove it" is literally crazy
> ranting and whining and completely mis-placed.

But that isn't what this thread is about. This is a strawman that
you're spending a lot of time and effort to stand up and then knock down.

Let's start from a well known problem we currently face: the
per-inode page cache struggles to scale to the bandwidth
capabilities of modern storage. We've known about this for well over
a decade in high performance IO circles, but now we are hitting it
with cheap consumer level storage. These per-inode bandwidth
scalability problems is one of the driving reasons behind the
conversion to folios and the introduction of high order folios into
the page cache.

One of the problems being raised in the high-order folio context is
that *bufferheads* and high-order folios don't really go together
well.  The pointer chasing model per-block bufferhead iteration
requires to update state and retrieve mapping information just does
not scale to marshalling millions of objects a second through the
page cache.

The best solution is to not use bufferheads at all for file data.
That's the direction the page cache IO stack is moving; we are
already there with iomap and hence XFS. With the recent introduction
of high order folios into the buffered write path, single file write
throughput on a pcie4.0 ssd went from ~2.5GB/s consuming 5 CPUs in
mapping lock contention to saturating the device at over 7GB/s
whilst also providing a 70% reduction in total CPU usage. This
result is came about simply by reduce reducing mapping lock traffic
by a couple of orders of magnitude across the write syscall, IO
submission, IO completion and memory reclaim paths....

This was easy to do with iomap based filesystems because they don't
carry per-block filesystem structures for every folio cached in page
cache - we carry a single object per folio that holds the 2 bits of
per-filesystem block state we need for each block the folio maps.
Compare that to a bufferhead - it uses 56 bytes of memory per
fielsystem block that is cached.

Hence in modern systems with hundreds of GB to TB of RAM and IO
rates measured in the multiple GB/s, this is a substantial cost in
terms of page cache efficiency and resource usage when using
bufferheads in the data path.  The benefits to moving from
bufferheads for data IO to iomap for data IO are significant.

However, that's not an easy conversion. There's a lot of work to
validate the intergrity of the IO path whilst making such a change.
It's complex and requires a fair bit of expertise in how the IO path
works, filesystem locking models, internal fs block mapping and
allocation routines, etc. And some filesystems flush data through
the buffer cache or track data writes though their journals via
bufferheads, so actually removing bufferheads for them is not an
easy task.

So we have to consider that maybe it is less work to make high-order
folios work with bufferheads. And that's where we start to get into
the maintenance problems with old filesysetms using bufferheads -
how do we ensure that the changes for high-order folio support in
bufferheads does not break the way one of these old filesystems
that use bufferheads?

That comes down to a simple question: if we can't actually test all
these old filesystems, how do we even know that they work correctly
right now?  Given that we are supposed to be providing some level of
quality assurance to users of these filesystems, are they going to
bve happy with running untested code that nobody really knows if it
works properly or not?

The buffer cache and the fact legacy filesystems use it is the least
of our worries - the problems are with the complex APIs,
architecture and interactions at the intersection point of shared
page cache and filesystem state. The discussion is a reflection on
how difficult it is to change a large, complex code base where
significant portions of it are untestable.

Regardless of which way we end up deciding to move forwards there is
*lots* of work that needs to be done and significant burdens remain
on the people who need to API changes to do get where we need to be.
We want to try to minimise that burden so we can make progress as
fast as possible.

Getting rid of unmaintained, untestable code is low hanging fruit.
Nobody is talking about getting rid of the buffer cache; we can
ensure that the buffer cache continues to work fairly easily; it's
all the other complex code in the filesystems that is the problem.

What we are actually talking about how to manage code which is
unmaintained, possibly broken and which nobody can and/or will fix.
Nobody benefits from the kernel carrying code we can't easily
maintain, test or fix, so working out how to deal with this problem
efficiently is a key part of the decisions that need to be made.

Hence to reduce this whole complex situation to a statement "the
buffer cache is simple and people suggesting we deprecate and remove
it" is a pretty significant misrepresentation the situation we find
ourselves in.

> Was this enough technical information for people?
> 
> And can we now all just admit that anybody who says "remove the buffer
> cache" is so uninformed about what they are speaking of that we can
> just ignore said whining?

Wow. Just wow.

After being called out for abusive behaviour, you immediately call
everyone who disagrees with you "uninformed" and suggest we should
"just ignore said whining"?

Which bit of "this is unacceptable behaviour" didn't you understand,
Linus?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-16 21:50                     ` James Bottomley
  2023-09-17  1:40                       ` NeilBrown
  2023-09-18 14:54                       ` Bill O'Donnell
@ 2023-09-19  2:44                       ` Dave Chinner
  2023-09-19 16:57                         ` James Bottomley
  2 siblings, 1 reply; 97+ messages in thread
From: Dave Chinner @ 2023-09-19  2:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linus Torvalds, Eric Sandeen, Steven Rostedt, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On Sat, Sep 16, 2023 at 05:50:50PM -0400, James Bottomley wrote:
> On Sat, 2023-09-16 at 08:48 +1000, Dave Chinner wrote:
> > On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
> [...]
> > >  - "they use the buffer cache".
> > > 
> > > Waah, waah, waah.
> > 
> > .... you dismiss those concerns in the same way a 6 year old school
> > yard bully taunts his suffering victims.
> > 
> > Regardless of the merits of the observation you've made, the tone
> > and content of this response is *completely unacceptable*.  Please
> > keep to technical arguments, Linus, because this sort of response
> > has no merit what-so-ever. All it does is shut down the technical
> > discussion because no-one wants to be the target of this sort of
> > ugly abuse just for participating in a technical discussion.
> > 
> > Given the number of top level maintainers that signed off on the CoC
> > that are present in this forum, I had an expectation that this is a
> > forum where bad behaviour is not tolerated at all.  So I've waited a
> > couple of days to see if anyone in a project leadership position is
> > going to say something about this comment.....
> > 
> > <silence>
> > 
> > The deafening silence of tacit acceptance is far more damning than
> > the high pitched squeal of Linus's childish taunts.
> 
> Well, let's face it: it's a pretty low level taunt and it wasn't aimed
> at you (or indeed anyone on the thread that I could find) and it was
> backed by technical argument in the next sentence.  We all have a
> tendency to let off steam about stuff in general not at people in
> particular as you did here:
> 
> https://lore.kernel.org/ksummit/ZP+vcgAOyfqWPcXT@dread.disaster.area/

There's a massive difference between someone saying no to a wild
proposal with the backing of solid ethical arguments against
experimentation on non-consenting human subjects vs someone calling
anyone who might disagree with them a bunch of cry-babies.

You do yourself a real disservice by claiming these two comments are
in any way equivalent.

> But I didn't take it as anything more than a rant about AI in general
> and syzbot in particular and certainly I didn't assume it was aimed at
> me or anyone else.

I wasn't ranting about AI at all. If you think that was what I was
talking about then you have, once again, completely missed the
point.

I was talking about the *ethics of our current situation* and how
that should dictate the behaviour of community members and bots that
they run for the benefit of the community. If a bot is causing harm
to the community, then ethics dictates that there is only one
reasonable course of action that can be taken...

This has *nothing to do with AI* and everything to do with how the
community polices hostile actors in the community. If 3rd party
run infrastructure is causing direct harm to developers and they
aren't allowed to opt out, then what do we do about it?

> If everyone reached for the code of conduct when someone had a non-
> specific rant using colourful phraseology, we'd be knee deep in
> complaints, which is why we tend to be more circumspect when it
> happens.

I disagree entirely, and I think this a really bad precedent to try
to set. Maybe you see it as "Fred has a colourful way with words",
but that doesn't change the fact the person receiving that comment
might see the same comment very, very differently.

I don't think anyone can dispute the fact that top level kernel
maintainers are repeat offenders when it comes to being nasty,
obnoxious and/or abusive. Just because kernel maintainers have
normalised this behaviour between themselves, it does not make it OK
to treat anyone else this way.

Maintainers need to be held to a higher standard than the rest of
the community - the project leaders need to set the example of how
everyone else should behave, work and act - and right now I am _very
disappointed_ by where this thread has ended up.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-18 17:26                                 ` Linus Torvalds
  2023-09-18 19:32                                   ` Jiri Kosina
@ 2023-09-19  4:56                                   ` Dave Chinner
  2023-09-25  9:43                                     ` Christoph Hellwig
  1 sibling, 1 reply; 97+ messages in thread
From: Dave Chinner @ 2023-09-19  4:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, Theodore Ts'o, NeilBrown, James Bottomley,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Mon, Sep 18, 2023 at 10:26:24AM -0700, Linus Torvalds wrote:
> On Mon, 18 Sept 2023 at 04:14, Jan Kara <jack@suse.cz> wrote:
> >
> > I agree. On the other hand each filesystem we carry imposes some
> > maintenance burden (due to tree wide changes that are happening) and the
> > question I have for some of them is: Do these filesystems actually bring
> > any value?
> 
> I wouldn't be shocked if we could probably remove half of the
> filesystems I listed, and nobody would even notice.

That's the best argument for removing all these old filesystems from
the kernel that anyone has made so far.

As it is, I'm really failing to see how it can be argued
successfully that we can remove ia64 support because it has no users
and is a maintenance burden on kernel developers, but that same
argument doesn't appear to hold any weight when applied to a
filesystem.

What makes filesystems so special we can't end-of-life them like
other kernel code?

[....]

> And that's kind of the other side of the picture: usage matters.
> Something like affs or minixfs might still have a couple of users, but
> those uses would basically be people who likely use Linux to interact
> with some legacy machine they maintain..  So the usage they see would
> mainly be very simple operations.

Having a "couple of occasional users" really does not justify the
ongoing overhead of maintaining those filesystems in working order
as everything else around them in the kernel changes. Removing the
code from the kernel does not deny users access to their data; they
just have to use a different method to access it (e.g. an old
kernel/distro in a vm).

> And that matters for two reasons:
> 
>  (a) we probably don't have to worry about bugs - security or
> otherwise - as much. These are not generally "general-purpose"
> filesystems. They are used for data transfer etc.

By the same argument they could use an old kernel in a VM and not
worry about the security implications of all the unfixed bugs that
might be in that old kernel/distro.

>  (b) if they ever turn painful, we might be able to limit the pain further.

The context that started this whole discussion is that maintenance
of old filesystems is becoming painful after years of largely
being able to ignore them.

.....

> Anyway, based on the *current* situation, I don't actually see the
> buffer cache even _remotely_ painful enough that we'd do even that
> thing. It's not a small undertaking to get rid of the whole
> b_this_page stuff and the complexity that comes from the page being
> reachable through the VM layer (ie writepages etc). So it would be a
> *lot* more work to rip that code out than it is to just support it.

As I keep saying, the issues are not purely constrained to the
buffer cache. It's all the VFS interfaces and structures. It's all
the ops methods that need to be changed. It's all the block layer
interfaces filesystem use. It's all the page and folio interfaces,
and how the filesystems (ab)use them. And so on - it all adds up.

If we're not going to be allowed to remove old filesystems, then how
do we go about avoiding the effort required to keep those old
filesystems up to date with the infrastructure modifications we need
to make for the benefit of millions of users that use modern
filesystems and modern hardware?

Do we just fork all the code and how two versions of things like
bufferheads until all the maintained filesystems have been migrated
away from them? Or something else? 

These are the same type of questions Christoph posed in his OP, yet
this discussion is still not at the point where people have
recognised that these are the problems we need to discuss and
solve....

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-19  1:15                           ` Dave Chinner
@ 2023-09-19  5:17                             ` Matthew Wilcox
  2023-09-19 16:34                               ` Theodore Ts'o
  2023-09-19 22:57                               ` Dave Chinner
  0 siblings, 2 replies; 97+ messages in thread
From: Matthew Wilcox @ 2023-09-19  5:17 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, NeilBrown, James Bottomley, Eric Sandeen,
	Steven Rostedt, Guenter Roeck, Christoph Hellwig, ksummit,
	linux-fsdevel

On Tue, Sep 19, 2023 at 11:15:54AM +1000, Dave Chinner wrote:
> This was easy to do with iomap based filesystems because they don't
> carry per-block filesystem structures for every folio cached in page
> cache - we carry a single object per folio that holds the 2 bits of
> per-filesystem block state we need for each block the folio maps.
> Compare that to a bufferhead - it uses 56 bytes of memory per
> fielsystem block that is cached.

56?1  What kind of config do you have?  It's 104 bytes on Debian:
buffer_head          936   1092    104   39    1 : tunables    0    0    0 : slabdata     28     28      0

Maybe you were looking at a 32-bit system; most of the elements are
word-sized (pointers, size_t or long)

> So we have to consider that maybe it is less work to make high-order
> folios work with bufferheads. And that's where we start to get into
> the maintenance problems with old filesysetms using bufferheads -
> how do we ensure that the changes for high-order folio support in
> bufferheads does not break the way one of these old filesystems
> that use bufferheads?

I don't think we can do it.  Regardless of the question you're proposing
here, the model where we complete a BIO, then walk every buffer_head
attached to the folio to determine if we can now mark the folio as being
(uptodate / not-under-writeback) just doesn't scale when you attach more
than tens of BHs to the folio.  It's one bit per BH rather than having
a summary bitmap like iomap has.

I have been thinking about spitting the BH into two pieces, something
like this:

struct buffer_head_head {
	spinlock_t b_lock;
	struct buffer_head *buffers;
	unsigned long state[];
};

and remove BH_Uptodate and BH_Dirty in favour of setting bits in state
like iomap does.

But, as you say, there are a lot of filesystems that would need to be
audited and probably modified.

Frustratingly, it looks like buffer_heads were intended to be used as
extents; each one has a b_size of its own.  But there's a ridiculous
amount of code that assumes that all BHs attached to a folio have the
same b_size as each other.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-19  5:17                             ` Matthew Wilcox
@ 2023-09-19 16:34                               ` Theodore Ts'o
  2023-09-19 16:45                                 ` Matthew Wilcox
  2023-09-19 22:57                               ` Dave Chinner
  1 sibling, 1 reply; 97+ messages in thread
From: Theodore Ts'o @ 2023-09-19 16:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dave Chinner, Linus Torvalds, NeilBrown, James Bottomley,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Tue, Sep 19, 2023 at 06:17:21AM +0100, Matthew Wilcox wrote:
> Frustratingly, it looks like buffer_heads were intended to be used as
> extents; each one has a b_size of its own.  But there's a ridiculous
> amount of code that assumes that all BHs attached to a folio have the
> same b_size as each other.

The primary reason why we need a per-bh b_size is for the benefit of
non-iomap O_DIRECT code paths.  If that goes away, then we can
simplify this significantly, since we flush the buffer cache whenever
we change the blocksize used in the buffer cache; the O_DIRECT bh's
aren't part of the buffer cache, which is when you might have bh's with
a b_size of 8200k (when doing a 8200k O_DIRECT read or write).

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-19 16:34                               ` Theodore Ts'o
@ 2023-09-19 16:45                                 ` Matthew Wilcox
  2023-09-19 17:15                                   ` Linus Torvalds
  0 siblings, 1 reply; 97+ messages in thread
From: Matthew Wilcox @ 2023-09-19 16:45 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Dave Chinner, Linus Torvalds, NeilBrown, James Bottomley,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Tue, Sep 19, 2023 at 12:34:17PM -0400, Theodore Ts'o wrote:
> On Tue, Sep 19, 2023 at 06:17:21AM +0100, Matthew Wilcox wrote:
> > Frustratingly, it looks like buffer_heads were intended to be used as
> > extents; each one has a b_size of its own.  But there's a ridiculous
> > amount of code that assumes that all BHs attached to a folio have the
> > same b_size as each other.
> 
> The primary reason why we need a per-bh b_size is for the benefit of
> non-iomap O_DIRECT code paths.  If that goes away, then we can
> simplify this significantly, since we flush the buffer cache whenever
> we change the blocksize used in the buffer cache; the O_DIRECT bh's
> aren't part of the buffer cache, which is when you might have bh's with
> a b_size of 8200k (when doing a 8200k O_DIRECT read or write).

I must have not explained myself very well.

What I was trying to say was that if the buffer cache actually supported
it, large folios and buffer_heads wouldn't perform horribly together,
unless you had a badly fragmented file.

eg you could allocate a 256kB folio, then ask the filesystem to
create buffer_heads for it, and maybe it would come back with a list
of four buffer_heads, the first representing the extent from 0-32kB,
the second 32kB-164kB, the third 164kB-252kB and the fourth 252kB-256kB.
Wherever there were physical discontiguities in the file.

Then there would be only four buffer_heads to scan in order to determine
whether the entire folio was uptodate/dirty/written-back/etc.  It's still
O(n^2) but don't underestimate the power of reducing N to a small number.

Possibly we'd want to change buffer_heads a little to support tracking
dirtiness on a finer granularity than per-extent (just as Ritesh
recently did to iomap).  But there is a path to happiness here that
doesn't involve switching everything to iomap.  If I try to do it, I
know I'll break everything while doing it ...

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-19  2:44                       ` Dave Chinner
@ 2023-09-19 16:57                         ` James Bottomley
  0 siblings, 0 replies; 97+ messages in thread
From: James Bottomley @ 2023-09-19 16:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Eric Sandeen, Steven Rostedt, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On Tue, 2023-09-19 at 12:44 +1000, Dave Chinner wrote:
> On Sat, Sep 16, 2023 at 05:50:50PM -0400, James Bottomley wrote:
> > On Sat, 2023-09-16 at 08:48 +1000, Dave Chinner wrote:
> > > On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
> > [...]
> > > >  - "they use the buffer cache".
> > > > 
> > > > Waah, waah, waah.
> > > 
> > > .... you dismiss those concerns in the same way a 6 year old
> > > school yard bully taunts his suffering victims.
> > > 
> > > Regardless of the merits of the observation you've made, the tone
> > > and content of this response is *completely unacceptable*. 
> > > Please keep to technical arguments, Linus, because this sort of
> > > response has no merit what-so-ever. All it does is shut down the
> > > technical discussion because no-one wants to be the target of
> > > this sort of ugly abuse just for participating in a technical
> > > discussion.
> > > 
> > > Given the number of top level maintainers that signed off on the
> > > CoC that are present in this forum, I had an expectation that
> > > this is a forum where bad behaviour is not tolerated at all.  So
> > > I've waited a couple of days to see if anyone in a project
> > > leadership position is going to say something about this
> > > comment.....
> > > 
> > > <silence>
> > > 
> > > The deafening silence of tacit acceptance is far more damning
> > > than the high pitched squeal of Linus's childish taunts.
> > 
> > Well, let's face it: it's a pretty low level taunt and it wasn't
> > aimed at you (or indeed anyone on the thread that I could find) and
> > it was backed by technical argument in the next sentence.  We all
> > have a tendency to let off steam about stuff in general not at
> > people in particular as you did here:
> > 
> > https://lore.kernel.org/ksummit/ZP+vcgAOyfqWPcXT@dread.disaster.area/
> 
> There's a massive difference between someone saying no to a wild
> proposal with the backing of solid ethical arguments against
> experimentation on non-consenting human subjects vs someone calling
> anyone who might disagree with them a bunch of cry-babies.
> 
> You do yourself a real disservice by claiming these two comments are
> in any way equivalent.

Well, let's see shall we.  The detrimental impact of an email often
results from the first sentence which is what most people see and react
to especially on the modern display devices like phones.  Pretty much
as you reacted to the first sentence from Linus above.  Your first
sentence in the email I quoted above replying to my idea was:

> No fucking way.

Absent further context that's a textbook stress inducing personal
attack.  Now, I've been on the receiving end of things like this for a
long time, so I simply deployed the usual stress reduction techniques,
read the rest of your email, deleted the knee jerk response and waited
to see if anyone else had a different opinion.

However, the key point is that your email induced the same stress
reaction in me that Linus' statement apparently did in you, so
absolutely I see an equivalence.

> > But I didn't take it as anything more than a rant about AI in
> > general and syzbot in particular and certainly I didn't assume it
> > was aimed at me or anyone else.
> 
> I wasn't ranting about AI at all. If you think that was what I was
> talking about then you have, once again, completely missed the
> point.
> 
> I was talking about the *ethics of our current situation* and how
> that should dictate the behaviour of community members and bots that
> they run for the benefit of the community. If a bot is causing harm
> to the community, then ethics dictates that there is only one
> reasonable course of action that can be taken...
> 
> This has *nothing to do with AI* and everything to do with how the
> community polices hostile actors in the community. If 3rd party
> run infrastructure is causing direct harm to developers and they
> aren't allowed to opt out, then what do we do about it?

My point was basically trying to improve the current situation by
getting the AI processes producing the reports to make them more useful
and eliminate a significant portion of the outbound flow, but I get
that some people are beyond that and would go for amputation rather
than attempted cure.

> > If everyone reached for the code of conduct when someone had a non-
> > specific rant using colourful phraseology, we'd be knee deep in
> > complaints, which is why we tend to be more circumspect when it
> > happens.
> 
> I disagree entirely, and I think this a really bad precedent to try
> to set. Maybe you see it as "Fred has a colourful way with words",
> but that doesn't change the fact the person receiving that comment
> might see the same comment very, very differently.

Which was precisely my point about your email above.  At its most basic
level the standard you hold yourself to has to be the same or better
than the standard you hold others to.

James

> I don't think anyone can dispute the fact that top level kernel
> maintainers are repeat offenders when it comes to being nasty,
> obnoxious and/or abusive. Just because kernel maintainers have
> normalised this behaviour between themselves, it does not make it OK
> to treat anyone else this way.
> 
> Maintainers need to be held to a higher standard than the rest of
> the community - the project leaders need to set the example of how
> everyone else should behave, work and act - and right now I am _very
> disappointed_ by where this thread has ended up.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-19 16:45                                 ` Matthew Wilcox
@ 2023-09-19 17:15                                   ` Linus Torvalds
  0 siblings, 0 replies; 97+ messages in thread
From: Linus Torvalds @ 2023-09-19 17:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Theodore Ts'o, Dave Chinner, NeilBrown, James Bottomley,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On Tue, 19 Sept 2023 at 09:45, Matthew Wilcox <willy@infradead.org> wrote:
>
> What I was trying to say was that if the buffer cache actually supported
> it, large folios and buffer_heads wouldn't perform horribly together,
> unless you had a badly fragmented file.

I think it would work in theory... I don't see a _practical_ example
of a filesystem that would use it, but maybe you had something
specific in mind?

> eg you could allocate a 256kB folio, then ask the filesystem to
> create buffer_heads for it, and maybe it would come back with a list
> of four buffer_heads, the first representing the extent from 0-32kB,
> the second 32kB-164kB, the third 164kB-252kB and the fourth 252kB-256kB.
> Wherever there were physical discontiguities in the file.

That *is* technically something that the buffer cache supports, but I
don't think it has ever been done.

So while it's technically possible, it's never been tested, so it
would almost certainly show some (potentially serious) issues.

And we obviously don't have the helper functions to create such a list
of buffer heads (ie all the existing "grow buffers" just take one size
and create a uniform set of buffers in the page/folio).

                 Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-19  5:17                             ` Matthew Wilcox
  2023-09-19 16:34                               ` Theodore Ts'o
@ 2023-09-19 22:57                               ` Dave Chinner
  1 sibling, 0 replies; 97+ messages in thread
From: Dave Chinner @ 2023-09-19 22:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, NeilBrown, James Bottomley, Eric Sandeen,
	Steven Rostedt, Guenter Roeck, Christoph Hellwig, ksummit,
	linux-fsdevel

On Tue, Sep 19, 2023 at 06:17:21AM +0100, Matthew Wilcox wrote:
> On Tue, Sep 19, 2023 at 11:15:54AM +1000, Dave Chinner wrote:
> > This was easy to do with iomap based filesystems because they don't
> > carry per-block filesystem structures for every folio cached in page
> > cache - we carry a single object per folio that holds the 2 bits of
> > per-filesystem block state we need for each block the folio maps.
> > Compare that to a bufferhead - it uses 56 bytes of memory per
> > fielsystem block that is cached.
> 
> 56?1  What kind of config do you have?  It's 104 bytes on Debian:
> buffer_head          936   1092    104   39    1 : tunables    0    0    0 : slabdata     28     28      0
> 
> Maybe you were looking at a 32-bit system; most of the elements are
> word-sized (pointers, size_t or long)

Perhaps so, it's been years since I actually paid attention to the
exact size of a bufferhead (XFS completely moved away from them back
in 2018). Regardless, underestimating the size of the bufferhead
doesn't materially change the reasons iomap is a better choice for
filesystems running on modern storage hardware...

> > So we have to consider that maybe it is less work to make high-order
> > folios work with bufferheads. And that's where we start to get into
> > the maintenance problems with old filesysetms using bufferheads -
> > how do we ensure that the changes for high-order folio support in
> > bufferheads does not break the way one of these old filesystems
> > that use bufferheads?
> 
> I don't think we can do it.  Regardless of the question you're proposing
> here, the model where we complete a BIO, then walk every buffer_head
> attached to the folio to determine if we can now mark the folio as being
> (uptodate / not-under-writeback) just doesn't scale when you attach more
> than tens of BHs to the folio.  It's one bit per BH rather than having
> a summary bitmap like iomap has.

*nod*

I said as much earlier in the email:

"The pointer chasing model per-block bufferhead iteration requires
to update state and retrieve mapping information just does not scale
to marshalling millions of objects a second through the page cache."


> I have been thinking about spitting the BH into two pieces, something
> like this:
> 
> struct buffer_head_head {
> 	spinlock_t b_lock;
> 	struct buffer_head *buffers;
> 	unsigned long state[];
> };
> 
> and remove BH_Uptodate and BH_Dirty in favour of setting bits in state
> like iomap does.

Yes, that woudl make it similar to the way iomap works, but I think
that then creates more problems in that bufferhead state is used for
per-block locking and blocking waits. I don't really want to think
about much more how complex stuff like __block_write_full_folio()
becomes with this model...

> But, as you say, there are a lot of filesystems that would need to be
> audited and probably modified.

Yes, this is the common problem all these "modernise old API" ideas
end up at - this is the primary issue that needs to be sorted out,
and we're no closer to that now than when the thread started.

We can deal with this problem for filesystems that we can test. For
stuff we can't test and verify, then we really have to start
considering the larger picture around shipping unverified code to
users.

Go read this article on LWN about new EU laws for software
development that aren't that far off being passed into law:

https://lwn.net/Articles/944300/

And it's clear that there are also current policy discussions going
through the US federal government that are, most likely, going to
end up in a similar place with respect to secure development
practices for critical software infrastructure like the Linux
kernel.

Now combine that with this one about the problem of bogus CVEs
(which could have been written about syzbot and filesystems!):

https://lwn.net/Articles/944209/

And it's pretty clear that the current issues with unmaintained code
will only get worse from here. All it will take is a CVE to be
issued on one of these unmaintained filesystems, and the safest
thing for us to do will be to remove the code to remove all
potential liability for it.

The basic message is that we aren't going to be able to ignore code
that we can't substantially verify for much longer.  We simply won't
have a choice about the code we ship: if is not testable and
verified to the best of our abilities then nobody will risk
shipping it regardless of whether they have users or not.

That's the model the cybersecurity-industrial complex is pushing us
towards whether we like it or not. If this is the future in which we
develop software, then this has substantial impact on any discussion
about how to manage old unmaintained, untestable code in any project
we work on, not just the Linux kernel...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-13 17:03                 ` Linus Torvalds
  2023-09-15 22:48                   ` Dave Chinner
@ 2023-09-25  9:38                   ` Christoph Hellwig
  2023-09-25 14:14                     ` Dan Carpenter
  2023-09-25 16:50                     ` Linus Torvalds
  1 sibling, 2 replies; 97+ messages in thread
From: Christoph Hellwig @ 2023-09-25  9:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Sandeen, Steven Rostedt, Dave Chinner, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
> I haven't actually heard a good reason to really stop supporting
> these. Using some kind of user-space library is ridiculous. It's *way*
> more effort than just keeping them in the kernel. So anybody who says
> "just move them to user space" is just making things up.
> 
> The reasons I have heard are:
> 
>  - security
> 
> Yes, don't enable them, and if you enable them, don't auto-mount them
> on hot-pkug devices. Simple. People in this thread have already
> pointed to the user-space support for it happening.

Which honetly doesn't work, as the status will change per kernel
version.  If we are serius about it we need proper in-kernel flagging.

>  - syzbot issues.
> 
> Ignore them for affs & co.

And still get spammed?  Again, we need some good common way to stop
them even bothering instead of wasting their and our resources.

> 
>  - "they use the buffer cache".
> 
> Waah, waah, waah. The buffer cache is *trivial*. If you don't like the
> buffer cache, don't use it. It's that simple.

FYI, IFF this is a response to my original mail and not some of the
weirder ideas floating around on the lists, I've never said remove the
buffer cache, quite to the contrary.  What is problematic, and what I
want to go away is the buffer_head based helpers for while I/O.  Which
unlike using buffer_heads for the actual metadata buffer_cache has
very useful replacements already.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-19  4:56                                   ` Dave Chinner
@ 2023-09-25  9:43                                     ` Christoph Hellwig
  0 siblings, 0 replies; 97+ messages in thread
From: Christoph Hellwig @ 2023-09-25  9:43 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Jan Kara, Theodore Ts'o, NeilBrown,
	James Bottomley, Eric Sandeen, Steven Rostedt, Guenter Roeck,
	Christoph Hellwig, ksummit, linux-fsdevel

On Tue, Sep 19, 2023 at 02:56:45PM +1000, Dave Chinner wrote:
> That's the best argument for removing all these old filesystems from
> the kernel that anyone has made so far.
> 
> As it is, I'm really failing to see how it can be argued
> successfully that we can remove ia64 support because it has no users
> and is a maintenance burden on kernel developers, but that same
> argument doesn't appear to hold any weight when applied to a
> filesystem.
> 
> What makes filesystems so special we can't end-of-life them like
> other kernel code?

Yepp.  And I don't want to remove them against major objections.  If
we even have a single user that actually signs up to do basic QA
I think it's fair game to keep it.  Similar to how we deal with most
drivers (except for some subsystems like net that seemed to be a lot
more aggressive in their removal schedules).


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-25  9:38                   ` Christoph Hellwig
@ 2023-09-25 14:14                     ` Dan Carpenter
  2023-09-25 16:50                     ` Linus Torvalds
  1 sibling, 0 replies; 97+ messages in thread
From: Dan Carpenter @ 2023-09-25 14:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Eric Sandeen, Steven Rostedt, Dave Chinner,
	Guenter Roeck, ksummit, linux-fsdevel

On Mon, Sep 25, 2023 at 02:38:39AM -0700, Christoph Hellwig wrote:
> On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
> >  - syzbot issues.
> > 
> > Ignore them for affs & co.
> 
> And still get spammed?  Again, we need some good common way to stop
> them even bothering instead of wasting their and our resources.

A couple people have suggested adding a pr_warn() in mount.  But another
idea is we could add a taint flag.  That's how we used to ignore bugs in
binary out of tree drivers.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-25  9:38                   ` Christoph Hellwig
  2023-09-25 14:14                     ` Dan Carpenter
@ 2023-09-25 16:50                     ` Linus Torvalds
  1 sibling, 0 replies; 97+ messages in thread
From: Linus Torvalds @ 2023-09-25 16:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, Steven Rostedt, Dave Chinner, Guenter Roeck,
	ksummit, linux-fsdevel

On Mon, 25 Sept 2023 at 02:38, Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Sep 13, 2023 at 10:03:55AM -0700, Linus Torvalds wrote:
> >
> > Yes, don't enable them, and if you enable them, don't auto-mount them
> > on hot-pkug devices. Simple. People in this thread have already
> > pointed to the user-space support for it happening.
>
> Which honetly doesn't work, as the status will change per kernel
> version.  If we are serius about it we need proper in-kernel flagging.

That would be good, I agree.

The obvious place to do it would be in /proc/filesystems, which is
very under-utilized right now. But I assume we have tools that parse
it and adding fields to it would break.

The alternative might be to add "hints" to the mount options, and just
have the kernel then react to them.

IOW, the same way we have "mount read-only" - which is not just a
semantic flag - the kernel also obviously *requires* read-only mediums
to be mounted that way, we could have some kind of "mount a
non-trusted medium", and the kernel could say "this filesystem can not
do that" on a per-filesystem basis.

                 Linus

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-11  1:05         ` Dave Chinner
  2023-09-11  1:29           ` Kent Overstreet
@ 2023-09-26  5:24           ` Eric W. Biederman
  1 sibling, 0 replies; 97+ messages in thread
From: Eric W. Biederman @ 2023-09-26  5:24 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Kent Overstreet, James Bottomley, Matthew Wilcox,
	Christoph Hellwig, ksummit, linux-fsdevel

Dave Chinner <david@fromorbit.com> writes:

> On Sat, Sep 09, 2023 at 06:42:30PM -0400, Kent Overstreet wrote:
>> On Sat, Sep 09, 2023 at 08:50:39AM -0400, James Bottomley wrote:
>> > So why can't we figure out that easier way? What's wrong with trying to
>> > figure out if we can do some sort of helper or library set that assists
>> > supporting and porting older filesystems. If we can do that it will not
>> > only make the job of an old fs maintainer a lot easier, but it might
>> > just provide the stepping stones we need to encourage more people climb
>> > up into the modern VFS world.
>> 
>> What if we could run our existing filesystem code in userspace?
>
> You mean like lklfuse already enables?
>
> https://github.com/lkl/linux
>
> Looks like the upstream repo is currently based on 6.1, so there's
> already a mechanism to use relatively recent kernel filesystem
> implementations as a FUSE filesystem without needed to support a
> userspace code base....

At a practical level I think it might be better to start with
https://libguestfs.org/.

The libguestfs code already has fuse support and already ships in common
linux distros.

If I read the documentation correctly libguestfs already has a mode
where it runs an existing kernel under qemu to access any filesystem
the kernel running in qemu supports.


Unless I am misunderstanding something all that needs to happen with
libguestfs is for someone to do the work to get userspace to mount
external untrusted filesystems with it (by default), and for
unprivileged containers to use it to mount filesystems the container
would like to use.


Be it libguestfs or lklfuse I think the real challenge is for someone to
do all of the work so that whatever solution is chosen it is there in
common situations (aka usb sticks and containers), the filesystems
developers know it is there, and the security folks know it is there.


For the long tail of rare filesystems simply having something that is
the recommended way of using the filesystem and works without friction
is the real challenge to get to.

Eric

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-18 11:14                               ` Jan Kara
  2023-09-18 17:26                                 ` Linus Torvalds
@ 2023-09-27 22:23                                 ` Dave Kleikamp
  1 sibling, 0 replies; 97+ messages in thread
From: Dave Kleikamp @ 2023-09-27 22:23 UTC (permalink / raw)
  To: Jan Kara, Linus Torvalds
  Cc: Theodore Ts'o, NeilBrown, James Bottomley, Dave Chinner,
	Eric Sandeen, Steven Rostedt, Guenter Roeck, Christoph Hellwig,
	ksummit, linux-fsdevel

On 9/18/23 6:14AM, Jan Kara wrote:

> I agree. On the other hand each filesystem we carry imposes some
> maintenance burden (due to tree wide changes that are happening) and the
> question I have for some of them is: Do these filesystems actually bring
> any value? I.e., are there still any users left? Sadly that's quite
> difficult to answer so people do bother only if the pain is significant
> enough like in the case of reiserfs. But I suspect we could get rid of a
> few more without a real user complaining (e.g. Shaggy said he'd be happy to
> deprecate JFS and he's not aware of any users).

Sorry for the late response, but just catching up on this discussion.

When I did make that statement, I did get a response that there actually 
are some users out there that want JFS to stay.

Every few weeks I do try to catch up JFS things. I can try to do so 
weekly. (We'll see how that goes.) I'm not promising and major revisions 
though.

Shaggy


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
  2023-09-12  9:50               ` Richard Biener
@ 2023-10-23  5:19                 ` Eric Gallager
  0 siblings, 0 replies; 97+ messages in thread
From: Eric Gallager @ 2023-10-23  5:19 UTC (permalink / raw)
  To: Richard Biener
  Cc: Segher Boessenkool, Dan Carpenter, Steven Rostedt, Dave Chinner,
	Guenter Roeck, Christoph Hellwig, ksummit, linux-fsdevel,
	gcc-patches

On Tue, Sep 12, 2023 at 5:53 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Sep 7, 2023 at 2:32 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Thu, Sep 07, 2023 at 02:23:00PM +0300, Dan Carpenter wrote:
> > > On Thu, Sep 07, 2023 at 06:04:09AM -0500, Segher Boessenkool wrote:
> > > > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> > > > > I started to hunt
> > > > > down all the Makefile which add a -Werror but there are a lot and
> > > > > eventually I got bored and gave up.
> > > >
> > > > I have a patch stack for that, since 2014 or so.  I build Linux with
> > > > unreleased GCC versions all the time, so pretty much any new warning is
> > > > fatal if you unwisely use -Werror.
> > > >
> > > > > Someone should patch GCC so there it checks an environment variable to
> > > > > ignore -Werror.  Somethine like this?
> > > >
> > > > No.  You should patch your program, instead.
> > >
> > > There are 2930 Makefiles in the kernel source.
> >
> > Yes.  And you need patches to about thirty.  Or a bit more, if you want
> > to do it more cleanly.  This isn't a guess.
> >
> > > > One easy way is to add a
> > > > -Wno-error at the end of your command lines.  Or even just -w if you
> > > > want or need a bigger hammer.
> > >
> > > I tried that.  Some of the Makefiles check an environemnt variable as
> > > well if you want to turn off -Werror.  It's not a complete solution at
> > > all.  I have no idea what a complete solution looks like because I gave
> > > up.
> >
> > A solution can not involve changing the compiler.  That is just saying
> > the kernel doesn't know how to fix its own problems, so let's give the
> > compiler some more unnecessary problems.
>
> You can change the compiler by replacing it with a script that appends
> -Wno-error
> for example.

I personally would find the original proposal of an IGNORE_WERROR
environment variable much simpler than any of the alternative proposed
solutions, especially for complicated build systems where I can't tell
where the "-Werror" is getting inserted from. Often times I'm not
actually the developer of the package I'm trying to compile, so saying
"fix your code" in such a case doesn't make sense, since it's not
actually my code to fix in the first place. It would be much easier
for end-users in such a situation to just set an environment variable,
rather than asking them to try to become developers themselves, which
is what some of these alternative proposals (such as "write your own
script!") seem to be asking.

>
> > > > Or nicer, put it all in Kconfig, like powerpc already has for example.
> > > > There is a CONFIG_WERROR as well, so maybe use that in all places?
> > >
> > > That's a good idea but I'm trying to compile old kernels and not the
> > > current kernel.
> >
> > You can patch older kernels, too, you know :-)
> >
> > If you need to not make any changes to your source code for some crazy
> > reason (political perhaps?), just use a shell script or shell function
> > instead of invoking the compiler driver directly?
> >
> >
> > Segher
> >
> > Segher

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2023-10-23  5:19 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-30 14:07 [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems Christoph Hellwig
2023-09-05 23:06 ` Dave Chinner
2023-09-05 23:23   ` Matthew Wilcox
2023-09-06  2:09     ` Dave Chinner
2023-09-06 15:06       ` Christian Brauner
2023-09-06 15:59         ` Christian Brauner
2023-09-06 19:09         ` Geert Uytterhoeven
2023-09-08  8:34         ` Christoph Hellwig
2023-09-07  0:46     ` Bagas Sanjaya
2023-09-09 12:50     ` James Bottomley
2023-09-09 15:44       ` Matthew Wilcox
2023-09-10 19:51         ` James Bottomley
2023-09-10 20:19           ` Kent Overstreet
2023-09-10 21:15           ` Guenter Roeck
2023-09-11  3:10           ` Theodore Ts'o
2023-09-11 19:03             ` James Bottomley
2023-09-12  0:23               ` Dave Chinner
2023-09-12 16:52             ` H. Peter Anvin
2023-09-09 22:42       ` Kent Overstreet
2023-09-10  8:19         ` Geert Uytterhoeven
2023-09-10  8:37           ` Bernd Schubert
2023-09-10 16:35           ` Kent Overstreet
2023-09-10 17:26             ` Geert Uytterhoeven
2023-09-10 17:35               ` Kent Overstreet
2023-09-11  1:05         ` Dave Chinner
2023-09-11  1:29           ` Kent Overstreet
2023-09-11  2:07             ` Dave Chinner
2023-09-11 13:35               ` David Disseldorp
2023-09-11 17:45                 ` Bart Van Assche
2023-09-11 19:11                   ` David Disseldorp
2023-09-11 23:05                 ` Dave Chinner
2023-09-26  5:24           ` Eric W. Biederman
2023-09-08  8:55   ` Christoph Hellwig
2023-09-08 22:47     ` Dave Chinner
2023-09-06 22:32 ` Guenter Roeck
2023-09-06 22:54   ` Dave Chinner
2023-09-07  0:53     ` Bagas Sanjaya
2023-09-07  3:14       ` Dave Chinner
2023-09-07  1:53     ` Steven Rostedt
2023-09-07  2:22       ` Dave Chinner
2023-09-07  2:51         ` Steven Rostedt
2023-09-07  3:26           ` Matthew Wilcox
2023-09-07  8:04             ` Thorsten Leemhuis
2023-09-07 10:29               ` Christian Brauner
2023-09-07 11:18                 ` Thorsten Leemhuis
2023-09-07 12:04                   ` Matthew Wilcox
2023-09-07 12:57                   ` Guenter Roeck
2023-09-07 13:56                     ` Christian Brauner
2023-09-08  8:44                     ` Christoph Hellwig
2023-09-07  3:38           ` Dave Chinner
2023-09-07 11:18             ` Steven Rostedt
2023-09-13 16:43               ` Eric Sandeen
2023-09-13 16:58                 ` Guenter Roeck
2023-09-13 17:03                 ` Linus Torvalds
2023-09-15 22:48                   ` Dave Chinner
2023-09-16 19:44                     ` Steven Rostedt
2023-09-16 21:50                     ` James Bottomley
2023-09-17  1:40                       ` NeilBrown
2023-09-17 17:30                         ` Linus Torvalds
2023-09-17 18:09                           ` Linus Torvalds
2023-09-17 18:57                           ` Theodore Ts'o
2023-09-17 19:45                             ` Linus Torvalds
2023-09-18 11:14                               ` Jan Kara
2023-09-18 17:26                                 ` Linus Torvalds
2023-09-18 19:32                                   ` Jiri Kosina
2023-09-18 19:59                                     ` Linus Torvalds
2023-09-18 20:50                                       ` Theodore Ts'o
2023-09-18 22:48                                         ` Linus Torvalds
2023-09-18 20:33                                     ` H. Peter Anvin
2023-09-19  4:56                                   ` Dave Chinner
2023-09-25  9:43                                     ` Christoph Hellwig
2023-09-27 22:23                                 ` Dave Kleikamp
2023-09-19  1:15                           ` Dave Chinner
2023-09-19  5:17                             ` Matthew Wilcox
2023-09-19 16:34                               ` Theodore Ts'o
2023-09-19 16:45                                 ` Matthew Wilcox
2023-09-19 17:15                                   ` Linus Torvalds
2023-09-19 22:57                               ` Dave Chinner
2023-09-18 14:54                       ` Bill O'Donnell
2023-09-19  2:44                       ` Dave Chinner
2023-09-19 16:57                         ` James Bottomley
2023-09-25  9:38                   ` Christoph Hellwig
2023-09-25 14:14                     ` Dan Carpenter
2023-09-25 16:50                     ` Linus Torvalds
2023-09-07  9:48       ` Dan Carpenter
2023-09-07 11:04         ` Segher Boessenkool
2023-09-07 11:22           ` Steven Rostedt
2023-09-07 12:24             ` Segher Boessenkool
2023-09-07 11:23           ` Dan Carpenter
2023-09-07 12:30             ` Segher Boessenkool
2023-09-12  9:50               ` Richard Biener
2023-10-23  5:19                 ` Eric Gallager
2023-09-08  8:39       ` Christoph Hellwig
2023-09-08  8:38     ` Christoph Hellwig
2023-09-08 23:21       ` Dave Chinner
2023-09-07  0:48   ` Bagas Sanjaya
2023-09-07  3:07     ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.