Re: [PATCH 0/2] shmem: Notify user space when file system is full

From: Amir Goldstein <amir73il@gmail.com>
To: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Linux MM <linux-mm@kvack.org>, Jan Kara <jack@suse.com>,
	 Matthew Bobrowski <repnop@google.com>,
	Khazhismel Kumykov <khazhy@google.com>,
	kernel@collabora.com
Subject: Re: [PATCH 0/2] shmem: Notify user space when file system is full
Date: Tue, 11 Jan 2022 09:50:42 +0200	[thread overview]
Message-ID: <CAOQ4uxiD1k+7F7gDpmS1nBFVfDz2evy+Ep=9XCOKRuDF7sAEJQ@mail.gmail.com> (raw)
In-Reply-To: <87fspv9gr2.fsf@collabora.com>

On Tue, Jan 11, 2022 at 3:57 AM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> writes:
>
> > Two things bother me about this proposal.
> > One is that it makes more sense IMO to report ENOSPC events
> > from vfs code.
>
> Hi Amir,
>
> I reimplemented this with FS_WB_ERROR in the branch below. It reports
> writeback errors on mapping_set_error, as suggested.
>
>   https://gitlab.collabora.com/krisman/linux/-/tree/wb-error
>
> It is a WIP, and I'm not proposing it yet, cause I'm thinking about the
> ENOSPC case a bit more...
>
> > Why should the requirement to monitor ENOSPC conditions be specific to tmpfs?
> > Especially, as I mentioned, there are already wrappers in place to report
> > writeback errors on an inode (mapping_set_error), where the fsnotify hook
> > can fit nicely.
>
> mapping_set_error would trigger the ENOSPC event only when it happens on
> an actual writeback error (i.e. BLK_STS_NOSPC), which is not the main
> case I'm solving here.  In fact, most of the time, -ENOSPC will happen
> before any IO is submitted, for instance, if an inode could not be
> allocated during .create() or a block can't be allocated in
> .write_begin(). In this case, it isn't really a writeback error
> (semantically), and it is not registered as such by any file system.
>

I see.
But the question remains, what is so special about shmem that
your use case requires fsnotify events to handle ENOSPC?

Many systems are deployed on thin provisioned storage these days
and monitoring the state of the storage to alert administrator before
storage gets full (be it filesystem inodes or blocks or thinp space)
is crucial to many systems.

Since the ENOSPC event that you are proposing is asynchronous
anyway, what is the problem with polling statfs() and meminfo?

I guess one difference is that it is harder to predict page allocation failure
that causes ENOSPC in shmem, but IIUC, your patch does not report
an fsevent in that case only in inode/block accounting error.
Or maybe I did not understand it correctly?

In a sense, the shmem ENOSPC error not caused by accounting error is
similar to EIO error on legacy storage and on thin provisioned storage that
the end user cannot monitor. Right?

> We can treat those under the same FAN_WB_ERROR mask, but that is a bit
> weird.  Maybe we should have a mask specifically for ENOSPC, or a,
> "IO error" mask?

If we go forward with this, I do like FAN_IO_ERROR for both
writeback error and general vfs errors, because what matters
is the action required from the event.

A listener that subscribes to FAN_FS_ERROR would likely react with
fsck or such.
A listener that subscribes to FAN_IO_ERROR would likely react with
checking the storage usage and/or application logs.

>
> The patchset is quite trivial, but it has to touch many places in the VFS
> (say, catch ENOSPC on create, fallocate, write, writev, etc). Please,
> see the branch above to what that would look like.
>
> Is that a viable solution?  Are you ok with reporting those cases under
> the same writeback mask?
>

I am not making the calls in vfs ;)
If I were, I would have asked you to explain your use case better.
Why do you need this for shmem and why would anyone need this for any
other fs.

Why can't the same result be achieved with polling existing APIs?
I think you will need to present a very strong case for the wide vfs change.
If your case is strong enough only for shmem, then perhaps we can
accept that filesystem (and not vfs) is responsible to report
FAN_IO_ERROR and that there is no clear semantics about when
user can expect FAN_IO_ERROR from any given filesystem, but that
seems strange.

> Btw, I'm thinking it could be tidy to transform many of those VFS calls,
> from
>
>         if (!error)
>                 fsnotify_modify(file);
>         else if (error == -ENOSPC)
>                 fsnotify_nospace(file);
>
> Into unconditionally calling the fsnotify_modify hook, which sends
> the correct event depending on the operation result:
>
>         void fsnotify_modify(int error, struct file *file)
>         {
>                 if (likely(!error)) {
>                         fsnotify_file(file, FS_MODIFY);
>                 } else if (error == -ENOSPC) {
>                         fsnotify_wb_error(file);
>                 }
>         }
>
> (same for fsnotify_mkdir, fsnotify_open, etc).
>
> If you are ok with that?
>

If we were to go down that path, I would prefer the latter
because it will force modifying all call sites and place the
logic in one place and apropos logic, if we do that
we should not handle only ENOSPC - IMO at least EIO
should get the same treatment.

>
> > I understand that you wanted to differentiate errors caused by memory
> > pressure, but I don't understand why it makes sense for a filesystem monitor
> > to get a different feedback than the writing application.
>
> Maybe the differentiation of those two cases could be done as part of
> the file system specific payload that I wanted to write for
> FAN_FS_ERROR?  If so, it would be easier for the ENOSPC event trigger to
> be implemented at the filesystem-level.
>

Certainly. If we restrict the scope of those events to shmem,
We do not need a new event type. It's enough to use FAN_FS_ERROR
with specific payload as you wanted.
But I still need convincing why this shmem ENOSPC is such a special case.

Thanks,
Amir.