linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: David Howells <dhowells@redhat.com>
Cc: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/32] VFS: Introduce filesystem context [ver #8]
Date: Mon, 18 Jun 2018 16:33:41 -0500	[thread overview]
Message-ID: <8736xjdbka.fsf@xmission.com> (raw)
In-Reply-To: <3949.1529353850@warthog.procyon.org.uk> (David Howells's message of "Mon, 18 Jun 2018 21:30:50 +0100")

David Howells <dhowells@redhat.com> writes:

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> I have read through these patches and I noticed a significant issue.
>> 
>> Today in mount_bdev we do something that looks like:
>> 
>> mount_bdev(...)
>> {
>> 	s = sget(..., bdev);
>> 	if (s->s_root) {
>>         	/* Noop */
>>         } else {
>>         	err = fill_super(s, ...);
>>                 if (err) {
>>                 	deactivate_locked_super(s);
>>                         return ERR_PTR(err);
>>                 }
>>                 s->s_flags |= SB_ATTIVE;
>>                 bdev->bd_super = s;
>>         }
>>         return dget(s->s_root);
>> }
>> 
>> The key point is that we don't process the mount options at all if
>> a super block already exists in the kernel.  Similar to what
>> your fscontext changes are doing (after parsing the options).
>
> Actually, no, that's not the case.
>
> The fscontext code *requires* you to parse the parameters *before* any attempt
> to access the superblock is made.  Note that this will actually be a problem
> for, say, ext4 which passes a text string stored in the superblock through the
> parser *before* parsing the mount syscall data.  Fun.
>
> I'm intending to deal with that particular case by having ext4 create multiple
> private contexts, one filled in from the user data, and then a second one
> filled in from the superblock string.  These can then be validated one against
> the other before the super_block struct is published.
>
> And if the super_block struct already exists, the user's specified parameters
> can be validated against that.

I did not say parse.  I said process.   My meaning is that today
on a second mount of a filesystem like ext2.  But really most of them we
ignore those options.  

>> Your fscontext changes do not improve upon this area of the mount api at
>> all and that concerns me.  This is an area where people can and already
>> do shoot themselves in their feet.
>
> This *will* be dealt with, but I wanted to get the core changes upstream
> before tackling all the filesystems.  The legacy wrapper is just that and
> should be got rid of when all the filesystems have been converted.

This is an area where you are making things explicit, where before
there was no way to talk about this.   For filesystems that are in
legacy mode I realize we might miss this corner case, but we should
still think about it and handle it.  The proc filesystem has the same
behavior and it is one you are converting.

>> ...
>>
>> Creating a new mount and finding an old mount are the same operation in
>> the kernel today.  This is fundamentally confusing.  In the new api
>> could we please separate these two operations?
>>
>> Perhaps someting like:
>> x create
>> x find
>> 
>> With the "x create" case failing if the filesystem already exists,
>> still allowing "x find"?  And with the "x find" case failing if
>> the superblock is not already created in the kernel.
>
> No.  What you're suggesting introduces a userspace-userspace and a
> userspace-kernel race - unless you're willing to let userspace lock against
> superblock creation by other parties.
>
> Further, some filesystems *have* to be parameterised before you can do the
> check for the superblock.  Network filesystems, for example, where you have to
> set the network parameters upfront and the key to the superblock might not be
> known until you've queried the server.

I am not talking about skipping the parameterization.  I am talking
about actually acting on those options.  Parsing and validating them
ahead of time is not my concern.  When we make the super block
honor those options is my concern.

Further I am not suggesting something that has a meaningful race.
I am suggesting some that is the equivalent of the O_EXCL logic.
I am proposing that "x create" fail if the superblock already exists
in the kernel.  I am proposing that "x find" will fail if the
superblock does not already exist.

In the worst case you have to iterate a time or two when another
user is racing with you to create the super block.  But this
gives you very valuable information.  Knowledge of if the superblock
is honoring all of your specified mount options or not.

It removes an existing nasty race today where people think they mount a
filesystem like "proc" with one set of options and those options are
ignored because an internal kernel mount already exists.

This is at the level of the fscontext API.

I don't care what filesystems that have not been updated to fscontext
do. I just want to avoid the nasty nasty confusion that is possible
with the existing API.

My motivation is I am in the middle of closing a regression in option
parsing in proc that caused a security option to get ignored.

I would be happy even with a result value of "x create" that told
reported if the superbloc "created" or "found".  Instead of having two
different options.

But I want to be able to say to userspace very clearly.  If this
superblock already exists.  You need to go through the
remount/reconfigure code path to change it's options.


>> That should make it clear to a userspace program what is going on
>> and give it a chance to mount a filesystem anyway.
>
> That said, I'm willing to provide a "fail if already extant" option if we
> think that's actually likely to be of use.  However, you'd still have to
> provide parameters before the check can be made.

Yes that is what I am asking for, though very much not optionally.  It
is a rare case and it succeeds in the wrong way today.  Letting people
think they have set a mount option when they have not.

>> In a similar vein could we please clarify the rules for changing mount
>> options for an existing superblock are in the new api?
>
> You mean remount/reconfigure?  Note that we have to provide backward
> compatibility with every single filesystem.

Yes.  I am thinking explicitly of reconfigure.  The old remount
can do whatever is backwards compatible.

>> Today mount assumes that it has to provide all of the existing options to
>> reconfigure a mount.  What people want to do and what most filesystems
>> support is just specifying the options that need to be changed.  Can we
>> please make this the rule of how this are expected to work for fscontext?
>> That only changing mount options need to be specified before: "x
>> reconfigure"
>
> Fine by me - but it must *also* support every option being specified if that
> is what mount currently does.

Yes.  That is what ext4 and xfs support today.

I just want it clear that at least if you use the new reconfigure
interface that you can expect options you have not specified to remain
the same instead of reverting to a default state.

> I don't really want to supply extra parsers if I can avoid it.  Miklós, for
> example wanted a different, incompatible interface, so you'd do:
>
> 	write(fd, "o +foo");
> 	write(fd, "o -bar");
> 	write(fd, "x reconfig");
>
> sort of thing to enable or disable options... but this assumes that options
> are binary and requires a separate parser to the one that does the initial
> configuration - and you still have to support the old remount data parse.
>
> I'm okay with specifying that you should just specify the options you want to
> change and that the normal way to 'disable' something is to prefix it with
> "no".

Yes.  That is what I am asking for.  Just so we don't need to specify
the options that are staying the same.

> I guess I could pass a flag through to indicate that this came from
> sys_mount(MS_REMOUNT) rather than the new method.

I don't think that would be needed.  As some of the most common
filesystems implement this semantic already for sys_mount(MS_REMOUNT).

What I think we might need is to cache the option string during the
transition for unconverted filesystems so that we can give then a full
set of options.  So that if we have an unconverted filesystem like
devpts that assumes a missing option should be set to it's default value
we won't break it.

Eric

  reply	other threads:[~2018-06-18 21:34 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-25  0:05 [PATCH 00/32] VFS: Introduce filesystem context [ver #8] David Howells
2018-05-25  0:05 ` [PATCH 01/32] VFS: Suppress MS_* flag defs within the kernel unless explicitly enabled " David Howells
2018-05-25  0:05 ` [PATCH 02/32] vfs: Provide documentation for new mount API " David Howells
2018-05-25  0:05 ` [PATCH 03/32] VFS: Introduce the basic header for the new mount API's filesystem context " David Howells
2018-05-31 23:11   ` Al Viro
2018-05-31 23:13   ` Al Viro
2018-05-25  0:05 ` [PATCH 04/32] VFS: Add LSM hooks for the new mount API " David Howells
2018-05-25  0:05 ` [PATCH 05/32] selinux: Implement the new mount API LSM hooks " David Howells
2018-05-25  0:06 ` [PATCH 06/32] smack: Implement filesystem context security " David Howells
2018-05-25  0:06 ` [PATCH 07/32] apparmor: Implement security hooks for the new mount API " David Howells
2018-05-25  0:06 ` [PATCH 08/32] tomoyo: " David Howells
2018-05-25  0:06 ` [PATCH 09/32] VFS: Require specification of size of mount data for internal mounts " David Howells
2018-05-25  0:06 ` [PATCH 10/32] VFS: Implement a filesystem superblock creation/configuration context " David Howells
2018-06-07 19:50   ` Miklos Szeredi
2018-07-03 18:33   ` Eric Biggers
2018-07-03 21:53   ` David Howells
2018-07-03 21:58     ` Al Viro
2018-07-03 22:06     ` David Howells
2018-05-25  0:06 ` [PATCH 11/32] VFS: Remove unused code after filesystem context changes " David Howells
2018-05-25  0:06 ` [PATCH 12/32] procfs: Move proc_fill_super() to fs/proc/root.c " David Howells
2018-05-25  0:06 ` [PATCH 13/32] proc: Add fs_context support to procfs " David Howells
2018-05-25  0:06 ` [PATCH 14/32] ipc: Convert mqueue fs to fs_context " David Howells
2018-05-25  0:07 ` [PATCH 15/32] cpuset: Use " David Howells
2018-05-25  0:07 ` [PATCH 16/32] kernfs, sysfs, cgroup, intel_rdt: Support " David Howells
2018-06-21 18:47   ` [16/32] " Andrei Vagin
2018-06-22 12:52   ` David Howells
2018-06-22 15:30     ` Andrei Vagin
2018-06-22 16:57       ` Andrei Vagin
2018-06-23 23:34       ` David Howells
2018-05-25  0:07 ` [PATCH 17/32] hugetlbfs: Convert to " David Howells
2018-05-25  0:07 ` [PATCH 18/32] VFS: Remove kern_mount_data() " David Howells
2018-05-25  0:07 ` [PATCH 19/32] VFS: Implement fsopen() to prepare for a mount " David Howells
2018-05-31 21:25   ` Al Viro
2018-05-25  0:07 ` [PATCH 20/32] vfs: Make close() unmount the attached mount if so flagged " David Howells
2018-05-31 19:19   ` Al Viro
2018-05-31 19:26     ` Al Viro
2018-06-01  1:52     ` Al Viro
2018-06-01  3:18       ` Al Viro
2018-06-01  5:16         ` Al Viro
2018-05-25  0:07 ` [PATCH 21/32] VFS: Implement fsmount() to effect a pre-configured mount " David Howells
2018-06-04 15:05   ` Arnd Bergmann
2018-06-04 15:24   ` David Howells
2018-05-25  0:07 ` [PATCH 22/32] vfs: Provide an fspick() system call " David Howells
2018-05-25  0:07 ` [PATCH 23/32] VFS: Implement logging through fs_context " David Howells
2018-05-25  1:48   ` Joe Perches
2018-05-25  0:07 ` [PATCH 24/32] vfs: Add some logging to the core users of the fs_context log " David Howells
2018-05-25  0:08 ` [PATCH 25/32] afs: Add fs_context support " David Howells
2018-05-25  0:08 ` [PATCH 26/32] afs: Use fs_context to pass parameters over automount " David Howells
2018-06-07  1:58   ` Goldwyn Rodrigues
2018-06-07 20:45   ` David Howells
2018-05-25  0:08 ` [PATCH 27/32] vfs: Use a 'struct fd_cookie *' type for light fd handling " David Howells
2018-05-25  0:08 ` [PATCH 28/32] vfs: Store the fd_cookie in nameidata, not the dfd int " David Howells
2018-05-25  0:08 ` [PATCH 29/32] vfs: Don't mix FMODE_* flags with O_* flags " David Howells
2018-05-25  0:08 ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) " David Howells
2018-06-01  6:26   ` Christoph Hellwig
2018-06-01  6:39     ` Al Viro
2018-06-01  8:27     ` David Howells
2018-06-02  3:09       ` Al Viro
2018-06-02  3:42         ` Al Viro
2018-06-02  4:04           ` Al Viro
2018-06-02 15:45           ` David Howells
2018-06-02 17:49             ` Al Viro
2018-06-03  0:55               ` [PATCH][RFC] open_tree(2) (was Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]) Al Viro
2018-06-04 10:34                 ` Miklos Szeredi
2018-06-04 15:52                   ` Al Viro
2018-06-04 15:59                     ` Al Viro
2018-06-04 19:27                     ` Miklos Szeredi
2018-06-04 15:27                 ` David Howells
2018-06-04 17:16                 ` Matthew Wilcox
2018-06-04 17:35                   ` Al Viro
2018-06-04 19:38                     ` Miklos Szeredi
2018-06-01  8:02   ` [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8] Amir Goldstein
2018-06-01  8:42   ` David Howells
2018-05-25  0:08 ` [PATCH 31/32] [RFC] fs: Add a move_mount() system call " David Howells
2018-05-31 21:20   ` Al Viro
2018-05-25  0:08 ` [PATCH 32/32] [RFC] fsinfo: Add a system call to allow querying of filesystem information " David Howells
2018-06-04 13:10   ` Arnd Bergmann
2018-06-04 15:01   ` David Howells
2018-06-04 16:00     ` Arnd Bergmann
2018-06-04 19:03     ` David Howells
2018-06-04 20:45       ` Arnd Bergmann
2018-05-31 20:56 ` Test program for move_mount() David Howells
2018-05-31 20:57 ` fsinfo test program David Howells
2018-06-15  4:18 ` [PATCH 00/32] VFS: Introduce filesystem context [ver #8] Eric W. Biederman
2018-06-18 20:30 ` David Howells
2018-06-18 21:33   ` Eric W. Biederman [this message]
2018-06-18 23:33   ` Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8736xjdbka.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=dhowells@redhat.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).