All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] readlink()-related oddities
@ 2015-11-19 23:26 Al Viro
  2015-11-20  2:13 ` Linus Torvalds
  2015-11-20  9:59 ` David Howells
  0 siblings, 2 replies; 11+ messages in thread
From: Al Viro @ 2015-11-19 23:26 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-kernel, Linus Torvalds, linux-security-module, miklos, dhowells

	I'd been looking through ->readlink() callers, and there are
several areas where behaviour looks wrong.

1) atime updates, according to POSIX, should happen in case of success.
For example, giving readlink(2) an unmapped buffer should _not_ touch
atime.  Neither should calling readlink(2) in case if ->readlink() method
returns e.g. -EIO or -ENOMEM.  We do atime update in those cases.  Looks
like a bug and unless there's a good reason to keep that behaviour, I'd
rather have it do what POSIX says.

The question of which ->readlink() callers should bother with atime is
interesting as well:
	* readlink(2) and nfsd_readlink() are touching atime of the
symlink in question.  Fair enough.
	* ecryptfs_readlink_lower() calls ->readlink() on the underlying
filesystem inode and does *not* do anything to atime there.  Also makes sense -
one of the callers (->getattr()) certainly does not want timestamps modified,
the other two will have atime of ecryptfs symlink touched, which will propagate
to underlying fs.
	* overlayfs copyup does not touch atime of the underlying object.
Also fine - the whole reason we are doing the copyup is that the layer it
was in is not to be modified in any way.
	* overlayfs ->readlink(), OTOH, *does* attempt to touch atime
of the symlink we end up reading.  Looks bogus, especially since
symlink _traversal_ does no such thing.  Am I missing something subtle here?
Miklos?

2) LSM, with its usual stellar consistency, has a hook stuck into
readlink(2), but not into nfsd_readlink().  Could the esteemed Linux S&M folks
decide if that's right?

3) normally, readlink(2) fails for non-symlinks.  Moreover, according to
POSIX it should do so (with -EINVAL).  There is a pathological case when
it succeeds for a directory, though.  Namely, one of the kinds of AFS
"mountpoints".  stat(2) reports those as directories, stepping into them
leads to automounting a directory there (why do we have ->open() for
them, BTW?)  How the hell is userland supposed to guess to call readlink(2)
on those suckers to get the information of what'll get automounted there if
we step upon them?  And could we please get rid of that kludge?  David?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-19 23:26 [RFC] readlink()-related oddities Al Viro
@ 2015-11-20  2:13 ` Linus Torvalds
  2015-11-20  2:57   ` Al Viro
  2015-11-20 10:00   ` David Howells
  2015-11-20  9:59 ` David Howells
  1 sibling, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2015-11-20  2:13 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linux Kernel Mailing List, LSM List,
	Miklos Szeredi, David Howells

On Thu, Nov 19, 2015 at 3:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> 1) atime updates, according to POSIX, should happen in case of success.
> For example, giving readlink(2) an unmapped buffer should _not_ touch
> atime.  Neither should calling readlink(2) in case if ->readlink() method
> returns e.g. -EIO or -ENOMEM.  We do atime update in those cases.  Looks
> like a bug and unless there's a good reason to keep that behaviour, I'd
> rather have it do what POSIX says.

I really don't think anybody cares, but I also don't think anybody
cares about the current behavior, so we can certainly fix it to match
POSIX wording, and just move it a bit lower after checking the return
value from ->realink().

> 3) normally, readlink(2) fails for non-symlinks.  Moreover, according to
> POSIX it should do so (with -EINVAL).

I don't think POSIX is necessarily relevant here.

We have had magic file behavior outside the scope of POSIX before, and
we will have it in the future. It makes perfect sense to use
readlink() for management tools for automounting, even if the normal
oepration is to treat the thing as a directory.

Not everything is within the domain of POSIX.

               Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-20  2:13 ` Linus Torvalds
@ 2015-11-20  2:57   ` Al Viro
  2015-11-20  3:09     ` Linus Torvalds
  2015-11-20 10:00   ` David Howells
  1 sibling, 1 reply; 11+ messages in thread
From: Al Viro @ 2015-11-20  2:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Linux Kernel Mailing List, LSM List,
	Miklos Szeredi, David Howells

On Thu, Nov 19, 2015 at 06:13:53PM -0800, Linus Torvalds wrote:

> > 3) normally, readlink(2) fails for non-symlinks.  Moreover, according to
> > POSIX it should do so (with -EINVAL).
> 
> I don't think POSIX is necessarily relevant here.
> 
> We have had magic file behavior outside the scope of POSIX before, and
> we will have it in the future. It makes perfect sense to use
> readlink() for management tools for automounting, even if the normal
> oepration is to treat the thing as a directory.
> 
> Not everything is within the domain of POSIX.

How would those tools know that this particular pathname _is_ a magical
symlink?  Sure, if you see a symlink with body that starts with % or #,
you could figure out that it's not a regular one and go parse the body,
but for stat(2) it looks like a directory.  Do those tools call readlink()
on every directory they spot on AFS volume?  David?

And what's the story with magical ->open() for those?  How could one get
to ->open() on those sucker and avoid triggering the automount instead?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-20  2:57   ` Al Viro
@ 2015-11-20  3:09     ` Linus Torvalds
  2015-11-20  3:16       ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2015-11-20  3:09 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linux Kernel Mailing List, LSM List,
	Miklos Szeredi, David Howells

On Thu, Nov 19, 2015 at 6:57 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> How would those tools know that this particular pathname _is_ a magical
> symlink?

Like maybe just the AFS management tools? By design you'd only run
them on the mountpoint in question.

Not everything has to be "generic". Sometimes its' good enough to just
have the ability to get the work done.

Now, if it turns out that others also want to do this, maybe somebody
decides "let's add flag -V to 'ls', which forces a 'readlink()' on all
the targets, whether links or not, and shows the information".

I could imagine other special files having "a single line of
information about the file" that they'd expose with readlink(). Who
knows?

So there is *potential* for just making it generic, but that doesn't
mean that it necessarily has to act that way.

                Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-20  3:09     ` Linus Torvalds
@ 2015-11-20  3:16       ` Linus Torvalds
  2015-11-20  3:24         ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2015-11-20  3:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linux Kernel Mailing List, LSM List,
	Miklos Szeredi, David Howells

On Thu, Nov 19, 2015 at 7:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So there is *potential* for just making it generic, but that doesn't
> mean that it necessarily has to act that way.

.. it's not necessarily just readlink() either. I still think it might
be a perfectly fine idea to allow non-directories to act as
directories in some case (by exposing "readdir" and "lookup").

But readdir() really doesn't sound horrible either. how about unix
domain sockets (or named pipes) giving their link information when you
do readdir() on them?

Quite frankly, I think allowing those kinds of unified interfaces is
better than the current situation where you have to use a
"getpeername()" system call etc. If it's a filesystem object, why not
allow filesystem operations to work on it?

We expose some things in /proc as symlinks things that actually would
work better as non-symlinks, exactly *because* we want to expose not
just the end result of what they point to, but also a *description* of
what they point to. So we have those odd "pseudo-symlinks" in /proc
that don't actually really do a pathname walk on the symlink content
they expose, but still *look* like symlinks just because readdir() is
such a useful thing to have.

No wonder other users have wanted to use it.

                    Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-20  3:16       ` Linus Torvalds
@ 2015-11-20  3:24         ` Al Viro
  0 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2015-11-20  3:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Linux Kernel Mailing List, LSM List,
	Miklos Szeredi, David Howells

On Thu, Nov 19, 2015 at 07:16:32PM -0800, Linus Torvalds wrote:

> .. it's not necessarily just readlink() either. I still think it might
> be a perfectly fine idea to allow non-directories to act as
> directories in some case (by exposing "readdir" and "lookup").

As soon as we expose ->lookup(), we are in for serious PITA with locking.
Please, don't.  The situation is convoluted enough as it is; playing with
parallel lookups is going to be interesting in itself and I'd rather not
mix it with attempts to accomodate for hybrid objects (e.g. something
with children and more than one parent, etc.), especially since I'm not
sure the latter can be done without major pain.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-19 23:26 [RFC] readlink()-related oddities Al Viro
  2015-11-20  2:13 ` Linus Torvalds
@ 2015-11-20  9:59 ` David Howells
  2015-11-20 16:08   ` Al Viro
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: David Howells @ 2015-11-20  9:59 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, linux-fsdevel, linux-kernel, Linus Torvalds,
	linux-security-module, miklos

Al Viro <viro@ZenIV.linux.org.uk> wrote:

> 3) normally, readlink(2) fails for non-symlinks.  Moreover, according to
> POSIX it should do so (with -EINVAL).  There is a pathological case when
> it succeeds for a directory, though.  Namely, one of the kinds of AFS
> "mountpoints".

All AFS mountpoints are magic symlinks that are specially interpreted by the
client as far as I'm aware.  I'm not sure why the designers didn't just select
a different file type for them, but they didn't.

Unfortunately, it means that iget has to read the contents of the symlinks :-/

> stat(2) reports those as directories, stepping into them leads to
> automounting a directory there (why do we have ->open() for them, BTW?).

I think I put that in to make sure the open() syscall returned EREMOTE rather
than another error if you tried to open it.  It can probably be removed
because with the d_automount code you can't ever get there I think - unless
you can pass AT_NO_AUTOMOUNT to openat().

> How the hell is userland supposed to guess to call readlink(2) on those
> suckers to get the information of what'll get automounted there if we step
> upon them?

There's an AFS userspace command that could be used to query a mountpoint that
was going to use it.  However, I suspect readlink() will now always trigger
the automount.  This is one of the things OpenAFS uses pioctl() for - but
since I'm not allowed to add that to the kernel, I have to find some other way
of doing it.

> And could we please get rid of that kludge?  David?

Sure.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-20  2:13 ` Linus Torvalds
  2015-11-20  2:57   ` Al Viro
@ 2015-11-20 10:00   ` David Howells
  1 sibling, 0 replies; 11+ messages in thread
From: David Howells @ 2015-11-20 10:00 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, Linus Torvalds, linux-fsdevel,
	Linux Kernel Mailing List, LSM List, Miklos Szeredi

Al Viro <viro@ZenIV.linux.org.uk> wrote:

> How would those tools know that this particular pathname _is_ a magical
> symlink?  Sure, if you see a symlink with body that starts with % or #,
> you could figure out that it's not a regular one and go parse the body,
> but for stat(2) it looks like a directory.  Do those tools call readlink()
> on every directory they spot on AFS volume?  David?

It has to be a directory so that you can mount on it.  If you look in /afs on
an OpenAFS client filesystem it appears as a symlink to somewhere under /afs/
because they can't do the in-kernel mounting (it's GPL-only on Linux).

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-20  9:59 ` David Howells
@ 2015-11-20 16:08   ` Al Viro
  2015-11-20 16:26   ` David Howells
  2015-11-20 22:33   ` Al Viro
  2 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2015-11-20 16:08 UTC (permalink / raw)
  To: David Howells
  Cc: linux-fsdevel, linux-kernel, Linus Torvalds,
	linux-security-module, miklos

On Fri, Nov 20, 2015 at 09:59:05AM +0000, David Howells wrote:
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > 3) normally, readlink(2) fails for non-symlinks.  Moreover, according to
> > POSIX it should do so (with -EINVAL).  There is a pathological case when
> > it succeeds for a directory, though.  Namely, one of the kinds of AFS
> > "mountpoints".
> 
> All AFS mountpoints are magic symlinks that are specially interpreted by the
> client as far as I'm aware.  I'm not sure why the designers didn't just select
> a different file type for them, but they didn't.

All of them?  I see two kinds there - one is magical symlink (recognized
by contents in afs_iget()), another is this autocell thing, the latter
having no ->readlink().  Both serve as automount points, don't they?

> > stat(2) reports those as directories, stepping into them leads to
> > automounting a directory there (why do we have ->open() for them, BTW?).
> 
> I think I put that in to make sure the open() syscall returned EREMOTE rather
> than another error if you tried to open it.  It can probably be removed
> because with the d_automount code you can't ever get there I think - unless
> you can pass AT_NO_AUTOMOUNT to openat().

Just how would openat() get the AT_... flags?  Only statat(2) accepts
AT_NO_AUTOMOUNT, sorry.
 
> > How the hell is userland supposed to guess to call readlink(2) on those
> > suckers to get the information of what'll get automounted there if we step
> > upon them?
> 
> There's an AFS userspace command that could be used to query a mountpoint that
> was going to use it.  However, I suspect readlink() will now always trigger
> the automount.  This is one of the things OpenAFS uses pioctl() for - but
> since I'm not allowed to add that to the kernel, I have to find some other way
> of doing it.

Well, pioctl() is a piec^H^Hle of shit interface; let's figure out what we'd
actually want to implement and do that.

One obvious thing is "here's a pathname, tell me what gets automounted here"
(with interesting question of what to do if the automount is being triggered
right now).  Another thing is locating those guys; if we had a separate file
type for them (i.e. could recognize them by st_mode _and_ d_type), we would
be fine (the usual tree-walkers would be able to spot such places and query
them for prospective automount targets), but without that... a syscall for
everything in a tree just to list those suckers?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-20  9:59 ` David Howells
  2015-11-20 16:08   ` Al Viro
@ 2015-11-20 16:26   ` David Howells
  2015-11-20 22:33   ` Al Viro
  2 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2015-11-20 16:26 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, linux-fsdevel, linux-kernel, Linus Torvalds,
	linux-security-module, miklos

Al Viro <viro@ZenIV.linux.org.uk> wrote:

> All of them?  I see two kinds there - one is magical symlink (recognized
> by contents in afs_iget()), another is this autocell thing, the latter
> having no ->readlink().  Both serve as automount points, don't they?

The "autocell" thing is where you don't have an AFS file of that name and
lookup of that non-existent file as an attempt to mount a destination volume
encoded by the filename.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] readlink()-related oddities
  2015-11-20  9:59 ` David Howells
  2015-11-20 16:08   ` Al Viro
  2015-11-20 16:26   ` David Howells
@ 2015-11-20 22:33   ` Al Viro
  2 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2015-11-20 22:33 UTC (permalink / raw)
  To: David Howells
  Cc: linux-fsdevel, linux-kernel, Linus Torvalds,
	linux-security-module, miklos

On Fri, Nov 20, 2015 at 09:59:05AM +0000, David Howells wrote:

> There's an AFS userspace command that could be used to query a mountpoint that
> was going to use it.  However, I suspect readlink() will now always trigger
> the automount.

It won't, actually.  All we are passing to user_path_at_empty() is
LOOKUP_EMPTY, so for the final component we'll have
        if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
                           LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
            path->dentry->d_inode)
                return -EISDIR;
in follow_automount() trigger and follow_managed() will turn that -EISDIR
into 0.  IOW, readlink(2) does work on those, same as stat() (since Sep 2011).

Sigh...  OK, let's leave it for now; ->open() for those guys is completely
bogus, AFAICS, but that's local bogo^Wbusiness.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-11-20 22:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-19 23:26 [RFC] readlink()-related oddities Al Viro
2015-11-20  2:13 ` Linus Torvalds
2015-11-20  2:57   ` Al Viro
2015-11-20  3:09     ` Linus Torvalds
2015-11-20  3:16       ` Linus Torvalds
2015-11-20  3:24         ` Al Viro
2015-11-20 10:00   ` David Howells
2015-11-20  9:59 ` David Howells
2015-11-20 16:08   ` Al Viro
2015-11-20 16:26   ` David Howells
2015-11-20 22:33   ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.