linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
Date: Thu, 23 Apr 2015 17:45:44 +1000	[thread overview]
Message-ID: <20150423174544.48ae3122@notabene.brown> (raw)
In-Reply-To: <20150422210553.GX889@ZenIV.linux.org.uk>

[-- Attachment #1: Type: text/plain, Size: 5113 bytes --]

On Wed, 22 Apr 2015 22:05:53 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Wed, Apr 22, 2015 at 09:12:38PM +0100, Al Viro wrote:
> > On Wed, Apr 22, 2015 at 07:07:59PM +0100, Al Viro wrote:
> > > And one more: may_follow_link() is now potentially oopsable.  Look: suppose
> > > we've reached the link in RCU mode, just as it got unlinked.  link->dentry
> > > has become negative and may_follow_link() steps into
> > >         /* Allowed if owner and follower match. */
> > >         inode = link->dentry->d_inode;
> > >         if (uid_eq(current_cred()->fsuid, inode->i_uid))
> > >                 return 0;
> > > Oops...  Incidentally, I suspect that your __read_seqcount_retry() in
> > > follow_link() might be lacking a barrier; why isn't full read_seqcount_retry()
> > > needed?

Blind copy-paste error I suspect.

> > > 
> > > FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling
> > > get_link(), and passed inode to it as an explicit argument.  And passed it
> > > to may_follow_link() as well...
> > 
> > Hrm...  You know, something really weird is going on here.  Where are
> > you setting nd->seq?  I don't see anything in follow_link() doing that.

follow_link calls  link_path_walk -> walk_component -> lookup_fast which sets
nd->seq.  Is that not enough?  I guess not when nd_jump_link is called.  Is
that what I missed?


> > And nd->seq _definitely_ needs setting if you want to stay in RCU mode -
> > at that point it matches the dentry of symlink, not that of nd->path
> > (== parent directory).  Neil, could you tell me which kernel you'd been
> > testing (ideally - commit ID is a public git tree), what config and what
> > tests had those been?

The 'devel' branch of git://neil.brown.name/md, which is based on 4.0-rc7.

Tests have been fairly causal so far, making sure I can follow symlinks on a
small variety of filesystems, and running a particular test which repeatedly
stats a large number of non-existent files with paths that go through a
symlink.


> 
> FWIW, there's a wart that had been annoying me for quite a while, and it
> might be related to dealing with that properly.  Namely, walk_component()
> calling conventions.  We have walk_component(nd, &path, follow), which can
> 	* return -E..., and leave us with pathwalk terminated; path contains
> junk, and so does nd->path.
> 	* return 0, update nd->path, nd->inode and nd->seq.  The contents
> of path is in undefined state - it might be unchanged, it might be equal to
> nd->path (and not pinned down, RCU mode or not).  In any case, callers do
> not touch it afterwards.  That's the normal case.
> 	* return 1, update nd->seq, leave nd->path and nd->inode unchanged and
> set path pointing to our symlink.  nd->seq matches path, not nd->path.
> 
> In all cases the original contents of path is ignored - it's purely 'out'
> parameter, but compiler can't establish that on its own; it _might_ be
> left untouched.  In all cases when its contents survives we don't look at
> it afterwards, but proving that requires a non-trivial analysis.
> 
> And in case when we return 1 (== symlink to be followed), we bugger nd->seq.
> It's left as we need it for unlazy_walk() (and after unlazy_walk() we don't
> care about it at all), so currently everything works, but if we want to
> stay in RCU mode for symlink traversal, we _will_ need ->d_seq of parent
> directory.
> 
> I wonder if the right way to solve that would be to drop the path argument
> entirely and store the bugger in nameidata.  As in
> 	union {
> 		struct qstr last;
> 		struct path link;
> 	};
> 	...
> 	union {
> 		int last_type;
> 		unsigned link_seq;
> 	};
> in struct nameidata.  We never need both at the same time; after
> walk_component() (or its analogue in do_last()) we don't need the component
> name anymore.  That way walk_component() would not trash nd->seq when
> finding a symlink...
> 
> It would also shrink the stack footprint a bit - local struct path next
> in link_path_walk() would be gone, along with the same thing in path_lookupat()
> and friends.  Not a lot of win (4 pointers total), but it might be enough
> to excuse putting ->d_seq of root in there, along with ->link.dentry->d_inode,
> to avoid rechecking its ->d_seq.  As the matter of fact, we do have an
> odd number of 32bit fields in there, so ->d_seq of root would fit nicely...
> 
> Comments?

One thing that is clear to me is that I don't really understand all the
handling of 'seq' numbers, making me unable to comment usefully.
I'll try to go through the current code and you current patch with that issue
in mind and make sure I do understand it.  Then I'll try to comment.

Meanwhile, I like your suggestion in separate email to legitimize the whole
symlink stack when unlazy_walk is needed.  One reason that I didn't even try
that originally is that it was not practical to walk the stack to find
everything that needed legitimizing.  Now that you have that in an explicit
stack it is at least work looking into.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

  reply	other threads:[~2015-04-23  7:45 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
2015-04-20 19:08   ` Andreas Dilger
2015-04-20 19:22     ` Al Viro
2015-04-20 20:35       ` Al Viro
2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
2015-04-20 21:04   ` Linus Torvalds
2015-04-20 21:32     ` Al Viro
2015-04-20 21:39       ` Linus Torvalds
2015-04-20 21:51         ` Al Viro
2015-04-20 21:41       ` Linus Torvalds
2015-04-20 21:42         ` Linus Torvalds
2015-04-20 21:59           ` Al Viro
2015-04-20 21:52         ` Al Viro
2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-21 15:04   ` Christoph Hellwig
2015-04-21 15:12     ` Richard Weinberger
2015-04-21 15:45       ` Al Viro
2015-04-21 16:46         ` Boaz Harrosh
2015-04-21 21:20         ` Al Viro
2015-04-22 18:07           ` Al Viro
2015-04-22 20:12             ` Al Viro
2015-04-22 21:05               ` Al Viro
2015-04-23  7:45                 ` NeilBrown [this message]
2015-04-23 18:07                   ` Al Viro
2015-04-24  6:35                     ` NeilBrown
2015-04-24 13:42                       ` Al Viro
2015-05-04  5:11                         ` Al Viro
2015-05-04  7:30                           ` NeilBrown
2015-04-23  5:01           ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150423174544.48ae3122@notabene.brown \
    --to=neilb@suse.de \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).