All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: NeilBrown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
Date: Mon, 4 May 2015 06:11:29 +0100	[thread overview]
Message-ID: <20150504051129.GA3246@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20150424134203.GC889@ZenIV.linux.org.uk>

On Fri, Apr 24, 2015 at 02:42:03PM +0100, Al Viro wrote:

> That avoids this spin_lock() on each absolute symlink at the price of extra
> 32 bits in struct nameidata.  It looks like doing on-demand reallocation
> of nd->stack is the right way to go anyway, so the pressure on nameidata size
> is going to be weaker and that might be the right way to go...

OK, on-demand reallocation is done.  What I have right now is
	* flat MAXSYMLINKS 40, no matter what kind of nesting there might
be.
	* purely iterative link_path_walk().
	* no damn nameidata on stack for generic_readlink()
	* stack footprint of the entire thing independent from the nesting
depth, and about on par with "no symlinks at all" case in mainline.
	* some massage towards RCU follow_link done (in the end of queue),
but quite a bit more work remains.

What I've got so far is in vfs.git#link_path_walk; I'm not too happy about
posting a 70-chunk mailbomb, but it really will need review and testing.
It survives xfstests and LTP with no regressions, but it will need
serious profiling, etc., along with RTFS.  I tried to keep it in reasonably
small pieces, but there's a lot of them ;-/

FWIW, I've a bit more reorganization plotted out, but it's not far from
where we need to be for RCU follow_link.  Some notes:
	* I don't believe we want to pass flags to ->follow_link() - it's
much simpler to give the damn thing NULL for dentry in RCU case.  In *all*
cases where we might have a change to get the symlink body without blocking
we can do that by inode alone.  We obviously want to pass dentry and inode
separately (and in case of fast symlinks we don't hit the filesystem at
all), but that's it - flags isn't needed.
	* terminate_walk() should do bulk put_link().  So should the
failure cases of complete_walk().  _Success_ of complete_walk() should
be careful about legitimizing links - it *can* be called with one link
on stack, and be followed by access to link body.  Yes, really - do_last()
in O_CREAT case.
	* do_last(), lookup_last() and mountpoint_last() ought to
have put_link() done when called on non-empty stack (thus turning the loops
into something like
                while ((err = lookup_last(nd)) > 0) {
                        err = trailing_symlink(nd);
                        if (err)
                                break;
                }
_After_ the point where they don't need to look at the last component of
name, obviously.
	* I think we should leave terminate_walk() to callers in failure
cases of walk_component() and handle_dots(), as well as get_link().  Makes
life simpler in callers, actually.  I'll play with that a bit more.
	* it might make sense to add the second flag to walk_component(),
in addition to LOOKUP_FOLLOW, meaning "do put_link() once you are done looking
at the name".  In principle, it promises simpler logics with unlazy_walk(),
but that's one area I'm still not entirely sure about.  Will need to
experiment a bit...
	* nd->seq clobbering will need to be dealt with, as discussed upthread.
	* I _really_ hate your "let's use the LSB of struct page * to tell
if we need to kunmap()" approach.  It's too damn ugly to live.  _And_ it's
trivial to avoid - all we need is to have (non-lazy) page_follow_link_light()
and page_symlink() to remove __GFP_HIGHMEM from inode->i_mapping before
ever asking to allocate pages there.  That'll suffice, and it makes sense
regardless of RCU work - that kmap/kunmap with potential for minutes in
between (while waiting for stuck NFS server in the middle of symlink traversal)
is simply wrong.


  reply	other threads:[~2015-05-04  5:12 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
2015-04-20 19:08   ` Andreas Dilger
2015-04-20 19:22     ` Al Viro
2015-04-20 20:35       ` Al Viro
2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
2015-04-20 21:04   ` Linus Torvalds
2015-04-20 21:32     ` Al Viro
2015-04-20 21:39       ` Linus Torvalds
2015-04-20 21:51         ` Al Viro
2015-04-20 21:41       ` Linus Torvalds
2015-04-20 21:42         ` Linus Torvalds
2015-04-20 21:59           ` Al Viro
2015-04-20 21:52         ` Al Viro
2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-21 15:04   ` Christoph Hellwig
2015-04-21 15:12     ` Richard Weinberger
2015-04-21 15:45       ` Al Viro
2015-04-21 16:46         ` Boaz Harrosh
2015-04-21 21:20         ` Al Viro
2015-04-22 18:07           ` Al Viro
2015-04-22 20:12             ` Al Viro
2015-04-22 21:05               ` Al Viro
2015-04-23  7:45                 ` NeilBrown
2015-04-23 18:07                   ` Al Viro
2015-04-24  6:35                     ` NeilBrown
2015-04-24 13:42                       ` Al Viro
2015-05-04  5:11                         ` Al Viro [this message]
2015-05-04  7:30                           ` NeilBrown
2015-04-23  5:01           ` Al Viro
2015-04-21 14:51 ` [PATCH] logfs: fix a pagecache leak for symlinks Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150504051129.GA3246@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.