All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
@ 2015-04-20 18:12 Al Viro
  2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
                   ` (25 more replies)
  0 siblings, 26 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: NeilBrown, linux-kernel, linux-fsdevel

	The patchset below gets rid of recursion in pathname resolution.
As the result, stack footprint of fs/namei.c stuff doesn't depend upon
the symlink nesting anymore and is much less than the worst cases in
mainline - as the matter of fact, it's around the footprint mainline gets
with just one or two levels of nesting.

	We could reduce it further (see below), but I'm not sure it's worth
doing - it's not much extra complexity, but we only squeeze out ~250 bytes
that way, with the worst-case footprints (those are triggered by rename())
are around 1.4Kb (instead of about twice as much in mainline).  OTOH,
that extra complexity would've allowed us to get rid of the nesting limit
entirely (i.e. we get ELOOP when we encounter 40 symlinks, no matter in
which manner they are nested).  That might be worth considering...

	Performance of the result seems to be about on par with the mainline
and so far it has survived everything I'd hit it with.  More testing is
obviously needed.

	Patches 2/24..6/24 are from Neil's RCU follow_link patchset; the
rest of his patchset is, of course, derailed by the massage done here,
but AFAICS we should be able to port it on top of this one with reasonably
little PITA.

	There are several fs-visible interface changes:
1) nameidata is _not_ passed to ->follow_link() and ->put_link().  We use
the same kind of approach as with pt_regs.
2) instead of "store the link body (with nd_set_link()), return an opaque
pointer to be given to ->put_link() to undo whatever we'd done to pin the
link body" ->follow_link() now does the opposite - it *returns* the symlink
body and stores whatever opaque pointer ->put_link() will need.  Stores
it with nd_pin_link(cookie).  Rules:
	* ERR_PTR() returned => fail lookup with that error.
	* NULL returned => we'd done the move ourselves, no body to traverse.
	* pointer to string => symlink body for traversal.  ->put_link()
will be called if non-NULL pointer had been stored by nd_pin_link().
3) ->put_link() does *not* have access to symlink body anymore.  The cases
that used to rely on seeing it (kfree_put_link(), mostly) are trivially
switched to new rules - just do nd_pin_link(body); return body; and we
are fine.

	I've converted all in-tree instances of ->follow_link()/->put_link()
(see 23/24) and it's actually less headache than with the mainline calling
conventions.

	The main reason the series is that long is that I'm trying to keep
all steps small - it *is* a serious rewrite and I want to do it in easily
verified steps.

	Overall it's fairly straightforward - instead of getting a new
stack frame for each level of nesting, we add an explicit array just for
the stuff that needs to be preserved across the recursive call - there's
not much of it (how far did we get in pathname + stuff needed for ->put_link(),
i.e. cookie and struct path of symlink itself).  That array is *not*
embedded into nameidata - only a pointer to it.  That allows to reduce the
size of struct nameidata, which is especially sensitive in rename(), where
we have two of those.  Old nd->saved_names[] is gone - not needed anymore.
	Array is held in caller (or caller of caller) stack frame.  We could
go for "just keep a couple of elements there, if we need more we'll allocate
from slab", but that doesn't buy us a lot unless we remove the limit on
nesting at the same time.  I hadn't done that in this series; we can always
do that later, it would be a fairly isolated modification.

	For RCU follow_link porting we'll probably need to replace struct
path of symlink with union of struct path and (dentry,inode) pair.  Should
be doable without blowing the stack footprint.

	The scariest remaining thing about fs/namei.c stack footprint is that
we get ->d_automount() called at pretty much maximal depth; it's about half of
what it used to be in mainline (1.3Kb instead of 2.7Kb), but there's a lot
of work done by that thing (and by finish_automount()).  Something like
NFS referral point encountered by rename(2) while looking for parent
directories can get really nasty...

	Folks, please review; the patches (on top of vfs.git#for-next) go in followups, the entire branch is in vfs.git#link_path_walk.

Shortlog:
Al Viro (19):
      lustre: rip the private symlink nesting limit out
      namei: expand nested_symlink() in its only caller
      namei.c: separate the parts of follow_link() that find the link body
      namei: fold follow_link() into link_path_walk()
      link_path_walk: handle get_link() returning ERR_PTR() immediately
      link_path_walk: don't bother with walk_component() after jumping link
      link_path_walk: turn inner loop into explicit goto
      link_path_walk: massage a bit more
      link_path_walk: get rid of duplication
      link_path_walk: final preparations to killing recursion
      link_path_walk: kill the recursion
      link_path_walk: split "return from recursive call" path
      link_path_walk: cleanup - turn goto start; into continue;
      namei: fold may_follow_link() into follow_link()
      namei: introduce nameidata->stack
      namei: regularize use of put_link() and follow_link(), trim arguments
      namei: trim the arguments of get_link()
      new ->follow_link() and ->put_link() calling conventions
      uninline walk_component()

NeilBrown (5):
      VFS: replace {, total_}link_count in task_struct with pointer to nameidata
      ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link
      VFS: replace nameidata arg to ->put_link with a char*.
      SECURITY: remove nameidata arg from inode_follow_link.
      VFS: remove nameidata args from ->follow_link

Diffstat:
 Documentation/filesystems/Locking             |   4 +-
 Documentation/filesystems/porting             |  10 +
 Documentation/filesystems/vfs.txt             |   4 +-
 drivers/staging/lustre/lustre/llite/symlink.c |  26 +-
 fs/9p/vfs_inode.c                             |  15 +-
 fs/9p/vfs_inode_dotl.c                        |   9 +-
 fs/autofs4/symlink.c                          |   5 +-
 fs/befs/linuxvfs.c                            |  46 ++--
 fs/ceph/inode.c                               |   6 +-
 fs/cifs/cifsfs.h                              |   2 +-
 fs/cifs/link.c                                |  29 +-
 fs/configfs/symlink.c                         |  28 +-
 fs/debugfs/file.c                             |   5 +-
 fs/ecryptfs/inode.c                           |  11 +-
 fs/exofs/symlink.c                            |   7 +-
 fs/ext2/symlink.c                             |   6 +-
 fs/ext3/symlink.c                             |   6 +-
 fs/ext4/symlink.c                             |   6 +-
 fs/freevxfs/vxfs_immed.c                      |  11 +-
 fs/fuse/dir.c                                 |  19 +-
 fs/gfs2/inode.c                               |  11 +-
 fs/hostfs/hostfs_kern.c                       |  16 +-
 fs/hppfs/hppfs.c                              |   9 +-
 fs/jffs2/symlink.c                            |   9 +-
 fs/jfs/symlink.c                              |   6 +-
 fs/kernfs/symlink.c                           |  23 +-
 fs/libfs.c                                    |   7 +-
 fs/namei.c                                    | 378 +++++++++++++++-----------
 fs/nfs/symlink.c                              |  18 +-
 fs/ntfs/namei.c                               |   1 -
 fs/overlayfs/inode.c                          |  36 +--
 fs/proc/base.c                                |   4 +-
 fs/proc/inode.c                               |   8 +-
 fs/proc/namespaces.c                          |   4 +-
 fs/proc/self.c                                |  24 +-
 fs/proc/thread_self.c                         |  22 +-
 fs/sysv/symlink.c                             |   5 +-
 fs/ubifs/file.c                               |   7 +-
 fs/ufs/symlink.c                              |   6 +-
 fs/xfs/xfs_iops.c                             |  12 +-
 include/linux/fs.h                            |  11 +-
 include/linux/namei.h                         |   7 +-
 include/linux/sched.h                         |   3 +-
 include/linux/security.h                      |   9 +-
 mm/shmem.c                                    |  28 +-
 security/capability.c                         |   3 +-
 security/security.c                           |   4 +-
 security/selinux/hooks.c                      |   2 +-
 48 files changed, 457 insertions(+), 471 deletions(-)

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2015-05-04  7:30 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
2015-04-20 19:08   ` Andreas Dilger
2015-04-20 19:22     ` Al Viro
2015-04-20 20:35       ` Al Viro
2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
2015-04-20 21:04   ` Linus Torvalds
2015-04-20 21:32     ` Al Viro
2015-04-20 21:39       ` Linus Torvalds
2015-04-20 21:51         ` Al Viro
2015-04-20 21:41       ` Linus Torvalds
2015-04-20 21:42         ` Linus Torvalds
2015-04-20 21:59           ` Al Viro
2015-04-20 21:52         ` Al Viro
2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-21 15:04   ` Christoph Hellwig
2015-04-21 15:12     ` Richard Weinberger
2015-04-21 15:45       ` Al Viro
2015-04-21 16:46         ` Boaz Harrosh
2015-04-21 21:20         ` Al Viro
2015-04-22 18:07           ` Al Viro
2015-04-22 20:12             ` Al Viro
2015-04-22 21:05               ` Al Viro
2015-04-23  7:45                 ` NeilBrown
2015-04-23 18:07                   ` Al Viro
2015-04-24  6:35                     ` NeilBrown
2015-04-24 13:42                       ` Al Viro
2015-05-04  5:11                         ` Al Viro
2015-05-04  7:30                           ` NeilBrown
2015-04-23  5:01           ` Al Viro
2015-04-21 14:51 ` [PATCH] logfs: fix a pagecache leak for symlinks Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.