From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753940AbbDTSM2 (ORCPT ); Mon, 20 Apr 2015 14:12:28 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:34710 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752466AbbDTSM0 (ORCPT ); Mon, 20 Apr 2015 14:12:26 -0400 Date: Mon, 20 Apr 2015 19:12:22 +0100 From: Al Viro To: Linus Torvalds Cc: NeilBrown , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Message-ID: <20150420181222.GK889@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The patchset below gets rid of recursion in pathname resolution. As the result, stack footprint of fs/namei.c stuff doesn't depend upon the symlink nesting anymore and is much less than the worst cases in mainline - as the matter of fact, it's around the footprint mainline gets with just one or two levels of nesting. We could reduce it further (see below), but I'm not sure it's worth doing - it's not much extra complexity, but we only squeeze out ~250 bytes that way, with the worst-case footprints (those are triggered by rename()) are around 1.4Kb (instead of about twice as much in mainline). OTOH, that extra complexity would've allowed us to get rid of the nesting limit entirely (i.e. we get ELOOP when we encounter 40 symlinks, no matter in which manner they are nested). That might be worth considering... Performance of the result seems to be about on par with the mainline and so far it has survived everything I'd hit it with. More testing is obviously needed. Patches 2/24..6/24 are from Neil's RCU follow_link patchset; the rest of his patchset is, of course, derailed by the massage done here, but AFAICS we should be able to port it on top of this one with reasonably little PITA. There are several fs-visible interface changes: 1) nameidata is _not_ passed to ->follow_link() and ->put_link(). We use the same kind of approach as with pt_regs. 2) instead of "store the link body (with nd_set_link()), return an opaque pointer to be given to ->put_link() to undo whatever we'd done to pin the link body" ->follow_link() now does the opposite - it *returns* the symlink body and stores whatever opaque pointer ->put_link() will need. Stores it with nd_pin_link(cookie). Rules: * ERR_PTR() returned => fail lookup with that error. * NULL returned => we'd done the move ourselves, no body to traverse. * pointer to string => symlink body for traversal. ->put_link() will be called if non-NULL pointer had been stored by nd_pin_link(). 3) ->put_link() does *not* have access to symlink body anymore. The cases that used to rely on seeing it (kfree_put_link(), mostly) are trivially switched to new rules - just do nd_pin_link(body); return body; and we are fine. I've converted all in-tree instances of ->follow_link()/->put_link() (see 23/24) and it's actually less headache than with the mainline calling conventions. The main reason the series is that long is that I'm trying to keep all steps small - it *is* a serious rewrite and I want to do it in easily verified steps. Overall it's fairly straightforward - instead of getting a new stack frame for each level of nesting, we add an explicit array just for the stuff that needs to be preserved across the recursive call - there's not much of it (how far did we get in pathname + stuff needed for ->put_link(), i.e. cookie and struct path of symlink itself). That array is *not* embedded into nameidata - only a pointer to it. That allows to reduce the size of struct nameidata, which is especially sensitive in rename(), where we have two of those. Old nd->saved_names[] is gone - not needed anymore. Array is held in caller (or caller of caller) stack frame. We could go for "just keep a couple of elements there, if we need more we'll allocate from slab", but that doesn't buy us a lot unless we remove the limit on nesting at the same time. I hadn't done that in this series; we can always do that later, it would be a fairly isolated modification. For RCU follow_link porting we'll probably need to replace struct path of symlink with union of struct path and (dentry,inode) pair. Should be doable without blowing the stack footprint. The scariest remaining thing about fs/namei.c stack footprint is that we get ->d_automount() called at pretty much maximal depth; it's about half of what it used to be in mainline (1.3Kb instead of 2.7Kb), but there's a lot of work done by that thing (and by finish_automount()). Something like NFS referral point encountered by rename(2) while looking for parent directories can get really nasty... Folks, please review; the patches (on top of vfs.git#for-next) go in followups, the entire branch is in vfs.git#link_path_walk. Shortlog: Al Viro (19): lustre: rip the private symlink nesting limit out namei: expand nested_symlink() in its only caller namei.c: separate the parts of follow_link() that find the link body namei: fold follow_link() into link_path_walk() link_path_walk: handle get_link() returning ERR_PTR() immediately link_path_walk: don't bother with walk_component() after jumping link link_path_walk: turn inner loop into explicit goto link_path_walk: massage a bit more link_path_walk: get rid of duplication link_path_walk: final preparations to killing recursion link_path_walk: kill the recursion link_path_walk: split "return from recursive call" path link_path_walk: cleanup - turn goto start; into continue; namei: fold may_follow_link() into follow_link() namei: introduce nameidata->stack namei: regularize use of put_link() and follow_link(), trim arguments namei: trim the arguments of get_link() new ->follow_link() and ->put_link() calling conventions uninline walk_component() NeilBrown (5): VFS: replace {, total_}link_count in task_struct with pointer to nameidata ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link VFS: replace nameidata arg to ->put_link with a char*. SECURITY: remove nameidata arg from inode_follow_link. VFS: remove nameidata args from ->follow_link Diffstat: Documentation/filesystems/Locking | 4 +- Documentation/filesystems/porting | 10 + Documentation/filesystems/vfs.txt | 4 +- drivers/staging/lustre/lustre/llite/symlink.c | 26 +- fs/9p/vfs_inode.c | 15 +- fs/9p/vfs_inode_dotl.c | 9 +- fs/autofs4/symlink.c | 5 +- fs/befs/linuxvfs.c | 46 ++-- fs/ceph/inode.c | 6 +- fs/cifs/cifsfs.h | 2 +- fs/cifs/link.c | 29 +- fs/configfs/symlink.c | 28 +- fs/debugfs/file.c | 5 +- fs/ecryptfs/inode.c | 11 +- fs/exofs/symlink.c | 7 +- fs/ext2/symlink.c | 6 +- fs/ext3/symlink.c | 6 +- fs/ext4/symlink.c | 6 +- fs/freevxfs/vxfs_immed.c | 11 +- fs/fuse/dir.c | 19 +- fs/gfs2/inode.c | 11 +- fs/hostfs/hostfs_kern.c | 16 +- fs/hppfs/hppfs.c | 9 +- fs/jffs2/symlink.c | 9 +- fs/jfs/symlink.c | 6 +- fs/kernfs/symlink.c | 23 +- fs/libfs.c | 7 +- fs/namei.c | 378 +++++++++++++++----------- fs/nfs/symlink.c | 18 +- fs/ntfs/namei.c | 1 - fs/overlayfs/inode.c | 36 +-- fs/proc/base.c | 4 +- fs/proc/inode.c | 8 +- fs/proc/namespaces.c | 4 +- fs/proc/self.c | 24 +- fs/proc/thread_self.c | 22 +- fs/sysv/symlink.c | 5 +- fs/ubifs/file.c | 7 +- fs/ufs/symlink.c | 6 +- fs/xfs/xfs_iops.c | 12 +- include/linux/fs.h | 11 +- include/linux/namei.h | 7 +- include/linux/sched.h | 3 +- include/linux/security.h | 9 +- mm/shmem.c | 28 +- security/capability.c | 3 +- security/security.c | 4 +- security/selinux/hooks.c | 2 +- 48 files changed, 457 insertions(+), 471 deletions(-)