From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751412AbaDMVwv (ORCPT ); Sun, 13 Apr 2014 17:52:51 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:44517 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750758AbaDMVwt (ORCPT ); Sun, 13 Apr 2014 17:52:49 -0400 Date: Sun, 13 Apr 2014 22:52:42 +0100 From: Al Viro To: "Eric W. Biederman" Cc: Linus Torvalds , "Serge E. Hallyn" , Linux-Fsdevel , Kernel Mailing List , Andy Lutomirski , Rob Landley , Miklos Szeredi , Christoph Hellwig , Karel Zak , "J. Bruce Fields" , Fengguang Wu , tytso@mit.edu Subject: Re: [RFC][PATCH] vfs: In mntput run deactivate_super on a shallow stack. Message-ID: <20140413215242.GP18016@ZenIV.linux.org.uk> References: <87sipmbe8x.fsf@x220.int.ebiederm.org> <20140409175322.GZ18016@ZenIV.linux.org.uk> <20140409182830.GA18016@ZenIV.linux.org.uk> <87txa286fu.fsf@x220.int.ebiederm.org> <87fvlm860e.fsf_-_@x220.int.ebiederm.org> <20140409232423.GB18016@ZenIV.linux.org.uk> <87lhva5h4k.fsf@x220.int.ebiederm.org> <20140413053956.GM18016@ZenIV.linux.org.uk> <87zjjp3e7w.fsf@x220.int.ebiederm.org> <87ppkl1xb7.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ppkl1xb7.fsf@x220.int.ebiederm.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Apr 13, 2014 at 12:51:56AM -0700, Eric W. Biederman wrote: > btrfs_init_test_fs, and of course I was silly when I thought the module > ref count would be useful for something before init_module succeeds. > > Still I suspect I was on the right track. We do have the get_fs_type, > get_filesystem and put_filesystem. Which ought to be enough to allow > us to convert unregister_filesystem into an appropriate barrier. Umm... I don't think so - register_filesystem and unregister_filesystem are only about the fs being visible in fs type list. You do *not* have to register the sucker at all in order to be able to do kern_mount(). So the call of unregister_filesystem() is not guaranteed to happen at all - as the matter of fact, I think we ought to stop registering any mount_pseudo()-based ones (pipefs, etc.) and the only reason for _not_ doing that is that some scripts might be poking in /proc/filesystems. We definitely do not want that for anything *new* that isn't mountable from userland... BTW, could somebody explain this: in fs/ext4/super.c we have #define IS_EXT2_SB(sb) ((sb)->s_bdev->bd_holder == &ext2_fs_type) What's wrong with simple ((sb)->s_type == &ext2_fs_type)? What am I missing there? Ted? FWIW, I have found one bug of similar sort, but it's already present with the current semantics: init_hugetlb_fs() has for_each_hstate(h) { char buf[50]; unsigned ps_kb = 1U << (h->order + PAGE_SHIFT - 10); snprintf(buf, sizeof(buf), "pagesize=%uK", ps_kb); hugetlbfs_vfsmount[i] = kern_mount_data(&hugetlbfs_fs_type, buf); if (IS_ERR(hugetlbfs_vfsmount[i])) { pr_err("hugetlb: Cannot mount internal hugetlbfs for " "page size %uK", ps_kb); error = PTR_ERR(hugetlbfs_vfsmount[i]); hugetlbfs_vfsmount[i] = NULL; } i++; } /* Non default hstates are optional */ if (!IS_ERR_OR_NULL(hugetlbfs_vfsmount[default_hstate_idx])) return 0; followed by kmem_cache_destroy(hugetlbfs_inode_cachep); If some of those kern_mount_data() succeed, but not the one we really want to have, we'll end up with kmem_cache_destroy() on non-empy cache... Granted, it's very unlikely to happen, but it's still a bug. BTW, that IS_ERR_OR_NULL() there looks like cargo-culting - it's ERR_PTR() is explicitly turned into NULL a few lines above... Another thing: does anybody seriously expect the system to survive with every pipe(2) failing (and oopsing, actually)? If not, we probably should just make init_pipe_fs() panic on failure... The same goes for sock_init(), IMO...