From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752574AbcGOFKH (ORCPT ); Fri, 15 Jul 2016 01:10:07 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:38820 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751609AbcGOFKE (ORCPT ); Fri, 15 Jul 2016 01:10:04 -0400 Date: Fri, 15 Jul 2016 06:09:59 +0100 From: Al Viro To: Liu Shuo Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Zhang Yanmin , He Bo , Liu Shuo , Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH] KVM: release anon file in failure path of vm creation Message-ID: <20160715050959.GH14480@ZenIV.linux.org.uk> References: <1468316323-23835-1-git-send-email-shuo.a.liu@intel.com> <28049c2a-1b49-1909-52ce-105859e14e33@redhat.com> <20160714164647.GD14480@ZenIV.linux.org.uk> <20160715022204.GA16729@shuo-desktop.sh.intel.com> <20160715022603.GG14480@ZenIV.linux.org.uk> <20160715031841.GA20887@shuo-desktop.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160715031841.GA20887@shuo-desktop.sh.intel.com> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 15, 2016 at 11:18:41AM +0800, Liu Shuo wrote: > If there is no such thread (who operates the descriptor based on > guessing), i can think the changing is safe at the point. As the fd has > not been delivered to userspace. Am i right? Expecting nice behaviour from userland code is something best avoided, really. All jokes aside, this other thread doesn't have to be malicious - just being buggy would suffice. Besides, you never know if something like userns won't be dumped into the kernel, making your ioctl accessible to genuinely malicious code. The only sane approach is to treat descriptor tables as shared data structures and postpone the insertion of struct file reference into descriptor table until you are past all failure exits. Including the ones related to copying to userland - e.g. pipe(2) creates a pipe, sets up two struct file associated with it, reserves two descriptors, copies them into userland array and only if everything has succeeded proceeds to fd_install(). In your case passing the descriptor to userland is not an issue (return value of ioctl(2) goes via register and that can't fail), so the last failure exit is that after failed attempt to create debugfs stuff. We have to reserve the descriptor before that (it's used as a part of debugfs directory name), so anon_inode_getfd() is not an option - it combines reserving descriptor with fd_install(). Such situations are exactly the reason why anon_inode_getfile() is there; anon_inode_getfd() is usable only when it is the very last thing we do before returning the descriptor to userland. FWIW, original code was not unreasonable - it simply treated debugfs stuff as optional and ignored those failures. That way anon_inode_getfd() is fine - there's no failure exits after it. If we want to fail when debugfs had been enabled and we'd failed to populate it, we need to use the real primitives behind anon_inode_getfd(), though.