From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932153AbcGOHED (ORCPT ); Fri, 15 Jul 2016 03:04:03 -0400 Received: from mga01.intel.com ([192.55.52.88]:42564 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752533AbcGOHD7 (ORCPT ); Fri, 15 Jul 2016 03:03:59 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,367,1464678000"; d="scan'208";a="995725898" Date: Fri, 15 Jul 2016 15:04:08 +0800 From: Liu Shuo To: Al Viro Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Zhang Yanmin , He Bo , Liu Shuo , Radim =?utf-8?B?S3LEjW3DocWZ?= Subject: Re: [PATCH] KVM: release anon file in failure path of vm creation Message-ID: <20160715070408.GA18395@shuo-desktop.sh.intel.com> References: <1468316323-23835-1-git-send-email-shuo.a.liu@intel.com> <28049c2a-1b49-1909-52ce-105859e14e33@redhat.com> <20160714164647.GD14480@ZenIV.linux.org.uk> <20160715022204.GA16729@shuo-desktop.sh.intel.com> <20160715022603.GG14480@ZenIV.linux.org.uk> <20160715031841.GA20887@shuo-desktop.sh.intel.com> <20160715050959.GH14480@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20160715050959.GH14480@ZenIV.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 15.Jul'16 at 6:09:59 +0100, Al Viro wrote: >On Fri, Jul 15, 2016 at 11:18:41AM +0800, Liu Shuo wrote: > >> If there is no such thread (who operates the descriptor based on >> guessing), i can think the changing is safe at the point. As the fd has >> not been delivered to userspace. Am i right? > > >Expecting nice behaviour from userland code is something best avoided, really. > Got it! :) > >All jokes aside, this other thread doesn't have to be malicious - just being >buggy would suffice. Besides, you never know if something like userns won't >be dumped into the kernel, making your ioctl accessible to genuinely >malicious code. > >The only sane approach is to treat descriptor tables as shared data structures >and postpone the insertion of struct file reference into descriptor table >until you are past all failure exits. Including the ones related to copying >to userland - e.g. pipe(2) creates a pipe, sets up two struct file associated >with it, reserves two descriptors, copies them into userland array and only if >everything has succeeded proceeds to fd_install(). In your case passing the >descriptor to userland is not an issue (return value of ioctl(2) goes via >register and that can't fail), so the last failure exit is that after failed >attempt to create debugfs stuff. We have to reserve the descriptor before >that (it's used as a part of debugfs directory name), so anon_inode_getfd() >is not an option - it combines reserving descriptor with fd_install(). >Such situations are exactly the reason why anon_inode_getfile() is there; >anon_inode_getfd() is usable only when it is the very last thing we do >before returning the descriptor to userland. Really appreciate your exhaustive explanations. Thanks. > >FWIW, original code was not unreasonable - it simply treated debugfs stuff as >optional and ignored those failures. That way anon_inode_getfd() is fine - >there's no failure exits after it. If we want to fail when debugfs had >been enabled and we'd failed to populate it, we need to use the real primitives >behind anon_inode_getfd(), though. So, my firstly immature idea returns back. Is the fd used by debugfs name very useful? Do we really need it here? Alternative approach is putting the fd into the debugfs, then the code will be clean here, as we can put anon_inode_getfd at very last of the ioctl.