From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50434) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFC7g-0006zx-1K for qemu-devel@nongnu.org; Mon, 20 Jun 2016 23:13:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFC7c-0007Xx-RF for qemu-devel@nongnu.org; Mon, 20 Jun 2016 23:13:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57069) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFC7c-0007Xq-J0 for qemu-devel@nongnu.org; Mon, 20 Jun 2016 23:13:04 -0400 Date: Mon, 20 Jun 2016 21:13:06 -0600 From: Alex Williamson Message-ID: <20160620211306.66a6b249@t450s.home> In-Reply-To: References: <1464315131-25834-1-git-send-email-zhoujie2011@cn.fujitsu.com> <20160527100655.60db8206@t450s.home> <30d1cd95-7f67-29cf-c55e-0565364d89ff@cn.fujitsu.com> <41b0c187-ade0-182e-46b5-afd3e99f1e36@cn.fujitsu.com> <20160620103226.0ff61b21@ul30vt.home> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume notification handler for aer resume List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Zhou Jie Cc: fan.chen@easystack.cn, mst@redhat.com, qemu-devel@nongnu.org, caoj.fnst@cn.fujitsu.com, Chen Fan , izumi.taku@jp.fujitsu.com On Tue, 21 Jun 2016 10:16:25 +0800 Zhou Jie wrote: > Hi, Alex > > > I was really hoping to hear your opinion, or at least some further > > discussion of pros and cons rather than simply parroting back my idea. > I understand. > > > My current thinking is that a resume notifier to userspace is poorly > > defined, it's not clear what the user can and cannot do between an > > error notification and the resume notification. > Yes, do nothing between that time is better. > > > One approach to solve > > that might be that the kernel internally handles the resume > > notifications. Maybe that means blocking the ioctl (interruptible > > timeout) until the internal resume occurs, or maybe that means > > returning -EAGAIN. > I don't think it is a good idea. > The kernel give the error and resume notifications, it's enough. > It's up to user to how to use them. Well that's exactly why it's poorly defined. What does a resume notification signal a user that they're allowed to do? What can they not do between error and resume notification. Clearly you had issues attempting to perform a reset during this time period since it was racing with the kernel reset, so is a user allowed to do a hot reset between error and resume? Where do we define it? Do we prevent it if they try? Why? What about the reset ioctl? How and why is that different from a hot reset? (hint, they can be the same) Do we define that resets are not allowed between error and resume, but other operations like read/write or interrupt setup ioctls are allowed? Why? Clearly we can't do anything that manipulates the device between error and resume since it might be lost or ineffective, but where do we define it and do we need to actively enforce those rules? I'm arguing that it's poorly defined, so "it's up to the user how to use them" doesn't not give me any additional confidence in that approach. We can't trust the user to be polite, we can't even trust the user not to be malicious. > > Probably implementations of each need to be worked > > through to determine which is better. We don't want to add complexity > > to the kernel simply to make things easier for userspace, but we also > > don't want a poorly specified interface that is difficult for > > userspace to use correctly. Thanks, > In qemu, the aer recovery process: > 1. Detect support for resume notification > If host vfio driver does not support for resume notification, > directly fail to boot up VM as with aer enabled. > 2. Immediately notify the VM on error detected. > 3. Disable the device. > Unmap the config space and bar region. > 4. Delay the guest directed bus reset. > 5. Wait for resume notification. > If we don't get the resume notification from the host after > some timeout, we would abort the guest directed bus reset > altogether and unplug of the device to prevent it from further > interacting with the VM. > 6. After get the resume notification reset bus and enable the device. > > I think we only make sure the disabled device > will not interact with the VM. Should interrupt irqfds then also be disabled so they trap into QEMU and we can prevent that interaction? Also, QEMU can be polite, but as above, QEMU is just one user, the API is open to anyone and QEMU might be exploited to not be so polite. So if there are points where the user can interfere with the kernel or exploit the knowledge that the device is going through a reset, the kernel can't rely on a friendly user. Thanks, Alex