From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42566) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFLYR-0005rx-SW for qemu-devel@nongnu.org; Tue, 21 Jun 2016 09:17:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFLYN-00028H-NB for qemu-devel@nongnu.org; Tue, 21 Jun 2016 09:17:23 -0400 Received: from m199-177.yeah.net ([123.58.177.199]:15560) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFLYM-000270-MJ for qemu-devel@nongnu.org; Tue, 21 Jun 2016 09:17:19 -0400 References: <1464315131-25834-1-git-send-email-zhoujie2011@cn.fujitsu.com> <20160527100655.60db8206@t450s.home> <30d1cd95-7f67-29cf-c55e-0565364d89ff@cn.fujitsu.com> <41b0c187-ade0-182e-46b5-afd3e99f1e36@cn.fujitsu.com> <20160620103226.0ff61b21@ul30vt.home> <20160620211306.66a6b249@t450s.home> From: Chen Fan Message-ID: <576935FC.1080503@easystack.cn> Date: Tue, 21 Jun 2016 20:41:32 +0800 MIME-Version: 1.0 In-Reply-To: <20160620211306.66a6b249@t450s.home> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume notification handler for aer resume List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson , Zhou Jie Cc: mst@redhat.com, qemu-devel@nongnu.org, caoj.fnst@cn.fujitsu.com, Chen Fan , izumi.taku@jp.fujitsu.com On 2016=E5=B9=B406=E6=9C=8821=E6=97=A5 11:13, Alex Williamson wrote: > On Tue, 21 Jun 2016 10:16:25 +0800 > Zhou Jie wrote: > >> Hi, Alex >> >>> I was really hoping to hear your opinion, or at least some further >>> discussion of pros and cons rather than simply parroting back my idea= . >> I understand. >> >>> My current thinking is that a resume notifier to userspace is poorly >>> defined, it's not clear what the user can and cannot do between an >>> error notification and the resume notification. >> Yes, do nothing between that time is better. >> >>> One approach to solve >>> that might be that the kernel internally handles the resume >>> notifications. Maybe that means blocking the ioctl (interruptible >>> timeout) until the internal resume occurs, or maybe that means >>> returning -EAGAIN. >> I don't think it is a good idea. >> The kernel give the error and resume notifications, it's enough. >> It's up to user to how to use them. > Well that's exactly why it's poorly defined. What does a resume > notification signal a user that they're allowed to do? What can they > not do between error and resume notification. Clearly you had issues > attempting to perform a reset during this time period since it was > racing with the kernel reset, so is a user allowed to do a hot reset > between error and resume? Where do we define it? Do we prevent it if > they try? Why? What about the reset ioctl? How and why is that > different from a hot reset? (hint, they can be the same) Do we define > that resets are not allowed between error and resume, but other > operations like read/write or interrupt setup ioctls are allowed? Why? > Clearly we can't do anything that manipulates the device between error > and resume since it might be lost or ineffective, but where do we > define it and do we need to actively enforce those rules? I'm arguing > that it's poorly defined, so "it's up to the user how to use them" > doesn't not give me any additional confidence in that approach. We > can't trust the user to be polite, we can't even trust the user not to > be malicious. Hi Alex, on kernel side, I think if we don't trust the user behaviors, we=20 should disable the access of vfio-pci interface once vfio-pci driver got the=20 error_detected, we should disable all access to vfio fd regardless whether the vfio-pci was assigned to a VM, we also can return a EAGAIN error if user try to access it during the reset period until the host reset finished. on qemu side, when we got a error_detect, we pass through the aer error to guest directly, ignore all access to vfio-pci during this=20 time, when qemu need to do a hot reset, we can retry to get the info from the get info ioctl until we got the info that vfio-pci has been reset=20 finished, then do the hot_reset ioctl if need, the kernel should ensure the ioctl=20 become //// accessible after host reset completed. Thanks, Chen > =20 >>> Probably implementations of each need to be worked >>> through to determine which is better. We don't want to add complexit= y >>> to the kernel simply to make things easier for userspace, but we also >>> don't want a poorly specified interface that is difficult for >>> userspace to use correctly. Thanks, >> In qemu, the aer recovery process: >> 1. Detect support for resume notification >> If host vfio driver does not support for resume notification, >> directly fail to boot up VM as with aer enabled. >> 2. Immediately notify the VM on error detected. >> 3. Disable the device. >> Unmap the config space and bar region. >> 4. Delay the guest directed bus reset. >> 5. Wait for resume notification. >> If we don't get the resume notification from the host after >> some timeout, we would abort the guest directed bus reset >> altogether and unplug of the device to prevent it from further >> interacting with the VM. >> 6. After get the resume notification reset bus and enable the devi= ce. >> >> I think we only make sure the disabled device >> will not interact with the VM. > Should interrupt irqfds then also be disabled so they trap into QEMU > and we can prevent that interaction? Also, QEMU can be polite, but as > above, QEMU is just one user, the API is open to anyone and QEMU might > be exploited to not be so polite. So if there are points where the > user can interfere with the kernel or exploit the knowledge that the > device is going through a reset, the kernel can't rely on a friendly > user. Thanks, > > Alex > --=20 Sincerely, Chen Fan