From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50434)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bFC7g-0006zx-1K
	for qemu-devel@nongnu.org; Mon, 20 Jun 2016 23:13:09 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bFC7c-0007Xx-RF
	for qemu-devel@nongnu.org; Mon, 20 Jun 2016 23:13:07 -0400
Received: from mx1.redhat.com ([209.132.183.28]:57069)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bFC7c-0007Xq-J0
	for qemu-devel@nongnu.org; Mon, 20 Jun 2016 23:13:04 -0400
Date: Mon, 20 Jun 2016 21:13:06 -0600
From: Alex Williamson <alex.williamson@redhat.com>
Message-ID: <20160620211306.66a6b249@t450s.home>
In-Reply-To: <c12c77e8-e664-9b09-5380-7dd9e09ec4e2@cn.fujitsu.com>
References: <1464315131-25834-1-git-send-email-zhoujie2011@cn.fujitsu.com>
	<20160527100655.60db8206@t450s.home>
	<30d1cd95-7f67-29cf-c55e-0565364d89ff@cn.fujitsu.com>
	<41b0c187-ade0-182e-46b5-afd3e99f1e36@cn.fujitsu.com>
	<20160620103226.0ff61b21@ul30vt.home>
	<c12c77e8-e664-9b09-5380-7dd9e09ec4e2@cn.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume
 notification handler for aer resume
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Cc: fan.chen@easystack.cn, mst@redhat.com, qemu-devel@nongnu.org, caoj.fnst@cn.fujitsu.com, Chen Fan <chen.fan.fnst@cn.fujitsu.com>, izumi.taku@jp.fujitsu.com

On Tue, 21 Jun 2016 10:16:25 +0800
Zhou Jie <zhoujie2011@cn.fujitsu.com> wrote:

> Hi, Alex
> 
> > I was really hoping to hear your opinion, or at least some further
> > discussion of pros and cons rather than simply parroting back my idea.  
> I understand.
> 
> > My current thinking is that a resume notifier to userspace is poorly
> > defined, it's not clear what the user can and cannot do between an
> > error notification and the resume notification.  
> Yes, do nothing between that time is better.
> 
> > One approach to solve
> > that might be that the kernel internally handles the resume
> > notifications.  Maybe that means blocking the ioctl (interruptible
> > timeout) until the internal resume occurs, or maybe that means
> > returning -EAGAIN.  
> I don't think it is a good idea.
> The kernel give the error and resume notifications, it's enough.
> It's up to user to how to use them.

Well that's exactly why it's poorly defined.  What does a resume
notification signal a user that they're allowed to do?  What can they
not do between error and resume notification.  Clearly you had issues
attempting to perform a reset during this time period since it was
racing with the kernel reset, so is a user allowed to do a hot reset
between error and resume?  Where do we define it?  Do we prevent it if
they try?  Why?  What about the reset ioctl?  How and why is that
different from a hot reset?  (hint, they can be the same)  Do we define
that resets are not allowed between error and resume, but other
operations like read/write or interrupt setup ioctls are allowed? Why?
Clearly we can't do anything that manipulates the device between error
and resume since it might be lost or ineffective, but where do we
define it and do we need to actively enforce those rules?  I'm arguing
that it's poorly defined, so "it's up to the user how to use them"
doesn't not give me any additional confidence in that approach.  We
can't trust the user to be polite, we can't even trust the user not to
be malicious.
 
> > Probably implementations of each need to be worked
> > through to determine which is better.  We don't want to add complexity
> > to the kernel simply to make things easier for userspace, but we also
> > don't want a poorly specified interface that is difficult for
> > userspace to use correctly.  Thanks,  
> In qemu, the aer recovery process:
>    1. Detect support for resume notification
>       If host vfio driver does not support for resume notification,
>       directly fail to boot up VM as with aer enabled.
>    2. Immediately notify the VM on error detected.
>    3. Disable the device.
>       Unmap the config space and bar region.
>    4. Delay the guest directed bus reset.
>    5. Wait for resume notification.
>       If we don't get the resume notification from the host after
>       some timeout, we would abort the guest directed bus reset
>       altogether and unplug of the device to prevent it from further
>       interacting with the VM.
>    6. After get the resume notification reset bus and enable the device.
> 
> I think we only make sure the disabled device
>   will not interact with the VM.

Should interrupt irqfds then also be disabled so they trap into QEMU
and we can prevent that interaction?  Also, QEMU can be polite, but as
above, QEMU is just one user, the API is open to anyone and QEMU might
be exploited to not be so polite.  So if there are points where the
user can interfere with the kernel or exploit the knowledge that the
device is going through a reset, the kernel can't rely on a friendly
user.  Thanks,

Alex