From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42566)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <fan.chen@easystack.cn>) id 1bFLYR-0005rx-SW
	for qemu-devel@nongnu.org; Tue, 21 Jun 2016 09:17:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <fan.chen@easystack.cn>) id 1bFLYN-00028H-NB
	for qemu-devel@nongnu.org; Tue, 21 Jun 2016 09:17:23 -0400
Received: from m199-177.yeah.net ([123.58.177.199]:15560)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <fan.chen@easystack.cn>) id 1bFLYM-000270-MJ
	for qemu-devel@nongnu.org; Tue, 21 Jun 2016 09:17:19 -0400
References: <1464315131-25834-1-git-send-email-zhoujie2011@cn.fujitsu.com>
	<20160527100655.60db8206@t450s.home>
	<30d1cd95-7f67-29cf-c55e-0565364d89ff@cn.fujitsu.com>
	<41b0c187-ade0-182e-46b5-afd3e99f1e36@cn.fujitsu.com>
	<20160620103226.0ff61b21@ul30vt.home>
	<c12c77e8-e664-9b09-5380-7dd9e09ec4e2@cn.fujitsu.com>
	<20160620211306.66a6b249@t450s.home>
From: Chen Fan <fan.chen@easystack.cn>
Message-ID: <576935FC.1080503@easystack.cn>
Date: Tue, 21 Jun 2016 20:41:32 +0800
MIME-Version: 1.0
In-Reply-To: <20160620211306.66a6b249@t450s.home>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume
 notification handler for aer resume
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>, Zhou Jie <zhoujie2011@cn.fujitsu.com>
Cc: mst@redhat.com, qemu-devel@nongnu.org, caoj.fnst@cn.fujitsu.com, Chen Fan <chen.fan.fnst@cn.fujitsu.com>, izumi.taku@jp.fujitsu.com

On 2016=E5=B9=B406=E6=9C=8821=E6=97=A5 11:13, Alex Williamson wrote:
> On Tue, 21 Jun 2016 10:16:25 +0800
> Zhou Jie <zhoujie2011@cn.fujitsu.com> wrote:
>
>> Hi, Alex
>>
>>> I was really hoping to hear your opinion, or at least some further
>>> discussion of pros and cons rather than simply parroting back my idea=
.
>> I understand.
>>
>>> My current thinking is that a resume notifier to userspace is poorly
>>> defined, it's not clear what the user can and cannot do between an
>>> error notification and the resume notification.
>> Yes, do nothing between that time is better.
>>
>>> One approach to solve
>>> that might be that the kernel internally handles the resume
>>> notifications.  Maybe that means blocking the ioctl (interruptible
>>> timeout) until the internal resume occurs, or maybe that means
>>> returning -EAGAIN.
>> I don't think it is a good idea.
>> The kernel give the error and resume notifications, it's enough.
>> It's up to user to how to use them.
> Well that's exactly why it's poorly defined.  What does a resume
> notification signal a user that they're allowed to do?  What can they
> not do between error and resume notification.  Clearly you had issues
> attempting to perform a reset during this time period since it was
> racing with the kernel reset, so is a user allowed to do a hot reset
> between error and resume?  Where do we define it?  Do we prevent it if
> they try?  Why?  What about the reset ioctl?  How and why is that
> different from a hot reset?  (hint, they can be the same)  Do we define
> that resets are not allowed between error and resume, but other
> operations like read/write or interrupt setup ioctls are allowed? Why?
> Clearly we can't do anything that manipulates the device between error
> and resume since it might be lost or ineffective, but where do we
> define it and do we need to actively enforce those rules?  I'm arguing
> that it's poorly defined, so "it's up to the user how to use them"
> doesn't not give me any additional confidence in that approach.  We
> can't trust the user to be polite, we can't even trust the user not to
> be malicious.
Hi Alex,
      on kernel side, I think if we don't trust the user behaviors, we=20
should
  disable the access of vfio-pci interface once vfio-pci driver got the=20
error_detected,
  we should disable all access to vfio fd regardless whether the vfio-pci
  was assigned to a VM, we also can return a EAGAIN error if user try
  to access it during the reset period until the host reset finished.
      on qemu side, when we got a error_detect, we pass through the
aer error to guest directly, ignore all access to vfio-pci during this=20
time,
when qemu need to do a hot reset, we can retry to get the info from
the get info ioctl until we got the info that vfio-pci has been reset=20
finished,
then do the hot_reset ioctl if need, the kernel should ensure the ioctl=20
become
//// accessible after host reset completed.

Thanks,
Chen


>  =20
>>> Probably implementations of each need to be worked
>>> through to determine which is better.  We don't want to add complexit=
y
>>> to the kernel simply to make things easier for userspace, but we also
>>> don't want a poorly specified interface that is difficult for
>>> userspace to use correctly.  Thanks,
>> In qemu, the aer recovery process:
>>     1. Detect support for resume notification
>>        If host vfio driver does not support for resume notification,
>>        directly fail to boot up VM as with aer enabled.
>>     2. Immediately notify the VM on error detected.
>>     3. Disable the device.
>>        Unmap the config space and bar region.
>>     4. Delay the guest directed bus reset.
>>     5. Wait for resume notification.
>>        If we don't get the resume notification from the host after
>>        some timeout, we would abort the guest directed bus reset
>>        altogether and unplug of the device to prevent it from further
>>        interacting with the VM.
>>     6. After get the resume notification reset bus and enable the devi=
ce.
>>
>> I think we only make sure the disabled device
>>    will not interact with the VM.
> Should interrupt irqfds then also be disabled so they trap into QEMU
> and we can prevent that interaction?  Also, QEMU can be polite, but as
> above, QEMU is just one user, the API is open to anyone and QEMU might
> be exploited to not be so polite.  So if there are points where the
> user can interfere with the kernel or exploit the knowledge that the
> device is going through a reset, the kernel can't rely on a friendly
> user.  Thanks,
>
> Alex
>

--=20
Sincerely,
Chen Fan