From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
<izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Wed, 7 Dec 2016 10:49:39 +0800 [thread overview]
Message-ID: <584778C3.8050302@cn.fujitsu.com> (raw)
In-Reply-To: <20161206083556.23be6ee5@t450s.home>
On 12/06/2016 11:35 PM, Alex Williamson wrote:
> On Tue, 6 Dec 2016 18:46:04 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>
>> On 12/06/2016 12:59 PM, Alex Williamson wrote:
>>> On Tue, 6 Dec 2016 05:55:28 +0200
>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>
>>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:
>>>>> If you're going to take the lead for these AER patches, I would
>>>>> certainly suggest that understanding the reasoning behind the bus reset
>>>>> behavior is a central aspect to this series. This effort has dragged
>>>>> out for nearly two years and I apologize, but I don't really have a lot
>>>>> of patience for rehashing some of these issues if you're not going to
>>>>> read the previous discussions or consult with your colleagues to
>>>>> understand how we got to this point. If you want to challenge some of
>>>>> the design points, that's great, it could use some new eyes, but please
>>>>> understand how we got here first.
>>>>
>>>> Well I'm guessing Cao jin here isn't the only one not
>>>> willing to plough through all historical versions of the patchset
>>>> just to figure out the motivation for some code.
>>>>
>>>> Including a summary of a high level architecture couldn't hurt.
>>>>
>>>> Any chance of writing such? Alternatively, we can try to build it as
>>>> part of this thread. Shouldn't be hard as it seems somewhat
>>>> straight-forward on the surface:
>>>>
>>>> - detect link error on the host, don't reset link as we would normally do
>>>
>>> This is actually a new approach that I'm not sure I agree with. By
>>> skipping the host directed link reset, vfio is taking responsibility
>>> for doing this, but then we just assume the user will do it. I have
>>> issues with this.
>>>
>>> The previous approach was to use the error detected notifier to block
>>> access to the device, allowing the host to perform the link reset. A
>>> subsequent notification in the AER process released the user access
>>> which allowed the user AER process to proceed. This did result in both
>>> a host directed and a guest directed link reset, but other than
>>> coordinating the blocking of the user process during host reset, that
>>> hasn't been brought up as an issue previously.
>>>
>>
>> Tests on previous versions didn't bring up issues as I find, I think
>> that is because we didn't test it completely. As I know, before August
>> of this year, we didn't have cable connected to NIC, let alone
>> connecting NIC to gateway.
>
> Lack of testing has been a significant issue throughout the development
> of this series.
>
>> Even if I fixed the guest oops issue in igb driver that Alex found in
>> v9, v9 still cannot work in my test. And in my test, disable link
>> reset(in host) in aer core for vfio-pci is the most significant step to
>> get my test passed.
>
> But is it the correct step? I'm not convinced. Why did blocking guest
> access not work? How do you plan to manage vfio taking the
> responsibility to perform a bus reset when you don't know whether QEMU
> is the user of the device or whether the user supports AER recovery?
>
>>>> - report link error to guest
>>>> - detect link reset request from guest
>>>> - reset link on host
>>>>
>>>> Since link reset will reset all devices behind it, for this to work we
>>>> need same set of devices behind the link in host and guest. Enforcing
>>>> this would be nice to have.
>>>
>>> This is a pretty significant complication and I think it's a
>>> requirement. This is the heart of why we have an AER vfio-pci device
>>> option and why we require that QEMU should fail to initialize the
>>> device if AER is enabled in an incompatible configuration. If a user
>>> cannot be sure that AER is enabled on a device, it's pointless to
>>> bother implementing it, though honestly I question the value of it in
>>> the VM altogether given configuration requirements (ie. are users
>>> going to accept the reason that all the ports of a device need to be
>>> assigned to a single guest for guest-based AER recovery when they were
>>> previously able to assign each port to a separate VM?).
>>>
>>>> - as link now might end up in bad state, reset
>>>> it when device is unassigned
>>>
>>> This is also a new aspect for the approach here, previously we allowed
>>> the host directed link reset so we could assume the device was
>>> sufficiently recovered. In the proposal here, the AER core skips any
>>> devices bound to vfio-pci, but vfio can't guarantee that we can do a
>>> bus reset on them. PCI bus isolation is not accounted for in DMA
>>> isolation, which is the basis for IOMMU groups. A bus can host
>>> multiple IOMMU groups, each of which may have a different user. Only
>>> in a very specific configuration can vfio do a bus reset.
>>>
>>>> Any details I missed?
>>>
>>> AIUI, the critical feature is that the guest needs to be able to reset
>>> the device link, all the other design elements play out from the
>>> logical expansion of that feature. It means that a guest bus reset
>>> needs to translate to a host bus reset, which means that all of the
>>> affected host devices need to be accounted for and those that are
>>> assigned to the guest need to be affected in the guest in the same
>>> way. QEMU must enforce this configuration or else a user cannot know
>>> the result of a AER fault, ie. will it cause a VMSTOP condition or is
>>> the fault forwarded to the guest. The required configuration
>>> restrictions are quite involved, therefore we can't simply require this
>>> of all configurations, so a vfio-pci device option is added. The
>>> coordination of the host directed reset with the guest directed reset
>>> was only a complications discovered within the last few revisions of
>>> the series. As noted above, the previous solution to this was to
>>> attempt to block access to the device while the host reset proceeds.
>>>
>>> Clearly it's a little disconcerting if we throw all of that away and
>>> simply assume that an FLR is sufficient to reset the device when it
>>> seems like link issues might be a nontrivial source of AER faults. If
>>> FLR is sufficient, then why does the core AER handling code in the
>>> kernel not simply do this? Thanks,
>>>
>>
>> I agree with the points here. Now I understand why translate a guest
>> link reset to host link reset[*], and FLR shouldn't be equivalent to
>> link reset, but the PITY is, we can't trigger a real fatal error to see
>> if a FLR is sufficient to reset the device.
>
> In this case, it's not a matter of finding a scenario where it works.
> Using FLR to recover from a fatal error would need to be supported by
> verbiage in a specification. I'm not interested in cases where it
> happens to work on single device for a certain type of error. The link
> reset is the bare metal mechanism for recovering from a fatal error and
> it should be our mechanism as well unless there's spec wording to
> support another approach. Thanks,
>
Ok, I understand your points here, will drag that dropped patch back and
test.
--
Sincerely,
Cao jin
next prev parent reply other threads:[~2016-12-07 2:45 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28 3:00 ` Michael S. Tsirkin
2016-11-28 9:32 ` Cao jin
2016-11-30 1:46 ` Michael S. Tsirkin
2016-12-01 13:38 ` Cao jin
2016-12-01 4:04 ` Alex Williamson
2016-12-01 4:51 ` Michael S. Tsirkin
2016-12-01 13:40 ` Cao jin
2016-12-06 3:46 ` Michael S. Tsirkin
2016-12-06 6:47 ` Cao jin
2016-12-01 13:40 ` Cao jin
2016-12-01 14:55 ` Alex Williamson
2016-12-04 12:16 ` Cao jin
2016-12-04 15:30 ` Alex Williamson
2016-12-05 5:52 ` Cao jin
2016-12-05 16:17 ` Alex Williamson
2016-12-06 3:55 ` Michael S. Tsirkin
2016-12-06 4:59 ` Alex Williamson
2016-12-06 10:46 ` Cao jin
2016-12-06 15:35 ` Alex Williamson
2016-12-07 2:49 ` Cao jin [this message]
2016-12-08 14:46 ` Cao jin
2016-12-08 16:30 ` Michael S. Tsirkin
2016-12-09 3:40 ` Cao jin
2016-12-09 3:40 ` Cao jin
2016-12-06 6:11 ` Cao jin
2016-12-06 15:25 ` Alex Williamson
2016-12-07 2:58 ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12 ` Alex Williamson
2016-12-12 22:29 ` Michael S. Tsirkin
2016-12-12 22:43 ` Alex Williamson
2016-12-13 3:15 ` Michael S. Tsirkin
2016-12-13 3:39 ` Alex Williamson
2016-12-13 16:12 ` Michael S. Tsirkin
2016-12-13 16:27 ` Alex Williamson
2016-12-14 1:58 ` Michael S. Tsirkin
2016-12-14 3:00 ` Alex Williamson
2016-12-14 22:20 ` Michael S. Tsirkin
2016-12-14 22:47 ` Alex Williamson
2016-12-14 23:00 ` Michael S. Tsirkin
2016-12-14 23:32 ` Alex Williamson
2016-12-14 10:24 ` Cao jin
2016-12-14 22:16 ` Alex Williamson
2016-12-14 22:25 ` Michael S. Tsirkin
2016-12-14 22:49 ` Alex Williamson
2016-12-15 13:56 ` Cao jin
2016-12-15 14:50 ` Michael S. Tsirkin
2016-12-15 22:01 ` Alex Williamson
2016-12-16 10:15 ` Cao jin
2016-12-16 10:15 ` Cao jin
2016-12-15 17:02 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=584778C3.8050302@cn.fujitsu.com \
--to=caoj.fnst@cn.fujitsu.com \
--cc=alex.williamson@redhat.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).