From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
<izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Fri, 9 Dec 2016 11:40:50 +0800 [thread overview]
Message-ID: <584A27C2.6030407@cn.fujitsu.com> (raw)
In-Reply-To: <58497263.7080500@cn.fujitsu.com>
On 12/08/2016 10:46 PM, Cao jin wrote:
>
>
> On 12/06/2016 11:35 PM, Alex Williamson wrote:
>> On Tue, 6 Dec 2016 18:46:04 +0800
>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>
>>> On 12/06/2016 12:59 PM, Alex Williamson wrote:
>>>> On Tue, 6 Dec 2016 05:55:28 +0200
>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>
>>>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:
>>>>>> If you're going to take the lead for these AER patches, I would
>>>>>> certainly suggest that understanding the reasoning behind the bus reset
>>>>>> behavior is a central aspect to this series. This effort has dragged
>>>>>> out for nearly two years and I apologize, but I don't really have a lot
>>>>>> of patience for rehashing some of these issues if you're not going to
>>>>>> read the previous discussions or consult with your colleagues to
>>>>>> understand how we got to this point. If you want to challenge some of
>>>>>> the design points, that's great, it could use some new eyes, but please
>>>>>> understand how we got here first.
>>>>>
>>>>> Well I'm guessing Cao jin here isn't the only one not
>>>>> willing to plough through all historical versions of the patchset
>>>>> just to figure out the motivation for some code.
>>>>>
>>>>> Including a summary of a high level architecture couldn't hurt.
>>>>>
>>>>> Any chance of writing such? Alternatively, we can try to build it as
>>>>> part of this thread. Shouldn't be hard as it seems somewhat
>>>>> straight-forward on the surface:
>>>>>
>>>>> - detect link error on the host, don't reset link as we would normally do
>>>>
>>>> This is actually a new approach that I'm not sure I agree with. By
>>>> skipping the host directed link reset, vfio is taking responsibility
>>>> for doing this, but then we just assume the user will do it. I have
>>>> issues with this.
>>>>
>>>> The previous approach was to use the error detected notifier to block
>>>> access to the device, allowing the host to perform the link reset. A
>>>> subsequent notification in the AER process released the user access
>>>> which allowed the user AER process to proceed. This did result in both
>>>> a host directed and a guest directed link reset, but other than
>>>> coordinating the blocking of the user process during host reset, that
>>>> hasn't been brought up as an issue previously.
>>>>
>>>
>>> Tests on previous versions didn't bring up issues as I find, I think
>>> that is because we didn't test it completely. As I know, before August
>>> of this year, we didn't have cable connected to NIC, let alone
>>> connecting NIC to gateway.
>>
>> Lack of testing has been a significant issue throughout the development
>> of this series.
>>
>>> Even if I fixed the guest oops issue in igb driver that Alex found in
>>> v9, v9 still cannot work in my test. And in my test, disable link
>>> reset(in host) in aer core for vfio-pci is the most significant step to
>>> get my test passed.
>>
>> But is it the correct step? I'm not convinced. Why did blocking guest
>> access not work? How do you plan to manage vfio taking the
>> responsibility to perform a bus reset when you don't know whether QEMU
>> is the user of the device or whether the user supports AER recovery?
>>
>
> Maybe currently we don't have enough proof to prove the correctness, but
> I think I did find some facts to prove that link reset in host is a big
> trouble, and can answer part of questions above.
>
> 1st, some thoughts:
> In pci-error-recovery.txt and do_recovery() of kernel tree, we can see,
> a recovery consists of several steps(callbacks), link reset is one of
> them, and except link reset, the others are seems kind of device
> specific. In our case, both host & guest will do recovery, I think the
> host recovery actually is some kind of fake recovery, see vfio-pci
> driver's error_detected & resume callback, they don't do anything
> special, mainly signal error to user, but the link reset in host "fake
> reset" does some serious work, in other words, I think host does the
> recovery incompletely, so I was thinking, why not just drop incompletely
> host recovery(drop link reset) for vfio-pci, and let the guest take care
> of the whole serious recovery. This is part of the reason of why my
> version looks like this. But yes, I admit the issue Alex mentioned,
> vfio can't guarantee that user will do a bus reset, this is an issue I
> will keep looking for a solution.
>
> 2nd, some facts and analyzation from test:
> In host, the relationship between time and behviour in each component
> roughly looks as following:
>
> + HW + host kernel + qemu + guest kernel +
> | |(error recovery)| | |
> | | | | |
> | | vfio-pci's | | |
> | | error_detected | | |
> | | + | | |
> | | | | | |
> | | | error notify | | |
> | | | via eventfd | | |
> | | +---------------> +----------+ | |
> | | | +vfio_err_ | | |
> | | | |notifier_ | | |
> | +---- +<---+link reset | |handler | | |
> | | HW | | | | | | |
> | | | | | | | |
> | |r | | vfio-pci's | |pass aer | | |
> | |e.. | | resume | |to guest | | |
> | |s. | | (blocking end) | | | | |
> | |e | | | | *2* | | |
> | |t | | | +----+-----+ | |
> | |i | | | | | |
> | |n | | | +--------> +----------+ |
> | |g | | | | | guest | |
> | | | | | | | recovery | |
> | | | | | | | process | |
> | | | | | | |(include | |
> | | | | | | |register | |
> | | *1* | | | | |access) | |
> | | | | | | | | |
> | | | | | | | *3* | |
> | | | | +----------+ |
> Time | | | | |
> | | | | |
> | | | | |
> | | | | |
> v | | | |
>
> Now let me try to answer: Why did blocking guest access not work?
> Some important factor:
> 1. host recovery doesn't do anything special except error notifying, so
> it may be executed very fast.
> 2. Hardware resetting time is not sure, from pcie spec 6.6.1, guessing
> it need many ms, pretty long?
>
> some facts found in v9(block config write, not read, during host
> recovery) test:
> 1. reading uncor error register in vfio_err_notifier_handler sometimes
> returns correct value, sometimes return invalid value(All F's)
>
I forget another facts:
2. the igb bug[*] our patch v9 triggered also is the same kind things.
error_detected() of igb in guest read config space, and return invalid
all F's, then igb NULLed its hardware address, then oops shows.
I think it is also a concurrent issue as described below.
[*] http://patchwork.ozlabs.org/patch/692171
> So, I am thinking, if host blocking on host end early, and *2*, *3* is
> parallel with *1*, the way used in v9 to blocking guest access, may not
> work.
>
--
Sincerely,
Cao jin
next prev parent reply other threads:[~2016-12-09 7:47 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28 3:00 ` Michael S. Tsirkin
2016-11-28 9:32 ` Cao jin
2016-11-30 1:46 ` Michael S. Tsirkin
2016-12-01 13:38 ` Cao jin
2016-12-01 4:04 ` Alex Williamson
2016-12-01 4:51 ` Michael S. Tsirkin
2016-12-01 13:40 ` Cao jin
2016-12-06 3:46 ` Michael S. Tsirkin
2016-12-06 6:47 ` Cao jin
2016-12-01 13:40 ` Cao jin
2016-12-01 14:55 ` Alex Williamson
2016-12-04 12:16 ` Cao jin
2016-12-04 15:30 ` Alex Williamson
2016-12-05 5:52 ` Cao jin
2016-12-05 16:17 ` Alex Williamson
2016-12-06 3:55 ` Michael S. Tsirkin
2016-12-06 4:59 ` Alex Williamson
2016-12-06 10:46 ` Cao jin
2016-12-06 15:35 ` Alex Williamson
2016-12-07 2:49 ` Cao jin
2016-12-08 14:46 ` Cao jin
2016-12-08 16:30 ` Michael S. Tsirkin
2016-12-09 3:40 ` Cao jin
2016-12-09 3:40 ` Cao jin [this message]
2016-12-06 6:11 ` Cao jin
2016-12-06 15:25 ` Alex Williamson
2016-12-07 2:58 ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12 ` Alex Williamson
2016-12-12 22:29 ` Michael S. Tsirkin
2016-12-12 22:43 ` Alex Williamson
2016-12-13 3:15 ` Michael S. Tsirkin
2016-12-13 3:39 ` Alex Williamson
2016-12-13 16:12 ` Michael S. Tsirkin
2016-12-13 16:27 ` Alex Williamson
2016-12-14 1:58 ` Michael S. Tsirkin
2016-12-14 3:00 ` Alex Williamson
2016-12-14 22:20 ` Michael S. Tsirkin
2016-12-14 22:47 ` Alex Williamson
2016-12-14 23:00 ` Michael S. Tsirkin
2016-12-14 23:32 ` Alex Williamson
2016-12-14 10:24 ` Cao jin
2016-12-14 22:16 ` Alex Williamson
2016-12-14 22:25 ` Michael S. Tsirkin
2016-12-14 22:49 ` Alex Williamson
2016-12-15 13:56 ` Cao jin
2016-12-15 14:50 ` Michael S. Tsirkin
2016-12-15 22:01 ` Alex Williamson
2016-12-16 10:15 ` Cao jin
2016-12-16 10:15 ` Cao jin
2016-12-15 17:02 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=584A27C2.6030407@cn.fujitsu.com \
--to=caoj.fnst@cn.fujitsu.com \
--cc=alex.williamson@redhat.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).