linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	<linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Fri, 9 Dec 2016 11:40:50 +0800	[thread overview]
Message-ID: <584A27C2.6030407@cn.fujitsu.com> (raw)
In-Reply-To: <58497263.7080500@cn.fujitsu.com>



On 12/08/2016 10:46 PM, Cao jin wrote:
> 
> 
> On 12/06/2016 11:35 PM, Alex Williamson wrote:
>> On Tue, 6 Dec 2016 18:46:04 +0800
>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>
>>> On 12/06/2016 12:59 PM, Alex Williamson wrote:
>>>> On Tue, 6 Dec 2016 05:55:28 +0200
>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>   
>>>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:  
>>>>>> If you're going to take the lead for these AER patches, I would
>>>>>> certainly suggest that understanding the reasoning behind the bus reset
>>>>>> behavior is a central aspect to this series.  This effort has dragged
>>>>>> out for nearly two years and I apologize, but I don't really have a lot
>>>>>> of patience for rehashing some of these issues if you're not going to
>>>>>> read the previous discussions or consult with your colleagues to
>>>>>> understand how we got to this point.  If you want to challenge some of
>>>>>> the design points, that's great, it could use some new eyes, but please
>>>>>> understand how we got here first.    
>>>>>
>>>>> Well I'm guessing Cao jin here isn't the only one not
>>>>> willing to plough through all historical versions of the patchset
>>>>> just to figure out the motivation for some code.
>>>>>
>>>>> Including a summary of a high level architecture couldn't hurt.
>>>>>
>>>>> Any chance of writing such?  Alternatively, we can try to build it as
>>>>> part of this thread.  Shouldn't be hard as it seems somewhat
>>>>> straight-forward on the surface:
>>>>>
>>>>> - detect link error on the host, don't reset link as we would normally do  
>>>>
>>>> This is actually a new approach that I'm not sure I agree with.  By
>>>> skipping the host directed link reset, vfio is taking responsibility
>>>> for doing this, but then we just assume the user will do it.  I have
>>>> issues with this.
>>>>
>>>> The previous approach was to use the error detected notifier to block
>>>> access to the device, allowing the host to perform the link reset.  A
>>>> subsequent notification in the AER process released the user access
>>>> which allowed the user AER process to proceed.  This did result in both
>>>> a host directed and a guest directed link reset, but other than
>>>> coordinating the blocking of the user process during host reset, that
>>>> hasn't been brought up as an issue previously.
>>>>   
>>>
>>> Tests on previous versions didn't bring up issues as I find, I think
>>> that is because we didn't test it completely. As I know, before August
>>> of this year, we didn't have cable connected to NIC, let alone
>>> connecting NIC to gateway.
>>
>> Lack of testing has been a significant issue throughout the development
>> of this series.
>>
>>> Even if I fixed the guest oops issue in igb driver that Alex found in
>>> v9, v9 still cannot work in my test. And in my test, disable link
>>> reset(in host) in aer core for vfio-pci is the most significant step to
>>> get my test passed.
>>
>> But is it the correct step?  I'm not convinced.  Why did blocking guest
>> access not work?  How do you plan to manage vfio taking the
>> responsibility to perform a bus reset when you don't know whether QEMU
>> is the user of the device or whether the user supports AER recovery?
>>  
> 
> Maybe currently we don't have enough proof to prove the correctness, but
> I think I did find some facts to prove that link reset in host is a big
> trouble, and can answer part of questions above.
> 
> 1st, some thoughts:
> In pci-error-recovery.txt and do_recovery() of kernel tree, we can see,
> a recovery consists of several steps(callbacks), link reset is one of
> them, and except link reset, the others are seems kind of device
> specific. In our case, both host & guest will do recovery, I think the
> host recovery actually is some kind of fake recovery, see vfio-pci
> driver's error_detected & resume callback, they don't do anything
> special, mainly signal error to user, but the link reset in host "fake
> reset" does some serious work, in other words, I think host does the
> recovery incompletely, so I was thinking, why not just drop incompletely
> host recovery(drop link reset) for vfio-pci, and let the guest take care
> of the whole serious recovery.  This is part of the reason of why my
> version looks like this.  But yes, I admit the issue Alex mentioned,
> vfio can't guarantee that user will do a bus reset, this is an issue I
> will keep looking for a solution.
> 
> 2nd, some facts and analyzation from test:
> In host, the relationship between time and behviour in each component
> roughly looks as following:
> 
>      +   HW    +  host kernel   +     qemu      + guest kernel  +
>      |         |(error recovery)|               |               |
>      |         |                |               |               |
>      |         | vfio-pci's     |               |               |
>      |         | error_detected |               |               |
>      |         | +              |               |               |
>      |         | |              |               |               |
>      |         | | error notify |               |               |
>      |         | | via eventfd  |               |               |
>      |         | +---------------> +----------+ |               |
>      |         |                |  +vfio_err_ | |               |
>      |         |                |  |notifier_ | |               |
>      | +---- +<---+link reset   |  |handler   | |               |
>      | | HW  | |                |  |          | |               |       
>   | |     | |                |             | |               |
>      | |r    | | vfio-pci's     |  |pass aer  | |               |
>      | |e..  | | resume         |  |to guest  | |               |
>      | |s.   | | (blocking end) |  |          | |               |
>      | |e    | |                |  |   *2*    | |               |
>      | |t    | |                |  +----+-----+ |               |
>      | |i    | |                |       |       |               |
>      | |n    | |                |       +--------> +----------+ |
>      | |g    | |                |               |  | guest    | |
>      | |     | |                |               |  | recovery | |
>      | |     | |                |               |  | process  | |
>      | |     | |                |               |  |(include  | |
>      | |     | |                |               |  |register  | |
>      | | *1* | |                |               |  |access)   | |
>      | |     | |                |               |  |          | |
>      | |     | |                |               |  |   *3*    | |
>      |         |                |               |  +----------+ |
> Time |         |                |               |               |
>      |         |                |               |               |
>      |         |                |               |               |
>      |         |                |               |               |
>      v         |                |               |               |
> 
> Now let me try to answer: Why did blocking guest access not work?
> Some important factor:
> 1. host recovery doesn't do anything special except error notifying, so
> it may be executed very fast.
> 2. Hardware resetting time is not sure, from pcie spec 6.6.1, guessing
> it need many ms, pretty long?
> 
> some facts found in v9(block config write, not read, during host
> recovery) test:
> 1. reading uncor error register in vfio_err_notifier_handler sometimes
> returns correct value, sometimes return invalid value(All F's)
> 

I forget another facts:
2. the igb bug[*] our patch v9 triggered also is the same kind things.
error_detected() of igb in guest read config space, and return invalid
all F's, then igb NULLed its hardware address, then oops shows.

I think it is also a concurrent issue as described below.

[*] http://patchwork.ozlabs.org/patch/692171

> So, I am thinking, if host blocking on host end early, and *2*, *3* is
> parallel with *1*, the way used in v9 to blocking guest access, may not
> work.
> 

-- 
Sincerely,
Cao jin

  parent reply	other threads:[~2016-12-09  7:47 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28  3:00 ` Michael S. Tsirkin
2016-11-28  9:32   ` Cao jin
2016-11-30  1:46     ` Michael S. Tsirkin
2016-12-01 13:38       ` Cao jin
2016-12-01  4:04 ` Alex Williamson
2016-12-01  4:51   ` Michael S. Tsirkin
2016-12-01 13:40     ` Cao jin
2016-12-06  3:46       ` Michael S. Tsirkin
2016-12-06  6:47         ` Cao jin
2016-12-01 13:40   ` Cao jin
2016-12-01 14:55     ` Alex Williamson
2016-12-04 12:16       ` Cao jin
2016-12-04 15:30         ` Alex Williamson
2016-12-05  5:52           ` Cao jin
2016-12-05 16:17             ` Alex Williamson
2016-12-06  3:55               ` Michael S. Tsirkin
2016-12-06  4:59                 ` Alex Williamson
2016-12-06 10:46                   ` Cao jin
2016-12-06 15:35                     ` Alex Williamson
2016-12-07  2:49                       ` Cao jin
2016-12-08 14:46                       ` Cao jin
2016-12-08 16:30                         ` Michael S. Tsirkin
2016-12-09  3:40                           ` Cao jin
2016-12-09  3:40                         ` Cao jin [this message]
2016-12-06  6:11               ` Cao jin
2016-12-06 15:25                 ` Alex Williamson
2016-12-07  2:58                   ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12   ` Alex Williamson
2016-12-12 22:29     ` Michael S. Tsirkin
2016-12-12 22:43       ` Alex Williamson
2016-12-13  3:15         ` Michael S. Tsirkin
2016-12-13  3:39           ` Alex Williamson
2016-12-13 16:12             ` Michael S. Tsirkin
2016-12-13 16:27               ` Alex Williamson
2016-12-14  1:58                 ` Michael S. Tsirkin
2016-12-14  3:00                   ` Alex Williamson
2016-12-14 22:20                     ` Michael S. Tsirkin
2016-12-14 22:47                       ` Alex Williamson
2016-12-14 23:00                         ` Michael S. Tsirkin
2016-12-14 23:32                           ` Alex Williamson
2016-12-14 10:24     ` Cao jin
2016-12-14 22:16       ` Alex Williamson
2016-12-14 22:25         ` Michael S. Tsirkin
2016-12-14 22:49           ` Alex Williamson
2016-12-15 13:56         ` Cao jin
2016-12-15 14:50           ` Michael S. Tsirkin
2016-12-15 22:01             ` Alex Williamson
2016-12-16 10:15               ` Cao jin
2016-12-16 10:15             ` Cao jin
2016-12-15 17:02           ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=584A27C2.6030407@cn.fujitsu.com \
    --to=caoj.fnst@cn.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).