From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>, <mst@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
<izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Wed, 14 Dec 2016 18:24:23 +0800 [thread overview]
Message-ID: <58511DD7.8040508@cn.fujitsu.com> (raw)
In-Reply-To: <20161212121216.1c385d65@t450s.home>
Sorry for late.
after reading all your comments, I think I will try the solution 1.
On 12/13/2016 03:12 AM, Alex Williamson wrote:
> On Mon, 12 Dec 2016 21:49:01 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>
>> Hi,
>> I have 2 solutions(high level design) came to me, please see if they are
>> acceptable, or which one is acceptable. Also have some questions.
>>
>> 1. block guest access during host recovery
>>
>> add new field error_recovering in struct vfio_pci_device to
>> indicate host recovery status. aer driver in host will still do
>> reset link
>>
>> - set error_recovering in vfio-pci driver's error_detected, used to
>> block all kinds of user access(config space, mmio)
>> - in order to solve concurrent issue of device resetting & user
>> access, check device state[*] in vfio-pci driver's resume, see if
>> device reset is done, if it is, then clear"error_recovering", or
>> else new a timer, check device state periodically until device
>> reset is done. (what if device reset don't end for a long time?)
>> - In qemu, translate guest link reset to host link reset.
>> A question here: we already have link reset in host, is a second
>> link reset necessary? why?
>>
>> [*] how to check device state: reading certain config space
>> register, check return value is valid or not(All F's)
>
> Isn't this exactly the path we were on previously?
Yes, it is basically the previous path, plus the optimization.
> There might be an
> optimization that we could skip back-to-back resets, but how can you
> necessarily infer that the resets are for the same thing? If the user
> accesses the device between resets, can you still guarantee the guest
> directed reset is unnecessary? If time passes between resets, do you
> know they're for the same event? How much time can pass between the
> host and guest reset to know they're for the same event? In the
> process of error handling, which is more important, speed or
> correctness?
>
I think vfio driver itself won't know what each reset comes for, and I
don't quite understand why should vfio care this question, is this a new
question in the design?
But I think it make sense that the user access during 2 resets maybe a
trouble for guest recovery, misbehaved user could be out of our
imagination. Correctness is more important.
If I understand you right, let me make a summary: host recovery just
does link reset, which is incomplete, so we'd better do a complete guest
recovery for correctness.
>> 2. skip link reset in aer driver of host kernel, for vfio-pci.
>> Let user decide how to do serious recovery
>>
>> add new field "user_driver" in struct pci_dev, used to skip link
>> reset for vfio-pci; add new field "link_reset" in struct
>> vfio_pci_device to indicate link has been reset or not during
>> recovery
>>
>> - set user_driver in vfio_pci_probe(), to skip link reset for
>> vfio-pci in host.
>> - (use a flag)block user access(config, mmio) during host recovery
>> (not sure if this step is necessary)
>> - In qemu, translate guest link reset to host link reset.
>> - In vfio-pci driver, set link_reset after VFIO_DEVICE_PCI_HOT_RESET
>> is executed
>> - In vfio-pci driver's resume, new a timer, check "link_reset" field
>> periodically, if it is set in reasonable time, then clear it and
>> delete timer, or else, vfio-pci driver will does the link reset!
>
> What happens in the case of a multifunction device where each function
> is part of a separate IOMMU group and one function is hot-removed from
> the user? We can't do a link reset on that function since the other
> function is still in use. We have no choice but release a device in an
> unknown state back to the host.
hot-remove from user, do you mean, for example, all functions assigned
to VM, then suddenly a person does something like following
$ echo 0000:06:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind
$ echo 0000:06:00.0 > /sys/bus/pci/drivers/igb/bind
to return device to host driver, or don't bind it to host driver, let it
in driver-less state???
> As previously discussed, we don't
> expect that any sort of function-level FLR will necessarily reset the
> device to the same state. I also don't really like vfio-pci taking
> over error handling capabilities from the PCI-core. That's redundant
> code and extra maintenance overhead.
>
I understand the concern, so I suppose solution 1 is preferred.
--
Sincerely,
Cao jin
>> A quick question:
>> I don't know how devices is divided into iommu groups, is it possible
>> for functions in a multi-function device to be split into different groups?
>
> Yes, if a multifunction device supports ACS or if we have quirks to
> expose that the functions do not perform internal peer-to-peer, then
> they may be in separate IOMMU groups, depending on the rest of the PCI
> topology. See:
>
> http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
>
> Thanks,
> Alex
>
>
> .
>
next prev parent reply other threads:[~2016-12-14 10:22 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28 3:00 ` Michael S. Tsirkin
2016-11-28 9:32 ` Cao jin
2016-11-30 1:46 ` Michael S. Tsirkin
2016-12-01 13:38 ` Cao jin
2016-12-01 4:04 ` Alex Williamson
2016-12-01 4:51 ` Michael S. Tsirkin
2016-12-01 13:40 ` Cao jin
2016-12-06 3:46 ` Michael S. Tsirkin
2016-12-06 6:47 ` Cao jin
2016-12-01 13:40 ` Cao jin
2016-12-01 14:55 ` Alex Williamson
2016-12-04 12:16 ` Cao jin
2016-12-04 15:30 ` Alex Williamson
2016-12-05 5:52 ` Cao jin
2016-12-05 16:17 ` Alex Williamson
2016-12-06 3:55 ` Michael S. Tsirkin
2016-12-06 4:59 ` Alex Williamson
2016-12-06 10:46 ` Cao jin
2016-12-06 15:35 ` Alex Williamson
2016-12-07 2:49 ` Cao jin
2016-12-08 14:46 ` Cao jin
2016-12-08 16:30 ` Michael S. Tsirkin
2016-12-09 3:40 ` Cao jin
2016-12-09 3:40 ` Cao jin
2016-12-06 6:11 ` Cao jin
2016-12-06 15:25 ` Alex Williamson
2016-12-07 2:58 ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12 ` Alex Williamson
2016-12-12 22:29 ` Michael S. Tsirkin
2016-12-12 22:43 ` Alex Williamson
2016-12-13 3:15 ` Michael S. Tsirkin
2016-12-13 3:39 ` Alex Williamson
2016-12-13 16:12 ` Michael S. Tsirkin
2016-12-13 16:27 ` Alex Williamson
2016-12-14 1:58 ` Michael S. Tsirkin
2016-12-14 3:00 ` Alex Williamson
2016-12-14 22:20 ` Michael S. Tsirkin
2016-12-14 22:47 ` Alex Williamson
2016-12-14 23:00 ` Michael S. Tsirkin
2016-12-14 23:32 ` Alex Williamson
2016-12-14 10:24 ` Cao jin [this message]
2016-12-14 22:16 ` Alex Williamson
2016-12-14 22:25 ` Michael S. Tsirkin
2016-12-14 22:49 ` Alex Williamson
2016-12-15 13:56 ` Cao jin
2016-12-15 14:50 ` Michael S. Tsirkin
2016-12-15 22:01 ` Alex Williamson
2016-12-16 10:15 ` Cao jin
2016-12-16 10:15 ` Cao jin
2016-12-15 17:02 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58511DD7.8040508@cn.fujitsu.com \
--to=caoj.fnst@cn.fujitsu.com \
--cc=alex.williamson@redhat.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).