linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>, <mst@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Wed, 14 Dec 2016 18:24:23 +0800	[thread overview]
Message-ID: <58511DD7.8040508@cn.fujitsu.com> (raw)
In-Reply-To: <20161212121216.1c385d65@t450s.home>

Sorry for late.
after reading all your comments, I think I will try the solution 1.

On 12/13/2016 03:12 AM, Alex Williamson wrote:
> On Mon, 12 Dec 2016 21:49:01 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
>> Hi,
>> I have 2 solutions(high level design) came to me, please see if they are
>> acceptable, or which one is acceptable. Also have some questions.
>>
>> 1. block guest access during host recovery
>>
>>    add new field error_recovering in struct vfio_pci_device to
>>    indicate host recovery status. aer driver in host will still do
>>    reset link
>>
>>    - set error_recovering in vfio-pci driver's error_detected, used to
>>      block all kinds of user access(config space, mmio)
>>    - in order to solve concurrent issue of device resetting & user
>>      access, check device state[*] in vfio-pci driver's resume, see if
>>      device reset is done, if it is, then clear"error_recovering", or
>>      else new a timer, check device state periodically until device
>>      reset is done. (what if device reset don't end for a long time?)
>>    - In qemu, translate guest link reset to host link reset.
>>      A question here: we already have link reset in host, is a second
>>      link reset necessary? why?
>>  
>>    [*] how to check device state: reading certain config space
>>        register, check return value is valid or not(All F's)
> 
> Isn't this exactly the path we were on previously?

Yes, it is basically the previous path, plus the optimization.

> There might be an
> optimization that we could skip back-to-back resets, but how can you
> necessarily infer that the resets are for the same thing? If the user
> accesses the device between resets, can you still guarantee the guest
> directed reset is unnecessary?  If time passes between resets, do you
> know they're for the same event?  How much time can pass between the
> host and guest reset to know they're for the same event?  In the
> process of error handling, which is more important, speed or
> correctness?
>  

I think vfio driver itself won't know what each reset comes for, and I
don't quite understand why should vfio care this question, is this a new
question in the design?

But I think it make sense that the user access during 2 resets maybe a
trouble for guest recovery, misbehaved user could be out of our
imagination.  Correctness is more important.

If I understand you right, let me make a summary: host recovery just
does link reset, which is incomplete, so we'd better do a complete guest
recovery for correctness.

>> 2. skip link reset in aer driver of host kernel, for vfio-pci.
>>    Let user decide how to do serious recovery
>>
>>    add new field "user_driver" in struct pci_dev, used to skip link
>>    reset for vfio-pci; add new field "link_reset" in struct
>>    vfio_pci_device to indicate link has been reset or not during
>>    recovery
>>
>>    - set user_driver in vfio_pci_probe(), to skip link reset for
>>      vfio-pci in host.
>>    - (use a flag)block user access(config, mmio) during host recovery
>>      (not sure if this step is necessary)
>>    - In qemu, translate guest link reset to host link reset.
>>    - In vfio-pci driver, set link_reset after VFIO_DEVICE_PCI_HOT_RESET
>>      is executed
>>    - In vfio-pci driver's resume, new a timer, check "link_reset" field
>>      periodically, if it is set in reasonable time, then clear it and
>>      delete timer, or else, vfio-pci driver will does the link reset!
> 
> What happens in the case of a multifunction device where each function
> is part of a separate IOMMU group and one function is hot-removed from
> the user? We can't do a link reset on that function since the other
> function is still in use.  We have no choice but release a device in an
> unknown state back to the host.

hot-remove from user, do you mean, for example, all functions assigned
to VM, then suddenly a person does something like following

$ echo 0000:06:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind

$ echo 0000:06:00.0 > /sys/bus/pci/drivers/igb/bind

to return device to host driver, or don't bind it to host driver, let it
in driver-less state???

>  As previously discussed, we don't
> expect that any sort of function-level FLR will necessarily reset the
> device to the same state.  I also don't really like vfio-pci taking
> over error handling capabilities from the PCI-core.  That's redundant
> code and extra maintenance overhead.
>  

I understand the concern, so I suppose solution 1 is preferred.

-- 
Sincerely,
Cao jin

>> A quick question:
>> I don't know how devices is divided into iommu groups, is it possible
>> for functions in a multi-function device to be split into different groups?
> 
> Yes, if a multifunction device supports ACS or if we have quirks to
> expose that the functions do not perform internal peer-to-peer, then
> they may be in separate IOMMU groups, depending on the rest of the PCI
> topology.  See:
> 
> http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
> 
> Thanks,
> Alex
> 
> 
> .
> 

  parent reply	other threads:[~2016-12-14 10:22 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28  3:00 ` Michael S. Tsirkin
2016-11-28  9:32   ` Cao jin
2016-11-30  1:46     ` Michael S. Tsirkin
2016-12-01 13:38       ` Cao jin
2016-12-01  4:04 ` Alex Williamson
2016-12-01  4:51   ` Michael S. Tsirkin
2016-12-01 13:40     ` Cao jin
2016-12-06  3:46       ` Michael S. Tsirkin
2016-12-06  6:47         ` Cao jin
2016-12-01 13:40   ` Cao jin
2016-12-01 14:55     ` Alex Williamson
2016-12-04 12:16       ` Cao jin
2016-12-04 15:30         ` Alex Williamson
2016-12-05  5:52           ` Cao jin
2016-12-05 16:17             ` Alex Williamson
2016-12-06  3:55               ` Michael S. Tsirkin
2016-12-06  4:59                 ` Alex Williamson
2016-12-06 10:46                   ` Cao jin
2016-12-06 15:35                     ` Alex Williamson
2016-12-07  2:49                       ` Cao jin
2016-12-08 14:46                       ` Cao jin
2016-12-08 16:30                         ` Michael S. Tsirkin
2016-12-09  3:40                           ` Cao jin
2016-12-09  3:40                         ` Cao jin
2016-12-06  6:11               ` Cao jin
2016-12-06 15:25                 ` Alex Williamson
2016-12-07  2:58                   ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12   ` Alex Williamson
2016-12-12 22:29     ` Michael S. Tsirkin
2016-12-12 22:43       ` Alex Williamson
2016-12-13  3:15         ` Michael S. Tsirkin
2016-12-13  3:39           ` Alex Williamson
2016-12-13 16:12             ` Michael S. Tsirkin
2016-12-13 16:27               ` Alex Williamson
2016-12-14  1:58                 ` Michael S. Tsirkin
2016-12-14  3:00                   ` Alex Williamson
2016-12-14 22:20                     ` Michael S. Tsirkin
2016-12-14 22:47                       ` Alex Williamson
2016-12-14 23:00                         ` Michael S. Tsirkin
2016-12-14 23:32                           ` Alex Williamson
2016-12-14 10:24     ` Cao jin [this message]
2016-12-14 22:16       ` Alex Williamson
2016-12-14 22:25         ` Michael S. Tsirkin
2016-12-14 22:49           ` Alex Williamson
2016-12-15 13:56         ` Cao jin
2016-12-15 14:50           ` Michael S. Tsirkin
2016-12-15 22:01             ` Alex Williamson
2016-12-16 10:15               ` Cao jin
2016-12-16 10:15             ` Cao jin
2016-12-15 17:02           ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58511DD7.8040508@cn.fujitsu.com \
    --to=caoj.fnst@cn.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).