All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: Alex Williamson <alex.williamson@redhat.com>, <mst@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Wed, 14 Dec 2016 18:24:23 +0800	[thread overview]
Message-ID: <58511DD7.8040508@cn.fujitsu.com> (raw)
In-Reply-To: <20161212121216.1c385d65@t450s.home>

Sorry for late.
after reading all your comments, I think I will try the solution 1.

On 12/13/2016 03:12 AM, Alex Williamson wrote:
> On Mon, 12 Dec 2016 21:49:01 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
>> Hi,
>> I have 2 solutions(high level design) came to me, please see if they are
>> acceptable, or which one is acceptable. Also have some questions.
>>
>> 1. block guest access during host recovery
>>
>>    add new field error_recovering in struct vfio_pci_device to
>>    indicate host recovery status. aer driver in host will still do
>>    reset link
>>
>>    - set error_recovering in vfio-pci driver's error_detected, used to
>>      block all kinds of user access(config space, mmio)
>>    - in order to solve concurrent issue of device resetting & user
>>      access, check device state[*] in vfio-pci driver's resume, see if
>>      device reset is done, if it is, then clear"error_recovering", or
>>      else new a timer, check device state periodically until device
>>      reset is done. (what if device reset don't end for a long time?)
>>    - In qemu, translate guest link reset to host link reset.
>>      A question here: we already have link reset in host, is a second
>>      link reset necessary? why?
>>  
>>    [*] how to check device state: reading certain config space
>>        register, check return value is valid or not(All F's)
> 
> Isn't this exactly the path we were on previously?

Yes, it is basically the previous path, plus the optimization.

> There might be an
> optimization that we could skip back-to-back resets, but how can you
> necessarily infer that the resets are for the same thing? If the user
> accesses the device between resets, can you still guarantee the guest
> directed reset is unnecessary?  If time passes between resets, do you
> know they're for the same event?  How much time can pass between the
> host and guest reset to know they're for the same event?  In the
> process of error handling, which is more important, speed or
> correctness?
>  

I think vfio driver itself won't know what each reset comes for, and I
don't quite understand why should vfio care this question, is this a new
question in the design?

But I think it make sense that the user access during 2 resets maybe a
trouble for guest recovery, misbehaved user could be out of our
imagination.  Correctness is more important.

If I understand you right, let me make a summary: host recovery just
does link reset, which is incomplete, so we'd better do a complete guest
recovery for correctness.

>> 2. skip link reset in aer driver of host kernel, for vfio-pci.
>>    Let user decide how to do serious recovery
>>
>>    add new field "user_driver" in struct pci_dev, used to skip link
>>    reset for vfio-pci; add new field "link_reset" in struct
>>    vfio_pci_device to indicate link has been reset or not during
>>    recovery
>>
>>    - set user_driver in vfio_pci_probe(), to skip link reset for
>>      vfio-pci in host.
>>    - (use a flag)block user access(config, mmio) during host recovery
>>      (not sure if this step is necessary)
>>    - In qemu, translate guest link reset to host link reset.
>>    - In vfio-pci driver, set link_reset after VFIO_DEVICE_PCI_HOT_RESET
>>      is executed
>>    - In vfio-pci driver's resume, new a timer, check "link_reset" field
>>      periodically, if it is set in reasonable time, then clear it and
>>      delete timer, or else, vfio-pci driver will does the link reset!
> 
> What happens in the case of a multifunction device where each function
> is part of a separate IOMMU group and one function is hot-removed from
> the user? We can't do a link reset on that function since the other
> function is still in use.  We have no choice but release a device in an
> unknown state back to the host.

hot-remove from user, do you mean, for example, all functions assigned
to VM, then suddenly a person does something like following

$ echo 0000:06:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind

$ echo 0000:06:00.0 > /sys/bus/pci/drivers/igb/bind

to return device to host driver, or don't bind it to host driver, let it
in driver-less state???

>  As previously discussed, we don't
> expect that any sort of function-level FLR will necessarily reset the
> device to the same state.  I also don't really like vfio-pci taking
> over error handling capabilities from the PCI-core.  That's redundant
> code and extra maintenance overhead.
>  

I understand the concern, so I suppose solution 1 is preferred.

-- 
Sincerely,
Cao jin

>> A quick question:
>> I don't know how devices is divided into iommu groups, is it possible
>> for functions in a multi-function device to be split into different groups?
> 
> Yes, if a multifunction device supports ACS or if we have quirks to
> expose that the functions do not perform internal peer-to-peer, then
> they may be in separate IOMMU groups, depending on the rest of the PCI
> topology.  See:
> 
> http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
> 
> Thanks,
> Alex
> 
> 
> .
> 

  parent reply	other threads:[~2016-12-14 10:22 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28  3:00 ` Michael S. Tsirkin
2016-11-28  9:32   ` Cao jin
2016-11-30  1:46     ` Michael S. Tsirkin
2016-12-01 13:38       ` Cao jin
2016-12-01  4:04 ` Alex Williamson
2016-12-01  4:51   ` Michael S. Tsirkin
2016-12-01 13:40     ` Cao jin
2016-12-06  3:46       ` Michael S. Tsirkin
2016-12-06  6:47         ` Cao jin
2016-12-01 13:40   ` Cao jin
2016-12-01 14:55     ` Alex Williamson
2016-12-04 12:16       ` Cao jin
2016-12-04 15:30         ` Alex Williamson
2016-12-05  5:52           ` Cao jin
2016-12-05 16:17             ` Alex Williamson
2016-12-06  3:55               ` Michael S. Tsirkin
2016-12-06  4:59                 ` Alex Williamson
2016-12-06 10:46                   ` Cao jin
2016-12-06 15:35                     ` Alex Williamson
2016-12-07  2:49                       ` Cao jin
2016-12-08 14:46                       ` Cao jin
2016-12-08 16:30                         ` Michael S. Tsirkin
2016-12-09  3:40                           ` Cao jin
2016-12-09  3:40                         ` Cao jin
2016-12-06  6:11               ` Cao jin
2016-12-06 15:25                 ` Alex Williamson
2016-12-07  2:58                   ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12   ` Alex Williamson
2016-12-12 22:29     ` Michael S. Tsirkin
2016-12-12 22:43       ` Alex Williamson
2016-12-13  3:15         ` Michael S. Tsirkin
2016-12-13  3:39           ` Alex Williamson
2016-12-13 16:12             ` Michael S. Tsirkin
2016-12-13 16:27               ` Alex Williamson
2016-12-14  1:58                 ` Michael S. Tsirkin
2016-12-14  3:00                   ` Alex Williamson
2016-12-14 22:20                     ` Michael S. Tsirkin
2016-12-14 22:47                       ` Alex Williamson
2016-12-14 23:00                         ` Michael S. Tsirkin
2016-12-14 23:32                           ` Alex Williamson
2016-12-14 10:24     ` Cao jin [this message]
2016-12-14 22:16       ` Alex Williamson
2016-12-14 22:25         ` Michael S. Tsirkin
2016-12-14 22:49           ` Alex Williamson
2016-12-15 13:56         ` Cao jin
2016-12-15 14:50           ` Michael S. Tsirkin
2016-12-15 22:01             ` Alex Williamson
2016-12-16 10:15               ` Cao jin
2016-12-16 10:15             ` Cao jin
2016-12-15 17:02           ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58511DD7.8040508@cn.fujitsu.com \
    --to=caoj.fnst@cn.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.