From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932273AbcLLWnP (ORCPT <rfc822;w@1wt.eu>);
        Mon, 12 Dec 2016 17:43:15 -0500
Received: from mx1.redhat.com ([209.132.183.28]:40172 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751697AbcLLWnO (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 12 Dec 2016 17:43:14 -0500
Date: Mon, 12 Dec 2016 15:43:13 -0700
From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org, izumi.taku@jp.fujitsu.com
Subject: Re: [PATCH] vfio/pci: Support error recovery
Message-ID: <20161212154313.2ffdf4ab@t450s.home>
In-Reply-To: <20161213002810-mutt-send-email-mst@kernel.org>
References: <1480246457-10368-1-git-send-email-caoj.fnst@cn.fujitsu.com>
        <584EAACD.9070800@cn.fujitsu.com>
        <20161212121216.1c385d65@t450s.home>
        <20161213002810-mutt-send-email-mst@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 12 Dec 2016 22:43:14 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, 13 Dec 2016 00:29:42 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Dec 12, 2016 at 12:12:16PM -0700, Alex Williamson wrote:
> > On Mon, 12 Dec 2016 21:49:01 +0800
> > Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> >   
> > > Hi,
> > > I have 2 solutions(high level design) came to me, please see if they are
> > > acceptable, or which one is acceptable. Also have some questions.
> > > 
> > > 1. block guest access during host recovery
> > > 
> > >    add new field error_recovering in struct vfio_pci_device to
> > >    indicate host recovery status. aer driver in host will still do
> > >    reset link
> > > 
> > >    - set error_recovering in vfio-pci driver's error_detected, used to
> > >      block all kinds of user access(config space, mmio)
> > >    - in order to solve concurrent issue of device resetting & user
> > >      access, check device state[*] in vfio-pci driver's resume, see if
> > >      device reset is done, if it is, then clear"error_recovering", or
> > >      else new a timer, check device state periodically until device
> > >      reset is done. (what if device reset don't end for a long time?)
> > >    - In qemu, translate guest link reset to host link reset.
> > >      A question here: we already have link reset in host, is a second
> > >      link reset necessary? why?
> > >  
> > >    [*] how to check device state: reading certain config space
> > >        register, check return value is valid or not(All F's)  
> > 
> > Isn't this exactly the path we were on previously?  There might be an
> > optimization that we could skip back-to-back resets, but how can you
> > necessarily infer that the resets are for the same thing?  If the user
> > accesses the device between resets, can you still guarantee the guest
> > directed reset is unnecessary?  If time passes between resets, do you
> > know they're for the same event?  How much time can pass between the
> > host and guest reset to know they're for the same event?  In the
> > process of error handling, which is more important, speed or
> > correctness?
> >    
> > > 2. skip link reset in aer driver of host kernel, for vfio-pci.
> > >    Let user decide how to do serious recovery
> > > 
> > >    add new field "user_driver" in struct pci_dev, used to skip link
> > >    reset for vfio-pci; add new field "link_reset" in struct
> > >    vfio_pci_device to indicate link has been reset or not during
> > >    recovery
> > > 
> > >    - set user_driver in vfio_pci_probe(), to skip link reset for
> > >      vfio-pci in host.
> > >    - (use a flag)block user access(config, mmio) during host recovery
> > >      (not sure if this step is necessary)
> > >    - In qemu, translate guest link reset to host link reset.
> > >    - In vfio-pci driver, set link_reset after VFIO_DEVICE_PCI_HOT_RESET
> > >      is executed
> > >    - In vfio-pci driver's resume, new a timer, check "link_reset" field
> > >      periodically, if it is set in reasonable time, then clear it and
> > >      delete timer, or else, vfio-pci driver will does the link reset!  
> > 
> > What happens in the case of a multifunction device where each function
> > is part of a separate IOMMU group and one function is hot-removed from
> > the user?  
> 
> So just don't do it then. Topology must match between host and guest,
> except maybe for the case of devices with host driver  (e.g. PF)
> which we might be able to synchronize against.

We're talking about host kernel level handling here.  The host kernel
cannot defer the link reset to the user under the assumption that the
user is handling the devices in a very specific way.  The moment we do
that, we've lost.  Thanks,

Alex