All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Yishai Hadas <yishaih@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	bhelgaas@google.com, saeedm@nvidia.com,
	linux-pci@vger.kernel.org, kvm@vger.kernel.org,
	netdev@vger.kernel.org, kuba@kernel.org, leonro@nvidia.com,
	kwankhede@nvidia.com, mgurtovoy@nvidia.com, maorg@nvidia.com
Subject: Re: [PATCH V1 mlx5-next 12/13] vfio/pci: Add infrastructure to let vfio_pci_core drivers trap device RESET
Date: Mon, 18 Oct 2021 09:02:34 -0300	[thread overview]
Message-ID: <20211018120234.GN2744544@nvidia.com> (raw)
In-Reply-To: <d91f729b-d547-406f-353f-04627d4e555c@nvidia.com>

On Sun, Oct 17, 2021 at 05:29:39PM +0300, Yishai Hadas wrote:
> On 10/16/2021 12:12 AM, Alex Williamson wrote:
> > On Fri, 15 Oct 2021 17:03:28 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > 
> > > On Fri, Oct 15, 2021 at 01:52:37PM -0600, Alex Williamson wrote:
> > > > On Wed, 13 Oct 2021 12:47:06 +0300
> > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > > Add infrastructure to let vfio_pci_core drivers trap device RESET.
> > > > > 
> > > > > The motivation for this is to let the underlay driver be aware that
> > > > > reset was done and set its internal state accordingly.
> > > > I think the intention of the uAPI here is that the migration error
> > > > state is exited specifically via the reset ioctl.  Maybe that should be
> > > > made more clear, but variant drivers can already wrap the core ioctl
> > > > for the purpose of determining that mechanism of reset has occurred.
> > > It is not just recovering the error state.
> > > 
> > > Any transition to reset changes the firmware state. Eg if userspace
> > > uses one of the other emulation paths to trigger the reset after
> > > putting the device off running then the driver state and FW state
> > > become desynchronized.
> > > 
> > > So all the reset paths need to be synchronized some how, either
> > > blocked while in non-running states or aligning the SW state with the
> > > new post-reset FW state.
> > This only catches the two flavors of FLR and the RESET ioctl itself, so
> > we've got gaps relative to "all the reset paths" anyway.  I'm also
> > concerned about adding arbitrary callbacks for every case that it gets
> > too cumbersome to write a wrapper for the existing callbacks.
> > 
> > However, why is this a vfio thing when we have the
> > pci_error_handlers.reset_done callback.  At best this ought to be
> > redundant to that.  Thanks,
> > 
> > Alex
> > 
> Alex,
> 
> How about the below patch instead ?
> 
> This will centralize the 'reset_done' notifications for drivers to one place
> (i.e. pci_error_handlers.reset_done)  and may close the gap that you pointed
> on.
> 
> I just followed the logic in vfio_pci_aer_err_detected() from usage and
> locking point of view.
> 
> Do we really need to take the &vdev->igate mutex as was done there ?
> 
> The next patch from the series in mlx5 will stay as of in V1, it may just
> set its ops and be called upon PCI 'reset_done'.
> 
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c
> b/drivers/vfio/pci/vfio_pci_core.c
> index e581a327f90d..20bf37c00fb6 100644
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1925,6 +1925,27 @@ static pci_ers_result_t
> vfio_pci_aer_err_detected(struct pci_dev *pdev,
>         return PCI_ERS_RESULT_CAN_RECOVER;
>  }
> 
> +static void vfio_pci_aer_err_reset_done(struct pci_dev *pdev)
> +{
> +       struct vfio_pci_core_device *vdev;
> +       struct vfio_device *device;
> +
> +       device = vfio_device_get_from_dev(&pdev->dev);
> +       if (device == NULL)
> +               return;

Do not add new vfio_device_get_from_dev() calls, this should extract
it from the pci_get_drvdata.

> +
> +       vdev = container_of(device, struct vfio_pci_core_device, vdev);
> +
> +       mutex_lock(&vdev->igate);
> +       if (vdev->ops && vdev->ops->reset_done)
> +               vdev->ops->reset_done(vdev);
> +       mutex_unlock(&vdev->igate);
> +
> +       vfio_device_put(device);
> +
> +       return;
> +}
> +
>  int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
>  {
>         struct vfio_device *device;
> @@ -1947,6 +1968,7 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_sriov_configure);
> 
>  const struct pci_error_handlers vfio_pci_core_err_handlers = {
>         .error_detected = vfio_pci_aer_err_detected,
> +       .reset_done = vfio_pci_aer_err_reset_done,
>  };
>  EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);

Most likely mlx5vf should just implement a pci_error_handlers struct
and install vfio_pci_aer_err_detected in it.

Jason

  reply	other threads:[~2021-10-18 12:02 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-13  9:46 [PATCH V1 mlx5-next 00/13] Add mlx5 live migration driver Yishai Hadas
2021-10-13  9:46 ` [PATCH V1 mlx5-next 01/13] PCI/IOV: Provide internal VF index Yishai Hadas
2021-10-13 18:14   ` Bjorn Helgaas
2021-10-14  9:08     ` Yishai Hadas
2021-10-13  9:46 ` [PATCH V1 mlx5-next 02/13] net/mlx5: Reuse exported virtfn index function call Yishai Hadas
2021-10-13  9:46 ` [PATCH V1 mlx5-next 03/13] net/mlx5: Disable SRIOV before PF removal Yishai Hadas
2021-10-13  9:46 ` [PATCH V1 mlx5-next 04/13] PCI/IOV: Allow SRIOV VF drivers to reach the drvdata of a PF Yishai Hadas
2021-10-13 18:27   ` Bjorn Helgaas
2021-10-14 22:11   ` Alex Williamson
2021-10-17 13:43     ` Yishai Hadas
2021-10-13  9:46 ` [PATCH V1 mlx5-next 05/13] net/mlx5: Expose APIs to get/put the mlx5 core device Yishai Hadas
2021-10-13  9:47 ` [PATCH V1 mlx5-next 06/13] vdpa/mlx5: Use mlx5_vf_get_core_dev() to get PF device Yishai Hadas
2021-10-13  9:47 ` [PATCH V1 mlx5-next 07/13] vfio: Add 'invalid' state definitions Yishai Hadas
2021-10-15 16:38   ` Alex Williamson
2021-10-17 14:07     ` Yishai Hadas
2021-10-13  9:47 ` [PATCH V1 mlx5-next 08/13] vfio/pci_core: Make the region->release() function optional Yishai Hadas
2021-10-13  9:47 ` [PATCH V1 mlx5-next 09/13] net/mlx5: Introduce migration bits and structures Yishai Hadas
2021-10-13  9:47 ` [PATCH V1 mlx5-next 10/13] vfio/mlx5: Expose migration commands over mlx5 device Yishai Hadas
2021-10-13  9:47 ` [PATCH V1 mlx5-next 11/13] vfio/mlx5: Implement vfio_pci driver for mlx5 devices Yishai Hadas
2021-10-15 19:48   ` Alex Williamson
2021-10-15 19:59     ` Jason Gunthorpe
2021-10-15 20:12       ` Alex Williamson
2021-10-15 20:16         ` Jason Gunthorpe
2021-10-15 20:59           ` Alex Williamson
2021-10-17 14:03             ` Yishai Hadas
2021-10-18 11:51               ` Jason Gunthorpe
2021-10-18 13:26                 ` Yishai Hadas
2021-10-18 13:42                   ` Alex Williamson
2021-10-18 13:46                     ` Yishai Hadas
2021-10-19  9:59   ` Shameerali Kolothum Thodi
2021-10-19 10:30     ` Yishai Hadas
2021-10-19 11:26       ` Shameerali Kolothum Thodi
2021-10-19 11:24     ` Jason Gunthorpe
2021-10-13  9:47 ` [PATCH V1 mlx5-next 12/13] vfio/pci: Add infrastructure to let vfio_pci_core drivers trap device RESET Yishai Hadas
2021-10-15 19:52   ` Alex Williamson
2021-10-15 20:03     ` Jason Gunthorpe
2021-10-15 21:12       ` Alex Williamson
2021-10-17 14:29         ` Yishai Hadas
2021-10-18 12:02           ` Jason Gunthorpe [this message]
2021-10-18 13:41             ` Yishai Hadas
2021-10-13  9:47 ` [PATCH V1 mlx5-next 13/13] vfio/mlx5: Trap device RESET and update state accordingly Yishai Hadas
2021-10-13 18:06   ` Jason Gunthorpe
2021-10-14  9:18     ` Yishai Hadas
2021-10-15 19:54       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211018120234.GN2744544@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.