All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Yishai Hadas <yishaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	bhelgaas@google.com, saeedm@nvidia.com,
	linux-pci@vger.kernel.org, kvm@vger.kernel.org,
	netdev@vger.kernel.org, kuba@kernel.org, leonro@nvidia.com,
	kwankhede@nvidia.com, mgurtovoy@nvidia.com, maorg@nvidia.com,
	Cornelia Huck <cohuck@redhat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices
Date: Tue, 26 Oct 2021 15:51:21 +0100	[thread overview]
Message-ID: <YXgV6ehhsSlydiEl@work-vm> (raw)
In-Reply-To: <20211026082920.1f302a45.alex.williamson@redhat.com>

* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Mon, 25 Oct 2021 19:47:29 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Alex Williamson (alex.williamson@redhat.com) wrote:
> > > On Mon, 25 Oct 2021 17:34:01 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Alex Williamson (alex.williamson@redhat.com) wrote:  
> > > > > [Cc +dgilbert, +cohuck]
> > > > > 
> > > > > On Wed, 20 Oct 2021 11:28:04 +0300
> > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > >     

<snip>

> > > In a way.  We're essentially recognizing that we cannot stop a single
> > > device in isolation of others that might participate in peer-to-peer
> > > DMA with that device, so we need to make a pass to quiesce each device
> > > before we can ask the device to fully stop.  This new device state bit
> > > is meant to be that quiescent point, devices can accept incoming DMA
> > > but should cease to generate any.  Once all device are quiesced then we
> > > can safely stop them.  
> > 
> > It may need some further refinement; for example in that quiesed state
> > do counters still tick? will a NIC still respond to packets that don't
> > get forwarded to the host?
> 
> I'd think no, but I imagine it's largely device specific to what extent
> a device can be fully halted yet minimally handle incoming DMA.

That's what worries me; we're adding a new state here as we understand
more about trying to implement a device; but it seems that we need to
nail something down as to what the state means.

> > Note I still think you need a way to know when you have actually reached
> > these states; setting a bit in a register is asking nicely for a device
> > to go into a state - has it got there?
> 
> It's more than asking nicely, we define the device_state bits as
> synchronous, the device needs to enter the state before returning from
> the write operation or return an errno.

I don't see how it can be synchronous in practice; can it really wait to
complete if it has to take many cycles to finish off an inflight DMA
before it transitions?

> > > > Now, you could be a *little* more sloppy; you could allow a device carry
> > > > on doing stuff purely with it's own internal state up until the point
> > > > it needs to serialise; but that would have to be strictly internal state
> > > > only - if it can change any other devices state (or issue an interrupt,
> > > > change RAM etc) then you get into ordering issues on the serialisation
> > > > of multiple devices.  
> > > 
> > > Yep, that's the proposal that doesn't require a uAPI change, we loosen
> > > the definition of stopped to mean the device can no longer generate DMA
> > > or interrupts and all internal processing outside or responding to
> > > incoming DMA should halt (essentially the same as the new quiescent
> > > state above).  Once all devices are in this state, there should be no
> > > incoming DMA and we can safely collect per device migration data.  If
> > > state changes occur beyond the point in time where userspace has
> > > initiated the collection of migration data, drivers have options for
> > > generating errors when userspace consumes that data.  
> > 
> > How do you know that last device has actually gone into that state?
> 
> Each device cannot, the burden is on the user to make sure all devices
> are stopped before proceeding to read migration data.

Yeh this really ties to the previous question; if it's synchronous
you're OK.

> > Also be careful; it feels much more delicate where something might
> > accidentally start a transaction.
> 
> This sounds like a discussion of theoretically broken drivers.  Like
> the above device_state, drivers still have a synchronization point when
> the user reads the pending_bytes field to initiate retrieving the
> device state.  If the implementation requires the device to be fully
> stopped to snapshot the device state to provide to the user, this is
> where that would happen.  Thanks,

Yes, but I worry that some ways of definining it are harder to get right
in drivers, so less likely to be theoretical.

Dave

> Alex
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


  reply	other threads:[~2021-10-26 14:51 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 10:58 [PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 01/14] PCI/IOV: Add pci_iov_vf_id() to get VF index Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 02/14] net/mlx5: Reuse exported virtfn index function call Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 03/14] net/mlx5: Disable SRIOV before PF removal Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 04/14] PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 05/14] net/mlx5: Expose APIs to get/put the mlx5 core device Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 06/14] vdpa/mlx5: Use mlx5_vf_get_core_dev() to get PF device Yishai Hadas
2021-10-19 11:16   ` Max Gurtovoy
2021-10-20  8:58     ` Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 07/14] vfio: Fix VFIO_DEVICE_STATE_SET_ERROR macro Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR Yishai Hadas
2021-10-19 15:48   ` Alex Williamson
2021-10-19 15:50     ` Alex Williamson
2021-10-20  7:35       ` Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 09/14] vfio/pci_core: Make the region->release() function optional Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 10/14] net/mlx5: Introduce migration bits and structures Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 11/14] vfio/mlx5: Expose migration commands over mlx5 device Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices Yishai Hadas
2021-10-19 18:43   ` Alex Williamson
2021-10-19 19:23     ` Jason Gunthorpe
2021-10-19 20:58       ` Alex Williamson
2021-10-19 23:04         ` Jason Gunthorpe
2021-10-20  8:28           ` Yishai Hadas
2021-10-20 16:52             ` Alex Williamson
2021-10-20 18:59               ` Jason Gunthorpe
2021-10-20 21:07                 ` Alex Williamson
2021-10-21  9:34                   ` Cornelia Huck
2021-10-21 21:47                     ` Alex Williamson
2021-10-25 12:29                       ` Jason Gunthorpe
2021-10-25 14:28                         ` Alex Williamson
2021-10-25 14:56                           ` Jason Gunthorpe
2021-10-26 14:42                             ` Alex Williamson
2021-10-26 15:18                               ` Jason Gunthorpe
2021-10-26 19:50                                 ` Alex Williamson
2021-10-26 23:43                                   ` Jason Gunthorpe
2021-10-27 19:05                                     ` Alex Williamson
2021-10-27 19:23                                       ` Jason Gunthorpe
2021-10-28 15:08                                         ` Cornelia Huck
2021-10-29  0:26                                           ` Jason Gunthorpe
2021-10-29  7:35                                             ` Yishai Hadas
2021-10-28 15:30                                         ` Alex Williamson
2021-10-28 23:47                                           ` Jason Gunthorpe
2021-10-29  6:57                                             ` Cornelia Huck
2021-10-29  7:48                                               ` Yishai Hadas
2021-10-29 10:32                                             ` Shameerali Kolothum Thodi
2021-10-29 12:15                                               ` Jason Gunthorpe
2021-10-29 22:06                                             ` Alex Williamson
2021-11-01 17:25                                               ` Jason Gunthorpe
2021-11-02 11:19                                                 ` Shameerali Kolothum Thodi
2021-11-02 14:56                                                 ` Alex Williamson
2021-11-02 15:54                                                   ` Jason Gunthorpe
2021-11-02 16:22                                                     ` Alex Williamson
2021-11-02 16:36                                                       ` Jason Gunthorpe
2021-11-02 20:15                                                         ` Alex Williamson
2021-11-03 12:09                                                           ` Jason Gunthorpe
2021-11-03 15:44                                                             ` Alex Williamson
2021-11-03 16:10                                                               ` Jason Gunthorpe
2021-11-03 18:04                                                                 ` Alex Williamson
2021-11-04 11:19                                                                   ` Cornelia Huck
2021-11-05 16:53                                                                     ` Cornelia Huck
2021-11-16 16:59                                                                       ` Cornelia Huck
2021-11-05 13:24                                                                   ` Jason Gunthorpe
2021-11-05 15:31                                                                     ` Alex Williamson
2021-11-15 23:29                                                                       ` Jason Gunthorpe
2021-11-16 17:57                                                                         ` Alex Williamson
2021-11-16 19:25                                                                           ` Jason Gunthorpe
2021-11-16 21:10                                                                             ` Alex Williamson
2021-11-17  1:48                                                                               ` Jason Gunthorpe
2021-11-18 18:15                                                                                 ` Alex Williamson
2021-11-22 19:18                                                                                   ` Jason Gunthorpe
2021-11-08  8:53                                 ` Tian, Kevin
2021-11-08 12:35                                   ` Jason Gunthorpe
2021-11-09  0:58                                     ` Tian, Kevin
2021-11-09 12:45                                       ` Jason Gunthorpe
2021-10-25 16:34               ` Dr. David Alan Gilbert
2021-10-25 17:55                 ` Alex Williamson
2021-10-25 18:47                   ` Dr. David Alan Gilbert
2021-10-25 19:15                     ` Jason Gunthorpe
2021-10-26  8:40                       ` Dr. David Alan Gilbert
2021-10-26 12:13                         ` Jason Gunthorpe
2021-10-26 14:52                           ` Alex Williamson
2021-10-26 15:56                             ` Jason Gunthorpe
2021-10-26 14:29                     ` Alex Williamson
2021-10-26 14:51                       ` Dr. David Alan Gilbert [this message]
2021-10-26 15:25                         ` Jason Gunthorpe
2021-10-20  8:01     ` Yishai Hadas
2021-10-20 16:25       ` Jason Gunthorpe
2021-10-21 10:46         ` Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 13/14] vfio/pci: Expose vfio_pci_aer_err_detected() Yishai Hadas
2021-10-19 10:58 ` [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler Yishai Hadas
2021-10-19 18:55   ` Alex Williamson
2021-10-19 19:10     ` Jason Gunthorpe
2021-10-20  8:46       ` Yishai Hadas
2021-10-20 16:46         ` Jason Gunthorpe
2021-10-20 17:45           ` Alex Williamson
2021-10-20 18:57             ` Jason Gunthorpe
2021-10-20 21:38               ` Alex Williamson
2021-10-21 10:39             ` Yishai Hadas
2021-11-17 16:42 ` vfio migration discussions (was: [PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver) Cornelia Huck
2021-11-17 17:47   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXgV6ehhsSlydiEl@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.