linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rao, Lei" <lei.rao@intel.com>
To: Max Gurtovoy <mgurtovoy@nvidia.com>,
	Christoph Hellwig <hch@lst.de>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: <kbusch@kernel.org>, <axboe@fb.com>, <kch@nvidia.com>,
	<sagi@grimberg.me>, <alex.williamson@redhat.com>,
	<cohuck@redhat.com>, <yishaih@nvidia.com>,
	<shameerali.kolothum.thodi@huawei.com>, <kevin.tian@intel.com>,
	<mjrosato@linux.ibm.com>, <linux-kernel@vger.kernel.org>,
	<linux-nvme@lists.infradead.org>, <kvm@vger.kernel.org>,
	<eddie.dong@intel.com>, <yadong.li@intel.com>,
	<yi.l.liu@intel.com>, <Konrad.wilk@oracle.com>,
	<stephen@eideticom.com>, <hang.yuan@intel.com>,
	Oren Duer <oren@nvidia.com>
Subject: Re: [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device
Date: Mon, 12 Dec 2022 09:20:09 +0800	[thread overview]
Message-ID: <0bda7899-6790-4b24-f622-5f9c8951c239@intel.com> (raw)
In-Reply-To: <d4aeda5c-d7bb-4427-5157-fb7530dfd1fb@nvidia.com>



On 12/11/2022 10:51 PM, Max Gurtovoy wrote:
> 
> On 12/11/2022 3:21 PM, Rao, Lei wrote:
>>
>>
>> On 12/11/2022 8:05 PM, Max Gurtovoy wrote:
>>>
>>> On 12/6/2022 5:01 PM, Christoph Hellwig wrote:
>>>> On Tue, Dec 06, 2022 at 10:48:22AM -0400, Jason Gunthorpe wrote:
>>>>> Sadly in Linux we don't have a SRIOV VF lifecycle model that is any
>>>>> use.
>>>> Beward:  The secondary function might as well be a physical function
>>>> as well.  In fact one of the major customers for "smart" multifunction
>>>> nvme devices prefers multi-PF devices over SR-IOV VFs. (and all the
>>>> symmetric dual ported devices are multi-PF as well).
>>>>
>>>> So this isn't really about a VF live cycle, but how to manage life
>>>> migration, especially on the receive / restore side.  And restoring
>>>> the entire controller state is extremely invasive and can't be done
>>>> on a controller that is in any classic form live.  In fact a lot
>>>> of the state is subsystem-wide, so without some kind of virtualization
>>>> of the subsystem it is impossible to actually restore the state.
>>>
>>> ohh, great !
>>>
>>> I read this subsystem virtualization proposal of yours after I sent my proposal for subsystem virtualization in patch 1/5 thread.
>>> I guess this means that this is the right way to go.
>>> Lets continue brainstorming this idea. I think this can be the way to migrate NVMe controllers in a standard way.
>>>
>>>>
>>>> To cycle back to the hardware that is posted here, I'm really confused
>>>> how it actually has any chance to work and no one has even tried
>>>> to explain how it is supposed to work.
>>>
>>> I guess in vendor specific implementation you can assume some things that we are discussing now for making it as a standard.
>>
>> Yes, as I wrote in the cover letter, this is a reference implementation to
>> start a discussion and help drive standardization efforts, but this series
>> works well for Intel IPU NVMe. As Jason said, there are two use cases:
>> shared medium and local medium. I think the live migration of the local medium
>> is complicated due to the large amount of user data that needs to be migrated.
>> I don't have a good idea to deal with this situation. But for Intel IPU NVMe,
>> each VF can connect to remote storage via the NVMF protocol to achieve storage
>> offloading. This is the shared medium. In this case, we don't need to migrate
>> the user data, which will significantly simplify the work of live migration.
> 
> I don't think that medium migration should be part of the SPEC. We can specify it's out of scope.
> 
> All the idea of live migration is to have a short downtime and I don't think we can guarantee short downtime if we need to copy few terabytes throw the networking.
> If the media copy is taking few seconds, there is no need to do live migration of few milisecs downtime. Just do regular migration of a
> 
>>
>> The series tries to solve the problem of live migration of shared medium.
>> But it still lacks dirty page tracking and P2P support, we are also developing
>> these features.
>>
>> About the nvme device state, As described in my document, the VF states include
>> VF CSR registers, Every IO Queue Pair state, and the AdminQ state. During the
>> implementation, I found that the device state data is small per VF. So, I decided
>> to use the admin queue of the Primary controller to send the live migration
>> commands to save and restore the VF states like MLX5.
> 
> I think and hope we all agree that the AdminQ of the controlling NVMe function will be used to migrate the controlled NVMe function.

Fully agree.

> 
> which document are you refereeing to ?

The fifth patch includes the definition of these commands and how the firmware handles
these live migration commands. It's the documentation that I referenced.

>>
>>>
>>>

  reply	other threads:[~2022-12-12  1:20 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-06  5:58 [RFC PATCH 0/5] Add new VFIO PCI driver for NVMe devices Lei Rao
2022-12-06  5:58 ` [RFC PATCH 1/5] nvme-pci: add function nvme_submit_vf_cmd to issue admin commands for VF driver Lei Rao
2022-12-06  6:19   ` Christoph Hellwig
2022-12-06 13:44     ` Jason Gunthorpe
2022-12-06 13:51       ` Keith Busch
2022-12-06 14:27         ` Jason Gunthorpe
2022-12-06 13:58       ` Christoph Hellwig
2022-12-06 15:22         ` Jason Gunthorpe
2022-12-06 15:38           ` Christoph Hellwig
2022-12-06 15:51             ` Jason Gunthorpe
2022-12-06 16:55               ` Christoph Hellwig
2022-12-06 19:15                 ` Jason Gunthorpe
2022-12-07  2:30                   ` Max Gurtovoy
2022-12-07  7:58                     ` Christoph Hellwig
2022-12-09  2:11                       ` Tian, Kevin
2022-12-12  7:41                         ` Christoph Hellwig
2022-12-07  7:54                   ` Christoph Hellwig
2022-12-07 10:59                     ` Max Gurtovoy
2022-12-07 13:46                       ` Christoph Hellwig
2022-12-07 14:50                         ` Max Gurtovoy
2022-12-07 16:35                           ` Christoph Hellwig
2022-12-07 13:34                     ` Jason Gunthorpe
2022-12-07 13:52                       ` Christoph Hellwig
2022-12-07 15:07                         ` Jason Gunthorpe
2022-12-07 16:38                           ` Christoph Hellwig
2022-12-07 17:31                             ` Jason Gunthorpe
2022-12-07 18:33                               ` Christoph Hellwig
2022-12-07 20:08                                 ` Jason Gunthorpe
2022-12-09  2:50                                   ` Tian, Kevin
2022-12-09 18:56                                     ` Dong, Eddie
2022-12-11 11:39                                   ` Max Gurtovoy
2022-12-12  7:55                                     ` Christoph Hellwig
2022-12-12 14:49                                       ` Max Gurtovoy
2022-12-12  7:50                                   ` Christoph Hellwig
2022-12-13 14:01                                     ` Jason Gunthorpe
2022-12-13 16:08                                       ` Christoph Hellwig
2022-12-13 17:49                                         ` Jason Gunthorpe
2022-12-06  5:58 ` [RFC PATCH 2/5] nvme-vfio: add new vfio-pci driver for NVMe device Lei Rao
2022-12-06  5:58 ` [RFC PATCH 3/5] nvme-vfio: enable the function of VFIO live migration Lei Rao
2023-01-19 10:21   ` Max Gurtovoy
2023-02-09  9:09     ` Rao, Lei
2022-12-06  5:58 ` [RFC PATCH 4/5] nvme-vfio: check if the hardware supports " Lei Rao
2022-12-06 13:47   ` Keith Busch
2022-12-06  5:58 ` [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device Lei Rao
2022-12-06  6:26   ` Christoph Hellwig
2022-12-06 13:05     ` Jason Gunthorpe
2022-12-06 13:09       ` Christoph Hellwig
2022-12-06 13:52         ` Jason Gunthorpe
2022-12-06 14:00           ` Christoph Hellwig
2022-12-06 14:20             ` Jason Gunthorpe
2022-12-06 14:31               ` Christoph Hellwig
2022-12-06 14:48                 ` Jason Gunthorpe
2022-12-06 15:01                   ` Christoph Hellwig
2022-12-06 15:28                     ` Jason Gunthorpe
2022-12-06 15:35                       ` Christoph Hellwig
2022-12-06 18:00                         ` Dong, Eddie
2022-12-12  7:57                           ` Christoph Hellwig
2022-12-11 12:05                     ` Max Gurtovoy
2022-12-11 13:21                       ` Rao, Lei
2022-12-11 14:51                         ` Max Gurtovoy
2022-12-12  1:20                           ` Rao, Lei [this message]
2022-12-12  8:09                           ` Christoph Hellwig
2022-12-09  2:05         ` Tian, Kevin
2022-12-09 16:53           ` Li, Yadong
2022-12-12  8:11             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0bda7899-6790-4b24-f622-5f9c8951c239@intel.com \
    --to=lei.rao@intel.com \
    --cc=Konrad.wilk@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@fb.com \
    --cc=cohuck@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=hang.yuan@intel.com \
    --cc=hch@lst.de \
    --cc=jgg@ziepe.ca \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mgurtovoy@nvidia.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=oren@nvidia.com \
    --cc=sagi@grimberg.me \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=stephen@eideticom.com \
    --cc=yadong.li@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).