From: "Rao, Lei" <lei.rao@intel.com>
To: Max Gurtovoy <mgurtovoy@nvidia.com>,
Christoph Hellwig <hch@lst.de>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: <kbusch@kernel.org>, <axboe@fb.com>, <kch@nvidia.com>,
<sagi@grimberg.me>, <alex.williamson@redhat.com>,
<cohuck@redhat.com>, <yishaih@nvidia.com>,
<shameerali.kolothum.thodi@huawei.com>, <kevin.tian@intel.com>,
<mjrosato@linux.ibm.com>, <linux-kernel@vger.kernel.org>,
<linux-nvme@lists.infradead.org>, <kvm@vger.kernel.org>,
<eddie.dong@intel.com>, <yadong.li@intel.com>,
<yi.l.liu@intel.com>, <Konrad.wilk@oracle.com>,
<stephen@eideticom.com>, <hang.yuan@intel.com>,
Oren Duer <oren@nvidia.com>
Subject: Re: [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device
Date: Mon, 12 Dec 2022 09:20:09 +0800 [thread overview]
Message-ID: <0bda7899-6790-4b24-f622-5f9c8951c239@intel.com> (raw)
In-Reply-To: <d4aeda5c-d7bb-4427-5157-fb7530dfd1fb@nvidia.com>
On 12/11/2022 10:51 PM, Max Gurtovoy wrote:
>
> On 12/11/2022 3:21 PM, Rao, Lei wrote:
>>
>>
>> On 12/11/2022 8:05 PM, Max Gurtovoy wrote:
>>>
>>> On 12/6/2022 5:01 PM, Christoph Hellwig wrote:
>>>> On Tue, Dec 06, 2022 at 10:48:22AM -0400, Jason Gunthorpe wrote:
>>>>> Sadly in Linux we don't have a SRIOV VF lifecycle model that is any
>>>>> use.
>>>> Beward: The secondary function might as well be a physical function
>>>> as well. In fact one of the major customers for "smart" multifunction
>>>> nvme devices prefers multi-PF devices over SR-IOV VFs. (and all the
>>>> symmetric dual ported devices are multi-PF as well).
>>>>
>>>> So this isn't really about a VF live cycle, but how to manage life
>>>> migration, especially on the receive / restore side. And restoring
>>>> the entire controller state is extremely invasive and can't be done
>>>> on a controller that is in any classic form live. In fact a lot
>>>> of the state is subsystem-wide, so without some kind of virtualization
>>>> of the subsystem it is impossible to actually restore the state.
>>>
>>> ohh, great !
>>>
>>> I read this subsystem virtualization proposal of yours after I sent my proposal for subsystem virtualization in patch 1/5 thread.
>>> I guess this means that this is the right way to go.
>>> Lets continue brainstorming this idea. I think this can be the way to migrate NVMe controllers in a standard way.
>>>
>>>>
>>>> To cycle back to the hardware that is posted here, I'm really confused
>>>> how it actually has any chance to work and no one has even tried
>>>> to explain how it is supposed to work.
>>>
>>> I guess in vendor specific implementation you can assume some things that we are discussing now for making it as a standard.
>>
>> Yes, as I wrote in the cover letter, this is a reference implementation to
>> start a discussion and help drive standardization efforts, but this series
>> works well for Intel IPU NVMe. As Jason said, there are two use cases:
>> shared medium and local medium. I think the live migration of the local medium
>> is complicated due to the large amount of user data that needs to be migrated.
>> I don't have a good idea to deal with this situation. But for Intel IPU NVMe,
>> each VF can connect to remote storage via the NVMF protocol to achieve storage
>> offloading. This is the shared medium. In this case, we don't need to migrate
>> the user data, which will significantly simplify the work of live migration.
>
> I don't think that medium migration should be part of the SPEC. We can specify it's out of scope.
>
> All the idea of live migration is to have a short downtime and I don't think we can guarantee short downtime if we need to copy few terabytes throw the networking.
> If the media copy is taking few seconds, there is no need to do live migration of few milisecs downtime. Just do regular migration of a
>
>>
>> The series tries to solve the problem of live migration of shared medium.
>> But it still lacks dirty page tracking and P2P support, we are also developing
>> these features.
>>
>> About the nvme device state, As described in my document, the VF states include
>> VF CSR registers, Every IO Queue Pair state, and the AdminQ state. During the
>> implementation, I found that the device state data is small per VF. So, I decided
>> to use the admin queue of the Primary controller to send the live migration
>> commands to save and restore the VF states like MLX5.
>
> I think and hope we all agree that the AdminQ of the controlling NVMe function will be used to migrate the controlled NVMe function.
Fully agree.
>
> which document are you refereeing to ?
The fifth patch includes the definition of these commands and how the firmware handles
these live migration commands. It's the documentation that I referenced.
>>
>>>
>>>
next prev parent reply other threads:[~2022-12-12 1:20 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-06 5:58 [RFC PATCH 0/5] Add new VFIO PCI driver for NVMe devices Lei Rao
2022-12-06 5:58 ` [RFC PATCH 1/5] nvme-pci: add function nvme_submit_vf_cmd to issue admin commands for VF driver Lei Rao
2022-12-06 6:19 ` Christoph Hellwig
2022-12-06 13:44 ` Jason Gunthorpe
2022-12-06 13:51 ` Keith Busch
2022-12-06 14:27 ` Jason Gunthorpe
2022-12-06 13:58 ` Christoph Hellwig
2022-12-06 15:22 ` Jason Gunthorpe
2022-12-06 15:38 ` Christoph Hellwig
2022-12-06 15:51 ` Jason Gunthorpe
2022-12-06 16:55 ` Christoph Hellwig
2022-12-06 19:15 ` Jason Gunthorpe
2022-12-07 2:30 ` Max Gurtovoy
2022-12-07 7:58 ` Christoph Hellwig
2022-12-09 2:11 ` Tian, Kevin
2022-12-12 7:41 ` Christoph Hellwig
2022-12-07 7:54 ` Christoph Hellwig
2022-12-07 10:59 ` Max Gurtovoy
2022-12-07 13:46 ` Christoph Hellwig
2022-12-07 14:50 ` Max Gurtovoy
2022-12-07 16:35 ` Christoph Hellwig
2022-12-07 13:34 ` Jason Gunthorpe
2022-12-07 13:52 ` Christoph Hellwig
2022-12-07 15:07 ` Jason Gunthorpe
2022-12-07 16:38 ` Christoph Hellwig
2022-12-07 17:31 ` Jason Gunthorpe
2022-12-07 18:33 ` Christoph Hellwig
2022-12-07 20:08 ` Jason Gunthorpe
2022-12-09 2:50 ` Tian, Kevin
2022-12-09 18:56 ` Dong, Eddie
2022-12-11 11:39 ` Max Gurtovoy
2022-12-12 7:55 ` Christoph Hellwig
2022-12-12 14:49 ` Max Gurtovoy
2022-12-12 7:50 ` Christoph Hellwig
2022-12-13 14:01 ` Jason Gunthorpe
2022-12-13 16:08 ` Christoph Hellwig
2022-12-13 17:49 ` Jason Gunthorpe
2022-12-06 5:58 ` [RFC PATCH 2/5] nvme-vfio: add new vfio-pci driver for NVMe device Lei Rao
2022-12-06 5:58 ` [RFC PATCH 3/5] nvme-vfio: enable the function of VFIO live migration Lei Rao
2023-01-19 10:21 ` Max Gurtovoy
2023-02-09 9:09 ` Rao, Lei
2022-12-06 5:58 ` [RFC PATCH 4/5] nvme-vfio: check if the hardware supports " Lei Rao
2022-12-06 13:47 ` Keith Busch
2022-12-06 5:58 ` [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device Lei Rao
2022-12-06 6:26 ` Christoph Hellwig
2022-12-06 13:05 ` Jason Gunthorpe
2022-12-06 13:09 ` Christoph Hellwig
2022-12-06 13:52 ` Jason Gunthorpe
2022-12-06 14:00 ` Christoph Hellwig
2022-12-06 14:20 ` Jason Gunthorpe
2022-12-06 14:31 ` Christoph Hellwig
2022-12-06 14:48 ` Jason Gunthorpe
2022-12-06 15:01 ` Christoph Hellwig
2022-12-06 15:28 ` Jason Gunthorpe
2022-12-06 15:35 ` Christoph Hellwig
2022-12-06 18:00 ` Dong, Eddie
2022-12-12 7:57 ` Christoph Hellwig
2022-12-11 12:05 ` Max Gurtovoy
2022-12-11 13:21 ` Rao, Lei
2022-12-11 14:51 ` Max Gurtovoy
2022-12-12 1:20 ` Rao, Lei [this message]
2022-12-12 8:09 ` Christoph Hellwig
2022-12-09 2:05 ` Tian, Kevin
2022-12-09 16:53 ` Li, Yadong
2022-12-12 8:11 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0bda7899-6790-4b24-f622-5f9c8951c239@intel.com \
--to=lei.rao@intel.com \
--cc=Konrad.wilk@oracle.com \
--cc=alex.williamson@redhat.com \
--cc=axboe@fb.com \
--cc=cohuck@redhat.com \
--cc=eddie.dong@intel.com \
--cc=hang.yuan@intel.com \
--cc=hch@lst.de \
--cc=jgg@ziepe.ca \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mgurtovoy@nvidia.com \
--cc=mjrosato@linux.ibm.com \
--cc=oren@nvidia.com \
--cc=sagi@grimberg.me \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=stephen@eideticom.com \
--cc=yadong.li@intel.com \
--cc=yi.l.liu@intel.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).