netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH vfio 0/7] pds vfio driver
@ 2022-12-07  1:06 Brett Creeley
  2022-12-07  1:06 ` [RFC PATCH vfio 1/7] vfio/pds: Initial support for pds_vfio VFIO driver Brett Creeley
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-07  1:06 UTC (permalink / raw)
  To: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Brett Creeley

This is a first draft patchset for a new vendor specific VFIO driver for
use with the AMD/Pensando Distributed Services Card (DSC). This driver
(pds_vfio) is a client of the newly introduced pds_core driver.

Reference to the pds_core patchset:
https://lore.kernel.org/netdev/20221207004443.33779-1-shannon.nelson@amd.com/

AMD/Pensando already supports a NVMe VF device (1dd8:1006) in the
Distributed Services Card (DSC). This patchset adds the new pds_vfio
driver in order to support NVMe VF live migration.

This driver will use the pds_core device and auxiliary_bus as the VFIO
control path to the DSC. The pds_core device creates auxiliary_bus devices
for each live migratable VF. The devices are named by their feature plus
the VF PCI BDF so the auxiliary_bus driver implemented by pds_vfio can find
its related VF PCI driver instance. Once this auxiliary bus connection
is configured, the pds_vfio driver can send admin queue commands to the
device and receive events from pds_core.

An ASCII diagram of a VFIO instance looks something like this and can
be used with the VFIO subsystem to provide devices VFIO and live
migration support.

                               .------.  .--------------------------.
                               | QEMU |--|  VM     .-------------.  |
                               '......'  |         | nvme driver |  |
                                  |      |         .-------------.  |
                                  |      |         |  SR-IOV VF  |  |
                                  |      |         '-------------'  |
                                  |      '---------------||---------'
                               .--------------.          ||
                               |/dev/<vfio_fd>|          ||
                               '--------------'          ||
Host Userspace                         |                 ||
===================================================      ||
Host Kernel                            |                 ||
                                       |                 ||
           pds_core.LM.2305 <--+   .--------.            ||
                   |           |   |vfio-pci|            ||
                   |           |   '--------'            ||
                   |           |       |                 ||
         .------------.       .-------------.            ||
         |  pds_core  |       |   pds_vfio  |            ||
         '------------'       '-------------'            ||
               ||                   ||                   ||
             09:00.0              09:00.1                ||
== PCI ==================================================||=====
               ||                   ||                   ||
          .----------.         .----------.              ||
    ,-----|    PF    |---------|    VF    |-------------------,
    |     '----------'         '----------'  |      nvme      |
    |                     DSC                |  data/control  |
    |                                        |      path      |
    -----------------------------------------------------------


The pds_vfio driver is targeted to reside in drivers/vfio/pci/pds.
It makes use of and introduces new files in the common include/linux/pds
include directory.

Brett Creeley (7):
  pds_vfio: Initial support for pds_vfio VFIO driver
  pds_vfio: Add support to register as PDS client
  pds_vfio: Add VFIO live migration support
  vfio: Commonize combine_ranges for use in other VFIO drivers
  pds_vfio: Add support for dirty page tracking
  pds_vfio: Add support for firmware recovery
  pds_vfio: Add documentation files

 .../ethernet/pensando/pds_vfio.rst            |  88 +++
 drivers/vfio/pci/Kconfig                      |   2 +
 drivers/vfio/pci/mlx5/cmd.c                   |  48 +-
 drivers/vfio/pci/pds/Kconfig                  |  10 +
 drivers/vfio/pci/pds/Makefile                 |  12 +
 drivers/vfio/pci/pds/aux_drv.c                | 216 +++++++
 drivers/vfio/pci/pds/aux_drv.h                |  30 +
 drivers/vfio/pci/pds/cmds.c                   | 486 ++++++++++++++++
 drivers/vfio/pci/pds/cmds.h                   |  44 ++
 drivers/vfio/pci/pds/dirty.c                  | 541 ++++++++++++++++++
 drivers/vfio/pci/pds/dirty.h                  |  49 ++
 drivers/vfio/pci/pds/lm.c                     | 484 ++++++++++++++++
 drivers/vfio/pci/pds/lm.h                     |  53 ++
 drivers/vfio/pci/pds/pci_drv.c                | 134 +++++
 drivers/vfio/pci/pds/pci_drv.h                |   9 +
 drivers/vfio/pci/pds/vfio_dev.c               | 238 ++++++++
 drivers/vfio/pci/pds/vfio_dev.h               |  42 ++
 drivers/vfio/vfio_main.c                      |  48 ++
 include/linux/pds/pds_core_if.h               |   1 +
 include/linux/pds/pds_lm.h                    | 356 ++++++++++++
 include/linux/vfio.h                          |   3 +
 21 files changed, 2847 insertions(+), 47 deletions(-)
 create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst
 create mode 100644 drivers/vfio/pci/pds/Kconfig
 create mode 100644 drivers/vfio/pci/pds/Makefile
 create mode 100644 drivers/vfio/pci/pds/aux_drv.c
 create mode 100644 drivers/vfio/pci/pds/aux_drv.h
 create mode 100644 drivers/vfio/pci/pds/cmds.c
 create mode 100644 drivers/vfio/pci/pds/cmds.h
 create mode 100644 drivers/vfio/pci/pds/dirty.c
 create mode 100644 drivers/vfio/pci/pds/dirty.h
 create mode 100644 drivers/vfio/pci/pds/lm.c
 create mode 100644 drivers/vfio/pci/pds/lm.h
 create mode 100644 drivers/vfio/pci/pds/pci_drv.c
 create mode 100644 drivers/vfio/pci/pds/pci_drv.h
 create mode 100644 drivers/vfio/pci/pds/vfio_dev.c
 create mode 100644 drivers/vfio/pci/pds/vfio_dev.h
 create mode 100644 include/linux/pds/pds_lm.h

--
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH vfio 1/7] vfio/pds: Initial support for pds_vfio VFIO driver
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
@ 2022-12-07  1:06 ` Brett Creeley
  2022-12-07  1:07 ` [RFC PATCH vfio 2/7] vfio/pds: Add support to register as PDS client Brett Creeley
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-07  1:06 UTC (permalink / raw)
  To: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Brett Creeley

This is the initial framework for the new pds_vfio device driver. This
does the very basics of registering the PCI device 1dd8:1006 and
configuring as a VFIO PCI device.

With this change, the VF device can be bound to the pds_vfio driver on
the host and presented to the VM as an NVMe VF.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 drivers/vfio/pci/pds/Makefile   |   8 +++
 drivers/vfio/pci/pds/pci_drv.c  | 100 ++++++++++++++++++++++++++++++++
 drivers/vfio/pci/pds/vfio_dev.c |  74 +++++++++++++++++++++++
 drivers/vfio/pci/pds/vfio_dev.h |  23 ++++++++
 include/linux/pds/pds_core_if.h |   1 +
 5 files changed, 206 insertions(+)
 create mode 100644 drivers/vfio/pci/pds/Makefile
 create mode 100644 drivers/vfio/pci/pds/pci_drv.c
 create mode 100644 drivers/vfio/pci/pds/vfio_dev.c
 create mode 100644 drivers/vfio/pci/pds/vfio_dev.h

diff --git a/drivers/vfio/pci/pds/Makefile b/drivers/vfio/pci/pds/Makefile
new file mode 100644
index 000000000000..cd012648a655
--- /dev/null
+++ b/drivers/vfio/pci/pds/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_PDS_VFIO_PCI) += pds_vfio.o
+
+pds_vfio-y := \
+	pci_drv.o	\
+	vfio_dev.o
+
+
diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c
new file mode 100644
index 000000000000..9a601194201d
--- /dev/null
+++ b/drivers/vfio/pci/pds/pci_drv.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/types.h>
+#include <linux/vfio.h>
+
+#include <linux/pds/pds_core_if.h>
+
+#include "vfio_dev.h"
+
+#define PDS_VFIO_DRV_NAME		"pds_vfio"
+#define PDS_VFIO_DRV_DESCRIPTION	"Pensando VFIO Device Driver"
+#define PCI_VENDOR_ID_PENSANDO		0x1dd8
+
+static int
+pds_vfio_pci_probe(struct pci_dev *pdev,
+		   const struct pci_device_id *id)
+{
+	struct pds_vfio_pci_device *pds_vfio;
+	int err;
+
+	pds_vfio = vfio_alloc_device(pds_vfio_pci_device, vfio_coredev.vdev,
+				     &pdev->dev,  pds_vfio_ops_info());
+	if (IS_ERR(pds_vfio))
+		return PTR_ERR(pds_vfio);
+
+	dev_set_drvdata(&pdev->dev, &pds_vfio->vfio_coredev);
+	pds_vfio->pdev = pdev;
+
+	err = vfio_pci_core_register_device(&pds_vfio->vfio_coredev);
+	if (err)
+		goto out_put_vdev;
+
+	return 0;
+
+out_put_vdev:
+	vfio_put_device(&pds_vfio->vfio_coredev.vdev);
+	return err;
+}
+
+static void
+pds_vfio_pci_remove(struct pci_dev *pdev)
+{
+	struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev);
+
+	vfio_pci_core_unregister_device(&pds_vfio->vfio_coredev);
+	vfio_put_device(&pds_vfio->vfio_coredev.vdev);
+}
+
+static const struct pci_device_id
+pds_vfio_pci_table[] = {
+	{
+		.class = PCI_CLASS_STORAGE_EXPRESS,
+		.class_mask = 0xffffff,
+		.vendor = PCI_VENDOR_ID_PENSANDO,
+		.device = PCI_DEVICE_ID_PENSANDO_NVME_VF,
+		.subvendor = PCI_ANY_ID,
+		.subdevice = PCI_ANY_ID,
+		.override_only = PCI_ID_F_VFIO_DRIVER_OVERRIDE,
+	},
+	{ 0, }
+};
+MODULE_DEVICE_TABLE(pci, pds_vfio_pci_table);
+
+static struct pci_driver
+pds_vfio_pci_driver = {
+	.name = PDS_VFIO_DRV_NAME,
+	.id_table = pds_vfio_pci_table,
+	.probe = pds_vfio_pci_probe,
+	.remove = pds_vfio_pci_remove,
+	.driver_managed_dma = true,
+};
+
+static void __exit
+pds_vfio_pci_cleanup(void)
+{
+	pci_unregister_driver(&pds_vfio_pci_driver);
+}
+module_exit(pds_vfio_pci_cleanup);
+
+static int __init
+pds_vfio_pci_init(void)
+{
+	int err;
+
+	err = pci_register_driver(&pds_vfio_pci_driver);
+	if (err) {
+		pr_err("pci driver register failed: %pe\n", ERR_PTR(err));
+		return err;
+	}
+
+	return 0;
+}
+module_init(pds_vfio_pci_init);
+
+MODULE_DESCRIPTION(PDS_VFIO_DRV_DESCRIPTION);
+MODULE_AUTHOR("Pensando Systems, Inc");
+MODULE_LICENSE("GPL");
diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c
new file mode 100644
index 000000000000..f8f4006c0915
--- /dev/null
+++ b/drivers/vfio/pci/pds/vfio_dev.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+
+#include "vfio_dev.h"
+
+struct pds_vfio_pci_device *
+pds_vfio_pci_drvdata(struct pci_dev *pdev)
+{
+	struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
+
+	return container_of(core_device, struct pds_vfio_pci_device,
+			    vfio_coredev);
+}
+
+static int
+pds_vfio_init_device(struct vfio_device *vdev)
+{
+	struct pds_vfio_pci_device *pds_vfio =
+		container_of(vdev, struct pds_vfio_pci_device,
+			     vfio_coredev.vdev);
+	struct pci_dev *pdev = to_pci_dev(vdev->dev);
+	int err;
+
+	err = vfio_pci_core_init_dev(vdev);
+	if (err)
+		return err;
+
+	pds_vfio->vf_id = pci_iov_vf_id(pdev);
+	pds_vfio->pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn);
+
+	return 0;
+}
+
+static int
+pds_vfio_open_device(struct vfio_device *vdev)
+{
+	struct pds_vfio_pci_device *pds_vfio =
+		container_of(vdev, struct pds_vfio_pci_device,
+			     vfio_coredev.vdev);
+	int err;
+
+	err = vfio_pci_core_enable(&pds_vfio->vfio_coredev);
+	if (err)
+		return err;
+
+	vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev);
+
+	return 0;
+}
+
+static const struct vfio_device_ops
+pds_vfio_ops = {
+	.name = "pds-vfio",
+	.init = pds_vfio_init_device,
+	.release = vfio_pci_core_release_dev,
+	.open_device = pds_vfio_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+};
+
+const struct vfio_device_ops *
+pds_vfio_ops_info(void)
+{
+	return &pds_vfio_ops;
+}
diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h
new file mode 100644
index 000000000000..289479a08dce
--- /dev/null
+++ b/drivers/vfio/pci/pds/vfio_dev.h
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _VFIO_DEV_H_
+#define _VFIO_DEV_H_
+
+#include <linux/pci.h>
+#include <linux/vfio_pci_core.h>
+
+struct pds_vfio_pci_device {
+	struct vfio_pci_core_device vfio_coredev;
+	struct pci_dev *pdev;
+
+	int vf_id;
+	int pci_id;
+};
+
+const struct vfio_device_ops *
+pds_vfio_ops_info(void);
+struct pds_vfio_pci_device *
+pds_vfio_pci_drvdata(struct pci_dev *pdev);
+
+#endif /* _VFIO_DEV_H_ */
diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h
index 6e92697657e4..4362b94a7666 100644
--- a/include/linux/pds/pds_core_if.h
+++ b/include/linux/pds/pds_core_if.h
@@ -9,6 +9,7 @@
 #define PCI_VENDOR_ID_PENSANDO			0x1dd8
 #define PCI_DEVICE_ID_PENSANDO_CORE_PF		0x100c
 #define PCI_DEVICE_ID_PENSANDO_VDPA_VF          0x100b
+#define PCI_DEVICE_ID_PENSANDO_NVME_VF		0x1006
 
 #define PDS_CORE_BARS_MAX			4
 #define PDS_CORE_PCI_BAR_DBELL			1
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH vfio 2/7] vfio/pds: Add support to register as PDS client
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
  2022-12-07  1:06 ` [RFC PATCH vfio 1/7] vfio/pds: Initial support for pds_vfio VFIO driver Brett Creeley
@ 2022-12-07  1:07 ` Brett Creeley
  2022-12-07  1:07 ` [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support Brett Creeley
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-07  1:07 UTC (permalink / raw)
  To: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Brett Creeley

The pds_core driver will create auxiliary devices for each PCI device
supported by pds_vfio. In order to communicate with the device, the
pds_vfio driver needs to register as an auxiliary driver for the
previously mentioned auxiliary device. Once the auxiliary device
is probed, the pds_vfio driver can send admin queue commands and
receive events from the device by way of pds_core.

Use the following commands to enable a VF and tell pds_core to
create its corresponding auxiliary device:

echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
devlink dev param set pci/$PF_BDF name enable_migration value true cmode runtime

This functionality is needed to support live migration commands, which
are added later in the series.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 drivers/vfio/pci/pds/Makefile   |   2 +
 drivers/vfio/pci/pds/aux_drv.c  | 155 ++++++++++++++++++++++++++++++++
 drivers/vfio/pci/pds/aux_drv.h  |  29 ++++++
 drivers/vfio/pci/pds/cmds.c     |  30 +++++++
 drivers/vfio/pci/pds/cmds.h     |  14 +++
 drivers/vfio/pci/pds/pci_drv.c  |  17 ++++
 drivers/vfio/pci/pds/pci_drv.h  |   9 ++
 drivers/vfio/pci/pds/vfio_dev.c |   8 ++
 drivers/vfio/pci/pds/vfio_dev.h |   1 +
 include/linux/pds/pds_lm.h      |  12 +++
 10 files changed, 277 insertions(+)
 create mode 100644 drivers/vfio/pci/pds/aux_drv.c
 create mode 100644 drivers/vfio/pci/pds/aux_drv.h
 create mode 100644 drivers/vfio/pci/pds/cmds.c
 create mode 100644 drivers/vfio/pci/pds/cmds.h
 create mode 100644 drivers/vfio/pci/pds/pci_drv.h
 create mode 100644 include/linux/pds/pds_lm.h

diff --git a/drivers/vfio/pci/pds/Makefile b/drivers/vfio/pci/pds/Makefile
index cd012648a655..1d927498b7bb 100644
--- a/drivers/vfio/pci/pds/Makefile
+++ b/drivers/vfio/pci/pds/Makefile
@@ -2,6 +2,8 @@
 obj-$(CONFIG_PDS_VFIO_PCI) += pds_vfio.o
 
 pds_vfio-y := \
+	aux_drv.o	\
+	cmds.o		\
 	pci_drv.o	\
 	vfio_dev.o
 
diff --git a/drivers/vfio/pci/pds/aux_drv.c b/drivers/vfio/pci/pds/aux_drv.c
new file mode 100644
index 000000000000..ef8f67ff4152
--- /dev/null
+++ b/drivers/vfio/pci/pds/aux_drv.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/auxiliary_bus.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+
+#include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
+#include <linux/pds/pds_lm.h>
+
+#include "aux_drv.h"
+#include "vfio_dev.h"
+#include "pci_drv.h"
+#include "cmds.h"
+
+static const
+struct auxiliary_device_id pds_vfio_aux_id_table[] = {
+	{ .name = PDS_LM_DEV_NAME, },
+	{},
+};
+
+static void
+pds_vfio_aux_notify_handler(struct pds_auxiliary_dev *padev,
+			    union pds_core_notifyq_comp *event)
+{
+	struct device *dev = &padev->aux_dev.dev;
+	u16 ecode = le16_to_cpu(event->ecode);
+
+	dev_dbg(dev, "%s: event code %d\n", __func__, ecode);
+}
+
+static int
+pds_vfio_aux_probe(struct auxiliary_device *aux_dev,
+		   const struct auxiliary_device_id *id)
+
+{
+	struct pds_auxiliary_dev *padev =
+		container_of(aux_dev, struct pds_auxiliary_dev, aux_dev);
+	struct device *dev = &aux_dev->dev;
+	struct pds_vfio_aux *vfio_aux;
+	struct pci_dev *pdev;
+	struct pci_bus *bus;
+	int busnr;
+	u16 devfn;
+	int err;
+
+	vfio_aux = kzalloc(sizeof(*vfio_aux), GFP_KERNEL);
+	if (!vfio_aux)
+		return -ENOMEM;
+
+	vfio_aux->padev = padev;
+	auxiliary_set_drvdata(aux_dev, vfio_aux);
+
+	/* Find our VF PCI device */
+	busnr = PCI_BUS_NUM(padev->id);
+	devfn = padev->id & 0xff;
+	bus = pci_find_bus(0, busnr);
+	pdev = pci_get_slot(bus, devfn);
+
+	vfio_aux->pds_vfio = pci_get_drvdata(pdev);
+	if (!vfio_aux->pds_vfio) {
+		dev_dbg(&pdev->dev, "PCI device not probed yet, defer until PCI device is probed by pds_vfio driver\n");
+		err = -EPROBE_DEFER;
+		goto err_pci_device_not_probed;
+	}
+
+	pdev = vfio_aux->pds_vfio->pdev;
+	if (!pds_vfio_is_vfio_pci_driver(pdev)) {
+		dev_err(&pdev->dev, "PCI driver is not pds_vfio_pci_driver\n");
+		err = -EINVAL;
+		goto err_invalid_driver;
+	}
+
+	dev_dbg(dev, "%s: id %#04x busnr %#x devfn %#x bus %p pds_vfio %p\n",
+		__func__, padev->id, busnr, devfn, bus, vfio_aux->pds_vfio);
+
+	vfio_aux->pds_vfio->vfio_aux = vfio_aux;
+
+	vfio_aux->padrv.event_handler = pds_vfio_aux_notify_handler;
+	err = pds_vfio_register_client_cmd(vfio_aux->pds_vfio);
+	if (err) {
+		dev_err(dev, "failed to register as client: %pe\n",
+			ERR_PTR(err));
+		goto err_register_client;
+	}
+
+	return 0;
+
+err_register_client:
+	auxiliary_set_drvdata(aux_dev, NULL);
+err_invalid_driver:
+err_pci_device_not_probed:
+	kfree(vfio_aux);
+
+	return err;
+}
+
+static void
+pds_vfio_aux_remove(struct auxiliary_device *aux_dev)
+{
+	struct pds_vfio_aux *vfio_aux = auxiliary_get_drvdata(aux_dev);
+	struct pds_vfio_pci_device *pds_vfio = vfio_aux->pds_vfio;
+
+	if (pds_vfio) {
+		pds_vfio_unregister_client_cmd(pds_vfio);
+		vfio_aux->pds_vfio->vfio_aux = NULL;
+		pci_dev_put(pds_vfio->pdev);
+	}
+
+	kfree(vfio_aux);
+	auxiliary_set_drvdata(aux_dev, NULL);
+}
+
+static struct auxiliary_driver
+pds_vfio_aux_driver = {
+	.name = PDS_DEV_TYPE_LM_STR,
+	.probe = pds_vfio_aux_probe,
+	.remove = pds_vfio_aux_remove,
+	.id_table = pds_vfio_aux_id_table,
+};
+
+struct auxiliary_driver *
+pds_vfio_aux_driver_info(void)
+{
+	return &pds_vfio_aux_driver;
+}
+
+static int
+pds_vfio_aux_match_id(struct device *dev, const void *data)
+{
+	dev_dbg(dev, "%s: %s\n", __func__, (char *)data);
+	return !strcmp(dev_name(dev), data);
+}
+
+struct pds_vfio_aux *
+pds_vfio_aux_get_drvdata(int vf_pci_id)
+{
+	struct auxiliary_device *aux_dev;
+	char name[32];
+
+	snprintf(name, sizeof(name), "%s.%d", PDS_LM_DEV_NAME, vf_pci_id);
+	aux_dev = auxiliary_find_device(NULL, name, pds_vfio_aux_match_id);
+	if (!aux_dev)
+		return NULL;
+
+	return auxiliary_get_drvdata(aux_dev);
+}
+
+void
+pds_vfio_put_aux_dev(struct pds_vfio_aux *vfio_aux)
+{
+	put_device(&vfio_aux->padev->aux_dev.dev);
+}
diff --git a/drivers/vfio/pci/pds/aux_drv.h b/drivers/vfio/pci/pds/aux_drv.h
new file mode 100644
index 000000000000..0f05a968bb00
--- /dev/null
+++ b/drivers/vfio/pci/pds/aux_drv.h
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _AUX_DRV_H_
+#define _AUX_DRV_H_
+
+#include <linux/auxiliary_bus.h>
+
+#include <linux/pds/pds_intr.h>
+#include <linux/pds/pds_common.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
+
+struct pds_vfio_pci_device;
+
+struct pds_vfio_aux {
+	struct pds_auxiliary_dev *padev;
+	struct pds_auxiliary_drv padrv;
+	struct pds_vfio_pci_device *pds_vfio;
+};
+
+struct auxiliary_driver *
+pds_vfio_aux_driver_info(void);
+struct pds_vfio_aux *
+pds_vfio_aux_get_drvdata(int vf_pci_id);
+void
+pds_vfio_put_aux_dev(struct pds_vfio_aux *vfio_aux);
+
+#endif /* _AUX_DRV_H_ */
diff --git a/drivers/vfio/pci/pds/cmds.c b/drivers/vfio/pci/pds/cmds.c
new file mode 100644
index 000000000000..5a3fadcd38d8
--- /dev/null
+++ b/drivers/vfio/pci/pds/cmds.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/io.h>
+#include <linux/types.h>
+
+#include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
+
+#include "vfio_dev.h"
+#include "aux_drv.h"
+#include "cmds.h"
+
+int
+pds_vfio_register_client_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pds_vfio_aux *vfio_aux = pds_vfio->vfio_aux;
+	struct pds_auxiliary_dev *padev = vfio_aux->padev;
+
+	return padev->ops->register_client(padev, &vfio_aux->padrv);
+}
+
+void
+pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pds_auxiliary_dev *padev = pds_vfio->vfio_aux->padev;
+
+	padev->ops->unregister_client(padev);
+}
diff --git a/drivers/vfio/pci/pds/cmds.h b/drivers/vfio/pci/pds/cmds.h
new file mode 100644
index 000000000000..7fe2d1efd894
--- /dev/null
+++ b/drivers/vfio/pci/pds/cmds.h
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _CMDS_H_
+#define _CMDS_H_
+
+struct pds_vfio_pci_device;
+
+int
+pds_vfio_register_client_cmd(struct pds_vfio_pci_device *pds_vfio);
+void
+pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio);
+
+#endif /* _CMDS_H_ */
diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c
index 9a601194201d..f6483494031d 100644
--- a/drivers/vfio/pci/pds/pci_drv.c
+++ b/drivers/vfio/pci/pds/pci_drv.c
@@ -9,6 +9,8 @@
 #include <linux/pds/pds_core_if.h>
 
 #include "vfio_dev.h"
+#include "aux_drv.h"
+#include "pci_drv.h"
 
 #define PDS_VFIO_DRV_NAME		"pds_vfio"
 #define PDS_VFIO_DRV_DESCRIPTION	"Pensando VFIO Device Driver"
@@ -73,9 +75,17 @@ pds_vfio_pci_driver = {
 	.driver_managed_dma = true,
 };
 
+bool
+pds_vfio_is_vfio_pci_driver(struct pci_dev *pdev)
+{
+	return (to_pci_driver(pdev->dev.driver) == &pds_vfio_pci_driver);
+}
+
 static void __exit
 pds_vfio_pci_cleanup(void)
 {
+	auxiliary_driver_unregister(pds_vfio_aux_driver_info());
+
 	pci_unregister_driver(&pds_vfio_pci_driver);
 }
 module_exit(pds_vfio_pci_cleanup);
@@ -91,6 +101,13 @@ pds_vfio_pci_init(void)
 		return err;
 	}
 
+	err = auxiliary_driver_register(pds_vfio_aux_driver_info());
+	if (err) {
+		pr_err("aux driver register failed: %pe\n", ERR_PTR(err));
+		pci_unregister_driver(&pds_vfio_pci_driver);
+		return err;
+	}
+
 	return 0;
 }
 module_init(pds_vfio_pci_init);
diff --git a/drivers/vfio/pci/pds/pci_drv.h b/drivers/vfio/pci/pds/pci_drv.h
new file mode 100644
index 000000000000..e174cb1afd73
--- /dev/null
+++ b/drivers/vfio/pci/pds/pci_drv.h
@@ -0,0 +1,9 @@
+#ifndef _PCI_DRV_H
+#define _PCI_DRV_H
+
+#include <linux/pci.h>
+
+bool
+pds_vfio_is_vfio_pci_driver(struct pci_dev *pdev);
+
+#endif /* _PCI_DRV_H */
diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c
index f8f4006c0915..30c3bb47a2be 100644
--- a/drivers/vfio/pci/pds/vfio_dev.c
+++ b/drivers/vfio/pci/pds/vfio_dev.c
@@ -5,6 +5,7 @@
 #include <linux/vfio_pci_core.h>
 
 #include "vfio_dev.h"
+#include "aux_drv.h"
 
 struct pds_vfio_pci_device *
 pds_vfio_pci_drvdata(struct pci_dev *pdev)
@@ -22,6 +23,7 @@ pds_vfio_init_device(struct vfio_device *vdev)
 		container_of(vdev, struct pds_vfio_pci_device,
 			     vfio_coredev.vdev);
 	struct pci_dev *pdev = to_pci_dev(vdev->dev);
+	struct pds_vfio_aux *vfio_aux;
 	int err;
 
 	err = vfio_pci_core_init_dev(vdev);
@@ -30,6 +32,12 @@ pds_vfio_init_device(struct vfio_device *vdev)
 
 	pds_vfio->vf_id = pci_iov_vf_id(pdev);
 	pds_vfio->pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn);
+	vfio_aux = pds_vfio_aux_get_drvdata(pds_vfio->pci_id);
+	if (vfio_aux) {
+		vfio_aux->pds_vfio = pds_vfio;
+		pds_vfio->vfio_aux = vfio_aux;
+		pds_vfio_put_aux_dev(vfio_aux);
+	}
 
 	return 0;
 }
diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h
index 289479a08dce..b16668693e1f 100644
--- a/drivers/vfio/pci/pds/vfio_dev.h
+++ b/drivers/vfio/pci/pds/vfio_dev.h
@@ -10,6 +10,7 @@
 struct pds_vfio_pci_device {
 	struct vfio_pci_core_device vfio_coredev;
 	struct pci_dev *pdev;
+	struct pds_vfio_aux *vfio_aux;
 
 	int vf_id;
 	int pci_id;
diff --git a/include/linux/pds/pds_lm.h b/include/linux/pds/pds_lm.h
new file mode 100644
index 000000000000..fdaf2bf71d35
--- /dev/null
+++ b/include/linux/pds/pds_lm.h
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _PDS_LM_H_
+#define _PDS_LM_H_
+
+#include "pds_common.h"
+
+#define PDS_DEV_TYPE_LM_STR	"LM"
+#define PDS_LM_DEV_NAME		PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_LM_STR
+
+#endif /* _PDS_LM_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
  2022-12-07  1:06 ` [RFC PATCH vfio 1/7] vfio/pds: Initial support for pds_vfio VFIO driver Brett Creeley
  2022-12-07  1:07 ` [RFC PATCH vfio 2/7] vfio/pds: Add support to register as PDS client Brett Creeley
@ 2022-12-07  1:07 ` Brett Creeley
  2022-12-07 17:09   ` Jason Gunthorpe
  2022-12-07  1:07 ` [RFC PATCH vfio 4/7] vfio: Commonize combine_ranges for use in other VFIO drivers Brett Creeley
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Brett Creeley @ 2022-12-07  1:07 UTC (permalink / raw)
  To: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Brett Creeley

Add ive migration support via the VFIO subsystem. The migration
implementation aligns with the definition from uapi/vfio.h and uses the
auxiliary bus connection between pds_core/pds_vfio to communicate with
the device.

The ability to suspend, resume, and transfer VF device state data is
included along with the required admin queue command structures and
implementations.

PDS_LM_CMD_SUSPEND and PDS_LM_CMD_SUSPEND_STATUS are added to support
the VF device suspend operation.

PDS_LM_CMD_RESUME is added to support the VF device resume operation.

PDS_LM_CMD_STATUS is added to determine the exact size of the VF
device state data.

PDS_LM_CMD_SAVE is added to get the VF device state data.

PDS_LM_CMD_RESTORE is added to restore the VF device with the
previously saved data from PDS_LM_CMD_SAVE.

PDS_LM_CMD_HOST_VF_STATUS is added to notify the device when
a migration is in/not-in progress from the host's perspective.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 drivers/vfio/pci/pds/Makefile   |   1 +
 drivers/vfio/pci/pds/aux_drv.c  |   1 +
 drivers/vfio/pci/pds/cmds.c     | 312 ++++++++++++++++++++
 drivers/vfio/pci/pds/cmds.h     |  15 +
 drivers/vfio/pci/pds/lm.c       | 486 ++++++++++++++++++++++++++++++++
 drivers/vfio/pci/pds/lm.h       |  43 +++
 drivers/vfio/pci/pds/pci_drv.c  |  15 +
 drivers/vfio/pci/pds/vfio_dev.c | 119 +++++++-
 drivers/vfio/pci/pds/vfio_dev.h |  12 +
 include/linux/pds/pds_lm.h      | 206 ++++++++++++++
 10 files changed, 1209 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vfio/pci/pds/lm.c
 create mode 100644 drivers/vfio/pci/pds/lm.h

diff --git a/drivers/vfio/pci/pds/Makefile b/drivers/vfio/pci/pds/Makefile
index 1d927498b7bb..3d3be3593a02 100644
--- a/drivers/vfio/pci/pds/Makefile
+++ b/drivers/vfio/pci/pds/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_PDS_VFIO_PCI) += pds_vfio.o
 pds_vfio-y := \
 	aux_drv.o	\
 	cmds.o		\
+	lm.o		\
 	pci_drv.o	\
 	vfio_dev.o
 
diff --git a/drivers/vfio/pci/pds/aux_drv.c b/drivers/vfio/pci/pds/aux_drv.c
index ef8f67ff4152..b4da741d7956 100644
--- a/drivers/vfio/pci/pds/aux_drv.c
+++ b/drivers/vfio/pci/pds/aux_drv.c
@@ -76,6 +76,7 @@ pds_vfio_aux_probe(struct auxiliary_device *aux_dev,
 	dev_dbg(dev, "%s: id %#04x busnr %#x devfn %#x bus %p pds_vfio %p\n",
 		__func__, padev->id, busnr, devfn, bus, vfio_aux->pds_vfio);
 
+	vfio_aux->pds_vfio->coredev = aux_dev->dev.parent;
 	vfio_aux->pds_vfio->vfio_aux = vfio_aux;
 
 	vfio_aux->padrv.event_handler = pds_vfio_aux_notify_handler;
diff --git a/drivers/vfio/pci/pds/cmds.c b/drivers/vfio/pci/pds/cmds.c
index 5a3fadcd38d8..11823d824ccc 100644
--- a/drivers/vfio/pci/pds/cmds.c
+++ b/drivers/vfio/pci/pds/cmds.c
@@ -3,9 +3,11 @@
 
 #include <linux/io.h>
 #include <linux/types.h>
+#include <linux/delay.h>
 
 #include <linux/pds/pds_core_if.h>
 #include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_lm.h>
 #include <linux/pds/pds_auxbus.h>
 
 #include "vfio_dev.h"
@@ -28,3 +30,313 @@ pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio)
 
 	padev->ops->unregister_client(padev);
 }
+
+#define SUSPEND_TIMEOUT_S		5
+#define SUSPEND_CHECK_INTERVAL_MS	1
+
+static int
+pds_vfio_suspend_wait_device_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pds_lm_suspend_status_cmd cmd = {
+		.opcode = PDS_LM_CMD_SUSPEND_STATUS,
+		.vf_id	= cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	struct pds_lm_comp comp = { 0 };
+	struct pds_auxiliary_dev *padev;
+	unsigned long time_limit;
+	unsigned long time_start;
+	unsigned long time_done;
+	int err;
+
+	padev = pds_vfio->vfio_aux->padev;
+
+	time_start = jiffies;
+	time_limit = time_start + HZ * SUSPEND_TIMEOUT_S;
+	do {
+		err = padev->ops->adminq_cmd(padev,
+					     (union pds_core_adminq_cmd *)&cmd,
+					     sizeof(cmd),
+					     (union pds_core_adminq_comp *)&comp,
+					      PDS_AQ_FLAG_FASTPOLL);
+		if (err != -EAGAIN)
+			break;
+
+		msleep(SUSPEND_CHECK_INTERVAL_MS);
+	} while (time_before(jiffies, time_limit));
+
+	time_done = jiffies;
+	dev_dbg(&pdev->dev, "%s: vf%u: Suspend comp received in %d msecs\n",
+		__func__, pds_vfio->vf_id,
+		jiffies_to_msecs(time_done - time_start));
+
+	/* Check the results */
+	if (time_after_eq(time_done, time_limit)) {
+		dev_err(&pdev->dev, "%s: vf%u: Suspend comp timeout\n", __func__,
+			pds_vfio->vf_id);
+		err = -ETIMEDOUT;
+	}
+
+	return err;
+}
+
+int
+pds_vfio_suspend_device_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pds_lm_suspend_cmd cmd = {
+		.opcode = PDS_LM_CMD_SUSPEND,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pds_lm_suspend_comp comp = {0};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	struct pds_auxiliary_dev *padev;
+	int err;
+
+	dev_dbg(&pdev->dev, "vf%u: Suspend device\n", pds_vfio->vf_id);
+
+	padev = pds_vfio->vfio_aux->padev;
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     PDS_AQ_FLAG_FASTPOLL);
+	if (err) {
+		dev_err(&pdev->dev, "vf%u: Suspend failed: %pe\n",
+			pds_vfio->vf_id, ERR_PTR(err));
+		return err;
+	}
+
+	return pds_vfio_suspend_wait_device_cmd(pds_vfio);
+}
+
+int
+pds_vfio_resume_device_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pds_lm_resume_cmd cmd = {
+		.opcode = PDS_LM_CMD_RESUME,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pds_auxiliary_dev *padev;
+	struct pds_lm_comp comp = {0};
+
+	dev_dbg(&pds_vfio->pdev->dev, "vf%u: Resume device\n", pds_vfio->vf_id);
+
+	padev = pds_vfio->vfio_aux->padev;
+	return padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+}
+
+int
+pds_vfio_get_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio, u64 *size)
+{
+	struct pds_lm_status_cmd cmd = {
+		.opcode = PDS_LM_CMD_STATUS,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pds_lm_status_comp comp = {0};
+	struct pds_auxiliary_dev *padev;
+	int err = 0;
+
+	dev_dbg(&pds_vfio->pdev->dev, "vf%u: Get migration status\n",
+		pds_vfio->vf_id);
+
+	padev = pds_vfio->vfio_aux->padev;
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err)
+		return err;
+
+	*size = le64_to_cpu(comp.size);
+	return 0;
+}
+
+static int
+pds_vfio_dma_map_lm_file(struct device *dev, enum dma_data_direction dir,
+			 struct pds_vfio_lm_file *lm_file)
+{
+	struct pds_lm_sg_elem *sgl, *sge;
+	struct scatterlist *sg;
+	int err = 0;
+	int i;
+
+	if (!lm_file)
+		return -EINVAL;
+
+	/* dma map file pages */
+	err = dma_map_sgtable(dev, &lm_file->sg_table, dir, 0);
+	if (err)
+		goto err_dma_map_sg;
+
+	lm_file->num_sge = lm_file->sg_table.nents;
+
+	/* alloc sgl */
+	sgl = dma_alloc_coherent(dev, lm_file->num_sge *
+				 sizeof(struct pds_lm_sg_elem),
+				 &lm_file->sgl_addr, GFP_KERNEL);
+	if (!sgl) {
+		err = -ENOMEM;
+		goto err_alloc_sgl;
+	}
+
+	lm_file->sgl = sgl;
+
+	/* fill sgl */
+	sge = sgl;
+	for_each_sgtable_dma_sg(&lm_file->sg_table, sg, i) {
+		sge->addr = cpu_to_le64(sg_dma_address(sg));
+		sge->len  = cpu_to_le32(sg_dma_len(sg));
+		dev_dbg(dev, "addr = %llx, len = %u\n", sge->addr, sge->len);
+		sge++;
+	}
+
+	return 0;
+
+err_alloc_sgl:
+	dma_unmap_sgtable(dev, &lm_file->sg_table, dir, 0);
+err_dma_map_sg:
+	return err;
+}
+
+static void
+pds_vfio_dma_unmap_lm_file(struct device *dev, enum dma_data_direction dir,
+			   struct pds_vfio_lm_file *lm_file)
+{
+	if (!lm_file)
+		return;
+
+	/* free sgl */
+	if (lm_file->sgl) {
+		dma_free_coherent(dev, lm_file->num_sge *
+				  sizeof(struct pds_lm_sg_elem),
+				  lm_file->sgl, lm_file->sgl_addr);
+		lm_file->sgl = NULL;
+		lm_file->sgl_addr = DMA_MAPPING_ERROR;
+		lm_file->num_sge = 0;
+	}
+
+	/* dma unmap file pages */
+	dma_unmap_sgtable(dev, &lm_file->sg_table, dir, 0);
+}
+
+int
+pds_vfio_get_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pds_lm_save_cmd cmd = {
+		.opcode = PDS_LM_CMD_SAVE,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	struct pds_vfio_lm_file *lm_file;
+	struct pds_auxiliary_dev *padev;
+	struct pds_lm_comp comp = {0};
+	int err;
+
+	dev_dbg(&pdev->dev, "vf%u: Get migration state\n", pds_vfio->vf_id);
+
+	lm_file = pds_vfio->save_file;
+
+	padev = pds_vfio->vfio_aux->padev;
+	err = pds_vfio_dma_map_lm_file(pds_vfio->coredev, DMA_FROM_DEVICE, lm_file);
+	if (err) {
+		dev_err(&pdev->dev, "failed to map save migration file: %pe\n",
+			ERR_PTR(err));
+		return err;
+	}
+
+	cmd.sgl_addr = cpu_to_le64(lm_file->sgl_addr);
+	cmd.num_sge = cpu_to_le32(lm_file->num_sge);
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err)
+		dev_err(&pdev->dev, "failed to get migration state: %pe\n",
+			ERR_PTR(err));
+
+	pds_vfio_dma_unmap_lm_file(pds_vfio->coredev, DMA_FROM_DEVICE, lm_file);
+
+	return err;
+}
+
+int
+pds_vfio_set_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pds_lm_restore_cmd cmd = {
+		.opcode = PDS_LM_CMD_RESTORE,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	struct pds_vfio_lm_file *lm_file;
+	struct pds_auxiliary_dev *padev;
+	struct pds_lm_comp comp = {0};
+	int err;
+
+	dev_dbg(&pdev->dev, "vf%u: Set migration state\n", pds_vfio->vf_id);
+
+	lm_file = pds_vfio->restore_file;
+
+	padev = pds_vfio->vfio_aux->padev;
+	err = pds_vfio_dma_map_lm_file(pds_vfio->coredev, DMA_TO_DEVICE, lm_file);
+	if (err) {
+		dev_err(&pdev->dev, "failed to map restore migration file: %pe\n",
+			ERR_PTR(err));
+		return err;
+	}
+
+	cmd.sgl_addr = cpu_to_le64(lm_file->sgl_addr);
+	cmd.num_sge = cpu_to_le32(lm_file->num_sge);
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err)
+		dev_err(&pdev->dev, "failed to set migration state: %pe\n",
+			ERR_PTR(err));
+
+	pds_vfio_dma_unmap_lm_file(pds_vfio->coredev, DMA_TO_DEVICE, lm_file);
+
+	return err;
+}
+
+void
+pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio,
+				    enum pds_lm_host_vf_status vf_status)
+{
+	struct pds_auxiliary_dev *padev = pds_vfio->vfio_aux->padev;
+	struct pds_lm_host_vf_status_cmd cmd = {
+		.opcode = PDS_LM_CMD_HOST_VF_STATUS,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+		.status = vf_status,
+	};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	struct pds_lm_comp comp = {0};
+	int err;
+
+	dev_dbg(&pdev->dev, "vf%u: Set host VF LM status: %u",
+		pds_vfio->vf_id, cmd.status);
+	if (vf_status != PDS_LM_STA_IN_PROGRESS &&
+	    vf_status != PDS_LM_STA_NONE) {
+		dev_warn(&pdev->dev, "Invalid host VF migration status, %d\n",
+			 vf_status);
+		return;
+	}
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err)
+		dev_warn(&pdev->dev, "failed to send host VF migration status: %pe\n",
+			 ERR_PTR(err));
+}
diff --git a/drivers/vfio/pci/pds/cmds.h b/drivers/vfio/pci/pds/cmds.h
index 7fe2d1efd894..5f9ac45ee5a3 100644
--- a/drivers/vfio/pci/pds/cmds.h
+++ b/drivers/vfio/pci/pds/cmds.h
@@ -4,11 +4,26 @@
 #ifndef _CMDS_H_
 #define _CMDS_H_
 
+#include <linux/pds/pds_lm.h>
+
 struct pds_vfio_pci_device;
 
 int
 pds_vfio_register_client_cmd(struct pds_vfio_pci_device *pds_vfio);
 void
 pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio);
+int
+pds_vfio_suspend_device_cmd(struct pds_vfio_pci_device *pds_vfio);
+int
+pds_vfio_resume_device_cmd(struct pds_vfio_pci_device *pds_vfio);
+int
+pds_vfio_get_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio, u64 *size);
+int
+pds_vfio_get_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio);
+int
+pds_vfio_set_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio);
+void
+pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio,
+				    enum pds_lm_host_vf_status vf_status);
 
 #endif /* _CMDS_H_ */
diff --git a/drivers/vfio/pci/pds/lm.c b/drivers/vfio/pci/pds/lm.c
new file mode 100644
index 000000000000..200a23f405fa
--- /dev/null
+++ b/drivers/vfio/pci/pds/lm.c
@@ -0,0 +1,486 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/highmem.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+
+#include "cmds.h"
+#include "vfio_dev.h"
+
+#define PDS_VFIO_LM_FILENAME	"pds_vfio_lm"
+
+const char *
+pds_vfio_lm_state(enum vfio_device_mig_state state)
+{
+	switch (state) {
+	case VFIO_DEVICE_STATE_ERROR:
+		return "VFIO_DEVICE_STATE_ERROR";
+	case VFIO_DEVICE_STATE_STOP:
+		return "VFIO_DEVICE_STATE_STOP";
+	case VFIO_DEVICE_STATE_RUNNING:
+		return "VFIO_DEVICE_STATE_RUNNING";
+	case VFIO_DEVICE_STATE_STOP_COPY:
+		return "VFIO_DEVICE_STATE_STOP_COPY";
+	case VFIO_DEVICE_STATE_RESUMING:
+		return "VFIO_DEVICE_STATE_RESUMING";
+	case VFIO_DEVICE_STATE_RUNNING_P2P:
+		return "VFIO_DEVICE_STATE_RUNNING_P2P";
+	default:
+		return "VFIO_DEVICE_STATE_INVALID";
+	}
+
+	return "VFIO_DEVICE_STATE_INVALID";
+}
+
+static struct pds_vfio_lm_file *
+pds_vfio_get_lm_file(const char *name, const struct file_operations *fops,
+		     int flags, u64 size)
+{
+	struct pds_vfio_lm_file *lm_file = NULL;
+	unsigned long long npages;
+	unsigned long long i;
+	struct page **pages;
+	int err = 0;
+
+	if (!size)
+		return NULL;
+
+	/* Alloc file structure */
+	lm_file = kzalloc(sizeof(*lm_file), GFP_KERNEL);
+	if (!lm_file)
+		return NULL;
+
+	/* Create file */
+	lm_file->filep = anon_inode_getfile(name, fops, lm_file, flags);
+	if (!lm_file->filep)
+		goto err_get_file;
+
+	stream_open(lm_file->filep->f_inode, lm_file->filep);
+	mutex_init(&lm_file->lock);
+
+	lm_file->size = size;
+
+	/* Allocate memory for file pages */
+	npages = DIV_ROUND_UP_ULL(lm_file->size, PAGE_SIZE);
+
+	pages = kcalloc(npages, sizeof(*pages), GFP_KERNEL);
+	if (!pages)
+		goto err_alloc_pages;
+
+	for (i = 0; i < npages; i++) {
+		pages[i] = alloc_page(GFP_KERNEL);
+		if (!pages[i])
+			goto err_alloc_page;
+	}
+
+	lm_file->pages = pages;
+	lm_file->npages = npages;
+	lm_file->alloc_size = npages * PAGE_SIZE;
+
+	/* Create scatterlist of file pages to use for DMA mapping later */
+	err = sg_alloc_table_from_pages(&lm_file->sg_table, pages, npages,
+					0, size, GFP_KERNEL);
+	if (err)
+		goto err_alloc_sg_table;
+
+	return lm_file;
+
+err_alloc_sg_table:
+err_alloc_page:
+	/* free allocated pages */
+	for (i = 0; i < npages && pages[i]; i++)
+		__free_page(pages[i]);
+	kfree(pages);
+err_alloc_pages:
+	fput(lm_file->filep);
+	mutex_destroy(&lm_file->lock);
+err_get_file:
+	kfree(lm_file);
+
+	return NULL;
+}
+
+static void
+pds_vfio_put_lm_file(struct pds_vfio_lm_file *lm_file)
+{
+	unsigned long long i;
+
+	mutex_lock(&lm_file->lock);
+
+	lm_file->size = 0;
+	lm_file->alloc_size = 0;
+
+	/* Free scatter list of file pages*/
+	sg_free_table(&lm_file->sg_table);
+
+	/* Free allocated file pages */
+	for (i = 0; i < lm_file->npages && lm_file->pages[i]; i++)
+		__free_page(lm_file->pages[i]);
+	kfree(lm_file->pages);
+	lm_file->pages = NULL;
+
+	/* Delete file */
+	fput(lm_file->filep);
+	lm_file->filep = NULL;
+
+	mutex_unlock(&lm_file->lock);
+
+	mutex_destroy(&lm_file->lock);
+
+	/* Free file structure */
+	kfree(lm_file);
+}
+
+void
+pds_vfio_put_save_file(struct pds_vfio_pci_device *pds_vfio)
+{
+	if (!pds_vfio->save_file)
+		return;
+
+	pds_vfio_put_lm_file(pds_vfio->save_file);
+	pds_vfio->save_file = NULL;
+}
+
+void
+pds_vfio_put_restore_file(struct pds_vfio_pci_device *pds_vfio)
+{
+	if (!pds_vfio->restore_file)
+		return;
+
+	pds_vfio_put_lm_file(pds_vfio->restore_file);
+	pds_vfio->restore_file = NULL;
+}
+
+static struct page *
+pds_vfio_get_file_page(struct pds_vfio_lm_file *lm_file,
+		       unsigned long offset)
+{
+	unsigned long cur_offset = 0;
+	struct scatterlist *sg;
+	unsigned int i;
+
+	/* All accesses are sequential */
+	if (offset < lm_file->last_offset || !lm_file->last_offset_sg) {
+		lm_file->last_offset = 0;
+		lm_file->last_offset_sg = lm_file->sg_table.sgl;
+		lm_file->sg_last_entry = 0;
+	}
+
+	cur_offset = lm_file->last_offset;
+
+	for_each_sg(lm_file->last_offset_sg, sg,
+		    lm_file->sg_table.orig_nents - lm_file->sg_last_entry,
+		    i) {
+		if (offset < sg->length + cur_offset) {
+			lm_file->last_offset_sg = sg;
+			lm_file->sg_last_entry += i;
+			lm_file->last_offset = cur_offset;
+			return nth_page(sg_page(sg),
+					(offset - cur_offset) / PAGE_SIZE);
+		}
+		cur_offset += sg->length;
+	}
+
+	return NULL;
+}
+
+static int
+pds_vfio_release_file(struct inode *inode, struct file *filp)
+{
+	struct pds_vfio_lm_file *lm_file = filp->private_data;
+
+	lm_file->size = 0;
+
+	return 0;
+}
+
+static ssize_t
+pds_vfio_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos)
+{
+	struct pds_vfio_lm_file *lm_file = filp->private_data;
+	ssize_t done = 0;
+
+	if (pos)
+		return -ESPIPE;
+	pos = &filp->f_pos;
+
+	mutex_lock(&lm_file->lock);
+	if (*pos > lm_file->size) {
+		done = -EINVAL;
+		goto out_unlock;
+	}
+
+	len = min_t(size_t, lm_file->size - *pos, len);
+	while (len) {
+		size_t page_offset;
+		struct page *page;
+		size_t page_len;
+		u8 *from_buff;
+		int err;
+
+		page_offset = (*pos) % PAGE_SIZE;
+		page = pds_vfio_get_file_page(lm_file, *pos - page_offset);
+		if (!page) {
+			if (done == 0)
+				done = -EINVAL;
+			goto out_unlock;
+		}
+
+		page_len = min_t(size_t, len, PAGE_SIZE - page_offset);
+		from_buff = kmap_local_page(page);
+		err = copy_to_user(buf, from_buff + page_offset, page_len);
+		kunmap_local(from_buff);
+		if (err) {
+			done = -EFAULT;
+			goto out_unlock;
+		}
+		*pos += page_len;
+		len -= page_len;
+		done += page_len;
+		buf += page_len;
+	}
+
+out_unlock:
+	mutex_unlock(&lm_file->lock);
+	return done;
+}
+
+static const struct file_operations
+pds_vfio_save_fops = {
+	.owner = THIS_MODULE,
+	.read = pds_vfio_save_read,
+	.release = pds_vfio_release_file,
+	.llseek = no_llseek,
+};
+
+static int
+pds_vfio_get_save_file(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pci_dev *pdev = pds_vfio->pdev;
+	struct pds_vfio_lm_file *lm_file;
+	int err = 0;
+	u64 size;
+
+	/* Get live migration state size in this state */
+	err = pds_vfio_get_lm_status_cmd(pds_vfio, &size);
+	if (err) {
+		dev_err(&pdev->dev, "failed to get save status: %pe\n",
+			ERR_PTR(err));
+		goto err_get_lm_status;
+	}
+
+	dev_dbg(&pdev->dev, "save status, size = %lld\n", size);
+
+	if (!size) {
+		err = -EIO;
+		dev_err(&pdev->dev, "invalid state size\n");
+		goto err_get_lm_status;
+	}
+
+	lm_file = pds_vfio_get_lm_file(PDS_VFIO_LM_FILENAME,
+				       &pds_vfio_save_fops,
+				       O_RDONLY, size);
+	if (!lm_file) {
+		err = -ENOENT;
+		dev_err(&pdev->dev, "failed to create save file\n");
+		goto err_get_lm_file;
+	}
+
+	dev_dbg(&pdev->dev, "size = %lld, alloc_size = %lld, npages = %lld\n",
+		lm_file->size, lm_file->alloc_size, lm_file->npages);
+
+	pds_vfio->save_file = lm_file;
+
+	return 0;
+
+err_get_lm_file:
+err_get_lm_status:
+	return err;
+}
+
+static ssize_t
+pds_vfio_restore_write(struct file *filp, const char __user *buf, size_t len, loff_t *pos)
+{
+	struct pds_vfio_lm_file *lm_file = filp->private_data;
+	loff_t requested_length;
+	ssize_t done = 0;
+
+	if (pos)
+		return -ESPIPE;
+
+	pos = &filp->f_pos;
+
+	if (*pos < 0 ||
+	    check_add_overflow((loff_t)len, *pos, &requested_length))
+		return -EINVAL;
+
+	mutex_lock(&lm_file->lock);
+
+	while (len) {
+		size_t page_offset;
+		struct page *page;
+		size_t page_len;
+		u8 *to_buff;
+		int err;
+
+		page_offset = (*pos) % PAGE_SIZE;
+		page = pds_vfio_get_file_page(lm_file, *pos - page_offset);
+		if (!page) {
+			if (done == 0)
+				done = -EINVAL;
+			goto out_unlock;
+		}
+
+		page_len = min_t(size_t, len, PAGE_SIZE - page_offset);
+		to_buff = kmap_local_page(page);
+		err = copy_from_user(to_buff + page_offset, buf, page_len);
+		kunmap_local(to_buff);
+		if (err) {
+			done = -EFAULT;
+			goto out_unlock;
+		}
+		*pos += page_len;
+		len -= page_len;
+		done += page_len;
+		buf += page_len;
+		lm_file->size += page_len;
+	}
+out_unlock:
+	mutex_unlock(&lm_file->lock);
+	return done;
+}
+
+static const struct file_operations
+pds_vfio_restore_fops = {
+	.owner = THIS_MODULE,
+	.write = pds_vfio_restore_write,
+	.release = pds_vfio_release_file,
+	.llseek = no_llseek,
+};
+
+static int
+pds_vfio_get_restore_file(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pci_dev *pdev = pds_vfio->pdev;
+	struct pds_vfio_lm_file *lm_file;
+	int err = 0;
+	u64 size;
+
+	size = sizeof(union pds_lm_dev_state);
+
+	dev_dbg(&pdev->dev, "restore status, size = %lld\n", size);
+
+	if (!size) {
+		err = -EIO;
+		dev_err(&pdev->dev, "invalid state size");
+		goto err_get_lm_status;
+	}
+
+	lm_file = pds_vfio_get_lm_file(PDS_VFIO_LM_FILENAME,
+				       &pds_vfio_restore_fops,
+				       O_WRONLY, size);
+	if (!lm_file) {
+		err = -ENOENT;
+		dev_err(&pdev->dev, "failed to create restore file");
+		goto err_get_lm_file;
+	}
+	pds_vfio->restore_file = lm_file;
+
+	return 0;
+
+err_get_lm_file:
+err_get_lm_status:
+	return err;
+}
+
+struct file *
+pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio,
+				  enum vfio_device_mig_state next)
+{
+	enum vfio_device_mig_state cur = pds_vfio->state;
+	struct device *dev = &pds_vfio->pdev->dev;
+	unsigned long lm_action_start;
+	int err = 0;
+
+	dev_dbg(dev, "%s => %s\n",
+		pds_vfio_lm_state(cur), pds_vfio_lm_state(next));
+
+	lm_action_start = jiffies;
+	if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_STOP_COPY) {
+		/* Device is already stopped
+		 * create save device data file & get device state from firmware
+		 */
+		err = pds_vfio_get_save_file(pds_vfio);
+		if (err)
+			return ERR_PTR(err);
+
+		/* Get device state */
+		err = pds_vfio_get_lm_state_cmd(pds_vfio);
+		if (err) {
+			pds_vfio_put_save_file(pds_vfio);
+			return ERR_PTR(err);
+		}
+
+		return pds_vfio->save_file->filep;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP_COPY && next == VFIO_DEVICE_STATE_STOP) {
+		/* Device is already stopped
+		 * delete the save device state file
+		 */
+		pds_vfio_put_save_file(pds_vfio);
+		pds_vfio_send_host_vf_lm_status_cmd(pds_vfio,
+						    PDS_LM_STA_NONE);
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_RESUMING) {
+		/* create resume device data file */
+		err = pds_vfio_get_restore_file(pds_vfio);
+		if (err)
+			return ERR_PTR(err);
+
+		return pds_vfio->restore_file->filep;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_RESUMING && next == VFIO_DEVICE_STATE_STOP) {
+		/* Set device state */
+		err = pds_vfio_set_lm_state_cmd(pds_vfio);
+		if (err)
+			return ERR_PTR(err);
+
+		/* delete resume device data file */
+		pds_vfio_put_restore_file(pds_vfio);
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_RUNNING && next == VFIO_DEVICE_STATE_STOP) {
+		/* Device should be stopped
+		 * no interrupts, dma or change in internal state
+		 */
+		err = pds_vfio_suspend_device_cmd(pds_vfio);
+		if (err)
+			return ERR_PTR(err);
+
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_RUNNING) {
+		/* Device should be functional
+		 * interrupts, dma, mmio or changes to internal state is allowed
+		 */
+		err = pds_vfio_resume_device_cmd(pds_vfio);
+		if (err)
+			return ERR_PTR(err);
+
+		pds_vfio_send_host_vf_lm_status_cmd(pds_vfio,
+						    PDS_LM_STA_NONE);
+		return NULL;
+	}
+
+	return ERR_PTR(-EINVAL);
+}
diff --git a/drivers/vfio/pci/pds/lm.h b/drivers/vfio/pci/pds/lm.h
new file mode 100644
index 000000000000..3dd97b807db6
--- /dev/null
+++ b/drivers/vfio/pci/pds/lm.h
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _LM_H_
+#define _LM_H_
+
+#include <linux/fs.h>
+#include <linux/mutex.h>
+#include <linux/scatterlist.h>
+#include <linux/types.h>
+#include <linux/vfio.h>
+
+#include <linux/pds/pds_lm.h>
+
+struct pds_vfio_lm_file {
+	struct file *filep;
+	struct mutex lock;	/* protect live migration data file */
+	u64 size;		/* Size with valid data */
+	u64 alloc_size;		/* Total allocated size. Always >= len */
+	struct page **pages;	/* Backing pages for file */
+	unsigned long long npages;
+	struct sg_table sg_table;	/* SG table for backing pages */
+	struct pds_lm_sg_elem *sgl;	/* DMA mapping */
+	dma_addr_t sgl_addr;
+	u16 num_sge;
+	struct scatterlist *last_offset_sg;	/* Iterator */
+	unsigned int sg_last_entry;
+	unsigned long last_offset;
+};
+
+struct pds_vfio_pci_device;
+
+struct file *
+pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio,
+				  enum vfio_device_mig_state next);
+const char *
+pds_vfio_lm_state(enum vfio_device_mig_state state);
+void
+pds_vfio_put_save_file(struct pds_vfio_pci_device *pds_vfio);
+void
+pds_vfio_put_restore_file(struct pds_vfio_pci_device *pds_vfio);
+
+#endif /* _LM_H_ */
diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c
index f6483494031d..081a51a0124a 100644
--- a/drivers/vfio/pci/pds/pci_drv.c
+++ b/drivers/vfio/pci/pds/pci_drv.c
@@ -66,12 +66,27 @@ pds_vfio_pci_table[] = {
 };
 MODULE_DEVICE_TABLE(pci, pds_vfio_pci_table);
 
+static void
+pds_vfio_pci_aer_reset_done(struct pci_dev *pdev)
+{
+	struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev);
+
+	pds_vfio_reset(pds_vfio);
+}
+
+static const struct
+pci_error_handlers pds_vfio_pci_err_handlers = {
+	.reset_done = pds_vfio_pci_aer_reset_done,
+	.error_detected = vfio_pci_core_aer_err_detected,
+};
+
 static struct pci_driver
 pds_vfio_pci_driver = {
 	.name = PDS_VFIO_DRV_NAME,
 	.id_table = pds_vfio_pci_table,
 	.probe = pds_vfio_pci_probe,
 	.remove = pds_vfio_pci_remove,
+	.err_handler = &pds_vfio_pci_err_handlers,
 	.driver_managed_dma = true,
 };
 
diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c
index 30c3bb47a2be..af8ce96033eb 100644
--- a/drivers/vfio/pci/pds/vfio_dev.c
+++ b/drivers/vfio/pci/pds/vfio_dev.c
@@ -4,6 +4,7 @@
 #include <linux/vfio.h>
 #include <linux/vfio_pci_core.h>
 
+#include "lm.h"
 #include "vfio_dev.h"
 #include "aux_drv.h"
 
@@ -16,6 +17,97 @@ pds_vfio_pci_drvdata(struct pci_dev *pdev)
 			    vfio_coredev);
 }
 
+static void
+pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio)
+{
+again:
+	spin_lock(&pds_vfio->reset_lock);
+	if (pds_vfio->deferred_reset) {
+		pds_vfio->deferred_reset = false;
+		if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) {
+			pds_vfio->state = VFIO_DEVICE_STATE_RUNNING;
+			pds_vfio_put_restore_file(pds_vfio);
+			pds_vfio_put_save_file(pds_vfio);
+		}
+		spin_unlock(&pds_vfio->reset_lock);
+		goto again;
+	}
+	mutex_unlock(&pds_vfio->state_mutex);
+	spin_unlock(&pds_vfio->reset_lock);
+}
+
+void
+pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio)
+{
+	spin_lock(&pds_vfio->reset_lock);
+	pds_vfio->deferred_reset = true;
+	if (!mutex_trylock(&pds_vfio->state_mutex)) {
+		spin_unlock(&pds_vfio->reset_lock);
+		return;
+	}
+	spin_unlock(&pds_vfio->reset_lock);
+	pds_vfio_state_mutex_unlock(pds_vfio);
+}
+
+static struct file *
+pds_vfio_set_device_state(struct vfio_device *vdev,
+			  enum vfio_device_mig_state new_state)
+{
+	struct pds_vfio_pci_device *pds_vfio =
+		container_of(vdev, struct pds_vfio_pci_device,
+			     vfio_coredev.vdev);
+	struct file *res = NULL;
+
+	if (!pds_vfio->vfio_aux)
+		return ERR_PTR(-ENODEV);
+
+	mutex_lock(&pds_vfio->state_mutex);
+	while (new_state != pds_vfio->state) {
+		enum vfio_device_mig_state next_state;
+
+		int err = vfio_mig_get_next_state(vdev, pds_vfio->state,
+						  new_state, &next_state);
+		if (err) {
+			res = ERR_PTR(err);
+			break;
+		}
+
+		res = pds_vfio_step_device_state_locked(pds_vfio, next_state);
+		if (IS_ERR(res))
+			break;
+
+		pds_vfio->state = next_state;
+
+		if (WARN_ON(res && new_state != pds_vfio->state)) {
+			res = ERR_PTR(-EINVAL);
+			break;
+		}
+	}
+	pds_vfio_state_mutex_unlock(pds_vfio);
+
+	return res;
+}
+
+static int
+pds_vfio_get_device_state(struct vfio_device *vdev,
+			  enum vfio_device_mig_state *current_state)
+{
+	struct pds_vfio_pci_device *pds_vfio =
+		container_of(vdev, struct pds_vfio_pci_device,
+			     vfio_coredev.vdev);
+
+	mutex_lock(&pds_vfio->state_mutex);
+	*current_state = pds_vfio->state;
+	pds_vfio_state_mutex_unlock(pds_vfio);
+	return 0;
+}
+
+static const struct vfio_migration_ops
+pds_vfio_lm_ops = {
+	.migration_set_state = pds_vfio_set_device_state,
+	.migration_get_state = pds_vfio_get_device_state
+};
+
 static int
 pds_vfio_init_device(struct vfio_device *vdev)
 {
@@ -35,10 +127,19 @@ pds_vfio_init_device(struct vfio_device *vdev)
 	vfio_aux = pds_vfio_aux_get_drvdata(pds_vfio->pci_id);
 	if (vfio_aux) {
 		vfio_aux->pds_vfio = pds_vfio;
+		pds_vfio->coredev = vfio_aux->padev->aux_dev.dev.parent;
 		pds_vfio->vfio_aux = vfio_aux;
 		pds_vfio_put_aux_dev(vfio_aux);
 	}
 
+	vdev->migration_flags = VFIO_MIGRATION_STOP_COPY;
+	vdev->mig_ops = &pds_vfio_lm_ops;
+
+	dev_dbg(&pdev->dev, "%s: PF %#04x VF %#04x (%d) vf_id %d domain %d vfio_aux %p pds_vfio %p\n",
+		__func__, pci_dev_id(pdev->physfn),
+		pds_vfio->pci_id, pds_vfio->pci_id, pds_vfio->vf_id,
+		pci_domain_nr(pdev->bus), pds_vfio->vfio_aux, pds_vfio);
+
 	return 0;
 }
 
@@ -54,18 +155,34 @@ pds_vfio_open_device(struct vfio_device *vdev)
 	if (err)
 		return err;
 
+	mutex_init(&pds_vfio->state_mutex);
+	dev_dbg(&pds_vfio->pdev->dev, "%s: %s => VFIO_DEVICE_STATE_RUNNING\n",
+		__func__, pds_vfio_lm_state(pds_vfio->state));
+	pds_vfio->state = VFIO_DEVICE_STATE_RUNNING;
+
 	vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev);
 
 	return 0;
 }
 
+static void
+pds_vfio_close_device(struct vfio_device *vdev)
+{
+	struct pds_vfio_pci_device *pds_vfio =
+		container_of(vdev, struct pds_vfio_pci_device,
+			     vfio_coredev.vdev);
+
+	mutex_destroy(&pds_vfio->state_mutex);
+	vfio_pci_core_close_device(vdev);
+}
+
 static const struct vfio_device_ops
 pds_vfio_ops = {
 	.name = "pds-vfio",
 	.init = pds_vfio_init_device,
 	.release = vfio_pci_core_release_dev,
 	.open_device = pds_vfio_open_device,
-	.close_device = vfio_pci_core_close_device,
+	.close_device = pds_vfio_close_device,
 	.ioctl = vfio_pci_core_ioctl,
 	.device_feature = vfio_pci_core_ioctl_feature,
 	.read = vfio_pci_core_read,
diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h
index b16668693e1f..a09570eec6fa 100644
--- a/drivers/vfio/pci/pds/vfio_dev.h
+++ b/drivers/vfio/pci/pds/vfio_dev.h
@@ -7,10 +7,20 @@
 #include <linux/pci.h>
 #include <linux/vfio_pci_core.h>
 
+#include "lm.h"
+
 struct pds_vfio_pci_device {
 	struct vfio_pci_core_device vfio_coredev;
 	struct pci_dev *pdev;
 	struct pds_vfio_aux *vfio_aux;
+	struct device *coredev;
+
+	struct pds_vfio_lm_file *save_file;
+	struct pds_vfio_lm_file *restore_file;
+	struct mutex state_mutex; /* protect migration state */
+	enum vfio_device_mig_state state;
+	spinlock_t reset_lock; /* protect reset_done flow */
+	u8 deferred_reset;
 
 	int vf_id;
 	int pci_id;
@@ -20,5 +30,7 @@ const struct vfio_device_ops *
 pds_vfio_ops_info(void);
 struct pds_vfio_pci_device *
 pds_vfio_pci_drvdata(struct pci_dev *pdev);
+void
+pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio);
 
 #endif /* _VFIO_DEV_H_ */
diff --git a/include/linux/pds/pds_lm.h b/include/linux/pds/pds_lm.h
index fdaf2bf71d35..28ebd62f7583 100644
--- a/include/linux/pds/pds_lm.h
+++ b/include/linux/pds/pds_lm.h
@@ -8,5 +8,211 @@
 
 #define PDS_DEV_TYPE_LM_STR	"LM"
 #define PDS_LM_DEV_NAME		PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_LM_STR
+#define PDS_LM_DEVICE_STATE_LENGTH		65536
+#define PDS_LM_CHECK_DEVICE_STATE_LENGTH(X)					\
+			PDS_CORE_SIZE_CHECK(union, PDS_LM_DEVICE_STATE_LENGTH, X)
+
+/*
+ * enum pds_lm_cmd_opcode - Live Migration Device commands
+ */
+enum pds_lm_cmd_opcode {
+	PDS_LM_CMD_HOST_VF_STATUS  = 1,
+
+	/* Device state commands */
+	PDS_LM_CMD_STATUS          = 16,
+	PDS_LM_CMD_SUSPEND         = 18,
+	PDS_LM_CMD_SUSPEND_STATUS  = 19,
+	PDS_LM_CMD_RESUME          = 20,
+	PDS_LM_CMD_SAVE            = 21,
+	PDS_LM_CMD_RESTORE         = 22,
+};
+
+/**
+ * struct pds_lm_cmd - generic command
+ * @opcode:	Opcode
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ * @rsvd2:	Structure padding to 60 Bytes
+ */
+struct pds_lm_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+	u8     rsvd2[56];
+};
+
+/**
+ * struct pds_lm_comp - generic command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @rsvd:	Structure padding to 16 Bytes
+ */
+struct pds_lm_comp {
+	u8 status;
+	u8 rsvd[15];
+};
+
+/**
+ * struct pds_lm_status_cmd - STATUS command
+ * @opcode:	Opcode
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ */
+struct pds_lm_status_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+};
+
+/**
+ * struct pds_lm_status_comp - STATUS command completion
+ * @status:		Status of the command (enum pds_core_status_code)
+ * @rsvd:		Word boundary padding
+ * @comp_index:		Index in the desc ring for which this is the completion
+ * @size:		Size of the device state
+ * @rsvd2:		Word boundary padding
+ * @color:		Color bit
+ */
+struct pds_lm_status_comp {
+	u8     status;
+	u8     rsvd;
+	__le16 comp_index;
+	union {
+		__le64 size;
+		u8     rsvd2[11];
+	} __packed;
+	u8     color;
+};
+
+/**
+ * struct pds_lm_suspend_cmd - SUSPEND command
+ * @opcode:	Opcode PDS_LM_CMD_SUSPEND
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ */
+struct pds_lm_suspend_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+};
+
+/**
+ * struct pds_lm_suspend_comp - SUSPEND command completion
+ * @status:		Status of the command (enum pds_core_status_code)
+ * @rsvd:		Word boundary padding
+ * @comp_index:		Index in the desc ring for which this is the completion
+ * @state_size:		Size of the device state computed post suspend.
+ * @rsvd2:		Word boundary padding
+ * @color:		Color bit
+ */
+struct pds_lm_suspend_comp {
+	u8     status;
+	u8     rsvd;
+	__le16 comp_index;
+	union {
+		__le64 state_size;
+		u8     rsvd2[11];
+	} __packed;
+	u8     color;
+};
+
+/**
+ * struct pds_lm_suspend_status_cmd - SUSPEND status command
+ * @opcode:	Opcode PDS_AQ_CMD_LM_SUSPEND_STATUS
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ */
+struct pds_lm_suspend_status_cmd {
+	u8 opcode;
+	u8 rsvd;
+	__le16 vf_id;
+};
+
+/**
+ * struct pds_lm_resume_cmd - RESUME command
+ * @opcode:	Opcode PDS_LM_CMD_RESUME
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ */
+struct pds_lm_resume_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+};
+
+/**
+ * struct pds_lm_sg_elem - Transmit scatter-gather (SG) descriptor element
+ * @addr:	DMA address of SG element data buffer
+ * @len:	Length of SG element data buffer, in bytes
+ * @rsvd:	Word boundary padding
+ */
+struct pds_lm_sg_elem {
+	__le64 addr;
+	__le32 len;
+	__le16 rsvd[2];
+};
+
+/**
+ * struct pds_lm_save_cmd - SAVE command
+ * @opcode:	Opcode PDS_LM_CMD_SAVE
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ * @rsvd2:	Word boundary padding
+ * @sgl_addr:	IOVA address of the SGL to dma the device state
+ * @num_sge:	Total number of SG elements
+ */
+struct pds_lm_save_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+	u8     rsvd2[4];
+	__le64 sgl_addr;
+	__le32 num_sge;
+} __packed;
+
+/**
+ * struct pds_lm_restore_cmd - RESTORE command
+ * @opcode:	Opcode PDS_LM_CMD_RESTORE
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ * @rsvd2:	Word boundary padding
+ * @sgl_addr:	IOVA address of the SGL to dma the device state
+ * @num_sge:	Total number of SG elements
+ */
+struct pds_lm_restore_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+	u8     rsvd2[4];
+	__le64 sgl_addr;
+	__le32 num_sge;
+} __packed;
+
+/**
+ * union pds_lm_dev_state - device state information
+ * @words:	Device state words
+ */
+union pds_lm_dev_state {
+	__le32 words[PDS_LM_DEVICE_STATE_LENGTH / sizeof(__le32)];
+};
+
+enum pds_lm_host_vf_status {
+	PDS_LM_STA_NONE = 0,
+	PDS_LM_STA_IN_PROGRESS,
+	PDS_LM_STA_MAX,
+};
+
+/**
+ * struct pds_lm_host_vf_status_cmd - HOST_VF_STATUS command
+ * @opcode:	Opcode PDS_LM_CMD_HOST_VF_STATUS
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ * @status:	Current LM status of host VF driver (enum pds_lm_host_status)
+ */
+struct pds_lm_host_vf_status_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+	u8     status;
+};
 
 #endif /* _PDS_LM_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH vfio 4/7] vfio: Commonize combine_ranges for use in other VFIO drivers
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
                   ` (2 preceding siblings ...)
  2022-12-07  1:07 ` [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support Brett Creeley
@ 2022-12-07  1:07 ` Brett Creeley
  2022-12-07  1:07 ` [RFC PATCH vfio 5/7] vfio/pds: Add support for dirty page tracking Brett Creeley
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-07  1:07 UTC (permalink / raw)
  To: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Brett Creeley

Currently only Mellanox uses the combine_ranges function. The
new pds_vfio driver also needs this function. So, move it to
a common location for other vendor drivers to use.

Cc: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 drivers/vfio/pci/mlx5/cmd.c | 48 +------------------------------------
 drivers/vfio/vfio_main.c    | 48 +++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h        |  3 +++
 3 files changed, 52 insertions(+), 47 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index c604b70437a5..bf620d339d82 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -425,52 +425,6 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 	return err;
 }
 
-static void combine_ranges(struct rb_root_cached *root, u32 cur_nodes,
-			   u32 req_nodes)
-{
-	struct interval_tree_node *prev, *curr, *comb_start, *comb_end;
-	unsigned long min_gap;
-	unsigned long curr_gap;
-
-	/* Special shortcut when a single range is required */
-	if (req_nodes == 1) {
-		unsigned long last;
-
-		curr = comb_start = interval_tree_iter_first(root, 0, ULONG_MAX);
-		while (curr) {
-			last = curr->last;
-			prev = curr;
-			curr = interval_tree_iter_next(curr, 0, ULONG_MAX);
-			if (prev != comb_start)
-				interval_tree_remove(prev, root);
-		}
-		comb_start->last = last;
-		return;
-	}
-
-	/* Combine ranges which have the smallest gap */
-	while (cur_nodes > req_nodes) {
-		prev = NULL;
-		min_gap = ULONG_MAX;
-		curr = interval_tree_iter_first(root, 0, ULONG_MAX);
-		while (curr) {
-			if (prev) {
-				curr_gap = curr->start - prev->last;
-				if (curr_gap < min_gap) {
-					min_gap = curr_gap;
-					comb_start = prev;
-					comb_end = curr;
-				}
-			}
-			prev = curr;
-			curr = interval_tree_iter_next(curr, 0, ULONG_MAX);
-		}
-		comb_start->last = comb_end->last;
-		interval_tree_remove(comb_end, root);
-		cur_nodes--;
-	}
-}
-
 static int mlx5vf_create_tracker(struct mlx5_core_dev *mdev,
 				 struct mlx5vf_pci_core_device *mvdev,
 				 struct rb_root_cached *ranges, u32 nnodes)
@@ -493,7 +447,7 @@ static int mlx5vf_create_tracker(struct mlx5_core_dev *mdev,
 	int i;
 
 	if (num_ranges > max_num_range) {
-		combine_ranges(ranges, nnodes, max_num_range);
+		vfio_combine_iova_ranges(ranges, nnodes, max_num_range);
 		num_ranges = max_num_range;
 	}
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6e8804fe0095..67edace75785 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1293,6 +1293,54 @@ static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
 	return 0;
 }
 
+void vfio_combine_iova_ranges(struct rb_root_cached *root, u32 cur_nodes,
+			      u32 req_nodes)
+{
+	struct interval_tree_node *prev, *curr, *comb_start, *comb_end;
+	unsigned long min_gap;
+	unsigned long curr_gap;
+
+	/* Special shortcut when a single range is required */
+	if (req_nodes == 1) {
+		unsigned long last;
+
+		comb_start = interval_tree_iter_first(root, 0, ULONG_MAX);
+		curr = comb_start;
+		while (curr) {
+			last = curr->last;
+			prev = curr;
+			curr = interval_tree_iter_next(curr, 0, ULONG_MAX);
+			if (prev != comb_start)
+				interval_tree_remove(prev, root);
+		}
+		comb_start->last = last;
+		return;
+	}
+
+	/* Combine ranges which have the smallest gap */
+	while (cur_nodes > req_nodes) {
+		prev = NULL;
+		min_gap = ULONG_MAX;
+		curr = interval_tree_iter_first(root, 0, ULONG_MAX);
+		while (curr) {
+			if (prev) {
+				curr_gap = curr->start - prev->last;
+				if (curr_gap < min_gap) {
+					min_gap = curr_gap;
+					comb_start = prev;
+					comb_end = curr;
+				}
+			}
+			prev = curr;
+			curr = interval_tree_iter_next(curr, 0, ULONG_MAX);
+		}
+		comb_start->last = comb_end->last;
+		interval_tree_remove(comb_end, root);
+		cur_nodes--;
+	}
+}
+EXPORT_SYMBOL_GPL(vfio_combine_iova_ranges);
+
 /* Ranges should fit into a single kernel page */
 #define LOG_MAX_RANGES \
 	(PAGE_SIZE / sizeof(struct vfio_device_feature_dma_logging_range))
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index fdd393f70b19..90d4fb3f155c 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -196,6 +196,9 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 			    enum vfio_device_mig_state new_fsm,
 			    enum vfio_device_mig_state *next_fsm);
 
+void vfio_combine_iova_ranges(struct rb_root_cached *root, u32 cur_nodes,
+			      u32 req_nodes);
+
 /*
  * External user API
  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH vfio 5/7] vfio/pds: Add support for dirty page tracking
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
                   ` (3 preceding siblings ...)
  2022-12-07  1:07 ` [RFC PATCH vfio 4/7] vfio: Commonize combine_ranges for use in other VFIO drivers Brett Creeley
@ 2022-12-07  1:07 ` Brett Creeley
  2022-12-07  1:07 ` [RFC PATCH vfio 6/7] vfio/pds: Add support for firmware recovery Brett Creeley
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-07  1:07 UTC (permalink / raw)
  To: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Brett Creeley

In order to support dirty page tracking, the driver has to implement
the VFIO subsystem's vfio_log_ops. This includes log_start, log_stop,
and log_read_and_clear.

All of the tracker resources are allocated and dirty tracking on the
device is started during log_start. The resources are cleaned up and
dirty tracking on the device is stopped during log_stop. The dirty
pages are determined and reported during log_read_and_clear.

In order to support these callbacks admin queue commands are used.
All of the adminq queue command structures and implementations
are included as part of this patch.

PDS_LM_CMD_DIRTY_STATUS is added to query the current status of
dirty tracking on the device. This includes if it's enabled (i.e.
number of regions being tracked from the device's perspective) and
the maximum number of regions supported from the device's perspective.

PDS_LM_CMD_DIRTY_ENABLE is added to enable dirty tracking on the
specified number of regions and their iova ranges.

PDS_LM_CMD_DIRTY_DISABLE is added to disable dirty tracking for all
regions on the device.

PDS_LM_CMD_READ_SEQ and PDS_LM_CMD_DIRTY_WRITE_ACK are added to
support reading and acknowledging the currently dirtied pages.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 drivers/vfio/pci/pds/Makefile   |   1 +
 drivers/vfio/pci/pds/cmds.c     | 143 +++++++++
 drivers/vfio/pci/pds/cmds.h     |  15 +
 drivers/vfio/pci/pds/dirty.c    | 541 ++++++++++++++++++++++++++++++++
 drivers/vfio/pci/pds/dirty.h    |  48 +++
 drivers/vfio/pci/pds/lm.h       |  10 +
 drivers/vfio/pci/pds/vfio_dev.c |   9 +
 drivers/vfio/pci/pds/vfio_dev.h |   2 +
 include/linux/pds/pds_lm.h      | 173 ++++++++++
 9 files changed, 942 insertions(+)
 create mode 100644 drivers/vfio/pci/pds/dirty.c
 create mode 100644 drivers/vfio/pci/pds/dirty.h

diff --git a/drivers/vfio/pci/pds/Makefile b/drivers/vfio/pci/pds/Makefile
index 3d3be3593a02..086fe31ad4a5 100644
--- a/drivers/vfio/pci/pds/Makefile
+++ b/drivers/vfio/pci/pds/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_PDS_VFIO_PCI) += pds_vfio.o
 pds_vfio-y := \
 	aux_drv.o	\
 	cmds.o		\
+	dirty.o		\
 	lm.o		\
 	pci_drv.o	\
 	vfio_dev.o
diff --git a/drivers/vfio/pci/pds/cmds.c b/drivers/vfio/pci/pds/cmds.c
index 11823d824ccc..3ebd46fd6b57 100644
--- a/drivers/vfio/pci/pds/cmds.c
+++ b/drivers/vfio/pci/pds/cmds.c
@@ -340,3 +340,146 @@ pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio,
 		dev_warn(&pdev->dev, "failed to send host VF migration status: %pe\n",
 			 ERR_PTR(err));
 }
+
+int
+pds_vfio_dirty_status_cmd(struct pds_vfio_pci_device *pds_vfio,
+			  u64 regions_dma, u8 *max_regions,
+			  u8 *num_regions)
+{
+	struct pds_auxiliary_dev *padev = pds_vfio->vfio_aux->padev;
+	struct pds_lm_dirty_status_cmd cmd = {
+		.opcode = PDS_LM_CMD_DIRTY_STATUS,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pds_lm_dirty_status_comp comp = {0};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	int err;
+
+	dev_dbg(&pdev->dev, "vf%u: Dirty status\n", pds_vfio->vf_id);
+
+	cmd.regions_dma = cpu_to_le64(regions_dma);
+	cmd.max_regions = *max_regions;
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(&pdev->dev, "failed to get dirty status: %pe\n",
+			ERR_PTR(err));
+		return err;
+	}
+
+	/* only support seq_ack approach for now */
+	if (!(le32_to_cpu(comp.bmp_type_mask) &
+	      BIT(PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK))) {
+		dev_err(&pdev->dev, "Dirty bitmap tracking SEQ_ACK not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	*num_regions = comp.num_regions;
+	*max_regions = comp.max_regions;
+
+	dev_dbg(&pdev->dev, "Page Tracking Status command successful, max_regions: %d, num_regions: %d, bmp_type: %s\n",
+		*max_regions, *num_regions, "PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK");
+
+	return 0;
+}
+
+int
+pds_vfio_dirty_enable_cmd(struct pds_vfio_pci_device *pds_vfio,
+			  u64 regions_dma, u8 num_regions)
+{
+	struct pds_auxiliary_dev *padev = pds_vfio->vfio_aux->padev;
+	struct pds_lm_dirty_enable_cmd cmd = {
+		.opcode = PDS_LM_CMD_DIRTY_ENABLE,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pds_lm_dirty_status_comp comp = {0};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	int err;
+
+	cmd.regions_dma = cpu_to_le64(regions_dma);
+	cmd.bmp_type = PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK;
+	cmd.num_regions = num_regions;
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(&pdev->dev, "failed dirty tracking enable: %pe\n",
+			ERR_PTR(err));
+		return err;
+	}
+
+	return 0;
+}
+
+int
+pds_vfio_dirty_disable_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+	struct pds_auxiliary_dev *padev = pds_vfio->vfio_aux->padev;
+	struct pds_lm_dirty_disable_cmd cmd = {
+		.opcode = PDS_LM_CMD_DIRTY_DISABLE,
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pds_lm_dirty_status_comp comp = {0};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	int err;
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err || comp.num_regions != 0) {
+		/* in case num_regions is still non-zero after disable */
+		err = err ? err : -EIO;
+		dev_err(&pdev->dev, "failed dirty tracking disable: %pe, num_regions %d\n",
+			ERR_PTR(err), comp.num_regions);
+		return err;
+	}
+
+	return 0;
+}
+
+int
+pds_vfio_dirty_seq_ack_cmd(struct pds_vfio_pci_device *pds_vfio,
+			   u64 sgl_dma, u16 num_sge, u32 offset,
+			   u32 total_len, bool read_seq)
+{
+	const char *cmd_type_str = read_seq ? "read_seq" : "write_ack";
+	struct pds_auxiliary_dev *padev = pds_vfio->vfio_aux->padev;
+	struct pds_lm_dirty_seq_ack_cmd cmd = {
+		.vf_id = cpu_to_le16(pds_vfio->vf_id),
+	};
+	struct pci_dev *pdev = pds_vfio->pdev;
+	struct pds_lm_comp comp = {0};
+	int err;
+
+	if (read_seq)
+		cmd.opcode = PDS_LM_CMD_DIRTY_READ_SEQ;
+	else
+		cmd.opcode = PDS_LM_CMD_DIRTY_WRITE_ACK;
+
+	cmd.sgl_addr = cpu_to_le64(sgl_dma);
+	cmd.num_sge = cpu_to_le16(num_sge);
+	cmd.len_bytes = cpu_to_le32(total_len);
+	cmd.off_bytes = cpu_to_le32(offset);
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(&pdev->dev, "failed cmd Page Tracking %s: %pe\n",
+			cmd_type_str, ERR_PTR(err));
+		return err;
+	}
+
+	return 0;
+}
diff --git a/drivers/vfio/pci/pds/cmds.h b/drivers/vfio/pci/pds/cmds.h
index 5f9ac45ee5a3..d4d53fcbb739 100644
--- a/drivers/vfio/pci/pds/cmds.h
+++ b/drivers/vfio/pci/pds/cmds.h
@@ -4,6 +4,8 @@
 #ifndef _CMDS_H_
 #define _CMDS_H_
 
+#include <linux/types.h>
+
 #include <linux/pds/pds_lm.h>
 
 struct pds_vfio_pci_device;
@@ -25,5 +27,18 @@ pds_vfio_set_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio);
 void
 pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio,
 				    enum pds_lm_host_vf_status vf_status);
+int
+pds_vfio_dirty_status_cmd(struct pds_vfio_pci_device *pds_vfio,
+			  u64 regions_dma, u8 *max_regions,
+			  u8 *num_regions);
+int
+pds_vfio_dirty_enable_cmd(struct pds_vfio_pci_device *pds_vfio,
+			  u64 regions_dma, u8 num_regions);
+int
+pds_vfio_dirty_disable_cmd(struct pds_vfio_pci_device *pds_vfio);
+int
+pds_vfio_dirty_seq_ack_cmd(struct pds_vfio_pci_device *pds_vfio,
+			   u64 sgl_dma, u16 num_sge, u32 offset,
+			   u32 total_len, bool read_seq);
 
 #endif /* _CMDS_H_ */
diff --git a/drivers/vfio/pci/pds/dirty.c b/drivers/vfio/pci/pds/dirty.c
new file mode 100644
index 000000000000..70d5be2ea108
--- /dev/null
+++ b/drivers/vfio/pci/pds/dirty.c
@@ -0,0 +1,541 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/io.h>
+#include <linux/interval_tree.h>
+#include <linux/vfio.h>
+
+#include <linux/pds/pds_intr.h>
+#include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
+
+#include "cmds.h"
+#include "dirty.h"
+#include "vfio_dev.h"
+
+#define READ_SEQ	true
+#define WRITE_ACK	false
+
+bool
+pds_vfio_dirty_is_enabled(struct pds_vfio_pci_device *pds_vfio)
+{
+	return pds_vfio->dirty.is_enabled;
+}
+
+void
+pds_vfio_dirty_set_enabled(struct pds_vfio_pci_device *pds_vfio)
+{
+	pds_vfio->dirty.is_enabled = true;
+}
+
+void
+pds_vfio_dirty_set_disabled(struct pds_vfio_pci_device *pds_vfio)
+{
+	pds_vfio->dirty.is_enabled = false;
+}
+
+static void
+pds_vfio_print_guest_region_info(struct pds_vfio_pci_device *pds_vfio,
+				 u8 max_regions)
+{
+	int len = max_regions * sizeof(struct pds_lm_dirty_region_info);
+	struct pds_lm_dirty_region_info *region_info;
+	struct pci_dev *pdev = pds_vfio->pdev;
+	dma_addr_t regions_dma;
+	u8 num_regions;
+	int err;
+
+	region_info = kcalloc(max_regions,
+			      sizeof(struct pds_lm_dirty_region_info),
+			      GFP_KERNEL);
+	if (!region_info)
+		return;
+
+	regions_dma = dma_map_single(pds_vfio->coredev, region_info, len,
+				     DMA_FROM_DEVICE);
+	if (dma_mapping_error(pds_vfio->coredev, regions_dma)) {
+		kfree(region_info);
+		return;
+	}
+
+	err = pds_vfio_dirty_status_cmd(pds_vfio, regions_dma,
+					&max_regions, &num_regions);
+	dma_unmap_single(pds_vfio->coredev, regions_dma, len, DMA_FROM_DEVICE);
+
+	if (!err) {
+		int i;
+
+		for (i = 0; i < num_regions; i++)
+			dev_dbg(&pdev->dev, "region_info[%d]: dma_base 0x%llx page_count %u page_size_log2 %u\n",
+				i, le64_to_cpu(region_info[i].dma_base),
+				le32_to_cpu(region_info[i].page_count),
+				region_info[i].page_size_log2);
+	}
+
+	kfree(region_info);
+}
+
+static int
+pds_vfio_dirty_alloc_bitmaps(struct pds_vfio_dirty *dirty,
+			     u32 nbits)
+{
+	unsigned long *host_seq_bmp, *host_ack_bmp;
+
+	host_seq_bmp = bitmap_zalloc(nbits, GFP_KERNEL);
+	if (!host_seq_bmp)
+		return -ENOMEM;
+
+	host_ack_bmp = bitmap_zalloc(nbits, GFP_KERNEL);
+	if (!host_ack_bmp) {
+		bitmap_free(host_seq_bmp);
+		return -ENOMEM;
+	}
+
+	dirty->host_seq.bmp = host_seq_bmp;
+	dirty->host_ack.bmp = host_ack_bmp;
+
+	return 0;
+}
+
+static void
+pds_vfio_dirty_free_bitmaps(struct pds_vfio_dirty *dirty)
+{
+	if (dirty->host_seq.bmp)
+		bitmap_free(dirty->host_seq.bmp);
+	if (dirty->host_ack.bmp)
+		bitmap_free(dirty->host_ack.bmp);
+
+	dirty->host_seq.bmp = NULL;
+	dirty->host_ack.bmp = NULL;
+}
+
+static void
+__pds_vfio_dirty_free_sgl(struct pds_vfio_pci_device *pds_vfio,
+			  struct pds_vfio_bmp_info *bmp_info)
+{
+	dma_free_coherent(pds_vfio->coredev,
+			  bmp_info->num_sge * sizeof(*bmp_info->sgl),
+			  bmp_info->sgl, bmp_info->sgl_addr);
+
+	bmp_info->num_sge = 0;
+	bmp_info->sgl = NULL;
+	bmp_info->sgl_addr = 0;
+}
+
+static void
+pds_vfio_dirty_free_sgl(struct pds_vfio_pci_device *pds_vfio)
+{
+	if (pds_vfio->dirty.host_seq.sgl)
+		__pds_vfio_dirty_free_sgl(pds_vfio,
+					  &pds_vfio->dirty.host_seq);
+	if (pds_vfio->dirty.host_ack.sgl)
+		__pds_vfio_dirty_free_sgl(pds_vfio,
+					  &pds_vfio->dirty.host_ack);
+}
+
+static int
+__pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio,
+			   struct pds_vfio_bmp_info *bmp_info,
+			   u32 page_count)
+{
+	struct pds_lm_sg_elem *sgl;
+	dma_addr_t sgl_addr;
+	u32 max_sge;
+
+	max_sge = DIV_ROUND_UP(page_count, PAGE_SIZE * 8);
+
+	sgl = dma_alloc_coherent(pds_vfio->coredev,
+				 max_sge * sizeof(*sgl), &sgl_addr,
+				 GFP_KERNEL);
+	if (!sgl)
+		return -ENOMEM;
+
+	bmp_info->sgl = sgl;
+	bmp_info->num_sge = max_sge;
+	bmp_info->sgl_addr = sgl_addr;
+
+	return 0;
+}
+
+static int
+pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio,
+			 u32 page_count)
+{
+	struct pds_vfio_dirty *dirty = &pds_vfio->dirty;
+	int err;
+
+	err = __pds_vfio_dirty_alloc_sgl(pds_vfio,
+					 &dirty->host_seq,
+					 page_count);
+	if (err)
+		return err;
+
+	err = __pds_vfio_dirty_alloc_sgl(pds_vfio,
+					 &dirty->host_ack,
+					 page_count);
+	if (err) {
+		__pds_vfio_dirty_free_sgl(pds_vfio, &dirty->host_seq);
+		return err;
+	}
+
+	return 0;
+}
+
+static int
+pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio,
+		      struct rb_root_cached *ranges, u32 nnodes,
+		      u64 *page_size)
+{
+	struct pds_vfio_dirty *dirty = &pds_vfio->dirty;
+	u64 region_start, region_size, region_page_size;
+	struct pds_lm_dirty_region_info *region_info;
+	struct interval_tree_node *node = NULL;
+	struct pci_dev *pdev = pds_vfio->pdev;
+	u8 max_regions = 0, num_regions;
+	dma_addr_t regions_dma = 0;
+	u32 num_ranges = nnodes;
+	u32 page_count;
+	u16 len;
+	int err;
+
+	dev_dbg(&pdev->dev, "vf%u: Start dirty page tracking\n", pds_vfio->vf_id);
+
+	if (pds_vfio_dirty_is_enabled(pds_vfio))
+		return -EINVAL;
+
+	pds_vfio_dirty_set_enabled(pds_vfio);
+
+	/* find if dirty tracking is disabled, i.e. num_regions == 0 */
+	err = pds_vfio_dirty_status_cmd(pds_vfio, 0, &max_regions, &num_regions);
+	if (num_regions) {
+		dev_err(&pdev->dev, "Dirty tracking already enabled for %d regions\n",
+			num_regions);
+		err = -EEXIST;
+		goto err_out;
+	} else if (!max_regions) {
+		dev_err(&pdev->dev, "Device doesn't support dirty tracking, max_regions %d\n",
+			max_regions);
+		err = -EOPNOTSUPP;
+		goto err_out;
+	} else if (err) {
+		dev_err(&pdev->dev, "Failed to get dirty status, err %pe\n",
+			ERR_PTR(err));
+		goto err_out;
+	}
+
+	/* Only support 1 region for now. If there are any large gaps in the
+	 * VM's address regions, then this would be a waste of memory as we are
+	 * generating 2 bitmaps (ack/seq) from the min address to the max
+	 * address of the VM's address regions. In the future, if we support
+	 * more than one region in the device/driver we can split the bitmaps
+	 * on the largest address region gaps. We can do this split up to the
+	 * max_regions times returned from the dirty_status command.
+	 */
+	max_regions = 1;
+	if (num_ranges > max_regions) {
+		vfio_combine_iova_ranges(ranges, nnodes, max_regions);
+		num_ranges = max_regions;
+	}
+
+	node = interval_tree_iter_first(ranges, 0, ULONG_MAX);
+	if (!node) {
+		err = -EINVAL;
+		goto err_out;
+	}
+
+	region_size = node->last - node->start + 1;
+	region_start = node->start;
+	region_page_size = *page_size;
+
+	len = sizeof(*region_info);
+	region_info = kzalloc(len, GFP_KERNEL);
+	if (!region_info) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	page_count = DIV_ROUND_UP(region_size, region_page_size);
+
+	region_info->dma_base = cpu_to_le64(region_start);
+	region_info->page_count = cpu_to_le32(page_count);
+	region_info->page_size_log2 = ilog2(region_page_size);
+
+	regions_dma = dma_map_single(pds_vfio->coredev, (void *)region_info, len,
+				     DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(pds_vfio->coredev, regions_dma)) {
+		err = -ENOMEM;
+		kfree(region_info);
+		goto err_out;
+	}
+
+	err = pds_vfio_dirty_enable_cmd(pds_vfio, regions_dma, max_regions);
+	dma_unmap_single(pds_vfio->coredev, regions_dma, len, DMA_BIDIRECTIONAL);
+	/* page_count might be adjusted by the device,
+	 * update it before freeing region_info DMA
+	 */
+	page_count = le32_to_cpu(region_info->page_count);
+
+	dev_dbg(&pdev->dev, "region_info: regions_dma 0x%llx dma_base 0x%llx page_count %u page_size_log2 %u\n",
+		regions_dma, region_start, page_count, (u8)ilog2(region_page_size));
+
+	kfree(region_info);
+	if (err)
+		goto err_out;
+
+	err = pds_vfio_dirty_alloc_bitmaps(dirty, page_count);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to alloc dirty bitmaps: %pe\n",
+			ERR_PTR(err));
+		goto err_out;
+	}
+
+	err = pds_vfio_dirty_alloc_sgl(pds_vfio, page_count);
+	if (err) {
+		dev_err(&pdev->dev, "Failed to alloc dirty sg lists: %pe\n",
+			ERR_PTR(err));
+		goto err_free_bitmaps;
+	}
+
+	dirty->region_start = region_start;
+	dirty->region_size = region_size;
+	dirty->region_page_size = region_page_size;
+
+	pds_vfio_print_guest_region_info(pds_vfio, max_regions);
+
+	return 0;
+
+err_free_bitmaps:
+	pds_vfio_dirty_free_bitmaps(dirty);
+err_out:
+	pds_vfio_dirty_set_disabled(pds_vfio);
+	return err;
+}
+
+static int
+pds_vfio_dirty_disable(struct pds_vfio_pci_device *pds_vfio)
+{
+	int err;
+
+	if (!pds_vfio_dirty_is_enabled(pds_vfio))
+		return 0;
+
+	pds_vfio_dirty_set_disabled(pds_vfio);
+	err = pds_vfio_dirty_disable_cmd(pds_vfio);
+	pds_vfio_dirty_free_sgl(pds_vfio);
+	pds_vfio_dirty_free_bitmaps(&pds_vfio->dirty);
+
+	return err;
+}
+
+static int
+pds_vfio_dirty_seq_ack(struct pds_vfio_pci_device *pds_vfio,
+		       struct pds_vfio_bmp_info *bmp_info,
+		       u32 offset, u32 bmp_bytes,
+		       bool read_seq)
+{
+	const char *bmp_type_str = read_seq ? "read_seq" : "write_ack";
+	struct pci_dev *pdev = pds_vfio->pdev;
+	int bytes_remaining;
+	dma_addr_t bmp_dma;
+	u8 dma_direction;
+	u16 num_sge = 0;
+	int err, i;
+	u64 *bmp;
+
+	bmp = (u64 *)((u64)bmp_info->bmp + offset);
+
+	dma_direction = read_seq ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+	bmp_dma = dma_map_single(pds_vfio->coredev, bmp, bmp_bytes,
+				 dma_direction);
+	if (dma_mapping_error(pds_vfio->coredev, bmp_dma))
+		return -EINVAL;
+
+	bytes_remaining = bmp_bytes;
+
+	for (i = 0; i < bmp_info->num_sge && bytes_remaining > 0; i++) {
+		struct pds_lm_sg_elem *sg_elem = &bmp_info->sgl[i];
+		u32 len = (bytes_remaining > PAGE_SIZE) ?
+			PAGE_SIZE : bytes_remaining;
+
+		sg_elem->addr = cpu_to_le64(bmp_dma + i * PAGE_SIZE);
+		sg_elem->len = cpu_to_le32(len);
+
+		bytes_remaining -= len;
+		++num_sge;
+	}
+
+	err = pds_vfio_dirty_seq_ack_cmd(pds_vfio, bmp_info->sgl_addr,
+					 num_sge, offset, bmp_bytes, read_seq);
+	if (err)
+		dev_err(&pdev->dev, "Dirty bitmap %s failed offset %u bmp_bytes %u num_sge %u DMA 0x%llx: %pe\n",
+			bmp_type_str, offset, bmp_bytes, num_sge, bmp_info->sgl_addr, ERR_PTR(err));
+
+	dma_unmap_single(pds_vfio->coredev, bmp_dma, bmp_bytes, dma_direction);
+
+	return err;
+}
+
+static int
+pds_vfio_dirty_write_ack(struct pds_vfio_pci_device *pds_vfio, u32 offset,
+			 u32 len)
+{
+	return pds_vfio_dirty_seq_ack(pds_vfio,
+				      &pds_vfio->dirty.host_ack, offset,
+				      len, WRITE_ACK);
+}
+
+static int
+pds_vfio_dirty_read_seq(struct pds_vfio_pci_device *pds_vfio, u32 offset,
+			u32 len)
+{
+	return pds_vfio_dirty_seq_ack(pds_vfio,
+					  &pds_vfio->dirty.host_seq, offset,
+					  len, READ_SEQ);
+}
+
+static int
+pds_vfio_dirty_process_bitmaps(struct pds_vfio_pci_device *pds_vfio,
+			       struct iova_bitmap *dirty_bitmap, u32 bmp_offset,
+			       u32 len_bytes)
+{
+	u64 page_size = pds_vfio->dirty.region_page_size;
+	u64 region_start = pds_vfio->dirty.region_start;
+	u32 bmp_offset_bit;
+	int dword_count, i;
+	__le64 *seq, *ack;
+
+	dword_count = len_bytes / sizeof(u64);
+	seq = (__le64 *)((u64)pds_vfio->dirty.host_seq.bmp + bmp_offset);
+	ack = (__le64 *)((u64)pds_vfio->dirty.host_ack.bmp + bmp_offset);
+	bmp_offset_bit = bmp_offset * 8;
+
+	for (i = 0; i < dword_count; i++) {
+		u64 xor = le64_to_cpu(seq[i]) ^ le64_to_cpu(ack[i]);
+		u8 bit_i;
+
+		/* prepare for next write_ack call */
+		ack[i] = seq[i];
+
+#define BITS_PER_U64	(sizeof(u64) * BITS_PER_BYTE)
+		for (bit_i = 0; bit_i < BITS_PER_U64; ++bit_i) {
+			if (xor & BIT(bit_i)) {
+				u64 abs_bit_i = bmp_offset_bit + i *
+					BITS_PER_U64 + bit_i;
+				u64 addr = abs_bit_i * page_size + region_start;
+
+				iova_bitmap_set(dirty_bitmap, addr, page_size);
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int
+pds_vfio_dirty_sync(struct pds_vfio_pci_device *pds_vfio,
+		    struct iova_bitmap *dirty_bitmap,
+		    unsigned long iova, unsigned long length)
+{
+	struct pds_vfio_dirty *dirty = &pds_vfio->dirty;
+	struct pci_dev *pdev = pds_vfio->pdev;
+	u64 bmp_offset, bmp_bytes;
+	u64 bitmap_size, pages;
+	int err;
+
+	dev_dbg(&pdev->dev, "vf%u: Get dirty page bitmap\n", pds_vfio->vf_id);
+
+	if (!pds_vfio_dirty_is_enabled(pds_vfio)) {
+		dev_err(&pdev->dev, "vf%u: Sync failed, dirty tracking is disabled\n",
+			pds_vfio->vf_id);
+		return -EINVAL;
+	}
+
+	pages = DIV_ROUND_UP(length, pds_vfio->dirty.region_page_size);
+	bitmap_size = round_up(pages, sizeof(u64) * BITS_PER_BYTE) /
+		BITS_PER_BYTE;
+
+	dev_dbg(&pdev->dev, "vf%u: iova 0x%lx length %lu page_size %llu pages %llu bitmap_size %llu\n",
+		pds_vfio->vf_id, iova, length,
+		pds_vfio->dirty.region_page_size, pages, bitmap_size);
+
+	if (!length ||
+	    ((dirty->region_start + iova + length) >
+	     (dirty->region_start + dirty->region_size))) {
+		dev_err(&pdev->dev, "Invalid iova 0x%lx and/or length 0x%lx to sync\n",
+			iova, length);
+		return -EINVAL;
+	}
+
+	/* bitmap is modified in 64 bit chunks */
+	bmp_bytes = ALIGN(DIV_ROUND_UP(length / dirty->region_page_size,
+				       sizeof(u64)), sizeof(u64));
+	if (bmp_bytes != bitmap_size) {
+		dev_err(&pdev->dev, "Calculated bitmap bytes %llu not equal to bitmap size %llu\n",
+			bmp_bytes, bitmap_size);
+		return -EINVAL;
+	}
+
+	bmp_offset = DIV_ROUND_UP(iova / dirty->region_page_size, sizeof(u64));
+
+	dev_dbg(&pdev->dev, "Syncing dirty bitmap, iova 0x%lx length 0x%lx, bmp_offset %llu bmp_bytes %llu\n",
+		iova, length, bmp_offset, bmp_bytes);
+
+	err = pds_vfio_dirty_read_seq(pds_vfio, bmp_offset, bmp_bytes);
+	if (err)
+		return err;
+
+	err = pds_vfio_dirty_process_bitmaps(pds_vfio, dirty_bitmap,
+					     bmp_offset, bmp_bytes);
+	if (err)
+		return err;
+
+	err = pds_vfio_dirty_write_ack(pds_vfio, bmp_offset, bmp_bytes);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+int
+pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova,
+			    unsigned long length,
+			    struct iova_bitmap *dirty)
+{
+	struct pds_vfio_pci_device *pds_vfio =
+		container_of(vdev, struct pds_vfio_pci_device,
+			     vfio_coredev.vdev);
+
+	return pds_vfio_dirty_sync(pds_vfio, dirty, iova, length);
+}
+
+int
+pds_vfio_dma_logging_start(struct vfio_device *vdev,
+			   struct rb_root_cached *ranges, u32 nnodes,
+			   u64 *page_size)
+{
+	struct pds_vfio_pci_device *pds_vfio =
+		container_of(vdev, struct pds_vfio_pci_device,
+			     vfio_coredev.vdev);
+	int err;
+
+	err = pds_vfio_dirty_enable(pds_vfio, ranges, nnodes, page_size);
+	if (err)
+		return err;
+
+	pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_IN_PROGRESS);
+
+	return 0;
+}
+
+int
+pds_vfio_dma_logging_stop(struct vfio_device *vdev)
+{
+	struct pds_vfio_pci_device *pds_vfio =
+		container_of(vdev, struct pds_vfio_pci_device,
+			     vfio_coredev.vdev);
+
+	return pds_vfio_dirty_disable(pds_vfio);
+}
diff --git a/drivers/vfio/pci/pds/dirty.h b/drivers/vfio/pci/pds/dirty.h
new file mode 100644
index 000000000000..934b8600dfc1
--- /dev/null
+++ b/drivers/vfio/pci/pds/dirty.h
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _DIRTY_H_
+#define _DIRTY_H_
+
+#include <linux/types.h>
+#include <linux/iova_bitmap.h>
+
+#include <linux/pds/pds_lm.h>
+
+struct pds_vfio_bmp_info {
+	unsigned long *bmp;
+	u32 bmp_bytes;
+	struct pds_lm_sg_elem *sgl;
+	dma_addr_t sgl_addr;
+	u16 num_sge;
+};
+
+struct pds_vfio_dirty {
+	struct pds_vfio_bmp_info host_seq;
+	struct pds_vfio_bmp_info host_ack;
+	u64 region_size;
+	u64 region_start;
+	u64 region_page_size;
+	bool is_enabled;
+};
+
+struct pds_vfio_pci_device;
+
+bool
+pds_vfio_dirty_is_enabled(struct pds_vfio_pci_device *pds_vfio);
+void
+pds_vfio_dirty_set_enabled(struct pds_vfio_pci_device *pds_vfio);
+void
+pds_vfio_dirty_set_disabled(struct pds_vfio_pci_device *pds_vfio);
+
+int
+pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova,
+			    unsigned long length,
+			    struct iova_bitmap *dirty);
+int
+pds_vfio_dma_logging_start(struct vfio_device *vdev,
+			   struct rb_root_cached *ranges, u32 nnodes,
+			   u64 *page_size);
+int
+pds_vfio_dma_logging_stop(struct vfio_device *vdev);
+#endif /* _DIRTY_H_ */
diff --git a/drivers/vfio/pci/pds/lm.h b/drivers/vfio/pci/pds/lm.h
index 3dd97b807db6..882342c571fc 100644
--- a/drivers/vfio/pci/pds/lm.h
+++ b/drivers/vfio/pci/pds/lm.h
@@ -33,6 +33,16 @@ struct pds_vfio_pci_device;
 struct file *
 pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio,
 				  enum vfio_device_mig_state next);
+int
+pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova,
+			    unsigned long length,
+			    struct iova_bitmap *dirty);
+int
+pds_vfio_dma_logging_start(struct vfio_device *vdev,
+			   struct rb_root_cached *ranges, u32 nnodes,
+			   u64 *page_size);
+int
+pds_vfio_dma_logging_stop(struct vfio_device *vdev);
 const char *
 pds_vfio_lm_state(enum vfio_device_mig_state state);
 void
diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c
index af8ce96033eb..1f09e3be408c 100644
--- a/drivers/vfio/pci/pds/vfio_dev.c
+++ b/drivers/vfio/pci/pds/vfio_dev.c
@@ -5,6 +5,7 @@
 #include <linux/vfio_pci_core.h>
 
 #include "lm.h"
+#include "dirty.h"
 #include "vfio_dev.h"
 #include "aux_drv.h"
 
@@ -108,6 +109,13 @@ pds_vfio_lm_ops = {
 	.migration_get_state = pds_vfio_get_device_state
 };
 
+static const struct vfio_log_ops
+pds_vfio_log_ops = {
+	.log_start = pds_vfio_dma_logging_start,
+	.log_stop = pds_vfio_dma_logging_stop,
+	.log_read_and_clear = pds_vfio_dma_logging_report,
+};
+
 static int
 pds_vfio_init_device(struct vfio_device *vdev)
 {
@@ -134,6 +142,7 @@ pds_vfio_init_device(struct vfio_device *vdev)
 
 	vdev->migration_flags = VFIO_MIGRATION_STOP_COPY;
 	vdev->mig_ops = &pds_vfio_lm_ops;
+	vdev->log_ops = &pds_vfio_log_ops;
 
 	dev_dbg(&pdev->dev, "%s: PF %#04x VF %#04x (%d) vf_id %d domain %d vfio_aux %p pds_vfio %p\n",
 		__func__, pci_dev_id(pdev->physfn),
diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h
index a09570eec6fa..42bfea448c10 100644
--- a/drivers/vfio/pci/pds/vfio_dev.h
+++ b/drivers/vfio/pci/pds/vfio_dev.h
@@ -7,6 +7,7 @@
 #include <linux/pci.h>
 #include <linux/vfio_pci_core.h>
 
+#include "dirty.h"
 #include "lm.h"
 
 struct pds_vfio_pci_device {
@@ -17,6 +18,7 @@ struct pds_vfio_pci_device {
 
 	struct pds_vfio_lm_file *save_file;
 	struct pds_vfio_lm_file *restore_file;
+	struct pds_vfio_dirty dirty;
 	struct mutex state_mutex; /* protect migration state */
 	enum vfio_device_mig_state state;
 	spinlock_t reset_lock; /* protect reset_done flow */
diff --git a/include/linux/pds/pds_lm.h b/include/linux/pds/pds_lm.h
index 28ebd62f7583..c7e83932f2fb 100644
--- a/include/linux/pds/pds_lm.h
+++ b/include/linux/pds/pds_lm.h
@@ -25,6 +25,13 @@ enum pds_lm_cmd_opcode {
 	PDS_LM_CMD_RESUME          = 20,
 	PDS_LM_CMD_SAVE            = 21,
 	PDS_LM_CMD_RESTORE         = 22,
+
+	/* Dirty page tracking commands */
+	PDS_LM_CMD_DIRTY_STATUS    = 32,
+	PDS_LM_CMD_DIRTY_ENABLE    = 33,
+	PDS_LM_CMD_DIRTY_DISABLE   = 34,
+	PDS_LM_CMD_DIRTY_READ_SEQ  = 35,
+	PDS_LM_CMD_DIRTY_WRITE_ACK = 36,
 };
 
 /**
@@ -215,4 +222,170 @@ struct pds_lm_host_vf_status_cmd {
 	u8     status;
 };
 
+/**
+ * struct pds_lm_dirty_region_info - Memory region info for STATUS and ENABLE
+ * @dma_base:		Base address of the DMA-contiguous memory region
+ * @page_count:		Number of pages in the memory region
+ * @page_size_log2:	Log2 page size in the memory region
+ * @rsvd:		Word boundary padding
+ */
+struct pds_lm_dirty_region_info {
+	__le64 dma_base;
+	__le32 page_count;
+	u8     page_size_log2;
+	u8     rsvd[3];
+};
+
+/**
+ * struct pds_lm_dirty_status_cmd - DIRTY_STATUS command
+ * @opcode:		Opcode PDS_LM_CMD_DIRTY_STATUS
+ * @rsvd:		Word boundary padding
+ * @vf_id:		VF id
+ * @max_regions:	Capacity of the region info buffer
+ * @rsvd2:		Word boundary padding
+ * @regions_dma:	DMA address of the region info buffer
+ *
+ * The minimum of max_regions (from the command) and num_regions (from the
+ * completion) of struct pds_lm_dirty_region_info will be written to
+ * regions_dma.
+ *
+ * The max_regions may be zero, in which case regions_dma is ignored.  In that
+ * case, the completion will only report the maximum number of regions
+ * supported by the device, and the number of regions currently enabled.
+ */
+struct pds_lm_dirty_status_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+	u8     max_regions;
+	u8     rsvd2[3];
+	__le64 regions_dma;
+} __packed;
+
+/**
+ * enum pds_lm_dirty_bmp_type - Type of dirty page bitmap
+ * @PDS_LM_DIRTY_BMP_TYPE_NONE: No bitmap / disabled
+ * @PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK: Seq/Ack bitmap representation
+ */
+enum pds_lm_dirty_bmp_type {
+	PDS_LM_DIRTY_BMP_TYPE_NONE     = 0,
+	PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK  = 1,
+};
+
+/**
+ * struct pds_lm_dirty_status_comp - STATUS command completion
+ * @status:		Status of the command (enum pds_core_status_code)
+ * @rsvd:		Word boundary padding
+ * @comp_index:		Index in the desc ring for which this is the completion
+ * @max_regions:	Maximum number of regions supported by the device
+ * @num_regions:	Number of regions currently enabled
+ * @bmp_type:		Type of dirty bitmap representation
+ * @rsvd2:		Word boundary padding
+ * @bmp_type_mask:	Mask of supported bitmap types, bit index per type
+ * @rsvd3:		Word boundary padding
+ * @color:		Color bit
+ *
+ * This completion descriptor is used for STATUS, ENABLE, and DISABLE.
+ */
+struct pds_lm_dirty_status_comp {
+	u8     status;
+	u8     rsvd;
+	__le16 comp_index;
+	u8     max_regions;
+	u8     num_regions;
+	u8     bmp_type;
+	u8     rsvd2;
+	__le32 bmp_type_mask;
+	u8     rsvd3[3];
+	u8     color;
+};
+
+/**
+ * struct pds_lm_dirty_enable_cmd - DIRTY_ENABLE command
+ * @opcode:		Opcode PDS_LM_CMD_DIRTY_ENABLE
+ * @rsvd:		Word boundary padding
+ * @vf_id:		VF id
+ * @bmp_type:		Type of dirty bitmap representation
+ * @num_regions:	Number of entries in the region info buffer
+ * @rsvd2:		Word boundary padding
+ * @regions_dma:	DMA address of the region info buffer
+ *
+ * The num_regions must be nonzero, and less than or equal to the maximum
+ * number of regions supported by the device.
+ *
+ * The memory regions should not overlap.
+ *
+ * The information should be initialized by the driver.  The device may modify
+ * the information on successful completion, such as by size-aligning the
+ * number of pages in a region.
+ *
+ * The modified number of pages will be greater than or equal to the page count
+ * given in the enable command, and at least as coarsly aligned as the given
+ * value.  For example, the count might be aligned to a multiple of 64, but
+ * if the value is already a multiple of 128 or higher, it will not change.
+ * If the driver requires its own minimum alignment of the number of pages, the
+ * driver should account for that already in the region info of this command.
+ *
+ * This command uses struct pds_lm_dirty_status_comp for its completion.
+ */
+struct pds_lm_dirty_enable_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+	u8     bmp_type;
+	u8     num_regions;
+	u8     rsvd2[2];
+	__le64 regions_dma;
+} __packed;
+
+/**
+ * struct pds_lm_dirty_disable_cmd - DIRTY_DISABLE command
+ * @opcode:	Opcode PDS_LM_CMD_DIRTY_DISABLE
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ *
+ * Dirty page tracking will be disabled.  This may be called in any state, as
+ * long as dirty page tracking is supported by the device, to ensure that dirty
+ * page tracking is disabled.
+ *
+ * This command uses struct pds_lm_dirty_status_comp for its completion.  On
+ * success, num_regions will be zero.
+ */
+struct pds_lm_dirty_disable_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+};
+
+/**
+ * struct pds_lm_dirty_seq_ack_cmd - DIRTY_READ_SEQ or _WRITE_ACK command
+ * @opcode:	Opcode PDS_LM_CMD_DIRTY_[READ_SEQ|WRITE_ACK]
+ * @rsvd:	Word boundary padding
+ * @vf_id:	VF id
+ * @off_bytes:	Byte offset in the bitmap
+ * @len_bytes:	Number of bytes to transfer
+ * @num_sge:	Number of DMA scatter gather elements
+ * @rsvd2:	Word boundary padding
+ * @sgl_addr:	DMA address of scatter gather list
+ *
+ * Read bytes from the SEQ bitmap, or write bytes into the ACK bitmap.
+ *
+ * This command treats the entire bitmap as a byte buffer.  It does not
+ * distinguish between guest memory regions.  The driver should refer to the
+ * number of pages in each region, according to PDS_LM_CMD_DIRTY_STATUS, to
+ * determine the region boundaries in the bitmap.  Each region will be
+ * represented by exactly the number of bits as the page count for that region,
+ * immediately following the last bit of the previous region.
+ */
+struct pds_lm_dirty_seq_ack_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+	__le32 off_bytes;
+	__le32 len_bytes;
+	__le16 num_sge;
+	u8     rsvd2[2];
+	__le64 sgl_addr;
+} __packed;
+
 #endif /* _PDS_LM_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH vfio 6/7] vfio/pds: Add support for firmware recovery
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
                   ` (4 preceding siblings ...)
  2022-12-07  1:07 ` [RFC PATCH vfio 5/7] vfio/pds: Add support for dirty page tracking Brett Creeley
@ 2022-12-07  1:07 ` Brett Creeley
  2022-12-07  1:07 ` [RFC PATCH vfio 7/7] vfio/pds: Add Kconfig and documentation Brett Creeley
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-07  1:07 UTC (permalink / raw)
  To: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Brett Creeley

It's possible that the device firmware crashes and is able to recover
due to some configuration and/or other issue. If a live migration
is in progress while the firmware crashes, it will fail. However,
the VF PCI device should still be functional post crash recovery and
subsequent migrations should go through as expected.

When the pds_core device notices that firmware crashes it sends an
event to all its client drivers over auxiliary bus. When the pds_vfio
driver receives this event while migration is in progress it will
request a deferred reset on the next migration state transition. This
state transition will report failure as well as any subsequent state
transition requests from the VMM/VFIO. Based on uapi/vfio.h the only
way out of VFIO_DEVICE_STATE_ERROR is by issuing VFIO_DEVICE_RESET.
Once this reset is done, the migration state will be reset to
VFIO_DEVICE_STATE_RUNNING and migration can be performed.

If the event is received while no migration is in progress (i.e.
the VM is in normal operating mode), then no actions are taken
and the migration state remains VFIO_DEVICE_STATE_RUNNING.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 drivers/vfio/pci/pds/aux_drv.c  | 61 +++++++++++++++++++++++++++++++++
 drivers/vfio/pci/pds/aux_drv.h  |  1 +
 drivers/vfio/pci/pds/vfio_dev.c | 34 ++++++++++++++++--
 drivers/vfio/pci/pds/vfio_dev.h |  4 +++
 4 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/pds/aux_drv.c b/drivers/vfio/pci/pds/aux_drv.c
index b4da741d7956..e20d8448a978 100644
--- a/drivers/vfio/pci/pds/aux_drv.c
+++ b/drivers/vfio/pci/pds/aux_drv.c
@@ -21,6 +21,46 @@ struct auxiliary_device_id pds_vfio_aux_id_table[] = {
 	{},
 };
 
+static void
+pds_vfio_recovery_work(struct work_struct *work)
+{
+	struct pds_vfio_aux *vfio_aux =
+		container_of(work, struct pds_vfio_aux, work);
+	struct pds_vfio_pci_device *pds_vfio = vfio_aux->pds_vfio;
+	bool deferred_reset_needed = false;
+
+	/* Documentation states that the kernel migration driver must not
+	 * generate asynchronous device state transitions outside of
+	 * manipulation by the user or the VFIO_DEVICE_RESET ioctl.
+	 *
+	 * Since recovery is an asynchronous event received from the device,
+	 * initiate a deferred reset. Only issue the deferred reset if a
+	 * migration is in progress, which will cause the next step of the
+	 * migration to fail. Also, if the device is in a state that will
+	 * be set to VFIO_DEVICE_STATE_RUNNING on the next action (i.e. VM is
+	 * shutdown and device is in VFIO_DEVICE_STATE_STOP) as that will clear
+	 * the VFIO_DEVICE_STATE_ERROR when the VM starts back up.
+	 */
+	mutex_lock(&pds_vfio->state_mutex);
+	if ((pds_vfio->state != VFIO_DEVICE_STATE_RUNNING &&
+	     pds_vfio->state != VFIO_DEVICE_STATE_ERROR) ||
+	    (pds_vfio->state == VFIO_DEVICE_STATE_RUNNING &&
+	     pds_vfio_dirty_is_enabled(pds_vfio)))
+		deferred_reset_needed = true;
+	mutex_unlock(&pds_vfio->state_mutex);
+
+	/* On the next user initiated state transition, the device will
+	 * transition to the VFIO_DEVICE_STATE_ERROR. At this point it's the user's
+	 * responsibility to reset the device.
+	 *
+	 * If a VFIO_DEVICE_RESET is requested post recovery and before the next
+	 * state transition, then the deferred reset state will be set to
+	 * VFIO_DEVICE_STATE_RUNNING.
+	 */
+	if (deferred_reset_needed)
+		pds_vfio_deferred_reset(pds_vfio, VFIO_DEVICE_STATE_ERROR);
+}
+
 static void
 pds_vfio_aux_notify_handler(struct pds_auxiliary_dev *padev,
 			    union pds_core_notifyq_comp *event)
@@ -29,6 +69,23 @@ pds_vfio_aux_notify_handler(struct pds_auxiliary_dev *padev,
 	u16 ecode = le16_to_cpu(event->ecode);
 
 	dev_dbg(dev, "%s: event code %d\n", __func__, ecode);
+
+	/* We don't need to do anything for RESET state==0 as there is no notify
+	 * or feedback mechanism available, and it is possible that we won't
+	 * even see a state==0 event.
+	 *
+	 * Any requests from VFIO while state==0 will fail, which will return
+	 * error and may cause migration to fail.
+	 */
+	if (ecode == PDS_EVENT_RESET) {
+		dev_info(dev, "%s: PDS_EVENT_RESET event received, state==%d\n",
+			 __func__, event->reset.state);
+		if (event->reset.state == 1) {
+			struct pds_vfio_aux *vfio_aux = auxiliary_get_drvdata(&padev->aux_dev);
+
+			schedule_work(&vfio_aux->work);
+		}
+	}
 }
 
 static int
@@ -87,6 +144,8 @@ pds_vfio_aux_probe(struct auxiliary_device *aux_dev,
 		goto err_register_client;
 	}
 
+	INIT_WORK(&vfio_aux->work, pds_vfio_recovery_work);
+
 	return 0;
 
 err_register_client:
@@ -104,6 +163,8 @@ pds_vfio_aux_remove(struct auxiliary_device *aux_dev)
 	struct pds_vfio_aux *vfio_aux = auxiliary_get_drvdata(aux_dev);
 	struct pds_vfio_pci_device *pds_vfio = vfio_aux->pds_vfio;
 
+	cancel_work_sync(&vfio_aux->work);
+
 	if (pds_vfio) {
 		pds_vfio_unregister_client_cmd(pds_vfio);
 		vfio_aux->pds_vfio->vfio_aux = NULL;
diff --git a/drivers/vfio/pci/pds/aux_drv.h b/drivers/vfio/pci/pds/aux_drv.h
index 0f05a968bb00..422a42b3ce14 100644
--- a/drivers/vfio/pci/pds/aux_drv.h
+++ b/drivers/vfio/pci/pds/aux_drv.h
@@ -17,6 +17,7 @@ struct pds_vfio_aux {
 	struct pds_auxiliary_dev *padev;
 	struct pds_auxiliary_drv padrv;
 	struct pds_vfio_pci_device *pds_vfio;
+	struct work_struct work;
 };
 
 struct auxiliary_driver *
diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c
index 1f09e3be408c..117757be9113 100644
--- a/drivers/vfio/pci/pds/vfio_dev.c
+++ b/drivers/vfio/pci/pds/vfio_dev.c
@@ -26,10 +26,17 @@ pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio)
 	if (pds_vfio->deferred_reset) {
 		pds_vfio->deferred_reset = false;
 		if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) {
-			pds_vfio->state = VFIO_DEVICE_STATE_RUNNING;
+			dev_dbg(&pds_vfio->pdev->dev, "Transitioning from VFIO_DEVICE_STATE_ERROR to %s\n",
+				pds_vfio_lm_state(pds_vfio->deferred_reset_state));
+			pds_vfio->state = pds_vfio->deferred_reset_state;
 			pds_vfio_put_restore_file(pds_vfio);
 			pds_vfio_put_save_file(pds_vfio);
+		} else if (pds_vfio->deferred_reset_state == VFIO_DEVICE_STATE_ERROR) {
+			dev_dbg(&pds_vfio->pdev->dev, "Transitioning from %s to VFIO_DEVICE_STATE_ERROR based on deferred_reset request\n",
+				pds_vfio_lm_state(pds_vfio->state));
+			pds_vfio->state = VFIO_DEVICE_STATE_ERROR;
 		}
+		pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
 		spin_unlock(&pds_vfio->reset_lock);
 		goto again;
 	}
@@ -42,6 +49,7 @@ pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio)
 {
 	spin_lock(&pds_vfio->reset_lock);
 	pds_vfio->deferred_reset = true;
+	pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
 	if (!mutex_trylock(&pds_vfio->state_mutex)) {
 		spin_unlock(&pds_vfio->reset_lock);
 		return;
@@ -50,6 +58,18 @@ pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio)
 	pds_vfio_state_mutex_unlock(pds_vfio);
 }
 
+void
+pds_vfio_deferred_reset(struct pds_vfio_pci_device *pds_vfio,
+			enum vfio_device_mig_state reset_state)
+{
+	dev_info(&pds_vfio->pdev->dev, "Requesting deferred_reset to state %s\n",
+		 pds_vfio_lm_state(reset_state));
+	spin_lock(&pds_vfio->reset_lock);
+	pds_vfio->deferred_reset = true;
+	pds_vfio->deferred_reset_state = reset_state;
+	spin_unlock(&pds_vfio->reset_lock);
+}
+
 static struct file *
 pds_vfio_set_device_state(struct vfio_device *vdev,
 			  enum vfio_device_mig_state new_state)
@@ -63,7 +83,13 @@ pds_vfio_set_device_state(struct vfio_device *vdev,
 		return ERR_PTR(-ENODEV);
 
 	mutex_lock(&pds_vfio->state_mutex);
-	while (new_state != pds_vfio->state) {
+	/* only way to transition out of VFIO_DEVICE_STATE_ERROR is via
+	 * VFIO_DEVICE_RESET, so prevent the state machine from running since
+	 * vfio_mig_get_next_state() will throw a WARN_ON() when transitioning
+	 * from VFIO_DEVICE_STATE_ERROR to any other state
+	 */
+	while (pds_vfio->state != VFIO_DEVICE_STATE_ERROR &&
+	       new_state != pds_vfio->state) {
 		enum vfio_device_mig_state next_state;
 
 		int err = vfio_mig_get_next_state(vdev, pds_vfio->state,
@@ -85,6 +111,9 @@ pds_vfio_set_device_state(struct vfio_device *vdev,
 		}
 	}
 	pds_vfio_state_mutex_unlock(pds_vfio);
+	/* still waiting on a deferred_reset */
+	if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR)
+		res = ERR_PTR(-EIO);
 
 	return res;
 }
@@ -168,6 +197,7 @@ pds_vfio_open_device(struct vfio_device *vdev)
 	dev_dbg(&pds_vfio->pdev->dev, "%s: %s => VFIO_DEVICE_STATE_RUNNING\n",
 		__func__, pds_vfio_lm_state(pds_vfio->state));
 	pds_vfio->state = VFIO_DEVICE_STATE_RUNNING;
+	pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
 
 	vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev);
 
diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h
index 42bfea448c10..212bb687cf9b 100644
--- a/drivers/vfio/pci/pds/vfio_dev.h
+++ b/drivers/vfio/pci/pds/vfio_dev.h
@@ -23,6 +23,7 @@ struct pds_vfio_pci_device {
 	enum vfio_device_mig_state state;
 	spinlock_t reset_lock; /* protect reset_done flow */
 	u8 deferred_reset;
+	enum vfio_device_mig_state deferred_reset_state;
 
 	int vf_id;
 	int pci_id;
@@ -34,5 +35,8 @@ struct pds_vfio_pci_device *
 pds_vfio_pci_drvdata(struct pci_dev *pdev);
 void
 pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio);
+void
+pds_vfio_deferred_reset(struct pds_vfio_pci_device *pds_vfio,
+			enum vfio_device_mig_state reset_state);
 
 #endif /* _VFIO_DEV_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH vfio 7/7] vfio/pds: Add Kconfig and documentation
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
                   ` (5 preceding siblings ...)
  2022-12-07  1:07 ` [RFC PATCH vfio 6/7] vfio/pds: Add support for firmware recovery Brett Creeley
@ 2022-12-07  1:07 ` Brett Creeley
  2022-12-07  7:43 ` [RFC PATCH vfio 0/7] pds vfio driver Christoph Hellwig
  2022-12-11 12:54 ` Max Gurtovoy
  8 siblings, 0 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-07  1:07 UTC (permalink / raw)
  To: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Brett Creeley

Add Kconfig entries and pds_vfio.rst. Also, add an entry in the
MAINTAINERS file for this new driver.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 .../ethernet/pensando/pds_vfio.rst            | 88 +++++++++++++++++++
 MAINTAINERS                                   |  7 ++
 drivers/vfio/pci/Kconfig                      |  2 +
 drivers/vfio/pci/pds/Kconfig                  | 17 ++++
 4 files changed, 114 insertions(+)
 create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst
 create mode 100644 drivers/vfio/pci/pds/Kconfig

diff --git a/Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst b/Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst
new file mode 100644
index 000000000000..adc144a4a7b8
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst
@@ -0,0 +1,88 @@
+.. SPDX-License-Identifier: GPL-2.0+
+.. note: can be edited and viewed with /usr/bin/formiko-vim
+
+==========================================================
+PCI VFIO driver for the Pensando(R) DSC adapter family
+==========================================================
+
+Pensando Linux VFIO PCI Device Driver
+Copyright(c) 2022 Pensando Systems, Inc
+
+Overview
+========
+
+The ``pds_vfio`` driver is both a PCI and auxiliary bus driver. The
+PCI driver supports Live Migration capable NVMe Virtual Function (VF)
+devices and the auxiliary driver is used to communicate with the
+``pds_core`` driver and hardware.
+
+Using the device
+================
+
+The pds_vfio device is enabled via multiple configuration steps and
+depends on the ``pds_core`` driver to create and enable SR-IOV Virtual
+Function devices.
+
+Shown below are the steps to bind the driver to a VF and also to the
+associated auxiliary device created by the ``pds_core`` driver. This
+example assumes the pds_core and pds_vfio modules are already
+loaded.
+
+.. code-block:: bash
+  :name: example-setup-script
+
+  #!/bin/bash
+
+  PF_BUS="0000:60"
+  PF_BDF="0000:60:00.0"
+  VF_BDF="0000:60:00.1"
+
+  # Enable live migration VF auxiliary device(s)
+  devlink dev param set pci/$PF_BDF name enable_migration value true cmode runtime
+
+  # Prevent nvme driver from probing the NVMe VF device
+  echo 0 > /sys/class/pci_bus/$PF_BUS/device/$PF_BDF/sriov_drivers_autoprobe
+
+  # Create single VF for NVMe Live Migration via VFIO
+  echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
+
+  # Allow the VF to be bound to the pds_vfio driver
+  echo "pds_vfio" > /sys/class/pci_bus/$PF_BUS/device/$VF_BDF/driver_override
+
+  # Bind the VF to the pds_vfio driver
+  echo "$VF_BDF" > /sys/bus/pci/drivers/pds_vfio/bind
+
+After performing the steps above the pds_vfio driver's PCI probe should
+have been called, the pds_vfio driver's auxiliary probe should have
+been called, and a file in /dev/vfio/<iommu_group> should have been created.
+There will also be an entry in /sys/bus/auxiliary/device/pds_core.LM.<nn>
+for the VF's auxiliary device and the associated driver registered by the
+pds_vfio module will be at /sys/bus/auxiliary/drivers/pds_vfio.LM.
+
+
+Enabling the driver
+===================
+
+The driver is enabled via the standard kernel configuration system,
+using the make command::
+
+  make oldconfig/menuconfig/etc.
+
+The driver is located in the menu structure at:
+
+  -> Device Drivers
+    -> VFIO Non-Privileged userspace driver framework
+      -> VFIO support for PDS PCI devices
+
+Support
+=======
+
+For general Linux networking support, please use the netdev mailing
+list, which is monitored by Pensando personnel::
+
+  netdev@vger.kernel.org
+
+For more specific support needs, please use the Pensando driver support
+email::
+
+  drivers@pensando.io
diff --git a/MAINTAINERS b/MAINTAINERS
index 2723cbdf8fd7..202f93dfce34 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21617,6 +21617,13 @@ S:	Maintained
 P:	Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst
 F:	drivers/vfio/pci/*/
 
+VFIO PDS PCI DRIVER
+M:	Brett Creeley <brett.creeley@amd.com>
+L:	kvm@vger.kernel.org
+S:	Maintained
+F:	Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst
+F:	drivers/vfio/pci/pds/
+
 VFIO PLATFORM DRIVER
 M:	Eric Auger <eric.auger@redhat.com>
 L:	kvm@vger.kernel.org
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index f9d0c908e738..2c3831dd60ef 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -59,4 +59,6 @@ source "drivers/vfio/pci/mlx5/Kconfig"
 
 source "drivers/vfio/pci/hisilicon/Kconfig"
 
+source "drivers/vfio/pci/pds/Kconfig"
+
 endif
diff --git a/drivers/vfio/pci/pds/Kconfig b/drivers/vfio/pci/pds/Kconfig
new file mode 100644
index 000000000000..d9bc9734c3cf
--- /dev/null
+++ b/drivers/vfio/pci/pds/Kconfig
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0
+config PDS_VFIO_PCI
+	tristate "VFIO support for PDS PCI devices"
+	depends on PDS_CORE
+	depends on VFIO_PCI_CORE
+	help
+	  This provides generic PCI support for PDS devices using the VFIO
+	  framework.
+
+	  More specific information on this driver can be
+	  found in
+	  <file:Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst>.
+
+	  To compile this driver as a module, choose M here. The module
+	  will be called pds_vfio.
+
+	  If you don't know what to do here, say N.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH vfio 0/7] pds vfio driver
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
                   ` (6 preceding siblings ...)
  2022-12-07  1:07 ` [RFC PATCH vfio 7/7] vfio/pds: Add Kconfig and documentation Brett Creeley
@ 2022-12-07  7:43 ` Christoph Hellwig
  2022-12-11 12:54 ` Max Gurtovoy
  8 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2022-12-07  7:43 UTC (permalink / raw)
  To: Brett Creeley
  Cc: kvm, netdev, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian, shannon.nelson, drivers

On Tue, Dec 06, 2022 at 05:06:58PM -0800, Brett Creeley wrote:
> AMD/Pensando already supports a NVMe VF device (1dd8:1006) in the
> Distributed Services Card (DSC). This patchset adds the new pds_vfio
> driver in order to support NVMe VF live migration.

If you want NVMe live migration, please work with the nvme technical
working group to standardize it.  We will not add support for a
gazillion incompatible and probably broken concepts of this.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support
  2022-12-07  1:07 ` [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support Brett Creeley
@ 2022-12-07 17:09   ` Jason Gunthorpe
  2022-12-07 21:32     ` Brett Creeley
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2022-12-07 17:09 UTC (permalink / raw)
  To: Brett Creeley
  Cc: kvm, netdev, alex.williamson, cohuck, yishaih,
	shameerali.kolothum.thodi, kevin.tian, shannon.nelson, drivers

On Tue, Dec 06, 2022 at 05:07:01PM -0800, Brett Creeley wrote:

> +struct file *
> +pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio,
> +				  enum vfio_device_mig_state next)
> +{
> +	enum vfio_device_mig_state cur = pds_vfio->state;
> +	struct device *dev = &pds_vfio->pdev->dev;
> +	unsigned long lm_action_start;
> +	int err = 0;
> +
> +	dev_dbg(dev, "%s => %s\n",
> +		pds_vfio_lm_state(cur), pds_vfio_lm_state(next));
> +
> +	lm_action_start = jiffies;
> +	if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_STOP_COPY) {
> +		/* Device is already stopped
> +		 * create save device data file & get device state from firmware
> +		 */
> +		err = pds_vfio_get_save_file(pds_vfio);
> +		if (err)
> +			return ERR_PTR(err);
> +
> +		/* Get device state */
> +		err = pds_vfio_get_lm_state_cmd(pds_vfio);
> +		if (err) {
> +			pds_vfio_put_save_file(pds_vfio);
> +			return ERR_PTR(err);
> +		}
> +
> +		return pds_vfio->save_file->filep;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_STOP_COPY && next == VFIO_DEVICE_STATE_STOP) {
> +		/* Device is already stopped
> +		 * delete the save device state file
> +		 */
> +		pds_vfio_put_save_file(pds_vfio);
> +		pds_vfio_send_host_vf_lm_status_cmd(pds_vfio,
> +						    PDS_LM_STA_NONE);
> +		return NULL;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_RESUMING) {
> +		/* create resume device data file */
> +		err = pds_vfio_get_restore_file(pds_vfio);
> +		if (err)
> +			return ERR_PTR(err);
> +
> +		return pds_vfio->restore_file->filep;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_RESUMING && next == VFIO_DEVICE_STATE_STOP) {
> +		/* Set device state */
> +		err = pds_vfio_set_lm_state_cmd(pds_vfio);
> +		if (err)
> +			return ERR_PTR(err);
> +
> +		/* delete resume device data file */
> +		pds_vfio_put_restore_file(pds_vfio);
> +		return NULL;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_RUNNING && next == VFIO_DEVICE_STATE_STOP) {
> +		/* Device should be stopped
> +		 * no interrupts, dma or change in internal state
> +		 */
> +		err = pds_vfio_suspend_device_cmd(pds_vfio);
> +		if (err)
> +			return ERR_PTR(err);
> +
> +		return NULL;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_RUNNING) {
> +		/* Device should be functional
> +		 * interrupts, dma, mmio or changes to internal state is allowed
> +		 */
> +		err = pds_vfio_resume_device_cmd(pds_vfio);
> +		if (err)
> +			return ERR_PTR(err);
> +
> +		pds_vfio_send_host_vf_lm_status_cmd(pds_vfio,
> +						    PDS_LM_STA_NONE);
> +		return NULL;
> +	}

Please implement the P2P states in your device. After long discussions
we really want to see all VFIO migrations implementations support
this.

It is still not clear what qemu will do when it sees devices that do
not support P2P, but it will not be nice.

Also, since you are obviously using and testing the related qemu
series, please participate in the review of that in the qemu list, or
at least offer your support with testing.

While HCH is objecting to this driver even existing I won't comment on
specific details.. Though it is intesting this approach doesn't change
NVMe at all so it does seem less objectionable to me than the Intel
RFC.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support
  2022-12-07 17:09   ` Jason Gunthorpe
@ 2022-12-07 21:32     ` Brett Creeley
  2022-12-07 23:29       ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Brett Creeley @ 2022-12-07 21:32 UTC (permalink / raw)
  To: Jason Gunthorpe, Brett Creeley
  Cc: kvm, netdev, alex.williamson, cohuck, yishaih,
	shameerali.kolothum.thodi, kevin.tian, shannon.nelson, drivers

On 12/7/2022 9:09 AM, Jason Gunthorpe wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Tue, Dec 06, 2022 at 05:07:01PM -0800, Brett Creeley wrote:
> 
>> +struct file *
>> +pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio,
>> +                               enum vfio_device_mig_state next)
>> +{
>> +     enum vfio_device_mig_state cur = pds_vfio->state;
>> +     struct device *dev = &pds_vfio->pdev->dev;
>> +     unsigned long lm_action_start;
>> +     int err = 0;
>> +
>> +     dev_dbg(dev, "%s => %s\n",
>> +             pds_vfio_lm_state(cur), pds_vfio_lm_state(next));
>> +
>> +     lm_action_start = jiffies;
>> +     if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_STOP_COPY) {
>> +             /* Device is already stopped
>> +              * create save device data file & get device state from firmware
>> +              */
>> +             err = pds_vfio_get_save_file(pds_vfio);
>> +             if (err)
>> +                     return ERR_PTR(err);
>> +
>> +             /* Get device state */
>> +             err = pds_vfio_get_lm_state_cmd(pds_vfio);
>> +             if (err) {
>> +                     pds_vfio_put_save_file(pds_vfio);
>> +                     return ERR_PTR(err);
>> +             }
>> +
>> +             return pds_vfio->save_file->filep;
>> +     }
>> +
>> +     if (cur == VFIO_DEVICE_STATE_STOP_COPY && next == VFIO_DEVICE_STATE_STOP) {
>> +             /* Device is already stopped
>> +              * delete the save device state file
>> +              */
>> +             pds_vfio_put_save_file(pds_vfio);
>> +             pds_vfio_send_host_vf_lm_status_cmd(pds_vfio,
>> +                                                 PDS_LM_STA_NONE);
>> +             return NULL;
>> +     }
>> +
>> +     if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_RESUMING) {
>> +             /* create resume device data file */
>> +             err = pds_vfio_get_restore_file(pds_vfio);
>> +             if (err)
>> +                     return ERR_PTR(err);
>> +
>> +             return pds_vfio->restore_file->filep;
>> +     }
>> +
>> +     if (cur == VFIO_DEVICE_STATE_RESUMING && next == VFIO_DEVICE_STATE_STOP) {
>> +             /* Set device state */
>> +             err = pds_vfio_set_lm_state_cmd(pds_vfio);
>> +             if (err)
>> +                     return ERR_PTR(err);
>> +
>> +             /* delete resume device data file */
>> +             pds_vfio_put_restore_file(pds_vfio);
>> +             return NULL;
>> +     }
>> +
>> +     if (cur == VFIO_DEVICE_STATE_RUNNING && next == VFIO_DEVICE_STATE_STOP) {
>> +             /* Device should be stopped
>> +              * no interrupts, dma or change in internal state
>> +              */
>> +             err = pds_vfio_suspend_device_cmd(pds_vfio);
>> +             if (err)
>> +                     return ERR_PTR(err);
>> +
>> +             return NULL;
>> +     }
>> +
>> +     if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_RUNNING) {
>> +             /* Device should be functional
>> +              * interrupts, dma, mmio or changes to internal state is allowed
>> +              */
>> +             err = pds_vfio_resume_device_cmd(pds_vfio);
>> +             if (err)
>> +                     return ERR_PTR(err);
>> +
>> +             pds_vfio_send_host_vf_lm_status_cmd(pds_vfio,
>> +                                                 PDS_LM_STA_NONE);
>> +             return NULL;
>> +     }
> 
> Please implement the P2P states in your device. After long discussions
> we really want to see all VFIO migrations implementations support
> this.
> 
> It is still not clear what qemu will do when it sees devices that do
> not support P2P, but it will not be nice.

Does that mean VFIO_MIGRATION_P2P is going to be required going forward 
or do we just need to handle the P2P transitions? Can you point me to 
where this is being discussed?

> 
> Also, since you are obviously using and testing the related qemu
> series, please participate in the review of that in the qemu list, or
> at least offer your support with testing.

ACK.

> 
> While HCH is objecting to this driver even existing I won't comment on
> specific details.. Though it is intesting this approach doesn't change
> NVMe at all so it does seem less objectionable to me than the Intel
> RFC.

That's understandable and thanks for the initial feedback.

Yes, no NVMe changes required.

> 
> Jason
> 
> --
> You received this message because you are subscribed to the Google Groups "Pensando Drivers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to drivers+unsubscribe@pensando.io.
> To view this discussion on the web visit https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Fpensando.io%2Fd%2Fmsgid%2Fdrivers%2FY5DIvM1Ca0qLNzPt%2540ziepe.ca&amp;data=05%7C01%7Cbrett.creeley%40amd.com%7Cb5b743b18f054684cc3108dad875c84f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638060297638789271%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3xXVIMNmYF7qFjhOiW9DDhbrzZklx%2FZ9xmEirgwodfw%3D&amp;reserved=0.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support
  2022-12-07 21:32     ` Brett Creeley
@ 2022-12-07 23:29       ` Jason Gunthorpe
  2022-12-07 23:34         ` Brett Creeley
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2022-12-07 23:29 UTC (permalink / raw)
  To: Brett Creeley
  Cc: Brett Creeley, kvm, netdev, alex.williamson, cohuck, yishaih,
	shameerali.kolothum.thodi, kevin.tian, shannon.nelson, drivers

On Wed, Dec 07, 2022 at 01:32:34PM -0800, Brett Creeley wrote:

> > Please implement the P2P states in your device. After long discussions
> > we really want to see all VFIO migrations implementations support
> > this.
> > 
> > It is still not clear what qemu will do when it sees devices that do
> > not support P2P, but it will not be nice.
> 
> Does that mean VFIO_MIGRATION_P2P is going to be required going forward or
> do we just need to handle the P2P transitions? Can you point me to where
> this is being discussed?

It means the device has to support a state where it is not issuing any
outgoing DMA but continuing to process incoming DMA.

This is mandatory to properly support multiple VFIO devices in the
same VM, which is why we want to see all devices implementing it. If
the devices don't support it we may assume it means the device is
broken and qemu will have to actively block P2P at the IOMMU.

There was lots of long threads in around Dec 2021 if I recall, lore
could probably find them. Somewhere around here based on the search

https://lore.kernel.org/kvm/20220215155602.GB1046125@nvidia.com/

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support
  2022-12-07 23:29       ` Jason Gunthorpe
@ 2022-12-07 23:34         ` Brett Creeley
  0 siblings, 0 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-07 23:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, netdev, alex.williamson, cohuck, yishaih,
	shameerali.kolothum.thodi, kevin.tian, shannon.nelson, drivers


On 12/7/2022 3:29 PM, Jason Gunthorpe wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, Dec 07, 2022 at 01:32:34PM -0800, Brett Creeley wrote:
> 
>>> Please implement the P2P states in your device. After long discussions
>>> we really want to see all VFIO migrations implementations support
>>> this.
>>>
>>> It is still not clear what qemu will do when it sees devices that do
>>> not support P2P, but it will not be nice.
>>
>> Does that mean VFIO_MIGRATION_P2P is going to be required going forward or
>> do we just need to handle the P2P transitions? Can you point me to where
>> this is being discussed?
> 
> It means the device has to support a state where it is not issuing any
> outgoing DMA but continuing to process incoming DMA.
> 
> This is mandatory to properly support multiple VFIO devices in the
> same VM, which is why we want to see all devices implementing it. If
> the devices don't support it we may assume it means the device is
> broken and qemu will have to actively block P2P at the IOMMU.
> 
> There was lots of long threads in around Dec 2021 if I recall, lore
> could probably find them. Somewhere around here based on the search
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F20220215155602.GB1046125%40nvidia.com%2F&amp;data=05%7C01%7Cbrett.creeley%40amd.com%7C5b6b2f8d92d34c0deaed08dad8aae13e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638060525686868907%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Qd0VLPAL5LTYBXi40N3OfWlIecgIhJW70FIAwL9O1lQ%3D&amp;reserved=0

Thanks for the info and link.

Brett

> 
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH vfio 0/7] pds vfio driver
  2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
                   ` (7 preceding siblings ...)
  2022-12-07  7:43 ` [RFC PATCH vfio 0/7] pds vfio driver Christoph Hellwig
@ 2022-12-11 12:54 ` Max Gurtovoy
  2022-12-12  1:16   ` Brett Creeley
  8 siblings, 1 reply; 16+ messages in thread
From: Max Gurtovoy @ 2022-12-11 12:54 UTC (permalink / raw)
  To: Brett Creeley, kvm, netdev, alex.williamson, cohuck, jgg,
	yishaih, shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Oren Duer


On 12/7/2022 3:06 AM, Brett Creeley wrote:
> This is a first draft patchset for a new vendor specific VFIO driver for
> use with the AMD/Pensando Distributed Services Card (DSC). This driver
> (pds_vfio) is a client of the newly introduced pds_core driver.
>
> Reference to the pds_core patchset:
> https://lore.kernel.org/netdev/20221207004443.33779-1-shannon.nelson@amd.com/
>
> AMD/Pensando already supports a NVMe VF device (1dd8:1006) in the
> Distributed Services Card (DSC). This patchset adds the new pds_vfio
> driver in order to support NVMe VF live migration.
>
> This driver will use the pds_core device and auxiliary_bus as the VFIO
> control path to the DSC. The pds_core device creates auxiliary_bus devices
> for each live migratable VF. The devices are named by their feature plus
> the VF PCI BDF so the auxiliary_bus driver implemented by pds_vfio can find
> its related VF PCI driver instance. Once this auxiliary bus connection
> is configured, the pds_vfio driver can send admin queue commands to the
> device and receive events from pds_core.
>
> An ASCII diagram of a VFIO instance looks something like this and can
> be used with the VFIO subsystem to provide devices VFIO and live
> migration support.
>
>                                 .------.  .--------------------------.
>                                 | QEMU |--|  VM     .-------------.  |
>                                 '......'  |         | nvme driver |  |
>                                    |      |         .-------------.  |
>                                    |      |         |  SR-IOV VF  |  |
>                                    |      |         '-------------'  |
>                                    |      '---------------||---------'
>                                 .--------------.          ||
>                                 |/dev/<vfio_fd>|          ||
>                                 '--------------'          ||
> Host Userspace                         |                 ||
> ===================================================      ||
> Host Kernel                            |                 ||
>                                         |                 ||
>             pds_core.LM.2305 <--+   .--------.            ||
>                     |           |   |vfio-pci|            ||
>                     |           |   '--------'            ||
>                     |           |       |                 ||
>           .------------.       .-------------.            ||
>           |  pds_core  |       |   pds_vfio  |            ||
>           '------------'       '-------------'            ||
>                 ||                   ||                   ||
>               09:00.0              09:00.1                ||
> == PCI ==================================================||=====
>                 ||                   ||                   ||
>            .----------.         .----------.              ||
>      ,-----|    PF    |---------|    VF    |-------------------,
>      |     '----------'         '----------'  |      nvme      |
>      |                     DSC                |  data/control  |
>      |                                        |      path      |
>      -----------------------------------------------------------

Hi Brett,

what is the class code of the pds_core device ?

I see that pds_vfio class_code is PCI_CLASS_STORAGE_EXPRESS.

>
>
> The pds_vfio driver is targeted to reside in drivers/vfio/pci/pds.
> It makes use of and introduces new files in the common include/linux/pds
> include directory.
>
> Brett Creeley (7):
>    pds_vfio: Initial support for pds_vfio VFIO driver
>    pds_vfio: Add support to register as PDS client
>    pds_vfio: Add VFIO live migration support
>    vfio: Commonize combine_ranges for use in other VFIO drivers
>    pds_vfio: Add support for dirty page tracking
>    pds_vfio: Add support for firmware recovery
>    pds_vfio: Add documentation files
>
>   .../ethernet/pensando/pds_vfio.rst            |  88 +++
>   drivers/vfio/pci/Kconfig                      |   2 +
>   drivers/vfio/pci/mlx5/cmd.c                   |  48 +-
>   drivers/vfio/pci/pds/Kconfig                  |  10 +
>   drivers/vfio/pci/pds/Makefile                 |  12 +
>   drivers/vfio/pci/pds/aux_drv.c                | 216 +++++++
>   drivers/vfio/pci/pds/aux_drv.h                |  30 +
>   drivers/vfio/pci/pds/cmds.c                   | 486 ++++++++++++++++
>   drivers/vfio/pci/pds/cmds.h                   |  44 ++
>   drivers/vfio/pci/pds/dirty.c                  | 541 ++++++++++++++++++
>   drivers/vfio/pci/pds/dirty.h                  |  49 ++
>   drivers/vfio/pci/pds/lm.c                     | 484 ++++++++++++++++
>   drivers/vfio/pci/pds/lm.h                     |  53 ++
>   drivers/vfio/pci/pds/pci_drv.c                | 134 +++++
>   drivers/vfio/pci/pds/pci_drv.h                |   9 +
>   drivers/vfio/pci/pds/vfio_dev.c               | 238 ++++++++
>   drivers/vfio/pci/pds/vfio_dev.h               |  42 ++
>   drivers/vfio/vfio_main.c                      |  48 ++
>   include/linux/pds/pds_core_if.h               |   1 +
>   include/linux/pds/pds_lm.h                    | 356 ++++++++++++
>   include/linux/vfio.h                          |   3 +
>   21 files changed, 2847 insertions(+), 47 deletions(-)
>   create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst
>   create mode 100644 drivers/vfio/pci/pds/Kconfig
>   create mode 100644 drivers/vfio/pci/pds/Makefile
>   create mode 100644 drivers/vfio/pci/pds/aux_drv.c
>   create mode 100644 drivers/vfio/pci/pds/aux_drv.h
>   create mode 100644 drivers/vfio/pci/pds/cmds.c
>   create mode 100644 drivers/vfio/pci/pds/cmds.h
>   create mode 100644 drivers/vfio/pci/pds/dirty.c
>   create mode 100644 drivers/vfio/pci/pds/dirty.h
>   create mode 100644 drivers/vfio/pci/pds/lm.c
>   create mode 100644 drivers/vfio/pci/pds/lm.h
>   create mode 100644 drivers/vfio/pci/pds/pci_drv.c
>   create mode 100644 drivers/vfio/pci/pds/pci_drv.h
>   create mode 100644 drivers/vfio/pci/pds/vfio_dev.c
>   create mode 100644 drivers/vfio/pci/pds/vfio_dev.h
>   create mode 100644 include/linux/pds/pds_lm.h
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH vfio 0/7] pds vfio driver
  2022-12-11 12:54 ` Max Gurtovoy
@ 2022-12-12  1:16   ` Brett Creeley
  2022-12-12 17:46     ` Brett Creeley
  0 siblings, 1 reply; 16+ messages in thread
From: Brett Creeley @ 2022-12-12  1:16 UTC (permalink / raw)
  To: Max Gurtovoy, Brett Creeley, kvm, netdev, alex.williamson,
	cohuck, jgg, yishaih, shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Oren Duer


On 12/11/2022 4:54 AM, Max Gurtovoy wrote:
> Caution: This message originated from an External Source. Use proper 
> caution when opening attachments, clicking links, or responding.
> 
> 
> On 12/7/2022 3:06 AM, Brett Creeley wrote:
>> This is a first draft patchset for a new vendor specific VFIO driver for
>> use with the AMD/Pensando Distributed Services Card (DSC). This driver
>> (pds_vfio) is a client of the newly introduced pds_core driver.
>>
>> Reference to the pds_core patchset:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F20221207004443.33779-1-shannon.nelson%40amd.com%2F&amp;data=05%7C01%7Cbrett.creeley%40amd.com%7C0591fe11a7c24bf8789908dadb76db84%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638063600829691750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3VMvNtUto4HwCap65NvWrIADbLzGk4Ef9ZnA9fAB458%3D&amp;reserved=0
>>
>> AMD/Pensando already supports a NVMe VF device (1dd8:1006) in the
>> Distributed Services Card (DSC). This patchset adds the new pds_vfio
>> driver in order to support NVMe VF live migration.
>>
>> This driver will use the pds_core device and auxiliary_bus as the VFIO
>> control path to the DSC. The pds_core device creates auxiliary_bus 
>> devices
>> for each live migratable VF. The devices are named by their feature plus
>> the VF PCI BDF so the auxiliary_bus driver implemented by pds_vfio can 
>> find
>> its related VF PCI driver instance. Once this auxiliary bus connection
>> is configured, the pds_vfio driver can send admin queue commands to the
>> device and receive events from pds_core.
>>
>> An ASCII diagram of a VFIO instance looks something like this and can
>> be used with the VFIO subsystem to provide devices VFIO and live
>> migration support.
>>
>>                                 .------.  .--------------------------.
>>                                 | QEMU |--|  VM     .-------------.  |
>>                                 '......'  |         | nvme driver |  |
>>                                    |      |         .-------------.  |
>>                                    |      |         |  SR-IOV VF  |  |
>>                                    |      |         '-------------'  |
>>                                    |      '---------------||---------'
>>                                 .--------------.          ||
>>                                 |/dev/<vfio_fd>|          ||
>>                                 '--------------'          ||
>> Host Userspace                         |                 ||
>> ===================================================      ||
>> Host Kernel                            |                 ||
>>                                         |                 ||
>>             pds_core.LM.2305 <--+   .--------.            ||
>>                     |           |   |vfio-pci|            ||
>>                     |           |   '--------'            ||
>>                     |           |       |                 ||
>>           .------------.       .-------------.            ||
>>           |  pds_core  |       |   pds_vfio  |            ||
>>           '------------'       '-------------'            ||
>>                 ||                   ||                   ||
>>               09:00.0              09:00.1                ||
>> == PCI ==================================================||=====
>>                 ||                   ||                   ||
>>            .----------.         .----------.              ||
>>      ,-----|    PF    |---------|    VF    |-------------------,
>>      |     '----------'         '----------'  |      nvme      |
>>      |                     DSC                |  data/control  |
>>      |                                        |      path      |
>>      -----------------------------------------------------------
> 
> Hi Brett,
> 
> what is the class code of the pds_core device ?
> 
> I see that pds_vfio class_code is PCI_CLASS_STORAGE_EXPRESS.

The pds_core driver has the following as its only pci_device_id
entry:

PCI_VDEVICE(PENSANDO, PCI_DEVICE_ID_PENSANDO_CORE_PF)

> 
>>
>>
>> The pds_vfio driver is targeted to reside in drivers/vfio/pci/pds.
>> It makes use of and introduces new files in the common include/linux/pds
>> include directory.
>>
>> Brett Creeley (7):
>>    pds_vfio: Initial support for pds_vfio VFIO driver
>>    pds_vfio: Add support to register as PDS client
>>    pds_vfio: Add VFIO live migration support
>>    vfio: Commonize combine_ranges for use in other VFIO drivers
>>    pds_vfio: Add support for dirty page tracking
>>    pds_vfio: Add support for firmware recovery
>>    pds_vfio: Add documentation files
>>
>>   .../ethernet/pensando/pds_vfio.rst            |  88 +++
>>   drivers/vfio/pci/Kconfig                      |   2 +
>>   drivers/vfio/pci/mlx5/cmd.c                   |  48 +-
>>   drivers/vfio/pci/pds/Kconfig                  |  10 +
>>   drivers/vfio/pci/pds/Makefile                 |  12 +
>>   drivers/vfio/pci/pds/aux_drv.c                | 216 +++++++
>>   drivers/vfio/pci/pds/aux_drv.h                |  30 +
>>   drivers/vfio/pci/pds/cmds.c                   | 486 ++++++++++++++++
>>   drivers/vfio/pci/pds/cmds.h                   |  44 ++
>>   drivers/vfio/pci/pds/dirty.c                  | 541 ++++++++++++++++++
>>   drivers/vfio/pci/pds/dirty.h                  |  49 ++
>>   drivers/vfio/pci/pds/lm.c                     | 484 ++++++++++++++++
>>   drivers/vfio/pci/pds/lm.h                     |  53 ++
>>   drivers/vfio/pci/pds/pci_drv.c                | 134 +++++
>>   drivers/vfio/pci/pds/pci_drv.h                |   9 +
>>   drivers/vfio/pci/pds/vfio_dev.c               | 238 ++++++++
>>   drivers/vfio/pci/pds/vfio_dev.h               |  42 ++
>>   drivers/vfio/vfio_main.c                      |  48 ++
>>   include/linux/pds/pds_core_if.h               |   1 +
>>   include/linux/pds/pds_lm.h                    | 356 ++++++++++++
>>   include/linux/vfio.h                          |   3 +
>>   21 files changed, 2847 insertions(+), 47 deletions(-)
>>   create mode 100644 
>> Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst
>>   create mode 100644 drivers/vfio/pci/pds/Kconfig
>>   create mode 100644 drivers/vfio/pci/pds/Makefile
>>   create mode 100644 drivers/vfio/pci/pds/aux_drv.c
>>   create mode 100644 drivers/vfio/pci/pds/aux_drv.h
>>   create mode 100644 drivers/vfio/pci/pds/cmds.c
>>   create mode 100644 drivers/vfio/pci/pds/cmds.h
>>   create mode 100644 drivers/vfio/pci/pds/dirty.c
>>   create mode 100644 drivers/vfio/pci/pds/dirty.h
>>   create mode 100644 drivers/vfio/pci/pds/lm.c
>>   create mode 100644 drivers/vfio/pci/pds/lm.h
>>   create mode 100644 drivers/vfio/pci/pds/pci_drv.c
>>   create mode 100644 drivers/vfio/pci/pds/pci_drv.h
>>   create mode 100644 drivers/vfio/pci/pds/vfio_dev.c
>>   create mode 100644 drivers/vfio/pci/pds/vfio_dev.h
>>   create mode 100644 include/linux/pds/pds_lm.h
>>
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH vfio 0/7] pds vfio driver
  2022-12-12  1:16   ` Brett Creeley
@ 2022-12-12 17:46     ` Brett Creeley
  0 siblings, 0 replies; 16+ messages in thread
From: Brett Creeley @ 2022-12-12 17:46 UTC (permalink / raw)
  To: Max Gurtovoy, Brett Creeley, kvm, netdev, alex.williamson,
	cohuck, jgg, yishaih, shameerali.kolothum.thodi, kevin.tian
  Cc: shannon.nelson, drivers, Oren Duer


On 12/11/2022 5:16 PM, Brett Creeley wrote:
> 
> On 12/11/2022 4:54 AM, Max Gurtovoy wrote:
>> Caution: This message originated from an External Source. Use proper 
>> caution when opening attachments, clicking links, or responding.
>>
>>
>> On 12/7/2022 3:06 AM, Brett Creeley wrote:
>>> This is a first draft patchset for a new vendor specific VFIO driver for
>>> use with the AMD/Pensando Distributed Services Card (DSC). This driver
>>> (pds_vfio) is a client of the newly introduced pds_core driver.
>>>
>>> Reference to the pds_core patchset:
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F20221207004443.33779-1-shannon.nelson%40amd.com%2F&amp;data=05%7C01%7Cbrett.creeley%40amd.com%7C0591fe11a7c24bf8789908dadb76db84%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638063600829691750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3VMvNtUto4HwCap65NvWrIADbLzGk4Ef9ZnA9fAB458%3D&amp;reserved=0
>>>
>>> AMD/Pensando already supports a NVMe VF device (1dd8:1006) in the
>>> Distributed Services Card (DSC). This patchset adds the new pds_vfio
>>> driver in order to support NVMe VF live migration.
>>>
>>> This driver will use the pds_core device and auxiliary_bus as the VFIO
>>> control path to the DSC. The pds_core device creates auxiliary_bus 
>>> devices
>>> for each live migratable VF. The devices are named by their feature plus
>>> the VF PCI BDF so the auxiliary_bus driver implemented by pds_vfio 
>>> can find
>>> its related VF PCI driver instance. Once this auxiliary bus connection
>>> is configured, the pds_vfio driver can send admin queue commands to the
>>> device and receive events from pds_core.
>>>
>>> An ASCII diagram of a VFIO instance looks something like this and can
>>> be used with the VFIO subsystem to provide devices VFIO and live
>>> migration support.
>>>
>>>                                 .------.  .--------------------------.
>>>                                 | QEMU |--|  VM     .-------------.  |
>>>                                 '......'  |         | nvme driver |  |
>>>                                    |      |         .-------------.  |
>>>                                    |      |         |  SR-IOV VF  |  |
>>>                                    |      |         '-------------'  |
>>>                                    |      '---------------||---------'
>>>                                 .--------------.          ||
>>>                                 |/dev/<vfio_fd>|          ||
>>>                                 '--------------'          ||
>>> Host Userspace                         |                 ||
>>> ===================================================      ||
>>> Host Kernel                            |                 ||
>>>                                         |                 ||
>>>             pds_core.LM.2305 <--+   .--------.            ||
>>>                     |           |   |vfio-pci|            ||
>>>                     |           |   '--------'            ||
>>>                     |           |       |                 ||
>>>           .------------.       .-------------.            ||
>>>           |  pds_core  |       |   pds_vfio  |            ||
>>>           '------------'       '-------------'            ||
>>>                 ||                   ||                   ||
>>>               09:00.0              09:00.1                ||
>>> == PCI ==================================================||=====
>>>                 ||                   ||                   ||
>>>            .----------.         .----------.              ||
>>>      ,-----|    PF    |---------|    VF    |-------------------,
>>>      |     '----------'         '----------'  |      nvme      |
>>>      |                     DSC                |  data/control  |
>>>      |                                        |      path      |
>>>      -----------------------------------------------------------
>>
>> Hi Brett,
>>
>> what is the class code of the pds_core device ?
>>
>> I see that pds_vfio class_code is PCI_CLASS_STORAGE_EXPRESS.
> 
> The pds_core driver has the following as its only pci_device_id
> entry:
> 
> PCI_VDEVICE(PENSANDO, PCI_DEVICE_ID_PENSANDO_CORE_PF)

The PCI class code for this device is 0x12 (Processing accelerator).

Thanks,

Brett
> 
>>
>>>
>>>
>>> The pds_vfio driver is targeted to reside in drivers/vfio/pci/pds.
>>> It makes use of and introduces new files in the common include/linux/pds
>>> include directory.
>>>
>>> Brett Creeley (7):
>>>    pds_vfio: Initial support for pds_vfio VFIO driver
>>>    pds_vfio: Add support to register as PDS client
>>>    pds_vfio: Add VFIO live migration support
>>>    vfio: Commonize combine_ranges for use in other VFIO drivers
>>>    pds_vfio: Add support for dirty page tracking
>>>    pds_vfio: Add support for firmware recovery
>>>    pds_vfio: Add documentation files
>>>
>>>   .../ethernet/pensando/pds_vfio.rst            |  88 +++
>>>   drivers/vfio/pci/Kconfig                      |   2 +
>>>   drivers/vfio/pci/mlx5/cmd.c                   |  48 +-
>>>   drivers/vfio/pci/pds/Kconfig                  |  10 +
>>>   drivers/vfio/pci/pds/Makefile                 |  12 +
>>>   drivers/vfio/pci/pds/aux_drv.c                | 216 +++++++
>>>   drivers/vfio/pci/pds/aux_drv.h                |  30 +
>>>   drivers/vfio/pci/pds/cmds.c                   | 486 ++++++++++++++++
>>>   drivers/vfio/pci/pds/cmds.h                   |  44 ++
>>>   drivers/vfio/pci/pds/dirty.c                  | 541 ++++++++++++++++++
>>>   drivers/vfio/pci/pds/dirty.h                  |  49 ++
>>>   drivers/vfio/pci/pds/lm.c                     | 484 ++++++++++++++++
>>>   drivers/vfio/pci/pds/lm.h                     |  53 ++
>>>   drivers/vfio/pci/pds/pci_drv.c                | 134 +++++
>>>   drivers/vfio/pci/pds/pci_drv.h                |   9 +
>>>   drivers/vfio/pci/pds/vfio_dev.c               | 238 ++++++++
>>>   drivers/vfio/pci/pds/vfio_dev.h               |  42 ++
>>>   drivers/vfio/vfio_main.c                      |  48 ++
>>>   include/linux/pds/pds_core_if.h               |   1 +
>>>   include/linux/pds/pds_lm.h                    | 356 ++++++++++++
>>>   include/linux/vfio.h                          |   3 +
>>>   21 files changed, 2847 insertions(+), 47 deletions(-)
>>>   create mode 100644 
>>> Documentation/networking/device_drivers/ethernet/pensando/pds_vfio.rst
>>>   create mode 100644 drivers/vfio/pci/pds/Kconfig
>>>   create mode 100644 drivers/vfio/pci/pds/Makefile
>>>   create mode 100644 drivers/vfio/pci/pds/aux_drv.c
>>>   create mode 100644 drivers/vfio/pci/pds/aux_drv.h
>>>   create mode 100644 drivers/vfio/pci/pds/cmds.c
>>>   create mode 100644 drivers/vfio/pci/pds/cmds.h
>>>   create mode 100644 drivers/vfio/pci/pds/dirty.c
>>>   create mode 100644 drivers/vfio/pci/pds/dirty.h
>>>   create mode 100644 drivers/vfio/pci/pds/lm.c
>>>   create mode 100644 drivers/vfio/pci/pds/lm.h
>>>   create mode 100644 drivers/vfio/pci/pds/pci_drv.c
>>>   create mode 100644 drivers/vfio/pci/pds/pci_drv.h
>>>   create mode 100644 drivers/vfio/pci/pds/vfio_dev.c
>>>   create mode 100644 drivers/vfio/pci/pds/vfio_dev.h
>>>   create mode 100644 include/linux/pds/pds_lm.h
>>>
>>> -- 
>>> 2.17.1
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-12-12 17:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-07  1:06 [RFC PATCH vfio 0/7] pds vfio driver Brett Creeley
2022-12-07  1:06 ` [RFC PATCH vfio 1/7] vfio/pds: Initial support for pds_vfio VFIO driver Brett Creeley
2022-12-07  1:07 ` [RFC PATCH vfio 2/7] vfio/pds: Add support to register as PDS client Brett Creeley
2022-12-07  1:07 ` [RFC PATCH vfio 3/7] vfio/pds: Add VFIO live migration support Brett Creeley
2022-12-07 17:09   ` Jason Gunthorpe
2022-12-07 21:32     ` Brett Creeley
2022-12-07 23:29       ` Jason Gunthorpe
2022-12-07 23:34         ` Brett Creeley
2022-12-07  1:07 ` [RFC PATCH vfio 4/7] vfio: Commonize combine_ranges for use in other VFIO drivers Brett Creeley
2022-12-07  1:07 ` [RFC PATCH vfio 5/7] vfio/pds: Add support for dirty page tracking Brett Creeley
2022-12-07  1:07 ` [RFC PATCH vfio 6/7] vfio/pds: Add support for firmware recovery Brett Creeley
2022-12-07  1:07 ` [RFC PATCH vfio 7/7] vfio/pds: Add Kconfig and documentation Brett Creeley
2022-12-07  7:43 ` [RFC PATCH vfio 0/7] pds vfio driver Christoph Hellwig
2022-12-11 12:54 ` Max Gurtovoy
2022-12-12  1:16   ` Brett Creeley
2022-12-12 17:46     ` Brett Creeley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).