linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver
@ 2022-03-02 17:28 Shameer Kolothum
  2022-03-02 17:28 ` [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to include/linux Shameer Kolothum
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:28 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

Hi,

This series attempts to add vfio live migration support for HiSilicon
ACC VF devices based on the new v2 migration protocol definition and
mlx5 v9 series discussed here[0].

v6 --> v7
 -Renamed MIG_PRECOPY ioctl name and struct name. Updated ioctl descriptions
  regarding ioctl validity (patch #7).
- Adressed comments from Jason and Alex on PRE_COPY read() and ioctl() fns
  (patch #9).
- Moved only VF PCI ids to pci_ids.h(patch #3).

This is sanity tested on a HiSilicon platform using the Qemu branch
provided here[1].

Please take a look and let me know your feedback.

Thanks,
Shameer
[0] https://lore.kernel.org/kvm/20220224142024.147653-1-yishaih@nvidia.com/
[1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2

v5 --> v6
 -Report PRE_COPY support and use that for early compatibility check
  between src and dst devices.
 -For generic PRE_COPY support, included patch #7 from Jason(Thanks!).
 -Addressed comments from Alex(Thanks!).
 -Added the QM state register update to QM driver(patch #8) since that
  is being used in migration driver to decide whether the device is
  ready to save the state.

RFCv4 --> v5
  - Dropped RFC tag as v2 migration APIs are more stable now.
  - Addressed review comments from Jason and Alex (Thanks!).

v3 --> RFCv4
-Based on migration v2 protocol and mlx5 v7 series.
-Added RFC tag again as migration v2 protocol is still under discussion.
-Added new patch #6 to retrieve the PF QM data.
-PRE_COPY compatibility check is now done after the migration data
 transfer. This is not ideal and needs discussion.

RFC v2 --> v3
 -Dropped RFC tag as the vfio_pci_core subsystem framework is now
  part of 5.15-rc1.
 -Added override methods for vfio_device_ops read/write/mmap calls
  to limit the access within the functional register space.
 -Patches 1 to 3 are code refactoring to move the common ACC QM
  definitions and header around.

RFCv1 --> RFCv2

 -Adds a new vendor-specific vfio_pci driver(hisi-acc-vfio-pci)
  for HiSilicon ACC VF devices based on the new vfio-pci-core
  framework proposal.

 -Since HiSilicon ACC VF device MMIO space contains both the
  functional register space and migration control register space,
  override the vfio_device_ops ioctl method to report only the
  functional space to VMs.

 -For a successful migration, we still need access to VF dev
  functional register space mainly to read the status registers.
  But accessing these while the Guest vCPUs are running may leave
  a security hole. To avoid any potential security issues, we
  map/unmap the MMIO regions on a need basis and is safe to do so.
  (Please see hisi_acc_vf_ioremap/unmap() fns in patch #4).
 
 -Dropped debugfs support for now.
 -Uses common QM functions for mailbox access(patch #3).

Jason Gunthorpe (1):
  vfio: Extend the device migration protocol with PRE_COPY

Longfang Liu (3):
  crypto: hisilicon/qm: Move few definitions to common header
  crypto: hisilicon/qm: Set the VF QM state register
  hisi_acc_vfio_pci: Add support for VFIO live migration

Shameer Kolothum (6):
  crypto: hisilicon/qm: Move the QM header to include/linux
  hisi_acc_qm: Move VF PCI device IDs to common header
  hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
  hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region
  hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver
  hisi_acc_vfio_pci: Use its own PCI reset_done error handler

 drivers/crypto/hisilicon/hpre/hpre.h          |    2 +-
 drivers/crypto/hisilicon/hpre/hpre_main.c     |   19 +-
 drivers/crypto/hisilicon/qm.c                 |   42 +-
 drivers/crypto/hisilicon/sec2/sec.h           |    2 +-
 drivers/crypto/hisilicon/sec2/sec_main.c      |   21 +-
 drivers/crypto/hisilicon/sgl.c                |    2 +-
 drivers/crypto/hisilicon/zip/zip.h            |    2 +-
 drivers/crypto/hisilicon/zip/zip_main.c       |   17 +-
 drivers/vfio/pci/Kconfig                      |    2 +
 drivers/vfio/pci/Makefile                     |    2 +
 drivers/vfio/pci/hisilicon/Kconfig            |   16 +
 drivers/vfio/pci/hisilicon/Makefile           |    4 +
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 1369 +++++++++++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  114 ++
 drivers/vfio/vfio.c                           |   71 +-
 .../qm.h => include/linux/hisi_acc_qm.h       |   49 +
 include/linux/pci_ids.h                       |    3 +
 include/uapi/linux/vfio.h                     |  113 +-
 18 files changed, 1792 insertions(+), 58 deletions(-)
 create mode 100644 drivers/vfio/pci/hisilicon/Kconfig
 create mode 100644 drivers/vfio/pci/hisilicon/Makefile
 create mode 100644 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
 create mode 100644 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
 rename drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h (87%)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to include/linux
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
@ 2022-03-02 17:28 ` Shameer Kolothum
  2022-03-02 18:48   ` John Garry
  2022-03-02 17:28 ` [PATCH v7 02/10] crypto: hisilicon/qm: Move few definitions to common header Shameer Kolothum
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:28 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

Since we are going to introduce VFIO PCI HiSilicon ACC
driver for live migration in subsequent patches, move
the ACC QM header file to a common include dir.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/crypto/hisilicon/hpre/hpre.h                         | 2 +-
 drivers/crypto/hisilicon/qm.c                                | 2 +-
 drivers/crypto/hisilicon/sec2/sec.h                          | 2 +-
 drivers/crypto/hisilicon/sgl.c                               | 2 +-
 drivers/crypto/hisilicon/zip/zip.h                           | 2 +-
 drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h | 0
 6 files changed, 5 insertions(+), 5 deletions(-)
 rename drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h (100%)

diff --git a/drivers/crypto/hisilicon/hpre/hpre.h b/drivers/crypto/hisilicon/hpre/hpre.h
index e0b4a1982ee9..9a0558ed82f9 100644
--- a/drivers/crypto/hisilicon/hpre/hpre.h
+++ b/drivers/crypto/hisilicon/hpre/hpre.h
@@ -4,7 +4,7 @@
 #define __HISI_HPRE_H
 
 #include <linux/list.h>
-#include "../qm.h"
+#include <linux/hisi_acc_qm.h>
 
 #define HPRE_SQE_SIZE			sizeof(struct hpre_sqe)
 #define HPRE_PF_DEF_Q_NUM		64
diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index c5b84a5ea350..ed23e1d3fa27 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -15,7 +15,7 @@
 #include <linux/uacce.h>
 #include <linux/uaccess.h>
 #include <uapi/misc/uacce/hisi_qm.h>
-#include "qm.h"
+#include <linux/hisi_acc_qm.h>
 
 /* eq/aeq irq enable */
 #define QM_VF_AEQ_INT_SOURCE		0x0
diff --git a/drivers/crypto/hisilicon/sec2/sec.h b/drivers/crypto/hisilicon/sec2/sec.h
index d97cf02b1df7..c2e9b01187a7 100644
--- a/drivers/crypto/hisilicon/sec2/sec.h
+++ b/drivers/crypto/hisilicon/sec2/sec.h
@@ -4,7 +4,7 @@
 #ifndef __HISI_SEC_V2_H
 #define __HISI_SEC_V2_H
 
-#include "../qm.h"
+#include <linux/hisi_acc_qm.h>
 #include "sec_crypto.h"
 
 /* Algorithm resource per hardware SEC queue */
diff --git a/drivers/crypto/hisilicon/sgl.c b/drivers/crypto/hisilicon/sgl.c
index 057273769f26..534687401135 100644
--- a/drivers/crypto/hisilicon/sgl.c
+++ b/drivers/crypto/hisilicon/sgl.c
@@ -3,7 +3,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/module.h>
 #include <linux/slab.h>
-#include "qm.h"
+#include <linux/hisi_acc_qm.h>
 
 #define HISI_ACC_SGL_SGE_NR_MIN		1
 #define HISI_ACC_SGL_NR_MAX		256
diff --git a/drivers/crypto/hisilicon/zip/zip.h b/drivers/crypto/hisilicon/zip/zip.h
index 517fdbdff3ea..3dfd3bac5a33 100644
--- a/drivers/crypto/hisilicon/zip/zip.h
+++ b/drivers/crypto/hisilicon/zip/zip.h
@@ -7,7 +7,7 @@
 #define pr_fmt(fmt)	"hisi_zip: " fmt
 
 #include <linux/list.h>
-#include "../qm.h"
+#include <linux/hisi_acc_qm.h>
 
 enum hisi_zip_error_type {
 	/* negative compression */
diff --git a/drivers/crypto/hisilicon/qm.h b/include/linux/hisi_acc_qm.h
similarity index 100%
rename from drivers/crypto/hisilicon/qm.h
rename to include/linux/hisi_acc_qm.h
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 02/10] crypto: hisilicon/qm: Move few definitions to common header
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
  2022-03-02 17:28 ` [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to include/linux Shameer Kolothum
@ 2022-03-02 17:28 ` Shameer Kolothum
  2022-03-02 18:55   ` John Garry
  2022-03-02 17:28 ` [PATCH v7 03/10] hisi_acc_qm: Move VF PCI device IDs " Shameer Kolothum
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:28 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

From: Longfang Liu <liulongfang@huawei.com>

Move Doorbell and Mailbox definitions to common header
file. Also export QM mailbox functions.

This will be useful when we introduce VFIO PCI HiSilicon
ACC live migration driver.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/crypto/hisilicon/qm.c | 32 +++++------------------------
 include/linux/hisi_acc_qm.h   | 38 +++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index ed23e1d3fa27..8c29f9fba573 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -33,23 +33,6 @@
 #define QM_ABNORMAL_EVENT_IRQ_VECTOR	3
 
 /* mailbox */
-#define QM_MB_CMD_SQC			0x0
-#define QM_MB_CMD_CQC			0x1
-#define QM_MB_CMD_EQC			0x2
-#define QM_MB_CMD_AEQC			0x3
-#define QM_MB_CMD_SQC_BT		0x4
-#define QM_MB_CMD_CQC_BT		0x5
-#define QM_MB_CMD_SQC_VFT_V2		0x6
-#define QM_MB_CMD_STOP_QP		0x8
-#define QM_MB_CMD_SRC			0xc
-#define QM_MB_CMD_DST			0xd
-
-#define QM_MB_CMD_SEND_BASE		0x300
-#define QM_MB_EVENT_SHIFT		8
-#define QM_MB_BUSY_SHIFT		13
-#define QM_MB_OP_SHIFT			14
-#define QM_MB_CMD_DATA_ADDR_L		0x304
-#define QM_MB_CMD_DATA_ADDR_H		0x308
 #define QM_MB_PING_ALL_VFS		0xffff
 #define QM_MB_CMD_DATA_SHIFT		32
 #define QM_MB_CMD_DATA_MASK		GENMASK(31, 0)
@@ -103,19 +86,12 @@
 #define QM_DB_CMD_SHIFT_V1		16
 #define QM_DB_INDEX_SHIFT_V1		32
 #define QM_DB_PRIORITY_SHIFT_V1		48
-#define QM_DOORBELL_SQ_CQ_BASE_V2	0x1000
-#define QM_DOORBELL_EQ_AEQ_BASE_V2	0x2000
 #define QM_QUE_ISO_CFG_V		0x0030
 #define QM_PAGE_SIZE			0x0034
 #define QM_QUE_ISO_EN			0x100154
 #define QM_CAPBILITY			0x100158
 #define QM_QP_NUN_MASK			GENMASK(10, 0)
 #define QM_QP_DB_INTERVAL		0x10000
-#define QM_QP_MAX_NUM_SHIFT		11
-#define QM_DB_CMD_SHIFT_V2		12
-#define QM_DB_RAND_SHIFT_V2		16
-#define QM_DB_INDEX_SHIFT_V2		32
-#define QM_DB_PRIORITY_SHIFT_V2		48
 
 #define QM_MEM_START_INIT		0x100040
 #define QM_MEM_INIT_DONE		0x100044
@@ -693,7 +669,7 @@ static void qm_mb_pre_init(struct qm_mailbox *mailbox, u8 cmd,
 }
 
 /* return 0 mailbox ready, -ETIMEDOUT hardware timeout */
-static int qm_wait_mb_ready(struct hisi_qm *qm)
+int qm_wait_mb_ready(struct hisi_qm *qm)
 {
 	u32 val;
 
@@ -701,6 +677,7 @@ static int qm_wait_mb_ready(struct hisi_qm *qm)
 					  val, !((val >> QM_MB_BUSY_SHIFT) &
 					  0x1), POLL_PERIOD, POLL_TIMEOUT);
 }
+EXPORT_SYMBOL_GPL(qm_wait_mb_ready);
 
 /* 128 bit should be written to hardware at one time to trigger a mailbox */
 static void qm_mb_write(struct hisi_qm *qm, const void *src)
@@ -745,8 +722,8 @@ static int qm_mb_nolock(struct hisi_qm *qm, struct qm_mailbox *mailbox)
 	return -EBUSY;
 }
 
-static int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
-		 bool op)
+int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
+	  bool op)
 {
 	struct qm_mailbox mailbox;
 	int ret;
@@ -762,6 +739,7 @@ static int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(qm_mb);
 
 static void qm_db_v1(struct hisi_qm *qm, u16 qn, u8 cmd, u16 index, u8 priority)
 {
diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h
index 3068093229a5..8befb59c6fb3 100644
--- a/include/linux/hisi_acc_qm.h
+++ b/include/linux/hisi_acc_qm.h
@@ -34,6 +34,40 @@
 #define QM_WUSER_M_CFG_ENABLE		0x1000a8
 #define WUSER_M_CFG_ENABLE		0xffffffff
 
+/* mailbox */
+#define QM_MB_CMD_SQC                   0x0
+#define QM_MB_CMD_CQC                   0x1
+#define QM_MB_CMD_EQC                   0x2
+#define QM_MB_CMD_AEQC                  0x3
+#define QM_MB_CMD_SQC_BT                0x4
+#define QM_MB_CMD_CQC_BT                0x5
+#define QM_MB_CMD_SQC_VFT_V2            0x6
+#define QM_MB_CMD_STOP_QP               0x8
+#define QM_MB_CMD_SRC                   0xc
+#define QM_MB_CMD_DST                   0xd
+
+#define QM_MB_CMD_SEND_BASE		0x300
+#define QM_MB_EVENT_SHIFT               8
+#define QM_MB_BUSY_SHIFT		13
+#define QM_MB_OP_SHIFT			14
+#define QM_MB_CMD_DATA_ADDR_L		0x304
+#define QM_MB_CMD_DATA_ADDR_H		0x308
+#define QM_MB_MAX_WAIT_CNT		6000
+
+/* doorbell */
+#define QM_DOORBELL_CMD_SQ              0
+#define QM_DOORBELL_CMD_CQ              1
+#define QM_DOORBELL_CMD_EQ              2
+#define QM_DOORBELL_CMD_AEQ             3
+
+#define QM_DOORBELL_SQ_CQ_BASE_V2	0x1000
+#define QM_DOORBELL_EQ_AEQ_BASE_V2	0x2000
+#define QM_QP_MAX_NUM_SHIFT             11
+#define QM_DB_CMD_SHIFT_V2		12
+#define QM_DB_RAND_SHIFT_V2		16
+#define QM_DB_INDEX_SHIFT_V2		32
+#define QM_DB_PRIORITY_SHIFT_V2		48
+
 /* qm cache */
 #define QM_CACHE_CTL			0x100050
 #define SQC_CACHE_ENABLE		BIT(0)
@@ -414,6 +448,10 @@ pci_ers_result_t hisi_qm_dev_slot_reset(struct pci_dev *pdev);
 void hisi_qm_reset_prepare(struct pci_dev *pdev);
 void hisi_qm_reset_done(struct pci_dev *pdev);
 
+int qm_wait_mb_ready(struct hisi_qm *qm);
+int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
+	  bool op);
+
 struct hisi_acc_sgl_pool;
 struct hisi_acc_hw_sgl *hisi_acc_sg_buf_map_to_hw_sgl(struct device *dev,
 	struct scatterlist *sgl, struct hisi_acc_sgl_pool *pool,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 03/10] hisi_acc_qm: Move VF PCI device IDs to common header
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
  2022-03-02 17:28 ` [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to include/linux Shameer Kolothum
  2022-03-02 17:28 ` [PATCH v7 02/10] crypto: hisilicon/qm: Move few definitions to common header Shameer Kolothum
@ 2022-03-02 17:28 ` Shameer Kolothum
  2022-03-02 17:28 ` [PATCH v7 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:28 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

Move the PCI Device IDs of HiSilicon ACC VF devices to a common header
and also use a uniform naming convention.

This will be useful when we introduce the vfio PCI HiSilicon ACC live
migration driver in subsequent patches.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/crypto/hisilicon/hpre/hpre_main.c | 13 ++++++-------
 drivers/crypto/hisilicon/sec2/sec_main.c  | 15 +++++++--------
 drivers/crypto/hisilicon/zip/zip_main.c   | 11 +++++------
 include/linux/pci_ids.h                   |  3 +++
 4 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/crypto/hisilicon/hpre/hpre_main.c b/drivers/crypto/hisilicon/hpre/hpre_main.c
index ebfab3e14499..3589d8879b5e 100644
--- a/drivers/crypto/hisilicon/hpre/hpre_main.c
+++ b/drivers/crypto/hisilicon/hpre/hpre_main.c
@@ -68,8 +68,7 @@
 #define HPRE_REG_RD_INTVRL_US		10
 #define HPRE_REG_RD_TMOUT_US		1000
 #define HPRE_DBGFS_VAL_MAX_LEN		20
-#define HPRE_PCI_DEVICE_ID		0xa258
-#define HPRE_PCI_VF_DEVICE_ID		0xa259
+#define PCI_DEVICE_ID_HUAWEI_HPRE_PF	0xa258
 #define HPRE_QM_USR_CFG_MASK		GENMASK(31, 1)
 #define HPRE_QM_AXI_CFG_MASK		GENMASK(15, 0)
 #define HPRE_QM_VFG_AX_MASK		GENMASK(7, 0)
@@ -111,8 +110,8 @@
 static const char hpre_name[] = "hisi_hpre";
 static struct dentry *hpre_debugfs_root;
 static const struct pci_device_id hpre_dev_ids[] = {
-	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, HPRE_PCI_DEVICE_ID) },
-	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, HPRE_PCI_VF_DEVICE_ID) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_HPRE_PF) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_HPRE_VF) },
 	{ 0, }
 };
 
@@ -242,7 +241,7 @@ MODULE_PARM_DESC(uacce_mode, UACCE_MODE_DESC);
 
 static int pf_q_num_set(const char *val, const struct kernel_param *kp)
 {
-	return q_num_set(val, kp, HPRE_PCI_DEVICE_ID);
+	return q_num_set(val, kp, PCI_DEVICE_ID_HUAWEI_HPRE_PF);
 }
 
 static const struct kernel_param_ops hpre_pf_q_num_ops = {
@@ -921,7 +920,7 @@ static int hpre_debugfs_init(struct hisi_qm *qm)
 	qm->debug.sqe_mask_len = HPRE_SQE_MASK_LEN;
 	hisi_qm_debug_init(qm);
 
-	if (qm->pdev->device == HPRE_PCI_DEVICE_ID) {
+	if (qm->pdev->device == PCI_DEVICE_ID_HUAWEI_HPRE_PF) {
 		ret = hpre_ctrl_debug_init(qm);
 		if (ret)
 			goto failed_to_create;
@@ -958,7 +957,7 @@ static int hpre_qm_init(struct hisi_qm *qm, struct pci_dev *pdev)
 	qm->sqe_size = HPRE_SQE_SIZE;
 	qm->dev_name = hpre_name;
 
-	qm->fun_type = (pdev->device == HPRE_PCI_DEVICE_ID) ?
+	qm->fun_type = (pdev->device == PCI_DEVICE_ID_HUAWEI_HPRE_PF) ?
 			QM_HW_PF : QM_HW_VF;
 	if (qm->fun_type == QM_HW_PF) {
 		qm->qp_base = HPRE_PF_DEF_Q_BASE;
diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c
index 26d3ab1d308b..311a8747b5bf 100644
--- a/drivers/crypto/hisilicon/sec2/sec_main.c
+++ b/drivers/crypto/hisilicon/sec2/sec_main.c
@@ -20,8 +20,7 @@
 
 #define SEC_VF_NUM			63
 #define SEC_QUEUE_NUM_V1		4096
-#define SEC_PF_PCI_DEVICE_ID		0xa255
-#define SEC_VF_PCI_DEVICE_ID		0xa256
+#define PCI_DEVICE_ID_HUAWEI_SEC_PF	0xa255
 
 #define SEC_BD_ERR_CHK_EN0		0xEFFFFFFF
 #define SEC_BD_ERR_CHK_EN1		0x7ffff7fd
@@ -225,7 +224,7 @@ static const struct debugfs_reg32 sec_dfx_regs[] = {
 
 static int sec_pf_q_num_set(const char *val, const struct kernel_param *kp)
 {
-	return q_num_set(val, kp, SEC_PF_PCI_DEVICE_ID);
+	return q_num_set(val, kp, PCI_DEVICE_ID_HUAWEI_SEC_PF);
 }
 
 static const struct kernel_param_ops sec_pf_q_num_ops = {
@@ -313,8 +312,8 @@ module_param_cb(uacce_mode, &sec_uacce_mode_ops, &uacce_mode, 0444);
 MODULE_PARM_DESC(uacce_mode, UACCE_MODE_DESC);
 
 static const struct pci_device_id sec_dev_ids[] = {
-	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, SEC_PF_PCI_DEVICE_ID) },
-	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, SEC_VF_PCI_DEVICE_ID) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_SEC_PF) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_SEC_VF) },
 	{ 0, }
 };
 MODULE_DEVICE_TABLE(pci, sec_dev_ids);
@@ -717,7 +716,7 @@ static int sec_core_debug_init(struct hisi_qm *qm)
 	regset->base = qm->io_base;
 	regset->dev = dev;
 
-	if (qm->pdev->device == SEC_PF_PCI_DEVICE_ID)
+	if (qm->pdev->device == PCI_DEVICE_ID_HUAWEI_SEC_PF)
 		debugfs_create_file("regs", 0444, tmp_d, regset, &sec_regs_fops);
 
 	for (i = 0; i < ARRAY_SIZE(sec_dfx_labels); i++) {
@@ -735,7 +734,7 @@ static int sec_debug_init(struct hisi_qm *qm)
 	struct sec_dev *sec = container_of(qm, struct sec_dev, qm);
 	int i;
 
-	if (qm->pdev->device == SEC_PF_PCI_DEVICE_ID) {
+	if (qm->pdev->device == PCI_DEVICE_ID_HUAWEI_SEC_PF) {
 		for (i = SEC_CLEAR_ENABLE; i < SEC_DEBUG_FILE_NUM; i++) {
 			spin_lock_init(&sec->debug.files[i].lock);
 			sec->debug.files[i].index = i;
@@ -877,7 +876,7 @@ static int sec_qm_init(struct hisi_qm *qm, struct pci_dev *pdev)
 	qm->sqe_size = SEC_SQE_SIZE;
 	qm->dev_name = sec_name;
 
-	qm->fun_type = (pdev->device == SEC_PF_PCI_DEVICE_ID) ?
+	qm->fun_type = (pdev->device == PCI_DEVICE_ID_HUAWEI_SEC_PF) ?
 			QM_HW_PF : QM_HW_VF;
 	if (qm->fun_type == QM_HW_PF) {
 		qm->qp_base = SEC_PF_DEF_Q_BASE;
diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c
index 678f8b58ec42..66decfe07282 100644
--- a/drivers/crypto/hisilicon/zip/zip_main.c
+++ b/drivers/crypto/hisilicon/zip/zip_main.c
@@ -15,8 +15,7 @@
 #include <linux/uacce.h>
 #include "zip.h"
 
-#define PCI_DEVICE_ID_ZIP_PF		0xa250
-#define PCI_DEVICE_ID_ZIP_VF		0xa251
+#define PCI_DEVICE_ID_HUAWEI_ZIP_PF	0xa250
 
 #define HZIP_QUEUE_NUM_V1		4096
 
@@ -246,7 +245,7 @@ MODULE_PARM_DESC(uacce_mode, UACCE_MODE_DESC);
 
 static int pf_q_num_set(const char *val, const struct kernel_param *kp)
 {
-	return q_num_set(val, kp, PCI_DEVICE_ID_ZIP_PF);
+	return q_num_set(val, kp, PCI_DEVICE_ID_HUAWEI_ZIP_PF);
 }
 
 static const struct kernel_param_ops pf_q_num_ops = {
@@ -268,8 +267,8 @@ module_param_cb(vfs_num, &vfs_num_ops, &vfs_num, 0444);
 MODULE_PARM_DESC(vfs_num, "Number of VFs to enable(1-63), 0(default)");
 
 static const struct pci_device_id hisi_zip_dev_ids[] = {
-	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_ZIP_PF) },
-	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_ZIP_VF) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_ZIP_PF) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_ZIP_VF) },
 	{ 0, }
 };
 MODULE_DEVICE_TABLE(pci, hisi_zip_dev_ids);
@@ -838,7 +837,7 @@ static int hisi_zip_qm_init(struct hisi_qm *qm, struct pci_dev *pdev)
 	qm->sqe_size = HZIP_SQE_SIZE;
 	qm->dev_name = hisi_zip_name;
 
-	qm->fun_type = (pdev->device == PCI_DEVICE_ID_ZIP_PF) ?
+	qm->fun_type = (pdev->device == PCI_DEVICE_ID_HUAWEI_ZIP_PF) ?
 			QM_HW_PF : QM_HW_VF;
 	if (qm->fun_type == QM_HW_PF) {
 		qm->qp_base = HZIP_PF_DEF_Q_BASE;
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index aad54c666407..31dee2b65a62 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2529,6 +2529,9 @@
 #define PCI_DEVICE_ID_KORENIX_JETCARDF3	0x17ff
 
 #define PCI_VENDOR_ID_HUAWEI		0x19e5
+#define PCI_DEVICE_ID_HUAWEI_ZIP_VF	0xa251
+#define PCI_DEVICE_ID_HUAWEI_SEC_VF	0xa256
+#define PCI_DEVICE_ID_HUAWEI_HPRE_VF	0xa259
 
 #define PCI_VENDOR_ID_NETRONOME		0x19ee
 #define PCI_DEVICE_ID_NETRONOME_NFP4000	0x4000
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
                   ` (2 preceding siblings ...)
  2022-03-02 17:28 ` [PATCH v7 03/10] hisi_acc_qm: Move VF PCI device IDs " Shameer Kolothum
@ 2022-03-02 17:28 ` Shameer Kolothum
  2022-03-02 19:02   ` John Garry
  2022-03-02 17:28 ` [PATCH v7 05/10] hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region Shameer Kolothum
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:28 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

Add a vendor-specific vfio_pci driver for HiSilicon ACC devices.
This will be extended in subsequent patches to add support for
VFIO live migration feature.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/pci/Kconfig                      |   2 +
 drivers/vfio/pci/Makefile                     |   2 +
 drivers/vfio/pci/hisilicon/Kconfig            |   9 ++
 drivers/vfio/pci/hisilicon/Makefile           |   4 +
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 100 ++++++++++++++++++
 5 files changed, 117 insertions(+)
 create mode 100644 drivers/vfio/pci/hisilicon/Kconfig
 create mode 100644 drivers/vfio/pci/hisilicon/Makefile
 create mode 100644 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 187b9c259944..4da1914425e1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -46,4 +46,6 @@ endif
 
 source "drivers/vfio/pci/mlx5/Kconfig"
 
+source "drivers/vfio/pci/hisilicon/Kconfig"
+
 endif
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index ed9d6f2e0555..7052ebd893e0 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -9,3 +9,5 @@ vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
 
 obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
+
+obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
diff --git a/drivers/vfio/pci/hisilicon/Kconfig b/drivers/vfio/pci/hisilicon/Kconfig
new file mode 100644
index 000000000000..d5acaf74a878
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/Kconfig
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config HISI_ACC_VFIO_PCI
+	tristate "VFIO PCI support for HiSilicon ACC devices"
+	depends on (ARM64 && VFIO_PCI_CORE) || (COMPILE_TEST && 64BIT)
+	help
+	  This provides generic PCI support for HiSilicon ACC devices
+	  using the VFIO framework.
+
+	  If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/hisilicon/Makefile b/drivers/vfio/pci/hisilicon/Makefile
new file mode 100644
index 000000000000..c66b3783f2f9
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisi-acc-vfio-pci.o
+hisi-acc-vfio-pci-y := hisi_acc_vfio_pci.o
+
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
new file mode 100644
index 000000000000..8129c3457b3b
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021, HiSilicon Ltd.
+ */
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/hisi_acc_qm.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+
+static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct vfio_pci_core_device *vdev =
+		container_of(core_vdev, struct vfio_pci_core_device, vdev);
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	vfio_pci_core_finish_enable(vdev);
+
+	return 0;
+}
+
+static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
+	.name = "hisi-acc-vfio-pci",
+	.open_device = hisi_acc_vfio_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = vfio_pci_core_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = vfio_pci_core_read,
+	.write = vfio_pci_core_write,
+	.mmap = vfio_pci_core_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+};
+
+static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct vfio_pci_core_device *vdev;
+	int ret;
+
+	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
+	if (!vdev)
+		return -ENOMEM;
+
+	vfio_pci_core_init_device(vdev, pdev, &hisi_acc_vfio_pci_ops);
+
+	ret = vfio_pci_core_register_device(vdev);
+	if (ret)
+		goto out_free;
+
+	dev_set_drvdata(&pdev->dev, vdev);
+
+	return 0;
+
+out_free:
+	vfio_pci_core_uninit_device(vdev);
+	kfree(vdev);
+	return ret;
+}
+
+static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
+{
+	struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
+
+	vfio_pci_core_unregister_device(vdev);
+	vfio_pci_core_uninit_device(vdev);
+	kfree(vdev);
+}
+
+static const struct pci_device_id hisi_acc_vfio_pci_table[] = {
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_SEC_VF) },
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_HPRE_VF) },
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_ZIP_VF) },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(pci, hisi_acc_vfio_pci_table);
+
+static struct pci_driver hisi_acc_vfio_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = hisi_acc_vfio_pci_table,
+	.probe = hisi_acc_vfio_pci_probe,
+	.remove = hisi_acc_vfio_pci_remove,
+	.err_handler = &vfio_pci_core_err_handlers,
+};
+
+module_pci_driver(hisi_acc_vfio_pci_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Liu Longfang <liulongfang@huawei.com>");
+MODULE_AUTHOR("Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>");
+MODULE_DESCRIPTION("HiSilicon VFIO PCI - Generic VFIO PCI driver for HiSilicon ACC device family");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 05/10] hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
                   ` (3 preceding siblings ...)
  2022-03-02 17:28 ` [PATCH v7 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
@ 2022-03-02 17:28 ` Shameer Kolothum
  2022-03-02 17:28 ` [PATCH v7 06/10] hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver Shameer Kolothum
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:28 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

HiSilicon ACC VF device BAR2 region consists of both functional
register space and migration control register space. From a
security point of view, it's not advisable to export the migration
control region to Guest.

Hence, introduce a separate struct vfio_device_ops for migration
support which will override the ioctl/read/write/mmap methods to
hide the migration region and limit the access only to the
functional register space.

This will be used in subsequent patches when we add migration
support to the driver.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 126 ++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 8129c3457b3b..582ee4fa4109 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -13,6 +13,119 @@
 #include <linux/vfio.h>
 #include <linux/vfio_pci_core.h>
 
+static int hisi_acc_pci_rw_access_check(struct vfio_device *core_vdev,
+					size_t count, loff_t *ppos,
+					size_t *new_count)
+{
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+	struct vfio_pci_core_device *vdev =
+		container_of(core_vdev, struct vfio_pci_core_device, vdev);
+
+	if (index == VFIO_PCI_BAR2_REGION_INDEX) {
+		loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+		resource_size_t end = pci_resource_len(vdev->pdev, index) / 2;
+
+		/* Check if access is for migration control region */
+		if (pos >= end)
+			return -EINVAL;
+
+		*new_count = min(count, (size_t)(end - pos));
+	}
+
+	return 0;
+}
+
+static int hisi_acc_vfio_pci_mmap(struct vfio_device *core_vdev,
+				  struct vm_area_struct *vma)
+{
+	struct vfio_pci_core_device *vdev =
+		container_of(core_vdev, struct vfio_pci_core_device, vdev);
+	unsigned int index;
+
+	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
+	if (index == VFIO_PCI_BAR2_REGION_INDEX) {
+		u64 req_len, pgoff, req_start;
+		resource_size_t end = pci_resource_len(vdev->pdev, index) / 2;
+
+		req_len = vma->vm_end - vma->vm_start;
+		pgoff = vma->vm_pgoff &
+			((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+		req_start = pgoff << PAGE_SHIFT;
+
+		if (req_start + req_len > end)
+			return -EINVAL;
+	}
+
+	return vfio_pci_core_mmap(core_vdev, vma);
+}
+
+static ssize_t hisi_acc_vfio_pci_write(struct vfio_device *core_vdev,
+				       const char __user *buf, size_t count,
+				       loff_t *ppos)
+{
+	size_t new_count = count;
+	int ret;
+
+	ret = hisi_acc_pci_rw_access_check(core_vdev, count, ppos, &new_count);
+	if (ret)
+		return ret;
+
+	return vfio_pci_core_write(core_vdev, buf, new_count, ppos);
+}
+
+static ssize_t hisi_acc_vfio_pci_read(struct vfio_device *core_vdev,
+				      char __user *buf, size_t count,
+				      loff_t *ppos)
+{
+	size_t new_count = count;
+	int ret;
+
+	ret = hisi_acc_pci_rw_access_check(core_vdev, count, ppos, &new_count);
+	if (ret)
+		return ret;
+
+	return vfio_pci_core_read(core_vdev, buf, new_count, ppos);
+}
+
+static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+				    unsigned long arg)
+{
+	if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
+		struct vfio_pci_core_device *vdev =
+			container_of(core_vdev, struct vfio_pci_core_device, vdev);
+		struct pci_dev *pdev = vdev->pdev;
+		struct vfio_region_info info;
+		unsigned long minsz;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		if (info.index == VFIO_PCI_BAR2_REGION_INDEX) {
+			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+
+			/*
+			 * ACC VF dev BAR2 region consists of both functional
+			 * register space and migration control register space.
+			 * Report only the functional region to Guest.
+			 */
+			info.size = pci_resource_len(pdev, info.index) / 2;
+
+			info.flags = VFIO_REGION_INFO_FLAG_READ |
+					VFIO_REGION_INFO_FLAG_WRITE |
+					VFIO_REGION_INFO_FLAG_MMAP;
+
+			return copy_to_user((void __user *)arg, &info, minsz) ?
+					    -EFAULT : 0;
+		}
+	}
+	return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+}
+
 static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
 {
 	struct vfio_pci_core_device *vdev =
@@ -28,6 +141,19 @@ static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
 	return 0;
 }
 
+static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
+	.name = "hisi-acc-vfio-pci-migration",
+	.open_device = hisi_acc_vfio_pci_open_device,
+	.close_device = vfio_pci_core_close_device,
+	.ioctl = hisi_acc_vfio_pci_ioctl,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read = hisi_acc_vfio_pci_read,
+	.write = hisi_acc_vfio_pci_write,
+	.mmap = hisi_acc_vfio_pci_mmap,
+	.request = vfio_pci_core_request,
+	.match = vfio_pci_core_match,
+};
+
 static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
 	.name = "hisi-acc-vfio-pci",
 	.open_device = hisi_acc_vfio_pci_open_device,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 06/10] hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
                   ` (4 preceding siblings ...)
  2022-03-02 17:28 ` [PATCH v7 05/10] hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region Shameer Kolothum
@ 2022-03-02 17:28 ` Shameer Kolothum
  2022-03-02 17:29 ` [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY Shameer Kolothum
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:28 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

struct pci_driver pointer is an input into the pci_iov_get_pf_drvdata().
Introduce helpers to retrieve the ACC PF dev struct pci_driver pointers
as we use this in ACC vfio migration driver.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/crypto/hisilicon/hpre/hpre_main.c | 6 ++++++
 drivers/crypto/hisilicon/sec2/sec_main.c  | 6 ++++++
 drivers/crypto/hisilicon/zip/zip_main.c   | 6 ++++++
 include/linux/hisi_acc_qm.h               | 5 +++++
 4 files changed, 23 insertions(+)

diff --git a/drivers/crypto/hisilicon/hpre/hpre_main.c b/drivers/crypto/hisilicon/hpre/hpre_main.c
index 3589d8879b5e..36ab30e9e654 100644
--- a/drivers/crypto/hisilicon/hpre/hpre_main.c
+++ b/drivers/crypto/hisilicon/hpre/hpre_main.c
@@ -1190,6 +1190,12 @@ static struct pci_driver hpre_pci_driver = {
 	.driver.pm		= &hpre_pm_ops,
 };
 
+struct pci_driver *hisi_hpre_get_pf_driver(void)
+{
+	return &hpre_pci_driver;
+}
+EXPORT_SYMBOL_GPL(hisi_hpre_get_pf_driver);
+
 static void hpre_register_debugfs(void)
 {
 	if (!debugfs_initialized())
diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c
index 311a8747b5bf..421a405ca337 100644
--- a/drivers/crypto/hisilicon/sec2/sec_main.c
+++ b/drivers/crypto/hisilicon/sec2/sec_main.c
@@ -1088,6 +1088,12 @@ static struct pci_driver sec_pci_driver = {
 	.driver.pm = &sec_pm_ops,
 };
 
+struct pci_driver *hisi_sec_get_pf_driver(void)
+{
+	return &sec_pci_driver;
+}
+EXPORT_SYMBOL_GPL(hisi_sec_get_pf_driver);
+
 static void sec_register_debugfs(void)
 {
 	if (!debugfs_initialized())
diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c
index 66decfe07282..4534e1e107d1 100644
--- a/drivers/crypto/hisilicon/zip/zip_main.c
+++ b/drivers/crypto/hisilicon/zip/zip_main.c
@@ -1012,6 +1012,12 @@ static struct pci_driver hisi_zip_pci_driver = {
 	.driver.pm		= &hisi_zip_pm_ops,
 };
 
+struct pci_driver *hisi_zip_get_pf_driver(void)
+{
+	return &hisi_zip_pci_driver;
+}
+EXPORT_SYMBOL_GPL(hisi_zip_get_pf_driver);
+
 static void hisi_zip_register_debugfs(void)
 {
 	if (!debugfs_initialized())
diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h
index 8befb59c6fb3..70706c1fb7b6 100644
--- a/include/linux/hisi_acc_qm.h
+++ b/include/linux/hisi_acc_qm.h
@@ -476,4 +476,9 @@ void hisi_qm_pm_init(struct hisi_qm *qm);
 int hisi_qm_get_dfx_access(struct hisi_qm *qm);
 void hisi_qm_put_dfx_access(struct hisi_qm *qm);
 void hisi_qm_regs_dump(struct seq_file *s, struct debugfs_regset32 *regset);
+
+/* Used by VFIO ACC live migration driver */
+struct pci_driver *hisi_sec_get_pf_driver(void);
+struct pci_driver *hisi_hpre_get_pf_driver(void);
+struct pci_driver *hisi_zip_get_pf_driver(void);
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
                   ` (5 preceding siblings ...)
  2022-03-02 17:28 ` [PATCH v7 06/10] hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver Shameer Kolothum
@ 2022-03-02 17:29 ` Shameer Kolothum
  2022-03-02 20:31   ` Alex Williamson
  2022-03-02 17:29 ` [PATCH v7 08/10] crypto: hisilicon/qm: Set the VF QM state register Shameer Kolothum
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:29 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

From: Jason Gunthorpe <jgg@nvidia.com>

The optional PRE_COPY states open the saving data transfer FD before
reaching STOP_COPY and allows the device to dirty track internal state
changes with the general idea to reduce the volume of data transferred
in the STOP_COPY stage.

While in PRE_COPY the device remains RUNNING, but the saving FD is open.

Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P,
which halts P2P transfers while continuing the saving FD.

PRE_COPY, with P2P support, requires the driver to implement 7 new arcs
and exists as an optional FSM branch between RUNNING and STOP_COPY:
    RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY

A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to
query the progress of the precopy operation in the driver with the idea it
will judge to move to STOP_COPY at least once the initial data set is
transferred, and possibly after the dirty size has shrunk appropriately.

This ioctl is valid only in VFIO_DEVICE_STATE_PRE_COPY state and kernel
driver should return -EINVAL from any other migration state.

Compared to the v1 clarification, STOP_COPY -> PRE_COPY is made optional
and to be defined in future. While making the whole PRE_COPY feature
optional eliminates the concern from mlx5, this is still a complicated arc
to implement and seems prudent to leave it closed until a proper use case
is developed. We also split the pending_bytes report into the initial and
sustaining values, and define the protocol to get an event via poll() for
new dirty data during PRE_COPY.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
[Shameer: Renamed ioctl and updated ioctl usage description]
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/vfio.c       |  71 +++++++++++++++++++++++-
 include/uapi/linux/vfio.h | 113 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 179 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index bdb5205bb358..a14b86913593 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1577,7 +1577,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 			    enum vfio_device_mig_state new_fsm,
 			    enum vfio_device_mig_state *next_fsm)
 {
-	enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 };
+	enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_PRE_COPY_P2P + 1 };
 	/*
 	 * The coding in this table requires the driver to implement
 	 * FSM arcs:
@@ -1596,25 +1596,59 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 	 *         RUNNING -> STOP
 	 *         STOP -> RUNNING
 	 *
+	 * If precopy is supported then the driver must support these additional
+	 * FSM arcs:
+	 *         RUNNING -> PRE_COPY
+	 *         PRE_COPY -> RUNNING
+	 *         PRE_COPY -> STOP_COPY
+	 * However, if precopy and P2P are supported together then the driver
+	 * must support these additional arcs beyond the P2P arcs above:
+	 *         PRE_COPY -> RUNNING
+	 *         PRE_COPY -> PRE_COPY_P2P
+	 *         PRE_COPY_P2P -> PRE_COPY
+	 *         PRE_COPY_P2P -> RUNNING_P2P
+	 *         PRE_COPY_P2P -> STOP_COPY
+	 *         RUNNING -> PRE_COPY
+	 *         RUNNING_P2P -> PRE_COPY_P2P
+	 *
 	 * If all optional features are supported then the coding will step
 	 * through multiple states for these combination transitions:
+	 *         PRE_COPY -> PRE_COPY_P2P -> STOP_COPY
+	 *         PRE_COPY -> RUNNING -> RUNNING_P2P
+	 *         PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP
+	 *         PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP -> RESUMING
+	 *         PRE_COPY_P2P -> RUNNING_P2P -> RUNNING
+	 *         PRE_COPY_P2P -> RUNNING_P2P -> STOP
+	 *         PRE_COPY_P2P -> RUNNING_P2P -> STOP -> RESUMING
 	 *         RESUMING -> STOP -> RUNNING_P2P
+	 *         RESUMING -> STOP -> RUNNING_P2P -> PRE_COPY_P2P
 	 *         RESUMING -> STOP -> RUNNING_P2P -> RUNNING
+	 *         RESUMING -> STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY
 	 *         RESUMING -> STOP -> STOP_COPY
+	 *         RUNNING -> RUNNING_P2P -> PRE_COPY_P2P
 	 *         RUNNING -> RUNNING_P2P -> STOP
 	 *         RUNNING -> RUNNING_P2P -> STOP -> RESUMING
 	 *         RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY
+	 *         RUNNING_P2P -> RUNNING -> PRE_COPY
 	 *         RUNNING_P2P -> STOP -> RESUMING
 	 *         RUNNING_P2P -> STOP -> STOP_COPY
+	 *         STOP -> RUNNING_P2P -> PRE_COPY_P2P
 	 *         STOP -> RUNNING_P2P -> RUNNING
+	 *         STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY
 	 *         STOP_COPY -> STOP -> RESUMING
 	 *         STOP_COPY -> STOP -> RUNNING_P2P
 	 *         STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING
+	 *
+	 *  The following transitions are blocked:
+	 *         STOP_COPY -> PRE_COPY
+	 *         STOP_COPY -> PRE_COPY_P2P
 	 */
 	static const u8 vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STATES] = {
 		[VFIO_DEVICE_STATE_STOP] = {
 			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P,
+			[VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P,
+			[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING,
 			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
@@ -1623,14 +1657,38 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 		[VFIO_DEVICE_STATE_RUNNING] = {
 			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY,
+			[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
 			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
 		},
+		[VFIO_DEVICE_STATE_PRE_COPY] = {
+			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY,
+			[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P,
+			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_PRE_COPY_P2P,
+			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+		},
+		[VFIO_DEVICE_STATE_PRE_COPY_P2P] = {
+			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P,
+			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P,
+			[VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY,
+			[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P,
+			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY,
+			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P,
+			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
+			[VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+		},
 		[VFIO_DEVICE_STATE_STOP_COPY] = {
 			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR,
+			[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP,
@@ -1639,6 +1697,8 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 		[VFIO_DEVICE_STATE_RESUMING] = {
 			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_STOP,
+			[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING,
 			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP,
@@ -1647,6 +1707,8 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 		[VFIO_DEVICE_STATE_RUNNING_P2P] = {
 			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING,
+			[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP,
 			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
@@ -1655,6 +1717,8 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 		[VFIO_DEVICE_STATE_ERROR] = {
 			[VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_ERROR,
 			[VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_ERROR,
+			[VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR,
+			[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR,
 			[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_ERROR,
 			[VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_ERROR,
 			[VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_ERROR,
@@ -1665,6 +1729,11 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 	static const unsigned int state_flags_table[VFIO_DEVICE_NUM_STATES] = {
 		[VFIO_DEVICE_STATE_STOP] = VFIO_MIGRATION_STOP_COPY,
 		[VFIO_DEVICE_STATE_RUNNING] = VFIO_MIGRATION_STOP_COPY,
+		[VFIO_DEVICE_STATE_PRE_COPY] =
+			VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY,
+		[VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_MIGRATION_STOP_COPY |
+						   VFIO_MIGRATION_P2P |
+						   VFIO_MIGRATION_PRE_COPY,
 		[VFIO_DEVICE_STATE_STOP_COPY] = VFIO_MIGRATION_STOP_COPY,
 		[VFIO_DEVICE_STATE_RESUMING] = VFIO_MIGRATION_STOP_COPY,
 		[VFIO_DEVICE_STATE_RUNNING_P2P] =
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index fea86061b44e..440dbfaabdb3 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -819,12 +819,20 @@ struct vfio_device_feature {
  * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
  * is supported in addition to the STOP_COPY states.
  *
+ * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY means that
+ * PRE_COPY is supported in addition to the STOP_COPY states.
+ *
+ * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY
+ * means that RUNNING_P2P, PRE_COPY and PRE_COPY_P2P are supported
+ * in addition to the STOP_COPY states.
+ *
  * Other combinations of flags have behavior to be defined in the future.
  */
 struct vfio_device_feature_migration {
 	__aligned_u64 flags;
 #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
 #define VFIO_MIGRATION_P2P		(1 << 1)
+#define VFIO_MIGRATION_PRE_COPY		(1 << 2)
 };
 #define VFIO_DEVICE_FEATURE_MIGRATION 1
 
@@ -875,8 +883,13 @@ struct vfio_device_feature_mig_state {
  *  RESUMING - The device is stopped and is loading a new internal state
  *  ERROR - The device has failed and must be reset
  *
- * And 1 optional state to support VFIO_MIGRATION_P2P:
+ * And optional states to support VFIO_MIGRATION_P2P:
  *  RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA
+ * And VFIO_MIGRATION_PRE_COPY:
+ *  PRE_COPY - The device is running normally but tracking internal state
+ *             changes
+ * And VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY:
+ *  PRE_COPY_P2P - PRE_COPY, except the device cannot do peer to peer DMA
  *
  * The FSM takes actions on the arcs between FSM states. The driver implements
  * the following behavior for the FSM arcs:
@@ -908,20 +921,48 @@ struct vfio_device_feature_mig_state {
  *
  *   To abort a RESUMING session the device must be reset.
  *
+ * PRE_COPY -> RUNNING
  * RUNNING_P2P -> RUNNING
  *   While in RUNNING the device is fully operational, the device may generate
  *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
  *   and the device may advance its internal state.
  *
+ *   The PRE_COPY arc will terminate a data transfer session.
+ *
+ * PRE_COPY_P2P -> RUNNING_P2P
  * RUNNING -> RUNNING_P2P
  * STOP -> RUNNING_P2P
  *   While in RUNNING_P2P the device is partially running in the P2P quiescent
  *   state defined below.
  *
+ *   The PRE_COPY arc will terminate a data transfer session.
+ *
+ * RUNNING -> PRE_COPY
+ * RUNNING_P2P -> PRE_COPY_P2P
  * STOP -> STOP_COPY
- *   This arc begin the process of saving the device state and will return a
- *   new data_fd.
+ *   PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states
+ *   which share a data transfer session. Moving between these states alters
+ *   what is streamed in session, but does not terminate or otherwise effect
+ *   the associated fd.
+ *
+ *   These arcs begin the process of saving the device state and will return a
+ *   new data_fd. The migration driver may perform actions such as enabling
+ *   dirty logging of device state when entering PRE_COPY or PER_COPY_P2P.
+ *
+ *   Each arc does not change the device operation, the device remains
+ *   RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below
+ *   in PRE_COPY_P2P -> STOP_COPY.
+ *
+ * PRE_COPY -> PRE_COPY_P2P
+ *   Entering PRE_COPY_P2P continues all the behaviors of PRE_COPY above.
+ *   However, while in the PRE_COPY_P2P state, the device is partially running
+ *   in the P2P quiescent state defined below, like RUNNING_P2P.
  *
+ * PRE_COPY_P2P -> PRE_COPY
+ *   This arc allows returning the device to a full RUNNING behavior while
+ *   continuing all the behaviors of PRE_COPY.
+ *
+ * PRE_COPY_P2P -> STOP_COPY
  *   While in the STOP_COPY state the device has the same behavior as STOP
  *   with the addition that the data transfers session continues to stream the
  *   migration state. End of stream on the FD indicates the entire device
@@ -939,6 +980,13 @@ struct vfio_device_feature_mig_state {
  *   device state for this arc if required to prepare the device to receive the
  *   migration data.
  *
+ * STOP_COPY -> PRE_COPY
+ * STOP_COPY -> PRE_COPY_P2P
+ *   These arcs are not permitted and return error if requested. Future
+ *   revisions of this API may define behaviors for these arcs, in this case
+ *   support will be discoverable by a new flag in
+ *   VFIO_DEVICE_FEATURE_MIGRATION.
+ *
  * any -> ERROR
  *   ERROR cannot be specified as a device state, however any transition request
  *   can be failed with an errno return and may then move the device_state into
@@ -950,7 +998,7 @@ struct vfio_device_feature_mig_state {
  * The optional peer to peer (P2P) quiescent state is intended to be a quiescent
  * state for the device for the purposes of managing multiple devices within a
  * user context where peer-to-peer DMA between devices may be active. The
- * RUNNING_P2P states must prevent the device from initiating
+ * RUNNING_P2P and PRE_COPY_P2P states must prevent the device from initiating
  * any new P2P DMA transactions. If the device can identify P2P transactions
  * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration
  * driver must complete any such outstanding operations prior to completing the
@@ -963,6 +1011,8 @@ struct vfio_device_feature_mig_state {
  * above FSM arcs. As there are multiple paths through the FSM arcs the path
  * should be selected based on the following rules:
  *   - Select the shortest path.
+ *   - The path cannot have saving group states as interior arcs, only
+ *     starting/end states.
  * Refer to vfio_mig_get_next_state() for the result of the algorithm.
  *
  * The automatic transit through the FSM arcs that make up the combination
@@ -976,6 +1026,9 @@ struct vfio_device_feature_mig_state {
  * support them. The user can discover if these states are supported by using
  * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can
  * avoid knowing about these optional states if the kernel driver supports them.
+ *
+ * Arcs touching PRE_COPY and PRE_COPY_P2P are removed if support for PRE_COPY
+ * is not present.
  */
 enum vfio_device_mig_state {
 	VFIO_DEVICE_STATE_ERROR = 0,
@@ -984,8 +1037,60 @@ enum vfio_device_mig_state {
 	VFIO_DEVICE_STATE_STOP_COPY = 3,
 	VFIO_DEVICE_STATE_RESUMING = 4,
 	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
+	VFIO_DEVICE_STATE_PRE_COPY = 6,
+	VFIO_DEVICE_STATE_PRE_COPY_P2P = 7,
+};
+
+/**
+ * VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21)
+ *
+ * This ioctl is used on the migration data FD in the precopy phase of the
+ * migration data transfer. It returns an estimate of the current data sizes
+ * remaining to be transferred. It allows the user to judge when it is
+ * appropriate to leave PRE_COPY for STOP_COPY.
+ *
+ * This ioctl is valid only in VFIO_DEVICE_STATE_PRE_COPY state and kernel
+ * driver should return -EINVAL from any other migration state.
+ *
+ * initial_bytes reflects the estimated remaining size of any initial mandatory
+ * precopy data transfer. When initial_bytes returns as zero then the initial
+ * phase of the precopy data is completed. Generally initial_bytes should start
+ * out as approximately the entire device state.
+ *
+ * dirty_bytes reflects an estimate for how much more data needs to be
+ * transferred to complete the migration. Generally it should start as zero
+ * and increase as internal state is dirtied.
+ *
+ * Drivers should attempt to return estimates so that initial_bytes +
+ * dirty_bytes matches the amount of data an immediate transition to STOP_COPY
+ * will require to be streamed.
+ *
+ * Drivers have a lot of flexibility in when and what they transfer during the
+ * PRE_COPY phase, and how they report this from VFIO_MIG_GET_PRECOPY_INFO.
+ *
+ * During pre-copy the migration data FD has a temporary "end of stream" that is
+ * reached when both initial_bytes and dirty_byte are zero. For instance, this
+ * may indicate that the device is idle and not currently dirtying any internal
+ * state. When read() is done on this temporary end of stream the kernel driver
+ * should return ENOMSG from read(). Userspace can wait for more data (which may
+ * never come) by using poll.
+ *
+ * Once in STOP_COPY the migration data FD has a permanent end of stream
+ * signaled in the usual way by read() always returning 0 and poll always
+ * returning readable. ENOMSG may not be returned in STOP_COPY. Support
+ * for this ioctl is optional.
+ *
+ * Return: 0 on success, -1 and errno set on failure.
+ */
+struct vfio_precopy_info {
+	__u32 argsz;
+	__u32 flags;
+	__aligned_u64 initial_bytes;
+	__aligned_u64 dirty_bytes;
 };
 
+#define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 08/10] crypto: hisilicon/qm: Set the VF QM state register
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
                   ` (6 preceding siblings ...)
  2022-03-02 17:29 ` [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY Shameer Kolothum
@ 2022-03-02 17:29 ` Shameer Kolothum
  2022-03-02 17:29 ` [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration Shameer Kolothum
  2022-03-02 17:29 ` [PATCH v7 10/10] hisi_acc_vfio_pci: Use its own PCI reset_done error handler Shameer Kolothum
  9 siblings, 0 replies; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:29 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

From: Longfang Liu <liulongfang@huawei.com>

We use VF QM state register to record the status of the QM configuration
state. This will be used in the ACC migration driver to determine whether
we can safely save and restore the QM data.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/crypto/hisilicon/qm.c | 8 ++++++++
 include/linux/hisi_acc_qm.h   | 6 ++++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index 8c29f9fba573..5a0ac6cb6eeb 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -3492,6 +3492,12 @@ static void hisi_qm_pci_uninit(struct hisi_qm *qm)
 	pci_disable_device(pdev);
 }
 
+static void hisi_qm_set_state(struct hisi_qm *qm, u8 state)
+{
+	if (qm->ver > QM_HW_V2 && qm->fun_type == QM_HW_VF)
+		writel(state, qm->io_base + QM_VF_STATE);
+}
+
 /**
  * hisi_qm_uninit() - Uninitialize qm.
  * @qm: The qm needed uninit.
@@ -3520,6 +3526,7 @@ void hisi_qm_uninit(struct hisi_qm *qm)
 		dma_free_coherent(dev, qm->qdma.size,
 				  qm->qdma.va, qm->qdma.dma);
 	}
+	hisi_qm_set_state(qm, QM_NOT_READY);
 	up_write(&qm->qps_lock);
 
 	qm_irq_unregister(qm);
@@ -3745,6 +3752,7 @@ int hisi_qm_start(struct hisi_qm *qm)
 	if (!ret)
 		atomic_set(&qm->status.flags, QM_START);
 
+	hisi_qm_set_state(qm, QM_READY);
 err_unlock:
 	up_write(&qm->qps_lock);
 	return ret;
diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h
index 70706c1fb7b6..cae3e02ce23e 100644
--- a/include/linux/hisi_acc_qm.h
+++ b/include/linux/hisi_acc_qm.h
@@ -67,6 +67,7 @@
 #define QM_DB_RAND_SHIFT_V2		16
 #define QM_DB_INDEX_SHIFT_V2		32
 #define QM_DB_PRIORITY_SHIFT_V2		48
+#define QM_VF_STATE			0x60
 
 /* qm cache */
 #define QM_CACHE_CTL			0x100050
@@ -162,6 +163,11 @@ enum qm_debug_file {
 	DEBUG_FILE_NUM,
 };
 
+enum qm_vf_state {
+	QM_READY = 0,
+	QM_NOT_READY,
+};
+
 struct qm_dfx {
 	atomic64_t err_irq_cnt;
 	atomic64_t aeq_irq_cnt;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
                   ` (7 preceding siblings ...)
  2022-03-02 17:29 ` [PATCH v7 08/10] crypto: hisilicon/qm: Set the VF QM state register Shameer Kolothum
@ 2022-03-02 17:29 ` Shameer Kolothum
  2022-03-03  0:21   ` Jason Gunthorpe
  2022-03-02 17:29 ` [PATCH v7 10/10] hisi_acc_vfio_pci: Use its own PCI reset_done error handler Shameer Kolothum
  9 siblings, 1 reply; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:29 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

From: Longfang Liu <liulongfang@huawei.com>

VMs assigned with HiSilicon ACC VF devices can now perform live migration
if the VF devices are bind to the hisi_acc_vfio_pci driver.

Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 drivers/vfio/pci/hisilicon/Kconfig            |    7 +
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 1128 ++++++++++++++++-
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  112 ++
 3 files changed, 1229 insertions(+), 18 deletions(-)
 create mode 100644 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h

diff --git a/drivers/vfio/pci/hisilicon/Kconfig b/drivers/vfio/pci/hisilicon/Kconfig
index d5acaf74a878..02811364a7a7 100644
--- a/drivers/vfio/pci/hisilicon/Kconfig
+++ b/drivers/vfio/pci/hisilicon/Kconfig
@@ -2,6 +2,13 @@
 config HISI_ACC_VFIO_PCI
 	tristate "VFIO PCI support for HiSilicon ACC devices"
 	depends on (ARM64 && VFIO_PCI_CORE) || (COMPILE_TEST && 64BIT)
+	depends on PCI && PCI_MSI
+	depends on UACCE || UACCE=n
+	depends on ACPI
+	select CRYPTO_DEV_HISI_QM
+	select CRYPTO_DEV_HISI_HPRE
+	select CRYPTO_DEV_HISI_SEC2
+	select CRYPTO_DEV_HISI_ZIP
 	help
 	  This provides generic PCI support for HiSilicon ACC devices
 	  using the VFIO framework.
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 582ee4fa4109..f733aa0b1a5b 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -12,6 +12,1039 @@
 #include <linux/pci.h>
 #include <linux/vfio.h>
 #include <linux/vfio_pci_core.h>
+#include <linux/anon_inodes.h>
+
+#include "hisi_acc_vfio_pci.h"
+
+/* return 0 on VM acc device ready, -ETIMEDOUT hardware timeout */
+static int qm_wait_dev_not_ready(struct hisi_qm *qm)
+{
+	u32 val;
+
+	return readl_relaxed_poll_timeout(qm->io_base + QM_VF_STATE,
+				val, !(val & 0x1), MB_POLL_PERIOD_US,
+				MB_POLL_TIMEOUT_US);
+}
+
+/*
+ * Each state Reg is checked 100 times,
+ * with a delay of 100 microseconds after each check
+ */
+static u32 acc_check_reg_state(struct hisi_qm *qm, u32 regs)
+{
+	int check_times = 0;
+	u32 state;
+
+	state = readl(qm->io_base + regs);
+	while (state && check_times < ERROR_CHECK_TIMEOUT) {
+		udelay(CHECK_DELAY_TIME);
+		state = readl(qm->io_base + regs);
+		check_times++;
+	}
+
+	return state;
+}
+
+/* Check the PF's RAS state and Function INT state */
+static int qm_check_int_state(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+	struct hisi_qm *vfqm = &hisi_acc_vdev->vf_qm;
+	struct hisi_qm *qm = hisi_acc_vdev->pf_qm;
+	struct pci_dev *vf_pdev = hisi_acc_vdev->vf_dev;
+	struct device *dev = &qm->pdev->dev;
+	u32 state;
+
+	/* Check RAS state */
+	state = acc_check_reg_state(qm, QM_ABNORMAL_INT_STATUS);
+	if (state) {
+		dev_err(dev, "failed to check QM RAS state!\n");
+		return -EBUSY;
+	}
+
+	/* Check Function Communication state between PF and VF */
+	state = acc_check_reg_state(vfqm, QM_IFC_INT_STATUS);
+	if (state) {
+		dev_err(dev, "failed to check QM IFC INT state!\n");
+		return -EBUSY;
+	}
+	state = acc_check_reg_state(vfqm, QM_IFC_INT_SET_V);
+	if (state) {
+		dev_err(dev, "failed to check QM IFC INT SET state!\n");
+		return -EBUSY;
+	}
+
+	/* Check submodule task state */
+	switch (vf_pdev->device) {
+	case PCI_DEVICE_ID_HUAWEI_SEC_VF:
+		state = acc_check_reg_state(qm, SEC_CORE_INT_STATUS);
+		if (state) {
+			dev_err(dev, "failed to check QM SEC Core INT state!\n");
+			return -EBUSY;
+		}
+		return 0;
+	case PCI_DEVICE_ID_HUAWEI_HPRE_VF:
+		state = acc_check_reg_state(qm, HPRE_HAC_INT_STATUS);
+		if (state) {
+			dev_err(dev, "failed to check QM HPRE HAC INT state!\n");
+			return -EBUSY;
+		}
+		return 0;
+	case PCI_DEVICE_ID_HUAWEI_ZIP_VF:
+		state = acc_check_reg_state(qm, HZIP_CORE_INT_STATUS);
+		if (state) {
+			dev_err(dev, "failed to check QM ZIP Core INT state!\n");
+			return -EBUSY;
+		}
+		return 0;
+	default:
+		dev_err(dev, "failed to detect acc module type!\n");
+		return -EINVAL;
+	}
+}
+
+static int qm_read_reg(struct hisi_qm *qm, u32 reg_addr,
+		       u32 *data, u8 nums)
+{
+	int i;
+
+	if (nums < 1 || nums > QM_REGS_MAX_LEN)
+		return -EINVAL;
+
+	for (i = 0; i < nums; i++) {
+		data[i] = readl(qm->io_base + reg_addr);
+		reg_addr += QM_REG_ADDR_OFFSET;
+	}
+
+	return 0;
+}
+
+static int qm_write_reg(struct hisi_qm *qm, u32 reg,
+			u32 *data, u8 nums)
+{
+	int i;
+
+	if (nums < 1 || nums > QM_REGS_MAX_LEN)
+		return -EINVAL;
+
+	for (i = 0; i < nums; i++)
+		writel(data[i], qm->io_base + reg + i * QM_REG_ADDR_OFFSET);
+
+	return 0;
+}
+
+static int qm_get_vft(struct hisi_qm *qm, u32 *base)
+{
+	u64 sqc_vft;
+	u32 qp_num;
+	int ret;
+
+	ret = qm_mb(qm, QM_MB_CMD_SQC_VFT_V2, 0, 0, 1);
+	if (ret)
+		return ret;
+
+	sqc_vft = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) |
+		  ((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) <<
+		  QM_XQC_ADDR_OFFSET);
+	*base = QM_SQC_VFT_BASE_MASK_V2 & (sqc_vft >> QM_SQC_VFT_BASE_SHIFT_V2);
+	qp_num = (QM_SQC_VFT_NUM_MASK_V2 &
+		  (sqc_vft >> QM_SQC_VFT_NUM_SHIFT_V2)) + 1;
+
+	return qp_num;
+}
+
+static int qm_get_sqc(struct hisi_qm *qm, u64 *addr)
+{
+	int ret;
+
+	ret = qm_mb(qm, QM_MB_CMD_SQC_BT, 0, 0, 1);
+	if (ret)
+		return ret;
+
+	*addr = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) |
+		  ((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) <<
+		  QM_XQC_ADDR_OFFSET);
+
+	return 0;
+}
+
+static int qm_get_cqc(struct hisi_qm *qm, u64 *addr)
+{
+	int ret;
+
+	ret = qm_mb(qm, QM_MB_CMD_CQC_BT, 0, 0, 1);
+	if (ret)
+		return ret;
+
+	*addr = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) |
+		  ((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) <<
+		  QM_XQC_ADDR_OFFSET);
+
+	return 0;
+}
+
+static int qm_rw_regs_read(struct hisi_qm *qm, struct acc_vf_data *vf_data)
+{
+	struct device *dev = &qm->pdev->dev;
+	int ret;
+
+	ret = qm_read_reg(qm, QM_VF_AEQ_INT_MASK, &vf_data->aeq_int_mask, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_VF_AEQ_INT_MASK\n");
+		return ret;
+	}
+
+	ret = qm_read_reg(qm, QM_VF_EQ_INT_MASK, &vf_data->eq_int_mask, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_VF_EQ_INT_MASK\n");
+		return ret;
+	}
+
+	ret = qm_read_reg(qm, QM_IFC_INT_SOURCE_V,
+			  &vf_data->ifc_int_source, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_IFC_INT_SOURCE_V\n");
+		return ret;
+	}
+
+	ret = qm_read_reg(qm, QM_IFC_INT_MASK, &vf_data->ifc_int_mask, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_IFC_INT_MASK\n");
+		return ret;
+	}
+
+	ret = qm_read_reg(qm, QM_IFC_INT_SET_V, &vf_data->ifc_int_set, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_IFC_INT_SET_V\n");
+		return ret;
+	}
+
+	ret = qm_read_reg(qm, QM_PAGE_SIZE, &vf_data->page_size, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_PAGE_SIZE\n");
+		return ret;
+	}
+
+	/* QM_EQC_DW has 7 regs */
+	ret = qm_read_reg(qm, QM_EQC_DW0, vf_data->qm_eqc_dw, 7);
+	if (ret) {
+		dev_err(dev, "failed to read QM_EQC_DW\n");
+		return ret;
+	}
+
+	/* QM_AEQC_DW has 7 regs */
+	ret = qm_read_reg(qm, QM_AEQC_DW0, vf_data->qm_aeqc_dw, 7);
+	if (ret) {
+		dev_err(dev, "failed to read QM_AEQC_DW\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+static int qm_rw_regs_write(struct hisi_qm *qm, struct acc_vf_data *vf_data)
+{
+	struct device *dev = &qm->pdev->dev;
+	int ret;
+
+	/* check VF state */
+	if (unlikely(qm_wait_mb_ready(qm))) {
+		dev_err(&qm->pdev->dev, "QM device is not ready to write\n");
+		return -EBUSY;
+	}
+
+	ret = qm_write_reg(qm, QM_VF_AEQ_INT_MASK, &vf_data->aeq_int_mask, 1);
+	if (ret) {
+		dev_err(dev, "failed to write QM_VF_AEQ_INT_MASK\n");
+		return ret;
+	}
+
+	ret = qm_write_reg(qm, QM_VF_EQ_INT_MASK, &vf_data->eq_int_mask, 1);
+	if (ret) {
+		dev_err(dev, "failed to write QM_VF_EQ_INT_MASK\n");
+		return ret;
+	}
+
+	ret = qm_write_reg(qm, QM_IFC_INT_SOURCE_V,
+			   &vf_data->ifc_int_source, 1);
+	if (ret) {
+		dev_err(dev, "failed to write QM_IFC_INT_SOURCE_V\n");
+		return ret;
+	}
+
+	ret = qm_write_reg(qm, QM_IFC_INT_MASK, &vf_data->ifc_int_mask, 1);
+	if (ret) {
+		dev_err(dev, "failed to write QM_IFC_INT_MASK\n");
+		return ret;
+	}
+
+	ret = qm_write_reg(qm, QM_IFC_INT_SET_V, &vf_data->ifc_int_set, 1);
+	if (ret) {
+		dev_err(dev, "failed to write QM_IFC_INT_SET_V\n");
+		return ret;
+	}
+
+	ret = qm_write_reg(qm, QM_QUE_ISO_CFG_V, &vf_data->que_iso_cfg, 1);
+	if (ret) {
+		dev_err(dev, "failed to write QM_QUE_ISO_CFG_V\n");
+		return ret;
+	}
+
+	ret = qm_write_reg(qm, QM_PAGE_SIZE, &vf_data->page_size, 1);
+	if (ret) {
+		dev_err(dev, "failed to write QM_PAGE_SIZE\n");
+		return ret;
+	}
+
+	/* QM_EQC_DW has 7 regs */
+	ret = qm_write_reg(qm, QM_EQC_DW0, vf_data->qm_eqc_dw, 7);
+	if (ret) {
+		dev_err(dev, "failed to write QM_EQC_DW\n");
+		return ret;
+	}
+
+	/* QM_AEQC_DW has 7 regs */
+	ret = qm_write_reg(qm, QM_AEQC_DW0, vf_data->qm_aeqc_dw, 7);
+	if (ret) {
+		dev_err(dev, "failed to write QM_AEQC_DW\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+static void qm_db(struct hisi_qm *qm, u16 qn, u8 cmd,
+		  u16 index, u8 priority)
+{
+	u64 doorbell;
+	u64 dbase;
+	u16 randata = 0;
+
+	if (cmd == QM_DOORBELL_CMD_SQ || cmd == QM_DOORBELL_CMD_CQ)
+		dbase = QM_DOORBELL_SQ_CQ_BASE_V2;
+	else
+		dbase = QM_DOORBELL_EQ_AEQ_BASE_V2;
+
+	doorbell = qn | ((u64)cmd << QM_DB_CMD_SHIFT_V2) |
+		   ((u64)randata << QM_DB_RAND_SHIFT_V2) |
+		   ((u64)index << QM_DB_INDEX_SHIFT_V2)	 |
+		   ((u64)priority << QM_DB_PRIORITY_SHIFT_V2);
+
+	writeq(doorbell, qm->io_base + dbase);
+}
+
+static int pf_qm_get_qp_num(struct hisi_qm *qm, int vf_id, u32 *rbase)
+{
+	unsigned int val;
+	u64 sqc_vft;
+	u32 qp_num;
+	int ret;
+
+	ret = readl_relaxed_poll_timeout(qm->io_base + QM_VFT_CFG_RDY, val,
+					 val & BIT(0), MB_POLL_PERIOD_US,
+					 MB_POLL_TIMEOUT_US);
+	if (ret)
+		return ret;
+
+	writel(0x1, qm->io_base + QM_VFT_CFG_OP_WR);
+	/* 0 mean SQC VFT */
+	writel(0x0, qm->io_base + QM_VFT_CFG_TYPE);
+	writel(vf_id, qm->io_base + QM_VFT_CFG);
+
+	writel(0x0, qm->io_base + QM_VFT_CFG_RDY);
+	writel(0x1, qm->io_base + QM_VFT_CFG_OP_ENABLE);
+
+	ret = readl_relaxed_poll_timeout(qm->io_base + QM_VFT_CFG_RDY, val,
+					 val & BIT(0), MB_POLL_PERIOD_US,
+					 MB_POLL_TIMEOUT_US);
+	if (ret)
+		return ret;
+
+	sqc_vft = readl(qm->io_base + QM_VFT_CFG_DATA_L) |
+		  ((u64)readl(qm->io_base + QM_VFT_CFG_DATA_H) <<
+		  QM_XQC_ADDR_OFFSET);
+	*rbase = QM_SQC_VFT_BASE_MASK_V2 &
+		  (sqc_vft >> QM_SQC_VFT_BASE_SHIFT_V2);
+	qp_num = (QM_SQC_VFT_NUM_MASK_V2 &
+		  (sqc_vft >> QM_SQC_VFT_NUM_SHIFT_V2)) + 1;
+
+	return qp_num;
+}
+
+static void qm_dev_cmd_init(struct hisi_qm *qm)
+{
+	/* Clear VF communication status registers. */
+	writel(0x1, qm->io_base + QM_IFC_INT_SOURCE_V);
+
+	/* Enable pf and vf communication. */
+	writel(0x0, qm->io_base + QM_IFC_INT_MASK);
+}
+
+static int vf_qm_cache_wb(struct hisi_qm *qm)
+{
+	unsigned int val;
+
+	writel(0x1, qm->io_base + QM_CACHE_WB_START);
+	if (readl_relaxed_poll_timeout(qm->io_base + QM_CACHE_WB_DONE,
+				       val, val & BIT(0), MB_POLL_PERIOD_US,
+				       MB_POLL_TIMEOUT_US)) {
+		dev_err(&qm->pdev->dev, "vf QM writeback sqc cache fail\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void vf_qm_fun_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+			    struct hisi_qm *qm)
+{
+	int i;
+
+	for (i = 0; i < qm->qp_num; i++)
+		qm_db(qm, i, QM_DOORBELL_CMD_SQ, 0, 1);
+}
+
+static int vf_qm_func_stop(struct hisi_qm *qm)
+{
+	return qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
+}
+
+static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+			     struct hisi_acc_vf_migration_file *migf)
+{
+	struct acc_vf_data *vf_data = &migf->vf_data;
+	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+	struct hisi_qm *pf_qm = hisi_acc_vdev->pf_qm;
+	struct device *dev = &vf_qm->pdev->dev;
+	u32 que_iso_state;
+	int ret;
+
+	if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev->match_done)
+		return 0;
+
+	/* vf acc dev type check */
+	if (vf_data->dev_id != hisi_acc_vdev->vf_dev->device) {
+		dev_err(dev, "failed to match VF devices\n");
+		return -EINVAL;
+	}
+
+	/* vf qp num check */
+	ret = qm_get_vft(vf_qm, &vf_qm->qp_base);
+	if (ret <= 0) {
+		dev_err(dev, "failed to get vft qp nums\n");
+		return -EINVAL;
+	}
+
+	if (ret != vf_data->qp_num) {
+		dev_err(dev, "failed to match VF qp num\n");
+		return -EINVAL;
+	}
+
+	vf_qm->qp_num = ret;
+
+	/* vf isolation state check */
+	ret = qm_read_reg(pf_qm, QM_QUE_ISO_CFG_V, &que_iso_state, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_QUE_ISO_CFG_V\n");
+		return ret;
+	}
+
+	if (vf_data->que_iso_cfg != que_iso_state) {
+		dev_err(dev, "failed to match isolation state\n");
+		return ret;
+	}
+
+	hisi_acc_vdev->match_done = 1;
+	return 0;
+}
+
+static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+				struct hisi_acc_vf_migration_file *migf)
+{
+	struct acc_vf_data *vf_data = &migf->vf_data;
+	struct hisi_qm *pf_qm = hisi_acc_vdev->pf_qm;
+	struct device *dev = &pf_qm->pdev->dev;
+	int vf_id = hisi_acc_vdev->vf_id;
+	int ret;
+
+	/* save device id */
+	vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
+
+	/* vf qp num save from PF */
+	ret = pf_qm_get_qp_num(pf_qm, vf_id, &vf_data->qp_base);
+	if (ret <= 0) {
+		dev_err(dev, "failed to get vft qp nums!\n");
+		return -EINVAL;
+	}
+
+	vf_data->qp_num = ret;
+
+	/* VF isolation state save from PF */
+	ret = qm_read_reg(pf_qm, QM_QUE_ISO_CFG_V, &vf_data->que_iso_cfg, 1);
+	if (ret) {
+		dev_err(dev, "failed to read QM_QUE_ISO_CFG_V!\n");
+		return ret;
+	}
+
+	migf->total_length = QM_MATCH_SIZE;
+
+	return 0;
+}
+
+static void hisi_acc_vf_disable_fd(struct hisi_acc_vf_migration_file *migf)
+{
+	mutex_lock(&migf->lock);
+	migf->disabled = true;
+	migf->total_length = 0;
+	migf->filp->f_pos = 0;
+	mutex_unlock(&migf->lock);
+}
+
+static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+	struct hisi_acc_vf_migration_file *resuming_migf = &hisi_acc_vdev->resuming_migf;
+	struct hisi_acc_vf_migration_file *saving_migf = &hisi_acc_vdev->saving_migf;
+
+	if (resuming_migf->filp) {
+		hisi_acc_vf_disable_fd(resuming_migf);
+		fput(resuming_migf->filp);
+	}
+
+	if (saving_migf->filp) {
+		hisi_acc_vf_disable_fd(saving_migf);
+		fput(saving_migf->filp);
+	}
+}
+
+static int hisi_acc_vf_load_state(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+				  struct hisi_acc_vf_migration_file *migf)
+{
+	struct hisi_qm *qm = &hisi_acc_vdev->vf_qm;
+	struct device *dev = &qm->pdev->dev;
+	struct acc_vf_data *vf_data = &migf->vf_data;
+	int ret;
+
+	/* Return if only match data was transferred */
+	if (migf->total_length == QM_MATCH_SIZE) {
+		hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
+		return 0;
+	}
+
+	if (migf->total_length < sizeof(struct acc_vf_data))
+		return -EINVAL;
+
+	qm->eqe_dma = vf_data->eqe_dma;
+	qm->aeqe_dma = vf_data->aeqe_dma;
+	qm->sqc_dma = vf_data->sqc_dma;
+	qm->cqc_dma = vf_data->cqc_dma;
+
+	qm->qp_base = vf_data->qp_base;
+	qm->qp_num = vf_data->qp_num;
+
+	ret = qm_rw_regs_write(qm, vf_data);
+	if (ret) {
+		dev_err(dev, "Set VF regs failed\n");
+		return ret;
+	}
+
+	ret = qm_mb(qm, QM_MB_CMD_SQC_BT, qm->sqc_dma, 0, 0);
+	if (ret) {
+		dev_err(dev, "Set sqc failed\n");
+		return ret;
+	}
+
+	ret = qm_mb(qm, QM_MB_CMD_CQC_BT, qm->cqc_dma, 0, 0);
+	if (ret) {
+		dev_err(dev, "Set cqc failed\n");
+		return ret;
+	}
+
+	qm_dev_cmd_init(qm);
+	return 0;
+}
+
+static int hisi_acc_vf_release_file(struct inode *inode, struct file *filp)
+{
+	struct hisi_acc_vf_migration_file *migf = filp->private_data;
+
+	hisi_acc_vf_disable_fd(migf);
+	mutex_destroy(&migf->lock);
+	kfree(migf);
+	return 0;
+}
+
+static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *buf,
+					size_t len, loff_t *pos)
+{
+	struct hisi_acc_vf_migration_file *migf = filp->private_data;
+	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(migf,
+			struct hisi_acc_vf_core_device, resuming_migf);
+	loff_t requested_length;
+	u8 *data = (u8 *)&migf->vf_data;
+	ssize_t done = 0;
+	int ret;
+
+	if (pos)
+		return -ESPIPE;
+	pos = &filp->f_pos;
+
+	if (*pos < 0 ||
+	    check_add_overflow((loff_t)len, *pos, &requested_length))
+		return -EINVAL;
+
+	if (requested_length > sizeof(struct acc_vf_data))
+		return -ENOMEM;
+
+	mutex_lock(&migf->lock);
+	if (migf->disabled) {
+		done = -ENODEV;
+		goto out_unlock;
+	}
+
+	ret = copy_from_user(data + *pos, buf, len);
+	if (ret) {
+		done = -EFAULT;
+		goto out_unlock;
+	}
+	*pos += len;
+	done = len;
+	migf->total_length += len;
+
+	ret = vf_qm_check_match(hisi_acc_vdev, migf);
+	if (ret)
+		done = -EFAULT;
+
+out_unlock:
+	mutex_unlock(&migf->lock);
+	return done;
+}
+
+static const struct file_operations hisi_acc_vf_resume_fops = {
+	.owner = THIS_MODULE,
+	.write = hisi_acc_vf_resume_write,
+	.release = hisi_acc_vf_release_file,
+	.llseek = no_llseek,
+};
+
+static int hisi_acc_vf_pci_resume(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+				  struct hisi_acc_vf_migration_file *migf)
+{
+	migf->filp = anon_inode_getfile("hisi_acc_vf_mig", &hisi_acc_vf_resume_fops, migf,
+					O_WRONLY);
+	if (IS_ERR(migf->filp)) {
+		int err = PTR_ERR(migf->filp);
+
+		return err;
+	}
+
+	stream_open(migf->filp->f_inode, migf->filp);
+	mutex_init(&migf->lock);
+	return 0;
+}
+
+static int hisi_acc_vf_stop_copy(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+				 struct hisi_acc_vf_migration_file *migf)
+{
+	struct acc_vf_data *vf_data = &migf->vf_data;
+	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+	struct device *dev = &vf_qm->pdev->dev;
+	int ret;
+
+	if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
+		/* Update state and return. */
+		hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
+		return 0;
+	}
+
+	ret = vf_qm_cache_wb(vf_qm);
+	if (ret) {
+		dev_err(dev, "failed to writeback QM Cache!\n");
+		return ret;
+	}
+
+	mutex_lock(&migf->lock);
+	ret = qm_rw_regs_read(vf_qm, vf_data);
+	if (ret) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* Every reg is 32 bit, the dma address is 64 bit. */
+	vf_data->eqe_dma = vf_data->qm_eqc_dw[2];
+	vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
+	vf_data->eqe_dma |= vf_data->qm_eqc_dw[1];
+	vf_data->aeqe_dma = vf_data->qm_aeqc_dw[2];
+	vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
+	vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[1];
+
+	/* Through SQC_BT/CQC_BT to get sqc and cqc address */
+	ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
+	if (ret) {
+		dev_err(dev, "failed to read SQC addr!\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = qm_get_cqc(vf_qm, &vf_data->cqc_dma);
+	if (ret) {
+		dev_err(dev, "failed to read CQC addr!\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	migf->total_length = sizeof(struct acc_vf_data);
+out:
+	mutex_unlock(&migf->lock);
+	return ret;
+}
+
+static int hisi_acc_vf_stop_device(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+	struct device *dev = &hisi_acc_vdev->vf_dev->dev;
+	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+	int ret;
+
+	ret = vf_qm_func_stop(vf_qm);
+	if (ret) {
+		dev_err(dev, "failed to stop QM VF function!\n");
+		return ret;
+	}
+
+	ret = qm_check_int_state(hisi_acc_vdev);
+	if (ret) {
+		dev_err(dev, "failed to check QM INT state!\n");
+		return ret;
+	}
+	return 0;
+}
+
+static void hisi_acc_vf_start_device(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+
+	if (hisi_acc_vdev->vf_qm_state != QM_READY)
+		return;
+
+	vf_qm_fun_reset(hisi_acc_vdev, vf_qm);
+}
+
+static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
+				       unsigned int cmd, unsigned long arg)
+{
+	struct hisi_acc_vf_migration_file *migf = filp->private_data;
+	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(migf,
+			struct hisi_acc_vf_core_device, saving_migf);
+	loff_t *pos = &filp->f_pos;
+	struct vfio_precopy_info info;
+	unsigned long minsz;
+	int ret;
+
+	if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
+		return -ENOTTY;
+
+	minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
+
+	if (copy_from_user(&info, (void __user *)arg, minsz))
+		return -EFAULT;
+	if (info.argsz < minsz)
+		return -EINVAL;
+
+	mutex_lock(&hisi_acc_vdev->state_mutex);
+	if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
+		mutex_unlock(&hisi_acc_vdev->state_mutex);
+		return -EINVAL;
+	}
+
+	mutex_lock(&migf->lock);
+
+	if (migf->disabled) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	if (*pos > migf->total_length) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	info.dirty_bytes = 0;
+	info.initial_bytes = migf->total_length - *pos;
+
+	ret = copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
+out:
+	mutex_unlock(&migf->lock);
+	mutex_unlock(&hisi_acc_vdev->state_mutex);
+	return ret;
+}
+
+static ssize_t hisi_acc_vf_save_read(struct file *filp, char __user *buf, size_t len,
+				     loff_t *pos)
+{
+	struct hisi_acc_vf_migration_file *migf = filp->private_data;
+	ssize_t done = 0;
+	int ret;
+
+	if (pos)
+		return -ESPIPE;
+	pos = &filp->f_pos;
+
+	mutex_lock(&migf->lock);
+	if (*pos > migf->total_length) {
+		done = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (migf->disabled) {
+		done = -ENODEV;
+		goto out_unlock;
+	}
+
+	/*
+	 * Check whether dev has reached temporary end of stream in PRE_COPY state
+	 * and return ENOMSG on any further read in this state.
+	 */
+	if (migf->total_length == QM_MATCH_SIZE && *pos == migf->total_length)
+		return -ENOMSG;
+
+	len = min_t(size_t, migf->total_length - *pos, len);
+	if (len) {
+		u8 *data = (u8 *)&migf->vf_data;
+
+		ret = copy_to_user(buf, data + *pos, len);
+		if (ret) {
+			done = -EFAULT;
+			goto out_unlock;
+		}
+		*pos += len;
+		done = len;
+	}
+out_unlock:
+	mutex_unlock(&migf->lock);
+	return done;
+}
+
+static const struct file_operations hisi_acc_vf_save_fops = {
+	.owner = THIS_MODULE,
+	.read = hisi_acc_vf_save_read,
+	.unlocked_ioctl = hisi_acc_vf_save_unl_ioctl,
+	.compat_ioctl = compat_ptr_ioctl,
+	.release = hisi_acc_vf_release_file,
+	.llseek = no_llseek,
+};
+
+static int hisi_acc_vf_pre_copy(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+				struct hisi_acc_vf_migration_file *migf)
+{
+	int ret;
+
+	migf->filp = anon_inode_getfile("hisi_acc_vf_mig", &hisi_acc_vf_save_fops, migf,
+					O_RDONLY);
+	if (IS_ERR(migf->filp)) {
+		int err = PTR_ERR(migf->filp);
+
+		return err;
+	}
+
+	stream_open(migf->filp->f_inode, migf->filp);
+	mutex_init(&migf->lock);
+
+	ret = vf_qm_get_match_data(hisi_acc_vdev, migf);
+	if (ret) {
+		fput(migf->filp);
+		return ret;
+	}
+
+	return 0;
+}
+
+static struct file *
+hisi_acc_vf_set_device_state(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+			     u32 new)
+{
+	u32 cur = hisi_acc_vdev->mig_state;
+	int ret;
+
+	if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_PRE_COPY) {
+		struct hisi_acc_vf_migration_file *migf = &hisi_acc_vdev->saving_migf;
+
+		ret = hisi_acc_vf_pre_copy(hisi_acc_vdev, migf);
+		if (ret)
+			return ERR_PTR(ret);
+		get_file(migf->filp);
+		return migf->filp;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_STOP_COPY) {
+		struct hisi_acc_vf_migration_file *migf = &hisi_acc_vdev->saving_migf;
+
+		ret = hisi_acc_vf_stop_device(hisi_acc_vdev);
+		if (ret)
+			return ERR_PTR(ret);
+
+		ret = hisi_acc_vf_stop_copy(hisi_acc_vdev, migf);
+		if (ret)
+			return ERR_PTR(ret);
+		get_file(migf->filp);
+		return migf->filp;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_STOP) {
+		ret = hisi_acc_vf_stop_device(hisi_acc_vdev);
+		if (ret)
+			return ERR_PTR(ret);
+		return NULL;
+	}
+
+	if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP)) {
+		hisi_acc_vf_disable_fds(hisi_acc_vdev);
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RESUMING) {
+		struct hisi_acc_vf_migration_file *migf = &hisi_acc_vdev->resuming_migf;
+
+		ret = hisi_acc_vf_pci_resume(hisi_acc_vdev, migf);
+		if (ret)
+			return ERR_PTR(ret);
+		get_file(migf->filp);
+		return migf->filp;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) {
+		struct hisi_acc_vf_migration_file *migf = &hisi_acc_vdev->resuming_migf;
+
+		ret = hisi_acc_vf_load_state(hisi_acc_vdev, migf);
+		if (ret)
+			return ERR_PTR(ret);
+		hisi_acc_vf_disable_fds(hisi_acc_vdev);
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_RUNNING) {
+		hisi_acc_vf_disable_fds(hisi_acc_vdev);
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP  && new == VFIO_DEVICE_STATE_RUNNING) {
+		hisi_acc_vf_start_device(hisi_acc_vdev);
+		return NULL;
+	}
+
+	/*
+	 * vfio_mig_get_next_state() does not use arcs other than the above
+	 */
+
+	WARN_ON(true);
+	return ERR_PTR(-EINVAL);
+}
+
+static struct file *
+hisi_acc_vfio_pci_set_device_state(struct vfio_device *vdev,
+				   enum vfio_device_mig_state new_state)
+{
+	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(vdev,
+			struct hisi_acc_vf_core_device, core_device.vdev);
+	enum vfio_device_mig_state next_state;
+	struct file *res = NULL;
+	int ret;
+
+	mutex_lock(&hisi_acc_vdev->state_mutex);
+	while (new_state != hisi_acc_vdev->mig_state) {
+		ret = vfio_mig_get_next_state(vdev,
+					      hisi_acc_vdev->mig_state,
+					      new_state, &next_state);
+		if (ret) {
+			res = ERR_PTR(-EINVAL);
+			break;
+		}
+
+		res = hisi_acc_vf_set_device_state(hisi_acc_vdev, next_state);
+		if (IS_ERR(res))
+			break;
+		hisi_acc_vdev->mig_state = next_state;
+		if (WARN_ON(res && new_state != hisi_acc_vdev->mig_state)) {
+			fput(res);
+			res = ERR_PTR(-EINVAL);
+			break;
+		}
+	}
+	mutex_unlock(&hisi_acc_vdev->state_mutex);
+	return res;
+}
+
+static int
+hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
+				   enum vfio_device_mig_state *curr_state)
+{
+	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(vdev,
+			struct hisi_acc_vf_core_device, core_device.vdev);
+
+	mutex_lock(&hisi_acc_vdev->state_mutex);
+	*curr_state = hisi_acc_vdev->mig_state;
+	mutex_unlock(&hisi_acc_vdev->state_mutex);
+	return 0;
+}
+
+static int hisi_acc_vf_qm_init(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+	struct vfio_pci_core_device *vdev = &hisi_acc_vdev->core_device;
+	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+	struct pci_dev *vf_dev = vdev->pdev;
+
+	/*
+	 * ACC VF dev BAR2 region consists of both functional register space
+	 * and migration control register space. For migration to work, we
+	 * need access to both. Hence, we map the entire BAR2 region here.
+	 * But from a security point of view, we restrict access to the
+	 * migration control space from Guest(Please see mmap/ioctl/read/write
+	 * override functions).
+	 *
+	 * Also the HiSilicon ACC VF devices supported by this driver on
+	 * HiSilicon hardware platforms are integrated end point devices
+	 * and has no capability to perform PCIe P2P.
+	 */
+
+	vf_qm->io_base =
+		ioremap(pci_resource_start(vf_dev, VFIO_PCI_BAR2_REGION_INDEX),
+			pci_resource_len(vf_dev, VFIO_PCI_BAR2_REGION_INDEX));
+	if (!vf_qm->io_base)
+		return -EIO;
+
+	vf_qm->fun_type = QM_HW_VF;
+	vf_qm->pdev = vf_dev;
+	mutex_init(&vf_qm->mailbox_lock);
+
+	return 0;
+}
+
+static struct hisi_qm *hisi_acc_get_pf_qm(struct pci_dev *pdev)
+{
+	struct hisi_qm	*pf_qm;
+	struct pci_driver *pf_driver;
+
+	if (!pdev->is_virtfn)
+		return NULL;
+
+	switch (pdev->device) {
+	case PCI_DEVICE_ID_HUAWEI_SEC_VF:
+		pf_driver = hisi_sec_get_pf_driver();
+		break;
+	case PCI_DEVICE_ID_HUAWEI_HPRE_VF:
+		pf_driver = hisi_hpre_get_pf_driver();
+		break;
+	case PCI_DEVICE_ID_HUAWEI_ZIP_VF:
+		pf_driver = hisi_zip_get_pf_driver();
+		break;
+	default:
+		return NULL;
+	}
+
+	if (!pf_driver)
+		return NULL;
+
+	pf_qm = pci_iov_get_pf_drvdata(pdev, pf_driver);
+
+	return !IS_ERR(pf_qm) ? pf_qm : NULL;
+}
 
 static int hisi_acc_pci_rw_access_check(struct vfio_device *core_vdev,
 					size_t count, loff_t *ppos,
@@ -128,23 +1161,42 @@ static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int
 
 static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
 {
-	struct vfio_pci_core_device *vdev =
-		container_of(core_vdev, struct vfio_pci_core_device, vdev);
+	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(core_vdev,
+			struct hisi_acc_vf_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &hisi_acc_vdev->core_device;
 	int ret;
 
 	ret = vfio_pci_core_enable(vdev);
 	if (ret)
 		return ret;
 
-	vfio_pci_core_finish_enable(vdev);
+	if (core_vdev->ops->migration_set_state) {
+		ret = hisi_acc_vf_qm_init(hisi_acc_vdev);
+		if (ret) {
+			vfio_pci_core_disable(vdev);
+			return ret;
+		}
+		hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+	}
 
+	vfio_pci_core_finish_enable(vdev);
 	return 0;
 }
 
+static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev)
+{
+	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(core_vdev,
+			struct hisi_acc_vf_core_device, core_device.vdev);
+	struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+
+	iounmap(vf_qm->io_base);
+	vfio_pci_core_close_device(core_vdev);
+}
+
 static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
 	.name = "hisi-acc-vfio-pci-migration",
 	.open_device = hisi_acc_vfio_pci_open_device,
-	.close_device = vfio_pci_core_close_device,
+	.close_device = hisi_acc_vfio_pci_close_device,
 	.ioctl = hisi_acc_vfio_pci_ioctl,
 	.device_feature = vfio_pci_core_ioctl_feature,
 	.read = hisi_acc_vfio_pci_read,
@@ -152,6 +1204,8 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
 	.mmap = hisi_acc_vfio_pci_mmap,
 	.request = vfio_pci_core_request,
 	.match = vfio_pci_core_match,
+	.migration_set_state = hisi_acc_vfio_pci_set_device_state,
+	.migration_get_state = hisi_acc_vfio_pci_get_device_state,
 };
 
 static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -167,38 +1221,76 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
 	.match = vfio_pci_core_match,
 };
 
+static int
+hisi_acc_vfio_pci_migrn_init(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+			     struct pci_dev *pdev, struct hisi_qm *pf_qm)
+{
+	int vf_id;
+
+	vf_id = pci_iov_vf_id(pdev);
+	if (vf_id < 0)
+		return vf_id;
+
+	hisi_acc_vdev->vf_id = vf_id + 1;
+	/*
+	 * We set _PRE_COPY here for an early check on compatibility between
+	 * src and dst devices.
+	 */
+	hisi_acc_vdev->core_device.vdev.migration_flags =
+			(VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY);
+	hisi_acc_vdev->pf_qm = pf_qm;
+	hisi_acc_vdev->vf_dev = pdev;
+	mutex_init(&hisi_acc_vdev->state_mutex);
+
+	return 0;
+}
+
 static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
-	struct vfio_pci_core_device *vdev;
+	struct hisi_acc_vf_core_device *hisi_acc_vdev;
+	struct hisi_qm *pf_qm;
 	int ret;
 
-	vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
-	if (!vdev)
+	hisi_acc_vdev = kzalloc(sizeof(*hisi_acc_vdev), GFP_KERNEL);
+	if (!hisi_acc_vdev)
 		return -ENOMEM;
 
-	vfio_pci_core_init_device(vdev, pdev, &hisi_acc_vfio_pci_ops);
+	pf_qm = hisi_acc_get_pf_qm(pdev);
+	if (pf_qm && pf_qm->ver >= QM_HW_V3) {
+		ret = hisi_acc_vfio_pci_migrn_init(hisi_acc_vdev, pdev, pf_qm);
+		if (!ret) {
+			vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
+						  &hisi_acc_vfio_pci_migrn_ops);
+		} else {
+			pci_warn(pdev, "migration support failed, continue with generic interface\n");
+			vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
+						  &hisi_acc_vfio_pci_ops);
+		}
+	} else {
+		vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
+					  &hisi_acc_vfio_pci_ops);
+	}
 
-	ret = vfio_pci_core_register_device(vdev);
+	ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
 	if (ret)
 		goto out_free;
 
-	dev_set_drvdata(&pdev->dev, vdev);
-
+	dev_set_drvdata(&pdev->dev, hisi_acc_vdev);
 	return 0;
 
 out_free:
-	vfio_pci_core_uninit_device(vdev);
-	kfree(vdev);
+	vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
+	kfree(hisi_acc_vdev);
 	return ret;
 }
 
 static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
 {
-	struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
+	struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
 
-	vfio_pci_core_unregister_device(vdev);
-	vfio_pci_core_uninit_device(vdev);
-	kfree(vdev);
+	vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
+	vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
+	kfree(hisi_acc_vdev);
 }
 
 static const struct pci_device_id hisi_acc_vfio_pci_table[] = {
@@ -223,4 +1315,4 @@ module_pci_driver(hisi_acc_vfio_pci_driver);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Liu Longfang <liulongfang@huawei.com>");
 MODULE_AUTHOR("Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>");
-MODULE_DESCRIPTION("HiSilicon VFIO PCI - Generic VFIO PCI driver for HiSilicon ACC device family");
+MODULE_DESCRIPTION("HiSilicon VFIO PCI - VFIO PCI driver with live migration support for HiSilicon ACC device family");
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
new file mode 100644
index 000000000000..51bc7e92a776
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2021 HiSilicon Ltd. */
+
+#ifndef HISI_ACC_VFIO_PCI_H
+#define HISI_ACC_VFIO_PCI_H
+
+#include <linux/hisi_acc_qm.h>
+
+#define MB_POLL_PERIOD_US		10
+#define MB_POLL_TIMEOUT_US		1000
+#define QM_CACHE_WB_START		0x204
+#define QM_CACHE_WB_DONE		0x208
+#define QM_MB_CMD_PAUSE_QM		0xe
+#define QM_ABNORMAL_INT_STATUS		0x100008
+#define QM_IFC_INT_STATUS		0x0028
+#define SEC_CORE_INT_STATUS		0x301008
+#define HPRE_HAC_INT_STATUS		0x301800
+#define HZIP_CORE_INT_STATUS		0x3010AC
+#define QM_QUE_ISO_CFG			0x301154
+
+#define QM_VFT_CFG_RDY			0x10006c
+#define QM_VFT_CFG_OP_WR		0x100058
+#define QM_VFT_CFG_TYPE			0x10005c
+#define QM_VFT_CFG			0x100060
+#define QM_VFT_CFG_OP_ENABLE		0x100054
+#define QM_VFT_CFG_DATA_L		0x100064
+#define QM_VFT_CFG_DATA_H		0x100068
+
+#define ERROR_CHECK_TIMEOUT		100
+#define CHECK_DELAY_TIME		100
+
+#define QM_SQC_VFT_BASE_SHIFT_V2	28
+#define QM_SQC_VFT_BASE_MASK_V2		GENMASK(15, 0)
+#define QM_SQC_VFT_NUM_SHIFT_V2		45
+#define QM_SQC_VFT_NUM_MASK_V2		GENMASK(9, 0)
+
+/* RW regs */
+#define QM_REGS_MAX_LEN		7
+#define QM_REG_ADDR_OFFSET	0x0004
+
+#define QM_XQC_ADDR_OFFSET	32U
+#define QM_VF_AEQ_INT_MASK	0x0004
+#define QM_VF_EQ_INT_MASK	0x000c
+#define QM_IFC_INT_SOURCE_V	0x0020
+#define QM_IFC_INT_MASK		0x0024
+#define QM_IFC_INT_SET_V	0x002c
+#define QM_QUE_ISO_CFG_V	0x0030
+#define QM_PAGE_SIZE		0x0034
+
+#define QM_EQC_DW0		0X8000
+#define QM_AEQC_DW0		0X8020
+
+struct acc_vf_data {
+#define QM_MATCH_SIZE 32L
+	/* QM match information */
+	u32 qp_num;
+	u32 dev_id;
+	u32 que_iso_cfg;
+	u32 qp_base;
+	/* QM reserved match information */
+	u32 qm_rsv_state[4];
+
+	/* QM RW regs */
+	u32 aeq_int_mask;
+	u32 eq_int_mask;
+	u32 ifc_int_source;
+	u32 ifc_int_mask;
+	u32 ifc_int_set;
+	u32 page_size;
+
+	/* QM_EQC_DW has 7 regs */
+	u32 qm_eqc_dw[7];
+
+	/* QM_AEQC_DW has 7 regs */
+	u32 qm_aeqc_dw[7];
+
+	/* QM reserved 5 regs */
+	u32 qm_rsv_regs[5];
+
+	/* qm memory init information */
+	u64 eqe_dma;
+	u64 aeqe_dma;
+	u64 sqc_dma;
+	u64 cqc_dma;
+};
+
+struct hisi_acc_vf_migration_file {
+	struct file *filp;
+	struct mutex lock;
+	bool disabled;
+
+	struct acc_vf_data vf_data;
+	size_t total_length;
+};
+
+struct hisi_acc_vf_core_device {
+	struct vfio_pci_core_device core_device;
+	u8 match_done:1;
+	/* for migration state */
+	struct mutex state_mutex;
+	enum vfio_device_mig_state mig_state;
+	struct pci_dev *pf_dev;
+	struct pci_dev *vf_dev;
+	struct hisi_qm *pf_qm;
+	struct hisi_qm vf_qm;
+	u32 vf_qm_state;
+	int vf_id;
+
+	struct hisi_acc_vf_migration_file resuming_migf;
+	struct hisi_acc_vf_migration_file saving_migf;
+};
+#endif /* HISI_ACC_VFIO_PCI_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 10/10] hisi_acc_vfio_pci: Use its own PCI reset_done error handler
  2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
                   ` (8 preceding siblings ...)
  2022-03-02 17:29 ` [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration Shameer Kolothum
@ 2022-03-02 17:29 ` Shameer Kolothum
  9 siblings, 0 replies; 32+ messages in thread
From: Shameer Kolothum @ 2022-03-02 17:29 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

Register private handler for pci_error_handlers.reset_done and update
state accordingly.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 61 +++++++++++++++++--
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  4 +-
 2 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index f733aa0b1a5b..906690e57020 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -515,6 +515,27 @@ static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vde
 	}
 }
 
+/*
+ * This function is called in all state_mutex unlock cases to
+ * handle a 'deferred_reset' if exists.
+ */
+static void
+hisi_acc_vf_state_mutex_unlock(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+again:
+	spin_lock(&hisi_acc_vdev->reset_lock);
+	if (hisi_acc_vdev->deferred_reset) {
+		hisi_acc_vdev->deferred_reset = false;
+		spin_unlock(&hisi_acc_vdev->reset_lock);
+		hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
+		hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+		hisi_acc_vf_disable_fds(hisi_acc_vdev);
+		goto again;
+	}
+	mutex_unlock(&hisi_acc_vdev->state_mutex);
+	spin_unlock(&hisi_acc_vdev->reset_lock);
+}
+
 static int hisi_acc_vf_load_state(struct hisi_acc_vf_core_device *hisi_acc_vdev,
 				  struct hisi_acc_vf_migration_file *migf)
 {
@@ -750,7 +771,7 @@ static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
 
 	mutex_lock(&hisi_acc_vdev->state_mutex);
 	if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
-		mutex_unlock(&hisi_acc_vdev->state_mutex);
+		hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
 		return -EINVAL;
 	}
 
@@ -772,7 +793,7 @@ static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
 	ret = copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
 out:
 	mutex_unlock(&migf->lock);
-	mutex_unlock(&hisi_acc_vdev->state_mutex);
+	hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
 	return ret;
 }
 
@@ -967,7 +988,7 @@ hisi_acc_vfio_pci_set_device_state(struct vfio_device *vdev,
 			break;
 		}
 	}
-	mutex_unlock(&hisi_acc_vdev->state_mutex);
+	hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
 	return res;
 }
 
@@ -980,10 +1001,35 @@ hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
 
 	mutex_lock(&hisi_acc_vdev->state_mutex);
 	*curr_state = hisi_acc_vdev->mig_state;
-	mutex_unlock(&hisi_acc_vdev->state_mutex);
+	hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
 	return 0;
 }
 
+static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
+{
+	struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
+
+	if (hisi_acc_vdev->core_device.vdev.migration_flags !=
+				VFIO_MIGRATION_STOP_COPY)
+		return;
+
+	/*
+	 * As the higher VFIO layers are holding locks across reset and using
+	 * those same locks with the mm_lock we need to prevent ABBA deadlock
+	 * with the state_mutex and mm_lock.
+	 * In case the state_mutex was taken already we defer the cleanup work
+	 * to the unlock flow of the other running context.
+	 */
+	spin_lock(&hisi_acc_vdev->reset_lock);
+	hisi_acc_vdev->deferred_reset = true;
+	if (!mutex_trylock(&hisi_acc_vdev->state_mutex)) {
+		spin_unlock(&hisi_acc_vdev->reset_lock);
+		return;
+	}
+	spin_unlock(&hisi_acc_vdev->reset_lock);
+	hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
+}
+
 static int hisi_acc_vf_qm_init(struct hisi_acc_vf_core_device *hisi_acc_vdev)
 {
 	struct vfio_pci_core_device *vdev = &hisi_acc_vdev->core_device;
@@ -1302,12 +1348,17 @@ static const struct pci_device_id hisi_acc_vfio_pci_table[] = {
 
 MODULE_DEVICE_TABLE(pci, hisi_acc_vfio_pci_table);
 
+static const struct pci_error_handlers hisi_acc_vf_err_handlers = {
+	.reset_done = hisi_acc_vf_pci_aer_reset_done,
+	.error_detected = vfio_pci_core_aer_err_detected,
+};
+
 static struct pci_driver hisi_acc_vfio_pci_driver = {
 	.name = KBUILD_MODNAME,
 	.id_table = hisi_acc_vfio_pci_table,
 	.probe = hisi_acc_vfio_pci_probe,
 	.remove = hisi_acc_vfio_pci_remove,
-	.err_handler = &vfio_pci_core_err_handlers,
+	.err_handler = &hisi_acc_vf_err_handlers,
 };
 
 module_pci_driver(hisi_acc_vfio_pci_driver);
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
index 51bc7e92a776..6c18f7c74f34 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
@@ -96,6 +96,7 @@ struct hisi_acc_vf_migration_file {
 struct hisi_acc_vf_core_device {
 	struct vfio_pci_core_device core_device;
 	u8 match_done:1;
+	u8 deferred_reset:1;
 	/* for migration state */
 	struct mutex state_mutex;
 	enum vfio_device_mig_state mig_state;
@@ -105,7 +106,8 @@ struct hisi_acc_vf_core_device {
 	struct hisi_qm vf_qm;
 	u32 vf_qm_state;
 	int vf_id;
-
+	/* for reset handler */
+	spinlock_t reset_lock;
 	struct hisi_acc_vf_migration_file resuming_migf;
 	struct hisi_acc_vf_migration_file saving_migf;
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to include/linux
  2022-03-02 17:28 ` [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to include/linux Shameer Kolothum
@ 2022-03-02 18:48   ` John Garry
  2022-03-02 20:39     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2022-03-02 18:48 UTC (permalink / raw)
  To: Shameer Kolothum, kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

On 02/03/2022 17:28, Shameer Kolothum wrote:
> Since we are going to introduce VFIO PCI HiSilicon ACC
> driver for live migration in subsequent patches, move
> the ACC QM header file to a common include dir.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   drivers/crypto/hisilicon/hpre/hpre.h                         | 2 +-
>   drivers/crypto/hisilicon/qm.c                                | 2 +-
>   drivers/crypto/hisilicon/sec2/sec.h                          | 2 +-
>   drivers/crypto/hisilicon/sgl.c                               | 2 +-
>   drivers/crypto/hisilicon/zip/zip.h                           | 2 +-
>   drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h | 0

include/linux/crypto seems a better location. I'm not sure if someone 
suggested a location already, though.

>   6 files changed, 5 insertions(+), 5 deletions(-)
>   rename drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h (100%)
> 
> diff --git a/drivers/crypto/hisilicon/hpre/hpre.h b/drivers/crypto/hisilicon/hpre/hpre.h
> index e0b4a1982ee9..9a0558ed82f9 100644
> --- a/drivers/crypto/hisilicon/hpre/hpre.h
> +++ b/drivers/crypto/hisilicon/hpre/hpre.h
> @@ -4,7 +4,7 @@
>   #define __HISI_HPRE_H
>   
>   #include <linux/list.h>
> -#include "../qm.h"
> +#include <linux/hisi_acc_qm.h>
>   
>   #define HPRE_SQE_SIZE			sizeof(struct hpre_sqe)
>   #define HPRE_PF_DEF_Q_NUM		64
> diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
> index c5b84a5ea350..ed23e1d3fa27 100644
> --- a/drivers/crypto/hisilicon/qm.c
> +++ b/drivers/crypto/hisilicon/qm.c
> @@ -15,7 +15,7 @@
>   #include <linux/uacce.h>
>   #include <linux/uaccess.h>
>   #include <uapi/misc/uacce/hisi_qm.h>
> -#include "qm.h"
> +#include <linux/hisi_acc_qm.h>
>   
>   /* eq/aeq irq enable */
>   #define QM_VF_AEQ_INT_SOURCE		0x0
> diff --git a/drivers/crypto/hisilicon/sec2/sec.h b/drivers/crypto/hisilicon/sec2/sec.h
> index d97cf02b1df7..c2e9b01187a7 100644
> --- a/drivers/crypto/hisilicon/sec2/sec.h
> +++ b/drivers/crypto/hisilicon/sec2/sec.h
> @@ -4,7 +4,7 @@
>   #ifndef __HISI_SEC_V2_H
>   #define __HISI_SEC_V2_H
>   
> -#include "../qm.h"
> +#include <linux/hisi_acc_qm.h>
>   #include "sec_crypto.h"
>   
>   /* Algorithm resource per hardware SEC queue */
> diff --git a/drivers/crypto/hisilicon/sgl.c b/drivers/crypto/hisilicon/sgl.c
> index 057273769f26..534687401135 100644
> --- a/drivers/crypto/hisilicon/sgl.c
> +++ b/drivers/crypto/hisilicon/sgl.c
> @@ -3,7 +3,7 @@
>   #include <linux/dma-mapping.h>
>   #include <linux/module.h>
>   #include <linux/slab.h>
> -#include "qm.h"
> +#include <linux/hisi_acc_qm.h>

alphabetic ordering (ignoring previous point)?

>   
>   #define HISI_ACC_SGL_SGE_NR_MIN		1
>   #define HISI_ACC_SGL_NR_MAX		256
> diff --git a/drivers/crypto/hisilicon/zip/zip.h b/drivers/crypto/hisilicon/zip/zip.h
> index 517fdbdff3ea..3dfd3bac5a33 100644
> --- a/drivers/crypto/hisilicon/zip/zip.h
> +++ b/drivers/crypto/hisilicon/zip/zip.h
> @@ -7,7 +7,7 @@
>   #define pr_fmt(fmt)	"hisi_zip: " fmt
>   
>   #include <linux/list.h>
> -#include "../qm.h"
> +#include <linux/hisi_acc_qm.h>
>   
>   enum hisi_zip_error_type {
>   	/* negative compression */
> diff --git a/drivers/crypto/hisilicon/qm.h b/include/linux/hisi_acc_qm.h
> similarity index 100%
> rename from drivers/crypto/hisilicon/qm.h
> rename to include/linux/hisi_acc_qm.h


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 02/10] crypto: hisilicon/qm: Move few definitions to common header
  2022-03-02 17:28 ` [PATCH v7 02/10] crypto: hisilicon/qm: Move few definitions to common header Shameer Kolothum
@ 2022-03-02 18:55   ` John Garry
  2022-03-02 20:41     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2022-03-02 18:55 UTC (permalink / raw)
  To: Shameer Kolothum, kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

On 02/03/2022 17:28, Shameer Kolothum wrote:
> From: Longfang Liu <liulongfang@huawei.com>
> 
> Move Doorbell and Mailbox definitions to common header
> file. Also export QM mailbox functions.
> 
> This will be useful when we introduce VFIO PCI HiSilicon
> ACC live migration driver.
> 
> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   drivers/crypto/hisilicon/qm.c | 32 +++++------------------------
>   include/linux/hisi_acc_qm.h   | 38 +++++++++++++++++++++++++++++++++++
>   2 files changed, 43 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
> index ed23e1d3fa27..8c29f9fba573 100644
> --- a/drivers/crypto/hisilicon/qm.c
> +++ b/drivers/crypto/hisilicon/qm.c
> @@ -33,23 +33,6 @@
>   #define QM_ABNORMAL_EVENT_IRQ_VECTOR	3
>   
>   /* mailbox */
> -#define QM_MB_CMD_SQC			0x0
> -#define QM_MB_CMD_CQC			0x1
> -#define QM_MB_CMD_EQC			0x2
> -#define QM_MB_CMD_AEQC			0x3
> -#define QM_MB_CMD_SQC_BT		0x4
> -#define QM_MB_CMD_CQC_BT		0x5
> -#define QM_MB_CMD_SQC_VFT_V2		0x6
> -#define QM_MB_CMD_STOP_QP		0x8
> -#define QM_MB_CMD_SRC			0xc
> -#define QM_MB_CMD_DST			0xd
> -
> -#define QM_MB_CMD_SEND_BASE		0x300
> -#define QM_MB_EVENT_SHIFT		8
> -#define QM_MB_BUSY_SHIFT		13
> -#define QM_MB_OP_SHIFT			14
> -#define QM_MB_CMD_DATA_ADDR_L		0x304
> -#define QM_MB_CMD_DATA_ADDR_H		0x308
>   #define QM_MB_PING_ALL_VFS		0xffff
>   #define QM_MB_CMD_DATA_SHIFT		32
>   #define QM_MB_CMD_DATA_MASK		GENMASK(31, 0)
> @@ -103,19 +86,12 @@
>   #define QM_DB_CMD_SHIFT_V1		16
>   #define QM_DB_INDEX_SHIFT_V1		32
>   #define QM_DB_PRIORITY_SHIFT_V1		48
> -#define QM_DOORBELL_SQ_CQ_BASE_V2	0x1000
> -#define QM_DOORBELL_EQ_AEQ_BASE_V2	0x2000
>   #define QM_QUE_ISO_CFG_V		0x0030
>   #define QM_PAGE_SIZE			0x0034
>   #define QM_QUE_ISO_EN			0x100154
>   #define QM_CAPBILITY			0x100158
>   #define QM_QP_NUN_MASK			GENMASK(10, 0)
>   #define QM_QP_DB_INTERVAL		0x10000
> -#define QM_QP_MAX_NUM_SHIFT		11
> -#define QM_DB_CMD_SHIFT_V2		12
> -#define QM_DB_RAND_SHIFT_V2		16
> -#define QM_DB_INDEX_SHIFT_V2		32
> -#define QM_DB_PRIORITY_SHIFT_V2		48
>   
>   #define QM_MEM_START_INIT		0x100040
>   #define QM_MEM_INIT_DONE		0x100044
> @@ -693,7 +669,7 @@ static void qm_mb_pre_init(struct qm_mailbox *mailbox, u8 cmd,
>   }
>   
>   /* return 0 mailbox ready, -ETIMEDOUT hardware timeout */
> -static int qm_wait_mb_ready(struct hisi_qm *qm)
> +int qm_wait_mb_ready(struct hisi_qm *qm)
>   {
>   	u32 val;
>   
> @@ -701,6 +677,7 @@ static int qm_wait_mb_ready(struct hisi_qm *qm)
>   					  val, !((val >> QM_MB_BUSY_SHIFT) &
>   					  0x1), POLL_PERIOD, POLL_TIMEOUT);
>   }
> +EXPORT_SYMBOL_GPL(qm_wait_mb_ready);

Since these will be public they require a more distinctive name, like 
hisi_qm_wait_mb_ready or hisi_acc_qm_wait_mb_ready

>   
>   /* 128 bit should be written to hardware at one time to trigger a mailbox */
>   static void qm_mb_write(struct hisi_qm *qm, const void *src)
> @@ -745,8 +722,8 @@ static int qm_mb_nolock(struct hisi_qm *qm, struct qm_mailbox *mailbox)
>   	return -EBUSY;
>   }
>   
> -static int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
> -		 bool op)
> +int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
> +	  bool op)
>   {
>   	struct qm_mailbox mailbox;
>   	int ret;
> @@ -762,6 +739,7 @@ static int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
>   
>   	return ret;
>   }
> +EXPORT_SYMBOL_GPL(qm_mb);
>   
>   static void qm_db_v1(struct hisi_qm *qm, u16 qn, u8 cmd, u16 index, u8 priority)
>   {
> diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h
> index 3068093229a5..8befb59c6fb3 100644
> --- a/include/linux/hisi_acc_qm.h
> +++ b/include/linux/hisi_acc_qm.h
> @@ -34,6 +34,40 @@
>   #define QM_WUSER_M_CFG_ENABLE		0x1000a8
>   #define WUSER_M_CFG_ENABLE		0xffffffff
>   
> +/* mailbox */
> +#define QM_MB_CMD_SQC                   0x0
> +#define QM_MB_CMD_CQC                   0x1
> +#define QM_MB_CMD_EQC                   0x2
> +#define QM_MB_CMD_AEQC                  0x3
> +#define QM_MB_CMD_SQC_BT                0x4
> +#define QM_MB_CMD_CQC_BT                0x5
> +#define QM_MB_CMD_SQC_VFT_V2            0x6
> +#define QM_MB_CMD_STOP_QP               0x8
> +#define QM_MB_CMD_SRC                   0xc
> +#define QM_MB_CMD_DST                   0xd
> +
> +#define QM_MB_CMD_SEND_BASE		0x300
> +#define QM_MB_EVENT_SHIFT               8
> +#define QM_MB_BUSY_SHIFT		13
> +#define QM_MB_OP_SHIFT			14
> +#define QM_MB_CMD_DATA_ADDR_L		0x304
> +#define QM_MB_CMD_DATA_ADDR_H		0x308
> +#define QM_MB_MAX_WAIT_CNT		6000
> +
> +/* doorbell */
> +#define QM_DOORBELL_CMD_SQ              0
> +#define QM_DOORBELL_CMD_CQ              1
> +#define QM_DOORBELL_CMD_EQ              2
> +#define QM_DOORBELL_CMD_AEQ             3
> +
> +#define QM_DOORBELL_SQ_CQ_BASE_V2	0x1000
> +#define QM_DOORBELL_EQ_AEQ_BASE_V2	0x2000
> +#define QM_QP_MAX_NUM_SHIFT             11
> +#define QM_DB_CMD_SHIFT_V2		12
> +#define QM_DB_RAND_SHIFT_V2		16
> +#define QM_DB_INDEX_SHIFT_V2		32
> +#define QM_DB_PRIORITY_SHIFT_V2		48
> +
>   /* qm cache */
>   #define QM_CACHE_CTL			0x100050
>   #define SQC_CACHE_ENABLE		BIT(0)
> @@ -414,6 +448,10 @@ pci_ers_result_t hisi_qm_dev_slot_reset(struct pci_dev *pdev);
>   void hisi_qm_reset_prepare(struct pci_dev *pdev);
>   void hisi_qm_reset_done(struct pci_dev *pdev);
>   
> +int qm_wait_mb_ready(struct hisi_qm *qm);
> +int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
> +	  bool op);

As above, please notice how everything else has a "hisi" prefix

> +
>   struct hisi_acc_sgl_pool;
>   struct hisi_acc_hw_sgl *hisi_acc_sg_buf_map_to_hw_sgl(struct device *dev,
>   	struct scatterlist *sgl, struct hisi_acc_sgl_pool *pool,


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
  2022-03-02 17:28 ` [PATCH v7 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
@ 2022-03-02 19:02   ` John Garry
  2022-03-02 20:32     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2022-03-02 19:02 UTC (permalink / raw)
  To: Shameer Kolothum, kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	linuxarm, liulongfang, prime.zeng, jonathan.cameron, wangzhou1

On 02/03/2022 17:28, Shameer Kolothum wrote:
> +config HISI_ACC_VFIO_PCI
> +	tristate "VFIO PCI support for HiSilicon ACC devices"
> +	depends on (ARM64 && VFIO_PCI_CORE) || (COMPILE_TEST && 64BIT)

This means that we will have HISI_ACC_VFIO_PCI=y for COMPILE_TEST=y and 
64BIT=y and VFIO_PCI_CORE=n, but ...

> +	help
> +	  This provides generic PCI support for HiSilicon ACC devices
> +	  using the VFIO framework.
> +
> +	  If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/hisilicon/Makefile b/drivers/vfio/pci/hisilicon/Makefile
> new file mode 100644
> index 000000000000..c66b3783f2f9
> --- /dev/null
> +++ b/drivers/vfio/pci/hisilicon/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisi-acc-vfio-pci.o
> +hisi-acc-vfio-pci-y := hisi_acc_vfio_pci.o
> +
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> new file mode 100644
> index 000000000000..8129c3457b3b
> --- /dev/null
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -0,0 +1,100 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2021, HiSilicon Ltd.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/eventfd.h>
> +#include <linux/file.h>
> +#include <linux/hisi_acc_qm.h>
> +#include <linux/interrupt.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/vfio.h>
> +#include <linux/vfio_pci_core.h>
> +
> +static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct vfio_pci_core_device *vdev =
> +		container_of(core_vdev, struct vfio_pci_core_device, vdev);
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	vfio_pci_core_finish_enable(vdev);

... there does not seem to be a stub for vfio_pci_core_finish_enable(), 
so I don't think that we compile under the conditions described. I think 
that HISI_ACC_VFIO_PCI should always depends on HISI_ACC_VFIO_PCI

> +
> +	return 0;
> +}


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-02 17:29 ` [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY Shameer Kolothum
@ 2022-03-02 20:31   ` Alex Williamson
  2022-03-03  0:05     ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2022-03-02 20:31 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvm, linux-kernel, linux-crypto, linux-pci, jgg, cohuck,
	mgurtovoy, yishaih, linuxarm, liulongfang, prime.zeng,
	jonathan.cameron, wangzhou1

On Wed, 2 Mar 2022 17:29:00 +0000
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> The optional PRE_COPY states open the saving data transfer FD before
> reaching STOP_COPY and allows the device to dirty track internal state
> changes with the general idea to reduce the volume of data transferred
> in the STOP_COPY stage.
> 
> While in PRE_COPY the device remains RUNNING, but the saving FD is open.
> 
> Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P,
> which halts P2P transfers while continuing the saving FD.
> 
> PRE_COPY, with P2P support, requires the driver to implement 7 new arcs
> and exists as an optional FSM branch between RUNNING and STOP_COPY:
>     RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY
> 
> A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to
> query the progress of the precopy operation in the driver with the idea it
> will judge to move to STOP_COPY at least once the initial data set is
> transferred, and possibly after the dirty size has shrunk appropriately.
> 
> This ioctl is valid only in VFIO_DEVICE_STATE_PRE_COPY state and kernel
> driver should return -EINVAL from any other migration state.

Nit, PRE_COPY and PRE_COPY_P2P
 
> Compared to the v1 clarification, STOP_COPY -> PRE_COPY is made optional
> and to be defined in future.

It's not really optional, it's explicitly unsupported in this extension.

> While making the whole PRE_COPY feature
> optional eliminates the concern from mlx5, this is still a complicated arc
> to implement and seems prudent to leave it closed until a proper use case
> is developed.

This seems like a leftover from the RFC that could be dropped.

> We also split the pending_bytes report into the initial and
> sustaining values, and define the protocol to get an event via poll() for
> new dirty data during PRE_COPY.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> [Shameer: Renamed ioctl and updated ioctl usage description]
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  drivers/vfio/vfio.c       |  71 +++++++++++++++++++++++-
>  include/uapi/linux/vfio.h | 113 ++++++++++++++++++++++++++++++++++++--
>  2 files changed, 179 insertions(+), 5 deletions(-)
> 
...
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index fea86061b44e..440dbfaabdb3 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -819,12 +819,20 @@ struct vfio_device_feature {
>   * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
>   * is supported in addition to the STOP_COPY states.
>   *
> + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY means that
> + * PRE_COPY is supported in addition to the STOP_COPY states.
> + *
> + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY
> + * means that RUNNING_P2P, PRE_COPY and PRE_COPY_P2P are supported
> + * in addition to the STOP_COPY states.
> + *
>   * Other combinations of flags have behavior to be defined in the future.
>   */
>  struct vfio_device_feature_migration {
>  	__aligned_u64 flags;
>  #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
>  #define VFIO_MIGRATION_P2P		(1 << 1)
> +#define VFIO_MIGRATION_PRE_COPY		(1 << 2)
>  };
>  #define VFIO_DEVICE_FEATURE_MIGRATION 1
>  
> @@ -875,8 +883,13 @@ struct vfio_device_feature_mig_state {
>   *  RESUMING - The device is stopped and is loading a new internal state
>   *  ERROR - The device has failed and must be reset
>   *
> - * And 1 optional state to support VFIO_MIGRATION_P2P:
> + * And optional states to support VFIO_MIGRATION_P2P:
>   *  RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA
> + * And VFIO_MIGRATION_PRE_COPY:
> + *  PRE_COPY - The device is running normally but tracking internal state
> + *             changes
> + * And VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY:
> + *  PRE_COPY_P2P - PRE_COPY, except the device cannot do peer to peer DMA
>   *
>   * The FSM takes actions on the arcs between FSM states. The driver implements
>   * the following behavior for the FSM arcs:
> @@ -908,20 +921,48 @@ struct vfio_device_feature_mig_state {
>   *
>   *   To abort a RESUMING session the device must be reset.
>   *
> + * PRE_COPY -> RUNNING
>   * RUNNING_P2P -> RUNNING
>   *   While in RUNNING the device is fully operational, the device may generate
>   *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
>   *   and the device may advance its internal state.
>   *
> + *   The PRE_COPY arc will terminate a data transfer session.
> + *
> + * PRE_COPY_P2P -> RUNNING_P2P
>   * RUNNING -> RUNNING_P2P
>   * STOP -> RUNNING_P2P
>   *   While in RUNNING_P2P the device is partially running in the P2P quiescent
>   *   state defined below.
>   *
> + *   The PRE_COPY arc will terminate a data transfer session.
> + *
> + * RUNNING -> PRE_COPY
> + * RUNNING_P2P -> PRE_COPY_P2P
>   * STOP -> STOP_COPY
> - *   This arc begin the process of saving the device state and will return a
> - *   new data_fd.
> + *   PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states
> + *   which share a data transfer session. Moving between these states alters
> + *   what is streamed in session, but does not terminate or otherwise effect
> + *   the associated fd.
> + *
> + *   These arcs begin the process of saving the device state and will return a
> + *   new data_fd. The migration driver may perform actions such as enabling
> + *   dirty logging of device state when entering PRE_COPY or PER_COPY_P2P.
> + *
> + *   Each arc does not change the device operation, the device remains
> + *   RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below
> + *   in PRE_COPY_P2P -> STOP_COPY.
> + *
> + * PRE_COPY -> PRE_COPY_P2P
> + *   Entering PRE_COPY_P2P continues all the behaviors of PRE_COPY above.
> + *   However, while in the PRE_COPY_P2P state, the device is partially running
> + *   in the P2P quiescent state defined below, like RUNNING_P2P.
>   *
> + * PRE_COPY_P2P -> PRE_COPY
> + *   This arc allows returning the device to a full RUNNING behavior while
> + *   continuing all the behaviors of PRE_COPY.
> + *
> + * PRE_COPY_P2P -> STOP_COPY
>   *   While in the STOP_COPY state the device has the same behavior as STOP
>   *   with the addition that the data transfers session continues to stream the
>   *   migration state. End of stream on the FD indicates the entire device
> @@ -939,6 +980,13 @@ struct vfio_device_feature_mig_state {
>   *   device state for this arc if required to prepare the device to receive the
>   *   migration data.
>   *
> + * STOP_COPY -> PRE_COPY
> + * STOP_COPY -> PRE_COPY_P2P
> + *   These arcs are not permitted and return error if requested. Future
> + *   revisions of this API may define behaviors for these arcs, in this case
> + *   support will be discoverable by a new flag in
> + *   VFIO_DEVICE_FEATURE_MIGRATION.
> + *
>   * any -> ERROR
>   *   ERROR cannot be specified as a device state, however any transition request
>   *   can be failed with an errno return and may then move the device_state into
> @@ -950,7 +998,7 @@ struct vfio_device_feature_mig_state {
>   * The optional peer to peer (P2P) quiescent state is intended to be a quiescent
>   * state for the device for the purposes of managing multiple devices within a
>   * user context where peer-to-peer DMA between devices may be active. The
> - * RUNNING_P2P states must prevent the device from initiating
> + * RUNNING_P2P and PRE_COPY_P2P states must prevent the device from initiating
>   * any new P2P DMA transactions. If the device can identify P2P transactions
>   * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration
>   * driver must complete any such outstanding operations prior to completing the
> @@ -963,6 +1011,8 @@ struct vfio_device_feature_mig_state {
>   * above FSM arcs. As there are multiple paths through the FSM arcs the path
>   * should be selected based on the following rules:
>   *   - Select the shortest path.
> + *   - The path cannot have saving group states as interior arcs, only
> + *     starting/end states.
>   * Refer to vfio_mig_get_next_state() for the result of the algorithm.
>   *
>   * The automatic transit through the FSM arcs that make up the combination
> @@ -976,6 +1026,9 @@ struct vfio_device_feature_mig_state {
>   * support them. The user can discover if these states are supported by using
>   * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can
>   * avoid knowing about these optional states if the kernel driver supports them.
> + *
> + * Arcs touching PRE_COPY and PRE_COPY_P2P are removed if support for PRE_COPY
> + * is not present.
>   */
>  enum vfio_device_mig_state {
>  	VFIO_DEVICE_STATE_ERROR = 0,
> @@ -984,8 +1037,60 @@ enum vfio_device_mig_state {
>  	VFIO_DEVICE_STATE_STOP_COPY = 3,
>  	VFIO_DEVICE_STATE_RESUMING = 4,
>  	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
> +	VFIO_DEVICE_STATE_PRE_COPY = 6,
> +	VFIO_DEVICE_STATE_PRE_COPY_P2P = 7,
> +};
> +
> +/**
> + * VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21)
> + *
> + * This ioctl is used on the migration data FD in the precopy phase of the
> + * migration data transfer. It returns an estimate of the current data sizes
> + * remaining to be transferred. It allows the user to judge when it is
> + * appropriate to leave PRE_COPY for STOP_COPY.
> + *
> + * This ioctl is valid only in VFIO_DEVICE_STATE_PRE_COPY state and kernel

PRE_COPY and PRE_COPY_P2P states, maybe we can generally refer to these
as "PRE_COPY states".

> + * driver should return -EINVAL from any other migration state.
> + *
> + * initial_bytes reflects the estimated remaining size of any initial mandatory
> + * precopy data transfer. When initial_bytes returns as zero then the initial
> + * phase of the precopy data is completed. Generally initial_bytes should start
> + * out as approximately the entire device state.

What is "mandatory" intended to mean here?  The user isn't required to
collect any data from the device in the PRE_COPY states.

> + *
> + * dirty_bytes reflects an estimate for how much more data needs to be
> + * transferred to complete the migration. Generally it should start as zero
> + * and increase as internal state is dirtied.

Maybe let's try to combine the descriptions, I'll give it a shot:

"The vfio_precopy_info data structure returned by this ioctl provides
 estimates of data available from the device during the PRE_COPY states.
 This estimate is split into two categories, initial_bytes and
 dirty_bytes.

 The initial_bytes field indicates the amount of static data available
 from the device.  This field should have a non-zero initial value and
 decrease as migration data is read from the device.

 The dirty_bytes field tracks device state changes relative to data
 previously retrieved.  This field starts at zero and may increase as
 the internal device state is modified or decrease as that modified
 state is read from the device.

 Userspace may use the combination of these fields to estimate the
 potential data size available during the PRE_COPY phases, as well as
 trends relative to the rate the device is dirtying it's internal
 state, but these fields are not required to have any bearing relative
 to the data size available during the STOP_COPY phase."

> + *
> + * Drivers should attempt to return estimates so that initial_bytes +
> + * dirty_bytes matches the amount of data an immediate transition to STOP_COPY
> + * will require to be streamed.

I think previous discussions have proven this false, we expect trailing
data that is only available in STOP_COPY, we cannot bound the size of
that data, and dirty_bytes is not intended to expose data that cannot
be retrieved during the PRE_COPY phases.  Thanks,

Alex

> + *
> + * Drivers have a lot of flexibility in when and what they transfer during the
> + * PRE_COPY phase, and how they report this from VFIO_MIG_GET_PRECOPY_INFO.
> + *
> + * During pre-copy the migration data FD has a temporary "end of stream" that is
> + * reached when both initial_bytes and dirty_byte are zero. For instance, this
> + * may indicate that the device is idle and not currently dirtying any internal
> + * state. When read() is done on this temporary end of stream the kernel driver
> + * should return ENOMSG from read(). Userspace can wait for more data (which may
> + * never come) by using poll.
> + *
> + * Once in STOP_COPY the migration data FD has a permanent end of stream
> + * signaled in the usual way by read() always returning 0 and poll always
> + * returning readable. ENOMSG may not be returned in STOP_COPY. Support
> + * for this ioctl is optional.
> + *
> + * Return: 0 on success, -1 and errno set on failure.
> + */
> +struct vfio_precopy_info {
> +	__u32 argsz;
> +	__u32 flags;
> +	__aligned_u64 initial_bytes;
> +	__aligned_u64 dirty_bytes;
>  };
>  
> +#define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21)
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>  
>  /**


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v7 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
  2022-03-02 19:02   ` John Garry
@ 2022-03-02 20:32     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 32+ messages in thread
From: Shameerali Kolothum Thodi @ 2022-03-02 20:32 UTC (permalink / raw)
  To: John Garry, kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)



> -----Original Message-----
> From: John Garry
> Sent: 02 March 2022 19:03
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org
> Cc: linux-pci@vger.kernel.org; alex.williamson@redhat.com; jgg@nvidia.com;
> cohuck@redhat.com; mgurtovoy@nvidia.com; yishaih@nvidia.com; Linuxarm
> <linuxarm@huawei.com>; liulongfang <liulongfang@huawei.com>; Zengtao (B)
> <prime.zeng@hisilicon.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v7 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for
> HiSilicon ACC devices
> 
> On 02/03/2022 17:28, Shameer Kolothum wrote:
> > +config HISI_ACC_VFIO_PCI
> > +	tristate "VFIO PCI support for HiSilicon ACC devices"
> > +	depends on (ARM64 && VFIO_PCI_CORE) || (COMPILE_TEST && 64BIT)
> 
> This means that we will have HISI_ACC_VFIO_PCI=y for COMPILE_TEST=y and
> 64BIT=y and VFIO_PCI_CORE=n, but ...
> 
> > +	help
> > +	  This provides generic PCI support for HiSilicon ACC devices
> > +	  using the VFIO framework.
> > +
> > +	  If you don't know what to do here, say N.
> > diff --git a/drivers/vfio/pci/hisilicon/Makefile
> b/drivers/vfio/pci/hisilicon/Makefile
> > new file mode 100644
> > index 000000000000..c66b3783f2f9
> > --- /dev/null
> > +++ b/drivers/vfio/pci/hisilicon/Makefile
> > @@ -0,0 +1,4 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisi-acc-vfio-pci.o
> > +hisi-acc-vfio-pci-y := hisi_acc_vfio_pci.o
> > +
> > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > new file mode 100644
> > index 000000000000..8129c3457b3b
> > --- /dev/null
> > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > @@ -0,0 +1,100 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (c) 2021, HiSilicon Ltd.
> > + */
> > +
> > +#include <linux/device.h>
> > +#include <linux/eventfd.h>
> > +#include <linux/file.h>
> > +#include <linux/hisi_acc_qm.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/vfio.h>
> > +#include <linux/vfio_pci_core.h>
> > +
> > +static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
> > +{
> > +	struct vfio_pci_core_device *vdev =
> > +		container_of(core_vdev, struct vfio_pci_core_device, vdev);
> > +	int ret;
> > +
> > +	ret = vfio_pci_core_enable(vdev);
> > +	if (ret)
> > +		return ret;
> > +
> > +	vfio_pci_core_finish_enable(vdev);
> 
> ... there does not seem to be a stub for vfio_pci_core_finish_enable(),
> so I don't think that we compile under the conditions described. I think
> that HISI_ACC_VFIO_PCI should always depends on HISI_ACC_VFIO_PCI
> 

That looks right. I do remember running a compile test with x86_64. May
be VFIO_PCI_CORE was somehow selected.

I will make the "depends on VFIO_PCI_CORE" common.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to include/linux
  2022-03-02 18:48   ` John Garry
@ 2022-03-02 20:39     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 32+ messages in thread
From: Shameerali Kolothum Thodi @ 2022-03-02 20:39 UTC (permalink / raw)
  To: John Garry, kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)



> -----Original Message-----
> From: John Garry
> Sent: 02 March 2022 18:48
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org
> Cc: linux-pci@vger.kernel.org; alex.williamson@redhat.com; jgg@nvidia.com;
> cohuck@redhat.com; mgurtovoy@nvidia.com; yishaih@nvidia.com; Linuxarm
> <linuxarm@huawei.com>; liulongfang <liulongfang@huawei.com>; Zengtao (B)
> <prime.zeng@hisilicon.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to
> include/linux
> 
> On 02/03/2022 17:28, Shameer Kolothum wrote:
> > Since we are going to introduce VFIO PCI HiSilicon ACC
> > driver for live migration in subsequent patches, move
> > the ACC QM header file to a common include dir.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >   drivers/crypto/hisilicon/hpre/hpre.h                         | 2 +-
> >   drivers/crypto/hisilicon/qm.c                                | 2 +-
> >   drivers/crypto/hisilicon/sec2/sec.h                          | 2 +-
> >   drivers/crypto/hisilicon/sgl.c                               | 2 +-
> >   drivers/crypto/hisilicon/zip/zip.h                           | 2 +-
> >   drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h | 0
> 
> include/linux/crypto seems a better location. I'm not sure if someone
> suggested a location already, though.

Hmm...Not sure we need to create "crypto" just for this one.
> 
> >   6 files changed, 5 insertions(+), 5 deletions(-)
> >   rename drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h
> (100%)
> >
> > diff --git a/drivers/crypto/hisilicon/hpre/hpre.h
> b/drivers/crypto/hisilicon/hpre/hpre.h
> > index e0b4a1982ee9..9a0558ed82f9 100644
> > --- a/drivers/crypto/hisilicon/hpre/hpre.h
> > +++ b/drivers/crypto/hisilicon/hpre/hpre.h
> > @@ -4,7 +4,7 @@
> >   #define __HISI_HPRE_H
> >
> >   #include <linux/list.h>
> > -#include "../qm.h"
> > +#include <linux/hisi_acc_qm.h>
> >
> >   #define HPRE_SQE_SIZE			sizeof(struct hpre_sqe)
> >   #define HPRE_PF_DEF_Q_NUM		64
> > diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
> > index c5b84a5ea350..ed23e1d3fa27 100644
> > --- a/drivers/crypto/hisilicon/qm.c
> > +++ b/drivers/crypto/hisilicon/qm.c
> > @@ -15,7 +15,7 @@
> >   #include <linux/uacce.h>
> >   #include <linux/uaccess.h>
> >   #include <uapi/misc/uacce/hisi_qm.h>
> > -#include "qm.h"
> > +#include <linux/hisi_acc_qm.h>
> >
> >   /* eq/aeq irq enable */
> >   #define QM_VF_AEQ_INT_SOURCE		0x0
> > diff --git a/drivers/crypto/hisilicon/sec2/sec.h
> b/drivers/crypto/hisilicon/sec2/sec.h
> > index d97cf02b1df7..c2e9b01187a7 100644
> > --- a/drivers/crypto/hisilicon/sec2/sec.h
> > +++ b/drivers/crypto/hisilicon/sec2/sec.h
> > @@ -4,7 +4,7 @@
> >   #ifndef __HISI_SEC_V2_H
> >   #define __HISI_SEC_V2_H
> >
> > -#include "../qm.h"
> > +#include <linux/hisi_acc_qm.h>
> >   #include "sec_crypto.h"
> >
> >   /* Algorithm resource per hardware SEC queue */
> > diff --git a/drivers/crypto/hisilicon/sgl.c b/drivers/crypto/hisilicon/sgl.c
> > index 057273769f26..534687401135 100644
> > --- a/drivers/crypto/hisilicon/sgl.c
> > +++ b/drivers/crypto/hisilicon/sgl.c
> > @@ -3,7 +3,7 @@
> >   #include <linux/dma-mapping.h>
> >   #include <linux/module.h>
> >   #include <linux/slab.h>
> > -#include "qm.h"
> > +#include <linux/hisi_acc_qm.h>
> 
> alphabetic ordering (ignoring previous point)?

Sure.

Thanks,
Shameer
> >
> >   #define HISI_ACC_SGL_SGE_NR_MIN		1
> >   #define HISI_ACC_SGL_NR_MAX		256
> > diff --git a/drivers/crypto/hisilicon/zip/zip.h
> b/drivers/crypto/hisilicon/zip/zip.h
> > index 517fdbdff3ea..3dfd3bac5a33 100644
> > --- a/drivers/crypto/hisilicon/zip/zip.h
> > +++ b/drivers/crypto/hisilicon/zip/zip.h
> > @@ -7,7 +7,7 @@
> >   #define pr_fmt(fmt)	"hisi_zip: " fmt
> >
> >   #include <linux/list.h>
> > -#include "../qm.h"
> > +#include <linux/hisi_acc_qm.h>
> >
> >   enum hisi_zip_error_type {
> >   	/* negative compression */
> > diff --git a/drivers/crypto/hisilicon/qm.h b/include/linux/hisi_acc_qm.h
> > similarity index 100%
> > rename from drivers/crypto/hisilicon/qm.h
> > rename to include/linux/hisi_acc_qm.h


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v7 02/10] crypto: hisilicon/qm: Move few definitions to common header
  2022-03-02 18:55   ` John Garry
@ 2022-03-02 20:41     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 32+ messages in thread
From: Shameerali Kolothum Thodi @ 2022-03-02 20:41 UTC (permalink / raw)
  To: John Garry, kvm, linux-kernel, linux-crypto
  Cc: linux-pci, alex.williamson, jgg, cohuck, mgurtovoy, yishaih,
	Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)



> -----Original Message-----
> From: John Garry
> Sent: 02 March 2022 18:55
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org
> Cc: linux-pci@vger.kernel.org; alex.williamson@redhat.com; jgg@nvidia.com;
> cohuck@redhat.com; mgurtovoy@nvidia.com; yishaih@nvidia.com; Linuxarm
> <linuxarm@huawei.com>; liulongfang <liulongfang@huawei.com>; Zengtao (B)
> <prime.zeng@hisilicon.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v7 02/10] crypto: hisilicon/qm: Move few definitions to
> common header
> 
> On 02/03/2022 17:28, Shameer Kolothum wrote:
> > From: Longfang Liu <liulongfang@huawei.com>
> >
> > Move Doorbell and Mailbox definitions to common header
> > file. Also export QM mailbox functions.
> >
> > This will be useful when we introduce VFIO PCI HiSilicon
> > ACC live migration driver.
> >
> > Signed-off-by: Longfang Liu <liulongfang@huawei.com>
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >   drivers/crypto/hisilicon/qm.c | 32 +++++------------------------
> >   include/linux/hisi_acc_qm.h   | 38
> +++++++++++++++++++++++++++++++++++
> >   2 files changed, 43 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
> > index ed23e1d3fa27..8c29f9fba573 100644
> > --- a/drivers/crypto/hisilicon/qm.c
> > +++ b/drivers/crypto/hisilicon/qm.c
> > @@ -33,23 +33,6 @@
> >   #define QM_ABNORMAL_EVENT_IRQ_VECTOR	3
> >
> >   /* mailbox */
> > -#define QM_MB_CMD_SQC			0x0
> > -#define QM_MB_CMD_CQC			0x1
> > -#define QM_MB_CMD_EQC			0x2
> > -#define QM_MB_CMD_AEQC			0x3
> > -#define QM_MB_CMD_SQC_BT		0x4
> > -#define QM_MB_CMD_CQC_BT		0x5
> > -#define QM_MB_CMD_SQC_VFT_V2		0x6
> > -#define QM_MB_CMD_STOP_QP		0x8
> > -#define QM_MB_CMD_SRC			0xc
> > -#define QM_MB_CMD_DST			0xd
> > -
> > -#define QM_MB_CMD_SEND_BASE		0x300
> > -#define QM_MB_EVENT_SHIFT		8
> > -#define QM_MB_BUSY_SHIFT		13
> > -#define QM_MB_OP_SHIFT			14
> > -#define QM_MB_CMD_DATA_ADDR_L		0x304
> > -#define QM_MB_CMD_DATA_ADDR_H		0x308
> >   #define QM_MB_PING_ALL_VFS		0xffff
> >   #define QM_MB_CMD_DATA_SHIFT		32
> >   #define QM_MB_CMD_DATA_MASK		GENMASK(31, 0)
> > @@ -103,19 +86,12 @@
> >   #define QM_DB_CMD_SHIFT_V1		16
> >   #define QM_DB_INDEX_SHIFT_V1		32
> >   #define QM_DB_PRIORITY_SHIFT_V1		48
> > -#define QM_DOORBELL_SQ_CQ_BASE_V2	0x1000
> > -#define QM_DOORBELL_EQ_AEQ_BASE_V2	0x2000
> >   #define QM_QUE_ISO_CFG_V		0x0030
> >   #define QM_PAGE_SIZE			0x0034
> >   #define QM_QUE_ISO_EN			0x100154
> >   #define QM_CAPBILITY			0x100158
> >   #define QM_QP_NUN_MASK			GENMASK(10, 0)
> >   #define QM_QP_DB_INTERVAL		0x10000
> > -#define QM_QP_MAX_NUM_SHIFT		11
> > -#define QM_DB_CMD_SHIFT_V2		12
> > -#define QM_DB_RAND_SHIFT_V2		16
> > -#define QM_DB_INDEX_SHIFT_V2		32
> > -#define QM_DB_PRIORITY_SHIFT_V2		48
> >
> >   #define QM_MEM_START_INIT		0x100040
> >   #define QM_MEM_INIT_DONE		0x100044
> > @@ -693,7 +669,7 @@ static void qm_mb_pre_init(struct qm_mailbox
> *mailbox, u8 cmd,
> >   }
> >
> >   /* return 0 mailbox ready, -ETIMEDOUT hardware timeout */
> > -static int qm_wait_mb_ready(struct hisi_qm *qm)
> > +int qm_wait_mb_ready(struct hisi_qm *qm)
> >   {
> >   	u32 val;
> >
> > @@ -701,6 +677,7 @@ static int qm_wait_mb_ready(struct hisi_qm *qm)
> >   					  val, !((val >> QM_MB_BUSY_SHIFT) &
> >   					  0x1), POLL_PERIOD, POLL_TIMEOUT);
> >   }
> > +EXPORT_SYMBOL_GPL(qm_wait_mb_ready);
> 
> Since these will be public they require a more distinctive name, like
> hisi_qm_wait_mb_ready or hisi_acc_qm_wait_mb_ready
> 
> >
> >   /* 128 bit should be written to hardware at one time to trigger a mailbox
> */
> >   static void qm_mb_write(struct hisi_qm *qm, const void *src)
> > @@ -745,8 +722,8 @@ static int qm_mb_nolock(struct hisi_qm *qm, struct
> qm_mailbox *mailbox)
> >   	return -EBUSY;
> >   }
> >
> > -static int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16
> queue,
> > -		 bool op)
> > +int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
> > +	  bool op)
> >   {
> >   	struct qm_mailbox mailbox;
> >   	int ret;
> > @@ -762,6 +739,7 @@ static int qm_mb(struct hisi_qm *qm, u8 cmd,
> dma_addr_t dma_addr, u16 queue,
> >
> >   	return ret;
> >   }
> > +EXPORT_SYMBOL_GPL(qm_mb);
> >
> >   static void qm_db_v1(struct hisi_qm *qm, u16 qn, u8 cmd, u16 index, u8
> priority)
> >   {
> > diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h
> > index 3068093229a5..8befb59c6fb3 100644
> > --- a/include/linux/hisi_acc_qm.h
> > +++ b/include/linux/hisi_acc_qm.h
> > @@ -34,6 +34,40 @@
> >   #define QM_WUSER_M_CFG_ENABLE		0x1000a8
> >   #define WUSER_M_CFG_ENABLE		0xffffffff
> >
> > +/* mailbox */
> > +#define QM_MB_CMD_SQC                   0x0
> > +#define QM_MB_CMD_CQC                   0x1
> > +#define QM_MB_CMD_EQC                   0x2
> > +#define QM_MB_CMD_AEQC                  0x3
> > +#define QM_MB_CMD_SQC_BT                0x4
> > +#define QM_MB_CMD_CQC_BT                0x5
> > +#define QM_MB_CMD_SQC_VFT_V2            0x6
> > +#define QM_MB_CMD_STOP_QP               0x8
> > +#define QM_MB_CMD_SRC                   0xc
> > +#define QM_MB_CMD_DST                   0xd
> > +
> > +#define QM_MB_CMD_SEND_BASE		0x300
> > +#define QM_MB_EVENT_SHIFT               8
> > +#define QM_MB_BUSY_SHIFT		13
> > +#define QM_MB_OP_SHIFT			14
> > +#define QM_MB_CMD_DATA_ADDR_L		0x304
> > +#define QM_MB_CMD_DATA_ADDR_H		0x308
> > +#define QM_MB_MAX_WAIT_CNT		6000
> > +
> > +/* doorbell */
> > +#define QM_DOORBELL_CMD_SQ              0
> > +#define QM_DOORBELL_CMD_CQ              1
> > +#define QM_DOORBELL_CMD_EQ              2
> > +#define QM_DOORBELL_CMD_AEQ             3
> > +
> > +#define QM_DOORBELL_SQ_CQ_BASE_V2	0x1000
> > +#define QM_DOORBELL_EQ_AEQ_BASE_V2	0x2000
> > +#define QM_QP_MAX_NUM_SHIFT             11
> > +#define QM_DB_CMD_SHIFT_V2		12
> > +#define QM_DB_RAND_SHIFT_V2		16
> > +#define QM_DB_INDEX_SHIFT_V2		32
> > +#define QM_DB_PRIORITY_SHIFT_V2		48
> > +
> >   /* qm cache */
> >   #define QM_CACHE_CTL			0x100050
> >   #define SQC_CACHE_ENABLE		BIT(0)
> > @@ -414,6 +448,10 @@ pci_ers_result_t hisi_qm_dev_slot_reset(struct
> pci_dev *pdev);
> >   void hisi_qm_reset_prepare(struct pci_dev *pdev);
> >   void hisi_qm_reset_done(struct pci_dev *pdev);
> >
> > +int qm_wait_mb_ready(struct hisi_qm *qm);
> > +int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
> > +	  bool op);
> 
> As above, please notice how everything else has a "hisi" prefix

Make sense. Will do.

Thanks,
Shameer

> 
> > +
> >   struct hisi_acc_sgl_pool;
> >   struct hisi_acc_hw_sgl *hisi_acc_sg_buf_map_to_hw_sgl(struct device
> *dev,
> >   	struct scatterlist *sgl, struct hisi_acc_sgl_pool *pool,


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-02 20:31   ` Alex Williamson
@ 2022-03-03  0:05     ` Jason Gunthorpe
  2022-03-03  3:47       ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2022-03-03  0:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Shameer Kolothum, kvm, linux-kernel, linux-crypto, linux-pci,
	cohuck, mgurtovoy, yishaih, linuxarm, liulongfang, prime.zeng,
	jonathan.cameron, wangzhou1

On Wed, Mar 02, 2022 at 01:31:59PM -0700, Alex Williamson wrote:
> > + * initial_bytes reflects the estimated remaining size of any initial mandatory
> > + * precopy data transfer. When initial_bytes returns as zero then the initial
> > + * phase of the precopy data is completed. Generally initial_bytes should start
> > + * out as approximately the entire device state.
> 
> What is "mandatory" intended to mean here?  The user isn't required to
> collect any data from the device in the PRE_COPY states.

If the data is split into initial,dirty,trailer then mandatory means
that first chunk.

> "The vfio_precopy_info data structure returned by this ioctl provides
>  estimates of data available from the device during the PRE_COPY states.
>  This estimate is split into two categories, initial_bytes and
>  dirty_bytes.
> 
>  The initial_bytes field indicates the amount of static data available
>  from the device.  This field should have a non-zero initial value and
>  decrease as migration data is read from the device.

static isn't great either, how about just say 'minimum data available'

>  Userspace may use the combination of these fields to estimate the
>  potential data size available during the PRE_COPY phases, as well as
>  trends relative to the rate the device is dirtying it's internal
>  state, but these fields are not required to have any bearing relative
>  to the data size available during the STOP_COPY phase."

That last is too strong. I would just drop starting at but.

The message to communicate is the device should allow dirty_bytes to
reach 0 during the PRE_COPY phases if everything is is idle. Which
tells alot about how to calculate it.

It is all better otherwise

> > + * Drivers should attempt to return estimates so that initial_bytes +
> > + * dirty_bytes matches the amount of data an immediate transition to STOP_COPY
> > + * will require to be streamed.
>
> I think previous discussions have proven this false, we expect trailing
> data that is only available in STOP_COPY, we cannot bound the size of
> that data, and dirty_bytes is not intended to expose data that cannot
> be retrieved during the PRE_COPY phases.  Thanks,

It was written assuming the stop_copy trailer is small.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
  2022-03-02 17:29 ` [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration Shameer Kolothum
@ 2022-03-03  0:21   ` Jason Gunthorpe
  2022-03-03 12:57     ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2022-03-03  0:21 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: kvm, linux-kernel, linux-crypto, linux-pci, alex.williamson,
	cohuck, mgurtovoy, yishaih, linuxarm, liulongfang, prime.zeng,
	jonathan.cameron, wangzhou1

On Wed, Mar 02, 2022 at 05:29:02PM +0000, Shameer Kolothum wrote:
> +static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
> +				       unsigned int cmd, unsigned long arg)
> +{
> +	struct hisi_acc_vf_migration_file *migf = filp->private_data;
> +	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(migf,
> +			struct hisi_acc_vf_core_device, saving_migf);
> +	loff_t *pos = &filp->f_pos;
> +	struct vfio_precopy_info info;
> +	unsigned long minsz;
> +	int ret;
> +
> +	if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
> +		return -ENOTTY;
> +
> +	minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
> +
> +	if (copy_from_user(&info, (void __user *)arg, minsz))
> +		return -EFAULT;
> +	if (info.argsz < minsz)
> +		return -EINVAL;
> +
> +	mutex_lock(&hisi_acc_vdev->state_mutex);
> +	if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
> +		mutex_unlock(&hisi_acc_vdev->state_mutex);
> +		return -EINVAL;
> +	}

IMHO it is easier just to check the total_length and not grab this
other lock

> +struct acc_vf_data {
> +#define QM_MATCH_SIZE 32L

This should be

#define QM_MATCH_SIZE offsetofend(struct acc_vf_data, qm_rsv_state)

> +	/* QM match information */

You should probably put an 8 byte random magic number here just to
make the compatibility more unique.

> +	u32 qp_num;
> +	u32 dev_id;
> +	u32 que_iso_cfg;
> +	u32 qp_base;
> +	/* QM reserved match information */
> +	u32 qm_rsv_state[4];
> +
> +	/* QM RW regs */
> +	u32 aeq_int_mask;
> +	u32 eq_int_mask;
> +	u32 ifc_int_source;
> +	u32 ifc_int_mask;
> +	u32 ifc_int_set;
> +	u32 page_size;
> +
> +	/* QM_EQC_DW has 7 regs */
> +	u32 qm_eqc_dw[7];
> +
> +	/* QM_AEQC_DW has 7 regs */
> +	u32 qm_aeqc_dw[7];
> +
> +	/* QM reserved 5 regs */
> +	u32 qm_rsv_regs[5];
> +
> +	/* qm memory init information */
> +	u64 eqe_dma;

Am I counting wrong or is there a padding before this? 7+7+5 is not a multiple
of 2. Be explicit about padding in a structure like this.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-03  0:05     ` Jason Gunthorpe
@ 2022-03-03  3:47       ` Alex Williamson
  2022-03-03 13:01         ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2022-03-03  3:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameer Kolothum, kvm, linux-kernel, linux-crypto, linux-pci,
	cohuck, mgurtovoy, yishaih, linuxarm, liulongfang, prime.zeng,
	jonathan.cameron, wangzhou1

On Wed, 2 Mar 2022 20:05:28 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Mar 02, 2022 at 01:31:59PM -0700, Alex Williamson wrote:
> > > + * initial_bytes reflects the estimated remaining size of any initial mandatory
> > > + * precopy data transfer. When initial_bytes returns as zero then the initial
> > > + * phase of the precopy data is completed. Generally initial_bytes should start
> > > + * out as approximately the entire device state.  
> > 
> > What is "mandatory" intended to mean here?  The user isn't required to
> > collect any data from the device in the PRE_COPY states.  
> 
> If the data is split into initial,dirty,trailer then mandatory means
> that first chunk.

But there's no requirement to read anything in PRE_COPY, so initial
becomes indistinguishable from trailer and dirty doesn't exist.

> > "The vfio_precopy_info data structure returned by this ioctl provides
> >  estimates of data available from the device during the PRE_COPY states.
> >  This estimate is split into two categories, initial_bytes and
> >  dirty_bytes.
> > 
> >  The initial_bytes field indicates the amount of static data available
> >  from the device.  This field should have a non-zero initial value and
> >  decrease as migration data is read from the device.  
> 
> static isn't great either, how about just say 'minimum data available'

'initial precopy data-set'?

> >  Userspace may use the combination of these fields to estimate the
> >  potential data size available during the PRE_COPY phases, as well as
> >  trends relative to the rate the device is dirtying it's internal
> >  state, but these fields are not required to have any bearing relative
> >  to the data size available during the STOP_COPY phase."  
> 
> That last is too strong. I would just drop starting at but.
> 
> The message to communicate is the device should allow dirty_bytes to
> reach 0 during the PRE_COPY phases if everything is is idle. Which
> tells alot about how to calculate it.
> 
> It is all better otherwise
> 
> > > + * Drivers should attempt to return estimates so that initial_bytes +
> > > + * dirty_bytes matches the amount of data an immediate transition to STOP_COPY
> > > + * will require to be streamed.  
> >
> > I think previous discussions have proven this false, we expect trailing
> > data that is only available in STOP_COPY, we cannot bound the size of
> > that data, and dirty_bytes is not intended to expose data that cannot
> > be retrieved during the PRE_COPY phases.  Thanks,  
> 
> It was written assuming the stop_copy trailer is small.

We have no basis to make that assertion.  We've agreed that precopy can
be used for nothing more than a compatibility test, so we could have a
vGPU with a massive framebuffer and no ability to provide dirty
tracking implement precopy only to include the entire framebuffer in
the trailing STOP_COPY data set.  Per my understanding and the fact
that we cannot enforce any heuristics regarding the size of the tailer
relative to the pre-copy data set, I think the above strongly phrased
sentence is necessary to understand the limitations of what this ioctl
is meant to convey.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
  2022-03-03  0:21   ` Jason Gunthorpe
@ 2022-03-03 12:57     ` Shameerali Kolothum Thodi
  2022-03-03 13:04       ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Shameerali Kolothum Thodi @ 2022-03-03 12:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, linux-kernel, linux-crypto, linux-pci, alex.williamson,
	cohuck, mgurtovoy, yishaih, Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)



> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 03 March 2022 00:22
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org; linux-pci@vger.kernel.org;
> alex.williamson@redhat.com; cohuck@redhat.com; mgurtovoy@nvidia.com;
> yishaih@nvidia.com; Linuxarm <linuxarm@huawei.com>; liulongfang
> <liulongfang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> migration
> 
> On Wed, Mar 02, 2022 at 05:29:02PM +0000, Shameer Kolothum wrote:
> > +static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
> > +				       unsigned int cmd, unsigned long arg)
> > +{
> > +	struct hisi_acc_vf_migration_file *migf = filp->private_data;
> > +	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(migf,
> > +			struct hisi_acc_vf_core_device, saving_migf);
> > +	loff_t *pos = &filp->f_pos;
> > +	struct vfio_precopy_info info;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
> > +		return -ENOTTY;
> > +
> > +	minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
> > +
> > +	if (copy_from_user(&info, (void __user *)arg, minsz))
> > +		return -EFAULT;
> > +	if (info.argsz < minsz)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&hisi_acc_vdev->state_mutex);
> > +	if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
> > +		mutex_unlock(&hisi_acc_vdev->state_mutex);
> > +		return -EINVAL;
> > +	}
> 
> IMHO it is easier just to check the total_length and not grab this
> other lock

The problem with checking the total_length here is that it is possible that
in STOP_COPY the dev is not ready and there are no more data to be transferred 
and the total_length remains at QM_MATCH_SIZE.

This just reminded me that the -ENOMSG setting logic in save_read() is
wrong now as it uses only the total_length to determine the PRE_COPY state. 

I think either we need to get the curr state info at both places or in STOP_COPY,
if there are no additional data, set the total_length = 0 and handle it in save_read().

Looks like setting the total_length = 0 in STOP_COPY is a better solution(If there are
no other issues with that) as it will avoid grabbing the state_mutex as you
mentioned above.

> > +struct acc_vf_data {
> > +#define QM_MATCH_SIZE 32L
> 
> This should be
> 
> #define QM_MATCH_SIZE offsetofend(struct acc_vf_data, qm_rsv_state)

Ok.

> > +	/* QM match information */
> 
> You should probably put an 8 byte random magic number here just to
> make the compatibility more unique.

Ok. Will add one.

> > +	u32 qp_num;
> > +	u32 dev_id;
> > +	u32 que_iso_cfg;
> > +	u32 qp_base;
> > +	/* QM reserved match information */
> > +	u32 qm_rsv_state[4];
> > +
> > +	/* QM RW regs */
> > +	u32 aeq_int_mask;
> > +	u32 eq_int_mask;
> > +	u32 ifc_int_source;
> > +	u32 ifc_int_mask;
> > +	u32 ifc_int_set;
> > +	u32 page_size;
> > +
> > +	/* QM_EQC_DW has 7 regs */
> > +	u32 qm_eqc_dw[7];
> > +
> > +	/* QM_AEQC_DW has 7 regs */
> > +	u32 qm_aeqc_dw[7];
> > +
> > +	/* QM reserved 5 regs */
> > +	u32 qm_rsv_regs[5];
> > +
> > +	/* qm memory init information */
> > +	u64 eqe_dma;
> 
> Am I counting wrong or is there a padding before this? 7+7+5 is not a multiple
> of 2. Be explicit about padding in a structure like this.

That's right. It needs padding before 'eqe_dma'.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-03  3:47       ` Alex Williamson
@ 2022-03-03 13:01         ` Jason Gunthorpe
  2022-03-03 15:20           ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2022-03-03 13:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Shameer Kolothum, kvm, linux-kernel, linux-crypto, linux-pci,
	cohuck, mgurtovoy, yishaih, linuxarm, liulongfang, prime.zeng,
	jonathan.cameron, wangzhou1

On Wed, Mar 02, 2022 at 08:47:52PM -0700, Alex Williamson wrote:
> On Wed, 2 Mar 2022 20:05:28 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, Mar 02, 2022 at 01:31:59PM -0700, Alex Williamson wrote:
> > > > + * initial_bytes reflects the estimated remaining size of any initial mandatory
> > > > + * precopy data transfer. When initial_bytes returns as zero then the initial
> > > > + * phase of the precopy data is completed. Generally initial_bytes should start
> > > > + * out as approximately the entire device state.  
> > > 
> > > What is "mandatory" intended to mean here?  The user isn't required to
> > > collect any data from the device in the PRE_COPY states.  
> > 
> > If the data is split into initial,dirty,trailer then mandatory means
> > that first chunk.
> 
> But there's no requirement to read anything in PRE_COPY, so initial
> becomes indistinguishable from trailer and dirty doesn't exist.

It is still mandatory to read that data out, it doesn't matter if it
is read during PRE_COPY or STOP_COPY.

> > > "The vfio_precopy_info data structure returned by this ioctl provides
> > >  estimates of data available from the device during the PRE_COPY states.
> > >  This estimate is split into two categories, initial_bytes and
> > >  dirty_bytes.
> > > 
> > >  The initial_bytes field indicates the amount of static data available
> > >  from the device.  This field should have a non-zero initial value and
> > >  decrease as migration data is read from the device.  
> > 
> > static isn't great either, how about just say 'minimum data available'
> 
> 'initial precopy data-set'?

Sure

> We have no basis to make that assertion.  We've agreed that precopy can
> be used for nothing more than a compatibility test, so we could have a
> vGPU with a massive framebuffer and no ability to provide dirty
> tracking implement precopy only to include the entire framebuffer in
> the trailing STOP_COPY data set.  Per my understanding and the fact
> that we cannot enforce any heuristics regarding the size of the tailer
> relative to the pre-copy data set, I think the above strongly phrased
> sentence is necessary to understand the limitations of what this ioctl
> is meant to convey.  Thanks,

This is why abusing precopy for compatability is not a great idea. It
is OK for acc because its total state is tiny, but I would not agree
to a vGPU driver being merged working like you describe. It distorts
the entire purpose of PRE_COPY and this whole estimation mechanism.

The ioctl is intended to convey when to switch to STOP_COPY, and the
driver should provide a semantic where the closer the reported length
is to 0 then the faster the STOP_COPY will go.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
  2022-03-03 12:57     ` Shameerali Kolothum Thodi
@ 2022-03-03 13:04       ` Jason Gunthorpe
  2022-03-03 13:17         ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2022-03-03 13:04 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: kvm, linux-kernel, linux-crypto, linux-pci, alex.williamson,
	cohuck, mgurtovoy, yishaih, Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)

On Thu, Mar 03, 2022 at 12:57:29PM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > Sent: 03 March 2022 00:22
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > linux-crypto@vger.kernel.org; linux-pci@vger.kernel.org;
> > alex.williamson@redhat.com; cohuck@redhat.com; mgurtovoy@nvidia.com;
> > yishaih@nvidia.com; Linuxarm <linuxarm@huawei.com>; liulongfang
> > <liulongfang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> > Jonathan Cameron <jonathan.cameron@huawei.com>; Wangzhou (B)
> > <wangzhou1@hisilicon.com>
> > Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> > migration
> > 
> > On Wed, Mar 02, 2022 at 05:29:02PM +0000, Shameer Kolothum wrote:
> > > +static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
> > > +				       unsigned int cmd, unsigned long arg)
> > > +{
> > > +	struct hisi_acc_vf_migration_file *migf = filp->private_data;
> > > +	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(migf,
> > > +			struct hisi_acc_vf_core_device, saving_migf);
> > > +	loff_t *pos = &filp->f_pos;
> > > +	struct vfio_precopy_info info;
> > > +	unsigned long minsz;
> > > +	int ret;
> > > +
> > > +	if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
> > > +		return -ENOTTY;
> > > +
> > > +	minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
> > > +
> > > +	if (copy_from_user(&info, (void __user *)arg, minsz))
> > > +		return -EFAULT;
> > > +	if (info.argsz < minsz)
> > > +		return -EINVAL;
> > > +
> > > +	mutex_lock(&hisi_acc_vdev->state_mutex);
> > > +	if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
> > > +		mutex_unlock(&hisi_acc_vdev->state_mutex);
> > > +		return -EINVAL;
> > > +	}
> > 
> > IMHO it is easier just to check the total_length and not grab this
> > other lock
> 
> The problem with checking the total_length here is that it is possible that
> in STOP_COPY the dev is not ready and there are no more data to be transferred 
> and the total_length remains at QM_MATCH_SIZE.

Tthere is a scenario that transfers only QM_MATCH_SIZE in stop_copy?
This doesn't seem like a good idea, I think you should transfer a
positive indication 'this device is not ready' instead of truncating
the stream. A truncated stream should not be a valid stream.

ie always transfer the whole struct.

> Looks like setting the total_length = 0 in STOP_COPY is a better solution(If there are
> no other issues with that) as it will avoid grabbing the state_mutex as you
> mentioned above.

That seems really weird, I wouldn't recommend doing that..
 
Kaspm

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
  2022-03-03 13:04       ` Jason Gunthorpe
@ 2022-03-03 13:17         ` Shameerali Kolothum Thodi
  2022-03-03 13:19           ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Shameerali Kolothum Thodi @ 2022-03-03 13:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, linux-kernel, linux-crypto, linux-pci, alex.williamson,
	cohuck, mgurtovoy, yishaih, Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)



> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 03 March 2022 13:04
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org; linux-pci@vger.kernel.org;
> alex.williamson@redhat.com; cohuck@redhat.com; mgurtovoy@nvidia.com;
> yishaih@nvidia.com; Linuxarm <linuxarm@huawei.com>; liulongfang
> <liulongfang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> migration
> 
> On Thu, Mar 03, 2022 at 12:57:29PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> > > Sent: 03 March 2022 00:22
> > > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > linux-crypto@vger.kernel.org; linux-pci@vger.kernel.org;
> > > alex.williamson@redhat.com; cohuck@redhat.com;
> mgurtovoy@nvidia.com;
> > > yishaih@nvidia.com; Linuxarm <linuxarm@huawei.com>; liulongfang
> > > <liulongfang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> > > Jonathan Cameron <jonathan.cameron@huawei.com>; Wangzhou (B)
> > > <wangzhou1@hisilicon.com>
> > > Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> > > migration
> > >
> > > On Wed, Mar 02, 2022 at 05:29:02PM +0000, Shameer Kolothum wrote:
> > > > +static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
> > > > +				       unsigned int cmd, unsigned long arg)
> > > > +{
> > > > +	struct hisi_acc_vf_migration_file *migf = filp->private_data;
> > > > +	struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(migf,
> > > > +			struct hisi_acc_vf_core_device, saving_migf);
> > > > +	loff_t *pos = &filp->f_pos;
> > > > +	struct vfio_precopy_info info;
> > > > +	unsigned long minsz;
> > > > +	int ret;
> > > > +
> > > > +	if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
> > > > +		return -ENOTTY;
> > > > +
> > > > +	minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
> > > > +
> > > > +	if (copy_from_user(&info, (void __user *)arg, minsz))
> > > > +		return -EFAULT;
> > > > +	if (info.argsz < minsz)
> > > > +		return -EINVAL;
> > > > +
> > > > +	mutex_lock(&hisi_acc_vdev->state_mutex);
> > > > +	if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
> > > > +		mutex_unlock(&hisi_acc_vdev->state_mutex);
> > > > +		return -EINVAL;
> > > > +	}
> > >
> > > IMHO it is easier just to check the total_length and not grab this
> > > other lock
> >
> > The problem with checking the total_length here is that it is possible that
> > in STOP_COPY the dev is not ready and there are no more data to be
> transferred
> > and the total_length remains at QM_MATCH_SIZE.
> 
> Tthere is a scenario that transfers only QM_MATCH_SIZE in stop_copy?
> This doesn't seem like a good idea, I think you should transfer a
> positive indication 'this device is not ready' instead of truncating
> the stream. A truncated stream should not be a valid stream.
> 
> ie always transfer the whole struct.

We could add a 'qm_state' and return the whole struct. But the rest
of the struct is basically invalid if qm_state = QM_NOT_REDAY.

> 
> > Looks like setting the total_length = 0 in STOP_COPY is a better
> > solution(If there are no other issues with that) as it will avoid
> > grabbing the state_mutex as you mentioned above.
> 
> That seems really weird, I wouldn't recommend doing that..

Does that mean we don't support a zero data transfer in STOP_COPY?
The concern is if we always transfer the whole struct, we end up reading
and writing the whole thing even if most of the data is invalid.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
  2022-03-03 13:17         ` Shameerali Kolothum Thodi
@ 2022-03-03 13:19           ` Jason Gunthorpe
  2022-03-03 13:27             ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2022-03-03 13:19 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: kvm, linux-kernel, linux-crypto, linux-pci, alex.williamson,
	cohuck, mgurtovoy, yishaih, Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)

On Thu, Mar 03, 2022 at 01:17:05PM +0000, Shameerali Kolothum Thodi wrote:
> > Tthere is a scenario that transfers only QM_MATCH_SIZE in stop_copy?
> > This doesn't seem like a good idea, I think you should transfer a
> > positive indication 'this device is not ready' instead of truncating
> > the stream. A truncated stream should not be a valid stream.
> > 
> > ie always transfer the whole struct.
> 
> We could add a 'qm_state' and return the whole struct. But the rest
> of the struct is basically invalid if qm_state = QM_NOT_REDAY.

This seems like the right thing to do to me

> > > Looks like setting the total_length = 0 in STOP_COPY is a better
> > > solution(If there are no other issues with that) as it will avoid
> > > grabbing the state_mutex as you mentioned above.
> > 
> > That seems really weird, I wouldn't recommend doing that..
> 
> Does that mean we don't support a zero data transfer in STOP_COPY?

total_length should not go backwards

> The concern is if we always transfer the whole struct, we end up reading
> and writing the whole thing even if most of the data is invalid.

Well, you can't know if the device is not ready or not until the
reaching STOP_COPY, so you have to transfer something to avoid
truncating the data stream. The state here is tiny, is the extra
transfer a real worry?

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
  2022-03-03 13:19           ` Jason Gunthorpe
@ 2022-03-03 13:27             ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 32+ messages in thread
From: Shameerali Kolothum Thodi @ 2022-03-03 13:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, linux-kernel, linux-crypto, linux-pci, alex.williamson,
	cohuck, mgurtovoy, yishaih, Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)



> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@nvidia.com]
> Sent: 03 March 2022 13:20
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org; linux-pci@vger.kernel.org;
> alex.williamson@redhat.com; cohuck@redhat.com; mgurtovoy@nvidia.com;
> yishaih@nvidia.com; Linuxarm <linuxarm@huawei.com>; liulongfang
> <liulongfang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> migration
> 
> On Thu, Mar 03, 2022 at 01:17:05PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > Tthere is a scenario that transfers only QM_MATCH_SIZE in stop_copy?
> > > This doesn't seem like a good idea, I think you should transfer a
> > > positive indication 'this device is not ready' instead of truncating
> > > the stream. A truncated stream should not be a valid stream.
> > >
> > > ie always transfer the whole struct.
> >
> > We could add a 'qm_state' and return the whole struct. But the rest
> > of the struct is basically invalid if qm_state = QM_NOT_REDAY.
> 
> This seems like the right thing to do to me
> 
> > > > Looks like setting the total_length = 0 in STOP_COPY is a better
> > > > solution(If there are no other issues with that) as it will avoid
> > > > grabbing the state_mutex as you mentioned above.
> > >
> > > That seems really weird, I wouldn't recommend doing that..
> >
> > Does that mean we don't support a zero data transfer in STOP_COPY?
> 
> total_length should not go backwards
> 
> > The concern is if we always transfer the whole struct, we end up reading
> > and writing the whole thing even if most of the data is invalid.
> 
> Well, you can't know if the device is not ready or not until the
> reaching STOP_COPY, so you have to transfer something to avoid
> truncating the data stream. The state here is tiny, is the extra
> transfer a real worry?

Ok, don't think so. I will introduce the 'qm_state' and handle it as discussed
above then.

Thanks,
Shameer 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-03 13:01         ` Jason Gunthorpe
@ 2022-03-03 15:20           ` Alex Williamson
  2022-03-03 18:05             ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2022-03-03 15:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameer Kolothum, kvm, linux-kernel, linux-crypto, linux-pci,
	cohuck, mgurtovoy, yishaih, linuxarm, liulongfang, prime.zeng,
	jonathan.cameron, wangzhou1

On Thu, 3 Mar 2022 09:01:24 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Mar 02, 2022 at 08:47:52PM -0700, Alex Williamson wrote:
> > On Wed, 2 Mar 2022 20:05:28 -0400
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > On Wed, Mar 02, 2022 at 01:31:59PM -0700, Alex Williamson wrote:  
> > > > > + * initial_bytes reflects the estimated remaining size of any initial mandatory
> > > > > + * precopy data transfer. When initial_bytes returns as zero then the initial
> > > > > + * phase of the precopy data is completed. Generally initial_bytes should start
> > > > > + * out as approximately the entire device state.    
> > > > 
> > > > What is "mandatory" intended to mean here?  The user isn't required to
> > > > collect any data from the device in the PRE_COPY states.    
> > > 
> > > If the data is split into initial,dirty,trailer then mandatory means
> > > that first chunk.  
> > 
> > But there's no requirement to read anything in PRE_COPY, so initial
> > becomes indistinguishable from trailer and dirty doesn't exist.  
> 
> It is still mandatory to read that data out, it doesn't matter if it
> is read during PRE_COPY or STOP_COPY.

Not really, PRE_COPY -> RUNNING is a valid arc.
 
> > > > "The vfio_precopy_info data structure returned by this ioctl provides
> > > >  estimates of data available from the device during the PRE_COPY states.
> > > >  This estimate is split into two categories, initial_bytes and
> > > >  dirty_bytes.
> > > > 
> > > >  The initial_bytes field indicates the amount of static data available
> > > >  from the device.  This field should have a non-zero initial value and
> > > >  decrease as migration data is read from the device.    
> > > 
> > > static isn't great either, how about just say 'minimum data available'  
> > 
> > 'initial precopy data-set'?  
> 
> Sure
> 
> > We have no basis to make that assertion.  We've agreed that precopy can
> > be used for nothing more than a compatibility test, so we could have a
> > vGPU with a massive framebuffer and no ability to provide dirty
> > tracking implement precopy only to include the entire framebuffer in
> > the trailing STOP_COPY data set.  Per my understanding and the fact
> > that we cannot enforce any heuristics regarding the size of the tailer
> > relative to the pre-copy data set, I think the above strongly phrased
> > sentence is necessary to understand the limitations of what this ioctl
> > is meant to convey.  Thanks,  
> 
> This is why abusing precopy for compatability is not a great idea. It
> is OK for acc because its total state is tiny, but I would not agree
> to a vGPU driver being merged working like you describe. It distorts
> the entire purpose of PRE_COPY and this whole estimation mechanism.
> 
> The ioctl is intended to convey when to switch to STOP_COPY, and the
> driver should provide a semantic where the closer the reported length
> is to 0 then the faster the STOP_COPY will go.

If it's an abuse, then let's not do it.  It was never my impression or
intention that this was ok for acc only due to the minimal trailing
data size.  My statement was that use of PRE_COPY for compatibility
testing only had been a previously agreed valid use case of the
original migration interface.

Furthermore the acc driver was explicitly directed not to indicate any
degree of trailing data size in dirty_bytes, so while trailing data may
be small for acc, this interface is explicitly not intended to provide
any indication of trailing data size.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-03 15:20           ` Alex Williamson
@ 2022-03-03 18:05             ` Shameerali Kolothum Thodi
  2022-03-03 19:59               ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Shameerali Kolothum Thodi @ 2022-03-03 18:05 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: kvm, linux-kernel, linux-crypto, linux-pci, cohuck, mgurtovoy,
	yishaih, Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: 03 March 2022 15:21
> To: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org; linux-pci@vger.kernel.org; cohuck@redhat.com;
> mgurtovoy@nvidia.com; yishaih@nvidia.com; Linuxarm
> <linuxarm@huawei.com>; liulongfang <liulongfang@huawei.com>; Zengtao (B)
> <prime.zeng@hisilicon.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with
> PRE_COPY
> 
> On Thu, 3 Mar 2022 09:01:24 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, Mar 02, 2022 at 08:47:52PM -0700, Alex Williamson wrote:
> > > On Wed, 2 Mar 2022 20:05:28 -0400
> > > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > > On Wed, Mar 02, 2022 at 01:31:59PM -0700, Alex Williamson wrote:
> > > > > > + * initial_bytes reflects the estimated remaining size of any
> > > > > > + initial mandatory
> > > > > > + * precopy data transfer. When initial_bytes returns as zero
> > > > > > + then the initial
> > > > > > + * phase of the precopy data is completed. Generally initial_bytes
> should start
> > > > > > + * out as approximately the entire device state.
> > > > >
> > > > > What is "mandatory" intended to mean here?  The user isn't required
> to
> > > > > collect any data from the device in the PRE_COPY states.
> > > >
> > > > If the data is split into initial,dirty,trailer then mandatory
> > > > means that first chunk.
> > >
> > > But there's no requirement to read anything in PRE_COPY, so initial
> > > becomes indistinguishable from trailer and dirty doesn't exist.
> >
> > It is still mandatory to read that data out, it doesn't matter if it
> > is read during PRE_COPY or STOP_COPY.
> 
> Not really, PRE_COPY -> RUNNING is a valid arc.
> 
> > > > > "The vfio_precopy_info data structure returned by this ioctl
> > > > > provides  estimates of data available from the device during the
> PRE_COPY states.
> > > > >  This estimate is split into two categories, initial_bytes and
> > > > > dirty_bytes.
> > > > >
> > > > >  The initial_bytes field indicates the amount of static data
> > > > > available  from the device.  This field should have a non-zero initial
> value and
> > > > >  decrease as migration data is read from the device.
> > > >
> > > > static isn't great either, how about just say 'minimum data available'
> > >
> > > 'initial precopy data-set'?
> >
> > Sure
> >
> > > We have no basis to make that assertion.  We've agreed that precopy
> > > can be used for nothing more than a compatibility test, so we could
> > > have a vGPU with a massive framebuffer and no ability to provide
> > > dirty tracking implement precopy only to include the entire
> > > framebuffer in the trailing STOP_COPY data set.  Per my
> > > understanding and the fact that we cannot enforce any heuristics
> > > regarding the size of the tailer relative to the pre-copy data set,
> > > I think the above strongly phrased sentence is necessary to
> > > understand the limitations of what this ioctl is meant to convey.
> > > Thanks,
> >
> > This is why abusing precopy for compatability is not a great idea. It
> > is OK for acc because its total state is tiny, but I would not agree
> > to a vGPU driver being merged working like you describe. It distorts
> > the entire purpose of PRE_COPY and this whole estimation mechanism.
> >
> > The ioctl is intended to convey when to switch to STOP_COPY, and the
> > driver should provide a semantic where the closer the reported length
> > is to 0 then the faster the STOP_COPY will go.
> 
> If it's an abuse, then let's not do it.  It was never my impression or intention
> that this was ok for acc only due to the minimal trailing data size.  My
> statement was that use of PRE_COPY for compatibility testing only had been a
> previously agreed valid use case of the original migration interface.
> 
> Furthermore the acc driver was explicitly directed not to indicate any degree
> of trailing data size in dirty_bytes, so while trailing data may be small for acc,
> this interface is explicitly not intended to provide any indication of trailing
> data size.  Thanks,

Just to clarify, so the suggestion here is not to use PRE_COPY for compatibility
check at all and have a different proper infrastructure for that later as Jason
suggested?

If so, I will remove this patch from this series and go back to the old revision
where we only have STOP_COPY and do the compatibility check during the final
load data operation.

Please let me know.

Thanks,
Shameer


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-03 18:05             ` Shameerali Kolothum Thodi
@ 2022-03-03 19:59               ` Alex Williamson
  2022-03-03 23:49                 ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Williamson @ 2022-03-03 19:59 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Jason Gunthorpe, kvm, linux-kernel, linux-crypto, linux-pci,
	cohuck, mgurtovoy, yishaih, Linuxarm, liulongfang, Zengtao (B),
	Jonathan Cameron, Wangzhou (B)

On Thu, 3 Mar 2022 18:05:53 +0000
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: 03 March 2022 15:21
> > To: Jason Gunthorpe <jgg@nvidia.com>
> > Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> > kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > linux-crypto@vger.kernel.org; linux-pci@vger.kernel.org; cohuck@redhat.com;
> > mgurtovoy@nvidia.com; yishaih@nvidia.com; Linuxarm
> > <linuxarm@huawei.com>; liulongfang <liulongfang@huawei.com>; Zengtao (B)
> > <prime.zeng@hisilicon.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>
> > Subject: Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with
> > PRE_COPY
> > 
> > On Thu, 3 Mar 2022 09:01:24 -0400
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > On Wed, Mar 02, 2022 at 08:47:52PM -0700, Alex Williamson wrote:  
> > > > On Wed, 2 Mar 2022 20:05:28 -0400
> > > > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > > >  
> > > > > On Wed, Mar 02, 2022 at 01:31:59PM -0700, Alex Williamson wrote:  
> > > > > > > + * initial_bytes reflects the estimated remaining size of any
> > > > > > > + initial mandatory
> > > > > > > + * precopy data transfer. When initial_bytes returns as zero
> > > > > > > + then the initial
> > > > > > > + * phase of the precopy data is completed. Generally initial_bytes  
> > should start  
> > > > > > > + * out as approximately the entire device state.  
> > > > > >
> > > > > > What is "mandatory" intended to mean here?  The user isn't required  
> > to  
> > > > > > collect any data from the device in the PRE_COPY states.  
> > > > >
> > > > > If the data is split into initial,dirty,trailer then mandatory
> > > > > means that first chunk.  
> > > >
> > > > But there's no requirement to read anything in PRE_COPY, so initial
> > > > becomes indistinguishable from trailer and dirty doesn't exist.  
> > >
> > > It is still mandatory to read that data out, it doesn't matter if it
> > > is read during PRE_COPY or STOP_COPY.  
> > 
> > Not really, PRE_COPY -> RUNNING is a valid arc.
> >   
> > > > > > "The vfio_precopy_info data structure returned by this ioctl
> > > > > > provides  estimates of data available from the device during the  
> > PRE_COPY states.  
> > > > > >  This estimate is split into two categories, initial_bytes and
> > > > > > dirty_bytes.
> > > > > >
> > > > > >  The initial_bytes field indicates the amount of static data
> > > > > > available  from the device.  This field should have a non-zero initial  
> > value and  
> > > > > >  decrease as migration data is read from the device.  
> > > > >
> > > > > static isn't great either, how about just say 'minimum data available'  
> > > >
> > > > 'initial precopy data-set'?  
> > >
> > > Sure
> > >  
> > > > We have no basis to make that assertion.  We've agreed that precopy
> > > > can be used for nothing more than a compatibility test, so we could
> > > > have a vGPU with a massive framebuffer and no ability to provide
> > > > dirty tracking implement precopy only to include the entire
> > > > framebuffer in the trailing STOP_COPY data set.  Per my
> > > > understanding and the fact that we cannot enforce any heuristics
> > > > regarding the size of the tailer relative to the pre-copy data set,
> > > > I think the above strongly phrased sentence is necessary to
> > > > understand the limitations of what this ioctl is meant to convey.
> > > > Thanks,  
> > >
> > > This is why abusing precopy for compatability is not a great idea. It
> > > is OK for acc because its total state is tiny, but I would not agree
> > > to a vGPU driver being merged working like you describe. It distorts
> > > the entire purpose of PRE_COPY and this whole estimation mechanism.
> > >
> > > The ioctl is intended to convey when to switch to STOP_COPY, and the
> > > driver should provide a semantic where the closer the reported length
> > > is to 0 then the faster the STOP_COPY will go.  
> > 
> > If it's an abuse, then let's not do it.  It was never my impression or intention
> > that this was ok for acc only due to the minimal trailing data size.  My
> > statement was that use of PRE_COPY for compatibility testing only had been a
> > previously agreed valid use case of the original migration interface.
> > 
> > Furthermore the acc driver was explicitly directed not to indicate any degree
> > of trailing data size in dirty_bytes, so while trailing data may be small for acc,
> > this interface is explicitly not intended to provide any indication of trailing
> > data size.  Thanks,  
> 
> Just to clarify, so the suggestion here is not to use PRE_COPY for compatibility
> check at all and have a different proper infrastructure for that later as Jason
> suggested?
> 
> If so, I will remove this patch from this series and go back to the old revision
> where we only have STOP_COPY and do the compatibility check during the final
> load data operation.

Hi Shameer,

I think NVIDIA has a company long weekend, so I'm not sure how quickly
we'll hear a rebuttal from Jason, but at this point I'd rather not move
forward with using PRE_COPY exclusively for compatibility testing if
that is seen as an abuse of the interface, regardless of the size of
the remaining STOP_COPY data.  It might be most expedient to respin
without PRE_COPY and we'll revisit methods to perform early
compatibility testing in the future.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-03 19:59               ` Alex Williamson
@ 2022-03-03 23:49                 ` Jason Gunthorpe
  2022-03-04 19:56                   ` Alex Williamson
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2022-03-03 23:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Shameerali Kolothum Thodi, kvm, linux-kernel, linux-crypto,
	linux-pci, cohuck, mgurtovoy, yishaih, Linuxarm, liulongfang,
	Zengtao (B), Jonathan Cameron, Wangzhou (B)

On Thu, Mar 03, 2022 at 12:59:30PM -0700, Alex Williamson wrote:

> > > If it's an abuse, then let's not do it.  It was never my
> > > impression or intention

So maybe abuse is the wrong word, but I don't want to mess up this
interface, which is intended to support real pre-copy devices, just
because devices that don't actually implement true precopy might do
silly things.

The vGPU case you imagine will still work and qemu will switch to
STOP_COPY with a huge trailer and be slow. That is unavoidable and I
think it is fine.

> > > Furthermore the acc driver was explicitly directed not to indicate any degree
> > > of trailing data size in dirty_bytes, so while trailing data may be small for acc,
> > > this interface is explicitly not intended to provide any indication of trailing
> > > data size.  Thanks, 

Yes, trailing data is not what this is for. This is only to help
decide when to switch from PRE_COPY to STOP_COPY. If the device can
execute STOP_COPY in the right time is a completely different
discussion/interface.

> > Just to clarify, so the suggestion here is not to use PRE_COPY for compatibility
> > check at all and have a different proper infrastructure for that later as Jason
> > suggested?
> > 
> > If so, I will remove this patch from this series and go back to the old revision
> > where we only have STOP_COPY and do the compatibility check during the final
> > load data operation.
> 
> Hi Shameer,
> 
> I think NVIDIA has a company long weekend, so I'm not sure how quickly
> we'll hear a rebuttal from Jason, but at this point I'd rather not
> move

Yes, company long weekend.

> forward with using PRE_COPY exclusively for compatibility testing if
> that is seen as an abuse of the interface, regardless of the size of
> the remaining STOP_COPY data.  It might be most expedient to respin
> without PRE_COPY and we'll revisit methods to perform early
> compatibility testing in the future.  Thanks,

Shameerali has talked about wanting this compat check early from the
start, and done all the work to implement it. I think it is pretty
extreme to blow up his series over trailing_data.

To me acc is fine to use it this way until we get a better solution
for compatability. We all need this, but I expect it to be complicated
to define.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY
  2022-03-03 23:49                 ` Jason Gunthorpe
@ 2022-03-04 19:56                   ` Alex Williamson
  0 siblings, 0 replies; 32+ messages in thread
From: Alex Williamson @ 2022-03-04 19:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shameerali Kolothum Thodi, kvm, linux-kernel, linux-crypto,
	linux-pci, cohuck, mgurtovoy, yishaih, Linuxarm, liulongfang,
	Zengtao (B), Jonathan Cameron, Wangzhou (B)

On Thu, 3 Mar 2022 19:49:51 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Mar 03, 2022 at 12:59:30PM -0700, Alex Williamson wrote:
> 
> > > > If it's an abuse, then let's not do it.  It was never my
> > > > impression or intention  
> 
> So maybe abuse is the wrong word, but I don't want to mess up this
> interface, which is intended to support real pre-copy devices, just
> because devices that don't actually implement true precopy might do
> silly things.

Abuse... silly... either way, you're clearly not comfortable misusing
PRE_COPY for this purpose.  

> The vGPU case you imagine will still work and qemu will switch to
> STOP_COPY with a huge trailer and be slow. That is unavoidable and I
> think it is fine.

It's not really fine, but I think it will require some better defined
interfaces and userspace support to give a clear picture of how data is
partitioned.

> > > > Furthermore the acc driver was explicitly directed not to indicate any degree
> > > > of trailing data size in dirty_bytes, so while trailing data may be small for acc,
> > > > this interface is explicitly not intended to provide any indication of trailing
> > > > data size.  Thanks,   
> 
> Yes, trailing data is not what this is for. This is only to help
> decide when to switch from PRE_COPY to STOP_COPY. If the device can
> execute STOP_COPY in the right time is a completely different
> discussion/interface.
> 
> > > Just to clarify, so the suggestion here is not to use PRE_COPY for compatibility
> > > check at all and have a different proper infrastructure for that later as Jason
> > > suggested?
> > > 
> > > If so, I will remove this patch from this series and go back to the old revision
> > > where we only have STOP_COPY and do the compatibility check during the final
> > > load data operation.  
> > 
> > Hi Shameer,
> > 
> > I think NVIDIA has a company long weekend, so I'm not sure how quickly
> > we'll hear a rebuttal from Jason, but at this point I'd rather not
> > move  
> 
> Yes, company long weekend.
> 
> > forward with using PRE_COPY exclusively for compatibility testing if
> > that is seen as an abuse of the interface, regardless of the size of
> > the remaining STOP_COPY data.  It might be most expedient to respin
> > without PRE_COPY and we'll revisit methods to perform early
> > compatibility testing in the future.  Thanks,  
> 
> Shameerali has talked about wanting this compat check early from the
> start, and done all the work to implement it. I think it is pretty
> extreme to blow up his series over trailing_data.
> 
> To me acc is fine to use it this way until we get a better solution
> for compatability. We all need this, but I expect it to be complicated
> to define.

It was only in v7 that we made this switch to use PRE_COPY for this
purpose, I wouldn't call it blowing up his series to step back and
decide that was a poor choice and clearly v8 exists without this.  This
isn't the end of the discussion regarding early compatibility testing,
but I'm not going to rush a PRE_COPY interface to support that early
compatibility testing if we're not agreed that it's a valid use case,
and not just a marginally acceptable one due to the trailing data being
inconsequential.  Let's focus on v8 and we can talk about further
extensions later.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-03-04 20:01 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-02 17:28 [PATCH v7 00/10] vfio/hisilicon: add ACC live migration driver Shameer Kolothum
2022-03-02 17:28 ` [PATCH v7 01/10] crypto: hisilicon/qm: Move the QM header to include/linux Shameer Kolothum
2022-03-02 18:48   ` John Garry
2022-03-02 20:39     ` Shameerali Kolothum Thodi
2022-03-02 17:28 ` [PATCH v7 02/10] crypto: hisilicon/qm: Move few definitions to common header Shameer Kolothum
2022-03-02 18:55   ` John Garry
2022-03-02 20:41     ` Shameerali Kolothum Thodi
2022-03-02 17:28 ` [PATCH v7 03/10] hisi_acc_qm: Move VF PCI device IDs " Shameer Kolothum
2022-03-02 17:28 ` [PATCH v7 04/10] hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices Shameer Kolothum
2022-03-02 19:02   ` John Garry
2022-03-02 20:32     ` Shameerali Kolothum Thodi
2022-03-02 17:28 ` [PATCH v7 05/10] hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region Shameer Kolothum
2022-03-02 17:28 ` [PATCH v7 06/10] hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver Shameer Kolothum
2022-03-02 17:29 ` [PATCH v7 07/10] vfio: Extend the device migration protocol with PRE_COPY Shameer Kolothum
2022-03-02 20:31   ` Alex Williamson
2022-03-03  0:05     ` Jason Gunthorpe
2022-03-03  3:47       ` Alex Williamson
2022-03-03 13:01         ` Jason Gunthorpe
2022-03-03 15:20           ` Alex Williamson
2022-03-03 18:05             ` Shameerali Kolothum Thodi
2022-03-03 19:59               ` Alex Williamson
2022-03-03 23:49                 ` Jason Gunthorpe
2022-03-04 19:56                   ` Alex Williamson
2022-03-02 17:29 ` [PATCH v7 08/10] crypto: hisilicon/qm: Set the VF QM state register Shameer Kolothum
2022-03-02 17:29 ` [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration Shameer Kolothum
2022-03-03  0:21   ` Jason Gunthorpe
2022-03-03 12:57     ` Shameerali Kolothum Thodi
2022-03-03 13:04       ` Jason Gunthorpe
2022-03-03 13:17         ` Shameerali Kolothum Thodi
2022-03-03 13:19           ` Jason Gunthorpe
2022-03-03 13:27             ` Shameerali Kolothum Thodi
2022-03-02 17:29 ` [PATCH v7 10/10] hisi_acc_vfio_pci: Use its own PCI reset_done error handler Shameer Kolothum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).