All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFT][PATCH v1 0/6] Update vfio_pin/unpin_pages API
@ 2022-06-16 23:52 ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: jchrist, kvm, linux-doc, linux-kernel, linux-s390, intel-gvt-dev,
	intel-gfx, dri-devel

This is a preparatory series for IOMMUFD v2 patches. It prepares for
replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
with IOMMUFD version.

There's a gap between these two versions: the vfio_iommu_type1 version
inputs a non-contiguous PFN list and outputs another PFN list for the
pinned physical page list, while the IOMMUFD version only supports a
contiguous address input by accepting the starting IO virtual address
of a set of pages to pin and by outputting to a physical page list.

The nature of existing callers mostly aligns with the IOMMUFD version,
except s390's vfio_ccw_cp code where some additional change is needed
along with this series. Overall, updating to "iova" and "phys_page"
does improve the caller side to some extent.

Also fix a misuse of physical address and virtual address in the s390's
crypto code. And update the input naming at the adjacent vfio_dma_rw().

This is on github:
https://github.com/nicolinc/iommufd/commits/vfio_pin_pages

Request for testing: I only did build for s390 and i915 code, so it'd
be nice to have people who have environment to run sanity accordingly.

Thanks!

Nicolin Chen (6):
  vfio/ap: Pass in physical address of ind to ap_aqic()
  vfio/ccw: Only pass in contiguous pages
  vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
  vfio: Rename user_iova of vfio_dma_rw()
  vfio/ccw: Add kmap_local_page() for memcpy
  vfio: Replace phys_pfn with phys_page for vfio_pin_pages()

 .../driver-api/vfio-mediated-device.rst       |  6 +-
 arch/s390/include/asm/ap.h                    |  6 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 44 ++++-----
 drivers/s390/cio/vfio_ccw_cp.c                | 90 ++++++++++++++-----
 drivers/s390/crypto/ap_queue.c                |  2 +-
 drivers/s390/crypto/vfio_ap_ops.c             | 21 ++---
 drivers/vfio/vfio.c                           | 38 ++++----
 drivers/vfio/vfio.h                           |  6 +-
 drivers/vfio/vfio_iommu_type1.c               | 34 +++----
 include/linux/vfio.h                          |  8 +-
 10 files changed, 142 insertions(+), 113 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 0/6] Update vfio_pin/unpin_pages API
@ 2022-06-16 23:52 ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: linux-s390, kvm, linux-doc, intel-gfx, jchrist, linux-kernel,
	dri-devel, intel-gvt-dev

This is a preparatory series for IOMMUFD v2 patches. It prepares for
replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
with IOMMUFD version.

There's a gap between these two versions: the vfio_iommu_type1 version
inputs a non-contiguous PFN list and outputs another PFN list for the
pinned physical page list, while the IOMMUFD version only supports a
contiguous address input by accepting the starting IO virtual address
of a set of pages to pin and by outputting to a physical page list.

The nature of existing callers mostly aligns with the IOMMUFD version,
except s390's vfio_ccw_cp code where some additional change is needed
along with this series. Overall, updating to "iova" and "phys_page"
does improve the caller side to some extent.

Also fix a misuse of physical address and virtual address in the s390's
crypto code. And update the input naming at the adjacent vfio_dma_rw().

This is on github:
https://github.com/nicolinc/iommufd/commits/vfio_pin_pages

Request for testing: I only did build for s390 and i915 code, so it'd
be nice to have people who have environment to run sanity accordingly.

Thanks!

Nicolin Chen (6):
  vfio/ap: Pass in physical address of ind to ap_aqic()
  vfio/ccw: Only pass in contiguous pages
  vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
  vfio: Rename user_iova of vfio_dma_rw()
  vfio/ccw: Add kmap_local_page() for memcpy
  vfio: Replace phys_pfn with phys_page for vfio_pin_pages()

 .../driver-api/vfio-mediated-device.rst       |  6 +-
 arch/s390/include/asm/ap.h                    |  6 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 44 ++++-----
 drivers/s390/cio/vfio_ccw_cp.c                | 90 ++++++++++++++-----
 drivers/s390/crypto/ap_queue.c                |  2 +-
 drivers/s390/crypto/vfio_ap_ops.c             | 21 ++---
 drivers/vfio/vfio.c                           | 38 ++++----
 drivers/vfio/vfio.h                           |  6 +-
 drivers/vfio/vfio_iommu_type1.c               | 34 +++----
 include/linux/vfio.h                          |  8 +-
 10 files changed, 142 insertions(+), 113 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 1/6] vfio/ap: Pass in physical address of ind to ap_aqic()
  2022-06-16 23:52 ` Nicolin Chen
@ 2022-06-16 23:52   ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: jchrist, kvm, linux-doc, linux-kernel, linux-s390, intel-gvt-dev,
	intel-gfx, dri-devel

The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
virt value that's casted from a physical address "h_nib". Inside the
ap_aqic(), it does virt_to_phys() again.

Since ap_aqic() needs a physical address, let's just pass in a pa of
ind directly. So change the "ind" to "pa_ind".

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 arch/s390/include/asm/ap.h        | 6 +++---
 drivers/s390/crypto/ap_queue.c    | 2 +-
 drivers/s390/crypto/vfio_ap_ops.c | 7 ++++---
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h
index b515cfa62bd9..f508f5025e38 100644
--- a/arch/s390/include/asm/ap.h
+++ b/arch/s390/include/asm/ap.h
@@ -227,13 +227,13 @@ struct ap_qirq_ctrl {
  * ap_aqic(): Control interruption for a specific AP.
  * @qid: The AP queue number
  * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
- * @ind: The notification indicator byte
+ * @pa_ind: Physical address of the notification indicator byte
  *
  * Returns AP queue status.
  */
 static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
 					     struct ap_qirq_ctrl qirqctrl,
-					     void *ind)
+					     phys_addr_t pa_ind)
 {
 	unsigned long reg0 = qid | (3UL << 24);  /* fc 3UL is AQIC */
 	union {
@@ -241,7 +241,7 @@ static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
 		struct ap_qirq_ctrl qirqctrl;
 		struct ap_queue_status status;
 	} reg1;
-	unsigned long reg2 = virt_to_phys(ind);
+	unsigned long reg2 = pa_ind;
 
 	reg1.qirqctrl = qirqctrl;
 
diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c
index c48b0db824e3..a32457b4cbb8 100644
--- a/drivers/s390/crypto/ap_queue.c
+++ b/drivers/s390/crypto/ap_queue.c
@@ -34,7 +34,7 @@ static int ap_queue_enable_irq(struct ap_queue *aq, void *ind)
 
 	qirqctrl.ir = 1;
 	qirqctrl.isc = AP_ISC;
-	status = ap_aqic(aq->qid, qirqctrl, ind);
+	status = ap_aqic(aq->qid, qirqctrl, virt_to_phys(ind));
 	switch (status.response_code) {
 	case AP_RESPONSE_NORMAL:
 	case AP_RESPONSE_OTHERWISE_CHANGED:
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a7d2a95796d3..bb869b28cebd 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -154,7 +154,7 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 	int retries = 5;
 
 	do {
-		status = ap_aqic(q->apqn, aqic_gisa, NULL);
+		status = ap_aqic(q->apqn, aqic_gisa, 0);
 		switch (status.response_code) {
 		case AP_RESPONSE_OTHERWISE_CHANGED:
 		case AP_RESPONSE_NORMAL:
@@ -245,7 +245,8 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	struct kvm_s390_gisa *gisa;
 	int nisc;
 	struct kvm *kvm;
-	unsigned long h_nib, g_pfn, h_pfn;
+	unsigned long g_pfn, h_pfn;
+	phys_addr_t h_nib;
 	int ret;
 
 	/* Verify that the notification indicator byte address is valid */
@@ -290,7 +291,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	aqic_gisa.ir = 1;
 	aqic_gisa.gisa = (uint64_t)gisa >> 4;
 
-	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
+	status = ap_aqic(q->apqn, aqic_gisa, h_nib);
 	switch (status.response_code) {
 	case AP_RESPONSE_NORMAL:
 		/* See if we did clear older IRQ configuration */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 1/6] vfio/ap: Pass in physical address of ind to ap_aqic()
@ 2022-06-16 23:52   ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: linux-s390, kvm, linux-doc, intel-gfx, jchrist, linux-kernel,
	dri-devel, intel-gvt-dev

The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
virt value that's casted from a physical address "h_nib". Inside the
ap_aqic(), it does virt_to_phys() again.

Since ap_aqic() needs a physical address, let's just pass in a pa of
ind directly. So change the "ind" to "pa_ind".

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 arch/s390/include/asm/ap.h        | 6 +++---
 drivers/s390/crypto/ap_queue.c    | 2 +-
 drivers/s390/crypto/vfio_ap_ops.c | 7 ++++---
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h
index b515cfa62bd9..f508f5025e38 100644
--- a/arch/s390/include/asm/ap.h
+++ b/arch/s390/include/asm/ap.h
@@ -227,13 +227,13 @@ struct ap_qirq_ctrl {
  * ap_aqic(): Control interruption for a specific AP.
  * @qid: The AP queue number
  * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
- * @ind: The notification indicator byte
+ * @pa_ind: Physical address of the notification indicator byte
  *
  * Returns AP queue status.
  */
 static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
 					     struct ap_qirq_ctrl qirqctrl,
-					     void *ind)
+					     phys_addr_t pa_ind)
 {
 	unsigned long reg0 = qid | (3UL << 24);  /* fc 3UL is AQIC */
 	union {
@@ -241,7 +241,7 @@ static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
 		struct ap_qirq_ctrl qirqctrl;
 		struct ap_queue_status status;
 	} reg1;
-	unsigned long reg2 = virt_to_phys(ind);
+	unsigned long reg2 = pa_ind;
 
 	reg1.qirqctrl = qirqctrl;
 
diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c
index c48b0db824e3..a32457b4cbb8 100644
--- a/drivers/s390/crypto/ap_queue.c
+++ b/drivers/s390/crypto/ap_queue.c
@@ -34,7 +34,7 @@ static int ap_queue_enable_irq(struct ap_queue *aq, void *ind)
 
 	qirqctrl.ir = 1;
 	qirqctrl.isc = AP_ISC;
-	status = ap_aqic(aq->qid, qirqctrl, ind);
+	status = ap_aqic(aq->qid, qirqctrl, virt_to_phys(ind));
 	switch (status.response_code) {
 	case AP_RESPONSE_NORMAL:
 	case AP_RESPONSE_OTHERWISE_CHANGED:
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a7d2a95796d3..bb869b28cebd 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -154,7 +154,7 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 	int retries = 5;
 
 	do {
-		status = ap_aqic(q->apqn, aqic_gisa, NULL);
+		status = ap_aqic(q->apqn, aqic_gisa, 0);
 		switch (status.response_code) {
 		case AP_RESPONSE_OTHERWISE_CHANGED:
 		case AP_RESPONSE_NORMAL:
@@ -245,7 +245,8 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	struct kvm_s390_gisa *gisa;
 	int nisc;
 	struct kvm *kvm;
-	unsigned long h_nib, g_pfn, h_pfn;
+	unsigned long g_pfn, h_pfn;
+	phys_addr_t h_nib;
 	int ret;
 
 	/* Verify that the notification indicator byte address is valid */
@@ -290,7 +291,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	aqic_gisa.ir = 1;
 	aqic_gisa.gisa = (uint64_t)gisa >> 4;
 
-	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
+	status = ap_aqic(q->apqn, aqic_gisa, h_nib);
 	switch (status.response_code) {
 	case AP_RESPONSE_NORMAL:
 		/* See if we did clear older IRQ configuration */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 2/6] vfio/ccw: Only pass in contiguous pages
  2022-06-16 23:52 ` Nicolin Chen
@ 2022-06-16 23:52   ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: jchrist, kvm, linux-doc, linux-kernel, linux-s390, intel-gvt-dev,
	intel-gfx, dri-devel

This driver is the only caller of vfio_pin/unpin_pages that might pass
in a non-contiguous PFN list, but in many cases it has a contiguous PFN
list to process. So letting VFIO API handle a non-contiguous PFN list
is actually counterproductive.

Add a pair of simple loops to pass in contiguous PFNs only, to have an
efficient implementation in VFIO.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 68 +++++++++++++++++++++++++++-------
 1 file changed, 54 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 0c2be9421ab7..4659c2e35877 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -90,6 +90,37 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
 	return 0;
 }
 
+/*
+ * pfn_array_unpin() - Unpin user pages in memory
+ * @pa: pfn_array on which to perform the operation
+ * @vdev: the vfio device to perform the operation
+ * @pa_nr: number of user pages to unpin
+ *
+ * Only unpin if any pages were pinned to begin with, i.e. pa_nr > 0,
+ * otherwise only clear pa->pa_nr
+ */
+static void pfn_array_unpin(struct pfn_array *pa,
+			    struct vfio_device *vdev, int pa_nr)
+{
+	int unpinned = 0, npage = 1;
+
+	while (unpinned < pa_nr) {
+		unsigned long *first = &pa->pa_iova_pfn[unpinned];
+		unsigned long *last = &first[npage];
+
+		if (unpinned + npage < pa_nr && *first + npage == *last) {
+			npage++;
+			continue;
+		}
+
+		vfio_unpin_pages(vdev, first, npage);
+		unpinned += npage;
+		npage = 1;
+	}
+
+	pa->pa_nr = 0;
+}
+
 /*
  * pfn_array_pin() - Pin user pages in memory
  * @pa: pfn_array on which to perform the operation
@@ -101,34 +132,43 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
  */
 static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
 {
+	int pinned = 0, npage = 1;
 	int ret = 0;
 
-	ret = vfio_pin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr,
-			     IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
+	while (pinned < pa->pa_nr) {
+		unsigned long *first = &pa->pa_iova_pfn[pinned];
+		unsigned long *last = &first[npage];
 
-	if (ret < 0) {
-		goto err_out;
-	} else if (ret > 0 && ret != pa->pa_nr) {
-		vfio_unpin_pages(vdev, pa->pa_iova_pfn, ret);
-		ret = -EINVAL;
-		goto err_out;
+		if (pinned + npage < pa->pa_nr && *first + npage == *last) {
+			npage++;
+			continue;
+		}
+
+		ret = vfio_pin_pages(vdev, first, npage,
+				     IOMMU_READ | IOMMU_WRITE,
+				     &pa->pa_pfn[pinned]);
+		if (ret < 0) {
+			goto err_out;
+		} else if (ret > 0 && ret != npage) {
+			pinned += ret;
+			ret = -EINVAL;
+			goto err_out;
+		}
+		pinned += npage;
+		npage = 1;
 	}
 
 	return ret;
 
 err_out:
-	pa->pa_nr = 0;
-
+	pfn_array_unpin(pa, vdev, pinned);
 	return ret;
 }
 
 /* Unpin the pages before releasing the memory. */
 static void pfn_array_unpin_free(struct pfn_array *pa, struct vfio_device *vdev)
 {
-	/* Only unpin if any pages were pinned to begin with */
-	if (pa->pa_nr)
-		vfio_unpin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr);
-	pa->pa_nr = 0;
+	pfn_array_unpin(pa, vdev, pa->pa_nr);
 	kfree(pa->pa_iova_pfn);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 2/6] vfio/ccw: Only pass in contiguous pages
@ 2022-06-16 23:52   ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: linux-s390, kvm, linux-doc, intel-gfx, jchrist, linux-kernel,
	dri-devel, intel-gvt-dev

This driver is the only caller of vfio_pin/unpin_pages that might pass
in a non-contiguous PFN list, but in many cases it has a contiguous PFN
list to process. So letting VFIO API handle a non-contiguous PFN list
is actually counterproductive.

Add a pair of simple loops to pass in contiguous PFNs only, to have an
efficient implementation in VFIO.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 68 +++++++++++++++++++++++++++-------
 1 file changed, 54 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 0c2be9421ab7..4659c2e35877 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -90,6 +90,37 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
 	return 0;
 }
 
+/*
+ * pfn_array_unpin() - Unpin user pages in memory
+ * @pa: pfn_array on which to perform the operation
+ * @vdev: the vfio device to perform the operation
+ * @pa_nr: number of user pages to unpin
+ *
+ * Only unpin if any pages were pinned to begin with, i.e. pa_nr > 0,
+ * otherwise only clear pa->pa_nr
+ */
+static void pfn_array_unpin(struct pfn_array *pa,
+			    struct vfio_device *vdev, int pa_nr)
+{
+	int unpinned = 0, npage = 1;
+
+	while (unpinned < pa_nr) {
+		unsigned long *first = &pa->pa_iova_pfn[unpinned];
+		unsigned long *last = &first[npage];
+
+		if (unpinned + npage < pa_nr && *first + npage == *last) {
+			npage++;
+			continue;
+		}
+
+		vfio_unpin_pages(vdev, first, npage);
+		unpinned += npage;
+		npage = 1;
+	}
+
+	pa->pa_nr = 0;
+}
+
 /*
  * pfn_array_pin() - Pin user pages in memory
  * @pa: pfn_array on which to perform the operation
@@ -101,34 +132,43 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
  */
 static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
 {
+	int pinned = 0, npage = 1;
 	int ret = 0;
 
-	ret = vfio_pin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr,
-			     IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
+	while (pinned < pa->pa_nr) {
+		unsigned long *first = &pa->pa_iova_pfn[pinned];
+		unsigned long *last = &first[npage];
 
-	if (ret < 0) {
-		goto err_out;
-	} else if (ret > 0 && ret != pa->pa_nr) {
-		vfio_unpin_pages(vdev, pa->pa_iova_pfn, ret);
-		ret = -EINVAL;
-		goto err_out;
+		if (pinned + npage < pa->pa_nr && *first + npage == *last) {
+			npage++;
+			continue;
+		}
+
+		ret = vfio_pin_pages(vdev, first, npage,
+				     IOMMU_READ | IOMMU_WRITE,
+				     &pa->pa_pfn[pinned]);
+		if (ret < 0) {
+			goto err_out;
+		} else if (ret > 0 && ret != npage) {
+			pinned += ret;
+			ret = -EINVAL;
+			goto err_out;
+		}
+		pinned += npage;
+		npage = 1;
 	}
 
 	return ret;
 
 err_out:
-	pa->pa_nr = 0;
-
+	pfn_array_unpin(pa, vdev, pinned);
 	return ret;
 }
 
 /* Unpin the pages before releasing the memory. */
 static void pfn_array_unpin_free(struct pfn_array *pa, struct vfio_device *vdev)
 {
-	/* Only unpin if any pages were pinned to begin with */
-	if (pa->pa_nr)
-		vfio_unpin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr);
-	pa->pa_nr = 0;
+	pfn_array_unpin(pa, vdev, pa->pa_nr);
 	kfree(pa->pa_iova_pfn);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
  2022-06-16 23:52 ` Nicolin Chen
@ 2022-06-16 23:52   ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: jchrist, kvm, linux-doc, linux-kernel, linux-s390, intel-gvt-dev,
	intel-gfx, dri-devel

The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA.
Among all three callers, there was only one caller possibly passing in
a non-contiguous PFN list, which is now ensured to have contiguous PFN
inputs too.

Pass in the starting address with "iova" alone to simplify things, so
callers no longer need to maintain a PFN list or to pin/unpin one page
at a time. This also allows VFIO to use more efficient implementations
of pin/unpin_pages.

For now, also update vfio_iommu_type1 to fit this new parameter too,
while keeping its input intact (being user_iova) since we don't want
to spend too much effort swapping its parameters and local variables
at that level.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 .../driver-api/vfio-mediated-device.rst       |  4 +--
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 25 +++++++-----------
 drivers/s390/cio/vfio_ccw_cp.c                |  4 +--
 drivers/s390/crypto/vfio_ap_ops.c             |  9 +++----
 drivers/vfio/vfio.c                           | 26 +++++++++----------
 drivers/vfio/vfio.h                           |  4 +--
 drivers/vfio/vfio_iommu_type1.c               | 17 ++++++------
 include/linux/vfio.h                          |  4 +--
 8 files changed, 41 insertions(+), 52 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index eded8719180f..d28f8bcbfbc6 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -262,10 +262,10 @@ Translation APIs for Mediated Devices
 The following APIs are provided for translating user pfn to host pfn in a VFIO
 driver::
 
-	int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+	int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 				  int npage, int prot, unsigned long *phys_pfn);
 
-	int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+	int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 				    int npage);
 
 These functions call back into the back-end IOMMU module by using the pin_pages
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index e2f6c56ab342..c9bdc3901f1e 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -231,19 +231,12 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
 static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		unsigned long size)
 {
+	unsigned long npage = roundup(size, PAGE_SIZE) / PAGE_SIZE;
 	struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
-	int total_pages;
-	int npage;
 	int ret;
 
-	total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
-
-	for (npage = 0; npage < total_pages; npage++) {
-		unsigned long cur_gfn = gfn + npage;
-
-		ret = vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
-		drm_WARN_ON(&i915->drm, ret != 1);
-	}
+	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
+	drm_WARN_ON(&i915->drm, ret != npage);
 }
 
 /* Pin a normal or compound guest page for dma. */
@@ -261,14 +254,14 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	 * on stack to hold pfns.
 	 */
 	for (npage = 0; npage < total_pages; npage++) {
-		unsigned long cur_gfn = gfn + npage;
+		unsigned long cur_iova = (gfn + npage) << PAGE_SHIFT;
 		unsigned long pfn;
 
-		ret = vfio_pin_pages(&vgpu->vfio_device, &cur_gfn, 1,
+		ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
 				     IOMMU_READ | IOMMU_WRITE, &pfn);
 		if (ret != 1) {
-			gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
-				     cur_gfn, ret);
+			gvt_vgpu_err("vfio_pin_pages failed for iova 0x%lx, ret %d\n",
+				     cur_iova, ret);
 			goto err;
 		}
 
@@ -312,7 +305,7 @@ static int gvt_dma_map_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	if (dma_mapping_error(dev, *dma_addr)) {
 		gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret %d\n",
 			     page_to_pfn(page), ret);
-		gvt_unpin_guest_page(vgpu, gfn, size);
+		gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
 		return -ENOMEM;
 	}
 
@@ -325,7 +318,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	struct device *dev = vgpu->gvt->gt->i915->drm.dev;
 
 	dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
-	gvt_unpin_guest_page(vgpu, gfn, size);
+	gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
 }
 
 static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu *vgpu,
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 4659c2e35877..e2b01115b3ec 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -113,7 +113,7 @@ static void pfn_array_unpin(struct pfn_array *pa,
 			continue;
 		}
 
-		vfio_unpin_pages(vdev, first, npage);
+		vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
 		unpinned += npage;
 		npage = 1;
 	}
@@ -144,7 +144,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
 			continue;
 		}
 
-		ret = vfio_pin_pages(vdev, first, npage,
+		ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
 				     IOMMU_READ | IOMMU_WRITE,
 				     &pa->pa_pfn[pinned]);
 		if (ret < 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bb869b28cebd..8a2018ab3cf0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -124,7 +124,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
 		q->saved_isc = VFIO_AP_ISC_INVALID;
 	}
 	if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
-		vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
+		vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
 		q->saved_pfn = 0;
 	}
 }
@@ -258,7 +258,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 		return status;
 	}
 
-	ret = vfio_pin_pages(&q->matrix_mdev->vdev, &g_pfn, 1,
+	ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
 			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
 	switch (ret) {
 	case 1:
@@ -301,7 +301,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 		break;
 	case AP_RESPONSE_OTHERWISE_CHANGED:
 		/* We could not modify IRQ setings: clear new configuration */
-		vfio_unpin_pages(&q->matrix_mdev->vdev, &g_pfn, 1);
+		vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
 		kvm_s390_gisc_unregister(kvm, isc);
 		break;
 	default:
@@ -1248,9 +1248,8 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 
 	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
 		struct vfio_iommu_type1_dma_unmap *unmap = data;
-		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
 
-		vfio_unpin_pages(&matrix_mdev->vdev, &g_pfn, 1);
+		vfio_unpin_pages(&matrix_mdev->vdev, unmap->iova, 1);
 		return NOTIFY_OK;
 	}
 
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 61e71c1154be..861594dd461a 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1910,17 +1910,17 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
 EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
 
 /*
- * Pin a set of guest PFNs and return their associated host PFNs for local
+ * Pin contiguous guest pages and return their associated host pages for local
  * domain only.
  * @device [in]  : device
- * @user_pfn [in]: array of user/guest PFNs to be pinned.
- * @npage [in]   : count of elements in user_pfn array.  This count should not
+ * @iova [in]    : starting IOVA of user/guest pages to be pinned.
+ * @npage [in]   : count of pages to be pinned.  This count should not
  *		   be greater VFIO_PIN_PAGES_MAX_ENTRIES.
  * @prot [in]    : protection flags
  * @phys_pfn[out]: array of host PFNs
  * Return error or number of pages pinned.
  */
-int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 		   int npage, int prot, unsigned long *phys_pfn)
 {
 	struct vfio_container *container;
@@ -1928,8 +1928,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
 	struct vfio_iommu_driver *driver;
 	int ret;
 
-	if (!user_pfn || !phys_pfn || !npage ||
-	    !vfio_assert_device_open(device))
+	if (!phys_pfn || !npage || !vfio_assert_device_open(device))
 		return -EINVAL;
 
 	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1943,7 +1942,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
 	driver = container->iommu_driver;
 	if (likely(driver && driver->ops->pin_pages))
 		ret = driver->ops->pin_pages(container->iommu_data,
-					     group->iommu_group, user_pfn,
+					     group->iommu_group, iova,
 					     npage, prot, phys_pfn);
 	else
 		ret = -ENOTTY;
@@ -1953,22 +1952,21 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
 EXPORT_SYMBOL(vfio_pin_pages);
 
 /*
- * Unpin set of host PFNs for local domain only.
+ * Unpin contiguous host pages for local domain only.
  * @device [in]  : device
- * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of user/guest
- *		   PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * @npage [in]   : count of elements in user_pfn array.  This count should not
+ * @iova [in]    : starting address of user/guest pages to be unpinned.
+ * @npage [in]   : count of pages to be unpinned.  This count should not
  *                 be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
  * Return error or number of pages unpinned.
  */
-int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 		     int npage)
 {
 	struct vfio_container *container;
 	struct vfio_iommu_driver *driver;
 	int ret;
 
-	if (!user_pfn || !npage || !vfio_assert_device_open(device))
+	if (!npage || !vfio_assert_device_open(device))
 		return -EINVAL;
 
 	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1978,7 +1976,7 @@ int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
 	container = device->group->container;
 	driver = container->iommu_driver;
 	if (likely(driver && driver->ops->unpin_pages))
-		ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
+		ret = driver->ops->unpin_pages(container->iommu_data, iova,
 					       npage);
 	else
 		ret = -ENOTTY;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index a67130221151..6bd5304ee0b7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -50,11 +50,11 @@ struct vfio_iommu_driver_ops {
 					struct iommu_group *group);
 	int		(*pin_pages)(void *iommu_data,
 				     struct iommu_group *group,
-				     unsigned long *user_pfn,
+				     dma_addr_t user_iova,
 				     int npage, int prot,
 				     unsigned long *phys_pfn);
 	int		(*unpin_pages)(void *iommu_data,
-				       unsigned long *user_pfn, int npage);
+				       dma_addr_t user_iova, int npage);
 	int		(*register_notifier)(void *iommu_data,
 					     unsigned long *events,
 					     struct notifier_block *nb);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..d027ed8441a9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -828,7 +828,7 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova,
 
 static int vfio_iommu_type1_pin_pages(void *iommu_data,
 				      struct iommu_group *iommu_group,
-				      unsigned long *user_pfn,
+				      dma_addr_t user_iova,
 				      int npage, int prot,
 				      unsigned long *phys_pfn)
 {
@@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	bool do_accounting;
 	dma_addr_t iova;
 
-	if (!iommu || !user_pfn || !phys_pfn)
+	if (!iommu || !phys_pfn)
 		return -EINVAL;
 
 	/* Supported for v2 version only */
@@ -856,7 +856,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 again:
 	if (iommu->vaddr_invalid_count) {
 		for (i = 0; i < npage; i++) {
-			iova = user_pfn[i] << PAGE_SHIFT;
+			iova = user_iova + PAGE_SIZE * i;
 			ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
 			if (ret < 0)
 				goto pin_done;
@@ -881,7 +881,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	for (i = 0; i < npage; i++) {
 		struct vfio_pfn *vpfn;
 
-		iova = user_pfn[i] << PAGE_SHIFT;
+		iova = user_iova + PAGE_SIZE * i;
 		dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
 		if (!dma) {
 			ret = -EINVAL;
@@ -938,7 +938,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	for (j = 0; j < i; j++) {
 		dma_addr_t iova;
 
-		iova = user_pfn[j] << PAGE_SHIFT;
+		iova = user_iova + PAGE_SIZE * j;
 		dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
 		vfio_unpin_page_external(dma, iova, do_accounting);
 		phys_pfn[j] = 0;
@@ -949,14 +949,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 }
 
 static int vfio_iommu_type1_unpin_pages(void *iommu_data,
-					unsigned long *user_pfn,
+					dma_addr_t user_iova,
 					int npage)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	bool do_accounting;
 	int i;
 
-	if (!iommu || !user_pfn || npage <= 0)
+	if (!iommu || npage <= 0)
 		return -EINVAL;
 
 	/* Supported for v2 version only */
@@ -967,10 +967,9 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
 
 	do_accounting = list_empty(&iommu->domain_list);
 	for (i = 0; i < npage; i++) {
+		dma_addr_t iova = user_iova + PAGE_SIZE * i;
 		struct vfio_dma *dma;
-		dma_addr_t iova;
 
-		iova = user_pfn[i] << PAGE_SHIFT;
 		dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
 		if (!dma)
 			break;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index aa888cc51757..edbad5d3d50a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -147,9 +147,9 @@ extern bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
 
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
-extern int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 			  int npage, int prot, unsigned long *phys_pfn);
-extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 			    int npage);
 extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
 		       void *data, size_t len, bool write);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
@ 2022-06-16 23:52   ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: linux-s390, kvm, linux-doc, intel-gfx, jchrist, linux-kernel,
	dri-devel, intel-gvt-dev

The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA.
Among all three callers, there was only one caller possibly passing in
a non-contiguous PFN list, which is now ensured to have contiguous PFN
inputs too.

Pass in the starting address with "iova" alone to simplify things, so
callers no longer need to maintain a PFN list or to pin/unpin one page
at a time. This also allows VFIO to use more efficient implementations
of pin/unpin_pages.

For now, also update vfio_iommu_type1 to fit this new parameter too,
while keeping its input intact (being user_iova) since we don't want
to spend too much effort swapping its parameters and local variables
at that level.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 .../driver-api/vfio-mediated-device.rst       |  4 +--
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 25 +++++++-----------
 drivers/s390/cio/vfio_ccw_cp.c                |  4 +--
 drivers/s390/crypto/vfio_ap_ops.c             |  9 +++----
 drivers/vfio/vfio.c                           | 26 +++++++++----------
 drivers/vfio/vfio.h                           |  4 +--
 drivers/vfio/vfio_iommu_type1.c               | 17 ++++++------
 include/linux/vfio.h                          |  4 +--
 8 files changed, 41 insertions(+), 52 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index eded8719180f..d28f8bcbfbc6 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -262,10 +262,10 @@ Translation APIs for Mediated Devices
 The following APIs are provided for translating user pfn to host pfn in a VFIO
 driver::
 
-	int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+	int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 				  int npage, int prot, unsigned long *phys_pfn);
 
-	int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+	int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 				    int npage);
 
 These functions call back into the back-end IOMMU module by using the pin_pages
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index e2f6c56ab342..c9bdc3901f1e 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -231,19 +231,12 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
 static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		unsigned long size)
 {
+	unsigned long npage = roundup(size, PAGE_SIZE) / PAGE_SIZE;
 	struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
-	int total_pages;
-	int npage;
 	int ret;
 
-	total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
-
-	for (npage = 0; npage < total_pages; npage++) {
-		unsigned long cur_gfn = gfn + npage;
-
-		ret = vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
-		drm_WARN_ON(&i915->drm, ret != 1);
-	}
+	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
+	drm_WARN_ON(&i915->drm, ret != npage);
 }
 
 /* Pin a normal or compound guest page for dma. */
@@ -261,14 +254,14 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	 * on stack to hold pfns.
 	 */
 	for (npage = 0; npage < total_pages; npage++) {
-		unsigned long cur_gfn = gfn + npage;
+		unsigned long cur_iova = (gfn + npage) << PAGE_SHIFT;
 		unsigned long pfn;
 
-		ret = vfio_pin_pages(&vgpu->vfio_device, &cur_gfn, 1,
+		ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
 				     IOMMU_READ | IOMMU_WRITE, &pfn);
 		if (ret != 1) {
-			gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
-				     cur_gfn, ret);
+			gvt_vgpu_err("vfio_pin_pages failed for iova 0x%lx, ret %d\n",
+				     cur_iova, ret);
 			goto err;
 		}
 
@@ -312,7 +305,7 @@ static int gvt_dma_map_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	if (dma_mapping_error(dev, *dma_addr)) {
 		gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret %d\n",
 			     page_to_pfn(page), ret);
-		gvt_unpin_guest_page(vgpu, gfn, size);
+		gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
 		return -ENOMEM;
 	}
 
@@ -325,7 +318,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	struct device *dev = vgpu->gvt->gt->i915->drm.dev;
 
 	dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
-	gvt_unpin_guest_page(vgpu, gfn, size);
+	gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
 }
 
 static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu *vgpu,
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 4659c2e35877..e2b01115b3ec 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -113,7 +113,7 @@ static void pfn_array_unpin(struct pfn_array *pa,
 			continue;
 		}
 
-		vfio_unpin_pages(vdev, first, npage);
+		vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
 		unpinned += npage;
 		npage = 1;
 	}
@@ -144,7 +144,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
 			continue;
 		}
 
-		ret = vfio_pin_pages(vdev, first, npage,
+		ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
 				     IOMMU_READ | IOMMU_WRITE,
 				     &pa->pa_pfn[pinned]);
 		if (ret < 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bb869b28cebd..8a2018ab3cf0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -124,7 +124,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
 		q->saved_isc = VFIO_AP_ISC_INVALID;
 	}
 	if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
-		vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
+		vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
 		q->saved_pfn = 0;
 	}
 }
@@ -258,7 +258,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 		return status;
 	}
 
-	ret = vfio_pin_pages(&q->matrix_mdev->vdev, &g_pfn, 1,
+	ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
 			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
 	switch (ret) {
 	case 1:
@@ -301,7 +301,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 		break;
 	case AP_RESPONSE_OTHERWISE_CHANGED:
 		/* We could not modify IRQ setings: clear new configuration */
-		vfio_unpin_pages(&q->matrix_mdev->vdev, &g_pfn, 1);
+		vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
 		kvm_s390_gisc_unregister(kvm, isc);
 		break;
 	default:
@@ -1248,9 +1248,8 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 
 	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
 		struct vfio_iommu_type1_dma_unmap *unmap = data;
-		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
 
-		vfio_unpin_pages(&matrix_mdev->vdev, &g_pfn, 1);
+		vfio_unpin_pages(&matrix_mdev->vdev, unmap->iova, 1);
 		return NOTIFY_OK;
 	}
 
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 61e71c1154be..861594dd461a 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1910,17 +1910,17 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
 EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
 
 /*
- * Pin a set of guest PFNs and return their associated host PFNs for local
+ * Pin contiguous guest pages and return their associated host pages for local
  * domain only.
  * @device [in]  : device
- * @user_pfn [in]: array of user/guest PFNs to be pinned.
- * @npage [in]   : count of elements in user_pfn array.  This count should not
+ * @iova [in]    : starting IOVA of user/guest pages to be pinned.
+ * @npage [in]   : count of pages to be pinned.  This count should not
  *		   be greater VFIO_PIN_PAGES_MAX_ENTRIES.
  * @prot [in]    : protection flags
  * @phys_pfn[out]: array of host PFNs
  * Return error or number of pages pinned.
  */
-int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 		   int npage, int prot, unsigned long *phys_pfn)
 {
 	struct vfio_container *container;
@@ -1928,8 +1928,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
 	struct vfio_iommu_driver *driver;
 	int ret;
 
-	if (!user_pfn || !phys_pfn || !npage ||
-	    !vfio_assert_device_open(device))
+	if (!phys_pfn || !npage || !vfio_assert_device_open(device))
 		return -EINVAL;
 
 	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1943,7 +1942,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
 	driver = container->iommu_driver;
 	if (likely(driver && driver->ops->pin_pages))
 		ret = driver->ops->pin_pages(container->iommu_data,
-					     group->iommu_group, user_pfn,
+					     group->iommu_group, iova,
 					     npage, prot, phys_pfn);
 	else
 		ret = -ENOTTY;
@@ -1953,22 +1952,21 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
 EXPORT_SYMBOL(vfio_pin_pages);
 
 /*
- * Unpin set of host PFNs for local domain only.
+ * Unpin contiguous host pages for local domain only.
  * @device [in]  : device
- * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of user/guest
- *		   PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * @npage [in]   : count of elements in user_pfn array.  This count should not
+ * @iova [in]    : starting address of user/guest pages to be unpinned.
+ * @npage [in]   : count of pages to be unpinned.  This count should not
  *                 be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
  * Return error or number of pages unpinned.
  */
-int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 		     int npage)
 {
 	struct vfio_container *container;
 	struct vfio_iommu_driver *driver;
 	int ret;
 
-	if (!user_pfn || !npage || !vfio_assert_device_open(device))
+	if (!npage || !vfio_assert_device_open(device))
 		return -EINVAL;
 
 	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1978,7 +1976,7 @@ int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
 	container = device->group->container;
 	driver = container->iommu_driver;
 	if (likely(driver && driver->ops->unpin_pages))
-		ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
+		ret = driver->ops->unpin_pages(container->iommu_data, iova,
 					       npage);
 	else
 		ret = -ENOTTY;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index a67130221151..6bd5304ee0b7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -50,11 +50,11 @@ struct vfio_iommu_driver_ops {
 					struct iommu_group *group);
 	int		(*pin_pages)(void *iommu_data,
 				     struct iommu_group *group,
-				     unsigned long *user_pfn,
+				     dma_addr_t user_iova,
 				     int npage, int prot,
 				     unsigned long *phys_pfn);
 	int		(*unpin_pages)(void *iommu_data,
-				       unsigned long *user_pfn, int npage);
+				       dma_addr_t user_iova, int npage);
 	int		(*register_notifier)(void *iommu_data,
 					     unsigned long *events,
 					     struct notifier_block *nb);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..d027ed8441a9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -828,7 +828,7 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova,
 
 static int vfio_iommu_type1_pin_pages(void *iommu_data,
 				      struct iommu_group *iommu_group,
-				      unsigned long *user_pfn,
+				      dma_addr_t user_iova,
 				      int npage, int prot,
 				      unsigned long *phys_pfn)
 {
@@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	bool do_accounting;
 	dma_addr_t iova;
 
-	if (!iommu || !user_pfn || !phys_pfn)
+	if (!iommu || !phys_pfn)
 		return -EINVAL;
 
 	/* Supported for v2 version only */
@@ -856,7 +856,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 again:
 	if (iommu->vaddr_invalid_count) {
 		for (i = 0; i < npage; i++) {
-			iova = user_pfn[i] << PAGE_SHIFT;
+			iova = user_iova + PAGE_SIZE * i;
 			ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
 			if (ret < 0)
 				goto pin_done;
@@ -881,7 +881,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	for (i = 0; i < npage; i++) {
 		struct vfio_pfn *vpfn;
 
-		iova = user_pfn[i] << PAGE_SHIFT;
+		iova = user_iova + PAGE_SIZE * i;
 		dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
 		if (!dma) {
 			ret = -EINVAL;
@@ -938,7 +938,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	for (j = 0; j < i; j++) {
 		dma_addr_t iova;
 
-		iova = user_pfn[j] << PAGE_SHIFT;
+		iova = user_iova + PAGE_SIZE * j;
 		dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
 		vfio_unpin_page_external(dma, iova, do_accounting);
 		phys_pfn[j] = 0;
@@ -949,14 +949,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 }
 
 static int vfio_iommu_type1_unpin_pages(void *iommu_data,
-					unsigned long *user_pfn,
+					dma_addr_t user_iova,
 					int npage)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	bool do_accounting;
 	int i;
 
-	if (!iommu || !user_pfn || npage <= 0)
+	if (!iommu || npage <= 0)
 		return -EINVAL;
 
 	/* Supported for v2 version only */
@@ -967,10 +967,9 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
 
 	do_accounting = list_empty(&iommu->domain_list);
 	for (i = 0; i < npage; i++) {
+		dma_addr_t iova = user_iova + PAGE_SIZE * i;
 		struct vfio_dma *dma;
-		dma_addr_t iova;
 
-		iova = user_pfn[i] << PAGE_SHIFT;
 		dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
 		if (!dma)
 			break;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index aa888cc51757..edbad5d3d50a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -147,9 +147,9 @@ extern bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
 
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
-extern int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 			  int npage, int prot, unsigned long *phys_pfn);
-extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 			    int npage);
 extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
 		       void *data, size_t len, bool write);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 4/6] vfio: Rename user_iova of vfio_dma_rw()
  2022-06-16 23:52 ` Nicolin Chen
@ 2022-06-16 23:52   ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: jchrist, kvm, linux-doc, linux-kernel, linux-s390, intel-gvt-dev,
	intel-gfx, dri-devel

Following the updated vfio_pin/unpin_pages(), use the simpler "iova".

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/vfio/vfio.c  | 6 +++---
 include/linux/vfio.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 861594dd461a..e8dbb0122e20 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1996,13 +1996,13 @@ EXPORT_SYMBOL(vfio_unpin_pages);
  * not a real device DMA, it is not necessary to pin the user space memory.
  *
  * @device [in]		: VFIO device
- * @user_iova [in]	: base IOVA of a user space buffer
+ * @iova [in]		: base IOVA of a user space buffer
  * @data [in]		: pointer to kernel buffer
  * @len [in]		: kernel buffer length
  * @write		: indicate read or write
  * Return error code on failure or 0 on success.
  */
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
 		size_t len, bool write)
 {
 	struct vfio_container *container;
@@ -2018,7 +2018,7 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
 
 	if (likely(driver && driver->ops->dma_rw))
 		ret = driver->ops->dma_rw(container->iommu_data,
-					  user_iova, data, len, write);
+					  iova, data, len, write);
 	else
 		ret = -ENOTTY;
 	return ret;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index edbad5d3d50a..99c3bf52c4da 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -151,7 +151,7 @@ extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 			  int npage, int prot, unsigned long *phys_pfn);
 extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 			    int npage);
-extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
+extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
 		       void *data, size_t len, bool write);
 
 /* each type has independent events */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 4/6] vfio: Rename user_iova of vfio_dma_rw()
@ 2022-06-16 23:52   ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: linux-s390, kvm, linux-doc, intel-gfx, jchrist, linux-kernel,
	dri-devel, intel-gvt-dev

Following the updated vfio_pin/unpin_pages(), use the simpler "iova".

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/vfio/vfio.c  | 6 +++---
 include/linux/vfio.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 861594dd461a..e8dbb0122e20 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1996,13 +1996,13 @@ EXPORT_SYMBOL(vfio_unpin_pages);
  * not a real device DMA, it is not necessary to pin the user space memory.
  *
  * @device [in]		: VFIO device
- * @user_iova [in]	: base IOVA of a user space buffer
+ * @iova [in]		: base IOVA of a user space buffer
  * @data [in]		: pointer to kernel buffer
  * @len [in]		: kernel buffer length
  * @write		: indicate read or write
  * Return error code on failure or 0 on success.
  */
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
 		size_t len, bool write)
 {
 	struct vfio_container *container;
@@ -2018,7 +2018,7 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
 
 	if (likely(driver && driver->ops->dma_rw))
 		ret = driver->ops->dma_rw(container->iommu_data,
-					  user_iova, data, len, write);
+					  iova, data, len, write);
 	else
 		ret = -ENOTTY;
 	return ret;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index edbad5d3d50a..99c3bf52c4da 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -151,7 +151,7 @@ extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 			  int npage, int prot, unsigned long *phys_pfn);
 extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 			    int npage);
-extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
+extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
 		       void *data, size_t len, bool write);
 
 /* each type has independent events */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-16 23:52 ` Nicolin Chen
@ 2022-06-16 23:52   ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: jchrist, kvm, linux-doc, linux-kernel, linux-s390, intel-gvt-dev,
	intel-gfx, dri-devel

The pinned PFN list returned from vfio_pin_pages() is simply converted
using page_to_pfn() without protection, so direct access via memcpy()
will crash on S390 if the PFN is an IO PFN. Instead, the pages should
be touched using kmap_local_page().

Add kmap_local_page() before doing memcpy on "from".

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index e2b01115b3ec..12cbe66721af 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -11,6 +11,7 @@
 #include <linux/ratelimit.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
+#include <linux/highmem.h>
 #include <linux/iommu.h>
 #include <linux/vfio.h>
 #include <asm/idals.h>
@@ -235,7 +236,6 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
 			   unsigned long n)
 {
 	struct pfn_array pa = {0};
-	u64 from;
 	int i, ret;
 	unsigned long l, m;
 
@@ -251,7 +251,9 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
 
 	l = n;
 	for (i = 0; i < pa.pa_nr; i++) {
-		from = pa.pa_pfn[i] << PAGE_SHIFT;
+		struct page *page = pfn_to_page(pa.pa_pfn[i]);
+		void *from = kmap_local_page(page);
+
 		m = PAGE_SIZE;
 		if (i == 0) {
 			from += iova & (PAGE_SIZE - 1);
@@ -259,7 +261,8 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
 		}
 
 		m = min(l, m);
-		memcpy(to + (n - l), (void *)from, m);
+		memcpy(to + (n - l), from, m);
+		kunmap_local(from);
 
 		l -= m;
 		if (l == 0)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-16 23:52   ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: linux-s390, kvm, linux-doc, intel-gfx, jchrist, linux-kernel,
	dri-devel, intel-gvt-dev

The pinned PFN list returned from vfio_pin_pages() is simply converted
using page_to_pfn() without protection, so direct access via memcpy()
will crash on S390 if the PFN is an IO PFN. Instead, the pages should
be touched using kmap_local_page().

Add kmap_local_page() before doing memcpy on "from".

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index e2b01115b3ec..12cbe66721af 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -11,6 +11,7 @@
 #include <linux/ratelimit.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
+#include <linux/highmem.h>
 #include <linux/iommu.h>
 #include <linux/vfio.h>
 #include <asm/idals.h>
@@ -235,7 +236,6 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
 			   unsigned long n)
 {
 	struct pfn_array pa = {0};
-	u64 from;
 	int i, ret;
 	unsigned long l, m;
 
@@ -251,7 +251,9 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
 
 	l = n;
 	for (i = 0; i < pa.pa_nr; i++) {
-		from = pa.pa_pfn[i] << PAGE_SHIFT;
+		struct page *page = pfn_to_page(pa.pa_pfn[i]);
+		void *from = kmap_local_page(page);
+
 		m = PAGE_SIZE;
 		if (i == 0) {
 			from += iova & (PAGE_SIZE - 1);
@@ -259,7 +261,8 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
 		}
 
 		m = min(l, m);
-		memcpy(to + (n - l), (void *)from, m);
+		memcpy(to + (n - l), from, m);
+		kunmap_local(from);
 
 		l -= m;
 		if (l == 0)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-16 23:52 ` Nicolin Chen
@ 2022-06-16 23:52   ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: jchrist, kvm, linux-doc, linux-kernel, linux-s390, intel-gvt-dev,
	intel-gfx, dri-devel

Most of the callers of vfio_pin_pages() want "struct page *" and the
low-level mm code to pin pages returns a list of "struct page *" too.
So there's no gain in converting "struct page *" to PFN in between.

Replace the output parameter phys_pfn list with a phys_page list, to
simplify callers. This also allows us to replace the vfio_iommu_type1
implementation with a more efficient one.

For now, also update vfio_iommu_type1 to fit this new parameter too.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 .../driver-api/vfio-mediated-device.rst       |  2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 19 ++++++-------------
 drivers/s390/cio/vfio_ccw_cp.c                | 19 +++++++++----------
 drivers/s390/crypto/vfio_ap_ops.c             |  7 ++++---
 drivers/vfio/vfio.c                           |  8 ++++----
 drivers/vfio/vfio.h                           |  2 +-
 drivers/vfio/vfio_iommu_type1.c               | 19 +++++++++++--------
 include/linux/vfio.h                          |  2 +-
 8 files changed, 37 insertions(+), 41 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index d28f8bcbfbc6..070e51bb0bb6 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -263,7 +263,7 @@ The following APIs are provided for translating user pfn to host pfn in a VFIO
 driver::
 
 	int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
-				  int npage, int prot, unsigned long *phys_pfn);
+				  int npage, int prot, struct page **phys_page);
 
 	int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 				    int npage);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index c9bdc3901f1e..669432999676 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -243,7 +243,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		unsigned long size, struct page **page)
 {
-	unsigned long base_pfn = 0;
+	struct page *base_page = NULL;
 	int total_pages;
 	int npage;
 	int ret;
@@ -255,26 +255,19 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	 */
 	for (npage = 0; npage < total_pages; npage++) {
 		unsigned long cur_iova = (gfn + npage) << PAGE_SHIFT;
-		unsigned long pfn;
+		struct page *cur_page;
 
 		ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
-				     IOMMU_READ | IOMMU_WRITE, &pfn);
+				     IOMMU_READ | IOMMU_WRITE, &cur_page);
 		if (ret != 1) {
 			gvt_vgpu_err("vfio_pin_pages failed for iova 0x%lx, ret %d\n",
 				     cur_iova, ret);
 			goto err;
 		}
 
-		if (!pfn_valid(pfn)) {
-			gvt_vgpu_err("pfn 0x%lx is not mem backed\n", pfn);
-			npage++;
-			ret = -EFAULT;
-			goto err;
-		}
-
 		if (npage == 0)
-			base_pfn = pfn;
-		else if (base_pfn + npage != pfn) {
+			base_page = cur_page;
+		else if (base_page + npage != cur_page) {
 			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
 			npage++;
@@ -282,7 +275,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		}
 	}
 
-	*page = pfn_to_page(base_pfn);
+	*page = base_page;
 	return 0;
 err:
 	gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 12cbe66721af..92be288dff74 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -24,8 +24,8 @@ struct pfn_array {
 	unsigned long		pa_iova;
 	/* Array that stores PFNs of the pages need to pin. */
 	unsigned long		*pa_iova_pfn;
-	/* Array that receives PFNs of the pages pinned. */
-	unsigned long		*pa_pfn;
+	/* Array that receives the pinned pages. */
+	struct page		**pa_page;
 	/* Number of pages pinned from @pa_iova. */
 	int			pa_nr;
 };
@@ -73,19 +73,19 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
 
 	pa->pa_iova_pfn = kcalloc(pa->pa_nr,
 				  sizeof(*pa->pa_iova_pfn) +
-				  sizeof(*pa->pa_pfn),
+				  sizeof(*pa->pa_page),
 				  GFP_KERNEL);
 	if (unlikely(!pa->pa_iova_pfn)) {
 		pa->pa_nr = 0;
 		return -ENOMEM;
 	}
-	pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
+	pa->pa_page = (struct page **)pa->pa_iova_pfn + pa->pa_nr;
 
 	pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
-	pa->pa_pfn[0] = -1ULL;
+	pa->pa_page[0] = NULL;
 	for (i = 1; i < pa->pa_nr; i++) {
 		pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
-		pa->pa_pfn[i] = -1ULL;
+		pa->pa_page[i] = NULL;
 	}
 
 	return 0;
@@ -147,7 +147,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
 
 		ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
 				     IOMMU_READ | IOMMU_WRITE,
-				     &pa->pa_pfn[pinned]);
+				     &pa->pa_page[pinned]);
 		if (ret < 0) {
 			goto err_out;
 		} else if (ret > 0 && ret != npage) {
@@ -200,7 +200,7 @@ static inline void pfn_array_idal_create_words(
 	 */
 
 	for (i = 0; i < pa->pa_nr; i++)
-		idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;
+		idaws[i] = page_to_phys(pa->pa_page[i]);
 
 	/* Adjust the first IDAW, since it may not start on a page boundary */
 	idaws[0] += pa->pa_iova & (PAGE_SIZE - 1);
@@ -251,8 +251,7 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
 
 	l = n;
 	for (i = 0; i < pa.pa_nr; i++) {
-		struct page *page = pfn_to_page(pa.pa_pfn[i]);
-		void *from = kmap_local_page(page);
+		void *from = kmap_local_page(pa.pa_page[i]);
 
 		m = PAGE_SIZE;
 		if (i == 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8a2018ab3cf0..e73bdb57bc90 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -243,9 +243,10 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	struct ap_qirq_ctrl aqic_gisa = {};
 	struct ap_queue_status status = {};
 	struct kvm_s390_gisa *gisa;
+	struct page *h_page;
 	int nisc;
 	struct kvm *kvm;
-	unsigned long g_pfn, h_pfn;
+	unsigned long g_pfn;
 	phys_addr_t h_nib;
 	int ret;
 
@@ -259,7 +260,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	}
 
 	ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
-			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
+			     IOMMU_READ | IOMMU_WRITE, &h_page);
 	switch (ret) {
 	case 1:
 		break;
@@ -275,7 +276,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	kvm = q->matrix_mdev->kvm;
 	gisa = kvm->arch.gisa_int.origin;
 
-	h_nib = (h_pfn << PAGE_SHIFT) | (nib & ~PAGE_MASK);
+	h_nib = page_to_phys(h_page) | (nib & ~PAGE_MASK);
 	aqic_gisa.gisc = isc;
 
 	nisc = kvm_s390_gisc_register(kvm, isc);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index e8dbb0122e20..7eee8048e231 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1917,18 +1917,18 @@ EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
  * @npage [in]   : count of pages to be pinned.  This count should not
  *		   be greater VFIO_PIN_PAGES_MAX_ENTRIES.
  * @prot [in]    : protection flags
- * @phys_pfn[out]: array of host PFNs
+ * @phys_page[out]: array of host pages
  * Return error or number of pages pinned.
  */
 int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
-		   int npage, int prot, unsigned long *phys_pfn)
+		   int npage, int prot, struct page **phys_page)
 {
 	struct vfio_container *container;
 	struct vfio_group *group = device->group;
 	struct vfio_iommu_driver *driver;
 	int ret;
 
-	if (!phys_pfn || !npage || !vfio_assert_device_open(device))
+	if (!phys_page || !npage || !vfio_assert_device_open(device))
 		return -EINVAL;
 
 	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1943,7 +1943,7 @@ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 	if (likely(driver && driver->ops->pin_pages))
 		ret = driver->ops->pin_pages(container->iommu_data,
 					     group->iommu_group, iova,
-					     npage, prot, phys_pfn);
+					     npage, prot, phys_page);
 	else
 		ret = -ENOTTY;
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 6bd5304ee0b7..758a0a91a066 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -52,7 +52,7 @@ struct vfio_iommu_driver_ops {
 				     struct iommu_group *group,
 				     dma_addr_t user_iova,
 				     int npage, int prot,
-				     unsigned long *phys_pfn);
+				     struct page **phys_page);
 	int		(*unpin_pages)(void *iommu_data,
 				       dma_addr_t user_iova, int npage);
 	int		(*register_notifier)(void *iommu_data,
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d027ed8441a9..841b1803e313 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -830,7 +830,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 				      struct iommu_group *iommu_group,
 				      dma_addr_t user_iova,
 				      int npage, int prot,
-				      unsigned long *phys_pfn)
+				      struct page **phys_page)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	struct vfio_iommu_group *group;
@@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	bool do_accounting;
 	dma_addr_t iova;
 
-	if (!iommu || !phys_pfn)
+	if (!iommu || !phys_page)
 		return -EINVAL;
 
 	/* Supported for v2 version only */
@@ -879,6 +879,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	do_accounting = list_empty(&iommu->domain_list);
 
 	for (i = 0; i < npage; i++) {
+		unsigned long phys_pfn;
 		struct vfio_pfn *vpfn;
 
 		iova = user_iova + PAGE_SIZE * i;
@@ -895,23 +896,25 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 
 		vpfn = vfio_iova_get_vfio_pfn(dma, iova);
 		if (vpfn) {
-			phys_pfn[i] = vpfn->pfn;
+			phys_page[i] = pfn_to_page(vpfn->pfn);
 			continue;
 		}
 
 		remote_vaddr = dma->vaddr + (iova - dma->iova);
-		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
+		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
 					     do_accounting);
 		if (ret)
 			goto pin_unwind;
 
-		ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]);
+		ret = vfio_add_to_pfn_list(dma, iova, phys_pfn);
 		if (ret) {
-			if (put_pfn(phys_pfn[i], dma->prot) && do_accounting)
+			if (put_pfn(phys_pfn, dma->prot) && do_accounting)
 				vfio_lock_acct(dma, -1, true);
 			goto pin_unwind;
 		}
 
+		phys_page[i] = pfn_to_page(phys_pfn);
+
 		if (iommu->dirty_page_tracking) {
 			unsigned long pgshift = __ffs(iommu->pgsize_bitmap);
 
@@ -934,14 +937,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	goto pin_done;
 
 pin_unwind:
-	phys_pfn[i] = 0;
+	phys_page[i] = NULL;
 	for (j = 0; j < i; j++) {
 		dma_addr_t iova;
 
 		iova = user_iova + PAGE_SIZE * j;
 		dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
 		vfio_unpin_page_external(dma, iova, do_accounting);
-		phys_pfn[j] = 0;
+		phys_page[j] = NULL;
 	}
 pin_done:
 	mutex_unlock(&iommu->lock);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 99c3bf52c4da..7bc18802bf39 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -148,7 +148,7 @@ extern bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
 extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
-			  int npage, int prot, unsigned long *phys_pfn);
+			  int npage, int prot, struct page **phys_page);
 extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 			    int npage);
 extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-16 23:52   ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-16 23:52 UTC (permalink / raw)
  To: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian
  Cc: linux-s390, kvm, linux-doc, intel-gfx, jchrist, linux-kernel,
	dri-devel, intel-gvt-dev

Most of the callers of vfio_pin_pages() want "struct page *" and the
low-level mm code to pin pages returns a list of "struct page *" too.
So there's no gain in converting "struct page *" to PFN in between.

Replace the output parameter phys_pfn list with a phys_page list, to
simplify callers. This also allows us to replace the vfio_iommu_type1
implementation with a more efficient one.

For now, also update vfio_iommu_type1 to fit this new parameter too.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 .../driver-api/vfio-mediated-device.rst       |  2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c              | 19 ++++++-------------
 drivers/s390/cio/vfio_ccw_cp.c                | 19 +++++++++----------
 drivers/s390/crypto/vfio_ap_ops.c             |  7 ++++---
 drivers/vfio/vfio.c                           |  8 ++++----
 drivers/vfio/vfio.h                           |  2 +-
 drivers/vfio/vfio_iommu_type1.c               | 19 +++++++++++--------
 include/linux/vfio.h                          |  2 +-
 8 files changed, 37 insertions(+), 41 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index d28f8bcbfbc6..070e51bb0bb6 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -263,7 +263,7 @@ The following APIs are provided for translating user pfn to host pfn in a VFIO
 driver::
 
 	int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
-				  int npage, int prot, unsigned long *phys_pfn);
+				  int npage, int prot, struct page **phys_page);
 
 	int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 				    int npage);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index c9bdc3901f1e..669432999676 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -243,7 +243,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		unsigned long size, struct page **page)
 {
-	unsigned long base_pfn = 0;
+	struct page *base_page = NULL;
 	int total_pages;
 	int npage;
 	int ret;
@@ -255,26 +255,19 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	 */
 	for (npage = 0; npage < total_pages; npage++) {
 		unsigned long cur_iova = (gfn + npage) << PAGE_SHIFT;
-		unsigned long pfn;
+		struct page *cur_page;
 
 		ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
-				     IOMMU_READ | IOMMU_WRITE, &pfn);
+				     IOMMU_READ | IOMMU_WRITE, &cur_page);
 		if (ret != 1) {
 			gvt_vgpu_err("vfio_pin_pages failed for iova 0x%lx, ret %d\n",
 				     cur_iova, ret);
 			goto err;
 		}
 
-		if (!pfn_valid(pfn)) {
-			gvt_vgpu_err("pfn 0x%lx is not mem backed\n", pfn);
-			npage++;
-			ret = -EFAULT;
-			goto err;
-		}
-
 		if (npage == 0)
-			base_pfn = pfn;
-		else if (base_pfn + npage != pfn) {
+			base_page = cur_page;
+		else if (base_page + npage != cur_page) {
 			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
 			npage++;
@@ -282,7 +275,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		}
 	}
 
-	*page = pfn_to_page(base_pfn);
+	*page = base_page;
 	return 0;
 err:
 	gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 12cbe66721af..92be288dff74 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -24,8 +24,8 @@ struct pfn_array {
 	unsigned long		pa_iova;
 	/* Array that stores PFNs of the pages need to pin. */
 	unsigned long		*pa_iova_pfn;
-	/* Array that receives PFNs of the pages pinned. */
-	unsigned long		*pa_pfn;
+	/* Array that receives the pinned pages. */
+	struct page		**pa_page;
 	/* Number of pages pinned from @pa_iova. */
 	int			pa_nr;
 };
@@ -73,19 +73,19 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
 
 	pa->pa_iova_pfn = kcalloc(pa->pa_nr,
 				  sizeof(*pa->pa_iova_pfn) +
-				  sizeof(*pa->pa_pfn),
+				  sizeof(*pa->pa_page),
 				  GFP_KERNEL);
 	if (unlikely(!pa->pa_iova_pfn)) {
 		pa->pa_nr = 0;
 		return -ENOMEM;
 	}
-	pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
+	pa->pa_page = (struct page **)pa->pa_iova_pfn + pa->pa_nr;
 
 	pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
-	pa->pa_pfn[0] = -1ULL;
+	pa->pa_page[0] = NULL;
 	for (i = 1; i < pa->pa_nr; i++) {
 		pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
-		pa->pa_pfn[i] = -1ULL;
+		pa->pa_page[i] = NULL;
 	}
 
 	return 0;
@@ -147,7 +147,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
 
 		ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
 				     IOMMU_READ | IOMMU_WRITE,
-				     &pa->pa_pfn[pinned]);
+				     &pa->pa_page[pinned]);
 		if (ret < 0) {
 			goto err_out;
 		} else if (ret > 0 && ret != npage) {
@@ -200,7 +200,7 @@ static inline void pfn_array_idal_create_words(
 	 */
 
 	for (i = 0; i < pa->pa_nr; i++)
-		idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;
+		idaws[i] = page_to_phys(pa->pa_page[i]);
 
 	/* Adjust the first IDAW, since it may not start on a page boundary */
 	idaws[0] += pa->pa_iova & (PAGE_SIZE - 1);
@@ -251,8 +251,7 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
 
 	l = n;
 	for (i = 0; i < pa.pa_nr; i++) {
-		struct page *page = pfn_to_page(pa.pa_pfn[i]);
-		void *from = kmap_local_page(page);
+		void *from = kmap_local_page(pa.pa_page[i]);
 
 		m = PAGE_SIZE;
 		if (i == 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8a2018ab3cf0..e73bdb57bc90 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -243,9 +243,10 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	struct ap_qirq_ctrl aqic_gisa = {};
 	struct ap_queue_status status = {};
 	struct kvm_s390_gisa *gisa;
+	struct page *h_page;
 	int nisc;
 	struct kvm *kvm;
-	unsigned long g_pfn, h_pfn;
+	unsigned long g_pfn;
 	phys_addr_t h_nib;
 	int ret;
 
@@ -259,7 +260,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	}
 
 	ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
-			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
+			     IOMMU_READ | IOMMU_WRITE, &h_page);
 	switch (ret) {
 	case 1:
 		break;
@@ -275,7 +276,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
 	kvm = q->matrix_mdev->kvm;
 	gisa = kvm->arch.gisa_int.origin;
 
-	h_nib = (h_pfn << PAGE_SHIFT) | (nib & ~PAGE_MASK);
+	h_nib = page_to_phys(h_page) | (nib & ~PAGE_MASK);
 	aqic_gisa.gisc = isc;
 
 	nisc = kvm_s390_gisc_register(kvm, isc);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index e8dbb0122e20..7eee8048e231 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1917,18 +1917,18 @@ EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
  * @npage [in]   : count of pages to be pinned.  This count should not
  *		   be greater VFIO_PIN_PAGES_MAX_ENTRIES.
  * @prot [in]    : protection flags
- * @phys_pfn[out]: array of host PFNs
+ * @phys_page[out]: array of host pages
  * Return error or number of pages pinned.
  */
 int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
-		   int npage, int prot, unsigned long *phys_pfn)
+		   int npage, int prot, struct page **phys_page)
 {
 	struct vfio_container *container;
 	struct vfio_group *group = device->group;
 	struct vfio_iommu_driver *driver;
 	int ret;
 
-	if (!phys_pfn || !npage || !vfio_assert_device_open(device))
+	if (!phys_page || !npage || !vfio_assert_device_open(device))
 		return -EINVAL;
 
 	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1943,7 +1943,7 @@ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 	if (likely(driver && driver->ops->pin_pages))
 		ret = driver->ops->pin_pages(container->iommu_data,
 					     group->iommu_group, iova,
-					     npage, prot, phys_pfn);
+					     npage, prot, phys_page);
 	else
 		ret = -ENOTTY;
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 6bd5304ee0b7..758a0a91a066 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -52,7 +52,7 @@ struct vfio_iommu_driver_ops {
 				     struct iommu_group *group,
 				     dma_addr_t user_iova,
 				     int npage, int prot,
-				     unsigned long *phys_pfn);
+				     struct page **phys_page);
 	int		(*unpin_pages)(void *iommu_data,
 				       dma_addr_t user_iova, int npage);
 	int		(*register_notifier)(void *iommu_data,
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d027ed8441a9..841b1803e313 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -830,7 +830,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 				      struct iommu_group *iommu_group,
 				      dma_addr_t user_iova,
 				      int npage, int prot,
-				      unsigned long *phys_pfn)
+				      struct page **phys_page)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	struct vfio_iommu_group *group;
@@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	bool do_accounting;
 	dma_addr_t iova;
 
-	if (!iommu || !phys_pfn)
+	if (!iommu || !phys_page)
 		return -EINVAL;
 
 	/* Supported for v2 version only */
@@ -879,6 +879,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	do_accounting = list_empty(&iommu->domain_list);
 
 	for (i = 0; i < npage; i++) {
+		unsigned long phys_pfn;
 		struct vfio_pfn *vpfn;
 
 		iova = user_iova + PAGE_SIZE * i;
@@ -895,23 +896,25 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 
 		vpfn = vfio_iova_get_vfio_pfn(dma, iova);
 		if (vpfn) {
-			phys_pfn[i] = vpfn->pfn;
+			phys_page[i] = pfn_to_page(vpfn->pfn);
 			continue;
 		}
 
 		remote_vaddr = dma->vaddr + (iova - dma->iova);
-		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
+		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
 					     do_accounting);
 		if (ret)
 			goto pin_unwind;
 
-		ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]);
+		ret = vfio_add_to_pfn_list(dma, iova, phys_pfn);
 		if (ret) {
-			if (put_pfn(phys_pfn[i], dma->prot) && do_accounting)
+			if (put_pfn(phys_pfn, dma->prot) && do_accounting)
 				vfio_lock_acct(dma, -1, true);
 			goto pin_unwind;
 		}
 
+		phys_page[i] = pfn_to_page(phys_pfn);
+
 		if (iommu->dirty_page_tracking) {
 			unsigned long pgshift = __ffs(iommu->pgsize_bitmap);
 
@@ -934,14 +937,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	goto pin_done;
 
 pin_unwind:
-	phys_pfn[i] = 0;
+	phys_page[i] = NULL;
 	for (j = 0; j < i; j++) {
 		dma_addr_t iova;
 
 		iova = user_iova + PAGE_SIZE * j;
 		dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
 		vfio_unpin_page_external(dma, iova, do_accounting);
-		phys_pfn[j] = 0;
+		phys_page[j] = NULL;
 	}
 pin_done:
 	mutex_unlock(&iommu->lock);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 99c3bf52c4da..7bc18802bf39 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -148,7 +148,7 @@ extern bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
 extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
-			  int npage, int prot, unsigned long *phys_pfn);
+			  int npage, int prot, struct page **phys_page);
 extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
 			    int npage);
 extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
  2022-06-16 23:52   ` Nicolin Chen
@ 2022-06-17  8:42     ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-17  8:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Thu, Jun 16, 2022 at 04:52:09PM -0700, Nicolin Chen wrote:
> +	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
> +	drm_WARN_ON(&i915->drm, ret != npage);

The shifting of gfn seems to happen bother here and in the callers.

Also this is the only caller that does anything withthe vfio_unpin_pages
return value.  Given that you touch the API here we might as well
not return any value, and turn the debug checks that can return errors
into WARN_ON_ONCE calls the vfio/iommu_type1 code.

> +extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
>  			  int npage, int prot, unsigned long *phys_pfn);
> -extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> +extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
>  			    int npage);

This will clash with the extern removal patch that Alex has sent.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
@ 2022-06-17  8:42     ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-17  8:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, jgg,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor, hca,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar,
	svens

On Thu, Jun 16, 2022 at 04:52:09PM -0700, Nicolin Chen wrote:
> +	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
> +	drm_WARN_ON(&i915->drm, ret != npage);

The shifting of gfn seems to happen bother here and in the callers.

Also this is the only caller that does anything withthe vfio_unpin_pages
return value.  Given that you touch the API here we might as well
not return any value, and turn the debug checks that can return errors
into WARN_ON_ONCE calls the vfio/iommu_type1 code.

> +extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
>  			  int npage, int prot, unsigned long *phys_pfn);
> -extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> +extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
>  			    int npage);

This will clash with the extern removal patch that Alex has sent.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-16 23:52   ` Nicolin Chen
@ 2022-06-17  8:44     ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-17  8:44 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Thu, Jun 16, 2022 at 04:52:11PM -0700, Nicolin Chen wrote:
> The pinned PFN list returned from vfio_pin_pages() is simply converted
> using page_to_pfn() without protection, so direct access via memcpy()
> will crash on S390 if the PFN is an IO PFN. Instead, the pages should
> be touched using kmap_local_page().

I don't see how this helps.  kmap_local_page only works for either
pages in the kernel direct map or highmem, but not for memory that needs
to be ioremapped.  And there is no highmem on s390.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-17  8:44     ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-17  8:44 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, jgg,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor, hca,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar,
	svens

On Thu, Jun 16, 2022 at 04:52:11PM -0700, Nicolin Chen wrote:
> The pinned PFN list returned from vfio_pin_pages() is simply converted
> using page_to_pfn() without protection, so direct access via memcpy()
> will crash on S390 if the PFN is an IO PFN. Instead, the pages should
> be touched using kmap_local_page().

I don't see how this helps.  kmap_local_page only works for either
pages in the kernel direct map or highmem, but not for memory that needs
to be ioremapped.  And there is no highmem on s390.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-16 23:52   ` Nicolin Chen
@ 2022-06-17  8:54     ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-17  8:54 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

There is a bunch of code an comments in the iommu type1 code that
suggest we can pin memory that is not page backed.  

>  int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> +		   int npage, int prot, struct page **phys_page)

I don't think phys_page makes much sense as an argument name.
I'd just call this pages.

> +			phys_page[i] = pfn_to_page(vpfn->pfn);

Please store the actual page pointer in the vfio_pfn structure.

>  		remote_vaddr = dma->vaddr + (iova - dma->iova);
> -		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
> +		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
>  					     do_accounting);

Please just return the actual page from vaddr_get_pfns through
vfio_pin_pages_remote and vfio_pin_page_external, maybe even as a prep
patch as that is a fair amount of churn.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-17  8:54     ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-17  8:54 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, jgg,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor, hca,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar,
	svens

There is a bunch of code an comments in the iommu type1 code that
suggest we can pin memory that is not page backed.  

>  int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> +		   int npage, int prot, struct page **phys_page)

I don't think phys_page makes much sense as an argument name.
I'd just call this pages.

> +			phys_page[i] = pfn_to_page(vpfn->pfn);

Please store the actual page pointer in the vfio_pfn structure.

>  		remote_vaddr = dma->vaddr + (iova - dma->iova);
> -		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
> +		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
>  					     do_accounting);

Please just return the actual page from vaddr_get_pfns through
vfio_pin_pages_remote and vfio_pin_page_external, maybe even as a prep
patch as that is a fair amount of churn.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
  2022-06-17  8:42     ` [Intel-gfx] " Christoph Hellwig
@ 2022-06-17 21:57       ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-17 21:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Fri, Jun 17, 2022 at 01:42:42AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:09PM -0700, Nicolin Chen wrote:
> > +	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
> > +	drm_WARN_ON(&i915->drm, ret != npage);
> 
> The shifting of gfn seems to happen bother here and in the callers.
> 
> Also this is the only caller that does anything withthe vfio_unpin_pages
> return value.  Given that you touch the API here we might as well
> not return any value, and turn the debug checks that can return errors
> into WARN_ON_ONCE calls the vfio/iommu_type1 code.

Thanks for the suggestion. I will do that.

> > +extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> >  			  int npage, int prot, unsigned long *phys_pfn);
> > -extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> > +extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
> >  			    int npage);
> 
> This will clash with the extern removal patch that Alex has sent.

And I will rebase on top of Alex's change.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
@ 2022-06-17 21:57       ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-17 21:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, jgg, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, hca, alex.williamson, freude, rodrigo.vivi,
	intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck, oberpar, svens

On Fri, Jun 17, 2022 at 01:42:42AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:09PM -0700, Nicolin Chen wrote:
> > +	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
> > +	drm_WARN_ON(&i915->drm, ret != npage);
> 
> The shifting of gfn seems to happen bother here and in the callers.
> 
> Also this is the only caller that does anything withthe vfio_unpin_pages
> return value.  Given that you touch the API here we might as well
> not return any value, and turn the debug checks that can return errors
> into WARN_ON_ONCE calls the vfio/iommu_type1 code.

Thanks for the suggestion. I will do that.

> > +extern int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> >  			  int npage, int prot, unsigned long *phys_pfn);
> > -extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> > +extern int vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
> >  			    int npage);
> 
> This will clash with the extern removal patch that Alex has sent.

And I will rebase on top of Alex's change.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-17  8:44     ` [Intel-gfx] " Christoph Hellwig
@ 2022-06-17 21:58       ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-17 21:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Fri, Jun 17, 2022 at 01:44:30AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:11PM -0700, Nicolin Chen wrote:
> > The pinned PFN list returned from vfio_pin_pages() is simply converted
> > using page_to_pfn() without protection, so direct access via memcpy()
> > will crash on S390 if the PFN is an IO PFN. Instead, the pages should
> > be touched using kmap_local_page().
> 
> I don't see how this helps.  kmap_local_page only works for either
> pages in the kernel direct map or highmem, but not for memory that needs
> to be ioremapped.  And there is no highmem on s390.

Oh..I will discuss with Jason once he's back, to see if we should
simply drop this change. Thanks!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-17 21:58       ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-17 21:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, jgg, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, hca, alex.williamson, freude, rodrigo.vivi,
	intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck, oberpar, svens

On Fri, Jun 17, 2022 at 01:44:30AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:11PM -0700, Nicolin Chen wrote:
> > The pinned PFN list returned from vfio_pin_pages() is simply converted
> > using page_to_pfn() without protection, so direct access via memcpy()
> > will crash on S390 if the PFN is an IO PFN. Instead, the pages should
> > be touched using kmap_local_page().
> 
> I don't see how this helps.  kmap_local_page only works for either
> pages in the kernel direct map or highmem, but not for memory that needs
> to be ioremapped.  And there is no highmem on s390.

Oh..I will discuss with Jason once he's back, to see if we should
simply drop this change. Thanks!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-17  8:54     ` [Intel-gfx] " Christoph Hellwig
@ 2022-06-17 22:06       ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-17 22:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> There is a bunch of code an comments in the iommu type1 code that
> suggest we can pin memory that is not page backed.  

Would you mind explaining the use case for pinning memory that
isn't page backed? And do we have such use case so far?

> >  int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> > +		   int npage, int prot, struct page **phys_page)
> 
> I don't think phys_page makes much sense as an argument name.
> I'd just call this pages.

OK.

> > +			phys_page[i] = pfn_to_page(vpfn->pfn);
> 
> Please store the actual page pointer in the vfio_pfn structure.

OK.

> >  		remote_vaddr = dma->vaddr + (iova - dma->iova);
> > -		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
> > +		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
> >  					     do_accounting);
> 
> Please just return the actual page from vaddr_get_pfns through
> vfio_pin_pages_remote and vfio_pin_page_external, maybe even as a prep
> patch as that is a fair amount of churn.

I can do that. I tried once, but there were just too much changes
inside type1 code that felt like a chain reaction. If we plan to
eventually replace with IOMMUFD implementations, these changes in
type1 might not be necessary, I thought.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-17 22:06       ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-17 22:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, jgg, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, hca, alex.williamson, freude, rodrigo.vivi,
	intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck, oberpar, svens

On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> There is a bunch of code an comments in the iommu type1 code that
> suggest we can pin memory that is not page backed.  

Would you mind explaining the use case for pinning memory that
isn't page backed? And do we have such use case so far?

> >  int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> > +		   int npage, int prot, struct page **phys_page)
> 
> I don't think phys_page makes much sense as an argument name.
> I'd just call this pages.

OK.

> > +			phys_page[i] = pfn_to_page(vpfn->pfn);
> 
> Please store the actual page pointer in the vfio_pfn structure.

OK.

> >  		remote_vaddr = dma->vaddr + (iova - dma->iova);
> > -		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
> > +		ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
> >  					     do_accounting);
> 
> Please just return the actual page from vaddr_get_pfns through
> vfio_pin_pages_remote and vfio_pin_page_external, maybe even as a prep
> patch as that is a fair amount of churn.

I can do that. I tried once, but there were just too much changes
inside type1 code that felt like a chain reaction. If we plan to
eventually replace with IOMMUFD implementations, these changes in
type1 might not be necessary, I thought.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-17 22:06       ` Nicolin Chen
@ 2022-06-19  6:18         ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-19  6:18 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Christoph Hellwig, kwankhede, corbet, hca, gor, agordeev,
	borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, jgg, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Fri, Jun 17, 2022 at 03:06:25PM -0700, Nicolin Chen wrote:
> On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > There is a bunch of code an comments in the iommu type1 code that
> > suggest we can pin memory that is not page backed.  
> 
> Would you mind explaining the use case for pinning memory that
> isn't page backed? And do we have such use case so far?

Sorry, I should have deleted that sentence.  I wrote it before spending
some more time to dig through the code and all the locked memory has
page backing.  There just seem to be a lot of checks left inbetween
if a pfn is page backed, mostly due to the pfn based calling convetions.

> I can do that. I tried once, but there were just too much changes
> inside type1 code that felt like a chain reaction. If we plan to
> eventually replace with IOMMUFD implementations, these changes in
> type1 might not be necessary, I thought.

To make sure we keep full compatibility I suspect the final iommufd
implementation has to be gradutally created from the existing code
anyway.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-19  6:18         ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-19  6:18 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, pasic, kvm, corbet, Christoph Hellwig, jgg,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor,
	linux-s390, hca, freude, rodrigo.vivi, intel-gvt-dev, akrowiak,
	cohuck, oberpar, svens

On Fri, Jun 17, 2022 at 03:06:25PM -0700, Nicolin Chen wrote:
> On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > There is a bunch of code an comments in the iommu type1 code that
> > suggest we can pin memory that is not page backed.  
> 
> Would you mind explaining the use case for pinning memory that
> isn't page backed? And do we have such use case so far?

Sorry, I should have deleted that sentence.  I wrote it before spending
some more time to dig through the code and all the locked memory has
page backing.  There just seem to be a lot of checks left inbetween
if a pfn is page backed, mostly due to the pfn based calling convetions.

> I can do that. I tried once, but there were just too much changes
> inside type1 code that felt like a chain reaction. If we plan to
> eventually replace with IOMMUFD implementations, these changes in
> type1 might not be necessary, I thought.

To make sure we keep full compatibility I suspect the final iommufd
implementation has to be gradutally created from the existing code
anyway.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-19  6:18         ` [Intel-gfx] " Christoph Hellwig
@ 2022-06-19  6:41           ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-19  6:41 UTC (permalink / raw)
  To: Christoph Hellwig, jgg
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Sat, Jun 18, 2022 at 11:18:17PM -0700, Christoph Hellwig wrote:
> > > There is a bunch of code an comments in the iommu type1 code that
> > > suggest we can pin memory that is not page backed.  
> > 
> > Would you mind explaining the use case for pinning memory that
> > isn't page backed? And do we have such use case so far?
> 
> Sorry, I should have deleted that sentence.  I wrote it before spending
> some more time to dig through the code and all the locked memory has
> page backing.  There just seem to be a lot of checks left inbetween
> if a pfn is page backed, mostly due to the pfn based calling convetions.

OK. We'd be safe to move on then. Thanks for the clarification.

> > I can do that. I tried once, but there were just too much changes
> > inside type1 code that felt like a chain reaction. If we plan to
> > eventually replace with IOMMUFD implementations, these changes in
> > type1 might not be necessary, I thought.
> 
> To make sure we keep full compatibility I suspect the final iommufd
> implementation has to be gradutally created from the existing code
> anyway.

Hmm. I think Jason can give some insight. Meanwhile, I will try
to add a patch to type1 code, in case we'd end up with what you
suspected.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-19  6:41           ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-19  6:41 UTC (permalink / raw)
  To: Christoph Hellwig, jgg
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, hca, alex.williamson, freude, rodrigo.vivi,
	intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck, oberpar, svens

On Sat, Jun 18, 2022 at 11:18:17PM -0700, Christoph Hellwig wrote:
> > > There is a bunch of code an comments in the iommu type1 code that
> > > suggest we can pin memory that is not page backed.  
> > 
> > Would you mind explaining the use case for pinning memory that
> > isn't page backed? And do we have such use case so far?
> 
> Sorry, I should have deleted that sentence.  I wrote it before spending
> some more time to dig through the code and all the locked memory has
> page backing.  There just seem to be a lot of checks left inbetween
> if a pfn is page backed, mostly due to the pfn based calling convetions.

OK. We'd be safe to move on then. Thanks for the clarification.

> > I can do that. I tried once, but there were just too much changes
> > inside type1 code that felt like a chain reaction. If we plan to
> > eventually replace with IOMMUFD implementations, these changes in
> > type1 might not be necessary, I thought.
> 
> To make sure we keep full compatibility I suspect the final iommufd
> implementation has to be gradutally created from the existing code
> anyway.

Hmm. I think Jason can give some insight. Meanwhile, I will try
to add a patch to type1 code, in case we'd end up with what you
suspected.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-17  8:44     ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-06-20  2:57       ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20  2:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nicolin Chen, kwankhede, corbet, hca, gor, agordeev, borntraeger,
	svens, zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, tvrtko.ursulin, airlied, daniel, farman, mjrosato,
	pasic, vneethv, oberpar, freude, akrowiak, jjherne,
	alex.williamson, cohuck, kevin.tian, jchrist, kvm, linux-doc,
	linux-kernel, linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Fri, Jun 17, 2022 at 01:44:30AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:11PM -0700, Nicolin Chen wrote:
> > The pinned PFN list returned from vfio_pin_pages() is simply converted
> > using page_to_pfn() without protection, so direct access via memcpy()
> > will crash on S390 if the PFN is an IO PFN. Instead, the pages should
> > be touched using kmap_local_page().
> 
> I don't see how this helps.  kmap_local_page only works for either
> pages in the kernel direct map or highmem, but not for memory that needs
> to be ioremapped.  And there is no highmem on s390.

The remark about io memory is because on s390 memcpy() will crash even
on ioremapped memory, you have to use the memcpy_to/fromio() which
uses the special s390 io access instructions.

This helps because we now block io memory from ever getting into these
call paths. I'm pretty sure this is a serious security bug, but would
let the IBM folks remark as I don't know it all that well..

As for the kmap, I thought it was standard practice even if it is a
non-highmem? Aren't people trying to use this for other security
stuff these days?

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-20  2:57       ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20  2:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, Nicolin Chen, borntraeger, intel-gfx, zhi.a.wang,
	jjherne, farman, jchrist, gor, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Fri, Jun 17, 2022 at 01:44:30AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:11PM -0700, Nicolin Chen wrote:
> > The pinned PFN list returned from vfio_pin_pages() is simply converted
> > using page_to_pfn() without protection, so direct access via memcpy()
> > will crash on S390 if the PFN is an IO PFN. Instead, the pages should
> > be touched using kmap_local_page().
> 
> I don't see how this helps.  kmap_local_page only works for either
> pages in the kernel direct map or highmem, but not for memory that needs
> to be ioremapped.  And there is no highmem on s390.

The remark about io memory is because on s390 memcpy() will crash even
on ioremapped memory, you have to use the memcpy_to/fromio() which
uses the special s390 io access instructions.

This helps because we now block io memory from ever getting into these
call paths. I'm pretty sure this is a serious security bug, but would
let the IBM folks remark as I don't know it all that well..

As for the kmap, I thought it was standard practice even if it is a
non-highmem? Aren't people trying to use this for other security
stuff these days?

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-20  2:57       ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20  2:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, Nicolin Chen,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor, hca,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar,
	svens

On Fri, Jun 17, 2022 at 01:44:30AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:11PM -0700, Nicolin Chen wrote:
> > The pinned PFN list returned from vfio_pin_pages() is simply converted
> > using page_to_pfn() without protection, so direct access via memcpy()
> > will crash on S390 if the PFN is an IO PFN. Instead, the pages should
> > be touched using kmap_local_page().
> 
> I don't see how this helps.  kmap_local_page only works for either
> pages in the kernel direct map or highmem, but not for memory that needs
> to be ioremapped.  And there is no highmem on s390.

The remark about io memory is because on s390 memcpy() will crash even
on ioremapped memory, you have to use the memcpy_to/fromio() which
uses the special s390 io access instructions.

This helps because we now block io memory from ever getting into these
call paths. I'm pretty sure this is a serious security bug, but would
let the IBM folks remark as I don't know it all that well..

As for the kmap, I thought it was standard practice even if it is a
non-highmem? Aren't people trying to use this for other security
stuff these days?

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-17  8:54     ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-06-20  3:00       ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20  3:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nicolin Chen, kwankhede, corbet, hca, gor, agordeev, borntraeger,
	svens, zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, tvrtko.ursulin, airlied, daniel, farman, mjrosato,
	pasic, vneethv, oberpar, freude, akrowiak, jjherne,
	alex.williamson, cohuck, kevin.tian, jchrist, kvm, linux-doc,
	linux-kernel, linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> There is a bunch of code an comments in the iommu type1 code that
> suggest we can pin memory that is not page backed.  

AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
loaded into the type1 maps and the pin API will happily return
them. This happens in almost every qemu scenario because PCI MMIO BAR
memory ends up routed down this path.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-20  3:00       ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20  3:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, Nicolin Chen, borntraeger, intel-gfx, zhi.a.wang,
	jjherne, farman, jchrist, gor, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> There is a bunch of code an comments in the iommu type1 code that
> suggest we can pin memory that is not page backed.  

AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
loaded into the type1 maps and the pin API will happily return
them. This happens in almost every qemu scenario because PCI MMIO BAR
memory ends up routed down this path.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-20  3:00       ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20  3:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, Nicolin Chen,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor, hca,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar,
	svens

On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> There is a bunch of code an comments in the iommu type1 code that
> suggest we can pin memory that is not page backed.  

AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
loaded into the type1 maps and the pin API will happily return
them. This happens in almost every qemu scenario because PCI MMIO BAR
memory ends up routed down this path.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-20  3:00       ` Jason Gunthorpe
@ 2022-06-20  5:51         ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-20  5:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Nicolin Chen, kwankhede, corbet, hca, gor,
	agordeev, borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > There is a bunch of code an comments in the iommu type1 code that
> > suggest we can pin memory that is not page backed.  
> 
> AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> loaded into the type1 maps and the pin API will happily return
> them. This happens in almost every qemu scenario because PCI MMIO BAR
> memory ends up routed down this path.

Indeed, my read wasn't deep enough.  Which means that we can't change
the ->pin_pages interface to return a struct pages array, as we don't
have one for those.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-20  5:51         ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-20  5:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, pasic, kvm, corbet, Christoph Hellwig,
	Nicolin Chen, borntraeger, intel-gfx, jjherne, farman, jchrist,
	gor, linux-s390, hca, freude, rodrigo.vivi, intel-gvt-dev,
	akrowiak, cohuck, oberpar, svens

On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > There is a bunch of code an comments in the iommu type1 code that
> > suggest we can pin memory that is not page backed.  
> 
> AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> loaded into the type1 maps and the pin API will happily return
> them. This happens in almost every qemu scenario because PCI MMIO BAR
> memory ends up routed down this path.

Indeed, my read wasn't deep enough.  Which means that we can't change
the ->pin_pages interface to return a struct pages array, as we don't
have one for those.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-20  2:57       ` Jason Gunthorpe
@ 2022-06-20  6:32         ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-20  6:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Nicolin Chen, kwankhede, corbet, hca, gor,
	agordeev, borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Sun, Jun 19, 2022 at 11:57:26PM -0300, Jason Gunthorpe wrote:
> The remark about io memory is because on s390 memcpy() will crash even
> on ioremapped memory, you have to use the memcpy_to/fromio() which
> uses the special s390 io access instructions.

Yes.  The same is true for various other architectures, inluding arm64
under the right circumstances.

> This helps because we now block io memory from ever getting into these
> call paths. I'm pretty sure this is a serious security bug, but would
> let the IBM folks remark as I don't know it all that well..

Prevent as in crash when trying to convert it to a page?

> As for the kmap, I thought it was standard practice even if it is a
> non-highmem? Aren't people trying to use this for other security
> stuff these days?

Ira has been lookin into the protection keys, although they don't
apply to s390.  Either way I don't object to using kmap, but the
commit log doesn't make much sense to me.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-20  6:32         ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-20  6:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, pasic, kvm, corbet, Christoph Hellwig,
	Nicolin Chen, borntraeger, intel-gfx, jjherne, farman, jchrist,
	gor, linux-s390, hca, freude, rodrigo.vivi, intel-gvt-dev,
	akrowiak, cohuck, oberpar, svens

On Sun, Jun 19, 2022 at 11:57:26PM -0300, Jason Gunthorpe wrote:
> The remark about io memory is because on s390 memcpy() will crash even
> on ioremapped memory, you have to use the memcpy_to/fromio() which
> uses the special s390 io access instructions.

Yes.  The same is true for various other architectures, inluding arm64
under the right circumstances.

> This helps because we now block io memory from ever getting into these
> call paths. I'm pretty sure this is a serious security bug, but would
> let the IBM folks remark as I don't know it all that well..

Prevent as in crash when trying to convert it to a page?

> As for the kmap, I thought it was standard practice even if it is a
> non-highmem? Aren't people trying to use this for other security
> stuff these days?

Ira has been lookin into the protection keys, although they don't
apply to s390.  Either way I don't object to using kmap, but the
commit log doesn't make much sense to me.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-20  5:51         ` [Intel-gfx] " Christoph Hellwig
@ 2022-06-20  6:37           ` Christoph Hellwig
  -1 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-20  6:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Nicolin Chen, kwankhede, corbet, hca, gor,
	agordeev, borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > There is a bunch of code an comments in the iommu type1 code that
> > > suggest we can pin memory that is not page backed.  
> > 
> > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > loaded into the type1 maps and the pin API will happily return
> > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > memory ends up routed down this path.
> 
> Indeed, my read wasn't deep enough.  Which means that we can't change
> the ->pin_pages interface to return a struct pages array, as we don't
> have one for those.

Actually.  gvt requires a struct page, and both s390 seem to require
normal non-I/O, non-remapped kernel pointers.  So I think for the
vfio_pin_pages we can assume that we only want page backed memory and
remove the follow_fault_pfn case entirely.   But we'll probably have
to keep it for the vfio_iommu_replay case that is not tied to
emulated IOMMU drivers.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-20  6:37           ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2022-06-20  6:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, pasic, kvm, corbet, Christoph Hellwig,
	Nicolin Chen, borntraeger, intel-gfx, jjherne, farman, jchrist,
	gor, linux-s390, hca, freude, rodrigo.vivi, intel-gvt-dev,
	akrowiak, cohuck, oberpar, svens

On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > There is a bunch of code an comments in the iommu type1 code that
> > > suggest we can pin memory that is not page backed.  
> > 
> > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > loaded into the type1 maps and the pin API will happily return
> > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > memory ends up routed down this path.
> 
> Indeed, my read wasn't deep enough.  Which means that we can't change
> the ->pin_pages interface to return a struct pages array, as we don't
> have one for those.

Actually.  gvt requires a struct page, and both s390 seem to require
normal non-I/O, non-remapped kernel pointers.  So I think for the
vfio_pin_pages we can assume that we only want page backed memory and
remove the follow_fault_pfn case entirely.   But we'll probably have
to keep it for the vfio_iommu_replay case that is not tied to
emulated IOMMU drivers.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 1/6] vfio/ap: Pass in physical address of ind to ap_aqic()
  2022-06-16 23:52   ` Nicolin Chen
@ 2022-06-20 10:00     ` Harald Freudenberger
  -1 siblings, 0 replies; 75+ messages in thread
From: Harald Freudenberger @ 2022-06-20 10:00 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, akrowiak, jjherne, alex.williamson, cohuck,
	jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On 2022-06-17 01:52, Nicolin Chen wrote:
> The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
> virt value that's casted from a physical address "h_nib". Inside the
> ap_aqic(), it does virt_to_phys() again.
> 
> Since ap_aqic() needs a physical address, let's just pass in a pa of
> ind directly. So change the "ind" to "pa_ind".
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  arch/s390/include/asm/ap.h        | 6 +++---
>  drivers/s390/crypto/ap_queue.c    | 2 +-
>  drivers/s390/crypto/vfio_ap_ops.c | 7 ++++---
>  3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h
> index b515cfa62bd9..f508f5025e38 100644
> --- a/arch/s390/include/asm/ap.h
> +++ b/arch/s390/include/asm/ap.h
> @@ -227,13 +227,13 @@ struct ap_qirq_ctrl {
>   * ap_aqic(): Control interruption for a specific AP.
>   * @qid: The AP queue number
>   * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
> - * @ind: The notification indicator byte
> + * @pa_ind: Physical address of the notification indicator byte
>   *
>   * Returns AP queue status.
>   */
>  static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
>  					     struct ap_qirq_ctrl qirqctrl,
> -					     void *ind)
> +					     phys_addr_t pa_ind)
>  {
>  	unsigned long reg0 = qid | (3UL << 24);  /* fc 3UL is AQIC */
>  	union {
> @@ -241,7 +241,7 @@ static inline struct ap_queue_status 
> ap_aqic(ap_qid_t qid,
>  		struct ap_qirq_ctrl qirqctrl;
>  		struct ap_queue_status status;
>  	} reg1;
> -	unsigned long reg2 = virt_to_phys(ind);
> +	unsigned long reg2 = pa_ind;
> 
>  	reg1.qirqctrl = qirqctrl;
> 
> diff --git a/drivers/s390/crypto/ap_queue.c 
> b/drivers/s390/crypto/ap_queue.c
> index c48b0db824e3..a32457b4cbb8 100644
> --- a/drivers/s390/crypto/ap_queue.c
> +++ b/drivers/s390/crypto/ap_queue.c
> @@ -34,7 +34,7 @@ static int ap_queue_enable_irq(struct ap_queue *aq, 
> void *ind)
> 
>  	qirqctrl.ir = 1;
>  	qirqctrl.isc = AP_ISC;
> -	status = ap_aqic(aq->qid, qirqctrl, ind);
> +	status = ap_aqic(aq->qid, qirqctrl, virt_to_phys(ind));
>  	switch (status.response_code) {
>  	case AP_RESPONSE_NORMAL:
>  	case AP_RESPONSE_OTHERWISE_CHANGED:
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index a7d2a95796d3..bb869b28cebd 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -154,7 +154,7 @@ static struct ap_queue_status
> vfio_ap_irq_disable(struct vfio_ap_queue *q)
>  	int retries = 5;
> 
>  	do {
> -		status = ap_aqic(q->apqn, aqic_gisa, NULL);
> +		status = ap_aqic(q->apqn, aqic_gisa, 0);
>  		switch (status.response_code) {
>  		case AP_RESPONSE_OTHERWISE_CHANGED:
>  		case AP_RESPONSE_NORMAL:
> @@ -245,7 +245,8 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
>  	struct kvm_s390_gisa *gisa;
>  	int nisc;
>  	struct kvm *kvm;
> -	unsigned long h_nib, g_pfn, h_pfn;
> +	unsigned long g_pfn, h_pfn;
> +	phys_addr_t h_nib;
>  	int ret;
> 
>  	/* Verify that the notification indicator byte address is valid */
> @@ -290,7 +291,7 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
>  	aqic_gisa.ir = 1;
>  	aqic_gisa.gisa = (uint64_t)gisa >> 4;
> 
> -	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
> +	status = ap_aqic(q->apqn, aqic_gisa, h_nib);
>  	switch (status.response_code) {
>  	case AP_RESPONSE_NORMAL:
>  		/* See if we did clear older IRQ configuration */
Add my Reviewed-By: Harald Freudenberger <freude@linux.ibm.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 1/6] vfio/ap: Pass in physical address of ind to ap_aqic()
@ 2022-06-20 10:00     ` Harald Freudenberger
  0 siblings, 0 replies; 75+ messages in thread
From: Harald Freudenberger @ 2022-06-20 10:00 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, jgg, borntraeger, intel-gfx, zhi.a.wang, akrowiak,
	farman, jchrist, gor, hca, alex.williamson, rodrigo.vivi,
	intel-gvt-dev, jjherne, tvrtko.ursulin, cohuck, oberpar, svens

On 2022-06-17 01:52, Nicolin Chen wrote:
> The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
> virt value that's casted from a physical address "h_nib". Inside the
> ap_aqic(), it does virt_to_phys() again.
> 
> Since ap_aqic() needs a physical address, let's just pass in a pa of
> ind directly. So change the "ind" to "pa_ind".
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  arch/s390/include/asm/ap.h        | 6 +++---
>  drivers/s390/crypto/ap_queue.c    | 2 +-
>  drivers/s390/crypto/vfio_ap_ops.c | 7 ++++---
>  3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h
> index b515cfa62bd9..f508f5025e38 100644
> --- a/arch/s390/include/asm/ap.h
> +++ b/arch/s390/include/asm/ap.h
> @@ -227,13 +227,13 @@ struct ap_qirq_ctrl {
>   * ap_aqic(): Control interruption for a specific AP.
>   * @qid: The AP queue number
>   * @qirqctrl: struct ap_qirq_ctrl (64 bit value)
> - * @ind: The notification indicator byte
> + * @pa_ind: Physical address of the notification indicator byte
>   *
>   * Returns AP queue status.
>   */
>  static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
>  					     struct ap_qirq_ctrl qirqctrl,
> -					     void *ind)
> +					     phys_addr_t pa_ind)
>  {
>  	unsigned long reg0 = qid | (3UL << 24);  /* fc 3UL is AQIC */
>  	union {
> @@ -241,7 +241,7 @@ static inline struct ap_queue_status 
> ap_aqic(ap_qid_t qid,
>  		struct ap_qirq_ctrl qirqctrl;
>  		struct ap_queue_status status;
>  	} reg1;
> -	unsigned long reg2 = virt_to_phys(ind);
> +	unsigned long reg2 = pa_ind;
> 
>  	reg1.qirqctrl = qirqctrl;
> 
> diff --git a/drivers/s390/crypto/ap_queue.c 
> b/drivers/s390/crypto/ap_queue.c
> index c48b0db824e3..a32457b4cbb8 100644
> --- a/drivers/s390/crypto/ap_queue.c
> +++ b/drivers/s390/crypto/ap_queue.c
> @@ -34,7 +34,7 @@ static int ap_queue_enable_irq(struct ap_queue *aq, 
> void *ind)
> 
>  	qirqctrl.ir = 1;
>  	qirqctrl.isc = AP_ISC;
> -	status = ap_aqic(aq->qid, qirqctrl, ind);
> +	status = ap_aqic(aq->qid, qirqctrl, virt_to_phys(ind));
>  	switch (status.response_code) {
>  	case AP_RESPONSE_NORMAL:
>  	case AP_RESPONSE_OTHERWISE_CHANGED:
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index a7d2a95796d3..bb869b28cebd 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -154,7 +154,7 @@ static struct ap_queue_status
> vfio_ap_irq_disable(struct vfio_ap_queue *q)
>  	int retries = 5;
> 
>  	do {
> -		status = ap_aqic(q->apqn, aqic_gisa, NULL);
> +		status = ap_aqic(q->apqn, aqic_gisa, 0);
>  		switch (status.response_code) {
>  		case AP_RESPONSE_OTHERWISE_CHANGED:
>  		case AP_RESPONSE_NORMAL:
> @@ -245,7 +245,8 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
>  	struct kvm_s390_gisa *gisa;
>  	int nisc;
>  	struct kvm *kvm;
> -	unsigned long h_nib, g_pfn, h_pfn;
> +	unsigned long g_pfn, h_pfn;
> +	phys_addr_t h_nib;
>  	int ret;
> 
>  	/* Verify that the notification indicator byte address is valid */
> @@ -290,7 +291,7 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
>  	aqic_gisa.ir = 1;
>  	aqic_gisa.gisa = (uint64_t)gisa >> 4;
> 
> -	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
> +	status = ap_aqic(q->apqn, aqic_gisa, h_nib);
>  	switch (status.response_code) {
>  	case AP_RESPONSE_NORMAL:
>  		/* See if we did clear older IRQ configuration */
Add my Reviewed-By: Harald Freudenberger <freude@linux.ibm.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-20  6:37           ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-06-20 15:36             ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20 15:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nicolin Chen, kwankhede, corbet, hca, gor, agordeev, borntraeger,
	svens, zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, tvrtko.ursulin, airlied, daniel, farman, mjrosato,
	pasic, vneethv, oberpar, freude, akrowiak, jjherne,
	alex.williamson, cohuck, kevin.tian, jchrist, kvm, linux-doc,
	linux-kernel, linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Sun, Jun 19, 2022 at 11:37:47PM -0700, Christoph Hellwig wrote:
> On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> > On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > > There is a bunch of code an comments in the iommu type1 code that
> > > > suggest we can pin memory that is not page backed.  
> > > 
> > > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > > loaded into the type1 maps and the pin API will happily return
> > > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > > memory ends up routed down this path.
> > 
> > Indeed, my read wasn't deep enough.  Which means that we can't change
> > the ->pin_pages interface to return a struct pages array, as we don't
> > have one for those.
> 
> Actually.  gvt requires a struct page, and both s390 seem to require
> normal non-I/O, non-remapped kernel pointers.  So I think for the
> vfio_pin_pages we can assume that we only want page backed memory and
> remove the follow_fault_pfn case entirely.   But we'll probably have
> to keep it for the vfio_iommu_replay case that is not tied to
> emulated IOMMU drivers.

Right, that is my thinking - since all drivers actually need a struct
page we should have the API return a working struct page and have the
VFIO layer isolate the non-struct page stuff so it never leaks out of
this API.

This nicely fixes the various problems in all the drivers if io memory
comes down this path.

It is also why doing too much surgery deeper into type 1 probably
isn't too worthwhile as it still needs raw pfns in its data
structures for iommu modes.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-20 15:36             ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20 15:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, Nicolin Chen, borntraeger, intel-gfx, zhi.a.wang,
	jjherne, farman, jchrist, gor, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Sun, Jun 19, 2022 at 11:37:47PM -0700, Christoph Hellwig wrote:
> On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> > On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > > There is a bunch of code an comments in the iommu type1 code that
> > > > suggest we can pin memory that is not page backed.  
> > > 
> > > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > > loaded into the type1 maps and the pin API will happily return
> > > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > > memory ends up routed down this path.
> > 
> > Indeed, my read wasn't deep enough.  Which means that we can't change
> > the ->pin_pages interface to return a struct pages array, as we don't
> > have one for those.
> 
> Actually.  gvt requires a struct page, and both s390 seem to require
> normal non-I/O, non-remapped kernel pointers.  So I think for the
> vfio_pin_pages we can assume that we only want page backed memory and
> remove the follow_fault_pfn case entirely.   But we'll probably have
> to keep it for the vfio_iommu_replay case that is not tied to
> emulated IOMMU drivers.

Right, that is my thinking - since all drivers actually need a struct
page we should have the API return a working struct page and have the
VFIO layer isolate the non-struct page stuff so it never leaks out of
this API.

This nicely fixes the various problems in all the drivers if io memory
comes down this path.

It is also why doing too much surgery deeper into type 1 probably
isn't too worthwhile as it still needs raw pfns in its data
structures for iommu modes.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-20 15:36             ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20 15:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, Nicolin Chen,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor, hca,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar,
	svens

On Sun, Jun 19, 2022 at 11:37:47PM -0700, Christoph Hellwig wrote:
> On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> > On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > > There is a bunch of code an comments in the iommu type1 code that
> > > > suggest we can pin memory that is not page backed.  
> > > 
> > > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > > loaded into the type1 maps and the pin API will happily return
> > > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > > memory ends up routed down this path.
> > 
> > Indeed, my read wasn't deep enough.  Which means that we can't change
> > the ->pin_pages interface to return a struct pages array, as we don't
> > have one for those.
> 
> Actually.  gvt requires a struct page, and both s390 seem to require
> normal non-I/O, non-remapped kernel pointers.  So I think for the
> vfio_pin_pages we can assume that we only want page backed memory and
> remove the follow_fault_pfn case entirely.   But we'll probably have
> to keep it for the vfio_iommu_replay case that is not tied to
> emulated IOMMU drivers.

Right, that is my thinking - since all drivers actually need a struct
page we should have the API return a working struct page and have the
VFIO layer isolate the non-struct page stuff so it never leaks out of
this API.

This nicely fixes the various problems in all the drivers if io memory
comes down this path.

It is also why doing too much surgery deeper into type 1 probably
isn't too worthwhile as it still needs raw pfns in its data
structures for iommu modes.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-20  6:32         ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-06-20 15:39           ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20 15:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nicolin Chen, kwankhede, corbet, hca, gor, agordeev, borntraeger,
	svens, zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, tvrtko.ursulin, airlied, daniel, farman, mjrosato,
	pasic, vneethv, oberpar, freude, akrowiak, jjherne,
	alex.williamson, cohuck, kevin.tian, jchrist, kvm, linux-doc,
	linux-kernel, linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:

> > This helps because we now block io memory from ever getting into these
> > call paths. I'm pretty sure this is a serious security bug, but would
> > let the IBM folks remark as I don't know it all that well..
> 
> Prevent as in crash when trying to convert it to a page?

That or when calling memcpy() on an IO memory PFN that the guest
passed into the dma s/g list the ccw driver is processing.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-20 15:39           ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20 15:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, Nicolin Chen, borntraeger, intel-gfx, zhi.a.wang,
	jjherne, farman, jchrist, gor, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:

> > This helps because we now block io memory from ever getting into these
> > call paths. I'm pretty sure this is a serious security bug, but would
> > let the IBM folks remark as I don't know it all that well..
> 
> Prevent as in crash when trying to convert it to a page?

That or when calling memcpy() on an IO memory PFN that the guest
passed into the dma s/g list the ccw driver is processing.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-20 15:39           ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-20 15:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, Nicolin Chen,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor, hca,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar,
	svens

On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:

> > This helps because we now block io memory from ever getting into these
> > call paths. I'm pretty sure this is a serious security bug, but would
> > let the IBM folks remark as I don't know it all that well..
> 
> Prevent as in crash when trying to convert it to a page?

That or when calling memcpy() on an IO memory PFN that the guest
passed into the dma s/g list the ccw driver is processing.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 1/6] vfio/ap: Pass in physical address of ind to ap_aqic()
  2022-06-20 10:00     ` Harald Freudenberger
  (?)
@ 2022-06-21 21:01       ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:01 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, jgg, borntraeger, intel-gfx, zhi.a.wang, akrowiak,
	farman, jchrist, gor, hca, alex.williamson, rodrigo.vivi,
	intel-gvt-dev, jjherne, tvrtko.ursulin, cohuck, oberpar, svens

On Mon, Jun 20, 2022 at 12:00:53PM +0200, Harald Freudenberger wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2022-06-17 01:52, Nicolin Chen wrote:
> > The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
> > virt value that's casted from a physical address "h_nib". Inside the
> > ap_aqic(), it does virt_to_phys() again.
> > 
> > Since ap_aqic() needs a physical address, let's just pass in a pa of
> > ind directly. So change the "ind" to "pa_ind".
> > 
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

> Add my Reviewed-By: Harald Freudenberger <freude@linux.ibm.com>

Will do. Thanks!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 1/6] vfio/ap: Pass in physical address of ind to ap_aqic()
@ 2022-06-21 21:01       ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:01 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, akrowiak, jjherne, alex.williamson, cohuck,
	jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Mon, Jun 20, 2022 at 12:00:53PM +0200, Harald Freudenberger wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2022-06-17 01:52, Nicolin Chen wrote:
> > The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
> > virt value that's casted from a physical address "h_nib". Inside the
> > ap_aqic(), it does virt_to_phys() again.
> > 
> > Since ap_aqic() needs a physical address, let's just pass in a pa of
> > ind directly. So change the "ind" to "pa_ind".
> > 
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

> Add my Reviewed-By: Harald Freudenberger <freude@linux.ibm.com>

Will do. Thanks!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 1/6] vfio/ap: Pass in physical address of ind to ap_aqic()
@ 2022-06-21 21:01       ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:01 UTC (permalink / raw)
  To: Harald Freudenberger
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, jgg,
	borntraeger, intel-gfx, akrowiak, farman, jchrist, gor, hca,
	rodrigo.vivi, intel-gvt-dev, jjherne, cohuck, oberpar, svens

On Mon, Jun 20, 2022 at 12:00:53PM +0200, Harald Freudenberger wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2022-06-17 01:52, Nicolin Chen wrote:
> > The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
> > virt value that's casted from a physical address "h_nib". Inside the
> > ap_aqic(), it does virt_to_phys() again.
> > 
> > Since ap_aqic() needs a physical address, let's just pass in a pa of
> > ind directly. So change the "ind" to "pa_ind".
> > 
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

> Add my Reviewed-By: Harald Freudenberger <freude@linux.ibm.com>

Will do. Thanks!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-20  6:32         ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-06-21 21:21           ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, Jason Gunthorpe, borntraeger, intel-gfx,
	zhi.a.wang, jjherne, farman, jchrist, gor, hca, alex.williamson,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin,
	cohuck, oberpar, svens

On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:
> On Sun, Jun 19, 2022 at 11:57:26PM -0300, Jason Gunthorpe wrote:
> > The remark about io memory is because on s390 memcpy() will crash even
> > on ioremapped memory, you have to use the memcpy_to/fromio() which
> > uses the special s390 io access instructions.
> 
> Yes.  The same is true for various other architectures, inluding arm64
> under the right circumstances.
> 
> > This helps because we now block io memory from ever getting into these
> > call paths. I'm pretty sure this is a serious security bug, but would
> > let the IBM folks remark as I don't know it all that well..
> 
> Prevent as in crash when trying to convert it to a page?
> 
> > As for the kmap, I thought it was standard practice even if it is a
> > non-highmem? Aren't people trying to use this for other security
> > stuff these days?
> 
> Ira has been lookin into the protection keys, although they don't
> apply to s390.  Either way I don't object to using kmap, but the
> commit log doesn't make much sense to me.

How about the updated commit log below? Thanks.

The pinned PFN list returned from vfio_pin_pages() is converted using
page_to_pfn(), so direct access via memcpy() will crash on S390 if the
PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
the special s390 IO access instructions.

As a standard practice for security purpose, add kmap_local_page() to
block any IO memory from ever getting into this call path.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-21 21:21           ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, kwankhede, corbet, hca, gor, agordeev,
	borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:
> On Sun, Jun 19, 2022 at 11:57:26PM -0300, Jason Gunthorpe wrote:
> > The remark about io memory is because on s390 memcpy() will crash even
> > on ioremapped memory, you have to use the memcpy_to/fromio() which
> > uses the special s390 io access instructions.
> 
> Yes.  The same is true for various other architectures, inluding arm64
> under the right circumstances.
> 
> > This helps because we now block io memory from ever getting into these
> > call paths. I'm pretty sure this is a serious security bug, but would
> > let the IBM folks remark as I don't know it all that well..
> 
> Prevent as in crash when trying to convert it to a page?
> 
> > As for the kmap, I thought it was standard practice even if it is a
> > non-highmem? Aren't people trying to use this for other security
> > stuff these days?
> 
> Ira has been lookin into the protection keys, although they don't
> apply to s390.  Either way I don't object to using kmap, but the
> commit log doesn't make much sense to me.

How about the updated commit log below? Thanks.

The pinned PFN list returned from vfio_pin_pages() is converted using
page_to_pfn(), so direct access via memcpy() will crash on S390 if the
PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
the special s390 IO access instructions.

As a standard practice for security purpose, add kmap_local_page() to
block any IO memory from ever getting into this call path.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-21 21:21           ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic,
	Jason Gunthorpe, borntraeger, intel-gfx, jjherne, farman,
	jchrist, gor, hca, freude, rodrigo.vivi, intel-gvt-dev, akrowiak,
	cohuck, oberpar, svens

On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:
> On Sun, Jun 19, 2022 at 11:57:26PM -0300, Jason Gunthorpe wrote:
> > The remark about io memory is because on s390 memcpy() will crash even
> > on ioremapped memory, you have to use the memcpy_to/fromio() which
> > uses the special s390 io access instructions.
> 
> Yes.  The same is true for various other architectures, inluding arm64
> under the right circumstances.
> 
> > This helps because we now block io memory from ever getting into these
> > call paths. I'm pretty sure this is a serious security bug, but would
> > let the IBM folks remark as I don't know it all that well..
> 
> Prevent as in crash when trying to convert it to a page?
> 
> > As for the kmap, I thought it was standard practice even if it is a
> > non-highmem? Aren't people trying to use this for other security
> > stuff these days?
> 
> Ira has been lookin into the protection keys, although they don't
> apply to s390.  Either way I don't object to using kmap, but the
> commit log doesn't make much sense to me.

How about the updated commit log below? Thanks.

The pinned PFN list returned from vfio_pin_pages() is converted using
page_to_pfn(), so direct access via memcpy() will crash on S390 if the
PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
the special s390 IO access instructions.

As a standard practice for security purpose, add kmap_local_page() to
block any IO memory from ever getting into this call path.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
  2022-06-20 15:36             ` Jason Gunthorpe
  (?)
@ 2022-06-21 21:47               ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:47 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Mon, Jun 20, 2022 at 12:36:28PM -0300, Jason Gunthorpe wrote:
> On Sun, Jun 19, 2022 at 11:37:47PM -0700, Christoph Hellwig wrote:
> > On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> > > On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > > > There is a bunch of code an comments in the iommu type1 code that
> > > > > suggest we can pin memory that is not page backed.  
> > > > 
> > > > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > > > loaded into the type1 maps and the pin API will happily return
> > > > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > > > memory ends up routed down this path.
> > > 
> > > Indeed, my read wasn't deep enough.  Which means that we can't change
> > > the ->pin_pages interface to return a struct pages array, as we don't
> > > have one for those.
> > 
> > Actually.  gvt requires a struct page, and both s390 seem to require
> > normal non-I/O, non-remapped kernel pointers.  So I think for the
> > vfio_pin_pages we can assume that we only want page backed memory and
> > remove the follow_fault_pfn case entirely.   But we'll probably have
> > to keep it for the vfio_iommu_replay case that is not tied to
> > emulated IOMMU drivers.
> 
> Right, that is my thinking - since all drivers actually need a struct
> page we should have the API return a working struct page and have the
> VFIO layer isolate the non-struct page stuff so it never leaks out of
> this API.
> 
> This nicely fixes the various problems in all the drivers if io memory
> comes down this path.
> 
> It is also why doing too much surgery deeper into type 1 probably
> isn't too worthwhile as it still needs raw pfns in its data
> structures for iommu modes.

Christoph, do you agree with Jason's remark on not doing too much
surgery into type1 code? Or do you still want this series to change
type1 like removing follow_fault_pfn() that you mentioned above?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-21 21:47               ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:47 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, hca, alex.williamson, freude, rodrigo.vivi,
	intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck, oberpar, svens

On Mon, Jun 20, 2022 at 12:36:28PM -0300, Jason Gunthorpe wrote:
> On Sun, Jun 19, 2022 at 11:37:47PM -0700, Christoph Hellwig wrote:
> > On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> > > On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > > > There is a bunch of code an comments in the iommu type1 code that
> > > > > suggest we can pin memory that is not page backed.  
> > > > 
> > > > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > > > loaded into the type1 maps and the pin API will happily return
> > > > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > > > memory ends up routed down this path.
> > > 
> > > Indeed, my read wasn't deep enough.  Which means that we can't change
> > > the ->pin_pages interface to return a struct pages array, as we don't
> > > have one for those.
> > 
> > Actually.  gvt requires a struct page, and both s390 seem to require
> > normal non-I/O, non-remapped kernel pointers.  So I think for the
> > vfio_pin_pages we can assume that we only want page backed memory and
> > remove the follow_fault_pfn case entirely.   But we'll probably have
> > to keep it for the vfio_iommu_replay case that is not tied to
> > emulated IOMMU drivers.
> 
> Right, that is my thinking - since all drivers actually need a struct
> page we should have the API return a working struct page and have the
> VFIO layer isolate the non-struct page stuff so it never leaks out of
> this API.
> 
> This nicely fixes the various problems in all the drivers if io memory
> comes down this path.
> 
> It is also why doing too much surgery deeper into type 1 probably
> isn't too worthwhile as it still needs raw pfns in its data
> structures for iommu modes.

Christoph, do you agree with Jason's remark on not doing too much
surgery into type1 code? Or do you still want this series to change
type1 like removing follow_fault_pfn() that you mentioned above?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
@ 2022-06-21 21:47               ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-21 21:47 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, borntraeger,
	intel-gfx, jjherne, farman, jchrist, gor, hca, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar, svens

On Mon, Jun 20, 2022 at 12:36:28PM -0300, Jason Gunthorpe wrote:
> On Sun, Jun 19, 2022 at 11:37:47PM -0700, Christoph Hellwig wrote:
> > On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> > > On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > > > There is a bunch of code an comments in the iommu type1 code that
> > > > > suggest we can pin memory that is not page backed.  
> > > > 
> > > > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > > > loaded into the type1 maps and the pin API will happily return
> > > > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > > > memory ends up routed down this path.
> > > 
> > > Indeed, my read wasn't deep enough.  Which means that we can't change
> > > the ->pin_pages interface to return a struct pages array, as we don't
> > > have one for those.
> > 
> > Actually.  gvt requires a struct page, and both s390 seem to require
> > normal non-I/O, non-remapped kernel pointers.  So I think for the
> > vfio_pin_pages we can assume that we only want page backed memory and
> > remove the follow_fault_pfn case entirely.   But we'll probably have
> > to keep it for the vfio_iommu_replay case that is not tied to
> > emulated IOMMU drivers.
> 
> Right, that is my thinking - since all drivers actually need a struct
> page we should have the API return a working struct page and have the
> VFIO layer isolate the non-struct page stuff so it never leaks out of
> this API.
> 
> This nicely fixes the various problems in all the drivers if io memory
> comes down this path.
> 
> It is also why doing too much surgery deeper into type 1 probably
> isn't too worthwhile as it still needs raw pfns in its data
> structures for iommu modes.

Christoph, do you agree with Jason's remark on not doing too much
surgery into type1 code? Or do you still want this series to change
type1 like removing follow_fault_pfn() that you mentioned above?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
  2022-06-17  8:42     ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-06-22  1:18       ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-22  1:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwankhede, corbet, hca, gor, agordeev, borntraeger, svens,
	zhenyuw, zhi.a.wang, jani.nikula, joonas.lahtinen, rodrigo.vivi,
	tvrtko.ursulin, airlied, daniel, farman, mjrosato, pasic,
	vneethv, oberpar, freude, akrowiak, jjherne, alex.williamson,
	cohuck, jgg, kevin.tian, jchrist, kvm, linux-doc, linux-kernel,
	linux-s390, intel-gvt-dev, intel-gfx, dri-devel

On Fri, Jun 17, 2022 at 01:42:42AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:09PM -0700, Nicolin Chen wrote:
> > +	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
> > +	drm_WARN_ON(&i915->drm, ret != npage);
> 
> The shifting of gfn seems to happen bother here and in the callers.

Sorry. I overlooked this line. I can add another preparatory patches
for callers to pass in an IOVA other than "pfn << PAGE_SHIFT", if you
think it's necessary: although GVT still does things in PFN, both ap
and ccw prepare their PFN lists from IOVAs, which now can be omitted.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
@ 2022-06-22  1:18       ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-22  1:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, linux-s390, kvm,
	corbet, pasic, jgg, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, hca, alex.williamson, freude, rodrigo.vivi,
	intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck, oberpar, svens

On Fri, Jun 17, 2022 at 01:42:42AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:09PM -0700, Nicolin Chen wrote:
> > +	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
> > +	drm_WARN_ON(&i915->drm, ret != npage);
> 
> The shifting of gfn seems to happen bother here and in the callers.

Sorry. I overlooked this line. I can add another preparatory patches
for callers to pass in an IOVA other than "pfn << PAGE_SHIFT", if you
think it's necessary: although GVT still does things in PFN, both ap
and ccw prepare their PFN lists from IOVAs, which now can be omitted.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
@ 2022-06-22  1:18       ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-22  1:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, linux-s390, kvm, corbet, pasic, jgg,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor, hca,
	freude, rodrigo.vivi, intel-gvt-dev, akrowiak, cohuck, oberpar,
	svens

On Fri, Jun 17, 2022 at 01:42:42AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:52:09PM -0700, Nicolin Chen wrote:
> > +	ret = vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT, npage);
> > +	drm_WARN_ON(&i915->drm, ret != npage);
> 
> The shifting of gfn seems to happen bother here and in the callers.

Sorry. I overlooked this line. I can add another preparatory patches
for callers to pass in an IOVA other than "pfn << PAGE_SHIFT", if you
think it's necessary: although GVT still does things in PFN, both ap
and ccw prepare their PFN lists from IOVAs, which now can be omitted.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-21 21:21           ` Nicolin Chen
  (?)
@ 2022-06-24 13:56             ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 13:56 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Christoph Hellwig, kwankhede, corbet, hca, gor, agordeev,
	borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Tue, Jun 21, 2022 at 02:21:22PM -0700, Nicolin Chen wrote:
> On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:
> > On Sun, Jun 19, 2022 at 11:57:26PM -0300, Jason Gunthorpe wrote:
> > > The remark about io memory is because on s390 memcpy() will crash even
> > > on ioremapped memory, you have to use the memcpy_to/fromio() which
> > > uses the special s390 io access instructions.
> > 
> > Yes.  The same is true for various other architectures, inluding arm64
> > under the right circumstances.
> > 
> > > This helps because we now block io memory from ever getting into these
> > > call paths. I'm pretty sure this is a serious security bug, but would
> > > let the IBM folks remark as I don't know it all that well..
> > 
> > Prevent as in crash when trying to convert it to a page?
> > 
> > > As for the kmap, I thought it was standard practice even if it is a
> > > non-highmem? Aren't people trying to use this for other security
> > > stuff these days?
> > 
> > Ira has been lookin into the protection keys, although they don't
> > apply to s390.  Either way I don't object to using kmap, but the
> > commit log doesn't make much sense to me.
> 
> How about the updated commit log below? Thanks.
> 
> The pinned PFN list returned from vfio_pin_pages() is converted using
> page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> the special s390 IO access instructions.
> 
> As a standard practice for security purpose, add kmap_local_page() to
> block any IO memory from ever getting into this call path.

The kmap_local_page is not about the IO memory, the switch to struct
page is what is protecting against IO memory.

Use kmap_local_page() is just the correct way to convert a struct page
into a CPU address to use with memcpy and it is a NOP on S390 because
it doesn't use highmem/etc.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-24 13:56             ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 13:56 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, pasic, kvm, corbet,
	Christoph Hellwig, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, linux-s390, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Tue, Jun 21, 2022 at 02:21:22PM -0700, Nicolin Chen wrote:
> On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:
> > On Sun, Jun 19, 2022 at 11:57:26PM -0300, Jason Gunthorpe wrote:
> > > The remark about io memory is because on s390 memcpy() will crash even
> > > on ioremapped memory, you have to use the memcpy_to/fromio() which
> > > uses the special s390 io access instructions.
> > 
> > Yes.  The same is true for various other architectures, inluding arm64
> > under the right circumstances.
> > 
> > > This helps because we now block io memory from ever getting into these
> > > call paths. I'm pretty sure this is a serious security bug, but would
> > > let the IBM folks remark as I don't know it all that well..
> > 
> > Prevent as in crash when trying to convert it to a page?
> > 
> > > As for the kmap, I thought it was standard practice even if it is a
> > > non-highmem? Aren't people trying to use this for other security
> > > stuff these days?
> > 
> > Ira has been lookin into the protection keys, although they don't
> > apply to s390.  Either way I don't object to using kmap, but the
> > commit log doesn't make much sense to me.
> 
> How about the updated commit log below? Thanks.
> 
> The pinned PFN list returned from vfio_pin_pages() is converted using
> page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> the special s390 IO access instructions.
> 
> As a standard practice for security purpose, add kmap_local_page() to
> block any IO memory from ever getting into this call path.

The kmap_local_page is not about the IO memory, the switch to struct
page is what is protecting against IO memory.

Use kmap_local_page() is just the correct way to convert a struct page
into a CPU address to use with memcpy and it is a NOP on S390 because
it doesn't use highmem/etc.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-24 13:56             ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 13:56 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, pasic, kvm, corbet, Christoph Hellwig,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor,
	linux-s390, hca, freude, rodrigo.vivi, intel-gvt-dev, akrowiak,
	cohuck, oberpar, svens

On Tue, Jun 21, 2022 at 02:21:22PM -0700, Nicolin Chen wrote:
> On Sun, Jun 19, 2022 at 11:32:07PM -0700, Christoph Hellwig wrote:
> > On Sun, Jun 19, 2022 at 11:57:26PM -0300, Jason Gunthorpe wrote:
> > > The remark about io memory is because on s390 memcpy() will crash even
> > > on ioremapped memory, you have to use the memcpy_to/fromio() which
> > > uses the special s390 io access instructions.
> > 
> > Yes.  The same is true for various other architectures, inluding arm64
> > under the right circumstances.
> > 
> > > This helps because we now block io memory from ever getting into these
> > > call paths. I'm pretty sure this is a serious security bug, but would
> > > let the IBM folks remark as I don't know it all that well..
> > 
> > Prevent as in crash when trying to convert it to a page?
> > 
> > > As for the kmap, I thought it was standard practice even if it is a
> > > non-highmem? Aren't people trying to use this for other security
> > > stuff these days?
> > 
> > Ira has been lookin into the protection keys, although they don't
> > apply to s390.  Either way I don't object to using kmap, but the
> > commit log doesn't make much sense to me.
> 
> How about the updated commit log below? Thanks.
> 
> The pinned PFN list returned from vfio_pin_pages() is converted using
> page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> the special s390 IO access instructions.
> 
> As a standard practice for security purpose, add kmap_local_page() to
> block any IO memory from ever getting into this call path.

The kmap_local_page is not about the IO memory, the switch to struct
page is what is protecting against IO memory.

Use kmap_local_page() is just the correct way to convert a struct page
into a CPU address to use with memcpy and it is a NOP on S390 because
it doesn't use highmem/etc.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-24 13:56             ` Jason Gunthorpe
@ 2022-06-24 19:22               ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-24 19:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, kwankhede, corbet, hca, gor, agordeev,
	borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Fri, Jun 24, 2022 at 10:56:15AM -0300, Jason Gunthorpe wrote:

> > How about the updated commit log below? Thanks.
> > 
> > The pinned PFN list returned from vfio_pin_pages() is converted using
> > page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> > PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> > the special s390 IO access instructions.
> > 
> > As a standard practice for security purpose, add kmap_local_page() to
> > block any IO memory from ever getting into this call path.
> 
> The kmap_local_page is not about the IO memory, the switch to struct
> page is what is protecting against IO memory.
> 
> Use kmap_local_page() is just the correct way to convert a struct page
> into a CPU address to use with memcpy and it is a NOP on S390 because
> it doesn't use highmem/etc.

I thought the whole purpose of switching to "struct page *" was to use
kmap_local_page() for the memcpy call, and the combination of these two
does the protection. Do you mind explaining how the switching part does
the protection?

Thanks!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-24 19:22               ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-24 19:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, pasic, kvm, corbet,
	Christoph Hellwig, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, linux-s390, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Fri, Jun 24, 2022 at 10:56:15AM -0300, Jason Gunthorpe wrote:

> > How about the updated commit log below? Thanks.
> > 
> > The pinned PFN list returned from vfio_pin_pages() is converted using
> > page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> > PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> > the special s390 IO access instructions.
> > 
> > As a standard practice for security purpose, add kmap_local_page() to
> > block any IO memory from ever getting into this call path.
> 
> The kmap_local_page is not about the IO memory, the switch to struct
> page is what is protecting against IO memory.
> 
> Use kmap_local_page() is just the correct way to convert a struct page
> into a CPU address to use with memcpy and it is a NOP on S390 because
> it doesn't use highmem/etc.

I thought the whole purpose of switching to "struct page *" was to use
kmap_local_page() for the memcpy call, and the combination of these two
does the protection. Do you mind explaining how the switching part does
the protection?

Thanks!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-24 19:22               ` Nicolin Chen
  (?)
@ 2022-06-24 19:30                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 19:30 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Christoph Hellwig, kwankhede, corbet, hca, gor, agordeev,
	borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Fri, Jun 24, 2022 at 12:22:36PM -0700, Nicolin Chen wrote:
> On Fri, Jun 24, 2022 at 10:56:15AM -0300, Jason Gunthorpe wrote:
> 
> > > How about the updated commit log below? Thanks.
> > > 
> > > The pinned PFN list returned from vfio_pin_pages() is converted using
> > > page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> > > PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> > > the special s390 IO access instructions.
> > > 
> > > As a standard practice for security purpose, add kmap_local_page() to
> > > block any IO memory from ever getting into this call path.
> > 
> > The kmap_local_page is not about the IO memory, the switch to struct
> > page is what is protecting against IO memory.
> > 
> > Use kmap_local_page() is just the correct way to convert a struct page
> > into a CPU address to use with memcpy and it is a NOP on S390 because
> > it doesn't use highmem/etc.
> 
> I thought the whole purpose of switching to "struct page *" was to use
> kmap_local_page() for the memcpy call, and the combination of these two
> does the protection. Do you mind explaining how the switching part does
> the protection?

A 'struct page' (ignoring ZONE_DEVICE) cannot represent IO memory
inherently because a 'struct page' is always a CPU coherent thing.

So, when VFIO returns only struct pages it also is promising that the
memory is not IO.

The kmap_local_page() arose because the code doing memcpy had to be
updated to go from a struct page to a void * for use with memcpy and
the kmap_local_page() is the correct API to use for that.

The existing code which casts a pfn to a void * is improper.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-24 19:30                 ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 19:30 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, pasic, kvm, corbet,
	Christoph Hellwig, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, linux-s390, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Fri, Jun 24, 2022 at 12:22:36PM -0700, Nicolin Chen wrote:
> On Fri, Jun 24, 2022 at 10:56:15AM -0300, Jason Gunthorpe wrote:
> 
> > > How about the updated commit log below? Thanks.
> > > 
> > > The pinned PFN list returned from vfio_pin_pages() is converted using
> > > page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> > > PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> > > the special s390 IO access instructions.
> > > 
> > > As a standard practice for security purpose, add kmap_local_page() to
> > > block any IO memory from ever getting into this call path.
> > 
> > The kmap_local_page is not about the IO memory, the switch to struct
> > page is what is protecting against IO memory.
> > 
> > Use kmap_local_page() is just the correct way to convert a struct page
> > into a CPU address to use with memcpy and it is a NOP on S390 because
> > it doesn't use highmem/etc.
> 
> I thought the whole purpose of switching to "struct page *" was to use
> kmap_local_page() for the memcpy call, and the combination of these two
> does the protection. Do you mind explaining how the switching part does
> the protection?

A 'struct page' (ignoring ZONE_DEVICE) cannot represent IO memory
inherently because a 'struct page' is always a CPU coherent thing.

So, when VFIO returns only struct pages it also is promising that the
memory is not IO.

The kmap_local_page() arose because the code doing memcpy had to be
updated to go from a struct page to a void * for use with memcpy and
the kmap_local_page() is the correct API to use for that.

The existing code which casts a pfn to a void * is improper.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-24 19:30                 ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 19:30 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, pasic, kvm, corbet, Christoph Hellwig,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor,
	linux-s390, hca, freude, rodrigo.vivi, intel-gvt-dev, akrowiak,
	cohuck, oberpar, svens

On Fri, Jun 24, 2022 at 12:22:36PM -0700, Nicolin Chen wrote:
> On Fri, Jun 24, 2022 at 10:56:15AM -0300, Jason Gunthorpe wrote:
> 
> > > How about the updated commit log below? Thanks.
> > > 
> > > The pinned PFN list returned from vfio_pin_pages() is converted using
> > > page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> > > PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> > > the special s390 IO access instructions.
> > > 
> > > As a standard practice for security purpose, add kmap_local_page() to
> > > block any IO memory from ever getting into this call path.
> > 
> > The kmap_local_page is not about the IO memory, the switch to struct
> > page is what is protecting against IO memory.
> > 
> > Use kmap_local_page() is just the correct way to convert a struct page
> > into a CPU address to use with memcpy and it is a NOP on S390 because
> > it doesn't use highmem/etc.
> 
> I thought the whole purpose of switching to "struct page *" was to use
> kmap_local_page() for the memcpy call, and the combination of these two
> does the protection. Do you mind explaining how the switching part does
> the protection?

A 'struct page' (ignoring ZONE_DEVICE) cannot represent IO memory
inherently because a 'struct page' is always a CPU coherent thing.

So, when VFIO returns only struct pages it also is promising that the
memory is not IO.

The kmap_local_page() arose because the code doing memcpy had to be
updated to go from a struct page to a void * for use with memcpy and
the kmap_local_page() is the correct API to use for that.

The existing code which casts a pfn to a void * is improper.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-24 19:30                 ` Jason Gunthorpe
@ 2022-06-24 20:12                   ` Nicolin Chen
  -1 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-24 20:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, kwankhede, corbet, hca, gor, agordeev,
	borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Fri, Jun 24, 2022 at 04:30:42PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 24, 2022 at 12:22:36PM -0700, Nicolin Chen wrote:
> > On Fri, Jun 24, 2022 at 10:56:15AM -0300, Jason Gunthorpe wrote:
> > 
> > > > How about the updated commit log below? Thanks.
> > > > 
> > > > The pinned PFN list returned from vfio_pin_pages() is converted using
> > > > page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> > > > PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> > > > the special s390 IO access instructions.
> > > > 
> > > > As a standard practice for security purpose, add kmap_local_page() to
> > > > block any IO memory from ever getting into this call path.
> > > 
> > > The kmap_local_page is not about the IO memory, the switch to struct
> > > page is what is protecting against IO memory.
> > > 
> > > Use kmap_local_page() is just the correct way to convert a struct page
> > > into a CPU address to use with memcpy and it is a NOP on S390 because
> > > it doesn't use highmem/etc.
> > 
> > I thought the whole purpose of switching to "struct page *" was to use
> > kmap_local_page() for the memcpy call, and the combination of these two
> > does the protection. Do you mind explaining how the switching part does
> > the protection?
> 
> A 'struct page' (ignoring ZONE_DEVICE) cannot represent IO memory
> inherently because a 'struct page' is always a CPU coherent thing.
> 
> So, when VFIO returns only struct pages it also is promising that the
> memory is not IO.

Ah. The "switch to struct page" means the next patch that changes the
vfio_pin_pages, not the pfn_to_page in this patch.

> The kmap_local_page() arose because the code doing memcpy had to be
> updated to go from a struct page to a void * for use with memcpy and
> the kmap_local_page() is the correct API to use for that.
> 
> The existing code which casts a pfn to a void * is improper.

Yes.

If I understand everything correctly:

A PFN is not secure enough to promise that the memory is not IO. And
direct access via memcpy() that only handles CPU memory will crash on
S390 if the PFN is an IO PFN, as we have to use the memcpy_to/fromio()
that uses the special S390 IO access instructions. On the other hand,
a "struct page *" is always a CPU coherent thing that fits memcpy().

Also, casting a PFN to "void *" for memcpy() is not an proper practice,
kmap_local_page() is the correct API to call here, though S390 doesn't
use highmem, which means kmap_local_page() is a NOP.

There's a following patch changing the vfio_pin_pages() API to return
a list of "struct page *" instead of PFNs. It will block any IO memory
from ever getting into this call path, for such a security purpose. In
this patch, add kmap_local_page() to prepare for that.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-24 20:12                   ` Nicolin Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Nicolin Chen @ 2022-06-24 20:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, pasic, kvm, corbet,
	Christoph Hellwig, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, linux-s390, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Fri, Jun 24, 2022 at 04:30:42PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 24, 2022 at 12:22:36PM -0700, Nicolin Chen wrote:
> > On Fri, Jun 24, 2022 at 10:56:15AM -0300, Jason Gunthorpe wrote:
> > 
> > > > How about the updated commit log below? Thanks.
> > > > 
> > > > The pinned PFN list returned from vfio_pin_pages() is converted using
> > > > page_to_pfn(), so direct access via memcpy() will crash on S390 if the
> > > > PFN is an IO PFN, as we have to use the memcpy_to/fromio(), which uses
> > > > the special s390 IO access instructions.
> > > > 
> > > > As a standard practice for security purpose, add kmap_local_page() to
> > > > block any IO memory from ever getting into this call path.
> > > 
> > > The kmap_local_page is not about the IO memory, the switch to struct
> > > page is what is protecting against IO memory.
> > > 
> > > Use kmap_local_page() is just the correct way to convert a struct page
> > > into a CPU address to use with memcpy and it is a NOP on S390 because
> > > it doesn't use highmem/etc.
> > 
> > I thought the whole purpose of switching to "struct page *" was to use
> > kmap_local_page() for the memcpy call, and the combination of these two
> > does the protection. Do you mind explaining how the switching part does
> > the protection?
> 
> A 'struct page' (ignoring ZONE_DEVICE) cannot represent IO memory
> inherently because a 'struct page' is always a CPU coherent thing.
> 
> So, when VFIO returns only struct pages it also is promising that the
> memory is not IO.

Ah. The "switch to struct page" means the next patch that changes the
vfio_pin_pages, not the pfn_to_page in this patch.

> The kmap_local_page() arose because the code doing memcpy had to be
> updated to go from a struct page to a void * for use with memcpy and
> the kmap_local_page() is the correct API to use for that.
> 
> The existing code which casts a pfn to a void * is improper.

Yes.

If I understand everything correctly:

A PFN is not secure enough to promise that the memory is not IO. And
direct access via memcpy() that only handles CPU memory will crash on
S390 if the PFN is an IO PFN, as we have to use the memcpy_to/fromio()
that uses the special S390 IO access instructions. On the other hand,
a "struct page *" is always a CPU coherent thing that fits memcpy().

Also, casting a PFN to "void *" for memcpy() is not an proper practice,
kmap_local_page() is the correct API to call here, though S390 doesn't
use highmem, which means kmap_local_page() is a NOP.

There's a following patch changing the vfio_pin_pages() API to return
a list of "struct page *" instead of PFNs. It will block any IO memory
from ever getting into this call path, for such a security purpose. In
this patch, add kmap_local_page() to prepare for that.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
  2022-06-24 20:12                   ` Nicolin Chen
  (?)
@ 2022-06-24 22:42                     ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 22:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Christoph Hellwig, kwankhede, corbet, hca, gor, agordeev,
	borntraeger, svens, zhenyuw, zhi.a.wang, jani.nikula,
	joonas.lahtinen, rodrigo.vivi, tvrtko.ursulin, airlied, daniel,
	farman, mjrosato, pasic, vneethv, oberpar, freude, akrowiak,
	jjherne, alex.williamson, cohuck, kevin.tian, jchrist, kvm,
	linux-doc, linux-kernel, linux-s390, intel-gvt-dev, intel-gfx,
	dri-devel

On Fri, Jun 24, 2022 at 01:12:56PM -0700, Nicolin Chen wrote:

> > The kmap_local_page() arose because the code doing memcpy had to be
> > updated to go from a struct page to a void * for use with memcpy and
> > the kmap_local_page() is the correct API to use for that.
> > 
> > The existing code which casts a pfn to a void * is improper.
> 
> Yes.
> 
> If I understand everything correctly:
> 
> A PFN is not secure enough to promise that the memory is not IO. And
> direct access via memcpy() that only handles CPU memory will crash on
> S390 if the PFN is an IO PFN, as we have to use the memcpy_to/fromio()
> that uses the special S390 IO access instructions. On the other hand,
> a "struct page *" is always a CPU coherent thing that fits memcpy().
> 
> Also, casting a PFN to "void *" for memcpy() is not an proper practice,
> kmap_local_page() is the correct API to call here, though S390 doesn't
> use highmem, which means kmap_local_page() is a NOP.
> 
> There's a following patch changing the vfio_pin_pages() API to return
> a list of "struct page *" instead of PFNs. It will block any IO memory
> from ever getting into this call path, for such a security purpose. In
> this patch, add kmap_local_page() to prepare for that.

Yes, basically

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-24 22:42                     ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 22:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, kevin.tian, dri-devel,
	linux-kernel, kwankhede, vneethv, agordeev, pasic, kvm, corbet,
	Christoph Hellwig, borntraeger, intel-gfx, zhi.a.wang, jjherne,
	farman, jchrist, gor, linux-s390, hca, alex.williamson, freude,
	rodrigo.vivi, intel-gvt-dev, akrowiak, tvrtko.ursulin, cohuck,
	oberpar, svens

On Fri, Jun 24, 2022 at 01:12:56PM -0700, Nicolin Chen wrote:

> > The kmap_local_page() arose because the code doing memcpy had to be
> > updated to go from a struct page to a void * for use with memcpy and
> > the kmap_local_page() is the correct API to use for that.
> > 
> > The existing code which casts a pfn to a void * is improper.
> 
> Yes.
> 
> If I understand everything correctly:
> 
> A PFN is not secure enough to promise that the memory is not IO. And
> direct access via memcpy() that only handles CPU memory will crash on
> S390 if the PFN is an IO PFN, as we have to use the memcpy_to/fromio()
> that uses the special S390 IO access instructions. On the other hand,
> a "struct page *" is always a CPU coherent thing that fits memcpy().
> 
> Also, casting a PFN to "void *" for memcpy() is not an proper practice,
> kmap_local_page() is the correct API to call here, though S390 doesn't
> use highmem, which means kmap_local_page() is a NOP.
> 
> There's a following patch changing the vfio_pin_pages() API to return
> a list of "struct page *" instead of PFNs. It will block any IO memory
> from ever getting into this call path, for such a security purpose. In
> this patch, add kmap_local_page() to prepare for that.

Yes, basically

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Intel-gfx] [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy
@ 2022-06-24 22:42                     ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 22:42 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: mjrosato, linux-doc, airlied, dri-devel, linux-kernel, kwankhede,
	vneethv, agordeev, pasic, kvm, corbet, Christoph Hellwig,
	borntraeger, intel-gfx, jjherne, farman, jchrist, gor,
	linux-s390, hca, freude, rodrigo.vivi, intel-gvt-dev, akrowiak,
	cohuck, oberpar, svens

On Fri, Jun 24, 2022 at 01:12:56PM -0700, Nicolin Chen wrote:

> > The kmap_local_page() arose because the code doing memcpy had to be
> > updated to go from a struct page to a void * for use with memcpy and
> > the kmap_local_page() is the correct API to use for that.
> > 
> > The existing code which casts a pfn to a void * is improper.
> 
> Yes.
> 
> If I understand everything correctly:
> 
> A PFN is not secure enough to promise that the memory is not IO. And
> direct access via memcpy() that only handles CPU memory will crash on
> S390 if the PFN is an IO PFN, as we have to use the memcpy_to/fromio()
> that uses the special S390 IO access instructions. On the other hand,
> a "struct page *" is always a CPU coherent thing that fits memcpy().
> 
> Also, casting a PFN to "void *" for memcpy() is not an proper practice,
> kmap_local_page() is the correct API to call here, though S390 doesn't
> use highmem, which means kmap_local_page() is a NOP.
> 
> There's a following patch changing the vfio_pin_pages() API to return
> a list of "struct page *" instead of PFNs. It will block any IO memory
> from ever getting into this call path, for such a security purpose. In
> this patch, add kmap_local_page() to prepare for that.

Yes, basically

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2022-06-24 22:42 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16 23:52 [RFT][PATCH v1 0/6] Update vfio_pin/unpin_pages API Nicolin Chen
2022-06-16 23:52 ` Nicolin Chen
2022-06-16 23:52 ` [RFT][PATCH v1 1/6] vfio/ap: Pass in physical address of ind to ap_aqic() Nicolin Chen
2022-06-16 23:52   ` Nicolin Chen
2022-06-20 10:00   ` Harald Freudenberger
2022-06-20 10:00     ` Harald Freudenberger
2022-06-21 21:01     ` Nicolin Chen
2022-06-21 21:01       ` [Intel-gfx] " Nicolin Chen
2022-06-21 21:01       ` Nicolin Chen
2022-06-16 23:52 ` [RFT][PATCH v1 2/6] vfio/ccw: Only pass in contiguous pages Nicolin Chen
2022-06-16 23:52   ` Nicolin Chen
2022-06-16 23:52 ` [RFT][PATCH v1 3/6] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API Nicolin Chen
2022-06-16 23:52   ` Nicolin Chen
2022-06-17  8:42   ` Christoph Hellwig
2022-06-17  8:42     ` [Intel-gfx] " Christoph Hellwig
2022-06-17 21:57     ` Nicolin Chen
2022-06-17 21:57       ` Nicolin Chen
2022-06-22  1:18     ` Nicolin Chen
2022-06-22  1:18       ` [Intel-gfx] " Nicolin Chen
2022-06-22  1:18       ` Nicolin Chen
2022-06-16 23:52 ` [RFT][PATCH v1 4/6] vfio: Rename user_iova of vfio_dma_rw() Nicolin Chen
2022-06-16 23:52   ` Nicolin Chen
2022-06-16 23:52 ` [RFT][PATCH v1 5/6] vfio/ccw: Add kmap_local_page() for memcpy Nicolin Chen
2022-06-16 23:52   ` Nicolin Chen
2022-06-17  8:44   ` Christoph Hellwig
2022-06-17  8:44     ` [Intel-gfx] " Christoph Hellwig
2022-06-17 21:58     ` Nicolin Chen
2022-06-17 21:58       ` Nicolin Chen
2022-06-20  2:57     ` Jason Gunthorpe
2022-06-20  2:57       ` [Intel-gfx] " Jason Gunthorpe
2022-06-20  2:57       ` Jason Gunthorpe
2022-06-20  6:32       ` Christoph Hellwig
2022-06-20  6:32         ` [Intel-gfx] " Christoph Hellwig
2022-06-20 15:39         ` Jason Gunthorpe
2022-06-20 15:39           ` [Intel-gfx] " Jason Gunthorpe
2022-06-20 15:39           ` Jason Gunthorpe
2022-06-21 21:21         ` Nicolin Chen
2022-06-21 21:21           ` [Intel-gfx] " Nicolin Chen
2022-06-21 21:21           ` Nicolin Chen
2022-06-24 13:56           ` Jason Gunthorpe
2022-06-24 13:56             ` [Intel-gfx] " Jason Gunthorpe
2022-06-24 13:56             ` Jason Gunthorpe
2022-06-24 19:22             ` Nicolin Chen
2022-06-24 19:22               ` Nicolin Chen
2022-06-24 19:30               ` Jason Gunthorpe
2022-06-24 19:30                 ` [Intel-gfx] " Jason Gunthorpe
2022-06-24 19:30                 ` Jason Gunthorpe
2022-06-24 20:12                 ` Nicolin Chen
2022-06-24 20:12                   ` Nicolin Chen
2022-06-24 22:42                   ` Jason Gunthorpe
2022-06-24 22:42                     ` [Intel-gfx] " Jason Gunthorpe
2022-06-24 22:42                     ` Jason Gunthorpe
2022-06-16 23:52 ` [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages() Nicolin Chen
2022-06-16 23:52   ` Nicolin Chen
2022-06-17  8:54   ` Christoph Hellwig
2022-06-17  8:54     ` [Intel-gfx] " Christoph Hellwig
2022-06-17 22:06     ` Nicolin Chen
2022-06-17 22:06       ` Nicolin Chen
2022-06-19  6:18       ` Christoph Hellwig
2022-06-19  6:18         ` [Intel-gfx] " Christoph Hellwig
2022-06-19  6:41         ` Nicolin Chen
2022-06-19  6:41           ` Nicolin Chen
2022-06-20  3:00     ` Jason Gunthorpe
2022-06-20  3:00       ` [Intel-gfx] " Jason Gunthorpe
2022-06-20  3:00       ` Jason Gunthorpe
2022-06-20  5:51       ` Christoph Hellwig
2022-06-20  5:51         ` [Intel-gfx] " Christoph Hellwig
2022-06-20  6:37         ` Christoph Hellwig
2022-06-20  6:37           ` [Intel-gfx] " Christoph Hellwig
2022-06-20 15:36           ` Jason Gunthorpe
2022-06-20 15:36             ` [Intel-gfx] " Jason Gunthorpe
2022-06-20 15:36             ` Jason Gunthorpe
2022-06-21 21:47             ` Nicolin Chen
2022-06-21 21:47               ` [Intel-gfx] " Nicolin Chen
2022-06-21 21:47               ` Nicolin Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.