All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/7] s390: vfio-ccw fixes
@ 2019-05-03 13:49 Eric Farman
  2019-05-03 13:49 ` [PATCH 1/7] s390/cio: Update SCSW if it points to the end of the chain Eric Farman
                   ` (6 more replies)
  0 siblings, 7 replies; 35+ messages in thread
From: Eric Farman @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Cornelia Huck, Farhan Ali
  Cc: Halil Pasic, Pierre Morel, linux-s390, kvm, Eric Farman

The attached are a few fixes to the vfio-ccw kernel code for potential
errors or architecture anomalies.  Under normal usage, and even most
abnormal usage, they don't expose any problems to a well-behaved guest
and its devices.  But, they are deficiencies just the same and could
cause some weird behavior if they ever popped up in real life.

I have tried to arrange these patches in a "solves a noticeable problem
with existing workloads" to "solves a theoretical problem with
hypothetical workloads" order.  This way, the bigger ones at the end
can be discussed without impeding the smaller and more impactful ones
at the start.

They are based on today's master, not Conny's vfio-ccw tree even though
there are some good fixes pending there.  I've run this series both with
and without that code, but couldn't decide which base would provide an
easier time applying patches.  "I think" they should apply fine to both,
but I apologize in advance if I guessed wrong!  :)

Eric Farman (7):
  s390/cio: Update SCSW if it points to the end of the chain
  s390/cio: Set vfio-ccw FSM state before ioeventfd
  s390/cio: Split pfn_array_alloc_pin into pieces
  s390/cio: Initialize the host addresses in pfn_array
  s390/cio: Allow zero-length CCWs in vfio-ccw
  s390/cio: Don't pin vfio pages for empty transfers
  s390/cio: Remove vfio-ccw checks of command codes

 drivers/s390/cio/vfio_ccw_cp.c  | 163 ++++++++++++++++++++++++++++------------
 drivers/s390/cio/vfio_ccw_drv.c |   6 +-
 2 files changed, 116 insertions(+), 53 deletions(-)

-- 
2.16.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/7] s390/cio: Update SCSW if it points to the end of the chain
  2019-05-03 13:49 [PATCH v1 0/7] s390: vfio-ccw fixes Eric Farman
@ 2019-05-03 13:49 ` Eric Farman
  2019-05-06 14:47   ` Cornelia Huck
  2019-05-03 13:49 ` [PATCH 2/7] s390/cio: Set vfio-ccw FSM state before ioeventfd Eric Farman
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Cornelia Huck, Farhan Ali
  Cc: Halil Pasic, Pierre Morel, linux-s390, kvm, Eric Farman

Per the POPs [1], when processing an interrupt the SCSW.CPA field of an
IRB generally points to 8 bytes after the last CCW that was executed
(there are exceptions, but this is the most common behavior).

In the case of an error, this points us to the first un-executed CCW
in the chain.  But in the case of normal I/O, the address points beyond
the end of the chain.  While the guest generally only cares about this
when possibly restarting a channel program after error recovery, we
should convert the address even in the good scenario so that we provide
a consistent, valid, response upon I/O completion.

[1] Figure 16-6 in SA22-7832-11.  The footnotes in that table also state
that this is true even if the resulting address is invalid or protected,
but moving to the end of the guest chain should not be a surprise.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 384b3987eeb4..f86da78eaeaa 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -870,7 +870,7 @@ void cp_update_scsw(struct channel_program *cp, union scsw *scsw)
 	 */
 	list_for_each_entry(chain, &cp->ccwchain_list, next) {
 		ccw_head = (u32)(u64)chain->ch_ccw;
-		if (is_cpa_within_range(cpa, ccw_head, chain->ch_len)) {
+		if (is_cpa_within_range(cpa, ccw_head, chain->ch_len + 1)) {
 			/*
 			 * (cpa - ccw_head) is the offset value of the host
 			 * physical ccw to its chain head.
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/7] s390/cio: Set vfio-ccw FSM state before ioeventfd
  2019-05-03 13:49 [PATCH v1 0/7] s390: vfio-ccw fixes Eric Farman
  2019-05-03 13:49 ` [PATCH 1/7] s390/cio: Update SCSW if it points to the end of the chain Eric Farman
@ 2019-05-03 13:49 ` Eric Farman
  2019-05-06 14:51   ` Cornelia Huck
  2019-05-03 13:49 ` [PATCH 3/7] s390/cio: Split pfn_array_alloc_pin into pieces Eric Farman
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Cornelia Huck, Farhan Ali
  Cc: Halil Pasic, Pierre Morel, linux-s390, kvm, Eric Farman

Otherwise, the guest can believe it's okay to start another I/O
and bump into the non-idle state.  This results in a cc=3
(or cc=2 with the pending async CSCH/HSCH code [1]) to the guest,
which is unfortunate since everything is otherwise working normally.

[1] https://patchwork.kernel.org/comment/22588563/

Signed-off-by: Eric Farman <farman@linux.ibm.com>

---

I think this might've been part of Pierre's FSM cleanup?
---
 drivers/s390/cio/vfio_ccw_drv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 0b3b9de45c60..ddd21b6149fd 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -86,11 +86,11 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
 	}
 	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
 
-	if (private->io_trigger)
-		eventfd_signal(private->io_trigger, 1);
-
 	if (private->mdev && is_final)
 		private->state = VFIO_CCW_STATE_IDLE;
+
+	if (private->io_trigger)
+		eventfd_signal(private->io_trigger, 1);
 }
 
 /*
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/7] s390/cio: Split pfn_array_alloc_pin into pieces
  2019-05-03 13:49 [PATCH v1 0/7] s390: vfio-ccw fixes Eric Farman
  2019-05-03 13:49 ` [PATCH 1/7] s390/cio: Update SCSW if it points to the end of the chain Eric Farman
  2019-05-03 13:49 ` [PATCH 2/7] s390/cio: Set vfio-ccw FSM state before ioeventfd Eric Farman
@ 2019-05-03 13:49 ` Eric Farman
  2019-05-08 10:43   ` Cornelia Huck
  2019-05-03 13:49 ` [PATCH 4/7] s390/cio: Initialize the host addresses in pfn_array Eric Farman
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Cornelia Huck, Farhan Ali
  Cc: Halil Pasic, Pierre Morel, linux-s390, kvm, Eric Farman

The pfn_array_alloc_pin routine is doing too much.  Today, it does the
alloc of the pfn_array struct and its member arrays, builds the iova
address lists out of a contiguous piece of guest memory, and asks vfio
to pin the resulting pages.

Let's effectively revert a significant portion of commit 5c1cfb1c3948
("vfio: ccw: refactor and improve pfn_array_alloc_pin()") such that we
break pfn_array_alloc_pin() into its component pieces, and have one
routine that allocates/populates the pfn_array structs, and another
that actually pins the memory.  In the future, we will be able to
handle scenarios where pinning memory isn't actually appropriate.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 72 +++++++++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 25 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index f86da78eaeaa..b70306c06150 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -50,28 +50,25 @@ struct ccwchain {
 };
 
 /*
- * pfn_array_alloc_pin() - alloc memory for PFNs, then pin user pages in memory
+ * pfn_array_alloc() - alloc memory for PFNs
  * @pa: pfn_array on which to perform the operation
- * @mdev: the mediated device to perform pin/unpin operations
  * @iova: target guest physical address
  * @len: number of bytes that should be pinned from @iova
  *
- * Attempt to allocate memory for PFNs, and pin user pages in memory.
+ * Attempt to allocate memory for PFN.
  *
  * Usage of pfn_array:
  * We expect (pa_nr == 0) and (pa_iova_pfn == NULL), any field in
  * this structure will be filled in by this function.
  *
  * Returns:
- *   Number of pages pinned on success.
- *   If @pa->pa_nr is not 0, or @pa->pa_iova_pfn is not NULL initially,
- *   returns -EINVAL.
- *   If no pages were pinned, returns -errno.
+ *         0 if PFNs are allocated
+ *   -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova_pfn is not NULL
+ *   -ENOMEM if alloc failed
  */
-static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
-			       u64 iova, unsigned int len)
+static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
 {
-	int i, ret = 0;
+	int i;
 
 	if (!len)
 		return 0;
@@ -97,23 +94,33 @@ static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
 	for (i = 1; i < pa->pa_nr; i++)
 		pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
 
+	return 0;
+}
+
+/*
+ * pfn_array_pin() - Pin user pages in memory
+ * @pa: pfn_array on which to perform the operation
+ * @mdev: the mediated device to perform pin operations
+ *
+ * Returns:
+ *   Number of pages pinned on success.
+ *   If fewer pages than requested were pinned, returns -EINVAL
+ *   If no pages were pinned, returns -errno.
+ */
+static int pfn_array_pin(struct pfn_array *pa, struct device *mdev)
+{
+	int ret = 0;
+
 	ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
 			     IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
 
-	if (ret < 0) {
-		goto err_out;
-	} else if (ret > 0 && ret != pa->pa_nr) {
+	if (ret > 0 && ret != pa->pa_nr) {
 		vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
 		ret = -EINVAL;
-		goto err_out;
 	}
 
-	return ret;
-
-err_out:
-	pa->pa_nr = 0;
-	kfree(pa->pa_iova_pfn);
-	pa->pa_iova_pfn = NULL;
+	if (ret < 0)
+		pa->pa_iova = 0;
 
 	return ret;
 }
@@ -209,10 +216,16 @@ static long copy_from_iova(struct device *mdev,
 	int i, ret;
 	unsigned long l, m;
 
-	ret = pfn_array_alloc_pin(&pa, mdev, iova, n);
-	if (ret <= 0)
+	ret = pfn_array_alloc(&pa, iova, n);
+	if (ret < 0)
 		return ret;
 
+	ret = pfn_array_pin(&pa, mdev);
+	if (ret < 0) {
+		pfn_array_unpin_free(&pa, mdev);
+		return ret;
+	}
+
 	l = n;
 	for (i = 0; i < pa.pa_nr; i++) {
 		from = pa.pa_pfn[i] << PAGE_SHIFT;
@@ -559,7 +572,11 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
 	if (ret)
 		goto out_init;
 
-	ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda, ccw->count);
+	ret = pfn_array_alloc(pat->pat_pa, ccw->cda, ccw->count);
+	if (ret < 0)
+		goto out_unpin;
+
+	ret = pfn_array_pin(pat->pat_pa, cp->mdev);
 	if (ret < 0)
 		goto out_unpin;
 
@@ -589,6 +606,7 @@ static int ccwchain_fetch_idal(struct ccwchain *chain,
 {
 	struct ccw1 *ccw;
 	struct pfn_array_table *pat;
+	struct pfn_array *pa;
 	unsigned long *idaws;
 	u64 idaw_iova;
 	unsigned int idaw_nr, idaw_len;
@@ -627,9 +645,13 @@ static int ccwchain_fetch_idal(struct ccwchain *chain,
 
 	for (i = 0; i < idaw_nr; i++) {
 		idaw_iova = *(idaws + i);
+		pa = pat->pat_pa + i;
+
+		ret = pfn_array_alloc(pa, idaw_iova, 1);
+		if (ret < 0)
+			goto out_free_idaws;
 
-		ret = pfn_array_alloc_pin(pat->pat_pa + i, cp->mdev,
-					  idaw_iova, 1);
+		ret = pfn_array_pin(pa, cp->mdev);
 		if (ret < 0)
 			goto out_free_idaws;
 	}
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/7] s390/cio: Initialize the host addresses in pfn_array
  2019-05-03 13:49 [PATCH v1 0/7] s390: vfio-ccw fixes Eric Farman
                   ` (2 preceding siblings ...)
  2019-05-03 13:49 ` [PATCH 3/7] s390/cio: Split pfn_array_alloc_pin into pieces Eric Farman
@ 2019-05-03 13:49 ` Eric Farman
  2019-05-03 13:49 ` [PATCH 5/7] s390/cio: Allow zero-length CCWs in vfio-ccw Eric Farman
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 35+ messages in thread
From: Eric Farman @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Cornelia Huck, Farhan Ali
  Cc: Halil Pasic, Pierre Morel, linux-s390, kvm, Eric Farman

Let's initialize the host address to something that is invalid,
rather than letting it default to zero.  This just makes it easier
to notice when a pin operation has failed or been skipped.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index b70306c06150..43c6d1cf1653 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -91,8 +91,11 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
 	pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
 
 	pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
-	for (i = 1; i < pa->pa_nr; i++)
+	pa->pa_pfn[0] = -1ULL;
+	for (i = 1; i < pa->pa_nr; i++) {
 		pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
+		pa->pa_pfn[i] = -1ULL;
+	}
 
 	return 0;
 }
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/7] s390/cio: Allow zero-length CCWs in vfio-ccw
  2019-05-03 13:49 [PATCH v1 0/7] s390: vfio-ccw fixes Eric Farman
                   ` (3 preceding siblings ...)
  2019-05-03 13:49 ` [PATCH 4/7] s390/cio: Initialize the host addresses in pfn_array Eric Farman
@ 2019-05-03 13:49 ` Eric Farman
  2019-05-03 13:49 ` [PATCH 6/7] s390/cio: Don't pin vfio pages for empty transfers Eric Farman
  2019-05-03 13:49 ` [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes Eric Farman
  6 siblings, 0 replies; 35+ messages in thread
From: Eric Farman @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Cornelia Huck, Farhan Ali
  Cc: Halil Pasic, Pierre Morel, linux-s390, kvm, Eric Farman

It is possible that a guest might issue a CCW with a length of zero,
and will expect a particular response.  Consider this chain:

   Address   Format-1 CCW
   --------  -----------------
 0 33110EC0  346022CC 33177468
 1 33110EC8  CF200000 3318300C

CCW[0] moves a little more than two pages, but also has the
Suppress Length Indication (SLI) bit set to handle the expectation
that considerably less data will be moved.  CCW[1] also has the SLI
bit set, and has a length of zero.  Once vfio-ccw does its magic,
the kernel issues a start subchannel on behalf of the guest with this:

   Address   Format-1 CCW
   --------  -----------------
 0 021EDED0  346422CC 021F0000
 1 021EDED8  CF240000 3318300C

Both CCWs were converted to an IDAL and have the corresponding flags
set (which is by design), but only the address of the first data
address is converted to something the host is aware of.  The second
CCW still has the address used by the guest, which happens to be (A)
(probably) an invalid address for the host, and (B) an invalid IDAW
address (doubleword boundary, etc.).

While the I/O fails, it doesn't fail correctly.  In this example, we
would receive a program check for an invalid IDAW address, instead of
a unit check for an invalid command.

To fix this, revert commit 4cebc5d6a6ff ("vfio: ccw: validate the
count field of a ccw before pinning") and allow the individual fetch
routines to process them like anything else.  We'll make a slight
adjustment to our allocation of the pfn_array (for direct CCWs) or
IDAL (for IDAL CCWs) memory, so that we have room for at least one
address even though no data will be transferred.

Note that this doesn't provide us with a channel program that will
fail in the expected way.  Since our length is zero, vfio_pin_pages()
returns -EINVAL and cp_prefetch() will thus fail.  This will be fixed
in the next patch.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 43c6d1cf1653..c3fffac92aa1 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -70,9 +70,6 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
 {
 	int i;
 
-	if (!len)
-		return 0;
-
 	if (pa->pa_nr || pa->pa_iova_pfn)
 		return -EINVAL;
 
@@ -366,8 +363,6 @@ static void ccwchain_cda_free(struct ccwchain *chain, int idx)
 
 	if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
 		return;
-	if (!ccw->count)
-		return;
 
 	kfree((void *)(u64)ccw->cda);
 }
@@ -552,18 +547,12 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
 	struct pfn_array_table *pat;
 	unsigned long *idaws;
 	int ret;
+	int bytes = 1;
 
 	ccw = chain->ch_ccw + idx;
 
-	if (!ccw->count) {
-		/*
-		 * We just want the translation result of any direct ccw
-		 * to be an IDA ccw, so let's add the IDA flag for it.
-		 * Although the flag will be ignored by firmware.
-		 */
-		ccw->flags |= CCW_FLAG_IDA;
-		return 0;
-	}
+	if (ccw->count)
+		bytes = ccw->count;
 
 	/*
 	 * Pin data page(s) in memory.
@@ -575,7 +564,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
 	if (ret)
 		goto out_init;
 
-	ret = pfn_array_alloc(pat->pat_pa, ccw->cda, ccw->count);
+	ret = pfn_array_alloc(pat->pat_pa, ccw->cda, bytes);
 	if (ret < 0)
 		goto out_unpin;
 
@@ -614,17 +603,18 @@ static int ccwchain_fetch_idal(struct ccwchain *chain,
 	u64 idaw_iova;
 	unsigned int idaw_nr, idaw_len;
 	int i, ret;
+	int bytes = 1;
 
 	ccw = chain->ch_ccw + idx;
 
-	if (!ccw->count)
-		return 0;
+	if (ccw->count)
+		bytes = ccw->count;
 
 	/* Calculate size of idaws. */
 	ret = copy_from_iova(cp->mdev, &idaw_iova, ccw->cda, sizeof(idaw_iova));
 	if (ret)
 		return ret;
-	idaw_nr = idal_nr_words((void *)(idaw_iova), ccw->count);
+	idaw_nr = idal_nr_words((void *)(idaw_iova), bytes);
 	idaw_len = idaw_nr * sizeof(*idaws);
 
 	/* Pin data page(s) in memory. */
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/7] s390/cio: Don't pin vfio pages for empty transfers
  2019-05-03 13:49 [PATCH v1 0/7] s390: vfio-ccw fixes Eric Farman
                   ` (4 preceding siblings ...)
  2019-05-03 13:49 ` [PATCH 5/7] s390/cio: Allow zero-length CCWs in vfio-ccw Eric Farman
@ 2019-05-03 13:49 ` Eric Farman
  2019-05-06 15:20   ` Cornelia Huck
  2019-05-03 13:49 ` [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes Eric Farman
  6 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Cornelia Huck, Farhan Ali
  Cc: Halil Pasic, Pierre Morel, linux-s390, kvm, Eric Farman

If a CCW has a count of zero, then no data will be transferred and
pinning/unpinning memory is unnecessary.

In addition to that, the skip flag of a CCW offers the possibility of
data not being transferred, but is only meaningful for certain commands.
Specifically, it is only applicable for a read, read backward, sense, or
sense ID CCW and will be ignored for any other command code
(SA22-7832-11 page 15-64, and figure 15-30 on page 15-75).

(A sense ID is xE4, while a sense is x04 with possible modifiers in the
upper four bits.  So we will cover the whole "family" of sense CCWs.)

For all those scenarios, since there is no requirement for the target
address to be valid, we should skip the call to vfio_pin_pages() and
rely on the IDAL address we have allocated/built for the channel
program.  The fact that the individual IDAWs within the IDAL are
invalid is fine, since they aren't actually checked in these cases.

Set pa_nr to zero, when skipping the pfn_array_pin() call, since it is
defined as the number of pages pinned.  This will cause the vfio unpin
logic to return -EINVAL, but since the return code is not checked it
will not harm our cleanup path.

As we do this, since the pfn_array_pin() routine returns the number of
pages pinned, and we might not be doing that, the logic for converting
a CCW from direct-addressed to IDAL needs to ensure there is room for
one IDAW in the IDAL being built since a zero-length IDAL isn't great.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 61 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index c3fffac92aa1..36d76b821209 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -285,6 +285,10 @@ static long copy_ccw_from_iova(struct channel_program *cp,
 /*
  * Helpers to operate ccwchain.
  */
+#define ccw_is_read(_ccw) (((_ccw)->cmd_code & 0x03) == 0x02)
+#define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
+#define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == CCW_CMD_BASIC_SENSE)
+
 #define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
 
 #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
@@ -292,10 +296,43 @@ static long copy_ccw_from_iova(struct channel_program *cp,
 #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
 
 #define ccw_is_idal(_ccw) ((_ccw)->flags & CCW_FLAG_IDA)
-
+#define ccw_is_skip(_ccw) ((_ccw)->flags & CCW_FLAG_SKIP)
 
 #define ccw_is_chain(_ccw) ((_ccw)->flags & (CCW_FLAG_CC | CCW_FLAG_DC))
 
+/*
+ * ccw_does_data_transfer()
+ *
+ * Determine whether a CCW will move any data, such that the guest pages
+ * would need to be pinned before performing the I/O.
+ *
+ * Returns 1 if yes, 0 if no.
+ */
+static inline int ccw_does_data_transfer(struct ccw1 *ccw)
+{
+	/* If the count field is zero, then no data will be transferred */
+	if (ccw->count == 0)
+		return 0;
+
+	/* If the skip flag is off, then data will be transferred */
+	if (!ccw_is_skip(ccw))
+		return 1;
+
+	/*
+	 * If the skip flag is on, it is only meaningful if the command
+	 * code is a read, read backward, sense, or sense ID.  In those
+	 * cases, no data will be transferred.
+	 */
+	if (ccw_is_read(ccw) || ccw_is_read_backward(ccw))
+		return 0;
+
+	if (ccw_is_sense(ccw))
+		return 0;
+
+	/* The skip flag is on, but it is ignored for this command code. */
+	return 1;
+}
+
 /*
  * is_cpa_within_range()
  *
@@ -548,11 +585,14 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
 	unsigned long *idaws;
 	int ret;
 	int bytes = 1;
+	int idaw_nr = 1;
 
 	ccw = chain->ch_ccw + idx;
 
-	if (ccw->count)
+	if (ccw->count) {
 		bytes = ccw->count;
+		idaw_nr = idal_nr_words((void *)(u64)ccw->cda, ccw->count);
+	}
 
 	/*
 	 * Pin data page(s) in memory.
@@ -568,12 +608,16 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
 	if (ret < 0)
 		goto out_unpin;
 
-	ret = pfn_array_pin(pat->pat_pa, cp->mdev);
-	if (ret < 0)
-		goto out_unpin;
+	if (ccw_does_data_transfer(ccw)) {
+		ret = pfn_array_pin(pat->pat_pa, cp->mdev);
+		if (ret < 0)
+			goto out_unpin;
+	} else {
+		pat->pat_pa->pa_nr = 0;
+	}
 
 	/* Translate this direct ccw to a idal ccw. */
-	idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA | GFP_KERNEL);
+	idaws = kcalloc(idaw_nr, sizeof(*idaws), GFP_DMA | GFP_KERNEL);
 	if (!idaws) {
 		ret = -ENOMEM;
 		goto out_unpin;
@@ -644,6 +688,11 @@ static int ccwchain_fetch_idal(struct ccwchain *chain,
 		if (ret < 0)
 			goto out_free_idaws;
 
+		if (!ccw_does_data_transfer(ccw)) {
+			pa->pa_nr = 0;
+			continue;
+		}
+
 		ret = pfn_array_pin(pa, cp->mdev);
 		if (ret < 0)
 			goto out_free_idaws;
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-03 13:49 [PATCH v1 0/7] s390: vfio-ccw fixes Eric Farman
                   ` (5 preceding siblings ...)
  2019-05-03 13:49 ` [PATCH 6/7] s390/cio: Don't pin vfio pages for empty transfers Eric Farman
@ 2019-05-03 13:49 ` Eric Farman
  2019-05-06 12:56   ` Pierre Morel
  2019-05-06 15:37   ` Cornelia Huck
  6 siblings, 2 replies; 35+ messages in thread
From: Eric Farman @ 2019-05-03 13:49 UTC (permalink / raw)
  To: Cornelia Huck, Farhan Ali
  Cc: Halil Pasic, Pierre Morel, linux-s390, kvm, Eric Farman

If the CCW being processed is a No-Operation, then by definition no
data is being transferred.  Let's fold those checks into the normal
CCW processors, rather than skipping out early.

Likewise, if the CCW being processed is a "test" (an invented
definition to simply mean it ends in a zero), let's permit that to go
through to the hardware.  There's nothing inherently unique about
those command codes versus one that ends in an eight [1], or any other
otherwise valid command codes that are undefined for the device type
in question.

[1] POPS states that a x08 is a TIC CCW, and that having any high-order
bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
high-order bits are ignored.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 36d76b821209..c0a52025bf06 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct channel_program *cp,
 #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
 #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == CCW_CMD_BASIC_SENSE)
 
-#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
-
 #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
 
 #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
@@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct ccw1 *ccw)
 	if (ccw->count == 0)
 		return 0;
 
+	/* If the command is a NOP, then no data will be transferred */
+	if (ccw_is_noop(ccw))
+		return 0;
+
 	/* If the skip flag is off, then data will be transferred */
 	if (!ccw_is_skip(ccw))
 		return 1;
@@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain *chain, int idx)
 {
 	struct ccw1 *ccw = chain->ch_ccw + idx;
 
-	if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
+	if (ccw_is_tic(ccw))
 		return;
 
 	kfree((void *)(u64)ccw->cda);
@@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain *chain,
 {
 	struct ccw1 *ccw = chain->ch_ccw + idx;
 
-	if (ccw_is_test(ccw) || ccw_is_noop(ccw))
-		return 0;
-
 	if (ccw_is_tic(ccw))
 		return ccwchain_fetch_tic(chain, idx, cp);
 
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-03 13:49 ` [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes Eric Farman
@ 2019-05-06 12:56   ` Pierre Morel
  2019-05-06 15:39     ` Eric Farman
  2019-05-06 15:37   ` Cornelia Huck
  1 sibling, 1 reply; 35+ messages in thread
From: Pierre Morel @ 2019-05-06 12:56 UTC (permalink / raw)
  To: Eric Farman, Cornelia Huck, Farhan Ali; +Cc: Halil Pasic, linux-s390, kvm

On 03/05/2019 15:49, Eric Farman wrote:
> If the CCW being processed is a No-Operation, then by definition no
> data is being transferred.  Let's fold those checks into the normal
> CCW processors, rather than skipping out early.
> 
> Likewise, if the CCW being processed is a "test" (an invented
> definition to simply mean it ends in a zero), let's permit that to go
> through to the hardware.  There's nothing inherently unique about
> those command codes versus one that ends in an eight [1], or any other
> otherwise valid command codes that are undefined for the device type
> in question.
> 
> [1] POPS states that a x08 is a TIC CCW, and that having any high-order
> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
> high-order bits are ignored.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>   1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 36d76b821209..c0a52025bf06 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct channel_program *cp,
>   #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
>   #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == CCW_CMD_BASIC_SENSE)
>   
> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
> -
>   #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>   
>   #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
> @@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct ccw1 *ccw)
>   	if (ccw->count == 0)
>   		return 0;
>   
> +	/* If the command is a NOP, then no data will be transferred */
> +	if (ccw_is_noop(ccw))
> +		return 0;
> +
>   	/* If the skip flag is off, then data will be transferred */
>   	if (!ccw_is_skip(ccw))
>   		return 1;
> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain *chain, int idx)
>   {
>   	struct ccw1 *ccw = chain->ch_ccw + idx;
>   
> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
> +	if (ccw_is_tic(ccw))


AFAIR, we introduced this code to protect against noop and test with a 
non zero CDA.
This could go away only if there is somewhere the guaranty that noop 
have always a null CDA (same for test).



>   		return;
>   
>   	kfree((void *)(u64)ccw->cda);
> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain *chain,
>   {
>   	struct ccw1 *ccw = chain->ch_ccw + idx;
>   
> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw))
> -		return 0;
> -
>   	if (ccw_is_tic(ccw))
>   		return ccwchain_fetch_tic(chain, idx, cp);
>   
> 


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] s390/cio: Update SCSW if it points to the end of the chain
  2019-05-03 13:49 ` [PATCH 1/7] s390/cio: Update SCSW if it points to the end of the chain Eric Farman
@ 2019-05-06 14:47   ` Cornelia Huck
  2019-05-06 15:23     ` Eric Farman
  0 siblings, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-06 14:47 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm

On Fri,  3 May 2019 15:49:06 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Per the POPs [1], when processing an interrupt the SCSW.CPA field of an
> IRB generally points to 8 bytes after the last CCW that was executed
> (there are exceptions, but this is the most common behavior).
> 
> In the case of an error, this points us to the first un-executed CCW
> in the chain.  But in the case of normal I/O, the address points beyond
> the end of the chain.  While the guest generally only cares about this
> when possibly restarting a channel program after error recovery, we
> should convert the address even in the good scenario so that we provide
> a consistent, valid, response upon I/O completion.
> 
> [1] Figure 16-6 in SA22-7832-11.  The footnotes in that table also state
> that this is true even if the resulting address is invalid or protected,
> but moving to the end of the guest chain should not be a surprise.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 384b3987eeb4..f86da78eaeaa 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -870,7 +870,7 @@ void cp_update_scsw(struct channel_program *cp, union scsw *scsw)
>  	 */
>  	list_for_each_entry(chain, &cp->ccwchain_list, next) {
>  		ccw_head = (u32)(u64)chain->ch_ccw;
> -		if (is_cpa_within_range(cpa, ccw_head, chain->ch_len)) {
> +		if (is_cpa_within_range(cpa, ccw_head, chain->ch_len + 1)) {

Maybe add a comment

/* On successful execution, cpa points just beyond the end of the chain. */

or so, to avoid head-scratching and PoP-reading in the future?

>  			/*
>  			 * (cpa - ccw_head) is the offset value of the host
>  			 * physical ccw to its chain head.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] s390/cio: Set vfio-ccw FSM state before ioeventfd
  2019-05-03 13:49 ` [PATCH 2/7] s390/cio: Set vfio-ccw FSM state before ioeventfd Eric Farman
@ 2019-05-06 14:51   ` Cornelia Huck
  2019-05-06 16:36     ` Eric Farman
  0 siblings, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-06 14:51 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm

On Fri,  3 May 2019 15:49:07 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Otherwise, the guest can believe it's okay to start another I/O
> and bump into the non-idle state.  This results in a cc=3
> (or cc=2 with the pending async CSCH/HSCH code [1]) to the guest,

I think you can now refer to cc=2, as the csch/hsch is on its way in :)

> which is unfortunate since everything is otherwise working normally.
> 
> [1] https://patchwork.kernel.org/comment/22588563/
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> 
> ---
> 
> I think this might've been part of Pierre's FSM cleanup?

Not sure if I saw this before, but there have been quite a number of
patches going around...

> ---
>  drivers/s390/cio/vfio_ccw_drv.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 0b3b9de45c60..ddd21b6149fd 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -86,11 +86,11 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  	}
>  	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
>  
> -	if (private->io_trigger)
> -		eventfd_signal(private->io_trigger, 1);
> -
>  	if (private->mdev && is_final)
>  		private->state = VFIO_CCW_STATE_IDLE;
> +
> +	if (private->io_trigger)
> +		eventfd_signal(private->io_trigger, 1);
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/7] s390/cio: Don't pin vfio pages for empty transfers
  2019-05-03 13:49 ` [PATCH 6/7] s390/cio: Don't pin vfio pages for empty transfers Eric Farman
@ 2019-05-06 15:20   ` Cornelia Huck
  2019-05-06 15:40     ` Eric Farman
  0 siblings, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-06 15:20 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm

On Fri,  3 May 2019 15:49:11 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> If a CCW has a count of zero, then no data will be transferred and
> pinning/unpinning memory is unnecessary.
> 
> In addition to that, the skip flag of a CCW offers the possibility of
> data not being transferred, but is only meaningful for certain commands.
> Specifically, it is only applicable for a read, read backward, sense, or
> sense ID CCW and will be ignored for any other command code
> (SA22-7832-11 page 15-64, and figure 15-30 on page 15-75).

This made me look at QEMU, and it seems that we cheerfully ignore that
flag so far in our ccw interpretation code :/

> 
> (A sense ID is xE4, while a sense is x04 with possible modifiers in the
> upper four bits.  So we will cover the whole "family" of sense CCWs.)
> 
> For all those scenarios, since there is no requirement for the target
> address to be valid, we should skip the call to vfio_pin_pages() and
> rely on the IDAL address we have allocated/built for the channel
> program.  The fact that the individual IDAWs within the IDAL are
> invalid is fine, since they aren't actually checked in these cases.
> 
> Set pa_nr to zero, when skipping the pfn_array_pin() call, since it is
> defined as the number of pages pinned.  This will cause the vfio unpin
> logic to return -EINVAL, but since the return code is not checked it
> will not harm our cleanup path.

We could also try to skip the unpinning, but this works as well.

> 
> As we do this, since the pfn_array_pin() routine returns the number of
> pages pinned, and we might not be doing that, the logic for converting
> a CCW from direct-addressed to IDAL needs to ensure there is room for
> one IDAW in the IDAL being built since a zero-length IDAL isn't great.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 61 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 55 insertions(+), 6 deletions(-)

Looks good to me.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] s390/cio: Update SCSW if it points to the end of the chain
  2019-05-06 14:47   ` Cornelia Huck
@ 2019-05-06 15:23     ` Eric Farman
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Farman @ 2019-05-06 15:23 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm



On 5/6/19 10:47 AM, Cornelia Huck wrote:
> On Fri,  3 May 2019 15:49:06 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> Per the POPs [1], when processing an interrupt the SCSW.CPA field of an
>> IRB generally points to 8 bytes after the last CCW that was executed
>> (there are exceptions, but this is the most common behavior).
>>
>> In the case of an error, this points us to the first un-executed CCW
>> in the chain.  But in the case of normal I/O, the address points beyond
>> the end of the chain.  While the guest generally only cares about this
>> when possibly restarting a channel program after error recovery, we
>> should convert the address even in the good scenario so that we provide
>> a consistent, valid, response upon I/O completion.
>>
>> [1] Figure 16-6 in SA22-7832-11.  The footnotes in that table also state
>> that this is true even if the resulting address is invalid or protected,
>> but moving to the end of the guest chain should not be a surprise.
>>
>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_cp.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
>> index 384b3987eeb4..f86da78eaeaa 100644
>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>> @@ -870,7 +870,7 @@ void cp_update_scsw(struct channel_program *cp, union scsw *scsw)
>>   	 */
>>   	list_for_each_entry(chain, &cp->ccwchain_list, next) {
>>   		ccw_head = (u32)(u64)chain->ch_ccw;
>> -		if (is_cpa_within_range(cpa, ccw_head, chain->ch_len)) {
>> +		if (is_cpa_within_range(cpa, ccw_head, chain->ch_len + 1)) {
> 
> Maybe add a comment
> 
> /* On successful execution, cpa points just beyond the end of the chain. */
> 
> or so, to avoid head-scratching and PoP-reading in the future?

And deny future visitors the confusion?  :)

Good point; added.

> 
>>   			/*
>>   			 * (cpa - ccw_head) is the offset value of the host
>>   			 * physical ccw to its chain head.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-03 13:49 ` [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes Eric Farman
  2019-05-06 12:56   ` Pierre Morel
@ 2019-05-06 15:37   ` Cornelia Huck
  2019-05-06 15:46     ` Eric Farman
  1 sibling, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-06 15:37 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm

On Fri,  3 May 2019 15:49:12 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> If the CCW being processed is a No-Operation, then by definition no
> data is being transferred.  Let's fold those checks into the normal
> CCW processors, rather than skipping out early.
> 
> Likewise, if the CCW being processed is a "test" (an invented
> definition to simply mean it ends in a zero), 

The "Common I/O Device Commands" document actually defines this :)

> let's permit that to go
> through to the hardware.  There's nothing inherently unique about
> those command codes versus one that ends in an eight [1], or any other
> otherwise valid command codes that are undefined for the device type
> in question.

But I agree that everything possible should be sent to the hardware.

> 
> [1] POPS states that a x08 is a TIC CCW, and that having any high-order
> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
> high-order bits are ignored.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 36d76b821209..c0a52025bf06 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct channel_program *cp,
>  #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
>  #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == CCW_CMD_BASIC_SENSE)
>  
> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
> -
>  #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>  
>  #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
> @@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct ccw1 *ccw)
>  	if (ccw->count == 0)
>  		return 0;
>  
> +	/* If the command is a NOP, then no data will be transferred */
> +	if (ccw_is_noop(ccw))
> +		return 0;
> +

Don't you need to return 0 here for any test command as well?

(If I read the doc correctly, we'll just get a unit check in any case,
as there are no parallel I/O interfaces on modern s390 boxes. Even if
we had a parallel I/O interface, we'd just collect the status, and not
get any data transfer. FWIW, the QEMU ccw interpreter for emulated
devices rejects test ccws with a channel program check, which looks
wrong; should be a command reject instead.)

>  	/* If the skip flag is off, then data will be transferred */
>  	if (!ccw_is_skip(ccw))
>  		return 1;
> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain *chain, int idx)
>  {
>  	struct ccw1 *ccw = chain->ch_ccw + idx;
>  
> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
> +	if (ccw_is_tic(ccw))
>  		return;
>  
>  	kfree((void *)(u64)ccw->cda);
> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain *chain,
>  {
>  	struct ccw1 *ccw = chain->ch_ccw + idx;
>  
> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw))
> -		return 0;
> -
>  	if (ccw_is_tic(ccw))
>  		return ccwchain_fetch_tic(chain, idx, cp);
>  

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-06 12:56   ` Pierre Morel
@ 2019-05-06 15:39     ` Eric Farman
  2019-05-06 20:47       ` Eric Farman
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-06 15:39 UTC (permalink / raw)
  To: pmorel, Cornelia Huck, Farhan Ali; +Cc: Halil Pasic, linux-s390, kvm



On 5/6/19 8:56 AM, Pierre Morel wrote:
> On 03/05/2019 15:49, Eric Farman wrote:
>> If the CCW being processed is a No-Operation, then by definition no
>> data is being transferred.  Let's fold those checks into the normal
>> CCW processors, rather than skipping out early.
>>
>> Likewise, if the CCW being processed is a "test" (an invented
>> definition to simply mean it ends in a zero), let's permit that to go
>> through to the hardware.  There's nothing inherently unique about
>> those command codes versus one that ends in an eight [1], or any other
>> otherwise valid command codes that are undefined for the device type
>> in question.
>>
>> [1] POPS states that a x08 is a TIC CCW, and that having any high-order
>> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
>> high-order bits are ignored.
>>
>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c 
>> b/drivers/s390/cio/vfio_ccw_cp.c
>> index 36d76b821209..c0a52025bf06 100644
>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct 
>> channel_program *cp,
>>   #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
>>   #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == 
>> CCW_CMD_BASIC_SENSE)
>> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
>> -
>>   #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>>   #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
>> @@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct 
>> ccw1 *ccw)
>>       if (ccw->count == 0)
>>           return 0;
>> +    /* If the command is a NOP, then no data will be transferred */
>> +    if (ccw_is_noop(ccw))
>> +        return 0;
>> +
>>       /* If the skip flag is off, then data will be transferred */
>>       if (!ccw_is_skip(ccw))
>>           return 1;
>> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain 
>> *chain, int idx)
>>   {
>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
>> +    if (ccw_is_tic(ccw))
> 
> 
> AFAIR, we introduced this code to protect against noop and test with a 
> non zero CDA.
> This could go away only if there is somewhere the guaranty that noop 
> have always a null CDA (same for test).

What was generating either the null or "test" command codes?  I can 
provide plenty of examples for both these command codes and how they 
look coming out of vfio-ccw now.

The noop check is moved up into the "does data transfer" routine, to 
determine whether the pages should be pinned or not.  Regardless of 
whether or not the input CDA is null, we'll end up with a CCW pointing 
to a valid IDAL of invalid addresses.

The "test" command codes always struck me as funky, because x18 and xF8 
and everything in between that ends in x8 is architecturally invalid 
too, but we don't check for them like we do for things that end in x0. 
And there's a TON of other opcodes that are invalid for today's ECKD 
devices, or perhaps were valid for older DASD but have since been 
deprecated, or are only valid for non-DASD device types.  We have no 
logic to permit them, either.  If those CCWs had a non-zero CDA, we 
either pin it successfully and let the targeted device sort it out or an 
error occurs and we fail at that point.  (QEMU will see a "wirte" region 
error of -EINVAL because of vfio_pin_pages())

> 
> 
> 
>>           return;
>>       kfree((void *)(u64)ccw->cda);
>> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain *chain,
>>   {
>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw))
>> -        return 0;
>> -
>>       if (ccw_is_tic(ccw))
>>           return ccwchain_fetch_tic(chain, idx, cp);
>>
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/7] s390/cio: Don't pin vfio pages for empty transfers
  2019-05-06 15:20   ` Cornelia Huck
@ 2019-05-06 15:40     ` Eric Farman
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Farman @ 2019-05-06 15:40 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm



On 5/6/19 11:20 AM, Cornelia Huck wrote:
> On Fri,  3 May 2019 15:49:11 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> If a CCW has a count of zero, then no data will be transferred and
>> pinning/unpinning memory is unnecessary.
>>
>> In addition to that, the skip flag of a CCW offers the possibility of
>> data not being transferred, but is only meaningful for certain commands.
>> Specifically, it is only applicable for a read, read backward, sense, or
>> sense ID CCW and will be ignored for any other command code
>> (SA22-7832-11 page 15-64, and figure 15-30 on page 15-75).
> 
> This made me look at QEMU, and it seems that we cheerfully ignore that
> flag so far in our ccw interpretation code :/

Yup...  :(

> 
>>
>> (A sense ID is xE4, while a sense is x04 with possible modifiers in the
>> upper four bits.  So we will cover the whole "family" of sense CCWs.)
>>
>> For all those scenarios, since there is no requirement for the target
>> address to be valid, we should skip the call to vfio_pin_pages() and
>> rely on the IDAL address we have allocated/built for the channel
>> program.  The fact that the individual IDAWs within the IDAL are
>> invalid is fine, since they aren't actually checked in these cases.
>>
>> Set pa_nr to zero, when skipping the pfn_array_pin() call, since it is
>> defined as the number of pages pinned.  This will cause the vfio unpin
>> logic to return -EINVAL, but since the return code is not checked it
>> will not harm our cleanup path.
> 
> We could also try to skip the unpinning, but this works as well.

In an earlier version I had, I was re-purposing other fields in 
pfn_array, which was rather kludgy.  I could easily add a check for 
non-zero pa_nr here, just to be clear of what we're doing (or in case we 
decide TO check the return code from vfio_unpin_pages() some day).

> 
>>
>> As we do this, since the pfn_array_pin() routine returns the number of
>> pages pinned, and we might not be doing that, the logic for converting
>> a CCW from direct-addressed to IDAL needs to ensure there is room for
>> one IDAW in the IDAL being built since a zero-length IDAL isn't great.
>>
>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_cp.c | 61 +++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 55 insertions(+), 6 deletions(-)
> 
> Looks good to me.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-06 15:37   ` Cornelia Huck
@ 2019-05-06 15:46     ` Eric Farman
  2019-05-06 16:18       ` Cornelia Huck
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-06 15:46 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm



On 5/6/19 11:37 AM, Cornelia Huck wrote:
> On Fri,  3 May 2019 15:49:12 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> If the CCW being processed is a No-Operation, then by definition no
>> data is being transferred.  Let's fold those checks into the normal
>> CCW processors, rather than skipping out early.
>>
>> Likewise, if the CCW being processed is a "test" (an invented
>> definition to simply mean it ends in a zero),
> 
> The "Common I/O Device Commands" document actually defines this :)

Blech, okay so I didn't look early enough in that document.  Section 1.5 
it is.  :)

> 
>> let's permit that to go
>> through to the hardware.  There's nothing inherently unique about
>> those command codes versus one that ends in an eight [1], or any other
>> otherwise valid command codes that are undefined for the device type
>> in question.
> 
> But I agree that everything possible should be sent to the hardware.
> 
>>
>> [1] POPS states that a x08 is a TIC CCW, and that having any high-order
>> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
>> high-order bits are ignored.
>>
>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
>> index 36d76b821209..c0a52025bf06 100644
>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct channel_program *cp,
>>   #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
>>   #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == CCW_CMD_BASIC_SENSE)
>>   
>> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
>> -
>>   #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>>   
>>   #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
>> @@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct ccw1 *ccw)
>>   	if (ccw->count == 0)
>>   		return 0;
>>   
>> +	/* If the command is a NOP, then no data will be transferred */
>> +	if (ccw_is_noop(ccw))
>> +		return 0;
>> +
> 
> Don't you need to return 0 here for any test command as well?
> 
> (If I read the doc correctly, we'll just get a unit check in any case,
> as there are no parallel I/O interfaces on modern s390 boxes. Even if
> we had a parallel I/O interface, we'd just collect the status, and not
> get any data transfer. FWIW, the QEMU ccw interpreter for emulated
> devices rejects test ccws with a channel program check, which looks
> wrong; should be a command reject instead.)

I will go back and look.  I thought when I sent a test command with an 
address that wasn't translated I got an unhappy result, which is why I 
ripped this check out.

I was trying to use test CCWs as a safety valve for Halil's Status 
Modifier concern, so maybe I had something else wrong on that pile. 
(The careful observer would note that that code was not included here.  :)

> 
>>   	/* If the skip flag is off, then data will be transferred */
>>   	if (!ccw_is_skip(ccw))
>>   		return 1;
>> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain *chain, int idx)
>>   {
>>   	struct ccw1 *ccw = chain->ch_ccw + idx;
>>   
>> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
>> +	if (ccw_is_tic(ccw))
>>   		return;
>>   
>>   	kfree((void *)(u64)ccw->cda);
>> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain *chain,
>>   {
>>   	struct ccw1 *ccw = chain->ch_ccw + idx;
>>   
>> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw))
>> -		return 0;
>> -
>>   	if (ccw_is_tic(ccw))
>>   		return ccwchain_fetch_tic(chain, idx, cp);
>>   
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-06 15:46     ` Eric Farman
@ 2019-05-06 16:18       ` Cornelia Huck
  2019-05-06 16:25         ` Eric Farman
  0 siblings, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-06 16:18 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm

On Mon, 6 May 2019 11:46:59 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> On 5/6/19 11:37 AM, Cornelia Huck wrote:
> > On Fri,  3 May 2019 15:49:12 +0200
> > Eric Farman <farman@linux.ibm.com> wrote:
> >   
> >> If the CCW being processed is a No-Operation, then by definition no
> >> data is being transferred.  Let's fold those checks into the normal
> >> CCW processors, rather than skipping out early.
> >>
> >> Likewise, if the CCW being processed is a "test" (an invented
> >> definition to simply mean it ends in a zero),  
> > 
> > The "Common I/O Device Commands" document actually defines this :)  
> 
> Blech, okay so I didn't look early enough in that document.  Section 1.5 
> it is.  :)
> 
> >   
> >> let's permit that to go
> >> through to the hardware.  There's nothing inherently unique about
> >> those command codes versus one that ends in an eight [1], or any other
> >> otherwise valid command codes that are undefined for the device type
> >> in question.  
> > 
> > But I agree that everything possible should be sent to the hardware.
> >   
> >>
> >> [1] POPS states that a x08 is a TIC CCW, and that having any high-order
> >> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
> >> high-order bits are ignored.
> >>
> >> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> >> ---
> >>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
> >>   1 file changed, 5 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> >> index 36d76b821209..c0a52025bf06 100644
> >> --- a/drivers/s390/cio/vfio_ccw_cp.c
> >> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> >> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct channel_program *cp,
> >>   #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
> >>   #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == CCW_CMD_BASIC_SENSE)
> >>   
> >> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
> >> -
> >>   #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
> >>   
> >>   #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
> >> @@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct ccw1 *ccw)
> >>   	if (ccw->count == 0)
> >>   		return 0;
> >>   
> >> +	/* If the command is a NOP, then no data will be transferred */
> >> +	if (ccw_is_noop(ccw))
> >> +		return 0;
> >> +  
> > 
> > Don't you need to return 0 here for any test command as well?
> > 
> > (If I read the doc correctly, we'll just get a unit check in any case,
> > as there are no parallel I/O interfaces on modern s390 boxes. Even if
> > we had a parallel I/O interface, we'd just collect the status, and not
> > get any data transfer. FWIW, the QEMU ccw interpreter for emulated
> > devices rejects test ccws with a channel program check, which looks
> > wrong; should be a command reject instead.)  
> 
> I will go back and look.  I thought when I sent a test command with an 
> address that wasn't translated I got an unhappy result, which is why I 
> ripped this check out.

Ugh, I just looked at the current PoP and that specifies ccws[1] of test
type as 'invalid' (generating a channel program check). So, the current
PoP and the (old) I/O device commands seem to disagree :/ Do you know
if there's any update to the latter? I think I'll just leave QEMU as it
is, as that at least agrees with the current PoP...

> 
> I was trying to use test CCWs as a safety valve for Halil's Status 
> Modifier concern, so maybe I had something else wrong on that pile. 
> (The careful observer would note that that code was not included here.  :)

:)

> 
> >   
> >>   	/* If the skip flag is off, then data will be transferred */
> >>   	if (!ccw_is_skip(ccw))
> >>   		return 1;
> >> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain *chain, int idx)
> >>   {
> >>   	struct ccw1 *ccw = chain->ch_ccw + idx;
> >>   
> >> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
> >> +	if (ccw_is_tic(ccw))
> >>   		return;
> >>   
> >>   	kfree((void *)(u64)ccw->cda);
> >> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain *chain,
> >>   {
> >>   	struct ccw1 *ccw = chain->ch_ccw + idx;
> >>   
> >> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw))
> >> -		return 0;
> >> -
> >>   	if (ccw_is_tic(ccw))
> >>   		return ccwchain_fetch_tic(chain, idx, cp);
> >>     
> >   

[1] tcws are a bit different; but we don't support them anyway.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-06 16:18       ` Cornelia Huck
@ 2019-05-06 16:25         ` Eric Farman
  2019-05-06 16:31           ` Cornelia Huck
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-06 16:25 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm



On 5/6/19 12:18 PM, Cornelia Huck wrote:
> On Mon, 6 May 2019 11:46:59 -0400
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 5/6/19 11:37 AM, Cornelia Huck wrote:
>>> On Fri,  3 May 2019 15:49:12 +0200
>>> Eric Farman <farman@linux.ibm.com> wrote:
>>>    
>>>> If the CCW being processed is a No-Operation, then by definition no
>>>> data is being transferred.  Let's fold those checks into the normal
>>>> CCW processors, rather than skipping out early.
>>>>
>>>> Likewise, if the CCW being processed is a "test" (an invented
>>>> definition to simply mean it ends in a zero),
>>>
>>> The "Common I/O Device Commands" document actually defines this :)
>>
>> Blech, okay so I didn't look early enough in that document.  Section 1.5
>> it is.  :)
>>
>>>    
>>>> let's permit that to go
>>>> through to the hardware.  There's nothing inherently unique about
>>>> those command codes versus one that ends in an eight [1], or any other
>>>> otherwise valid command codes that are undefined for the device type
>>>> in question.
>>>
>>> But I agree that everything possible should be sent to the hardware.
>>>    
>>>>
>>>> [1] POPS states that a x08 is a TIC CCW, and that having any high-order
>>>> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
>>>> high-order bits are ignored.
>>>>
>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>>>>    1 file changed, 5 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
>>>> index 36d76b821209..c0a52025bf06 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>>> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct channel_program *cp,
>>>>    #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
>>>>    #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == CCW_CMD_BASIC_SENSE)
>>>>    
>>>> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
>>>> -
>>>>    #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>>>>    
>>>>    #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
>>>> @@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct ccw1 *ccw)
>>>>    	if (ccw->count == 0)
>>>>    		return 0;
>>>>    
>>>> +	/* If the command is a NOP, then no data will be transferred */
>>>> +	if (ccw_is_noop(ccw))
>>>> +		return 0;
>>>> +
>>>
>>> Don't you need to return 0 here for any test command as well?
>>>
>>> (If I read the doc correctly, we'll just get a unit check in any case,
>>> as there are no parallel I/O interfaces on modern s390 boxes. Even if
>>> we had a parallel I/O interface, we'd just collect the status, and not
>>> get any data transfer. FWIW, the QEMU ccw interpreter for emulated
>>> devices rejects test ccws with a channel program check, which looks
>>> wrong; should be a command reject instead.)
>>
>> I will go back and look.  I thought when I sent a test command with an
>> address that wasn't translated I got an unhappy result, which is why I
>> ripped this check out.
> 
> Ugh, I just looked at the current PoP and that specifies ccws[1] of test
> type as 'invalid' (generating a channel program check). So, the current
> PoP and the (old) I/O device commands seem to disagree :/ Do you know
> if there's any update to the latter? I think I'll just leave QEMU as it
> is, as that at least agrees with the current PoP...

Double ugh.  I am not aware, but I'll poke around on our side.  Probably 
better to focus on PoP, unless it specifically points to ye olde document.

> 
>>
>> I was trying to use test CCWs as a safety valve for Halil's Status
>> Modifier concern, so maybe I had something else wrong on that pile.
>> (The careful observer would note that that code was not included here.  :)
> 
> :)
> 
>>
>>>    
>>>>    	/* If the skip flag is off, then data will be transferred */
>>>>    	if (!ccw_is_skip(ccw))
>>>>    		return 1;
>>>> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain *chain, int idx)
>>>>    {
>>>>    	struct ccw1 *ccw = chain->ch_ccw + idx;
>>>>    
>>>> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
>>>> +	if (ccw_is_tic(ccw))
>>>>    		return;
>>>>    
>>>>    	kfree((void *)(u64)ccw->cda);
>>>> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain *chain,
>>>>    {
>>>>    	struct ccw1 *ccw = chain->ch_ccw + idx;
>>>>    
>>>> -	if (ccw_is_test(ccw) || ccw_is_noop(ccw))
>>>> -		return 0;
>>>> -
>>>>    	if (ccw_is_tic(ccw))
>>>>    		return ccwchain_fetch_tic(chain, idx, cp);
>>>>      
>>>    
> 
> [1] tcws are a bit different; but we don't support them anyway.
> 

Yeah...  I'd like to get the code cleaned up to a point where if we want 
to support TCWs, we could just up front say "if command go here, if 
transport go there."  Not that that'll be anytime soon, but it would 
prevent us from having to dis-entangle everything at that future point.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-06 16:25         ` Eric Farman
@ 2019-05-06 16:31           ` Cornelia Huck
  0 siblings, 0 replies; 35+ messages in thread
From: Cornelia Huck @ 2019-05-06 16:31 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm

On Mon, 6 May 2019 12:25:01 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> On 5/6/19 12:18 PM, Cornelia Huck wrote:

> > [1] tcws are a bit different; but we don't support them anyway.
> >   
> 
> Yeah...  I'd like to get the code cleaned up to a point where if we want 
> to support TCWs, we could just up front say "if command go here, if 
> transport go there."  Not that that'll be anytime soon, but it would 
> prevent us from having to dis-entangle everything at that future point.
> 

Let's see how much work that is :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] s390/cio: Set vfio-ccw FSM state before ioeventfd
  2019-05-06 14:51   ` Cornelia Huck
@ 2019-05-06 16:36     ` Eric Farman
  2019-05-07  8:32       ` Pierre Morel
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-06 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm



On 5/6/19 10:51 AM, Cornelia Huck wrote:
> On Fri,  3 May 2019 15:49:07 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> Otherwise, the guest can believe it's okay to start another I/O
>> and bump into the non-idle state.  This results in a cc=3
>> (or cc=2 with the pending async CSCH/HSCH code [1]) to the guest,
> 
> I think you can now refer to cc=2, as the csch/hsch is on its way in :)

Woohoo!  :)

> 
>> which is unfortunate since everything is otherwise working normally.
>>
>> [1] https://patchwork.kernel.org/comment/22588563/
>>
>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>
>> ---
>>
>> I think this might've been part of Pierre's FSM cleanup?
> 
> Not sure if I saw this before, but there have been quite a number of
> patches going around...

I guess I should have said his original cleanup from last year.  I 
didn't find it, but it also seems familiar to me.

> 
>> ---
>>   drivers/s390/cio/vfio_ccw_drv.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>> index 0b3b9de45c60..ddd21b6149fd 100644
>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>> @@ -86,11 +86,11 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>   	}
>>   	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
>>   
>> -	if (private->io_trigger)
>> -		eventfd_signal(private->io_trigger, 1);
>> -
>>   	if (private->mdev && is_final)
>>   		private->state = VFIO_CCW_STATE_IDLE;
>> +
>> +	if (private->io_trigger)
>> +		eventfd_signal(private->io_trigger, 1);
>>   }
>>   
>>   /*
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-06 15:39     ` Eric Farman
@ 2019-05-06 20:47       ` Eric Farman
  2019-05-07  8:52         ` Pierre Morel
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-06 20:47 UTC (permalink / raw)
  To: pmorel, Cornelia Huck, Farhan Ali; +Cc: Halil Pasic, linux-s390, kvm



On 5/6/19 11:39 AM, Eric Farman wrote:
> 
> 
> On 5/6/19 8:56 AM, Pierre Morel wrote:
>> On 03/05/2019 15:49, Eric Farman wrote:
>>> If the CCW being processed is a No-Operation, then by definition no
>>> data is being transferred.  Let's fold those checks into the normal
>>> CCW processors, rather than skipping out early.
>>>
>>> Likewise, if the CCW being processed is a "test" (an invented
>>> definition to simply mean it ends in a zero), let's permit that to go
>>> through to the hardware.  There's nothing inherently unique about
>>> those command codes versus one that ends in an eight [1], or any other
>>> otherwise valid command codes that are undefined for the device type
>>> in question.
>>>
>>> [1] POPS states that a x08 is a TIC CCW, and that having any high-order
>>> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
>>> high-order bits are ignored.
>>>
>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>> ---
>>>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c 
>>> b/drivers/s390/cio/vfio_ccw_cp.c
>>> index 36d76b821209..c0a52025bf06 100644
>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct 
>>> channel_program *cp,
>>>   #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 0x0C)
>>>   #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == 
>>> CCW_CMD_BASIC_SENSE)
>>> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
>>> -
>>>   #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>>>   #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
>>> @@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct 
>>> ccw1 *ccw)
>>>       if (ccw->count == 0)
>>>           return 0;
>>> +    /* If the command is a NOP, then no data will be transferred */
>>> +    if (ccw_is_noop(ccw))
>>> +        return 0;
>>> +
>>>       /* If the skip flag is off, then data will be transferred */
>>>       if (!ccw_is_skip(ccw))
>>>           return 1;
>>> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain 
>>> *chain, int idx)
>>>   {
>>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
>>> +    if (ccw_is_tic(ccw))
>>
>>
>> AFAIR, we introduced this code to protect against noop and test with a 
>> non zero CDA.
>> This could go away only if there is somewhere the guaranty that noop 
>> have always a null CDA (same for test).
> 
> What was generating either the null or "test" command codes?  I can 
> provide plenty of examples for both these command codes and how they 
> look coming out of vfio-ccw now.

I've sent both x00 and x03 (NOP) CCWs with zero and non-zero CDAs to 
hardware without this patch.  I don't see anything particuarly 
surpising, so I'm not sure what the original code was attempting to protect.

Maybe, since you question this in ccwchain_cda_free(), you're referring 
to commit 408358b50dea ("s390: vfio-ccw: Do not attempt to free no-op, 
test and tic cda."), which fixed up our attempt to clean things up that 
weren't allocated on the transmit side?  With this series, that is 
reverted, but the cda is indeed set to something that needs to be free'd 
(see below).  So maybe I should at least mention that commit here.

Regardless, while the I/Os work/fail as I expect, the cda addresses 
themselves are wrong in much the same way I describe in patch 4.  Yes, 
without this patch we don't convert them to an IDAL so certain program 
checks aren't applicable.  But the addresses that we end up sending to 
the hardware are nonsensical, though potentially valid, locations.

> 
> The noop check is moved up into the "does data transfer" routine, to 
> determine whether the pages should be pinned or not.  Regardless of 
> whether or not the input CDA is null, we'll end up with a CCW pointing 
> to a valid IDAL of invalid addresses.
> 
> The "test" command codes always struck me as funky, because x18 and xF8 
> and everything in between that ends in x8 is architecturally invalid 
> too, but we don't check for them like we do for things that end in x0. 
> And there's a TON of other opcodes that are invalid for today's ECKD 
> devices, or perhaps were valid for older DASD but have since been 
> deprecated, or are only valid for non-DASD device types.  We have no 
> logic to permit them, either.  If those CCWs had a non-zero CDA, we 
> either pin it successfully and let the targeted device sort it out or an 
> error occurs and we fail at that point.  (QEMU will see a "wirte" region 
> error of -EINVAL because of vfio_pin_pages())
> 
>>
>>
>>
>>>           return;
>>>       kfree((void *)(u64)ccw->cda);
>>> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain 
>>> *chain,
>>>   {
>>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw))
>>> -        return 0;
>>> -
>>>       if (ccw_is_tic(ccw))
>>>           return ccwchain_fetch_tic(chain, idx, cp);
>>>
>>
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] s390/cio: Set vfio-ccw FSM state before ioeventfd
  2019-05-06 16:36     ` Eric Farman
@ 2019-05-07  8:32       ` Pierre Morel
  0 siblings, 0 replies; 35+ messages in thread
From: Pierre Morel @ 2019-05-07  8:32 UTC (permalink / raw)
  To: Eric Farman, Cornelia Huck; +Cc: Farhan Ali, Halil Pasic, linux-s390, kvm

On 06/05/2019 18:36, Eric Farman wrote:
> 
> 
> On 5/6/19 10:51 AM, Cornelia Huck wrote:
>> On Fri,  3 May 2019 15:49:07 +0200
>> Eric Farman <farman@linux.ibm.com> wrote:
>>
>>> Otherwise, the guest can believe it's okay to start another I/O
>>> and bump into the non-idle state.  This results in a cc=3
>>> (or cc=2 with the pending async CSCH/HSCH code [1]) to the guest,
>>
>> I think you can now refer to cc=2, as the csch/hsch is on its way in :)
> 
> Woohoo!  :)
> 
>>
>>> which is unfortunate since everything is otherwise working normally.
>>>
>>> [1] https://patchwork.kernel.org/comment/22588563/
>>>
>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>
>>> ---
>>>
>>> I think this might've been part of Pierre's FSM cleanup?
>>
>> Not sure if I saw this before, but there have been quite a number of
>> patches going around...
> 
> I guess I should have said his original cleanup from last year.  I 
> didn't find it, but it also seems familiar to me.

May be, I am not sure, but does not mater.
It looks good to me to change the state before to send the IRQ signal to 
the guest, just in case we get asynchronism sometime.

> 
>>
>>> ---
>>>   drivers/s390/cio/vfio_ccw_drv.c | 6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c 
>>> b/drivers/s390/cio/vfio_ccw_drv.c
>>> index 0b3b9de45c60..ddd21b6149fd 100644
>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>> @@ -86,11 +86,11 @@ static void vfio_ccw_sch_io_todo(struct 
>>> work_struct *work)
>>>       }
>>>       memcpy(private->io_region->irb_area, irb, sizeof(*irb));
>>> -    if (private->io_trigger)
>>> -        eventfd_signal(private->io_trigger, 1);
>>> -
>>>       if (private->mdev && is_final)
>>>           private->state = VFIO_CCW_STATE_IDLE;
>>> +
>>> +    if (private->io_trigger)
>>> +        eventfd_signal(private->io_trigger, 1);
>>>   }
>>>   /*
>>

Reviewed-by: Pierre Morel<pmorel@linux.ibm.com>



-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-06 20:47       ` Eric Farman
@ 2019-05-07  8:52         ` Pierre Morel
  2019-05-07 16:43           ` Eric Farman
  0 siblings, 1 reply; 35+ messages in thread
From: Pierre Morel @ 2019-05-07  8:52 UTC (permalink / raw)
  To: Eric Farman, Cornelia Huck, Farhan Ali; +Cc: Halil Pasic, linux-s390, kvm

On 06/05/2019 22:47, Eric Farman wrote:
> 
> 
> On 5/6/19 11:39 AM, Eric Farman wrote:
>>
>>
>> On 5/6/19 8:56 AM, Pierre Morel wrote:
>>> On 03/05/2019 15:49, Eric Farman wrote:
>>>> If the CCW being processed is a No-Operation, then by definition no
>>>> data is being transferred.  Let's fold those checks into the normal
>>>> CCW processors, rather than skipping out early.
>>>>
>>>> Likewise, if the CCW being processed is a "test" (an invented
>>>> definition to simply mean it ends in a zero), let's permit that to go
>>>> through to the hardware.  There's nothing inherently unique about
>>>> those command codes versus one that ends in an eight [1], or any other
>>>> otherwise valid command codes that are undefined for the device type
>>>> in question.
>>>>
>>>> [1] POPS states that a x08 is a TIC CCW, and that having any high-order
>>>> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
>>>> high-order bits are ignored.
>>>>
>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>> ---
>>>>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>>>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c 
>>>> b/drivers/s390/cio/vfio_ccw_cp.c
>>>> index 36d76b821209..c0a52025bf06 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>>> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct 
>>>> channel_program *cp,
>>>>   #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 
>>>> 0x0C)
>>>>   #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == 
>>>> CCW_CMD_BASIC_SENSE)
>>>> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
>>>> -
>>>>   #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>>>>   #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
>>>> @@ -314,6 +312,10 @@ static inline int ccw_does_data_transfer(struct 
>>>> ccw1 *ccw)
>>>>       if (ccw->count == 0)
>>>>           return 0;
>>>> +    /* If the command is a NOP, then no data will be transferred */
>>>> +    if (ccw_is_noop(ccw))
>>>> +        return 0;
>>>> +
>>>>       /* If the skip flag is off, then data will be transferred */
>>>>       if (!ccw_is_skip(ccw))
>>>>           return 1;
>>>> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain 
>>>> *chain, int idx)
>>>>   {
>>>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>>>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
>>>> +    if (ccw_is_tic(ccw))
>>>
>>>
>>> AFAIR, we introduced this code to protect against noop and test with 
>>> a non zero CDA.
>>> This could go away only if there is somewhere the guaranty that noop 
>>> have always a null CDA (same for test).
>>
>> What was generating either the null or "test" command codes?  I can 
>> provide plenty of examples for both these command codes and how they 
>> look coming out of vfio-ccw now.
> 
> I've sent both x00 and x03 (NOP) CCWs with zero and non-zero CDAs to 
> hardware without this patch.  I don't see anything particuarly 
> surpising, so I'm not sure what the original code was attempting to 
> protect.
> 
> Maybe, since you question this in ccwchain_cda_free(), you're referring 
> to commit 408358b50dea ("s390: vfio-ccw: Do not attempt to free no-op, 
> test and tic cda."), which fixed up our attempt to clean things up that 
> weren't allocated on the transmit side?  With this series, that is 
> reverted, but the cda is indeed set to something that needs to be free'd 
> (see below).  So maybe I should at least mention that commit here.
> 
> Regardless, while the I/Os work/fail as I expect, the cda addresses 
> themselves are wrong in much the same way I describe in patch 4.  Yes, 
> without this patch we don't convert them to an IDAL so certain program 
> checks aren't applicable.  But the addresses that we end up sending to 
> the hardware are nonsensical, though potentially valid, locations.
> 

I am not comfortable with this.
with NOOP no data transfer take place and the role of VFIO is to take 
care about data transfer.
So in my logic better do nothing and send the original CCW to the hardware.

>>
>> The noop check is moved up into the "does data transfer" routine, to 
>> determine whether the pages should be pinned or not.  Regardless of 
>> whether or not the input CDA is null, we'll end up with a CCW pointing 
>> to a valid IDAL of invalid addresses.
>>
>> The "test" command codes always struck me as funky, because x18 and 
>> xF8 and everything in between that ends in x8 is architecturally 
>> invalid too, but we don't check for them like we do for things that 
>> end in x0. And there's a TON of other opcodes that are invalid for 
>> today's ECKD devices, or perhaps were valid for older DASD but have 
>> since been deprecated, or are only valid for non-DASD device types.  
>> We have no logic to permit them, either.  If those CCWs had a non-zero 
>> CDA, we either pin it successfully and let the targeted device sort it 
>> out or an error occurs and we fail at that point.  (QEMU will see a 
>> "wirte" region error of -EINVAL because of vfio_pin_pages())

The test command is AFAIU even more sensible that the NOOP command and 
in my opinion should never be modified since it is highly device 
dependent and do not induce data transfer anyway.

We even do not know how the CDA field may be used by the device.
May be I am a little dramatic with this.
Just to say that I would feel more comfortable if the test command reach 
the device unchanged.

>>
>>>
>>>
>>>
>>>>           return;
>>>>       kfree((void *)(u64)ccw->cda);
>>>> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain 
>>>> *chain,
>>>>   {
>>>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>>>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw))
>>>> -        return 0;
>>>> -
>>>>       if (ccw_is_tic(ccw))
>>>>           return ccwchain_fetch_tic(chain, idx, cp);
>>>>
>>>
>>>


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-07  8:52         ` Pierre Morel
@ 2019-05-07 16:43           ` Eric Farman
  2019-05-08  9:22             ` Pierre Morel
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-07 16:43 UTC (permalink / raw)
  To: pmorel, Cornelia Huck, Farhan Ali; +Cc: Halil Pasic, linux-s390, kvm



On 5/7/19 4:52 AM, Pierre Morel wrote:
> On 06/05/2019 22:47, Eric Farman wrote:
>>
>>
>> On 5/6/19 11:39 AM, Eric Farman wrote:
>>>
>>>
>>> On 5/6/19 8:56 AM, Pierre Morel wrote:
>>>> On 03/05/2019 15:49, Eric Farman wrote:
>>>>> If the CCW being processed is a No-Operation, then by definition no
>>>>> data is being transferred.  Let's fold those checks into the normal
>>>>> CCW processors, rather than skipping out early.
>>>>>
>>>>> Likewise, if the CCW being processed is a "test" (an invented
>>>>> definition to simply mean it ends in a zero), let's permit that to go
>>>>> through to the hardware.  There's nothing inherently unique about
>>>>> those command codes versus one that ends in an eight [1], or any other
>>>>> otherwise valid command codes that are undefined for the device type
>>>>> in question.
>>>>>
>>>>> [1] POPS states that a x08 is a TIC CCW, and that having any 
>>>>> high-order
>>>>> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
>>>>> high-order bits are ignored.
>>>>>
>>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>>> ---
>>>>>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>>>>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c 
>>>>> b/drivers/s390/cio/vfio_ccw_cp.c
>>>>> index 36d76b821209..c0a52025bf06 100644
>>>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>>>> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct 
>>>>> channel_program *cp,
>>>>>   #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 
>>>>> 0x0C)
>>>>>   #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == 
>>>>> CCW_CMD_BASIC_SENSE)
>>>>> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
>>>>> -
>>>>>   #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>>>>>   #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
>>>>> @@ -314,6 +312,10 @@ static inline int 
>>>>> ccw_does_data_transfer(struct ccw1 *ccw)
>>>>>       if (ccw->count == 0)
>>>>>           return 0;
>>>>> +    /* If the command is a NOP, then no data will be transferred */
>>>>> +    if (ccw_is_noop(ccw))
>>>>> +        return 0;
>>>>> +
>>>>>       /* If the skip flag is off, then data will be transferred */
>>>>>       if (!ccw_is_skip(ccw))
>>>>>           return 1;
>>>>> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain 
>>>>> *chain, int idx)
>>>>>   {
>>>>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>>>>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
>>>>> +    if (ccw_is_tic(ccw))
>>>>
>>>>
>>>> AFAIR, we introduced this code to protect against noop and test with 
>>>> a non zero CDA.
>>>> This could go away only if there is somewhere the guaranty that noop 
>>>> have always a null CDA (same for test).
>>>
>>> What was generating either the null or "test" command codes?  I can 
>>> provide plenty of examples for both these command codes and how they 
>>> look coming out of vfio-ccw now.
>>
>> I've sent both x00 and x03 (NOP) CCWs with zero and non-zero CDAs to 
>> hardware without this patch.  I don't see anything particuarly 
>> surpising, so I'm not sure what the original code was attempting to 
>> protect.
>>
>> Maybe, since you question this in ccwchain_cda_free(), you're 
>> referring to commit 408358b50dea ("s390: vfio-ccw: Do not attempt to 
>> free no-op, test and tic cda."), which fixed up our attempt to clean 
>> things up that weren't allocated on the transmit side?  With this 
>> series, that is reverted, but the cda is indeed set to something that 
>> needs to be free'd (see below).  So maybe I should at least mention 
>> that commit here.
>>
>> Regardless, while the I/Os work/fail as I expect, the cda addresses 
>> themselves are wrong in much the same way I describe in patch 4.  Yes, 
>> without this patch we don't convert them to an IDAL so certain program 
>> checks aren't applicable.  But the addresses that we end up sending to 
>> the hardware are nonsensical, though potentially valid, locations.
>>
> 
> I am not comfortable with this.
> with NOOP no data transfer take place and the role of VFIO is to take 
> care about data transfer.
> So in my logic better do nothing and send the original CCW to the hardware.
> 
>>>
>>> The noop check is moved up into the "does data transfer" routine, to 
>>> determine whether the pages should be pinned or not.  Regardless of 
>>> whether or not the input CDA is null, we'll end up with a CCW 
>>> pointing to a valid IDAL of invalid addresses.
>>>
>>> The "test" command codes always struck me as funky, because x18 and 
>>> xF8 and everything in between that ends in x8 is architecturally 
>>> invalid too, but we don't check for them like we do for things that 
>>> end in x0. And there's a TON of other opcodes that are invalid for 
>>> today's ECKD devices, or perhaps were valid for older DASD but have 
>>> since been deprecated, or are only valid for non-DASD device types. 
>>> We have no logic to permit them, either.  If those CCWs had a 
>>> non-zero CDA, we either pin it successfully and let the targeted 
>>> device sort it out or an error occurs and we fail at that point.  
>>> (QEMU will see a "wirte" region error of -EINVAL because of 
>>> vfio_pin_pages())
> 
> The test command is AFAIU even more sensible that the NOOP command and 
> in my opinion should never be modified since it is highly device 
> dependent and do not induce data transfer anyway.
> 
> We even do not know how the CDA field may be used by the device.

Exactly, which is why I think sending an unpinned, non-translated, guest 
address to the hardware (which is what happens today) is a Bad Idea.  If 
the associated command code WERE going to cause the channel to modify 
any memory, the provided address from the guest would (best case) cause 
a program check if the address were not available, or some data 
corruption if it were.

> May be I am a little dramatic with this.
> Just to say that I would feel more comfortable if the test command reach 
> the device unchanged.
> 

As I say above, I disagree.  I'd rather that the command (test or 
otherwise) hit the channel (and the device if applicable) with a valid 
host address in ccw.cda, so that if any data transfer occurs we're not 
exposed.

If there's an application that wants to send a test CCW with an invalid 
CDA (and thus would fail the pin, as I have seen with NOP), then I guess 
I can add ccw_is_test() to ccw_does_data_transfer(), but since I still 
don't see the use case for test CCWs I'm not as thrilled about it.

Do you recall what caused them to be added originally?

>>>
>>>>
>>>>
>>>>
>>>>>           return;
>>>>>       kfree((void *)(u64)ccw->cda);
>>>>> @@ -723,9 +725,6 @@ static int ccwchain_fetch_one(struct ccwchain 
>>>>> *chain,
>>>>>   {
>>>>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>>>>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw))
>>>>> -        return 0;
>>>>> -
>>>>>       if (ccw_is_tic(ccw))
>>>>>           return ccwchain_fetch_tic(chain, idx, cp);
>>>>>
>>>>
>>>>
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-07 16:43           ` Eric Farman
@ 2019-05-08  9:22             ` Pierre Morel
  2019-05-08 10:06               ` Cornelia Huck
  2019-05-10 11:47               ` Cornelia Huck
  0 siblings, 2 replies; 35+ messages in thread
From: Pierre Morel @ 2019-05-08  9:22 UTC (permalink / raw)
  To: Eric Farman, Cornelia Huck, Farhan Ali; +Cc: Halil Pasic, linux-s390, kvm

On 07/05/2019 18:43, Eric Farman wrote:
> 
> 
> On 5/7/19 4:52 AM, Pierre Morel wrote:
>> On 06/05/2019 22:47, Eric Farman wrote:
>>>
>>>
>>> On 5/6/19 11:39 AM, Eric Farman wrote:
>>>>
>>>>
>>>> On 5/6/19 8:56 AM, Pierre Morel wrote:
>>>>> On 03/05/2019 15:49, Eric Farman wrote:
>>>>>> If the CCW being processed is a No-Operation, then by definition no
>>>>>> data is being transferred.  Let's fold those checks into the normal
>>>>>> CCW processors, rather than skipping out early.
>>>>>>
>>>>>> Likewise, if the CCW being processed is a "test" (an invented
>>>>>> definition to simply mean it ends in a zero), let's permit that to go
>>>>>> through to the hardware.  There's nothing inherently unique about
>>>>>> those command codes versus one that ends in an eight [1], or any 
>>>>>> other
>>>>>> otherwise valid command codes that are undefined for the device type
>>>>>> in question.
>>>>>>
>>>>>> [1] POPS states that a x08 is a TIC CCW, and that having any 
>>>>>> high-order
>>>>>> bits enabled is invalid for format-1 CCWs.  For format-0 CCWs, the
>>>>>> high-order bits are ignored.
>>>>>>
>>>>>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>>>>>> ---
>>>>>>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++------
>>>>>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c 
>>>>>> b/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> index 36d76b821209..c0a52025bf06 100644
>>>>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> @@ -289,8 +289,6 @@ static long copy_ccw_from_iova(struct 
>>>>>> channel_program *cp,
>>>>>>   #define ccw_is_read_backward(_ccw) (((_ccw)->cmd_code & 0x0F) == 
>>>>>> 0x0C)
>>>>>>   #define ccw_is_sense(_ccw) (((_ccw)->cmd_code & 0x0F) == 
>>>>>> CCW_CMD_BASIC_SENSE)
>>>>>> -#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
>>>>>> -
>>>>>>   #define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
>>>>>>   #define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
>>>>>> @@ -314,6 +312,10 @@ static inline int 
>>>>>> ccw_does_data_transfer(struct ccw1 *ccw)
>>>>>>       if (ccw->count == 0)
>>>>>>           return 0;
>>>>>> +    /* If the command is a NOP, then no data will be transferred */
>>>>>> +    if (ccw_is_noop(ccw))
>>>>>> +        return 0;
>>>>>> +
>>>>>>       /* If the skip flag is off, then data will be transferred */
>>>>>>       if (!ccw_is_skip(ccw))
>>>>>>           return 1;
>>>>>> @@ -398,7 +400,7 @@ static void ccwchain_cda_free(struct ccwchain 
>>>>>> *chain, int idx)
>>>>>>   {
>>>>>>       struct ccw1 *ccw = chain->ch_ccw + idx;
>>>>>> -    if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw))
>>>>>> +    if (ccw_is_tic(ccw))
>>>>>
>>>>>
>>>>> AFAIR, we introduced this code to protect against noop and test 
>>>>> with a non zero CDA.
>>>>> This could go away only if there is somewhere the guaranty that 
>>>>> noop have always a null CDA (same for test).
>>>>
>>>> What was generating either the null or "test" command codes?  I can 
>>>> provide plenty of examples for both these command codes and how they 
>>>> look coming out of vfio-ccw now.
>>>
>>> I've sent both x00 and x03 (NOP) CCWs with zero and non-zero CDAs to 
>>> hardware without this patch.  I don't see anything particuarly 
>>> surpising, so I'm not sure what the original code was attempting to 
>>> protect.
>>>
>>> Maybe, since you question this in ccwchain_cda_free(), you're 
>>> referring to commit 408358b50dea ("s390: vfio-ccw: Do not attempt to 
>>> free no-op, test and tic cda."), which fixed up our attempt to clean 
>>> things up that weren't allocated on the transmit side?  With this 
>>> series, that is reverted, but the cda is indeed set to something that 
>>> needs to be free'd (see below).  So maybe I should at least mention 
>>> that commit here.
>>>
>>> Regardless, while the I/Os work/fail as I expect, the cda addresses 
>>> themselves are wrong in much the same way I describe in patch 4.  
>>> Yes, without this patch we don't convert them to an IDAL so certain 
>>> program checks aren't applicable.  But the addresses that we end up 
>>> sending to the hardware are nonsensical, though potentially valid, 
>>> locations.
>>>
>>
>> I am not comfortable with this.
>> with NOOP no data transfer take place and the role of VFIO is to take 
>> care about data transfer.
>> So in my logic better do nothing and send the original CCW to the 
>> hardware.
>>
>>>>
>>>> The noop check is moved up into the "does data transfer" routine, to 
>>>> determine whether the pages should be pinned or not.  Regardless of 
>>>> whether or not the input CDA is null, we'll end up with a CCW 
>>>> pointing to a valid IDAL of invalid addresses.
>>>>
>>>> The "test" command codes always struck me as funky, because x18 and 
>>>> xF8 and everything in between that ends in x8 is architecturally 
>>>> invalid too, but we don't check for them like we do for things that 
>>>> end in x0. And there's a TON of other opcodes that are invalid for 
>>>> today's ECKD devices, or perhaps were valid for older DASD but have 
>>>> since been deprecated, or are only valid for non-DASD device types. 
>>>> We have no logic to permit them, either.  If those CCWs had a 
>>>> non-zero CDA, we either pin it successfully and let the targeted 
>>>> device sort it out or an error occurs and we fail at that point. 
>>>> (QEMU will see a "wirte" region error of -EINVAL because of 
>>>> vfio_pin_pages())
>>
>> The test command is AFAIU even more sensible that the NOOP command and 
>> in my opinion should never be modified since it is highly device 
>> dependent and do not induce data transfer anyway.
>>
>> We even do not know how the CDA field may be used by the device.
> 
> Exactly, which is why I think sending an unpinned, non-translated, guest 
> address to the hardware (which is what happens today) is a Bad Idea.  If 
> the associated command code WERE going to cause the channel to modify 
> any memory, the provided address from the guest would (best case) cause 
> a program check if the address were not available, or some data 
> corruption if it were.
> 
>> May be I am a little dramatic with this.
>> Just to say that I would feel more comfortable if the test command 
>> reach the device unchanged.
>>
> 
> As I say above, I disagree.  I'd rather that the command (test or 
> otherwise) hit the channel (and the device if applicable) with a valid 
> host address in ccw.cda, so that if any data transfer occurs we're not 
> exposed.
> 
> If there's an application that wants to send a test CCW with an invalid 
> CDA (and thus would fail the pin, as I have seen with NOP), then I guess 
> I can add ccw_is_test() to ccw_does_data_transfer(), but since I still 
> don't see the use case for test CCWs I'm not as thrilled about it.
> 
> Do you recall what caused them to be added originally?



AFAIK they have bin added because they are standard CCW commands.


For the NOOP its clearly stated that it does not start a data transfer.
If we pin the CDA, it could then eventually be the cause of errors if 
the address indicated by the CDA is not accessible.

The NOOP is a particular CONTROL operation for which no data is transfered.
Other CONTROL operation may start a data transfer.

The TEST command is used to retrieve the status of the I/O-device 
__path__ and do not go up to the device.
I did not find clearly that it does not start a data transfer but I 
really do not think it does.
May be we should ask people from hardware.
I only found that test I/O (a specific test command) do not initiate an 
operation.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-08  9:22             ` Pierre Morel
@ 2019-05-08 10:06               ` Cornelia Huck
  2019-05-08 19:38                 ` Eric Farman
  2019-05-10 11:47               ` Cornelia Huck
  1 sibling, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-08 10:06 UTC (permalink / raw)
  To: Pierre Morel; +Cc: Eric Farman, Farhan Ali, Halil Pasic, linux-s390, kvm

On Wed, 8 May 2019 11:22:07 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> The TEST command is used to retrieve the status of the I/O-device 
> __path__ and do not go up to the device.
> I did not find clearly that it does not start a data transfer but I 
> really do not think it does.
> May be we should ask people from hardware.
> I only found that test I/O (a specific test command) do not initiate an 
> operation.

FWIW, I'm not sure about what we should do with the test command in any
case.

Currently, I see it defined as a proper command in the rather ancient
"Common I/O Device Commands" (I don't know of any newer public
version), which states that it retrieves the status on the parallel
interface _only_ (and generates a command reject on the serial
interface). IIRC, the parallel interface has been phased out quite some
time ago.

The current PoP, in contrast, defines this as an _invalid_ command
(generating a channel program check).

So, while the test command originally was designed to never initiate a
data transfer, we now have an invalid command in its place, and we
don't know if something else might change in the future (for transfer
mode, a test-like command is already defined in the PoP).

So, the safest course would probably be to handle the ->cda portion and
send the command down. We'll get a check condition on current hardware,
but it should be safe if something changes in the future.

Of course, asking some hardware folks is not a bad idea, either :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] s390/cio: Split pfn_array_alloc_pin into pieces
  2019-05-03 13:49 ` [PATCH 3/7] s390/cio: Split pfn_array_alloc_pin into pieces Eric Farman
@ 2019-05-08 10:43   ` Cornelia Huck
  2019-05-08 13:25     ` Eric Farman
  0 siblings, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-08 10:43 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm

On Fri,  3 May 2019 15:49:08 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> The pfn_array_alloc_pin routine is doing too much.  Today, it does the
> alloc of the pfn_array struct and its member arrays, builds the iova
> address lists out of a contiguous piece of guest memory, and asks vfio
> to pin the resulting pages.
> 
> Let's effectively revert a significant portion of commit 5c1cfb1c3948
> ("vfio: ccw: refactor and improve pfn_array_alloc_pin()") such that we
> break pfn_array_alloc_pin() into its component pieces, and have one
> routine that allocates/populates the pfn_array structs, and another
> that actually pins the memory.  In the future, we will be able to
> handle scenarios where pinning memory isn't actually appropriate.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 72 +++++++++++++++++++++++++++---------------
>  1 file changed, 47 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index f86da78eaeaa..b70306c06150 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -50,28 +50,25 @@ struct ccwchain {
>  };
>  
>  /*
> - * pfn_array_alloc_pin() - alloc memory for PFNs, then pin user pages in memory
> + * pfn_array_alloc() - alloc memory for PFNs
>   * @pa: pfn_array on which to perform the operation
> - * @mdev: the mediated device to perform pin/unpin operations
>   * @iova: target guest physical address
>   * @len: number of bytes that should be pinned from @iova
>   *
> - * Attempt to allocate memory for PFNs, and pin user pages in memory.
> + * Attempt to allocate memory for PFN.

s/PFN/PFNs/

>   *
>   * Usage of pfn_array:
>   * We expect (pa_nr == 0) and (pa_iova_pfn == NULL), any field in
>   * this structure will be filled in by this function.
>   *
>   * Returns:
> - *   Number of pages pinned on success.
> - *   If @pa->pa_nr is not 0, or @pa->pa_iova_pfn is not NULL initially,
> - *   returns -EINVAL.
> - *   If no pages were pinned, returns -errno.
> + *         0 if PFNs are allocated
> + *   -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova_pfn is not NULL
> + *   -ENOMEM if alloc failed
>   */
> -static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
> -			       u64 iova, unsigned int len)
> +static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
>  {
> -	int i, ret = 0;
> +	int i;
>  
>  	if (!len)
>  		return 0;
> @@ -97,23 +94,33 @@ static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
>  	for (i = 1; i < pa->pa_nr; i++)
>  		pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
>  
> +	return 0;
> +}
> +
> +/*
> + * pfn_array_pin() - Pin user pages in memory
> + * @pa: pfn_array on which to perform the operation
> + * @mdev: the mediated device to perform pin operations
> + *
> + * Returns:
> + *   Number of pages pinned on success.
> + *   If fewer pages than requested were pinned, returns -EINVAL
> + *   If no pages were pinned, returns -errno.

I don't really like the 'returns -errno' :) It's actually the return
code of vfio_pin_pages(), and that might include -EINVAL as well.

So, what about mentioning in the function description that
pfn_array_pin() only succeeds if it coult pin all pages, and simply
stating that it returns a negative error value on failure?

> + */
> +static int pfn_array_pin(struct pfn_array *pa, struct device *mdev)
> +{
> +	int ret = 0;
> +
>  	ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
>  			     IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
>  
> -	if (ret < 0) {
> -		goto err_out;
> -	} else if (ret > 0 && ret != pa->pa_nr) {
> +	if (ret > 0 && ret != pa->pa_nr) {
>  		vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
>  		ret = -EINVAL;
> -		goto err_out;
>  	}
>  
> -	return ret;
> -
> -err_out:
> -	pa->pa_nr = 0;
> -	kfree(pa->pa_iova_pfn);
> -	pa->pa_iova_pfn = NULL;
> +	if (ret < 0)
> +		pa->pa_iova = 0;
>  
>  	return ret;
>  }

(...)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] s390/cio: Split pfn_array_alloc_pin into pieces
  2019-05-08 10:43   ` Cornelia Huck
@ 2019-05-08 13:25     ` Eric Farman
  2019-05-08 13:36       ` Cornelia Huck
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-08 13:25 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm



On 5/8/19 6:43 AM, Cornelia Huck wrote:
> On Fri,  3 May 2019 15:49:08 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> The pfn_array_alloc_pin routine is doing too much.  Today, it does the
>> alloc of the pfn_array struct and its member arrays, builds the iova
>> address lists out of a contiguous piece of guest memory, and asks vfio
>> to pin the resulting pages.
>>
>> Let's effectively revert a significant portion of commit 5c1cfb1c3948
>> ("vfio: ccw: refactor and improve pfn_array_alloc_pin()") such that we
>> break pfn_array_alloc_pin() into its component pieces, and have one
>> routine that allocates/populates the pfn_array structs, and another
>> that actually pins the memory.  In the future, we will be able to
>> handle scenarios where pinning memory isn't actually appropriate.
>>
>> Signed-off-by: Eric Farman <farman@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_cp.c | 72 +++++++++++++++++++++++++++---------------
>>   1 file changed, 47 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
>> index f86da78eaeaa..b70306c06150 100644
>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>> @@ -50,28 +50,25 @@ struct ccwchain {
>>   };
>>   
>>   /*
>> - * pfn_array_alloc_pin() - alloc memory for PFNs, then pin user pages in memory
>> + * pfn_array_alloc() - alloc memory for PFNs
>>    * @pa: pfn_array on which to perform the operation
>> - * @mdev: the mediated device to perform pin/unpin operations
>>    * @iova: target guest physical address
>>    * @len: number of bytes that should be pinned from @iova
>>    *
>> - * Attempt to allocate memory for PFNs, and pin user pages in memory.
>> + * Attempt to allocate memory for PFN.
> 
> s/PFN/PFNs/
> 
>>    *
>>    * Usage of pfn_array:
>>    * We expect (pa_nr == 0) and (pa_iova_pfn == NULL), any field in
>>    * this structure will be filled in by this function.
>>    *
>>    * Returns:
>> - *   Number of pages pinned on success.
>> - *   If @pa->pa_nr is not 0, or @pa->pa_iova_pfn is not NULL initially,
>> - *   returns -EINVAL.
>> - *   If no pages were pinned, returns -errno.
>> + *         0 if PFNs are allocated
>> + *   -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova_pfn is not NULL
>> + *   -ENOMEM if alloc failed
>>    */
>> -static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
>> -			       u64 iova, unsigned int len)
>> +static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
>>   {
>> -	int i, ret = 0;
>> +	int i;
>>   
>>   	if (!len)
>>   		return 0;
>> @@ -97,23 +94,33 @@ static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
>>   	for (i = 1; i < pa->pa_nr; i++)
>>   		pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
>>   
>> +	return 0;
>> +}
>> +
>> +/*
>> + * pfn_array_pin() - Pin user pages in memory
>> + * @pa: pfn_array on which to perform the operation
>> + * @mdev: the mediated device to perform pin operations
>> + *
>> + * Returns:
>> + *   Number of pages pinned on success.
>> + *   If fewer pages than requested were pinned, returns -EINVAL
>> + *   If no pages were pinned, returns -errno.
> 
> I don't really like the 'returns -errno' :) It's actually the return
> code of vfio_pin_pages(), and that might include -EINVAL as well.
> 
> So, what about mentioning in the function description that
> pfn_array_pin() only succeeds if it coult pin all pages, and simply
> stating that it returns a negative error value on failure?

Seems reasonable to me...  Something like:

  * Returns number of pages pinned upon success.
  * If the pin request partially succeeds, or fails completely,
  * all pages are left unpinned and a negative error value is returned.

> 
>> + */
>> +static int pfn_array_pin(struct pfn_array *pa, struct device *mdev)
>> +{
>> +	int ret = 0;
>> +
>>   	ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
>>   			     IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
>>   
>> -	if (ret < 0) {
>> -		goto err_out;
>> -	} else if (ret > 0 && ret != pa->pa_nr) {
>> +	if (ret > 0 && ret != pa->pa_nr) {
>>   		vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
>>   		ret = -EINVAL;
>> -		goto err_out;
>>   	}
>>   
>> -	return ret;
>> -
>> -err_out:
>> -	pa->pa_nr = 0;
>> -	kfree(pa->pa_iova_pfn);
>> -	pa->pa_iova_pfn = NULL;
>> +	if (ret < 0)
>> +		pa->pa_iova = 0;
>>   
>>   	return ret;
>>   }
> 
> (...)
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] s390/cio: Split pfn_array_alloc_pin into pieces
  2019-05-08 13:25     ` Eric Farman
@ 2019-05-08 13:36       ` Cornelia Huck
  0 siblings, 0 replies; 35+ messages in thread
From: Cornelia Huck @ 2019-05-08 13:36 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Halil Pasic, Pierre Morel, linux-s390, kvm

On Wed, 8 May 2019 09:25:57 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> On 5/8/19 6:43 AM, Cornelia Huck wrote:
> > On Fri,  3 May 2019 15:49:08 +0200
> > Eric Farman <farman@linux.ibm.com> wrote:

> >> +/*
> >> + * pfn_array_pin() - Pin user pages in memory
> >> + * @pa: pfn_array on which to perform the operation
> >> + * @mdev: the mediated device to perform pin operations
> >> + *
> >> + * Returns:
> >> + *   Number of pages pinned on success.
> >> + *   If fewer pages than requested were pinned, returns -EINVAL
> >> + *   If no pages were pinned, returns -errno.  
> > 
> > I don't really like the 'returns -errno' :) It's actually the return
> > code of vfio_pin_pages(), and that might include -EINVAL as well.
> > 
> > So, what about mentioning in the function description that
> > pfn_array_pin() only succeeds if it coult pin all pages, and simply
> > stating that it returns a negative error value on failure?  
> 
> Seems reasonable to me...  Something like:
> 
>   * Returns number of pages pinned upon success.
>   * If the pin request partially succeeds, or fails completely,
>   * all pages are left unpinned and a negative error value is returned.

Sounds good to me!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-08 10:06               ` Cornelia Huck
@ 2019-05-08 19:38                 ` Eric Farman
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Farman @ 2019-05-08 19:38 UTC (permalink / raw)
  To: Cornelia Huck, Pierre Morel; +Cc: Farhan Ali, Halil Pasic, linux-s390, kvm



On 5/8/19 6:06 AM, Cornelia Huck wrote:
> On Wed, 8 May 2019 11:22:07 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> The TEST command is used to retrieve the status of the I/O-device
>> __path__ and do not go up to the device.
>> I did not find clearly that it does not start a data transfer but I
>> really do not think it does.
>> May be we should ask people from hardware.
>> I only found that test I/O (a specific test command) do not initiate an
>> operation.
> 
> FWIW, I'm not sure about what we should do with the test command in any
> case.
> 
> Currently, I see it defined as a proper command in the rather ancient
> "Common I/O Device Commands" (I don't know of any newer public
> version), 

Nor I.  I had to rummage around a few dumpsters to find a copy of this 
one, even.

> which states that it retrieves the status on the parallel
> interface _only_ (and generates a command reject on the serial
> interface). IIRC, the parallel interface has been phased out quite some
> time ago.

The current POPs, towards the bottom left side of page 13-3, has this 
statement:

---
The term “serial-I/O interface” is used to refer the ESCON I/O 
interface, the FICON I/O interface, and the FICON-converted I/O 
interface. The term “parallel-I/O interface” is used to refer to the IBM 
System/360 and System/370 I/O interface.
---

So, yes it was phased out some time ago.  :)

> 
> The current PoP, in contrast, defines this as an _invalid_ command
> (generating a channel program check).

Ditto the ESA/390 POPs (SA22-7201-08).

> 
> So, while the test command originally was designed to never initiate a
> data transfer, we now have an invalid command in its place, and we
> don't know if something else might change in the future (for transfer
> mode, a test-like command is already defined in the PoP).

Indeed, the ccw_is_test() check would need to be reworked if we ever 
want to support transport mode anyway.  :shudder:

> 
> So, the safest course would probably be to handle the ->cda portion and
> send the command down. We'll get a check condition on current hardware,
> but it should be safe if something changes in the future.
> 
> Of course, asking some hardware folks is not a bad idea, either :)
> 

I'll shoot a quick note (and cc Pierre) just for the sake of sanity, but 
I'm still convinced this patch is fine as-is.  :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-08  9:22             ` Pierre Morel
  2019-05-08 10:06               ` Cornelia Huck
@ 2019-05-10 11:47               ` Cornelia Huck
  2019-05-10 14:24                 ` Eric Farman
  1 sibling, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-10 11:47 UTC (permalink / raw)
  To: Pierre Morel; +Cc: Eric Farman, Farhan Ali, Halil Pasic, linux-s390, kvm

On Wed, 8 May 2019 11:22:07 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> For the NOOP its clearly stated that it does not start a data transfer.
> If we pin the CDA, it could then eventually be the cause of errors if 
> the address indicated by the CDA is not accessible.
> 
> The NOOP is a particular CONTROL operation for which no data is transfered.
> Other CONTROL operation may start a data transfer.

I've just looked at the documentation again.

The Olde Common I/O Device Commands document indicates that a NOOP
simply causes channel end/device end.

The PoP seems to indicate that the cda is always checked (i.e. does it
point to a valid memory area?), but I'm not sure whether the area that
is pointed to is checked for accessibility etc. as well, even if the
command does not transfer any data.

Has somebody tried to find out what happens on Real Hardware(tm) if you
send a command that is not supposed to transfer any data where the cda
points to a valid, but not accessible area?

In general, I think doing the translation (and probably already hitting
errors there) is better than sending down a guest address.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-10 11:47               ` Cornelia Huck
@ 2019-05-10 14:24                 ` Eric Farman
  2019-05-14 14:29                   ` Cornelia Huck
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Farman @ 2019-05-10 14:24 UTC (permalink / raw)
  To: Cornelia Huck, Pierre Morel; +Cc: Farhan Ali, Halil Pasic, linux-s390, kvm



On 5/10/19 7:47 AM, Cornelia Huck wrote:
> On Wed, 8 May 2019 11:22:07 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> For the NOOP its clearly stated that it does not start a data transfer.
>> If we pin the CDA, it could then eventually be the cause of errors if
>> the address indicated by the CDA is not accessible.
>>
>> The NOOP is a particular CONTROL operation for which no data is transfered.
>> Other CONTROL operation may start a data transfer.
> 
> I've just looked at the documentation again.
> 
> The Olde Common I/O Device Commands document indicates that a NOOP
> simply causes channel end/device end.
> 
> The PoP seems to indicate that the cda is always checked (i.e. does it
> point to a valid memory area?), but I'm not sure whether the area that
> is pointed to is checked for accessibility etc. as well, even if the
> command does not transfer any data.
> 
> Has somebody tried to find out what happens on Real Hardware(tm) if you
> send a command that is not supposed to transfer any data where the cda
> points to a valid, but not accessible area?

Hrm...  The CDA itself?  I don't think so.  Since every CCW is converted 
to an IDAL in vfio-ccw, we guarantee that it's pointing to something 
valid at that point.

So, I hacked ccwchain_fetch_direct() to NOT set the IDAL flag in a NOP 
CCW, and to leave the CDA alone.  This means it will still contain the 
guest address, which is risky but hey it's a test system.  :)  (I 
offline'd a bunch of host memory too, to make sure I had some 
unavailable addresses.)

In my traces, the non-IDA NOP CCWs were issued to the host with and 
without the skip flag, with zero and non-zero counts, and with zero and 
non-zero CDAs.  All of them work just fine, including the ones who's 
addresses fall into the offline space.  Even the combination of no skip, 
non-zero count, and zero cda.

I modified that hack to do the same for a known invalid control opcode, 
and it seemed to be okay too.  We got an (expected) invalid command 
before we noticed any problem with the provided address.


> 
> In general, I think doing the translation (and probably already hitting
> errors there) is better than sending down a guest address.
> 

I mostly agree, but I have one test program that generates invalid GUEST 
addresses with its NOP CCWs, since it doesn't seem to care about whether 
they're valid or not.  So any attempt to pin them will end badly, which 
is why I call that opcode out in ccw_does_data_transfer(), and just send 
invalid IDAWs with it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-10 14:24                 ` Eric Farman
@ 2019-05-14 14:29                   ` Cornelia Huck
  2019-05-14 18:29                     ` Eric Farman
  0 siblings, 1 reply; 35+ messages in thread
From: Cornelia Huck @ 2019-05-14 14:29 UTC (permalink / raw)
  To: Eric Farman; +Cc: Pierre Morel, Farhan Ali, Halil Pasic, linux-s390, kvm

On Fri, 10 May 2019 10:24:31 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> On 5/10/19 7:47 AM, Cornelia Huck wrote:
> > On Wed, 8 May 2019 11:22:07 +0200
> > Pierre Morel <pmorel@linux.ibm.com> wrote:
> >   
> >> For the NOOP its clearly stated that it does not start a data transfer.
> >> If we pin the CDA, it could then eventually be the cause of errors if
> >> the address indicated by the CDA is not accessible.
> >>
> >> The NOOP is a particular CONTROL operation for which no data is transfered.
> >> Other CONTROL operation may start a data transfer.  
> > 
> > I've just looked at the documentation again.
> > 
> > The Olde Common I/O Device Commands document indicates that a NOOP
> > simply causes channel end/device end.
> > 
> > The PoP seems to indicate that the cda is always checked (i.e. does it
> > point to a valid memory area?), but I'm not sure whether the area that
> > is pointed to is checked for accessibility etc. as well, even if the
> > command does not transfer any data.
> > 
> > Has somebody tried to find out what happens on Real Hardware(tm) if you
> > send a command that is not supposed to transfer any data where the cda
> > points to a valid, but not accessible area?  
> 
> Hrm...  The CDA itself?  I don't think so.  Since every CCW is converted 
> to an IDAL in vfio-ccw, we guarantee that it's pointing to something 
> valid at that point.
> 
> So, I hacked ccwchain_fetch_direct() to NOT set the IDAL flag in a NOP 
> CCW, and to leave the CDA alone.  This means it will still contain the 
> guest address, which is risky but hey it's a test system.  :)  (I 
> offline'd a bunch of host memory too, to make sure I had some 
> unavailable addresses.)
> 
> In my traces, the non-IDA NOP CCWs were issued to the host with and 
> without the skip flag, with zero and non-zero counts, and with zero and 
> non-zero CDAs.  All of them work just fine, including the ones who's 
> addresses fall into the offline space.  Even the combination of no skip, 
> non-zero count, and zero cda.
> 
> I modified that hack to do the same for a known invalid control opcode, 
> and it seemed to be okay too.  We got an (expected) invalid command 
> before we noticed any problem with the provided address.

That's interesting; I would not have arrived at this by interpreting
the PoP...

> > 
> > In general, I think doing the translation (and probably already hitting
> > errors there) is better than sending down a guest address.
> >   
> 
> I mostly agree, but I have one test program that generates invalid GUEST 
> addresses with its NOP CCWs, since it doesn't seem to care about whether 
> they're valid or not.  So any attempt to pin them will end badly, which 
> is why I call that opcode out in ccw_does_data_transfer(), and just send 
> invalid IDAWs with it.

So, without the attempt to pin they do not fail? Maybe the right
approach would be to rewrite the cda before sending the ccws?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes
  2019-05-14 14:29                   ` Cornelia Huck
@ 2019-05-14 18:29                     ` Eric Farman
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Farman @ 2019-05-14 18:29 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Pierre Morel, Farhan Ali, Halil Pasic, linux-s390, kvm



On 5/14/19 10:29 AM, Cornelia Huck wrote:
> On Fri, 10 May 2019 10:24:31 -0400
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 5/10/19 7:47 AM, Cornelia Huck wrote:
>>> On Wed, 8 May 2019 11:22:07 +0200
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>    
>>>> For the NOOP its clearly stated that it does not start a data transfer.
>>>> If we pin the CDA, it could then eventually be the cause of errors if
>>>> the address indicated by the CDA is not accessible.
>>>>
>>>> The NOOP is a particular CONTROL operation for which no data is transfered.
>>>> Other CONTROL operation may start a data transfer.
>>>
>>> I've just looked at the documentation again.
>>>
>>> The Olde Common I/O Device Commands document indicates that a NOOP
>>> simply causes channel end/device end.
>>>
>>> The PoP seems to indicate that the cda is always checked (i.e. does it
>>> point to a valid memory area?), but I'm not sure whether the area that
>>> is pointed to is checked for accessibility etc. as well, even if the
>>> command does not transfer any data.
>>>
>>> Has somebody tried to find out what happens on Real Hardware(tm) if you
>>> send a command that is not supposed to transfer any data where the cda
>>> points to a valid, but not accessible area?
>>
>> Hrm...  The CDA itself?  I don't think so.  Since every CCW is converted
>> to an IDAL in vfio-ccw, we guarantee that it's pointing to something
>> valid at that point.
>>
>> So, I hacked ccwchain_fetch_direct() to NOT set the IDAL flag in a NOP
>> CCW, and to leave the CDA alone.  This means it will still contain the
>> guest address, which is risky but hey it's a test system.  :)  (I
>> offline'd a bunch of host memory too, to make sure I had some
>> unavailable addresses.)
>>
>> In my traces, the non-IDA NOP CCWs were issued to the host with and
>> without the skip flag, with zero and non-zero counts, and with zero and
>> non-zero CDAs.  All of them work just fine, including the ones who's
>> addresses fall into the offline space.  Even the combination of no skip,
>> non-zero count, and zero cda.
>>
>> I modified that hack to do the same for a known invalid control opcode,
>> and it seemed to be okay too.  We got an (expected) invalid command
>> before we noticed any problem with the provided address.
> 
> That's interesting; I would not have arrived at this by interpreting
> the PoP...
> 
>>>
>>> In general, I think doing the translation (and probably already hitting
>>> errors there) is better than sending down a guest address.
>>>    
>>
>> I mostly agree, but I have one test program that generates invalid GUEST
>> addresses with its NOP CCWs, since it doesn't seem to care about whether
>> they're valid or not.  So any attempt to pin them will end badly, which
>> is why I call that opcode out in ccw_does_data_transfer(), and just send
>> invalid IDAWs with it.
> 
> So, without the attempt to pin they do not fail? 

Correct.

> Maybe the right
> approach would be to rewrite the cda before sending the ccws?
> 

That would be my vote.  (And it's what this series does.  :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2019-05-14 18:29 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03 13:49 [PATCH v1 0/7] s390: vfio-ccw fixes Eric Farman
2019-05-03 13:49 ` [PATCH 1/7] s390/cio: Update SCSW if it points to the end of the chain Eric Farman
2019-05-06 14:47   ` Cornelia Huck
2019-05-06 15:23     ` Eric Farman
2019-05-03 13:49 ` [PATCH 2/7] s390/cio: Set vfio-ccw FSM state before ioeventfd Eric Farman
2019-05-06 14:51   ` Cornelia Huck
2019-05-06 16:36     ` Eric Farman
2019-05-07  8:32       ` Pierre Morel
2019-05-03 13:49 ` [PATCH 3/7] s390/cio: Split pfn_array_alloc_pin into pieces Eric Farman
2019-05-08 10:43   ` Cornelia Huck
2019-05-08 13:25     ` Eric Farman
2019-05-08 13:36       ` Cornelia Huck
2019-05-03 13:49 ` [PATCH 4/7] s390/cio: Initialize the host addresses in pfn_array Eric Farman
2019-05-03 13:49 ` [PATCH 5/7] s390/cio: Allow zero-length CCWs in vfio-ccw Eric Farman
2019-05-03 13:49 ` [PATCH 6/7] s390/cio: Don't pin vfio pages for empty transfers Eric Farman
2019-05-06 15:20   ` Cornelia Huck
2019-05-06 15:40     ` Eric Farman
2019-05-03 13:49 ` [PATCH 7/7] s390/cio: Remove vfio-ccw checks of command codes Eric Farman
2019-05-06 12:56   ` Pierre Morel
2019-05-06 15:39     ` Eric Farman
2019-05-06 20:47       ` Eric Farman
2019-05-07  8:52         ` Pierre Morel
2019-05-07 16:43           ` Eric Farman
2019-05-08  9:22             ` Pierre Morel
2019-05-08 10:06               ` Cornelia Huck
2019-05-08 19:38                 ` Eric Farman
2019-05-10 11:47               ` Cornelia Huck
2019-05-10 14:24                 ` Eric Farman
2019-05-14 14:29                   ` Cornelia Huck
2019-05-14 18:29                     ` Eric Farman
2019-05-06 15:37   ` Cornelia Huck
2019-05-06 15:46     ` Eric Farman
2019-05-06 16:18       ` Cornelia Huck
2019-05-06 16:25         ` Eric Farman
2019-05-06 16:31           ` Cornelia Huck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.