All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/9]
@ 2017-08-30 16:36 Halil Pasic
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH Halil Pasic
                   ` (9 more replies)
  0 siblings, 10 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

This series has a character of a refactoring, as the initial motivation
was improving readability and reducing complexity.

Despite of the original intent the tree first patches buxfixes, and
according to Dong Jia the patch-set also has a functional value: for ccw
pass-through, he is planing to pass-through the instruction completion
information (cc or interruption condition) from the kernel, and this
patch set can pretty much be seen as a preparation for that.

The basic idea is: tell how to handle an unusual conditon where it's
identified, instead of mapping it to an errno (more or less arbitrarily),
then possibly mapping these errnos around, to finally (mentally) map the
errno back to the condition and take appropriate action. 

At the end of the series we also have 125 lines of code less to maintain,
and the executable got a bit smaller too.

Halil Pasic (9):
  s390x/css: fix cc handling for XSCH
  s390x: fix invalid use of cc 1 for SSCH
  s390x/css: be more consistent if broken beyond repair
  s390x: refactor error handling for SSCH and RSCH
  s390x: refactor error handling for XSCH handler
  s390x: refactor error handling for CSCH handler
  s390x: refactor error handling for HSCH handler
  s390x: refactor error handling for MSCH handler
  s390x: factor out common ioinst handler logic

 hw/s390x/css.c              | 158 ++++++++++++--------------------------
 hw/s390x/s390-ccw.c         |   8 +-
 hw/vfio/ccw.c               |  32 ++++++--
 include/hw/s390x/css.h      |  38 +++++++---
 include/hw/s390x/s390-ccw.h |   2 +-
 target/s390x/ioinst.c       | 181 ++++++++++----------------------------------
 6 files changed, 147 insertions(+), 272 deletions(-)

-- 
2.13.5

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-31  5:51   ` Thomas Huth
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH Halil Pasic
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

The function ioinst_handle_xsch is presenting cc 2 when it's supposed to
present cc 1 and the other way around, because css_do_xsch has the error
codes mixed up. Fixing the error codes also fixes the priority.

Let us fix this.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
---
 hw/s390x/css.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 1880b1a0ff..a50fb0727e 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1281,12 +1281,12 @@ int css_do_xsch(SubchDev *sch)
         (!(s->ctrl &
            (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
         (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
-        ret = -EINPROGRESS;
+        ret = -EBUSY;
         goto out;
     }
 
     if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
-        ret = -EBUSY;
+        ret = -EINPROGRESS;
         goto out;
     }
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-31  7:50   ` Thomas Huth
  2017-08-31  9:19   ` Cornelia Huck
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair Halil Pasic
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

According to the POP a start subchannel instruction (SSCH) returning with
cc 1 implies that the subchannel was status pending when SSCH executed.

Due to a somewhat confusing error handling, where error codes are mapped
to cc value, sane looking error codes result in non AR compliant
behavior.

Let's fix this! Instead of cc 1 we use cc 3 which means device not
operational, and is much closer to the truth in the given cases.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
---

This patch turned out quite controversial. We did not reach a consensus
during the internal review.

The most of the discussion revolved around the ORB flag which
architecturally must be supported, but are currently not supported by
vfio-ccw (not yet, or can't be). The idea showing the most promise for
consensus was to handle this via device status (along the lines better a
strange acting device than a non-conform machine) but since it's a
radical change we decided to first discuss upstream and then do whatever
needs to be done.
---
 hw/s390x/css.c      | 15 ++++++---------
 hw/s390x/s390-ccw.c |  2 +-
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index a50fb0727e..0822538cde 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
      */
     if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
         !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
-        return -EINVAL;
+        return -ENODEV;
     }
 
     ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
@@ -1046,16 +1046,13 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
         break;
     case -ENODEV:
         break;
+    case -EFAULT:
+         break;
     case -EACCES:
         /* Let's reflect an inaccessible host device by cc 3. */
-        ret = -ENODEV;
-        break;
     default:
-       /*
-        * All other return codes will trigger a program check,
-        * or set cc to 1.
-        */
-       break;
+        /* Let's make all other return codes map to cc 3.  */
+        ret = -ENODEV;
     };
 
     return ret;
@@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
     if (sch->do_subchannel_work) {
         return sch->do_subchannel_work(sch);
     } else {
-        return -EINVAL;
+        return -ENODEV;
     }
 }
 
diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
index 8614dda6f8..2b0741741c 100644
--- a/hw/s390x/s390-ccw.c
+++ b/hw/s390x/s390-ccw.c
@@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
     if (cdc->handle_request) {
         return cdc->handle_request(orb, scsw, data);
     } else {
-        return -ENOSYS;
+        return -ENODEV;
     }
 }
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH Halil Pasic
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-31  6:10   ` Thomas Huth
  2017-08-31  9:33   ` Cornelia Huck
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH Halil Pasic
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

If we detect that the internally manged state of the subchannel
is broken beyond repair while in do_subchannel_work in case of
virtual we just abort the operation and pretend all went well,
while in case of pass-through we honor the situation with -ENODEV
which results in cc 3 for the instruction whose handler triggered
the call.

Let's be consistent on this and do the -ENODEV also for virtual
subchannels.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
---
 hw/s390x/css.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 0822538cde..bc47bf5b20 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1078,7 +1078,7 @@ int do_subchannel_work_virtual(SubchDev *sch)
         sch_handle_start_func_virtual(sch);
     } else {
         /* Cannot happen. */
-        return 0;
+        return -ENODEV;
     }
     css_inject_io_interrupt(sch);
     return 0;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
                   ` (2 preceding siblings ...)
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-31  9:55   ` Cornelia Huck
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 5/9] s390x: refactor error handling for XSCH handler Halil Pasic
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

Simplify the error handling of the SSCH and RSCH handler avoiding
arbitrary and cryptic error codes being mapped to what a subchannel is
supposed to do.  Let the code detecting the condition tell how it's to be
handled in a less ambiguous way.  It's best to handle SSCH and RSCH in
one go as the emulation of the two shares a lot of code.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>

---
Notes:
Funny, we had a different swich-case for SSCH and RSCH. For
virtual it did not matter, but for passtrough it could. In practice
-EINVAL from the kernel would have been mapped to cc 2 in case of
RSCH and to cc 1 in case of SSHC which is absurd. Same goes for
-EBUSY from kernel which is correctly mapped to cc 2 in case of
SSCH, but for RSCH it gets mapped to cc 1 which is also absurd.
---
 hw/s390x/css.c              | 86 ++++++++++++++-------------------------------
 hw/s390x/s390-ccw.c         |  8 ++---
 hw/vfio/ccw.c               | 32 +++++++++++++----
 include/hw/s390x/css.h      | 30 ++++++++++++----
 include/hw/s390x/s390-ccw.h |  2 +-
 target/s390x/ioinst.c       | 61 +++++++++-----------------------
 6 files changed, 97 insertions(+), 122 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index bc47bf5b20..1102642c10 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1015,12 +1015,11 @@ static void sch_handle_start_func_virtual(SubchDev *sch)
 
 }
 
-static int sch_handle_start_func_passthrough(SubchDev *sch)
+static void sch_handle_start_func_passthrough(SubchDev *sch)
 {
 
     PMCW *p = &sch->curr_status.pmcw;
     SCSW *s = &sch->curr_status.scsw;
-    int ret;
 
     ORB *orb = &sch->orb;
     if (!(s->ctrl & SCSW_ACTL_SUSP)) {
@@ -1034,28 +1033,10 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
      */
     if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
         !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
-        return -ENODEV;
+        sch->iret.cc = 3;
     }
 
-    ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
-    switch (ret) {
-    /* Currently we don't update control block and just return the cc code. */
-    case 0:
-        break;
-    case -EBUSY:
-        break;
-    case -ENODEV:
-        break;
-    case -EFAULT:
-         break;
-    case -EACCES:
-        /* Let's reflect an inaccessible host device by cc 3. */
-    default:
-        /* Let's make all other return codes map to cc 3.  */
-        ret = -ENODEV;
-    };
-
-    return ret;
+    s390_ccw_cmd_request(sch);
 }
 
 /*
@@ -1064,7 +1045,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
  * read/writes) asynchronous later on if we start supporting more than
  * our current very simple devices.
  */
-int do_subchannel_work_virtual(SubchDev *sch)
+void do_subchannel_work_virtual(SubchDev *sch)
 {
 
     SCSW *s = &sch->curr_status.scsw;
@@ -1078,41 +1059,35 @@ int do_subchannel_work_virtual(SubchDev *sch)
         sch_handle_start_func_virtual(sch);
     } else {
         /* Cannot happen. */
-        return -ENODEV;
+        sch->iret.cc = 3;
     }
     css_inject_io_interrupt(sch);
-    return 0;
 }
 
-int do_subchannel_work_passthrough(SubchDev *sch)
+void do_subchannel_work_passthrough(SubchDev *sch)
 {
-    int ret;
     SCSW *s = &sch->curr_status.scsw;
 
     if (s->ctrl & SCSW_FCTL_CLEAR_FUNC) {
         /* TODO: Clear handling */
         sch_handle_clear_func(sch);
-        ret = 0;
     } else if (s->ctrl & SCSW_FCTL_HALT_FUNC) {
         /* TODO: Halt handling */
         sch_handle_halt_func(sch);
-        ret = 0;
     } else if (s->ctrl & SCSW_FCTL_START_FUNC) {
-        ret = sch_handle_start_func_passthrough(sch);
+        sch_handle_start_func_passthrough(sch);
     } else {
         /* Cannot happen. */
-        return -ENODEV;
+        sch->iret.cc = 3;
     }
-
-    return ret;
 }
 
-static int do_subchannel_work(SubchDev *sch)
+static void do_subchannel_work(SubchDev *sch)
 {
     if (sch->do_subchannel_work) {
-        return sch->do_subchannel_work(sch);
+        sch->do_subchannel_work(sch);
     } else {
-        return -ENODEV;
+        sch->iret.cc = 3;
     }
 }
 
@@ -1400,27 +1375,26 @@ static void css_update_chnmon(SubchDev *sch)
     }
 }
 
-int css_do_ssch(SubchDev *sch, ORB *orb)
+void css_do_ssch(SubchDev *sch, ORB *orb)
 {
     SCSW *s = &sch->curr_status.scsw;
     PMCW *p = &sch->curr_status.pmcw;
-    int ret;
 
     if (~(p->flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
-        ret = -ENODEV;
-        goto out;
+        sch->iret.cc = 3;
+        return;
     }
 
     if (s->ctrl & SCSW_STCTL_STATUS_PEND) {
-        ret = -EINPROGRESS;
-        goto out;
+        sch->iret.cc = 1;
+        return;
     }
 
     if (s->ctrl & (SCSW_FCTL_START_FUNC |
                    SCSW_FCTL_HALT_FUNC |
                    SCSW_FCTL_CLEAR_FUNC)) {
-        ret = -EBUSY;
-        goto out;
+        sch->iret.cc = 2;
+        return;
     }
 
     /* If monitoring is active, update counter. */
@@ -1433,10 +1407,7 @@ int css_do_ssch(SubchDev *sch, ORB *orb)
     s->ctrl |= (SCSW_FCTL_START_FUNC | SCSW_ACTL_START_PEND);
     s->flags &= ~SCSW_FLAGS_MASK_PNO;
 
-    ret = do_subchannel_work(sch);
-
-out:
-    return ret;
+    do_subchannel_work(sch);
 }
 
 static void copy_irb_to_guest(IRB *dest, const IRB *src, PMCW *pmcw,
@@ -1683,27 +1654,26 @@ void css_do_schm(uint8_t mbk, int update, int dct, uint64_t mbo)
     }
 }
 
-int css_do_rsch(SubchDev *sch)
+void css_do_rsch(SubchDev *sch)
 {
     SCSW *s = &sch->curr_status.scsw;
     PMCW *p = &sch->curr_status.pmcw;
-    int ret;
 
     if (~(p->flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
-        ret = -ENODEV;
-        goto out;
+        sch->iret.cc = 3;
+        return;
     }
 
     if (s->ctrl & SCSW_STCTL_STATUS_PEND) {
-        ret = -EINPROGRESS;
-        goto out;
+        sch->iret.cc = 1;
+        return;
     }
 
     if (((s->ctrl & SCSW_CTRL_MASK_FCTL) != SCSW_FCTL_START_FUNC) ||
         (s->ctrl & SCSW_ACTL_RESUME_PEND) ||
         (!(s->ctrl & SCSW_ACTL_SUSP))) {
-        ret = -EINVAL;
-        goto out;
+        sch->iret.cc = 2;
+        return;
     }
 
     /* If monitoring is active, update counter. */
@@ -1713,10 +1683,6 @@ int css_do_rsch(SubchDev *sch)
 
     s->ctrl |= SCSW_ACTL_RESUME_PEND;
     do_subchannel_work(sch);
-    ret = 0;
-
-out:
-    return ret;
 }
 
 int css_do_rchp(uint8_t cssid, uint8_t chpid)
diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
index 2b0741741c..fa0947894d 100644
--- a/hw/s390x/s390-ccw.c
+++ b/hw/s390x/s390-ccw.c
@@ -18,14 +18,14 @@
 #include "hw/s390x/css-bridge.h"
 #include "hw/s390x/s390-ccw.h"
 
-int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
+void s390_ccw_cmd_request(SubchDev *sch)
 {
-    S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(data);
+    S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(sch->driver_data);
 
     if (cdc->handle_request) {
-        return cdc->handle_request(orb, scsw, data);
+        cdc->handle_request(sch);
     } else {
-        return -ENODEV;
+        sch->iret.cc = 3;
     }
 }
 
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index a8baadf57a..b2827ce987 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -23,6 +23,7 @@
 #include "hw/s390x/s390-ccw.h"
 #include "hw/s390x/ccw-device.h"
 #include "qemu/error-report.h"
+#include "cpu.h"
 
 #define TYPE_VFIO_CCW "vfio-ccw"
 typedef struct VFIOCCWDevice {
@@ -47,9 +48,9 @@ struct VFIODeviceOps vfio_ccw_ops = {
     .vfio_compute_needs_reset = vfio_ccw_compute_needs_reset,
 };
 
-static int vfio_ccw_handle_request(ORB *orb, SCSW *scsw, void *data)
+static void vfio_ccw_handle_request(SubchDev *sch)
 {
-    S390CCWDevice *cdev = data;
+    S390CCWDevice *cdev = sch->driver_data;
     VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
     struct ccw_io_region *region = vcdev->io_region;
     int ret;
@@ -60,8 +61,8 @@ static int vfio_ccw_handle_request(ORB *orb, SCSW *scsw, void *data)
 
     memset(region, 0, sizeof(*region));
 
-    memcpy(region->orb_area, orb, sizeof(ORB));
-    memcpy(region->scsw_area, scsw, sizeof(SCSW));
+    memcpy(region->orb_area, &sch->orb, sizeof(ORB));
+    memcpy(region->scsw_area, &sch->curr_status.scsw, sizeof(SCSW));
 
 again:
     ret = pwrite(vcdev->vdev.fd, region,
@@ -71,10 +72,27 @@ again:
             goto again;
         }
         error_report("vfio-ccw: wirte I/O region failed with errno=%d", errno);
-        return -errno;
+        ret = -errno;
+    } else {
+        ret = region->ret_code;
+    }
+    switch (-ret) {
+    /* Currently we don't update control block and just return the cc code. */
+    case 0:
+        sch->iret.cc = 0;
+        break;
+    case EBUSY:
+        sch->iret.cc = 2;
+        break;
+    case EFAULT:
+        sch->iret.pgm_chk = true;
+        sch->iret.irq_code = PGM_ADDRESSING;
+        break;
+    case ENODEV:
+    case EACCES:
+    default:
+        sch->iret.cc = 3;
     }
-
-    return region->ret_code;
 }
 
 static void vfio_ccw_reset(DeviceState *dev)
diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
index 5c5fe6b202..d093181a9e 100644
--- a/include/hw/s390x/css.h
+++ b/include/hw/s390x/css.h
@@ -94,13 +94,31 @@ struct SubchDev {
     /* transport-provided data: */
     int (*ccw_cb) (SubchDev *, CCW1);
     void (*disable_cb)(SubchDev *);
-    int (*do_subchannel_work) (SubchDev *);
+    void (*do_subchannel_work) (SubchDev *);
     SenseId id;
     void *driver_data;
+    /* io instructions conclude according to iret */
+    struct {
+        /*
+         * General semantic of cc codes of IO instructions is (brief):
+         * 0 -- produced expected result
+         * 1 -- produced alternate result
+         * 2 -- ineffective, because busy with previously initiated function
+         * 3 -- ineffective, not operational
+         */
+        uint32_t cc:4;
+        bool pgm_chk:1;
+        uint32_t irq_code;
+    } iret;
 };
 
 extern const VMStateDescription vmstate_subch_dev;
 
+static inline void css_subch_clear_iret(SubchDev *sch)
+{
+    memset(&sch->iret, 0, sizeof(sch->iret));
+}
+
 /*
  * Identify a device within the channel subsystem.
  * Note that this can be used to identify either the subchannel or
@@ -156,9 +174,9 @@ void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
 void css_generate_chp_crws(uint8_t cssid, uint8_t chpid);
 void css_generate_css_crws(uint8_t cssid);
 void css_clear_sei_pending(void);
-int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data);
-int do_subchannel_work_virtual(SubchDev *sub);
-int do_subchannel_work_passthrough(SubchDev *sub);
+void s390_ccw_cmd_request(SubchDev *sch);
+void do_subchannel_work_virtual(SubchDev *sub);
+void do_subchannel_work_passthrough(SubchDev *sub);
 
 typedef enum {
     CSS_IO_ADAPTER_VIRTIO = 0,
@@ -189,7 +207,7 @@ int css_do_msch(SubchDev *sch, const SCHIB *schib);
 int css_do_xsch(SubchDev *sch);
 int css_do_csch(SubchDev *sch);
 int css_do_hsch(SubchDev *sch);
-int css_do_ssch(SubchDev *sch, ORB *orb);
+void css_do_ssch(SubchDev *sch, ORB *orb);
 int css_do_tsch_get_irb(SubchDev *sch, IRB *irb, int *irb_len);
 void css_do_tsch_update_subch(SubchDev *sch);
 int css_do_stcrw(CRW *crw);
@@ -200,7 +218,7 @@ int css_collect_chp_desc(int m, uint8_t cssid, uint8_t f_chpid, uint8_t l_chpid,
 void css_do_schm(uint8_t mbk, int update, int dct, uint64_t mbo);
 int css_enable_mcsse(void);
 int css_enable_mss(void);
-int css_do_rsch(SubchDev *sch);
+void css_do_rsch(SubchDev *sch);
 int css_do_rchp(uint8_t cssid, uint8_t chpid);
 bool css_present(uint8_t cssid);
 #endif
diff --git a/include/hw/s390x/s390-ccw.h b/include/hw/s390x/s390-ccw.h
index 9f45cf1347..d2159c9ed7 100644
--- a/include/hw/s390x/s390-ccw.h
+++ b/include/hw/s390x/s390-ccw.h
@@ -33,7 +33,7 @@ typedef struct S390CCWDeviceClass {
     CCWDeviceClass parent_class;
     void (*realize)(S390CCWDevice *dev, char *sysfsdev, Error **errp);
     void (*unrealize)(S390CCWDevice *dev, Error **errp);
-    int (*handle_request) (ORB *, SCSW *, void *);
+    void (*handle_request) (SubchDev *sch);
 } S390CCWDeviceClass;
 
 #endif
diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index 51fbea620d..e8044068c2 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -217,8 +217,6 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, uint32_t ipb)
     SubchDev *sch;
     ORB orig_orb, orb;
     uint64_t addr;
-    int ret = -ENODEV;
-    int cc;
     CPUS390XState *env = &cpu->env;
     uint8_t ar;
 
@@ -238,33 +236,17 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, uint32_t ipb)
     }
     trace_ioinst_sch_id("ssch", cssid, ssid, schid);
     sch = css_find_subch(m, cssid, ssid, schid);
-    if (sch && css_subch_visible(sch)) {
-        ret = css_do_ssch(sch, &orb);
+    if (!sch || !css_subch_visible(sch)) {
+        setcc(cpu, 3);
+        return;
     }
-    switch (ret) {
-    case -ENODEV:
-        cc = 3;
-        break;
-    case -EBUSY:
-        cc = 2;
-        break;
-    case -EFAULT:
-        /*
-         * TODO:
-         * I'm wondering whether there is something better
-         * to do for us here (like setting some device or
-         * subchannel status).
-         */
-        program_interrupt(env, PGM_ADDRESSING, 4);
+    css_subch_clear_iret(sch);
+    css_do_ssch(sch, &orb);
+    if (sch->iret.pgm_chk) {
+        program_interrupt(env, sch->iret.irq_code, 4);
         return;
-    case 0:
-        cc = 0;
-        break;
-    default:
-        cc = 1;
-        break;
     }
-    setcc(cpu, cc);
+    setcc(cpu, sch->iret.cc);
 }
 
 void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb)
@@ -767,8 +749,6 @@ void ioinst_handle_rsch(S390CPU *cpu, uint64_t reg1)
 {
     int cssid, ssid, schid, m;
     SubchDev *sch;
-    int ret = -ENODEV;
-    int cc;
 
     if (ioinst_disassemble_sch_ident(reg1, &m, &cssid, &ssid, &schid)) {
         program_interrupt(&cpu->env, PGM_OPERAND, 4);
@@ -776,24 +756,17 @@ void ioinst_handle_rsch(S390CPU *cpu, uint64_t reg1)
     }
     trace_ioinst_sch_id("rsch", cssid, ssid, schid);
     sch = css_find_subch(m, cssid, ssid, schid);
-    if (sch && css_subch_visible(sch)) {
-        ret = css_do_rsch(sch);
+    if (!sch || !css_subch_visible(sch)) {
+        setcc(cpu, 3);
+        return;
     }
-    switch (ret) {
-    case -ENODEV:
-        cc = 3;
-        break;
-    case -EINVAL:
-        cc = 2;
-        break;
-    case 0:
-        cc = 0;
-        break;
-    default:
-        cc = 1;
-        break;
+    css_subch_clear_iret(sch);
+    css_do_rsch(sch);
+    if (sch->iret.pgm_chk) {
+        program_interrupt(&cpu->env, sch->iret.irq_code, 4);
+        return;
     }
-    setcc(cpu, cc);
+    setcc(cpu, sch->iret.cc);
 }
 
 #define RCHP_REG1_RES(_reg) (_reg & 0x00000000ff00ff00)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 5/9] s390x: refactor error handling for XSCH handler
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
                   ` (3 preceding siblings ...)
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 6/9] s390x: refactor error handling for CSCH handler Halil Pasic
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

Simplify the error handling of the XSCH handler avoiding arbitrary and
cryptic error codes being mapped to what a subchannel is supposed to do.
Let the code detecting the condition tell how it's to be handled in a
less ambiguous way.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
---
 hw/s390x/css.c         | 19 +++++++------------
 include/hw/s390x/css.h |  2 +-
 target/s390x/ioinst.c  | 27 +++++++++------------------
 3 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 1102642c10..7880c2c025 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1237,15 +1237,14 @@ out:
     return ret;
 }
 
-int css_do_xsch(SubchDev *sch)
+void css_do_xsch(SubchDev *sch)
 {
     SCSW *s = &sch->curr_status.scsw;
     PMCW *p = &sch->curr_status.pmcw;
-    int ret;
 
     if (~(p->flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
-        ret = -ENODEV;
-        goto out;
+        sch->iret.cc = 3;
+        return;
     }
 
     if (!(s->ctrl & SCSW_CTRL_MASK_FCTL) ||
@@ -1253,13 +1252,13 @@ int css_do_xsch(SubchDev *sch)
         (!(s->ctrl &
            (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
         (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
-        ret = -EBUSY;
-        goto out;
+        sch->iret.cc = 2;
+        return;
     }
 
     if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
-        ret = -EINPROGRESS;
-        goto out;
+        sch->iret.cc = 1;
+        return;
     }
 
     /* Cancel the current operation. */
@@ -1271,10 +1270,6 @@ int css_do_xsch(SubchDev *sch)
     sch->last_cmd_valid = false;
     s->dstat = 0;
     s->cstat = 0;
-    ret = 0;
-
-out:
-    return ret;
 }
 
 int css_do_csch(SubchDev *sch)
diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
index d093181a9e..d78ee5714b 100644
--- a/include/hw/s390x/css.h
+++ b/include/hw/s390x/css.h
@@ -204,7 +204,7 @@ void css_conditional_io_interrupt(SubchDev *sch);
 int css_do_stsch(SubchDev *sch, SCHIB *schib);
 bool css_schid_final(int m, uint8_t cssid, uint8_t ssid, uint16_t schid);
 int css_do_msch(SubchDev *sch, const SCHIB *schib);
-int css_do_xsch(SubchDev *sch);
+void css_do_xsch(SubchDev *sch);
 int css_do_csch(SubchDev *sch);
 int css_do_hsch(SubchDev *sch);
 void css_do_ssch(SubchDev *sch, ORB *orb);
diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index e8044068c2..f8e038fac7 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -41,8 +41,6 @@ void ioinst_handle_xsch(S390CPU *cpu, uint64_t reg1)
 {
     int cssid, ssid, schid, m;
     SubchDev *sch;
-    int ret = -ENODEV;
-    int cc;
 
     if (ioinst_disassemble_sch_ident(reg1, &m, &cssid, &ssid, &schid)) {
         program_interrupt(&cpu->env, PGM_OPERAND, 4);
@@ -50,24 +48,17 @@ void ioinst_handle_xsch(S390CPU *cpu, uint64_t reg1)
     }
     trace_ioinst_sch_id("xsch", cssid, ssid, schid);
     sch = css_find_subch(m, cssid, ssid, schid);
-    if (sch && css_subch_visible(sch)) {
-        ret = css_do_xsch(sch);
+    if (!sch || !css_subch_visible(sch)) {
+        setcc(cpu, 3);
+        return;
     }
-    switch (ret) {
-    case -ENODEV:
-        cc = 3;
-        break;
-    case -EBUSY:
-        cc = 2;
-        break;
-    case 0:
-        cc = 0;
-        break;
-    default:
-        cc = 1;
-        break;
+    css_subch_clear_iret(sch);
+    css_do_xsch(sch);
+    if (sch->iret.pgm_chk) {
+        program_interrupt(&cpu->env, sch->iret.irq_code, 4);
+        return;
     }
-    setcc(cpu, cc);
+    setcc(cpu, sch->iret.cc);
 }
 
 void ioinst_handle_csch(S390CPU *cpu, uint64_t reg1)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 6/9] s390x: refactor error handling for CSCH handler
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
                   ` (4 preceding siblings ...)
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 5/9] s390x: refactor error handling for XSCH handler Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 7/9] s390x: refactor error handling for HSCH handler Halil Pasic
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

Simplify the error handling of the CSCH handler avoiding arbitrary and
cryptic error codes being mapped to what a subchannel is supposed to do.
Let the code detecting the condition tell how it's to be handled in a
less ambiguous way.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
---
 hw/s390x/css.c         | 11 +++--------
 include/hw/s390x/css.h |  2 +-
 target/s390x/ioinst.c  | 18 +++++++++---------
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 7880c2c025..e661a87387 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1272,15 +1272,14 @@ void css_do_xsch(SubchDev *sch)
     s->cstat = 0;
 }
 
-int css_do_csch(SubchDev *sch)
+void css_do_csch(SubchDev *sch)
 {
     SCSW *s = &sch->curr_status.scsw;
     PMCW *p = &sch->curr_status.pmcw;
-    int ret;
 
     if (~(p->flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
-        ret = -ENODEV;
-        goto out;
+        sch->iret.cc = 3;
+        return;
     }
 
     /* Trigger the clear function. */
@@ -1288,10 +1287,6 @@ int css_do_csch(SubchDev *sch)
     s->ctrl |= SCSW_FCTL_CLEAR_FUNC | SCSW_ACTL_CLEAR_PEND;
 
     do_subchannel_work(sch);
-    ret = 0;
-
-out:
-    return ret;
 }
 
 int css_do_hsch(SubchDev *sch)
diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
index d78ee5714b..d5a7e460ec 100644
--- a/include/hw/s390x/css.h
+++ b/include/hw/s390x/css.h
@@ -205,7 +205,7 @@ int css_do_stsch(SubchDev *sch, SCHIB *schib);
 bool css_schid_final(int m, uint8_t cssid, uint8_t ssid, uint16_t schid);
 int css_do_msch(SubchDev *sch, const SCHIB *schib);
 void css_do_xsch(SubchDev *sch);
-int css_do_csch(SubchDev *sch);
+void css_do_csch(SubchDev *sch);
 int css_do_hsch(SubchDev *sch);
 void css_do_ssch(SubchDev *sch, ORB *orb);
 int css_do_tsch_get_irb(SubchDev *sch, IRB *irb, int *irb_len);
diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index f8e038fac7..501f3135ad 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -65,8 +65,6 @@ void ioinst_handle_csch(S390CPU *cpu, uint64_t reg1)
 {
     int cssid, ssid, schid, m;
     SubchDev *sch;
-    int ret = -ENODEV;
-    int cc;
 
     if (ioinst_disassemble_sch_ident(reg1, &m, &cssid, &ssid, &schid)) {
         program_interrupt(&cpu->env, PGM_OPERAND, 4);
@@ -74,15 +72,17 @@ void ioinst_handle_csch(S390CPU *cpu, uint64_t reg1)
     }
     trace_ioinst_sch_id("csch", cssid, ssid, schid);
     sch = css_find_subch(m, cssid, ssid, schid);
-    if (sch && css_subch_visible(sch)) {
-        ret = css_do_csch(sch);
+    if (!sch || !css_subch_visible(sch)) {
+        setcc(cpu, 3);
+        return;
     }
-    if (ret == -ENODEV) {
-        cc = 3;
-    } else {
-        cc = 0;
+    css_subch_clear_iret(sch);
+    css_do_csch(sch);
+    if (sch->iret.pgm_chk) {
+        program_interrupt(&cpu->env, sch->iret.irq_code, 4);
+        return;
     }
-    setcc(cpu, cc);
+    setcc(cpu, sch->iret.cc);
 }
 
 void ioinst_handle_hsch(S390CPU *cpu, uint64_t reg1)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 7/9] s390x: refactor error handling for HSCH handler
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
                   ` (5 preceding siblings ...)
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 6/9] s390x: refactor error handling for CSCH handler Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 8/9] s390x: refactor error handling for MSCH handler Halil Pasic
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

Simplify the error handling of the HSCH handler avoiding arbitrary and
cryptic error codes being mapped to what a subchannel is supposed to do.
Let the code detecting the condition tell how it's to be handled in a
less ambiguous way.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
---
 hw/s390x/css.c         | 19 +++++++------------
 include/hw/s390x/css.h |  2 +-
 target/s390x/ioinst.c  | 27 +++++++++------------------
 3 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index e661a87387..eff1b91a8c 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1289,28 +1289,27 @@ void css_do_csch(SubchDev *sch)
     do_subchannel_work(sch);
 }
 
-int css_do_hsch(SubchDev *sch)
+void css_do_hsch(SubchDev *sch)
 {
     SCSW *s = &sch->curr_status.scsw;
     PMCW *p = &sch->curr_status.pmcw;
-    int ret;
 
     if (~(p->flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
-        ret = -ENODEV;
-        goto out;
+        sch->iret.cc = 3;
+        return;
     }
 
     if (((s->ctrl & SCSW_CTRL_MASK_STCTL) == SCSW_STCTL_STATUS_PEND) ||
         (s->ctrl & (SCSW_STCTL_PRIMARY |
                     SCSW_STCTL_SECONDARY |
                     SCSW_STCTL_ALERT))) {
-        ret = -EINPROGRESS;
-        goto out;
+        sch->iret.cc = 1;
+        return;
     }
 
     if (s->ctrl & (SCSW_FCTL_HALT_FUNC | SCSW_FCTL_CLEAR_FUNC)) {
-        ret = -EBUSY;
-        goto out;
+        sch->iret.cc = 2;
+        return;
     }
 
     /* Trigger the halt function. */
@@ -1324,10 +1323,6 @@ int css_do_hsch(SubchDev *sch)
     s->ctrl |= SCSW_ACTL_HALT_PEND;
 
     do_subchannel_work(sch);
-    ret = 0;
-
-out:
-    return ret;
 }
 
 static void css_update_chnmon(SubchDev *sch)
diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
index d5a7e460ec..d8d949da4f 100644
--- a/include/hw/s390x/css.h
+++ b/include/hw/s390x/css.h
@@ -206,7 +206,7 @@ bool css_schid_final(int m, uint8_t cssid, uint8_t ssid, uint16_t schid);
 int css_do_msch(SubchDev *sch, const SCHIB *schib);
 void css_do_xsch(SubchDev *sch);
 void css_do_csch(SubchDev *sch);
-int css_do_hsch(SubchDev *sch);
+void css_do_hsch(SubchDev *sch);
 void css_do_ssch(SubchDev *sch, ORB *orb);
 int css_do_tsch_get_irb(SubchDev *sch, IRB *irb, int *irb_len);
 void css_do_tsch_update_subch(SubchDev *sch);
diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index 501f3135ad..e91788d586 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -89,8 +89,6 @@ void ioinst_handle_hsch(S390CPU *cpu, uint64_t reg1)
 {
     int cssid, ssid, schid, m;
     SubchDev *sch;
-    int ret = -ENODEV;
-    int cc;
 
     if (ioinst_disassemble_sch_ident(reg1, &m, &cssid, &ssid, &schid)) {
         program_interrupt(&cpu->env, PGM_OPERAND, 4);
@@ -98,24 +96,17 @@ void ioinst_handle_hsch(S390CPU *cpu, uint64_t reg1)
     }
     trace_ioinst_sch_id("hsch", cssid, ssid, schid);
     sch = css_find_subch(m, cssid, ssid, schid);
-    if (sch && css_subch_visible(sch)) {
-        ret = css_do_hsch(sch);
+    if (!sch || !css_subch_visible(sch)) {
+        setcc(cpu, 3);
+        return;
     }
-    switch (ret) {
-    case -ENODEV:
-        cc = 3;
-        break;
-    case -EBUSY:
-        cc = 2;
-        break;
-    case 0:
-        cc = 0;
-        break;
-    default:
-        cc = 1;
-        break;
+    css_subch_clear_iret(sch);
+    css_do_hsch(sch);
+    if (sch->iret.pgm_chk) {
+        program_interrupt(&cpu->env, sch->iret.irq_code, 4);
+        return;
     }
-    setcc(cpu, cc);
+    setcc(cpu, sch->iret.cc);
 }
 
 static int ioinst_schib_valid(SCHIB *schib)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 8/9] s390x: refactor error handling for MSCH handler
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
                   ` (6 preceding siblings ...)
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 7/9] s390x: refactor error handling for HSCH handler Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 9/9] s390x: factor out common ioinst handler logic Halil Pasic
  2017-08-31 10:04 ` [Qemu-devel] [PATCH 0/9] Cornelia Huck
  9 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

Simplify the error handling of the MSCH handler avoiding arbitrary and
cryptic error codes being mapped to what a subchannel is supposed to do.
Let the code detecting the condition tell how it's to be handled in a
less ambiguous way.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
---
 hw/s390x/css.c         | 20 +++++++-------------
 include/hw/s390x/css.h |  2 +-
 target/s390x/ioinst.c  | 27 +++++++++------------------
 3 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index eff1b91a8c..931a097d6a 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1182,28 +1182,27 @@ static void copy_schib_from_guest(SCHIB *dest, const SCHIB *src)
     }
 }
 
-int css_do_msch(SubchDev *sch, const SCHIB *orig_schib)
+void css_do_msch(SubchDev *sch, const SCHIB *orig_schib)
 {
     SCSW *s = &sch->curr_status.scsw;
     PMCW *p = &sch->curr_status.pmcw;
     uint16_t oldflags;
-    int ret;
     SCHIB schib;
 
     if (!(sch->curr_status.pmcw.flags & PMCW_FLAGS_MASK_DNV)) {
-        ret = 0;
-        goto out;
+        sch->iret.cc = 0;
+        return;
     }
 
     if (s->ctrl & SCSW_STCTL_STATUS_PEND) {
-        ret = -EINPROGRESS;
-        goto out;
+        sch->iret.cc = 1;
+        return;
     }
 
     if (s->ctrl &
         (SCSW_FCTL_START_FUNC|SCSW_FCTL_HALT_FUNC|SCSW_FCTL_CLEAR_FUNC)) {
-        ret = -EBUSY;
-        goto out;
+        sch->iret.cc = 2;
+        return;
     }
 
     copy_schib_from_guest(&schib, orig_schib);
@@ -1230,11 +1229,6 @@ int css_do_msch(SubchDev *sch, const SCHIB *orig_schib)
         && (p->flags & PMCW_FLAGS_MASK_ENA) == 0) {
         sch->disable_cb(sch);
     }
-
-    ret = 0;
-
-out:
-    return ret;
 }
 
 void css_do_xsch(SubchDev *sch)
diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
index d8d949da4f..37ac36b014 100644
--- a/include/hw/s390x/css.h
+++ b/include/hw/s390x/css.h
@@ -203,7 +203,7 @@ bool css_subch_visible(SubchDev *sch);
 void css_conditional_io_interrupt(SubchDev *sch);
 int css_do_stsch(SubchDev *sch, SCHIB *schib);
 bool css_schid_final(int m, uint8_t cssid, uint8_t ssid, uint16_t schid);
-int css_do_msch(SubchDev *sch, const SCHIB *schib);
+void css_do_msch(SubchDev *sch, const SCHIB *schib);
 void css_do_xsch(SubchDev *sch);
 void css_do_csch(SubchDev *sch);
 void css_do_hsch(SubchDev *sch);
diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index e91788d586..a286495219 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -128,8 +128,6 @@ void ioinst_handle_msch(S390CPU *cpu, uint64_t reg1, uint32_t ipb)
     SubchDev *sch;
     SCHIB schib;
     uint64_t addr;
-    int ret = -ENODEV;
-    int cc;
     CPUS390XState *env = &cpu->env;
     uint8_t ar;
 
@@ -148,24 +146,17 @@ void ioinst_handle_msch(S390CPU *cpu, uint64_t reg1, uint32_t ipb)
     }
     trace_ioinst_sch_id("msch", cssid, ssid, schid);
     sch = css_find_subch(m, cssid, ssid, schid);
-    if (sch && css_subch_visible(sch)) {
-        ret = css_do_msch(sch, &schib);
+    if (!sch || !css_subch_visible(sch)) {
+        setcc(cpu, 3);
+        return;
     }
-    switch (ret) {
-    case -ENODEV:
-        cc = 3;
-        break;
-    case -EBUSY:
-        cc = 2;
-        break;
-    case 0:
-        cc = 0;
-        break;
-    default:
-        cc = 1;
-        break;
+    css_subch_clear_iret(sch);
+    css_do_msch(sch, &schib);
+    if (sch->iret.pgm_chk) {
+        program_interrupt(&cpu->env, sch->iret.irq_code, 4);
+        return;
     }
-    setcc(cpu, cc);
+    setcc(cpu, sch->iret.cc);
 }
 
 static void copy_orb_from_guest(ORB *dest, const ORB *src)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 9/9] s390x: factor out common ioinst handler logic
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
                   ` (7 preceding siblings ...)
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 8/9] s390x: refactor error handling for MSCH handler Halil Pasic
@ 2017-08-30 16:36 ` Halil Pasic
  2017-08-31 10:04 ` [Qemu-devel] [PATCH 0/9] Cornelia Huck
  9 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-30 16:36 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Halil Pasic

Some of ioinst the handlers look very much the same: they basically
delegate the work to the appropriate css function (doing some always the
same stuff before and after the call to the appropriate css function).
Let us create a template and get rid of some code.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Suggested-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
---

The handlers for MSCH and SSCH look very similar to our template too.
It's just that there is one more parameter to the corresponding css
function. Maybe one could do something to unify that too (e.g. some
crazy macro) but I don't think it's worth the effort. Another idea
was to make the two corresponding functions single argument too
(by Dong Jia).
---
 target/s390x/ioinst.c | 77 ++++++++++-----------------------------------------
 1 file changed, 14 insertions(+), 63 deletions(-)

diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
index a286495219..6583e5216d 100644
--- a/target/s390x/ioinst.c
+++ b/target/s390x/ioinst.c
@@ -37,7 +37,10 @@ int ioinst_disassemble_sch_ident(uint32_t value, int *m, int *cssid, int *ssid,
     return 0;
 }
 
-void ioinst_handle_xsch(S390CPU *cpu, uint64_t reg1)
+/* many ionst handlers look the same: they just delegate to a some css func */
+static void ioinst_handler_template_sch(S390CPU *cpu, uint64_t reg1,
+                                      const char *iname,
+                                      void (*handler_css)(SubchDev *))
 {
     int cssid, ssid, schid, m;
     SubchDev *sch;
@@ -46,14 +49,14 @@ void ioinst_handle_xsch(S390CPU *cpu, uint64_t reg1)
         program_interrupt(&cpu->env, PGM_OPERAND, 4);
         return;
     }
-    trace_ioinst_sch_id("xsch", cssid, ssid, schid);
+    trace_ioinst_sch_id(iname, cssid, ssid, schid);
     sch = css_find_subch(m, cssid, ssid, schid);
     if (!sch || !css_subch_visible(sch)) {
         setcc(cpu, 3);
         return;
     }
     css_subch_clear_iret(sch);
-    css_do_xsch(sch);
+    handler_css(sch);
     if (sch->iret.pgm_chk) {
         program_interrupt(&cpu->env, sch->iret.irq_code, 4);
         return;
@@ -61,52 +64,19 @@ void ioinst_handle_xsch(S390CPU *cpu, uint64_t reg1)
     setcc(cpu, sch->iret.cc);
 }
 
-void ioinst_handle_csch(S390CPU *cpu, uint64_t reg1)
+void ioinst_handle_xsch(S390CPU *cpu, uint64_t reg1)
 {
-    int cssid, ssid, schid, m;
-    SubchDev *sch;
+    ioinst_handler_template_sch(cpu, reg1, "xsch", css_do_xsch);
+}
 
-    if (ioinst_disassemble_sch_ident(reg1, &m, &cssid, &ssid, &schid)) {
-        program_interrupt(&cpu->env, PGM_OPERAND, 4);
-        return;
-    }
-    trace_ioinst_sch_id("csch", cssid, ssid, schid);
-    sch = css_find_subch(m, cssid, ssid, schid);
-    if (!sch || !css_subch_visible(sch)) {
-        setcc(cpu, 3);
-        return;
-    }
-    css_subch_clear_iret(sch);
-    css_do_csch(sch);
-    if (sch->iret.pgm_chk) {
-        program_interrupt(&cpu->env, sch->iret.irq_code, 4);
-        return;
-    }
-    setcc(cpu, sch->iret.cc);
+void ioinst_handle_csch(S390CPU *cpu, uint64_t reg1)
+{
+    ioinst_handler_template_sch(cpu, reg1, "csch", css_do_csch);
 }
 
 void ioinst_handle_hsch(S390CPU *cpu, uint64_t reg1)
 {
-    int cssid, ssid, schid, m;
-    SubchDev *sch;
-
-    if (ioinst_disassemble_sch_ident(reg1, &m, &cssid, &ssid, &schid)) {
-        program_interrupt(&cpu->env, PGM_OPERAND, 4);
-        return;
-    }
-    trace_ioinst_sch_id("hsch", cssid, ssid, schid);
-    sch = css_find_subch(m, cssid, ssid, schid);
-    if (!sch || !css_subch_visible(sch)) {
-        setcc(cpu, 3);
-        return;
-    }
-    css_subch_clear_iret(sch);
-    css_do_hsch(sch);
-    if (sch->iret.pgm_chk) {
-        program_interrupt(&cpu->env, sch->iret.irq_code, 4);
-        return;
-    }
-    setcc(cpu, sch->iret.cc);
+    ioinst_handler_template_sch(cpu, reg1, "hsch", css_do_hsch);
 }
 
 static int ioinst_schib_valid(SCHIB *schib)
@@ -720,26 +690,7 @@ void ioinst_handle_schm(S390CPU *cpu, uint64_t reg1, uint64_t reg2,
 
 void ioinst_handle_rsch(S390CPU *cpu, uint64_t reg1)
 {
-    int cssid, ssid, schid, m;
-    SubchDev *sch;
-
-    if (ioinst_disassemble_sch_ident(reg1, &m, &cssid, &ssid, &schid)) {
-        program_interrupt(&cpu->env, PGM_OPERAND, 4);
-        return;
-    }
-    trace_ioinst_sch_id("rsch", cssid, ssid, schid);
-    sch = css_find_subch(m, cssid, ssid, schid);
-    if (!sch || !css_subch_visible(sch)) {
-        setcc(cpu, 3);
-        return;
-    }
-    css_subch_clear_iret(sch);
-    css_do_rsch(sch);
-    if (sch->iret.pgm_chk) {
-        program_interrupt(&cpu->env, sch->iret.irq_code, 4);
-        return;
-    }
-    setcc(cpu, sch->iret.cc);
+    ioinst_handler_template_sch(cpu, reg1, "rsch", css_do_rsch);
 }
 
 #define RCHP_REG1_RES(_reg) (_reg & 0x00000000ff00ff00)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH Halil Pasic
@ 2017-08-31  5:51   ` Thomas Huth
  2017-08-31  6:38     ` Cornelia Huck
  2017-08-31  9:09     ` Halil Pasic
  0 siblings, 2 replies; 60+ messages in thread
From: Thomas Huth @ 2017-08-31  5:51 UTC (permalink / raw)
  To: Halil Pasic, Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On 30.08.2017 18:36, Halil Pasic wrote:
> The function ioinst_handle_xsch is presenting cc 2 when it's supposed to
> present cc 1 and the other way around, because css_do_xsch has the error
> codes mixed up. Fixing the error codes also fixes the priority.
> 
> Let us fix this.

(Nit: In case you respin, I'd suggest to remove the last sentence. You
already used "fix" two times in the previous one)

> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> Reported-by: Pierre Morel<pmorel@linux.vnet.ibm.com>

Space missing -------------^

> ---
>  hw/s390x/css.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index 1880b1a0ff..a50fb0727e 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -1281,12 +1281,12 @@ int css_do_xsch(SubchDev *sch)
>          (!(s->ctrl &
>             (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
>          (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
> -        ret = -EINPROGRESS;
> +        ret = -EBUSY;
>          goto out;
>      }
>  
>      if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
> -        ret = -EBUSY;
> +        ret = -EINPROGRESS;
>          goto out;
>      }

Using both, EBUSY and EINPROGRESS as error codes sounds very confusing
to me here ... what's the difference between busy and in-progress? So
while you're at it, maybe you could replace the code for CC 2 ("CANCEL
SUBCHANNEL not applicable") with a different error code?

 Thomas

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair Halil Pasic
@ 2017-08-31  6:10   ` Thomas Huth
  2017-08-31  7:44     ` Thomas Huth
  2017-08-31  9:33   ` Cornelia Huck
  1 sibling, 1 reply; 60+ messages in thread
From: Thomas Huth @ 2017-08-31  6:10 UTC (permalink / raw)
  To: Halil Pasic, Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On 30.08.2017 18:36, Halil Pasic wrote:
> If we detect that the internally manged state of the subchannel
> is broken beyond repair while in do_subchannel_work in case of
> virtual we just abort the operation and pretend all went well,
> while in case of pass-through we honor the situation with -ENODEV
> which results in cc 3 for the instruction whose handler triggered
> the call.
> 
> Let's be consistent on this and do the -ENODEV also for virtual
> subchannels.
> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
> ---
>  hw/s390x/css.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index 0822538cde..bc47bf5b20 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -1078,7 +1078,7 @@ int do_subchannel_work_virtual(SubchDev *sch)
>          sch_handle_start_func_virtual(sch);
>      } else {
>          /* Cannot happen. */
> -        return 0;
> +        return -ENODEV;
>      }
>      css_inject_io_interrupt(sch);
>      return 0;

First, I think ENODEV is not really a good choice here since there
certainly was a device. So maybe use EINVAL or EBADR or something else
instead?

Second, while return an error code is certainly better than returning 0,
I think most errors will still go unnoticed here, since most callers of
do_subchannel_work() seem to ignore the return value ... so I wonder
whether we rather want to have another way to recognize this condition.
If the comment is right and this really can not happen, I think you
should use an g_assert_notreached() here instead. Otherwise the comment
should be changed and it's maybe a good idea to use a
qemu_log_mask(LOG_GUEST_ERROR, "subchannel in bad state bla bla...") here.

 Thomas

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH
  2017-08-31  5:51   ` Thomas Huth
@ 2017-08-31  6:38     ` Cornelia Huck
  2017-08-31  7:32       ` Thomas Huth
  2017-08-31  9:09     ` Halil Pasic
  1 sibling, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-08-31  6:38 UTC (permalink / raw)
  To: Thomas Huth; +Cc: Halil Pasic, Dong Jia Shi, Pierre Morel, qemu-devel

On Thu, 31 Aug 2017 07:51:17 +0200
Thomas Huth <thuth@redhat.com> wrote:

> On 30.08.2017 18:36, Halil Pasic wrote:
> > The function ioinst_handle_xsch is presenting cc 2 when it's supposed to
> > present cc 1 and the other way around, because css_do_xsch has the error
> > codes mixed up. Fixing the error codes also fixes the priority.
> > 
> > Let us fix this.  
> 
> (Nit: In case you respin, I'd suggest to remove the last sentence. You
> already used "fix" two times in the previous one)

I can also remove it on applying, if Halil agrees (I have not yet
reviewed it, though).

> 
> > Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> > Reported-by: Pierre Morel<pmorel@linux.vnet.ibm.com>  
> 
> Space missing -------------^

And I can also add that space on applying :)

> 
> > ---
> >  hw/s390x/css.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> > index 1880b1a0ff..a50fb0727e 100644
> > --- a/hw/s390x/css.c
> > +++ b/hw/s390x/css.c
> > @@ -1281,12 +1281,12 @@ int css_do_xsch(SubchDev *sch)
> >          (!(s->ctrl &
> >             (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
> >          (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
> > -        ret = -EINPROGRESS;
> > +        ret = -EBUSY;
> >          goto out;
> >      }
> >  
> >      if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
> > -        ret = -EBUSY;
> > +        ret = -EINPROGRESS;
> >          goto out;
> >      }  
> 
> Using both, EBUSY and EINPROGRESS as error codes sounds very confusing
> to me here ... what's the difference between busy and in-progress? So
> while you're at it, maybe you could replace the code for CC 2 ("CANCEL
> SUBCHANNEL not applicable") with a different error code?

IIRC, I used these two as they matched my idea of what happens best
(there's a difference between "subchannel is busy" and "the I/O is
already in progress, too late to cancel"). "xsch not applicable" is
very hard to translate to an Unix error code :/

I'll double-check with the PoP.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH
  2017-08-31  6:38     ` Cornelia Huck
@ 2017-08-31  7:32       ` Thomas Huth
  2017-08-31  8:42         ` Cornelia Huck
  0 siblings, 1 reply; 60+ messages in thread
From: Thomas Huth @ 2017-08-31  7:32 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Halil Pasic, Pierre Morel, Dong Jia Shi, qemu-devel

On 31.08.2017 08:38, Cornelia Huck wrote:
> On Thu, 31 Aug 2017 07:51:17 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> On 30.08.2017 18:36, Halil Pasic wrote:
>>> The function ioinst_handle_xsch is presenting cc 2 when it's supposed to
>>> present cc 1 and the other way around, because css_do_xsch has the error
>>> codes mixed up. Fixing the error codes also fixes the priority.
>>>
>>> Let us fix this.  
>>
>> (Nit: In case you respin, I'd suggest to remove the last sentence. You
>> already used "fix" two times in the previous one)
> 
> I can also remove it on applying, if Halil agrees (I have not yet
> reviewed it, though).
> 
>>
>>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>>> Reported-by: Pierre Morel<pmorel@linux.vnet.ibm.com>  
>>
>> Space missing -------------^
> 
> And I can also add that space on applying :)
> 
>>
>>> ---
>>>  hw/s390x/css.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>>> index 1880b1a0ff..a50fb0727e 100644
>>> --- a/hw/s390x/css.c
>>> +++ b/hw/s390x/css.c
>>> @@ -1281,12 +1281,12 @@ int css_do_xsch(SubchDev *sch)
>>>          (!(s->ctrl &
>>>             (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
>>>          (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
>>> -        ret = -EINPROGRESS;
>>> +        ret = -EBUSY;
>>>          goto out;
>>>      }
>>>  
>>>      if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
>>> -        ret = -EBUSY;
>>> +        ret = -EINPROGRESS;
>>>          goto out;
>>>      }  
>>
>> Using both, EBUSY and EINPROGRESS as error codes sounds very confusing
>> to me here ... what's the difference between busy and in-progress? So
>> while you're at it, maybe you could replace the code for CC 2 ("CANCEL
>> SUBCHANNEL not applicable") with a different error code?
> 
> IIRC, I used these two as they matched my idea of what happens best
> (there's a difference between "subchannel is busy" and "the I/O is
> already in progress, too late to cancel"). "xsch not applicable" is
> very hard to translate to an Unix error code :/

OK, the codes make more sense with your description ==> Maybe simply add
a proper comment for each of the return codes?

 Thomas

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair
  2017-08-31  6:10   ` Thomas Huth
@ 2017-08-31  7:44     ` Thomas Huth
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Huth @ 2017-08-31  7:44 UTC (permalink / raw)
  To: Halil Pasic, Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On 31.08.2017 08:10, Thomas Huth wrote:
> On 30.08.2017 18:36, Halil Pasic wrote:
>> If we detect that the internally manged state of the subchannel
>> is broken beyond repair while in do_subchannel_work in case of
>> virtual we just abort the operation and pretend all went well,
>> while in case of pass-through we honor the situation with -ENODEV
>> which results in cc 3 for the instruction whose handler triggered
>> the call.
>>
>> Let's be consistent on this and do the -ENODEV also for virtual
>> subchannels.
>>
>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
>> ---
>>  hw/s390x/css.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>> index 0822538cde..bc47bf5b20 100644
>> --- a/hw/s390x/css.c
>> +++ b/hw/s390x/css.c
>> @@ -1078,7 +1078,7 @@ int do_subchannel_work_virtual(SubchDev *sch)
>>          sch_handle_start_func_virtual(sch);
>>      } else {
>>          /* Cannot happen. */
>> -        return 0;
>> +        return -ENODEV;
>>      }
>>      css_inject_io_interrupt(sch);
>>      return 0;
> 
> First, I think ENODEV is not really a good choice here since there
> certainly was a device. So maybe use EINVAL or EBADR or something else
> instead?
>
> Second, while return an error code is certainly better than returning 0,
> I think most errors will still go unnoticed here, since most callers of
> do_subchannel_work() seem to ignore the return value ... so I wonder
> whether we rather want to have another way to recognize this condition.
> If the comment is right and this really can not happen, I think you
> should use an g_assert_notreached() here instead. Otherwise the comment
> should be changed and it's maybe a good idea to use a
> qemu_log_mask(LOG_GUEST_ERROR, "subchannel in bad state bla bla...") here.

OK, after reading through patch 4/9 I think I've got the basic idea now
... you'll eventually set sch->iret.cc = 3 here instead, so the exact
error code does not really matter here.
But still - if it "Cannot happen", maybe an assert() or an additional
qemu_log would be appropriate?

 Thomas

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH Halil Pasic
@ 2017-08-31  7:50   ` Thomas Huth
  2017-08-31 10:54     ` Halil Pasic
  2017-08-31  9:19   ` Cornelia Huck
  1 sibling, 1 reply; 60+ messages in thread
From: Thomas Huth @ 2017-08-31  7:50 UTC (permalink / raw)
  To: Halil Pasic, Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On 30.08.2017 18:36, Halil Pasic wrote:
> According to the POP a start subchannel instruction (SSCH) returning with
> cc 1 implies that the subchannel was status pending when SSCH executed.
> 
> Due to a somewhat confusing error handling, where error codes are mapped
> to cc value, sane looking error codes result in non AR compliant
> behavior.
> 
> Let's fix this! Instead of cc 1 we use cc 3 which means device not
> operational, and is much closer to the truth in the given cases.
> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
> ---
> 
> This patch turned out quite controversial. We did not reach a consensus
> during the internal review.
> 
> The most of the discussion revolved around the ORB flag which
> architecturally must be supported, but are currently not supported by
> vfio-ccw (not yet, or can't be). The idea showing the most promise for
> consensus was to handle this via device status (along the lines better a
> strange acting device than a non-conform machine) but since it's a
> radical change we decided to first discuss upstream and then do whatever
> needs to be done.
> ---
>  hw/s390x/css.c      | 15 ++++++---------
>  hw/s390x/s390-ccw.c |  2 +-
>  2 files changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index a50fb0727e..0822538cde 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>       */
>      if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
>          !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
> -        return -EINVAL;
> +        return -ENODEV;

I don't really like ENODEV in this case (since the device is apparently
there)... but well, since you're later change it again to set cc=3
directly, I guess the temporary ENODEV is ok.

>      }
>  
>      ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
> @@ -1046,16 +1046,13 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>          break;
>      case -ENODEV:
>          break;
> +    case -EFAULT:
> +         break;

I think you should mention this in the patch description. Why is EFAULT
suddenly handled here?

>      case -EACCES:
>          /* Let's reflect an inaccessible host device by cc 3. */
> -        ret = -ENODEV;
> -        break;
>      default:
> -       /*
> -        * All other return codes will trigger a program check,
> -        * or set cc to 1.
> -        */
> -       break;
> +        /* Let's make all other return codes map to cc 3.  */
> +        ret = -ENODEV;
>      };
>  
>      return ret;
> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
>      if (sch->do_subchannel_work) {
>          return sch->do_subchannel_work(sch);
>      } else {
> -        return -EINVAL;
> +        return -ENODEV;
>      }
>  }
>  
> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
> index 8614dda6f8..2b0741741c 100644
> --- a/hw/s390x/s390-ccw.c
> +++ b/hw/s390x/s390-ccw.c
> @@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
>      if (cdc->handle_request) {
>          return cdc->handle_request(orb, scsw, data);
>      } else {
> -        return -ENOSYS;
> +        return -ENODEV;
>      }
>  }

 Thomas

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH
  2017-08-31  7:32       ` Thomas Huth
@ 2017-08-31  8:42         ` Cornelia Huck
  2017-08-31 10:19           ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-08-31  8:42 UTC (permalink / raw)
  To: Thomas Huth; +Cc: Halil Pasic, Pierre Morel, Dong Jia Shi, qemu-devel

On Thu, 31 Aug 2017 09:32:49 +0200
Thomas Huth <thuth@redhat.com> wrote:

> On 31.08.2017 08:38, Cornelia Huck wrote:
> > On Thu, 31 Aug 2017 07:51:17 +0200
> > Thomas Huth <thuth@redhat.com> wrote:
> >   
> >> On 30.08.2017 18:36, Halil Pasic wrote:  
> >>> The function ioinst_handle_xsch is presenting cc 2 when it's supposed to
> >>> present cc 1 and the other way around, because css_do_xsch has the error
> >>> codes mixed up. Fixing the error codes also fixes the priority.
> >>>
> >>> Let us fix this.    
> >>
> >> (Nit: In case you respin, I'd suggest to remove the last sentence. You
> >> already used "fix" two times in the previous one)  
> > 
> > I can also remove it on applying, if Halil agrees (I have not yet
> > reviewed it, though).
> >   
> >>  
> >>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> >>> Reported-by: Pierre Morel<pmorel@linux.vnet.ibm.com>    
> >>
> >> Space missing -------------^  
> > 
> > And I can also add that space on applying :)
> >   
> >>  
> >>> ---
> >>>  hw/s390x/css.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> >>> index 1880b1a0ff..a50fb0727e 100644
> >>> --- a/hw/s390x/css.c
> >>> +++ b/hw/s390x/css.c
> >>> @@ -1281,12 +1281,12 @@ int css_do_xsch(SubchDev *sch)
> >>>          (!(s->ctrl &
> >>>             (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
> >>>          (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
> >>> -        ret = -EINPROGRESS;
> >>> +        ret = -EBUSY;
> >>>          goto out;
> >>>      }
> >>>  
> >>>      if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
> >>> -        ret = -EBUSY;
> >>> +        ret = -EINPROGRESS;
> >>>          goto out;
> >>>      }    
> >>
> >> Using both, EBUSY and EINPROGRESS as error codes sounds very confusing
> >> to me here ... what's the difference between busy and in-progress? So
> >> while you're at it, maybe you could replace the code for CC 2 ("CANCEL
> >> SUBCHANNEL not applicable") with a different error code?  
> > 
> > IIRC, I used these two as they matched my idea of what happens best
> > (there's a difference between "subchannel is busy" and "the I/O is
> > already in progress, too late to cancel"). "xsch not applicable" is
> > very hard to translate to an Unix error code :/  
> 
> OK, the codes make more sense with your description ==> Maybe simply add
> a proper comment for each of the return codes?

Taking a step back and looking at the other I/O instructions and their
implementation in qemu:

- For those instructions that can set it, cc 1 is set when the
  subchannel is status pending. That's usually the "default" branch in
  ioinst.c.
- cc 2 is set when the subchannel is "busy" (or, in the case of xsch,
  "not applicable for xsch"... sigh) This is supposed to be handled via
  -EBUSY.

So, there are actually two problems with the current implementation of
xsch:

- The return codes are switched around, which this patch fixes.
- But "status pending" should also take precedence over "not
  applicable", if I read the PoP correctly, so the second if needs to
  be moved up.

And yes, it makes sense do add some comments...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH
  2017-08-31  5:51   ` Thomas Huth
  2017-08-31  6:38     ` Cornelia Huck
@ 2017-08-31  9:09     ` Halil Pasic
  2017-08-31  9:16       ` Thomas Huth
  1 sibling, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-08-31  9:09 UTC (permalink / raw)
  To: Thomas Huth, Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 08/31/2017 07:51 AM, Thomas Huth wrote:
> On 30.08.2017 18:36, Halil Pasic wrote:
>> The function ioinst_handle_xsch is presenting cc 2 when it's supposed to
>> present cc 1 and the other way around, because css_do_xsch has the error
>> codes mixed up. Fixing the error codes also fixes the priority.
>>
>> Let us fix this.
> 
> (Nit: In case you respin, I'd suggest to remove the last sentence. You
> already used "fix" two times in the previous one)
> 
>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>> Reported-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
> 
> Space missing -------------^
> 

copy-paste :(

>> ---
>>  hw/s390x/css.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>> index 1880b1a0ff..a50fb0727e 100644
>> --- a/hw/s390x/css.c
>> +++ b/hw/s390x/css.c
>> @@ -1281,12 +1281,12 @@ int css_do_xsch(SubchDev *sch)
>>          (!(s->ctrl &
>>             (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
>>          (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
>> -        ret = -EINPROGRESS;
>> +        ret = -EBUSY;
>>          goto out;
>>      }
>>  
>>      if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
>> -        ret = -EBUSY;
>> +        ret = -EINPROGRESS;
>>          goto out;
>>      }
> 
> Using both, EBUSY and EINPROGRESS as error codes sounds very confusing
> to me here ... what's the difference between busy and in-progress? So
> while you're at it, maybe you could replace the code for CC 2 ("CANCEL
> SUBCHANNEL not applicable") with a different error code?
> 
>  Thomas
> 

Well, the idea of the series is to get rid of these artificial error codes,
so your concern of using EBUSY and EINPROGRESS will be dealt with in patch
5.

The idea was to first do the fixes and then do the transformation without
changing behavior.

Thanks for having a look!

Regards,

Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH
  2017-08-31  9:09     ` Halil Pasic
@ 2017-08-31  9:16       ` Thomas Huth
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Huth @ 2017-08-31  9:16 UTC (permalink / raw)
  To: Halil Pasic, Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On 31.08.2017 11:09, Halil Pasic wrote:
> 
> 
> On 08/31/2017 07:51 AM, Thomas Huth wrote:
>> On 30.08.2017 18:36, Halil Pasic wrote:
>>> The function ioinst_handle_xsch is presenting cc 2 when it's supposed to
>>> present cc 1 and the other way around, because css_do_xsch has the error
>>> codes mixed up. Fixing the error codes also fixes the priority.
>>>
>>> Let us fix this.
>>
>> (Nit: In case you respin, I'd suggest to remove the last sentence. You
>> already used "fix" two times in the previous one)
>>
>>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>>> Reported-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
>>
>> Space missing -------------^
>>
> 
> copy-paste :(
> 
>>> ---
>>>  hw/s390x/css.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>>> index 1880b1a0ff..a50fb0727e 100644
>>> --- a/hw/s390x/css.c
>>> +++ b/hw/s390x/css.c
>>> @@ -1281,12 +1281,12 @@ int css_do_xsch(SubchDev *sch)
>>>          (!(s->ctrl &
>>>             (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
>>>          (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
>>> -        ret = -EINPROGRESS;
>>> +        ret = -EBUSY;
>>>          goto out;
>>>      }
>>>  
>>>      if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
>>> -        ret = -EBUSY;
>>> +        ret = -EINPROGRESS;
>>>          goto out;
>>>      }
>>
>> Using both, EBUSY and EINPROGRESS as error codes sounds very confusing
>> to me here ... what's the difference between busy and in-progress? So
>> while you're at it, maybe you could replace the code for CC 2 ("CANCEL
>> SUBCHANNEL not applicable") with a different error code?
>>
>>  Thomas
>>
> 
> Well, the idea of the series is to get rid of these artificial error codes,
> so your concern of using EBUSY and EINPROGRESS will be dealt with in patch
> 5.
> 
> The idea was to first do the fixes and then do the transformation without
> changing behavior.

Yeah, I realized that when I started to look at the later patches ... so
please ignore my comment, it should be OK the way you're doing it.

 Thomas

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH Halil Pasic
  2017-08-31  7:50   ` Thomas Huth
@ 2017-08-31  9:19   ` Cornelia Huck
  2017-08-31 10:41     ` Halil Pasic
  1 sibling, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-08-31  9:19 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On Wed, 30 Aug 2017 18:36:02 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> According to the POP a start subchannel instruction (SSCH) returning with
> cc 1 implies that the subchannel was status pending when SSCH executed.
> 
> Due to a somewhat confusing error handling, where error codes are mapped
> to cc value, sane looking error codes result in non AR compliant
> behavior.
> 
> Let's fix this! Instead of cc 1 we use cc 3 which means device not
> operational, and is much closer to the truth in the given cases.
> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
> ---
> 
> This patch turned out quite controversial. We did not reach a consensus
> during the internal review.
> 
> The most of the discussion revolved around the ORB flag which
> architecturally must be supported, but are currently not supported by
> vfio-ccw (not yet, or can't be). The idea showing the most promise for
> consensus was to handle this via device status (along the lines better a
> strange acting device than a non-conform machine) but since it's a
> radical change we decided to first discuss upstream and then do whatever
> needs to be done.
> ---
>  hw/s390x/css.c      | 15 ++++++---------
>  hw/s390x/s390-ccw.c |  2 +-
>  2 files changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index a50fb0727e..0822538cde 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>       */
>      if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
>          !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
> -        return -EINVAL;
> +        return -ENODEV;

This feels wrong. If we don't support this yet, doing something like a
channel-program check or an operand exception feels closer to the
architecture than indicating a gone device.

>      }
>  
>      ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
> @@ -1046,16 +1046,13 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>          break;
>      case -ENODEV:
>          break;
> +    case -EFAULT:
> +         break;
>      case -EACCES:
>          /* Let's reflect an inaccessible host device by cc 3. */
> -        ret = -ENODEV;
> -        break;
>      default:
> -       /*
> -        * All other return codes will trigger a program check,
> -        * or set cc to 1.
> -        */
> -       break;
> +        /* Let's make all other return codes map to cc 3.  */
> +        ret = -ENODEV;

Why? This feels wrong. For those cases where we want to signal an error
but cc 1 is conceptually wrong, either an operand exception (for very
few cases) or a channel-program check feels more in line with the
architecture.

That's a general problem with doing stuff in the hypervisor: We have
sets of internal problems that obviously don't show up in the
architecture, and we can either handle them internally or use what the
architecture offers for problem signaling. z/VM has probably faced the
same problems :)

>      };
>  
>      return ret;
> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
>      if (sch->do_subchannel_work) {
>          return sch->do_subchannel_work(sch);
>      } else {
> -        return -EINVAL;
> +        return -ENODEV;

This rather seems like a job for an assert? If we don't have a function
for the 'asynchronous' handling of the various functions assigned for a
subchannel, that looks like an internal error.

>      }
>  }
>  
> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
> index 8614dda6f8..2b0741741c 100644
> --- a/hw/s390x/s390-ccw.c
> +++ b/hw/s390x/s390-ccw.c
> @@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
>      if (cdc->handle_request) {
>          return cdc->handle_request(orb, scsw, data);
>      } else {
> -        return -ENOSYS;
> +        return -ENODEV;

If we get here, it means that we called a request handler (which is
only done for the passthrough variety) without having assigned a
request handler beforehand. This also looks like an internal error to
me...

>      }
>  }
>  

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair Halil Pasic
  2017-08-31  6:10   ` Thomas Huth
@ 2017-08-31  9:33   ` Cornelia Huck
  1 sibling, 0 replies; 60+ messages in thread
From: Cornelia Huck @ 2017-08-31  9:33 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On Wed, 30 Aug 2017 18:36:03 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> If we detect that the internally manged state of the subchannel
> is broken beyond repair while in do_subchannel_work in case of
> virtual we just abort the operation and pretend all went well,
> while in case of pass-through we honor the situation with -ENODEV
> which results in cc 3 for the instruction whose handler triggered
> the call.
> 
> Let's be consistent on this and do the -ENODEV also for virtual
> subchannels.
> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
> ---
>  hw/s390x/css.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index 0822538cde..bc47bf5b20 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -1078,7 +1078,7 @@ int do_subchannel_work_virtual(SubchDev *sch)
>          sch_handle_start_func_virtual(sch);
>      } else {
>          /* Cannot happen. */
> -        return 0;
> +        return -ENODEV;

No, this _really_ cannot happen. fctl is a three-bit field, which means
that one of the branches above must have executed. fctl cannot be 0, as
any caller of do_subchannel_work() either sets a bit there or, in the
case of rsch, checks for a bit already set. This is an internal error,
so an assert seems more suitable here. [We might need to keep the
return to keep mingw happy.]

>      }
>      css_inject_io_interrupt(sch);
>      return 0;

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH Halil Pasic
@ 2017-08-31  9:55   ` Cornelia Huck
  2017-09-05 15:55     ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-08-31  9:55 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On Wed, 30 Aug 2017 18:36:04 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> Simplify the error handling of the SSCH and RSCH handler avoiding
> arbitrary and cryptic error codes being mapped to what a subchannel is
> supposed to do.  Let the code detecting the condition tell how it's to be
> handled in a less ambiguous way.  It's best to handle SSCH and RSCH in
> one go as the emulation of the two shares a lot of code.

So determining the return code at the point in time where we can
instead of trying to map to return codes and back again makes quite a
bit of sense, but I'll have to look at the rest of this. For one thing,
would a better split to introduce the cc-reporting infrastructure first
and then convert the different I/O functions?

> 
> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
> 
> ---
> Notes:
> Funny, we had a different swich-case for SSCH and RSCH. For
> virtual it did not matter, but for passtrough it could. In practice
> -EINVAL from the kernel would have been mapped to cc 2 in case of
> RSCH and to cc 1 in case of SSHC which is absurd. Same goes for
> -EBUSY from kernel which is correctly mapped to cc 2 in case of
> SSCH, but for RSCH it gets mapped to cc 1 which is also absurd.
> ---
>  hw/s390x/css.c              | 86 ++++++++++++++-------------------------------
>  hw/s390x/s390-ccw.c         |  8 ++---
>  hw/vfio/ccw.c               | 32 +++++++++++++----
>  include/hw/s390x/css.h      | 30 ++++++++++++----
>  include/hw/s390x/s390-ccw.h |  2 +-
>  target/s390x/ioinst.c       | 61 +++++++++-----------------------
>  6 files changed, 97 insertions(+), 122 deletions(-)
> 
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index bc47bf5b20..1102642c10 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -1015,12 +1015,11 @@ static void sch_handle_start_func_virtual(SubchDev *sch)
>  
>  }
>  
> -static int sch_handle_start_func_passthrough(SubchDev *sch)
> +static void sch_handle_start_func_passthrough(SubchDev *sch)
>  {
>  
>      PMCW *p = &sch->curr_status.pmcw;
>      SCSW *s = &sch->curr_status.scsw;
> -    int ret;
>  
>      ORB *orb = &sch->orb;
>      if (!(s->ctrl & SCSW_ACTL_SUSP)) {
> @@ -1034,28 +1033,10 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>       */
>      if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
>          !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
> -        return -ENODEV;
> +        sch->iret.cc = 3;

Same as already commented: I don't think cc 3 is a good match.

>      }
>  
> -    ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
> -    switch (ret) {
> -    /* Currently we don't update control block and just return the cc code. */
> -    case 0:
> -        break;
> -    case -EBUSY:
> -        break;
> -    case -ENODEV:
> -        break;
> -    case -EFAULT:
> -         break;
> -    case -EACCES:
> -        /* Let's reflect an inaccessible host device by cc 3. */
> -    default:
> -        /* Let's make all other return codes map to cc 3.  */
> -        ret = -ENODEV;
> -    };
> -
> -    return ret;
> +    s390_ccw_cmd_request(sch);

As you change the handling anyway: Don't change this logic in prior
patches?

>  }
>  
>  /*
> @@ -1064,7 +1045,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>   * read/writes) asynchronous later on if we start supporting more than
>   * our current very simple devices.
>   */
> -int do_subchannel_work_virtual(SubchDev *sch)
> +void do_subchannel_work_virtual(SubchDev *sch)
>  {
>  
>      SCSW *s = &sch->curr_status.scsw;
> @@ -1078,41 +1059,35 @@ int do_subchannel_work_virtual(SubchDev *sch)
>          sch_handle_start_func_virtual(sch);
>      } else {
>          /* Cannot happen. */
> -        return -ENODEV;
> +        sch->iret.cc = 3;

See comment for the last patch.

>      }
>      css_inject_io_interrupt(sch);
> -    return 0;
>  }
>  
> -int do_subchannel_work_passthrough(SubchDev *sch)
> +void do_subchannel_work_passthrough(SubchDev *sch)
>  {
> -    int ret;
>      SCSW *s = &sch->curr_status.scsw;
>  
>      if (s->ctrl & SCSW_FCTL_CLEAR_FUNC) {
>          /* TODO: Clear handling */
>          sch_handle_clear_func(sch);
> -        ret = 0;
>      } else if (s->ctrl & SCSW_FCTL_HALT_FUNC) {
>          /* TODO: Halt handling */
>          sch_handle_halt_func(sch);
> -        ret = 0;
>      } else if (s->ctrl & SCSW_FCTL_START_FUNC) {
> -        ret = sch_handle_start_func_passthrough(sch);
> +        sch_handle_start_func_passthrough(sch);
>      } else {
>          /* Cannot happen. */
> -        return -ENODEV;
> +        sch->iret.cc = 3;

ftcl == 0 should have been rejected already by the function handlers
here as well, shouldn't it?

>      }
> -
> -    return ret;
>  }
>  
> -static int do_subchannel_work(SubchDev *sch)
> +static void do_subchannel_work(SubchDev *sch)
>  {
>      if (sch->do_subchannel_work) {
> -        return sch->do_subchannel_work(sch);
> +        sch->do_subchannel_work(sch);
>      } else {
> -        return -ENODEV;
> +        sch->iret.cc = 3;

See my comment for a prior patch.

>      }
>  }
>  

(...)

> diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
> index 5c5fe6b202..d093181a9e 100644
> --- a/include/hw/s390x/css.h
> +++ b/include/hw/s390x/css.h
> @@ -94,13 +94,31 @@ struct SubchDev {
>      /* transport-provided data: */
>      int (*ccw_cb) (SubchDev *, CCW1);
>      void (*disable_cb)(SubchDev *);
> -    int (*do_subchannel_work) (SubchDev *);
> +    void (*do_subchannel_work) (SubchDev *);
>      SenseId id;
>      void *driver_data;
> +    /* io instructions conclude according to iret */
> +    struct {
> +        /*
> +         * General semantic of cc codes of IO instructions is (brief):
> +         * 0 -- produced expected result
> +         * 1 -- produced alternate result
> +         * 2 -- ineffective, because busy with previously initiated function
> +         * 3 -- ineffective, not operational

I'm not sure you can summarize this that way in all cases.

Also, what does "ineffective" mean? If I get a cc 1 for, say, ssch, I
don't expect something either as the subchannel was already status
pending.

> +         */
> +        uint32_t cc:4;
> +        bool pgm_chk:1;

This looks weird?

> +        uint32_t irq_code;
> +    } iret;
>  };
>  
>  extern const VMStateDescription vmstate_subch_dev;

(...)

> @@ -238,33 +236,17 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, uint32_t ipb)
>      }
>      trace_ioinst_sch_id("ssch", cssid, ssid, schid);
>      sch = css_find_subch(m, cssid, ssid, schid);
> -    if (sch && css_subch_visible(sch)) {
> -        ret = css_do_ssch(sch, &orb);
> +    if (!sch || !css_subch_visible(sch)) {
> +        setcc(cpu, 3);
> +        return;
>      }
> -    switch (ret) {
> -    case -ENODEV:
> -        cc = 3;
> -        break;
> -    case -EBUSY:
> -        cc = 2;
> -        break;
> -    case -EFAULT:
> -        /*
> -         * TODO:
> -         * I'm wondering whether there is something better
> -         * to do for us here (like setting some device or
> -         * subchannel status).
> -         */

You removed the TODO :(

There still might be a better way to reflect this...

> -        program_interrupt(env, PGM_ADDRESSING, 4);
> +    css_subch_clear_iret(sch);
> +    css_do_ssch(sch, &orb);
> +    if (sch->iret.pgm_chk) {
> +        program_interrupt(env, sch->iret.irq_code, 4);
>          return;
> -    case 0:
> -        cc = 0;
> -        break;
> -    default:
> -        cc = 1;
> -        break;
>      }
> -    setcc(cpu, cc);
> +    setcc(cpu, sch->iret.cc);
>  }
>  
>  void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9]
  2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
                   ` (8 preceding siblings ...)
  2017-08-30 16:36 ` [Qemu-devel] [PATCH 9/9] s390x: factor out common ioinst handler logic Halil Pasic
@ 2017-08-31 10:04 ` Cornelia Huck
  2017-08-31 10:43   ` Halil Pasic
  9 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-08-31 10:04 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On Wed, 30 Aug 2017 18:36:00 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> This series has a character of a refactoring, as the initial motivation
> was improving readability and reducing complexity.

But you reduced the cover letter subject too much ;)

> 
> Despite of the original intent the tree first patches buxfixes, and
> according to Dong Jia the patch-set also has a functional value: for ccw
> pass-through, he is planing to pass-through the instruction completion
> information (cc or interruption condition) from the kernel, and this
> patch set can pretty much be seen as a preparation for that.

I think the first one (with the precedence fixed as well) is pretty
uncontroversial, and I'd be happy to queue an updated version.

> 
> The basic idea is: tell how to handle an unusual conditon where it's
> identified, instead of mapping it to an errno (more or less arbitrarily),
> then possibly mapping these errnos around, to finally (mentally) map the
> errno back to the condition and take appropriate action. 
> 
> At the end of the series we also have 125 lines of code less to maintain,
> and the executable got a bit smaller too.

I agree with the basic direction, but I think this needs more hashing
out.

> 
> Halil Pasic (9):
>   s390x/css: fix cc handling for XSCH
>   s390x: fix invalid use of cc 1 for SSCH
>   s390x/css: be more consistent if broken beyond repair
>   s390x: refactor error handling for SSCH and RSCH
>   s390x: refactor error handling for XSCH handler
>   s390x: refactor error handling for CSCH handler
>   s390x: refactor error handling for HSCH handler
>   s390x: refactor error handling for MSCH handler
>   s390x: factor out common ioinst handler logic
> 
>  hw/s390x/css.c              | 158 ++++++++++++--------------------------
>  hw/s390x/s390-ccw.c         |   8 +-
>  hw/vfio/ccw.c               |  32 ++++++--
>  include/hw/s390x/css.h      |  38 +++++++---
>  include/hw/s390x/s390-ccw.h |   2 +-
>  target/s390x/ioinst.c       | 181 ++++++++++----------------------------------
>  6 files changed, 147 insertions(+), 272 deletions(-)
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH
  2017-08-31  8:42         ` Cornelia Huck
@ 2017-08-31 10:19           ` Halil Pasic
  0 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-31 10:19 UTC (permalink / raw)
  To: Cornelia Huck, Thomas Huth; +Cc: Pierre Morel, Dong Jia Shi, qemu-devel



On 08/31/2017 10:42 AM, Cornelia Huck wrote:
> On Thu, 31 Aug 2017 09:32:49 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
>> On 31.08.2017 08:38, Cornelia Huck wrote:
>>> On Thu, 31 Aug 2017 07:51:17 +0200
>>> Thomas Huth <thuth@redhat.com> wrote:
>>>   
>>>> On 30.08.2017 18:36, Halil Pasic wrote:  
>>>>> The function ioinst_handle_xsch is presenting cc 2 when it's supposed to
>>>>> present cc 1 and the other way around, because css_do_xsch has the error
>>>>> codes mixed up. Fixing the error codes also fixes the priority.
>>>>>
>>>>> Let us fix this.    
>>>>
>>>> (Nit: In case you respin, I'd suggest to remove the last sentence. You
>>>> already used "fix" two times in the previous one)  
>>>
>>> I can also remove it on applying, if Halil agrees (I have not yet
>>> reviewed it, though).
>>>   
>>>>  
>>>>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>>>>> Reported-by: Pierre Morel<pmorel@linux.vnet.ibm.com>    
>>>>
>>>> Space missing -------------^  
>>>
>>> And I can also add that space on applying :)
>>>   
>>>>  
>>>>> ---
>>>>>  hw/s390x/css.c | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>>>>> index 1880b1a0ff..a50fb0727e 100644
>>>>> --- a/hw/s390x/css.c
>>>>> +++ b/hw/s390x/css.c
>>>>> @@ -1281,12 +1281,12 @@ int css_do_xsch(SubchDev *sch)
>>>>>          (!(s->ctrl &
>>>>>             (SCSW_ACTL_RESUME_PEND | SCSW_ACTL_START_PEND | SCSW_ACTL_SUSP))) ||
>>>>>          (s->ctrl & SCSW_ACTL_SUBCH_ACTIVE)) {
>>>>> -        ret = -EINPROGRESS;
>>>>> +        ret = -EBUSY;
>>>>>          goto out;
>>>>>      }
>>>>>  
>>>>>      if (s->ctrl & SCSW_CTRL_MASK_STCTL) {
>>>>> -        ret = -EBUSY;
>>>>> +        ret = -EINPROGRESS;
>>>>>          goto out;
>>>>>      }    
>>>>
>>>> Using both, EBUSY and EINPROGRESS as error codes sounds very confusing
>>>> to me here ... what's the difference between busy and in-progress? So
>>>> while you're at it, maybe you could replace the code for CC 2 ("CANCEL
>>>> SUBCHANNEL not applicable") with a different error code?  
>>>
>>> IIRC, I used these two as they matched my idea of what happens best
>>> (there's a difference between "subchannel is busy" and "the I/O is
>>> already in progress, too late to cancel"). "xsch not applicable" is
>>> very hard to translate to an Unix error code :/  
>>
>> OK, the codes make more sense with your description ==> Maybe simply add
>> a proper comment for each of the return codes?
> 
> Taking a step back and looking at the other I/O instructions and their
> implementation in qemu:
> 
> - For those instructions that can set it, cc 1 is set when the
>   subchannel is status pending. That's usually the "default" branch in
>   ioinst.c.
> - cc 2 is set when the subchannel is "busy" (or, in the case of xsch,
>   "not applicable for xsch"... sigh) This is supposed to be handled via
>   -EBUSY.
> 
> So, there are actually two problems with the current implementation of
> xsch:
> 
> - The return codes are switched around, which this patch fixes.
> - But "status pending" should also take precedence over "not
>   applicable", if I read the PoP correctly, so the second if needs to
>   be moved up.

You are right and I was wrong. "Condition code 1 has precedence over
condition code 2." So it's 3 > 1 > 2 (and I remembered 3 > 2 > 1).

I will fix this for v2.

> 
> And yes, it makes sense do add some comments...
> 

If we apply the series as a whole adding comments would an overkill
IMHO. We will switch this to iret.cc = ? so it should become obvious.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-08-31  9:19   ` Cornelia Huck
@ 2017-08-31 10:41     ` Halil Pasic
  2017-09-05  8:02       ` Cornelia Huck
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-08-31 10:41 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 08/31/2017 11:19 AM, Cornelia Huck wrote:
> On Wed, 30 Aug 2017 18:36:02 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> According to the POP a start subchannel instruction (SSCH) returning with
>> cc 1 implies that the subchannel was status pending when SSCH executed.
>>
>> Due to a somewhat confusing error handling, where error codes are mapped
>> to cc value, sane looking error codes result in non AR compliant
>> behavior.
>>
>> Let's fix this! Instead of cc 1 we use cc 3 which means device not
>> operational, and is much closer to the truth in the given cases.
>>
>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
>> ---
>>
>> This patch turned out quite controversial. We did not reach a consensus
>> during the internal review.
>>
>> The most of the discussion revolved around the ORB flag which
>> architecturally must be supported, but are currently not supported by
>> vfio-ccw (not yet, or can't be). The idea showing the most promise for
>> consensus was to handle this via device status (along the lines better a
>> strange acting device than a non-conform machine) but since it's a
>> radical change we decided to first discuss upstream and then do whatever
>> needs to be done.
>> ---
>>  hw/s390x/css.c      | 15 ++++++---------
>>  hw/s390x/s390-ccw.c |  2 +-
>>  2 files changed, 7 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>> index a50fb0727e..0822538cde 100644
>> --- a/hw/s390x/css.c
>> +++ b/hw/s390x/css.c
>> @@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>>       */
>>      if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
>>          !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
>> -        return -EINVAL;
>> +        return -ENODEV;
> 
> This feels wrong. If we don't support this yet, doing something like a
> channel-program check or an operand exception feels closer to the
> architecture than indicating a gone device.

I disagree, a channel-program check or an operand exception, or cc 1
(current solution) makes the machine obviously non-conform.

My train of thought was that architecturally you can loose connection
to the device at any time (you can't prohibit admins pulling cables or
smashing equipment with a 10kg hammer).

Also from the guest OS perspective I think saying device not operational
could provoke a proper reaction form the guest OS: that is just give up
on the device. The things you propose would in my opinion put the blame
on the guest OS driver (making non-conform requests) so in that case
it would make sense to give up on the driver (but the same driver could
wonderfully work with let's say a fully emulated device).

As I have stated in the cover letter of this patch, I would find
setting device status even better, but I wanted to discuss first
before going from setting cc to something else.

Setting cc was not my idea in the first place (AFAIK the -EINVAL
here effectively triggers cc 1).

> 
>>      }
>>  
>>      ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
>> @@ -1046,16 +1046,13 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>>          break;
>>      case -ENODEV:
>>          break;
>> +    case -EFAULT:
>> +         break;
>>      case -EACCES:
>>          /* Let's reflect an inaccessible host device by cc 3. */
>> -        ret = -ENODEV;
>> -        break;
>>      default:
>> -       /*
>> -        * All other return codes will trigger a program check,
>> -        * or set cc to 1.
>> -        */
>> -       break;
>> +        /* Let's make all other return codes map to cc 3.  */
>> +        ret = -ENODEV;
> 
> Why? This feels wrong. For those cases where we want to signal an error
> but cc 1 is conceptually wrong, either an operand exception (for very
> few cases) or a channel-program check feels more in line with the
> architecture.

You mean the original code feels wrong, or? I keep the program check for
-EFAULT (that's why it's added) and just change cc 1 to cc 3 for the
not explicitly handled error codes (reason stated in the commit message).

> 
> That's a general problem with doing stuff in the hypervisor: We have
> sets of internal problems that obviously don't show up in the
> architecture, and we can either handle them internally or use what the
> architecture offers for problem signaling. z/VM has probably faced the
> same problems :)

I agree.

> 
>>      };
>>  
>>      return ret;
>> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
>>      if (sch->do_subchannel_work) {
>>          return sch->do_subchannel_work(sch);
>>      } else {
>> -        return -EINVAL;
>> +        return -ENODEV;
> 
> This rather seems like a job for an assert? If we don't have a function
> for the 'asynchronous' handling of the various functions assigned for a
> subchannel, that looks like an internal error.
> 

IMHO it depends. Aborting qemu is heavy handed, and as an user I would not
be happy about it. But certainly it is an assert situation.  We can look for
an even better solution, but I think this is an improvement. The logic behind
is that the device is broken and can't be talked to properly.

>>      }
>>  }
>>  
>> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
>> index 8614dda6f8..2b0741741c 100644
>> --- a/hw/s390x/s390-ccw.c
>> +++ b/hw/s390x/s390-ccw.c
>> @@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
>>      if (cdc->handle_request) {
>>          return cdc->handle_request(orb, scsw, data);
>>      } else {
>> -        return -ENOSYS;
>> +        return -ENODEV;
> 
> If we get here, it means that we called a request handler (which is
> only done for the passthrough variety) without having assigned a
> request handler beforehand. This also looks like an internal error to
> me...
> 

Certainly. Again I was not the one who wrote or accepted the original
code. My previous comment about whether assert or not applies here as
well.

>>      }
>>  }
>>  
> 
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9]
  2017-08-31 10:04 ` [Qemu-devel] [PATCH 0/9] Cornelia Huck
@ 2017-08-31 10:43   ` Halil Pasic
  0 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-31 10:43 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 08/31/2017 12:04 PM, Cornelia Huck wrote:
> On Wed, 30 Aug 2017 18:36:00 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> This series has a character of a refactoring, as the initial motivation
>> was improving readability and reducing complexity.
> 
> But you reduced the cover letter subject too much ;)

Noticed that after sending. The plan was to copy paste
it form the internal version, but I'm no good at copy-pasting.

> 
>>
>> Despite of the original intent the tree first patches buxfixes, and
>> according to Dong Jia the patch-set also has a functional value: for ccw
>> pass-through, he is planing to pass-through the instruction completion
>> information (cc or interruption condition) from the kernel, and this
>> patch set can pretty much be seen as a preparation for that.
> 
> I think the first one (with the precedence fixed as well) is pretty
> uncontroversial, and I'd be happy to queue an updated version.
> 

I will send it as a stand-alone patch (probably today).

>>
>> The basic idea is: tell how to handle an unusual conditon where it's
>> identified, instead of mapping it to an errno (more or less arbitrarily),
>> then possibly mapping these errnos around, to finally (mentally) map the
>> errno back to the condition and take appropriate action. 
>>
>> At the end of the series we also have 125 lines of code less to maintain,
>> and the executable got a bit smaller too.
> 
> I agree with the basic direction, but I think this needs more hashing
> out.
> 

I'm sure we will sort the details out.

>>
>> Halil Pasic (9):
>>   s390x/css: fix cc handling for XSCH
>>   s390x: fix invalid use of cc 1 for SSCH
>>   s390x/css: be more consistent if broken beyond repair
>>   s390x: refactor error handling for SSCH and RSCH
>>   s390x: refactor error handling for XSCH handler
>>   s390x: refactor error handling for CSCH handler
>>   s390x: refactor error handling for HSCH handler
>>   s390x: refactor error handling for MSCH handler
>>   s390x: factor out common ioinst handler logic
>>
>>  hw/s390x/css.c              | 158 ++++++++++++--------------------------
>>  hw/s390x/s390-ccw.c         |   8 +-
>>  hw/vfio/ccw.c               |  32 ++++++--
>>  include/hw/s390x/css.h      |  38 +++++++---
>>  include/hw/s390x/s390-ccw.h |   2 +-
>>  target/s390x/ioinst.c       | 181 ++++++++++----------------------------------
>>  6 files changed, 147 insertions(+), 272 deletions(-)
>>
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-08-31  7:50   ` Thomas Huth
@ 2017-08-31 10:54     ` Halil Pasic
  0 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-08-31 10:54 UTC (permalink / raw)
  To: Thomas Huth, Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 08/31/2017 09:50 AM, Thomas Huth wrote:
> On 30.08.2017 18:36, Halil Pasic wrote:
>> According to the POP a start subchannel instruction (SSCH) returning with
>> cc 1 implies that the subchannel was status pending when SSCH executed.
>>
>> Due to a somewhat confusing error handling, where error codes are mapped
>> to cc value, sane looking error codes result in non AR compliant
>> behavior.
>>
>> Let's fix this! Instead of cc 1 we use cc 3 which means device not
>> operational, and is much closer to the truth in the given cases.
>>
>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
>> ---
>>
>> This patch turned out quite controversial. We did not reach a consensus
>> during the internal review.
>>
>> The most of the discussion revolved around the ORB flag which
>> architecturally must be supported, but are currently not supported by
>> vfio-ccw (not yet, or can't be). The idea showing the most promise for
>> consensus was to handle this via device status (along the lines better a
>> strange acting device than a non-conform machine) but since it's a
>> radical change we decided to first discuss upstream and then do whatever
>> needs to be done.
>> ---
>>  hw/s390x/css.c      | 15 ++++++---------
>>  hw/s390x/s390-ccw.c |  2 +-
>>  2 files changed, 7 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>> index a50fb0727e..0822538cde 100644
>> --- a/hw/s390x/css.c
>> +++ b/hw/s390x/css.c
>> @@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>>       */
>>      if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
>>          !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
>> -        return -EINVAL;
>> +        return -ENODEV;
> 
> I don't really like ENODEV in this case (since the device is apparently
> there)... but well, since you're later change it again to set cc=3
> directly, I guess the temporary ENODEV is ok.
> 
>>      }
>>  
>>      ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
>> @@ -1046,16 +1046,13 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>>          break;
>>      case -ENODEV:
>>          break;
>> +    case -EFAULT:
>> +         break;
> 
> I think you should mention this in the patch description. Why is EFAULT
> suddenly handled here?

It is not suddenly :) If you examine ioinst_handle_ssch which really
handles the error codes (here we are just mapping them around) you see:
   switch (ret) {
    case -ENODEV:
        cc = 3;
        break;
    case -EBUSY:
        cc = 2;
        break;
    case -EFAULT:
        /*
         * TODO:
         * I'm wondering whether there is something better
         * to do for us here (like setting some device or
         * subchannel status).
         */
        program_interrupt(env, PGM_ADDRESSING, 4);
        return;
    case 0:
        cc = 0;
        break;
    default:
        cc = 1;
        break;
    }

That is -EFAULT is handled with a program interrupt, and I want to keep
that. Hence break, that is keep unchanged.

What I do want to change is the other not explicitly handled error codes
(which actually should not happen) should be cc 3 and not cc 1.

So the default branch sets ret to -ENODEV.


> 
>>      case -EACCES:
>>          /* Let's reflect an inaccessible host device by cc 3. */
>> -        ret = -ENODEV;
>> -        break;
>>      default:
>> -       /*
>> -        * All other return codes will trigger a program check,
>> -        * or set cc to 1.
>> -        */
>> -       break;
>> +        /* Let's make all other return codes map to cc 3.  */
>> +        ret = -ENODEV;
>>      };
>>  
>>      return ret;
>> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
>>      if (sch->do_subchannel_work) {
>>          return sch->do_subchannel_work(sch);
>>      } else {
>> -        return -EINVAL;
>> +        return -ENODEV;
>>      }
>>  }
>>  
>> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
>> index 8614dda6f8..2b0741741c 100644
>> --- a/hw/s390x/s390-ccw.c
>> +++ b/hw/s390x/s390-ccw.c
>> @@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
>>      if (cdc->handle_request) {
>>          return cdc->handle_request(orb, scsw, data);
>>      } else {
>> -        return -ENOSYS;
>> +        return -ENODEV;
>>      }
>>  }
> 
>  Thomas
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-08-31 10:41     ` Halil Pasic
@ 2017-09-05  8:02       ` Cornelia Huck
  2017-09-05 15:24         ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-05  8:02 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On Thu, 31 Aug 2017 12:41:05 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 08/31/2017 11:19 AM, Cornelia Huck wrote:
> > On Wed, 30 Aug 2017 18:36:02 +0200
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> >   
> >> According to the POP a start subchannel instruction (SSCH) returning with
> >> cc 1 implies that the subchannel was status pending when SSCH executed.
> >>
> >> Due to a somewhat confusing error handling, where error codes are mapped
> >> to cc value, sane looking error codes result in non AR compliant
> >> behavior.
> >>
> >> Let's fix this! Instead of cc 1 we use cc 3 which means device not
> >> operational, and is much closer to the truth in the given cases.
> >>
> >> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> >> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
> >> ---
> >>
> >> This patch turned out quite controversial. We did not reach a consensus
> >> during the internal review.
> >>
> >> The most of the discussion revolved around the ORB flag which
> >> architecturally must be supported, but are currently not supported by
> >> vfio-ccw (not yet, or can't be). The idea showing the most promise for
> >> consensus was to handle this via device status (along the lines better a
> >> strange acting device than a non-conform machine) but since it's a
> >> radical change we decided to first discuss upstream and then do whatever
> >> needs to be done.
> >> ---
> >>  hw/s390x/css.c      | 15 ++++++---------
> >>  hw/s390x/s390-ccw.c |  2 +-
> >>  2 files changed, 7 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> >> index a50fb0727e..0822538cde 100644
> >> --- a/hw/s390x/css.c
> >> +++ b/hw/s390x/css.c
> >> @@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
> >>       */
> >>      if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
> >>          !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
> >> -        return -EINVAL;
> >> +        return -ENODEV;  
> > 
> > This feels wrong. If we don't support this yet, doing something like a
> > channel-program check or an operand exception feels closer to the
> > architecture than indicating a gone device.  
> 
> I disagree, a channel-program check or an operand exception, or cc 1
> (current solution) makes the machine obviously non-conform.

Well, it *is* not compliant...

> 
> My train of thought was that architecturally you can loose connection
> to the device at any time (you can't prohibit admins pulling cables or
> smashing equipment with a 10kg hammer).

But that's not what you're doing: Iff you want to signal 'device
gone' (and I'm not convinced it's a good idea), you need to do more
than signal cc 3.

> 
> Also from the guest OS perspective I think saying device not operational
> could provoke a proper reaction form the guest OS: that is just give up
> on the device. The things you propose would in my opinion put the blame
> on the guest OS driver (making non-conform requests) so in that case
> it would make sense to give up on the driver (but the same driver could
> wonderfully work with let's say a fully emulated device).

I'd not call that 'putting blame on the driver'. What happens is that
we signal the driver 'you send us something we cannot deal with' - the
more common reason for that is that the driver built an invalid
request, but I've certainly seen rejects for stuff that simply was not
implemented in the past.

The important thing is that I don't want to lie to the driver: The
device is there, and will work with a different request. Also see my
comment above.

(The real fix is of course to implement what is required by the
architecture :), but I don't think cc 3 is a good stop-gap measure.)

> 
> As I have stated in the cover letter of this patch, I would find
> setting device status even better, but I wanted to discuss first
> before going from setting cc to something else.
> 
> Setting cc was not my idea in the first place (AFAIK the -EINVAL
> here effectively triggers cc 1).

Something else than cc 1 is better, yes (but the intention was probably
a channel-program check or so).

> 
> >   
> >>      }
> >>  
> >>      ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
> >> @@ -1046,16 +1046,13 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
> >>          break;
> >>      case -ENODEV:
> >>          break;
> >> +    case -EFAULT:
> >> +         break;
> >>      case -EACCES:
> >>          /* Let's reflect an inaccessible host device by cc 3. */
> >> -        ret = -ENODEV;
> >> -        break;
> >>      default:
> >> -       /*
> >> -        * All other return codes will trigger a program check,
> >> -        * or set cc to 1.
> >> -        */
> >> -       break;
> >> +        /* Let's make all other return codes map to cc 3.  */
> >> +        ret = -ENODEV;  
> > 
> > Why? This feels wrong. For those cases where we want to signal an error
> > but cc 1 is conceptually wrong, either an operand exception (for very
> > few cases) or a channel-program check feels more in line with the
> > architecture.  
> 
> You mean the original code feels wrong, or? I keep the program check for
> -EFAULT (that's why it's added) and just change cc 1 to cc 3 for the
> not explicitly handled error codes (reason stated in the commit message).

I mean that both feel wrong :) Moving away from abuses of cc 1 makes
sense, but I don't think cc 3 is the correct direction.

> 
> > 
> > That's a general problem with doing stuff in the hypervisor: We have
> > sets of internal problems that obviously don't show up in the
> > architecture, and we can either handle them internally or use what the
> > architecture offers for problem signaling. z/VM has probably faced the
> > same problems :)  
> 
> I agree.
> 
> >   
> >>      };
> >>  
> >>      return ret;
> >> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
> >>      if (sch->do_subchannel_work) {
> >>          return sch->do_subchannel_work(sch);
> >>      } else {
> >> -        return -EINVAL;
> >> +        return -ENODEV;  
> > 
> > This rather seems like a job for an assert? If we don't have a function
> > for the 'asynchronous' handling of the various functions assigned for a
> > subchannel, that looks like an internal error.
> >   
> 
> IMHO it depends. Aborting qemu is heavy handed, and as an user I would not
> be happy about it. But certainly it is an assert situation.  We can look for
> an even better solution, but I think this is an improvement. The logic behind
> is that the device is broken and can't be talked to properly.

We currently don't have a vast array of subchannel types (and are
unlikely to get more types that need a different handler function). We
know the current ones are fine, and an assert would catch programming
errors early.

> 
> >>      }
> >>  }
> >>  
> >> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
> >> index 8614dda6f8..2b0741741c 100644
> >> --- a/hw/s390x/s390-ccw.c
> >> +++ b/hw/s390x/s390-ccw.c
> >> @@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
> >>      if (cdc->handle_request) {
> >>          return cdc->handle_request(orb, scsw, data);
> >>      } else {
> >> -        return -ENOSYS;
> >> +        return -ENODEV;  
> > 
> > If we get here, it means that we called a request handler (which is
> > only done for the passthrough variety) without having assigned a
> > request handler beforehand. This also looks like an internal error to
> > me...
> >   
> 
> Certainly. Again I was not the one who wrote or accepted the original
> code. My previous comment about whether assert or not applies here as
> well.

My answer applies even more here :)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-05  8:02       ` Cornelia Huck
@ 2017-09-05 15:24         ` Halil Pasic
  2017-09-05 15:46           ` Cornelia Huck
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-05 15:24 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 09/05/2017 10:02 AM, Cornelia Huck wrote:
> On Thu, 31 Aug 2017 12:41:05 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> On 08/31/2017 11:19 AM, Cornelia Huck wrote:
>>> On Wed, 30 Aug 2017 18:36:02 +0200
>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>   
>>>> According to the POP a start subchannel instruction (SSCH) returning with
>>>> cc 1 implies that the subchannel was status pending when SSCH executed.
>>>>
>>>> Due to a somewhat confusing error handling, where error codes are mapped
>>>> to cc value, sane looking error codes result in non AR compliant
>>>> behavior.
>>>>
>>>> Let's fix this! Instead of cc 1 we use cc 3 which means device not
>>>> operational, and is much closer to the truth in the given cases.
>>>>
>>>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>>>> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
>>>> ---
>>>>
>>>> This patch turned out quite controversial. We did not reach a consensus
>>>> during the internal review.
>>>>
>>>> The most of the discussion revolved around the ORB flag which
>>>> architecturally must be supported, but are currently not supported by
>>>> vfio-ccw (not yet, or can't be). The idea showing the most promise for
>>>> consensus was to handle this via device status (along the lines better a
>>>> strange acting device than a non-conform machine) but since it's a
>>>> radical change we decided to first discuss upstream and then do whatever
>>>> needs to be done.
>>>> ---
>>>>  hw/s390x/css.c      | 15 ++++++---------
>>>>  hw/s390x/s390-ccw.c |  2 +-
>>>>  2 files changed, 7 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>>>> index a50fb0727e..0822538cde 100644
>>>> --- a/hw/s390x/css.c
>>>> +++ b/hw/s390x/css.c
>>>> @@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>>>>       */
>>>>      if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
>>>>          !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
>>>> -        return -EINVAL;
>>>> +        return -ENODEV;  
>>>
>>> This feels wrong. If we don't support this yet, doing something like a
>>> channel-program check or an operand exception feels closer to the
>>> architecture than indicating a gone device.  
>>
>> I disagree, a channel-program check or an operand exception, or cc 1
>> (current solution) makes the machine obviously non-conform.
> 
> Well, it *is* not compliant...
> 

I'm trying to explain that could be. It is a design decision whether
to put the non-compliant in the machine or in the device attached
to the machine.

>>
>> My train of thought was that architecturally you can loose connection
>> to the device at any time (you can't prohibit admins pulling cables or
>> smashing equipment with a 10kg hammer).
> 
> But that's not what you're doing: Iff you want to signal 'device
> gone' (and I'm not convinced it's a good idea), you need to do more
> than signal cc 3.
> 

Valid (probably), but we already use this kind of gone cc 3 (IMHO).

>>
>> Also from the guest OS perspective I think saying device not operational
>> could provoke a proper reaction form the guest OS: that is just give up
>> on the device. The things you propose would in my opinion put the blame
>> on the guest OS driver (making non-conform requests) so in that case
>> it would make sense to give up on the driver (but the same driver could
>> wonderfully work with let's say a fully emulated device).
> 
> I'd not call that 'putting blame on the driver'. What happens is that
> we signal the driver 'you send us something we cannot deal with' - the
> more common reason for that is that the driver built an invalid
> request, but I've certainly seen rejects for stuff that simply was not
> implemented in the past.

I assume you are talking about a channel-program check or operand exception
now and not about cc 1. IMHO while your argument does make sense, it
contradicts the specified architecture. In my reading the architecture 
defines operand exception and program-check condition as triggered
solely by program (OS) error.

Yes, if one could re-write the architecture, the cleanest way would probably
be capability indication (e.g. can live without prefetching) and channel
program-check if the capability indication is disregarded. But as far as
I know we can't re-write the architecture.

> 
> The important thing is that I don't want to lie to the driver: The
> device is there, and will work with a different request. Also see my
> comment above.

I agree, telling the truth is important. My problem is, that
all the choice we have seems to be picking a lie. And we seem to not
agree, which lie is better.

> 
> (The real fix is of course to implement what is required by the
> architecture :), but I don't think cc 3 is a good stop-gap measure.)
> 
>>
>> As I have stated in the cover letter of this patch, I would find
>> setting device status even better, but I wanted to discuss first
>> before going from setting cc to something else.
>>
>> Setting cc was not my idea in the first place (AFAIK the -EINVAL
>> here effectively triggers cc 1).
> 
> Something else than cc 1 is better, yes (but the intention was probably
> a channel-program check or so).
> 

My problem with a program check (indicated by SCSW word 2 bit 10) is
that, in my reading of the architecture, the semantic behind it is: The
channel subsystem (not the cu or device) has detected, that the 
the channel program (previously submitted as an ORB) is erroneous. Which
programs are erroneous is specified by the architecture. What we have
here does not qualify.

My idea was to rather blame the virtual hardware (device) and put no blame
on the program nor he channel subsystem. This could be done using device
status (unit check with command reject, maybe unit exception) or interface
check. My train of thought was, the problem is not consistent across a
device type, so it has to be device specific.

Of course blaming the device could mislead the person encountering the
problem, and make him believe it's an non-virtual hardware problem.

About the misleading, I think the best we can do is log out a message
indicating what really happened.

In the end I don't care that deeply about vfio-ccw, and this problem
already took me more time than I intended to spend on this. We have
people driving this whole vfio-ccw stuff and I'm not one of them (I'm
rather in the supporting role).

I'm also fine with me being credited with a reported-by once the
more involved people figure out what to do, and keeping the vfio-ccw
stuff as is. Should we go with that option? 

>>
>>>   
>>>>      }
>>>>  
>>>>      ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
>>>> @@ -1046,16 +1046,13 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>>>>          break;
>>>>      case -ENODEV:
>>>>          break;
>>>> +    case -EFAULT:
>>>> +         break;
>>>>      case -EACCES:
>>>>          /* Let's reflect an inaccessible host device by cc 3. */
>>>> -        ret = -ENODEV;
>>>> -        break;
>>>>      default:
>>>> -       /*
>>>> -        * All other return codes will trigger a program check,
>>>> -        * or set cc to 1.
>>>> -        */
>>>> -       break;
>>>> +        /* Let's make all other return codes map to cc 3.  */
>>>> +        ret = -ENODEV;  
>>>
>>> Why? This feels wrong. For those cases where we want to signal an error
>>> but cc 1 is conceptually wrong, either an operand exception (for very
>>> few cases) or a channel-program check feels more in line with the
>>> architecture.  
>>
>> You mean the original code feels wrong, or? I keep the program check for
>> -EFAULT (that's why it's added) and just change cc 1 to cc 3 for the
>> not explicitly handled error codes (reason stated in the commit message).
> 
> I mean that both feel wrong :) Moving away from abuses of cc 1 makes
> sense, but I don't think cc 3 is the correct direction.
> 

Noted. Discussed above.

>>
>>>
>>> That's a general problem with doing stuff in the hypervisor: We have
>>> sets of internal problems that obviously don't show up in the
>>> architecture, and we can either handle them internally or use what the
>>> architecture offers for problem signaling. z/VM has probably faced the
>>> same problems :)  
>>
>> I agree.
>>
>>>   
>>>>      };
>>>>  
>>>>      return ret;
>>>> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
>>>>      if (sch->do_subchannel_work) {
>>>>          return sch->do_subchannel_work(sch);
>>>>      } else {
>>>> -        return -EINVAL;
>>>> +        return -ENODEV;  
>>>
>>> This rather seems like a job for an assert? If we don't have a function
>>> for the 'asynchronous' handling of the various functions assigned for a
>>> subchannel, that looks like an internal error.
>>>   
>>
>> IMHO it depends. Aborting qemu is heavy handed, and as an user I would not
>> be happy about it. But certainly it is an assert situation.  We can look for
>> an even better solution, but I think this is an improvement. The logic behind
>> is that the device is broken and can't be talked to properly.
> 
> We currently don't have a vast array of subchannel types (and are
> unlikely to get more types that need a different handler function). We
> know the current ones are fine, and an assert would catch programming
> errors early.
> 

Despite of that we already had a problem of this type: see 1728cff2ab
("s390x/3270: fix instruction interception handler", 2017-06-09) by 
Dong Jia. If we had some automated testing covering all the asserts
I would not think twice about using an assert here. But I don't think
we do and I'm reluctant (not positive that assert is superior to what
we have now). Maybe we could agree on reported by again.

>>
>>>>      }
>>>>  }
>>>>  
>>>> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
>>>> index 8614dda6f8..2b0741741c 100644
>>>> --- a/hw/s390x/s390-ccw.c
>>>> +++ b/hw/s390x/s390-ccw.c
>>>> @@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
>>>>      if (cdc->handle_request) {
>>>>          return cdc->handle_request(orb, scsw, data);
>>>>      } else {
>>>> -        return -ENOSYS;
>>>> +        return -ENODEV;  
>>>
>>> If we get here, it means that we called a request handler (which is
>>> only done for the passthrough variety) without having assigned a
>>> request handler beforehand. This also looks like an internal error to
>>> me...
>>>   
>>
>> Certainly. Again I was not the one who wrote or accepted the original
>> code. My previous comment about whether assert or not applies here as
>> well.
> 
> My answer applies even more here :)
> 

This is again vfio-ccw.

I've just noticed that I did not answer to your comments
for patch 4. Sorry, I was waiting for more feedback on patches 4-9,
but it turns out the ball is in my court. I will do that right away.

Regards,
Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-05 15:24         ` Halil Pasic
@ 2017-09-05 15:46           ` Cornelia Huck
  2017-09-05 17:20             ` Halil Pasic
  2017-09-06  8:37             ` Dong Jia Shi
  0 siblings, 2 replies; 60+ messages in thread
From: Cornelia Huck @ 2017-09-05 15:46 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On Tue, 5 Sep 2017 17:24:19 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> My problem with a program check (indicated by SCSW word 2 bit 10) is
> that, in my reading of the architecture, the semantic behind it is: The
> channel subsystem (not the cu or device) has detected, that the 
> the channel program (previously submitted as an ORB) is erroneous. Which
> programs are erroneous is specified by the architecture. What we have
> here does not qualify.
> 
> My idea was to rather blame the virtual hardware (device) and put no blame
> on the program nor he channel subsystem. This could be done using device
> status (unit check with command reject, maybe unit exception) or interface
> check. My train of thought was, the problem is not consistent across a
> device type, so it has to be device specific.

Unit exception might be a better way to express what is happening here.
At least, it moves us away from cc 1 and not towards cc 3 :)

> 
> Of course blaming the device could mislead the person encountering the
> problem, and make him believe it's an non-virtual hardware problem.
> 
> About the misleading, I think the best we can do is log out a message
> indicating what really happened.

Just document it in the code? If it doesn't happen with Linux as a
guest, it is highly unlikely to be seen in the wild.

> 
> In the end I don't care that deeply about vfio-ccw, and this problem
> already took me more time than I intended to spend on this. We have
> people driving this whole vfio-ccw stuff and I'm not one of them (I'm
> rather in the supporting role).
> 
> I'm also fine with me being credited with a reported-by once the
> more involved people figure out what to do, and keeping the vfio-ccw
> stuff as is. Should we go with that option? 

If converting the reporting to a device status is straightforward
enough, let's do that. I'm fine with postponing this and waiting for a
real fix as well (I don't really have spare cycles to actually write
vfio-ccw code currently...)

> >>>> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
> >>>>      if (sch->do_subchannel_work) {
> >>>>          return sch->do_subchannel_work(sch);
> >>>>      } else {
> >>>> -        return -EINVAL;
> >>>> +        return -ENODEV;    
> >>>
> >>> This rather seems like a job for an assert? If we don't have a function
> >>> for the 'asynchronous' handling of the various functions assigned for a
> >>> subchannel, that looks like an internal error.
> >>>     
> >>
> >> IMHO it depends. Aborting qemu is heavy handed, and as an user I would not
> >> be happy about it. But certainly it is an assert situation.  We can look for
> >> an even better solution, but I think this is an improvement. The logic behind
> >> is that the device is broken and can't be talked to properly.  
> > 
> > We currently don't have a vast array of subchannel types (and are
> > unlikely to get more types that need a different handler function). We
> > know the current ones are fine, and an assert would catch programming
> > errors early.
> >   
> 
> Despite of that we already had a problem of this type: see 1728cff2ab
> ("s390x/3270: fix instruction interception handler", 2017-06-09) by 
> Dong Jia. If we had some automated testing covering all the asserts
> I would not think twice about using an assert here. But I don't think
> we do and I'm reluctant (not positive that assert is superior to what
> we have now). Maybe we could agree on reported by again.

Yes, we (as in generally 'we') are really lacking automated testing...
(it is somewhere on my todo list).

Either leave it as-is, or do an assert. -ENODEV just feels wrong.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-08-31  9:55   ` Cornelia Huck
@ 2017-09-05 15:55     ` Halil Pasic
  2017-09-05 16:25       ` Cornelia Huck
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-05 15:55 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 08/31/2017 11:55 AM, Cornelia Huck wrote:
> On Wed, 30 Aug 2017 18:36:04 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> Simplify the error handling of the SSCH and RSCH handler avoiding
>> arbitrary and cryptic error codes being mapped to what a subchannel is
>> supposed to do.  Let the code detecting the condition tell how it's to be
>> handled in a less ambiguous way.  It's best to handle SSCH and RSCH in
>> one go as the emulation of the two shares a lot of code.
> 
> So determining the return code at the point in time where we can
> instead of trying to map to return codes and back again makes quite a
> bit of sense, but I'll have to look at the rest of this.


Looging forward to.

> For one thing,
> would a better split to introduce the cc-reporting infrastructure first
> and then convert the different I/O functions?
> 

Speaks nothing against it, although I don't see anything wrong with
not introducing unused infrastructure (that is introducing
the infrastructure and it's first user together). As you prefer.

>>
>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
>> Acked-by: Pierre Morel<pmorel@linux.vnet.ibm.com>
>>
>> ---
>> Notes:
>> Funny, we had a different swich-case for SSCH and RSCH. For
>> virtual it did not matter, but for passtrough it could. In practice
>> -EINVAL from the kernel would have been mapped to cc 2 in case of
>> RSCH and to cc 1 in case of SSHC which is absurd. Same goes for
>> -EBUSY from kernel which is correctly mapped to cc 2 in case of
>> SSCH, but for RSCH it gets mapped to cc 1 which is also absurd.
>> ---
>>  hw/s390x/css.c              | 86 ++++++++++++++-------------------------------
>>  hw/s390x/s390-ccw.c         |  8 ++---
>>  hw/vfio/ccw.c               | 32 +++++++++++++----
>>  include/hw/s390x/css.h      | 30 ++++++++++++----
>>  include/hw/s390x/s390-ccw.h |  2 +-
>>  target/s390x/ioinst.c       | 61 +++++++++-----------------------
>>  6 files changed, 97 insertions(+), 122 deletions(-)
>>
>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>> index bc47bf5b20..1102642c10 100644
>> --- a/hw/s390x/css.c
>> +++ b/hw/s390x/css.c
>> @@ -1015,12 +1015,11 @@ static void sch_handle_start_func_virtual(SubchDev *sch)
>>  
>>  }
>>  
>> -static int sch_handle_start_func_passthrough(SubchDev *sch)
>> +static void sch_handle_start_func_passthrough(SubchDev *sch)
>>  {
>>  
>>      PMCW *p = &sch->curr_status.pmcw;
>>      SCSW *s = &sch->curr_status.scsw;
>> -    int ret;
>>  
>>      ORB *orb = &sch->orb;
>>      if (!(s->ctrl & SCSW_ACTL_SUSP)) {
>> @@ -1034,28 +1033,10 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>>       */
>>      if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
>>          !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
>> -        return -ENODEV;
>> +        sch->iret.cc = 3;
> 
> Same as already commented: I don't think cc 3 is a good match.
> 

Yeah, this is just dumb conversion.

>>      }
>>  
>> -    ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
>> -    switch (ret) {
>> -    /* Currently we don't update control block and just return the cc code. */
>> -    case 0:
>> -        break;
>> -    case -EBUSY:
>> -        break;
>> -    case -ENODEV:
>> -        break;
>> -    case -EFAULT:
>> -         break;
>> -    case -EACCES:
>> -        /* Let's reflect an inaccessible host device by cc 3. */
>> -    default:
>> -        /* Let's make all other return codes map to cc 3.  */
>> -        ret = -ENODEV;
>> -    };
>> -
>> -    return ret;
>> +    s390_ccw_cmd_request(sch);
> 
> As you change the handling anyway: Don't change this logic in prior
> patches?

I preserve the logic as altered by the previous patches (at least,
that is the intention). This is the mapping around part which is going
away if we push the handling down.

> 
>>  }
>>  
>>  /*
>> @@ -1064,7 +1045,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch)
>>   * read/writes) asynchronous later on if we start supporting more than
>>   * our current very simple devices.
>>   */
>> -int do_subchannel_work_virtual(SubchDev *sch)
>> +void do_subchannel_work_virtual(SubchDev *sch)
>>  {
>>  
>>      SCSW *s = &sch->curr_status.scsw;
>> @@ -1078,41 +1059,35 @@ int do_subchannel_work_virtual(SubchDev *sch)
>>          sch_handle_start_func_virtual(sch);
>>      } else {
>>          /* Cannot happen. */
>> -        return -ENODEV;
>> +        sch->iret.cc = 3;
> 
> See comment for the last patch.
> 

Again just a conversion. If we don't do it in the previous patch
it has to be done differently in this patch.

>>      }
>>      css_inject_io_interrupt(sch);
>> -    return 0;
>>  }
>>  
>> -int do_subchannel_work_passthrough(SubchDev *sch)
>> +void do_subchannel_work_passthrough(SubchDev *sch)
>>  {
>> -    int ret;
>>      SCSW *s = &sch->curr_status.scsw;
>>  
>>      if (s->ctrl & SCSW_FCTL_CLEAR_FUNC) {
>>          /* TODO: Clear handling */
>>          sch_handle_clear_func(sch);
>> -        ret = 0;
>>      } else if (s->ctrl & SCSW_FCTL_HALT_FUNC) {
>>          /* TODO: Halt handling */
>>          sch_handle_halt_func(sch);
>> -        ret = 0;
>>      } else if (s->ctrl & SCSW_FCTL_START_FUNC) {
>> -        ret = sch_handle_start_func_passthrough(sch);
>> +        sch_handle_start_func_passthrough(sch);
>>      } else {
>>          /* Cannot happen. */
>> -        return -ENODEV;
>> +        sch->iret.cc = 3;
> 
> ftcl == 0 should have been rejected already by the function handlers
> here as well, shouldn't it?

Which function handlers. Basically the instruction handlers set fctl
and call do_subchannel_work to get the indicated work done.

> 
>>      }
>> -
>> -    return ret;
>>  }
>>  
>> -static int do_subchannel_work(SubchDev *sch)
>> +static void do_subchannel_work(SubchDev *sch)
>>  {
>>      if (sch->do_subchannel_work) {
>> -        return sch->do_subchannel_work(sch);
>> +        sch->do_subchannel_work(sch);
>>      } else {
>> -        return -ENODEV;
>> +        sch->iret.cc = 3;
> 
> See my comment for a prior patch.
>

Nod.

 
>>      }
>>  }
>>  
> 
> (...)
> 
>> diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
>> index 5c5fe6b202..d093181a9e 100644
>> --- a/include/hw/s390x/css.h
>> +++ b/include/hw/s390x/css.h
>> @@ -94,13 +94,31 @@ struct SubchDev {
>>      /* transport-provided data: */
>>      int (*ccw_cb) (SubchDev *, CCW1);
>>      void (*disable_cb)(SubchDev *);
>> -    int (*do_subchannel_work) (SubchDev *);
>> +    void (*do_subchannel_work) (SubchDev *);
>>      SenseId id;
>>      void *driver_data;
>> +    /* io instructions conclude according to iret */
>> +    struct {
>> +        /*
>> +         * General semantic of cc codes of IO instructions is (brief):
>> +         * 0 -- produced expected result
>> +         * 1 -- produced alternate result
>> +         * 2 -- ineffective, because busy with previously initiated function
>> +         * 3 -- ineffective, not operational
> 
> I'm not sure you can summarize this that way in all cases.
> 
> Also, what does "ineffective" mean? If I get a cc 1 for, say, ssch, I
> don't expect something either as the subchannel was already status
> pending.

You are right cc 1 would have been better off with
 'produced alternate result or status pending'

I've tried to make this shorter (from PoP 14-2):
"""
Condition Code 0: Instruction execution produced
the expected or most probable result. (See “Deferred
Condition Code (CC)” on page 9 for a description of
conditions that can be encountered subsequent to
the presentation of condition code 0 that result in a
nonzero deferred condition code.)
Condition Code 1: Instruction execution produced
the alternate or second-most-probable result, or sta-
tus conditions were present that may or may not have
prevented the expected result.
Condition Code 2: Instruction execution was inef-
fective because the designated subchannel or chan-
nel-subsystem facility was busy with a previously
initiated function.
Condition Code 3: Instruction execution was inef-
fective because the designated element was not
operational or because some condition precluded ini-
tiation of the normal function.
"""

Well ineffective means not-effective ;).

> 
>> +         */
>> +        uint32_t cc:4;
>> +        bool pgm_chk:1;
> 
> This looks weird?>

What do you mean? Any suggestions how to do it better?
 
>> +        uint32_t irq_code;
>> +    } iret;
>>  };
>>  
>>  extern const VMStateDescription vmstate_subch_dev;
> 
> (...)
> 
>> @@ -238,33 +236,17 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, uint32_t ipb)
>>      }
>>      trace_ioinst_sch_id("ssch", cssid, ssid, schid);
>>      sch = css_find_subch(m, cssid, ssid, schid);
>> -    if (sch && css_subch_visible(sch)) {
>> -        ret = css_do_ssch(sch, &orb);
>> +    if (!sch || !css_subch_visible(sch)) {
>> +        setcc(cpu, 3);
>> +        return;
>>      }
>> -    switch (ret) {
>> -    case -ENODEV:
>> -        cc = 3;
>> -        break;
>> -    case -EBUSY:
>> -        cc = 2;
>> -        break;
>> -    case -EFAULT:
>> -        /*
>> -         * TODO:
>> -         * I'm wondering whether there is something better
>> -         * to do for us here (like setting some device or
>> -         * subchannel status).
>> -         */
> 
> You removed the TODO :(
> 
> There still might be a better way to reflect this...
> 

OK I should have pushed it down to 
@@ -71,10 +72,27 @@ again:
             goto again;
         }
         error_report("vfio-ccw: wirte I/O region failed with errno=%d", errno);
-        return -errno;
+        ret = -errno;
+    } else {
+        ret = region->ret_code;
+    }
+    switch (-ret) {
+    /* Currently we don't update control block and just return the cc code. */
+    case 0:
+        sch->iret.cc = 0;
+        break;
+    case EBUSY:
+        sch->iret.cc = 2;
+        break;
+    case EFAULT:

Like this:

+        /*
+         * TODO:
+         * I'm wondering whether there is something better
+         * to do for us here (like setting some device or
+         * subchannel status).
+         */
+        sch->iret.pgm_chk = true;
+        sch->iret.irq_code = PGM_ADDRESSING;
+        break;
+    case ENODEV:
+    case EACCES:
+    default:
+        sch->iret.cc = 3;
     }
-
-    return region->ret_code;
 }

 static void vfio_ccw_reset(DeviceState *dev)

(deleted in your mail, file hw/vfio/ccw.c).

But I guess, I was afraid of somebody blaming me for this
comment (such TODOs in production code are a bit strange -- what
does it mean: we did not bother to figure it out?).

>> -        program_interrupt(env, PGM_ADDRESSING, 4);
>> +    css_subch_clear_iret(sch);
>> +    css_do_ssch(sch, &orb);
>> +    if (sch->iret.pgm_chk) {
>> +        program_interrupt(env, sch->iret.irq_code, 4);
>>          return;
>> -    case 0:
>> -        cc = 0;
>> -        break;
>> -    default:
>> -        cc = 1;
>> -        break;
>>      }
>> -    setcc(cpu, cc);
>> +    setcc(cpu, sch->iret.cc);
>>  }
>>  
>>  void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb)
> 

Looking forward to the rest of the feedback.

Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-05 15:55     ` Halil Pasic
@ 2017-09-05 16:25       ` Cornelia Huck
  2017-09-05 22:30         ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-05 16:25 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On Tue, 5 Sep 2017 17:55:17 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 08/31/2017 11:55 AM, Cornelia Huck wrote:
> > On Wed, 30 Aug 2017 18:36:04 +0200
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> >   
> >> Simplify the error handling of the SSCH and RSCH handler avoiding
> >> arbitrary and cryptic error codes being mapped to what a subchannel is
> >> supposed to do.  Let the code detecting the condition tell how it's to be
> >> handled in a less ambiguous way.  It's best to handle SSCH and RSCH in
> >> one go as the emulation of the two shares a lot of code.  
> > 
> > So determining the return code at the point in time where we can
> > instead of trying to map to return codes and back again makes quite a
> > bit of sense, but I'll have to look at the rest of this.  
> 
> 
> Looging forward to.

Once I manage to find some quiet time for thinking :(


> >> -    ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
> >> -    switch (ret) {
> >> -    /* Currently we don't update control block and just return the cc code. */
> >> -    case 0:
> >> -        break;
> >> -    case -EBUSY:
> >> -        break;
> >> -    case -ENODEV:
> >> -        break;
> >> -    case -EFAULT:
> >> -         break;
> >> -    case -EACCES:
> >> -        /* Let's reflect an inaccessible host device by cc 3. */
> >> -    default:
> >> -        /* Let's make all other return codes map to cc 3.  */
> >> -        ret = -ENODEV;
> >> -    };
> >> -
> >> -    return ret;
> >> +    s390_ccw_cmd_request(sch);  
> > 
> > As you change the handling anyway: Don't change this logic in prior
> > patches?  
> 
> I preserve the logic as altered by the previous patches (at least,
> that is the intention). This is the mapping around part which is going
> away if we push the handling down.

My point is that you touch the code path twice. But we disagreed on the
original change anyway :)

> >> -int do_subchannel_work_passthrough(SubchDev *sch)
> >> +void do_subchannel_work_passthrough(SubchDev *sch)
> >>  {
> >> -    int ret;
> >>      SCSW *s = &sch->curr_status.scsw;
> >>  
> >>      if (s->ctrl & SCSW_FCTL_CLEAR_FUNC) {
> >>          /* TODO: Clear handling */
> >>          sch_handle_clear_func(sch);
> >> -        ret = 0;
> >>      } else if (s->ctrl & SCSW_FCTL_HALT_FUNC) {
> >>          /* TODO: Halt handling */
> >>          sch_handle_halt_func(sch);
> >> -        ret = 0;
> >>      } else if (s->ctrl & SCSW_FCTL_START_FUNC) {
> >> -        ret = sch_handle_start_func_passthrough(sch);
> >> +        sch_handle_start_func_passthrough(sch);
> >>      } else {
> >>          /* Cannot happen. */
> >> -        return -ENODEV;
> >> +        sch->iret.cc = 3;  
> > 
> > ftcl == 0 should have been rejected already by the function handlers
> > here as well, shouldn't it?  
> 
> Which function handlers. Basically the instruction handlers set fctl
> and call do_subchannel_work to get the indicated work done.

Or set. My point is that we can't get here with fctl == 0. So an assert
sounds better to me.


> >> diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
> >> index 5c5fe6b202..d093181a9e 100644
> >> --- a/include/hw/s390x/css.h
> >> +++ b/include/hw/s390x/css.h
> >> @@ -94,13 +94,31 @@ struct SubchDev {
> >>      /* transport-provided data: */
> >>      int (*ccw_cb) (SubchDev *, CCW1);
> >>      void (*disable_cb)(SubchDev *);
> >> -    int (*do_subchannel_work) (SubchDev *);
> >> +    void (*do_subchannel_work) (SubchDev *);
> >>      SenseId id;
> >>      void *driver_data;
> >> +    /* io instructions conclude according to iret */
> >> +    struct {
> >> +        /*
> >> +         * General semantic of cc codes of IO instructions is (brief):
> >> +         * 0 -- produced expected result
> >> +         * 1 -- produced alternate result
> >> +         * 2 -- ineffective, because busy with previously initiated function
> >> +         * 3 -- ineffective, not operational  
> > 
> > I'm not sure you can summarize this that way in all cases.
> > 
> > Also, what does "ineffective" mean? If I get a cc 1 for, say, ssch, I
> > don't expect something either as the subchannel was already status
> > pending.  
> 
> You are right cc 1 would have been better off with
>  'produced alternate result or status pending'
> 
> I've tried to make this shorter (from PoP 14-2):
> """
> Condition Code 0: Instruction execution produced
> the expected or most probable result. (See “Deferred
> Condition Code (CC)” on page 9 for a description of
> conditions that can be encountered subsequent to
> the presentation of condition code 0 that result in a
> nonzero deferred condition code.)
> Condition Code 1: Instruction execution produced
> the alternate or second-most-probable result, or sta-
> tus conditions were present that may or may not have
> prevented the expected result.
> Condition Code 2: Instruction execution was inef-
> fective because the designated subchannel or chan-
> nel-subsystem facility was busy with a previously
> initiated function.
> Condition Code 3: Instruction execution was inef-
> fective because the designated element was not
> operational or because some condition precluded ini-
> tiation of the normal function.
> """
> 
> Well ineffective means not-effective ;).

Yes :)

I think the summary in the PoP is already a bit on the over-generalized
side, and condensing it further makes it rather ineffective ;) at
explaining what happens. Frankly, I'd just drop this and rely on
interested parties referring to the PoP.

> 
> >   
> >> +         */
> >> +        uint32_t cc:4;
> >> +        bool pgm_chk:1;  
> > 
> > This looks weird?>  
> 
> What do you mean? Any suggestions how to do it better?

Taking one bit from a bool looks odd. I'd either use a bool normally or
take one bit from an uint8_t.

>  
> >> +        uint32_t irq_code;
> >> +    } iret;
> >>  };
> >>  
> >>  extern const VMStateDescription vmstate_subch_dev;  

> But I guess, I was afraid of somebody blaming me for this
> comment (such TODOs in production code are a bit strange -- what
> does it mean: we did not bother to figure it out?).

For one, the TODO is preexisting... and we really should remember to
figure out if there's something better rather than just drop the
comment.

(And I sure hope nobody is using vfio-ccw in production yet...)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-05 15:46           ` Cornelia Huck
@ 2017-09-05 17:20             ` Halil Pasic
  2017-09-06  8:27               ` Dong Jia Shi
  2017-09-06 11:37               ` Cornelia Huck
  2017-09-06  8:37             ` Dong Jia Shi
  1 sibling, 2 replies; 60+ messages in thread
From: Halil Pasic @ 2017-09-05 17:20 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 09/05/2017 05:46 PM, Cornelia Huck wrote:
> On Tue, 5 Sep 2017 17:24:19 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> My problem with a program check (indicated by SCSW word 2 bit 10) is
>> that, in my reading of the architecture, the semantic behind it is: The
>> channel subsystem (not the cu or device) has detected, that the 
>> the channel program (previously submitted as an ORB) is erroneous. Which
>> programs are erroneous is specified by the architecture. What we have
>> here does not qualify.
>>
>> My idea was to rather blame the virtual hardware (device) and put no blame
>> on the program nor he channel subsystem. This could be done using device
>> status (unit check with command reject, maybe unit exception) or interface
>> check. My train of thought was, the problem is not consistent across a
>> device type, so it has to be device specific.
> 
> Unit exception might be a better way to express what is happening here.
> At least, it moves us away from cc 1 and not towards cc 3 :)
> 

I will do a follow up patch pursuing device exception.

>>
>> Of course blaming the device could mislead the person encountering the
>> problem, and make him believe it's an non-virtual hardware problem.
>>
>> About the misleading, I think the best we can do is log out a message
>> indicating what really happened.
> 
> Just document it in the code? If it doesn't happen with Linux as a
> guest, it is highly unlikely to be seen in the wild.
> 


Well we have two problems here:
1) Unit exception can be already defined by the device type for the
command (reference: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920).
I think this one is what you mean. And I agree that's best handled
with comment in code.
2) The poor user/programmer is trying to figure out why things
don't work (why are we getting the unit exception)? I think that's
best remedied with producing something for the log (maybe a warning
with warn_report which states that the implementation vfio-ccw requires
the given flags).

[..] 
>>>>>> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
>>>>>>      if (sch->do_subchannel_work) {
>>>>>>          return sch->do_subchannel_work(sch);
>>>>>>      } else {
>>>>>> -        return -EINVAL;
>>>>>> +        return -ENODEV;    
>>>>>
>>>>> This rather seems like a job for an assert? If we don't have a function
>>>>> for the 'asynchronous' handling of the various functions assigned for a
>>>>> subchannel, that looks like an internal error.
>>>>>     
>>>>
>>>> IMHO it depends. Aborting qemu is heavy handed, and as an user I would not
>>>> be happy about it. But certainly it is an assert situation.  We can look for
>>>> an even better solution, but I think this is an improvement. The logic behind
>>>> is that the device is broken and can't be talked to properly.  
>>>
>>> We currently don't have a vast array of subchannel types (and are
>>> unlikely to get more types that need a different handler function). We
>>> know the current ones are fine, and an assert would catch programming
>>> errors early.
>>>   
>>
>> Despite of that we already had a problem of this type: see 1728cff2ab
>> ("s390x/3270: fix instruction interception handler", 2017-06-09) by 
>> Dong Jia. If we had some automated testing covering all the asserts
>> I would not think twice about using an assert here. But I don't think
>> we do and I'm reluctant (not positive that assert is superior to what
>> we have now). Maybe we could agree on reported by again.
> 
> Yes, we (as in generally 'we') are really lacking automated testing...
> (it is somewhere on my todo list).
> 
> Either leave it as-is, or do an assert. -ENODEV just feels wrong.
> 

I think I will leave this one as is and maybe try to discuss with
the folks here about reliable test coverage. Just spoke with Marc H.,
and according to that we have a long way to go.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-05 16:25       ` Cornelia Huck
@ 2017-09-05 22:30         ` Halil Pasic
  2017-09-06  4:31           ` Dong Jia Shi
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-05 22:30 UTC (permalink / raw)
  To: Cornelia Huck, Xiao Feng Ren; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 09/05/2017 06:25 PM, Cornelia Huck wrote:
> On Tue, 5 Sep 2017 17:55:17 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> On 08/31/2017 11:55 AM, Cornelia Huck wrote:
>>> On Wed, 30 Aug 2017 18:36:04 +0200
>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>   
>>>> Simplify the error handling of the SSCH and RSCH handler avoiding
>>>> arbitrary and cryptic error codes being mapped to what a subchannel is
>>>> supposed to do.  Let the code detecting the condition tell how it's to be
>>>> handled in a less ambiguous way.  It's best to handle SSCH and RSCH in
>>>> one go as the emulation of the two shares a lot of code.  
>>>
>>> So determining the return code at the point in time where we can
>>> instead of trying to map to return codes and back again makes quite a
>>> bit of sense, but I'll have to look at the rest of this.  
>>
>>
>> Looging forward to.
> 
> Once I manage to find some quiet time for thinking :(

I think it's the best if you ignore the rest until v2.

Since we agreed that cc 3 is not the way to go and that the basic
idea is sane, IMHO it does not make much sense to do a thorough
review of this any more.

Not diverting valuable maintainer resources from my indirect data 
access patch set is also a point (IDA is something I have to deliver, while
this is just for fun).
 
> 
>>>> -    ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
>>>> -    switch (ret) {
>>>> -    /* Currently we don't update control block and just return the cc code. */
>>>> -    case 0:
>>>> -        break;
>>>> -    case -EBUSY:
>>>> -        break;
>>>> -    case -ENODEV:
>>>> -        break;
>>>> -    case -EFAULT:
>>>> -         break;
>>>> -    case -EACCES:
>>>> -        /* Let's reflect an inaccessible host device by cc 3. */
>>>> -    default:
>>>> -        /* Let's make all other return codes map to cc 3.  */
>>>> -        ret = -ENODEV;
>>>> -    };
>>>> -
>>>> -    return ret;
>>>> +    s390_ccw_cmd_request(sch);  
>>>
>>> As you change the handling anyway: Don't change this logic in prior
>>> patches?  
>>
>> I preserve the logic as altered by the previous patches (at least,
>> that is the intention). This is the mapping around part which is going
>> away if we push the handling down.
> 
> My point is that you touch the code path twice. But we disagreed on the
> original change anyway :)

Nod.

> 
>>>> -int do_subchannel_work_passthrough(SubchDev *sch)
>>>> +void do_subchannel_work_passthrough(SubchDev *sch)
>>>>  {
>>>> -    int ret;
>>>>      SCSW *s = &sch->curr_status.scsw;
>>>> 
>>>>      if (s->ctrl & SCSW_FCTL_CLEAR_FUNC) {
>>>>          /* TODO: Clear handling */
>>>>          sch_handle_clear_func(sch);
>>>> -        ret = 0;
>>>>      } else if (s->ctrl & SCSW_FCTL_HALT_FUNC) {
>>>>          /* TODO: Halt handling */
>>>>          sch_handle_halt_func(sch);
>>>> -        ret = 0;
>>>>      } else if (s->ctrl & SCSW_FCTL_START_FUNC) {
>>>> -        ret = sch_handle_start_func_passthrough(sch);
>>>> +        sch_handle_start_func_passthrough(sch);
>>>>      } else {
>>>>          /* Cannot happen. */
>>>> -        return -ENODEV;
>>>> +        sch->iret.cc = 3;  
>>>
>>> ftcl == 0 should have been rejected already by the function handlers
>>> here as well, shouldn't it?  
>>
>> Which function handlers. Basically the instruction handlers set fctl
>> and call do_subchannel_work to get the indicated work done.
> 
> Or set. My point is that we can't get here with fctl == 0. So an assert
> sounds better to me.
> 
Yeah, I agree assert is the way to go here.

> 
>>>> diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
>>>> index 5c5fe6b202..d093181a9e 100644
>>>> --- a/include/hw/s390x/css.h
>>>> +++ b/include/hw/s390x/css.h
>>>> @@ -94,13 +94,31 @@ struct SubchDev {
>>>>      /* transport-provided data: */
>>>>      int (*ccw_cb) (SubchDev *, CCW1);
>>>>      void (*disable_cb)(SubchDev *);
>>>> -    int (*do_subchannel_work) (SubchDev *);
>>>> +    void (*do_subchannel_work) (SubchDev *);
>>>>      SenseId id;
>>>>      void *driver_data;
>>>> +    /* io instructions conclude according to iret */
>>>> +    struct {
>>>> +        /*
>>>> +         * General semantic of cc codes of IO instructions is (brief):
>>>> +         * 0 -- produced expected result
>>>> +         * 1 -- produced alternate result
>>>> +         * 2 -- ineffective, because busy with previously initiated function
>>>> +         * 3 -- ineffective, not operational  
>>>
>>> I'm not sure you can summarize this that way in all cases.
>>>
>>> Also, what does "ineffective" mean? If I get a cc 1 for, say, ssch, I
>>> don't expect something either as the subchannel was already status
>>> pending.  
>>
>> You are right cc 1 would have been better off with
>>  'produced alternate result or status pending'
>>
>> I've tried to make this shorter (from PoP 14-2):
>> """
>> Condition Code 0: Instruction execution produced
>> the expected or most probable result. (See “Deferred
>> Condition Code (CC)” on page 9 for a description of
>> conditions that can be encountered subsequent to
>> the presentation of condition code 0 that result in a
>> nonzero deferred condition code.)
>> Condition Code 1: Instruction execution produced
>> the alternate or second-most-probable result, or sta-
>> tus conditions were present that may or may not have
>> prevented the expected result.
>> Condition Code 2: Instruction execution was inef-
>> fective because the designated subchannel or chan-
>> nel-subsystem facility was busy with a previously
>> initiated function.
>> Condition Code 3: Instruction execution was inef-
>> fective because the designated element was not
>> operational or because some condition precluded ini-
>> tiation of the normal function.
>> """
>>
>> Well ineffective means not-effective ;).
> 
> Yes :)
> 
> I think the summary in the PoP is already a bit on the over-generalized
> side, and condensing it further makes it rather ineffective ;) at
> explaining what happens. Frankly, I'd just drop this and rely on
> interested parties referring to the PoP.
> 

That's what I did in the first place, but then Janosch complained.
I will meditate on this (having some sort of explanation is helpful
and I think cc 0 2 and 3 are actually quite OK).

>>
>>>   
>>>> +         */
>>>> +        uint32_t cc:4;
>>>> +        bool pgm_chk:1;  
>>>
>>> This looks weird?>  
>>
>> What do you mean? Any suggestions how to do it better?
> 
> Taking one bit from a bool looks odd. I'd either use a bool normally or
> take one bit from an uint8_t.
> 

I have to think about this.

>>  
>>>> +        uint32_t irq_code;
>>>> +    } iret;
>>>>  };
>>>>  
>>>>  extern const VMStateDescription vmstate_subch_dev;  
> 
>> But I guess, I was afraid of somebody blaming me for this
>> comment (such TODOs in production code are a bit strange -- what
>> does it mean: we did not bother to figure it out?).
> 
> For one, the TODO is preexisting... and we really should remember to
> figure out if there's something better rather than just drop the
> comment.
> 
> (And I sure hope nobody is using vfio-ccw in production yet...)
>
Since blame says the TODO has been around since 2017-05-17
let me have a critical look at it.

At a first glance I would say addressing exception for SSCH
is not what we want: the only possibility I see for address
exception for SSCH is due to the ORB address. But if that's
the case we will never reach the code in question. So I suppose
we can do better.

Adding Ren. @Ren: Do you agree with my analysis. If you do,
I could come up with a proposal what to do -- after some reading.

Regards,
Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-05 22:30         ` Halil Pasic
@ 2017-09-06  4:31           ` Dong Jia Shi
  2017-09-06 12:25             ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-06  4:31 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, Xiao Feng Ren, Dong Jia Shi, Pierre Morel, qemu-devel

* Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-06 00:30:58 +0200]:

[...]

> > 
> >> But I guess, I was afraid of somebody blaming me for this
> >> comment (such TODOs in production code are a bit strange -- what
> >> does it mean: we did not bother to figure it out?).
> > 
> > For one, the TODO is preexisting... and we really should remember to
> > figure out if there's something better rather than just drop the
> > comment.
> > 
> > (And I sure hope nobody is using vfio-ccw in production yet...)
> >
> Since blame says the TODO has been around since 2017-05-17
> let me have a critical look at it.
> 
> At a first glance I would say addressing exception for SSCH
> is not what we want: the only possibility I see for address
> exception for SSCH is due to the ORB address. But if that's
> the case we will never reach the code in question.
Agree.

> So I suppose we can do better.
As the comment said, I'm (still) in the state of 'wondering'.

> 
> Adding Ren. @Ren: Do you agree with my analysis. If you do,
> I could come up with a proposal what to do -- after some reading.
If you have a better idea, and time, why not? ;)

> 
> Regards,
> Halil

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-05 17:20             ` Halil Pasic
@ 2017-09-06  8:27               ` Dong Jia Shi
  2017-09-06 11:25                 ` Cornelia Huck
  2017-09-06 11:37               ` Cornelia Huck
  1 sibling, 1 reply; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-06  8:27 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Cornelia Huck, Dong Jia Shi, Pierre Morel, qemu-devel

* Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]:

> 
> 
> On 09/05/2017 05:46 PM, Cornelia Huck wrote:
> > On Tue, 5 Sep 2017 17:24:19 +0200
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > 
> >> My problem with a program check (indicated by SCSW word 2 bit 10) is
> >> that, in my reading of the architecture, the semantic behind it is: The
> >> channel subsystem (not the cu or device) has detected, that the 
> >> the channel program (previously submitted as an ORB) is erroneous. Which
> >> programs are erroneous is specified by the architecture. What we have
> >> here does not qualify.
> >>
> >> My idea was to rather blame the virtual hardware (device) and put no blame
> >> on the program nor he channel subsystem. This could be done using device
> >> status (unit check with command reject, maybe unit exception) or interface
> >> check. My train of thought was, the problem is not consistent across a
> >> device type, so it has to be device specific.
> > 
> > Unit exception might be a better way to express what is happening here.
> > At least, it moves us away from cc 1 and not towards cc 3 :)
> > 
> 
> I will do a follow up patch pursuing device exception.
> 
> >>
> >> Of course blaming the device could mislead the person encountering the
> >> problem, and make him believe it's an non-virtual hardware problem.
> >>
> >> About the misleading, I think the best we can do is log out a message
> >> indicating what really happened.
> > 
> > Just document it in the code? If it doesn't happen with Linux as a
> > guest, it is highly unlikely to be seen in the wild.
> > 
> 
> 
> Well we have two problems here:
> 1) Unit exception can be already defined by the device type for the
> command (reference: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920).
> I think this one is what you mean. And I agree that's best handled
> with comment in code.
Using unit check, with bit 3 byte 0 of the sense data set to 1, to
indicate an 'Equipment check', sounds a bit more proper than unit
exception.

> 2) The poor user/programmer is trying to figure out why things
> don't work (why are we getting the unit exception)? I think that's
> best remedied with producing something for the log (maybe a warning
> with warn_report which states that the implementation vfio-ccw requires
> the given flags).
Fine with me.

[...]

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-05 15:46           ` Cornelia Huck
  2017-09-05 17:20             ` Halil Pasic
@ 2017-09-06  8:37             ` Dong Jia Shi
  2017-09-06 11:38               ` Cornelia Huck
  1 sibling, 1 reply; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-06  8:37 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Halil Pasic, Dong Jia Shi, Pierre Morel, qemu-devel

* Cornelia Huck <cohuck@redhat.com> [2017-09-05 17:46:06 +0200]:

> On Tue, 5 Sep 2017 17:24:19 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
> > My problem with a program check (indicated by SCSW word 2 bit 10) is
> > that, in my reading of the architecture, the semantic behind it is: The
> > channel subsystem (not the cu or device) has detected, that the 
> > the channel program (previously submitted as an ORB) is erroneous. Which
> > programs are erroneous is specified by the architecture. What we have
> > here does not qualify.
> > 
> > My idea was to rather blame the virtual hardware (device) and put no blame
> > on the program nor he channel subsystem. This could be done using device
> > status (unit check with command reject, maybe unit exception) or interface
> > check. My train of thought was, the problem is not consistent across a
> > device type, so it has to be device specific.
> 
> Unit exception might be a better way to express what is happening here.
> At least, it moves us away from cc 1 and not towards cc 3 :)
> 
> > 
> > Of course blaming the device could mislead the person encountering the
> > problem, and make him believe it's an non-virtual hardware problem.
> > 
> > About the misleading, I think the best we can do is log out a message
> > indicating what really happened.
> 
> Just document it in the code? If it doesn't happen with Linux as a
> guest, it is highly unlikely to be seen in the wild.
> 
> > 
> > In the end I don't care that deeply about vfio-ccw, and this problem
> > already took me more time than I intended to spend on this. We have
> > people driving this whole vfio-ccw stuff and I'm not one of them (I'm
> > rather in the supporting role).
> > 
> > I'm also fine with me being credited with a reported-by once the
> > more involved people figure out what to do, and keeping the vfio-ccw
> > stuff as is. Should we go with that option? 
> 
> If converting the reporting to a device status is straightforward
> enough, let's do that. I'm fine with postponing this and waiting for a
> real fix as well (I don't really have spare cycles to actually write
> vfio-ccw code currently...)
> 

I can do this after this series.

[...]

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-06  8:27               ` Dong Jia Shi
@ 2017-09-06 11:25                 ` Cornelia Huck
  2017-09-07  8:02                   ` Dong Jia Shi
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-06 11:25 UTC (permalink / raw)
  To: Dong Jia Shi; +Cc: Halil Pasic, Pierre Morel, qemu-devel

On Wed, 6 Sep 2017 16:27:20 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]:
> 
> > 
> > 
> > On 09/05/2017 05:46 PM, Cornelia Huck wrote:  
> > > On Tue, 5 Sep 2017 17:24:19 +0200
> > > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > >   
> > >> My problem with a program check (indicated by SCSW word 2 bit 10) is
> > >> that, in my reading of the architecture, the semantic behind it is: The
> > >> channel subsystem (not the cu or device) has detected, that the 
> > >> the channel program (previously submitted as an ORB) is erroneous. Which
> > >> programs are erroneous is specified by the architecture. What we have
> > >> here does not qualify.
> > >>
> > >> My idea was to rather blame the virtual hardware (device) and put no blame
> > >> on the program nor he channel subsystem. This could be done using device
> > >> status (unit check with command reject, maybe unit exception) or interface
> > >> check. My train of thought was, the problem is not consistent across a
> > >> device type, so it has to be device specific.  
> > > 
> > > Unit exception might be a better way to express what is happening here.
> > > At least, it moves us away from cc 1 and not towards cc 3 :)
> > >   
> > 
> > I will do a follow up patch pursuing device exception.
> >   
> > >>
> > >> Of course blaming the device could mislead the person encountering the
> > >> problem, and make him believe it's an non-virtual hardware problem.
> > >>
> > >> About the misleading, I think the best we can do is log out a message
> > >> indicating what really happened.  
> > > 
> > > Just document it in the code? If it doesn't happen with Linux as a
> > > guest, it is highly unlikely to be seen in the wild.
> > >   
> > 
> > 
> > Well we have two problems here:
> > 1) Unit exception can be already defined by the device type for the
> > command (reference: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920).
> > I think this one is what you mean. And I agree that's best handled
> > with comment in code.  
> Using unit check, with bit 3 byte 0 of the sense data set to 1, to
> indicate an 'Equipment check', sounds a bit more proper than unit
> exception.

I don't agree: Equipment check sounds a lot more dire (and seems to
imply a malfunction). I like unit exception better.

> 
> > 2) The poor user/programmer is trying to figure out why things
> > don't work (why are we getting the unit exception)? I think that's
> > best remedied with producing something for the log (maybe a warning
> > with warn_report which states that the implementation vfio-ccw requires
> > the given flags).  
> Fine with me.

With me as well.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-05 17:20             ` Halil Pasic
  2017-09-06  8:27               ` Dong Jia Shi
@ 2017-09-06 11:37               ` Cornelia Huck
  1 sibling, 0 replies; 60+ messages in thread
From: Cornelia Huck @ 2017-09-06 11:37 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel, Thomas Huth

On Tue, 5 Sep 2017 19:20:43 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 09/05/2017 05:46 PM, Cornelia Huck wrote:
> > On Tue, 5 Sep 2017 17:24:19 +0200
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> >> Despite of that we already had a problem of this type: see 1728cff2ab
> >> ("s390x/3270: fix instruction interception handler", 2017-06-09) by 
> >> Dong Jia. If we had some automated testing covering all the asserts
> >> I would not think twice about using an assert here. But I don't think
> >> we do and I'm reluctant (not positive that assert is superior to what
> >> we have now). Maybe we could agree on reported by again.  
> > 
> > Yes, we (as in generally 'we') are really lacking automated testing...
> > (it is somewhere on my todo list).
> > 
> > Either leave it as-is, or do an assert. -ENODEV just feels wrong.
> >   
> 
> I think I will leave this one as is and maybe try to discuss with
> the folks here about reliable test coverage. Just spoke with Marc H.,
> and according to that we have a long way to go.

Ideally, we want something that can be executed from 'make check'. We
can already cover some basic stuff via tcg (I need to look into wiring
up more stuff), people with access to hardware should be able to cover
the rest.

That's not to say that extensive in-house testing by you guys wouldn't
be helpful, quite the contrary :)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-06  8:37             ` Dong Jia Shi
@ 2017-09-06 11:38               ` Cornelia Huck
  0 siblings, 0 replies; 60+ messages in thread
From: Cornelia Huck @ 2017-09-06 11:38 UTC (permalink / raw)
  To: Dong Jia Shi; +Cc: Halil Pasic, Pierre Morel, qemu-devel

On Wed, 6 Sep 2017 16:37:08 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Cornelia Huck <cohuck@redhat.com> [2017-09-05 17:46:06 +0200]:
> 
> > On Tue, 5 Sep 2017 17:24:19 +0200
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> > > In the end I don't care that deeply about vfio-ccw, and this problem
> > > already took me more time than I intended to spend on this. We have
> > > people driving this whole vfio-ccw stuff and I'm not one of them (I'm
> > > rather in the supporting role).
> > > 
> > > I'm also fine with me being credited with a reported-by once the
> > > more involved people figure out what to do, and keeping the vfio-ccw
> > > stuff as is. Should we go with that option?   
> > 
> > If converting the reporting to a device status is straightforward
> > enough, let's do that. I'm fine with postponing this and waiting for a
> > real fix as well (I don't really have spare cycles to actually write
> > vfio-ccw code currently...)
> >   
> 
> I can do this after this series.

Cool, thanks!

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-06  4:31           ` Dong Jia Shi
@ 2017-09-06 12:25             ` Halil Pasic
  2017-09-06 14:20               ` Cornelia Huck
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-06 12:25 UTC (permalink / raw)
  To: Cornelia Huck, Xiao Feng Ren, Dong Jia Shi, Pierre Morel, qemu-devel



On 09/06/2017 06:31 AM, Dong Jia Shi wrote:
> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-06 00:30:58 +0200]:
> 
> [...]
> 
>>>
>>>> But I guess, I was afraid of somebody blaming me for this
>>>> comment (such TODOs in production code are a bit strange -- what
>>>> does it mean: we did not bother to figure it out?).
>>>
>>> For one, the TODO is preexisting... and we really should remember to
>>> figure out if there's something better rather than just drop the
>>> comment.
>>>
>>> (And I sure hope nobody is using vfio-ccw in production yet...)
>>>
>> Since blame says the TODO has been around since 2017-05-17
>> let me have a critical look at it.
>>
>> At a first glance I would say addressing exception for SSCH
>> is not what we want: the only possibility I see for address
>> exception for SSCH is due to the ORB address. But if that's
>> the case we will never reach the code in question.
> Agree.
> 
>> So I suppose we can do better.
> As the comment said, I'm (still) in the state of 'wondering'.
> 
>>
>> Adding Ren. @Ren: Do you agree with my analysis. If you do,
>> I could come up with a proposal what to do -- after some reading.
> If you have a better idea, and time, why not? ;)
> 

We have basically two possibilities/options which ask for different
handling:
1) EFAULT is due to a bug in the vfio-ccw implementation
(can be QEMU or kernel).
2) EFAULT is due to buggy channel program.

Option 2) is basically to be handled with a channel-program check and
setting primary secondary and alert status. For reference see PoP page
15-59 ("Designation of Storage Area").  An exception may be an invalid
channel program address in the ORB. There the channel-program check ain't
explicitly stated (although) I would expect one. It may be implied by the
things on page 15-59 though.

Option 1) is however very similar to other we have figured out that the
implementation is broken situations and should be handled consequently.
The current state of the discussion is with a unit exception.

Does that make sense?

Now, Dong Jia, I need your help to figure out do we have option 1 or
option 2 here? After quick look at the kernel code, it appears to me that
I've seen both option 1 and option 2 (I'm afraid) -- but my assessment
was really very superficial.

I would expect option 2 to be handled differently (kernel provides the
SCSW) though.

Regards,
Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-06 12:25             ` Halil Pasic
@ 2017-09-06 14:20               ` Cornelia Huck
  2017-09-06 14:43                 ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-06 14:20 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Xiao Feng Ren, Dong Jia Shi, Pierre Morel, qemu-devel

On Wed, 6 Sep 2017 14:25:13 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> We have basically two possibilities/options which ask for different
> handling:
> 1) EFAULT is due to a bug in the vfio-ccw implementation
> (can be QEMU or kernel).
> 2) EFAULT is due to buggy channel program.
> 
> Option 2) is basically to be handled with a channel-program check and
> setting primary secondary and alert status. For reference see PoP page
> 15-59 ("Designation of Storage Area").  An exception may be an invalid
> channel program address in the ORB. There the channel-program check ain't
> explicitly stated (although) I would expect one. It may be implied by the
> things on page 15-59 though.
> 
> Option 1) is however very similar to other we have figured out that the
> implementation is broken situations and should be handled consequently.
> The current state of the discussion is with a unit exception.
> 
> Does that make sense?

I think the situation is slightly different here, though. For the orb
flags, we reject something out of hand because we have not implemented
it, and for that, unit exception sounds like a good fit. Processing
errors, however, are more similar to errors in the hardware, and as
such can probably be reported via something like equipment check.

> 
> Now, Dong Jia, I need your help to figure out do we have option 1 or
> option 2 here? After quick look at the kernel code, it appears to me that
> I've seen both option 1 and option 2 (I'm afraid) -- but my assessment
> was really very superficial.
> 
> I would expect option 2 to be handled differently (kernel provides the
> SCSW) though.
> 
> Regards,
> Halil
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-06 14:20               ` Cornelia Huck
@ 2017-09-06 14:43                 ` Halil Pasic
  2017-09-07  8:58                   ` Dong Jia Shi
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-06 14:43 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, Xiao Feng Ren, qemu-devel



On 09/06/2017 04:20 PM, Cornelia Huck wrote:
> On Wed, 6 Sep 2017 14:25:13 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> We have basically two possibilities/options which ask for different
>> handling:
>> 1) EFAULT is due to a bug in the vfio-ccw implementation
>> (can be QEMU or kernel).
>> 2) EFAULT is due to buggy channel program.
>>
>> Option 2) is basically to be handled with a channel-program check and
>> setting primary secondary and alert status. For reference see PoP page
>> 15-59 ("Designation of Storage Area").  An exception may be an invalid
>> channel program address in the ORB. There the channel-program check ain't
>> explicitly stated (although) I would expect one. It may be implied by the
>> things on page 15-59 though.
>>
>> Option 1) is however very similar to other we have figured out that the
>> implementation is broken situations and should be handled consequently.
>> The current state of the discussion is with a unit exception.
>>
>> Does that make sense?
> 
> I think the situation is slightly different here, though. For the orb
> flags, we reject something out of hand because we have not implemented
> it, and for that, unit exception sounds like a good fit. Processing
> errors, however, are more similar to errors in the hardware, and as
> such can probably be reported via something like equipment check.
> 

Noted. Let's see what Dong Jia has to say, before we continuing a
discussion on something (option 1) what may be irrelevant anyway.

>>
>> Now, Dong Jia, I need your help to figure out do we have option 1 or
>> option 2 here? After quick look at the kernel code, it appears to me that
>> I've seen both option 1 and option 2 (I'm afraid) -- but my assessment
>> was really very superficial.
>>
>> I would expect option 2 to be handled differently (kernel provides the
>> SCSW) though.
>>
>> Regards,
>> Halil
>>
> 
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-06 11:25                 ` Cornelia Huck
@ 2017-09-07  8:02                   ` Dong Jia Shi
  2017-09-07 11:01                     ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-07  8:02 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Halil Pasic, Pierre Morel, qemu-devel

* Cornelia Huck <cohuck@redhat.com> [2017-09-06 13:25:38 +0200]:

> On Wed, 6 Sep 2017 16:27:20 +0800
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]:
> > 
> > > 
> > > 
> > > On 09/05/2017 05:46 PM, Cornelia Huck wrote:  
> > > > On Tue, 5 Sep 2017 17:24:19 +0200
> > > > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > > >   
> > > >> My problem with a program check (indicated by SCSW word 2 bit 10) is
> > > >> that, in my reading of the architecture, the semantic behind it is: The
> > > >> channel subsystem (not the cu or device) has detected, that the 
> > > >> the channel program (previously submitted as an ORB) is erroneous. Which
> > > >> programs are erroneous is specified by the architecture. What we have
> > > >> here does not qualify.
> > > >>
> > > >> My idea was to rather blame the virtual hardware (device) and put no blame
> > > >> on the program nor he channel subsystem. This could be done using device
> > > >> status (unit check with command reject, maybe unit exception) or interface
> > > >> check. My train of thought was, the problem is not consistent across a
> > > >> device type, so it has to be device specific.  
> > > > 
> > > > Unit exception might be a better way to express what is happening here.
> > > > At least, it moves us away from cc 1 and not towards cc 3 :)
> > > >   
> > > 
> > > I will do a follow up patch pursuing device exception.
> > >   
> > > >>
> > > >> Of course blaming the device could mislead the person encountering the
> > > >> problem, and make him believe it's an non-virtual hardware problem.
> > > >>
> > > >> About the misleading, I think the best we can do is log out a message
> > > >> indicating what really happened.  
> > > > 
> > > > Just document it in the code? If it doesn't happen with Linux as a
> > > > guest, it is highly unlikely to be seen in the wild.
> > > >   
> > > 
> > > 
> > > Well we have two problems here:
> > > 1) Unit exception can be already defined by the device type for the
> > > command (reference: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920).
> > > I think this one is what you mean. And I agree that's best handled
> > > with comment in code.  
> > Using unit check, with bit 3 byte 0 of the sense data set to 1, to
> > indicate an 'Equipment check', sounds a bit more proper than unit
> > exception.
> 
> I don't agree: Equipment check sounds a lot more dire (and seems to
> imply a malfunction). I like unit exception better.
Got the point. Fair enough!

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-06 14:43                 ` Halil Pasic
@ 2017-09-07  8:58                   ` Dong Jia Shi
  2017-09-07 10:15                     ` Halil Pasic
  2017-09-07 10:24                     ` Cornelia Huck
  0 siblings, 2 replies; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-07  8:58 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, Dong Jia Shi, Pierre Morel, Xiao Feng Ren, qemu-devel

* Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-06 16:43:42 +0200]:

> 
> 
> On 09/06/2017 04:20 PM, Cornelia Huck wrote:
> > On Wed, 6 Sep 2017 14:25:13 +0200
> > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > 
> >> We have basically two possibilities/options which ask for different
> >> handling:
> >> 1) EFAULT is due to a bug in the vfio-ccw implementation
> >> (can be QEMU or kernel).
> >> 2) EFAULT is due to buggy channel program.
> >>
> >> Option 2) is basically to be handled with a channel-program check and
> >> setting primary secondary and alert status. For reference see PoP page
> >> 15-59 ("Designation of Storage Area").  An exception may be an invalid
> >> channel program address in the ORB. There the channel-program check ain't
> >> explicitly stated (although) I would expect one. It may be implied by the
> >> things on page 15-59 though.
> >>
> >> Option 1) is however very similar to other we have figured out that the
> >> implementation is broken situations and should be handled consequently.
> >> The current state of the discussion is with a unit exception.
> >>
> >> Does that make sense?
> > 
> > I think the situation is slightly different here, though. For the orb
> > flags, we reject something out of hand because we have not implemented
> > it, and for that, unit exception sounds like a good fit. Processing
> > errors, however, are more similar to errors in the hardware, and as
> > such can probably be reported via something like equipment check.
> > 
> 
> Noted. Let's see what Dong Jia has to say, before we continuing a
> discussion on something (option 1) what may be irrelevant anyway.
> 
> >>
> >> Now, Dong Jia, I need your help to figure out do we have option 1 or
> >> option 2 here? After quick look at the kernel code, it appears to me that
> >> I've seen both option 1 and option 2 (I'm afraid) -- but my assessment
> >> was really very superficial.
There are three cases (all in the kernel) that generate a -EFAULT ret
code:
a. vfio_ccw_mdev_write: copy_from_user() fails.
  This is option 1.

b. ccwchain_fetch_tic
  It's mostly likely that the vfio-ccw driver processed the ccw chains
  wrongly. (Actually I can not think of any other reason.)
  This is option 1.

c. ccwchain_fetch_idal
  When we find that an IDAW contents an invalid address
  This is option 2.

> >>
> >> I would expect option 2 to be handled differently (kernel provides the
> >> SCSW) though.
> >>
> >> Regards,
> >> Halil
> >>
> > 
> > 

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-07  8:58                   ` Dong Jia Shi
@ 2017-09-07 10:15                     ` Halil Pasic
  2017-09-07 10:24                     ` Cornelia Huck
  1 sibling, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-09-07 10:15 UTC (permalink / raw)
  To: Cornelia Huck, Dong Jia Shi, Pierre Morel, Xiao Feng Ren, qemu-devel



On 09/07/2017 10:58 AM, Dong Jia Shi wrote:
> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-06 16:43:42 +0200]:
> 
>>
>>
>> On 09/06/2017 04:20 PM, Cornelia Huck wrote:
>>> On Wed, 6 Sep 2017 14:25:13 +0200
>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>
>>>> We have basically two possibilities/options which ask for different
>>>> handling:
>>>> 1) EFAULT is due to a bug in the vfio-ccw implementation
>>>> (can be QEMU or kernel).
>>>> 2) EFAULT is due to buggy channel program.
>>>>
>>>> Option 2) is basically to be handled with a channel-program check and
>>>> setting primary secondary and alert status. For reference see PoP page
>>>> 15-59 ("Designation of Storage Area").  An exception may be an invalid
>>>> channel program address in the ORB. There the channel-program check ain't
>>>> explicitly stated (although) I would expect one. It may be implied by the
>>>> things on page 15-59 though.
>>>>
>>>> Option 1) is however very similar to other we have figured out that the
>>>> implementation is broken situations and should be handled consequently.
>>>> The current state of the discussion is with a unit exception.
>>>>
>>>> Does that make sense?
>>>
>>> I think the situation is slightly different here, though. For the orb
>>> flags, we reject something out of hand because we have not implemented
>>> it, and for that, unit exception sounds like a good fit. Processing
>>> errors, however, are more similar to errors in the hardware, and as
>>> such can probably be reported via something like equipment check.
>>>
>>
>> Noted. Let's see what Dong Jia has to say, before we continuing a
>> discussion on something (option 1) what may be irrelevant anyway.
>>
>>>>
>>>> Now, Dong Jia, I need your help to figure out do we have option 1 or
>>>> option 2 here? After quick look at the kernel code, it appears to me that
>>>> I've seen both option 1 and option 2 (I'm afraid) -- but my assessment
>>>> was really very superficial.
> There are three cases (all in the kernel) that generate a -EFAULT ret
> code:
> a. vfio_ccw_mdev_write: copy_from_user() fails.
>   This is option 1.
> 
> b. ccwchain_fetch_tic
>   It's mostly likely that the vfio-ccw driver processed the ccw chains
>   wrongly. (Actually I can not think of any other reason.)
>   This is option 1.
> 
> c. ccwchain_fetch_idal
>   When we find that an IDAW contents an invalid address
>   This is option 2.
> 

So my fear was justified. If we can't tell if its option 1 or 2, and
currently we can't, I would say we shall trust our infrastructure and
blame the channel program: that boils down to channel-program check.
That's my assessment with the kernel driver being as-is.

If we are willing to change the vfio-ccw kernel driver, then generating
a response to option 2 conditions in the kernel (basically an SCSW) is
IMHO better. For instance we can do SCSW.cpa and SCWS.count properly.
AFAIR we are not allowed to present conditions that logically did not
happen (yet) -- for instance if we have per-fetched a bad CCW address
but the given ccw did not become active yet.

If we handle option 2 via a different channel (SCWS instead of return
code) then we could happily do the proper handling for option 1 here.

So Dong Jia the call is again yours to make!

Regards,
Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-07  8:58                   ` Dong Jia Shi
  2017-09-07 10:15                     ` Halil Pasic
@ 2017-09-07 10:24                     ` Cornelia Huck
  2017-09-07 11:32                       ` Halil Pasic
  1 sibling, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-07 10:24 UTC (permalink / raw)
  To: Dong Jia Shi; +Cc: Halil Pasic, Pierre Morel, Xiao Feng Ren, qemu-devel

On Thu, 7 Sep 2017 16:58:31 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-06 16:43:42 +0200]:
> 
> > 
> > 
> > On 09/06/2017 04:20 PM, Cornelia Huck wrote:  
> > > On Wed, 6 Sep 2017 14:25:13 +0200
> > > Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> > >   
> > >> We have basically two possibilities/options which ask for different
> > >> handling:
> > >> 1) EFAULT is due to a bug in the vfio-ccw implementation
> > >> (can be QEMU or kernel).
> > >> 2) EFAULT is due to buggy channel program.
> > >>
> > >> Option 2) is basically to be handled with a channel-program check and
> > >> setting primary secondary and alert status. For reference see PoP page
> > >> 15-59 ("Designation of Storage Area").  An exception may be an invalid
> > >> channel program address in the ORB. There the channel-program check ain't
> > >> explicitly stated (although) I would expect one. It may be implied by the
> > >> things on page 15-59 though.
> > >>
> > >> Option 1) is however very similar to other we have figured out that the
> > >> implementation is broken situations and should be handled consequently.
> > >> The current state of the discussion is with a unit exception.
> > >>
> > >> Does that make sense?  
> > > 
> > > I think the situation is slightly different here, though. For the orb
> > > flags, we reject something out of hand because we have not implemented
> > > it, and for that, unit exception sounds like a good fit. Processing
> > > errors, however, are more similar to errors in the hardware, and as
> > > such can probably be reported via something like equipment check.
> > >   
> > 
> > Noted. Let's see what Dong Jia has to say, before we continuing a
> > discussion on something (option 1) what may be irrelevant anyway.
> >   
> > >>
> > >> Now, Dong Jia, I need your help to figure out do we have option 1 or
> > >> option 2 here? After quick look at the kernel code, it appears to me that
> > >> I've seen both option 1 and option 2 (I'm afraid) -- but my assessment
> > >> was really very superficial.  
> There are three cases (all in the kernel) that generate a -EFAULT ret
> code:
> a. vfio_ccw_mdev_write: copy_from_user() fails.
>   This is option 1.
> 
> b. ccwchain_fetch_tic
>   It's mostly likely that the vfio-ccw driver processed the ccw chains
>   wrongly. (Actually I can not think of any other reason.)

Me neither, I'd consider hitting this a bug in the implementation.

>   This is option 1.
> 
> c. ccwchain_fetch_idal
>   When we find that an IDAW contents an invalid address
>   This is option 2.
> 
> > >>
> > >> I would expect option 2 to be handled differently (kernel provides the
> > >> SCSW) though.

So we would do an equipment check for the first two ("equipment", i.e.
the software, is malfunctioning) and use a more appropriate way for the
malformed idaw?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-07  8:02                   ` Dong Jia Shi
@ 2017-09-07 11:01                     ` Halil Pasic
  2017-09-13 10:08                       ` Cornelia Huck
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-07 11:01 UTC (permalink / raw)
  To: Cornelia Huck, Dong Jia Shi, Pierre Morel, qemu-devel



On 09/07/2017 10:02 AM, Dong Jia Shi wrote:
> * Cornelia Huck <cohuck@redhat.com> [2017-09-06 13:25:38 +0200]:
> 
>> On Wed, 6 Sep 2017 16:27:20 +0800
>> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
>>
>>> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]:
>>>
>>>>
>>>>
>>>> On 09/05/2017 05:46 PM, Cornelia Huck wrote:  
>>>>> On Tue, 5 Sep 2017 17:24:19 +0200
>>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>>>   
>>>>>> My problem with a program check (indicated by SCSW word 2 bit 10) is
>>>>>> that, in my reading of the architecture, the semantic behind it is: The
>>>>>> channel subsystem (not the cu or device) has detected, that the 
>>>>>> the channel program (previously submitted as an ORB) is erroneous. Which
>>>>>> programs are erroneous is specified by the architecture. What we have
>>>>>> here does not qualify.
>>>>>>
>>>>>> My idea was to rather blame the virtual hardware (device) and put no blame
>>>>>> on the program nor he channel subsystem. This could be done using device
>>>>>> status (unit check with command reject, maybe unit exception) or interface
>>>>>> check. My train of thought was, the problem is not consistent across a
>>>>>> device type, so it has to be device specific.  
>>>>>
>>>>> Unit exception might be a better way to express what is happening here.
>>>>> At least, it moves us away from cc 1 and not towards cc 3 :)
>>>>>   
>>>>
>>>> I will do a follow up patch pursuing device exception.
>>>>   
>>>>>>
>>>>>> Of course blaming the device could mislead the person encountering the
>>>>>> problem, and make him believe it's an non-virtual hardware problem.
>>>>>>
>>>>>> About the misleading, I think the best we can do is log out a message
>>>>>> indicating what really happened.  
>>>>>
>>>>> Just document it in the code? If it doesn't happen with Linux as a
>>>>> guest, it is highly unlikely to be seen in the wild.
>>>>>   
>>>>
>>>>
>>>> Well we have two problems here:
>>>> 1) Unit exception can be already defined by the device type for the
>>>> command (reference: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920).
>>>> I think this one is what you mean. And I agree that's best handled
>>>> with comment in code.  
>>> Using unit check, with bit 3 byte 0 of the sense data set to 1, to
>>> indicate an 'Equipment check', sounds a bit more proper than unit
>>> exception.
>>
>> I don't agree: Equipment check sounds a lot more dire (and seems to
>> imply a malfunction). I like unit exception better.
> Got the point. Fair enough!
> 

I do see some benefit in doing unit check over unit exception. Just
kept quite to see the discussion unfold. As already said, unit exception
seems to be something reserved for the device type to define in a more
or less arbitrary but unambiguous way. I agreed to use this, because
I trust Connie's assessment about not really being used by the
devices in the wild (obviously nothing changed here).

If we consider the semantic of unit check with command reject, it's
a surprisingly good match: basically device detected a programming
error (which can not be detected by the channel-subsystem because it
is device (type) specific). For reference see:
http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.7.2.1?DT=19920904110920

IMHO that's almost exactly what we have here: the channel-program
is good from the perspective of the channel subsystem, but the device
can't deal with it. So we would not lie that the device is at fault
(was Connie's concern initially) but we would not lie about having
a generally invalid channel program (was my concern).

So how about an unit check with a command reject? (The only problem
I see is is on the device vs device type plane -- but that ain't better
for unit exception.)

Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-07 10:24                     ` Cornelia Huck
@ 2017-09-07 11:32                       ` Halil Pasic
  2017-09-07 11:41                         ` Cornelia Huck
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-07 11:32 UTC (permalink / raw)
  To: Cornelia Huck, Dong Jia Shi; +Cc: Pierre Morel, Xiao Feng Ren, qemu-devel



On 09/07/2017 12:24 PM, Cornelia Huck wrote:
> On Thu, 7 Sep 2017 16:58:31 +0800
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
>> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-06 16:43:42 +0200]:
>>
>>>
>>>
>>> On 09/06/2017 04:20 PM, Cornelia Huck wrote:  
>>>> On Wed, 6 Sep 2017 14:25:13 +0200
>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>>   
>>>>> We have basically two possibilities/options which ask for different
>>>>> handling:
>>>>> 1) EFAULT is due to a bug in the vfio-ccw implementation
>>>>> (can be QEMU or kernel).
>>>>> 2) EFAULT is due to buggy channel program.
>>>>>
>>>>> Option 2) is basically to be handled with a channel-program check and
>>>>> setting primary secondary and alert status. For reference see PoP page
>>>>> 15-59 ("Designation of Storage Area").  An exception may be an invalid
>>>>> channel program address in the ORB. There the channel-program check ain't
>>>>> explicitly stated (although) I would expect one. It may be implied by the
>>>>> things on page 15-59 though.
>>>>>
>>>>> Option 1) is however very similar to other we have figured out that the
>>>>> implementation is broken situations and should be handled consequently.
>>>>> The current state of the discussion is with a unit exception.
>>>>>
>>>>> Does that make sense?  
>>>>
>>>> I think the situation is slightly different here, though. For the orb
>>>> flags, we reject something out of hand because we have not implemented
>>>> it, and for that, unit exception sounds like a good fit. Processing
>>>> errors, however, are more similar to errors in the hardware, and as
>>>> such can probably be reported via something like equipment check.
>>>>   
>>>
>>> Noted. Let's see what Dong Jia has to say, before we continuing a
>>> discussion on something (option 1) what may be irrelevant anyway.
>>>   
>>>>>
>>>>> Now, Dong Jia, I need your help to figure out do we have option 1 or
>>>>> option 2 here? After quick look at the kernel code, it appears to me that
>>>>> I've seen both option 1 and option 2 (I'm afraid) -- but my assessment
>>>>> was really very superficial.  
>> There are three cases (all in the kernel) that generate a -EFAULT ret
>> code:
>> a. vfio_ccw_mdev_write: copy_from_user() fails.
>>   This is option 1.
>>
>> b. ccwchain_fetch_tic
>>   It's mostly likely that the vfio-ccw driver processed the ccw chains
>>   wrongly. (Actually I can not think of any other reason.)
> 
> Me neither, I'd consider hitting this a bug in the implementation.

Nod.

> 
>>   This is option 1.
>>
>> c. ccwchain_fetch_idal
>>   When we find that an IDAW contents an invalid address
>>   This is option 2.
>>
>>>>>
>>>>> I would expect option 2 to be handled differently (kernel provides the
>>>>> SCSW) though.
> 
> So we would do an equipment check for the first two ("equipment", i.e.
> the software, is malfunctioning) and use a more appropriate way for the
> malformed idaw?
> 

You have probably missed my previous email where I state something very
similar (MID  <2aa8cf98-c331-fe5a-0a7e-1a553c6c5054@linux.vnet.ibm.com>).
There I say: if we change the kernel code, yes, if we don't I prefer a
program check.


Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-07 11:32                       ` Halil Pasic
@ 2017-09-07 11:41                         ` Cornelia Huck
  2017-09-08  3:41                           ` Dong Jia Shi
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-07 11:41 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, Xiao Feng Ren, qemu-devel

On Thu, 7 Sep 2017 13:32:41 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 09/07/2017 12:24 PM, Cornelia Huck wrote:

> > So we would do an equipment check for the first two ("equipment", i.e.
> > the software, is malfunctioning) and use a more appropriate way for the
> > malformed idaw?
> >   
> 
> You have probably missed my previous email where I state something very
> similar (MID  <2aa8cf98-c331-fe5a-0a7e-1a553c6c5054@linux.vnet.ibm.com>).
> There I say: if we change the kernel code, yes, if we don't I prefer a
> program check.

Not missed, they crossed in mid-air.

But I agree, the decision is Dong Jia's.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-07 11:41                         ` Cornelia Huck
@ 2017-09-08  3:41                           ` Dong Jia Shi
  2017-09-08  9:21                             ` Halil Pasic
  2017-09-08 10:02                             ` Cornelia Huck
  0 siblings, 2 replies; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-08  3:41 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Dong Jia Shi, Pierre Morel, Xiao Feng Ren, qemu-devel

* Cornelia Huck <cohuck@redhat.com> [2017-09-07 13:41:34 +0200]:

> On Thu, 7 Sep 2017 13:32:41 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
> > On 09/07/2017 12:24 PM, Cornelia Huck wrote:
> 
> > > So we would do an equipment check for the first two ("equipment", i.e.
> > > the software, is malfunctioning) and use a more appropriate way for the
> > > malformed idaw?
> > >   
> > 
> > You have probably missed my previous email where I state something very
> > similar (MID  <2aa8cf98-c331-fe5a-0a7e-1a553c6c5054@linux.vnet.ibm.com>).
> > There I say: if we change the kernel code, yes, if we don't I prefer a
> > program check.
> 
> Not missed, they crossed in mid-air.
> 
> But I agree, the decision is Dong Jia's.
> 
Let' me summarize here, in case I misunderstand things. Now we have
two ways to choose:

A. Kernel: no change.
   Qemu  : handle -EFAULT as option 2 by generating a program check.

B. Kernel: return -EFAULT
           +
           update the IRB area in the I/O region for option 1 to present
           a unit check SCSW (with proper sense byte ECW), and for option
           2 to present a program check.
   Qemu  : handle -EFAULT according to the information that the IRB area
           provided.

I think the difficult part is in the Qemu side. For either A or B, in
Qemu, we'd need to return a cc0 to indicate the channel program is
accepted successfully by the device, and then update the virtual IRB and
inject an I/O interrupt to notify the guest.

The question is, now we need to map -EFAULT to cc0? This is too odd...
And we need to find a proper place to do the interrupt injection. I
guess it could be in the sch_handle_start_func_passthrough().

If we do not like such modification in the Qemu side, then A is not the
way to go.

And for B, we need to update the IRB area of the I/O region in the three
cases listed in the former mail, and:
1. We need to set the ret_code as 0 for them, so that Qemu could map it
   to cc0.
2. We need to signal Qemu to fetch the IRB we generated with the checks.
All of these are doable I think.

Any comment for the above thoughts?

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-08  3:41                           ` Dong Jia Shi
@ 2017-09-08  9:21                             ` Halil Pasic
  2017-09-08  9:59                               ` Cornelia Huck
  2017-09-08 10:02                             ` Cornelia Huck
  1 sibling, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-08  9:21 UTC (permalink / raw)
  To: Cornelia Huck, Dong Jia Shi, Pierre Morel, Xiao Feng Ren, qemu-devel



On 09/08/2017 05:41 AM, Dong Jia Shi wrote:
> Let' me summarize here, in case I misunderstand things. Now we have
> two ways to choose:
> 
> A. Kernel: no change.
>    Qemu  : handle -EFAULT as option 2 by generating a program check.
> 
> B. Kernel: return -EFAULT
>            +
>            update the IRB area in the I/O region for option 1 to present
>            a unit check SCSW (with proper sense byte ECW), and for option
>            2 to present a program check.
>    Qemu  : handle -EFAULT according to the information that the IRB area
>            provided.

This is not what I was trying to say. You got my message regarding A, but
B was supposed to be understood like this.

Keep the current handling for option 1, that is return -EFAULT. For option
2 do what the spec says, execute the program until the bad address and then
generate a program-check (SCSW) once the bad stuff has it's turn. Thus
the only change in QEMU would be handling -EFAULT with an unit check (because
now it's just option 1).

Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-08  9:21                             ` Halil Pasic
@ 2017-09-08  9:59                               ` Cornelia Huck
  2017-09-25  7:31                                 ` Dong Jia Shi
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-08  9:59 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, Xiao Feng Ren, qemu-devel

On Fri, 8 Sep 2017 11:21:57 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 09/08/2017 05:41 AM, Dong Jia Shi wrote:
> > Let' me summarize here, in case I misunderstand things. Now we have
> > two ways to choose:
> > 
> > A. Kernel: no change.
> >    Qemu  : handle -EFAULT as option 2 by generating a program check.
> > 
> > B. Kernel: return -EFAULT
> >            +
> >            update the IRB area in the I/O region for option 1 to present
> >            a unit check SCSW (with proper sense byte ECW), and for option
> >            2 to present a program check.
> >    Qemu  : handle -EFAULT according to the information that the IRB area
> >            provided.  
> 
> This is not what I was trying to say. You got my message regarding A, but
> B was supposed to be understood like this.
> 
> Keep the current handling for option 1, that is return -EFAULT. For option
> 2 do what the spec says, execute the program until the bad address and then
> generate a program-check (SCSW) once the bad stuff has it's turn. Thus
> the only change in QEMU would be handling -EFAULT with an unit check (because
> now it's just option 1).

That makes sense to me.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-08  3:41                           ` Dong Jia Shi
  2017-09-08  9:21                             ` Halil Pasic
@ 2017-09-08 10:02                             ` Cornelia Huck
  2017-09-25  7:14                               ` Dong Jia Shi
  1 sibling, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-08 10:02 UTC (permalink / raw)
  To: Dong Jia Shi; +Cc: Halil Pasic, Pierre Morel, Xiao Feng Ren, qemu-devel

On Fri, 8 Sep 2017 11:41:00 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> I think the difficult part is in the Qemu side. For either A or B, in
> Qemu, we'd need to return a cc0 to indicate the channel program is
> accepted successfully by the device, and then update the virtual IRB and
> inject an I/O interrupt to notify the guest.
> 
> The question is, now we need to map -EFAULT to cc0? This is too odd...

It's not odd at all, if you consider these as two stages:

- Initial acceptance of the command, set cc 0 to indicate you got it.
- Execution of the start function. This can then fail, of course.

> And we need to find a proper place to do the interrupt injection. I
> guess it could be in the sch_handle_start_func_passthrough().

Probably, yes.

> 
> If we do not like such modification in the Qemu side, then A is not the
> way to go.
> 
> And for B, we need to update the IRB area of the I/O region in the three
> cases listed in the former mail, and:
> 1. We need to set the ret_code as 0 for them, so that Qemu could map it
>    to cc0.
> 2. We need to signal Qemu to fetch the IRB we generated with the checks.
> All of these are doable I think.
> 
> Any comment for the above thoughts?
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-07 11:01                     ` Halil Pasic
@ 2017-09-13 10:08                       ` Cornelia Huck
  2017-09-13 14:05                         ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Cornelia Huck @ 2017-09-13 10:08 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel

On Thu, 7 Sep 2017 13:01:34 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 09/07/2017 10:02 AM, Dong Jia Shi wrote:
> > * Cornelia Huck <cohuck@redhat.com> [2017-09-06 13:25:38 +0200]:
> >   
> >> On Wed, 6 Sep 2017 16:27:20 +0800
> >> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> >>  
> >>> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]:
> >>>  
> >>>>
> >>>>
> >>>> On 09/05/2017 05:46 PM, Cornelia Huck wrote:    
> >>>>> On Tue, 5 Sep 2017 17:24:19 +0200
> >>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> >>>>>     
> >>>>>> My problem with a program check (indicated by SCSW word 2 bit 10) is
> >>>>>> that, in my reading of the architecture, the semantic behind it is: The
> >>>>>> channel subsystem (not the cu or device) has detected, that the 
> >>>>>> the channel program (previously submitted as an ORB) is erroneous. Which
> >>>>>> programs are erroneous is specified by the architecture. What we have
> >>>>>> here does not qualify.
> >>>>>>
> >>>>>> My idea was to rather blame the virtual hardware (device) and put no blame
> >>>>>> on the program nor he channel subsystem. This could be done using device
> >>>>>> status (unit check with command reject, maybe unit exception) or interface
> >>>>>> check. My train of thought was, the problem is not consistent across a
> >>>>>> device type, so it has to be device specific.    
> >>>>>
> >>>>> Unit exception might be a better way to express what is happening here.
> >>>>> At least, it moves us away from cc 1 and not towards cc 3 :)
> >>>>>     
> >>>>
> >>>> I will do a follow up patch pursuing device exception.
> >>>>     
> >>>>>>
> >>>>>> Of course blaming the device could mislead the person encountering the
> >>>>>> problem, and make him believe it's an non-virtual hardware problem.
> >>>>>>
> >>>>>> About the misleading, I think the best we can do is log out a message
> >>>>>> indicating what really happened.    
> >>>>>
> >>>>> Just document it in the code? If it doesn't happen with Linux as a
> >>>>> guest, it is highly unlikely to be seen in the wild.
> >>>>>     
> >>>>
> >>>>
> >>>> Well we have two problems here:
> >>>> 1) Unit exception can be already defined by the device type for the
> >>>> command (reference: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920).
> >>>> I think this one is what you mean. And I agree that's best handled
> >>>> with comment in code.    
> >>> Using unit check, with bit 3 byte 0 of the sense data set to 1, to
> >>> indicate an 'Equipment check', sounds a bit more proper than unit
> >>> exception.  
> >>
> >> I don't agree: Equipment check sounds a lot more dire (and seems to
> >> imply a malfunction). I like unit exception better.  
> > Got the point. Fair enough!
> >   
> 
> I do see some benefit in doing unit check over unit exception. Just
> kept quite to see the discussion unfold. As already said, unit exception
> seems to be something reserved for the device type to define in a more
> or less arbitrary but unambiguous way. I agreed to use this, because
> I trust Connie's assessment about not really being used by the
> devices in the wild (obviously nothing changed here).
> 
> If we consider the semantic of unit check with command reject, it's
> a surprisingly good match: basically device detected a programming
> error (which can not be detected by the channel-subsystem because it
> is device (type) specific). For reference see:
> http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.7.2.1?DT=19920904110920
> 
> IMHO that's almost exactly what we have here: the channel-program
> is good from the perspective of the channel subsystem, but the device
> can't deal with it. So we would not lie that the device is at fault
> (was Connie's concern initially) but we would not lie about having
> a generally invalid channel program (was my concern).
> 
> So how about an unit check with a command reject? (The only problem
> I see is is on the device vs device type plane -- but that ain't better
> for unit exception.)

I don't know, it feels a bit weird if I look at the cases where I saw
command reject in the wild before, even if seems to agree with the
architecture... but just a gut feeling.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
  2017-09-13 10:08                       ` Cornelia Huck
@ 2017-09-13 14:05                         ` Halil Pasic
  0 siblings, 0 replies; 60+ messages in thread
From: Halil Pasic @ 2017-09-13 14:05 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dong Jia Shi, Pierre Morel, qemu-devel



On 09/13/2017 12:08 PM, Cornelia Huck wrote:
> On Thu, 7 Sep 2017 13:01:34 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
>> On 09/07/2017 10:02 AM, Dong Jia Shi wrote:
>>> * Cornelia Huck <cohuck@redhat.com> [2017-09-06 13:25:38 +0200]:
>>>   
>>>> On Wed, 6 Sep 2017 16:27:20 +0800
>>>> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
>>>>  
>>>>> * Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-05 19:20:43 +0200]:
>>>>>  
>>>>>>
>>>>>>
>>>>>> On 09/05/2017 05:46 PM, Cornelia Huck wrote:    
>>>>>>> On Tue, 5 Sep 2017 17:24:19 +0200
>>>>>>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>>>>>>     
>>>>>>>> My problem with a program check (indicated by SCSW word 2 bit 10) is
>>>>>>>> that, in my reading of the architecture, the semantic behind it is: The
>>>>>>>> channel subsystem (not the cu or device) has detected, that the 
>>>>>>>> the channel program (previously submitted as an ORB) is erroneous. Which
>>>>>>>> programs are erroneous is specified by the architecture. What we have
>>>>>>>> here does not qualify.
>>>>>>>>
>>>>>>>> My idea was to rather blame the virtual hardware (device) and put no blame
>>>>>>>> on the program nor he channel subsystem. This could be done using device
>>>>>>>> status (unit check with command reject, maybe unit exception) or interface
>>>>>>>> check. My train of thought was, the problem is not consistent across a
>>>>>>>> device type, so it has to be device specific.    
>>>>>>>
>>>>>>> Unit exception might be a better way to express what is happening here.
>>>>>>> At least, it moves us away from cc 1 and not towards cc 3 :)
>>>>>>>     
>>>>>>
>>>>>> I will do a follow up patch pursuing device exception.
>>>>>>     
>>>>>>>>
>>>>>>>> Of course blaming the device could mislead the person encountering the
>>>>>>>> problem, and make him believe it's an non-virtual hardware problem.
>>>>>>>>
>>>>>>>> About the misleading, I think the best we can do is log out a message
>>>>>>>> indicating what really happened.    
>>>>>>>
>>>>>>> Just document it in the code? If it doesn't happen with Linux as a
>>>>>>> guest, it is highly unlikely to be seen in the wild.
>>>>>>>     
>>>>>>
>>>>>>
>>>>>> Well we have two problems here:
>>>>>> 1) Unit exception can be already defined by the device type for the
>>>>>> command (reference: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.6.10?DT=19920904110920).
>>>>>> I think this one is what you mean. And I agree that's best handled
>>>>>> with comment in code.    
>>>>> Using unit check, with bit 3 byte 0 of the sense data set to 1, to
>>>>> indicate an 'Equipment check', sounds a bit more proper than unit
>>>>> exception.  
>>>>
>>>> I don't agree: Equipment check sounds a lot more dire (and seems to
>>>> imply a malfunction). I like unit exception better.  
>>> Got the point. Fair enough!
>>>   
>>
>> I do see some benefit in doing unit check over unit exception. Just
>> kept quite to see the discussion unfold. As already said, unit exception
>> seems to be something reserved for the device type to define in a more
>> or less arbitrary but unambiguous way. I agreed to use this, because
>> I trust Connie's assessment about not really being used by the
>> devices in the wild (obviously nothing changed here).
>>
>> If we consider the semantic of unit check with command reject, it's
>> a surprisingly good match: basically device detected a programming
>> error (which can not be detected by the channel-subsystem because it
>> is device (type) specific). For reference see:
>> http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar110/2.7.2.1?DT=19920904110920
>>
>> IMHO that's almost exactly what we have here: the channel-program
>> is good from the perspective of the channel subsystem, but the device
>> can't deal with it. So we would not lie that the device is at fault
>> (was Connie's concern initially) but we would not lie about having
>> a generally invalid channel program (was my concern).
>>
>> So how about an unit check with a command reject? (The only problem
>> I see is is on the device vs device type plane -- but that ain't better
>> for unit exception.)
> 
> I don't know, it feels a bit weird if I look at the cases where I saw
> command reject in the wild before, even if seems to agree with the
> architecture... but just a gut feeling.
> 

Then let's settle for unit exception for now. I will let this topic
(series) rest for a couple of days in favor of things like virtio-crypto
spec review, maybe IDA, and some other stuff. But I definitely intend
to pick this series up again.

Halil

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-08 10:02                             ` Cornelia Huck
@ 2017-09-25  7:14                               ` Dong Jia Shi
  0 siblings, 0 replies; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-25  7:14 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Dong Jia Shi, Halil Pasic, Pierre Morel, Xiao Feng Ren, qemu-devel

* Cornelia Huck <cohuck@redhat.com> [2017-09-08 12:02:38 +0200]:

> On Fri, 8 Sep 2017 11:41:00 +0800
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > I think the difficult part is in the Qemu side. For either A or B, in
> > Qemu, we'd need to return a cc0 to indicate the channel program is
> > accepted successfully by the device, and then update the virtual IRB and
> > inject an I/O interrupt to notify the guest.
> > 
> > The question is, now we need to map -EFAULT to cc0? This is too odd...
> 
> It's not odd at all, if you consider these as two stages:
> 
> - Initial acceptance of the command, set cc 0 to indicate you got it.
> - Execution of the start function. This can then fail, of course.
Ok. I got the idea!

> 
> > And we need to find a proper place to do the interrupt injection. I
> > guess it could be in the sch_handle_start_func_passthrough().
> 
> Probably, yes.
> 
> > 
> > If we do not like such modification in the Qemu side, then A is not the
> > way to go.
> > 
> > And for B, we need to update the IRB area of the I/O region in the three
> > cases listed in the former mail, and:
> > 1. We need to set the ret_code as 0 for them, so that Qemu could map it
> >    to cc0.
> > 2. We need to signal Qemu to fetch the IRB we generated with the checks.
> > All of these are doable I think.
> > 
> > Any comment for the above thoughts?
> > 
> 

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-08  9:59                               ` Cornelia Huck
@ 2017-09-25  7:31                                 ` Dong Jia Shi
  2017-09-25 10:57                                   ` Halil Pasic
  0 siblings, 1 reply; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-25  7:31 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Dong Jia Shi, Pierre Morel, Xiao Feng Ren, qemu-devel

* Cornelia Huck <cohuck@redhat.com> [2017-09-08 11:59:50 +0200]:

> On Fri, 8 Sep 2017 11:21:57 +0200
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> 
> > On 09/08/2017 05:41 AM, Dong Jia Shi wrote:
> > > Let' me summarize here, in case I misunderstand things. Now we have
> > > two ways to choose:
> > > 
> > > A. Kernel: no change.
> > >    Qemu  : handle -EFAULT as option 2 by generating a program check.
> > > 
> > > B. Kernel: return -EFAULT
> > >            +
> > >            update the IRB area in the I/O region for option 1 to present
> > >            a unit check SCSW (with proper sense byte ECW), and for option
> > >            2 to present a program check.
> > >    Qemu  : handle -EFAULT according to the information that the IRB area
> > >            provided.  
> > 
> > This is not what I was trying to say. You got my message regarding A, but
> > B was supposed to be understood like this.
> > 
> > Keep the current handling for option 1, that is return -EFAULT. For option
> > 2 do what the spec says, execute the program until the bad address and then
> > generate a program-check (SCSW) once the bad stuff has it's turn. Thus
> > the only change in QEMU would be handling -EFAULT with an unit check (because
> > now it's just option 1).
Let me adding some context information here by copying some words from the
previous mail in this thread:
The only option 2 case in the kernel is ccwchain_fetch_idal() finding a
bad idaw_iova.

What you propose to do for this case is (correct me if I get it wrong):
In ccwchain_fetch_idal(), we do not return -EFAULT, instead we return 0,
and issuing the incompletely translated channel program with the bad
address to the physical device. And QEMU will eventually get the SCSW
with the program-check from the physical device I/O result, and inject
it to guest for further handling.

Is this understanding right? If so, I'm fine with that, and I can
provide the fix in the kernel.

> 
> That makes sense to me.
> 

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-25  7:31                                 ` Dong Jia Shi
@ 2017-09-25 10:57                                   ` Halil Pasic
  2017-09-27  7:55                                     ` Dong Jia Shi
  0 siblings, 1 reply; 60+ messages in thread
From: Halil Pasic @ 2017-09-25 10:57 UTC (permalink / raw)
  To: qemu-devel



On 09/25/2017 09:31 AM, Dong Jia Shi wrote:
> * Cornelia Huck <cohuck@redhat.com> [2017-09-08 11:59:50 +0200]:
> 
>> On Fri, 8 Sep 2017 11:21:57 +0200
>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>
>>> On 09/08/2017 05:41 AM, Dong Jia Shi wrote:
>>>> Let' me summarize here, in case I misunderstand things. Now we have
>>>> two ways to choose:
>>>>
>>>> A. Kernel: no change.
>>>>    Qemu  : handle -EFAULT as option 2 by generating a program check.
>>>>
>>>> B. Kernel: return -EFAULT
>>>>            +
>>>>            update the IRB area in the I/O region for option 1 to present
>>>>            a unit check SCSW (with proper sense byte ECW), and for option
>>>>            2 to present a program check.
>>>>    Qemu  : handle -EFAULT according to the information that the IRB area
>>>>            provided.  
>>>
>>> This is not what I was trying to say. You got my message regarding A, but
>>> B was supposed to be understood like this.
>>>
>>> Keep the current handling for option 1, that is return -EFAULT. For option
>>> 2 do what the spec says, execute the program until the bad address and then
>>> generate a program-check (SCSW) once the bad stuff has it's turn. Thus
>>> the only change in QEMU would be handling -EFAULT with an unit check (because
>>> now it's just option 1).
> Let me adding some context information here by copying some words from the
> previous mail in this thread:
> The only option 2 case in the kernel is ccwchain_fetch_idal() finding a
> bad idaw_iova.
> 
> What you propose to do for this case is (correct me if I get it wrong):
> In ccwchain_fetch_idal(), we do not return -EFAULT, instead we return 0,
> and issuing the incompletely translated channel program with the bad
> address to the physical device. And QEMU will eventually get the SCSW
> with the program-check from the physical device I/O result, and inject
> it to guest for further handling.
> 

I guess that would be the cleanest. I would also be fine with not making
the physical device program-check (issuing a shortened channel program,
and doing the program check in software) but that's probably more
complicated to implement.

> Is this understanding right? If so, I'm fine with that, and I can
> provide the fix in the kernel.
> 

That would be nice.

>>
>> That makes sense to me.
>>
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH
  2017-09-25 10:57                                   ` Halil Pasic
@ 2017-09-27  7:55                                     ` Dong Jia Shi
  0 siblings, 0 replies; 60+ messages in thread
From: Dong Jia Shi @ 2017-09-27  7:55 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dong Jia Shi, Cornelia Huck, Pierre Morel, qemu-devel

* Halil Pasic <pasic@linux.vnet.ibm.com> [2017-09-25 12:57:31 +0200]:

[restored Cc:]

> 
> 
> On 09/25/2017 09:31 AM, Dong Jia Shi wrote:
> > * Cornelia Huck <cohuck@redhat.com> [2017-09-08 11:59:50 +0200]:
> > 
> >> On Fri, 8 Sep 2017 11:21:57 +0200
> >> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> >>
> >>> On 09/08/2017 05:41 AM, Dong Jia Shi wrote:
> >>>> Let' me summarize here, in case I misunderstand things. Now we have
> >>>> two ways to choose:
> >>>>
> >>>> A. Kernel: no change.
> >>>>    Qemu  : handle -EFAULT as option 2 by generating a program check.
> >>>>
> >>>> B. Kernel: return -EFAULT
> >>>>            +
> >>>>            update the IRB area in the I/O region for option 1 to present
> >>>>            a unit check SCSW (with proper sense byte ECW), and for option
> >>>>            2 to present a program check.
> >>>>    Qemu  : handle -EFAULT according to the information that the IRB area
> >>>>            provided.  
> >>>
> >>> This is not what I was trying to say. You got my message regarding A, but
> >>> B was supposed to be understood like this.
> >>>
> >>> Keep the current handling for option 1, that is return -EFAULT. For option
> >>> 2 do what the spec says, execute the program until the bad address and then
> >>> generate a program-check (SCSW) once the bad stuff has it's turn. Thus
> >>> the only change in QEMU would be handling -EFAULT with an unit check (because
> >>> now it's just option 1).
> > Let me adding some context information here by copying some words from the
> > previous mail in this thread:
> > The only option 2 case in the kernel is ccwchain_fetch_idal() finding a
> > bad idaw_iova.
> > 
> > What you propose to do for this case is (correct me if I get it wrong):
> > In ccwchain_fetch_idal(), we do not return -EFAULT, instead we return 0,
> > and issuing the incompletely translated channel program with the bad
> > address to the physical device. And QEMU will eventually get the SCSW
> > with the program-check from the physical device I/O result, and inject
> > it to guest for further handling.
> > 
> 
> I guess that would be the cleanest. I would also be fine with not making
> the physical device program-check (issuing a shortened channel program,
> and doing the program check in software) but that's probably more
> complicated to implement.
That's far more complicated. I will try the simple approach.

> > Is this understanding right? If so, I'm fine with that, and I can
> > provide the fix in the kernel.
> > 
> 
> That would be nice.
Ok.

-- 
Dong Jia Shi

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2017-09-27  7:55 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30 16:36 [Qemu-devel] [PATCH 0/9] Halil Pasic
2017-08-30 16:36 ` [Qemu-devel] [PATCH 1/9] s390x/css: fix cc handling for XSCH Halil Pasic
2017-08-31  5:51   ` Thomas Huth
2017-08-31  6:38     ` Cornelia Huck
2017-08-31  7:32       ` Thomas Huth
2017-08-31  8:42         ` Cornelia Huck
2017-08-31 10:19           ` Halil Pasic
2017-08-31  9:09     ` Halil Pasic
2017-08-31  9:16       ` Thomas Huth
2017-08-30 16:36 ` [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH Halil Pasic
2017-08-31  7:50   ` Thomas Huth
2017-08-31 10:54     ` Halil Pasic
2017-08-31  9:19   ` Cornelia Huck
2017-08-31 10:41     ` Halil Pasic
2017-09-05  8:02       ` Cornelia Huck
2017-09-05 15:24         ` Halil Pasic
2017-09-05 15:46           ` Cornelia Huck
2017-09-05 17:20             ` Halil Pasic
2017-09-06  8:27               ` Dong Jia Shi
2017-09-06 11:25                 ` Cornelia Huck
2017-09-07  8:02                   ` Dong Jia Shi
2017-09-07 11:01                     ` Halil Pasic
2017-09-13 10:08                       ` Cornelia Huck
2017-09-13 14:05                         ` Halil Pasic
2017-09-06 11:37               ` Cornelia Huck
2017-09-06  8:37             ` Dong Jia Shi
2017-09-06 11:38               ` Cornelia Huck
2017-08-30 16:36 ` [Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair Halil Pasic
2017-08-31  6:10   ` Thomas Huth
2017-08-31  7:44     ` Thomas Huth
2017-08-31  9:33   ` Cornelia Huck
2017-08-30 16:36 ` [Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH Halil Pasic
2017-08-31  9:55   ` Cornelia Huck
2017-09-05 15:55     ` Halil Pasic
2017-09-05 16:25       ` Cornelia Huck
2017-09-05 22:30         ` Halil Pasic
2017-09-06  4:31           ` Dong Jia Shi
2017-09-06 12:25             ` Halil Pasic
2017-09-06 14:20               ` Cornelia Huck
2017-09-06 14:43                 ` Halil Pasic
2017-09-07  8:58                   ` Dong Jia Shi
2017-09-07 10:15                     ` Halil Pasic
2017-09-07 10:24                     ` Cornelia Huck
2017-09-07 11:32                       ` Halil Pasic
2017-09-07 11:41                         ` Cornelia Huck
2017-09-08  3:41                           ` Dong Jia Shi
2017-09-08  9:21                             ` Halil Pasic
2017-09-08  9:59                               ` Cornelia Huck
2017-09-25  7:31                                 ` Dong Jia Shi
2017-09-25 10:57                                   ` Halil Pasic
2017-09-27  7:55                                     ` Dong Jia Shi
2017-09-08 10:02                             ` Cornelia Huck
2017-09-25  7:14                               ` Dong Jia Shi
2017-08-30 16:36 ` [Qemu-devel] [PATCH 5/9] s390x: refactor error handling for XSCH handler Halil Pasic
2017-08-30 16:36 ` [Qemu-devel] [PATCH 6/9] s390x: refactor error handling for CSCH handler Halil Pasic
2017-08-30 16:36 ` [Qemu-devel] [PATCH 7/9] s390x: refactor error handling for HSCH handler Halil Pasic
2017-08-30 16:36 ` [Qemu-devel] [PATCH 8/9] s390x: refactor error handling for MSCH handler Halil Pasic
2017-08-30 16:36 ` [Qemu-devel] [PATCH 9/9] s390x: factor out common ioinst handler logic Halil Pasic
2017-08-31 10:04 ` [Qemu-devel] [PATCH 0/9] Cornelia Huck
2017-08-31 10:43   ` Halil Pasic

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.