All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] vfio-ccw: support hsch/csch (kernel part)
@ 2019-01-21 11:03 ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, Cornelia Huck, Alex Williamson, qemu-devel, qemu-s390x

[This is the Linux kernel part, git tree is available at
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git vfio-ccw-eagain-caps

The companion QEMU patches are available at
https://github.com/cohuck/qemu vfio-ccw-caps]

Currently, vfio-ccw only relays START SUBCHANNEL requests to the real
device. This tends to work well for the most common 'good path' scenarios;
however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, things like
clearing pending requests at the device is currently not supported.
This may be a problem for e.g. error recovery.

This patch series introduces capabilities (similar to what vfio-pci uses)
and exposes a new async region for handling hsch/csch.

New in v2 are two patches dealing with concurrency.

Lightly tested (I can interact with a dasd as before, and reserve/release
seems to work well.) Not sure if there is a better way to test this, ideas
welcome.

Changes v1->v2:
- New patch 1: make it safe to use the cp accessors at any time; this
  should avoid problems with unsolicited interrupt handling
- New patch 2: handle concurrent accesses to the io region; the idea is
  to return -EAGAIN to userspace more often (so it can simply retry)
- also handle concurrent accesses to the async io region
- change VFIO_REGION_TYPE_CCW
- merge events for halt and clear to a single async event; this turned out
  to make the code quite a bit simpler
- probably some small changes I forgot to note down


Cornelia Huck (5):
  vfio-ccw: make it safe to access channel programs
  vfio-ccw: concurrent I/O handling
  vfio-ccw: add capabilities chain
  s390/cio: export hsch to modules
  vfio-ccw: add handling for async channel instructions

 drivers/s390/cio/Makefile           |   3 +-
 drivers/s390/cio/ioasm.c            |   1 +
 drivers/s390/cio/vfio_ccw_async.c   |  91 ++++++++++++
 drivers/s390/cio/vfio_ccw_cp.c      |   3 +
 drivers/s390/cio/vfio_ccw_cp.h      |   2 +
 drivers/s390/cio/vfio_ccw_drv.c     |  46 ++++--
 drivers/s390/cio/vfio_ccw_fsm.c     | 122 +++++++++++++++-
 drivers/s390/cio/vfio_ccw_ops.c     | 211 +++++++++++++++++++++++-----
 drivers/s390/cio/vfio_ccw_private.h |  45 ++++++
 include/uapi/linux/vfio.h           |   4 +
 include/uapi/linux/vfio_ccw.h       |  12 ++
 11 files changed, 487 insertions(+), 53 deletions(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_async.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [Qemu-devel] [PATCH v2 0/5] vfio-ccw: support hsch/csch (kernel part)
@ 2019-01-21 11:03 ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson, Cornelia Huck

[This is the Linux kernel part, git tree is available at
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git vfio-ccw-eagain-caps

The companion QEMU patches are available at
https://github.com/cohuck/qemu vfio-ccw-caps]

Currently, vfio-ccw only relays START SUBCHANNEL requests to the real
device. This tends to work well for the most common 'good path' scenarios;
however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, things like
clearing pending requests at the device is currently not supported.
This may be a problem for e.g. error recovery.

This patch series introduces capabilities (similar to what vfio-pci uses)
and exposes a new async region for handling hsch/csch.

New in v2 are two patches dealing with concurrency.

Lightly tested (I can interact with a dasd as before, and reserve/release
seems to work well.) Not sure if there is a better way to test this, ideas
welcome.

Changes v1->v2:
- New patch 1: make it safe to use the cp accessors at any time; this
  should avoid problems with unsolicited interrupt handling
- New patch 2: handle concurrent accesses to the io region; the idea is
  to return -EAGAIN to userspace more often (so it can simply retry)
- also handle concurrent accesses to the async io region
- change VFIO_REGION_TYPE_CCW
- merge events for halt and clear to a single async event; this turned out
  to make the code quite a bit simpler
- probably some small changes I forgot to note down


Cornelia Huck (5):
  vfio-ccw: make it safe to access channel programs
  vfio-ccw: concurrent I/O handling
  vfio-ccw: add capabilities chain
  s390/cio: export hsch to modules
  vfio-ccw: add handling for async channel instructions

 drivers/s390/cio/Makefile           |   3 +-
 drivers/s390/cio/ioasm.c            |   1 +
 drivers/s390/cio/vfio_ccw_async.c   |  91 ++++++++++++
 drivers/s390/cio/vfio_ccw_cp.c      |   3 +
 drivers/s390/cio/vfio_ccw_cp.h      |   2 +
 drivers/s390/cio/vfio_ccw_drv.c     |  46 ++++--
 drivers/s390/cio/vfio_ccw_fsm.c     | 122 +++++++++++++++-
 drivers/s390/cio/vfio_ccw_ops.c     | 211 +++++++++++++++++++++++-----
 drivers/s390/cio/vfio_ccw_private.h |  45 ++++++
 include/uapi/linux/vfio.h           |   4 +
 include/uapi/linux/vfio_ccw.h       |  12 ++
 11 files changed, 487 insertions(+), 53 deletions(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_async.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 1/5] vfio-ccw: make it safe to access channel programs
  2019-01-21 11:03 ` [Qemu-devel] " Cornelia Huck
@ 2019-01-21 11:03   ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, Cornelia Huck, Alex Williamson, qemu-devel, qemu-s390x

When we get a solicited interrupt, the start function may have
been cleared by a csch, but we still have a channel program
structure allocated. Make it safe to call the cp accessors in
any case, so we can call them unconditionally.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 3 +++
 drivers/s390/cio/vfio_ccw_cp.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 70a006ba4d05..714987ceea9a 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp)
 	struct ccwchain *chain, *temp;
 	int i;
 
+	cp->initialized = false;
 	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
 		for (i = 0; i < chain->ch_len; i++) {
 			pfn_array_table_unpin_free(chain->ch_pat + i,
@@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 	 */
 	cp->orb.cmd.c64 = 1;
 
+	cp->initialized = true;
+
 	return ret;
 }
 
diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
index a4b74fb1aa57..3c20cd208da5 100644
--- a/drivers/s390/cio/vfio_ccw_cp.h
+++ b/drivers/s390/cio/vfio_ccw_cp.h
@@ -21,6 +21,7 @@
  * @ccwchain_list: list head of ccwchains
  * @orb: orb for the currently processed ssch request
  * @mdev: the mediated device to perform page pinning/unpinning
+ * @initialized: whether this instance is actually initialized
  *
  * @ccwchain_list is the head of a ccwchain list, that contents the
  * translated result of the guest channel program that pointed out by
@@ -30,6 +31,7 @@ struct channel_program {
 	struct list_head ccwchain_list;
 	union orb orb;
 	struct device *mdev;
+	bool initialized;
 };
 
 extern int cp_init(struct channel_program *cp, struct device *mdev,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [Qemu-devel] [PATCH v2 1/5] vfio-ccw: make it safe to access channel programs
@ 2019-01-21 11:03   ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson, Cornelia Huck

When we get a solicited interrupt, the start function may have
been cleared by a csch, but we still have a channel program
structure allocated. Make it safe to call the cp accessors in
any case, so we can call them unconditionally.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 3 +++
 drivers/s390/cio/vfio_ccw_cp.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 70a006ba4d05..714987ceea9a 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp)
 	struct ccwchain *chain, *temp;
 	int i;
 
+	cp->initialized = false;
 	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
 		for (i = 0; i < chain->ch_len; i++) {
 			pfn_array_table_unpin_free(chain->ch_pat + i,
@@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 	 */
 	cp->orb.cmd.c64 = 1;
 
+	cp->initialized = true;
+
 	return ret;
 }
 
diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
index a4b74fb1aa57..3c20cd208da5 100644
--- a/drivers/s390/cio/vfio_ccw_cp.h
+++ b/drivers/s390/cio/vfio_ccw_cp.h
@@ -21,6 +21,7 @@
  * @ccwchain_list: list head of ccwchains
  * @orb: orb for the currently processed ssch request
  * @mdev: the mediated device to perform page pinning/unpinning
+ * @initialized: whether this instance is actually initialized
  *
  * @ccwchain_list is the head of a ccwchain list, that contents the
  * translated result of the guest channel program that pointed out by
@@ -30,6 +31,7 @@ struct channel_program {
 	struct list_head ccwchain_list;
 	union orb orb;
 	struct device *mdev;
+	bool initialized;
 };
 
 extern int cp_init(struct channel_program *cp, struct device *mdev,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-21 11:03 ` [Qemu-devel] " Cornelia Huck
@ 2019-01-21 11:03   ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, Cornelia Huck, Alex Williamson, qemu-devel, qemu-s390x

Rework handling of multiple I/O requests to return -EAGAIN if
we are already processing an I/O request. Introduce a mutex
to disallow concurrent writes to the I/O region.

The expectation is that userspace simply retries the operation
if it gets -EAGAIN.

We currently don't allow multiple ssch requests at the same
time, as we don't have support for keeping channel programs
around for more than one request.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_drv.c     |  1 +
 drivers/s390/cio/vfio_ccw_fsm.c     |  8 +++-----
 drivers/s390/cio/vfio_ccw_ops.c     | 31 +++++++++++++++++++----------
 drivers/s390/cio/vfio_ccw_private.h |  2 ++
 4 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index a10cec0e86eb..2ef189fe45ed 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -125,6 +125,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 
 	private->sch = sch;
 	dev_set_drvdata(&sch->dev, private);
+	mutex_init(&private->io_mutex);
 
 	spin_lock_irq(sch->lock);
 	private->state = VFIO_CCW_STATE_NOT_OPER;
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index cab17865aafe..f6ed934cc565 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -28,7 +28,6 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
 	sch = private->sch;
 
 	spin_lock_irqsave(sch->lock, flags);
-	private->state = VFIO_CCW_STATE_BUSY;
 
 	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
 
@@ -42,6 +41,8 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
 		 */
 		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
 		ret = 0;
+		/* Don't allow another ssch for now */
+		private->state = VFIO_CCW_STATE_BUSY;
 		break;
 	case 1:		/* Status pending */
 	case 2:		/* Busy */
@@ -99,7 +100,7 @@ static void fsm_io_error(struct vfio_ccw_private *private,
 static void fsm_io_busy(struct vfio_ccw_private *private,
 			enum vfio_ccw_event event)
 {
-	private->io_region->ret_code = -EBUSY;
+	private->io_region->ret_code = -EAGAIN;
 }
 
 static void fsm_disabled_irq(struct vfio_ccw_private *private,
@@ -130,8 +131,6 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 	struct mdev_device *mdev = private->mdev;
 	char *errstr = "request";
 
-	private->state = VFIO_CCW_STATE_BUSY;
-
 	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
 
 	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
@@ -176,7 +175,6 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 	}
 
 err_out:
-	private->state = VFIO_CCW_STATE_IDLE;
 	trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
 			       io_region->ret_code, errstr);
 }
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index f673e106c041..3fa9fc570400 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
 {
 	struct vfio_ccw_private *private;
 	struct ccw_io_region *region;
+	int ret;
 
 	if (*ppos + count > sizeof(*region))
 		return -EINVAL;
 
 	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	mutex_lock(&private->io_mutex);
 	region = private->io_region;
 	if (copy_to_user(buf, (void *)region + *ppos, count))
-		return -EFAULT;
-
-	return count;
+		ret = -EFAULT;
+	else
+		ret = count;
+	mutex_unlock(&private->io_mutex);
+	return ret;
 }
 
 static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
@@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 {
 	struct vfio_ccw_private *private;
 	struct ccw_io_region *region;
+	int ret;
 
 	if (*ppos + count > sizeof(*region))
 		return -EINVAL;
 
 	private = dev_get_drvdata(mdev_parent_dev(mdev));
-	if (private->state != VFIO_CCW_STATE_IDLE)
+	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
+	    private->state == VFIO_CCW_STATE_STANDBY)
 		return -EACCES;
+	if (!mutex_trylock(&private->io_mutex))
+		return -EAGAIN;
 
 	region = private->io_region;
-	if (copy_from_user((void *)region + *ppos, buf, count))
-		return -EFAULT;
+	if (copy_from_user((void *)region + *ppos, buf, count)) {
+		ret = -EFAULT;
+		goto out_unlock;
+	}
 
 	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
-	if (region->ret_code != 0) {
-		private->state = VFIO_CCW_STATE_IDLE;
-		return region->ret_code;
-	}
+	ret = (region->ret_code != 0) ? region->ret_code : count;
 
-	return count;
+out_unlock:
+	mutex_unlock(&private->io_mutex);
+	return ret;
 }
 
 static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 08e9a7dc9176..e88237697f83 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -28,6 +28,7 @@
  * @mdev: pointer to the mediated device
  * @nb: notifier for vfio events
  * @io_region: MMIO region to input/output I/O arguments/results
+ * @io_mutex: protect against concurrent update of I/O structures
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
  * @scsw: scsw info
@@ -42,6 +43,7 @@ struct vfio_ccw_private {
 	struct mdev_device	*mdev;
 	struct notifier_block	nb;
 	struct ccw_io_region	*io_region;
+	struct mutex		io_mutex;
 
 	struct channel_program	cp;
 	struct irb		irb;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-21 11:03   ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson, Cornelia Huck

Rework handling of multiple I/O requests to return -EAGAIN if
we are already processing an I/O request. Introduce a mutex
to disallow concurrent writes to the I/O region.

The expectation is that userspace simply retries the operation
if it gets -EAGAIN.

We currently don't allow multiple ssch requests at the same
time, as we don't have support for keeping channel programs
around for more than one request.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_drv.c     |  1 +
 drivers/s390/cio/vfio_ccw_fsm.c     |  8 +++-----
 drivers/s390/cio/vfio_ccw_ops.c     | 31 +++++++++++++++++++----------
 drivers/s390/cio/vfio_ccw_private.h |  2 ++
 4 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index a10cec0e86eb..2ef189fe45ed 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -125,6 +125,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 
 	private->sch = sch;
 	dev_set_drvdata(&sch->dev, private);
+	mutex_init(&private->io_mutex);
 
 	spin_lock_irq(sch->lock);
 	private->state = VFIO_CCW_STATE_NOT_OPER;
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index cab17865aafe..f6ed934cc565 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -28,7 +28,6 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
 	sch = private->sch;
 
 	spin_lock_irqsave(sch->lock, flags);
-	private->state = VFIO_CCW_STATE_BUSY;
 
 	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
 
@@ -42,6 +41,8 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
 		 */
 		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
 		ret = 0;
+		/* Don't allow another ssch for now */
+		private->state = VFIO_CCW_STATE_BUSY;
 		break;
 	case 1:		/* Status pending */
 	case 2:		/* Busy */
@@ -99,7 +100,7 @@ static void fsm_io_error(struct vfio_ccw_private *private,
 static void fsm_io_busy(struct vfio_ccw_private *private,
 			enum vfio_ccw_event event)
 {
-	private->io_region->ret_code = -EBUSY;
+	private->io_region->ret_code = -EAGAIN;
 }
 
 static void fsm_disabled_irq(struct vfio_ccw_private *private,
@@ -130,8 +131,6 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 	struct mdev_device *mdev = private->mdev;
 	char *errstr = "request";
 
-	private->state = VFIO_CCW_STATE_BUSY;
-
 	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
 
 	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
@@ -176,7 +175,6 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 	}
 
 err_out:
-	private->state = VFIO_CCW_STATE_IDLE;
 	trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
 			       io_region->ret_code, errstr);
 }
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index f673e106c041..3fa9fc570400 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
 {
 	struct vfio_ccw_private *private;
 	struct ccw_io_region *region;
+	int ret;
 
 	if (*ppos + count > sizeof(*region))
 		return -EINVAL;
 
 	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	mutex_lock(&private->io_mutex);
 	region = private->io_region;
 	if (copy_to_user(buf, (void *)region + *ppos, count))
-		return -EFAULT;
-
-	return count;
+		ret = -EFAULT;
+	else
+		ret = count;
+	mutex_unlock(&private->io_mutex);
+	return ret;
 }
 
 static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
@@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 {
 	struct vfio_ccw_private *private;
 	struct ccw_io_region *region;
+	int ret;
 
 	if (*ppos + count > sizeof(*region))
 		return -EINVAL;
 
 	private = dev_get_drvdata(mdev_parent_dev(mdev));
-	if (private->state != VFIO_CCW_STATE_IDLE)
+	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
+	    private->state == VFIO_CCW_STATE_STANDBY)
 		return -EACCES;
+	if (!mutex_trylock(&private->io_mutex))
+		return -EAGAIN;
 
 	region = private->io_region;
-	if (copy_from_user((void *)region + *ppos, buf, count))
-		return -EFAULT;
+	if (copy_from_user((void *)region + *ppos, buf, count)) {
+		ret = -EFAULT;
+		goto out_unlock;
+	}
 
 	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
-	if (region->ret_code != 0) {
-		private->state = VFIO_CCW_STATE_IDLE;
-		return region->ret_code;
-	}
+	ret = (region->ret_code != 0) ? region->ret_code : count;
 
-	return count;
+out_unlock:
+	mutex_unlock(&private->io_mutex);
+	return ret;
 }
 
 static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 08e9a7dc9176..e88237697f83 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -28,6 +28,7 @@
  * @mdev: pointer to the mediated device
  * @nb: notifier for vfio events
  * @io_region: MMIO region to input/output I/O arguments/results
+ * @io_mutex: protect against concurrent update of I/O structures
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
  * @scsw: scsw info
@@ -42,6 +43,7 @@ struct vfio_ccw_private {
 	struct mdev_device	*mdev;
 	struct notifier_block	nb;
 	struct ccw_io_region	*io_region;
+	struct mutex		io_mutex;
 
 	struct channel_program	cp;
 	struct irb		irb;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 3/5] vfio-ccw: add capabilities chain
  2019-01-21 11:03 ` [Qemu-devel] " Cornelia Huck
@ 2019-01-21 11:03   ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, Cornelia Huck, Alex Williamson, qemu-devel, qemu-s390x

Allow to extend the regions used by vfio-ccw. The first user will be
handling of halt and clear subchannel.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_ops.c     | 181 ++++++++++++++++++++++++----
 drivers/s390/cio/vfio_ccw_private.h |  38 ++++++
 include/uapi/linux/vfio.h           |   2 +
 3 files changed, 195 insertions(+), 26 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 3fa9fc570400..5a89d09f9271 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -3,9 +3,11 @@
  * Physical device callbacks for vfio_ccw
  *
  * Copyright IBM Corp. 2017
+ * Copyright Red Hat, Inc. 2019
  *
  * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
  *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ *            Cornelia Huck <cohuck@redhat.com>
  */
 
 #include <linux/vfio.h>
@@ -157,27 +159,33 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev)
 {
 	struct vfio_ccw_private *private =
 		dev_get_drvdata(mdev_parent_dev(mdev));
+	int i;
 
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
 				 &private->nb);
+
+	for (i = 0; i < private->num_regions; i++)
+		private->region[i].ops->release(private, &private->region[i]);
+
+	private->num_regions = 0;
+	kfree(private->region);
+	private->region = NULL;
 }
 
-static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
-				  char __user *buf,
-				  size_t count,
-				  loff_t *ppos)
+static ssize_t vfio_ccw_mdev_read_io_region(struct vfio_ccw_private *private,
+					    char __user *buf, size_t count,
+					    loff_t *ppos)
 {
-	struct vfio_ccw_private *private;
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
 	struct ccw_io_region *region;
 	int ret;
 
-	if (*ppos + count > sizeof(*region))
+	if (pos + count > sizeof(*region))
 		return -EINVAL;
 
-	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	mutex_lock(&private->io_mutex);
 	region = private->io_region;
-	if (copy_to_user(buf, (void *)region + *ppos, count))
+	if (copy_to_user(buf, (void *)region + pos, count))
 		ret = -EFAULT;
 	else
 		ret = count;
@@ -185,19 +193,42 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
 	return ret;
 }
 
-static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
-				   const char __user *buf,
-				   size_t count,
-				   loff_t *ppos)
+static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
+				  char __user *buf,
+				  size_t count,
+				  loff_t *ppos)
 {
+	unsigned int index = VFIO_CCW_OFFSET_TO_INDEX(*ppos);
 	struct vfio_ccw_private *private;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+
+	if (index >= VFIO_CCW_NUM_REGIONS + private->num_regions)
+		return -EINVAL;
+
+	switch (index) {
+	case VFIO_CCW_CONFIG_REGION_INDEX:
+		return vfio_ccw_mdev_read_io_region(private, buf, count, ppos);
+	default:
+		index -= VFIO_CCW_NUM_REGIONS;
+		return private->region[index].ops->read(private, buf, count,
+							ppos);
+	}
+
+	return -EINVAL;
+}
+
+static ssize_t vfio_ccw_mdev_write_io_region(struct vfio_ccw_private *private,
+					     const char __user *buf,
+					     size_t count, loff_t *ppos)
+{
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
 	struct ccw_io_region *region;
 	int ret;
 
-	if (*ppos + count > sizeof(*region))
+	if (pos + count > sizeof(*region))
 		return -EINVAL;
 
-	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
 	    private->state == VFIO_CCW_STATE_STANDBY)
 		return -EACCES;
@@ -205,7 +236,7 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 		return -EAGAIN;
 
 	region = private->io_region;
-	if (copy_from_user((void *)region + *ppos, buf, count)) {
+	if (copy_from_user((void *)region + pos, buf, count)) {
 		ret = -EFAULT;
 		goto out_unlock;
 	}
@@ -218,19 +249,52 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	return ret;
 }
 
-static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
+static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
+				   const char __user *buf,
+				   size_t count,
+				   loff_t *ppos)
+{
+	unsigned int index = VFIO_CCW_OFFSET_TO_INDEX(*ppos);
+	struct vfio_ccw_private *private;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+
+	if (index >= VFIO_CCW_NUM_REGIONS + private->num_regions)
+		return -EINVAL;
+
+	switch (index) {
+	case VFIO_CCW_CONFIG_REGION_INDEX:
+		return vfio_ccw_mdev_write_io_region(private, buf, count, ppos);
+	default:
+		index -= VFIO_CCW_NUM_REGIONS;
+		return private->region[index].ops->write(private, buf, count,
+							 ppos);
+	}
+
+	return -EINVAL;
+}
+
+static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info,
+					 struct mdev_device *mdev)
 {
+	struct vfio_ccw_private *private;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
-	info->num_regions = VFIO_CCW_NUM_REGIONS;
+	info->num_regions = VFIO_CCW_NUM_REGIONS + private->num_regions;
 	info->num_irqs = VFIO_CCW_NUM_IRQS;
 
 	return 0;
 }
 
 static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
-					 u16 *cap_type_id,
-					 void **cap_type)
+					 struct mdev_device *mdev,
+					 unsigned long arg)
 {
+	struct vfio_ccw_private *private;
+	int i;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	switch (info->index) {
 	case VFIO_CCW_CONFIG_REGION_INDEX:
 		info->offset = 0;
@@ -238,9 +302,51 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
 		info->flags = VFIO_REGION_INFO_FLAG_READ
 			      | VFIO_REGION_INFO_FLAG_WRITE;
 		return 0;
-	default:
-		return -EINVAL;
+	default: /* all other regions are handled via capability chain */
+	{
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		struct vfio_region_info_cap_type cap_type = {
+			.header.id = VFIO_REGION_INFO_CAP_TYPE,
+			.header.version = 1 };
+		int ret;
+
+		if (info->index >=
+		    VFIO_CCW_NUM_REGIONS + private->num_regions)
+			return -EINVAL;
+
+		i = info->index - VFIO_CCW_NUM_REGIONS;
+
+		info->offset = VFIO_CCW_INDEX_TO_OFFSET(info->index);
+		info->size = private->region[i].size;
+		info->flags = private->region[i].flags;
+
+		cap_type.type = private->region[i].type;
+		cap_type.subtype = private->region[i].subtype;
+
+		ret = vfio_info_add_capability(&caps, &cap_type.header,
+					       sizeof(cap_type));
+		if (ret)
+			return ret;
+
+		info->flags |= VFIO_REGION_INFO_FLAG_CAPS;
+		if (info->argsz < sizeof(*info) + caps.size) {
+			info->argsz = sizeof(*info) + caps.size;
+			info->cap_offset = 0;
+		} else {
+			vfio_info_cap_shift(&caps, sizeof(*info));
+			if (copy_to_user((void __user *)arg + sizeof(*info),
+					 caps.buf, caps.size)) {
+				kfree(caps.buf);
+				return -EFAULT;
+			}
+			info->cap_offset = sizeof(*info);
+		}
+
+		kfree(caps.buf);
+
+	}
 	}
+	return 0;
 }
 
 static int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
@@ -317,6 +423,32 @@ static int vfio_ccw_mdev_set_irqs(struct mdev_device *mdev,
 	}
 }
 
+int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
+				 unsigned int subtype,
+				 const struct vfio_ccw_regops *ops,
+				 size_t size, u32 flags, void *data)
+{
+	struct vfio_ccw_region *region;
+
+	region = krealloc(private->region,
+			  (private->num_regions + 1) * sizeof(*region),
+			  GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	private->region = region;
+	private->region[private->num_regions].type = VFIO_REGION_TYPE_CCW;
+	private->region[private->num_regions].subtype = subtype;
+	private->region[private->num_regions].ops = ops;
+	private->region[private->num_regions].size = size;
+	private->region[private->num_regions].flags = flags;
+	private->region[private->num_regions].data = data;
+
+	private->num_regions++;
+
+	return 0;
+}
+
 static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 				   unsigned int cmd,
 				   unsigned long arg)
@@ -337,7 +469,7 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 		if (info.argsz < minsz)
 			return -EINVAL;
 
-		ret = vfio_ccw_mdev_get_device_info(&info);
+		ret = vfio_ccw_mdev_get_device_info(&info, mdev);
 		if (ret)
 			return ret;
 
@@ -346,8 +478,6 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 	case VFIO_DEVICE_GET_REGION_INFO:
 	{
 		struct vfio_region_info info;
-		u16 cap_type_id = 0;
-		void *cap_type = NULL;
 
 		minsz = offsetofend(struct vfio_region_info, offset);
 
@@ -357,8 +487,7 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 		if (info.argsz < minsz)
 			return -EINVAL;
 
-		ret = vfio_ccw_mdev_get_region_info(&info, &cap_type_id,
-						    &cap_type);
+		ret = vfio_ccw_mdev_get_region_info(&info, mdev, arg);
 		if (ret)
 			return ret;
 
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index e88237697f83..20e75f4f3695 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -3,9 +3,11 @@
  * Private stuff for vfio_ccw driver
  *
  * Copyright IBM Corp. 2017
+ * Copyright Red Hat, Inc. 2019
  *
  * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
  *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ *            Cornelia Huck <cohuck@redhat.com>
  */
 
 #ifndef _VFIO_CCW_PRIVATE_H_
@@ -19,6 +21,38 @@
 #include "css.h"
 #include "vfio_ccw_cp.h"
 
+#define VFIO_CCW_OFFSET_SHIFT   40
+#define VFIO_CCW_OFFSET_TO_INDEX(off)	(off >> VFIO_CCW_OFFSET_SHIFT)
+#define VFIO_CCW_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_CCW_OFFSET_SHIFT)
+#define VFIO_CCW_OFFSET_MASK	(((u64)(1) << VFIO_CCW_OFFSET_SHIFT) - 1)
+
+/* capability chain handling similar to vfio-pci */
+struct vfio_ccw_private;
+struct vfio_ccw_region;
+
+struct vfio_ccw_regops {
+	size_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
+			size_t count, loff_t *ppos);
+	size_t	(*write)(struct vfio_ccw_private *private,
+			 const char __user *buf, size_t count, loff_t *ppos);
+	void	(*release)(struct vfio_ccw_private *private,
+			   struct vfio_ccw_region *region);
+};
+
+struct vfio_ccw_region {
+	u32				type;
+	u32				subtype;
+	const struct vfio_ccw_regops	*ops;
+	void				*data;
+	size_t				size;
+	u32				flags;
+};
+
+int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
+				 unsigned int subtype,
+				 const struct vfio_ccw_regops *ops,
+				 size_t size, u32 flags, void *data);
+
 /**
  * struct vfio_ccw_private
  * @sch: pointer to the subchannel
@@ -29,6 +63,8 @@
  * @nb: notifier for vfio events
  * @io_region: MMIO region to input/output I/O arguments/results
  * @io_mutex: protect against concurrent update of I/O structures
+ * @region: additional regions for other subchannel operations
+ * @num_regions: number of additional regions
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
  * @scsw: scsw info
@@ -44,6 +80,8 @@ struct vfio_ccw_private {
 	struct notifier_block	nb;
 	struct ccw_io_region	*io_region;
 	struct mutex		io_mutex;
+	struct vfio_ccw_region *region;
+	int num_regions;
 
 	struct channel_program	cp;
 	struct irb		irb;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 02bb7ad6e986..56e2413d3e00 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -353,6 +353,8 @@ struct vfio_region_gfx_edid {
 #define VFIO_DEVICE_GFX_LINK_STATE_DOWN  2
 };
 
+#define VFIO_REGION_TYPE_CCW			(2)
+
 /*
  * 10de vendor sub-type
  *
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [Qemu-devel] [PATCH v2 3/5] vfio-ccw: add capabilities chain
@ 2019-01-21 11:03   ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson, Cornelia Huck

Allow to extend the regions used by vfio-ccw. The first user will be
handling of halt and clear subchannel.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_ops.c     | 181 ++++++++++++++++++++++++----
 drivers/s390/cio/vfio_ccw_private.h |  38 ++++++
 include/uapi/linux/vfio.h           |   2 +
 3 files changed, 195 insertions(+), 26 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 3fa9fc570400..5a89d09f9271 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -3,9 +3,11 @@
  * Physical device callbacks for vfio_ccw
  *
  * Copyright IBM Corp. 2017
+ * Copyright Red Hat, Inc. 2019
  *
  * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
  *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ *            Cornelia Huck <cohuck@redhat.com>
  */
 
 #include <linux/vfio.h>
@@ -157,27 +159,33 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev)
 {
 	struct vfio_ccw_private *private =
 		dev_get_drvdata(mdev_parent_dev(mdev));
+	int i;
 
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
 				 &private->nb);
+
+	for (i = 0; i < private->num_regions; i++)
+		private->region[i].ops->release(private, &private->region[i]);
+
+	private->num_regions = 0;
+	kfree(private->region);
+	private->region = NULL;
 }
 
-static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
-				  char __user *buf,
-				  size_t count,
-				  loff_t *ppos)
+static ssize_t vfio_ccw_mdev_read_io_region(struct vfio_ccw_private *private,
+					    char __user *buf, size_t count,
+					    loff_t *ppos)
 {
-	struct vfio_ccw_private *private;
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
 	struct ccw_io_region *region;
 	int ret;
 
-	if (*ppos + count > sizeof(*region))
+	if (pos + count > sizeof(*region))
 		return -EINVAL;
 
-	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	mutex_lock(&private->io_mutex);
 	region = private->io_region;
-	if (copy_to_user(buf, (void *)region + *ppos, count))
+	if (copy_to_user(buf, (void *)region + pos, count))
 		ret = -EFAULT;
 	else
 		ret = count;
@@ -185,19 +193,42 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
 	return ret;
 }
 
-static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
-				   const char __user *buf,
-				   size_t count,
-				   loff_t *ppos)
+static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
+				  char __user *buf,
+				  size_t count,
+				  loff_t *ppos)
 {
+	unsigned int index = VFIO_CCW_OFFSET_TO_INDEX(*ppos);
 	struct vfio_ccw_private *private;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+
+	if (index >= VFIO_CCW_NUM_REGIONS + private->num_regions)
+		return -EINVAL;
+
+	switch (index) {
+	case VFIO_CCW_CONFIG_REGION_INDEX:
+		return vfio_ccw_mdev_read_io_region(private, buf, count, ppos);
+	default:
+		index -= VFIO_CCW_NUM_REGIONS;
+		return private->region[index].ops->read(private, buf, count,
+							ppos);
+	}
+
+	return -EINVAL;
+}
+
+static ssize_t vfio_ccw_mdev_write_io_region(struct vfio_ccw_private *private,
+					     const char __user *buf,
+					     size_t count, loff_t *ppos)
+{
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
 	struct ccw_io_region *region;
 	int ret;
 
-	if (*ppos + count > sizeof(*region))
+	if (pos + count > sizeof(*region))
 		return -EINVAL;
 
-	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
 	    private->state == VFIO_CCW_STATE_STANDBY)
 		return -EACCES;
@@ -205,7 +236,7 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 		return -EAGAIN;
 
 	region = private->io_region;
-	if (copy_from_user((void *)region + *ppos, buf, count)) {
+	if (copy_from_user((void *)region + pos, buf, count)) {
 		ret = -EFAULT;
 		goto out_unlock;
 	}
@@ -218,19 +249,52 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	return ret;
 }
 
-static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
+static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
+				   const char __user *buf,
+				   size_t count,
+				   loff_t *ppos)
+{
+	unsigned int index = VFIO_CCW_OFFSET_TO_INDEX(*ppos);
+	struct vfio_ccw_private *private;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+
+	if (index >= VFIO_CCW_NUM_REGIONS + private->num_regions)
+		return -EINVAL;
+
+	switch (index) {
+	case VFIO_CCW_CONFIG_REGION_INDEX:
+		return vfio_ccw_mdev_write_io_region(private, buf, count, ppos);
+	default:
+		index -= VFIO_CCW_NUM_REGIONS;
+		return private->region[index].ops->write(private, buf, count,
+							 ppos);
+	}
+
+	return -EINVAL;
+}
+
+static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info,
+					 struct mdev_device *mdev)
 {
+	struct vfio_ccw_private *private;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
-	info->num_regions = VFIO_CCW_NUM_REGIONS;
+	info->num_regions = VFIO_CCW_NUM_REGIONS + private->num_regions;
 	info->num_irqs = VFIO_CCW_NUM_IRQS;
 
 	return 0;
 }
 
 static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
-					 u16 *cap_type_id,
-					 void **cap_type)
+					 struct mdev_device *mdev,
+					 unsigned long arg)
 {
+	struct vfio_ccw_private *private;
+	int i;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	switch (info->index) {
 	case VFIO_CCW_CONFIG_REGION_INDEX:
 		info->offset = 0;
@@ -238,9 +302,51 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
 		info->flags = VFIO_REGION_INFO_FLAG_READ
 			      | VFIO_REGION_INFO_FLAG_WRITE;
 		return 0;
-	default:
-		return -EINVAL;
+	default: /* all other regions are handled via capability chain */
+	{
+		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+		struct vfio_region_info_cap_type cap_type = {
+			.header.id = VFIO_REGION_INFO_CAP_TYPE,
+			.header.version = 1 };
+		int ret;
+
+		if (info->index >=
+		    VFIO_CCW_NUM_REGIONS + private->num_regions)
+			return -EINVAL;
+
+		i = info->index - VFIO_CCW_NUM_REGIONS;
+
+		info->offset = VFIO_CCW_INDEX_TO_OFFSET(info->index);
+		info->size = private->region[i].size;
+		info->flags = private->region[i].flags;
+
+		cap_type.type = private->region[i].type;
+		cap_type.subtype = private->region[i].subtype;
+
+		ret = vfio_info_add_capability(&caps, &cap_type.header,
+					       sizeof(cap_type));
+		if (ret)
+			return ret;
+
+		info->flags |= VFIO_REGION_INFO_FLAG_CAPS;
+		if (info->argsz < sizeof(*info) + caps.size) {
+			info->argsz = sizeof(*info) + caps.size;
+			info->cap_offset = 0;
+		} else {
+			vfio_info_cap_shift(&caps, sizeof(*info));
+			if (copy_to_user((void __user *)arg + sizeof(*info),
+					 caps.buf, caps.size)) {
+				kfree(caps.buf);
+				return -EFAULT;
+			}
+			info->cap_offset = sizeof(*info);
+		}
+
+		kfree(caps.buf);
+
+	}
 	}
+	return 0;
 }
 
 static int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
@@ -317,6 +423,32 @@ static int vfio_ccw_mdev_set_irqs(struct mdev_device *mdev,
 	}
 }
 
+int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
+				 unsigned int subtype,
+				 const struct vfio_ccw_regops *ops,
+				 size_t size, u32 flags, void *data)
+{
+	struct vfio_ccw_region *region;
+
+	region = krealloc(private->region,
+			  (private->num_regions + 1) * sizeof(*region),
+			  GFP_KERNEL);
+	if (!region)
+		return -ENOMEM;
+
+	private->region = region;
+	private->region[private->num_regions].type = VFIO_REGION_TYPE_CCW;
+	private->region[private->num_regions].subtype = subtype;
+	private->region[private->num_regions].ops = ops;
+	private->region[private->num_regions].size = size;
+	private->region[private->num_regions].flags = flags;
+	private->region[private->num_regions].data = data;
+
+	private->num_regions++;
+
+	return 0;
+}
+
 static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 				   unsigned int cmd,
 				   unsigned long arg)
@@ -337,7 +469,7 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 		if (info.argsz < minsz)
 			return -EINVAL;
 
-		ret = vfio_ccw_mdev_get_device_info(&info);
+		ret = vfio_ccw_mdev_get_device_info(&info, mdev);
 		if (ret)
 			return ret;
 
@@ -346,8 +478,6 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 	case VFIO_DEVICE_GET_REGION_INFO:
 	{
 		struct vfio_region_info info;
-		u16 cap_type_id = 0;
-		void *cap_type = NULL;
 
 		minsz = offsetofend(struct vfio_region_info, offset);
 
@@ -357,8 +487,7 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 		if (info.argsz < minsz)
 			return -EINVAL;
 
-		ret = vfio_ccw_mdev_get_region_info(&info, &cap_type_id,
-						    &cap_type);
+		ret = vfio_ccw_mdev_get_region_info(&info, mdev, arg);
 		if (ret)
 			return ret;
 
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index e88237697f83..20e75f4f3695 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -3,9 +3,11 @@
  * Private stuff for vfio_ccw driver
  *
  * Copyright IBM Corp. 2017
+ * Copyright Red Hat, Inc. 2019
  *
  * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
  *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ *            Cornelia Huck <cohuck@redhat.com>
  */
 
 #ifndef _VFIO_CCW_PRIVATE_H_
@@ -19,6 +21,38 @@
 #include "css.h"
 #include "vfio_ccw_cp.h"
 
+#define VFIO_CCW_OFFSET_SHIFT   40
+#define VFIO_CCW_OFFSET_TO_INDEX(off)	(off >> VFIO_CCW_OFFSET_SHIFT)
+#define VFIO_CCW_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_CCW_OFFSET_SHIFT)
+#define VFIO_CCW_OFFSET_MASK	(((u64)(1) << VFIO_CCW_OFFSET_SHIFT) - 1)
+
+/* capability chain handling similar to vfio-pci */
+struct vfio_ccw_private;
+struct vfio_ccw_region;
+
+struct vfio_ccw_regops {
+	size_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
+			size_t count, loff_t *ppos);
+	size_t	(*write)(struct vfio_ccw_private *private,
+			 const char __user *buf, size_t count, loff_t *ppos);
+	void	(*release)(struct vfio_ccw_private *private,
+			   struct vfio_ccw_region *region);
+};
+
+struct vfio_ccw_region {
+	u32				type;
+	u32				subtype;
+	const struct vfio_ccw_regops	*ops;
+	void				*data;
+	size_t				size;
+	u32				flags;
+};
+
+int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
+				 unsigned int subtype,
+				 const struct vfio_ccw_regops *ops,
+				 size_t size, u32 flags, void *data);
+
 /**
  * struct vfio_ccw_private
  * @sch: pointer to the subchannel
@@ -29,6 +63,8 @@
  * @nb: notifier for vfio events
  * @io_region: MMIO region to input/output I/O arguments/results
  * @io_mutex: protect against concurrent update of I/O structures
+ * @region: additional regions for other subchannel operations
+ * @num_regions: number of additional regions
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
  * @scsw: scsw info
@@ -44,6 +80,8 @@ struct vfio_ccw_private {
 	struct notifier_block	nb;
 	struct ccw_io_region	*io_region;
 	struct mutex		io_mutex;
+	struct vfio_ccw_region *region;
+	int num_regions;
 
 	struct channel_program	cp;
 	struct irb		irb;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 02bb7ad6e986..56e2413d3e00 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -353,6 +353,8 @@ struct vfio_region_gfx_edid {
 #define VFIO_DEVICE_GFX_LINK_STATE_DOWN  2
 };
 
+#define VFIO_REGION_TYPE_CCW			(2)
+
 /*
  * 10de vendor sub-type
  *
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 4/5] s390/cio: export hsch to modules
  2019-01-21 11:03 ` [Qemu-devel] " Cornelia Huck
@ 2019-01-21 11:03   ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, Cornelia Huck, Alex Williamson, qemu-devel, qemu-s390x

The vfio-ccw code will need this, and it matches treatment of ssch
and csch.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/ioasm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/s390/cio/ioasm.c b/drivers/s390/cio/ioasm.c
index 14d328338ce2..08eb10283b18 100644
--- a/drivers/s390/cio/ioasm.c
+++ b/drivers/s390/cio/ioasm.c
@@ -233,6 +233,7 @@ int hsch(struct subchannel_id schid)
 
 	return ccode;
 }
+EXPORT_SYMBOL(hsch);
 
 static inline int __xsch(struct subchannel_id schid)
 {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [Qemu-devel] [PATCH v2 4/5] s390/cio: export hsch to modules
@ 2019-01-21 11:03   ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson, Cornelia Huck

The vfio-ccw code will need this, and it matches treatment of ssch
and csch.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/ioasm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/s390/cio/ioasm.c b/drivers/s390/cio/ioasm.c
index 14d328338ce2..08eb10283b18 100644
--- a/drivers/s390/cio/ioasm.c
+++ b/drivers/s390/cio/ioasm.c
@@ -233,6 +233,7 @@ int hsch(struct subchannel_id schid)
 
 	return ccode;
 }
+EXPORT_SYMBOL(hsch);
 
 static inline int __xsch(struct subchannel_id schid)
 {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
  2019-01-21 11:03 ` [Qemu-devel] " Cornelia Huck
@ 2019-01-21 11:03   ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, Cornelia Huck, Alex Williamson, qemu-devel, qemu-s390x

Add a region to the vfio-ccw device that can be used to submit
asynchronous I/O instructions. ssch continues to be handled by the
existing I/O region; the new region handles hsch and csch.

Interrupt status continues to be reported through the same channels
as for ssch.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/Makefile           |   3 +-
 drivers/s390/cio/vfio_ccw_async.c   |  91 ++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_drv.c     |  45 +++++++----
 drivers/s390/cio/vfio_ccw_fsm.c     | 114 +++++++++++++++++++++++++++-
 drivers/s390/cio/vfio_ccw_ops.c     |  13 +++-
 drivers/s390/cio/vfio_ccw_private.h |   9 ++-
 include/uapi/linux/vfio.h           |   2 +
 include/uapi/linux/vfio_ccw.h       |  12 +++
 8 files changed, 269 insertions(+), 20 deletions(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_async.c

diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index f230516abb96..f6a8db04177c 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -20,5 +20,6 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
 
-vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o
+vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o \
+	vfio_ccw_async.o
 obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_async.c b/drivers/s390/cio/vfio_ccw_async.c
new file mode 100644
index 000000000000..604806c2970f
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_async.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async I/O region for vfio_ccw
+ *
+ * Copyright Red Hat, Inc. 2019
+ *
+ * Author(s): Cornelia Huck <cohuck@redhat.com>
+ */
+
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+
+#include "vfio_ccw_private.h"
+
+static ssize_t vfio_ccw_async_region_read(struct vfio_ccw_private *private,
+					  char __user *buf, size_t count,
+					  loff_t *ppos)
+{
+	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
+	struct ccw_cmd_region *region;
+	int ret;
+
+	if (pos + count > sizeof(*region))
+		return -EINVAL;
+
+	mutex_lock(&private->io_mutex);
+	region = private->region[i].data;
+	if (copy_to_user(buf, (void *)region + pos, count))
+		ret = -EFAULT;
+	else
+		ret = count;
+	mutex_unlock(&private->io_mutex);
+	return ret;
+}
+
+static ssize_t vfio_ccw_async_region_write(struct vfio_ccw_private *private,
+					   const char __user *buf, size_t count,
+					   loff_t *ppos)
+{
+	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
+	struct ccw_cmd_region *region;
+	int ret;
+
+	if (pos + count > sizeof(*region))
+		return -EINVAL;
+
+	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
+	    private->state == VFIO_CCW_STATE_STANDBY)
+		return -EACCES;
+	if (!mutex_trylock(&private->io_mutex))
+		return -EAGAIN;
+
+	region = private->region[i].data;
+	if (copy_from_user((void *)region + pos, buf, count)) {
+		ret = -EFAULT;
+		goto out_unlock;
+	}
+
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_ASYNC_REQ);
+
+	ret = region->ret_code ? region->ret_code : count;
+
+out_unlock:
+	mutex_unlock(&private->io_mutex);
+	return ret;
+}
+
+static void vfio_ccw_async_region_release(struct vfio_ccw_private *private,
+					  struct vfio_ccw_region *region)
+{
+
+}
+
+const struct vfio_ccw_regops vfio_ccw_async_region_ops = {
+	.read = vfio_ccw_async_region_read,
+	.write = vfio_ccw_async_region_write,
+	.release = vfio_ccw_async_region_release,
+};
+
+int vfio_ccw_register_async_dev_regions(struct vfio_ccw_private *private)
+{
+	return vfio_ccw_register_dev_region(private,
+					    VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD,
+					    &vfio_ccw_async_region_ops,
+					    sizeof(struct ccw_cmd_region),
+					    VFIO_REGION_INFO_FLAG_READ |
+					    VFIO_REGION_INFO_FLAG_WRITE,
+					    private->cmd_region);
+}
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 2ef189fe45ed..d807911b8ed5 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -3,9 +3,11 @@
  * VFIO based Physical Subchannel device driver
  *
  * Copyright IBM Corp. 2017
+ * Copyright Red Hat, Inc. 2019
  *
  * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
  *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ *            Cornelia Huck <cohuck@redhat.com>
  */
 
 #include <linux/module.h>
@@ -23,6 +25,7 @@
 
 struct workqueue_struct *vfio_ccw_work_q;
 static struct kmem_cache *vfio_ccw_io_region;
+static struct kmem_cache *vfio_ccw_cmd_region;
 
 /*
  * Helpers
@@ -104,7 +107,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 {
 	struct pmcw *pmcw = &sch->schib.pmcw;
 	struct vfio_ccw_private *private;
-	int ret;
+	int ret = -ENOMEM;
 
 	if (pmcw->qf) {
 		dev_warn(&sch->dev, "vfio: ccw: does not support QDIO: %s\n",
@@ -118,10 +121,13 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 
 	private->io_region = kmem_cache_zalloc(vfio_ccw_io_region,
 					       GFP_KERNEL | GFP_DMA);
-	if (!private->io_region) {
-		kfree(private);
-		return -ENOMEM;
-	}
+	if (!private->io_region)
+		goto out_free;
+
+	private->cmd_region = kmem_cache_zalloc(vfio_ccw_cmd_region,
+						GFP_KERNEL | GFP_DMA);
+	if (!private->cmd_region)
+		goto out_free;
 
 	private->sch = sch;
 	dev_set_drvdata(&sch->dev, private);
@@ -149,7 +155,10 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	cio_disable_subchannel(sch);
 out_free:
 	dev_set_drvdata(&sch->dev, NULL);
-	kmem_cache_free(vfio_ccw_io_region, private->io_region);
+	if (private->cmd_region)
+		kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
+	if (private->io_region)
+		kmem_cache_free(vfio_ccw_io_region, private->io_region);
 	kfree(private);
 	return ret;
 }
@@ -238,7 +247,7 @@ static struct css_driver vfio_ccw_sch_driver = {
 
 static int __init vfio_ccw_sch_init(void)
 {
-	int ret;
+	int ret = -ENOMEM;
 
 	vfio_ccw_work_q = create_singlethread_workqueue("vfio-ccw");
 	if (!vfio_ccw_work_q)
@@ -248,20 +257,30 @@ static int __init vfio_ccw_sch_init(void)
 					sizeof(struct ccw_io_region), 0,
 					SLAB_ACCOUNT, 0,
 					sizeof(struct ccw_io_region), NULL);
-	if (!vfio_ccw_io_region) {
-		destroy_workqueue(vfio_ccw_work_q);
-		return -ENOMEM;
-	}
+	if (!vfio_ccw_io_region)
+		goto out_err;
+
+	vfio_ccw_cmd_region = kmem_cache_create_usercopy("vfio_ccw_cmd_region",
+					sizeof(struct ccw_cmd_region), 0,
+					SLAB_ACCOUNT, 0,
+					sizeof(struct ccw_cmd_region), NULL);
+	if (!vfio_ccw_cmd_region)
+		goto out_err;
 
 	isc_register(VFIO_CCW_ISC);
 	ret = css_driver_register(&vfio_ccw_sch_driver);
 	if (ret) {
 		isc_unregister(VFIO_CCW_ISC);
-		kmem_cache_destroy(vfio_ccw_io_region);
-		destroy_workqueue(vfio_ccw_work_q);
+		goto out_err;
 	}
 
 	return ret;
+
+out_err:
+	kmem_cache_destroy(vfio_ccw_cmd_region);
+	kmem_cache_destroy(vfio_ccw_io_region);
+	destroy_workqueue(vfio_ccw_work_q);
+	return ret;
 }
 
 static void __exit vfio_ccw_sch_exit(void)
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index f6ed934cc565..72912d596181 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -3,8 +3,10 @@
  * Finite state machine for vfio-ccw device handling
  *
  * Copyright IBM Corp. 2017
+ * Copyright Red Hat, Inc. 2019
  *
  * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Cornelia Huck <cohuck@redhat.com>
  */
 
 #include <linux/vfio.h>
@@ -69,6 +71,81 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
 	return ret;
 }
 
+static int fsm_do_halt(struct vfio_ccw_private *private)
+{
+	struct subchannel *sch;
+	unsigned long flags;
+	int ccode;
+	int ret;
+
+	sch = private->sch;
+
+	spin_lock_irqsave(sch->lock, flags);
+
+	/* Issue "Halt Subchannel" */
+	ccode = hsch(sch->schid);
+
+	switch (ccode) {
+	case 0:
+		/*
+		 * Initialize device status information
+		 */
+		sch->schib.scsw.cmd.actl |= SCSW_ACTL_HALT_PEND;
+		ret = 0;
+		private->state = VFIO_CCW_STATE_BUSY;
+		break;
+	case 1:		/* Status pending */
+	case 2:		/* Busy */
+		ret = -EBUSY;
+		break;
+	case 3:		/* Device not operational */
+	{
+		ret = -ENODEV;
+		break;
+	}
+	default:
+		ret = ccode;
+	}
+	spin_unlock_irqrestore(sch->lock, flags);
+	return ret;
+}
+
+static int fsm_do_clear(struct vfio_ccw_private *private)
+{
+	struct subchannel *sch;
+	unsigned long flags;
+	int ccode;
+	int ret;
+
+	sch = private->sch;
+
+	spin_lock_irqsave(sch->lock, flags);
+
+	/* Issue "Clear Subchannel" */
+	ccode = csch(sch->schid);
+
+	switch (ccode) {
+	case 0:
+		/*
+		 * Initialize device status information
+		 */
+		sch->schib.scsw.cmd.actl = SCSW_ACTL_CLEAR_PEND;
+		/* TODO: check what else we might need to clear */
+		ret = 0;
+		private->state = VFIO_CCW_STATE_BUSY;
+		break;
+	case 3:		/* Device not operational */
+	{
+		ret = -ENODEV;
+		break;
+	}
+	default:
+		ret = ccode;
+	}
+	spin_unlock_irqrestore(sch->lock, flags);
+	return ret;
+}
+
 static void fsm_notoper(struct vfio_ccw_private *private,
 			enum vfio_ccw_event event)
 {
@@ -103,6 +180,14 @@ static void fsm_io_busy(struct vfio_ccw_private *private,
 	private->io_region->ret_code = -EAGAIN;
 }
 
+static void fsm_async_error(struct vfio_ccw_private *private,
+			    enum vfio_ccw_event event)
+{
+	pr_err("vfio-ccw: FSM: halt/clear request from state:%d\n",
+	       private->state);
+	private->cmd_region->ret_code = -EIO;
+}
+
 static void fsm_disabled_irq(struct vfio_ccw_private *private,
 			     enum vfio_ccw_event event)
 {
@@ -165,11 +250,11 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 		}
 		return;
 	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
-		/* XXX: Handle halt. */
+		/* halt is handled via the async cmd region */
 		io_region->ret_code = -EOPNOTSUPP;
 		goto err_out;
 	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
-		/* XXX: Handle clear. */
+		/* clear is handled via the async cmd region */
 		io_region->ret_code = -EOPNOTSUPP;
 		goto err_out;
 	}
@@ -179,6 +264,27 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 			       io_region->ret_code, errstr);
 }
 
+/*
+ * Deal with an async request from userspace.
+ */
+static void fsm_async_request(struct vfio_ccw_private *private,
+			      enum vfio_ccw_event event)
+{
+	struct ccw_cmd_region *cmd_region = private->cmd_region;
+
+	switch (cmd_region->command) {
+	case VFIO_CCW_ASYNC_CMD_HSCH:
+		cmd_region->ret_code = fsm_do_halt(private);
+		break;
+	case VFIO_CCW_ASYNC_CMD_CSCH:
+		cmd_region->ret_code = fsm_do_clear(private);
+		break;
+	default:
+		/* should not happen? */
+		cmd_region->ret_code = -EINVAL;
+	}
+}
+
 /*
  * Got an interrupt for a normal io (state busy).
  */
@@ -202,21 +308,25 @@ fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
 	[VFIO_CCW_STATE_NOT_OPER] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_nop,
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
 	},
 	[VFIO_CCW_STATE_STANDBY] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 	},
 	[VFIO_CCW_STATE_IDLE] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_request,
+		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_request,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 	},
 	[VFIO_CCW_STATE_BUSY] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
+		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_request,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 	},
 };
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 5a89d09f9271..755806cb8d53 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -148,11 +148,20 @@ static int vfio_ccw_mdev_open(struct mdev_device *mdev)
 	struct vfio_ccw_private *private =
 		dev_get_drvdata(mdev_parent_dev(mdev));
 	unsigned long events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
+	int ret;
 
 	private->nb.notifier_call = vfio_ccw_mdev_notifier;
 
-	return vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
-				      &events, &private->nb);
+	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				     &events, &private->nb);
+	if (ret)
+		return ret;
+
+	ret = vfio_ccw_register_async_dev_regions(private);
+	if (ret)
+		vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+					 &private->nb);
+	return ret;
 }
 
 static void vfio_ccw_mdev_release(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 20e75f4f3695..ed8b94ea2f08 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -31,9 +31,9 @@ struct vfio_ccw_private;
 struct vfio_ccw_region;
 
 struct vfio_ccw_regops {
-	size_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
+	ssize_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
 			size_t count, loff_t *ppos);
-	size_t	(*write)(struct vfio_ccw_private *private,
+	ssize_t	(*write)(struct vfio_ccw_private *private,
 			 const char __user *buf, size_t count, loff_t *ppos);
 	void	(*release)(struct vfio_ccw_private *private,
 			   struct vfio_ccw_region *region);
@@ -53,6 +53,8 @@ int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
 				 const struct vfio_ccw_regops *ops,
 				 size_t size, u32 flags, void *data);
 
+int vfio_ccw_register_async_dev_regions(struct vfio_ccw_private *private);
+
 /**
  * struct vfio_ccw_private
  * @sch: pointer to the subchannel
@@ -64,6 +66,7 @@ int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
  * @io_region: MMIO region to input/output I/O arguments/results
  * @io_mutex: protect against concurrent update of I/O structures
  * @region: additional regions for other subchannel operations
+ * @cmd_region: MMIO region for asynchronous I/O commands other than START
  * @num_regions: number of additional regions
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
@@ -81,6 +84,7 @@ struct vfio_ccw_private {
 	struct ccw_io_region	*io_region;
 	struct mutex		io_mutex;
 	struct vfio_ccw_region *region;
+	struct ccw_cmd_region	*cmd_region;
 	int num_regions;
 
 	struct channel_program	cp;
@@ -115,6 +119,7 @@ enum vfio_ccw_event {
 	VFIO_CCW_EVENT_NOT_OPER,
 	VFIO_CCW_EVENT_IO_REQ,
 	VFIO_CCW_EVENT_INTERRUPT,
+	VFIO_CCW_EVENT_ASYNC_REQ,
 	/* last element! */
 	NR_VFIO_CCW_EVENTS
 };
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 56e2413d3e00..8f10748dac79 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -354,6 +354,8 @@ struct vfio_region_gfx_edid {
 };
 
 #define VFIO_REGION_TYPE_CCW			(2)
+/* ccw sub-types */
+#define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD	(1)
 
 /*
  * 10de vendor sub-type
diff --git a/include/uapi/linux/vfio_ccw.h b/include/uapi/linux/vfio_ccw.h
index 2ec5f367ff78..cbecbf0cd54f 100644
--- a/include/uapi/linux/vfio_ccw.h
+++ b/include/uapi/linux/vfio_ccw.h
@@ -12,6 +12,7 @@
 
 #include <linux/types.h>
 
+/* used for START SUBCHANNEL, always present */
 struct ccw_io_region {
 #define ORB_AREA_SIZE 12
 	__u8	orb_area[ORB_AREA_SIZE];
@@ -22,4 +23,15 @@ struct ccw_io_region {
 	__u32	ret_code;
 } __packed;
 
+/*
+ * used for processing commands that trigger asynchronous actions
+ * Note: this is controlled by a capability
+ */
+#define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
+#define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
+struct ccw_cmd_region {
+	__u32 command;
+	__u32 ret_code;
+} __packed;
+
 #endif
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [Qemu-devel] [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
@ 2019-01-21 11:03   ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-21 11:03 UTC (permalink / raw)
  To: Halil Pasic, Eric Farman, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson, Cornelia Huck

Add a region to the vfio-ccw device that can be used to submit
asynchronous I/O instructions. ssch continues to be handled by the
existing I/O region; the new region handles hsch and csch.

Interrupt status continues to be reported through the same channels
as for ssch.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/Makefile           |   3 +-
 drivers/s390/cio/vfio_ccw_async.c   |  91 ++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_drv.c     |  45 +++++++----
 drivers/s390/cio/vfio_ccw_fsm.c     | 114 +++++++++++++++++++++++++++-
 drivers/s390/cio/vfio_ccw_ops.c     |  13 +++-
 drivers/s390/cio/vfio_ccw_private.h |   9 ++-
 include/uapi/linux/vfio.h           |   2 +
 include/uapi/linux/vfio_ccw.h       |  12 +++
 8 files changed, 269 insertions(+), 20 deletions(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_async.c

diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index f230516abb96..f6a8db04177c 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -20,5 +20,6 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
 
-vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o
+vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o \
+	vfio_ccw_async.o
 obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_async.c b/drivers/s390/cio/vfio_ccw_async.c
new file mode 100644
index 000000000000..604806c2970f
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_async.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async I/O region for vfio_ccw
+ *
+ * Copyright Red Hat, Inc. 2019
+ *
+ * Author(s): Cornelia Huck <cohuck@redhat.com>
+ */
+
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+
+#include "vfio_ccw_private.h"
+
+static ssize_t vfio_ccw_async_region_read(struct vfio_ccw_private *private,
+					  char __user *buf, size_t count,
+					  loff_t *ppos)
+{
+	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
+	struct ccw_cmd_region *region;
+	int ret;
+
+	if (pos + count > sizeof(*region))
+		return -EINVAL;
+
+	mutex_lock(&private->io_mutex);
+	region = private->region[i].data;
+	if (copy_to_user(buf, (void *)region + pos, count))
+		ret = -EFAULT;
+	else
+		ret = count;
+	mutex_unlock(&private->io_mutex);
+	return ret;
+}
+
+static ssize_t vfio_ccw_async_region_write(struct vfio_ccw_private *private,
+					   const char __user *buf, size_t count,
+					   loff_t *ppos)
+{
+	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
+	struct ccw_cmd_region *region;
+	int ret;
+
+	if (pos + count > sizeof(*region))
+		return -EINVAL;
+
+	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
+	    private->state == VFIO_CCW_STATE_STANDBY)
+		return -EACCES;
+	if (!mutex_trylock(&private->io_mutex))
+		return -EAGAIN;
+
+	region = private->region[i].data;
+	if (copy_from_user((void *)region + pos, buf, count)) {
+		ret = -EFAULT;
+		goto out_unlock;
+	}
+
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_ASYNC_REQ);
+
+	ret = region->ret_code ? region->ret_code : count;
+
+out_unlock:
+	mutex_unlock(&private->io_mutex);
+	return ret;
+}
+
+static void vfio_ccw_async_region_release(struct vfio_ccw_private *private,
+					  struct vfio_ccw_region *region)
+{
+
+}
+
+const struct vfio_ccw_regops vfio_ccw_async_region_ops = {
+	.read = vfio_ccw_async_region_read,
+	.write = vfio_ccw_async_region_write,
+	.release = vfio_ccw_async_region_release,
+};
+
+int vfio_ccw_register_async_dev_regions(struct vfio_ccw_private *private)
+{
+	return vfio_ccw_register_dev_region(private,
+					    VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD,
+					    &vfio_ccw_async_region_ops,
+					    sizeof(struct ccw_cmd_region),
+					    VFIO_REGION_INFO_FLAG_READ |
+					    VFIO_REGION_INFO_FLAG_WRITE,
+					    private->cmd_region);
+}
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 2ef189fe45ed..d807911b8ed5 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -3,9 +3,11 @@
  * VFIO based Physical Subchannel device driver
  *
  * Copyright IBM Corp. 2017
+ * Copyright Red Hat, Inc. 2019
  *
  * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
  *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ *            Cornelia Huck <cohuck@redhat.com>
  */
 
 #include <linux/module.h>
@@ -23,6 +25,7 @@
 
 struct workqueue_struct *vfio_ccw_work_q;
 static struct kmem_cache *vfio_ccw_io_region;
+static struct kmem_cache *vfio_ccw_cmd_region;
 
 /*
  * Helpers
@@ -104,7 +107,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 {
 	struct pmcw *pmcw = &sch->schib.pmcw;
 	struct vfio_ccw_private *private;
-	int ret;
+	int ret = -ENOMEM;
 
 	if (pmcw->qf) {
 		dev_warn(&sch->dev, "vfio: ccw: does not support QDIO: %s\n",
@@ -118,10 +121,13 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 
 	private->io_region = kmem_cache_zalloc(vfio_ccw_io_region,
 					       GFP_KERNEL | GFP_DMA);
-	if (!private->io_region) {
-		kfree(private);
-		return -ENOMEM;
-	}
+	if (!private->io_region)
+		goto out_free;
+
+	private->cmd_region = kmem_cache_zalloc(vfio_ccw_cmd_region,
+						GFP_KERNEL | GFP_DMA);
+	if (!private->cmd_region)
+		goto out_free;
 
 	private->sch = sch;
 	dev_set_drvdata(&sch->dev, private);
@@ -149,7 +155,10 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	cio_disable_subchannel(sch);
 out_free:
 	dev_set_drvdata(&sch->dev, NULL);
-	kmem_cache_free(vfio_ccw_io_region, private->io_region);
+	if (private->cmd_region)
+		kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
+	if (private->io_region)
+		kmem_cache_free(vfio_ccw_io_region, private->io_region);
 	kfree(private);
 	return ret;
 }
@@ -238,7 +247,7 @@ static struct css_driver vfio_ccw_sch_driver = {
 
 static int __init vfio_ccw_sch_init(void)
 {
-	int ret;
+	int ret = -ENOMEM;
 
 	vfio_ccw_work_q = create_singlethread_workqueue("vfio-ccw");
 	if (!vfio_ccw_work_q)
@@ -248,20 +257,30 @@ static int __init vfio_ccw_sch_init(void)
 					sizeof(struct ccw_io_region), 0,
 					SLAB_ACCOUNT, 0,
 					sizeof(struct ccw_io_region), NULL);
-	if (!vfio_ccw_io_region) {
-		destroy_workqueue(vfio_ccw_work_q);
-		return -ENOMEM;
-	}
+	if (!vfio_ccw_io_region)
+		goto out_err;
+
+	vfio_ccw_cmd_region = kmem_cache_create_usercopy("vfio_ccw_cmd_region",
+					sizeof(struct ccw_cmd_region), 0,
+					SLAB_ACCOUNT, 0,
+					sizeof(struct ccw_cmd_region), NULL);
+	if (!vfio_ccw_cmd_region)
+		goto out_err;
 
 	isc_register(VFIO_CCW_ISC);
 	ret = css_driver_register(&vfio_ccw_sch_driver);
 	if (ret) {
 		isc_unregister(VFIO_CCW_ISC);
-		kmem_cache_destroy(vfio_ccw_io_region);
-		destroy_workqueue(vfio_ccw_work_q);
+		goto out_err;
 	}
 
 	return ret;
+
+out_err:
+	kmem_cache_destroy(vfio_ccw_cmd_region);
+	kmem_cache_destroy(vfio_ccw_io_region);
+	destroy_workqueue(vfio_ccw_work_q);
+	return ret;
 }
 
 static void __exit vfio_ccw_sch_exit(void)
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index f6ed934cc565..72912d596181 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -3,8 +3,10 @@
  * Finite state machine for vfio-ccw device handling
  *
  * Copyright IBM Corp. 2017
+ * Copyright Red Hat, Inc. 2019
  *
  * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Cornelia Huck <cohuck@redhat.com>
  */
 
 #include <linux/vfio.h>
@@ -69,6 +71,81 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
 	return ret;
 }
 
+static int fsm_do_halt(struct vfio_ccw_private *private)
+{
+	struct subchannel *sch;
+	unsigned long flags;
+	int ccode;
+	int ret;
+
+	sch = private->sch;
+
+	spin_lock_irqsave(sch->lock, flags);
+
+	/* Issue "Halt Subchannel" */
+	ccode = hsch(sch->schid);
+
+	switch (ccode) {
+	case 0:
+		/*
+		 * Initialize device status information
+		 */
+		sch->schib.scsw.cmd.actl |= SCSW_ACTL_HALT_PEND;
+		ret = 0;
+		private->state = VFIO_CCW_STATE_BUSY;
+		break;
+	case 1:		/* Status pending */
+	case 2:		/* Busy */
+		ret = -EBUSY;
+		break;
+	case 3:		/* Device not operational */
+	{
+		ret = -ENODEV;
+		break;
+	}
+	default:
+		ret = ccode;
+	}
+	spin_unlock_irqrestore(sch->lock, flags);
+	return ret;
+}
+
+static int fsm_do_clear(struct vfio_ccw_private *private)
+{
+	struct subchannel *sch;
+	unsigned long flags;
+	int ccode;
+	int ret;
+
+	sch = private->sch;
+
+	spin_lock_irqsave(sch->lock, flags);
+
+	/* Issue "Clear Subchannel" */
+	ccode = csch(sch->schid);
+
+	switch (ccode) {
+	case 0:
+		/*
+		 * Initialize device status information
+		 */
+		sch->schib.scsw.cmd.actl = SCSW_ACTL_CLEAR_PEND;
+		/* TODO: check what else we might need to clear */
+		ret = 0;
+		private->state = VFIO_CCW_STATE_BUSY;
+		break;
+	case 3:		/* Device not operational */
+	{
+		ret = -ENODEV;
+		break;
+	}
+	default:
+		ret = ccode;
+	}
+	spin_unlock_irqrestore(sch->lock, flags);
+	return ret;
+}
+
 static void fsm_notoper(struct vfio_ccw_private *private,
 			enum vfio_ccw_event event)
 {
@@ -103,6 +180,14 @@ static void fsm_io_busy(struct vfio_ccw_private *private,
 	private->io_region->ret_code = -EAGAIN;
 }
 
+static void fsm_async_error(struct vfio_ccw_private *private,
+			    enum vfio_ccw_event event)
+{
+	pr_err("vfio-ccw: FSM: halt/clear request from state:%d\n",
+	       private->state);
+	private->cmd_region->ret_code = -EIO;
+}
+
 static void fsm_disabled_irq(struct vfio_ccw_private *private,
 			     enum vfio_ccw_event event)
 {
@@ -165,11 +250,11 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 		}
 		return;
 	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
-		/* XXX: Handle halt. */
+		/* halt is handled via the async cmd region */
 		io_region->ret_code = -EOPNOTSUPP;
 		goto err_out;
 	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
-		/* XXX: Handle clear. */
+		/* clear is handled via the async cmd region */
 		io_region->ret_code = -EOPNOTSUPP;
 		goto err_out;
 	}
@@ -179,6 +264,27 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 			       io_region->ret_code, errstr);
 }
 
+/*
+ * Deal with an async request from userspace.
+ */
+static void fsm_async_request(struct vfio_ccw_private *private,
+			      enum vfio_ccw_event event)
+{
+	struct ccw_cmd_region *cmd_region = private->cmd_region;
+
+	switch (cmd_region->command) {
+	case VFIO_CCW_ASYNC_CMD_HSCH:
+		cmd_region->ret_code = fsm_do_halt(private);
+		break;
+	case VFIO_CCW_ASYNC_CMD_CSCH:
+		cmd_region->ret_code = fsm_do_clear(private);
+		break;
+	default:
+		/* should not happen? */
+		cmd_region->ret_code = -EINVAL;
+	}
+}
+
 /*
  * Got an interrupt for a normal io (state busy).
  */
@@ -202,21 +308,25 @@ fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
 	[VFIO_CCW_STATE_NOT_OPER] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_nop,
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
 	},
 	[VFIO_CCW_STATE_STANDBY] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 	},
 	[VFIO_CCW_STATE_IDLE] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_request,
+		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_request,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 	},
 	[VFIO_CCW_STATE_BUSY] = {
 		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
 		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
+		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_request,
 		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
 	},
 };
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 5a89d09f9271..755806cb8d53 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -148,11 +148,20 @@ static int vfio_ccw_mdev_open(struct mdev_device *mdev)
 	struct vfio_ccw_private *private =
 		dev_get_drvdata(mdev_parent_dev(mdev));
 	unsigned long events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
+	int ret;
 
 	private->nb.notifier_call = vfio_ccw_mdev_notifier;
 
-	return vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
-				      &events, &private->nb);
+	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				     &events, &private->nb);
+	if (ret)
+		return ret;
+
+	ret = vfio_ccw_register_async_dev_regions(private);
+	if (ret)
+		vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+					 &private->nb);
+	return ret;
 }
 
 static void vfio_ccw_mdev_release(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 20e75f4f3695..ed8b94ea2f08 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -31,9 +31,9 @@ struct vfio_ccw_private;
 struct vfio_ccw_region;
 
 struct vfio_ccw_regops {
-	size_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
+	ssize_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
 			size_t count, loff_t *ppos);
-	size_t	(*write)(struct vfio_ccw_private *private,
+	ssize_t	(*write)(struct vfio_ccw_private *private,
 			 const char __user *buf, size_t count, loff_t *ppos);
 	void	(*release)(struct vfio_ccw_private *private,
 			   struct vfio_ccw_region *region);
@@ -53,6 +53,8 @@ int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
 				 const struct vfio_ccw_regops *ops,
 				 size_t size, u32 flags, void *data);
 
+int vfio_ccw_register_async_dev_regions(struct vfio_ccw_private *private);
+
 /**
  * struct vfio_ccw_private
  * @sch: pointer to the subchannel
@@ -64,6 +66,7 @@ int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
  * @io_region: MMIO region to input/output I/O arguments/results
  * @io_mutex: protect against concurrent update of I/O structures
  * @region: additional regions for other subchannel operations
+ * @cmd_region: MMIO region for asynchronous I/O commands other than START
  * @num_regions: number of additional regions
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
@@ -81,6 +84,7 @@ struct vfio_ccw_private {
 	struct ccw_io_region	*io_region;
 	struct mutex		io_mutex;
 	struct vfio_ccw_region *region;
+	struct ccw_cmd_region	*cmd_region;
 	int num_regions;
 
 	struct channel_program	cp;
@@ -115,6 +119,7 @@ enum vfio_ccw_event {
 	VFIO_CCW_EVENT_NOT_OPER,
 	VFIO_CCW_EVENT_IO_REQ,
 	VFIO_CCW_EVENT_INTERRUPT,
+	VFIO_CCW_EVENT_ASYNC_REQ,
 	/* last element! */
 	NR_VFIO_CCW_EVENTS
 };
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 56e2413d3e00..8f10748dac79 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -354,6 +354,8 @@ struct vfio_region_gfx_edid {
 };
 
 #define VFIO_REGION_TYPE_CCW			(2)
+/* ccw sub-types */
+#define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD	(1)
 
 /*
  * 10de vendor sub-type
diff --git a/include/uapi/linux/vfio_ccw.h b/include/uapi/linux/vfio_ccw.h
index 2ec5f367ff78..cbecbf0cd54f 100644
--- a/include/uapi/linux/vfio_ccw.h
+++ b/include/uapi/linux/vfio_ccw.h
@@ -12,6 +12,7 @@
 
 #include <linux/types.h>
 
+/* used for START SUBCHANNEL, always present */
 struct ccw_io_region {
 #define ORB_AREA_SIZE 12
 	__u8	orb_area[ORB_AREA_SIZE];
@@ -22,4 +23,15 @@ struct ccw_io_region {
 	__u32	ret_code;
 } __packed;
 
+/*
+ * used for processing commands that trigger asynchronous actions
+ * Note: this is controlled by a capability
+ */
+#define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
+#define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
+struct ccw_cmd_region {
+	__u32 command;
+	__u32 ret_code;
+} __packed;
+
 #endif
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-21 20:20     ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-21 20:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Mon, 21 Jan 2019 12:03:51 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> Rework handling of multiple I/O requests to return -EAGAIN if
> we are already processing an I/O request. Introduce a mutex
> to disallow concurrent writes to the I/O region.
> 
> The expectation is that userspace simply retries the operation
> if it gets -EAGAIN.
> 
> We currently don't allow multiple ssch requests at the same
> time, as we don't have support for keeping channel programs
> around for more than one request.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---

[..]

>  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>  {
>  	struct vfio_ccw_private *private;
>  	struct ccw_io_region *region;
> +	int ret;
>  
>  	if (*ppos + count > sizeof(*region))
>  		return -EINVAL;
>  
>  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> -	if (private->state != VFIO_CCW_STATE_IDLE)
> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> +	    private->state == VFIO_CCW_STATE_STANDBY)
>  		return -EACCES;
> +	if (!mutex_trylock(&private->io_mutex))
> +		return -EAGAIN;
>  
>  	region = private->io_region;
> -	if (copy_from_user((void *)region + *ppos, buf, count))
> -		return -EFAULT;
> +	if (copy_from_user((void *)region + *ppos, buf, count)) {

This might race with vfio_ccw_sch_io_todo() on
private->io_region->irb_area, or?

> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
>  
>  	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
> -	if (region->ret_code != 0) {
> -		private->state = VFIO_CCW_STATE_IDLE;
> -		return region->ret_code;
> -	}
> +	ret = (region->ret_code != 0) ? region->ret_code : count;
>  
> -	return count;
> +out_unlock:
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
>  }
>  
[..]

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-21 20:20     ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-21 20:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Mon, 21 Jan 2019 12:03:51 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> Rework handling of multiple I/O requests to return -EAGAIN if
> we are already processing an I/O request. Introduce a mutex
> to disallow concurrent writes to the I/O region.
> 
> The expectation is that userspace simply retries the operation
> if it gets -EAGAIN.
> 
> We currently don't allow multiple ssch requests at the same
> time, as we don't have support for keeping channel programs
> around for more than one request.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---

[..]

>  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>  {
>  	struct vfio_ccw_private *private;
>  	struct ccw_io_region *region;
> +	int ret;
>  
>  	if (*ppos + count > sizeof(*region))
>  		return -EINVAL;
>  
>  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> -	if (private->state != VFIO_CCW_STATE_IDLE)
> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> +	    private->state == VFIO_CCW_STATE_STANDBY)
>  		return -EACCES;
> +	if (!mutex_trylock(&private->io_mutex))
> +		return -EAGAIN;
>  
>  	region = private->io_region;
> -	if (copy_from_user((void *)region + *ppos, buf, count))
> -		return -EFAULT;
> +	if (copy_from_user((void *)region + *ppos, buf, count)) {

This might race with vfio_ccw_sch_io_todo() on
private->io_region->irb_area, or?

> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
>  
>  	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
> -	if (region->ret_code != 0) {
> -		private->state = VFIO_CCW_STATE_IDLE;
> -		return region->ret_code;
> -	}
> +	ret = (region->ret_code != 0) ? region->ret_code : count;
>  
> -	return count;
> +out_unlock:
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
>  }
>  
[..]

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-21 20:20     ` [Qemu-devel] " Halil Pasic
@ 2019-01-22 10:29       ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-22 10:29 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Mon, 21 Jan 2019 21:20:18 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 12:03:51 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > Rework handling of multiple I/O requests to return -EAGAIN if
> > we are already processing an I/O request. Introduce a mutex
> > to disallow concurrent writes to the I/O region.
> > 
> > The expectation is that userspace simply retries the operation
> > if it gets -EAGAIN.
> > 
> > We currently don't allow multiple ssch requests at the same
> > time, as we don't have support for keeping channel programs
> > around for more than one request.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > ---  
> 
> [..]
> 
> >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> >  {
> >  	struct vfio_ccw_private *private;
> >  	struct ccw_io_region *region;
> > +	int ret;
> >  
> >  	if (*ppos + count > sizeof(*region))
> >  		return -EINVAL;
> >  
> >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > +	    private->state == VFIO_CCW_STATE_STANDBY)
> >  		return -EACCES;
> > +	if (!mutex_trylock(&private->io_mutex))
> > +		return -EAGAIN;
> >  
> >  	region = private->io_region;
> > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > -		return -EFAULT;
> > +	if (copy_from_user((void *)region + *ppos, buf, count)) {  
> 
> This might race with vfio_ccw_sch_io_todo() on
> private->io_region->irb_area, or?

Ah yes, this should also take the mutex (should work because we're on a
workqueue).

> 
> > +		ret = -EFAULT;
> > +		goto out_unlock;
> > +	}
> >  
> >  	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
> > -	if (region->ret_code != 0) {
> > -		private->state = VFIO_CCW_STATE_IDLE;
> > -		return region->ret_code;
> > -	}
> > +	ret = (region->ret_code != 0) ? region->ret_code : count;
> >  
> > -	return count;
> > +out_unlock:
> > +	mutex_unlock(&private->io_mutex);
> > +	return ret;
> >  }
> >    
> [..]
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-22 10:29       ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-22 10:29 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Mon, 21 Jan 2019 21:20:18 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 12:03:51 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > Rework handling of multiple I/O requests to return -EAGAIN if
> > we are already processing an I/O request. Introduce a mutex
> > to disallow concurrent writes to the I/O region.
> > 
> > The expectation is that userspace simply retries the operation
> > if it gets -EAGAIN.
> > 
> > We currently don't allow multiple ssch requests at the same
> > time, as we don't have support for keeping channel programs
> > around for more than one request.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > ---  
> 
> [..]
> 
> >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> >  {
> >  	struct vfio_ccw_private *private;
> >  	struct ccw_io_region *region;
> > +	int ret;
> >  
> >  	if (*ppos + count > sizeof(*region))
> >  		return -EINVAL;
> >  
> >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > +	    private->state == VFIO_CCW_STATE_STANDBY)
> >  		return -EACCES;
> > +	if (!mutex_trylock(&private->io_mutex))
> > +		return -EAGAIN;
> >  
> >  	region = private->io_region;
> > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > -		return -EFAULT;
> > +	if (copy_from_user((void *)region + *ppos, buf, count)) {  
> 
> This might race with vfio_ccw_sch_io_todo() on
> private->io_region->irb_area, or?

Ah yes, this should also take the mutex (should work because we're on a
workqueue).

> 
> > +		ret = -EFAULT;
> > +		goto out_unlock;
> > +	}
> >  
> >  	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
> > -	if (region->ret_code != 0) {
> > -		private->state = VFIO_CCW_STATE_IDLE;
> > -		return region->ret_code;
> > -	}
> > +	ret = (region->ret_code != 0) ? region->ret_code : count;
> >  
> > -	return count;
> > +out_unlock:
> > +	mutex_unlock(&private->io_mutex);
> > +	return ret;
> >  }
> >    
> [..]
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-22 10:29       ` [Qemu-devel] " Cornelia Huck
@ 2019-01-22 11:17         ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 11:17 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 22 Jan 2019 11:29:26 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 21 Jan 2019 21:20:18 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 21 Jan 2019 12:03:51 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > we are already processing an I/O request. Introduce a mutex
> > > to disallow concurrent writes to the I/O region.
> > > 
> > > The expectation is that userspace simply retries the operation
> > > if it gets -EAGAIN.
> > > 
> > > We currently don't allow multiple ssch requests at the same
> > > time, as we don't have support for keeping channel programs
> > > around for more than one request.
> > > 
> > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > ---  
> > 
> > [..]
> > 
> > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > >  {
> > >  	struct vfio_ccw_private *private;
> > >  	struct ccw_io_region *region;
> > > +	int ret;
> > >  
> > >  	if (*ppos + count > sizeof(*region))
> > >  		return -EINVAL;
> > >  
> > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > >  		return -EACCES;
> > > +	if (!mutex_trylock(&private->io_mutex))
> > > +		return -EAGAIN;
> > >  
> > >  	region = private->io_region;
> > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > -		return -EFAULT;
> > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {  
> > 
> > This might race with vfio_ccw_sch_io_todo() on
> > private->io_region->irb_area, or?
> 
> Ah yes, this should also take the mutex (should work because we're on a
> workqueue).
> 

I'm not sure that will do the trick (assumed I understood the
intention correctly). Let's say the things happen in this order:
1) vfio_ccw_sch_io_todo() goes first, I guess updates
private->io_region->irb_area and releases the mutex.
2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
and finally,
3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().

Or am I misunderstanding something? 

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-22 11:17         ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 11:17 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Tue, 22 Jan 2019 11:29:26 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Mon, 21 Jan 2019 21:20:18 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 21 Jan 2019 12:03:51 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > we are already processing an I/O request. Introduce a mutex
> > > to disallow concurrent writes to the I/O region.
> > > 
> > > The expectation is that userspace simply retries the operation
> > > if it gets -EAGAIN.
> > > 
> > > We currently don't allow multiple ssch requests at the same
> > > time, as we don't have support for keeping channel programs
> > > around for more than one request.
> > > 
> > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > ---  
> > 
> > [..]
> > 
> > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > >  {
> > >  	struct vfio_ccw_private *private;
> > >  	struct ccw_io_region *region;
> > > +	int ret;
> > >  
> > >  	if (*ppos + count > sizeof(*region))
> > >  		return -EINVAL;
> > >  
> > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > >  		return -EACCES;
> > > +	if (!mutex_trylock(&private->io_mutex))
> > > +		return -EAGAIN;
> > >  
> > >  	region = private->io_region;
> > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > -		return -EFAULT;
> > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {  
> > 
> > This might race with vfio_ccw_sch_io_todo() on
> > private->io_region->irb_area, or?
> 
> Ah yes, this should also take the mutex (should work because we're on a
> workqueue).
> 

I'm not sure that will do the trick (assumed I understood the
intention correctly). Let's say the things happen in this order:
1) vfio_ccw_sch_io_todo() goes first, I guess updates
private->io_region->irb_area and releases the mutex.
2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
and finally,
3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().

Or am I misunderstanding something? 

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-22 11:17         ` [Qemu-devel] " Halil Pasic
@ 2019-01-22 11:53           ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-22 11:53 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 22 Jan 2019 12:17:37 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 22 Jan 2019 11:29:26 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Mon, 21 Jan 2019 21:20:18 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:
> > >   
> > > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > > we are already processing an I/O request. Introduce a mutex
> > > > to disallow concurrent writes to the I/O region.
> > > > 
> > > > The expectation is that userspace simply retries the operation
> > > > if it gets -EAGAIN.
> > > > 
> > > > We currently don't allow multiple ssch requests at the same
> > > > time, as we don't have support for keeping channel programs
> > > > around for more than one request.
> > > > 
> > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > > ---    
> > > 
> > > [..]
> > >   
> > > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > >  {
> > > >  	struct vfio_ccw_private *private;
> > > >  	struct ccw_io_region *region;
> > > > +	int ret;
> > > >  
> > > >  	if (*ppos + count > sizeof(*region))
> > > >  		return -EINVAL;
> > > >  
> > > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > > >  		return -EACCES;
> > > > +	if (!mutex_trylock(&private->io_mutex))
> > > > +		return -EAGAIN;
> > > >  
> > > >  	region = private->io_region;
> > > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > > -		return -EFAULT;
> > > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {    
> > > 
> > > This might race with vfio_ccw_sch_io_todo() on
> > > private->io_region->irb_area, or?  
> > 
> > Ah yes, this should also take the mutex (should work because we're on a
> > workqueue).
> >   
> 
> I'm not sure that will do the trick (assumed I understood the
> intention correctly). Let's say the things happen in this order:
> 1) vfio_ccw_sch_io_todo() goes first, I guess updates
> private->io_region->irb_area and releases the mutex.
> 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
> and finally,
> 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().
> 
> Or am I misunderstanding something? 

You're not, but dealing with that race is outside the scope of this
patch. If userspace submits a request and then tries to get the old
data for a prior request, I suggest that userspace needs to fix their
sequencing.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-22 11:53           ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-22 11:53 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Tue, 22 Jan 2019 12:17:37 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 22 Jan 2019 11:29:26 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Mon, 21 Jan 2019 21:20:18 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:
> > >   
> > > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > > we are already processing an I/O request. Introduce a mutex
> > > > to disallow concurrent writes to the I/O region.
> > > > 
> > > > The expectation is that userspace simply retries the operation
> > > > if it gets -EAGAIN.
> > > > 
> > > > We currently don't allow multiple ssch requests at the same
> > > > time, as we don't have support for keeping channel programs
> > > > around for more than one request.
> > > > 
> > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > > ---    
> > > 
> > > [..]
> > >   
> > > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > >  {
> > > >  	struct vfio_ccw_private *private;
> > > >  	struct ccw_io_region *region;
> > > > +	int ret;
> > > >  
> > > >  	if (*ppos + count > sizeof(*region))
> > > >  		return -EINVAL;
> > > >  
> > > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > > >  		return -EACCES;
> > > > +	if (!mutex_trylock(&private->io_mutex))
> > > > +		return -EAGAIN;
> > > >  
> > > >  	region = private->io_region;
> > > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > > -		return -EFAULT;
> > > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {    
> > > 
> > > This might race with vfio_ccw_sch_io_todo() on
> > > private->io_region->irb_area, or?  
> > 
> > Ah yes, this should also take the mutex (should work because we're on a
> > workqueue).
> >   
> 
> I'm not sure that will do the trick (assumed I understood the
> intention correctly). Let's say the things happen in this order:
> 1) vfio_ccw_sch_io_todo() goes first, I guess updates
> private->io_region->irb_area and releases the mutex.
> 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
> and finally,
> 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().
> 
> Or am I misunderstanding something? 

You're not, but dealing with that race is outside the scope of this
patch. If userspace submits a request and then tries to get the old
data for a prior request, I suggest that userspace needs to fix their
sequencing.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-22 11:53           ` [Qemu-devel] " Cornelia Huck
@ 2019-01-22 12:46             ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 12:46 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 22 Jan 2019 12:53:22 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 22 Jan 2019 12:17:37 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Tue, 22 Jan 2019 11:29:26 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > On Mon, 21 Jan 2019 21:20:18 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > >   
> > > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > >   
> > > > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > > > we are already processing an I/O request. Introduce a mutex
> > > > > to disallow concurrent writes to the I/O region.
> > > > > 
> > > > > The expectation is that userspace simply retries the operation
> > > > > if it gets -EAGAIN.
> > > > > 
> > > > > We currently don't allow multiple ssch requests at the same
> > > > > time, as we don't have support for keeping channel programs
> > > > > around for more than one request.
> > > > > 
> > > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > > > ---    
> > > > 
> > > > [..]
> > > >   
> > > > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > >  {
> > > > >  	struct vfio_ccw_private *private;
> > > > >  	struct ccw_io_region *region;
> > > > > +	int ret;
> > > > >  
> > > > >  	if (*ppos + count > sizeof(*region))
> > > > >  		return -EINVAL;
> > > > >  
> > > > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > > > >  		return -EACCES;
> > > > > +	if (!mutex_trylock(&private->io_mutex))
> > > > > +		return -EAGAIN;
> > > > >  
> > > > >  	region = private->io_region;
> > > > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > > > -		return -EFAULT;
> > > > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {    
> > > > 
> > > > This might race with vfio_ccw_sch_io_todo() on
> > > > private->io_region->irb_area, or?  
> > > 
> > > Ah yes, this should also take the mutex (should work because we're on a
> > > workqueue).
> > >   
> > 
> > I'm not sure that will do the trick (assumed I understood the
> > intention correctly). Let's say the things happen in this order:
> > 1) vfio_ccw_sch_io_todo() goes first, I guess updates
> > private->io_region->irb_area and releases the mutex.
> > 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
> > and finally,
> > 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().
> > 
> > Or am I misunderstanding something? 
> 
> You're not, but dealing with that race is outside the scope of this
> patch. If userspace submits a request and then tries to get the old
> data for a prior request, I suggest that userspace needs to fix their
> sequencing.
> 

I tend to disagree, because I think this would be a degradation compared
to what we have right now.

Let me explain. I guess the current idea is that the private->state !=
VFIO_CCW_STATE_IDLE check safeguards against this. Yes we lack proper
synchronization (atomic/interlocked access or locks) that would guarantee
that different thread observe state transitions as required -- no
splint brain. But if state were atomic the scenario I have in mind can
not happen, because we get the solicited interrupt in BUSY state (and
set IDLE in vfio_ccw_sch_io_todo()). Unsolicited interrupts are another
piece of cake -- I have no idea how may of those do we get. And because
of this the broken sequencing in userspace could actually be the kernels
fault.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-22 12:46             ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 12:46 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Tue, 22 Jan 2019 12:53:22 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 22 Jan 2019 12:17:37 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Tue, 22 Jan 2019 11:29:26 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > On Mon, 21 Jan 2019 21:20:18 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > >   
> > > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > >   
> > > > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > > > we are already processing an I/O request. Introduce a mutex
> > > > > to disallow concurrent writes to the I/O region.
> > > > > 
> > > > > The expectation is that userspace simply retries the operation
> > > > > if it gets -EAGAIN.
> > > > > 
> > > > > We currently don't allow multiple ssch requests at the same
> > > > > time, as we don't have support for keeping channel programs
> > > > > around for more than one request.
> > > > > 
> > > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > > > ---    
> > > > 
> > > > [..]
> > > >   
> > > > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > >  {
> > > > >  	struct vfio_ccw_private *private;
> > > > >  	struct ccw_io_region *region;
> > > > > +	int ret;
> > > > >  
> > > > >  	if (*ppos + count > sizeof(*region))
> > > > >  		return -EINVAL;
> > > > >  
> > > > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > > > >  		return -EACCES;
> > > > > +	if (!mutex_trylock(&private->io_mutex))
> > > > > +		return -EAGAIN;
> > > > >  
> > > > >  	region = private->io_region;
> > > > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > > > -		return -EFAULT;
> > > > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {    
> > > > 
> > > > This might race with vfio_ccw_sch_io_todo() on
> > > > private->io_region->irb_area, or?  
> > > 
> > > Ah yes, this should also take the mutex (should work because we're on a
> > > workqueue).
> > >   
> > 
> > I'm not sure that will do the trick (assumed I understood the
> > intention correctly). Let's say the things happen in this order:
> > 1) vfio_ccw_sch_io_todo() goes first, I guess updates
> > private->io_region->irb_area and releases the mutex.
> > 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
> > and finally,
> > 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().
> > 
> > Or am I misunderstanding something? 
> 
> You're not, but dealing with that race is outside the scope of this
> patch. If userspace submits a request and then tries to get the old
> data for a prior request, I suggest that userspace needs to fix their
> sequencing.
> 

I tend to disagree, because I think this would be a degradation compared
to what we have right now.

Let me explain. I guess the current idea is that the private->state !=
VFIO_CCW_STATE_IDLE check safeguards against this. Yes we lack proper
synchronization (atomic/interlocked access or locks) that would guarantee
that different thread observe state transitions as required -- no
splint brain. But if state were atomic the scenario I have in mind can
not happen, because we get the solicited interrupt in BUSY state (and
set IDLE in vfio_ccw_sch_io_todo()). Unsolicited interrupts are another
piece of cake -- I have no idea how may of those do we get. And because
of this the broken sequencing in userspace could actually be the kernels
fault.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 1/5] vfio-ccw: make it safe to access channel programs
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-22 14:56     ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 14:56 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Mon, 21 Jan 2019 12:03:50 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> When we get a solicited interrupt, the start function may have
> been cleared by a csch, but we still have a channel program
> structure allocated. Make it safe to call the cp accessors in
> any case, so we can call them unconditionally.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 3 +++
>  drivers/s390/cio/vfio_ccw_cp.h | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 70a006ba4d05..714987ceea9a 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp)
>  	struct ccwchain *chain, *temp;
>  	int i;
>  
> +	cp->initialized = false;
>  	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
>  		for (i = 0; i < chain->ch_len; i++) {
>  			pfn_array_table_unpin_free(chain->ch_pat + i,
> @@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
>  	 */
>  	cp->orb.cmd.c64 = 1;
>  
> +	cp->initialized = true;
> +
>  	return ret;
>  }
>  
> diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
> index a4b74fb1aa57..3c20cd208da5 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.h
> +++ b/drivers/s390/cio/vfio_ccw_cp.h
> @@ -21,6 +21,7 @@
>   * @ccwchain_list: list head of ccwchains
>   * @orb: orb for the currently processed ssch request
>   * @mdev: the mediated device to perform page pinning/unpinning
> + * @initialized: whether this instance is actually initialized
>   *
>   * @ccwchain_list is the head of a ccwchain list, that contents the
>   * translated result of the guest channel program that pointed out by
> @@ -30,6 +31,7 @@ struct channel_program {
>  	struct list_head ccwchain_list;
>  	union orb orb;
>  	struct device *mdev;
> +	bool initialized;
>  };
>  
>  extern int cp_init(struct channel_program *cp, struct device *mdev,

I fail to see how is this patch delivering on the promise made by it's
title. I see only code that sets cp->initialized but none that reads it.
The follow-up patches don't seem to care for cp->initialized either.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] vfio-ccw: make it safe to access channel programs
@ 2019-01-22 14:56     ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 14:56 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Mon, 21 Jan 2019 12:03:50 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> When we get a solicited interrupt, the start function may have
> been cleared by a csch, but we still have a channel program
> structure allocated. Make it safe to call the cp accessors in
> any case, so we can call them unconditionally.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 3 +++
>  drivers/s390/cio/vfio_ccw_cp.h | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 70a006ba4d05..714987ceea9a 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp)
>  	struct ccwchain *chain, *temp;
>  	int i;
>  
> +	cp->initialized = false;
>  	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
>  		for (i = 0; i < chain->ch_len; i++) {
>  			pfn_array_table_unpin_free(chain->ch_pat + i,
> @@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
>  	 */
>  	cp->orb.cmd.c64 = 1;
>  
> +	cp->initialized = true;
> +
>  	return ret;
>  }
>  
> diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
> index a4b74fb1aa57..3c20cd208da5 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.h
> +++ b/drivers/s390/cio/vfio_ccw_cp.h
> @@ -21,6 +21,7 @@
>   * @ccwchain_list: list head of ccwchains
>   * @orb: orb for the currently processed ssch request
>   * @mdev: the mediated device to perform page pinning/unpinning
> + * @initialized: whether this instance is actually initialized
>   *
>   * @ccwchain_list is the head of a ccwchain list, that contents the
>   * translated result of the guest channel program that pointed out by
> @@ -30,6 +31,7 @@ struct channel_program {
>  	struct list_head ccwchain_list;
>  	union orb orb;
>  	struct device *mdev;
> +	bool initialized;
>  };
>  
>  extern int cp_init(struct channel_program *cp, struct device *mdev,

I fail to see how is this patch delivering on the promise made by it's
title. I see only code that sets cp->initialized but none that reads it.
The follow-up patches don't seem to care for cp->initialized either.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 1/5] vfio-ccw: make it safe to access channel programs
  2019-01-22 14:56     ` [Qemu-devel] " Halil Pasic
@ 2019-01-22 15:19       ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-22 15:19 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 22 Jan 2019 15:56:56 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 12:03:50 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > When we get a solicited interrupt, the start function may have
> > been cleared by a csch, but we still have a channel program
> > structure allocated. Make it safe to call the cp accessors in
> > any case, so we can call them unconditionally.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > ---
> >  drivers/s390/cio/vfio_ccw_cp.c | 3 +++
> >  drivers/s390/cio/vfio_ccw_cp.h | 2 ++
> >  2 files changed, 5 insertions(+)
> > 
> > diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> > index 70a006ba4d05..714987ceea9a 100644
> > --- a/drivers/s390/cio/vfio_ccw_cp.c
> > +++ b/drivers/s390/cio/vfio_ccw_cp.c
> > @@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp)
> >  	struct ccwchain *chain, *temp;
> >  	int i;
> >  
> > +	cp->initialized = false;
> >  	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
> >  		for (i = 0; i < chain->ch_len; i++) {
> >  			pfn_array_table_unpin_free(chain->ch_pat + i,
> > @@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
> >  	 */
> >  	cp->orb.cmd.c64 = 1;
> >  
> > +	cp->initialized = true;
> > +
> >  	return ret;
> >  }
> >  
> > diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
> > index a4b74fb1aa57..3c20cd208da5 100644
> > --- a/drivers/s390/cio/vfio_ccw_cp.h
> > +++ b/drivers/s390/cio/vfio_ccw_cp.h
> > @@ -21,6 +21,7 @@
> >   * @ccwchain_list: list head of ccwchains
> >   * @orb: orb for the currently processed ssch request
> >   * @mdev: the mediated device to perform page pinning/unpinning
> > + * @initialized: whether this instance is actually initialized
> >   *
> >   * @ccwchain_list is the head of a ccwchain list, that contents the
> >   * translated result of the guest channel program that pointed out by
> > @@ -30,6 +31,7 @@ struct channel_program {
> >  	struct list_head ccwchain_list;
> >  	union orb orb;
> >  	struct device *mdev;
> > +	bool initialized;
> >  };
> >  
> >  extern int cp_init(struct channel_program *cp, struct device *mdev,  
> 
> I fail to see how is this patch delivering on the promise made by it's
> title. I see only code that sets cp->initialized but none that reads it.
> The follow-up patches don't seem to care for cp->initialized either.

Grr, has been lost in the rebase. I'll redo this.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] vfio-ccw: make it safe to access channel programs
@ 2019-01-22 15:19       ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-22 15:19 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Tue, 22 Jan 2019 15:56:56 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 12:03:50 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > When we get a solicited interrupt, the start function may have
> > been cleared by a csch, but we still have a channel program
> > structure allocated. Make it safe to call the cp accessors in
> > any case, so we can call them unconditionally.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > ---
> >  drivers/s390/cio/vfio_ccw_cp.c | 3 +++
> >  drivers/s390/cio/vfio_ccw_cp.h | 2 ++
> >  2 files changed, 5 insertions(+)
> > 
> > diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> > index 70a006ba4d05..714987ceea9a 100644
> > --- a/drivers/s390/cio/vfio_ccw_cp.c
> > +++ b/drivers/s390/cio/vfio_ccw_cp.c
> > @@ -335,6 +335,7 @@ static void cp_unpin_free(struct channel_program *cp)
> >  	struct ccwchain *chain, *temp;
> >  	int i;
> >  
> > +	cp->initialized = false;
> >  	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
> >  		for (i = 0; i < chain->ch_len; i++) {
> >  			pfn_array_table_unpin_free(chain->ch_pat + i,
> > @@ -701,6 +702,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
> >  	 */
> >  	cp->orb.cmd.c64 = 1;
> >  
> > +	cp->initialized = true;
> > +
> >  	return ret;
> >  }
> >  
> > diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
> > index a4b74fb1aa57..3c20cd208da5 100644
> > --- a/drivers/s390/cio/vfio_ccw_cp.h
> > +++ b/drivers/s390/cio/vfio_ccw_cp.h
> > @@ -21,6 +21,7 @@
> >   * @ccwchain_list: list head of ccwchains
> >   * @orb: orb for the currently processed ssch request
> >   * @mdev: the mediated device to perform page pinning/unpinning
> > + * @initialized: whether this instance is actually initialized
> >   *
> >   * @ccwchain_list is the head of a ccwchain list, that contents the
> >   * translated result of the guest channel program that pointed out by
> > @@ -30,6 +31,7 @@ struct channel_program {
> >  	struct list_head ccwchain_list;
> >  	union orb orb;
> >  	struct device *mdev;
> > +	bool initialized;
> >  };
> >  
> >  extern int cp_init(struct channel_program *cp, struct device *mdev,  
> 
> I fail to see how is this patch delivering on the promise made by it's
> title. I see only code that sets cp->initialized but none that reads it.
> The follow-up patches don't seem to care for cp->initialized either.

Grr, has been lost in the rebase. I'll redo this.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [qemu-s390x] [PATCH v2 4/5] s390/cio: export hsch to modules
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-22 15:21     ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 15:21 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Mon, 21 Jan 2019 12:03:53 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> The vfio-ccw code will need this, and it matches treatment of ssch
> and csch.
> 
> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

;)

> ---
>  drivers/s390/cio/ioasm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/s390/cio/ioasm.c b/drivers/s390/cio/ioasm.c
> index 14d328338ce2..08eb10283b18 100644
> --- a/drivers/s390/cio/ioasm.c
> +++ b/drivers/s390/cio/ioasm.c
> @@ -233,6 +233,7 @@ int hsch(struct subchannel_id schid)
>  
>  	return ccode;
>  }
> +EXPORT_SYMBOL(hsch);
>  
>  static inline int __xsch(struct subchannel_id schid)
>  {

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v2 4/5] s390/cio: export hsch to modules
@ 2019-01-22 15:21     ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 15:21 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Mon, 21 Jan 2019 12:03:53 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> The vfio-ccw code will need this, and it matches treatment of ssch
> and csch.
> 
> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

;)

> ---
>  drivers/s390/cio/ioasm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/s390/cio/ioasm.c b/drivers/s390/cio/ioasm.c
> index 14d328338ce2..08eb10283b18 100644
> --- a/drivers/s390/cio/ioasm.c
> +++ b/drivers/s390/cio/ioasm.c
> @@ -233,6 +233,7 @@ int hsch(struct subchannel_id schid)
>  
>  	return ccode;
>  }
> +EXPORT_SYMBOL(hsch);
>  
>  static inline int __xsch(struct subchannel_id schid)
>  {

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-22 12:46             ` [Qemu-devel] " Halil Pasic
@ 2019-01-22 17:26               ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-22 17:26 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 22 Jan 2019 13:46:12 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 22 Jan 2019 12:53:22 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 22 Jan 2019 12:17:37 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Tue, 22 Jan 2019 11:29:26 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:
> > >   
> > > > On Mon, 21 Jan 2019 21:20:18 +0100
> > > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > >     
> > > > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > > >     
> > > > > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > > > > we are already processing an I/O request. Introduce a mutex
> > > > > > to disallow concurrent writes to the I/O region.
> > > > > > 
> > > > > > The expectation is that userspace simply retries the operation
> > > > > > if it gets -EAGAIN.
> > > > > > 
> > > > > > We currently don't allow multiple ssch requests at the same
> > > > > > time, as we don't have support for keeping channel programs
> > > > > > around for more than one request.
> > > > > > 
> > > > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > > > > ---      
> > > > > 
> > > > > [..]
> > > > >     
> > > > > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > >  {
> > > > > >  	struct vfio_ccw_private *private;
> > > > > >  	struct ccw_io_region *region;
> > > > > > +	int ret;
> > > > > >  
> > > > > >  	if (*ppos + count > sizeof(*region))
> > > > > >  		return -EINVAL;
> > > > > >  
> > > > > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > > > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > > > > >  		return -EACCES;
> > > > > > +	if (!mutex_trylock(&private->io_mutex))
> > > > > > +		return -EAGAIN;
> > > > > >  
> > > > > >  	region = private->io_region;
> > > > > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > > > > -		return -EFAULT;
> > > > > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {      
> > > > > 
> > > > > This might race with vfio_ccw_sch_io_todo() on
> > > > > private->io_region->irb_area, or?    
> > > > 
> > > > Ah yes, this should also take the mutex (should work because we're on a
> > > > workqueue).
> > > >     
> > > 
> > > I'm not sure that will do the trick (assumed I understood the
> > > intention correctly). Let's say the things happen in this order:
> > > 1) vfio_ccw_sch_io_todo() goes first, I guess updates
> > > private->io_region->irb_area and releases the mutex.
> > > 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
> > > and finally,
> > > 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().
> > > 
> > > Or am I misunderstanding something?   
> > 
> > You're not, but dealing with that race is outside the scope of this
> > patch. If userspace submits a request and then tries to get the old
> > data for a prior request, I suggest that userspace needs to fix their
> > sequencing.
> >   
> 
> I tend to disagree, because I think this would be a degradation compared
> to what we have right now.
> 
> Let me explain. I guess the current idea is that the private->state !=
> VFIO_CCW_STATE_IDLE check safeguards against this. Yes we lack proper
> synchronization (atomic/interlocked access or locks) that would guarantee
> that different thread observe state transitions as required -- no
> splint brain. But if state were atomic the scenario I have in mind can
> not happen, because we get the solicited interrupt in BUSY state (and
> set IDLE in vfio_ccw_sch_io_todo()). 

This BUSY handling is broken for another case: If the guest requests
intermediate interrupts, there may be more than one interrupt by the
hardware -- and we still go out of BUSY state. (Freeing the cp is also
broken in that case.) However, the Linux dasd driver does not seem to
do that.

> Unsolicited interrupts are another
> piece of cake -- I have no idea how may of those do we get.

They at least don't have the "free the cp before we got final state"
bug. But I think both are reasons to get away from "use the BUSY state
to ensure the right sequence".

> And because
> of this the broken sequencing in userspace could actually be the kernels
> fault.

Here, I can't follow you at all :(

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-22 17:26               ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-22 17:26 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Tue, 22 Jan 2019 13:46:12 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 22 Jan 2019 12:53:22 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 22 Jan 2019 12:17:37 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Tue, 22 Jan 2019 11:29:26 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:
> > >   
> > > > On Mon, 21 Jan 2019 21:20:18 +0100
> > > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > >     
> > > > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > > >     
> > > > > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > > > > we are already processing an I/O request. Introduce a mutex
> > > > > > to disallow concurrent writes to the I/O region.
> > > > > > 
> > > > > > The expectation is that userspace simply retries the operation
> > > > > > if it gets -EAGAIN.
> > > > > > 
> > > > > > We currently don't allow multiple ssch requests at the same
> > > > > > time, as we don't have support for keeping channel programs
> > > > > > around for more than one request.
> > > > > > 
> > > > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > > > > ---      
> > > > > 
> > > > > [..]
> > > > >     
> > > > > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > >  {
> > > > > >  	struct vfio_ccw_private *private;
> > > > > >  	struct ccw_io_region *region;
> > > > > > +	int ret;
> > > > > >  
> > > > > >  	if (*ppos + count > sizeof(*region))
> > > > > >  		return -EINVAL;
> > > > > >  
> > > > > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > > > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > > > > >  		return -EACCES;
> > > > > > +	if (!mutex_trylock(&private->io_mutex))
> > > > > > +		return -EAGAIN;
> > > > > >  
> > > > > >  	region = private->io_region;
> > > > > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > > > > -		return -EFAULT;
> > > > > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {      
> > > > > 
> > > > > This might race with vfio_ccw_sch_io_todo() on
> > > > > private->io_region->irb_area, or?    
> > > > 
> > > > Ah yes, this should also take the mutex (should work because we're on a
> > > > workqueue).
> > > >     
> > > 
> > > I'm not sure that will do the trick (assumed I understood the
> > > intention correctly). Let's say the things happen in this order:
> > > 1) vfio_ccw_sch_io_todo() goes first, I guess updates
> > > private->io_region->irb_area and releases the mutex.
> > > 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
> > > and finally,
> > > 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().
> > > 
> > > Or am I misunderstanding something?   
> > 
> > You're not, but dealing with that race is outside the scope of this
> > patch. If userspace submits a request and then tries to get the old
> > data for a prior request, I suggest that userspace needs to fix their
> > sequencing.
> >   
> 
> I tend to disagree, because I think this would be a degradation compared
> to what we have right now.
> 
> Let me explain. I guess the current idea is that the private->state !=
> VFIO_CCW_STATE_IDLE check safeguards against this. Yes we lack proper
> synchronization (atomic/interlocked access or locks) that would guarantee
> that different thread observe state transitions as required -- no
> splint brain. But if state were atomic the scenario I have in mind can
> not happen, because we get the solicited interrupt in BUSY state (and
> set IDLE in vfio_ccw_sch_io_todo()). 

This BUSY handling is broken for another case: If the guest requests
intermediate interrupts, there may be more than one interrupt by the
hardware -- and we still go out of BUSY state. (Freeing the cp is also
broken in that case.) However, the Linux dasd driver does not seem to
do that.

> Unsolicited interrupts are another
> piece of cake -- I have no idea how may of those do we get.

They at least don't have the "free the cp before we got final state"
bug. But I think both are reasons to get away from "use the BUSY state
to ensure the right sequence".

> And because
> of this the broken sequencing in userspace could actually be the kernels
> fault.

Here, I can't follow you at all :(

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-22 18:33     ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 18:33 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Mon, 21 Jan 2019 12:03:51 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -28,6 +28,7 @@
>   * @mdev: pointer to the mediated device
>   * @nb: notifier for vfio events
>   * @io_region: MMIO region to input/output I/O arguments/results
> + * @io_mutex: protect against concurrent update of I/O structures

We could be a bit more specific about what does this mutex guard.
Is it only io_region, or cp, irb and the new regions a well? ->state does
not seem to be covered, but should need some sort of synchronisation
too, or?

>   * @cp: channel program for the current I/O operation
>   * @irb: irb info received from interrupt
>   * @scsw: scsw info
> @@ -42,6 +43,7 @@ struct vfio_ccw_private {
>  	struct mdev_device	*mdev;
>  	struct notifier_block	nb;
>  	struct ccw_io_region	*io_region;
> +	struct mutex		io_mutex;
>  
>  	struct channel_program	cp;
>  	struct irb		irb;
> -- 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-22 18:33     ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 18:33 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Mon, 21 Jan 2019 12:03:51 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -28,6 +28,7 @@
>   * @mdev: pointer to the mediated device
>   * @nb: notifier for vfio events
>   * @io_region: MMIO region to input/output I/O arguments/results
> + * @io_mutex: protect against concurrent update of I/O structures

We could be a bit more specific about what does this mutex guard.
Is it only io_region, or cp, irb and the new regions a well? ->state does
not seem to be covered, but should need some sort of synchronisation
too, or?

>   * @cp: channel program for the current I/O operation
>   * @irb: irb info received from interrupt
>   * @scsw: scsw info
> @@ -42,6 +43,7 @@ struct vfio_ccw_private {
>  	struct mdev_device	*mdev;
>  	struct notifier_block	nb;
>  	struct ccw_io_region	*io_region;
> +	struct mutex		io_mutex;
>  
>  	struct channel_program	cp;
>  	struct irb		irb;
> -- 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-22 17:26               ` [Qemu-devel] " Cornelia Huck
@ 2019-01-22 19:03                 ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 19:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 22 Jan 2019 18:26:17 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 22 Jan 2019 13:46:12 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Tue, 22 Jan 2019 12:53:22 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > On Tue, 22 Jan 2019 12:17:37 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > >   
> > > > On Tue, 22 Jan 2019 11:29:26 +0100
> > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > >   
> > > > > On Mon, 21 Jan 2019 21:20:18 +0100
> > > > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > > >     
> > > > > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > > > >     
> > > > > > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > > > > > we are already processing an I/O request. Introduce a mutex
> > > > > > > to disallow concurrent writes to the I/O region.
> > > > > > > 
> > > > > > > The expectation is that userspace simply retries the operation
> > > > > > > if it gets -EAGAIN.
> > > > > > > 
> > > > > > > We currently don't allow multiple ssch requests at the same
> > > > > > > time, as we don't have support for keeping channel programs
> > > > > > > around for more than one request.
> > > > > > > 
> > > > > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > > > > > ---      
> > > > > > 
> > > > > > [..]
> > > > > >     
> > > > > > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > > >  {
> > > > > > >  	struct vfio_ccw_private *private;
> > > > > > >  	struct ccw_io_region *region;
> > > > > > > +	int ret;
> > > > > > >  
> > > > > > >  	if (*ppos + count > sizeof(*region))
> > > > > > >  		return -EINVAL;
> > > > > > >  
> > > > > > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > > > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > > > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > > > > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > > > > > >  		return -EACCES;
> > > > > > > +	if (!mutex_trylock(&private->io_mutex))
> > > > > > > +		return -EAGAIN;
> > > > > > >  
> > > > > > >  	region = private->io_region;
> > > > > > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > > > > > -		return -EFAULT;
> > > > > > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {      
> > > > > > 
> > > > > > This might race with vfio_ccw_sch_io_todo() on
> > > > > > private->io_region->irb_area, or?    
> > > > > 
> > > > > Ah yes, this should also take the mutex (should work because we're on a
> > > > > workqueue).
> > > > >     
> > > > 
> > > > I'm not sure that will do the trick (assumed I understood the
> > > > intention correctly). Let's say the things happen in this order:
> > > > 1) vfio_ccw_sch_io_todo() goes first, I guess updates
> > > > private->io_region->irb_area and releases the mutex.
> > > > 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
> > > > and finally,
> > > > 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().
> > > > 
> > > > Or am I misunderstanding something?   
> > > 
> > > You're not, but dealing with that race is outside the scope of this
> > > patch. If userspace submits a request and then tries to get the old
> > > data for a prior request, I suggest that userspace needs to fix their
> > > sequencing.
> > >   
> > 
> > I tend to disagree, because I think this would be a degradation compared
> > to what we have right now.
> > 
> > Let me explain. I guess the current idea is that the private->state !=
> > VFIO_CCW_STATE_IDLE check safeguards against this. Yes we lack proper
> > synchronization (atomic/interlocked access or locks) that would guarantee
> > that different thread observe state transitions as required -- no
> > splint brain. But if state were atomic the scenario I have in mind can
> > not happen, because we get the solicited interrupt in BUSY state (and
> > set IDLE in vfio_ccw_sch_io_todo()). 
> 
> This BUSY handling is broken for another case: If the guest requests
> intermediate interrupts, there may be more than one interrupt by the
> hardware -- and we still go out of BUSY state. (Freeing the cp is also
> broken in that case.) However, the Linux dasd driver does not seem to
> do that.
> 

Nod.

> > Unsolicited interrupts are another
> > piece of cake -- I have no idea how may of those do we get.
> 
> They at least don't have the "free the cp before we got final state"
> bug. But I think both are reasons to get away from "use the BUSY state
> to ensure the right sequence".
> 

I'm not sure I understand you correctly. I was under the impression that
the whole point in having a state machine was to ensure the states are
traversed in the right sequence with the right stuff being done on each
transition. At least in theory.

You've probably figured out that IMHO vfio-ccw is not in a good shape
(to put it mildly). I have a hard time reviewing a non-holistic
concurrency fix. Please tell sould I get perceived as non-constructive,
I will try to cut back on criticism. 

> > And because
> > of this the broken sequencing in userspace could actually be the kernels
> > fault.
> 
> Here, I can't follow you at all :(
> 

Should we ever deliver a zeroed out IRB to the userspace, for the next
ioinst it would look like we have no status nor FC bit set. That is, the
guest could end up with stuff in parallel that was never supposed to
be in parallel (i.e. broken sequencing because kernel feeds false
information due to race with unsolicited interrupt).

Does that help?

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-22 19:03                 ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-22 19:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Tue, 22 Jan 2019 18:26:17 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 22 Jan 2019 13:46:12 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Tue, 22 Jan 2019 12:53:22 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > On Tue, 22 Jan 2019 12:17:37 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > >   
> > > > On Tue, 22 Jan 2019 11:29:26 +0100
> > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > >   
> > > > > On Mon, 21 Jan 2019 21:20:18 +0100
> > > > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > > > >     
> > > > > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > > > > Cornelia Huck <cohuck@redhat.com> wrote:
> > > > > >     
> > > > > > > Rework handling of multiple I/O requests to return -EAGAIN if
> > > > > > > we are already processing an I/O request. Introduce a mutex
> > > > > > > to disallow concurrent writes to the I/O region.
> > > > > > > 
> > > > > > > The expectation is that userspace simply retries the operation
> > > > > > > if it gets -EAGAIN.
> > > > > > > 
> > > > > > > We currently don't allow multiple ssch requests at the same
> > > > > > > time, as we don't have support for keeping channel programs
> > > > > > > around for more than one request.
> > > > > > > 
> > > > > > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > > > > > > ---      
> > > > > > 
> > > > > > [..]
> > > > > >     
> > > > > > >  static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > > > @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> > > > > > >  {
> > > > > > >  	struct vfio_ccw_private *private;
> > > > > > >  	struct ccw_io_region *region;
> > > > > > > +	int ret;
> > > > > > >  
> > > > > > >  	if (*ppos + count > sizeof(*region))
> > > > > > >  		return -EINVAL;
> > > > > > >  
> > > > > > >  	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > > > > > > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > > > > > > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > > > > > > +	    private->state == VFIO_CCW_STATE_STANDBY)
> > > > > > >  		return -EACCES;
> > > > > > > +	if (!mutex_trylock(&private->io_mutex))
> > > > > > > +		return -EAGAIN;
> > > > > > >  
> > > > > > >  	region = private->io_region;
> > > > > > > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > > > > > > -		return -EFAULT;
> > > > > > > +	if (copy_from_user((void *)region + *ppos, buf, count)) {      
> > > > > > 
> > > > > > This might race with vfio_ccw_sch_io_todo() on
> > > > > > private->io_region->irb_area, or?    
> > > > > 
> > > > > Ah yes, this should also take the mutex (should work because we're on a
> > > > > workqueue).
> > > > >     
> > > > 
> > > > I'm not sure that will do the trick (assumed I understood the
> > > > intention correctly). Let's say the things happen in this order:
> > > > 1) vfio_ccw_sch_io_todo() goes first, I guess updates
> > > > private->io_region->irb_area and releases the mutex.
> > > > 2) Then vfio_ccw_mdev_write() destroys the irb_area by zeriong it out,
> > > > and finally,
> > > > 3) userspace reads the destroyed irb_area using vfio_ccw_mdev_read().
> > > > 
> > > > Or am I misunderstanding something?   
> > > 
> > > You're not, but dealing with that race is outside the scope of this
> > > patch. If userspace submits a request and then tries to get the old
> > > data for a prior request, I suggest that userspace needs to fix their
> > > sequencing.
> > >   
> > 
> > I tend to disagree, because I think this would be a degradation compared
> > to what we have right now.
> > 
> > Let me explain. I guess the current idea is that the private->state !=
> > VFIO_CCW_STATE_IDLE check safeguards against this. Yes we lack proper
> > synchronization (atomic/interlocked access or locks) that would guarantee
> > that different thread observe state transitions as required -- no
> > splint brain. But if state were atomic the scenario I have in mind can
> > not happen, because we get the solicited interrupt in BUSY state (and
> > set IDLE in vfio_ccw_sch_io_todo()). 
> 
> This BUSY handling is broken for another case: If the guest requests
> intermediate interrupts, there may be more than one interrupt by the
> hardware -- and we still go out of BUSY state. (Freeing the cp is also
> broken in that case.) However, the Linux dasd driver does not seem to
> do that.
> 

Nod.

> > Unsolicited interrupts are another
> > piece of cake -- I have no idea how may of those do we get.
> 
> They at least don't have the "free the cp before we got final state"
> bug. But I think both are reasons to get away from "use the BUSY state
> to ensure the right sequence".
> 

I'm not sure I understand you correctly. I was under the impression that
the whole point in having a state machine was to ensure the states are
traversed in the right sequence with the right stuff being done on each
transition. At least in theory.

You've probably figured out that IMHO vfio-ccw is not in a good shape
(to put it mildly). I have a hard time reviewing a non-holistic
concurrency fix. Please tell sould I get perceived as non-constructive,
I will try to cut back on criticism. 

> > And because
> > of this the broken sequencing in userspace could actually be the kernels
> > fault.
> 
> Here, I can't follow you at all :(
> 

Should we ever deliver a zeroed out IRB to the userspace, for the next
ioinst it would look like we have no status nor FC bit set. That is, the
guest could end up with stuff in parallel that was never supposed to
be in parallel (i.e. broken sequencing because kernel feeds false
information due to race with unsolicited interrupt).

Does that help?

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-22 18:33     ` [Qemu-devel] " Halil Pasic
@ 2019-01-23 10:21       ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-23 10:21 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 22 Jan 2019 19:33:46 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 12:03:51 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > --- a/drivers/s390/cio/vfio_ccw_private.h
> > +++ b/drivers/s390/cio/vfio_ccw_private.h
> > @@ -28,6 +28,7 @@
> >   * @mdev: pointer to the mediated device
> >   * @nb: notifier for vfio events
> >   * @io_region: MMIO region to input/output I/O arguments/results
> > + * @io_mutex: protect against concurrent update of I/O structures  
> 
> We could be a bit more specific about what does this mutex guard.
> Is it only io_region, or cp, irb and the new regions a well? ->state does
> not seem to be covered, but should need some sort of synchronisation
> too, or?

I'm not sure. IIRC Pierre had some ideas about locking in the fsm?

> 
> >   * @cp: channel program for the current I/O operation
> >   * @irb: irb info received from interrupt
> >   * @scsw: scsw info
> > @@ -42,6 +43,7 @@ struct vfio_ccw_private {
> >  	struct mdev_device	*mdev;
> >  	struct notifier_block	nb;
> >  	struct ccw_io_region	*io_region;
> > +	struct mutex		io_mutex;
> >  
> >  	struct channel_program	cp;
> >  	struct irb		irb;
> > --   
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-23 10:21       ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-23 10:21 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Tue, 22 Jan 2019 19:33:46 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 12:03:51 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > --- a/drivers/s390/cio/vfio_ccw_private.h
> > +++ b/drivers/s390/cio/vfio_ccw_private.h
> > @@ -28,6 +28,7 @@
> >   * @mdev: pointer to the mediated device
> >   * @nb: notifier for vfio events
> >   * @io_region: MMIO region to input/output I/O arguments/results
> > + * @io_mutex: protect against concurrent update of I/O structures  
> 
> We could be a bit more specific about what does this mutex guard.
> Is it only io_region, or cp, irb and the new regions a well? ->state does
> not seem to be covered, but should need some sort of synchronisation
> too, or?

I'm not sure. IIRC Pierre had some ideas about locking in the fsm?

> 
> >   * @cp: channel program for the current I/O operation
> >   * @irb: irb info received from interrupt
> >   * @scsw: scsw info
> > @@ -42,6 +43,7 @@ struct vfio_ccw_private {
> >  	struct mdev_device	*mdev;
> >  	struct notifier_block	nb;
> >  	struct ccw_io_region	*io_region;
> > +	struct mutex		io_mutex;
> >  
> >  	struct channel_program	cp;
> >  	struct irb		irb;
> > --   
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-22 19:03                 ` [Qemu-devel] " Halil Pasic
@ 2019-01-23 10:34                   ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-23 10:34 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 22 Jan 2019 20:03:31 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 22 Jan 2019 18:26:17 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 22 Jan 2019 13:46:12 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:

> > > Unsolicited interrupts are another
> > > piece of cake -- I have no idea how may of those do we get.  
> > 
> > They at least don't have the "free the cp before we got final state"
> > bug. But I think both are reasons to get away from "use the BUSY state
> > to ensure the right sequence".
> >   
> 
> I'm not sure I understand you correctly. I was under the impression that
> the whole point in having a state machine was to ensure the states are
> traversed in the right sequence with the right stuff being done on each
> transition. At least in theory.

Sequence in user space programs, not in the state machine.

> 
> You've probably figured out that IMHO vfio-ccw is not in a good shape
> (to put it mildly). I have a hard time reviewing a non-holistic
> concurrency fix. Please tell sould I get perceived as non-constructive,
> I will try to cut back on criticism. 

I'm afraid this is just confusing me :(

> 
> > > And because
> > > of this the broken sequencing in userspace could actually be the kernels
> > > fault.  
> > 
> > Here, I can't follow you at all :(
> >   
> 
> Should we ever deliver a zeroed out IRB to the userspace, for the next
> ioinst it would look like we have no status nor FC bit set. That is, the
> guest could end up with stuff in parallel that was never supposed to
> be in parallel (i.e. broken sequencing because kernel feeds false
> information due to race with unsolicited interrupt).
> 
> Does that help?

Not at all, I'm afraid :( User space programs still need to make sure
they poke the interfaces in the right order IMO...

At this point, I'm mostly confused... I'd prefer to simply fix things
as they come up so that we can finally move forward with the halt/clear
handling (and probably rework the state machine on top of that.)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-23 10:34                   ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-23 10:34 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Tue, 22 Jan 2019 20:03:31 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 22 Jan 2019 18:26:17 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 22 Jan 2019 13:46:12 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:

> > > Unsolicited interrupts are another
> > > piece of cake -- I have no idea how may of those do we get.  
> > 
> > They at least don't have the "free the cp before we got final state"
> > bug. But I think both are reasons to get away from "use the BUSY state
> > to ensure the right sequence".
> >   
> 
> I'm not sure I understand you correctly. I was under the impression that
> the whole point in having a state machine was to ensure the states are
> traversed in the right sequence with the right stuff being done on each
> transition. At least in theory.

Sequence in user space programs, not in the state machine.

> 
> You've probably figured out that IMHO vfio-ccw is not in a good shape
> (to put it mildly). I have a hard time reviewing a non-holistic
> concurrency fix. Please tell sould I get perceived as non-constructive,
> I will try to cut back on criticism. 

I'm afraid this is just confusing me :(

> 
> > > And because
> > > of this the broken sequencing in userspace could actually be the kernels
> > > fault.  
> > 
> > Here, I can't follow you at all :(
> >   
> 
> Should we ever deliver a zeroed out IRB to the userspace, for the next
> ioinst it would look like we have no status nor FC bit set. That is, the
> guest could end up with stuff in parallel that was never supposed to
> be in parallel (i.e. broken sequencing because kernel feeds false
> information due to race with unsolicited interrupt).
> 
> Does that help?

Not at all, I'm afraid :( User space programs still need to make sure
they poke the interfaces in the right order IMO...

At this point, I'm mostly confused... I'd prefer to simply fix things
as they come up so that we can finally move forward with the halt/clear
handling (and probably rework the state machine on top of that.)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-23 10:34                   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-23 13:06                     ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-23 13:06 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Wed, 23 Jan 2019 11:34:47 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 22 Jan 2019 20:03:31 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Tue, 22 Jan 2019 18:26:17 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > On Tue, 22 Jan 2019 13:46:12 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > > > Unsolicited interrupts are another
> > > > piece of cake -- I have no idea how may of those do we get.  
> > > 
> > > They at least don't have the "free the cp before we got final state"
> > > bug. But I think both are reasons to get away from "use the BUSY state
> > > to ensure the right sequence".
> > >   
> > 
> > I'm not sure I understand you correctly. I was under the impression that
> > the whole point in having a state machine was to ensure the states are
> > traversed in the right sequence with the right stuff being done on each
> > transition. At least in theory.
> 
> Sequence in user space programs, not in the state machine.
> 

I'm a bit confused.

> > 
> > You've probably figured out that IMHO vfio-ccw is not in a good shape
> > (to put it mildly). I have a hard time reviewing a non-holistic
> > concurrency fix. Please tell sould I get perceived as non-constructive,
> > I will try to cut back on criticism. 
> 
> I'm afraid this is just confusing me :(
> 
> > 
> > > > And because
> > > > of this the broken sequencing in userspace could actually be the kernels
> > > > fault.  
> > > 
> > > Here, I can't follow you at all :(
> > >   
> > 
> > Should we ever deliver a zeroed out IRB to the userspace, for the next
> > ioinst it would look like we have no status nor FC bit set. That is, the
> > guest could end up with stuff in parallel that was never supposed to
> > be in parallel (i.e. broken sequencing because kernel feeds false
> > information due to race with unsolicited interrupt).
> > 
> > Does that help?
> 
> Not at all, I'm afraid :( User space programs still need to make sure
> they poke the interfaces in the right order IMO...
> 

Yes, one can usually think of interfaces as contracts: both sides need
to keep their end for things to work as intended. Unfortunately the
vfio-ccw iterface is not a very well specified one, and that makes
reasoning about right order so much harder.

I was under the impression that the right ordering is dictated by the
SCSW in userspace. E.g. if there is an FC bit set there userspace is not
ought to issue a SSCH request (write to the io_region). The kernel part
however may say 'userspace read the actual SCSW' by signaling
the io_trigger eventfd. Userspace is supposed to read the IRB from the
region and update it's SCSW.

Now if userspace reads a broken SCSW from the IRB, because of a race
(due to poorly written kernel part -- userspace not at fault), it is
going to make wrong assumptions about currently legal and illegal
operations (ordering).

Previously I described a scenario where IRB can break without userspace
being at fault (race between unsolicited interrupt -- can happen at any
time -- and a legit io request). I was under the impression we agreed on
this.

This in turn could lead to userspace violating the contract, as perceived
by the kernel side.

> At this point, I'm mostly confused... I'd prefer to simply fix things
> as they come up so that we can finally move forward with the halt/clear
> handling (and probably rework the state machine on top of that.)
> 

I understand. I guess you will want to send a new version because of the
stuff that got lost in the rebase, or?

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-23 13:06                     ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-23 13:06 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Wed, 23 Jan 2019 11:34:47 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 22 Jan 2019 20:03:31 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Tue, 22 Jan 2019 18:26:17 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > On Tue, 22 Jan 2019 13:46:12 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > > > Unsolicited interrupts are another
> > > > piece of cake -- I have no idea how may of those do we get.  
> > > 
> > > They at least don't have the "free the cp before we got final state"
> > > bug. But I think both are reasons to get away from "use the BUSY state
> > > to ensure the right sequence".
> > >   
> > 
> > I'm not sure I understand you correctly. I was under the impression that
> > the whole point in having a state machine was to ensure the states are
> > traversed in the right sequence with the right stuff being done on each
> > transition. At least in theory.
> 
> Sequence in user space programs, not in the state machine.
> 

I'm a bit confused.

> > 
> > You've probably figured out that IMHO vfio-ccw is not in a good shape
> > (to put it mildly). I have a hard time reviewing a non-holistic
> > concurrency fix. Please tell sould I get perceived as non-constructive,
> > I will try to cut back on criticism. 
> 
> I'm afraid this is just confusing me :(
> 
> > 
> > > > And because
> > > > of this the broken sequencing in userspace could actually be the kernels
> > > > fault.  
> > > 
> > > Here, I can't follow you at all :(
> > >   
> > 
> > Should we ever deliver a zeroed out IRB to the userspace, for the next
> > ioinst it would look like we have no status nor FC bit set. That is, the
> > guest could end up with stuff in parallel that was never supposed to
> > be in parallel (i.e. broken sequencing because kernel feeds false
> > information due to race with unsolicited interrupt).
> > 
> > Does that help?
> 
> Not at all, I'm afraid :( User space programs still need to make sure
> they poke the interfaces in the right order IMO...
> 

Yes, one can usually think of interfaces as contracts: both sides need
to keep their end for things to work as intended. Unfortunately the
vfio-ccw iterface is not a very well specified one, and that makes
reasoning about right order so much harder.

I was under the impression that the right ordering is dictated by the
SCSW in userspace. E.g. if there is an FC bit set there userspace is not
ought to issue a SSCH request (write to the io_region). The kernel part
however may say 'userspace read the actual SCSW' by signaling
the io_trigger eventfd. Userspace is supposed to read the IRB from the
region and update it's SCSW.

Now if userspace reads a broken SCSW from the IRB, because of a race
(due to poorly written kernel part -- userspace not at fault), it is
going to make wrong assumptions about currently legal and illegal
operations (ordering).

Previously I described a scenario where IRB can break without userspace
being at fault (race between unsolicited interrupt -- can happen at any
time -- and a legit io request). I was under the impression we agreed on
this.

This in turn could lead to userspace violating the contract, as perceived
by the kernel side.

> At this point, I'm mostly confused... I'd prefer to simply fix things
> as they come up so that we can finally move forward with the halt/clear
> handling (and probably rework the state machine on top of that.)
> 

I understand. I guess you will want to send a new version because of the
stuff that got lost in the rebase, or?

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-23 10:21       ` [Qemu-devel] " Cornelia Huck
@ 2019-01-23 13:30         ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-23 13:30 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Wed, 23 Jan 2019 11:21:12 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 22 Jan 2019 19:33:46 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 21 Jan 2019 12:03:51 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > --- a/drivers/s390/cio/vfio_ccw_private.h
> > > +++ b/drivers/s390/cio/vfio_ccw_private.h
> > > @@ -28,6 +28,7 @@
> > >   * @mdev: pointer to the mediated device
> > >   * @nb: notifier for vfio events
> > >   * @io_region: MMIO region to input/output I/O arguments/results
> > > + * @io_mutex: protect against concurrent update of I/O structures  
> > 
> > We could be a bit more specific about what does this mutex guard.
> > Is it only io_region, or cp, irb and the new regions a well? ->state does
> > not seem to be covered, but should need some sort of synchronisation
> > too, or?
> 
> I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> 

Yes there was something. I usually do review by sanity checking the
resulting code. IMHO the fsm stuff is broken now and differently broken
after this series. If we don't want to fix what we are touching, maybe a
pointing out ignored problems in patch descriptions and a
minimal-invasive approach could help ease review.

Regards,
Halil

> > 
> > >   * @cp: channel program for the current I/O operation
> > >   * @irb: irb info received from interrupt
> > >   * @scsw: scsw info
> > > @@ -42,6 +43,7 @@ struct vfio_ccw_private {
> > >  	struct mdev_device	*mdev;
> > >  	struct notifier_block	nb;
> > >  	struct ccw_io_region	*io_region;
> > > +	struct mutex		io_mutex;
> > >  
> > >  	struct channel_program	cp;
> > >  	struct irb		irb;
> > > --   
> > 
> 
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-23 13:30         ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-23 13:30 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Wed, 23 Jan 2019 11:21:12 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 22 Jan 2019 19:33:46 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 21 Jan 2019 12:03:51 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > --- a/drivers/s390/cio/vfio_ccw_private.h
> > > +++ b/drivers/s390/cio/vfio_ccw_private.h
> > > @@ -28,6 +28,7 @@
> > >   * @mdev: pointer to the mediated device
> > >   * @nb: notifier for vfio events
> > >   * @io_region: MMIO region to input/output I/O arguments/results
> > > + * @io_mutex: protect against concurrent update of I/O structures  
> > 
> > We could be a bit more specific about what does this mutex guard.
> > Is it only io_region, or cp, irb and the new regions a well? ->state does
> > not seem to be covered, but should need some sort of synchronisation
> > too, or?
> 
> I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> 

Yes there was something. I usually do review by sanity checking the
resulting code. IMHO the fsm stuff is broken now and differently broken
after this series. If we don't want to fix what we are touching, maybe a
pointing out ignored problems in patch descriptions and a
minimal-invasive approach could help ease review.

Regards,
Halil

> > 
> > >   * @cp: channel program for the current I/O operation
> > >   * @irb: irb info received from interrupt
> > >   * @scsw: scsw info
> > > @@ -42,6 +43,7 @@ struct vfio_ccw_private {
> > >  	struct mdev_device	*mdev;
> > >  	struct notifier_block	nb;
> > >  	struct ccw_io_region	*io_region;
> > > +	struct mutex		io_mutex;
> > >  
> > >  	struct channel_program	cp;
> > >  	struct irb		irb;
> > > --   
> > 
> 
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-23 13:06                     ` [Qemu-devel] " Halil Pasic
@ 2019-01-23 13:34                       ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-23 13:34 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Wed, 23 Jan 2019 14:06:01 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Wed, 23 Jan 2019 11:34:47 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:

> Yes, one can usually think of interfaces as contracts: both sides need
> to keep their end for things to work as intended. Unfortunately the
> vfio-ccw iterface is not a very well specified one, and that makes
> reasoning about right order so much harder.

That's probably where our disconnect comes from.

> 
> I was under the impression that the right ordering is dictated by the
> SCSW in userspace. E.g. if there is an FC bit set there userspace is not
> ought to issue a SSCH request (write to the io_region). The kernel part
> however may say 'userspace read the actual SCSW' by signaling
> the io_trigger eventfd. Userspace is supposed to read the IRB from the
> region and update it's SCSW.
> 
> Now if userspace reads a broken SCSW from the IRB, because of a race
> (due to poorly written kernel part -- userspace not at fault), it is
> going to make wrong assumptions about currently legal and illegal
> operations (ordering).

My understanding of the interface was that writing to the I/O region
triggers a ssch (unless rejected with error) and that reading it just
gets whatever the kernel wrote there the last time it updated its
internal structures. The eventfd simply triggers to say "the region has
been updated with an IRB", not to say "userspace, read this".

> 
> Previously I described a scenario where IRB can break without userspace
> being at fault (race between unsolicited interrupt -- can happen at any
> time -- and a legit io request). I was under the impression we agreed on
> this.

There is a bug in there (clearing the cp for non-final interrupts), and
it needs to be fixed. I'm not so sure if the unsolicited interrupt
thing is a bug (beyond that the internal state machine is confused).

> 
> This in turn could lead to userspace violating the contract, as perceived
> by the kernel side.

Which contract? ;)

Also, I'm not sure if we'd rather get a deferred cc 1?

> 
> > At this point, I'm mostly confused... I'd prefer to simply fix things
> > as they come up so that we can finally move forward with the halt/clear
> > handling (and probably rework the state machine on top of that.)
> >   
> 
> I understand. I guess you will want to send a new version because of the
> stuff that got lost in the rebase, or?

Yes, I'll send a new version; but I'll wait for more feedback for a bit.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-23 13:34                       ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-23 13:34 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Wed, 23 Jan 2019 14:06:01 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Wed, 23 Jan 2019 11:34:47 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:

> Yes, one can usually think of interfaces as contracts: both sides need
> to keep their end for things to work as intended. Unfortunately the
> vfio-ccw iterface is not a very well specified one, and that makes
> reasoning about right order so much harder.

That's probably where our disconnect comes from.

> 
> I was under the impression that the right ordering is dictated by the
> SCSW in userspace. E.g. if there is an FC bit set there userspace is not
> ought to issue a SSCH request (write to the io_region). The kernel part
> however may say 'userspace read the actual SCSW' by signaling
> the io_trigger eventfd. Userspace is supposed to read the IRB from the
> region and update it's SCSW.
> 
> Now if userspace reads a broken SCSW from the IRB, because of a race
> (due to poorly written kernel part -- userspace not at fault), it is
> going to make wrong assumptions about currently legal and illegal
> operations (ordering).

My understanding of the interface was that writing to the I/O region
triggers a ssch (unless rejected with error) and that reading it just
gets whatever the kernel wrote there the last time it updated its
internal structures. The eventfd simply triggers to say "the region has
been updated with an IRB", not to say "userspace, read this".

> 
> Previously I described a scenario where IRB can break without userspace
> being at fault (race between unsolicited interrupt -- can happen at any
> time -- and a legit io request). I was under the impression we agreed on
> this.

There is a bug in there (clearing the cp for non-final interrupts), and
it needs to be fixed. I'm not so sure if the unsolicited interrupt
thing is a bug (beyond that the internal state machine is confused).

> 
> This in turn could lead to userspace violating the contract, as perceived
> by the kernel side.

Which contract? ;)

Also, I'm not sure if we'd rather get a deferred cc 1?

> 
> > At this point, I'm mostly confused... I'd prefer to simply fix things
> > as they come up so that we can finally move forward with the halt/clear
> > handling (and probably rework the state machine on top of that.)
> >   
> 
> I understand. I guess you will want to send a new version because of the
> stuff that got lost in the rebase, or?

Yes, I'll send a new version; but I'll wait for more feedback for a bit.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-23 15:51     ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-23 15:51 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Mon, 21 Jan 2019 12:03:54 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> Add a region to the vfio-ccw device that can be used to submit
> asynchronous I/O instructions. ssch continues to be handled by the
> existing I/O region; the new region handles hsch and csch.
> 
> Interrupt status continues to be reported through the same channels
> as for ssch.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>

I had a look, and I don't have any new concerns.(New like not raised
before.)

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
@ 2019-01-23 15:51     ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-23 15:51 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Mon, 21 Jan 2019 12:03:54 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> Add a region to the vfio-ccw device that can be used to submit
> asynchronous I/O instructions. ssch continues to be handled by the
> existing I/O region; the new region handles hsch and csch.
> 
> Interrupt status continues to be reported through the same channels
> as for ssch.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>

I had a look, and I don't have any new concerns.(New like not raised
before.)

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [qemu-s390x] [PATCH v2 3/5] vfio-ccw: add capabilities chain
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-23 15:57     ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-23 15:57 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Mon, 21 Jan 2019 12:03:52 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> Allow to extend the regions used by vfio-ccw. The first user will be
> handling of halt and clear subchannel.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>

Looks OK to me, but I did not look to hard. I'm likely to invest more
when v3 comes along.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v2 3/5] vfio-ccw: add capabilities chain
@ 2019-01-23 15:57     ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-23 15:57 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Mon, 21 Jan 2019 12:03:52 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> Allow to extend the regions used by vfio-ccw. The first user will be
> handling of halt and clear subchannel.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>

Looks OK to me, but I did not look to hard. I'm likely to invest more
when v3 comes along.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-23 13:30         ` [Qemu-devel] " Halil Pasic
@ 2019-01-24 10:05           ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-24 10:05 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Wed, 23 Jan 2019 14:30:51 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Wed, 23 Jan 2019 11:21:12 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 22 Jan 2019 19:33:46 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:
> > >   
> > > > --- a/drivers/s390/cio/vfio_ccw_private.h
> > > > +++ b/drivers/s390/cio/vfio_ccw_private.h
> > > > @@ -28,6 +28,7 @@
> > > >   * @mdev: pointer to the mediated device
> > > >   * @nb: notifier for vfio events
> > > >   * @io_region: MMIO region to input/output I/O arguments/results
> > > > + * @io_mutex: protect against concurrent update of I/O structures    
> > > 
> > > We could be a bit more specific about what does this mutex guard.
> > > Is it only io_region, or cp, irb and the new regions a well? ->state does
> > > not seem to be covered, but should need some sort of synchronisation
> > > too, or?  
> > 
> > I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> >   
> 
> Yes there was something. I usually do review by sanity checking the
> resulting code. IMHO the fsm stuff is broken now and differently broken
> after this series. If we don't want to fix what we are touching, maybe a
> pointing out ignored problems in patch descriptions and a
> minimal-invasive approach could help ease review.

So, would changing the description above to reference "the I/O
regions" (as it will also be taken when writing to the async region)
and stating that this handles concurrent reading/writing of the regions
help? I really don't want to enumerate everything I don't fix...

> 
> Regards,
> Halil
> 
> > >   
> > > >   * @cp: channel program for the current I/O operation
> > > >   * @irb: irb info received from interrupt
> > > >   * @scsw: scsw info
> > > > @@ -42,6 +43,7 @@ struct vfio_ccw_private {
> > > >  	struct mdev_device	*mdev;
> > > >  	struct notifier_block	nb;
> > > >  	struct ccw_io_region	*io_region;
> > > > +	struct mutex		io_mutex;
> > > >  
> > > >  	struct channel_program	cp;
> > > >  	struct irb		irb;
> > > > --     
> > >   
> > 
> >   
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-24 10:05           ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-24 10:05 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Wed, 23 Jan 2019 14:30:51 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Wed, 23 Jan 2019 11:21:12 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Tue, 22 Jan 2019 19:33:46 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Mon, 21 Jan 2019 12:03:51 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:
> > >   
> > > > --- a/drivers/s390/cio/vfio_ccw_private.h
> > > > +++ b/drivers/s390/cio/vfio_ccw_private.h
> > > > @@ -28,6 +28,7 @@
> > > >   * @mdev: pointer to the mediated device
> > > >   * @nb: notifier for vfio events
> > > >   * @io_region: MMIO region to input/output I/O arguments/results
> > > > + * @io_mutex: protect against concurrent update of I/O structures    
> > > 
> > > We could be a bit more specific about what does this mutex guard.
> > > Is it only io_region, or cp, irb and the new regions a well? ->state does
> > > not seem to be covered, but should need some sort of synchronisation
> > > too, or?  
> > 
> > I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> >   
> 
> Yes there was something. I usually do review by sanity checking the
> resulting code. IMHO the fsm stuff is broken now and differently broken
> after this series. If we don't want to fix what we are touching, maybe a
> pointing out ignored problems in patch descriptions and a
> minimal-invasive approach could help ease review.

So, would changing the description above to reference "the I/O
regions" (as it will also be taken when writing to the async region)
and stating that this handles concurrent reading/writing of the regions
help? I really don't want to enumerate everything I don't fix...

> 
> Regards,
> Halil
> 
> > >   
> > > >   * @cp: channel program for the current I/O operation
> > > >   * @irb: irb info received from interrupt
> > > >   * @scsw: scsw info
> > > > @@ -42,6 +43,7 @@ struct vfio_ccw_private {
> > > >  	struct mdev_device	*mdev;
> > > >  	struct notifier_block	nb;
> > > >  	struct ccw_io_region	*io_region;
> > > > +	struct mutex		io_mutex;
> > > >  
> > > >  	struct channel_program	cp;
> > > >  	struct irb		irb;
> > > > --     
> > >   
> > 
> >   
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
  2019-01-23 15:51     ` [Qemu-devel] " Halil Pasic
@ 2019-01-24 10:06       ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-24 10:06 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Wed, 23 Jan 2019 16:51:48 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 12:03:54 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > Add a region to the vfio-ccw device that can be used to submit
> > asynchronous I/O instructions. ssch continues to be handled by the
> > existing I/O region; the new region handles hsch and csch.
> > 
> > Interrupt status continues to be reported through the same channels
> > as for ssch.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>  
> 
> I had a look, and I don't have any new concerns.(New like not raised
> before.)

So, what was raised before that I did not address?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
@ 2019-01-24 10:06       ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-24 10:06 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Wed, 23 Jan 2019 16:51:48 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 21 Jan 2019 12:03:54 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > Add a region to the vfio-ccw device that can be used to submit
> > asynchronous I/O instructions. ssch continues to be handled by the
> > existing I/O region; the new region handles hsch and csch.
> > 
> > Interrupt status continues to be reported through the same channels
> > as for ssch.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>  
> 
> I had a look, and I don't have any new concerns.(New like not raised
> before.)

So, what was raised before that I did not address?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-23 10:21       ` [Qemu-devel] " Cornelia Huck
@ 2019-01-24 10:08         ` Pierre Morel
  -1 siblings, 0 replies; 134+ messages in thread
From: Pierre Morel @ 2019-01-24 10:08 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic
  Cc: linux-s390, Eric Farman, kvm, qemu-s390x, Farhan Ali, qemu-devel,
	Alex Williamson

On 23/01/2019 11:21, Cornelia Huck wrote:
> On Tue, 22 Jan 2019 19:33:46 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On Mon, 21 Jan 2019 12:03:51 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>> --- a/drivers/s390/cio/vfio_ccw_private.h
>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>>> @@ -28,6 +28,7 @@
>>>    * @mdev: pointer to the mediated device
>>>    * @nb: notifier for vfio events
>>>    * @io_region: MMIO region to input/output I/O arguments/results
>>> + * @io_mutex: protect against concurrent update of I/O structures
>>
>> We could be a bit more specific about what does this mutex guard.
>> Is it only io_region, or cp, irb and the new regions a well? ->state does
>> not seem to be covered, but should need some sort of synchronisation
>> too, or?
> 
> I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> 

Yes I postponed this work to not collide with your patch series.

Do you think I should provide a new version of the FSM reworking series 
based on the last comment I got?

I would take into account that the asynchronous commands will come with 
your patch series and only provide the framework changes.


Regards,
Pierre



-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-24 10:08         ` Pierre Morel
  0 siblings, 0 replies; 134+ messages in thread
From: Pierre Morel @ 2019-01-24 10:08 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic
  Cc: Eric Farman, Farhan Ali, linux-s390, kvm, Alex Williamson,
	qemu-devel, qemu-s390x

On 23/01/2019 11:21, Cornelia Huck wrote:
> On Tue, 22 Jan 2019 19:33:46 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On Mon, 21 Jan 2019 12:03:51 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>> --- a/drivers/s390/cio/vfio_ccw_private.h
>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>>> @@ -28,6 +28,7 @@
>>>    * @mdev: pointer to the mediated device
>>>    * @nb: notifier for vfio events
>>>    * @io_region: MMIO region to input/output I/O arguments/results
>>> + * @io_mutex: protect against concurrent update of I/O structures
>>
>> We could be a bit more specific about what does this mutex guard.
>> Is it only io_region, or cp, irb and the new regions a well? ->state does
>> not seem to be covered, but should need some sort of synchronisation
>> too, or?
> 
> I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> 

Yes I postponed this work to not collide with your patch series.

Do you think I should provide a new version of the FSM reworking series 
based on the last comment I got?

I would take into account that the asynchronous commands will come with 
your patch series and only provide the framework changes.


Regards,
Pierre



-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-24 10:08         ` [Qemu-devel] " Pierre Morel
@ 2019-01-24 10:19           ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-24 10:19 UTC (permalink / raw)
  To: Pierre Morel
  Cc: linux-s390, Eric Farman, kvm, qemu-s390x, Farhan Ali, qemu-devel,
	Halil Pasic, Alex Williamson

On Thu, 24 Jan 2019 11:08:02 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 23/01/2019 11:21, Cornelia Huck wrote:
> > On Tue, 22 Jan 2019 19:33:46 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> >> On Mon, 21 Jan 2019 12:03:51 +0100
> >> Cornelia Huck <cohuck@redhat.com> wrote:
> >>  
> >>> --- a/drivers/s390/cio/vfio_ccw_private.h
> >>> +++ b/drivers/s390/cio/vfio_ccw_private.h
> >>> @@ -28,6 +28,7 @@
> >>>    * @mdev: pointer to the mediated device
> >>>    * @nb: notifier for vfio events
> >>>    * @io_region: MMIO region to input/output I/O arguments/results
> >>> + * @io_mutex: protect against concurrent update of I/O structures  
> >>
> >> We could be a bit more specific about what does this mutex guard.
> >> Is it only io_region, or cp, irb and the new regions a well? ->state does
> >> not seem to be covered, but should need some sort of synchronisation
> >> too, or?  
> > 
> > I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> >   
> 
> Yes I postponed this work to not collide with your patch series.
> 
> Do you think I should provide a new version of the FSM reworking series 
> based on the last comment I got?
> 
> I would take into account that the asynchronous commands will come with 
> your patch series and only provide the framework changes.

This was more an answer to Halil's concerns around state
synchronization. I would prefer to first get this series (or a
variation) into decent shape, and then address state machine handling
on top of that (when we know more about the transitions involved), just
to avoid confusion.

Does that sound reasonable?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-24 10:19           ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-24 10:19 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Halil Pasic, Eric Farman, Farhan Ali, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Thu, 24 Jan 2019 11:08:02 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 23/01/2019 11:21, Cornelia Huck wrote:
> > On Tue, 22 Jan 2019 19:33:46 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> >> On Mon, 21 Jan 2019 12:03:51 +0100
> >> Cornelia Huck <cohuck@redhat.com> wrote:
> >>  
> >>> --- a/drivers/s390/cio/vfio_ccw_private.h
> >>> +++ b/drivers/s390/cio/vfio_ccw_private.h
> >>> @@ -28,6 +28,7 @@
> >>>    * @mdev: pointer to the mediated device
> >>>    * @nb: notifier for vfio events
> >>>    * @io_region: MMIO region to input/output I/O arguments/results
> >>> + * @io_mutex: protect against concurrent update of I/O structures  
> >>
> >> We could be a bit more specific about what does this mutex guard.
> >> Is it only io_region, or cp, irb and the new regions a well? ->state does
> >> not seem to be covered, but should need some sort of synchronisation
> >> too, or?  
> > 
> > I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> >   
> 
> Yes I postponed this work to not collide with your patch series.
> 
> Do you think I should provide a new version of the FSM reworking series 
> based on the last comment I got?
> 
> I would take into account that the asynchronous commands will come with 
> your patch series and only provide the framework changes.

This was more an answer to Halil's concerns around state
synchronization. I would prefer to first get this series (or a
variation) into decent shape, and then address state machine handling
on top of that (when we know more about the transitions involved), just
to avoid confusion.

Does that sound reasonable?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
  2019-01-24 10:06       ` [Qemu-devel] " Cornelia Huck
@ 2019-01-24 10:37         ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-24 10:37 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Thu, 24 Jan 2019 11:06:37 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Wed, 23 Jan 2019 16:51:48 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 21 Jan 2019 12:03:54 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > Add a region to the vfio-ccw device that can be used to submit
> > > asynchronous I/O instructions. ssch continues to be handled by the
> > > existing I/O region; the new region handles hsch and csch.
> > > 
> > > Interrupt status continues to be reported through the same channels
> > > as for ssch.
> > > 
> > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>  
> > 
> > I had a look, and I don't have any new concerns.(New like not raised
> > before.)
> 
> So, what was raised before that I did not address?
> 

I had the cp->initialized in mind here. My understanding is that this is
the point at which safe accessors are necessary. But I consider that
addressed.

I'm still not a fan of this try_lock() and -EAGAIN in write (and just
lock() in read approach), for the reasons I stated before. But it ain't
a deal-breaker for me. It is just that I don't get the benefit of the
busy looping userspace.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
@ 2019-01-24 10:37         ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-24 10:37 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Thu, 24 Jan 2019 11:06:37 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Wed, 23 Jan 2019 16:51:48 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 21 Jan 2019 12:03:54 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> > 
> > > Add a region to the vfio-ccw device that can be used to submit
> > > asynchronous I/O instructions. ssch continues to be handled by the
> > > existing I/O region; the new region handles hsch and csch.
> > > 
> > > Interrupt status continues to be reported through the same channels
> > > as for ssch.
> > > 
> > > Signed-off-by: Cornelia Huck <cohuck@redhat.com>  
> > 
> > I had a look, and I don't have any new concerns.(New like not raised
> > before.)
> 
> So, what was raised before that I did not address?
> 

I had the cp->initialized in mind here. My understanding is that this is
the point at which safe accessors are necessary. But I consider that
addressed.

I'm still not a fan of this try_lock() and -EAGAIN in write (and just
lock() in read approach), for the reasons I stated before. But it ain't
a deal-breaker for me. It is just that I don't get the benefit of the
busy looping userspace.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-24 10:19           ` [Qemu-devel] " Cornelia Huck
@ 2019-01-24 11:18             ` Pierre Morel
  -1 siblings, 0 replies; 134+ messages in thread
From: Pierre Morel @ 2019-01-24 11:18 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, kvm, qemu-s390x, Farhan Ali, qemu-devel,
	Halil Pasic, Alex Williamson

On 24/01/2019 11:19, Cornelia Huck wrote:
> On Thu, 24 Jan 2019 11:08:02 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 23/01/2019 11:21, Cornelia Huck wrote:
>>> On Tue, 22 Jan 2019 19:33:46 +0100
>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>    
>>>> On Mon, 21 Jan 2019 12:03:51 +0100
>>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>>   
>>>>> --- a/drivers/s390/cio/vfio_ccw_private.h
>>>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>>>>> @@ -28,6 +28,7 @@
>>>>>     * @mdev: pointer to the mediated device
>>>>>     * @nb: notifier for vfio events
>>>>>     * @io_region: MMIO region to input/output I/O arguments/results
>>>>> + * @io_mutex: protect against concurrent update of I/O structures
>>>>
>>>> We could be a bit more specific about what does this mutex guard.
>>>> Is it only io_region, or cp, irb and the new regions a well? ->state does
>>>> not seem to be covered, but should need some sort of synchronisation
>>>> too, or?
>>>
>>> I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
>>>    
>>
>> Yes I postponed this work to not collide with your patch series.
>>
>> Do you think I should provide a new version of the FSM reworking series
>> based on the last comment I got?
>>
>> I would take into account that the asynchronous commands will come with
>> your patch series and only provide the framework changes.
> 
> This was more an answer to Halil's concerns around state
> synchronization. I would prefer to first get this series (or a
> variation) into decent shape, and then address state machine handling
> on top of that (when we know more about the transitions involved), just
> to avoid confusion.
> 
> Does that sound reasonable?
> 

Absolutely, this was why I waited with my series. :)


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-24 11:18             ` Pierre Morel
  0 siblings, 0 replies; 134+ messages in thread
From: Pierre Morel @ 2019-01-24 11:18 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Eric Farman, Farhan Ali, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On 24/01/2019 11:19, Cornelia Huck wrote:
> On Thu, 24 Jan 2019 11:08:02 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 23/01/2019 11:21, Cornelia Huck wrote:
>>> On Tue, 22 Jan 2019 19:33:46 +0100
>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>    
>>>> On Mon, 21 Jan 2019 12:03:51 +0100
>>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>>   
>>>>> --- a/drivers/s390/cio/vfio_ccw_private.h
>>>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>>>>> @@ -28,6 +28,7 @@
>>>>>     * @mdev: pointer to the mediated device
>>>>>     * @nb: notifier for vfio events
>>>>>     * @io_region: MMIO region to input/output I/O arguments/results
>>>>> + * @io_mutex: protect against concurrent update of I/O structures
>>>>
>>>> We could be a bit more specific about what does this mutex guard.
>>>> Is it only io_region, or cp, irb and the new regions a well? ->state does
>>>> not seem to be covered, but should need some sort of synchronisation
>>>> too, or?
>>>
>>> I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
>>>    
>>
>> Yes I postponed this work to not collide with your patch series.
>>
>> Do you think I should provide a new version of the FSM reworking series
>> based on the last comment I got?
>>
>> I would take into account that the asynchronous commands will come with
>> your patch series and only provide the framework changes.
> 
> This was more an answer to Halil's concerns around state
> synchronization. I would prefer to first get this series (or a
> variation) into decent shape, and then address state machine handling
> on top of that (when we know more about the transitions involved), just
> to avoid confusion.
> 
> Does that sound reasonable?
> 

Absolutely, this was why I waited with my series. :)


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-24 10:19           ` [Qemu-devel] " Cornelia Huck
@ 2019-01-24 11:45             ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-24 11:45 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Pierre Morel, kvm, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Thu, 24 Jan 2019 11:19:34 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, 24 Jan 2019 11:08:02 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
> > On 23/01/2019 11:21, Cornelia Huck wrote:
> > > On Tue, 22 Jan 2019 19:33:46 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > >   
> > >> On Mon, 21 Jan 2019 12:03:51 +0100
> > >> Cornelia Huck <cohuck@redhat.com> wrote:
> > >>  
> > >>> --- a/drivers/s390/cio/vfio_ccw_private.h
> > >>> +++ b/drivers/s390/cio/vfio_ccw_private.h
> > >>> @@ -28,6 +28,7 @@
> > >>>    * @mdev: pointer to the mediated device
> > >>>    * @nb: notifier for vfio events
> > >>>    * @io_region: MMIO region to input/output I/O arguments/results
> > >>> + * @io_mutex: protect against concurrent update of I/O structures  
> > >>
> > >> We could be a bit more specific about what does this mutex guard.
> > >> Is it only io_region, or cp, irb and the new regions a well? ->state does
> > >> not seem to be covered, but should need some sort of synchronisation
> > >> too, or?  
> > > 
> > > I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> > >   
> > 
> > Yes I postponed this work to not collide with your patch series.
> > 
> > Do you think I should provide a new version of the FSM reworking series 
> > based on the last comment I got?
> > 
> > I would take into account that the asynchronous commands will come with 
> > your patch series and only provide the framework changes.
> 
> This was more an answer to Halil's concerns around state
> synchronization. I would prefer to first get this series (or a
> variation) into decent shape, and then address state machine handling
> on top of that (when we know more about the transitions involved), just
> to avoid confusion.
> 
> Does that sound reasonable?
> 

I would like the two hitting the same kernel release. In that case I'm
fine with deferring some of the concurrency fixes after the csch/hsch
stuff. Otherwise I would have a bad feeling about increasing the
complexity without fixing known bugs.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-24 11:45             ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-24 11:45 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Pierre Morel, Eric Farman, Farhan Ali, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Thu, 24 Jan 2019 11:19:34 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, 24 Jan 2019 11:08:02 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
> > On 23/01/2019 11:21, Cornelia Huck wrote:
> > > On Tue, 22 Jan 2019 19:33:46 +0100
> > > Halil Pasic <pasic@linux.ibm.com> wrote:
> > >   
> > >> On Mon, 21 Jan 2019 12:03:51 +0100
> > >> Cornelia Huck <cohuck@redhat.com> wrote:
> > >>  
> > >>> --- a/drivers/s390/cio/vfio_ccw_private.h
> > >>> +++ b/drivers/s390/cio/vfio_ccw_private.h
> > >>> @@ -28,6 +28,7 @@
> > >>>    * @mdev: pointer to the mediated device
> > >>>    * @nb: notifier for vfio events
> > >>>    * @io_region: MMIO region to input/output I/O arguments/results
> > >>> + * @io_mutex: protect against concurrent update of I/O structures  
> > >>
> > >> We could be a bit more specific about what does this mutex guard.
> > >> Is it only io_region, or cp, irb and the new regions a well? ->state does
> > >> not seem to be covered, but should need some sort of synchronisation
> > >> too, or?  
> > > 
> > > I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
> > >   
> > 
> > Yes I postponed this work to not collide with your patch series.
> > 
> > Do you think I should provide a new version of the FSM reworking series 
> > based on the last comment I got?
> > 
> > I would take into account that the asynchronous commands will come with 
> > your patch series and only provide the framework changes.
> 
> This was more an answer to Halil's concerns around state
> synchronization. I would prefer to first get this series (or a
> variation) into decent shape, and then address state machine handling
> on top of that (when we know more about the transitions involved), just
> to avoid confusion.
> 
> Does that sound reasonable?
> 

I would like the two hitting the same kernel release. In that case I'm
fine with deferring some of the concurrency fixes after the csch/hsch
stuff. Otherwise I would have a bad feeling about increasing the
complexity without fixing known bugs.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-24 10:19           ` [Qemu-devel] " Cornelia Huck
@ 2019-01-24 19:14             ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-24 19:14 UTC (permalink / raw)
  To: Cornelia Huck, Pierre Morel
  Cc: linux-s390, kvm, qemu-s390x, Farhan Ali, qemu-devel, Halil Pasic,
	Alex Williamson



On 01/24/2019 05:19 AM, Cornelia Huck wrote:
> On Thu, 24 Jan 2019 11:08:02 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 23/01/2019 11:21, Cornelia Huck wrote:
>>> On Tue, 22 Jan 2019 19:33:46 +0100
>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>    
>>>> On Mon, 21 Jan 2019 12:03:51 +0100
>>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>>   
>>>>> --- a/drivers/s390/cio/vfio_ccw_private.h
>>>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>>>>> @@ -28,6 +28,7 @@
>>>>>     * @mdev: pointer to the mediated device
>>>>>     * @nb: notifier for vfio events
>>>>>     * @io_region: MMIO region to input/output I/O arguments/results
>>>>> + * @io_mutex: protect against concurrent update of I/O structures
>>>>
>>>> We could be a bit more specific about what does this mutex guard.
>>>> Is it only io_region, or cp, irb and the new regions a well? ->state does
>>>> not seem to be covered, but should need some sort of synchronisation
>>>> too, or?
>>>
>>> I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
>>>    
>>
>> Yes I postponed this work to not collide with your patch series.
>>
>> Do you think I should provide a new version of the FSM reworking series
>> based on the last comment I got?
>>
>> I would take into account that the asynchronous commands will come with
>> your patch series and only provide the framework changes.
> 
> This was more an answer to Halil's concerns around state
> synchronization. I would prefer to first get this series (or a
> variation) into decent shape, and then address state machine handling
> on top of that (when we know more about the transitions involved), just
> to avoid confusion.
> 
> Does that sound reasonable?
> 

It does to me.

<Sorry for my silence; we teach our daughter to share, and she shares 
whatever bug is passed around daycare.  I'm catching up on my "todo" 
emails now!>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-24 19:14             ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-24 19:14 UTC (permalink / raw)
  To: Cornelia Huck, Pierre Morel
  Cc: Halil Pasic, Farhan Ali, linux-s390, kvm, Alex Williamson,
	qemu-devel, qemu-s390x



On 01/24/2019 05:19 AM, Cornelia Huck wrote:
> On Thu, 24 Jan 2019 11:08:02 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 23/01/2019 11:21, Cornelia Huck wrote:
>>> On Tue, 22 Jan 2019 19:33:46 +0100
>>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>>    
>>>> On Mon, 21 Jan 2019 12:03:51 +0100
>>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>>   
>>>>> --- a/drivers/s390/cio/vfio_ccw_private.h
>>>>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>>>>> @@ -28,6 +28,7 @@
>>>>>     * @mdev: pointer to the mediated device
>>>>>     * @nb: notifier for vfio events
>>>>>     * @io_region: MMIO region to input/output I/O arguments/results
>>>>> + * @io_mutex: protect against concurrent update of I/O structures
>>>>
>>>> We could be a bit more specific about what does this mutex guard.
>>>> Is it only io_region, or cp, irb and the new regions a well? ->state does
>>>> not seem to be covered, but should need some sort of synchronisation
>>>> too, or?
>>>
>>> I'm not sure. IIRC Pierre had some ideas about locking in the fsm?
>>>    
>>
>> Yes I postponed this work to not collide with your patch series.
>>
>> Do you think I should provide a new version of the FSM reworking series
>> based on the last comment I got?
>>
>> I would take into account that the asynchronous commands will come with
>> your patch series and only provide the framework changes.
> 
> This was more an answer to Halil's concerns around state
> synchronization. I would prefer to first get this series (or a
> variation) into decent shape, and then address state machine handling
> on top of that (when we know more about the transitions involved), just
> to avoid confusion.
> 
> Does that sound reasonable?
> 

It does to me.

<Sorry for my silence; we teach our daughter to share, and she shares 
whatever bug is passed around daycare.  I'm catching up on my "todo" 
emails now!>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-23 13:34                       ` [Qemu-devel] " Cornelia Huck
@ 2019-01-24 19:16                         ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-24 19:16 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic
  Cc: linux-s390, Pierre Morel, kvm, qemu-s390x, Farhan Ali,
	qemu-devel, Alex Williamson



On 01/23/2019 08:34 AM, Cornelia Huck wrote:
> On Wed, 23 Jan 2019 14:06:01 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On Wed, 23 Jan 2019 11:34:47 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> Yes, one can usually think of interfaces as contracts: both sides need
>> to keep their end for things to work as intended. Unfortunately the
>> vfio-ccw iterface is not a very well specified one, and that makes
>> reasoning about right order so much harder.
> 
> That's probably where our disconnect comes from.
> 
>>
>> I was under the impression that the right ordering is dictated by the
>> SCSW in userspace. E.g. if there is an FC bit set there userspace is not
>> ought to issue a SSCH request (write to the io_region). The kernel part
>> however may say 'userspace read the actual SCSW' by signaling
>> the io_trigger eventfd. Userspace is supposed to read the IRB from the
>> region and update it's SCSW.
>>
>> Now if userspace reads a broken SCSW from the IRB, because of a race
>> (due to poorly written kernel part -- userspace not at fault), it is
>> going to make wrong assumptions about currently legal and illegal
>> operations (ordering).
> 
> My understanding of the interface was that writing to the I/O region
> triggers a ssch (unless rejected with error) and that reading it just
> gets whatever the kernel wrote there the last time it updated its
> internal structures. The eventfd simply triggers to say "the region has
> been updated with an IRB", not to say "userspace, read this".
> 
>>
>> Previously I described a scenario where IRB can break without userspace
>> being at fault (race between unsolicited interrupt -- can happen at any
>> time -- and a legit io request). I was under the impression we agreed on
>> this.
> 
> There is a bug in there (clearing the cp for non-final interrupts), and
> it needs to be fixed. I'm not so sure if the unsolicited interrupt
> thing is a bug (beyond that the internal state machine is confused).
> 
>>
>> This in turn could lead to userspace violating the contract, as perceived
>> by the kernel side.
> 
> Which contract? ;)
> 
> Also, I'm not sure if we'd rather get a deferred cc 1?

As I'm encountering dcc=1 quite regularly lately, it's a nice error. 
But we don't have a good way of recovering from it, and so my test tends 
to go down in a heap quite quickly.  This patch set will probably help; 
I should really get it applied and try it out.

> 
>>
>>> At this point, I'm mostly confused... I'd prefer to simply fix things
>>> as they come up so that we can finally move forward with the halt/clear
>>> handling (and probably rework the state machine on top of that.)

+1 for fixing things as we go.  I hear the complaints about this code 
(and probably say them too), but remain convinced that a large rewrite 
is unnecessary.  Lots of opportunities for improvement, with lots of 
willing and motivated participants, means it can only get better!

>>>    
>>
>> I understand. I guess you will want to send a new version because of the
>> stuff that got lost in the rebase, or?
> 
> Yes, I'll send a new version; but I'll wait for more feedback for a bit.
> 

I'll try to provide some now.  Still digging through the emails marked 
"todo" :)

  - Eric

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-24 19:16                         ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-24 19:16 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic
  Cc: Farhan Ali, Pierre Morel, linux-s390, kvm, Alex Williamson,
	qemu-devel, qemu-s390x



On 01/23/2019 08:34 AM, Cornelia Huck wrote:
> On Wed, 23 Jan 2019 14:06:01 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On Wed, 23 Jan 2019 11:34:47 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> Yes, one can usually think of interfaces as contracts: both sides need
>> to keep their end for things to work as intended. Unfortunately the
>> vfio-ccw iterface is not a very well specified one, and that makes
>> reasoning about right order so much harder.
> 
> That's probably where our disconnect comes from.
> 
>>
>> I was under the impression that the right ordering is dictated by the
>> SCSW in userspace. E.g. if there is an FC bit set there userspace is not
>> ought to issue a SSCH request (write to the io_region). The kernel part
>> however may say 'userspace read the actual SCSW' by signaling
>> the io_trigger eventfd. Userspace is supposed to read the IRB from the
>> region and update it's SCSW.
>>
>> Now if userspace reads a broken SCSW from the IRB, because of a race
>> (due to poorly written kernel part -- userspace not at fault), it is
>> going to make wrong assumptions about currently legal and illegal
>> operations (ordering).
> 
> My understanding of the interface was that writing to the I/O region
> triggers a ssch (unless rejected with error) and that reading it just
> gets whatever the kernel wrote there the last time it updated its
> internal structures. The eventfd simply triggers to say "the region has
> been updated with an IRB", not to say "userspace, read this".
> 
>>
>> Previously I described a scenario where IRB can break without userspace
>> being at fault (race between unsolicited interrupt -- can happen at any
>> time -- and a legit io request). I was under the impression we agreed on
>> this.
> 
> There is a bug in there (clearing the cp for non-final interrupts), and
> it needs to be fixed. I'm not so sure if the unsolicited interrupt
> thing is a bug (beyond that the internal state machine is confused).
> 
>>
>> This in turn could lead to userspace violating the contract, as perceived
>> by the kernel side.
> 
> Which contract? ;)
> 
> Also, I'm not sure if we'd rather get a deferred cc 1?

As I'm encountering dcc=1 quite regularly lately, it's a nice error. 
But we don't have a good way of recovering from it, and so my test tends 
to go down in a heap quite quickly.  This patch set will probably help; 
I should really get it applied and try it out.

> 
>>
>>> At this point, I'm mostly confused... I'd prefer to simply fix things
>>> as they come up so that we can finally move forward with the halt/clear
>>> handling (and probably rework the state machine on top of that.)

+1 for fixing things as we go.  I hear the complaints about this code 
(and probably say them too), but remain convinced that a large rewrite 
is unnecessary.  Lots of opportunities for improvement, with lots of 
willing and motivated participants, means it can only get better!

>>>    
>>
>> I understand. I guess you will want to send a new version because of the
>> stuff that got lost in the rebase, or?
> 
> Yes, I'll send a new version; but I'll wait for more feedback for a bit.
> 

I'll try to provide some now.  Still digging through the emails marked 
"todo" :)

  - Eric

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-25  2:25     ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25  2:25 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, qemu-s390x, Alex Williamson, qemu-devel, kvm



On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> Rework handling of multiple I/O requests to return -EAGAIN if
> we are already processing an I/O request. Introduce a mutex
> to disallow concurrent writes to the I/O region.
> 
> The expectation is that userspace simply retries the operation
> if it gets -EAGAIN.
> 
> We currently don't allow multiple ssch requests at the same
> time, as we don't have support for keeping channel programs
> around for more than one request.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---
>   drivers/s390/cio/vfio_ccw_drv.c     |  1 +
>   drivers/s390/cio/vfio_ccw_fsm.c     |  8 +++-----
>   drivers/s390/cio/vfio_ccw_ops.c     | 31 +++++++++++++++++++----------
>   drivers/s390/cio/vfio_ccw_private.h |  2 ++
>   4 files changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index a10cec0e86eb..2ef189fe45ed 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -125,6 +125,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>   
>   	private->sch = sch;
>   	dev_set_drvdata(&sch->dev, private);
> +	mutex_init(&private->io_mutex);
>   
>   	spin_lock_irq(sch->lock);
>   	private->state = VFIO_CCW_STATE_NOT_OPER;
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
> index cab17865aafe..f6ed934cc565 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -28,7 +28,6 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
>   	sch = private->sch;
>   
>   	spin_lock_irqsave(sch->lock, flags);
> -	private->state = VFIO_CCW_STATE_BUSY;

[1]

>   
>   	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
>   
> @@ -42,6 +41,8 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
>   		 */
>   		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
>   		ret = 0;
> +		/* Don't allow another ssch for now */
> +		private->state = VFIO_CCW_STATE_BUSY;

[1]

>   		break;
>   	case 1:		/* Status pending */
>   	case 2:		/* Busy */
> @@ -99,7 +100,7 @@ static void fsm_io_error(struct vfio_ccw_private *private,
>   static void fsm_io_busy(struct vfio_ccw_private *private,
>   			enum vfio_ccw_event event)
>   {
> -	private->io_region->ret_code = -EBUSY;
> +	private->io_region->ret_code = -EAGAIN;
>   }
>   
>   static void fsm_disabled_irq(struct vfio_ccw_private *private,
> @@ -130,8 +131,6 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   	struct mdev_device *mdev = private->mdev;
>   	char *errstr = "request";
>   
> -	private->state = VFIO_CCW_STATE_BUSY;
> -

[1]

>   	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
>   
>   	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
> @@ -176,7 +175,6 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   	}
>   
>   err_out:
> -	private->state = VFIO_CCW_STATE_IDLE;

[1] I think these changes are cool.  We end up going into (and staying 
in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
bumble along.

But why can't these be separated out from this patch?  It does change 
the behavior of the state machine, and seem distinct from the addition 
of the mutex you otherwise add here?  At the very least, this behavior 
change should be documented in the commit since it's otherwise lost in 
the mutex/EAGAIN stuff.

>   	trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
>   			       io_region->ret_code, errstr);
>   }
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index f673e106c041..3fa9fc570400 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
>   {
>   	struct vfio_ccw_private *private;
>   	struct ccw_io_region *region;
> +	int ret;
>   
>   	if (*ppos + count > sizeof(*region))
>   		return -EINVAL;
>   
>   	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +	mutex_lock(&private->io_mutex);
>   	region = private->io_region;
>   	if (copy_to_user(buf, (void *)region + *ppos, count))
> -		return -EFAULT;
> -
> -	return count;
> +		ret = -EFAULT;
> +	else
> +		ret = count;
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
>   }
>   
>   static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>   {
>   	struct vfio_ccw_private *private;
>   	struct ccw_io_region *region;
> +	int ret;
>   
>   	if (*ppos + count > sizeof(*region))
>   		return -EINVAL;
>   
>   	private = dev_get_drvdata(mdev_parent_dev(mdev));
> -	if (private->state != VFIO_CCW_STATE_IDLE)
> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> +	    private->state == VFIO_CCW_STATE_STANDBY)
>   		return -EACCES;
> +	if (!mutex_trylock(&private->io_mutex))
> +		return -EAGAIN;

Ah, I see Halil's difficulty here.

It is true there is a race condition today, and that this doesn't 
address it.  That's fine, add it to the todo list.  But even with that, 
I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be 
serialized (one will get kicked out with a failed trylock() call), while 
still leaving the window open between cc=0 on the SSCH and the 
subsequent interrupt.  In the latter case, a second SSCH will come 
through here, do the copy_from_user below, and then jump to fsm_io_busy 
to return EAGAIN.  Do we really want to stomp on io_region in that case? 
  Why can't we simply return EAGAIN if state==BUSY?

>   
>   	region = private->io_region;
> -	if (copy_from_user((void *)region + *ppos, buf, count))
> -		return -EFAULT;
> +	if (copy_from_user((void *)region + *ppos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
>   
>   	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ)
> -	if (region->ret_code != 0) {
> -		private->state = VFIO_CCW_STATE_IDLE;

[1] (above)

> -		return region->ret_code;
> -	}
> +	ret = (region->ret_code != 0) ? region->ret_code : count;
>   
> -	return count;
> +out_unlock:
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
>   }
>   
>   static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 08e9a7dc9176..e88237697f83 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -28,6 +28,7 @@
>    * @mdev: pointer to the mediated device
>    * @nb: notifier for vfio events
>    * @io_region: MMIO region to input/output I/O arguments/results
> + * @io_mutex: protect against concurrent update of I/O structures
>    * @cp: channel program for the current I/O operation
>    * @irb: irb info received from interrupt
>    * @scsw: scsw info
> @@ -42,6 +43,7 @@ struct vfio_ccw_private {
>   	struct mdev_device	*mdev;
>   	struct notifier_block	nb;
>   	struct ccw_io_region	*io_region;
> +	struct mutex		io_mutex;
>   
>   	struct channel_program	cp;
>   	struct irb		irb;
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25  2:25     ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25  2:25 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson



On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> Rework handling of multiple I/O requests to return -EAGAIN if
> we are already processing an I/O request. Introduce a mutex
> to disallow concurrent writes to the I/O region.
> 
> The expectation is that userspace simply retries the operation
> if it gets -EAGAIN.
> 
> We currently don't allow multiple ssch requests at the same
> time, as we don't have support for keeping channel programs
> around for more than one request.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---
>   drivers/s390/cio/vfio_ccw_drv.c     |  1 +
>   drivers/s390/cio/vfio_ccw_fsm.c     |  8 +++-----
>   drivers/s390/cio/vfio_ccw_ops.c     | 31 +++++++++++++++++++----------
>   drivers/s390/cio/vfio_ccw_private.h |  2 ++
>   4 files changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index a10cec0e86eb..2ef189fe45ed 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -125,6 +125,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>   
>   	private->sch = sch;
>   	dev_set_drvdata(&sch->dev, private);
> +	mutex_init(&private->io_mutex);
>   
>   	spin_lock_irq(sch->lock);
>   	private->state = VFIO_CCW_STATE_NOT_OPER;
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
> index cab17865aafe..f6ed934cc565 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -28,7 +28,6 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
>   	sch = private->sch;
>   
>   	spin_lock_irqsave(sch->lock, flags);
> -	private->state = VFIO_CCW_STATE_BUSY;

[1]

>   
>   	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
>   
> @@ -42,6 +41,8 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
>   		 */
>   		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
>   		ret = 0;
> +		/* Don't allow another ssch for now */
> +		private->state = VFIO_CCW_STATE_BUSY;

[1]

>   		break;
>   	case 1:		/* Status pending */
>   	case 2:		/* Busy */
> @@ -99,7 +100,7 @@ static void fsm_io_error(struct vfio_ccw_private *private,
>   static void fsm_io_busy(struct vfio_ccw_private *private,
>   			enum vfio_ccw_event event)
>   {
> -	private->io_region->ret_code = -EBUSY;
> +	private->io_region->ret_code = -EAGAIN;
>   }
>   
>   static void fsm_disabled_irq(struct vfio_ccw_private *private,
> @@ -130,8 +131,6 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   	struct mdev_device *mdev = private->mdev;
>   	char *errstr = "request";
>   
> -	private->state = VFIO_CCW_STATE_BUSY;
> -

[1]

>   	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
>   
>   	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
> @@ -176,7 +175,6 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   	}
>   
>   err_out:
> -	private->state = VFIO_CCW_STATE_IDLE;

[1] I think these changes are cool.  We end up going into (and staying 
in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
bumble along.

But why can't these be separated out from this patch?  It does change 
the behavior of the state machine, and seem distinct from the addition 
of the mutex you otherwise add here?  At the very least, this behavior 
change should be documented in the commit since it's otherwise lost in 
the mutex/EAGAIN stuff.

>   	trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
>   			       io_region->ret_code, errstr);
>   }
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index f673e106c041..3fa9fc570400 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
>   {
>   	struct vfio_ccw_private *private;
>   	struct ccw_io_region *region;
> +	int ret;
>   
>   	if (*ppos + count > sizeof(*region))
>   		return -EINVAL;
>   
>   	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +	mutex_lock(&private->io_mutex);
>   	region = private->io_region;
>   	if (copy_to_user(buf, (void *)region + *ppos, count))
> -		return -EFAULT;
> -
> -	return count;
> +		ret = -EFAULT;
> +	else
> +		ret = count;
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
>   }
>   
>   static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>   {
>   	struct vfio_ccw_private *private;
>   	struct ccw_io_region *region;
> +	int ret;
>   
>   	if (*ppos + count > sizeof(*region))
>   		return -EINVAL;
>   
>   	private = dev_get_drvdata(mdev_parent_dev(mdev));
> -	if (private->state != VFIO_CCW_STATE_IDLE)
> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> +	    private->state == VFIO_CCW_STATE_STANDBY)
>   		return -EACCES;
> +	if (!mutex_trylock(&private->io_mutex))
> +		return -EAGAIN;

Ah, I see Halil's difficulty here.

It is true there is a race condition today, and that this doesn't 
address it.  That's fine, add it to the todo list.  But even with that, 
I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be 
serialized (one will get kicked out with a failed trylock() call), while 
still leaving the window open between cc=0 on the SSCH and the 
subsequent interrupt.  In the latter case, a second SSCH will come 
through here, do the copy_from_user below, and then jump to fsm_io_busy 
to return EAGAIN.  Do we really want to stomp on io_region in that case? 
  Why can't we simply return EAGAIN if state==BUSY?

>   
>   	region = private->io_region;
> -	if (copy_from_user((void *)region + *ppos, buf, count))
> -		return -EFAULT;
> +	if (copy_from_user((void *)region + *ppos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
>   
>   	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ)
> -	if (region->ret_code != 0) {
> -		private->state = VFIO_CCW_STATE_IDLE;

[1] (above)

> -		return region->ret_code;
> -	}
> +	ret = (region->ret_code != 0) ? region->ret_code : count;
>   
> -	return count;
> +out_unlock:
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
>   }
>   
>   static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 08e9a7dc9176..e88237697f83 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -28,6 +28,7 @@
>    * @mdev: pointer to the mediated device
>    * @nb: notifier for vfio events
>    * @io_region: MMIO region to input/output I/O arguments/results
> + * @io_mutex: protect against concurrent update of I/O structures
>    * @cp: channel program for the current I/O operation
>    * @irb: irb info received from interrupt
>    * @scsw: scsw info
> @@ -42,6 +43,7 @@ struct vfio_ccw_private {
>   	struct mdev_device	*mdev;
>   	struct notifier_block	nb;
>   	struct ccw_io_region	*io_region;
> +	struct mutex		io_mutex;
>   
>   	struct channel_program	cp;
>   	struct irb		irb;
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25  2:25     ` [Qemu-devel] " Eric Farman
@ 2019-01-25  2:37       ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25  2:37 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, qemu-s390x, Alex Williamson, qemu-devel, kvm



On 01/24/2019 09:25 PM, Eric Farman wrote:
> 
> 
> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
>> Rework handling of multiple I/O requests to return -EAGAIN if
>> we are already processing an I/O request. Introduce a mutex
>> to disallow concurrent writes to the I/O region.
>>
>> The expectation is that userspace simply retries the operation
>> if it gets -EAGAIN.
>>
>> We currently don't allow multiple ssch requests at the same
>> time, as we don't have support for keeping channel programs
>> around for more than one request.
>>
>> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_drv.c     |  1 +
>>   drivers/s390/cio/vfio_ccw_fsm.c     |  8 +++-----
>>   drivers/s390/cio/vfio_ccw_ops.c     | 31 +++++++++++++++++++----------
>>   drivers/s390/cio/vfio_ccw_private.h |  2 ++
>>   4 files changed, 26 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c 
>> b/drivers/s390/cio/vfio_ccw_drv.c
>> index a10cec0e86eb..2ef189fe45ed 100644
>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>> @@ -125,6 +125,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>>       private->sch = sch;
>>       dev_set_drvdata(&sch->dev, private);
>> +    mutex_init(&private->io_mutex);
>>       spin_lock_irq(sch->lock);
>>       private->state = VFIO_CCW_STATE_NOT_OPER;
>> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c 
>> b/drivers/s390/cio/vfio_ccw_fsm.c
>> index cab17865aafe..f6ed934cc565 100644
>> --- a/drivers/s390/cio/vfio_ccw_fsm.c
>> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
>> @@ -28,7 +28,6 @@ static int fsm_io_helper(struct vfio_ccw_private 
>> *private)
>>       sch = private->sch;
>>       spin_lock_irqsave(sch->lock, flags);
>> -    private->state = VFIO_CCW_STATE_BUSY;
> 
> [1]
> 
>>       orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
>> @@ -42,6 +41,8 @@ static int fsm_io_helper(struct vfio_ccw_private 
>> *private)
>>            */
>>           sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
>>           ret = 0;
>> +        /* Don't allow another ssch for now */
>> +        private->state = VFIO_CCW_STATE_BUSY;
> 
> [1]
> 
>>           break;
>>       case 1:        /* Status pending */
>>       case 2:        /* Busy */
>> @@ -99,7 +100,7 @@ static void fsm_io_error(struct vfio_ccw_private 
>> *private,
>>   static void fsm_io_busy(struct vfio_ccw_private *private,
>>               enum vfio_ccw_event event)
>>   {
>> -    private->io_region->ret_code = -EBUSY;
>> +    private->io_region->ret_code = -EAGAIN;
>>   }
>>   static void fsm_disabled_irq(struct vfio_ccw_private *private,
>> @@ -130,8 +131,6 @@ static void fsm_io_request(struct vfio_ccw_private 
>> *private,
>>       struct mdev_device *mdev = private->mdev;
>>       char *errstr = "request";
>> -    private->state = VFIO_CCW_STATE_BUSY;
>> -
> 
> [1]
> 
>>       memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
>>       if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
>> @@ -176,7 +175,6 @@ static void fsm_io_request(struct vfio_ccw_private 
>> *private,
>>       }
>>   err_out:
>> -    private->state = VFIO_CCW_STATE_IDLE;
> 
> [1] I think these changes are cool.  We end up going into (and staying 
> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> bumble along.
> 
> But why can't these be separated out from this patch?  It does change 
> the behavior of the state machine, and seem distinct from the addition 
> of the mutex you otherwise add here?  At the very least, this behavior 
> change should be documented in the commit since it's otherwise lost in 
> the mutex/EAGAIN stuff.
> 
>>       trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
>>                      io_region->ret_code, errstr);
>>   }
>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c 
>> b/drivers/s390/cio/vfio_ccw_ops.c
>> index f673e106c041..3fa9fc570400 100644
>> --- a/drivers/s390/cio/vfio_ccw_ops.c
>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
>> @@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct 
>> mdev_device *mdev,
>>   {
>>       struct vfio_ccw_private *private;
>>       struct ccw_io_region *region;
>> +    int ret;
>>       if (*ppos + count > sizeof(*region))
>>           return -EINVAL;
>>       private = dev_get_drvdata(mdev_parent_dev(mdev));
>> +    mutex_lock(&private->io_mutex);
>>       region = private->io_region;
>>       if (copy_to_user(buf, (void *)region + *ppos, count))
>> -        return -EFAULT;
>> -
>> -    return count;
>> +        ret = -EFAULT;
>> +    else
>> +        ret = count;
>> +    mutex_unlock(&private->io_mutex);
>> +    return ret;
>>   }
>>   static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct 
>> mdev_device *mdev,
>>   {
>>       struct vfio_ccw_private *private;
>>       struct ccw_io_region *region;
>> +    int ret;
>>       if (*ppos + count > sizeof(*region))
>>           return -EINVAL;
>>       private = dev_get_drvdata(mdev_parent_dev(mdev));
>> -    if (private->state != VFIO_CCW_STATE_IDLE)
>> +    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
>> +        private->state == VFIO_CCW_STATE_STANDBY)
>>           return -EACCES;
>> +    if (!mutex_trylock(&private->io_mutex))
>> +        return -EAGAIN;
> 
> Ah, I see Halil's difficulty here.
> 
> It is true there is a race condition today, and that this doesn't 
> address it.  That's fine, add it to the todo list.  But even with that, 
> I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be 
> serialized (one will get kicked out with a failed trylock() call), while 
> still leaving the window open between cc=0 on the SSCH and the 
> subsequent interrupt.  In the latter case, a second SSCH will come 
> through here, do the copy_from_user below, and then jump to fsm_io_busy 
> to return EAGAIN.  Do we really want to stomp on io_region in that case? 
>   Why can't we simply return EAGAIN if state==BUSY?

(Answering my own questions as I skim patch 5...)

Because of course this series is for async handling, while I was looking 
specifically at the synchronous code that exists today.  I guess then my 
question just remains on how the mutex is adding protection in the sync 
case, because that's still not apparent to me.  (Perhaps I missed it in 
a reply to Halil; if so I apologize, there were a lot when I returned.)

> 
>>       region = private->io_region;
>> -    if (copy_from_user((void *)region + *ppos, buf, count))
>> -        return -EFAULT;
>> +    if (copy_from_user((void *)region + *ppos, buf, count)) {
>> +        ret = -EFAULT;
>> +        goto out_unlock;
>> +    }
>>       vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ)
>> -    if (region->ret_code != 0) {
>> -        private->state = VFIO_CCW_STATE_IDLE;
> 
> [1] (above)
> 
>> -        return region->ret_code;
>> -    }
>> +    ret = (region->ret_code != 0) ? region->ret_code : count;
>> -    return count;
>> +out_unlock:
>> +    mutex_unlock(&private->io_mutex);
>> +    return ret;
>>   }
>>   static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
>> diff --git a/drivers/s390/cio/vfio_ccw_private.h 
>> b/drivers/s390/cio/vfio_ccw_private.h
>> index 08e9a7dc9176..e88237697f83 100644
>> --- a/drivers/s390/cio/vfio_ccw_private.h
>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>> @@ -28,6 +28,7 @@
>>    * @mdev: pointer to the mediated device
>>    * @nb: notifier for vfio events
>>    * @io_region: MMIO region to input/output I/O arguments/results
>> + * @io_mutex: protect against concurrent update of I/O structures
>>    * @cp: channel program for the current I/O operation
>>    * @irb: irb info received from interrupt
>>    * @scsw: scsw info
>> @@ -42,6 +43,7 @@ struct vfio_ccw_private {
>>       struct mdev_device    *mdev;
>>       struct notifier_block    nb;
>>       struct ccw_io_region    *io_region;
>> +    struct mutex        io_mutex;
>>       struct channel_program    cp;
>>       struct irb        irb;
>>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25  2:37       ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25  2:37 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson



On 01/24/2019 09:25 PM, Eric Farman wrote:
> 
> 
> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
>> Rework handling of multiple I/O requests to return -EAGAIN if
>> we are already processing an I/O request. Introduce a mutex
>> to disallow concurrent writes to the I/O region.
>>
>> The expectation is that userspace simply retries the operation
>> if it gets -EAGAIN.
>>
>> We currently don't allow multiple ssch requests at the same
>> time, as we don't have support for keeping channel programs
>> around for more than one request.
>>
>> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_drv.c     |  1 +
>>   drivers/s390/cio/vfio_ccw_fsm.c     |  8 +++-----
>>   drivers/s390/cio/vfio_ccw_ops.c     | 31 +++++++++++++++++++----------
>>   drivers/s390/cio/vfio_ccw_private.h |  2 ++
>>   4 files changed, 26 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c 
>> b/drivers/s390/cio/vfio_ccw_drv.c
>> index a10cec0e86eb..2ef189fe45ed 100644
>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>> @@ -125,6 +125,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>>       private->sch = sch;
>>       dev_set_drvdata(&sch->dev, private);
>> +    mutex_init(&private->io_mutex);
>>       spin_lock_irq(sch->lock);
>>       private->state = VFIO_CCW_STATE_NOT_OPER;
>> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c 
>> b/drivers/s390/cio/vfio_ccw_fsm.c
>> index cab17865aafe..f6ed934cc565 100644
>> --- a/drivers/s390/cio/vfio_ccw_fsm.c
>> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
>> @@ -28,7 +28,6 @@ static int fsm_io_helper(struct vfio_ccw_private 
>> *private)
>>       sch = private->sch;
>>       spin_lock_irqsave(sch->lock, flags);
>> -    private->state = VFIO_CCW_STATE_BUSY;
> 
> [1]
> 
>>       orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
>> @@ -42,6 +41,8 @@ static int fsm_io_helper(struct vfio_ccw_private 
>> *private)
>>            */
>>           sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
>>           ret = 0;
>> +        /* Don't allow another ssch for now */
>> +        private->state = VFIO_CCW_STATE_BUSY;
> 
> [1]
> 
>>           break;
>>       case 1:        /* Status pending */
>>       case 2:        /* Busy */
>> @@ -99,7 +100,7 @@ static void fsm_io_error(struct vfio_ccw_private 
>> *private,
>>   static void fsm_io_busy(struct vfio_ccw_private *private,
>>               enum vfio_ccw_event event)
>>   {
>> -    private->io_region->ret_code = -EBUSY;
>> +    private->io_region->ret_code = -EAGAIN;
>>   }
>>   static void fsm_disabled_irq(struct vfio_ccw_private *private,
>> @@ -130,8 +131,6 @@ static void fsm_io_request(struct vfio_ccw_private 
>> *private,
>>       struct mdev_device *mdev = private->mdev;
>>       char *errstr = "request";
>> -    private->state = VFIO_CCW_STATE_BUSY;
>> -
> 
> [1]
> 
>>       memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
>>       if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
>> @@ -176,7 +175,6 @@ static void fsm_io_request(struct vfio_ccw_private 
>> *private,
>>       }
>>   err_out:
>> -    private->state = VFIO_CCW_STATE_IDLE;
> 
> [1] I think these changes are cool.  We end up going into (and staying 
> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> bumble along.
> 
> But why can't these be separated out from this patch?  It does change 
> the behavior of the state machine, and seem distinct from the addition 
> of the mutex you otherwise add here?  At the very least, this behavior 
> change should be documented in the commit since it's otherwise lost in 
> the mutex/EAGAIN stuff.
> 
>>       trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
>>                      io_region->ret_code, errstr);
>>   }
>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c 
>> b/drivers/s390/cio/vfio_ccw_ops.c
>> index f673e106c041..3fa9fc570400 100644
>> --- a/drivers/s390/cio/vfio_ccw_ops.c
>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
>> @@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct 
>> mdev_device *mdev,
>>   {
>>       struct vfio_ccw_private *private;
>>       struct ccw_io_region *region;
>> +    int ret;
>>       if (*ppos + count > sizeof(*region))
>>           return -EINVAL;
>>       private = dev_get_drvdata(mdev_parent_dev(mdev));
>> +    mutex_lock(&private->io_mutex);
>>       region = private->io_region;
>>       if (copy_to_user(buf, (void *)region + *ppos, count))
>> -        return -EFAULT;
>> -
>> -    return count;
>> +        ret = -EFAULT;
>> +    else
>> +        ret = count;
>> +    mutex_unlock(&private->io_mutex);
>> +    return ret;
>>   }
>>   static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct 
>> mdev_device *mdev,
>>   {
>>       struct vfio_ccw_private *private;
>>       struct ccw_io_region *region;
>> +    int ret;
>>       if (*ppos + count > sizeof(*region))
>>           return -EINVAL;
>>       private = dev_get_drvdata(mdev_parent_dev(mdev));
>> -    if (private->state != VFIO_CCW_STATE_IDLE)
>> +    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
>> +        private->state == VFIO_CCW_STATE_STANDBY)
>>           return -EACCES;
>> +    if (!mutex_trylock(&private->io_mutex))
>> +        return -EAGAIN;
> 
> Ah, I see Halil's difficulty here.
> 
> It is true there is a race condition today, and that this doesn't 
> address it.  That's fine, add it to the todo list.  But even with that, 
> I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be 
> serialized (one will get kicked out with a failed trylock() call), while 
> still leaving the window open between cc=0 on the SSCH and the 
> subsequent interrupt.  In the latter case, a second SSCH will come 
> through here, do the copy_from_user below, and then jump to fsm_io_busy 
> to return EAGAIN.  Do we really want to stomp on io_region in that case? 
>   Why can't we simply return EAGAIN if state==BUSY?

(Answering my own questions as I skim patch 5...)

Because of course this series is for async handling, while I was looking 
specifically at the synchronous code that exists today.  I guess then my 
question just remains on how the mutex is adding protection in the sync 
case, because that's still not apparent to me.  (Perhaps I missed it in 
a reply to Halil; if so I apologize, there were a lot when I returned.)

> 
>>       region = private->io_region;
>> -    if (copy_from_user((void *)region + *ppos, buf, count))
>> -        return -EFAULT;
>> +    if (copy_from_user((void *)region + *ppos, buf, count)) {
>> +        ret = -EFAULT;
>> +        goto out_unlock;
>> +    }
>>       vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ)
>> -    if (region->ret_code != 0) {
>> -        private->state = VFIO_CCW_STATE_IDLE;
> 
> [1] (above)
> 
>> -        return region->ret_code;
>> -    }
>> +    ret = (region->ret_code != 0) ? region->ret_code : count;
>> -    return count;
>> +out_unlock:
>> +    mutex_unlock(&private->io_mutex);
>> +    return ret;
>>   }
>>   static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
>> diff --git a/drivers/s390/cio/vfio_ccw_private.h 
>> b/drivers/s390/cio/vfio_ccw_private.h
>> index 08e9a7dc9176..e88237697f83 100644
>> --- a/drivers/s390/cio/vfio_ccw_private.h
>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>> @@ -28,6 +28,7 @@
>>    * @mdev: pointer to the mediated device
>>    * @nb: notifier for vfio events
>>    * @io_region: MMIO region to input/output I/O arguments/results
>> + * @io_mutex: protect against concurrent update of I/O structures
>>    * @cp: channel program for the current I/O operation
>>    * @irb: irb info received from interrupt
>>    * @scsw: scsw info
>> @@ -42,6 +43,7 @@ struct vfio_ccw_private {
>>       struct mdev_device    *mdev;
>>       struct notifier_block    nb;
>>       struct ccw_io_region    *io_region;
>> +    struct mutex        io_mutex;
>>       struct channel_program    cp;
>>       struct irb        irb;
>>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-24 19:16                         ` [Qemu-devel] " Eric Farman
@ 2019-01-25 10:13                           ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-25 10:13 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Pierre Morel, kvm, qemu-s390x, Farhan Ali,
	qemu-devel, Halil Pasic, Alex Williamson

On Thu, 24 Jan 2019 14:16:21 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/23/2019 08:34 AM, Cornelia Huck wrote:
> > On Wed, 23 Jan 2019 14:06:01 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> >> On Wed, 23 Jan 2019 11:34:47 +0100
> >> Cornelia Huck <cohuck@redhat.com> wrote:  
> >   
> >> Yes, one can usually think of interfaces as contracts: both sides need
> >> to keep their end for things to work as intended. Unfortunately the
> >> vfio-ccw iterface is not a very well specified one, and that makes
> >> reasoning about right order so much harder.  
> > 
> > That's probably where our disconnect comes from.
> >   
> >>
> >> I was under the impression that the right ordering is dictated by the
> >> SCSW in userspace. E.g. if there is an FC bit set there userspace is not
> >> ought to issue a SSCH request (write to the io_region). The kernel part
> >> however may say 'userspace read the actual SCSW' by signaling
> >> the io_trigger eventfd. Userspace is supposed to read the IRB from the
> >> region and update it's SCSW.
> >>
> >> Now if userspace reads a broken SCSW from the IRB, because of a race
> >> (due to poorly written kernel part -- userspace not at fault), it is
> >> going to make wrong assumptions about currently legal and illegal
> >> operations (ordering).  
> > 
> > My understanding of the interface was that writing to the I/O region
> > triggers a ssch (unless rejected with error) and that reading it just
> > gets whatever the kernel wrote there the last time it updated its
> > internal structures. The eventfd simply triggers to say "the region has
> > been updated with an IRB", not to say "userspace, read this".
> >   
> >>
> >> Previously I described a scenario where IRB can break without userspace
> >> being at fault (race between unsolicited interrupt -- can happen at any
> >> time -- and a legit io request). I was under the impression we agreed on
> >> this.  
> > 
> > There is a bug in there (clearing the cp for non-final interrupts), and
> > it needs to be fixed. I'm not so sure if the unsolicited interrupt
> > thing is a bug (beyond that the internal state machine is confused).
> >   
> >>
> >> This in turn could lead to userspace violating the contract, as perceived
> >> by the kernel side.  
> > 
> > Which contract? ;)
> > 
> > Also, I'm not sure if we'd rather get a deferred cc 1?  
> 
> As I'm encountering dcc=1 quite regularly lately, it's a nice error. 
> But we don't have a good way of recovering from it, and so my test tends 
> to go down in a heap quite quickly.  This patch set will probably help; 
> I should really get it applied and try it out.

The deferred cc 1 is probably more likely simply due to the overhead we
get from intercepting the I/O calls.

> 
> >   
> >>  
> >>> At this point, I'm mostly confused... I'd prefer to simply fix things
> >>> as they come up so that we can finally move forward with the halt/clear
> >>> handling (and probably rework the state machine on top of that.)  
> 
> +1 for fixing things as we go.  I hear the complaints about this code 
> (and probably say them too), but remain convinced that a large rewrite 
> is unnecessary.  Lots of opportunities for improvement, with lots of 
> willing and motivated participants, means it can only get better!

Yeah, the code would probably look a bit different if I started writing
it from scratch now, but I don't think the basic design is unfixably
broken.

> 
> >>>      
> >>
> >> I understand. I guess you will want to send a new version because of the
> >> stuff that got lost in the rebase, or?  
> > 
> > Yes, I'll send a new version; but I'll wait for more feedback for a bit.
> >   
> 
> I'll try to provide some now.  Still digging through the emails marked 
> "todo" :)

Ok, I'll wait for a bit more :)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 10:13                           ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-25 10:13 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	Alex Williamson, qemu-devel, qemu-s390x

On Thu, 24 Jan 2019 14:16:21 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/23/2019 08:34 AM, Cornelia Huck wrote:
> > On Wed, 23 Jan 2019 14:06:01 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> >> On Wed, 23 Jan 2019 11:34:47 +0100
> >> Cornelia Huck <cohuck@redhat.com> wrote:  
> >   
> >> Yes, one can usually think of interfaces as contracts: both sides need
> >> to keep their end for things to work as intended. Unfortunately the
> >> vfio-ccw iterface is not a very well specified one, and that makes
> >> reasoning about right order so much harder.  
> > 
> > That's probably where our disconnect comes from.
> >   
> >>
> >> I was under the impression that the right ordering is dictated by the
> >> SCSW in userspace. E.g. if there is an FC bit set there userspace is not
> >> ought to issue a SSCH request (write to the io_region). The kernel part
> >> however may say 'userspace read the actual SCSW' by signaling
> >> the io_trigger eventfd. Userspace is supposed to read the IRB from the
> >> region and update it's SCSW.
> >>
> >> Now if userspace reads a broken SCSW from the IRB, because of a race
> >> (due to poorly written kernel part -- userspace not at fault), it is
> >> going to make wrong assumptions about currently legal and illegal
> >> operations (ordering).  
> > 
> > My understanding of the interface was that writing to the I/O region
> > triggers a ssch (unless rejected with error) and that reading it just
> > gets whatever the kernel wrote there the last time it updated its
> > internal structures. The eventfd simply triggers to say "the region has
> > been updated with an IRB", not to say "userspace, read this".
> >   
> >>
> >> Previously I described a scenario where IRB can break without userspace
> >> being at fault (race between unsolicited interrupt -- can happen at any
> >> time -- and a legit io request). I was under the impression we agreed on
> >> this.  
> > 
> > There is a bug in there (clearing the cp for non-final interrupts), and
> > it needs to be fixed. I'm not so sure if the unsolicited interrupt
> > thing is a bug (beyond that the internal state machine is confused).
> >   
> >>
> >> This in turn could lead to userspace violating the contract, as perceived
> >> by the kernel side.  
> > 
> > Which contract? ;)
> > 
> > Also, I'm not sure if we'd rather get a deferred cc 1?  
> 
> As I'm encountering dcc=1 quite regularly lately, it's a nice error. 
> But we don't have a good way of recovering from it, and so my test tends 
> to go down in a heap quite quickly.  This patch set will probably help; 
> I should really get it applied and try it out.

The deferred cc 1 is probably more likely simply due to the overhead we
get from intercepting the I/O calls.

> 
> >   
> >>  
> >>> At this point, I'm mostly confused... I'd prefer to simply fix things
> >>> as they come up so that we can finally move forward with the halt/clear
> >>> handling (and probably rework the state machine on top of that.)  
> 
> +1 for fixing things as we go.  I hear the complaints about this code 
> (and probably say them too), but remain convinced that a large rewrite 
> is unnecessary.  Lots of opportunities for improvement, with lots of 
> willing and motivated participants, means it can only get better!

Yeah, the code would probably look a bit different if I started writing
it from scratch now, but I don't think the basic design is unfixably
broken.

> 
> >>>      
> >>
> >> I understand. I guess you will want to send a new version because of the
> >> stuff that got lost in the rebase, or?  
> > 
> > Yes, I'll send a new version; but I'll wait for more feedback for a bit.
> >   
> 
> I'll try to provide some now.  Still digging through the emails marked 
> "todo" :)

Ok, I'll wait for a bit more :)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25  2:37       ` [Qemu-devel] " Eric Farman
@ 2019-01-25 10:24         ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-25 10:24 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x

On Thu, 24 Jan 2019 21:37:44 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/24/2019 09:25 PM, Eric Farman wrote:
> > 
> > 
> > On 01/21/2019 06:03 AM, Cornelia Huck wrote:  

> > [1] I think these changes are cool.  We end up going into (and staying 
> > in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> > bumble along.
> > 
> > But why can't these be separated out from this patch?  It does change 
> > the behavior of the state machine, and seem distinct from the addition 
> > of the mutex you otherwise add here?  At the very least, this behavior 
> > change should be documented in the commit since it's otherwise lost in 
> > the mutex/EAGAIN stuff.

That's a very good idea. I'll factor them out into a separate patch.

> >   
> >>       trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
> >>                      io_region->ret_code, errstr);
> >>   }
> >> diff --git a/drivers/s390/cio/vfio_ccw_ops.c 
> >> b/drivers/s390/cio/vfio_ccw_ops.c
> >> index f673e106c041..3fa9fc570400 100644
> >> --- a/drivers/s390/cio/vfio_ccw_ops.c
> >> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> >> @@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct 
> >> mdev_device *mdev,
> >>   {
> >>       struct vfio_ccw_private *private;
> >>       struct ccw_io_region *region;
> >> +    int ret;
> >>       if (*ppos + count > sizeof(*region))
> >>           return -EINVAL;
> >>       private = dev_get_drvdata(mdev_parent_dev(mdev));
> >> +    mutex_lock(&private->io_mutex);
> >>       region = private->io_region;
> >>       if (copy_to_user(buf, (void *)region + *ppos, count))
> >> -        return -EFAULT;
> >> -
> >> -    return count;
> >> +        ret = -EFAULT;
> >> +    else
> >> +        ret = count;
> >> +    mutex_unlock(&private->io_mutex);
> >> +    return ret;
> >>   }
> >>   static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> >> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct 
> >> mdev_device *mdev,
> >>   {
> >>       struct vfio_ccw_private *private;
> >>       struct ccw_io_region *region;
> >> +    int ret;
> >>       if (*ppos + count > sizeof(*region))
> >>           return -EINVAL;
> >>       private = dev_get_drvdata(mdev_parent_dev(mdev));
> >> -    if (private->state != VFIO_CCW_STATE_IDLE)
> >> +    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> >> +        private->state == VFIO_CCW_STATE_STANDBY)
> >>           return -EACCES;
> >> +    if (!mutex_trylock(&private->io_mutex))
> >> +        return -EAGAIN;  
> > 
> > Ah, I see Halil's difficulty here.
> > 
> > It is true there is a race condition today, and that this doesn't 
> > address it.  That's fine, add it to the todo list.  But even with that, 
> > I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be 
> > serialized (one will get kicked out with a failed trylock() call), while 
> > still leaving the window open between cc=0 on the SSCH and the 
> > subsequent interrupt.  In the latter case, a second SSCH will come 
> > through here, do the copy_from_user below, and then jump to fsm_io_busy 
> > to return EAGAIN.  Do we really want to stomp on io_region in that case? 
> >   Why can't we simply return EAGAIN if state==BUSY?  
> 
> (Answering my own questions as I skim patch 5...)
> 
> Because of course this series is for async handling, while I was looking 
> specifically at the synchronous code that exists today.  I guess then my 
> question just remains on how the mutex is adding protection in the sync 
> case, because that's still not apparent to me.  (Perhaps I missed it in 
> a reply to Halil; if so I apologize, there were a lot when I returned.)

My idea behind the mutex was to make sure that we get consistent data
when reading/writing (e.g. if one user space thread is reading the I/O
region while another is writing to it).

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 10:24         ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-25 10:24 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Thu, 24 Jan 2019 21:37:44 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/24/2019 09:25 PM, Eric Farman wrote:
> > 
> > 
> > On 01/21/2019 06:03 AM, Cornelia Huck wrote:  

> > [1] I think these changes are cool.  We end up going into (and staying 
> > in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> > bumble along.
> > 
> > But why can't these be separated out from this patch?  It does change 
> > the behavior of the state machine, and seem distinct from the addition 
> > of the mutex you otherwise add here?  At the very least, this behavior 
> > change should be documented in the commit since it's otherwise lost in 
> > the mutex/EAGAIN stuff.

That's a very good idea. I'll factor them out into a separate patch.

> >   
> >>       trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
> >>                      io_region->ret_code, errstr);
> >>   }
> >> diff --git a/drivers/s390/cio/vfio_ccw_ops.c 
> >> b/drivers/s390/cio/vfio_ccw_ops.c
> >> index f673e106c041..3fa9fc570400 100644
> >> --- a/drivers/s390/cio/vfio_ccw_ops.c
> >> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> >> @@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct 
> >> mdev_device *mdev,
> >>   {
> >>       struct vfio_ccw_private *private;
> >>       struct ccw_io_region *region;
> >> +    int ret;
> >>       if (*ppos + count > sizeof(*region))
> >>           return -EINVAL;
> >>       private = dev_get_drvdata(mdev_parent_dev(mdev));
> >> +    mutex_lock(&private->io_mutex);
> >>       region = private->io_region;
> >>       if (copy_to_user(buf, (void *)region + *ppos, count))
> >> -        return -EFAULT;
> >> -
> >> -    return count;
> >> +        ret = -EFAULT;
> >> +    else
> >> +        ret = count;
> >> +    mutex_unlock(&private->io_mutex);
> >> +    return ret;
> >>   }
> >>   static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> >> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct 
> >> mdev_device *mdev,
> >>   {
> >>       struct vfio_ccw_private *private;
> >>       struct ccw_io_region *region;
> >> +    int ret;
> >>       if (*ppos + count > sizeof(*region))
> >>           return -EINVAL;
> >>       private = dev_get_drvdata(mdev_parent_dev(mdev));
> >> -    if (private->state != VFIO_CCW_STATE_IDLE)
> >> +    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> >> +        private->state == VFIO_CCW_STATE_STANDBY)
> >>           return -EACCES;
> >> +    if (!mutex_trylock(&private->io_mutex))
> >> +        return -EAGAIN;  
> > 
> > Ah, I see Halil's difficulty here.
> > 
> > It is true there is a race condition today, and that this doesn't 
> > address it.  That's fine, add it to the todo list.  But even with that, 
> > I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be 
> > serialized (one will get kicked out with a failed trylock() call), while 
> > still leaving the window open between cc=0 on the SSCH and the 
> > subsequent interrupt.  In the latter case, a second SSCH will come 
> > through here, do the copy_from_user below, and then jump to fsm_io_busy 
> > to return EAGAIN.  Do we really want to stomp on io_region in that case? 
> >   Why can't we simply return EAGAIN if state==BUSY?  
> 
> (Answering my own questions as I skim patch 5...)
> 
> Because of course this series is for async handling, while I was looking 
> specifically at the synchronous code that exists today.  I guess then my 
> question just remains on how the mutex is adding protection in the sync 
> case, because that's still not apparent to me.  (Perhaps I missed it in 
> a reply to Halil; if so I apologize, there were a lot when I returned.)

My idea behind the mutex was to make sure that we get consistent data
when reading/writing (e.g. if one user space thread is reading the I/O
region while another is writing to it).

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25  2:25     ` [Qemu-devel] " Eric Farman
@ 2019-01-25 12:58       ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-25 12:58 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Cornelia Huck, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Thu, 24 Jan 2019 21:25:10 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> >   	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > +	    private->state == VFIO_CCW_STATE_STANDBY)
> >   		return -EACCES;
> > +	if (!mutex_trylock(&private->io_mutex))
> > +		return -EAGAIN;  
> 
> Ah, I see Halil's difficulty here.
> 
> It is true there is a race condition today, and that this doesn't 
> address it.  That's fine, add it to the todo list.  But even with that, 
> I don't see what the mutex is enforcing?  

It is protecting the io regions. AFAIU the idea was that only one
thread is accessing the io region(s) at a time to prevent corruption and
reading half-morphed data.

> Two simultaneous SSCHs will be 
> serialized (one will get kicked out with a failed trylock() call), while 
> still leaving the window open between cc=0 on the SSCH and the 
> subsequent interrupt.  In the latter case, a second SSCH will come 
> through here, do the copy_from_user below, and then jump to fsm_io_busy 
> to return EAGAIN.  Do we really want to stomp on io_region in that case?

I'm not sure I understood you correctly. The interrupt handler does not
take the lock before writing to the io_region. That is one race but it is
easy to fix.

The bigger problem is that between the interrupt handler has written IRB
area and userspace has read it we may end up destroying it by stomping on
it (to use your words). The userspace reading a wrong (given todays qemu
zeroed out) IRB could lead to follow on problems.
 
>   Why can't we simply return EAGAIN if state==BUSY?
> 

Sure we can. That would essentially go back to the old way of things:
if not idle return with error. Just the error code returned would change
form EACCESS to EAGAIN. Which Isn't necessarily a win, because
conceptually here should be never two interleaved io_requests/start
commands hitting the module.


> >   
> >   	region = private->io_region;
> > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > -		return -EFAULT;
> > +	if (copy_from_user((void *)region + *ppos, buf, count)) {
> > +		ret = -EFAULT;
> > +		goto out_unlock;
> > +	}

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 12:58       ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-25 12:58 UTC (permalink / raw)
  To: Eric Farman
  Cc: Cornelia Huck, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Thu, 24 Jan 2019 21:25:10 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> >   	private = dev_get_drvdata(mdev_parent_dev(mdev));
> > -	if (private->state != VFIO_CCW_STATE_IDLE)
> > +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> > +	    private->state == VFIO_CCW_STATE_STANDBY)
> >   		return -EACCES;
> > +	if (!mutex_trylock(&private->io_mutex))
> > +		return -EAGAIN;  
> 
> Ah, I see Halil's difficulty here.
> 
> It is true there is a race condition today, and that this doesn't 
> address it.  That's fine, add it to the todo list.  But even with that, 
> I don't see what the mutex is enforcing?  

It is protecting the io regions. AFAIU the idea was that only one
thread is accessing the io region(s) at a time to prevent corruption and
reading half-morphed data.

> Two simultaneous SSCHs will be 
> serialized (one will get kicked out with a failed trylock() call), while 
> still leaving the window open between cc=0 on the SSCH and the 
> subsequent interrupt.  In the latter case, a second SSCH will come 
> through here, do the copy_from_user below, and then jump to fsm_io_busy 
> to return EAGAIN.  Do we really want to stomp on io_region in that case?

I'm not sure I understood you correctly. The interrupt handler does not
take the lock before writing to the io_region. That is one race but it is
easy to fix.

The bigger problem is that between the interrupt handler has written IRB
area and userspace has read it we may end up destroying it by stomping on
it (to use your words). The userspace reading a wrong (given todays qemu
zeroed out) IRB could lead to follow on problems.
 
>   Why can't we simply return EAGAIN if state==BUSY?
> 

Sure we can. That would essentially go back to the old way of things:
if not idle return with error. Just the error code returned would change
form EACCESS to EAGAIN. Which Isn't necessarily a win, because
conceptually here should be never two interleaved io_requests/start
commands hitting the module.


> >   
> >   	region = private->io_region;
> > -	if (copy_from_user((void *)region + *ppos, buf, count))
> > -		return -EFAULT;
> > +	if (copy_from_user((void *)region + *ppos, buf, count)) {
> > +		ret = -EFAULT;
> > +		goto out_unlock;
> > +	}

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 10:24         ` [Qemu-devel] " Cornelia Huck
@ 2019-01-25 12:58           ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-25 12:58 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x

On Fri, 25 Jan 2019 11:24:37 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, 24 Jan 2019 21:37:44 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > On 01/24/2019 09:25 PM, Eric Farman wrote:  
> > > 
> > > 
> > > On 01/21/2019 06:03 AM, Cornelia Huck wrote:    
> 
> > > [1] I think these changes are cool.  We end up going into (and staying 
> > > in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> > > bumble along.
> > > 
> > > But why can't these be separated out from this patch?  It does change 
> > > the behavior of the state machine, and seem distinct from the addition 
> > > of the mutex you otherwise add here?  At the very least, this behavior 
> > > change should be documented in the commit since it's otherwise lost in 
> > > the mutex/EAGAIN stuff.  
> 
> That's a very good idea. I'll factor them out into a separate patch.

And now that I've factored it out, I noticed some more problems.

What we basically need is the following, I think:

- The code should not be interrupted while we process the channel
  program, do the ssch etc. We want the caller to try again later (i.e.
  return -EAGAIN)
- We currently do not want the user space to submit another channel
  program while the first one is still in flight. As submitting another
  one is a valid request, however, we should allow this in the future
  (once we have the code to handle that in place).
- With the async interface, we want user space to be able to submit a
  halt/clear while a start request is still in flight, but not while
  we're processing a start request with translation etc. We probably
  want to do -EAGAIN in that case.

My idea would be:

- The BUSY state denotes "I'm busy processing a request right now, try
  again". We hold it while processing the cp and doing the ssch and
  leave it afterwards (i.e., while the start request is processed by
  the hardware). I/O requests and async requests get -EAGAIN in that
  state.
- A new state (CP_PENDING?) is entered after ssch returned with cc 0
  (from the BUSY state). We stay in there as long as no final state for
  that request has been received and delivered. (This may be final
  interrupt for that request, a deferred cc, or successful halt/clear.)
  I/O requests get -EBUSY, async requests are processed. This state can
  be removed again once we are able to handle more than one outstanding
  cp.

Does that make sense?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 12:58           ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-25 12:58 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Fri, 25 Jan 2019 11:24:37 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Thu, 24 Jan 2019 21:37:44 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > On 01/24/2019 09:25 PM, Eric Farman wrote:  
> > > 
> > > 
> > > On 01/21/2019 06:03 AM, Cornelia Huck wrote:    
> 
> > > [1] I think these changes are cool.  We end up going into (and staying 
> > > in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> > > bumble along.
> > > 
> > > But why can't these be separated out from this patch?  It does change 
> > > the behavior of the state machine, and seem distinct from the addition 
> > > of the mutex you otherwise add here?  At the very least, this behavior 
> > > change should be documented in the commit since it's otherwise lost in 
> > > the mutex/EAGAIN stuff.  
> 
> That's a very good idea. I'll factor them out into a separate patch.

And now that I've factored it out, I noticed some more problems.

What we basically need is the following, I think:

- The code should not be interrupted while we process the channel
  program, do the ssch etc. We want the caller to try again later (i.e.
  return -EAGAIN)
- We currently do not want the user space to submit another channel
  program while the first one is still in flight. As submitting another
  one is a valid request, however, we should allow this in the future
  (once we have the code to handle that in place).
- With the async interface, we want user space to be able to submit a
  halt/clear while a start request is still in flight, but not while
  we're processing a start request with translation etc. We probably
  want to do -EAGAIN in that case.

My idea would be:

- The BUSY state denotes "I'm busy processing a request right now, try
  again". We hold it while processing the cp and doing the ssch and
  leave it afterwards (i.e., while the start request is processed by
  the hardware). I/O requests and async requests get -EAGAIN in that
  state.
- A new state (CP_PENDING?) is entered after ssch returned with cc 0
  (from the BUSY state). We stay in there as long as no final state for
  that request has been received and delivered. (This may be final
  interrupt for that request, a deferred cc, or successful halt/clear.)
  I/O requests get -EBUSY, async requests are processed. This state can
  be removed again once we are able to handle more than one outstanding
  cp.

Does that make sense?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25  2:37       ` [Qemu-devel] " Eric Farman
@ 2019-01-25 13:09         ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-25 13:09 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Cornelia Huck, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Thu, 24 Jan 2019 21:37:44 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> >> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct 
> >> mdev_device *mdev,
> >>   {
> >>       struct vfio_ccw_private *private;
> >>       struct ccw_io_region *region;
> >> +    int ret;
> >>       if (*ppos + count > sizeof(*region))
> >>           return -EINVAL;
> >>       private = dev_get_drvdata(mdev_parent_dev(mdev));
> >> -    if (private->state != VFIO_CCW_STATE_IDLE)
> >> +    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> >> +        private->state == VFIO_CCW_STATE_STANDBY)
> >>           return -EACCES;
> >> +    if (!mutex_trylock(&private->io_mutex))
> >> +        return -EAGAIN;  
> > 
> > Ah, I see Halil's difficulty here.
> > 
> > It is true there is a race condition today, and that this doesn't 
> > address it.  That's fine, add it to the todo list.  But even with that, 
> > I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be 
> > serialized (one will get kicked out with a failed trylock() call), while 
> > still leaving the window open between cc=0 on the SSCH and the 
> > subsequent interrupt.  In the latter case, a second SSCH will come 
> > through here, do the copy_from_user below, and then jump to fsm_io_busy 
> > to return EAGAIN.  Do we really want to stomp on io_region in that case? 
> >   Why can't we simply return EAGAIN if state==BUSY?  
> 
> (Answering my own questions as I skim patch 5...)
> 
> Because of course this series is for async handling, while I was looking 
> specifically at the synchronous code that exists today.  I guess then my 
> question just remains on how the mutex is adding protection in the sync 
> case, because that's still not apparent to me.  (Perhaps I missed it in 
> a reply to Halil; if so I apologize, there were a lot when I returned.)

Careful, at the end we have vfio_ccw_mdev_write_io_region() and the
write callback for the capchain regions. We could return EAGAIN if
state==BUSY in the vfio_ccw_mdev_write_io_region() (but I would prefer a
different error code -- see my other response).

I answered your mutex question as well. Just a small addendum the mutex
is not only for the cases the userspace acts sane (but also when it acts
insane;).

Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 13:09         ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-25 13:09 UTC (permalink / raw)
  To: Eric Farman
  Cc: Cornelia Huck, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Thu, 24 Jan 2019 21:37:44 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> >> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct 
> >> mdev_device *mdev,
> >>   {
> >>       struct vfio_ccw_private *private;
> >>       struct ccw_io_region *region;
> >> +    int ret;
> >>       if (*ppos + count > sizeof(*region))
> >>           return -EINVAL;
> >>       private = dev_get_drvdata(mdev_parent_dev(mdev));
> >> -    if (private->state != VFIO_CCW_STATE_IDLE)
> >> +    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> >> +        private->state == VFIO_CCW_STATE_STANDBY)
> >>           return -EACCES;
> >> +    if (!mutex_trylock(&private->io_mutex))
> >> +        return -EAGAIN;  
> > 
> > Ah, I see Halil's difficulty here.
> > 
> > It is true there is a race condition today, and that this doesn't 
> > address it.  That's fine, add it to the todo list.  But even with that, 
> > I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be 
> > serialized (one will get kicked out with a failed trylock() call), while 
> > still leaving the window open between cc=0 on the SSCH and the 
> > subsequent interrupt.  In the latter case, a second SSCH will come 
> > through here, do the copy_from_user below, and then jump to fsm_io_busy 
> > to return EAGAIN.  Do we really want to stomp on io_region in that case? 
> >   Why can't we simply return EAGAIN if state==BUSY?  
> 
> (Answering my own questions as I skim patch 5...)
> 
> Because of course this series is for async handling, while I was looking 
> specifically at the synchronous code that exists today.  I guess then my 
> question just remains on how the mutex is adding protection in the sync 
> case, because that's still not apparent to me.  (Perhaps I missed it in 
> a reply to Halil; if so I apologize, there were a lot when I returned.)

Careful, at the end we have vfio_ccw_mdev_write_io_region() and the
write callback for the capchain regions. We could return EAGAIN if
state==BUSY in the vfio_ccw_mdev_write_io_region() (but I would prefer a
different error code -- see my other response).

I answered your mutex question as well. Just a small addendum the mutex
is not only for the cases the userspace acts sane (but also when it acts
insane;).

Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 12:58           ` [Qemu-devel] " Cornelia Huck
@ 2019-01-25 14:01             ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-25 14:01 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Fri, 25 Jan 2019 13:58:35 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 25 Jan 2019 11:24:37 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Thu, 24 Jan 2019 21:37:44 -0500
> > Eric Farman <farman@linux.ibm.com> wrote:
> > 
> > > On 01/24/2019 09:25 PM, Eric Farman wrote:  
> > > > 
> > > > 
> > > > On 01/21/2019 06:03 AM, Cornelia Huck wrote:    
> > 
> > > > [1] I think these changes are cool.  We end up going into (and staying 
> > > > in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> > > > bumble along.
> > > > 
> > > > But why can't these be separated out from this patch?  It does change 
> > > > the behavior of the state machine, and seem distinct from the addition 
> > > > of the mutex you otherwise add here?  At the very least, this behavior 
> > > > change should be documented in the commit since it's otherwise lost in 
> > > > the mutex/EAGAIN stuff.  
> > 
> > That's a very good idea. I'll factor them out into a separate patch.
> 
> And now that I've factored it out, I noticed some more problems.
> 
> What we basically need is the following, I think:
> 
> - The code should not be interrupted while we process the channel
>   program, do the ssch etc. We want the caller to try again later (i.e.
>   return -EAGAIN)

We could also interrupt it e.g. by a TRANSLATE -> REQ_ABORT_TRANSLATE
state transition. The thread doing the translation could pick that up
and make sure we don't do the ssch(). Would match the architecture
better, but would be more complicated. And can be done any time later.

> - We currently do not want the user space to submit another channel
>   program while the first one is still in flight. As submitting another
>   one is a valid request, however, we should allow this in the future
>   (once we have the code to handle that in place).

I don't agree. There is at most one channel program processed by a
subchannel at any time. I would prefer an early error code if channel
programs are issued on top of each other (our virtual subchannel
is channel pending or FC start function bit set or similar).

Of course the interface exposed by the vfio-ccw module does not need to
look like the architecture interface. But IMHO any
unjustified deviation from the good old architecure ways of things will
just make it harder to reason about stuff. In the end you have that
interface both as input (guests passed-thorough subchannel) and as
output (the subchannel in the host that's being passed-thorough).


> - With the async interface, we want user space to be able to submit a
>   halt/clear while a start request is still in flight, but not while
>   we're processing a start request with translation etc. We probably
>   want to do -EAGAIN in that case.

This reads very similar to your first point.

> 
> My idea would be:
> 
> - The BUSY state denotes "I'm busy processing a request right now, try
>   again". We hold it while processing the cp and doing the ssch and
>   leave it afterwards (i.e., while the start request is processed by
>   the hardware). I/O requests and async requests get -EAGAIN in that
>   state.
> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>   (from the BUSY state). We stay in there as long as no final state for
>   that request has been received and delivered. (This may be final
>   interrupt for that request, a deferred cc, or successful halt/clear.)
>   I/O requests get -EBUSY, async requests are processed. This state can
>   be removed again once we are able to handle more than one outstanding
>   cp.
> 
> Does that make sense?
> 

AFAIU your idea is to split up the busy state into two states: CP_PENDING
and of busy without CP_PENDING called BUSY. I like the idea of having a
separate state for CP_PENDING but I don't like the new semantic of BUSY.

Hm mashing a conceptual state machine and the jumptabe stuff ain't
making reasoning about this simpler either. I'm taking about the
conceptual state machine. It would be nice to have a picture of it and
then think about how to express that in code.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 14:01             ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-25 14:01 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Fri, 25 Jan 2019 13:58:35 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 25 Jan 2019 11:24:37 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Thu, 24 Jan 2019 21:37:44 -0500
> > Eric Farman <farman@linux.ibm.com> wrote:
> > 
> > > On 01/24/2019 09:25 PM, Eric Farman wrote:  
> > > > 
> > > > 
> > > > On 01/21/2019 06:03 AM, Cornelia Huck wrote:    
> > 
> > > > [1] I think these changes are cool.  We end up going into (and staying 
> > > > in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> > > > bumble along.
> > > > 
> > > > But why can't these be separated out from this patch?  It does change 
> > > > the behavior of the state machine, and seem distinct from the addition 
> > > > of the mutex you otherwise add here?  At the very least, this behavior 
> > > > change should be documented in the commit since it's otherwise lost in 
> > > > the mutex/EAGAIN stuff.  
> > 
> > That's a very good idea. I'll factor them out into a separate patch.
> 
> And now that I've factored it out, I noticed some more problems.
> 
> What we basically need is the following, I think:
> 
> - The code should not be interrupted while we process the channel
>   program, do the ssch etc. We want the caller to try again later (i.e.
>   return -EAGAIN)

We could also interrupt it e.g. by a TRANSLATE -> REQ_ABORT_TRANSLATE
state transition. The thread doing the translation could pick that up
and make sure we don't do the ssch(). Would match the architecture
better, but would be more complicated. And can be done any time later.

> - We currently do not want the user space to submit another channel
>   program while the first one is still in flight. As submitting another
>   one is a valid request, however, we should allow this in the future
>   (once we have the code to handle that in place).

I don't agree. There is at most one channel program processed by a
subchannel at any time. I would prefer an early error code if channel
programs are issued on top of each other (our virtual subchannel
is channel pending or FC start function bit set or similar).

Of course the interface exposed by the vfio-ccw module does not need to
look like the architecture interface. But IMHO any
unjustified deviation from the good old architecure ways of things will
just make it harder to reason about stuff. In the end you have that
interface both as input (guests passed-thorough subchannel) and as
output (the subchannel in the host that's being passed-thorough).


> - With the async interface, we want user space to be able to submit a
>   halt/clear while a start request is still in flight, but not while
>   we're processing a start request with translation etc. We probably
>   want to do -EAGAIN in that case.

This reads very similar to your first point.

> 
> My idea would be:
> 
> - The BUSY state denotes "I'm busy processing a request right now, try
>   again". We hold it while processing the cp and doing the ssch and
>   leave it afterwards (i.e., while the start request is processed by
>   the hardware). I/O requests and async requests get -EAGAIN in that
>   state.
> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>   (from the BUSY state). We stay in there as long as no final state for
>   that request has been received and delivered. (This may be final
>   interrupt for that request, a deferred cc, or successful halt/clear.)
>   I/O requests get -EBUSY, async requests are processed. This state can
>   be removed again once we are able to handle more than one outstanding
>   cp.
> 
> Does that make sense?
> 

AFAIU your idea is to split up the busy state into two states: CP_PENDING
and of busy without CP_PENDING called BUSY. I like the idea of having a
separate state for CP_PENDING but I don't like the new semantic of BUSY.

Hm mashing a conceptual state machine and the jumptabe stuff ain't
making reasoning about this simpler either. I'm taking about the
conceptual state machine. It would be nice to have a picture of it and
then think about how to express that in code.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 14:01             ` [Qemu-devel] " Halil Pasic
@ 2019-01-25 14:21               ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-25 14:21 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Fri, 25 Jan 2019 15:01:01 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Fri, 25 Jan 2019 13:58:35 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:

> > - We currently do not want the user space to submit another channel
> >   program while the first one is still in flight. As submitting another
> >   one is a valid request, however, we should allow this in the future
> >   (once we have the code to handle that in place).  
> 
> I don't agree. There is at most one channel program processed by a
> subchannel at any time. I would prefer an early error code if channel
> programs are issued on top of each other (our virtual subchannel
> is channel pending or FC start function bit set or similar).

You can submit a new request if the subchannel is pending with primary,
but not with secondary state.

Regardless of that, I think it is much easier to push as much as
possible of sorting out of requests to the hardware.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 14:21               ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-25 14:21 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Fri, 25 Jan 2019 15:01:01 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Fri, 25 Jan 2019 13:58:35 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:

> > - We currently do not want the user space to submit another channel
> >   program while the first one is still in flight. As submitting another
> >   one is a valid request, however, we should allow this in the future
> >   (once we have the code to handle that in place).  
> 
> I don't agree. There is at most one channel program processed by a
> subchannel at any time. I would prefer an early error code if channel
> programs are issued on top of each other (our virtual subchannel
> is channel pending or FC start function bit set or similar).

You can submit a new request if the subchannel is pending with primary,
but not with secondary state.

Regardless of that, I think it is much easier to push as much as
possible of sorting out of requests to the hardware.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 12:58           ` [Qemu-devel] " Cornelia Huck
@ 2019-01-25 15:57             ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 15:57 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x



On 01/25/2019 07:58 AM, Cornelia Huck wrote:
> On Fri, 25 Jan 2019 11:24:37 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Thu, 24 Jan 2019 21:37:44 -0500
>> Eric Farman <farman@linux.ibm.com> wrote:
>>
>>> On 01/24/2019 09:25 PM, Eric Farman wrote:
>>>>
>>>>
>>>> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
>>
>>>> [1] I think these changes are cool.  We end up going into (and staying
>>>> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
>>>> bumble along.
>>>>
>>>> But why can't these be separated out from this patch?  It does change
>>>> the behavior of the state machine, and seem distinct from the addition
>>>> of the mutex you otherwise add here?  At the very least, this behavior
>>>> change should be documented in the commit since it's otherwise lost in
>>>> the mutex/EAGAIN stuff.
>>
>> That's a very good idea. I'll factor them out into a separate patch.
> 
> And now that I've factored it out, I noticed some more problems.

That's good!  Maybe it helps us with the circles we're on :)

> 
> What we basically need is the following, I think:
> 
> - The code should not be interrupted while we process the channel
>    program, do the ssch etc. We want the caller to try again later (i.e.
>    return -EAGAIN)
> - We currently do not want the user space to submit another channel
>    program while the first one is still in flight. 

These two seem to contradict one another.  I think you're saying is that 
we don't _want_ userspace to issue another channel program, even though 
its _allowed_ to as far as vfio-ccw is concerned.

As submitting another
>    one is a valid request, however, we should allow this in the future
>    (once we have the code to handle that in place).
> - With the async interface, we want user space to be able to submit a
>    halt/clear while a start request is still in flight, but not while
>    we're processing a start request with translation etc. We probably
>    want to do -EAGAIN in that case.
> 
> My idea would be:
> 
> - The BUSY state denotes "I'm busy processing a request right now, try
>    again". We hold it while processing the cp and doing the ssch and
>    leave it afterwards (i.e., while the start request is processed by
>    the hardware). I/O requests and async requests get -EAGAIN in that
>    state.
> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>    (from the BUSY state). We stay in there as long as no final state for
>    that request has been received and delivered. (This may be final
>    interrupt for that request, a deferred cc, or successful halt/clear.)
>    I/O requests get -EBUSY

I liked CP_PENDING, since it corresponds to the subchannel being marked 
"start pending" as described in POPS, but this statement suggests that 
the BUSY/PENDING state to be swapped, such that state=PENDING returns 
-EAGAIN and state=BUSY returns -EBUSY.  Not super-concerned with the 
terminology though.

, async requests are processed. This state can
>    be removed again once we are able to handle more than one outstanding
>    cp.
> 
> Does that make sense?
> 

I think so, and I think I like it.  So you want to distinguish between 
(I have swapped BUSY/PENDING in this example per my above comment):

A) SSCH issued by userspace (IDLE->PENDING)
B) SSCH issued (successfully) by kernel (PENDING->BUSY)
B') SSCH issued (unsuccessfully) by kernel (PENDING->IDLE?)
C) Interrupt received by kernel (no change?)
D) Interrupt given to userspace (BUSY->IDLE)

If we receive A and A, the second A gets EAGAIN

If we do A+B and A, the second A gets EBUSY (unless async, which is 
processed)

Does the boundary of "in flight" in the interrupt side (C and D) need to 
be defined, such that we go BUSY->PENDING->IDLE instead of BUSY->IDLE ?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 15:57             ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 15:57 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson



On 01/25/2019 07:58 AM, Cornelia Huck wrote:
> On Fri, 25 Jan 2019 11:24:37 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Thu, 24 Jan 2019 21:37:44 -0500
>> Eric Farman <farman@linux.ibm.com> wrote:
>>
>>> On 01/24/2019 09:25 PM, Eric Farman wrote:
>>>>
>>>>
>>>> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
>>
>>>> [1] I think these changes are cool.  We end up going into (and staying
>>>> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
>>>> bumble along.
>>>>
>>>> But why can't these be separated out from this patch?  It does change
>>>> the behavior of the state machine, and seem distinct from the addition
>>>> of the mutex you otherwise add here?  At the very least, this behavior
>>>> change should be documented in the commit since it's otherwise lost in
>>>> the mutex/EAGAIN stuff.
>>
>> That's a very good idea. I'll factor them out into a separate patch.
> 
> And now that I've factored it out, I noticed some more problems.

That's good!  Maybe it helps us with the circles we're on :)

> 
> What we basically need is the following, I think:
> 
> - The code should not be interrupted while we process the channel
>    program, do the ssch etc. We want the caller to try again later (i.e.
>    return -EAGAIN)
> - We currently do not want the user space to submit another channel
>    program while the first one is still in flight. 

These two seem to contradict one another.  I think you're saying is that 
we don't _want_ userspace to issue another channel program, even though 
its _allowed_ to as far as vfio-ccw is concerned.

As submitting another
>    one is a valid request, however, we should allow this in the future
>    (once we have the code to handle that in place).
> - With the async interface, we want user space to be able to submit a
>    halt/clear while a start request is still in flight, but not while
>    we're processing a start request with translation etc. We probably
>    want to do -EAGAIN in that case.
> 
> My idea would be:
> 
> - The BUSY state denotes "I'm busy processing a request right now, try
>    again". We hold it while processing the cp and doing the ssch and
>    leave it afterwards (i.e., while the start request is processed by
>    the hardware). I/O requests and async requests get -EAGAIN in that
>    state.
> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>    (from the BUSY state). We stay in there as long as no final state for
>    that request has been received and delivered. (This may be final
>    interrupt for that request, a deferred cc, or successful halt/clear.)
>    I/O requests get -EBUSY

I liked CP_PENDING, since it corresponds to the subchannel being marked 
"start pending" as described in POPS, but this statement suggests that 
the BUSY/PENDING state to be swapped, such that state=PENDING returns 
-EAGAIN and state=BUSY returns -EBUSY.  Not super-concerned with the 
terminology though.

, async requests are processed. This state can
>    be removed again once we are able to handle more than one outstanding
>    cp.
> 
> Does that make sense?
> 

I think so, and I think I like it.  So you want to distinguish between 
(I have swapped BUSY/PENDING in this example per my above comment):

A) SSCH issued by userspace (IDLE->PENDING)
B) SSCH issued (successfully) by kernel (PENDING->BUSY)
B') SSCH issued (unsuccessfully) by kernel (PENDING->IDLE?)
C) Interrupt received by kernel (no change?)
D) Interrupt given to userspace (BUSY->IDLE)

If we receive A and A, the second A gets EAGAIN

If we do A+B and A, the second A gets EBUSY (unless async, which is 
processed)

Does the boundary of "in flight" in the interrupt side (C and D) need to 
be defined, such that we go BUSY->PENDING->IDLE instead of BUSY->IDLE ?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 14:21               ` [Qemu-devel] " Cornelia Huck
@ 2019-01-25 16:04                 ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-25 16:04 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, kvm, Pierre Morel, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Fri, 25 Jan 2019 15:21:54 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 25 Jan 2019 15:01:01 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Fri, 25 Jan 2019 13:58:35 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > > - We currently do not want the user space to submit another channel
> > >   program while the first one is still in flight. As submitting another
> > >   one is a valid request, however, we should allow this in the future
> > >   (once we have the code to handle that in place).  
> > 
> > I don't agree. There is at most one channel program processed by a
> > subchannel at any time. I would prefer an early error code if channel
> > programs are issued on top of each other (our virtual subchannel
> > is channel pending or FC start function bit set or similar).
> 
> You can submit a new request if the subchannel is pending with primary,
> but not with secondary state.
> 
> Regardless of that, I think it is much easier to push as much as
> possible of sorting out of requests to the hardware.
> 

Do we expect userspace/QEMU to fence the bad scenarios as tries to do
today, or is this supposed to change to hardware should sort out
requests whenever possible.

The problem I see with the let the hardware sort it out is that, for that
to work, we need to juggle multiple translations simultaneously (or am I
wrong?). Doing that does not appear particularly simple to me.
Furthermore we would go through all that hassle knowingly that the sole
reason is working around bugs. We still expect our Linux guests
serializing it's ssch() stuff as it does today. Thus I would except this
code not getting the love nor the coverage that would guard against bugs
in that code.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 16:04                 ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-25 16:04 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Fri, 25 Jan 2019 15:21:54 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 25 Jan 2019 15:01:01 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Fri, 25 Jan 2019 13:58:35 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > > - We currently do not want the user space to submit another channel
> > >   program while the first one is still in flight. As submitting another
> > >   one is a valid request, however, we should allow this in the future
> > >   (once we have the code to handle that in place).  
> > 
> > I don't agree. There is at most one channel program processed by a
> > subchannel at any time. I would prefer an early error code if channel
> > programs are issued on top of each other (our virtual subchannel
> > is channel pending or FC start function bit set or similar).
> 
> You can submit a new request if the subchannel is pending with primary,
> but not with secondary state.
> 
> Regardless of that, I think it is much easier to push as much as
> possible of sorting out of requests to the hardware.
> 

Do we expect userspace/QEMU to fence the bad scenarios as tries to do
today, or is this supposed to change to hardware should sort out
requests whenever possible.

The problem I see with the let the hardware sort it out is that, for that
to work, we need to juggle multiple translations simultaneously (or am I
wrong?). Doing that does not appear particularly simple to me.
Furthermore we would go through all that hassle knowingly that the sole
reason is working around bugs. We still expect our Linux guests
serializing it's ssch() stuff as it does today. Thus I would except this
code not getting the love nor the coverage that would guard against bugs
in that code.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 3/5] vfio-ccw: add capabilities chain
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-25 16:19     ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 16:19 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, qemu-s390x, Alex Williamson, qemu-devel, kvm



On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> Allow to extend the regions used by vfio-ccw. The first user will be
> handling of halt and clear subchannel.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---
>   drivers/s390/cio/vfio_ccw_ops.c     | 181 ++++++++++++++++++++++++----
>   drivers/s390/cio/vfio_ccw_private.h |  38 ++++++
>   include/uapi/linux/vfio.h           |   2 +
>   3 files changed, 195 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index 3fa9fc570400..5a89d09f9271 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -3,9 +3,11 @@
>    * Physical device callbacks for vfio_ccw
>    *
>    * Copyright IBM Corp. 2017
> + * Copyright Red Hat, Inc. 2019
>    *
>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> + *            Cornelia Huck <cohuck@redhat.com>
>    */
>   
>   #include <linux/vfio.h>
> @@ -157,27 +159,33 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev)
>   {
>   	struct vfio_ccw_private *private =
>   		dev_get_drvdata(mdev_parent_dev(mdev));
> +	int i;
>   
>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>   				 &private->nb);
> +
> +	for (i = 0; i < private->num_regions; i++)
> +		private->region[i].ops->release(private, &private->region[i]);
> +
> +	private->num_regions = 0;
> +	kfree(private->region);
> +	private->region = NULL;
>   }
>   
> -static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
> -				  char __user *buf,
> -				  size_t count,
> -				  loff_t *ppos)
> +static ssize_t vfio_ccw_mdev_read_io_region(struct vfio_ccw_private *private,
> +					    char __user *buf, size_t count,
> +					    loff_t *ppos)
>   {
> -	struct vfio_ccw_private *private;
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
>   	struct ccw_io_region *region;
>   	int ret;
>   
> -	if (*ppos + count > sizeof(*region))
> +	if (pos + count > sizeof(*region))
>   		return -EINVAL;
>   
> -	private = dev_get_drvdata(mdev_parent_dev(mdev));
>   	mutex_lock(&private->io_mutex);
>   	region = private->io_region;
> -	if (copy_to_user(buf, (void *)region + *ppos, count))
> +	if (copy_to_user(buf, (void *)region + pos, count))
>   		ret = -EFAULT;
>   	else
>   		ret = count;
> @@ -185,19 +193,42 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
>   	return ret;
>   }
>   
> -static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> -				   const char __user *buf,
> -				   size_t count,
> -				   loff_t *ppos)
> +static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
> +				  char __user *buf,
> +				  size_t count,
> +				  loff_t *ppos)
>   {
> +	unsigned int index = VFIO_CCW_OFFSET_TO_INDEX(*ppos);
>   	struct vfio_ccw_private *private;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +
> +	if (index >= VFIO_CCW_NUM_REGIONS + private->num_regions)
> +		return -EINVAL;
> +
> +	switch (index) {
> +	case VFIO_CCW_CONFIG_REGION_INDEX:
> +		return vfio_ccw_mdev_read_io_region(private, buf, count, ppos);
> +	default:
> +		index -= VFIO_CCW_NUM_REGIONS;
> +		return private->region[index].ops->read(private, buf, count,
> +							ppos);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t vfio_ccw_mdev_write_io_region(struct vfio_ccw_private *private,
> +					     const char __user *buf,
> +					     size_t count, loff_t *ppos)
> +{
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
>   	struct ccw_io_region *region;
>   	int ret;
>   
> -	if (*ppos + count > sizeof(*region))
> +	if (pos + count > sizeof(*region))
>   		return -EINVAL;
>   
> -	private = dev_get_drvdata(mdev_parent_dev(mdev));
>   	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
>   	    private->state == VFIO_CCW_STATE_STANDBY)
>   		return -EACCES;
> @@ -205,7 +236,7 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>   		return -EAGAIN;
>   
>   	region = private->io_region;
> -	if (copy_from_user((void *)region + *ppos, buf, count)) {
> +	if (copy_from_user((void *)region + pos, buf, count)) {
>   		ret = -EFAULT;
>   		goto out_unlock;
>   	}
> @@ -218,19 +249,52 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>   	return ret;
>   }
>   
> -static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
> +static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> +				   const char __user *buf,
> +				   size_t count,
> +				   loff_t *ppos)
> +{
> +	unsigned int index = VFIO_CCW_OFFSET_TO_INDEX(*ppos);
> +	struct vfio_ccw_private *private;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +
> +	if (index >= VFIO_CCW_NUM_REGIONS + private->num_regions)
> +		return -EINVAL;
> +
> +	switch (index) {
> +	case VFIO_CCW_CONFIG_REGION_INDEX:
> +		return vfio_ccw_mdev_write_io_region(private, buf, count, ppos);
> +	default:
> +		index -= VFIO_CCW_NUM_REGIONS;
> +		return private->region[index].ops->write(private, buf, count,
> +							 ppos);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info,
> +					 struct mdev_device *mdev)
>   {
> +	struct vfio_ccw_private *private;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
>   	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
> -	info->num_regions = VFIO_CCW_NUM_REGIONS;
> +	info->num_regions = VFIO_CCW_NUM_REGIONS + private->num_regions;
>   	info->num_irqs = VFIO_CCW_NUM_IRQS;
>   
>   	return 0;
>   }
>   
>   static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
> -					 u16 *cap_type_id,
> -					 void **cap_type)
> +					 struct mdev_device *mdev,
> +					 unsigned long arg)
>   {
> +	struct vfio_ccw_private *private;
> +	int i;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
>   	switch (info->index) {
>   	case VFIO_CCW_CONFIG_REGION_INDEX:
>   		info->offset = 0;
> @@ -238,9 +302,51 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
>   		info->flags = VFIO_REGION_INFO_FLAG_READ
>   			      | VFIO_REGION_INFO_FLAG_WRITE;
>   		return 0;
> -	default:
> -		return -EINVAL;
> +	default: /* all other regions are handled via capability chain */
> +	{
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		struct vfio_region_info_cap_type cap_type = {
> +			.header.id = VFIO_REGION_INFO_CAP_TYPE,
> +			.header.version = 1 };
> +		int ret;
> +
> +		if (info->index >=
> +		    VFIO_CCW_NUM_REGIONS + private->num_regions)
> +			return -EINVAL;
> +
> +		i = info->index - VFIO_CCW_NUM_REGIONS;
> +
> +		info->offset = VFIO_CCW_INDEX_TO_OFFSET(info->index);
> +		info->size = private->region[i].size;
> +		info->flags = private->region[i].flags;
> +
> +		cap_type.type = private->region[i].type;
> +		cap_type.subtype = private->region[i].subtype;
> +
> +		ret = vfio_info_add_capability(&caps, &cap_type.header,
> +					       sizeof(cap_type));
> +		if (ret)
> +			return ret;
> +
> +		info->flags |= VFIO_REGION_INFO_FLAG_CAPS;
> +		if (info->argsz < sizeof(*info) + caps.size) {
> +			info->argsz = sizeof(*info) + caps.size;
> +			info->cap_offset = 0;
> +		} else {
> +			vfio_info_cap_shift(&caps, sizeof(*info));
> +			if (copy_to_user((void __user *)arg + sizeof(*info),
> +					 caps.buf, caps.size)) {
> +				kfree(caps.buf);
> +				return -EFAULT;
> +			}
> +			info->cap_offset = sizeof(*info);
> +		}
> +
> +		kfree(caps.buf);
> +
> +	}
>   	}
> +	return 0;
>   }
>   
>   static int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
> @@ -317,6 +423,32 @@ static int vfio_ccw_mdev_set_irqs(struct mdev_device *mdev,
>   	}
>   }
>   
> +int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
> +				 unsigned int subtype,
> +				 const struct vfio_ccw_regops *ops,
> +				 size_t size, u32 flags, void *data)
> +{
> +	struct vfio_ccw_region *region;
> +
> +	region = krealloc(private->region,
> +			  (private->num_regions + 1) * sizeof(*region),
> +			  GFP_KERNEL);
> +	if (!region)
> +		return -ENOMEM;
> +
> +	private->region = region;
> +	private->region[private->num_regions].type = VFIO_REGION_TYPE_CCW;
> +	private->region[private->num_regions].subtype = subtype;
> +	private->region[private->num_regions].ops = ops;
> +	private->region[private->num_regions].size = size;
> +	private->region[private->num_regions].flags = flags;
> +	private->region[private->num_regions].data = data;
> +
> +	private->num_regions++;
> +
> +	return 0;
> +}
> +
>   static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>   				   unsigned int cmd,
>   				   unsigned long arg)
> @@ -337,7 +469,7 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>   		if (info.argsz < minsz)
>   			return -EINVAL;
>   
> -		ret = vfio_ccw_mdev_get_device_info(&info);
> +		ret = vfio_ccw_mdev_get_device_info(&info, mdev);
>   		if (ret)
>   			return ret;
>   
> @@ -346,8 +478,6 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>   	case VFIO_DEVICE_GET_REGION_INFO:
>   	{
>   		struct vfio_region_info info;
> -		u16 cap_type_id = 0;
> -		void *cap_type = NULL;
>   
>   		minsz = offsetofend(struct vfio_region_info, offset);
>   
> @@ -357,8 +487,7 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>   		if (info.argsz < minsz)
>   			return -EINVAL;
>   
> -		ret = vfio_ccw_mdev_get_region_info(&info, &cap_type_id,
> -						    &cap_type);
> +		ret = vfio_ccw_mdev_get_region_info(&info, mdev, arg);
>   		if (ret)
>   			return ret;
>   
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index e88237697f83..20e75f4f3695 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -3,9 +3,11 @@
>    * Private stuff for vfio_ccw driver
>    *
>    * Copyright IBM Corp. 2017
> + * Copyright Red Hat, Inc. 2019
>    *
>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> + *            Cornelia Huck <cohuck@redhat.com>
>    */
>   
>   #ifndef _VFIO_CCW_PRIVATE_H_
> @@ -19,6 +21,38 @@
>   #include "css.h"
>   #include "vfio_ccw_cp.h"
>   
> +#define VFIO_CCW_OFFSET_SHIFT   40
> +#define VFIO_CCW_OFFSET_TO_INDEX(off)	(off >> VFIO_CCW_OFFSET_SHIFT)
> +#define VFIO_CCW_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_CCW_OFFSET_SHIFT)
> +#define VFIO_CCW_OFFSET_MASK	(((u64)(1) << VFIO_CCW_OFFSET_SHIFT) - 1)
> +
> +/* capability chain handling similar to vfio-pci */
> +struct vfio_ccw_private;
> +struct vfio_ccw_region;
> +
> +struct vfio_ccw_regops {
> +	size_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
> +			size_t count, loff_t *ppos);
> +	size_t	(*write)(struct vfio_ccw_private *private,
> +			 const char __user *buf, size_t count, loff_t *ppos);
> +	void	(*release)(struct vfio_ccw_private *private,
> +			   struct vfio_ccw_region *region);
> +};
> +
> +struct vfio_ccw_region {
> +	u32				type;
> +	u32				subtype;
> +	const struct vfio_ccw_regops	*ops;
> +	void				*data;
> +	size_t				size;
> +	u32				flags;
> +};
> +
> +int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
> +				 unsigned int subtype,
> +				 const struct vfio_ccw_regops *ops,
> +				 size_t size, u32 flags, void *data);
> +
>   /**
>    * struct vfio_ccw_private
>    * @sch: pointer to the subchannel
> @@ -29,6 +63,8 @@
>    * @nb: notifier for vfio events
>    * @io_region: MMIO region to input/output I/O arguments/results
>    * @io_mutex: protect against concurrent update of I/O structures
> + * @region: additional regions for other subchannel operations
> + * @num_regions: number of additional regions
>    * @cp: channel program for the current I/O operation
>    * @irb: irb info received from interrupt
>    * @scsw: scsw info
> @@ -44,6 +80,8 @@ struct vfio_ccw_private {
>   	struct notifier_block	nb;
>   	struct ccw_io_region	*io_region;
>   	struct mutex		io_mutex;
> +	struct vfio_ccw_region *region;
> +	int num_regions;
>   
>   	struct channel_program	cp;
>   	struct irb		irb;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 02bb7ad6e986..56e2413d3e00 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -353,6 +353,8 @@ struct vfio_region_gfx_edid {
>   #define VFIO_DEVICE_GFX_LINK_STATE_DOWN  2
>   };
>   
> +#define VFIO_REGION_TYPE_CCW			(2)
> +

Cool.  :)

>   /*
>    * 10de vendor sub-type
>    *
> 

Looks fine to me.  I'd love to think there was a way to generalize this 
for other vfio drivers, but man that's a tall task.  So...

Reviewed-by: Eric Farman <farman@linux.ibm.com>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] vfio-ccw: add capabilities chain
@ 2019-01-25 16:19     ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 16:19 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson



On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> Allow to extend the regions used by vfio-ccw. The first user will be
> handling of halt and clear subchannel.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---
>   drivers/s390/cio/vfio_ccw_ops.c     | 181 ++++++++++++++++++++++++----
>   drivers/s390/cio/vfio_ccw_private.h |  38 ++++++
>   include/uapi/linux/vfio.h           |   2 +
>   3 files changed, 195 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index 3fa9fc570400..5a89d09f9271 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -3,9 +3,11 @@
>    * Physical device callbacks for vfio_ccw
>    *
>    * Copyright IBM Corp. 2017
> + * Copyright Red Hat, Inc. 2019
>    *
>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> + *            Cornelia Huck <cohuck@redhat.com>
>    */
>   
>   #include <linux/vfio.h>
> @@ -157,27 +159,33 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev)
>   {
>   	struct vfio_ccw_private *private =
>   		dev_get_drvdata(mdev_parent_dev(mdev));
> +	int i;
>   
>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>   				 &private->nb);
> +
> +	for (i = 0; i < private->num_regions; i++)
> +		private->region[i].ops->release(private, &private->region[i]);
> +
> +	private->num_regions = 0;
> +	kfree(private->region);
> +	private->region = NULL;
>   }
>   
> -static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
> -				  char __user *buf,
> -				  size_t count,
> -				  loff_t *ppos)
> +static ssize_t vfio_ccw_mdev_read_io_region(struct vfio_ccw_private *private,
> +					    char __user *buf, size_t count,
> +					    loff_t *ppos)
>   {
> -	struct vfio_ccw_private *private;
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
>   	struct ccw_io_region *region;
>   	int ret;
>   
> -	if (*ppos + count > sizeof(*region))
> +	if (pos + count > sizeof(*region))
>   		return -EINVAL;
>   
> -	private = dev_get_drvdata(mdev_parent_dev(mdev));
>   	mutex_lock(&private->io_mutex);
>   	region = private->io_region;
> -	if (copy_to_user(buf, (void *)region + *ppos, count))
> +	if (copy_to_user(buf, (void *)region + pos, count))
>   		ret = -EFAULT;
>   	else
>   		ret = count;
> @@ -185,19 +193,42 @@ static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
>   	return ret;
>   }
>   
> -static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> -				   const char __user *buf,
> -				   size_t count,
> -				   loff_t *ppos)
> +static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
> +				  char __user *buf,
> +				  size_t count,
> +				  loff_t *ppos)
>   {
> +	unsigned int index = VFIO_CCW_OFFSET_TO_INDEX(*ppos);
>   	struct vfio_ccw_private *private;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +
> +	if (index >= VFIO_CCW_NUM_REGIONS + private->num_regions)
> +		return -EINVAL;
> +
> +	switch (index) {
> +	case VFIO_CCW_CONFIG_REGION_INDEX:
> +		return vfio_ccw_mdev_read_io_region(private, buf, count, ppos);
> +	default:
> +		index -= VFIO_CCW_NUM_REGIONS;
> +		return private->region[index].ops->read(private, buf, count,
> +							ppos);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t vfio_ccw_mdev_write_io_region(struct vfio_ccw_private *private,
> +					     const char __user *buf,
> +					     size_t count, loff_t *ppos)
> +{
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
>   	struct ccw_io_region *region;
>   	int ret;
>   
> -	if (*ppos + count > sizeof(*region))
> +	if (pos + count > sizeof(*region))
>   		return -EINVAL;
>   
> -	private = dev_get_drvdata(mdev_parent_dev(mdev));
>   	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
>   	    private->state == VFIO_CCW_STATE_STANDBY)
>   		return -EACCES;
> @@ -205,7 +236,7 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>   		return -EAGAIN;
>   
>   	region = private->io_region;
> -	if (copy_from_user((void *)region + *ppos, buf, count)) {
> +	if (copy_from_user((void *)region + pos, buf, count)) {
>   		ret = -EFAULT;
>   		goto out_unlock;
>   	}
> @@ -218,19 +249,52 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>   	return ret;
>   }
>   
> -static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
> +static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> +				   const char __user *buf,
> +				   size_t count,
> +				   loff_t *ppos)
> +{
> +	unsigned int index = VFIO_CCW_OFFSET_TO_INDEX(*ppos);
> +	struct vfio_ccw_private *private;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +
> +	if (index >= VFIO_CCW_NUM_REGIONS + private->num_regions)
> +		return -EINVAL;
> +
> +	switch (index) {
> +	case VFIO_CCW_CONFIG_REGION_INDEX:
> +		return vfio_ccw_mdev_write_io_region(private, buf, count, ppos);
> +	default:
> +		index -= VFIO_CCW_NUM_REGIONS;
> +		return private->region[index].ops->write(private, buf, count,
> +							 ppos);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info,
> +					 struct mdev_device *mdev)
>   {
> +	struct vfio_ccw_private *private;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
>   	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
> -	info->num_regions = VFIO_CCW_NUM_REGIONS;
> +	info->num_regions = VFIO_CCW_NUM_REGIONS + private->num_regions;
>   	info->num_irqs = VFIO_CCW_NUM_IRQS;
>   
>   	return 0;
>   }
>   
>   static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
> -					 u16 *cap_type_id,
> -					 void **cap_type)
> +					 struct mdev_device *mdev,
> +					 unsigned long arg)
>   {
> +	struct vfio_ccw_private *private;
> +	int i;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
>   	switch (info->index) {
>   	case VFIO_CCW_CONFIG_REGION_INDEX:
>   		info->offset = 0;
> @@ -238,9 +302,51 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
>   		info->flags = VFIO_REGION_INFO_FLAG_READ
>   			      | VFIO_REGION_INFO_FLAG_WRITE;
>   		return 0;
> -	default:
> -		return -EINVAL;
> +	default: /* all other regions are handled via capability chain */
> +	{
> +		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> +		struct vfio_region_info_cap_type cap_type = {
> +			.header.id = VFIO_REGION_INFO_CAP_TYPE,
> +			.header.version = 1 };
> +		int ret;
> +
> +		if (info->index >=
> +		    VFIO_CCW_NUM_REGIONS + private->num_regions)
> +			return -EINVAL;
> +
> +		i = info->index - VFIO_CCW_NUM_REGIONS;
> +
> +		info->offset = VFIO_CCW_INDEX_TO_OFFSET(info->index);
> +		info->size = private->region[i].size;
> +		info->flags = private->region[i].flags;
> +
> +		cap_type.type = private->region[i].type;
> +		cap_type.subtype = private->region[i].subtype;
> +
> +		ret = vfio_info_add_capability(&caps, &cap_type.header,
> +					       sizeof(cap_type));
> +		if (ret)
> +			return ret;
> +
> +		info->flags |= VFIO_REGION_INFO_FLAG_CAPS;
> +		if (info->argsz < sizeof(*info) + caps.size) {
> +			info->argsz = sizeof(*info) + caps.size;
> +			info->cap_offset = 0;
> +		} else {
> +			vfio_info_cap_shift(&caps, sizeof(*info));
> +			if (copy_to_user((void __user *)arg + sizeof(*info),
> +					 caps.buf, caps.size)) {
> +				kfree(caps.buf);
> +				return -EFAULT;
> +			}
> +			info->cap_offset = sizeof(*info);
> +		}
> +
> +		kfree(caps.buf);
> +
> +	}
>   	}
> +	return 0;
>   }
>   
>   static int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
> @@ -317,6 +423,32 @@ static int vfio_ccw_mdev_set_irqs(struct mdev_device *mdev,
>   	}
>   }
>   
> +int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
> +				 unsigned int subtype,
> +				 const struct vfio_ccw_regops *ops,
> +				 size_t size, u32 flags, void *data)
> +{
> +	struct vfio_ccw_region *region;
> +
> +	region = krealloc(private->region,
> +			  (private->num_regions + 1) * sizeof(*region),
> +			  GFP_KERNEL);
> +	if (!region)
> +		return -ENOMEM;
> +
> +	private->region = region;
> +	private->region[private->num_regions].type = VFIO_REGION_TYPE_CCW;
> +	private->region[private->num_regions].subtype = subtype;
> +	private->region[private->num_regions].ops = ops;
> +	private->region[private->num_regions].size = size;
> +	private->region[private->num_regions].flags = flags;
> +	private->region[private->num_regions].data = data;
> +
> +	private->num_regions++;
> +
> +	return 0;
> +}
> +
>   static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>   				   unsigned int cmd,
>   				   unsigned long arg)
> @@ -337,7 +469,7 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>   		if (info.argsz < minsz)
>   			return -EINVAL;
>   
> -		ret = vfio_ccw_mdev_get_device_info(&info);
> +		ret = vfio_ccw_mdev_get_device_info(&info, mdev);
>   		if (ret)
>   			return ret;
>   
> @@ -346,8 +478,6 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>   	case VFIO_DEVICE_GET_REGION_INFO:
>   	{
>   		struct vfio_region_info info;
> -		u16 cap_type_id = 0;
> -		void *cap_type = NULL;
>   
>   		minsz = offsetofend(struct vfio_region_info, offset);
>   
> @@ -357,8 +487,7 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>   		if (info.argsz < minsz)
>   			return -EINVAL;
>   
> -		ret = vfio_ccw_mdev_get_region_info(&info, &cap_type_id,
> -						    &cap_type);
> +		ret = vfio_ccw_mdev_get_region_info(&info, mdev, arg);
>   		if (ret)
>   			return ret;
>   
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index e88237697f83..20e75f4f3695 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -3,9 +3,11 @@
>    * Private stuff for vfio_ccw driver
>    *
>    * Copyright IBM Corp. 2017
> + * Copyright Red Hat, Inc. 2019
>    *
>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> + *            Cornelia Huck <cohuck@redhat.com>
>    */
>   
>   #ifndef _VFIO_CCW_PRIVATE_H_
> @@ -19,6 +21,38 @@
>   #include "css.h"
>   #include "vfio_ccw_cp.h"
>   
> +#define VFIO_CCW_OFFSET_SHIFT   40
> +#define VFIO_CCW_OFFSET_TO_INDEX(off)	(off >> VFIO_CCW_OFFSET_SHIFT)
> +#define VFIO_CCW_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_CCW_OFFSET_SHIFT)
> +#define VFIO_CCW_OFFSET_MASK	(((u64)(1) << VFIO_CCW_OFFSET_SHIFT) - 1)
> +
> +/* capability chain handling similar to vfio-pci */
> +struct vfio_ccw_private;
> +struct vfio_ccw_region;
> +
> +struct vfio_ccw_regops {
> +	size_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
> +			size_t count, loff_t *ppos);
> +	size_t	(*write)(struct vfio_ccw_private *private,
> +			 const char __user *buf, size_t count, loff_t *ppos);
> +	void	(*release)(struct vfio_ccw_private *private,
> +			   struct vfio_ccw_region *region);
> +};
> +
> +struct vfio_ccw_region {
> +	u32				type;
> +	u32				subtype;
> +	const struct vfio_ccw_regops	*ops;
> +	void				*data;
> +	size_t				size;
> +	u32				flags;
> +};
> +
> +int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
> +				 unsigned int subtype,
> +				 const struct vfio_ccw_regops *ops,
> +				 size_t size, u32 flags, void *data);
> +
>   /**
>    * struct vfio_ccw_private
>    * @sch: pointer to the subchannel
> @@ -29,6 +63,8 @@
>    * @nb: notifier for vfio events
>    * @io_region: MMIO region to input/output I/O arguments/results
>    * @io_mutex: protect against concurrent update of I/O structures
> + * @region: additional regions for other subchannel operations
> + * @num_regions: number of additional regions
>    * @cp: channel program for the current I/O operation
>    * @irb: irb info received from interrupt
>    * @scsw: scsw info
> @@ -44,6 +80,8 @@ struct vfio_ccw_private {
>   	struct notifier_block	nb;
>   	struct ccw_io_region	*io_region;
>   	struct mutex		io_mutex;
> +	struct vfio_ccw_region *region;
> +	int num_regions;
>   
>   	struct channel_program	cp;
>   	struct irb		irb;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 02bb7ad6e986..56e2413d3e00 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -353,6 +353,8 @@ struct vfio_region_gfx_edid {
>   #define VFIO_DEVICE_GFX_LINK_STATE_DOWN  2
>   };
>   
> +#define VFIO_REGION_TYPE_CCW			(2)
> +

Cool.  :)

>   /*
>    * 10de vendor sub-type
>    *
> 

Looks fine to me.  I'd love to think there was a way to generalize this 
for other vfio drivers, but man that's a tall task.  So...

Reviewed-by: Eric Farman <farman@linux.ibm.com>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 12:58       ` [Qemu-devel] " Halil Pasic
@ 2019-01-25 20:21         ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 20:21 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Cornelia Huck, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x



On 01/25/2019 07:58 AM, Halil Pasic wrote:
> On Thu, 24 Jan 2019 21:25:10 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>>>    	private = dev_get_drvdata(mdev_parent_dev(mdev));
>>> -	if (private->state != VFIO_CCW_STATE_IDLE)
>>> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
>>> +	    private->state == VFIO_CCW_STATE_STANDBY)
>>>    		return -EACCES;
>>> +	if (!mutex_trylock(&private->io_mutex))
>>> +		return -EAGAIN;
>>
>> Ah, I see Halil's difficulty here.
>>
>> It is true there is a race condition today, and that this doesn't
>> address it.  That's fine, add it to the todo list.  But even with that,
>> I don't see what the mutex is enforcing?
> 
> It is protecting the io regions. AFAIU the idea was that only one
> thread is accessing the io region(s) at a time to prevent corruption and
> reading half-morphed data.
> 
>> Two simultaneous SSCHs will be
>> serialized (one will get kicked out with a failed trylock() call), while
>> still leaving the window open between cc=0 on the SSCH and the
>> subsequent interrupt.  In the latter case, a second SSCH will come
>> through here, do the copy_from_user below, and then jump to fsm_io_busy
>> to return EAGAIN.  Do we really want to stomp on io_region in that case?
> 
> I'm not sure I understood you correctly. The interrupt handler does not
> take the lock before writing to the io_region. That is one race but it is
> easy to fix.
> 
> The bigger problem is that between the interrupt handler has written IRB
> area and userspace has read it we may end up destroying it by stomping on
> it (to use your words). The userspace reading a wrong (given todays qemu
> zeroed out) IRB could lead to follow on problems.

I wasn't thinking about a race between the start and interrupt handler, 
but rather between two near-simultaneous starts.  Looking at it more 
closely, the orb and scsw structs as well as the ret_code field in 
ccw_io_region are only referenced under the protection of the new mutex 
(within fsm_io_request, for example), which I guess is the point.

So that leaves us with just the irb fields, which you'd mentioned a 
couple days ago (and which I was trying to ignore since it'd seems to 
have been discussed enough at the time).  So I withdraw my concerns on 
this point.  For now.  ;-)

>   
>>    Why can't we simply return EAGAIN if state==BUSY?
>>
> 
> Sure we can. That would essentially go back to the old way of things:
> if not idle return with error. 

I think this happens both before and after this series.  With this 
series, we just update the io_region with things that are never used 
because we're busy.

Just the error code returned would change
> form EACCESS to EAGAIN. Which Isn't necessarily a win, because
> conceptually here should be never two interleaved io_requests/start
> commands hitting the module.
> 
> 
>>>    
>>>    	region = private->io_region;
>>> -	if (copy_from_user((void *)region + *ppos, buf, count))
>>> -		return -EFAULT;
>>> +	if (copy_from_user((void *)region + *ppos, buf, count)) {
>>> +		ret = -EFAULT;
>>> +		goto out_unlock;
>>> +	}
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 20:21         ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 20:21 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson



On 01/25/2019 07:58 AM, Halil Pasic wrote:
> On Thu, 24 Jan 2019 21:25:10 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>>>    	private = dev_get_drvdata(mdev_parent_dev(mdev));
>>> -	if (private->state != VFIO_CCW_STATE_IDLE)
>>> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
>>> +	    private->state == VFIO_CCW_STATE_STANDBY)
>>>    		return -EACCES;
>>> +	if (!mutex_trylock(&private->io_mutex))
>>> +		return -EAGAIN;
>>
>> Ah, I see Halil's difficulty here.
>>
>> It is true there is a race condition today, and that this doesn't
>> address it.  That's fine, add it to the todo list.  But even with that,
>> I don't see what the mutex is enforcing?
> 
> It is protecting the io regions. AFAIU the idea was that only one
> thread is accessing the io region(s) at a time to prevent corruption and
> reading half-morphed data.
> 
>> Two simultaneous SSCHs will be
>> serialized (one will get kicked out with a failed trylock() call), while
>> still leaving the window open between cc=0 on the SSCH and the
>> subsequent interrupt.  In the latter case, a second SSCH will come
>> through here, do the copy_from_user below, and then jump to fsm_io_busy
>> to return EAGAIN.  Do we really want to stomp on io_region in that case?
> 
> I'm not sure I understood you correctly. The interrupt handler does not
> take the lock before writing to the io_region. That is one race but it is
> easy to fix.
> 
> The bigger problem is that between the interrupt handler has written IRB
> area and userspace has read it we may end up destroying it by stomping on
> it (to use your words). The userspace reading a wrong (given todays qemu
> zeroed out) IRB could lead to follow on problems.

I wasn't thinking about a race between the start and interrupt handler, 
but rather between two near-simultaneous starts.  Looking at it more 
closely, the orb and scsw structs as well as the ret_code field in 
ccw_io_region are only referenced under the protection of the new mutex 
(within fsm_io_request, for example), which I guess is the point.

So that leaves us with just the irb fields, which you'd mentioned a 
couple days ago (and which I was trying to ignore since it'd seems to 
have been discussed enough at the time).  So I withdraw my concerns on 
this point.  For now.  ;-)

>   
>>    Why can't we simply return EAGAIN if state==BUSY?
>>
> 
> Sure we can. That would essentially go back to the old way of things:
> if not idle return with error. 

I think this happens both before and after this series.  With this 
series, we just update the io_region with things that are never used 
because we're busy.

Just the error code returned would change
> form EACCESS to EAGAIN. Which Isn't necessarily a win, because
> conceptually here should be never two interleaved io_requests/start
> commands hitting the module.
> 
> 
>>>    
>>>    	region = private->io_region;
>>> -	if (copy_from_user((void *)region + *ppos, buf, count))
>>> -		return -EFAULT;
>>> +	if (copy_from_user((void *)region + *ppos, buf, count)) {
>>> +		ret = -EFAULT;
>>> +		goto out_unlock;
>>> +	}
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 10:24         ` [Qemu-devel] " Cornelia Huck
@ 2019-01-25 20:22           ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 20:22 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x



On 01/25/2019 05:24 AM, Cornelia Huck wrote:
> On Thu, 24 Jan 2019 21:37:44 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 01/24/2019 09:25 PM, Eric Farman wrote:
>>>
>>>
>>> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> 
>>> [1] I think these changes are cool.  We end up going into (and staying
>>> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
>>> bumble along.
>>>
>>> But why can't these be separated out from this patch?  It does change
>>> the behavior of the state machine, and seem distinct from the addition
>>> of the mutex you otherwise add here?  At the very least, this behavior
>>> change should be documented in the commit since it's otherwise lost in
>>> the mutex/EAGAIN stuff.
> 
> That's a very good idea. I'll factor them out into a separate patch.
> 
>>>    
>>>>        trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
>>>>                       io_region->ret_code, errstr);
>>>>    }
>>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c
>>>> b/drivers/s390/cio/vfio_ccw_ops.c
>>>> index f673e106c041..3fa9fc570400 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
>>>> @@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct
>>>> mdev_device *mdev,
>>>>    {
>>>>        struct vfio_ccw_private *private;
>>>>        struct ccw_io_region *region;
>>>> +    int ret;
>>>>        if (*ppos + count > sizeof(*region))
>>>>            return -EINVAL;
>>>>        private = dev_get_drvdata(mdev_parent_dev(mdev));
>>>> +    mutex_lock(&private->io_mutex);
>>>>        region = private->io_region;
>>>>        if (copy_to_user(buf, (void *)region + *ppos, count))
>>>> -        return -EFAULT;
>>>> -
>>>> -    return count;
>>>> +        ret = -EFAULT;
>>>> +    else
>>>> +        ret = count;
>>>> +    mutex_unlock(&private->io_mutex);
>>>> +    return ret;
>>>>    }
>>>>    static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>>>> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct
>>>> mdev_device *mdev,
>>>>    {
>>>>        struct vfio_ccw_private *private;
>>>>        struct ccw_io_region *region;
>>>> +    int ret;
>>>>        if (*ppos + count > sizeof(*region))
>>>>            return -EINVAL;
>>>>        private = dev_get_drvdata(mdev_parent_dev(mdev));
>>>> -    if (private->state != VFIO_CCW_STATE_IDLE)
>>>> +    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
>>>> +        private->state == VFIO_CCW_STATE_STANDBY)
>>>>            return -EACCES;
>>>> +    if (!mutex_trylock(&private->io_mutex))
>>>> +        return -EAGAIN;
>>>
>>> Ah, I see Halil's difficulty here.
>>>
>>> It is true there is a race condition today, and that this doesn't
>>> address it.  That's fine, add it to the todo list.  But even with that,
>>> I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be
>>> serialized (one will get kicked out with a failed trylock() call), while
>>> still leaving the window open between cc=0 on the SSCH and the
>>> subsequent interrupt.  In the latter case, a second SSCH will come
>>> through here, do the copy_from_user below, and then jump to fsm_io_busy
>>> to return EAGAIN.  Do we really want to stomp on io_region in that case?
>>>    Why can't we simply return EAGAIN if state==BUSY?
>>
>> (Answering my own questions as I skim patch 5...)
>>
>> Because of course this series is for async handling, while I was looking
>> specifically at the synchronous code that exists today.  I guess then my
>> question just remains on how the mutex is adding protection in the sync
>> case, because that's still not apparent to me.  (Perhaps I missed it in
>> a reply to Halil; if so I apologize, there were a lot when I returned.)
> 
> My idea behind the mutex was to make sure that we get consistent data
> when reading/writing (e.g. if one user space thread is reading the I/O
> region while another is writing to it).
> 

And from that angle, this accomplishes that.  It just wasn't apparent to 
me at first.

I'm still not certain of how we handle mdev_write when state=BUSY, so 
let me ask my question a different way...

If we come into mdev_write with state=BUSY and we get the lock, 
copy_from_user, and do our jump table we go to fsm_io_busy to set 
ret_code and return -EAGAIN.  Why then don't we set the jump table for 
state=NOT_OPER||STANDBY to do something that will return -EACCES instead 
of how we currently do a direct return of -EACCES before all the 
lock/copy stuff (and the jump table that would take us to fsm_io_error 
and an error message before returning -EIO)?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-25 20:22           ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 20:22 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson



On 01/25/2019 05:24 AM, Cornelia Huck wrote:
> On Thu, 24 Jan 2019 21:37:44 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 01/24/2019 09:25 PM, Eric Farman wrote:
>>>
>>>
>>> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> 
>>> [1] I think these changes are cool.  We end up going into (and staying
>>> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
>>> bumble along.
>>>
>>> But why can't these be separated out from this patch?  It does change
>>> the behavior of the state machine, and seem distinct from the addition
>>> of the mutex you otherwise add here?  At the very least, this behavior
>>> change should be documented in the commit since it's otherwise lost in
>>> the mutex/EAGAIN stuff.
> 
> That's a very good idea. I'll factor them out into a separate patch.
> 
>>>    
>>>>        trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
>>>>                       io_region->ret_code, errstr);
>>>>    }
>>>> diff --git a/drivers/s390/cio/vfio_ccw_ops.c
>>>> b/drivers/s390/cio/vfio_ccw_ops.c
>>>> index f673e106c041..3fa9fc570400 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_ops.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_ops.c
>>>> @@ -169,16 +169,20 @@ static ssize_t vfio_ccw_mdev_read(struct
>>>> mdev_device *mdev,
>>>>    {
>>>>        struct vfio_ccw_private *private;
>>>>        struct ccw_io_region *region;
>>>> +    int ret;
>>>>        if (*ppos + count > sizeof(*region))
>>>>            return -EINVAL;
>>>>        private = dev_get_drvdata(mdev_parent_dev(mdev));
>>>> +    mutex_lock(&private->io_mutex);
>>>>        region = private->io_region;
>>>>        if (copy_to_user(buf, (void *)region + *ppos, count))
>>>> -        return -EFAULT;
>>>> -
>>>> -    return count;
>>>> +        ret = -EFAULT;
>>>> +    else
>>>> +        ret = count;
>>>> +    mutex_unlock(&private->io_mutex);
>>>> +    return ret;
>>>>    }
>>>>    static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>>>> @@ -188,25 +192,30 @@ static ssize_t vfio_ccw_mdev_write(struct
>>>> mdev_device *mdev,
>>>>    {
>>>>        struct vfio_ccw_private *private;
>>>>        struct ccw_io_region *region;
>>>> +    int ret;
>>>>        if (*ppos + count > sizeof(*region))
>>>>            return -EINVAL;
>>>>        private = dev_get_drvdata(mdev_parent_dev(mdev));
>>>> -    if (private->state != VFIO_CCW_STATE_IDLE)
>>>> +    if (private->state == VFIO_CCW_STATE_NOT_OPER ||
>>>> +        private->state == VFIO_CCW_STATE_STANDBY)
>>>>            return -EACCES;
>>>> +    if (!mutex_trylock(&private->io_mutex))
>>>> +        return -EAGAIN;
>>>
>>> Ah, I see Halil's difficulty here.
>>>
>>> It is true there is a race condition today, and that this doesn't
>>> address it.  That's fine, add it to the todo list.  But even with that,
>>> I don't see what the mutex is enforcing?  Two simultaneous SSCHs will be
>>> serialized (one will get kicked out with a failed trylock() call), while
>>> still leaving the window open between cc=0 on the SSCH and the
>>> subsequent interrupt.  In the latter case, a second SSCH will come
>>> through here, do the copy_from_user below, and then jump to fsm_io_busy
>>> to return EAGAIN.  Do we really want to stomp on io_region in that case?
>>>    Why can't we simply return EAGAIN if state==BUSY?
>>
>> (Answering my own questions as I skim patch 5...)
>>
>> Because of course this series is for async handling, while I was looking
>> specifically at the synchronous code that exists today.  I guess then my
>> question just remains on how the mutex is adding protection in the sync
>> case, because that's still not apparent to me.  (Perhaps I missed it in
>> a reply to Halil; if so I apologize, there were a lot when I returned.)
> 
> My idea behind the mutex was to make sure that we get consistent data
> when reading/writing (e.g. if one user space thread is reading the I/O
> region while another is writing to it).
> 

And from that angle, this accomplishes that.  It just wasn't apparent to 
me at first.

I'm still not certain of how we handle mdev_write when state=BUSY, so 
let me ask my question a different way...

If we come into mdev_write with state=BUSY and we get the lock, 
copy_from_user, and do our jump table we go to fsm_io_busy to set 
ret_code and return -EAGAIN.  Why then don't we set the jump table for 
state=NOT_OPER||STANDBY to do something that will return -EACCES instead 
of how we currently do a direct return of -EACCES before all the 
lock/copy stuff (and the jump table that would take us to fsm_io_error 
and an error message before returning -EIO)?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 3/5] vfio-ccw: add capabilities chain
  2019-01-25 16:19     ` [Qemu-devel] " Eric Farman
@ 2019-01-25 21:00       ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 21:00 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, qemu-s390x, Alex Williamson, qemu-devel, kvm



On 01/25/2019 11:19 AM, Eric Farman wrote:
> 
> 
> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
>> Allow to extend the regions used by vfio-ccw. The first user will be
>> handling of halt and clear subchannel.
>>
>> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_ops.c     | 181 ++++++++++++++++++++++++----
>>   drivers/s390/cio/vfio_ccw_private.h |  38 ++++++
...snip...
>> diff --git a/drivers/s390/cio/vfio_ccw_private.h 
>> b/drivers/s390/cio/vfio_ccw_private.h
>> index e88237697f83..20e75f4f3695 100644
>> --- a/drivers/s390/cio/vfio_ccw_private.h
>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>> @@ -3,9 +3,11 @@
>>    * Private stuff for vfio_ccw driver
>>    *
>>    * Copyright IBM Corp. 2017
>> + * Copyright Red Hat, Inc. 2019
>>    *
>>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
>>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
>> + *            Cornelia Huck <cohuck@redhat.com>
>>    */
>>   #ifndef _VFIO_CCW_PRIVATE_H_
>> @@ -19,6 +21,38 @@
>>   #include "css.h"
>>   #include "vfio_ccw_cp.h"
>> +#define VFIO_CCW_OFFSET_SHIFT   40
>> +#define VFIO_CCW_OFFSET_TO_INDEX(off)    (off >> VFIO_CCW_OFFSET_SHIFT)
>> +#define VFIO_CCW_INDEX_TO_OFFSET(index)    ((u64)(index) << 
>> VFIO_CCW_OFFSET_SHIFT)
>> +#define VFIO_CCW_OFFSET_MASK    (((u64)(1) << VFIO_CCW_OFFSET_SHIFT) 
>> - 1)
>> +
>> +/* capability chain handling similar to vfio-pci */
>> +struct vfio_ccw_private;
>> +struct vfio_ccw_region;
>> +
>> +struct vfio_ccw_regops {
>> +    size_t    (*read)(struct vfio_ccw_private *private, char __user 
>> *buf,
>> +            size_t count, loff_t *ppos);
>> +    size_t    (*write)(struct vfio_ccw_private *private,
>> +             const char __user *buf, size_t count, loff_t *ppos);

Oops.  Per my recommendation on v1, you change these to "ssize_t" in 
patch 5.  Might as well just do that here.

>> +    void    (*release)(struct vfio_ccw_private *private,
>> +               struct vfio_ccw_region *region);
>> +};
>> +
>> +struct vfio_ccw_region {
>> +    u32                type;
>> +    u32                subtype;
>> +    const struct vfio_ccw_regops    *ops;
>> +    void                *data;
>> +    size_t                size;
>> +    u32                flags;
>> +};
>> +
>> +int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
>> +                 unsigned int subtype,
>> +                 const struct vfio_ccw_regops *ops,
>> +                 size_t size, u32 flags, void *data);
>> +
>>   /**
>>    * struct vfio_ccw_private
>>    * @sch: pointer to the subchannel
...snip...
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 02bb7ad6e986..56e2413d3e00 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -353,6 +353,8 @@ struct vfio_region_gfx_edid {
>>   #define VFIO_DEVICE_GFX_LINK_STATE_DOWN  2
>>   };
>> +#define VFIO_REGION_TYPE_CCW            (2)
>> +
> 
> Cool.  :)
> 
>>   /*
>>    * 10de vendor sub-type
>>    *
>>
> 
> Looks fine to me.  I'd love to think there was a way to generalize this 
> for other vfio drivers, but man that's a tall task.  So...

With the ssize_t fixup from patch 5...

> 
> Reviewed-by: Eric Farman <farman@linux.ibm.com>


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] vfio-ccw: add capabilities chain
@ 2019-01-25 21:00       ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 21:00 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson



On 01/25/2019 11:19 AM, Eric Farman wrote:
> 
> 
> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
>> Allow to extend the regions used by vfio-ccw. The first user will be
>> handling of halt and clear subchannel.
>>
>> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_ops.c     | 181 ++++++++++++++++++++++++----
>>   drivers/s390/cio/vfio_ccw_private.h |  38 ++++++
...snip...
>> diff --git a/drivers/s390/cio/vfio_ccw_private.h 
>> b/drivers/s390/cio/vfio_ccw_private.h
>> index e88237697f83..20e75f4f3695 100644
>> --- a/drivers/s390/cio/vfio_ccw_private.h
>> +++ b/drivers/s390/cio/vfio_ccw_private.h
>> @@ -3,9 +3,11 @@
>>    * Private stuff for vfio_ccw driver
>>    *
>>    * Copyright IBM Corp. 2017
>> + * Copyright Red Hat, Inc. 2019
>>    *
>>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
>>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
>> + *            Cornelia Huck <cohuck@redhat.com>
>>    */
>>   #ifndef _VFIO_CCW_PRIVATE_H_
>> @@ -19,6 +21,38 @@
>>   #include "css.h"
>>   #include "vfio_ccw_cp.h"
>> +#define VFIO_CCW_OFFSET_SHIFT   40
>> +#define VFIO_CCW_OFFSET_TO_INDEX(off)    (off >> VFIO_CCW_OFFSET_SHIFT)
>> +#define VFIO_CCW_INDEX_TO_OFFSET(index)    ((u64)(index) << 
>> VFIO_CCW_OFFSET_SHIFT)
>> +#define VFIO_CCW_OFFSET_MASK    (((u64)(1) << VFIO_CCW_OFFSET_SHIFT) 
>> - 1)
>> +
>> +/* capability chain handling similar to vfio-pci */
>> +struct vfio_ccw_private;
>> +struct vfio_ccw_region;
>> +
>> +struct vfio_ccw_regops {
>> +    size_t    (*read)(struct vfio_ccw_private *private, char __user 
>> *buf,
>> +            size_t count, loff_t *ppos);
>> +    size_t    (*write)(struct vfio_ccw_private *private,
>> +             const char __user *buf, size_t count, loff_t *ppos);

Oops.  Per my recommendation on v1, you change these to "ssize_t" in 
patch 5.  Might as well just do that here.

>> +    void    (*release)(struct vfio_ccw_private *private,
>> +               struct vfio_ccw_region *region);
>> +};
>> +
>> +struct vfio_ccw_region {
>> +    u32                type;
>> +    u32                subtype;
>> +    const struct vfio_ccw_regops    *ops;
>> +    void                *data;
>> +    size_t                size;
>> +    u32                flags;
>> +};
>> +
>> +int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
>> +                 unsigned int subtype,
>> +                 const struct vfio_ccw_regops *ops,
>> +                 size_t size, u32 flags, void *data);
>> +
>>   /**
>>    * struct vfio_ccw_private
>>    * @sch: pointer to the subchannel
...snip...
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 02bb7ad6e986..56e2413d3e00 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -353,6 +353,8 @@ struct vfio_region_gfx_edid {
>>   #define VFIO_DEVICE_GFX_LINK_STATE_DOWN  2
>>   };
>> +#define VFIO_REGION_TYPE_CCW            (2)
>> +
> 
> Cool.  :)
> 
>>   /*
>>    * 10de vendor sub-type
>>    *
>>
> 
> Looks fine to me.  I'd love to think there was a way to generalize this 
> for other vfio drivers, but man that's a tall task.  So...

With the ssize_t fixup from patch 5...

> 
> Reviewed-by: Eric Farman <farman@linux.ibm.com>


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
  2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-25 21:00     ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 21:00 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, qemu-s390x, Alex Williamson, qemu-devel, kvm



On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> Add a region to the vfio-ccw device that can be used to submit
> asynchronous I/O instructions. ssch continues to be handled by the
> existing I/O region; the new region handles hsch and csch.
> 
> Interrupt status continues to be reported through the same channels
> as for ssch.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---
>   drivers/s390/cio/Makefile           |   3 +-
>   drivers/s390/cio/vfio_ccw_async.c   |  91 ++++++++++++++++++++++
>   drivers/s390/cio/vfio_ccw_drv.c     |  45 +++++++----
>   drivers/s390/cio/vfio_ccw_fsm.c     | 114 +++++++++++++++++++++++++++-
>   drivers/s390/cio/vfio_ccw_ops.c     |  13 +++-
>   drivers/s390/cio/vfio_ccw_private.h |   9 ++-
>   include/uapi/linux/vfio.h           |   2 +
>   include/uapi/linux/vfio_ccw.h       |  12 +++
>   8 files changed, 269 insertions(+), 20 deletions(-)
>   create mode 100644 drivers/s390/cio/vfio_ccw_async.c
> 
> diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
> index f230516abb96..f6a8db04177c 100644
> --- a/drivers/s390/cio/Makefile
> +++ b/drivers/s390/cio/Makefile
> @@ -20,5 +20,6 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
>   qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
>   obj-$(CONFIG_QDIO) += qdio.o
>   
> -vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o
> +vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o \
> +	vfio_ccw_async.o
>   obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
> diff --git a/drivers/s390/cio/vfio_ccw_async.c b/drivers/s390/cio/vfio_ccw_async.c
> new file mode 100644
> index 000000000000..604806c2970f
> --- /dev/null
> +++ b/drivers/s390/cio/vfio_ccw_async.c
> @@ -0,0 +1,91 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Async I/O region for vfio_ccw
> + *
> + * Copyright Red Hat, Inc. 2019
> + *
> + * Author(s): Cornelia Huck <cohuck@redhat.com>
> + */
> +
> +#include <linux/vfio.h>
> +#include <linux/mdev.h>
> +
> +#include "vfio_ccw_private.h"
> +
> +static ssize_t vfio_ccw_async_region_read(struct vfio_ccw_private *private,
> +					  char __user *buf, size_t count,
> +					  loff_t *ppos)
> +{
> +	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
> +	struct ccw_cmd_region *region;
> +	int ret;
> +
> +	if (pos + count > sizeof(*region))
> +		return -EINVAL;
> +
> +	mutex_lock(&private->io_mutex);
> +	region = private->region[i].data;
> +	if (copy_to_user(buf, (void *)region + pos, count))
> +		ret = -EFAULT;
> +	else
> +		ret = count;
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
> +}
> +
> +static ssize_t vfio_ccw_async_region_write(struct vfio_ccw_private *private,
> +					   const char __user *buf, size_t count,
> +					   loff_t *ppos)
> +{
> +	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
> +	struct ccw_cmd_region *region;
> +	int ret;
> +
> +	if (pos + count > sizeof(*region))
> +		return -EINVAL;
> +
> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> +	    private->state == VFIO_CCW_STATE_STANDBY)
> +		return -EACCES;
> +	if (!mutex_trylock(&private->io_mutex))
> +		return -EAGAIN;
> +
> +	region = private->region[i].data;
> +	if (copy_from_user((void *)region + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
> +
> +	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_ASYNC_REQ);
> +
> +	ret = region->ret_code ? region->ret_code : count;
> +
> +out_unlock:
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
> +}
> +
> +static void vfio_ccw_async_region_release(struct vfio_ccw_private *private,
> +					  struct vfio_ccw_region *region)
> +{
> +
> +}
> +
> +const struct vfio_ccw_regops vfio_ccw_async_region_ops = {
> +	.read = vfio_ccw_async_region_read,
> +	.write = vfio_ccw_async_region_write,
> +	.release = vfio_ccw_async_region_release,
> +};
> +
> +int vfio_ccw_register_async_dev_regions(struct vfio_ccw_private *private)
> +{
> +	return vfio_ccw_register_dev_region(private,
> +					    VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD,
> +					    &vfio_ccw_async_region_ops,
> +					    sizeof(struct ccw_cmd_region),
> +					    VFIO_REGION_INFO_FLAG_READ |
> +					    VFIO_REGION_INFO_FLAG_WRITE,
> +					    private->cmd_region);
> +}
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 2ef189fe45ed..d807911b8ed5 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -3,9 +3,11 @@
>    * VFIO based Physical Subchannel device driver
>    *
>    * Copyright IBM Corp. 2017
> + * Copyright Red Hat, Inc. 2019
>    *
>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> + *            Cornelia Huck <cohuck@redhat.com>
>    */
>   
>   #include <linux/module.h>
> @@ -23,6 +25,7 @@
>   
>   struct workqueue_struct *vfio_ccw_work_q;
>   static struct kmem_cache *vfio_ccw_io_region;
> +static struct kmem_cache *vfio_ccw_cmd_region;
>   
>   /*
>    * Helpers
> @@ -104,7 +107,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>   {
>   	struct pmcw *pmcw = &sch->schib.pmcw;
>   	struct vfio_ccw_private *private;
> -	int ret;
> +	int ret = -ENOMEM;
>   
>   	if (pmcw->qf) {
>   		dev_warn(&sch->dev, "vfio: ccw: does not support QDIO: %s\n",
> @@ -118,10 +121,13 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>   
>   	private->io_region = kmem_cache_zalloc(vfio_ccw_io_region,
>   					       GFP_KERNEL | GFP_DMA);
> -	if (!private->io_region) {
> -		kfree(private);
> -		return -ENOMEM;
> -	}
> +	if (!private->io_region)
> +		goto out_free;
> +
> +	private->cmd_region = kmem_cache_zalloc(vfio_ccw_cmd_region,
> +						GFP_KERNEL | GFP_DMA);
> +	if (!private->cmd_region)
> +		goto out_free;
>   
>   	private->sch = sch;
>   	dev_set_drvdata(&sch->dev, private);
> @@ -149,7 +155,10 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>   	cio_disable_subchannel(sch);
>   out_free:
>   	dev_set_drvdata(&sch->dev, NULL);
> -	kmem_cache_free(vfio_ccw_io_region, private->io_region);
> +	if (private->cmd_region)
> +		kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
> +	if (private->io_region)
> +		kmem_cache_free(vfio_ccw_io_region, private->io_region);

Well, adding the check if private->xxx_region is non-NULL is fine.  I'd 
have made it a separate patch for io_region, but whatever.

Since you're adding that check, you should add the same if statement in 
vfio_ccw_sch_remove().  And you should certainly call 
kmem_cache_free(private->cmd_region) there too.  :)

>   	kfree(private);
>   	return ret;
>   }
> @@ -238,7 +247,7 @@ static struct css_driver vfio_ccw_sch_driver = {
>   
>   static int __init vfio_ccw_sch_init(void)
>   {
> -	int ret;
> +	int ret = -ENOMEM;
>   
>   	vfio_ccw_work_q = create_singlethread_workqueue("vfio-ccw");
>   	if (!vfio_ccw_work_q)
> @@ -248,20 +257,30 @@ static int __init vfio_ccw_sch_init(void)
>   					sizeof(struct ccw_io_region), 0,
>   					SLAB_ACCOUNT, 0,
>   					sizeof(struct ccw_io_region), NULL);
> -	if (!vfio_ccw_io_region) {
> -		destroy_workqueue(vfio_ccw_work_q);
> -		return -ENOMEM;
> -	}
> +	if (!vfio_ccw_io_region)
> +		goto out_err;
> +
> +	vfio_ccw_cmd_region = kmem_cache_create_usercopy("vfio_ccw_cmd_region",
> +					sizeof(struct ccw_cmd_region), 0,
> +					SLAB_ACCOUNT, 0,
> +					sizeof(struct ccw_cmd_region), NULL);
> +	if (!vfio_ccw_cmd_region)
> +		goto out_err;
>   
>   	isc_register(VFIO_CCW_ISC);
>   	ret = css_driver_register(&vfio_ccw_sch_driver);
>   	if (ret) {
>   		isc_unregister(VFIO_CCW_ISC);
> -		kmem_cache_destroy(vfio_ccw_io_region);
> -		destroy_workqueue(vfio_ccw_work_q);
> +		goto out_err;
>   	}
>   
>   	return ret;
> +
> +out_err:
> +	kmem_cache_destroy(vfio_ccw_cmd_region);
> +	kmem_cache_destroy(vfio_ccw_io_region);
> +	destroy_workqueue(vfio_ccw_work_q);
> +	return ret;
>   }
>   
>   static void __exit vfio_ccw_sch_exit(void)
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
> index f6ed934cc565..72912d596181 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -3,8 +3,10 @@
>    * Finite state machine for vfio-ccw device handling
>    *
>    * Copyright IBM Corp. 2017
> + * Copyright Red Hat, Inc. 2019
>    *
>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> + *            Cornelia Huck <cohuck@redhat.com>
>    */
>   
>   #include <linux/vfio.h>
> @@ -69,6 +71,81 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
>   	return ret;
>   }
>   
> +static int fsm_do_halt(struct vfio_ccw_private *private)
> +{
> +	struct subchannel *sch;
> +	unsigned long flags;
> +	int ccode;
> +	int ret;
> +
> +	sch = private->sch;
> +
> +	spin_lock_irqsave(sch->lock, flags);
> +
> +	/* Issue "Halt Subchannel" */
> +	ccode = hsch(sch->schid);
> +
> +	switch (ccode) {
> +	case 0:
> +		/*
> +		 * Initialize device status information
> +		 */
> +		sch->schib.scsw.cmd.actl |= SCSW_ACTL_HALT_PEND;
> +		ret = 0;
> +		private->state = VFIO_CCW_STATE_BUSY;
> +		break;
> +	case 1:		/* Status pending */
> +	case 2:		/* Busy */
> +		ret = -EBUSY;
> +		break;
> +	case 3:		/* Device not operational */
> +	{
> +		ret = -ENODEV;
> +		break;
> +	}

Why does cc3 get braces, but no other case does?  (Ditto for clear)

(I guess the answer is "because fsm_io_helper does" but I didn't notice 
that before either.  :-)

> +	default:
> +		ret = ccode;
> +	}
> +	spin_unlock_irqrestore(sch->lock, flags);
> +	return ret;
> +}
> +
> +static int fsm_do_clear(struct vfio_ccw_private *private)
> +{
> +	struct subchannel *sch;
> +	unsigned long flags;
> +	int ccode;
> +	int ret;
> +
> +	sch = private->sch;
> +
> +	spin_lock_irqsave(sch->lock, flags);
> +
> +	/* Issue "Clear Subchannel" */
> +	ccode = csch(sch->schid);
> +
> +	switch (ccode) {
> +	case 0:
> +		/*
> +		 * Initialize device status information
> +		 */
> +		sch->schib.scsw.cmd.actl = SCSW_ACTL_CLEAR_PEND;
> +		/* TODO: check what else we might need to clear */
> +		ret = 0;
> +		private->state = VFIO_CCW_STATE_BUSY;
> +		break;
> +	case 3:		/* Device not operational */
> +	{
> +		ret = -ENODEV;
> +		break;
> +	}
> +	default:
> +		ret = ccode;
> +	}
> +	spin_unlock_irqrestore(sch->lock, flags);
> +	return ret;
> +}
> +
>   static void fsm_notoper(struct vfio_ccw_private *private,
>   			enum vfio_ccw_event event)
>   {
> @@ -103,6 +180,14 @@ static void fsm_io_busy(struct vfio_ccw_private *private,
>   	private->io_region->ret_code = -EAGAIN;
>   }
>   
> +static void fsm_async_error(struct vfio_ccw_private *private,
> +			    enum vfio_ccw_event event)
> +{
> +	pr_err("vfio-ccw: FSM: halt/clear request from state:%d\n",
> +	       private->state);

Had a little deja vu here.  :)

Can this message use private->cmd_region->command to tell us if it's a 
halt, clear, or unknown?  Instead of just saying "halt/clear" statically.

> +	private->cmd_region->ret_code = -EIO;
> +}
> +
>   static void fsm_disabled_irq(struct vfio_ccw_private *private,
>   			     enum vfio_ccw_event event)
>   {
> @@ -165,11 +250,11 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   		}
>   		return;
>   	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> -		/* XXX: Handle halt. */
> +		/* halt is handled via the async cmd region */
>   		io_region->ret_code = -EOPNOTSUPP;
>   		goto err_out;
>   	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
> -		/* XXX: Handle clear. */
> +		/* clear is handled via the async cmd region */
>   		io_region->ret_code = -EOPNOTSUPP;
>   		goto err_out;
>   	}
> @@ -179,6 +264,27 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   			       io_region->ret_code, errstr);
>   }
>   
> +/*
> + * Deal with an async request from userspace.
> + */
> +static void fsm_async_request(struct vfio_ccw_private *private,
> +			      enum vfio_ccw_event event)
> +{
> +	struct ccw_cmd_region *cmd_region = private->cmd_region;
> +
> +	switch (cmd_region->command) {
> +	case VFIO_CCW_ASYNC_CMD_HSCH:
> +		cmd_region->ret_code = fsm_do_halt(private);
> +		break;
> +	case VFIO_CCW_ASYNC_CMD_CSCH:
> +		cmd_region->ret_code = fsm_do_clear(private);
> +		break;
> +	default:
> +		/* should not happen? */
> +		cmd_region->ret_code = -EINVAL;
> +	}
> +}
> +
>   /*
>    * Got an interrupt for a normal io (state busy).
>    */
> @@ -202,21 +308,25 @@ fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
>   	[VFIO_CCW_STATE_NOT_OPER] = {
>   		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_nop,
>   		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
> +		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
>   		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
>   	},
>   	[VFIO_CCW_STATE_STANDBY] = {
>   		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
>   		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
> +		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
>   		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
>   	},
>   	[VFIO_CCW_STATE_IDLE] = {
>   		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
>   		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_request,
> +		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_request,
>   		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
>   	},
>   	[VFIO_CCW_STATE_BUSY] = {
>   		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
>   		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
> +		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_request,
>   		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
>   	},
>   };
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index 5a89d09f9271..755806cb8d53 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -148,11 +148,20 @@ static int vfio_ccw_mdev_open(struct mdev_device *mdev)
>   	struct vfio_ccw_private *private =
>   		dev_get_drvdata(mdev_parent_dev(mdev));
>   	unsigned long events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> +	int ret;
>   
>   	private->nb.notifier_call = vfio_ccw_mdev_notifier;
>   
> -	return vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> -				      &events, &private->nb);
> +	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +				     &events, &private->nb);
> +	if (ret)
> +		return ret;
> +
> +	ret = vfio_ccw_register_async_dev_regions(private);
> +	if (ret)
> +		vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +					 &private->nb);
> +	return ret;
>   }
>   
>   static void vfio_ccw_mdev_release(struct mdev_device *mdev)
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 20e75f4f3695..ed8b94ea2f08 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -31,9 +31,9 @@ struct vfio_ccw_private;
>   struct vfio_ccw_region;
>   
>   struct vfio_ccw_regops {
> -	size_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
> +	ssize_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
>   			size_t count, loff_t *ppos);
> -	size_t	(*write)(struct vfio_ccw_private *private,
> +	ssize_t	(*write)(struct vfio_ccw_private *private,

Ah, amending my r-b to patch 3 :)

>   			 const char __user *buf, size_t count, loff_t *ppos);
>   	void	(*release)(struct vfio_ccw_private *private,
>   			   struct vfio_ccw_region *region);
> @@ -53,6 +53,8 @@ int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
>   				 const struct vfio_ccw_regops *ops,
>   				 size_t size, u32 flags, void *data);
>   
> +int vfio_ccw_register_async_dev_regions(struct vfio_ccw_private *private);
> +
>   /**
>    * struct vfio_ccw_private
>    * @sch: pointer to the subchannel
> @@ -64,6 +66,7 @@ int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
>    * @io_region: MMIO region to input/output I/O arguments/results
>    * @io_mutex: protect against concurrent update of I/O structures
>    * @region: additional regions for other subchannel operations
> + * @cmd_region: MMIO region for asynchronous I/O commands other than START
>    * @num_regions: number of additional regions
>    * @cp: channel program for the current I/O operation
>    * @irb: irb info received from interrupt
> @@ -81,6 +84,7 @@ struct vfio_ccw_private {
>   	struct ccw_io_region	*io_region;
>   	struct mutex		io_mutex;
>   	struct vfio_ccw_region *region;
> +	struct ccw_cmd_region	*cmd_region;
>   	int num_regions;
>   
>   	struct channel_program	cp;
> @@ -115,6 +119,7 @@ enum vfio_ccw_event {
>   	VFIO_CCW_EVENT_NOT_OPER,
>   	VFIO_CCW_EVENT_IO_REQ,
>   	VFIO_CCW_EVENT_INTERRUPT,
> +	VFIO_CCW_EVENT_ASYNC_REQ,
>   	/* last element! */
>   	NR_VFIO_CCW_EVENTS
>   };
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 56e2413d3e00..8f10748dac79 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -354,6 +354,8 @@ struct vfio_region_gfx_edid {
>   };
>   
>   #define VFIO_REGION_TYPE_CCW			(2)
> +/* ccw sub-types */
> +#define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD	(1)
>   
>   /*
>    * 10de vendor sub-type
> diff --git a/include/uapi/linux/vfio_ccw.h b/include/uapi/linux/vfio_ccw.h
> index 2ec5f367ff78..cbecbf0cd54f 100644
> --- a/include/uapi/linux/vfio_ccw.h
> +++ b/include/uapi/linux/vfio_ccw.h
> @@ -12,6 +12,7 @@
>   
>   #include <linux/types.h>
>   
> +/* used for START SUBCHANNEL, always present */
>   struct ccw_io_region {
>   #define ORB_AREA_SIZE 12
>   	__u8	orb_area[ORB_AREA_SIZE];
> @@ -22,4 +23,15 @@ struct ccw_io_region {
>   	__u32	ret_code;
>   } __packed;
>   
> +/*
> + * used for processing commands that trigger asynchronous actions
> + * Note: this is controlled by a capability
> + */
> +#define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
> +#define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
> +struct ccw_cmd_region {
> +	__u32 command;
> +	__u32 ret_code;
> +} __packed;
> +
>   #endif
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
@ 2019-01-25 21:00     ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-25 21:00 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic, Farhan Ali, Pierre Morel
  Cc: linux-s390, kvm, qemu-devel, qemu-s390x, Alex Williamson



On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> Add a region to the vfio-ccw device that can be used to submit
> asynchronous I/O instructions. ssch continues to be handled by the
> existing I/O region; the new region handles hsch and csch.
> 
> Interrupt status continues to be reported through the same channels
> as for ssch.
> 
> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> ---
>   drivers/s390/cio/Makefile           |   3 +-
>   drivers/s390/cio/vfio_ccw_async.c   |  91 ++++++++++++++++++++++
>   drivers/s390/cio/vfio_ccw_drv.c     |  45 +++++++----
>   drivers/s390/cio/vfio_ccw_fsm.c     | 114 +++++++++++++++++++++++++++-
>   drivers/s390/cio/vfio_ccw_ops.c     |  13 +++-
>   drivers/s390/cio/vfio_ccw_private.h |   9 ++-
>   include/uapi/linux/vfio.h           |   2 +
>   include/uapi/linux/vfio_ccw.h       |  12 +++
>   8 files changed, 269 insertions(+), 20 deletions(-)
>   create mode 100644 drivers/s390/cio/vfio_ccw_async.c
> 
> diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
> index f230516abb96..f6a8db04177c 100644
> --- a/drivers/s390/cio/Makefile
> +++ b/drivers/s390/cio/Makefile
> @@ -20,5 +20,6 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
>   qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
>   obj-$(CONFIG_QDIO) += qdio.o
>   
> -vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o
> +vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o \
> +	vfio_ccw_async.o
>   obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
> diff --git a/drivers/s390/cio/vfio_ccw_async.c b/drivers/s390/cio/vfio_ccw_async.c
> new file mode 100644
> index 000000000000..604806c2970f
> --- /dev/null
> +++ b/drivers/s390/cio/vfio_ccw_async.c
> @@ -0,0 +1,91 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Async I/O region for vfio_ccw
> + *
> + * Copyright Red Hat, Inc. 2019
> + *
> + * Author(s): Cornelia Huck <cohuck@redhat.com>
> + */
> +
> +#include <linux/vfio.h>
> +#include <linux/mdev.h>
> +
> +#include "vfio_ccw_private.h"
> +
> +static ssize_t vfio_ccw_async_region_read(struct vfio_ccw_private *private,
> +					  char __user *buf, size_t count,
> +					  loff_t *ppos)
> +{
> +	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
> +	struct ccw_cmd_region *region;
> +	int ret;
> +
> +	if (pos + count > sizeof(*region))
> +		return -EINVAL;
> +
> +	mutex_lock(&private->io_mutex);
> +	region = private->region[i].data;
> +	if (copy_to_user(buf, (void *)region + pos, count))
> +		ret = -EFAULT;
> +	else
> +		ret = count;
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
> +}
> +
> +static ssize_t vfio_ccw_async_region_write(struct vfio_ccw_private *private,
> +					   const char __user *buf, size_t count,
> +					   loff_t *ppos)
> +{
> +	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
> +	struct ccw_cmd_region *region;
> +	int ret;
> +
> +	if (pos + count > sizeof(*region))
> +		return -EINVAL;
> +
> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> +	    private->state == VFIO_CCW_STATE_STANDBY)
> +		return -EACCES;
> +	if (!mutex_trylock(&private->io_mutex))
> +		return -EAGAIN;
> +
> +	region = private->region[i].data;
> +	if (copy_from_user((void *)region + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
> +
> +	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_ASYNC_REQ);
> +
> +	ret = region->ret_code ? region->ret_code : count;
> +
> +out_unlock:
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
> +}
> +
> +static void vfio_ccw_async_region_release(struct vfio_ccw_private *private,
> +					  struct vfio_ccw_region *region)
> +{
> +
> +}
> +
> +const struct vfio_ccw_regops vfio_ccw_async_region_ops = {
> +	.read = vfio_ccw_async_region_read,
> +	.write = vfio_ccw_async_region_write,
> +	.release = vfio_ccw_async_region_release,
> +};
> +
> +int vfio_ccw_register_async_dev_regions(struct vfio_ccw_private *private)
> +{
> +	return vfio_ccw_register_dev_region(private,
> +					    VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD,
> +					    &vfio_ccw_async_region_ops,
> +					    sizeof(struct ccw_cmd_region),
> +					    VFIO_REGION_INFO_FLAG_READ |
> +					    VFIO_REGION_INFO_FLAG_WRITE,
> +					    private->cmd_region);
> +}
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 2ef189fe45ed..d807911b8ed5 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -3,9 +3,11 @@
>    * VFIO based Physical Subchannel device driver
>    *
>    * Copyright IBM Corp. 2017
> + * Copyright Red Hat, Inc. 2019
>    *
>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> + *            Cornelia Huck <cohuck@redhat.com>
>    */
>   
>   #include <linux/module.h>
> @@ -23,6 +25,7 @@
>   
>   struct workqueue_struct *vfio_ccw_work_q;
>   static struct kmem_cache *vfio_ccw_io_region;
> +static struct kmem_cache *vfio_ccw_cmd_region;
>   
>   /*
>    * Helpers
> @@ -104,7 +107,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>   {
>   	struct pmcw *pmcw = &sch->schib.pmcw;
>   	struct vfio_ccw_private *private;
> -	int ret;
> +	int ret = -ENOMEM;
>   
>   	if (pmcw->qf) {
>   		dev_warn(&sch->dev, "vfio: ccw: does not support QDIO: %s\n",
> @@ -118,10 +121,13 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>   
>   	private->io_region = kmem_cache_zalloc(vfio_ccw_io_region,
>   					       GFP_KERNEL | GFP_DMA);
> -	if (!private->io_region) {
> -		kfree(private);
> -		return -ENOMEM;
> -	}
> +	if (!private->io_region)
> +		goto out_free;
> +
> +	private->cmd_region = kmem_cache_zalloc(vfio_ccw_cmd_region,
> +						GFP_KERNEL | GFP_DMA);
> +	if (!private->cmd_region)
> +		goto out_free;
>   
>   	private->sch = sch;
>   	dev_set_drvdata(&sch->dev, private);
> @@ -149,7 +155,10 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
>   	cio_disable_subchannel(sch);
>   out_free:
>   	dev_set_drvdata(&sch->dev, NULL);
> -	kmem_cache_free(vfio_ccw_io_region, private->io_region);
> +	if (private->cmd_region)
> +		kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
> +	if (private->io_region)
> +		kmem_cache_free(vfio_ccw_io_region, private->io_region);

Well, adding the check if private->xxx_region is non-NULL is fine.  I'd 
have made it a separate patch for io_region, but whatever.

Since you're adding that check, you should add the same if statement in 
vfio_ccw_sch_remove().  And you should certainly call 
kmem_cache_free(private->cmd_region) there too.  :)

>   	kfree(private);
>   	return ret;
>   }
> @@ -238,7 +247,7 @@ static struct css_driver vfio_ccw_sch_driver = {
>   
>   static int __init vfio_ccw_sch_init(void)
>   {
> -	int ret;
> +	int ret = -ENOMEM;
>   
>   	vfio_ccw_work_q = create_singlethread_workqueue("vfio-ccw");
>   	if (!vfio_ccw_work_q)
> @@ -248,20 +257,30 @@ static int __init vfio_ccw_sch_init(void)
>   					sizeof(struct ccw_io_region), 0,
>   					SLAB_ACCOUNT, 0,
>   					sizeof(struct ccw_io_region), NULL);
> -	if (!vfio_ccw_io_region) {
> -		destroy_workqueue(vfio_ccw_work_q);
> -		return -ENOMEM;
> -	}
> +	if (!vfio_ccw_io_region)
> +		goto out_err;
> +
> +	vfio_ccw_cmd_region = kmem_cache_create_usercopy("vfio_ccw_cmd_region",
> +					sizeof(struct ccw_cmd_region), 0,
> +					SLAB_ACCOUNT, 0,
> +					sizeof(struct ccw_cmd_region), NULL);
> +	if (!vfio_ccw_cmd_region)
> +		goto out_err;
>   
>   	isc_register(VFIO_CCW_ISC);
>   	ret = css_driver_register(&vfio_ccw_sch_driver);
>   	if (ret) {
>   		isc_unregister(VFIO_CCW_ISC);
> -		kmem_cache_destroy(vfio_ccw_io_region);
> -		destroy_workqueue(vfio_ccw_work_q);
> +		goto out_err;
>   	}
>   
>   	return ret;
> +
> +out_err:
> +	kmem_cache_destroy(vfio_ccw_cmd_region);
> +	kmem_cache_destroy(vfio_ccw_io_region);
> +	destroy_workqueue(vfio_ccw_work_q);
> +	return ret;
>   }
>   
>   static void __exit vfio_ccw_sch_exit(void)
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
> index f6ed934cc565..72912d596181 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -3,8 +3,10 @@
>    * Finite state machine for vfio-ccw device handling
>    *
>    * Copyright IBM Corp. 2017
> + * Copyright Red Hat, Inc. 2019
>    *
>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> + *            Cornelia Huck <cohuck@redhat.com>
>    */
>   
>   #include <linux/vfio.h>
> @@ -69,6 +71,81 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
>   	return ret;
>   }
>   
> +static int fsm_do_halt(struct vfio_ccw_private *private)
> +{
> +	struct subchannel *sch;
> +	unsigned long flags;
> +	int ccode;
> +	int ret;
> +
> +	sch = private->sch;
> +
> +	spin_lock_irqsave(sch->lock, flags);
> +
> +	/* Issue "Halt Subchannel" */
> +	ccode = hsch(sch->schid);
> +
> +	switch (ccode) {
> +	case 0:
> +		/*
> +		 * Initialize device status information
> +		 */
> +		sch->schib.scsw.cmd.actl |= SCSW_ACTL_HALT_PEND;
> +		ret = 0;
> +		private->state = VFIO_CCW_STATE_BUSY;
> +		break;
> +	case 1:		/* Status pending */
> +	case 2:		/* Busy */
> +		ret = -EBUSY;
> +		break;
> +	case 3:		/* Device not operational */
> +	{
> +		ret = -ENODEV;
> +		break;
> +	}

Why does cc3 get braces, but no other case does?  (Ditto for clear)

(I guess the answer is "because fsm_io_helper does" but I didn't notice 
that before either.  :-)

> +	default:
> +		ret = ccode;
> +	}
> +	spin_unlock_irqrestore(sch->lock, flags);
> +	return ret;
> +}
> +
> +static int fsm_do_clear(struct vfio_ccw_private *private)
> +{
> +	struct subchannel *sch;
> +	unsigned long flags;
> +	int ccode;
> +	int ret;
> +
> +	sch = private->sch;
> +
> +	spin_lock_irqsave(sch->lock, flags);
> +
> +	/* Issue "Clear Subchannel" */
> +	ccode = csch(sch->schid);
> +
> +	switch (ccode) {
> +	case 0:
> +		/*
> +		 * Initialize device status information
> +		 */
> +		sch->schib.scsw.cmd.actl = SCSW_ACTL_CLEAR_PEND;
> +		/* TODO: check what else we might need to clear */
> +		ret = 0;
> +		private->state = VFIO_CCW_STATE_BUSY;
> +		break;
> +	case 3:		/* Device not operational */
> +	{
> +		ret = -ENODEV;
> +		break;
> +	}
> +	default:
> +		ret = ccode;
> +	}
> +	spin_unlock_irqrestore(sch->lock, flags);
> +	return ret;
> +}
> +
>   static void fsm_notoper(struct vfio_ccw_private *private,
>   			enum vfio_ccw_event event)
>   {
> @@ -103,6 +180,14 @@ static void fsm_io_busy(struct vfio_ccw_private *private,
>   	private->io_region->ret_code = -EAGAIN;
>   }
>   
> +static void fsm_async_error(struct vfio_ccw_private *private,
> +			    enum vfio_ccw_event event)
> +{
> +	pr_err("vfio-ccw: FSM: halt/clear request from state:%d\n",
> +	       private->state);

Had a little deja vu here.  :)

Can this message use private->cmd_region->command to tell us if it's a 
halt, clear, or unknown?  Instead of just saying "halt/clear" statically.

> +	private->cmd_region->ret_code = -EIO;
> +}
> +
>   static void fsm_disabled_irq(struct vfio_ccw_private *private,
>   			     enum vfio_ccw_event event)
>   {
> @@ -165,11 +250,11 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   		}
>   		return;
>   	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> -		/* XXX: Handle halt. */
> +		/* halt is handled via the async cmd region */
>   		io_region->ret_code = -EOPNOTSUPP;
>   		goto err_out;
>   	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
> -		/* XXX: Handle clear. */
> +		/* clear is handled via the async cmd region */
>   		io_region->ret_code = -EOPNOTSUPP;
>   		goto err_out;
>   	}
> @@ -179,6 +264,27 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   			       io_region->ret_code, errstr);
>   }
>   
> +/*
> + * Deal with an async request from userspace.
> + */
> +static void fsm_async_request(struct vfio_ccw_private *private,
> +			      enum vfio_ccw_event event)
> +{
> +	struct ccw_cmd_region *cmd_region = private->cmd_region;
> +
> +	switch (cmd_region->command) {
> +	case VFIO_CCW_ASYNC_CMD_HSCH:
> +		cmd_region->ret_code = fsm_do_halt(private);
> +		break;
> +	case VFIO_CCW_ASYNC_CMD_CSCH:
> +		cmd_region->ret_code = fsm_do_clear(private);
> +		break;
> +	default:
> +		/* should not happen? */
> +		cmd_region->ret_code = -EINVAL;
> +	}
> +}
> +
>   /*
>    * Got an interrupt for a normal io (state busy).
>    */
> @@ -202,21 +308,25 @@ fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
>   	[VFIO_CCW_STATE_NOT_OPER] = {
>   		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_nop,
>   		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
> +		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
>   		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
>   	},
>   	[VFIO_CCW_STATE_STANDBY] = {
>   		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
>   		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
> +		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_error,
>   		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
>   	},
>   	[VFIO_CCW_STATE_IDLE] = {
>   		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
>   		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_request,
> +		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_request,
>   		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
>   	},
>   	[VFIO_CCW_STATE_BUSY] = {
>   		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
>   		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
> +		[VFIO_CCW_EVENT_ASYNC_REQ]	= fsm_async_request,
>   		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
>   	},
>   };
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index 5a89d09f9271..755806cb8d53 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -148,11 +148,20 @@ static int vfio_ccw_mdev_open(struct mdev_device *mdev)
>   	struct vfio_ccw_private *private =
>   		dev_get_drvdata(mdev_parent_dev(mdev));
>   	unsigned long events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> +	int ret;
>   
>   	private->nb.notifier_call = vfio_ccw_mdev_notifier;
>   
> -	return vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> -				      &events, &private->nb);
> +	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +				     &events, &private->nb);
> +	if (ret)
> +		return ret;
> +
> +	ret = vfio_ccw_register_async_dev_regions(private);
> +	if (ret)
> +		vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +					 &private->nb);
> +	return ret;
>   }
>   
>   static void vfio_ccw_mdev_release(struct mdev_device *mdev)
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 20e75f4f3695..ed8b94ea2f08 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -31,9 +31,9 @@ struct vfio_ccw_private;
>   struct vfio_ccw_region;
>   
>   struct vfio_ccw_regops {
> -	size_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
> +	ssize_t	(*read)(struct vfio_ccw_private *private, char __user *buf,
>   			size_t count, loff_t *ppos);
> -	size_t	(*write)(struct vfio_ccw_private *private,
> +	ssize_t	(*write)(struct vfio_ccw_private *private,

Ah, amending my r-b to patch 3 :)

>   			 const char __user *buf, size_t count, loff_t *ppos);
>   	void	(*release)(struct vfio_ccw_private *private,
>   			   struct vfio_ccw_region *region);
> @@ -53,6 +53,8 @@ int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
>   				 const struct vfio_ccw_regops *ops,
>   				 size_t size, u32 flags, void *data);
>   
> +int vfio_ccw_register_async_dev_regions(struct vfio_ccw_private *private);
> +
>   /**
>    * struct vfio_ccw_private
>    * @sch: pointer to the subchannel
> @@ -64,6 +66,7 @@ int vfio_ccw_register_dev_region(struct vfio_ccw_private *private,
>    * @io_region: MMIO region to input/output I/O arguments/results
>    * @io_mutex: protect against concurrent update of I/O structures
>    * @region: additional regions for other subchannel operations
> + * @cmd_region: MMIO region for asynchronous I/O commands other than START
>    * @num_regions: number of additional regions
>    * @cp: channel program for the current I/O operation
>    * @irb: irb info received from interrupt
> @@ -81,6 +84,7 @@ struct vfio_ccw_private {
>   	struct ccw_io_region	*io_region;
>   	struct mutex		io_mutex;
>   	struct vfio_ccw_region *region;
> +	struct ccw_cmd_region	*cmd_region;
>   	int num_regions;
>   
>   	struct channel_program	cp;
> @@ -115,6 +119,7 @@ enum vfio_ccw_event {
>   	VFIO_CCW_EVENT_NOT_OPER,
>   	VFIO_CCW_EVENT_IO_REQ,
>   	VFIO_CCW_EVENT_INTERRUPT,
> +	VFIO_CCW_EVENT_ASYNC_REQ,
>   	/* last element! */
>   	NR_VFIO_CCW_EVENTS
>   };
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 56e2413d3e00..8f10748dac79 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -354,6 +354,8 @@ struct vfio_region_gfx_edid {
>   };
>   
>   #define VFIO_REGION_TYPE_CCW			(2)
> +/* ccw sub-types */
> +#define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD	(1)
>   
>   /*
>    * 10de vendor sub-type
> diff --git a/include/uapi/linux/vfio_ccw.h b/include/uapi/linux/vfio_ccw.h
> index 2ec5f367ff78..cbecbf0cd54f 100644
> --- a/include/uapi/linux/vfio_ccw.h
> +++ b/include/uapi/linux/vfio_ccw.h
> @@ -12,6 +12,7 @@
>   
>   #include <linux/types.h>
>   
> +/* used for START SUBCHANNEL, always present */
>   struct ccw_io_region {
>   #define ORB_AREA_SIZE 12
>   	__u8	orb_area[ORB_AREA_SIZE];
> @@ -22,4 +23,15 @@ struct ccw_io_region {
>   	__u32	ret_code;
>   } __packed;
>   
> +/*
> + * used for processing commands that trigger asynchronous actions
> + * Note: this is controlled by a capability
> + */
> +#define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
> +#define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
> +struct ccw_cmd_region {
> +	__u32 command;
> +	__u32 ret_code;
> +} __packed;
> +
>   #endif
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 14:01             ` [Qemu-devel] " Halil Pasic
@ 2019-01-28 17:09               ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:09 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Fri, 25 Jan 2019 15:01:01 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Fri, 25 Jan 2019 13:58:35 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:

> > - The code should not be interrupted while we process the channel
> >   program, do the ssch etc. We want the caller to try again later (i.e.
> >   return -EAGAIN)  

(...)

> > - With the async interface, we want user space to be able to submit a
> >   halt/clear while a start request is still in flight, but not while
> >   we're processing a start request with translation etc. We probably
> >   want to do -EAGAIN in that case.  
> 
> This reads very similar to your first point.

Not quite. ssch() means that we have a cp around; for hsch()/csch() we
don't have such a thing. So we want to protect the process of
translating the cp etc., but we don't need such protection for the
halt/clear processing.

> 
> > 
> > My idea would be:
> > 
> > - The BUSY state denotes "I'm busy processing a request right now, try
> >   again". We hold it while processing the cp and doing the ssch and
> >   leave it afterwards (i.e., while the start request is processed by
> >   the hardware). I/O requests and async requests get -EAGAIN in that
> >   state.
> > - A new state (CP_PENDING?) is entered after ssch returned with cc 0
> >   (from the BUSY state). We stay in there as long as no final state for
> >   that request has been received and delivered. (This may be final
> >   interrupt for that request, a deferred cc, or successful halt/clear.)
> >   I/O requests get -EBUSY, async requests are processed. This state can
> >   be removed again once we are able to handle more than one outstanding
> >   cp.
> > 
> > Does that make sense?
> >   
> 
> AFAIU your idea is to split up the busy state into two states: CP_PENDING
> and of busy without CP_PENDING called BUSY. I like the idea of having a
> separate state for CP_PENDING but I don't like the new semantic of BUSY.
> 
> Hm mashing a conceptual state machine and the jumptabe stuff ain't
> making reasoning about this simpler either. I'm taking about the
> conceptual state machine. It would be nice to have a picture of it and
> then think about how to express that in code.

Sorry, I'm having a hard time parsing your comments. Are you looking
for something like the below?

IDLE --- IO_REQ --> BUSY ---> CP_PENDING --- IRQ ---> IDLE (if final
state for I/O)
(normal ssch)

BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
(user space is supposed to retry, as we'll eventually progress from
BUSY)

CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
(user space is supposed to map this to the appropriate cc for the guest)

IDLE --- ASYNC_REQ ---> IDLE
(user space is welcome to do anything else right away)

BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
(user space is supposed to retry, as above)

CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
(the interrupt will get us out of CP_PENDING eventually)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-28 17:09               ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:09 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Fri, 25 Jan 2019 15:01:01 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Fri, 25 Jan 2019 13:58:35 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:

> > - The code should not be interrupted while we process the channel
> >   program, do the ssch etc. We want the caller to try again later (i.e.
> >   return -EAGAIN)  

(...)

> > - With the async interface, we want user space to be able to submit a
> >   halt/clear while a start request is still in flight, but not while
> >   we're processing a start request with translation etc. We probably
> >   want to do -EAGAIN in that case.  
> 
> This reads very similar to your first point.

Not quite. ssch() means that we have a cp around; for hsch()/csch() we
don't have such a thing. So we want to protect the process of
translating the cp etc., but we don't need such protection for the
halt/clear processing.

> 
> > 
> > My idea would be:
> > 
> > - The BUSY state denotes "I'm busy processing a request right now, try
> >   again". We hold it while processing the cp and doing the ssch and
> >   leave it afterwards (i.e., while the start request is processed by
> >   the hardware). I/O requests and async requests get -EAGAIN in that
> >   state.
> > - A new state (CP_PENDING?) is entered after ssch returned with cc 0
> >   (from the BUSY state). We stay in there as long as no final state for
> >   that request has been received and delivered. (This may be final
> >   interrupt for that request, a deferred cc, or successful halt/clear.)
> >   I/O requests get -EBUSY, async requests are processed. This state can
> >   be removed again once we are able to handle more than one outstanding
> >   cp.
> > 
> > Does that make sense?
> >   
> 
> AFAIU your idea is to split up the busy state into two states: CP_PENDING
> and of busy without CP_PENDING called BUSY. I like the idea of having a
> separate state for CP_PENDING but I don't like the new semantic of BUSY.
> 
> Hm mashing a conceptual state machine and the jumptabe stuff ain't
> making reasoning about this simpler either. I'm taking about the
> conceptual state machine. It would be nice to have a picture of it and
> then think about how to express that in code.

Sorry, I'm having a hard time parsing your comments. Are you looking
for something like the below?

IDLE --- IO_REQ --> BUSY ---> CP_PENDING --- IRQ ---> IDLE (if final
state for I/O)
(normal ssch)

BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
(user space is supposed to retry, as we'll eventually progress from
BUSY)

CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
(user space is supposed to map this to the appropriate cc for the guest)

IDLE --- ASYNC_REQ ---> IDLE
(user space is welcome to do anything else right away)

BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
(user space is supposed to retry, as above)

CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
(the interrupt will get us out of CP_PENDING eventually)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 16:04                 ` [Qemu-devel] " Halil Pasic
@ 2019-01-28 17:13                   ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:13 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, kvm, Pierre Morel, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Fri, 25 Jan 2019 17:04:04 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> Do we expect userspace/QEMU to fence the bad scenarios as tries to do
> today, or is this supposed to change to hardware should sort out
> requests whenever possible.

Does my other mail answer that?

> The problem I see with the let the hardware sort it out is that, for that
> to work, we need to juggle multiple translations simultaneously (or am I
> wrong?). Doing that does not appear particularly simple to me.

None in the first stage, at most two in the second stage, I guess.

> Furthermore we would go through all that hassle knowingly that the sole
> reason is working around bugs. We still expect our Linux guests
> serializing it's ssch() stuff as it does today. Thus I would except this
> code not getting the love nor the coverage that would guard against bugs
> in that code.

So, we should have test code for that? (Any IBM-internal channel I/O
exercisers that may help?)

We should not rely on the guest being sane, although Linux probably is
in that respect.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-28 17:13                   ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:13 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Fri, 25 Jan 2019 17:04:04 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> Do we expect userspace/QEMU to fence the bad scenarios as tries to do
> today, or is this supposed to change to hardware should sort out
> requests whenever possible.

Does my other mail answer that?

> The problem I see with the let the hardware sort it out is that, for that
> to work, we need to juggle multiple translations simultaneously (or am I
> wrong?). Doing that does not appear particularly simple to me.

None in the first stage, at most two in the second stage, I guess.

> Furthermore we would go through all that hassle knowingly that the sole
> reason is working around bugs. We still expect our Linux guests
> serializing it's ssch() stuff as it does today. Thus I would except this
> code not getting the love nor the coverage that would guard against bugs
> in that code.

So, we should have test code for that? (Any IBM-internal channel I/O
exercisers that may help?)

We should not rely on the guest being sane, although Linux probably is
in that respect.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 15:57             ` [Qemu-devel] " Eric Farman
@ 2019-01-28 17:24               ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:24 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x

On Fri, 25 Jan 2019 10:57:38 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/25/2019 07:58 AM, Cornelia Huck wrote:
> > On Fri, 25 Jan 2019 11:24:37 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> >> On Thu, 24 Jan 2019 21:37:44 -0500
> >> Eric Farman <farman@linux.ibm.com> wrote:
> >>  
> >>> On 01/24/2019 09:25 PM, Eric Farman wrote:  
> >>>>
> >>>>
> >>>> On 01/21/2019 06:03 AM, Cornelia Huck wrote:  
> >>  
> >>>> [1] I think these changes are cool.  We end up going into (and staying
> >>>> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
> >>>> bumble along.
> >>>>
> >>>> But why can't these be separated out from this patch?  It does change
> >>>> the behavior of the state machine, and seem distinct from the addition
> >>>> of the mutex you otherwise add here?  At the very least, this behavior
> >>>> change should be documented in the commit since it's otherwise lost in
> >>>> the mutex/EAGAIN stuff.  
> >>
> >> That's a very good idea. I'll factor them out into a separate patch.  
> > 
> > And now that I've factored it out, I noticed some more problems.  
> 
> That's good!  Maybe it helps us with the circles we're on :)

:)

> 
> > 
> > What we basically need is the following, I think:
> > 
> > - The code should not be interrupted while we process the channel
> >    program, do the ssch etc. We want the caller to try again later (i.e.
> >    return -EAGAIN)
> > - We currently do not want the user space to submit another channel
> >    program while the first one is still in flight.   
> 
> These two seem to contradict one another.  I think you're saying is that 
> we don't _want_ userspace to issue another channel program, even though 
> its _allowed_ to as far as vfio-ccw is concerned.

What I'm trying to say is that we want to distinguish two things:
- The code is currently doing translation etc. We probably want to keep
  that atomic, in order not to make things too complicated.
- We have sent the ssch() to the hardware, but have not yet received
  the final interrupt for that request (that's what I meant with "in
  flight"). It's easier for the first shot to disallow a second ssch()
  as that would need handling of more than one cp request, but we may
  want to allow it in the future.
  A hsch()/csch() (which does not generate a new cp) should be fine.

(see also my reply to Halil's mail)

> 
> As submitting another
> >    one is a valid request, however, we should allow this in the future
> >    (once we have the code to handle that in place).
> > - With the async interface, we want user space to be able to submit a
> >    halt/clear while a start request is still in flight, but not while
> >    we're processing a start request with translation etc. We probably
> >    want to do -EAGAIN in that case.
> > 
> > My idea would be:
> > 
> > - The BUSY state denotes "I'm busy processing a request right now, try
> >    again". We hold it while processing the cp and doing the ssch and
> >    leave it afterwards (i.e., while the start request is processed by
> >    the hardware). I/O requests and async requests get -EAGAIN in that
> >    state.
> > - A new state (CP_PENDING?) is entered after ssch returned with cc 0
> >    (from the BUSY state). We stay in there as long as no final state for
> >    that request has been received and delivered. (This may be final
> >    interrupt for that request, a deferred cc, or successful halt/clear.)
> >    I/O requests get -EBUSY  
> 
> I liked CP_PENDING, since it corresponds to the subchannel being marked 
> "start pending" as described in POPS, but this statement suggests that 
> the BUSY/PENDING state to be swapped, such that state=PENDING returns 
> -EAGAIN and state=BUSY returns -EBUSY.  Not super-concerned with the 
> terminology though.

What about s/BUSY/CP_PROCESSING/ ?

> 
> , async requests are processed. This state can
> >    be removed again once we are able to handle more than one outstanding
> >    cp.
> > 
> > Does that make sense?
> >   
> 
> I think so, and I think I like it.  So you want to distinguish between 
> (I have swapped BUSY/PENDING in this example per my above comment):
> 
> A) SSCH issued by userspace (IDLE->PENDING)
> B) SSCH issued (successfully) by kernel (PENDING->BUSY)
> B') SSCH issued (unsuccessfully) by kernel (PENDING->IDLE?)

I think so.

> C) Interrupt received by kernel (no change?)
> D) Interrupt given to userspace (BUSY->IDLE)

Only if that is the final interrupt for that cp.

> 
> If we receive A and A, the second A gets EAGAIN
> 
> If we do A+B and A, the second A gets EBUSY (unless async, which is 
> processed)

Nod.

> Does the boundary of "in flight" in the interrupt side (C and D) need to 
> be defined, such that we go BUSY->PENDING->IDLE instead of BUSY->IDLE ?

I don't think we can go BUSY->PENDING (in your terminology), at that
would imply a retry of the ssch()?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-28 17:24               ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:24 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Fri, 25 Jan 2019 10:57:38 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/25/2019 07:58 AM, Cornelia Huck wrote:
> > On Fri, 25 Jan 2019 11:24:37 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >   
> >> On Thu, 24 Jan 2019 21:37:44 -0500
> >> Eric Farman <farman@linux.ibm.com> wrote:
> >>  
> >>> On 01/24/2019 09:25 PM, Eric Farman wrote:  
> >>>>
> >>>>
> >>>> On 01/21/2019 06:03 AM, Cornelia Huck wrote:  
> >>  
> >>>> [1] I think these changes are cool.  We end up going into (and staying
> >>>> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
> >>>> bumble along.
> >>>>
> >>>> But why can't these be separated out from this patch?  It does change
> >>>> the behavior of the state machine, and seem distinct from the addition
> >>>> of the mutex you otherwise add here?  At the very least, this behavior
> >>>> change should be documented in the commit since it's otherwise lost in
> >>>> the mutex/EAGAIN stuff.  
> >>
> >> That's a very good idea. I'll factor them out into a separate patch.  
> > 
> > And now that I've factored it out, I noticed some more problems.  
> 
> That's good!  Maybe it helps us with the circles we're on :)

:)

> 
> > 
> > What we basically need is the following, I think:
> > 
> > - The code should not be interrupted while we process the channel
> >    program, do the ssch etc. We want the caller to try again later (i.e.
> >    return -EAGAIN)
> > - We currently do not want the user space to submit another channel
> >    program while the first one is still in flight.   
> 
> These two seem to contradict one another.  I think you're saying is that 
> we don't _want_ userspace to issue another channel program, even though 
> its _allowed_ to as far as vfio-ccw is concerned.

What I'm trying to say is that we want to distinguish two things:
- The code is currently doing translation etc. We probably want to keep
  that atomic, in order not to make things too complicated.
- We have sent the ssch() to the hardware, but have not yet received
  the final interrupt for that request (that's what I meant with "in
  flight"). It's easier for the first shot to disallow a second ssch()
  as that would need handling of more than one cp request, but we may
  want to allow it in the future.
  A hsch()/csch() (which does not generate a new cp) should be fine.

(see also my reply to Halil's mail)

> 
> As submitting another
> >    one is a valid request, however, we should allow this in the future
> >    (once we have the code to handle that in place).
> > - With the async interface, we want user space to be able to submit a
> >    halt/clear while a start request is still in flight, but not while
> >    we're processing a start request with translation etc. We probably
> >    want to do -EAGAIN in that case.
> > 
> > My idea would be:
> > 
> > - The BUSY state denotes "I'm busy processing a request right now, try
> >    again". We hold it while processing the cp and doing the ssch and
> >    leave it afterwards (i.e., while the start request is processed by
> >    the hardware). I/O requests and async requests get -EAGAIN in that
> >    state.
> > - A new state (CP_PENDING?) is entered after ssch returned with cc 0
> >    (from the BUSY state). We stay in there as long as no final state for
> >    that request has been received and delivered. (This may be final
> >    interrupt for that request, a deferred cc, or successful halt/clear.)
> >    I/O requests get -EBUSY  
> 
> I liked CP_PENDING, since it corresponds to the subchannel being marked 
> "start pending" as described in POPS, but this statement suggests that 
> the BUSY/PENDING state to be swapped, such that state=PENDING returns 
> -EAGAIN and state=BUSY returns -EBUSY.  Not super-concerned with the 
> terminology though.

What about s/BUSY/CP_PROCESSING/ ?

> 
> , async requests are processed. This state can
> >    be removed again once we are able to handle more than one outstanding
> >    cp.
> > 
> > Does that make sense?
> >   
> 
> I think so, and I think I like it.  So you want to distinguish between 
> (I have swapped BUSY/PENDING in this example per my above comment):
> 
> A) SSCH issued by userspace (IDLE->PENDING)
> B) SSCH issued (successfully) by kernel (PENDING->BUSY)
> B') SSCH issued (unsuccessfully) by kernel (PENDING->IDLE?)

I think so.

> C) Interrupt received by kernel (no change?)
> D) Interrupt given to userspace (BUSY->IDLE)

Only if that is the final interrupt for that cp.

> 
> If we receive A and A, the second A gets EAGAIN
> 
> If we do A+B and A, the second A gets EBUSY (unless async, which is 
> processed)

Nod.

> Does the boundary of "in flight" in the interrupt side (C and D) need to 
> be defined, such that we go BUSY->PENDING->IDLE instead of BUSY->IDLE ?

I don't think we can go BUSY->PENDING (in your terminology), at that
would imply a retry of the ssch()?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-25 20:22           ` [Qemu-devel] " Eric Farman
@ 2019-01-28 17:31             ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:31 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x

On Fri, 25 Jan 2019 15:22:56 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> If we come into mdev_write with state=BUSY and we get the lock, 
> copy_from_user, and do our jump table we go to fsm_io_busy to set 
> ret_code and return -EAGAIN.  Why then don't we set the jump table for 
> state=NOT_OPER||STANDBY to do something that will return -EACCES instead 
> of how we currently do a direct return of -EACCES before all the 
> lock/copy stuff (and the jump table that would take us to fsm_io_error 
> and an error message before returning -EIO)?

If you phrase it like that, I'm wondering why we're not already doing
it that way :) We just need to make sure to revert to the previous
state on error instead of IDLE.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-28 17:31             ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:31 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Fri, 25 Jan 2019 15:22:56 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> If we come into mdev_write with state=BUSY and we get the lock, 
> copy_from_user, and do our jump table we go to fsm_io_busy to set 
> ret_code and return -EAGAIN.  Why then don't we set the jump table for 
> state=NOT_OPER||STANDBY to do something that will return -EACCES instead 
> of how we currently do a direct return of -EACCES before all the 
> lock/copy stuff (and the jump table that would take us to fsm_io_error 
> and an error message before returning -EIO)?

If you phrase it like that, I'm wondering why we're not already doing
it that way :) We just need to make sure to revert to the previous
state on error instead of IDLE.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 3/5] vfio-ccw: add capabilities chain
  2019-01-25 21:00       ` [Qemu-devel] " Eric Farman
@ 2019-01-28 17:34         ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:34 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x

On Fri, 25 Jan 2019 16:00:18 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/25/2019 11:19 AM, Eric Farman wrote:
> > 
> > 
> > On 01/21/2019 06:03 AM, Cornelia Huck wrote:  
> >> Allow to extend the regions used by vfio-ccw. The first user will be
> >> handling of halt and clear subchannel.
> >>
> >> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> >> ---
> >>   drivers/s390/cio/vfio_ccw_ops.c     | 181 ++++++++++++++++++++++++----
> >>   drivers/s390/cio/vfio_ccw_private.h |  38 ++++++  
> ...snip...
> >> diff --git a/drivers/s390/cio/vfio_ccw_private.h 
> >> b/drivers/s390/cio/vfio_ccw_private.h
> >> index e88237697f83..20e75f4f3695 100644
> >> --- a/drivers/s390/cio/vfio_ccw_private.h
> >> +++ b/drivers/s390/cio/vfio_ccw_private.h
> >> @@ -3,9 +3,11 @@
> >>    * Private stuff for vfio_ccw driver
> >>    *
> >>    * Copyright IBM Corp. 2017
> >> + * Copyright Red Hat, Inc. 2019
> >>    *
> >>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> >>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> >> + *            Cornelia Huck <cohuck@redhat.com>
> >>    */
> >>   #ifndef _VFIO_CCW_PRIVATE_H_
> >> @@ -19,6 +21,38 @@
> >>   #include "css.h"
> >>   #include "vfio_ccw_cp.h"
> >> +#define VFIO_CCW_OFFSET_SHIFT   40
> >> +#define VFIO_CCW_OFFSET_TO_INDEX(off)    (off >> VFIO_CCW_OFFSET_SHIFT)
> >> +#define VFIO_CCW_INDEX_TO_OFFSET(index)    ((u64)(index) << 
> >> VFIO_CCW_OFFSET_SHIFT)
> >> +#define VFIO_CCW_OFFSET_MASK    (((u64)(1) << VFIO_CCW_OFFSET_SHIFT) 
> >> - 1)
> >> +
> >> +/* capability chain handling similar to vfio-pci */
> >> +struct vfio_ccw_private;
> >> +struct vfio_ccw_region;
> >> +
> >> +struct vfio_ccw_regops {
> >> +    size_t    (*read)(struct vfio_ccw_private *private, char __user 
> >> *buf,
> >> +            size_t count, loff_t *ppos);
> >> +    size_t    (*write)(struct vfio_ccw_private *private,
> >> +             const char __user *buf, size_t count, loff_t *ppos);  
> 
> Oops.  Per my recommendation on v1, you change these to "ssize_t" in 
> patch 5.  Might as well just do that here.

Seems to have slipped into the wrong patch during rebase. Will fix.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] vfio-ccw: add capabilities chain
@ 2019-01-28 17:34         ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:34 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Fri, 25 Jan 2019 16:00:18 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/25/2019 11:19 AM, Eric Farman wrote:
> > 
> > 
> > On 01/21/2019 06:03 AM, Cornelia Huck wrote:  
> >> Allow to extend the regions used by vfio-ccw. The first user will be
> >> handling of halt and clear subchannel.
> >>
> >> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> >> ---
> >>   drivers/s390/cio/vfio_ccw_ops.c     | 181 ++++++++++++++++++++++++----
> >>   drivers/s390/cio/vfio_ccw_private.h |  38 ++++++  
> ...snip...
> >> diff --git a/drivers/s390/cio/vfio_ccw_private.h 
> >> b/drivers/s390/cio/vfio_ccw_private.h
> >> index e88237697f83..20e75f4f3695 100644
> >> --- a/drivers/s390/cio/vfio_ccw_private.h
> >> +++ b/drivers/s390/cio/vfio_ccw_private.h
> >> @@ -3,9 +3,11 @@
> >>    * Private stuff for vfio_ccw driver
> >>    *
> >>    * Copyright IBM Corp. 2017
> >> + * Copyright Red Hat, Inc. 2019
> >>    *
> >>    * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> >>    *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> >> + *            Cornelia Huck <cohuck@redhat.com>
> >>    */
> >>   #ifndef _VFIO_CCW_PRIVATE_H_
> >> @@ -19,6 +21,38 @@
> >>   #include "css.h"
> >>   #include "vfio_ccw_cp.h"
> >> +#define VFIO_CCW_OFFSET_SHIFT   40
> >> +#define VFIO_CCW_OFFSET_TO_INDEX(off)    (off >> VFIO_CCW_OFFSET_SHIFT)
> >> +#define VFIO_CCW_INDEX_TO_OFFSET(index)    ((u64)(index) << 
> >> VFIO_CCW_OFFSET_SHIFT)
> >> +#define VFIO_CCW_OFFSET_MASK    (((u64)(1) << VFIO_CCW_OFFSET_SHIFT) 
> >> - 1)
> >> +
> >> +/* capability chain handling similar to vfio-pci */
> >> +struct vfio_ccw_private;
> >> +struct vfio_ccw_region;
> >> +
> >> +struct vfio_ccw_regops {
> >> +    size_t    (*read)(struct vfio_ccw_private *private, char __user 
> >> *buf,
> >> +            size_t count, loff_t *ppos);
> >> +    size_t    (*write)(struct vfio_ccw_private *private,
> >> +             const char __user *buf, size_t count, loff_t *ppos);  
> 
> Oops.  Per my recommendation on v1, you change these to "ssize_t" in 
> patch 5.  Might as well just do that here.

Seems to have slipped into the wrong patch during rebase. Will fix.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
  2019-01-25 21:00     ` [Qemu-devel] " Eric Farman
@ 2019-01-28 17:40       ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:40 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x

On Fri, 25 Jan 2019 16:00:38 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> > Add a region to the vfio-ccw device that can be used to submit
> > asynchronous I/O instructions. ssch continues to be handled by the
> > existing I/O region; the new region handles hsch and csch.
> > 
> > Interrupt status continues to be reported through the same channels
> > as for ssch.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > ---
> >   drivers/s390/cio/Makefile           |   3 +-
> >   drivers/s390/cio/vfio_ccw_async.c   |  91 ++++++++++++++++++++++
> >   drivers/s390/cio/vfio_ccw_drv.c     |  45 +++++++----
> >   drivers/s390/cio/vfio_ccw_fsm.c     | 114 +++++++++++++++++++++++++++-
> >   drivers/s390/cio/vfio_ccw_ops.c     |  13 +++-
> >   drivers/s390/cio/vfio_ccw_private.h |   9 ++-
> >   include/uapi/linux/vfio.h           |   2 +
> >   include/uapi/linux/vfio_ccw.h       |  12 +++
> >   8 files changed, 269 insertions(+), 20 deletions(-)
> >   create mode 100644 drivers/s390/cio/vfio_ccw_async.c

(...)

> > @@ -149,7 +155,10 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
> >   	cio_disable_subchannel(sch);
> >   out_free:
> >   	dev_set_drvdata(&sch->dev, NULL);
> > -	kmem_cache_free(vfio_ccw_io_region, private->io_region);
> > +	if (private->cmd_region)
> > +		kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
> > +	if (private->io_region)
> > +		kmem_cache_free(vfio_ccw_io_region, private->io_region);  
> 
> Well, adding the check if private->xxx_region is non-NULL is fine.  I'd 
> have made it a separate patch for io_region, but whatever.
> 
> Since you're adding that check, you should add the same if statement in 
> vfio_ccw_sch_remove().  And you should certainly call 
> kmem_cache_free(private->cmd_region) there too.  :)

Ehm, yes :)

> 
> >   	kfree(private);
> >   	return ret;
> >   }

(...)

> > +static int fsm_do_halt(struct vfio_ccw_private *private)
> > +{
> > +	struct subchannel *sch;
> > +	unsigned long flags;
> > +	int ccode;
> > +	int ret;
> > +
> > +	sch = private->sch;
> > +
> > +	spin_lock_irqsave(sch->lock, flags);
> > +
> > +	/* Issue "Halt Subchannel" */
> > +	ccode = hsch(sch->schid);
> > +
> > +	switch (ccode) {
> > +	case 0:
> > +		/*
> > +		 * Initialize device status information
> > +		 */
> > +		sch->schib.scsw.cmd.actl |= SCSW_ACTL_HALT_PEND;
> > +		ret = 0;
> > +		private->state = VFIO_CCW_STATE_BUSY;
> > +		break;
> > +	case 1:		/* Status pending */
> > +	case 2:		/* Busy */
> > +		ret = -EBUSY;
> > +		break;
> > +	case 3:		/* Device not operational */
> > +	{
> > +		ret = -ENODEV;
> > +		break;
> > +	}  
> 
> Why does cc3 get braces, but no other case does?  (Ditto for clear)
> 
> (I guess the answer is "because fsm_io_helper does" but I didn't notice 
> that before either.  :-)

Yes, the power of copy/paste :) But it makes sense to avoid adding them.

> 
> > +	default:
> > +		ret = ccode;
> > +	}
> > +	spin_unlock_irqrestore(sch->lock, flags);
> > +	return ret;
> > +}

(...)

> > +static void fsm_async_error(struct vfio_ccw_private *private,
> > +			    enum vfio_ccw_event event)
> > +{
> > +	pr_err("vfio-ccw: FSM: halt/clear request from state:%d\n",
> > +	       private->state);  
> 
> Had a little deja vu here.  :)
> 
> Can this message use private->cmd_region->command to tell us if it's a 
> halt, clear, or unknown?  Instead of just saying "halt/clear" statically.

Can do that; need to check if we need the mutex.

> 
> > +	private->cmd_region->ret_code = -EIO;
> > +}
> > +

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions
@ 2019-01-28 17:40       ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-28 17:40 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Fri, 25 Jan 2019 16:00:38 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
> > Add a region to the vfio-ccw device that can be used to submit
> > asynchronous I/O instructions. ssch continues to be handled by the
> > existing I/O region; the new region handles hsch and csch.
> > 
> > Interrupt status continues to be reported through the same channels
> > as for ssch.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > ---
> >   drivers/s390/cio/Makefile           |   3 +-
> >   drivers/s390/cio/vfio_ccw_async.c   |  91 ++++++++++++++++++++++
> >   drivers/s390/cio/vfio_ccw_drv.c     |  45 +++++++----
> >   drivers/s390/cio/vfio_ccw_fsm.c     | 114 +++++++++++++++++++++++++++-
> >   drivers/s390/cio/vfio_ccw_ops.c     |  13 +++-
> >   drivers/s390/cio/vfio_ccw_private.h |   9 ++-
> >   include/uapi/linux/vfio.h           |   2 +
> >   include/uapi/linux/vfio_ccw.h       |  12 +++
> >   8 files changed, 269 insertions(+), 20 deletions(-)
> >   create mode 100644 drivers/s390/cio/vfio_ccw_async.c

(...)

> > @@ -149,7 +155,10 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
> >   	cio_disable_subchannel(sch);
> >   out_free:
> >   	dev_set_drvdata(&sch->dev, NULL);
> > -	kmem_cache_free(vfio_ccw_io_region, private->io_region);
> > +	if (private->cmd_region)
> > +		kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
> > +	if (private->io_region)
> > +		kmem_cache_free(vfio_ccw_io_region, private->io_region);  
> 
> Well, adding the check if private->xxx_region is non-NULL is fine.  I'd 
> have made it a separate patch for io_region, but whatever.
> 
> Since you're adding that check, you should add the same if statement in 
> vfio_ccw_sch_remove().  And you should certainly call 
> kmem_cache_free(private->cmd_region) there too.  :)

Ehm, yes :)

> 
> >   	kfree(private);
> >   	return ret;
> >   }

(...)

> > +static int fsm_do_halt(struct vfio_ccw_private *private)
> > +{
> > +	struct subchannel *sch;
> > +	unsigned long flags;
> > +	int ccode;
> > +	int ret;
> > +
> > +	sch = private->sch;
> > +
> > +	spin_lock_irqsave(sch->lock, flags);
> > +
> > +	/* Issue "Halt Subchannel" */
> > +	ccode = hsch(sch->schid);
> > +
> > +	switch (ccode) {
> > +	case 0:
> > +		/*
> > +		 * Initialize device status information
> > +		 */
> > +		sch->schib.scsw.cmd.actl |= SCSW_ACTL_HALT_PEND;
> > +		ret = 0;
> > +		private->state = VFIO_CCW_STATE_BUSY;
> > +		break;
> > +	case 1:		/* Status pending */
> > +	case 2:		/* Busy */
> > +		ret = -EBUSY;
> > +		break;
> > +	case 3:		/* Device not operational */
> > +	{
> > +		ret = -ENODEV;
> > +		break;
> > +	}  
> 
> Why does cc3 get braces, but no other case does?  (Ditto for clear)
> 
> (I guess the answer is "because fsm_io_helper does" but I didn't notice 
> that before either.  :-)

Yes, the power of copy/paste :) But it makes sense to avoid adding them.

> 
> > +	default:
> > +		ret = ccode;
> > +	}
> > +	spin_unlock_irqrestore(sch->lock, flags);
> > +	return ret;
> > +}

(...)

> > +static void fsm_async_error(struct vfio_ccw_private *private,
> > +			    enum vfio_ccw_event event)
> > +{
> > +	pr_err("vfio-ccw: FSM: halt/clear request from state:%d\n",
> > +	       private->state);  
> 
> Had a little deja vu here.  :)
> 
> Can this message use private->cmd_region->command to tell us if it's a 
> halt, clear, or unknown?  Instead of just saying "halt/clear" statically.

Can do that; need to check if we need the mutex.

> 
> > +	private->cmd_region->ret_code = -EIO;
> > +}
> > +

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-28 17:09               ` [Qemu-devel] " Cornelia Huck
@ 2019-01-28 19:15                 ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-28 19:15 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Mon, 28 Jan 2019 18:09:48 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 25 Jan 2019 15:01:01 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Fri, 25 Jan 2019 13:58:35 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > > - The code should not be interrupted while we process the channel
> > >   program, do the ssch etc. We want the caller to try again later (i.e.
> > >   return -EAGAIN)  
> 
> (...)
> 
> > > - With the async interface, we want user space to be able to submit a
> > >   halt/clear while a start request is still in flight, but not while
> > >   we're processing a start request with translation etc. We probably
> > >   want to do -EAGAIN in that case.  
> > 
> > This reads very similar to your first point.
> 
> Not quite. ssch() means that we have a cp around; for hsch()/csch() we
> don't have such a thing. So we want to protect the process of
> translating the cp etc., but we don't need such protection for the
> halt/clear processing.
> 

What does this don't 'need such protection' mean in terms of code,
moving the unlock of the io_mutex upward (in
vfio_ccw_async_region_write())?

Here the function in question for reference:

+static ssize_t vfio_ccw_async_region_write(struct vfio_ccw_private
*private,
+					   const char __user *buf,
size_t count,
+					   loff_t *ppos)
+{
+	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) -
VFIO_CCW_NUM_REGIONS;
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
+	struct ccw_cmd_region *region;
+	int ret;
+
+	if (pos + count > sizeof(*region))
+		return -EINVAL;
+
+	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
+	    private->state == VFIO_CCW_STATE_STANDBY)
+		return -EACCES;
+	if (!mutex_trylock(&private->io_mutex))
+		return -EAGAIN;
+
+	region = private->region[i].data;
+	if (copy_from_user((void *)region + pos, buf, count)) {
+		ret = -EFAULT;
+		goto out_unlock;
+	}
+
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_ASYNC_REQ);
+
+	ret = region->ret_code ? region->ret_code : count;
+
+out_unlock:
+	mutex_unlock(&private->io_mutex);
+	return ret;
+}

That does not make much sense to me at the moment (so I guess I
misunderstood again).

> > 
> > > 
> > > My idea would be:
> > > 
> > > - The BUSY state denotes "I'm busy processing a request right now, try
> > >   again". We hold it while processing the cp and doing the ssch and
> > >   leave it afterwards (i.e., while the start request is processed by
> > >   the hardware). I/O requests and async requests get -EAGAIN in that
> > >   state.
> > > - A new state (CP_PENDING?) is entered after ssch returned with cc 0
> > >   (from the BUSY state). We stay in there as long as no final state for
> > >   that request has been received and delivered. (This may be final
> > >   interrupt for that request, a deferred cc, or successful halt/clear.)
> > >   I/O requests get -EBUSY, async requests are processed. This state can
> > >   be removed again once we are able to handle more than one outstanding
> > >   cp.
> > > 
> > > Does that make sense?
> > >   
> > 
> > AFAIU your idea is to split up the busy state into two states: CP_PENDING
> > and of busy without CP_PENDING called BUSY. I like the idea of having a
> > separate state for CP_PENDING but I don't like the new semantic of BUSY.
> > 
> > Hm mashing a conceptual state machine and the jumptabe stuff ain't
> > making reasoning about this simpler either. I'm taking about the
> > conceptual state machine. It would be nice to have a picture of it and
> > then think about how to express that in code.
> 
> Sorry, I'm having a hard time parsing your comments. Are you looking
> for something like the below?

I had more something like this 
https://en.wikipedia.org/wiki/UML_state_machine,
in mind but the lists of state transitions are also useful.

> 
> IDLE --- IO_REQ --> BUSY ---> CP_PENDING --- IRQ ---> IDLE (if final

There ain't no trigger/action list  between BUSY and CP_PENDING.
I'm also in the  dark about where the issuing of the ssch() happen
here (is it an internal transition within CP_PENDING?). I guess if
the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
transition
won't take place. And I guess the IRQ is a final one.

Sorry abstraction is not a concept unknown to me. But this is too much
abstraction for me in this context. The devil is in the details, and
AFAIU we are discussing these details right now.
 

> state for I/O)
> (normal ssch)
> 
> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
> (user space is supposed to retry, as we'll eventually progress from
> BUSY)
> 
> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
> (user space is supposed to map this to the appropriate cc for the guest)

From this it seems you don't intend to issue the second  requested ssch()
any more (and don't want to do any translation). Is that right? (If it
is, that what I was asking for for a while, but then it's a pity for the
retries.)

> 
> IDLE --- ASYNC_REQ ---> IDLE
> (user space is welcome to do anything else right away)

Your idea is to not issue a requested hsch() if we think we are IDLE
it seems. Do I understand this right? We would end up with a different
semantic for hsch()/and csch() (compared to PoP) in the guest with this
(AFAICT).

> 
> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
> (user space is supposed to retry, as above)
> 
> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
> (the interrupt will get us out of CP_PENDING eventually)

Issue (c|h)sch() is an action that is done on this internal 
transition (within CP_PENDING).

Thank you very much for investing into this description of the state
machine. I'm afraid I'm acting like a not so nice person (self censored)
at the moment. I can't help myself, sorry. Maybe Farhan and Eric can take
this as a starting point and come up with something that we can integrate
into our documentation. Maybe not...

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-28 19:15                 ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-28 19:15 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Mon, 28 Jan 2019 18:09:48 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 25 Jan 2019 15:01:01 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Fri, 25 Jan 2019 13:58:35 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > > - The code should not be interrupted while we process the channel
> > >   program, do the ssch etc. We want the caller to try again later (i.e.
> > >   return -EAGAIN)  
> 
> (...)
> 
> > > - With the async interface, we want user space to be able to submit a
> > >   halt/clear while a start request is still in flight, but not while
> > >   we're processing a start request with translation etc. We probably
> > >   want to do -EAGAIN in that case.  
> > 
> > This reads very similar to your first point.
> 
> Not quite. ssch() means that we have a cp around; for hsch()/csch() we
> don't have such a thing. So we want to protect the process of
> translating the cp etc., but we don't need such protection for the
> halt/clear processing.
> 

What does this don't 'need such protection' mean in terms of code,
moving the unlock of the io_mutex upward (in
vfio_ccw_async_region_write())?

Here the function in question for reference:

+static ssize_t vfio_ccw_async_region_write(struct vfio_ccw_private
*private,
+					   const char __user *buf,
size_t count,
+					   loff_t *ppos)
+{
+	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) -
VFIO_CCW_NUM_REGIONS;
+	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
+	struct ccw_cmd_region *region;
+	int ret;
+
+	if (pos + count > sizeof(*region))
+		return -EINVAL;
+
+	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
+	    private->state == VFIO_CCW_STATE_STANDBY)
+		return -EACCES;
+	if (!mutex_trylock(&private->io_mutex))
+		return -EAGAIN;
+
+	region = private->region[i].data;
+	if (copy_from_user((void *)region + pos, buf, count)) {
+		ret = -EFAULT;
+		goto out_unlock;
+	}
+
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_ASYNC_REQ);
+
+	ret = region->ret_code ? region->ret_code : count;
+
+out_unlock:
+	mutex_unlock(&private->io_mutex);
+	return ret;
+}

That does not make much sense to me at the moment (so I guess I
misunderstood again).

> > 
> > > 
> > > My idea would be:
> > > 
> > > - The BUSY state denotes "I'm busy processing a request right now, try
> > >   again". We hold it while processing the cp and doing the ssch and
> > >   leave it afterwards (i.e., while the start request is processed by
> > >   the hardware). I/O requests and async requests get -EAGAIN in that
> > >   state.
> > > - A new state (CP_PENDING?) is entered after ssch returned with cc 0
> > >   (from the BUSY state). We stay in there as long as no final state for
> > >   that request has been received and delivered. (This may be final
> > >   interrupt for that request, a deferred cc, or successful halt/clear.)
> > >   I/O requests get -EBUSY, async requests are processed. This state can
> > >   be removed again once we are able to handle more than one outstanding
> > >   cp.
> > > 
> > > Does that make sense?
> > >   
> > 
> > AFAIU your idea is to split up the busy state into two states: CP_PENDING
> > and of busy without CP_PENDING called BUSY. I like the idea of having a
> > separate state for CP_PENDING but I don't like the new semantic of BUSY.
> > 
> > Hm mashing a conceptual state machine and the jumptabe stuff ain't
> > making reasoning about this simpler either. I'm taking about the
> > conceptual state machine. It would be nice to have a picture of it and
> > then think about how to express that in code.
> 
> Sorry, I'm having a hard time parsing your comments. Are you looking
> for something like the below?

I had more something like this 
https://en.wikipedia.org/wiki/UML_state_machine,
in mind but the lists of state transitions are also useful.

> 
> IDLE --- IO_REQ --> BUSY ---> CP_PENDING --- IRQ ---> IDLE (if final

There ain't no trigger/action list  between BUSY and CP_PENDING.
I'm also in the  dark about where the issuing of the ssch() happen
here (is it an internal transition within CP_PENDING?). I guess if
the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
transition
won't take place. And I guess the IRQ is a final one.

Sorry abstraction is not a concept unknown to me. But this is too much
abstraction for me in this context. The devil is in the details, and
AFAIU we are discussing these details right now.
 

> state for I/O)
> (normal ssch)
> 
> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
> (user space is supposed to retry, as we'll eventually progress from
> BUSY)
> 
> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
> (user space is supposed to map this to the appropriate cc for the guest)

From this it seems you don't intend to issue the second  requested ssch()
any more (and don't want to do any translation). Is that right? (If it
is, that what I was asking for for a while, but then it's a pity for the
retries.)

> 
> IDLE --- ASYNC_REQ ---> IDLE
> (user space is welcome to do anything else right away)

Your idea is to not issue a requested hsch() if we think we are IDLE
it seems. Do I understand this right? We would end up with a different
semantic for hsch()/and csch() (compared to PoP) in the guest with this
(AFAICT).

> 
> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
> (user space is supposed to retry, as above)
> 
> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
> (the interrupt will get us out of CP_PENDING eventually)

Issue (c|h)sch() is an action that is done on this internal 
transition (within CP_PENDING).

Thank you very much for investing into this description of the state
machine. I'm afraid I'm acting like a not so nice person (self censored)
at the moment. I can't help myself, sorry. Maybe Farhan and Eric can take
this as a starting point and come up with something that we can integrate
into our documentation. Maybe not...

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-28 17:13                   ` [Qemu-devel] " Cornelia Huck
@ 2019-01-28 19:30                     ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-28 19:30 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, kvm, Pierre Morel, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Mon, 28 Jan 2019 18:13:55 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 25 Jan 2019 17:04:04 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > Do we expect userspace/QEMU to fence the bad scenarios as tries to do
> > today, or is this supposed to change to hardware should sort out
> > requests whenever possible.
> 
> Does my other mail answer that?

Sorry, I can't find the answer in your other (Date: Mon, 28 Jan 2019
17:59:10 +0100, Message-Id: <20190128175910.5d9677e7@oc2783563651>) mail.
AFAIU that mail talks abut the kernel and not about the userspace.

I guess the answer is we don't expect changes to userspace, so we do
expect userspace to fence bad scenarios.

> 
> > The problem I see with the let the hardware sort it out is that, for
> > that to work, we need to juggle multiple translations simultaneously
> > (or am I wrong?). Doing that does not appear particularly simple to
> > me.
> 
> None in the first stage, at most two in the second stage, I guess.
> 

Expected benefit of the second stage over the first stage? (I see none.)

> > Furthermore we would go through all that hassle knowingly that the
> > sole reason is working around bugs. We still expect our Linux guests
> > serializing it's ssch() stuff as it does today. Thus I would except
> > this code not getting the love nor the coverage that would guard
> > against bugs in that code.
> 
> So, we should have test code for that? (Any IBM-internal channel I/O
> exercisers that may help?)
>

None that I'm aware of. Anyone else? 

But the point I was trying to make is the following: I prefer keeping
the handling for the case "ssch()'s on top of each other" as trivial as
possible. (E.g. bail out if CP_PENDING without doing any translation.)
 
> We should not rely on the guest being sane, although Linux probably is
> in that respect.
> 

I agree 100%: we should not rely on either guest or userspace emulator
being sane. But IMHO we should handle insanity with the least possible
investment.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-28 19:30                     ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-28 19:30 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Mon, 28 Jan 2019 18:13:55 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 25 Jan 2019 17:04:04 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > Do we expect userspace/QEMU to fence the bad scenarios as tries to do
> > today, or is this supposed to change to hardware should sort out
> > requests whenever possible.
> 
> Does my other mail answer that?

Sorry, I can't find the answer in your other (Date: Mon, 28 Jan 2019
17:59:10 +0100, Message-Id: <20190128175910.5d9677e7@oc2783563651>) mail.
AFAIU that mail talks abut the kernel and not about the userspace.

I guess the answer is we don't expect changes to userspace, so we do
expect userspace to fence bad scenarios.

> 
> > The problem I see with the let the hardware sort it out is that, for
> > that to work, we need to juggle multiple translations simultaneously
> > (or am I wrong?). Doing that does not appear particularly simple to
> > me.
> 
> None in the first stage, at most two in the second stage, I guess.
> 

Expected benefit of the second stage over the first stage? (I see none.)

> > Furthermore we would go through all that hassle knowingly that the
> > sole reason is working around bugs. We still expect our Linux guests
> > serializing it's ssch() stuff as it does today. Thus I would except
> > this code not getting the love nor the coverage that would guard
> > against bugs in that code.
> 
> So, we should have test code for that? (Any IBM-internal channel I/O
> exercisers that may help?)
>

None that I'm aware of. Anyone else? 

But the point I was trying to make is the following: I prefer keeping
the handling for the case "ssch()'s on top of each other" as trivial as
possible. (E.g. bail out if CP_PENDING without doing any translation.)
 
> We should not rely on the guest being sane, although Linux probably is
> in that respect.
> 

I agree 100%: we should not rely on either guest or userspace emulator
being sane. But IMHO we should handle insanity with the least possible
investment.

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-28 19:15                 ` [Qemu-devel] " Halil Pasic
@ 2019-01-28 21:48                   ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-28 21:48 UTC (permalink / raw)
  To: Halil Pasic, Cornelia Huck
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, qemu-s390x



On 01/28/2019 02:15 PM, Halil Pasic wrote:
> On Mon, 28 Jan 2019 18:09:48 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Fri, 25 Jan 2019 15:01:01 +0100
>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>>> On Fri, 25 Jan 2019 13:58:35 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>>> - The code should not be interrupted while we process the channel
>>>>    program, do the ssch etc. We want the caller to try again later (i.e.
>>>>    return -EAGAIN)
>>
>> (...)
>>
>>>> - With the async interface, we want user space to be able to submit a
>>>>    halt/clear while a start request is still in flight, but not while
>>>>    we're processing a start request with translation etc. We probably
>>>>    want to do -EAGAIN in that case.
>>>
>>> This reads very similar to your first point.
>>
>> Not quite. ssch() means that we have a cp around; for hsch()/csch() we
>> don't have such a thing. So we want to protect the process of
>> translating the cp etc., but we don't need such protection for the
>> halt/clear processing.
>>
> 
> What does this don't 'need such protection' mean in terms of code,
> moving the unlock of the io_mutex upward (in
> vfio_ccw_async_region_write())?
> 
> Here the function in question for reference:
> 
> +static ssize_t vfio_ccw_async_region_write(struct vfio_ccw_private
> *private,
> +					   const char __user *buf,
> size_t count,
> +					   loff_t *ppos)
> +{
> +	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) -
> VFIO_CCW_NUM_REGIONS;
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
> +	struct ccw_cmd_region *region;
> +	int ret;
> +
> +	if (pos + count > sizeof(*region))
> +		return -EINVAL;
> +
> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> +	    private->state == VFIO_CCW_STATE_STANDBY)
> +		return -EACCES;
> +	if (!mutex_trylock(&private->io_mutex))
> +		return -EAGAIN;
> +
> +	region = private->region[i].data;
> +	if (copy_from_user((void *)region + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
> +
> +	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_ASYNC_REQ);
> +
> +	ret = region->ret_code ? region->ret_code : count;
> +
> +out_unlock:
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
> +}
> 
> That does not make much sense to me at the moment (so I guess I
> misunderstood again).
> 
>>>
>>>>
>>>> My idea would be:
>>>>
>>>> - The BUSY state denotes "I'm busy processing a request right now, try
>>>>    again". We hold it while processing the cp and doing the ssch and
>>>>    leave it afterwards (i.e., while the start request is processed by
>>>>    the hardware). I/O requests and async requests get -EAGAIN in that
>>>>    state.
>>>> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>>>>    (from the BUSY state). We stay in there as long as no final state for
>>>>    that request has been received and delivered. (This may be final
>>>>    interrupt for that request, a deferred cc, or successful halt/clear.)
>>>>    I/O requests get -EBUSY, async requests are processed. This state can
>>>>    be removed again once we are able to handle more than one outstanding
>>>>    cp.
>>>>
>>>> Does that make sense?
>>>>    
>>>
>>> AFAIU your idea is to split up the busy state into two states: CP_PENDING
>>> and of busy without CP_PENDING called BUSY. I like the idea of having a
>>> separate state for CP_PENDING but I don't like the new semantic of BUSY.
>>>
>>> Hm mashing a conceptual state machine and the jumptabe stuff ain't
>>> making reasoning about this simpler either. I'm taking about the
>>> conceptual state machine. It would be nice to have a picture of it and
>>> then think about how to express that in code.
>>
>> Sorry, I'm having a hard time parsing your comments. Are you looking
>> for something like the below?
> 
> I had more something like this
> https://en.wikipedia.org/wiki/UML_state_machine,
> in mind but the lists of state transitions are also useful.
> 

I think the picture Connie paints below is just as useful as any 
formalized UML diagram.

>>
>> IDLE --- IO_REQ --> BUSY ---> CP_PENDING --- IRQ ---> IDLE (if final
> 
> There ain't no trigger/action list  between BUSY and CP_PENDING.

Right, because BUSY means "KVM started processing a SSCH" and CP_PENDING 
means "KVM finished processing the SSCH and issued it to the hardware, 
and got cc=0."

> I'm also in the  dark about where the issuing of the ssch() happen
> here (is it an internal transition within CP_PENDING?). 

Connie said...

 >>>> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
 >>>>    (from the BUSY state).

...and I agree with that.

I guess if
> the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
> transition
> won't take place. And I guess the IRQ is a final one.

Yes this is the one point I hadn't seen explicitly stated.  We shouldn't 
remain in state=BUSY if the ssch got cc!=0, and probably return to IDLE 
when processing the failure.  In Connie's response (Mon, 28 Jan 2019 
18:24:24 +0100) to my note, she expressed some agreement to that.

> 
> Sorry abstraction is not a concept unknown to me. But this is too much
> abstraction for me in this context. The devil is in the details, and
> AFAIU we are discussing these details right now.
>   
> 
>> state for I/O)
>> (normal ssch)
>>
>> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
>> (user space is supposed to retry, as we'll eventually progress from
>> BUSY)
>>
>> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
>> (user space is supposed to map this to the appropriate cc for the guest)
> 
>  From this it seems you don't intend to issue the second  requested ssch()
> any more (and don't want to do any translation). Is that right? (If it
> is, that what I was asking for for a while, but then it's a pity for the
> retries.)
> 
>>
>> IDLE --- ASYNC_REQ ---> IDLE
>> (user space is welcome to do anything else right away)
> 
> Your idea is to not issue a requested hsch() if we think we are IDLE
> it seems. Do I understand this right? We would end up with a different
> semantic for hsch()/and csch() (compared to PoP) in the guest with this
> (AFAICT).
> 
>>
>> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
>> (user space is supposed to retry, as above)
>>
>> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
>> (the interrupt will get us out of CP_PENDING eventually)
> 
> Issue (c|h)sch() is an action that is done on this internal
> transition (within CP_PENDING).

These three do read like CSCH/HSCH are subject to the same rules as 
SSCH, when in fact they would be (among other reasons) issued to clean 
up a lost interrupt from a previous SSCH.  So maybe return -EAGAIN on 
state=BUSY (don't race ourselves with the start), but issue to hardware 
if CP_PENDING.

If we get an async request when state=IDLE, then maybe just issue it for 
fun, as if it were an SSCH?

> 
> Thank you very much for investing into this description of the state
> machine. I'm afraid I'm acting like a not so nice person (self censored)
> at the moment. I can't help myself, sorry. Maybe Farhan and Eric can take
> this as a starting point and come up with something that we can integrate
> into our documentation. Maybe not...
> 
> Regards,
> Halil
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-28 21:48                   ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-28 21:48 UTC (permalink / raw)
  To: Halil Pasic, Cornelia Huck
  Cc: Farhan Ali, Pierre Morel, linux-s390, kvm, qemu-devel,
	qemu-s390x, Alex Williamson



On 01/28/2019 02:15 PM, Halil Pasic wrote:
> On Mon, 28 Jan 2019 18:09:48 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> On Fri, 25 Jan 2019 15:01:01 +0100
>> Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>>> On Fri, 25 Jan 2019 13:58:35 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>>> - The code should not be interrupted while we process the channel
>>>>    program, do the ssch etc. We want the caller to try again later (i.e.
>>>>    return -EAGAIN)
>>
>> (...)
>>
>>>> - With the async interface, we want user space to be able to submit a
>>>>    halt/clear while a start request is still in flight, but not while
>>>>    we're processing a start request with translation etc. We probably
>>>>    want to do -EAGAIN in that case.
>>>
>>> This reads very similar to your first point.
>>
>> Not quite. ssch() means that we have a cp around; for hsch()/csch() we
>> don't have such a thing. So we want to protect the process of
>> translating the cp etc., but we don't need such protection for the
>> halt/clear processing.
>>
> 
> What does this don't 'need such protection' mean in terms of code,
> moving the unlock of the io_mutex upward (in
> vfio_ccw_async_region_write())?
> 
> Here the function in question for reference:
> 
> +static ssize_t vfio_ccw_async_region_write(struct vfio_ccw_private
> *private,
> +					   const char __user *buf,
> size_t count,
> +					   loff_t *ppos)
> +{
> +	unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) -
> VFIO_CCW_NUM_REGIONS;
> +	loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
> +	struct ccw_cmd_region *region;
> +	int ret;
> +
> +	if (pos + count > sizeof(*region))
> +		return -EINVAL;
> +
> +	if (private->state == VFIO_CCW_STATE_NOT_OPER ||
> +	    private->state == VFIO_CCW_STATE_STANDBY)
> +		return -EACCES;
> +	if (!mutex_trylock(&private->io_mutex))
> +		return -EAGAIN;
> +
> +	region = private->region[i].data;
> +	if (copy_from_user((void *)region + pos, buf, count)) {
> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
> +
> +	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_ASYNC_REQ);
> +
> +	ret = region->ret_code ? region->ret_code : count;
> +
> +out_unlock:
> +	mutex_unlock(&private->io_mutex);
> +	return ret;
> +}
> 
> That does not make much sense to me at the moment (so I guess I
> misunderstood again).
> 
>>>
>>>>
>>>> My idea would be:
>>>>
>>>> - The BUSY state denotes "I'm busy processing a request right now, try
>>>>    again". We hold it while processing the cp and doing the ssch and
>>>>    leave it afterwards (i.e., while the start request is processed by
>>>>    the hardware). I/O requests and async requests get -EAGAIN in that
>>>>    state.
>>>> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>>>>    (from the BUSY state). We stay in there as long as no final state for
>>>>    that request has been received and delivered. (This may be final
>>>>    interrupt for that request, a deferred cc, or successful halt/clear.)
>>>>    I/O requests get -EBUSY, async requests are processed. This state can
>>>>    be removed again once we are able to handle more than one outstanding
>>>>    cp.
>>>>
>>>> Does that make sense?
>>>>    
>>>
>>> AFAIU your idea is to split up the busy state into two states: CP_PENDING
>>> and of busy without CP_PENDING called BUSY. I like the idea of having a
>>> separate state for CP_PENDING but I don't like the new semantic of BUSY.
>>>
>>> Hm mashing a conceptual state machine and the jumptabe stuff ain't
>>> making reasoning about this simpler either. I'm taking about the
>>> conceptual state machine. It would be nice to have a picture of it and
>>> then think about how to express that in code.
>>
>> Sorry, I'm having a hard time parsing your comments. Are you looking
>> for something like the below?
> 
> I had more something like this
> https://en.wikipedia.org/wiki/UML_state_machine,
> in mind but the lists of state transitions are also useful.
> 

I think the picture Connie paints below is just as useful as any 
formalized UML diagram.

>>
>> IDLE --- IO_REQ --> BUSY ---> CP_PENDING --- IRQ ---> IDLE (if final
> 
> There ain't no trigger/action list  between BUSY and CP_PENDING.

Right, because BUSY means "KVM started processing a SSCH" and CP_PENDING 
means "KVM finished processing the SSCH and issued it to the hardware, 
and got cc=0."

> I'm also in the  dark about where the issuing of the ssch() happen
> here (is it an internal transition within CP_PENDING?). 

Connie said...

 >>>> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
 >>>>    (from the BUSY state).

...and I agree with that.

I guess if
> the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
> transition
> won't take place. And I guess the IRQ is a final one.

Yes this is the one point I hadn't seen explicitly stated.  We shouldn't 
remain in state=BUSY if the ssch got cc!=0, and probably return to IDLE 
when processing the failure.  In Connie's response (Mon, 28 Jan 2019 
18:24:24 +0100) to my note, she expressed some agreement to that.

> 
> Sorry abstraction is not a concept unknown to me. But this is too much
> abstraction for me in this context. The devil is in the details, and
> AFAIU we are discussing these details right now.
>   
> 
>> state for I/O)
>> (normal ssch)
>>
>> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
>> (user space is supposed to retry, as we'll eventually progress from
>> BUSY)
>>
>> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
>> (user space is supposed to map this to the appropriate cc for the guest)
> 
>  From this it seems you don't intend to issue the second  requested ssch()
> any more (and don't want to do any translation). Is that right? (If it
> is, that what I was asking for for a while, but then it's a pity for the
> retries.)
> 
>>
>> IDLE --- ASYNC_REQ ---> IDLE
>> (user space is welcome to do anything else right away)
> 
> Your idea is to not issue a requested hsch() if we think we are IDLE
> it seems. Do I understand this right? We would end up with a different
> semantic for hsch()/and csch() (compared to PoP) in the guest with this
> (AFAICT).
> 
>>
>> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
>> (user space is supposed to retry, as above)
>>
>> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
>> (the interrupt will get us out of CP_PENDING eventually)
> 
> Issue (c|h)sch() is an action that is done on this internal
> transition (within CP_PENDING).

These three do read like CSCH/HSCH are subject to the same rules as 
SSCH, when in fact they would be (among other reasons) issued to clean 
up a lost interrupt from a previous SSCH.  So maybe return -EAGAIN on 
state=BUSY (don't race ourselves with the start), but issue to hardware 
if CP_PENDING.

If we get an async request when state=IDLE, then maybe just issue it for 
fun, as if it were an SSCH?

> 
> Thank you very much for investing into this description of the state
> machine. I'm afraid I'm acting like a not so nice person (self censored)
> at the moment. I can't help myself, sorry. Maybe Farhan and Eric can take
> this as a starting point and come up with something that we can integrate
> into our documentation. Maybe not...
> 
> Regards,
> Halil
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-28 17:24               ` [Qemu-devel] " Cornelia Huck
@ 2019-01-28 21:50                 ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-28 21:50 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x



On 01/28/2019 12:24 PM, Cornelia Huck wrote:
> On Fri, 25 Jan 2019 10:57:38 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 01/25/2019 07:58 AM, Cornelia Huck wrote:
>>> On Fri, 25 Jan 2019 11:24:37 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>    
>>>> On Thu, 24 Jan 2019 21:37:44 -0500
>>>> Eric Farman <farman@linux.ibm.com> wrote:
>>>>   
>>>>> On 01/24/2019 09:25 PM, Eric Farman wrote:
>>>>>>
>>>>>>
>>>>>> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
>>>>   
>>>>>> [1] I think these changes are cool.  We end up going into (and staying
>>>>>> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
>>>>>> bumble along.
>>>>>>
>>>>>> But why can't these be separated out from this patch?  It does change
>>>>>> the behavior of the state machine, and seem distinct from the addition
>>>>>> of the mutex you otherwise add here?  At the very least, this behavior
>>>>>> change should be documented in the commit since it's otherwise lost in
>>>>>> the mutex/EAGAIN stuff.
>>>>
>>>> That's a very good idea. I'll factor them out into a separate patch.
>>>
>>> And now that I've factored it out, I noticed some more problems.
>>
>> That's good!  Maybe it helps us with the circles we're on :)
> 
> :)
> 
>>
>>>
>>> What we basically need is the following, I think:
>>>
>>> - The code should not be interrupted while we process the channel
>>>     program, do the ssch etc. We want the caller to try again later (i.e.
>>>     return -EAGAIN)
>>> - We currently do not want the user space to submit another channel
>>>     program while the first one is still in flight.
>>
>> These two seem to contradict one another.  I think you're saying is that
>> we don't _want_ userspace to issue another channel program, even though
>> its _allowed_ to as far as vfio-ccw is concerned.
> 
> What I'm trying to say is that we want to distinguish two things:
> - The code is currently doing translation etc. We probably want to keep
>    that atomic, in order not to make things too complicated.
> - We have sent the ssch() to the hardware, but have not yet received
>    the final interrupt for that request (that's what I meant with "in
>    flight"). It's easier for the first shot to disallow a second ssch()
>    as that would need handling of more than one cp request, but we may
>    want to allow it in the future.
>    A hsch()/csch() (which does not generate a new cp) should be fine.
> 
> (see also my reply to Halil's mail)
> 
>>
>> As submitting another
>>>     one is a valid request, however, we should allow this in the future
>>>     (once we have the code to handle that in place).
>>> - With the async interface, we want user space to be able to submit a
>>>     halt/clear while a start request is still in flight, but not while
>>>     we're processing a start request with translation etc. We probably
>>>     want to do -EAGAIN in that case.
>>>
>>> My idea would be:
>>>
>>> - The BUSY state denotes "I'm busy processing a request right now, try
>>>     again". We hold it while processing the cp and doing the ssch and
>>>     leave it afterwards (i.e., while the start request is processed by
>>>     the hardware). I/O requests and async requests get -EAGAIN in that
>>>     state.
>>> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>>>     (from the BUSY state). We stay in there as long as no final state for
>>>     that request has been received and delivered. (This may be final
>>>     interrupt for that request, a deferred cc, or successful halt/clear.)
>>>     I/O requests get -EBUSY
>>
>> I liked CP_PENDING, since it corresponds to the subchannel being marked
>> "start pending" as described in POPS, but this statement suggests that
>> the BUSY/PENDING state to be swapped, such that state=PENDING returns
>> -EAGAIN and state=BUSY returns -EBUSY.  Not super-concerned with the
>> terminology though.
> 
> What about s/BUSY/CP_PROCESSING/ ?

So we go IDLE -> CP_PROCESSING -> CP_PENDING -> (IRQ) -> IDLE right? 
Seems good to me.

> 
>>
>> , async requests are processed. This state can
>>>     be removed again once we are able to handle more than one outstanding
>>>     cp.
>>>
>>> Does that make sense?
>>>    
>>
>> I think so, and I think I like it.  So you want to distinguish between
>> (I have swapped BUSY/PENDING in this example per my above comment):
>>
>> A) SSCH issued by userspace (IDLE->PENDING)
>> B) SSCH issued (successfully) by kernel (PENDING->BUSY)
>> B') SSCH issued (unsuccessfully) by kernel (PENDING->IDLE?)
> 
> I think so.
> 
>> C) Interrupt received by kernel (no change?)
>> D) Interrupt given to userspace (BUSY->IDLE)
> 
> Only if that is the final interrupt for that cp.

Agreed.

> 
>>
>> If we receive A and A, the second A gets EAGAIN
>>
>> If we do A+B and A, the second A gets EBUSY (unless async, which is
>> processed)
> 
> Nod.
> 
>> Does the boundary of "in flight" in the interrupt side (C and D) need to
>> be defined, such that we go BUSY->PENDING->IDLE instead of BUSY->IDLE ?
> 
> I don't think we can go BUSY->PENDING (in your terminology), at that
> would imply a retry of the ssch()?
> 

I didn't think so, but figured it's worth asking while we're already 
confused.  :)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-28 21:50                 ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-28 21:50 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson



On 01/28/2019 12:24 PM, Cornelia Huck wrote:
> On Fri, 25 Jan 2019 10:57:38 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 01/25/2019 07:58 AM, Cornelia Huck wrote:
>>> On Fri, 25 Jan 2019 11:24:37 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>    
>>>> On Thu, 24 Jan 2019 21:37:44 -0500
>>>> Eric Farman <farman@linux.ibm.com> wrote:
>>>>   
>>>>> On 01/24/2019 09:25 PM, Eric Farman wrote:
>>>>>>
>>>>>>
>>>>>> On 01/21/2019 06:03 AM, Cornelia Huck wrote:
>>>>   
>>>>>> [1] I think these changes are cool.  We end up going into (and staying
>>>>>> in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we
>>>>>> bumble along.
>>>>>>
>>>>>> But why can't these be separated out from this patch?  It does change
>>>>>> the behavior of the state machine, and seem distinct from the addition
>>>>>> of the mutex you otherwise add here?  At the very least, this behavior
>>>>>> change should be documented in the commit since it's otherwise lost in
>>>>>> the mutex/EAGAIN stuff.
>>>>
>>>> That's a very good idea. I'll factor them out into a separate patch.
>>>
>>> And now that I've factored it out, I noticed some more problems.
>>
>> That's good!  Maybe it helps us with the circles we're on :)
> 
> :)
> 
>>
>>>
>>> What we basically need is the following, I think:
>>>
>>> - The code should not be interrupted while we process the channel
>>>     program, do the ssch etc. We want the caller to try again later (i.e.
>>>     return -EAGAIN)
>>> - We currently do not want the user space to submit another channel
>>>     program while the first one is still in flight.
>>
>> These two seem to contradict one another.  I think you're saying is that
>> we don't _want_ userspace to issue another channel program, even though
>> its _allowed_ to as far as vfio-ccw is concerned.
> 
> What I'm trying to say is that we want to distinguish two things:
> - The code is currently doing translation etc. We probably want to keep
>    that atomic, in order not to make things too complicated.
> - We have sent the ssch() to the hardware, but have not yet received
>    the final interrupt for that request (that's what I meant with "in
>    flight"). It's easier for the first shot to disallow a second ssch()
>    as that would need handling of more than one cp request, but we may
>    want to allow it in the future.
>    A hsch()/csch() (which does not generate a new cp) should be fine.
> 
> (see also my reply to Halil's mail)
> 
>>
>> As submitting another
>>>     one is a valid request, however, we should allow this in the future
>>>     (once we have the code to handle that in place).
>>> - With the async interface, we want user space to be able to submit a
>>>     halt/clear while a start request is still in flight, but not while
>>>     we're processing a start request with translation etc. We probably
>>>     want to do -EAGAIN in that case.
>>>
>>> My idea would be:
>>>
>>> - The BUSY state denotes "I'm busy processing a request right now, try
>>>     again". We hold it while processing the cp and doing the ssch and
>>>     leave it afterwards (i.e., while the start request is processed by
>>>     the hardware). I/O requests and async requests get -EAGAIN in that
>>>     state.
>>> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>>>     (from the BUSY state). We stay in there as long as no final state for
>>>     that request has been received and delivered. (This may be final
>>>     interrupt for that request, a deferred cc, or successful halt/clear.)
>>>     I/O requests get -EBUSY
>>
>> I liked CP_PENDING, since it corresponds to the subchannel being marked
>> "start pending" as described in POPS, but this statement suggests that
>> the BUSY/PENDING state to be swapped, such that state=PENDING returns
>> -EAGAIN and state=BUSY returns -EBUSY.  Not super-concerned with the
>> terminology though.
> 
> What about s/BUSY/CP_PROCESSING/ ?

So we go IDLE -> CP_PROCESSING -> CP_PENDING -> (IRQ) -> IDLE right? 
Seems good to me.

> 
>>
>> , async requests are processed. This state can
>>>     be removed again once we are able to handle more than one outstanding
>>>     cp.
>>>
>>> Does that make sense?
>>>    
>>
>> I think so, and I think I like it.  So you want to distinguish between
>> (I have swapped BUSY/PENDING in this example per my above comment):
>>
>> A) SSCH issued by userspace (IDLE->PENDING)
>> B) SSCH issued (successfully) by kernel (PENDING->BUSY)
>> B') SSCH issued (unsuccessfully) by kernel (PENDING->IDLE?)
> 
> I think so.
> 
>> C) Interrupt received by kernel (no change?)
>> D) Interrupt given to userspace (BUSY->IDLE)
> 
> Only if that is the final interrupt for that cp.

Agreed.

> 
>>
>> If we receive A and A, the second A gets EAGAIN
>>
>> If we do A+B and A, the second A gets EBUSY (unless async, which is
>> processed)
> 
> Nod.
> 
>> Does the boundary of "in flight" in the interrupt side (C and D) need to
>> be defined, such that we go BUSY->PENDING->IDLE instead of BUSY->IDLE ?
> 
> I don't think we can go BUSY->PENDING (in your terminology), at that
> would imply a retry of the ssch()?
> 

I didn't think so, but figured it's worth asking while we're already 
confused.  :)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-28 19:30                     ` [Qemu-devel] " Halil Pasic
@ 2019-01-29  9:58                       ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-29  9:58 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, kvm, Pierre Morel, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Mon, 28 Jan 2019 20:30:00 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 28 Jan 2019 18:13:55 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Fri, 25 Jan 2019 17:04:04 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > Do we expect userspace/QEMU to fence the bad scenarios as tries to do
> > > today, or is this supposed to change to hardware should sort out
> > > requests whenever possible.  
> > 
> > Does my other mail answer that?  
> 
> Sorry, I can't find the answer in your other (Date: Mon, 28 Jan 2019
> 17:59:10 +0100, Message-Id: <20190128175910.5d9677e7@oc2783563651>) mail.
> AFAIU that mail talks abut the kernel and not about the userspace.
> 
> I guess the answer is we don't expect changes to userspace, so we do
> expect userspace to fence bad scenarios.

Then, I really have no idea what you are aiming at with your comment :(

> 
> >   
> > > The problem I see with the let the hardware sort it out is that, for
> > > that to work, we need to juggle multiple translations simultaneously
> > > (or am I wrong?). Doing that does not appear particularly simple to
> > > me.  
> > 
> > None in the first stage, at most two in the second stage, I guess.
> >   
> 
> Expected benefit of the second stage over the first stage? (I see none.)

Making something possible that is allowed by the architecture. Not
really important, though.

> 
> > > Furthermore we would go through all that hassle knowingly that the
> > > sole reason is working around bugs. We still expect our Linux guests
> > > serializing it's ssch() stuff as it does today. Thus I would except
> > > this code not getting the love nor the coverage that would guard
> > > against bugs in that code.  
> > 
> > So, we should have test code for that? (Any IBM-internal channel I/O
> > exercisers that may help?)
> >  
> 
> None that I'm aware of. Anyone else? 
> 
> But the point I was trying to make is the following: I prefer keeping
> the handling for the case "ssch()'s on top of each other" as trivial as
> possible. (E.g. bail out if CP_PENDING without doing any translation.)
>  
> > We should not rely on the guest being sane, although Linux probably is
> > in that respect.
> >   
> 
> I agree 100%: we should not rely on either guest or userspace emulator
> being sane. But IMHO we should handle insanity with the least possible
> investment.

We probably disagree what the least possible investment is.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-29  9:58                       ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-29  9:58 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Mon, 28 Jan 2019 20:30:00 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 28 Jan 2019 18:13:55 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Fri, 25 Jan 2019 17:04:04 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > Do we expect userspace/QEMU to fence the bad scenarios as tries to do
> > > today, or is this supposed to change to hardware should sort out
> > > requests whenever possible.  
> > 
> > Does my other mail answer that?  
> 
> Sorry, I can't find the answer in your other (Date: Mon, 28 Jan 2019
> 17:59:10 +0100, Message-Id: <20190128175910.5d9677e7@oc2783563651>) mail.
> AFAIU that mail talks abut the kernel and not about the userspace.
> 
> I guess the answer is we don't expect changes to userspace, so we do
> expect userspace to fence bad scenarios.

Then, I really have no idea what you are aiming at with your comment :(

> 
> >   
> > > The problem I see with the let the hardware sort it out is that, for
> > > that to work, we need to juggle multiple translations simultaneously
> > > (or am I wrong?). Doing that does not appear particularly simple to
> > > me.  
> > 
> > None in the first stage, at most two in the second stage, I guess.
> >   
> 
> Expected benefit of the second stage over the first stage? (I see none.)

Making something possible that is allowed by the architecture. Not
really important, though.

> 
> > > Furthermore we would go through all that hassle knowingly that the
> > > sole reason is working around bugs. We still expect our Linux guests
> > > serializing it's ssch() stuff as it does today. Thus I would except
> > > this code not getting the love nor the coverage that would guard
> > > against bugs in that code.  
> > 
> > So, we should have test code for that? (Any IBM-internal channel I/O
> > exercisers that may help?)
> >  
> 
> None that I'm aware of. Anyone else? 
> 
> But the point I was trying to make is the following: I prefer keeping
> the handling for the case "ssch()'s on top of each other" as trivial as
> possible. (E.g. bail out if CP_PENDING without doing any translation.)
>  
> > We should not rely on the guest being sane, although Linux probably is
> > in that respect.
> >   
> 
> I agree 100%: we should not rely on either guest or userspace emulator
> being sane. But IMHO we should handle insanity with the least possible
> investment.

We probably disagree what the least possible investment is.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-28 19:15                 ` [Qemu-devel] " Halil Pasic
@ 2019-01-29 10:10                   ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-29 10:10 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Mon, 28 Jan 2019 20:15:48 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 28 Jan 2019 18:09:48 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Fri, 25 Jan 2019 15:01:01 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Fri, 25 Jan 2019 13:58:35 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:  
> >   
> > > > - The code should not be interrupted while we process the channel
> > > >   program, do the ssch etc. We want the caller to try again later (i.e.
> > > >   return -EAGAIN)    
> > 
> > (...)
> >   
> > > > - With the async interface, we want user space to be able to submit a
> > > >   halt/clear while a start request is still in flight, but not while
> > > >   we're processing a start request with translation etc. We probably
> > > >   want to do -EAGAIN in that case.    
> > > 
> > > This reads very similar to your first point.  
> > 
> > Not quite. ssch() means that we have a cp around; for hsch()/csch() we
> > don't have such a thing. So we want to protect the process of
> > translating the cp etc., but we don't need such protection for the
> > halt/clear processing.
> >   
> 
> What does this don't 'need such protection' mean in terms of code,
> moving the unlock of the io_mutex upward (in
> vfio_ccw_async_region_write())?

We don't have a cp that we need to process, so we don't need protection
for that.

> > 
> > IDLE --- IO_REQ --> BUSY ---> CP_PENDING --- IRQ ---> IDLE (if final  
> 
> There ain't no trigger/action list  between BUSY and CP_PENDING.
> I'm also in the  dark about where the issuing of the ssch() happen
> here (is it an internal transition within CP_PENDING?). I guess if
> the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
> transition
> won't take place. And I guess the IRQ is a final one.

Please refer to the original ideas. This is obviously not supposed to
be a complete description of every case we might encounter.

> > state for I/O)
> > (normal ssch)
> > 
> > BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
> > (user space is supposed to retry, as we'll eventually progress from
> > BUSY)
> > 
> > CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
> > (user space is supposed to map this to the appropriate cc for the guest)  
> 
> From this it seems you don't intend to issue the second  requested ssch()
> any more (and don't want to do any translation). Is that right? (If it
> is, that what I was asking for for a while, but then it's a pity for the
> retries.)

Which "second requested ssch"? In the first case, user space is
supposed to retry; in the second case, it should map it to a cc (and
the guest does whatever it does on busy conditions). We can't issue a
ssch if we're not able to handle multiple cps.

> 
> > 
> > IDLE --- ASYNC_REQ ---> IDLE
> > (user space is welcome to do anything else right away)  
> 
> Your idea is to not issue a requested hsch() if we think we are IDLE
> it seems. Do I understand this right? We would end up with a different
> semantic for hsch()/and csch() (compared to PoP) in the guest with this
> (AFAICT).

Nope, we're doing hsch/csch. We're just not moving out of IDLE, as we
(a) don't have any cp processing we need to protect and (b) no need to
fence of multiple attempts of hsch/csch.

> 
> > 
> > BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
> > (user space is supposed to retry, as above)
> > 
> > CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
> > (the interrupt will get us out of CP_PENDING eventually)  
> 
> Issue (c|h)sch() is an action that is done on this internal 
> transition (within CP_PENDING).

Yes. hsch/csch do not trigger a state change (other than possibly
dropping into NOT_OPER for cc 3).

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-29 10:10                   ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-29 10:10 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Eric Farman, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Mon, 28 Jan 2019 20:15:48 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 28 Jan 2019 18:09:48 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Fri, 25 Jan 2019 15:01:01 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Fri, 25 Jan 2019 13:58:35 +0100
> > > Cornelia Huck <cohuck@redhat.com> wrote:  
> >   
> > > > - The code should not be interrupted while we process the channel
> > > >   program, do the ssch etc. We want the caller to try again later (i.e.
> > > >   return -EAGAIN)    
> > 
> > (...)
> >   
> > > > - With the async interface, we want user space to be able to submit a
> > > >   halt/clear while a start request is still in flight, but not while
> > > >   we're processing a start request with translation etc. We probably
> > > >   want to do -EAGAIN in that case.    
> > > 
> > > This reads very similar to your first point.  
> > 
> > Not quite. ssch() means that we have a cp around; for hsch()/csch() we
> > don't have such a thing. So we want to protect the process of
> > translating the cp etc., but we don't need such protection for the
> > halt/clear processing.
> >   
> 
> What does this don't 'need such protection' mean in terms of code,
> moving the unlock of the io_mutex upward (in
> vfio_ccw_async_region_write())?

We don't have a cp that we need to process, so we don't need protection
for that.

> > 
> > IDLE --- IO_REQ --> BUSY ---> CP_PENDING --- IRQ ---> IDLE (if final  
> 
> There ain't no trigger/action list  between BUSY and CP_PENDING.
> I'm also in the  dark about where the issuing of the ssch() happen
> here (is it an internal transition within CP_PENDING?). I guess if
> the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
> transition
> won't take place. And I guess the IRQ is a final one.

Please refer to the original ideas. This is obviously not supposed to
be a complete description of every case we might encounter.

> > state for I/O)
> > (normal ssch)
> > 
> > BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
> > (user space is supposed to retry, as we'll eventually progress from
> > BUSY)
> > 
> > CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
> > (user space is supposed to map this to the appropriate cc for the guest)  
> 
> From this it seems you don't intend to issue the second  requested ssch()
> any more (and don't want to do any translation). Is that right? (If it
> is, that what I was asking for for a while, but then it's a pity for the
> retries.)

Which "second requested ssch"? In the first case, user space is
supposed to retry; in the second case, it should map it to a cc (and
the guest does whatever it does on busy conditions). We can't issue a
ssch if we're not able to handle multiple cps.

> 
> > 
> > IDLE --- ASYNC_REQ ---> IDLE
> > (user space is welcome to do anything else right away)  
> 
> Your idea is to not issue a requested hsch() if we think we are IDLE
> it seems. Do I understand this right? We would end up with a different
> semantic for hsch()/and csch() (compared to PoP) in the guest with this
> (AFAICT).

Nope, we're doing hsch/csch. We're just not moving out of IDLE, as we
(a) don't have any cp processing we need to protect and (b) no need to
fence of multiple attempts of hsch/csch.

> 
> > 
> > BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
> > (user space is supposed to retry, as above)
> > 
> > CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
> > (the interrupt will get us out of CP_PENDING eventually)  
> 
> Issue (c|h)sch() is an action that is done on this internal 
> transition (within CP_PENDING).

Yes. hsch/csch do not trigger a state change (other than possibly
dropping into NOT_OPER for cc 3).

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-28 21:48                   ` [Qemu-devel] " Eric Farman
@ 2019-01-29 10:20                     ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-29 10:20 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x

On Mon, 28 Jan 2019 16:48:10 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/28/2019 02:15 PM, Halil Pasic wrote:
> > On Mon, 28 Jan 2019 18:09:48 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:

> I guess if
> > the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
> > transition
> > won't take place. And I guess the IRQ is a final one.  
> 
> Yes this is the one point I hadn't seen explicitly stated.  We shouldn't 
> remain in state=BUSY if the ssch got cc!=0, and probably return to IDLE 
> when processing the failure.  In Connie's response (Mon, 28 Jan 2019 
> 18:24:24 +0100) to my note, she expressed some agreement to that.

Yes, I think that's what should happen.


> >> state for I/O)
> >> (normal ssch)
> >>
> >> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
> >> (user space is supposed to retry, as we'll eventually progress from
> >> BUSY)
> >>
> >> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
> >> (user space is supposed to map this to the appropriate cc for the guest)  
> > 
> >  From this it seems you don't intend to issue the second  requested ssch()
> > any more (and don't want to do any translation). Is that right? (If it
> > is, that what I was asking for for a while, but then it's a pity for the
> > retries.)
> >   
> >>
> >> IDLE --- ASYNC_REQ ---> IDLE
> >> (user space is welcome to do anything else right away)  
> > 
> > Your idea is to not issue a requested hsch() if we think we are IDLE
> > it seems. Do I understand this right? We would end up with a different
> > semantic for hsch()/and csch() (compared to PoP) in the guest with this
> > (AFAICT).
> >   
> >>
> >> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
> >> (user space is supposed to retry, as above)
> >>
> >> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
> >> (the interrupt will get us out of CP_PENDING eventually)  
> > 
> > Issue (c|h)sch() is an action that is done on this internal
> > transition (within CP_PENDING).  
> 
> These three do read like CSCH/HSCH are subject to the same rules as 
> SSCH, when in fact they would be (among other reasons) issued to clean 
> up a lost interrupt from a previous SSCH.  So maybe return -EAGAIN on 
> state=BUSY (don't race ourselves with the start), but issue to hardware 
> if CP_PENDING.

I think there are some devices which require a certain hsch/csch
sequence during device bringup, so it's not just cleaning up after a
ssch. Therefore, we should always try to do the requested hsch/csch,
unless things like "we're in the process of translating a cp, and can't
deal with another request right now" prevent it.

> 
> If we get an async request when state=IDLE, then maybe just issue it for 
> fun, as if it were an SSCH?

For fun, but mainly because the guest wants it :)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-29 10:20                     ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-29 10:20 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Mon, 28 Jan 2019 16:48:10 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/28/2019 02:15 PM, Halil Pasic wrote:
> > On Mon, 28 Jan 2019 18:09:48 +0100
> > Cornelia Huck <cohuck@redhat.com> wrote:

> I guess if
> > the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
> > transition
> > won't take place. And I guess the IRQ is a final one.  
> 
> Yes this is the one point I hadn't seen explicitly stated.  We shouldn't 
> remain in state=BUSY if the ssch got cc!=0, and probably return to IDLE 
> when processing the failure.  In Connie's response (Mon, 28 Jan 2019 
> 18:24:24 +0100) to my note, she expressed some agreement to that.

Yes, I think that's what should happen.


> >> state for I/O)
> >> (normal ssch)
> >>
> >> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
> >> (user space is supposed to retry, as we'll eventually progress from
> >> BUSY)
> >>
> >> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
> >> (user space is supposed to map this to the appropriate cc for the guest)  
> > 
> >  From this it seems you don't intend to issue the second  requested ssch()
> > any more (and don't want to do any translation). Is that right? (If it
> > is, that what I was asking for for a while, but then it's a pity for the
> > retries.)
> >   
> >>
> >> IDLE --- ASYNC_REQ ---> IDLE
> >> (user space is welcome to do anything else right away)  
> > 
> > Your idea is to not issue a requested hsch() if we think we are IDLE
> > it seems. Do I understand this right? We would end up with a different
> > semantic for hsch()/and csch() (compared to PoP) in the guest with this
> > (AFAICT).
> >   
> >>
> >> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
> >> (user space is supposed to retry, as above)
> >>
> >> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
> >> (the interrupt will get us out of CP_PENDING eventually)  
> > 
> > Issue (c|h)sch() is an action that is done on this internal
> > transition (within CP_PENDING).  
> 
> These three do read like CSCH/HSCH are subject to the same rules as 
> SSCH, when in fact they would be (among other reasons) issued to clean 
> up a lost interrupt from a previous SSCH.  So maybe return -EAGAIN on 
> state=BUSY (don't race ourselves with the start), but issue to hardware 
> if CP_PENDING.

I think there are some devices which require a certain hsch/csch
sequence during device bringup, so it's not just cleaning up after a
ssch. Therefore, we should always try to do the requested hsch/csch,
unless things like "we're in the process of translating a cp, and can't
deal with another request right now" prevent it.

> 
> If we get an async request when state=IDLE, then maybe just issue it for 
> fun, as if it were an SSCH?

For fun, but mainly because the guest wants it :)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-29 10:20                     ` [Qemu-devel] " Cornelia Huck
@ 2019-01-29 14:14                       ` Eric Farman
  -1 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-29 14:14 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x



On 01/29/2019 05:20 AM, Cornelia Huck wrote:
> On Mon, 28 Jan 2019 16:48:10 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 01/28/2019 02:15 PM, Halil Pasic wrote:
>>> On Mon, 28 Jan 2019 18:09:48 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> I guess if
>>> the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
>>> transition
>>> won't take place. And I guess the IRQ is a final one.
>>
>> Yes this is the one point I hadn't seen explicitly stated.  We shouldn't
>> remain in state=BUSY if the ssch got cc!=0, and probably return to IDLE
>> when processing the failure.  In Connie's response (Mon, 28 Jan 2019
>> 18:24:24 +0100) to my note, she expressed some agreement to that.
> 
> Yes, I think that's what should happen.
> 
> 
>>>> state for I/O)
>>>> (normal ssch)
>>>>
>>>> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
>>>> (user space is supposed to retry, as we'll eventually progress from
>>>> BUSY)
>>>>
>>>> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
>>>> (user space is supposed to map this to the appropriate cc for the guest)
>>>
>>>   From this it seems you don't intend to issue the second  requested ssch()
>>> any more (and don't want to do any translation). Is that right? (If it
>>> is, that what I was asking for for a while, but then it's a pity for the
>>> retries.)
>>>    
>>>>
>>>> IDLE --- ASYNC_REQ ---> IDLE
>>>> (user space is welcome to do anything else right away)
>>>
>>> Your idea is to not issue a requested hsch() if we think we are IDLE
>>> it seems. Do I understand this right? We would end up with a different
>>> semantic for hsch()/and csch() (compared to PoP) in the guest with this
>>> (AFAICT).
>>>    
>>>>
>>>> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
>>>> (user space is supposed to retry, as above)
>>>>
>>>> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
>>>> (the interrupt will get us out of CP_PENDING eventually)
>>>
>>> Issue (c|h)sch() is an action that is done on this internal
>>> transition (within CP_PENDING).
>>
>> These three do read like CSCH/HSCH are subject to the same rules as
>> SSCH, when in fact they would be (among other reasons) issued to clean
>> up a lost interrupt from a previous SSCH.  So maybe return -EAGAIN on
>> state=BUSY (don't race ourselves with the start), but issue to hardware
>> if CP_PENDING.
> 
> I think there are some devices which require a certain hsch/csch
> sequence during device bringup, so it's not just cleaning up after a
> ssch. 

Ah, yes.

Therefore, we should always try to do the requested hsch/csch,
> unless things like "we're in the process of translating a cp, and can't
> deal with another request right now" prevent it.

Agreed.  I'm in support of all of this.

> 
>>
>> If we get an async request when state=IDLE, then maybe just issue it for
>> fun, as if it were an SSCH?
> 
> For fun, but mainly because the guest wants it :)
> 

Well, that too.  ;-)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-29 14:14                       ` Eric Farman
  0 siblings, 0 replies; 134+ messages in thread
From: Eric Farman @ 2019-01-29 14:14 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson



On 01/29/2019 05:20 AM, Cornelia Huck wrote:
> On Mon, 28 Jan 2019 16:48:10 -0500
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 01/28/2019 02:15 PM, Halil Pasic wrote:
>>> On Mon, 28 Jan 2019 18:09:48 +0100
>>> Cornelia Huck <cohuck@redhat.com> wrote:
> 
>> I guess if
>>> the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
>>> transition
>>> won't take place. And I guess the IRQ is a final one.
>>
>> Yes this is the one point I hadn't seen explicitly stated.  We shouldn't
>> remain in state=BUSY if the ssch got cc!=0, and probably return to IDLE
>> when processing the failure.  In Connie's response (Mon, 28 Jan 2019
>> 18:24:24 +0100) to my note, she expressed some agreement to that.
> 
> Yes, I think that's what should happen.
> 
> 
>>>> state for I/O)
>>>> (normal ssch)
>>>>
>>>> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
>>>> (user space is supposed to retry, as we'll eventually progress from
>>>> BUSY)
>>>>
>>>> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
>>>> (user space is supposed to map this to the appropriate cc for the guest)
>>>
>>>   From this it seems you don't intend to issue the second  requested ssch()
>>> any more (and don't want to do any translation). Is that right? (If it
>>> is, that what I was asking for for a while, but then it's a pity for the
>>> retries.)
>>>    
>>>>
>>>> IDLE --- ASYNC_REQ ---> IDLE
>>>> (user space is welcome to do anything else right away)
>>>
>>> Your idea is to not issue a requested hsch() if we think we are IDLE
>>> it seems. Do I understand this right? We would end up with a different
>>> semantic for hsch()/and csch() (compared to PoP) in the guest with this
>>> (AFAICT).
>>>    
>>>>
>>>> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
>>>> (user space is supposed to retry, as above)
>>>>
>>>> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
>>>> (the interrupt will get us out of CP_PENDING eventually)
>>>
>>> Issue (c|h)sch() is an action that is done on this internal
>>> transition (within CP_PENDING).
>>
>> These three do read like CSCH/HSCH are subject to the same rules as
>> SSCH, when in fact they would be (among other reasons) issued to clean
>> up a lost interrupt from a previous SSCH.  So maybe return -EAGAIN on
>> state=BUSY (don't race ourselves with the start), but issue to hardware
>> if CP_PENDING.
> 
> I think there are some devices which require a certain hsch/csch
> sequence during device bringup, so it's not just cleaning up after a
> ssch. 

Ah, yes.

Therefore, we should always try to do the requested hsch/csch,
> unless things like "we're in the process of translating a cp, and can't
> deal with another request right now" prevent it.

Agreed.  I'm in support of all of this.

> 
>>
>> If we get an async request when state=IDLE, then maybe just issue it for
>> fun, as if it were an SSCH?
> 
> For fun, but mainly because the guest wants it :)
> 

Well, that too.  ;-)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-29 14:14                       ` [Qemu-devel] " Eric Farman
@ 2019-01-29 18:53                         ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-29 18:53 UTC (permalink / raw)
  To: Eric Farman
  Cc: linux-s390, Alex Williamson, Pierre Morel, kvm, Farhan Ali,
	qemu-devel, Halil Pasic, qemu-s390x

On Tue, 29 Jan 2019 09:14:40 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/29/2019 05:20 AM, Cornelia Huck wrote:
> > On Mon, 28 Jan 2019 16:48:10 -0500
> > Eric Farman <farman@linux.ibm.com> wrote:
> >   
> >> On 01/28/2019 02:15 PM, Halil Pasic wrote:  
> >>> On Mon, 28 Jan 2019 18:09:48 +0100
> >>> Cornelia Huck <cohuck@redhat.com> wrote:  
> >   
> >> I guess if  
> >>> the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
> >>> transition
> >>> won't take place. And I guess the IRQ is a final one.  
> >>
> >> Yes this is the one point I hadn't seen explicitly stated.  We shouldn't
> >> remain in state=BUSY if the ssch got cc!=0, and probably return to IDLE
> >> when processing the failure.  In Connie's response (Mon, 28 Jan 2019
> >> 18:24:24 +0100) to my note, she expressed some agreement to that.  
> > 
> > Yes, I think that's what should happen.
> > 
> >   
> >>>> state for I/O)
> >>>> (normal ssch)
> >>>>
> >>>> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
> >>>> (user space is supposed to retry, as we'll eventually progress from
> >>>> BUSY)
> >>>>
> >>>> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
> >>>> (user space is supposed to map this to the appropriate cc for the guest)  
> >>>
> >>>   From this it seems you don't intend to issue the second  requested ssch()
> >>> any more (and don't want to do any translation). Is that right? (If it
> >>> is, that what I was asking for for a while, but then it's a pity for the
> >>> retries.)
> >>>      
> >>>>
> >>>> IDLE --- ASYNC_REQ ---> IDLE
> >>>> (user space is welcome to do anything else right away)  
> >>>
> >>> Your idea is to not issue a requested hsch() if we think we are IDLE
> >>> it seems. Do I understand this right? We would end up with a different
> >>> semantic for hsch()/and csch() (compared to PoP) in the guest with this
> >>> (AFAICT).
> >>>      
> >>>>
> >>>> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
> >>>> (user space is supposed to retry, as above)
> >>>>
> >>>> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
> >>>> (the interrupt will get us out of CP_PENDING eventually)  
> >>>
> >>> Issue (c|h)sch() is an action that is done on this internal
> >>> transition (within CP_PENDING).  
> >>
> >> These three do read like CSCH/HSCH are subject to the same rules as
> >> SSCH, when in fact they would be (among other reasons) issued to clean
> >> up a lost interrupt from a previous SSCH.  So maybe return -EAGAIN on
> >> state=BUSY (don't race ourselves with the start), but issue to hardware
> >> if CP_PENDING.  
> > 
> > I think there are some devices which require a certain hsch/csch
> > sequence during device bringup, so it's not just cleaning up after a
> > ssch.   
> 
> Ah, yes.
> 
> Therefore, we should always try to do the requested hsch/csch,
> > unless things like "we're in the process of translating a cp, and can't
> > deal with another request right now" prevent it.  
> 
> Agreed.  I'm in support of all of this.

Cool. In the meantime, I've coded the changes, and I think the result
looks reasonable. I'll give it some testing and then send it out; it's
probably easier to discuss it with some code in front of us.

[The QEMU part should not need any changes.]

> 
> >   
> >>
> >> If we get an async request when state=IDLE, then maybe just issue it for
> >> fun, as if it were an SSCH?  
> > 
> > For fun, but mainly because the guest wants it :)
> >   
> 
> Well, that too.  ;-)
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-29 18:53                         ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-29 18:53 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Farhan Ali, Pierre Morel, linux-s390, kvm,
	qemu-devel, qemu-s390x, Alex Williamson

On Tue, 29 Jan 2019 09:14:40 -0500
Eric Farman <farman@linux.ibm.com> wrote:

> On 01/29/2019 05:20 AM, Cornelia Huck wrote:
> > On Mon, 28 Jan 2019 16:48:10 -0500
> > Eric Farman <farman@linux.ibm.com> wrote:
> >   
> >> On 01/28/2019 02:15 PM, Halil Pasic wrote:  
> >>> On Mon, 28 Jan 2019 18:09:48 +0100
> >>> Cornelia Huck <cohuck@redhat.com> wrote:  
> >   
> >> I guess if  
> >>> the ssch() returns with non cc == 0 the CP_PENDING ---IRQ---> IDLE
> >>> transition
> >>> won't take place. And I guess the IRQ is a final one.  
> >>
> >> Yes this is the one point I hadn't seen explicitly stated.  We shouldn't
> >> remain in state=BUSY if the ssch got cc!=0, and probably return to IDLE
> >> when processing the failure.  In Connie's response (Mon, 28 Jan 2019
> >> 18:24:24 +0100) to my note, she expressed some agreement to that.  
> > 
> > Yes, I think that's what should happen.
> > 
> >   
> >>>> state for I/O)
> >>>> (normal ssch)
> >>>>
> >>>> BUSY --- IO_REQ ---> return -EAGAIN, stay in BUSY
> >>>> (user space is supposed to retry, as we'll eventually progress from
> >>>> BUSY)
> >>>>
> >>>> CP_PENDING --- IO_REQ ---> return -EBUSY, stay in CP_PENDING
> >>>> (user space is supposed to map this to the appropriate cc for the guest)  
> >>>
> >>>   From this it seems you don't intend to issue the second  requested ssch()
> >>> any more (and don't want to do any translation). Is that right? (If it
> >>> is, that what I was asking for for a while, but then it's a pity for the
> >>> retries.)
> >>>      
> >>>>
> >>>> IDLE --- ASYNC_REQ ---> IDLE
> >>>> (user space is welcome to do anything else right away)  
> >>>
> >>> Your idea is to not issue a requested hsch() if we think we are IDLE
> >>> it seems. Do I understand this right? We would end up with a different
> >>> semantic for hsch()/and csch() (compared to PoP) in the guest with this
> >>> (AFAICT).
> >>>      
> >>>>
> >>>> BUSY --- ASYNC_REQ ---> return -EAGAIN, stay in BUSY
> >>>> (user space is supposed to retry, as above)
> >>>>
> >>>> CP_PENDING --- ASYNC_REQ ---> return success, stay in CP_PENDING
> >>>> (the interrupt will get us out of CP_PENDING eventually)  
> >>>
> >>> Issue (c|h)sch() is an action that is done on this internal
> >>> transition (within CP_PENDING).  
> >>
> >> These three do read like CSCH/HSCH are subject to the same rules as
> >> SSCH, when in fact they would be (among other reasons) issued to clean
> >> up a lost interrupt from a previous SSCH.  So maybe return -EAGAIN on
> >> state=BUSY (don't race ourselves with the start), but issue to hardware
> >> if CP_PENDING.  
> > 
> > I think there are some devices which require a certain hsch/csch
> > sequence during device bringup, so it's not just cleaning up after a
> > ssch.   
> 
> Ah, yes.
> 
> Therefore, we should always try to do the requested hsch/csch,
> > unless things like "we're in the process of translating a cp, and can't
> > deal with another request right now" prevent it.  
> 
> Agreed.  I'm in support of all of this.

Cool. In the meantime, I've coded the changes, and I think the result
looks reasonable. I'll give it some testing and then send it out; it's
probably easier to discuss it with some code in front of us.

[The QEMU part should not need any changes.]

> 
> >   
> >>
> >> If we get an async request when state=IDLE, then maybe just issue it for
> >> fun, as if it were an SSCH?  
> > 
> > For fun, but mainly because the guest wants it :)
> >   
> 
> Well, that too.  ;-)
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-29  9:58                       ` [Qemu-devel] " Cornelia Huck
@ 2019-01-29 19:39                         ` Halil Pasic
  -1 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-29 19:39 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, kvm, Pierre Morel, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 29 Jan 2019 10:58:40 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> > > > The problem I see with the let the hardware sort it out is that, for
> > > > that to work, we need to juggle multiple translations simultaneously
> > > > (or am I wrong?). Doing that does not appear particularly simple to
> > > > me.    
> > > 
> > > None in the first stage, at most two in the second stage, I guess.
> > >     
> > 
> > Expected benefit of the second stage over the first stage? (I see none.)  
> 
> Making something possible that is allowed by the architecture. Not
> really important, though.
> 

I had a chat with Farhan, and he suggested that by 'allowed by
architecture' you mean " You can submit a new request if the subchannel
is pending with primary, but not with secondary state." (from Message-ID:
<20190125152154.05120461.cohuck@redhat.com>).

So I re-read the PoP.

From the description of the start subchannel instruction:
"""
Special Conditions

Condition code 1 is set, and no other action is
taken, when the subchannel is status pending when
START SUBCHANNEL is executed. On some mod-
els, condition code 1 is not set when the subchannel
is status pending with only secondary status; instead,
the status-pending condition is discarded.

Condition code 2 is set, and no other action is
taken, when a start, halt, or clear function is currently
in progress at the subchannel (see “Function Control
(FC)” on page 13).

"""

So I guess you mixed primary and secondary up and wanted to say:
"You can submit a new request if the subchannel
is pending with _secondary_, but not with _primary_ _status_."

But does that really mean architecture allows the subchannel
to accept multiple ssch() instructions so that it ends up processing
two or more channel programs in parallel.

My answer to that question is: no it does not, and furthermore
it would not make sense.

So let me provide some quotes that explain how this ominous accepting
the ssch() with a status pending can occur.

"""
Conclusion of I/O Operations

The conclusion of an I/O operation normally is indi-
cated by two status conditions: channel end and
device end. The channel-end condition indicates that
the I/O device has received or provided all data asso-
ciated with the operation and no longer needs chan-
nel-subsystem facilities. This condition is called the
primary interruption condition, and the channel end
in this case is the primary status. Generally, the pri-
mary interruption condition is any interruption condi-
tion that relates to an I/O operation and that signals
the conclusion at the subchannel of the I/O operation
or chain of I/O operations.

The device-end signal indicates that the I/O device
has concluded execution and is ready to perform
another operation. This condition is called the sec-
ondary interruption condition, and the device end in
this case is the secondary status. Generally, the sec-
ondary interruption condition is any interruption con-
dition that relates to an I/O operation and that signals
the conclusion at the device of the I/O operation or
chain of operations. The secondary interruption con-
dition can occur concurrently with, or later than, the
primary interruption condition.
"""

In my reading this means that a device may lag with signaling
that it is done (with respect to the conclusion at the subchannel).

It that window between primary and secondary the driver could do the
proper driving of a ccw device stuff, do the store subchannel and clear
the primary status. See  And happily start the next ssch(). If that
ssch() now hit a subchannel thah just got the secondary status, for some
modes we apparently don't need to wait for the secondary status before
issuing the next ssch(), nor do we need to do we need to do the clear the
pending status dance because ssch() says cc == 1. The subchannel will
discard the secondary status and process the second ssch(). But the
previous ssch() has long concluded. Because as the quoted text says
the primary status is either simultaneous with or precedes the secondary
status.

Also if the subchannel were to process more than one channel program
at the time, questions would arise like what happens if one of them fails
(does that affect the other one?). It's something I find very hard to
even thing about.

BTW we would have to deal with these problems as well, if we were
to implement your second stage.

> >   
> > > > Furthermore we would go through all that hassle knowingly that
> > > > the sole reason is working around bugs. We still expect our
> > > > Linux guests serializing it's ssch() stuff as it does today.
> > > > Thus I would except this code not getting the love nor the
> > > > coverage that would guard against bugs in that code.    
> > > 
> > > So, we should have test code for that? (Any IBM-internal channel
> > > I/O exercisers that may help?)
> > >    
> > 
> > None that I'm aware of. Anyone else? 
> > 
> > But the point I was trying to make is the following: I prefer keeping
> > the handling for the case "ssch()'s on top of each other" as trivial
> > as possible. (E.g. bail out if CP_PENDING without doing any
> > translation.) 
> > > We should not rely on the guest being sane, although Linux
> > > probably is in that respect.
> > >     
> > 
> > I agree 100%: we should not rely on either guest or userspace
> > emulator being sane. But IMHO we should handle insanity with the
> > least possible investment.  
> 
> We probably disagree what the least possible investment is.
> 

Yes. IMHO making sure that we accept io_requests only if we are in
an appropriate state (currently the IDLE) and rejecting requests with
the appropriate error code is easy. And juggling parallel translations is
hard. My intuition. I can try to prove my point should anybody ever try
to submit patches that attempt this juggling parallel translations. But
I'm loosing my confidence in my ability to convince people.

Farhan, Eric any opinions?

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-29 19:39                         ` Halil Pasic
  0 siblings, 0 replies; 134+ messages in thread
From: Halil Pasic @ 2019-01-29 19:39 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Tue, 29 Jan 2019 10:58:40 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> > > > The problem I see with the let the hardware sort it out is that, for
> > > > that to work, we need to juggle multiple translations simultaneously
> > > > (or am I wrong?). Doing that does not appear particularly simple to
> > > > me.    
> > > 
> > > None in the first stage, at most two in the second stage, I guess.
> > >     
> > 
> > Expected benefit of the second stage over the first stage? (I see none.)  
> 
> Making something possible that is allowed by the architecture. Not
> really important, though.
> 

I had a chat with Farhan, and he suggested that by 'allowed by
architecture' you mean " You can submit a new request if the subchannel
is pending with primary, but not with secondary state." (from Message-ID:
<20190125152154.05120461.cohuck@redhat.com>).

So I re-read the PoP.

From the description of the start subchannel instruction:
"""
Special Conditions

Condition code 1 is set, and no other action is
taken, when the subchannel is status pending when
START SUBCHANNEL is executed. On some mod-
els, condition code 1 is not set when the subchannel
is status pending with only secondary status; instead,
the status-pending condition is discarded.

Condition code 2 is set, and no other action is
taken, when a start, halt, or clear function is currently
in progress at the subchannel (see “Function Control
(FC)” on page 13).

"""

So I guess you mixed primary and secondary up and wanted to say:
"You can submit a new request if the subchannel
is pending with _secondary_, but not with _primary_ _status_."

But does that really mean architecture allows the subchannel
to accept multiple ssch() instructions so that it ends up processing
two or more channel programs in parallel.

My answer to that question is: no it does not, and furthermore
it would not make sense.

So let me provide some quotes that explain how this ominous accepting
the ssch() with a status pending can occur.

"""
Conclusion of I/O Operations

The conclusion of an I/O operation normally is indi-
cated by two status conditions: channel end and
device end. The channel-end condition indicates that
the I/O device has received or provided all data asso-
ciated with the operation and no longer needs chan-
nel-subsystem facilities. This condition is called the
primary interruption condition, and the channel end
in this case is the primary status. Generally, the pri-
mary interruption condition is any interruption condi-
tion that relates to an I/O operation and that signals
the conclusion at the subchannel of the I/O operation
or chain of I/O operations.

The device-end signal indicates that the I/O device
has concluded execution and is ready to perform
another operation. This condition is called the sec-
ondary interruption condition, and the device end in
this case is the secondary status. Generally, the sec-
ondary interruption condition is any interruption con-
dition that relates to an I/O operation and that signals
the conclusion at the device of the I/O operation or
chain of operations. The secondary interruption con-
dition can occur concurrently with, or later than, the
primary interruption condition.
"""

In my reading this means that a device may lag with signaling
that it is done (with respect to the conclusion at the subchannel).

It that window between primary and secondary the driver could do the
proper driving of a ccw device stuff, do the store subchannel and clear
the primary status. See  And happily start the next ssch(). If that
ssch() now hit a subchannel thah just got the secondary status, for some
modes we apparently don't need to wait for the secondary status before
issuing the next ssch(), nor do we need to do we need to do the clear the
pending status dance because ssch() says cc == 1. The subchannel will
discard the secondary status and process the second ssch(). But the
previous ssch() has long concluded. Because as the quoted text says
the primary status is either simultaneous with or precedes the secondary
status.

Also if the subchannel were to process more than one channel program
at the time, questions would arise like what happens if one of them fails
(does that affect the other one?). It's something I find very hard to
even thing about.

BTW we would have to deal with these problems as well, if we were
to implement your second stage.

> >   
> > > > Furthermore we would go through all that hassle knowingly that
> > > > the sole reason is working around bugs. We still expect our
> > > > Linux guests serializing it's ssch() stuff as it does today.
> > > > Thus I would except this code not getting the love nor the
> > > > coverage that would guard against bugs in that code.    
> > > 
> > > So, we should have test code for that? (Any IBM-internal channel
> > > I/O exercisers that may help?)
> > >    
> > 
> > None that I'm aware of. Anyone else? 
> > 
> > But the point I was trying to make is the following: I prefer keeping
> > the handling for the case "ssch()'s on top of each other" as trivial
> > as possible. (E.g. bail out if CP_PENDING without doing any
> > translation.) 
> > > We should not rely on the guest being sane, although Linux
> > > probably is in that respect.
> > >     
> > 
> > I agree 100%: we should not rely on either guest or userspace
> > emulator being sane. But IMHO we should handle insanity with the
> > least possible investment.  
> 
> We probably disagree what the least possible investment is.
> 

Yes. IMHO making sure that we accept io_requests only if we are in
an appropriate state (currently the IDLE) and rejecting requests with
the appropriate error code is easy. And juggling parallel translations is
hard. My intuition. I can try to prove my point should anybody ever try
to submit patches that attempt this juggling parallel translations. But
I'm loosing my confidence in my ability to convince people.

Farhan, Eric any opinions?

Regards,
Halil

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-29 19:39                         ` [Qemu-devel] " Halil Pasic
@ 2019-01-30 13:29                           ` Cornelia Huck
  -1 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-30 13:29 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, kvm, Pierre Morel, qemu-s390x,
	Farhan Ali, qemu-devel, Alex Williamson

On Tue, 29 Jan 2019 20:39:33 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 29 Jan 2019 10:58:40 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > > > > The problem I see with the let the hardware sort it out is that, for
> > > > > that to work, we need to juggle multiple translations simultaneously
> > > > > (or am I wrong?). Doing that does not appear particularly simple to
> > > > > me.      
> > > > 
> > > > None in the first stage, at most two in the second stage, I guess.
> > > >       
> > > 
> > > Expected benefit of the second stage over the first stage? (I see none.)    
> > 
> > Making something possible that is allowed by the architecture. Not
> > really important, though.
> >   
> 
> I had a chat with Farhan, and he suggested that by 'allowed by
> architecture' you mean " You can submit a new request if the subchannel
> is pending with primary, but not with secondary state." (from Message-ID:
> <20190125152154.05120461.cohuck@redhat.com>).

Yes. I might have mixed things up, though.

> 
> So I re-read the PoP.
> 
> From the description of the start subchannel instruction:
> """
> Special Conditions
> 
> Condition code 1 is set, and no other action is
> taken, when the subchannel is status pending when
> START SUBCHANNEL is executed. On some mod-
> els, condition code 1 is not set when the subchannel
> is status pending with only secondary status; instead,
> the status-pending condition is discarded.
> 
> Condition code 2 is set, and no other action is
> taken, when a start, halt, or clear function is currently
> in progress at the subchannel (see “Function Control
> (FC)” on page 13).
> 
> """
> 
> So I guess you mixed primary and secondary up and wanted to say:
> "You can submit a new request if the subchannel
> is pending with _secondary_, but not with _primary_ _status_."
> 
> But does that really mean architecture allows the subchannel
> to accept multiple ssch() instructions so that it ends up processing
> two or more channel programs in parallel.

That's not what I meant. The vfio-ccw driver still holds on to one cp,
while a second one could be submitted.

But let's just end discussing this here, and continue with discussing
the reworked state machine, ok? It's not really relevant for going
forward with halt/clear.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-30 13:29                           ` Cornelia Huck
  0 siblings, 0 replies; 134+ messages in thread
From: Cornelia Huck @ 2019-01-30 13:29 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	Farhan Ali, qemu-devel, qemu-s390x

On Tue, 29 Jan 2019 20:39:33 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Tue, 29 Jan 2019 10:58:40 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > > > > The problem I see with the let the hardware sort it out is that, for
> > > > > that to work, we need to juggle multiple translations simultaneously
> > > > > (or am I wrong?). Doing that does not appear particularly simple to
> > > > > me.      
> > > > 
> > > > None in the first stage, at most two in the second stage, I guess.
> > > >       
> > > 
> > > Expected benefit of the second stage over the first stage? (I see none.)    
> > 
> > Making something possible that is allowed by the architecture. Not
> > really important, though.
> >   
> 
> I had a chat with Farhan, and he suggested that by 'allowed by
> architecture' you mean " You can submit a new request if the subchannel
> is pending with primary, but not with secondary state." (from Message-ID:
> <20190125152154.05120461.cohuck@redhat.com>).

Yes. I might have mixed things up, though.

> 
> So I re-read the PoP.
> 
> From the description of the start subchannel instruction:
> """
> Special Conditions
> 
> Condition code 1 is set, and no other action is
> taken, when the subchannel is status pending when
> START SUBCHANNEL is executed. On some mod-
> els, condition code 1 is not set when the subchannel
> is status pending with only secondary status; instead,
> the status-pending condition is discarded.
> 
> Condition code 2 is set, and no other action is
> taken, when a start, halt, or clear function is currently
> in progress at the subchannel (see “Function Control
> (FC)” on page 13).
> 
> """
> 
> So I guess you mixed primary and secondary up and wanted to say:
> "You can submit a new request if the subchannel
> is pending with _secondary_, but not with _primary_ _status_."
> 
> But does that really mean architecture allows the subchannel
> to accept multiple ssch() instructions so that it ends up processing
> two or more channel programs in parallel.

That's not what I meant. The vfio-ccw driver still holds on to one cp,
while a second one could be submitted.

But let's just end discussing this here, and continue with discussing
the reworked state machine, ok? It's not really relevant for going
forward with halt/clear.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
  2019-01-30 13:29                           ` [Qemu-devel] " Cornelia Huck
@ 2019-01-30 14:32                             ` Farhan Ali
  -1 siblings, 0 replies; 134+ messages in thread
From: Farhan Ali @ 2019-01-30 14:32 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic
  Cc: linux-s390, Eric Farman, kvm, Pierre Morel, qemu-s390x,
	qemu-devel, Alex Williamson



On 01/30/2019 08:29 AM, Cornelia Huck wrote:
> On Tue, 29 Jan 2019 20:39:33 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On Tue, 29 Jan 2019 10:58:40 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>>>>> The problem I see with the let the hardware sort it out is that, for
>>>>>> that to work, we need to juggle multiple translations simultaneously
>>>>>> (or am I wrong?). Doing that does not appear particularly simple to
>>>>>> me.
>>>>>
>>>>> None in the first stage, at most two in the second stage, I guess.
>>>>>        
>>>>
>>>> Expected benefit of the second stage over the first stage? (I see none.)
>>>
>>> Making something possible that is allowed by the architecture. Not
>>> really important, though.
>>>    
>>
>> I had a chat with Farhan, and he suggested that by 'allowed by
>> architecture' you mean " You can submit a new request if the subchannel
>> is pending with primary, but not with secondary state." (from Message-ID:
>> <20190125152154.05120461.cohuck@redhat.com>).
> 
> Yes. I might have mixed things up, though.
> 
>>
>> So I re-read the PoP.
>>
>>  From the description of the start subchannel instruction:
>> """
>> Special Conditions
>>
>> Condition code 1 is set, and no other action is
>> taken, when the subchannel is status pending when
>> START SUBCHANNEL is executed. On some mod-
>> els, condition code 1 is not set when the subchannel
>> is status pending with only secondary status; instead,
>> the status-pending condition is discarded.
>>
>> Condition code 2 is set, and no other action is
>> taken, when a start, halt, or clear function is currently
>> in progress at the subchannel (see “Function Control
>> (FC)” on page 13).
>>
>> """
>>
>> So I guess you mixed primary and secondary up and wanted to say:
>> "You can submit a new request if the subchannel
>> is pending with _secondary_, but not with _primary_ _status_."
>>
>> But does that really mean architecture allows the subchannel
>> to accept multiple ssch() instructions so that it ends up processing
>> two or more channel programs in parallel.
> 
> That's not what I meant. The vfio-ccw driver still holds on to one cp,
> while a second one could be submitted.
> 
> But let's just end discussing this here, and continue with discussing
> the reworked state machine, ok? It's not really relevant for going
> forward with halt/clear.
> 
> 
+1
  I think we should move forward with halt/clear.

Thanks
Farhan

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] vfio-ccw: concurrent I/O handling
@ 2019-01-30 14:32                             ` Farhan Ali
  0 siblings, 0 replies; 134+ messages in thread
From: Farhan Ali @ 2019-01-30 14:32 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic
  Cc: linux-s390, Eric Farman, Alex Williamson, Pierre Morel, kvm,
	qemu-devel, qemu-s390x



On 01/30/2019 08:29 AM, Cornelia Huck wrote:
> On Tue, 29 Jan 2019 20:39:33 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On Tue, 29 Jan 2019 10:58:40 +0100
>> Cornelia Huck <cohuck@redhat.com> wrote:
>>
>>>>>> The problem I see with the let the hardware sort it out is that, for
>>>>>> that to work, we need to juggle multiple translations simultaneously
>>>>>> (or am I wrong?). Doing that does not appear particularly simple to
>>>>>> me.
>>>>>
>>>>> None in the first stage, at most two in the second stage, I guess.
>>>>>        
>>>>
>>>> Expected benefit of the second stage over the first stage? (I see none.)
>>>
>>> Making something possible that is allowed by the architecture. Not
>>> really important, though.
>>>    
>>
>> I had a chat with Farhan, and he suggested that by 'allowed by
>> architecture' you mean " You can submit a new request if the subchannel
>> is pending with primary, but not with secondary state." (from Message-ID:
>> <20190125152154.05120461.cohuck@redhat.com>).
> 
> Yes. I might have mixed things up, though.
> 
>>
>> So I re-read the PoP.
>>
>>  From the description of the start subchannel instruction:
>> """
>> Special Conditions
>>
>> Condition code 1 is set, and no other action is
>> taken, when the subchannel is status pending when
>> START SUBCHANNEL is executed. On some mod-
>> els, condition code 1 is not set when the subchannel
>> is status pending with only secondary status; instead,
>> the status-pending condition is discarded.
>>
>> Condition code 2 is set, and no other action is
>> taken, when a start, halt, or clear function is currently
>> in progress at the subchannel (see “Function Control
>> (FC)” on page 13).
>>
>> """
>>
>> So I guess you mixed primary and secondary up and wanted to say:
>> "You can submit a new request if the subchannel
>> is pending with _secondary_, but not with _primary_ _status_."
>>
>> But does that really mean architecture allows the subchannel
>> to accept multiple ssch() instructions so that it ends up processing
>> two or more channel programs in parallel.
> 
> That's not what I meant. The vfio-ccw driver still holds on to one cp,
> while a second one could be submitted.
> 
> But let's just end discussing this here, and continue with discussing
> the reworked state machine, ok? It's not really relevant for going
> forward with halt/clear.
> 
> 
+1
  I think we should move forward with halt/clear.

Thanks
Farhan

^ permalink raw reply	[flat|nested] 134+ messages in thread

end of thread, other threads:[~2019-01-30 14:33 UTC | newest]

Thread overview: 134+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-21 11:03 [PATCH v2 0/5] vfio-ccw: support hsch/csch (kernel part) Cornelia Huck
2019-01-21 11:03 ` [Qemu-devel] " Cornelia Huck
2019-01-21 11:03 ` [PATCH v2 1/5] vfio-ccw: make it safe to access channel programs Cornelia Huck
2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
2019-01-22 14:56   ` Halil Pasic
2019-01-22 14:56     ` [Qemu-devel] " Halil Pasic
2019-01-22 15:19     ` Cornelia Huck
2019-01-22 15:19       ` [Qemu-devel] " Cornelia Huck
2019-01-21 11:03 ` [PATCH v2 2/5] vfio-ccw: concurrent I/O handling Cornelia Huck
2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
2019-01-21 20:20   ` Halil Pasic
2019-01-21 20:20     ` [Qemu-devel] " Halil Pasic
2019-01-22 10:29     ` Cornelia Huck
2019-01-22 10:29       ` [Qemu-devel] " Cornelia Huck
2019-01-22 11:17       ` Halil Pasic
2019-01-22 11:17         ` [Qemu-devel] " Halil Pasic
2019-01-22 11:53         ` Cornelia Huck
2019-01-22 11:53           ` [Qemu-devel] " Cornelia Huck
2019-01-22 12:46           ` Halil Pasic
2019-01-22 12:46             ` [Qemu-devel] " Halil Pasic
2019-01-22 17:26             ` Cornelia Huck
2019-01-22 17:26               ` [Qemu-devel] " Cornelia Huck
2019-01-22 19:03               ` Halil Pasic
2019-01-22 19:03                 ` [Qemu-devel] " Halil Pasic
2019-01-23 10:34                 ` Cornelia Huck
2019-01-23 10:34                   ` [Qemu-devel] " Cornelia Huck
2019-01-23 13:06                   ` Halil Pasic
2019-01-23 13:06                     ` [Qemu-devel] " Halil Pasic
2019-01-23 13:34                     ` Cornelia Huck
2019-01-23 13:34                       ` [Qemu-devel] " Cornelia Huck
2019-01-24 19:16                       ` Eric Farman
2019-01-24 19:16                         ` [Qemu-devel] " Eric Farman
2019-01-25 10:13                         ` Cornelia Huck
2019-01-25 10:13                           ` [Qemu-devel] " Cornelia Huck
2019-01-22 18:33   ` Halil Pasic
2019-01-22 18:33     ` [Qemu-devel] " Halil Pasic
2019-01-23 10:21     ` Cornelia Huck
2019-01-23 10:21       ` [Qemu-devel] " Cornelia Huck
2019-01-23 13:30       ` Halil Pasic
2019-01-23 13:30         ` [Qemu-devel] " Halil Pasic
2019-01-24 10:05         ` Cornelia Huck
2019-01-24 10:05           ` [Qemu-devel] " Cornelia Huck
2019-01-24 10:08       ` Pierre Morel
2019-01-24 10:08         ` [Qemu-devel] " Pierre Morel
2019-01-24 10:19         ` Cornelia Huck
2019-01-24 10:19           ` [Qemu-devel] " Cornelia Huck
2019-01-24 11:18           ` Pierre Morel
2019-01-24 11:18             ` [Qemu-devel] " Pierre Morel
2019-01-24 11:45           ` Halil Pasic
2019-01-24 11:45             ` [Qemu-devel] " Halil Pasic
2019-01-24 19:14           ` Eric Farman
2019-01-24 19:14             ` [Qemu-devel] " Eric Farman
2019-01-25  2:25   ` Eric Farman
2019-01-25  2:25     ` [Qemu-devel] " Eric Farman
2019-01-25  2:37     ` Eric Farman
2019-01-25  2:37       ` [Qemu-devel] " Eric Farman
2019-01-25 10:24       ` Cornelia Huck
2019-01-25 10:24         ` [Qemu-devel] " Cornelia Huck
2019-01-25 12:58         ` Cornelia Huck
2019-01-25 12:58           ` [Qemu-devel] " Cornelia Huck
2019-01-25 14:01           ` Halil Pasic
2019-01-25 14:01             ` [Qemu-devel] " Halil Pasic
2019-01-25 14:21             ` Cornelia Huck
2019-01-25 14:21               ` [Qemu-devel] " Cornelia Huck
2019-01-25 16:04               ` Halil Pasic
2019-01-25 16:04                 ` [Qemu-devel] " Halil Pasic
2019-01-28 17:13                 ` Cornelia Huck
2019-01-28 17:13                   ` [Qemu-devel] " Cornelia Huck
2019-01-28 19:30                   ` Halil Pasic
2019-01-28 19:30                     ` [Qemu-devel] " Halil Pasic
2019-01-29  9:58                     ` Cornelia Huck
2019-01-29  9:58                       ` [Qemu-devel] " Cornelia Huck
2019-01-29 19:39                       ` Halil Pasic
2019-01-29 19:39                         ` [Qemu-devel] " Halil Pasic
2019-01-30 13:29                         ` Cornelia Huck
2019-01-30 13:29                           ` [Qemu-devel] " Cornelia Huck
2019-01-30 14:32                           ` Farhan Ali
2019-01-30 14:32                             ` [Qemu-devel] " Farhan Ali
2019-01-28 17:09             ` Cornelia Huck
2019-01-28 17:09               ` [Qemu-devel] " Cornelia Huck
2019-01-28 19:15               ` Halil Pasic
2019-01-28 19:15                 ` [Qemu-devel] " Halil Pasic
2019-01-28 21:48                 ` Eric Farman
2019-01-28 21:48                   ` [Qemu-devel] " Eric Farman
2019-01-29 10:20                   ` Cornelia Huck
2019-01-29 10:20                     ` [Qemu-devel] " Cornelia Huck
2019-01-29 14:14                     ` Eric Farman
2019-01-29 14:14                       ` [Qemu-devel] " Eric Farman
2019-01-29 18:53                       ` Cornelia Huck
2019-01-29 18:53                         ` [Qemu-devel] " Cornelia Huck
2019-01-29 10:10                 ` Cornelia Huck
2019-01-29 10:10                   ` [Qemu-devel] " Cornelia Huck
2019-01-25 15:57           ` Eric Farman
2019-01-25 15:57             ` [Qemu-devel] " Eric Farman
2019-01-28 17:24             ` Cornelia Huck
2019-01-28 17:24               ` [Qemu-devel] " Cornelia Huck
2019-01-28 21:50               ` Eric Farman
2019-01-28 21:50                 ` [Qemu-devel] " Eric Farman
2019-01-25 20:22         ` Eric Farman
2019-01-25 20:22           ` [Qemu-devel] " Eric Farman
2019-01-28 17:31           ` Cornelia Huck
2019-01-28 17:31             ` [Qemu-devel] " Cornelia Huck
2019-01-25 13:09       ` Halil Pasic
2019-01-25 13:09         ` [Qemu-devel] " Halil Pasic
2019-01-25 12:58     ` Halil Pasic
2019-01-25 12:58       ` [Qemu-devel] " Halil Pasic
2019-01-25 20:21       ` Eric Farman
2019-01-25 20:21         ` [Qemu-devel] " Eric Farman
2019-01-21 11:03 ` [PATCH v2 3/5] vfio-ccw: add capabilities chain Cornelia Huck
2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
2019-01-23 15:57   ` [qemu-s390x] " Halil Pasic
2019-01-23 15:57     ` [Qemu-devel] " Halil Pasic
2019-01-25 16:19   ` Eric Farman
2019-01-25 16:19     ` [Qemu-devel] " Eric Farman
2019-01-25 21:00     ` Eric Farman
2019-01-25 21:00       ` [Qemu-devel] " Eric Farman
2019-01-28 17:34       ` Cornelia Huck
2019-01-28 17:34         ` [Qemu-devel] " Cornelia Huck
2019-01-21 11:03 ` [PATCH v2 4/5] s390/cio: export hsch to modules Cornelia Huck
2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
2019-01-22 15:21   ` [qemu-s390x] " Halil Pasic
2019-01-22 15:21     ` [Qemu-devel] " Halil Pasic
2019-01-21 11:03 ` [PATCH v2 5/5] vfio-ccw: add handling for async channel instructions Cornelia Huck
2019-01-21 11:03   ` [Qemu-devel] " Cornelia Huck
2019-01-23 15:51   ` Halil Pasic
2019-01-23 15:51     ` [Qemu-devel] " Halil Pasic
2019-01-24 10:06     ` Cornelia Huck
2019-01-24 10:06       ` [Qemu-devel] " Cornelia Huck
2019-01-24 10:37       ` Halil Pasic
2019-01-24 10:37         ` [Qemu-devel] " Halil Pasic
2019-01-25 21:00   ` Eric Farman
2019-01-25 21:00     ` [Qemu-devel] " Eric Farman
2019-01-28 17:40     ` Cornelia Huck
2019-01-28 17:40       ` [Qemu-devel] " Cornelia Huck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.