kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/5] Some vfio-ccw fixes
@ 2019-07-08 20:10 Farhan Ali
  2019-07-08 20:10 ` [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64 Farhan Ali
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Farhan Ali @ 2019-07-08 20:10 UTC (permalink / raw)
  To: cohuck, farman, pasic; +Cc: linux-s390, kvm, alifm

Hi,

While trying to chase down the problem regarding the stacktraces,
I have also found some minor problems in the code.

Would appreciate any review or feedback regarding them.

Thanks
Farhan

ChangeLog
---------
v1 -> v2
   - Update docs for csch/hsch since we can support those
     instructions now (patch 5)
   - Fix the memory leak where we fail to free a ccwchain (patch 2)
   - Add fixes tag where appropriate.
   - Fix comment instead of the order when setting orb.cmd.c64 (patch 1)

Farhan Ali (5):
  vfio-ccw: Fix misleading comment when setting orb.cmd.c64
  vfio-ccw: Fix memory leak and don't call cp_free in cp_init
  vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn
  vfio-ccw: Don't call cp_free if we are processing a channel program
  vfio-ccw: Update documentation for csch/hsch

 Documentation/s390/vfio-ccw.rst | 31 ++++++++++++++++++++++++++++---
 drivers/s390/cio/vfio_ccw_cp.c  | 19 ++++++++++++-------
 drivers/s390/cio/vfio_ccw_drv.c |  2 +-
 3 files changed, 41 insertions(+), 11 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64
  2019-07-08 20:10 [RFC v2 0/5] Some vfio-ccw fixes Farhan Ali
@ 2019-07-08 20:10 ` Farhan Ali
  2019-07-09  9:57   ` Cornelia Huck
  2019-07-08 20:10 ` [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init Farhan Ali
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Farhan Ali @ 2019-07-08 20:10 UTC (permalink / raw)
  To: cohuck, farman, pasic; +Cc: linux-s390, kvm, alifm

The comment is misleading because it tells us that
we should set orb.cmd.c64 before calling ccwchain_calc_length,
otherwise the function ccwchain_calc_length would return an
error. This is not completely accurate.

We want to allow an orb without cmd.c64, and this is fine
as long as the channel program does not use IDALs. But we do
want to reject any channel program that uses IDALs and does
not set the flag, which what we do in ccwchain_calc_length.

After we have done the ccw processing, it should be safe
to set cmd.c64, since we will convert them into IDALs.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index d6a8dff..7622b72 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -645,8 +645,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 	if (ret)
 		cp_free(cp);
 
-	/* It is safe to force: if not set but idals used
-	 * ccwchain_calc_length returns an error.
+	/* It is safe to force: if it was not set but idals used
+	 * ccwchain_calc_length would have returned an error.
 	 */
 	cp->orb.cmd.c64 = 1;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init
  2019-07-08 20:10 [RFC v2 0/5] Some vfio-ccw fixes Farhan Ali
  2019-07-08 20:10 ` [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64 Farhan Ali
@ 2019-07-08 20:10 ` Farhan Ali
  2019-07-09 10:06   ` Cornelia Huck
  2019-07-08 20:10 ` [RFC v2 3/5] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Farhan Ali
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Farhan Ali @ 2019-07-08 20:10 UTC (permalink / raw)
  To: cohuck, farman, pasic; +Cc: linux-s390, kvm, alifm

We don't set cp->initialized to true so calling cp_free
will just return and not do anything.

Also fix a memory leak where we fail to free a ccwchain
on an error.

Fixes: 812271b910 ("s390/cio: Squash cp_free() and cp_unpin_free()")
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 7622b72..31a04a5 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -421,7 +421,7 @@ static int ccwchain_loop_tic(struct ccwchain *chain,
 static int ccwchain_handle_ccw(u32 cda, struct channel_program *cp)
 {
 	struct ccwchain *chain;
-	int len;
+	int len, ret;
 
 	/* Copy 2K (the most we support today) of possible CCWs */
 	len = copy_from_iova(cp->mdev, cp->guest_cp, cda,
@@ -448,7 +448,12 @@ static int ccwchain_handle_ccw(u32 cda, struct channel_program *cp)
 	memcpy(chain->ch_ccw, cp->guest_cp, len * sizeof(struct ccw1));
 
 	/* Loop for tics on this new chain. */
-	return ccwchain_loop_tic(chain, cp);
+	ret = ccwchain_loop_tic(chain, cp);
+
+	if (ret)
+		ccwchain_free(chain);
+
+	return ret;
 }
 
 /* Loop for TICs. */
@@ -642,8 +647,6 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 
 	/* Build a ccwchain for the first CCW segment */
 	ret = ccwchain_handle_ccw(orb->cmd.cpa, cp);
-	if (ret)
-		cp_free(cp);
 
 	/* It is safe to force: if it was not set but idals used
 	 * ccwchain_calc_length would have returned an error.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC v2 3/5] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn
  2019-07-08 20:10 [RFC v2 0/5] Some vfio-ccw fixes Farhan Ali
  2019-07-08 20:10 ` [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64 Farhan Ali
  2019-07-08 20:10 ` [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init Farhan Ali
@ 2019-07-08 20:10 ` Farhan Ali
  2019-07-09 10:08   ` Cornelia Huck
  2019-07-08 20:10 ` [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program Farhan Ali
  2019-07-08 20:10 ` [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch Farhan Ali
  4 siblings, 1 reply; 23+ messages in thread
From: Farhan Ali @ 2019-07-08 20:10 UTC (permalink / raw)
  To: cohuck, farman, pasic; +Cc: linux-s390, kvm, alifm

So we don't call try to call vfio_unpin_pages() incorrectly.

Fixes: 0a19e61e6d4c ("vfio: ccw: introduce channel program interfaces")
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 31a04a5..1b93863 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -72,8 +72,10 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
 				  sizeof(*pa->pa_iova_pfn) +
 				  sizeof(*pa->pa_pfn),
 				  GFP_KERNEL);
-	if (unlikely(!pa->pa_iova_pfn))
+	if (unlikely(!pa->pa_iova_pfn)) {
+		pa->pa_nr = 0;
 		return -ENOMEM;
+	}
 	pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
 
 	pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-08 20:10 [RFC v2 0/5] Some vfio-ccw fixes Farhan Ali
                   ` (2 preceding siblings ...)
  2019-07-08 20:10 ` [RFC v2 3/5] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Farhan Ali
@ 2019-07-08 20:10 ` Farhan Ali
  2019-07-09 10:16   ` Cornelia Huck
  2019-07-08 20:10 ` [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch Farhan Ali
  4 siblings, 1 reply; 23+ messages in thread
From: Farhan Ali @ 2019-07-08 20:10 UTC (permalink / raw)
  To: cohuck, farman, pasic; +Cc: linux-s390, kvm, alifm

There is a small window where it's possible that we could be working
on an interrupt (queued in the workqueue) and setting up a channel
program (i.e allocating memory, pinning pages, translating address).
This can lead to allocating and freeing the channel program at the
same time and can cause memory corruption.

Let's not call cp_free if we are currently processing a channel program.
The only way we know for sure that we don't have a thread setting
up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 4e3a903..0357165 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
 		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
 	if (scsw_is_solicited(&irb->scsw)) {
 		cp_update_scsw(&private->cp, &irb->scsw);
-		if (is_final)
+		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
 			cp_free(&private->cp);
 	}
 	mutex_lock(&private->io_mutex);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch
  2019-07-08 20:10 [RFC v2 0/5] Some vfio-ccw fixes Farhan Ali
                   ` (3 preceding siblings ...)
  2019-07-08 20:10 ` [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program Farhan Ali
@ 2019-07-08 20:10 ` Farhan Ali
  2019-07-09 10:14   ` Cornelia Huck
  4 siblings, 1 reply; 23+ messages in thread
From: Farhan Ali @ 2019-07-08 20:10 UTC (permalink / raw)
  To: cohuck, farman, pasic; +Cc: linux-s390, kvm, alifm

We now support CLEAR SUBCHANNEL and HALT SUBCHANNEL
via ccw_cmd_region.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 Documentation/s390/vfio-ccw.rst | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/s390/vfio-ccw.rst b/Documentation/s390/vfio-ccw.rst
index 1f6d0b5..40e4110 100644
--- a/Documentation/s390/vfio-ccw.rst
+++ b/Documentation/s390/vfio-ccw.rst
@@ -180,6 +180,13 @@ The process of how these work together.
    add it to an iommu_group and a vfio_group. Then we could pass through
    the mdev to a guest.
 
+
+VFIO-CCW Regions
+----------------
+
+The vfio-ccw driver exposes MMIO regions to accept requests from and return
+results to userspace.
+
 vfio-ccw I/O region
 -------------------
 
@@ -205,6 +212,25 @@ irb_area stores the I/O result.
 
 ret_code stores a return code for each access of the region.
 
+This region is always available.
+
+vfio-ccw cmd region
+-------------------
+
+The vfio-ccw cmd region is used to accept asynchronous instructions
+from userspace.
+
+#define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
+#define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
+struct ccw_cmd_region {
+       __u32 command;
+       __u32 ret_code;
+} __packed;
+
+This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD.
+
+CLEAR SUBCHANNEL and HALT SUBCHANNEL currently uses this region.
+
 vfio-ccw operation details
 --------------------------
 
@@ -306,9 +332,8 @@ Together with the corresponding work in QEMU, we can bring the passed
 through DASD/ECKD device online in a guest now and use it as a block
 device.
 
-While the current code allows the guest to start channel programs via
-START SUBCHANNEL, support for HALT SUBCHANNEL or CLEAR SUBCHANNEL is
-not yet implemented.
+The current code allows the guest to start channel programs via
+START SUBCHANNEL, and supports HALT SUBCHANNEL or CLEAR SUBCHANNEL.
 
 vfio-ccw supports classic (command mode) channel I/O only. Transport
 mode (HPF) is not supported.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64
  2019-07-08 20:10 ` [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64 Farhan Ali
@ 2019-07-09  9:57   ` Cornelia Huck
  0 siblings, 0 replies; 23+ messages in thread
From: Cornelia Huck @ 2019-07-09  9:57 UTC (permalink / raw)
  To: Farhan Ali; +Cc: farman, pasic, linux-s390, kvm

On Mon,  8 Jul 2019 16:10:34 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> The comment is misleading because it tells us that
> we should set orb.cmd.c64 before calling ccwchain_calc_length,
> otherwise the function ccwchain_calc_length would return an
> error. This is not completely accurate.
> 
> We want to allow an orb without cmd.c64, and this is fine
> as long as the channel program does not use IDALs. But we do
> want to reject any channel program that uses IDALs and does
> not set the flag, which what we do in ccwchain_calc_length.
> 
> After we have done the ccw processing, it should be safe
> to set cmd.c64, since we will convert them into IDALs.

"After we have done the ccw processing, we need to set cmd.c64, as we
use IDALs for all translated channel programs." ?

>

Fixes: fb9e7880af35 ("vfio: ccw: push down unsupported IDA check")
 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index d6a8dff..7622b72 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -645,8 +645,8 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
>  	if (ret)
>  		cp_free(cp);
>  
> -	/* It is safe to force: if not set but idals used
> -	 * ccwchain_calc_length returns an error.
> +	/* It is safe to force: if it was not set but idals used
> +	 * ccwchain_calc_length would have returned an error.

Thanks, much clearer.

>  	 */
>  	cp->orb.cmd.c64 = 1;
>  

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init
  2019-07-08 20:10 ` [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init Farhan Ali
@ 2019-07-09 10:06   ` Cornelia Huck
  2019-07-09 14:07     ` Farhan Ali
  0 siblings, 1 reply; 23+ messages in thread
From: Cornelia Huck @ 2019-07-09 10:06 UTC (permalink / raw)
  To: Farhan Ali; +Cc: farman, pasic, linux-s390, kvm

On Mon,  8 Jul 2019 16:10:35 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> We don't set cp->initialized to true so calling cp_free
> will just return and not do anything.
> 
> Also fix a memory leak where we fail to free a ccwchain
> on an error.
> 
> Fixes: 812271b910 ("s390/cio: Squash cp_free() and cp_unpin_free()")
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)

(...)

> @@ -642,8 +647,6 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
>  
>  	/* Build a ccwchain for the first CCW segment */
>  	ret = ccwchain_handle_ccw(orb->cmd.cpa, cp);
> -	if (ret)
> -		cp_free(cp);

Now that I look again: it's a bit odd that we set the bit in all cases,
even if we have an error. We could move that into the !ret branch that
sets ->initialized; but it does not really hurt.

>  
>  	/* It is safe to force: if it was not set but idals used
>  	 * ccwchain_calc_length would have returned an error.

The rest of the patch looks good to me.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 3/5] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn
  2019-07-08 20:10 ` [RFC v2 3/5] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Farhan Ali
@ 2019-07-09 10:08   ` Cornelia Huck
  0 siblings, 0 replies; 23+ messages in thread
From: Cornelia Huck @ 2019-07-09 10:08 UTC (permalink / raw)
  To: Farhan Ali; +Cc: farman, pasic, linux-s390, kvm

On Mon,  8 Jul 2019 16:10:36 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> So we don't call try to call vfio_unpin_pages() incorrectly.
> 
> Fixes: 0a19e61e6d4c ("vfio: ccw: introduce channel program interfaces")

So that was there since the beginning :)

> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> Reviewed-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_cp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 31a04a5..1b93863 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -72,8 +72,10 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
>  				  sizeof(*pa->pa_iova_pfn) +
>  				  sizeof(*pa->pa_pfn),
>  				  GFP_KERNEL);
> -	if (unlikely(!pa->pa_iova_pfn))
> +	if (unlikely(!pa->pa_iova_pfn)) {
> +		pa->pa_nr = 0;
>  		return -ENOMEM;
> +	}
>  	pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
>  
>  	pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch
  2019-07-08 20:10 ` [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch Farhan Ali
@ 2019-07-09 10:14   ` Cornelia Huck
  2019-07-09 12:47     ` Farhan Ali
  0 siblings, 1 reply; 23+ messages in thread
From: Cornelia Huck @ 2019-07-09 10:14 UTC (permalink / raw)
  To: Farhan Ali; +Cc: farman, pasic, linux-s390, kvm

On Mon,  8 Jul 2019 16:10:38 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> We now support CLEAR SUBCHANNEL and HALT SUBCHANNEL
> via ccw_cmd_region.
> 

Thanks, I forgot about this.

Fixes: d5afd5d135c8 ("vfio-ccw: add handling for async channel instructions")

> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  Documentation/s390/vfio-ccw.rst | 31 ++++++++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/s390/vfio-ccw.rst b/Documentation/s390/vfio-ccw.rst
> index 1f6d0b5..40e4110 100644
> --- a/Documentation/s390/vfio-ccw.rst
> +++ b/Documentation/s390/vfio-ccw.rst
> @@ -180,6 +180,13 @@ The process of how these work together.
>     add it to an iommu_group and a vfio_group. Then we could pass through
>     the mdev to a guest.
>  
> +
> +VFIO-CCW Regions
> +----------------
> +
> +The vfio-ccw driver exposes MMIO regions to accept requests from and return
> +results to userspace.
> +
>  vfio-ccw I/O region
>  -------------------
>  
> @@ -205,6 +212,25 @@ irb_area stores the I/O result.
>  
>  ret_code stores a return code for each access of the region.
>  
> +This region is always available.
> +
> +vfio-ccw cmd region
> +-------------------
> +
> +The vfio-ccw cmd region is used to accept asynchronous instructions
> +from userspace.
> +
> +#define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
> +#define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
> +struct ccw_cmd_region {
> +       __u32 command;
> +       __u32 ret_code;
> +} __packed;
> +
> +This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD.
> +
> +CLEAR SUBCHANNEL and HALT SUBCHANNEL currently uses this region.

"Currently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region". ?

> +
>  vfio-ccw operation details
>  --------------------------
>  
> @@ -306,9 +332,8 @@ Together with the corresponding work in QEMU, we can bring the passed
>  through DASD/ECKD device online in a guest now and use it as a block
>  device.
>  
> -While the current code allows the guest to start channel programs via
> -START SUBCHANNEL, support for HALT SUBCHANNEL or CLEAR SUBCHANNEL is
> -not yet implemented.
> +The current code allows the guest to start channel programs via
> +START SUBCHANNEL, and supports HALT SUBCHANNEL or CLEAR SUBCHANNEL.

"The current code allows the guest to start channel programs via START
SUBCHANNEL, and to issue HALT SUBCHANNEL and CLEAR SUBCHANNEL." ?

>  
>  vfio-ccw supports classic (command mode) channel I/O only. Transport
>  mode (HPF) is not supported.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-08 20:10 ` [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program Farhan Ali
@ 2019-07-09 10:16   ` Cornelia Huck
  2019-07-09 13:46     ` Farhan Ali
  0 siblings, 1 reply; 23+ messages in thread
From: Cornelia Huck @ 2019-07-09 10:16 UTC (permalink / raw)
  To: Farhan Ali; +Cc: farman, pasic, linux-s390, kvm

On Mon,  8 Jul 2019 16:10:37 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> There is a small window where it's possible that we could be working
> on an interrupt (queued in the workqueue) and setting up a channel
> program (i.e allocating memory, pinning pages, translating address).
> This can lead to allocating and freeing the channel program at the
> same time and can cause memory corruption.
> 
> Let's not call cp_free if we are currently processing a channel program.
> The only way we know for sure that we don't have a thread setting
> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.

Can we pinpoint a commit that introduced this bug, or has it been there
since the beginning?

> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 4e3a903..0357165 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>  	if (scsw_is_solicited(&irb->scsw)) {
>  		cp_update_scsw(&private->cp, &irb->scsw);
> -		if (is_final)
> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
>  			cp_free(&private->cp);
>  	}
>  	mutex_lock(&private->io_mutex);

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch
  2019-07-09 10:14   ` Cornelia Huck
@ 2019-07-09 12:47     ` Farhan Ali
  0 siblings, 0 replies; 23+ messages in thread
From: Farhan Ali @ 2019-07-09 12:47 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: farman, pasic, linux-s390, kvm



On 07/09/2019 06:14 AM, Cornelia Huck wrote:
> On Mon,  8 Jul 2019 16:10:38 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> We now support CLEAR SUBCHANNEL and HALT SUBCHANNEL
>> via ccw_cmd_region.
>>
> 
> Thanks, I forgot about this.
> 
> Fixes: d5afd5d135c8 ("vfio-ccw: add handling for async channel instructions")
> 

I don't know if this warranted a fixes tag but I will add it.

>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   Documentation/s390/vfio-ccw.rst | 31 ++++++++++++++++++++++++++++---
>>   1 file changed, 28 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/s390/vfio-ccw.rst b/Documentation/s390/vfio-ccw.rst
>> index 1f6d0b5..40e4110 100644
>> --- a/Documentation/s390/vfio-ccw.rst
>> +++ b/Documentation/s390/vfio-ccw.rst
>> @@ -180,6 +180,13 @@ The process of how these work together.
>>      add it to an iommu_group and a vfio_group. Then we could pass through
>>      the mdev to a guest.
>>   
>> +
>> +VFIO-CCW Regions
>> +----------------
>> +
>> +The vfio-ccw driver exposes MMIO regions to accept requests from and return
>> +results to userspace.
>> +
>>   vfio-ccw I/O region
>>   -------------------
>>   
>> @@ -205,6 +212,25 @@ irb_area stores the I/O result.
>>   
>>   ret_code stores a return code for each access of the region.
>>   
>> +This region is always available.
>> +
>> +vfio-ccw cmd region
>> +-------------------
>> +
>> +The vfio-ccw cmd region is used to accept asynchronous instructions
>> +from userspace.
>> +
>> +#define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
>> +#define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
>> +struct ccw_cmd_region {
>> +       __u32 command;
>> +       __u32 ret_code;
>> +} __packed;
>> +
>> +This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD.
>> +
>> +CLEAR SUBCHANNEL and HALT SUBCHANNEL currently uses this region.
> 
> "Currently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region". ?
> 

This sounds better to me :)

>> +
>>   vfio-ccw operation details
>>   --------------------------
>>   
>> @@ -306,9 +332,8 @@ Together with the corresponding work in QEMU, we can bring the passed
>>   through DASD/ECKD device online in a guest now and use it as a block
>>   device.
>>   
>> -While the current code allows the guest to start channel programs via
>> -START SUBCHANNEL, support for HALT SUBCHANNEL or CLEAR SUBCHANNEL is
>> -not yet implemented.
>> +The current code allows the guest to start channel programs via
>> +START SUBCHANNEL, and supports HALT SUBCHANNEL or CLEAR SUBCHANNEL.
> 
> "The current code allows the guest to start channel programs via START
> SUBCHANNEL, and to issue HALT SUBCHANNEL and CLEAR SUBCHANNEL." ?
>

Same. I will update with the changes you requested.

>>   
>>   vfio-ccw supports classic (command mode) channel I/O only. Transport
>>   mode (HPF) is not supported.
> 
> 

Thanks for taking a look.

Thanks
Farhan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-09 10:16   ` Cornelia Huck
@ 2019-07-09 13:46     ` Farhan Ali
  2019-07-09 14:21       ` Halil Pasic
  0 siblings, 1 reply; 23+ messages in thread
From: Farhan Ali @ 2019-07-09 13:46 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: farman, pasic, linux-s390, kvm



On 07/09/2019 06:16 AM, Cornelia Huck wrote:
> On Mon,  8 Jul 2019 16:10:37 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> There is a small window where it's possible that we could be working
>> on an interrupt (queued in the workqueue) and setting up a channel
>> program (i.e allocating memory, pinning pages, translating address).
>> This can lead to allocating and freeing the channel program at the
>> same time and can cause memory corruption.
>>
>> Let's not call cp_free if we are currently processing a channel program.
>> The only way we know for sure that we don't have a thread setting
>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
> 
> Can we pinpoint a commit that introduced this bug, or has it been there
> since the beginning?
> 

I think the problem was always there.

>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>> index 4e3a903..0357165 100644
>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>   		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>   	if (scsw_is_solicited(&irb->scsw)) {
>>   		cp_update_scsw(&private->cp, &irb->scsw);
>> -		if (is_final)
>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
>>   			cp_free(&private->cp);
>>   	}
>>   	mutex_lock(&private->io_mutex);
> 
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> 
> 
Thanks for reviewing.

Thanks
Farhan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init
  2019-07-09 10:06   ` Cornelia Huck
@ 2019-07-09 14:07     ` Farhan Ali
  2019-07-09 14:18       ` Cornelia Huck
  0 siblings, 1 reply; 23+ messages in thread
From: Farhan Ali @ 2019-07-09 14:07 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: farman, pasic, linux-s390, kvm



On 07/09/2019 06:06 AM, Cornelia Huck wrote:
> On Mon,  8 Jul 2019 16:10:35 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> We don't set cp->initialized to true so calling cp_free
>> will just return and not do anything.
>>
>> Also fix a memory leak where we fail to free a ccwchain
>> on an error.
>>
>> Fixes: 812271b910 ("s390/cio: Squash cp_free() and cp_unpin_free()")
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
> 
> (...)
> 
>> @@ -642,8 +647,6 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
>>   
>>   	/* Build a ccwchain for the first CCW segment */
>>   	ret = ccwchain_handle_ccw(orb->cmd.cpa, cp);
>> -	if (ret)
>> -		cp_free(cp);
> 
> Now that I look again: it's a bit odd that we set the bit in all cases,
> even if we have an error. We could move that into the !ret branch that
> sets ->initialized; but it does not really hurt.

By setting the bit, I am assuming you meant cmd.c64?

Yes, it doesn't harm anything but for better code readability you have a 
good point. I will move it into !ret branch in the first patch since I 
think it would be more appropriate there, no?

> 
>>   
>>   	/* It is safe to force: if it was not set but idals used
>>   	 * ccwchain_calc_length would have returned an error.
> 
> The rest of the patch looks good to me.
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init
  2019-07-09 14:07     ` Farhan Ali
@ 2019-07-09 14:18       ` Cornelia Huck
  0 siblings, 0 replies; 23+ messages in thread
From: Cornelia Huck @ 2019-07-09 14:18 UTC (permalink / raw)
  To: Farhan Ali; +Cc: farman, pasic, linux-s390, kvm

On Tue, 9 Jul 2019 10:07:57 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 07/09/2019 06:06 AM, Cornelia Huck wrote:
> > On Mon,  8 Jul 2019 16:10:35 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >   
> >> We don't set cp->initialized to true so calling cp_free
> >> will just return and not do anything.
> >>
> >> Also fix a memory leak where we fail to free a ccwchain
> >> on an error.
> >>
> >> Fixes: 812271b910 ("s390/cio: Squash cp_free() and cp_unpin_free()")
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >>   drivers/s390/cio/vfio_ccw_cp.c | 11 +++++++----
> >>   1 file changed, 7 insertions(+), 4 deletions(-)  
> > 
> > (...)
> >   
> >> @@ -642,8 +647,6 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
> >>   
> >>   	/* Build a ccwchain for the first CCW segment */
> >>   	ret = ccwchain_handle_ccw(orb->cmd.cpa, cp);
> >> -	if (ret)
> >> -		cp_free(cp);  
> > 
> > Now that I look again: it's a bit odd that we set the bit in all cases,
> > even if we have an error. We could move that into the !ret branch that
> > sets ->initialized; but it does not really hurt.  
> 
> By setting the bit, I am assuming you meant cmd.c64?
> 
> Yes, it doesn't harm anything but for better code readability you have a 
> good point. I will move it into !ret branch in the first patch since I 
> think it would be more appropriate there, no?

Yes to all of that :)

> 
> >   
> >>   
> >>   	/* It is safe to force: if it was not set but idals used
> >>   	 * ccwchain_calc_length would have returned an error.  
> > 
> > The rest of the patch looks good to me.
> > 
> >   
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-09 13:46     ` Farhan Ali
@ 2019-07-09 14:21       ` Halil Pasic
  2019-07-09 21:27         ` Farhan Ali
  0 siblings, 1 reply; 23+ messages in thread
From: Halil Pasic @ 2019-07-09 14:21 UTC (permalink / raw)
  To: Farhan Ali; +Cc: Cornelia Huck, farman, linux-s390, kvm

On Tue, 9 Jul 2019 09:46:51 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> 
> 
> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
> > On Mon,  8 Jul 2019 16:10:37 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> > 
> >> There is a small window where it's possible that we could be working
> >> on an interrupt (queued in the workqueue) and setting up a channel
> >> program (i.e allocating memory, pinning pages, translating address).
> >> This can lead to allocating and freeing the channel program at the
> >> same time and can cause memory corruption.
> >>
> >> Let's not call cp_free if we are currently processing a channel program.
> >> The only way we know for sure that we don't have a thread setting
> >> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
> > 
> > Can we pinpoint a commit that introduced this bug, or has it been there
> > since the beginning?
> > 
> 
> I think the problem was always there.
> 

I think it became relevant with the async stuff. Because after the async
stuff was added we start getting solicited interrupts that are not about
channel program is done. At least this is how I remember the discussion.

> >>
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >>   drivers/s390/cio/vfio_ccw_drv.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> >> index 4e3a903..0357165 100644
> >> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
> >>   		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >>   	if (scsw_is_solicited(&irb->scsw)) {
> >>   		cp_update_scsw(&private->cp, &irb->scsw);
> >> -		if (is_final)
> >> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)

Ain't private->state potentially used by multiple threads of execution?
Do we need to use atomic operations or external synchronization to avoid
this being another gamble? Or am I missing something?

> >>   			cp_free(&private->cp);
> >>   	}
> >>   	mutex_lock(&private->io_mutex);
> > 
> > Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> > 
> > 
> Thanks for reviewing.
> 
> Thanks
> Farhan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-09 14:21       ` Halil Pasic
@ 2019-07-09 21:27         ` Farhan Ali
  2019-07-10 13:45           ` Cornelia Huck
  2019-07-11 14:57           ` Halil Pasic
  0 siblings, 2 replies; 23+ messages in thread
From: Farhan Ali @ 2019-07-09 21:27 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Cornelia Huck, farman, linux-s390, kvm



On 07/09/2019 10:21 AM, Halil Pasic wrote:
> On Tue, 9 Jul 2019 09:46:51 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>>
>>
>> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
>>> On Mon,  8 Jul 2019 16:10:37 -0400
>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>
>>>> There is a small window where it's possible that we could be working
>>>> on an interrupt (queued in the workqueue) and setting up a channel
>>>> program (i.e allocating memory, pinning pages, translating address).
>>>> This can lead to allocating and freeing the channel program at the
>>>> same time and can cause memory corruption.
>>>>
>>>> Let's not call cp_free if we are currently processing a channel program.
>>>> The only way we know for sure that we don't have a thread setting
>>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
>>>
>>> Can we pinpoint a commit that introduced this bug, or has it been there
>>> since the beginning?
>>>
>>
>> I think the problem was always there.
>>
> 
> I think it became relevant with the async stuff. Because after the async
> stuff was added we start getting solicited interrupts that are not about
> channel program is done. At least this is how I remember the discussion.
> 
>>>>
>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>>>> index 4e3a903..0357165 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>>>    		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>>    	if (scsw_is_solicited(&irb->scsw)) {
>>>>    		cp_update_scsw(&private->cp, &irb->scsw);
>>>> -		if (is_final)
>>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> 
> Ain't private->state potentially used by multiple threads of execution?

yes

One of the paths I can think of is a machine check from the host which 
will ultimately call vfio_ccw_sch_event callback which could set state 
to NOT_OPER or IDLE.

> Do we need to use atomic operations or external synchronization to avoid
> this being another gamble? Or am I missing something?

I think we probably should think about atomic operations for 
synchronizing the state (and it could be a separate add on patch?).

But for preventing 2 threads from stomping on the cp the check should be 
enough, unless I am missing something?

> 
>>>>    			cp_free(&private->cp);
>>>>    	}
>>>>    	mutex_lock(&private->io_mutex);
>>>
>>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>>>
>>>
>> Thanks for reviewing.
>>
>> Thanks
>> Farhan
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-09 21:27         ` Farhan Ali
@ 2019-07-10 13:45           ` Cornelia Huck
  2019-07-10 16:10             ` Farhan Ali
  2019-07-11 14:57           ` Halil Pasic
  1 sibling, 1 reply; 23+ messages in thread
From: Cornelia Huck @ 2019-07-10 13:45 UTC (permalink / raw)
  To: Farhan Ali; +Cc: Halil Pasic, farman, linux-s390, kvm

On Tue, 9 Jul 2019 17:27:47 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 07/09/2019 10:21 AM, Halil Pasic wrote:
> > On Tue, 9 Jul 2019 09:46:51 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >   
> >>
> >>
> >> On 07/09/2019 06:16 AM, Cornelia Huck wrote:  
> >>> On Mon,  8 Jul 2019 16:10:37 -0400
> >>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>  
> >>>> There is a small window where it's possible that we could be working
> >>>> on an interrupt (queued in the workqueue) and setting up a channel
> >>>> program (i.e allocating memory, pinning pages, translating address).
> >>>> This can lead to allocating and freeing the channel program at the
> >>>> same time and can cause memory corruption.
> >>>>
> >>>> Let's not call cp_free if we are currently processing a channel program.
> >>>> The only way we know for sure that we don't have a thread setting
> >>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.  
> >>>
> >>> Can we pinpoint a commit that introduced this bug, or has it been there
> >>> since the beginning?
> >>>  
> >>
> >> I think the problem was always there.
> >>  
> > 
> > I think it became relevant with the async stuff. Because after the async
> > stuff was added we start getting solicited interrupts that are not about
> > channel program is done. At least this is how I remember the discussion.
> >   
> >>>>
> >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>> ---
> >>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> index 4e3a903..0357165 100644
> >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
> >>>>    		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >>>>    	if (scsw_is_solicited(&irb->scsw)) {
> >>>>    		cp_update_scsw(&private->cp, &irb->scsw);
> >>>> -		if (is_final)
> >>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)  
> > 
> > Ain't private->state potentially used by multiple threads of execution?  
> 
> yes
> 
> One of the paths I can think of is a machine check from the host which 
> will ultimately call vfio_ccw_sch_event callback which could set state 
> to NOT_OPER or IDLE.

Now I went through the machine check rabbit hole because I thought
freeing the cp in there might be a good idea, but it's not that easy
(who'd have thought...)

If I read the POP correctly, an IPI or IPR in the subchannel CRW will
indicate that the subchannel has been restored to a state after an I/O
reset; in particular, that means that the subchannel does not have any
I/O pending. However, that does not seem to be the case e.g. for an IPM
(the doc does not seem to be very clear on that, though.) We can't
unconditionally do something, as we do not know what event we're being
called for (please disregard the positively ancient "we're called for
IPI" comment in css_process_crw(), I think I added that one in the
Linux 2.4 or 2.5 timeframe...) tl;dr We can't rely on anything...

> 
> > Do we need to use atomic operations or external synchronization to avoid
> > this being another gamble? Or am I missing something?  
> 
> I think we probably should think about atomic operations for 
> synchronizing the state (and it could be a separate add on patch?).

+1 to thinking about some atomicity changes later.

> 
> But for preventing 2 threads from stomping on the cp the check should be 
> enough, unless I am missing something?

I think so. Plus, the patch is small enough that we can merge it right
away, and figure out a more generic change later.

> 
> >   
> >>>>    			cp_free(&private->cp);
> >>>>    	}
> >>>>    	mutex_lock(&private->io_mutex);  
> >>>
> >>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> >>>
> >>>  
> >> Thanks for reviewing.
> >>
> >> Thanks
> >> Farhan  
> > 
> >   


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-10 13:45           ` Cornelia Huck
@ 2019-07-10 16:10             ` Farhan Ali
  2019-07-11 12:28               ` Eric Farman
  0 siblings, 1 reply; 23+ messages in thread
From: Farhan Ali @ 2019-07-10 16:10 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Halil Pasic, farman, linux-s390, kvm



On 07/10/2019 09:45 AM, Cornelia Huck wrote:
> On Tue, 9 Jul 2019 17:27:47 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> On 07/09/2019 10:21 AM, Halil Pasic wrote:
>>> On Tue, 9 Jul 2019 09:46:51 -0400
>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>    
>>>>
>>>>
>>>> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
>>>>> On Mon,  8 Jul 2019 16:10:37 -0400
>>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>>   
>>>>>> There is a small window where it's possible that we could be working
>>>>>> on an interrupt (queued in the workqueue) and setting up a channel
>>>>>> program (i.e allocating memory, pinning pages, translating address).
>>>>>> This can lead to allocating and freeing the channel program at the
>>>>>> same time and can cause memory corruption.
>>>>>>
>>>>>> Let's not call cp_free if we are currently processing a channel program.
>>>>>> The only way we know for sure that we don't have a thread setting
>>>>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
>>>>>
>>>>> Can we pinpoint a commit that introduced this bug, or has it been there
>>>>> since the beginning?
>>>>>   
>>>>
>>>> I think the problem was always there.
>>>>   
>>>
>>> I think it became relevant with the async stuff. Because after the async
>>> stuff was added we start getting solicited interrupts that are not about
>>> channel program is done. At least this is how I remember the discussion.
>>>    
>>>>>>
>>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>>>> ---
>>>>>>     drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> index 4e3a903..0357165 100644
>>>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>>>>>     		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>>>>     	if (scsw_is_solicited(&irb->scsw)) {
>>>>>>     		cp_update_scsw(&private->cp, &irb->scsw);
>>>>>> -		if (is_final)
>>>>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
>>>
>>> Ain't private->state potentially used by multiple threads of execution?
>>
>> yes
>>
>> One of the paths I can think of is a machine check from the host which
>> will ultimately call vfio_ccw_sch_event callback which could set state
>> to NOT_OPER or IDLE.
> 
> Now I went through the machine check rabbit hole because I thought
> freeing the cp in there might be a good idea, but it's not that easy
> (who'd have thought...)

Thanks for taking a deeper look :)

> 
> If I read the POP correctly, an IPI or IPR in the subchannel CRW will
> indicate that the subchannel has been restored to a state after an I/O
> reset; in particular, that means that the subchannel does not have any
> I/O pending. However, that does not seem to be the case e.g. for an IPM
> (the doc does not seem to be very clear on that, though.) We can't
> unconditionally do something, as we do not know what event we're being
> called for (please disregard the positively ancient "we're called for
> IPI" comment in css_process_crw(), I think I added that one in the
> Linux 2.4 or 2.5 timeframe...) tl;dr We can't rely on anything...

Yes, the CRW infrastructure in Linux does not convey the exact event 
back to the subchannel driver.

> 
>>
>>> Do we need to use atomic operations or external synchronization to avoid
>>> this being another gamble? Or am I missing something?
>>
>> I think we probably should think about atomic operations for
>> synchronizing the state (and it could be a separate add on patch?).
> 
> +1 to thinking about some atomicity changes later.
> 
>>
>> But for preventing 2 threads from stomping on the cp the check should be
>> enough, unless I am missing something?
> 
> I think so. Plus, the patch is small enough that we can merge it right
> away, and figure out a more generic change later.

I will send out a v3 soon if no one else has any other suggestions.

> 
>>
>>>    
>>>>>>     			cp_free(&private->cp);
>>>>>>     	}
>>>>>>     	mutex_lock(&private->io_mutex);
>>>>>
>>>>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>>>>>
>>>>>   
>>>> Thanks for reviewing.
>>>>
>>>> Thanks
>>>> Farhan
>>>
>>>    
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-10 16:10             ` Farhan Ali
@ 2019-07-11 12:28               ` Eric Farman
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Farman @ 2019-07-11 12:28 UTC (permalink / raw)
  To: Farhan Ali, Cornelia Huck; +Cc: Halil Pasic, linux-s390, kvm



On 7/10/19 12:10 PM, Farhan Ali wrote:
> 
> 
> On 07/10/2019 09:45 AM, Cornelia Huck wrote:
>> On Tue, 9 Jul 2019 17:27:47 -0400
>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>
>>> On 07/09/2019 10:21 AM, Halil Pasic wrote:
>>>> Do we need to use atomic operations or external synchronization to
>>>> avoid
>>>> this being another gamble? Or am I missing something?
>>>
>>> I think we probably should think about atomic operations for
>>> synchronizing the state (and it could be a separate add on patch?).
>>
>> +1 to thinking about some atomicity changes later.

+1

>>
>>>
>>> But for preventing 2 threads from stomping on the cp the check should be
>>> enough, unless I am missing something?
>>
>> I think so. Plus, the patch is small enough that we can merge it right
>> away, and figure out a more generic change later.
> 
> I will send out a v3 soon if no one else has any other suggestions.
> 

I thumbed through them and think they look good with Conny's
suggestions.  Nothing else jumps to mind for me.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-09 21:27         ` Farhan Ali
  2019-07-10 13:45           ` Cornelia Huck
@ 2019-07-11 14:57           ` Halil Pasic
  2019-07-11 20:09             ` Eric Farman
  1 sibling, 1 reply; 23+ messages in thread
From: Halil Pasic @ 2019-07-11 14:57 UTC (permalink / raw)
  To: Farhan Ali; +Cc: Cornelia Huck, farman, linux-s390, kvm

On Tue, 9 Jul 2019 17:27:47 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> 
> 
> On 07/09/2019 10:21 AM, Halil Pasic wrote:
> > On Tue, 9 Jul 2019 09:46:51 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> > 
> >>
> >>
> >> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
> >>> On Mon,  8 Jul 2019 16:10:37 -0400
> >>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>
> >>>> There is a small window where it's possible that we could be working
> >>>> on an interrupt (queued in the workqueue) and setting up a channel
> >>>> program (i.e allocating memory, pinning pages, translating address).
> >>>> This can lead to allocating and freeing the channel program at the
> >>>> same time and can cause memory corruption.
> >>>>
> >>>> Let's not call cp_free if we are currently processing a channel program.
> >>>> The only way we know for sure that we don't have a thread setting
> >>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
> >>>
> >>> Can we pinpoint a commit that introduced this bug, or has it been there
> >>> since the beginning?
> >>>
> >>
> >> I think the problem was always there.
> >>
> > 
> > I think it became relevant with the async stuff. Because after the async
> > stuff was added we start getting solicited interrupts that are not about
> > channel program is done. At least this is how I remember the discussion.
> > 

You seem to have ignored this comment. BTW wasn't the cp->is_initialized
make 'Make it safe to call the cp accessors in any case, so we can call
them unconditionally.'?

@Connie: Your opinion as the author of that patch and of the cited
sentence?

> >>>>
> >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>> ---
> >>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> index 4e3a903..0357165 100644
> >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
> >>>>    		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >>>>    	if (scsw_is_solicited(&irb->scsw)) {
> >>>>    		cp_update_scsw(&private->cp, &irb->scsw);
> >>>> -		if (is_final)
> >>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> > 
> > Ain't private->state potentially used by multiple threads of execution?
> 
> yes
> 
> One of the paths I can think of is a machine check from the host which 
> will ultimately call vfio_ccw_sch_event callback which could set state 
> to NOT_OPER or IDLE.
> 
> > Do we need to use atomic operations or external synchronization to avoid
> > this being another gamble? Or am I missing something?
> 
> I think we probably should think about atomic operations for 
> synchronizing the state (and it could be a separate add on patch?).
> 
> But for preventing 2 threads from stomping on the cp the check should be 
> enough, unless I am missing something?
> 

Usually programming languages don't like incorrectly synchronized
programs. One tends to end up in undefined behavior land -- form language
perspective. That doesn't actually mean you are bound to see strange
stuff. With implementation spec + ABI spec + platform/architecture
spec one may end up with things being well defined. But it that is a much
deeper rabbit hole.

The nice thing about condition state == VFIO_CCW_STATE_CP_PENDING is
that it can tolerate stale state values. The bad case at hand
(you free but you should not) would be we see a stale
VFIO_CCW_STATE_CP_PENDING but we are actually
VFIO_CCW_STATE_CP_PROCESSING. That is pretty difficult to imagine
because one can enter VFIO_CCW_STATE_CP_PROCESSING only form
VFIO_CCW_STATE_CP_PENDING afair. On s390x torn reads/writes (i.e.
observing something that ain't either the old nor the new value) on an
int shouldn't be a concern.

The other bad case (where you don't free albeit you should) looks a
bit trickier.

I'm not a fan of keeping races around without good reasons. And I don't
see good reasons here. I'm no fan of needlessly complicated solutions
either.

But seems, at least with my beliefs about races, I'm the oddball
here. 

Regards,
Halil

> > 
> >>>>    			cp_free(&private->cp);
> >>>>    	}
> >>>>    	mutex_lock(&private->io_mutex);
> >>>
> >>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> >>>
> >>>
> >> Thanks for reviewing.
> >>
> >> Thanks
> >> Farhan
> > 
> > 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-11 14:57           ` Halil Pasic
@ 2019-07-11 20:09             ` Eric Farman
  2019-07-12 13:59               ` Halil Pasic
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Farman @ 2019-07-11 20:09 UTC (permalink / raw)
  To: Halil Pasic, Farhan Ali; +Cc: Cornelia Huck, linux-s390, kvm



On 7/11/19 10:57 AM, Halil Pasic wrote:
> On Tue, 9 Jul 2019 17:27:47 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>>
>>
>> On 07/09/2019 10:21 AM, Halil Pasic wrote:
>>> On Tue, 9 Jul 2019 09:46:51 -0400
>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>
>>>>
>>>>
>>>> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
>>>>> On Mon,  8 Jul 2019 16:10:37 -0400
>>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>>
>>>>>> There is a small window where it's possible that we could be working
>>>>>> on an interrupt (queued in the workqueue) and setting up a channel
>>>>>> program (i.e allocating memory, pinning pages, translating address).
>>>>>> This can lead to allocating and freeing the channel program at the
>>>>>> same time and can cause memory corruption.
>>>>>>
>>>>>> Let's not call cp_free if we are currently processing a channel program.
>>>>>> The only way we know for sure that we don't have a thread setting
>>>>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
>>>>>
>>>>> Can we pinpoint a commit that introduced this bug, or has it been there
>>>>> since the beginning?
>>>>>
>>>>
>>>> I think the problem was always there.
>>>>
>>>
>>> I think it became relevant with the async stuff. Because after the async
>>> stuff was added we start getting solicited interrupts that are not about
>>> channel program is done. At least this is how I remember the discussion.
>>>
> 
> You seem to have ignored this comment. 

I read both comments as being in agreement with one another.  The
problem has always been there, but didn't mean anything until we had
another mechanism (async) to drive additional interrupts.  Hence the v3
patch including the async patch in a Fixes tag.

BTW wasn't the cp->is_initialized
> make 'Make it safe to call the cp accessors in any case, so we can call
> them unconditionally.'?
> 
> @Connie: Your opinion as the author of that patch and of the cited
> sentence?
> >>>>>>
>>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>>>> ---
>>>>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> index 4e3a903..0357165 100644
>>>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>>>>>    		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>>>>    	if (scsw_is_solicited(&irb->scsw)) {
>>>>>>    		cp_update_scsw(&private->cp, &irb->scsw);
>>>>>> -		if (is_final)
>>>>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
>>>
>>> Ain't private->state potentially used by multiple threads of execution?
>>
>> yes
>>
>> One of the paths I can think of is a machine check from the host which 
>> will ultimately call vfio_ccw_sch_event callback which could set state 
>> to NOT_OPER or IDLE.
>>
>>> Do we need to use atomic operations or external synchronization to avoid
>>> this being another gamble? Or am I missing something?
>>
>> I think we probably should think about atomic operations for 
>> synchronizing the state (and it could be a separate add on patch?).
>>
>> But for preventing 2 threads from stomping on the cp the check should be 
>> enough, unless I am missing something?
>>
> 
> Usually programming languages don't like incorrectly synchronized
> programs. One tends to end up in undefined behavior land -- form language
> perspective. That doesn't actually mean you are bound to see strange
> stuff. With implementation spec + ABI spec + platform/architecture
> spec one may end up with things being well defined. But it that is a much
> deeper rabbit hole.
> 
> The nice thing about condition state == VFIO_CCW_STATE_CP_PENDING is
> that it can tolerate stale state values. The bad case at hand
> (you free but you should not) would be we see a stale
> VFIO_CCW_STATE_CP_PENDING but we are actually
> VFIO_CCW_STATE_CP_PROCESSING. That is pretty difficult to imagine
> because one can enter VFIO_CCW_STATE_CP_PROCESSING only form
> VFIO_CCW_STATE_CP_PENDING afair. 

I think you're backwards here.  The path is IDLE -> CP_PROCESSING ->
(CP_PENDING | IDLE)

On s390x torn reads/writes (i.e.
> observing something that ain't either the old nor the new value) on an
> int shouldn't be a concern.
> 
> The other bad case (where you don't free albeit you should) looks a
> bit trickier.

I'm afraid I don't understand your intention with the above paragraphs.  :(

> 
> I'm not a fan of keeping races around without good reasons. And I don't
> see good reasons here. I'm no fan of needlessly complicated solutions
> either.
> 
> But seems, at least with my beliefs about races, I'm the oddball
> here. 

The "race" here is that we have one synchronous operation (SSCH) and two
asynchronous operations (HSCH, CSCH), both of which interact with one
another and generate interrupts that pass through this chunk of code.

I have not fully considered this patch yet, but the race is a concern to
all of us oddballs.  I have not chimed in any great detail because I
only got through the first couple patches in v1 before going on holiday,
and the discussions on v1/v2 are numerous.

 - Eric

> 
> Regards,
> Halil
> 
>>>
>>>>>>    			cp_free(&private->cp);
>>>>>>    	}
>>>>>>    	mutex_lock(&private->io_mutex);
>>>>>
>>>>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>>>>>
>>>>>
>>>> Thanks for reviewing.
>>>>
>>>> Thanks
>>>> Farhan
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
  2019-07-11 20:09             ` Eric Farman
@ 2019-07-12 13:59               ` Halil Pasic
  0 siblings, 0 replies; 23+ messages in thread
From: Halil Pasic @ 2019-07-12 13:59 UTC (permalink / raw)
  To: Eric Farman; +Cc: Farhan Ali, Cornelia Huck, linux-s390, kvm

On Thu, 11 Jul 2019 16:09:22 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> 
> 
> On 7/11/19 10:57 AM, Halil Pasic wrote:
> > On Tue, 9 Jul 2019 17:27:47 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> > 
> >>
> >>
> >> On 07/09/2019 10:21 AM, Halil Pasic wrote:
> >>> On Tue, 9 Jul 2019 09:46:51 -0400
> >>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
> >>>>> On Mon,  8 Jul 2019 16:10:37 -0400
> >>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>>>
> >>>>>> There is a small window where it's possible that we could be working
> >>>>>> on an interrupt (queued in the workqueue) and setting up a channel
> >>>>>> program (i.e allocating memory, pinning pages, translating address).
> >>>>>> This can lead to allocating and freeing the channel program at the
> >>>>>> same time and can cause memory corruption.
> >>>>>>
> >>>>>> Let's not call cp_free if we are currently processing a channel program.
> >>>>>> The only way we know for sure that we don't have a thread setting
> >>>>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
> >>>>>
> >>>>> Can we pinpoint a commit that introduced this bug, or has it been there
> >>>>> since the beginning?
> >>>>>
> >>>>
> >>>> I think the problem was always there.
> >>>>
> >>>
> >>> I think it became relevant with the async stuff. Because after the async
> >>> stuff was added we start getting solicited interrupts that are not about
> >>> channel program is done. At least this is how I remember the discussion.
> >>>
> > 
> > You seem to have ignored this comment. 
> 
> I read both comments as being in agreement with one another.

Which both comments do you see in agreement? The one that states 'was
always there' and the one that states 'was introduced by the async
series'?

> The
> problem has always been there, but didn't mean anything until we had
> another mechanism (async) to drive additional interrupts.  Hence the v3
> patch including the async patch in a Fixes tag.
> 

Sorry, when I started writing this response, there was no v3 out yet.
Later I've seen the Fixes tag in v3. I'm not sure it is the correct one
though. Hence my question about cp->is_initialized.

> > BTW wasn't the cp->is_initialized
> > make 'Make it safe to call the cp accessors in any case, so we can call
> > them unconditionally.'?
> > 
> > @Connie: Your opinion as the author of that patch and of the cited
> > sentence?
> > >>>>>>
> >>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>>>> ---
> >>>>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
> >>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> >>>>>> index 4e3a903..0357165 100644
> >>>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >>>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >>>>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
> >>>>>>    		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >>>>>>    	if (scsw_is_solicited(&irb->scsw)) {
> >>>>>>    		cp_update_scsw(&private->cp, &irb->scsw);

<BACKREF_1>

> >>>>>> -		if (is_final)
> >>>>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> >>>
> >>> Ain't private->state potentially used by multiple threads of execution?
> >>
> >> yes
> >>

</BACKREF_1>

> >> One of the paths I can think of is a machine check from the host which 
> >> will ultimately call vfio_ccw_sch_event callback which could set state 
> >> to NOT_OPER or IDLE.
> >>

<BACKREF_2>

> >>> Do we need to use atomic operations or external synchronization to avoid
> >>> this being another gamble? Or am I missing something?
> >>
> >> I think we probably should think about atomic operations for 
> >> synchronizing the state (and it could be a separate add on patch?).
> >>

</BACKREF_2>

> >> But for preventing 2 threads from stomping on the cp the check should be 
> >> enough, unless I am missing something?
> >>
> > 
> > Usually programming languages don't like incorrectly synchronized
> > programs. One tends to end up in undefined behavior land -- form language
> > perspective. That doesn't actually mean you are bound to see strange
> > stuff. With implementation spec + ABI spec + platform/architecture
> > spec one may end up with things being well defined. But it that is a much
> > deeper rabbit hole.
> > 
> > The nice thing about condition state == VFIO_CCW_STATE_CP_PENDING is
> > that it can tolerate stale state values. The bad case at hand
> > (you free but you should not) would be we see a stale
> > VFIO_CCW_STATE_CP_PENDING but we are actually
> > VFIO_CCW_STATE_CP_PROCESSING. That is pretty difficult to imagine
> > because one can enter VFIO_CCW_STATE_CP_PROCESSING only form
> > VFIO_CCW_STATE_CP_PENDING afair. 
> 
> I think you're backwards here.  The path is IDLE -> CP_PROCESSING ->
> (CP_PENDING | IDLE)

That is what I tried to say. The backwards twist probably comes from
the fact that I'm discussing what can happen if we read stale value
from state.

> 
> > On s390x torn reads/writes (i.e.
> > observing something that ain't either the old nor the new value) on an
> > int shouldn't be a concern.
> > 
> > The other bad case (where you don't free albeit you should) looks a
> > bit trickier.
> 
> I'm afraid I don't understand your intention with the above paragraphs.  :(
> 

All three paragraphs are about discussing what can happen or can not
happen in practice. The paragraph before those three is about the so
called academic aspects.

> > 
> > I'm not a fan of keeping races around without good reasons. And I don't
> > see good reasons here. I'm no fan of needlessly complicated solutions
> > either.
> > 
> > But seems, at least with my beliefs about races, I'm the oddball
> > here. 
> 
> The "race" here is that we have one synchronous operation (SSCH) and two
> asynchronous operations (HSCH, CSCH), both of which interact with one
> another and generate interrupts that pass through this chunk of code.
> 

I don't agree. Please consider the stuff between the <BACKREF_[12]> tags.
In my reading Farhan agrees that we have a data race on private->state.

If you did not get that, no wonder my email makes little sense.

> I have not fully considered this patch yet, but the race is a concern to
> all of us oddballs.  

Yes, but seems to a different extent. The rest of the guys are fine with
just plainly accessing private->state in <BACKREF_1>, and do different
logic based on the value, even though nobody seems to argue that the
accesses to private->state involve race.

> I have not chimed in any great detail because I
> only got through the first couple patches in v1 before going on holiday,
> and the discussions on v1/v2 are numerous.

My email was mostly addressed to Farhan, the author of the patch. No
need to for apologies. :)

Regards,
Halil


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-07-12 13:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-08 20:10 [RFC v2 0/5] Some vfio-ccw fixes Farhan Ali
2019-07-08 20:10 ` [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64 Farhan Ali
2019-07-09  9:57   ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init Farhan Ali
2019-07-09 10:06   ` Cornelia Huck
2019-07-09 14:07     ` Farhan Ali
2019-07-09 14:18       ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 3/5] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Farhan Ali
2019-07-09 10:08   ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program Farhan Ali
2019-07-09 10:16   ` Cornelia Huck
2019-07-09 13:46     ` Farhan Ali
2019-07-09 14:21       ` Halil Pasic
2019-07-09 21:27         ` Farhan Ali
2019-07-10 13:45           ` Cornelia Huck
2019-07-10 16:10             ` Farhan Ali
2019-07-11 12:28               ` Eric Farman
2019-07-11 14:57           ` Halil Pasic
2019-07-11 20:09             ` Eric Farman
2019-07-12 13:59               ` Halil Pasic
2019-07-08 20:10 ` [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch Farhan Ali
2019-07-09 10:14   ` Cornelia Huck
2019-07-09 12:47     ` Farhan Ali

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).