All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR
@ 2021-05-11 19:56 Eric Farman
  2021-05-11 19:56 ` [PATCH v6 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Eric Farman @ 2021-05-11 19:56 UTC (permalink / raw)
  To: Cornelia Huck, Matthew Rosato, Halil Pasic
  Cc: Jared Rossi, linux-s390, kvm, Eric Farman

Hi Conny, Matt, Halil,

Here's one (last?) update to my proposal for handling the collision
between interrupts for START SUBCHANNEL and HALT/CLEAR SUBCHANNEL.

Only change here is to include Conny's suggestions on patch 3.

Thanks,
Eric

Changelog:
v5->v6:
 - Add a block comment and rename variable in patch 3 [CH]
 - Drop RFC tag [EF]

v4->v5:
 - Applied Conny's r-b to patches 1 and 3
 - Dropped patch 2 and 4
 - Use a "finished" flag in the interrupt completion path

Previous versions:
v5: https://lore.kernel.org/kvm/20210510205646.1845844-1-farman@linux.ibm.com/
v4: https://lore.kernel.org/kvm/20210413182410.1396170-1-farman@linux.ibm.com/
v3: https://lore.kernel.org/kvm/20200616195053.99253-1-farman@linux.ibm.com/
v2: https://lore.kernel.org/kvm/20200513142934.28788-1-farman@linux.ibm.com/
v1: https://lore.kernel.org/kvm/20200124145455.51181-1-farman@linux.ibm.com/

Eric Farman (3):
  vfio-ccw: Check initialized flag in cp_init()
  vfio-ccw: Reset FSM state to IDLE inside FSM
  vfio-ccw: Serialize FSM IDLE state with I/O completion

 drivers/s390/cio/vfio_ccw_cp.c  |  4 ++++
 drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++--
 drivers/s390/cio/vfio_ccw_fsm.c |  1 +
 drivers/s390/cio/vfio_ccw_ops.c |  2 --
 4 files changed, 15 insertions(+), 4 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/3] vfio-ccw: Check initialized flag in cp_init()
  2021-05-11 19:56 [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
@ 2021-05-11 19:56 ` Eric Farman
  2021-05-11 20:10   ` Matthew Rosato
  2021-05-11 19:56 ` [PATCH v6 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Eric Farman @ 2021-05-11 19:56 UTC (permalink / raw)
  To: Cornelia Huck, Matthew Rosato, Halil Pasic
  Cc: Jared Rossi, linux-s390, kvm, Eric Farman

We have a really nice flag in the channel_program struct that
indicates if it had been initialized by cp_init(), and use it
as a guard in the other cp accessor routines, but not for a
duplicate call into cp_init(). The possibility of this occurring
is low, because that flow is protected by the private->io_mutex
and FSM CP_PROCESSING state. But then why bother checking it
in (for example) cp_prefetch() then?

Let's just be consistent and check for that in cp_init() too.

Fixes: 71189f263f8a3 ("vfio-ccw: make it safe to access channel programs")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index b9febc581b1f..8d1b2771c1aa 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -638,6 +638,10 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 1);
 	int ret;
 
+	/* this is an error in the caller */
+	if (cp->initialized)
+		return -EBUSY;
+
 	/*
 	 * We only support prefetching the channel program. We assume all channel
 	 * programs executed by supported guests likewise support prefetching.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM
  2021-05-11 19:56 [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
  2021-05-11 19:56 ` [PATCH v6 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
@ 2021-05-11 19:56 ` Eric Farman
  2021-05-11 20:38   ` Matthew Rosato
  2021-05-11 19:56 ` [PATCH v6 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Eric Farman @ 2021-05-11 19:56 UTC (permalink / raw)
  To: Cornelia Huck, Matthew Rosato, Halil Pasic
  Cc: Jared Rossi, linux-s390, kvm, Eric Farman

When an I/O request is made, the fsm_io_request() routine
moves the FSM state from IDLE to CP_PROCESSING, and then
fsm_io_helper() moves it to CP_PENDING if the START SUBCHANNEL
received a cc0. Yet, the error case to go from CP_PROCESSING
back to IDLE is done after the FSM call returns.

Let's move this up into the FSM proper, to provide some
better symmetry when unwinding in this case.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_fsm.c | 1 +
 drivers/s390/cio/vfio_ccw_ops.c | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index 23e61aa638e4..e435a9cd92da 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -318,6 +318,7 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 	}
 
 err_out:
+	private->state = VFIO_CCW_STATE_IDLE;
 	trace_vfio_ccw_fsm_io_request(scsw->cmd.fctl, schid,
 				      io_region->ret_code, errstr);
 }
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 767ac41686fe..5971641964c6 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -276,8 +276,6 @@ static ssize_t vfio_ccw_mdev_write_io_region(struct vfio_ccw_private *private,
 	}
 
 	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
-	if (region->ret_code != 0)
-		private->state = VFIO_CCW_STATE_IDLE;
 	ret = (region->ret_code != 0) ? region->ret_code : count;
 
 out_unlock:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion
  2021-05-11 19:56 [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
  2021-05-11 19:56 ` [PATCH v6 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
  2021-05-11 19:56 ` [PATCH v6 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
@ 2021-05-11 19:56 ` Eric Farman
  2021-05-11 20:45   ` Matthew Rosato
  2021-05-12 10:49   ` Cornelia Huck
  2021-05-13  1:05 ` [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Halil Pasic
  2021-05-18  9:49 ` Cornelia Huck
  4 siblings, 2 replies; 12+ messages in thread
From: Eric Farman @ 2021-05-11 19:56 UTC (permalink / raw)
  To: Cornelia Huck, Matthew Rosato, Halil Pasic
  Cc: Jared Rossi, linux-s390, kvm, Eric Farman

Today, the stacked call to vfio_ccw_sch_io_todo() does three things:

  1) Update a solicited IRB with CP information, and release the CP
     if the interrupt was the end of a START operation.
  2) Copy the IRB data into the io_region, under the protection of
     the io_mutex
  3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
     vfio-ccw can accept more work.

The trouble is that step 3 is (A) invoked for both solicited and
unsolicited interrupts, and (B) sitting after the mutex for step 2.
This second piece becomes a problem if it processes an interrupt
for a CLEAR SUBCHANNEL while another thread initiates a START,
thus allowing the CP and FSM states to get out of sync. That is:

    CPU 1                           CPU 2
    fsm_do_clear()
    fsm_irq()
                                    fsm_io_request()
    vfio_ccw_sch_io_todo()
                                    fsm_io_helper()

Since the FSM state and CP should be kept in sync, let's make a
note when the CP is released, and rely on that as an indication
that the FSM should also be reset at the end of this routine and
open up the device for more work.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 8c625b530035..9b61e9b131ad 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -86,6 +86,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
 	struct vfio_ccw_private *private;
 	struct irb *irb;
 	bool is_final;
+	bool cp_is_finished = false;
 
 	private = container_of(work, struct vfio_ccw_private, io_work);
 	irb = &private->irb;
@@ -94,14 +95,21 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
 		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
 	if (scsw_is_solicited(&irb->scsw)) {
 		cp_update_scsw(&private->cp, &irb->scsw);
-		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
+		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) {
 			cp_free(&private->cp);
+			cp_is_finished = true;
+		}
 	}
 	mutex_lock(&private->io_mutex);
 	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
 	mutex_unlock(&private->io_mutex);
 
-	if (private->mdev && is_final)
+	/*
+	 * Reset to IDLE only if processing of a channel program
+	 * has finished. Do not overwrite a possible processing
+	 * state if the final interrupt was for HSCH or CSCH.
+	 */
+	if (private->mdev && cp_is_finished)
 		private->state = VFIO_CCW_STATE_IDLE;
 
 	if (private->io_trigger)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/3] vfio-ccw: Check initialized flag in cp_init()
  2021-05-11 19:56 ` [PATCH v6 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
@ 2021-05-11 20:10   ` Matthew Rosato
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Rosato @ 2021-05-11 20:10 UTC (permalink / raw)
  To: Eric Farman, Cornelia Huck, Halil Pasic; +Cc: Jared Rossi, linux-s390, kvm

On 5/11/21 3:56 PM, Eric Farman wrote:
> We have a really nice flag in the channel_program struct that
> indicates if it had been initialized by cp_init(), and use it
> as a guard in the other cp accessor routines, but not for a
> duplicate call into cp_init(). The possibility of this occurring
> is low, because that flow is protected by the private->io_mutex
> and FSM CP_PROCESSING state. But then why bother checking it
> in (for example) cp_prefetch() then?
> 
> Let's just be consistent and check for that in cp_init() too.
> 
> Fixes: 71189f263f8a3 ("vfio-ccw: make it safe to access channel programs")
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>

Acked-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
>   drivers/s390/cio/vfio_ccw_cp.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index b9febc581b1f..8d1b2771c1aa 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -638,6 +638,10 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
>   	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 1);
>   	int ret;
>   
> +	/* this is an error in the caller */
> +	if (cp->initialized)
> +		return -EBUSY;
> +
>   	/*
>   	 * We only support prefetching the channel program. We assume all channel
>   	 * programs executed by supported guests likewise support prefetching.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM
  2021-05-11 19:56 ` [PATCH v6 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
@ 2021-05-11 20:38   ` Matthew Rosato
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Rosato @ 2021-05-11 20:38 UTC (permalink / raw)
  To: Eric Farman, Cornelia Huck, Halil Pasic; +Cc: Jared Rossi, linux-s390, kvm

On 5/11/21 3:56 PM, Eric Farman wrote:
> When an I/O request is made, the fsm_io_request() routine
> moves the FSM state from IDLE to CP_PROCESSING, and then
> fsm_io_helper() moves it to CP_PENDING if the START SUBCHANNEL
> received a cc0. Yet, the error case to go from CP_PROCESSING
> back to IDLE is done after the FSM call returns.
> 
> Let's move this up into the FSM proper, to provide some
> better symmetry when unwinding in this case.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>

Acked-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
>   drivers/s390/cio/vfio_ccw_fsm.c | 1 +
>   drivers/s390/cio/vfio_ccw_ops.c | 2 --
>   2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
> index 23e61aa638e4..e435a9cd92da 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -318,6 +318,7 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>   	}
>   
>   err_out:
> +	private->state = VFIO_CCW_STATE_IDLE;
>   	trace_vfio_ccw_fsm_io_request(scsw->cmd.fctl, schid,
>   				      io_region->ret_code, errstr);
>   }
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index 767ac41686fe..5971641964c6 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -276,8 +276,6 @@ static ssize_t vfio_ccw_mdev_write_io_region(struct vfio_ccw_private *private,
>   	}
>   
>   	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
> -	if (region->ret_code != 0)
> -		private->state = VFIO_CCW_STATE_IDLE;
>   	ret = (region->ret_code != 0) ? region->ret_code : count;
>   
>   out_unlock:
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion
  2021-05-11 19:56 ` [PATCH v6 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
@ 2021-05-11 20:45   ` Matthew Rosato
  2021-05-12 10:49   ` Cornelia Huck
  1 sibling, 0 replies; 12+ messages in thread
From: Matthew Rosato @ 2021-05-11 20:45 UTC (permalink / raw)
  To: Eric Farman, Cornelia Huck, Halil Pasic; +Cc: Jared Rossi, linux-s390, kvm

On 5/11/21 3:56 PM, Eric Farman wrote:
> Today, the stacked call to vfio_ccw_sch_io_todo() does three things:
> 
>    1) Update a solicited IRB with CP information, and release the CP
>       if the interrupt was the end of a START operation.
>    2) Copy the IRB data into the io_region, under the protection of
>       the io_mutex
>    3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
>       vfio-ccw can accept more work.
> 
> The trouble is that step 3 is (A) invoked for both solicited and
> unsolicited interrupts, and (B) sitting after the mutex for step 2.
> This second piece becomes a problem if it processes an interrupt
> for a CLEAR SUBCHANNEL while another thread initiates a START,
> thus allowing the CP and FSM states to get out of sync. That is:
> 
>      CPU 1                           CPU 2
>      fsm_do_clear()
>      fsm_irq()
>                                      fsm_io_request()
>      vfio_ccw_sch_io_todo()
>                                      fsm_io_helper()
> 
> Since the FSM state and CP should be kept in sync, let's make a
> note when the CP is released, and rely on that as an indication
> that the FSM should also be reset at the end of this routine and
> open up the device for more work.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>

Thanks for the detailed commit message and comment block -- this makes 
sense to me.

Acked-by: Matthew Rosato <mjrosato@linux.ibm.com>

> ---
>   drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 8c625b530035..9b61e9b131ad 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -86,6 +86,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>   	struct vfio_ccw_private *private;
>   	struct irb *irb;
>   	bool is_final;
> +	bool cp_is_finished = false;
>   
>   	private = container_of(work, struct vfio_ccw_private, io_work);
>   	irb = &private->irb;
> @@ -94,14 +95,21 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>   		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>   	if (scsw_is_solicited(&irb->scsw)) {
>   		cp_update_scsw(&private->cp, &irb->scsw);
> -		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) {
>   			cp_free(&private->cp);
> +			cp_is_finished = true;
> +		}
>   	}
>   	mutex_lock(&private->io_mutex);
>   	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
>   	mutex_unlock(&private->io_mutex);
>   
> -	if (private->mdev && is_final)
> +	/*
> +	 * Reset to IDLE only if processing of a channel program
> +	 * has finished. Do not overwrite a possible processing
> +	 * state if the final interrupt was for HSCH or CSCH.
> +	 */
> +	if (private->mdev && cp_is_finished)
>   		private->state = VFIO_CCW_STATE_IDLE;
>   
>   	if (private->io_trigger)
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion
  2021-05-11 19:56 ` [PATCH v6 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
  2021-05-11 20:45   ` Matthew Rosato
@ 2021-05-12 10:49   ` Cornelia Huck
  1 sibling, 0 replies; 12+ messages in thread
From: Cornelia Huck @ 2021-05-12 10:49 UTC (permalink / raw)
  To: Eric Farman; +Cc: Matthew Rosato, Halil Pasic, Jared Rossi, linux-s390, kvm

On Tue, 11 May 2021 21:56:31 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Today, the stacked call to vfio_ccw_sch_io_todo() does three things:
> 
>   1) Update a solicited IRB with CP information, and release the CP
>      if the interrupt was the end of a START operation.
>   2) Copy the IRB data into the io_region, under the protection of
>      the io_mutex
>   3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
>      vfio-ccw can accept more work.
> 
> The trouble is that step 3 is (A) invoked for both solicited and
> unsolicited interrupts, and (B) sitting after the mutex for step 2.
> This second piece becomes a problem if it processes an interrupt
> for a CLEAR SUBCHANNEL while another thread initiates a START,
> thus allowing the CP and FSM states to get out of sync. That is:
> 
>     CPU 1                           CPU 2
>     fsm_do_clear()
>     fsm_irq()
>                                     fsm_io_request()
>     vfio_ccw_sch_io_todo()
>                                     fsm_io_helper()
> 
> Since the FSM state and CP should be kept in sync, let's make a
> note when the CP is released, and rely on that as an indication
> that the FSM should also be reset at the end of this routine and
> open up the device for more work.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)

Looking good :)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR
  2021-05-11 19:56 [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
                   ` (2 preceding siblings ...)
  2021-05-11 19:56 ` [PATCH v6 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
@ 2021-05-13  1:05 ` Halil Pasic
  2021-05-13 18:33   ` Eric Farman
  2021-05-18  9:49 ` Cornelia Huck
  4 siblings, 1 reply; 12+ messages in thread
From: Halil Pasic @ 2021-05-13  1:05 UTC (permalink / raw)
  To: Eric Farman; +Cc: Cornelia Huck, Matthew Rosato, Jared Rossi, linux-s390, kvm

On Tue, 11 May 2021 21:56:28 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Hi Conny, Matt, Halil,
> 
> Here's one (last?) update to my proposal for handling the collision
> between interrupts for START SUBCHANNEL and HALT/CLEAR SUBCHANNEL.
> 
> Only change here is to include Conny's suggestions on patch 3.
> 
> Thanks,

I believe these changes are beneficial, although I don't understand
everything about them. In that sense I'm happy with the these getting
merged.

Let me also spend some words answering the unasked question, what I'm
not understanding about these.

Not understanding how the problem stated in the cover letter of v4 is
actually resolved is certainly the most important one. Let me cite
the relevant part of it (your cover letter already contains a link to
the full version).

"""

	CPU 1			CPU 2
 1	CLEAR SUBCHANNEL
 2	fsm_irq()
 3				START SUBCHANNEL
 4	vfio_ccw_sch_io_todo()
 5				fsm_irq()
 6				vfio_ccw_sch_io_todo()

From the channel subsystem's point of view the CLEAR SUBCHANNEL (step 1)
is complete once step 2 is called, as the Interrupt Response Block (IRB)
has been presented and the TEST SUBCHANNEL was driven by the cio layer.
Thus, the START SUBCHANNEL (step 3) is submitted [1] and gets a cc=0 to
indicate the I/O was accepted. However, step 2 stacks the bulk of the
actual work onto a workqueue for when the subchannel lock is NOT held,
and is unqueued at step 4. That code misidentifies the data in the IRB
as being associated with the newly active I/O, and may release memory
that is actively in use by the channel subsystem and/or device. Eww.
"""

The last sentence clearly states "may release memory that is actively
used by ... the device", and I understood it refers to the invocation
of cp_free() from vfio_ccw_sch_io_todo(). Patch 3 of this series does
not change the conditions under which cp_free() is called.

Looking at the cited diagram, since patch 3 changes things in
vfio_ccw_sch_io_todo() it probably ain't affecting steps 1-3 and
I understood the description so that bad free happens in step 4.

My guess is that your change from patch 3 somehow via the fsm prevents
the SSCH on CPU 2 (using the diagram) from being executed  if it actually
happens to be after vfio_ccw_sch_io_todo(). And patch 1 is supposed to
prevent the SSCH on CPU2 from being executed in the depicted case because
if there is a cp to free, then we would bail out form if we see it
while processing the new IO request.

In any case, I don't want to hold this up any further.

Regards,
Halil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR
  2021-05-13  1:05 ` [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Halil Pasic
@ 2021-05-13 18:33   ` Eric Farman
  2021-05-14  0:29     ` Halil Pasic
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Farman @ 2021-05-13 18:33 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Cornelia Huck, Matthew Rosato, Jared Rossi, linux-s390, kvm

On Thu, 2021-05-13 at 03:05 +0200, Halil Pasic wrote:
> On Tue, 11 May 2021 21:56:28 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > Hi Conny, Matt, Halil,
> > 
> > Here's one (last?) update to my proposal for handling the collision
> > between interrupts for START SUBCHANNEL and HALT/CLEAR SUBCHANNEL.
> > 
> > Only change here is to include Conny's suggestions on patch 3.
> > 
> > Thanks,
> 
> I believe these changes are beneficial, although I don't understand
> everything about them. In that sense I'm happy with the these getting
> merged.
> 
> Let me also spend some words answering the unasked question, what I'm
> not understanding about these.
> 
> Not understanding how the problem stated in the cover letter of v4 is
> actually resolved is certainly the most important one. 

Per our phone call last week, one of Conny's suggestions from that
particular version was related to vfio_ccw_sch_io_todo() and was giving
me some difficulties. We all agreed that I should send what I had, and
leave the other corner case(s) to be addressed later along with the
broader serialization topic throughout the driver. That is still my
intention, but I suspect that's where you are going here...

(I realize I said "last?" at the top here. Poor decision on my part.)

> Let me cite
> the relevant part of it (your cover letter already contains a link to
> the full version).
> 
> """
> 
> 	CPU 1			CPU 2
>  1	CLEAR SUBCHANNEL
>  2	fsm_irq()
>  3				START SUBCHANNEL
>  4	vfio_ccw_sch_io_todo()
>  5				fsm_irq()
>  6				vfio_ccw_sch_io_todo()
> 
> From the channel subsystem's point of view the CLEAR SUBCHANNEL (step
> 1)
> is complete once step 2 is called, as the Interrupt Response Block
> (IRB)
> has been presented and the TEST SUBCHANNEL was driven by the cio
> layer.
> Thus, the START SUBCHANNEL (step 3) is submitted [1] and gets a cc=0
> to
> indicate the I/O was accepted. However, step 2 stacks the bulk of the
> actual work onto a workqueue for when the subchannel lock is NOT
> held,
> and is unqueued at step 4. That code misidentifies the data in the
> IRB
> as being associated with the newly active I/O, and may release memory
> that is actively in use by the channel subsystem and/or device. Eww.
> """
> 
> The last sentence clearly states "may release memory that is actively
> used by ... the device", and I understood it refers to the invocation
> of cp_free() from vfio_ccw_sch_io_todo(). Patch 3 of this series does
> not change the conditions under which cp_free() is called.

Correct.

> 
> Looking at the cited diagram, since patch 3 changes things in
> vfio_ccw_sch_io_todo() it probably ain't affecting steps 1-3 and
> I understood the description so that bad free happens in step 4.

You are correct that patch 3 touches vfio_ccw_sch_io_todo(), but it is
not addressing the possibility of a bad free described in the old cover
letter. The commit message for patch 3 describes pretty clearly the
scenario in question.

> 
> My guess is that your change from patch 3 somehow via the fsm
> prevents
> the SSCH on CPU 2 (using the diagram) from being executed  if it
> actually
> happens to be after vfio_ccw_sch_io_todo(). 

That's an incorrect guess. The code in vfio_ccw_sch_io_todo() today
says "If another CPU is building an I/O (FSM is CP_PROCESSING), or
there is no CPU building an I/O (FSM is IDLE), then skip the cp_free()
call." The change in patch 3 says that in that situation, it should
also not adjust the FSM state because the interrupt being handled on
CPU1 was unrelated (maybe it was for a HALT/CLEAR, maybe it was an
unsolicited interrupt). The SSCH on CPU2 will still go on as expected.

> And patch 1 is supposed to
> prevent the SSCH on CPU2 from being executed in the depicted case
> because
> if there is a cp to free, then we would bail out form if we see it
> while processing the new IO request.

Not really. It's the FSM's job to prevent a second SSCH, and route to
fsm_io_retry() or fsm_io_busy() as appropriate. But the scenario
described by patch 3 in this series would leave the cp initialized,
while also resetting the FSM back to IDLE. As such, the FSM was free to
allow another SSCH in, which would then re-initialize the cp and orphan
the existing (active) cp resources.

With the application of patch 3, that concern isn't present, so the
change in patch 1 is really a NOP. But it allows for consistency in how
the cp_*() functions are working, and a safety valve should this
situation show up another way. (We'll get trace data that says
cp_init() bailed out, rather than going on as if nothing were wrong.)

> 
> In any case, I don't want to hold this up any further.
> 

Thanks for that. You are correct that there's still a potential issue
here, in the handoff between fsm_irq() and vfio_ccw_sch_io_todo(), and
another fsm_io_request() that would arrive between those points. But
it's not anything that we haven't already discussed, and will hopefully
begin discussing in the next couple of weeks.

Thanks,
Eric

> Regards,
> Halil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR
  2021-05-13 18:33   ` Eric Farman
@ 2021-05-14  0:29     ` Halil Pasic
  0 siblings, 0 replies; 12+ messages in thread
From: Halil Pasic @ 2021-05-14  0:29 UTC (permalink / raw)
  To: Eric Farman; +Cc: Cornelia Huck, Matthew Rosato, Jared Rossi, linux-s390, kvm

On Thu, 13 May 2021 14:33:20 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> > 
> > In any case, I don't want to hold this up any further.
> >   
> 
> Thanks for that. You are correct that there's still a potential issue
> here, in the handoff between fsm_irq() and vfio_ccw_sch_io_todo(), and
> another fsm_io_request() that would arrive between those points. But
> it's not anything that we haven't already discussed, and will hopefully
> begin discussing in the next couple of weeks.

Thanks for all the explanations and your patience. I know, I can be
difficult when I'm at discomfort due to dissonances in my mental model
of a certain problem or a certain solution. Will try to carve out some
time to at least have a look at those as well.

Have a nice weekend!

Halil 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR
  2021-05-11 19:56 [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
                   ` (3 preceding siblings ...)
  2021-05-13  1:05 ` [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Halil Pasic
@ 2021-05-18  9:49 ` Cornelia Huck
  4 siblings, 0 replies; 12+ messages in thread
From: Cornelia Huck @ 2021-05-18  9:49 UTC (permalink / raw)
  To: Eric Farman; +Cc: Matthew Rosato, Halil Pasic, Jared Rossi, linux-s390, kvm

On Tue, 11 May 2021 21:56:28 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Hi Conny, Matt, Halil,
> 
> Here's one (last?) update to my proposal for handling the collision
> between interrupts for START SUBCHANNEL and HALT/CLEAR SUBCHANNEL.
> 
> Only change here is to include Conny's suggestions on patch 3.
> 
> Thanks,
> Eric
> 
> Changelog:
> v5->v6:
>  - Add a block comment and rename variable in patch 3 [CH]
>  - Drop RFC tag [EF]
> 
> v4->v5:
>  - Applied Conny's r-b to patches 1 and 3
>  - Dropped patch 2 and 4
>  - Use a "finished" flag in the interrupt completion path
> 
> Previous versions:
> v5: https://lore.kernel.org/kvm/20210510205646.1845844-1-farman@linux.ibm.com/
> v4: https://lore.kernel.org/kvm/20210413182410.1396170-1-farman@linux.ibm.com/
> v3: https://lore.kernel.org/kvm/20200616195053.99253-1-farman@linux.ibm.com/
> v2: https://lore.kernel.org/kvm/20200513142934.28788-1-farman@linux.ibm.com/
> v1: https://lore.kernel.org/kvm/20200124145455.51181-1-farman@linux.ibm.com/
> 
> Eric Farman (3):
>   vfio-ccw: Check initialized flag in cp_init()
>   vfio-ccw: Reset FSM state to IDLE inside FSM
>   vfio-ccw: Serialize FSM IDLE state with I/O completion
> 
>  drivers/s390/cio/vfio_ccw_cp.c  |  4 ++++
>  drivers/s390/cio/vfio_ccw_drv.c | 12 ++++++++++--
>  drivers/s390/cio/vfio_ccw_fsm.c |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c |  2 --
>  4 files changed, 15 insertions(+), 4 deletions(-)
> 

Thanks, applied.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-18  9:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11 19:56 [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2021-05-11 19:56 ` [PATCH v6 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
2021-05-11 20:10   ` Matthew Rosato
2021-05-11 19:56 ` [PATCH v6 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
2021-05-11 20:38   ` Matthew Rosato
2021-05-11 19:56 ` [PATCH v6 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
2021-05-11 20:45   ` Matthew Rosato
2021-05-12 10:49   ` Cornelia Huck
2021-05-13  1:05 ` [PATCH v6 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Halil Pasic
2021-05-13 18:33   ` Eric Farman
2021-05-14  0:29     ` Halil Pasic
2021-05-18  9:49 ` Cornelia Huck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.