kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v5 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR
@ 2021-05-10 20:56 Eric Farman
  2021-05-10 20:56 ` [RFC PATCH v5 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eric Farman @ 2021-05-10 20:56 UTC (permalink / raw)
  To: Cornelia Huck, Matthew Rosato, Halil Pasic
  Cc: Jared Rossi, linux-s390, kvm, Eric Farman

Hi Conny, Matt, Halil,

Here's the update to my proposed series for handling the collision
between interrupts for START SUBCHANNEL and HALT/CLEAR SUBCHANNEL.
I'm feeling more confident in the state of them now based on the
discussion on v4, so will keep the cover letter brief. :)

I carried patches 1 and 3 from the last version forward, as patch
1 and 2 here. (Thanks, Conny, for the r-b's on them.)

I dropped patches 2 and 4 from the last version, as part of this
newest attempt.  The conversation on patch 4 [1] has formed into
the new patch 3.

As we'd discussed offline last week, I still have the todo for
a more proper audit of the serialization across these codepaths.
But this seems a better, simpler, fix for the code in its current
form, which addresses my problematic test case and does not
impact my usual regression tests. Any further rework for the
serialization [2] will be more invasive, and take a bit longer.

Thanks,
Eric

Changelog:
v4->v5:
 - Applied Conny's r-b to patches 1 and 3
 - Dropped patch 2 and 4
 - Use a "finished" flag in the interrupt completion path

Previous versions:
v4: https://lore.kernel.org/kvm/20210413182410.1396170-1-farman@linux.ibm.com/
v3: https://lore.kernel.org/kvm/20200616195053.99253-1-farman@linux.ibm.com/
v2: https://lore.kernel.org/kvm/20200513142934.28788-1-farman@linux.ibm.com/
v1: https://lore.kernel.org/kvm/20200124145455.51181-1-farman@linux.ibm.com/

References:
[1] https://lore.kernel.org/kvm/2c1c1e73d488673ec39d7c085a343cbd6b50fb41.camel@linux.ibm.com/
[2] https://lore.kernel.org/kvm/20210416164137.23f4631b.cohuck@redhat.com/

Eric Farman (3):
  vfio-ccw: Check initialized flag in cp_init()
  vfio-ccw: Reset FSM state to IDLE inside FSM
  vfio-ccw: Serialize FSM IDLE state with I/O completion

 drivers/s390/cio/vfio_ccw_cp.c  | 4 ++++
 drivers/s390/cio/vfio_ccw_drv.c | 8 +++++---
 drivers/s390/cio/vfio_ccw_fsm.c | 1 +
 drivers/s390/cio/vfio_ccw_ops.c | 2 --
 4 files changed, 10 insertions(+), 5 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH v5 1/3] vfio-ccw: Check initialized flag in cp_init()
  2021-05-10 20:56 [RFC PATCH v5 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
@ 2021-05-10 20:56 ` Eric Farman
  2021-05-10 20:56 ` [RFC PATCH v5 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
  2021-05-10 20:56 ` [RFC PATCH v5 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Farman @ 2021-05-10 20:56 UTC (permalink / raw)
  To: Cornelia Huck, Matthew Rosato, Halil Pasic
  Cc: Jared Rossi, linux-s390, kvm, Eric Farman

We have a really nice flag in the channel_program struct that
indicates if it had been initialized by cp_init(), and use it
as a guard in the other cp accessor routines, but not for a
duplicate call into cp_init(). The possibility of this occurring
is low, because that flow is protected by the private->io_mutex
and FSM CP_PROCESSING state. But then why bother checking it
in (for example) cp_prefetch() then?

Let's just be consistent and check for that in cp_init() too.

Fixes: 71189f263f8a3 ("vfio-ccw: make it safe to access channel programs")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_cp.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index b9febc581b1f..8d1b2771c1aa 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -638,6 +638,10 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 	static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 1);
 	int ret;
 
+	/* this is an error in the caller */
+	if (cp->initialized)
+		return -EBUSY;
+
 	/*
 	 * We only support prefetching the channel program. We assume all channel
 	 * programs executed by supported guests likewise support prefetching.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v5 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM
  2021-05-10 20:56 [RFC PATCH v5 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
  2021-05-10 20:56 ` [RFC PATCH v5 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
@ 2021-05-10 20:56 ` Eric Farman
  2021-05-10 20:56 ` [RFC PATCH v5 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Farman @ 2021-05-10 20:56 UTC (permalink / raw)
  To: Cornelia Huck, Matthew Rosato, Halil Pasic
  Cc: Jared Rossi, linux-s390, kvm, Eric Farman

When an I/O request is made, the fsm_io_request() routine
moves the FSM state from IDLE to CP_PROCESSING, and then
fsm_io_helper() moves it to CP_PENDING if the START SUBCHANNEL
received a cc0. Yet, the error case to go from CP_PROCESSING
back to IDLE is done after the FSM call returns.

Let's move this up into the FSM proper, to provide some
better symmetry when unwinding in this case.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/cio/vfio_ccw_fsm.c | 1 +
 drivers/s390/cio/vfio_ccw_ops.c | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index 23e61aa638e4..e435a9cd92da 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -318,6 +318,7 @@ static void fsm_io_request(struct vfio_ccw_private *private,
 	}
 
 err_out:
+	private->state = VFIO_CCW_STATE_IDLE;
 	trace_vfio_ccw_fsm_io_request(scsw->cmd.fctl, schid,
 				      io_region->ret_code, errstr);
 }
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 767ac41686fe..5971641964c6 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -276,8 +276,6 @@ static ssize_t vfio_ccw_mdev_write_io_region(struct vfio_ccw_private *private,
 	}
 
 	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
-	if (region->ret_code != 0)
-		private->state = VFIO_CCW_STATE_IDLE;
 	ret = (region->ret_code != 0) ? region->ret_code : count;
 
 out_unlock:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH v5 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion
  2021-05-10 20:56 [RFC PATCH v5 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
  2021-05-10 20:56 ` [RFC PATCH v5 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
  2021-05-10 20:56 ` [RFC PATCH v5 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
@ 2021-05-10 20:56 ` Eric Farman
  2021-05-11 11:31   ` Cornelia Huck
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Farman @ 2021-05-10 20:56 UTC (permalink / raw)
  To: Cornelia Huck, Matthew Rosato, Halil Pasic
  Cc: Jared Rossi, linux-s390, kvm, Eric Farman

Today, the stacked call to vfio_ccw_sch_io_todo() does three things:

  1) Update a solicited IRB with CP information, and release the CP
     if the interrupt was the end of a START operation.
  2) Copy the IRB data into the io_region, under the protection of
     the io_mutex
  3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
     vfio-ccw can accept more work.

The trouble is that step 3 is (A) invoked for both solicited and
unsolicited interrupts, and (B) sitting after the mutex for step 2.
This second piece becomes a problem if it processes an interrupt
for a CLEAR SUBCHANNEL while another thread initiates a START,
thus allowing the CP and FSM states to get out of sync. That is:

    CPU 1                           CPU 2
    fsm_do_clear()
    fsm_irq()
                                    fsm_io_request()
    vfio_ccw_sch_io_todo()
                                    fsm_io_helper()

Since the FSM state and CP should be kept in sync, let's make a
note when the CP is released, and rely on that as an indication
that the FSM should also be reset at the end of this routine and
open up the device for more work.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 8c625b530035..ef39182edab5 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -85,7 +85,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
 {
 	struct vfio_ccw_private *private;
 	struct irb *irb;
-	bool is_final;
+	bool is_final, is_finished = false;
 
 	private = container_of(work, struct vfio_ccw_private, io_work);
 	irb = &private->irb;
@@ -94,14 +94,16 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
 		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
 	if (scsw_is_solicited(&irb->scsw)) {
 		cp_update_scsw(&private->cp, &irb->scsw);
-		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
+		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) {
 			cp_free(&private->cp);
+			is_finished = true;
+		}
 	}
 	mutex_lock(&private->io_mutex);
 	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
 	mutex_unlock(&private->io_mutex);
 
-	if (private->mdev && is_final)
+	if (private->mdev && is_finished)
 		private->state = VFIO_CCW_STATE_IDLE;
 
 	if (private->io_trigger)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v5 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion
  2021-05-10 20:56 ` [RFC PATCH v5 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
@ 2021-05-11 11:31   ` Cornelia Huck
  2021-05-11 18:02     ` Eric Farman
  0 siblings, 1 reply; 6+ messages in thread
From: Cornelia Huck @ 2021-05-11 11:31 UTC (permalink / raw)
  To: Eric Farman; +Cc: Matthew Rosato, Halil Pasic, Jared Rossi, linux-s390, kvm

On Mon, 10 May 2021 22:56:46 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Today, the stacked call to vfio_ccw_sch_io_todo() does three things:
> 
>   1) Update a solicited IRB with CP information, and release the CP
>      if the interrupt was the end of a START operation.
>   2) Copy the IRB data into the io_region, under the protection of
>      the io_mutex
>   3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
>      vfio-ccw can accept more work.
> 
> The trouble is that step 3 is (A) invoked for both solicited and
> unsolicited interrupts, and (B) sitting after the mutex for step 2.
> This second piece becomes a problem if it processes an interrupt
> for a CLEAR SUBCHANNEL while another thread initiates a START,
> thus allowing the CP and FSM states to get out of sync. That is:
> 
>     CPU 1                           CPU 2
>     fsm_do_clear()
>     fsm_irq()
>                                     fsm_io_request()
>     vfio_ccw_sch_io_todo()
>                                     fsm_io_helper()
> 
> Since the FSM state and CP should be kept in sync, let's make a
> note when the CP is released, and rely on that as an indication
> that the FSM should also be reset at the end of this routine and
> open up the device for more work.
> 
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_drv.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> index 8c625b530035..ef39182edab5 100644
> --- a/drivers/s390/cio/vfio_ccw_drv.c
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -85,7 +85,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  {
>  	struct vfio_ccw_private *private;
>  	struct irb *irb;
> -	bool is_final;
> +	bool is_final, is_finished = false;

<bikeshed>
"is_finished" does not really say what is finished; maybe call it
"cp_is_finished"?
</bikeshed>

>  
>  	private = container_of(work, struct vfio_ccw_private, io_work);
>  	irb = &private->irb;
> @@ -94,14 +94,16 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>  		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>  	if (scsw_is_solicited(&irb->scsw)) {
>  		cp_update_scsw(&private->cp, &irb->scsw);
> -		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) {
>  			cp_free(&private->cp);
> +			is_finished = true;
> +		}
>  	}
>  	mutex_lock(&private->io_mutex);
>  	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
>  	mutex_unlock(&private->io_mutex);
>  
> -	if (private->mdev && is_final)
> +	if (private->mdev && is_finished)

Maybe add a comment?

/*
 * Reset to idle if processing of a channel program
 * has finished; but do not overwrite a possible
 * processing state if we got a final interrupt for hsch
 * or csch.
 */

Otherwise, I see us scratching our heads again in a few months :)

>  		private->state = VFIO_CCW_STATE_IDLE;
>  
>  	if (private->io_trigger)

Patch looks good to me.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v5 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion
  2021-05-11 11:31   ` Cornelia Huck
@ 2021-05-11 18:02     ` Eric Farman
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Farman @ 2021-05-11 18:02 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Matthew Rosato, Halil Pasic, Jared Rossi, linux-s390, kvm

On Tue, 2021-05-11 at 13:31 +0200, Cornelia Huck wrote:
> On Mon, 10 May 2021 22:56:46 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > Today, the stacked call to vfio_ccw_sch_io_todo() does three
> > things:
> > 
> >   1) Update a solicited IRB with CP information, and release the CP
> >      if the interrupt was the end of a START operation.
> >   2) Copy the IRB data into the io_region, under the protection of
> >      the io_mutex
> >   3) Reset the vfio-ccw FSM state to IDLE to acknowledge that
> >      vfio-ccw can accept more work.
> > 
> > The trouble is that step 3 is (A) invoked for both solicited and
> > unsolicited interrupts, and (B) sitting after the mutex for step 2.
> > This second piece becomes a problem if it processes an interrupt
> > for a CLEAR SUBCHANNEL while another thread initiates a START,
> > thus allowing the CP and FSM states to get out of sync. That is:
> > 
> >     CPU 1                           CPU 2
> >     fsm_do_clear()
> >     fsm_irq()
> >                                     fsm_io_request()
> >     vfio_ccw_sch_io_todo()
> >                                     fsm_io_helper()
> > 
> > Since the FSM state and CP should be kept in sync, let's make a
> > note when the CP is released, and rely on that as an indication
> > that the FSM should also be reset at the end of this routine and
> > open up the device for more work.
> > 
> > Signed-off-by: Eric Farman <farman@linux.ibm.com>
> > ---
> >  drivers/s390/cio/vfio_ccw_drv.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/s390/cio/vfio_ccw_drv.c
> > b/drivers/s390/cio/vfio_ccw_drv.c
> > index 8c625b530035..ef39182edab5 100644
> > --- a/drivers/s390/cio/vfio_ccw_drv.c
> > +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > @@ -85,7 +85,7 @@ static void vfio_ccw_sch_io_todo(struct
> > work_struct *work)
> >  {
> >  	struct vfio_ccw_private *private;
> >  	struct irb *irb;
> > -	bool is_final;
> > +	bool is_final, is_finished = false;
> 
> <bikeshed>
> "is_finished" does not really say what is finished; maybe call it
> "cp_is_finished"?
> </bikeshed>

Sure, that's a bit clearer.

> 
> >  
> >  	private = container_of(work, struct vfio_ccw_private, io_work);
> >  	irb = &private->irb;
> > @@ -94,14 +94,16 @@ static void vfio_ccw_sch_io_todo(struct
> > work_struct *work)
> >  		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >  	if (scsw_is_solicited(&irb->scsw)) {
> >  		cp_update_scsw(&private->cp, &irb->scsw);
> > -		if (is_final && private->state ==
> > VFIO_CCW_STATE_CP_PENDING)
> > +		if (is_final && private->state ==
> > VFIO_CCW_STATE_CP_PENDING) {
> >  			cp_free(&private->cp);
> > +			is_finished = true;
> > +		}
> >  	}
> >  	mutex_lock(&private->io_mutex);
> >  	memcpy(private->io_region->irb_area, irb, sizeof(*irb));
> >  	mutex_unlock(&private->io_mutex);
> >  
> > -	if (private->mdev && is_final)
> > +	if (private->mdev && is_finished)
> 
> Maybe add a comment?
> 
> /*
>  * Reset to idle if processing of a channel program
>  * has finished; but do not overwrite a possible
>  * processing state if we got a final interrupt for hsch
>  * or csch.
>  */
> 
> Otherwise, I see us scratching our heads again in a few months :)

Almost certainly. :)

> 
> >  		private->state = VFIO_CCW_STATE_IDLE;
> >  
> >  	if (private->io_trigger)
> 
> Patch looks good to me.
> 

Thanks. Will make the above improvements and send as non-RFC.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-11 18:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-10 20:56 [RFC PATCH v5 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2021-05-10 20:56 ` [RFC PATCH v5 1/3] vfio-ccw: Check initialized flag in cp_init() Eric Farman
2021-05-10 20:56 ` [RFC PATCH v5 2/3] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
2021-05-10 20:56 ` [RFC PATCH v5 3/3] vfio-ccw: Serialize FSM IDLE state with I/O completion Eric Farman
2021-05-11 11:31   ` Cornelia Huck
2021-05-11 18:02     ` Eric Farman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).