kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR
@ 2021-04-13 18:24 Eric Farman
  2021-04-13 18:24 ` [RFC PATCH v4 1/4] vfio-ccw: Check initialized flag in cp_init() Eric Farman
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Eric Farman @ 2021-04-13 18:24 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic
  Cc: Matthew Rosato, Jared Rossi, linux-s390, kvm, Eric Farman

Hi Conny, Halil,

Let's restart our discussion about the collision between interrupts for
START SUBCHANNEL and HALT/CLEAR SUBCHANNEL. It's been a quarter million
minutes (give or take), so here is the problematic scenario again:

	CPU 1			CPU 2
 1	CLEAR SUBCHANNEL
 2	fsm_irq()
 3				START SUBCHANNEL
 4	vfio_ccw_sch_io_todo()
 5				fsm_irq()
 6				vfio_ccw_sch_io_todo()

From the channel subsystem's point of view the CLEAR SUBCHANNEL (step 1)
is complete once step 2 is called, as the Interrupt Response Block (IRB)
has been presented and the TEST SUBCHANNEL was driven by the cio layer.
Thus, the START SUBCHANNEL (step 3) is submitted [1] and gets a cc=0 to
indicate the I/O was accepted. However, step 2 stacks the bulk of the
actual work onto a workqueue for when the subchannel lock is NOT held,
and is unqueued at step 4. That code misidentifies the data in the IRB
as being associated with the newly active I/O, and may release memory
that is actively in use by the channel subsystem and/or device. Eww.

In this version...

Patch 1 and 2 are defensive checks. Patch 2 was part of v3 [2], but I
would love a better option here to guard between steps 2 and 4.

Patch 3 is a subset of the removal of the CP_PENDING FSM state in v3.
I've obviously gone away from this idea, but I thought this piece is
still valuable.

Patch 4 collapses the code on the interrupt path so that changes to
the FSM state and the channel_program struct are handled at the same
point, rather than separated by a mutex boundary. Because of the
possibility of a START and HALT/CLEAR running concurrently, it does
not make sense to split them here.

With the above patches, maybe it then makes sense to hold the io_mutex
across the entirety of vfio_ccw_sch_io_todo(). But I'm not completely
sure that would be acceptable.

So... Thoughts?

Thanks,
Eric

Previous versions:
v3: https://lore.kernel.org/kvm/20200616195053.99253-1-farman@linux.ibm.com/
v2: https://lore.kernel.org/kvm/20200513142934.28788-1-farman@linux.ibm.com/
v1: https://lore.kernel.org/kvm/20200124145455.51181-1-farman@linux.ibm.com/

Footnotes:
[1] Halil correctly asserts that today's QEMU should prohibit this, but I
    still have not looked into why. The above is the sequence that is
    occurring in the kernel, and we shouldn't rely on a well-behaved
    userspace to enforce things for us. It is still on my list for further
    investigation, but it's lower in priority.
[2] https://lore.kernel.org/kvm/20200619134005.512fc54f.cohuck@redhat.com/

Eric Farman (4):
  vfio-ccw: Check initialized flag in cp_init()
  vfio-ccw: Check workqueue before doing START
  vfio-ccw: Reset FSM state to IDLE inside FSM
  vfio-ccw: Reset FSM state to IDLE before io_mutex

 drivers/s390/cio/vfio_ccw_cp.c  | 4 ++++
 drivers/s390/cio/vfio_ccw_drv.c | 7 +++----
 drivers/s390/cio/vfio_ccw_fsm.c | 6 ++++++
 drivers/s390/cio/vfio_ccw_ops.c | 2 --
 4 files changed, 13 insertions(+), 6 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-04-24  0:18 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 18:24 [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2021-04-13 18:24 ` [RFC PATCH v4 1/4] vfio-ccw: Check initialized flag in cp_init() Eric Farman
2021-04-14 16:30   ` Cornelia Huck
2021-04-13 18:24 ` [RFC PATCH v4 2/4] vfio-ccw: Check workqueue before doing START Eric Farman
2021-04-15 10:51   ` Cornelia Huck
2021-04-15 13:48     ` Eric Farman
2021-04-15 16:19       ` Cornelia Huck
2021-04-15 18:42         ` Eric Farman
2021-04-16 14:41           ` Cornelia Huck
2021-04-13 18:24 ` [RFC PATCH v4 3/4] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
2021-04-15 10:54   ` Cornelia Huck
2021-04-13 18:24 ` [RFC PATCH v4 4/4] vfio-ccw: Reset FSM state to IDLE before io_mutex Eric Farman
2021-04-21 10:25   ` Cornelia Huck
2021-04-21 12:58     ` Eric Farman
2021-04-22 16:16       ` Eric Farman
2021-04-22  0:52 ` [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Halil Pasic
2021-04-22 20:49   ` Eric Farman
2021-04-23 11:06     ` Cornelia Huck
2021-04-23 13:23       ` Halil Pasic
2021-04-23 13:28         ` Niklas Schnelle
2021-04-23 15:53         ` Eric Farman
2021-04-23 11:50     ` Halil Pasic
2021-04-23 15:53       ` Eric Farman
2021-04-23 17:08         ` Halil Pasic
2021-04-23 19:07           ` Eric Farman
2021-04-24  0:18             ` Halil Pasic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).