From: Eric Farman <firstname.lastname@example.org> To: Cornelia Huck <email@example.com>, Halil Pasic <firstname.lastname@example.org> Cc: Matthew Rosato <email@example.com>, Jared Rossi <firstname.lastname@example.org>, email@example.com, firstname.lastname@example.org, Eric Farman <email@example.com> Subject: [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Date: Tue, 13 Apr 2021 20:24:06 +0200 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) Hi Conny, Halil, Let's restart our discussion about the collision between interrupts for START SUBCHANNEL and HALT/CLEAR SUBCHANNEL. It's been a quarter million minutes (give or take), so here is the problematic scenario again: CPU 1 CPU 2 1 CLEAR SUBCHANNEL 2 fsm_irq() 3 START SUBCHANNEL 4 vfio_ccw_sch_io_todo() 5 fsm_irq() 6 vfio_ccw_sch_io_todo() From the channel subsystem's point of view the CLEAR SUBCHANNEL (step 1) is complete once step 2 is called, as the Interrupt Response Block (IRB) has been presented and the TEST SUBCHANNEL was driven by the cio layer. Thus, the START SUBCHANNEL (step 3) is submitted  and gets a cc=0 to indicate the I/O was accepted. However, step 2 stacks the bulk of the actual work onto a workqueue for when the subchannel lock is NOT held, and is unqueued at step 4. That code misidentifies the data in the IRB as being associated with the newly active I/O, and may release memory that is actively in use by the channel subsystem and/or device. Eww. In this version... Patch 1 and 2 are defensive checks. Patch 2 was part of v3 , but I would love a better option here to guard between steps 2 and 4. Patch 3 is a subset of the removal of the CP_PENDING FSM state in v3. I've obviously gone away from this idea, but I thought this piece is still valuable. Patch 4 collapses the code on the interrupt path so that changes to the FSM state and the channel_program struct are handled at the same point, rather than separated by a mutex boundary. Because of the possibility of a START and HALT/CLEAR running concurrently, it does not make sense to split them here. With the above patches, maybe it then makes sense to hold the io_mutex across the entirety of vfio_ccw_sch_io_todo(). But I'm not completely sure that would be acceptable. So... Thoughts? Thanks, Eric Previous versions: v3: https://email@example.com/ v2: https://firstname.lastname@example.org/ v1: https://email@example.com/ Footnotes:  Halil correctly asserts that today's QEMU should prohibit this, but I still have not looked into why. The above is the sequence that is occurring in the kernel, and we shouldn't rely on a well-behaved userspace to enforce things for us. It is still on my list for further investigation, but it's lower in priority.  https://firstname.lastname@example.org/ Eric Farman (4): vfio-ccw: Check initialized flag in cp_init() vfio-ccw: Check workqueue before doing START vfio-ccw: Reset FSM state to IDLE inside FSM vfio-ccw: Reset FSM state to IDLE before io_mutex drivers/s390/cio/vfio_ccw_cp.c | 4 ++++ drivers/s390/cio/vfio_ccw_drv.c | 7 +++---- drivers/s390/cio/vfio_ccw_fsm.c | 6 ++++++ drivers/s390/cio/vfio_ccw_ops.c | 2 -- 4 files changed, 13 insertions(+), 6 deletions(-) -- 2.25.1
next reply other threads:[~2021-04-13 18:24 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-13 18:24 Eric Farman [this message] 2021-04-13 18:24 ` [RFC PATCH v4 1/4] vfio-ccw: Check initialized flag in cp_init() Eric Farman 2021-04-14 16:30 ` Cornelia Huck 2021-04-13 18:24 ` [RFC PATCH v4 2/4] vfio-ccw: Check workqueue before doing START Eric Farman 2021-04-15 10:51 ` Cornelia Huck 2021-04-15 13:48 ` Eric Farman 2021-04-15 16:19 ` Cornelia Huck 2021-04-15 18:42 ` Eric Farman 2021-04-16 14:41 ` Cornelia Huck 2021-04-13 18:24 ` [RFC PATCH v4 3/4] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman 2021-04-15 10:54 ` Cornelia Huck 2021-04-13 18:24 ` [RFC PATCH v4 4/4] vfio-ccw: Reset FSM state to IDLE before io_mutex Eric Farman 2021-04-21 10:25 ` Cornelia Huck 2021-04-21 12:58 ` Eric Farman 2021-04-22 16:16 ` Eric Farman 2021-04-22 0:52 ` [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Halil Pasic 2021-04-22 20:49 ` Eric Farman 2021-04-23 11:06 ` Cornelia Huck 2021-04-23 13:23 ` Halil Pasic 2021-04-23 13:28 ` Niklas Schnelle 2021-04-23 15:53 ` Eric Farman 2021-04-23 11:50 ` Halil Pasic 2021-04-23 15:53 ` Eric Farman 2021-04-23 17:08 ` Halil Pasic 2021-04-23 19:07 ` Eric Farman 2021-04-24 0:18 ` Halil Pasic
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).