From: Cornelia Huck <cohuck@redhat.com>
To: Eric Farman <farman@linux.ibm.com>
Cc: Jared Rossi <jrossi@linux.ibm.com>,
Halil Pasic <pasic@linux.ibm.com>,
linux-s390@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH v3 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR
Date: Mon, 29 Jun 2020 16:56:29 +0200 [thread overview]
Message-ID: <20200629165629.24f21585.cohuck@redhat.com> (raw)
In-Reply-To: <5ae6151b-31de-eca6-2917-4e23ecd4f0df@linux.ibm.com>
On Wed, 17 Jun 2020 07:24:17 -0400
Eric Farman <farman@linux.ibm.com> wrote:
> On 6/16/20 3:50 PM, Eric Farman wrote:
> > Let's continue our discussion of the handling of vfio-ccw interrupts.
> >
> > The initial fix [1] relied upon the interrupt path's examination of the
> > FSM state, and freeing all resources if it were CP_PENDING. But the
> > interface used by HALT/CLEAR SUBCHANNEL doesn't affect the FSM state.
> > Consider this sequence:
> >
> > CPU 1 CPU 2
> > CLEAR (state=IDLE/no change)
> > START [2]
> > INTERRUPT (set state=IDLE)
> > INTERRUPT (set state=IDLE)
> >
> > This translates to a couple of possible scenarios:
> >
> > A) The START gets a cc2 because of the outstanding CLEAR, -EBUSY is
> > returned, resources are freed, and state remains IDLE
> > B) The START gets a cc0 because the CLEAR has already presented an
> > interrupt, and state is set to CP_PENDING
> >
> > If the START gets a cc0 before the CLEAR INTERRUPT (stacked onto a
> > workqueue by the IRQ context) gets a chance to run, then the INTERRUPT
> > will release the channel program memory prematurely. If the two
> > operations run concurrently, then the FSM state set to CP_PROCESSING
> > will prevent the cp_free() from being invoked. But the io_mutex
> > boundary on that path will pause itself until the START completes,
> > and then allow the FSM to be reset to IDLE without considering the
> > outstanding START. Neither scenario would be considered good.
> >
> > Having said all of that, in v2 Conny suggested [3] the following:
> >
> >> - Detach the cp from the subchannel (or better, remove the 1:1
> >> relationship). By that I mean building the cp as a separately
> >> allocated structure (maybe embedding a kref, but that might not be
> >> needed), and appending it to a list after SSCH with cc=0. Discard it
> >> if cc!=0.
> >> - Remove the CP_PENDING state. The state is either IDLE after any
> >> successful SSCH/HSCH/CSCH, or a new state in that case. But no
> >> special state for SSCH.
> >> - A successful CSCH removes the first queued request, if any.
> >> - A final interrupt removes the first queued request, if any.
> >
> > What I have implemented here is basically this, with a few changes:
> >
> > - I don't queue cp's. Since there should only be one START in process
> > at a time, and HALT/CLEAR doesn't build a cp, I didn't see a pressing
> > need to introduce that complexity.
> > - Furthermore, while I initially made a separately allocated cp, adding
> > an alloc for a cp on each I/O AND moving the guest_cp alloc from the
> > probe path to the I/O path seems excessive. So I implemented a
> > "started" flag to the cp, set after a cc0 from the START, and examine
> > that on the interrupt path to determine whether cp_free() is needed.
>
> FYI... After a day or two of running, I sprung a kernel debug oops for
> list corruption in ccwchain_free(). I'm going to blame this piece, since
> it was the last thing I changed and I hadn't come across any such damage
> since v2. So either "started" is a bad idea, or a broken one. Or both. :)
Have you come to any conclusion wrt 'started'? Not wanting to generate
stress, just asking :)
next prev parent reply other threads:[~2020-06-29 21:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-16 19:50 [RFC PATCH v3 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2020-06-16 19:50 ` [RFC PATCH v3 1/3] vfio-ccw: Indicate if a channel_program is started Eric Farman
2020-06-17 23:11 ` Halil Pasic
2020-06-18 11:47 ` Eric Farman
2020-06-16 19:50 ` [RFC PATCH v3 2/3] vfio-ccw: Remove the CP_PENDING FSM state Eric Farman
2020-06-16 19:50 ` [RFC PATCH v3 3/3] vfio-ccw: Check workqueue before doing START Eric Farman
2020-06-19 11:40 ` Cornelia Huck
2020-06-17 11:24 ` [RFC PATCH v3 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2020-06-29 14:56 ` Cornelia Huck [this message]
2020-06-30 19:10 ` Eric Farman
2020-06-19 11:21 ` Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200629165629.24f21585.cohuck@redhat.com \
--to=cohuck@redhat.com \
--cc=farman@linux.ibm.com \
--cc=jrossi@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=pasic@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).