kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Halil Pasic <pasic@linux.ibm.com>
To: Eric Farman <farman@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Jared Rossi <jrossi@linux.ibm.com>,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR
Date: Thu, 22 Apr 2021 02:52:58 +0200	[thread overview]
Message-ID: <20210422025258.6ed7619d.pasic@linux.ibm.com> (raw)
In-Reply-To: <20210413182410.1396170-1-farman@linux.ibm.com>

On Tue, 13 Apr 2021 20:24:06 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Hi Conny, Halil,
> 
> Let's restart our discussion about the collision between interrupts for
> START SUBCHANNEL and HALT/CLEAR SUBCHANNEL. It's been a quarter million
> minutes (give or take), so here is the problematic scenario again:
> 
> 	CPU 1			CPU 2
>  1	CLEAR SUBCHANNEL
>  2	fsm_irq()
>  3				START SUBCHANNEL
>  4	vfio_ccw_sch_io_todo()
>  5				fsm_irq()
>  6				vfio_ccw_sch_io_todo()
> 
> From the channel subsystem's point of view the CLEAR SUBCHANNEL (step 1)
> is complete once step 2 is called, as the Interrupt Response Block (IRB)
> has been presented and the TEST SUBCHANNEL was driven by the cio layer.
> Thus, the START SUBCHANNEL (step 3) is submitted [1] and gets a cc=0 to
> indicate the I/O was accepted. However, step 2 stacks the bulk of the
> actual work onto a workqueue for when the subchannel lock is NOT held,
> and is unqueued at step 4. That code misidentifies the data in the IRB
> as being associated with the newly active I/O, and may release memory
> that is actively in use by the channel subsystem and/or device. Eww.
> 
> In this version...
> 
> Patch 1 and 2 are defensive checks. Patch 2 was part of v3 [2], but I
> would love a better option here to guard between steps 2 and 4.
> 
> Patch 3 is a subset of the removal of the CP_PENDING FSM state in v3.
> I've obviously gone away from this idea, but I thought this piece is
> still valuable.
> 
> Patch 4 collapses the code on the interrupt path so that changes to
> the FSM state and the channel_program struct are handled at the same
> point, rather than separated by a mutex boundary. Because of the
> possibility of a START and HALT/CLEAR running concurrently, it does
> not make sense to split them here.
> 
> With the above patches, maybe it then makes sense to hold the io_mutex
> across the entirety of vfio_ccw_sch_io_todo(). But I'm not completely
> sure that would be acceptable.
> 
> So... Thoughts?

I believe we should address the concurrency, encapsulation and layering
issues in the subchannel/ccw pass-through code (vfio-ccw) by taking a
holistic approach as soon as possible.

I find the current state of art very hard to reason about, and that
adversely  affects my ability to reason about attempts at partial
improvements.

I understand that such a holistic approach needs a lot of work, and we
may have to stop some bleeding first. In the stop the bleeding phase we
can take a pragmatic approach and accept changes that empirically seem to
work towards stopping the bleeding. I.e. if your tests say it's better,
I'm willing to accept that it is better.

I have to admit, I don't understand how synchronization is done in the
vfio-ccw kernel module (in the sense of avoiding data races).

Regarding your patches, I have to admit, I have a hard time figuring out
which one of these (or what combination of them) is supposed to solve
the problem you described above. If I had to guess, I would guess it is
either patch 4, because it has a similar scenario diagram in the
commit message like the one in the problem statement. Is my guess right?

If it is right I don't quite understand the mechanics of the fix,
because what the patch seems to do is changing the content of step 4 in
the above diagram. And I don't see how is change that code
so that it does not "misidentifies the data in the IRB as being
associated with the newly active I/O". Moreover patch 4 seems to rely on
private->state which, AFAIR is still used in a racy fashion.

But if strong empirical evidence shows that it performs better (stops
the bleeding), I think we can go ahead with it.

Regards,
Halil








  parent reply	other threads:[~2021-04-22  0:53 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 18:24 [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2021-04-13 18:24 ` [RFC PATCH v4 1/4] vfio-ccw: Check initialized flag in cp_init() Eric Farman
2021-04-14 16:30   ` Cornelia Huck
2021-04-13 18:24 ` [RFC PATCH v4 2/4] vfio-ccw: Check workqueue before doing START Eric Farman
2021-04-15 10:51   ` Cornelia Huck
2021-04-15 13:48     ` Eric Farman
2021-04-15 16:19       ` Cornelia Huck
2021-04-15 18:42         ` Eric Farman
2021-04-16 14:41           ` Cornelia Huck
2021-04-13 18:24 ` [RFC PATCH v4 3/4] vfio-ccw: Reset FSM state to IDLE inside FSM Eric Farman
2021-04-15 10:54   ` Cornelia Huck
2021-04-13 18:24 ` [RFC PATCH v4 4/4] vfio-ccw: Reset FSM state to IDLE before io_mutex Eric Farman
2021-04-21 10:25   ` Cornelia Huck
2021-04-21 12:58     ` Eric Farman
2021-04-22 16:16       ` Eric Farman
2021-04-22  0:52 ` Halil Pasic [this message]
2021-04-22 20:49   ` [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2021-04-23 11:06     ` Cornelia Huck
2021-04-23 13:23       ` Halil Pasic
2021-04-23 13:28         ` Niklas Schnelle
2021-04-23 15:53         ` Eric Farman
2021-04-23 11:50     ` Halil Pasic
2021-04-23 15:53       ` Eric Farman
2021-04-23 17:08         ` Halil Pasic
2021-04-23 19:07           ` Eric Farman
2021-04-24  0:18             ` Halil Pasic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210422025258.6ed7619d.pasic@linux.ibm.com \
    --to=pasic@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=jrossi@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).