kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>,
	Jared Rossi <jrossi@linux.ibm.com>,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH v2 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR
Date: Fri, 15 May 2020 17:58:49 +0200	[thread overview]
Message-ID: <20200515175850.79e2ac74.cohuck@redhat.com> (raw)
In-Reply-To: <20200515165539.2e4a8485.pasic@linux.ibm.com>

[only some very high-level comments; I have not had time for this yet
and it's late on a Friday]

On Fri, 15 May 2020 16:55:39 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Fri, 15 May 2020 09:09:47 -0400
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > 
> > 
> > On 5/14/20 9:46 AM, Halil Pasic wrote:  
> > > On Wed, 13 May 2020 16:29:30 +0200
> > > Eric Farman <farman@linux.ibm.com> wrote:
> > >   
> > >> Hi Conny,
> > >>
> > >> Back in January, I suggested a small patch [1] to try to clean up
> > >> the handling of HSCH/CSCH interrupts, especially as it relates to
> > >> concurrent SSCH interrupts. Here is a new attempt to address this.
> > >>
> > >> There was some suggestion earlier about locking the FSM, but I'm not
> > >> seeing any problems with that. Rather, what I'm noticing is that the
> > >> flow between a synchronous START and asynchronous HALT/CLEAR have
> > >> different impacts on the FSM state. Consider:
> > >>
> > >>     CPU 1                           CPU 2
> > >>
> > >>     SSCH (set state=CP_PENDING)
> > >>     INTERRUPT (set state=IDLE)
> > >>     CSCH (no change in state)
> > >>                                     SSCH (set state=CP_PENDING)
> > >>     INTERRUPT (set state=IDLE)
> > >>                                     INTERRUPT (set state=IDLE)  
> > > 
> > > How does the PoP view of the composite device look like 
> > > (where composite device = vfio-ccw + host device)?  
> > 
> > (Just want to be sure that "composite device" is a term that you're
> > creating, because it's not one I'm familiar with.)  
> 
> Yes I created this term because I'm unaware of an established one, and
> I could not come up with a better one. If you have something established
> or better please do tell, I will start using that.

I don't think "composite" is a term I would use; in the end, we are
talking about a vfio-ccw device that gets some of its state from the
host device. As such, I don't think there's really anything like a "PoP
view" of it; the part that should comply with what the PoP specifies is
what gets exposed to the guest.

> 
> > 
> > In today's code, there isn't a "composite device" that contains
> > POPs-defined state information like we find in, for example, the host
> > SCHIB, but that incorporates vfio-ccw state. This series (in a far more
> > architecturally valid, non-RFC, form) would get us closer to that.
> >   
> 
> I think it is very important to start thinking about the ccw devices
> that are exposed to the guest as a "composite device" in a sense that
> the (passed-through) host ccw device is wrapped by vfio-ccw. For instance
> the guest sees the SCHIB exposed by the vfio-ccw wrapper (adaptation
> code in qemu and the kernel module), and not the SCHIB of the host
> device.

See my comments on that above.

> 
> This series definitely has some progressive thoughts. It just that
> IMHO we sould to be more radical about this. And yes I'm a bit
> frustrated.
> 
> > > 
> > > I suppose at the moment where we accept the CSCH the FC bit
> > > indicated clear function (19) goes to set. When this happens
> > > there are 2 possibilities: either the start (17) bit is set,
> > > or it is not. You describe a scenario where the start bit is
> > > not set. In that case we don't have a channel program that
> > > is currently being processed, and any SSCH must bounce right
> > > off without doing anything (with cc 2) as long as FC clear
> > > is set. Please note that we are still talking about the composite
> > > device here.  
> > 
> > Correct. Though, whether the START function control bit is currently set
> > is immaterial to a CLEAR function; that's the biggest recovery action we
> > have at our disposal, and will always go through.
> > 
> > The point is that there is nothing to prevent the START on CPU 2 from
> > going through. The CLEAR does not affect the FSM today, and doesn't
> > record a FC CLEAR bit within vfio-ccw, and so we're only relying on the
> > return code from the SSCH instruction to give us cc0 vs cc2 (or
> > whatever). The actual results of that will depend, since the CPU may
> > have presented the interrupt for the CLEAR (via the .irq callback and
> > thus FSM VFIO_CCW_EVENT_INTERRUPT) and thus a new START is perfectly
> > legal from its point of view. Since vfio-ccw may not have unstacked the
> > work it placed to finish up that interrupt handler means we have a problem.
> >   
> > > 
> > > Thus in my reading CPU2 making the IDLE -> CP_PENDING transition
> > > happen is already wrong. We should have rejected to look at the
> > > channel program in the first place. Because as far as I can tell
> > > for the composite device the FC clear bit remains set until we
> > > process the last interrupt on the CPU 1 side in your scenario. Or
> > > am I wrong?  
> > 
> > You're not wrong. You're agreeing with everything I've described.  :)
> >   
> 
> I'm happy our understanding seems to converge! :)
> 
> My problem is that you tie the denial of SSCH to outstanding interrupts
> ("C) A SSCH cannot be issued while an interrupt is outstanding") while
> the PoP says "Condition code 2 is set, and no other action is
> taken, when a start, halt, or clear function is currently
> in progress at the subchannel (see “Function Control
> (FC)” on page 16-22)".
> 
> This may or man not be what you have actually implemented, but it is what
> you describe in this cover letter. Also patches 2 and 3 do the 
> serialization operate on activity controls instead of on the function
> controls (as described by the PoP).

I have not really had the cycles yet to look at this series in detail,
but should our goal really be to mimic what the PoP talks about in
vfio-ccw (both kernel and userspace parts)? IMO, the important part is
that the guest sees a device that acts *towards the guest* in a way
that is compliant with the PoP; we can take different paths inside
vfio-ccw.

(...)

> BTW the whole notion of synchronous and asynchronous commands is IMHO
> broken. I tried to argue against it back then. If you read the PoP SSCH
> is asynchronous as well.

I don't see where we ever stated that SSCH was synchronous. That would
be silly. The async region is called the async region because it is
used for some async commands (HSCH/CSCH), not because it is the only
way to do async commands. (The original idea was to extend the I/O
region for HSCH/CSCH, but that turned out to be awkward.)

(...)

I hope I can find more time to look at this next week, but as it will
be a short work week for me and I'm already swamped with various other
things, I fear that you should keep your expectations low.


  reply	other threads:[~2020-05-15 15:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13 14:29 [RFC PATCH v2 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2020-05-13 14:29 ` [RFC PATCH v2 1/4] vfio-ccw: Do not reset FSM state for unsolicited interrupts Eric Farman
2020-05-13 14:29 ` [RFC PATCH v2 2/4] vfio-ccw: Utilize scsw actl to serialize start operations Eric Farman
2020-05-13 14:29 ` [RFC PATCH v2 3/4] vfio-ccw: Expand SCSW usage to HALT and CLEAR Eric Farman
2020-05-13 14:29 ` [RFC PATCH v2 4/4] vfio-ccw: Clean up how to react to a failed START Eric Farman
2020-05-14 13:46 ` [RFC PATCH v2 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Halil Pasic
2020-05-15 13:09   ` Eric Farman
2020-05-15 14:55     ` Halil Pasic
2020-05-15 15:58       ` Cornelia Huck [this message]
2020-05-15 17:41         ` Halil Pasic
2020-05-15 18:19           ` Eric Farman
2020-05-15 18:12       ` Eric Farman
2020-05-15 18:37         ` Halil Pasic
2020-05-18 22:01           ` Eric Farman
2020-05-15 19:35         ` Halil Pasic
2020-05-18 16:09 ` Cornelia Huck
2020-05-18 21:57   ` Eric Farman
2020-05-19 11:23     ` Cornelia Huck
2020-05-18 22:09   ` Halil Pasic
2020-05-19 11:36     ` Cornelia Huck
2020-05-19 12:10       ` Halil Pasic
2020-05-26  9:55 ` Cornelia Huck
2020-05-26 11:08   ` Eric Farman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200515175850.79e2ac74.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=jrossi@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).