All of lore.kernel.org
 help / color / mirror / Atom feed
From: Halil Pasic <pasic@linux.ibm.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: Farhan Ali <alifm@linux.ibm.com>,
	Eric Farman <farman@linux.ibm.com>,
	kvm@vger.kernel.org, linux-s390@vger.kernel.org,
	pmorel@linux.ibm.com
Subject: Re: [PATCH v3 1/1] vfio-ccw: Prevent quiesce function going into an infinite loop
Date: Wed, 24 Apr 2019 12:02:38 +0200	[thread overview]
Message-ID: <20190424120238.4e8e1c2f.pasic@linux.ibm.com> (raw)
In-Reply-To: <20190424090915.4a5ab1d9.cohuck@redhat.com>

On Wed, 24 Apr 2019 09:09:15 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 23 Apr 2019 15:41:34 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
> > On 04/23/2019 01:42 PM, Halil Pasic wrote:
> > > One thing I'm confused about is, that we don't seem to prevent
> > > new I/O being submitted. That is we could still loop indefinitely
> > > if we get new IO after the 'kill I/O on the subchannel' is done but
> > > before the msch() with the disable is issued.  
> > 
> > So the quiesce function will be called in the remove, release functions 
> > and also in the mdev reset callback via an ioctl VFIO_DEVICE_RESET.
> > 
> > Now the release function is invoked in cases when we hot unplug the 
> > device or the guest is gone (or anything that will close the vfio mdev 
> > file descriptor, I believe). In such scenarios it's really the userspace 
> > which is asking to release the device. Similar for remove, where the 
> > user has to explicitly write to the remove file for the mdev to invoke 
> > it. Under normal conditions no sane userspace should be doing 
> > release/remove while there are still on going I/Os :)
> > 
> > Me and Conny had some discussion on this in v1 of this patch:
> > https://marc.info/?l=kvm&m=155437117823248&w=2
> > 
> > > 
> > > The 'flush all I/O' parts in the commit message and in the code make
> > > this even more confusing.  
> > 
> > Maybe...if it's too confusing it could be fixed, but IMHO I don't think 
> > it's a dealbreaker. If anyone else thinks otherwise, I can go ahead and 
> > change it.
> 
> I think it's fine -- I wasn't confused.
> 

What I/O is flushed in the workqueue? I guess it is about the line

flush_workqueue(vfio_ccw_work_q);

But there is no I/O that can be flushed in vfio_ccw_work_q, but the
bottom half of the interrupt handler if you like. 

> > 
> > > 
> > > Another thing that I don't quite understand is injecting interrupts
> > > into QEMU for stuff that is actually not guest initiated.  
> > 
> > As mentioned above under normal conditions we shouldn't be doing 
> > quiesce. But wouldn't those interrupts just be unsolicited interrupts 
> > for the guest?
> 
> Yes, you simply cannot keep an enabled subchannel from getting status
> pending with unsolicited status.
> 

I don't think a status that results from a halt subchannel can be called
unsolicited. For example if no halt signal was issued, the halt remains
pending. IMHO it is, form a guest perspective to see a halt pending in an
IRB without having a HSCH executed.

> > 
> > > 
> > > Furthermore I find how cio_cancel_halt_clear() quite confusing. We
> 
> Well, that's a problem (if any) with the common I/O layer and beyond
> the scope of this patch.
> 
> > > TL;DR:
> > > 
> > > I welcome  this batch (with an r-b) but I would like the commit message
> 
> So, what does this sentence mean? Confused.
> 

s/batch/patch/ and the sentence misses 'is gone' at the very end.

> > > and the comment changed so that the misleading 'flush all I/O in the
> > > workqueue'.
> > > 
> > > I think 'vfio-ccw: fix cio_cancel_halt_clear() usage' would reflect the
> > > content of this patch better, because reasoning about the upper limit,
> > > and what happens if this upper limit is hit is not what this patch is
> > > about. It is about a client code bug that rendered iretry ineffective.
> > >   
> > 
> > I politely disagree with the change in subject line. I think the current 
> > subject line describe what we are trying to prevent with this patch. But 
> > again if anyone else feels otherwise, I will go ahead and change :)
> 
> No, I agree that the subject line is completely fine.
> 

This is the 'infinite' loop in question.

                iretry = 255;                                                                            
                                                                                                         
                ret = cio_cancel_halt_clear(sch, &iretry);                                               
                while (ret == -EBUSY) {                                                                  
                        /*                                                                               
                         * Flush all I/O and wait for                                                    
                         * cancel/halt/clear completion.                                                 
                         */                                                                              
                        private->completion = &completion;                                               
                        spin_unlock_irq(sch->lock);                                                      
                                                                                                         
                        wait_for_completion_timeout(&completion, 3*HZ);                                  
                                                                                                         
                        private->completion = NULL;                                                      
                        flush_workqueue(vfio_ccw_work_q);                                                
                        spin_lock_irq(sch->lock);                                                        
                        ret = cio_cancel_halt_clear(sch, &iretry);                                       
                };

Considering the documentation of cio_cancel_halt_clear() and without
fully understanding the implementation of cio_cancel_halt_clear() it
looks IMHO limited to 255 retries. But it is not. Because
cio_cancel_halt_clear() is used incorrectly.

Regarding the changed code, sorry, I missed the break on -EIO. With that
the new loop should indeed be limited to ~255 iterations.

Acked-by: Halil Pasic <pasic@linux.ibm.com>

Regards,
Halil

  reply	other threads:[~2019-04-24 10:02 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1555449329.git.alifm@linux.ibm.com>
2019-04-16 21:23 ` [PATCH v3 1/1] vfio-ccw: Prevent quiesce function going into an infinite loop Farhan Ali
2019-04-17  9:03   ` Cornelia Huck
2019-04-17 13:58     ` Eric Farman
2019-04-17 15:13       ` Halil Pasic
2019-04-17 15:18         ` Farhan Ali
2019-04-19 20:12           ` Halil Pasic
2019-04-22 14:01             ` Farhan Ali
2019-04-23 17:42               ` Halil Pasic
2019-04-23 19:41                 ` Farhan Ali
2019-04-23 20:37                   ` Eric Farman
2019-04-24  7:09                   ` Cornelia Huck
2019-04-24 10:02                     ` Halil Pasic [this message]
2019-04-24 10:21                   ` Halil Pasic
2019-04-18 14:36         ` Cornelia Huck
2019-04-17 14:02     ` Farhan Ali
2019-04-24 16:35   ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190424120238.4e8e1c2f.pasic@linux.ibm.com \
    --to=pasic@linux.ibm.com \
    --cc=alifm@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pmorel@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.