kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Halil Pasic <pasic@linux.ibm.com>
To: Farhan Ali <alifm@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	farman@linux.ibm.com, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
Date: Thu, 11 Jul 2019 16:57:03 +0200	[thread overview]
Message-ID: <20190711165703.3a1a8462.pasic@linux.ibm.com> (raw)
In-Reply-To: <87f7a37f-cc34-36fb-3a33-309e33bbbdde@linux.ibm.com>

On Tue, 9 Jul 2019 17:27:47 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> 
> 
> On 07/09/2019 10:21 AM, Halil Pasic wrote:
> > On Tue, 9 Jul 2019 09:46:51 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> > 
> >>
> >>
> >> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
> >>> On Mon,  8 Jul 2019 16:10:37 -0400
> >>> Farhan Ali <alifm@linux.ibm.com> wrote:
> >>>
> >>>> There is a small window where it's possible that we could be working
> >>>> on an interrupt (queued in the workqueue) and setting up a channel
> >>>> program (i.e allocating memory, pinning pages, translating address).
> >>>> This can lead to allocating and freeing the channel program at the
> >>>> same time and can cause memory corruption.
> >>>>
> >>>> Let's not call cp_free if we are currently processing a channel program.
> >>>> The only way we know for sure that we don't have a thread setting
> >>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
> >>>
> >>> Can we pinpoint a commit that introduced this bug, or has it been there
> >>> since the beginning?
> >>>
> >>
> >> I think the problem was always there.
> >>
> > 
> > I think it became relevant with the async stuff. Because after the async
> > stuff was added we start getting solicited interrupts that are not about
> > channel program is done. At least this is how I remember the discussion.
> > 

You seem to have ignored this comment. BTW wasn't the cp->is_initialized
make 'Make it safe to call the cp accessors in any case, so we can call
them unconditionally.'?

@Connie: Your opinion as the author of that patch and of the cited
sentence?

> >>>>
> >>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >>>> ---
> >>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> index 4e3a903..0357165 100644
> >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
> >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> >>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
> >>>>    		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
> >>>>    	if (scsw_is_solicited(&irb->scsw)) {
> >>>>    		cp_update_scsw(&private->cp, &irb->scsw);
> >>>> -		if (is_final)
> >>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> > 
> > Ain't private->state potentially used by multiple threads of execution?
> 
> yes
> 
> One of the paths I can think of is a machine check from the host which 
> will ultimately call vfio_ccw_sch_event callback which could set state 
> to NOT_OPER or IDLE.
> 
> > Do we need to use atomic operations or external synchronization to avoid
> > this being another gamble? Or am I missing something?
> 
> I think we probably should think about atomic operations for 
> synchronizing the state (and it could be a separate add on patch?).
> 
> But for preventing 2 threads from stomping on the cp the check should be 
> enough, unless I am missing something?
> 

Usually programming languages don't like incorrectly synchronized
programs. One tends to end up in undefined behavior land -- form language
perspective. That doesn't actually mean you are bound to see strange
stuff. With implementation spec + ABI spec + platform/architecture
spec one may end up with things being well defined. But it that is a much
deeper rabbit hole.

The nice thing about condition state == VFIO_CCW_STATE_CP_PENDING is
that it can tolerate stale state values. The bad case at hand
(you free but you should not) would be we see a stale
VFIO_CCW_STATE_CP_PENDING but we are actually
VFIO_CCW_STATE_CP_PROCESSING. That is pretty difficult to imagine
because one can enter VFIO_CCW_STATE_CP_PROCESSING only form
VFIO_CCW_STATE_CP_PENDING afair. On s390x torn reads/writes (i.e.
observing something that ain't either the old nor the new value) on an
int shouldn't be a concern.

The other bad case (where you don't free albeit you should) looks a
bit trickier.

I'm not a fan of keeping races around without good reasons. And I don't
see good reasons here. I'm no fan of needlessly complicated solutions
either.

But seems, at least with my beliefs about races, I'm the oddball
here. 

Regards,
Halil

> > 
> >>>>    			cp_free(&private->cp);
> >>>>    	}
> >>>>    	mutex_lock(&private->io_mutex);
> >>>
> >>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> >>>
> >>>
> >> Thanks for reviewing.
> >>
> >> Thanks
> >> Farhan
> > 
> > 


  parent reply	other threads:[~2019-07-11 14:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-08 20:10 [RFC v2 0/5] Some vfio-ccw fixes Farhan Ali
2019-07-08 20:10 ` [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64 Farhan Ali
2019-07-09  9:57   ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init Farhan Ali
2019-07-09 10:06   ` Cornelia Huck
2019-07-09 14:07     ` Farhan Ali
2019-07-09 14:18       ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 3/5] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Farhan Ali
2019-07-09 10:08   ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program Farhan Ali
2019-07-09 10:16   ` Cornelia Huck
2019-07-09 13:46     ` Farhan Ali
2019-07-09 14:21       ` Halil Pasic
2019-07-09 21:27         ` Farhan Ali
2019-07-10 13:45           ` Cornelia Huck
2019-07-10 16:10             ` Farhan Ali
2019-07-11 12:28               ` Eric Farman
2019-07-11 14:57           ` Halil Pasic [this message]
2019-07-11 20:09             ` Eric Farman
2019-07-12 13:59               ` Halil Pasic
2019-07-08 20:10 ` [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch Farhan Ali
2019-07-09 10:14   ` Cornelia Huck
2019-07-09 12:47     ` Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190711165703.3a1a8462.pasic@linux.ibm.com \
    --to=pasic@linux.ibm.com \
    --cc=alifm@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).