kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Farhan Ali <alifm@linux.ibm.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: Halil Pasic <pasic@linux.ibm.com>,
	farman@linux.ibm.com, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program
Date: Wed, 10 Jul 2019 12:10:20 -0400	[thread overview]
Message-ID: <75e71cc4-7552-b9e5-5649-4de2cdd8f59a@linux.ibm.com> (raw)
In-Reply-To: <20190710154549.5c31cc0c.cohuck@redhat.com>



On 07/10/2019 09:45 AM, Cornelia Huck wrote:
> On Tue, 9 Jul 2019 17:27:47 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> On 07/09/2019 10:21 AM, Halil Pasic wrote:
>>> On Tue, 9 Jul 2019 09:46:51 -0400
>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>    
>>>>
>>>>
>>>> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
>>>>> On Mon,  8 Jul 2019 16:10:37 -0400
>>>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>>>   
>>>>>> There is a small window where it's possible that we could be working
>>>>>> on an interrupt (queued in the workqueue) and setting up a channel
>>>>>> program (i.e allocating memory, pinning pages, translating address).
>>>>>> This can lead to allocating and freeing the channel program at the
>>>>>> same time and can cause memory corruption.
>>>>>>
>>>>>> Let's not call cp_free if we are currently processing a channel program.
>>>>>> The only way we know for sure that we don't have a thread setting
>>>>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
>>>>>
>>>>> Can we pinpoint a commit that introduced this bug, or has it been there
>>>>> since the beginning?
>>>>>   
>>>>
>>>> I think the problem was always there.
>>>>   
>>>
>>> I think it became relevant with the async stuff. Because after the async
>>> stuff was added we start getting solicited interrupts that are not about
>>> channel program is done. At least this is how I remember the discussion.
>>>    
>>>>>>
>>>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>>>> ---
>>>>>>     drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> index 4e3a903..0357165 100644
>>>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>>>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>>>>>     		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>>>>     	if (scsw_is_solicited(&irb->scsw)) {
>>>>>>     		cp_update_scsw(&private->cp, &irb->scsw);
>>>>>> -		if (is_final)
>>>>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
>>>
>>> Ain't private->state potentially used by multiple threads of execution?
>>
>> yes
>>
>> One of the paths I can think of is a machine check from the host which
>> will ultimately call vfio_ccw_sch_event callback which could set state
>> to NOT_OPER or IDLE.
> 
> Now I went through the machine check rabbit hole because I thought
> freeing the cp in there might be a good idea, but it's not that easy
> (who'd have thought...)

Thanks for taking a deeper look :)

> 
> If I read the POP correctly, an IPI or IPR in the subchannel CRW will
> indicate that the subchannel has been restored to a state after an I/O
> reset; in particular, that means that the subchannel does not have any
> I/O pending. However, that does not seem to be the case e.g. for an IPM
> (the doc does not seem to be very clear on that, though.) We can't
> unconditionally do something, as we do not know what event we're being
> called for (please disregard the positively ancient "we're called for
> IPI" comment in css_process_crw(), I think I added that one in the
> Linux 2.4 or 2.5 timeframe...) tl;dr We can't rely on anything...

Yes, the CRW infrastructure in Linux does not convey the exact event 
back to the subchannel driver.

> 
>>
>>> Do we need to use atomic operations or external synchronization to avoid
>>> this being another gamble? Or am I missing something?
>>
>> I think we probably should think about atomic operations for
>> synchronizing the state (and it could be a separate add on patch?).
> 
> +1 to thinking about some atomicity changes later.
> 
>>
>> But for preventing 2 threads from stomping on the cp the check should be
>> enough, unless I am missing something?
> 
> I think so. Plus, the patch is small enough that we can merge it right
> away, and figure out a more generic change later.

I will send out a v3 soon if no one else has any other suggestions.

> 
>>
>>>    
>>>>>>     			cp_free(&private->cp);
>>>>>>     	}
>>>>>>     	mutex_lock(&private->io_mutex);
>>>>>
>>>>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>>>>>
>>>>>   
>>>> Thanks for reviewing.
>>>>
>>>> Thanks
>>>> Farhan
>>>
>>>    
> 
> 


  reply	other threads:[~2019-07-10 16:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-08 20:10 [RFC v2 0/5] Some vfio-ccw fixes Farhan Ali
2019-07-08 20:10 ` [RFC v2 1/5] vfio-ccw: Fix misleading comment when setting orb.cmd.c64 Farhan Ali
2019-07-09  9:57   ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 2/5] vfio-ccw: Fix memory leak and don't call cp_free in cp_init Farhan Ali
2019-07-09 10:06   ` Cornelia Huck
2019-07-09 14:07     ` Farhan Ali
2019-07-09 14:18       ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 3/5] vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn Farhan Ali
2019-07-09 10:08   ` Cornelia Huck
2019-07-08 20:10 ` [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program Farhan Ali
2019-07-09 10:16   ` Cornelia Huck
2019-07-09 13:46     ` Farhan Ali
2019-07-09 14:21       ` Halil Pasic
2019-07-09 21:27         ` Farhan Ali
2019-07-10 13:45           ` Cornelia Huck
2019-07-10 16:10             ` Farhan Ali [this message]
2019-07-11 12:28               ` Eric Farman
2019-07-11 14:57           ` Halil Pasic
2019-07-11 20:09             ` Eric Farman
2019-07-12 13:59               ` Halil Pasic
2019-07-08 20:10 ` [RFC v2 5/5] vfio-ccw: Update documentation for csch/hsch Farhan Ali
2019-07-09 10:14   ` Cornelia Huck
2019-07-09 12:47     ` Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75e71cc4-7552-b9e5-5649-4de2cdd8f59a@linux.ibm.com \
    --to=alifm@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).