From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S933470AbdLRJZG (ORCPT <rfc822;w@1wt.eu>);
        Mon, 18 Dec 2017 04:25:06 -0500
Received: from mail-wm0-f68.google.com ([74.125.82.68]:40441 "EHLO
        mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1758195AbdLRJZB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 18 Dec 2017 04:25:01 -0500
X-Google-Smtp-Source: ACJfBotdbhu5aIZEnFyKUPWeErKKNPhikWWEfBSlPSP2ZPE/q10c8mXa8PMyl2wVTvp0ALAASBUXng==
Subject: Re: [PATCH] KVM/Eventfd: Avoid crash when assign and deassign same
 eventfd in parallel.
To: David Hildenbrand <david@redhat.com>,
        Lan Tianyu <tianyu.lan@intel.com>
Cc: rkrcmar@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        dvyukov@google.com, kernellwp@gmail.com
References: <1513554007-12302-1-git-send-email-tianyu.lan@intel.com>
 <d47054f9-16ee-5ef2-4024-935683ce8dfa@redhat.com>
 <f6337d64-d41a-aba1-d9da-3f646b033a8d@redhat.com>
 <fc20c4ec-acb2-00d3-ee78-a3eec2220288@redhat.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <74af03d3-d1c2-a8b3-3f8b-b80ac0eee461@redhat.com>
Date: Mon, 18 Dec 2017 10:24:58 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <fc20c4ec-acb2-00d3-ee78-a3eec2220288@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 18/12/2017 10:08, David Hildenbrand wrote:
> On 18.12.2017 09:50, Paolo Bonzini wrote:
>> On 18/12/2017 09:30, David Hildenbrand wrote:
>>> The ugly thing in kvm_irqfd_assign() is that we access irqfd without
>>> holding a lock. I think that should rather be fixed than working around
>>> that issue. (e.g. lock() -> lookup again -> verify still in list ->
>>> unlock())
>>
>> I wonder if it's even simpler:
>>
>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>> index f2ac53ab8243..17ed298bd66f 100644
>> --- a/virt/kvm/eventfd.c
>> +++ b/virt/kvm/eventfd.c
>> @@ -387,7 +387,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>>  
>>  	idx = srcu_read_lock(&kvm->irq_srcu);
>>  	irqfd_update(kvm, irqfd);
>> -	srcu_read_unlock(&kvm->irq_srcu, idx);
>>  
>>  	list_add_tail(&irqfd->list, &kvm->irqfds.items);
>>  
>> @@ -420,10 +419,12 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>>  				irqfd->consumer.token, ret);
>>  	}
>>  #endif
>> +	srcu_read_unlock(&kvm->irq_srcu, idx);
>>  
> 
> Was worried about the poll() call. But if that works, it would be very nice.

Good point.

The poll() call is effectively a callback to irqfd_ptable_queue_proc.
So, after the above change,  rqfd_wakeup takes irq_srcu inside
wqh->lock, while kvm_irqfd_assign would take them in the opposite order.

However, this is a read-side critical section so this doesn't cause a
deadlock directly.  The effect is only that synchronize_srcu would now
wait for wqh->lock to be released.  The opposite, which *would* cause a
deadlock, would be a call to synchronize_srcu while wqh->lock is held.

However, this cannot happen because wqh->lock is a spinlock and
synchronize_srcu, which sleeps, cannot be called at all while wqh->lock
is held.  So I think it's okay.

Thanks,

Paolo

> 
>>  	return 0;
>>  
>>  fail:
>> +	/* irq_srcu is *not* held here.  */
>>  	if (irqfd->resampler)
>>  		irqfd_resampler_shutdown(irqfd);
>>  
>>
>> Thanks,
>>
>> Paolo
>>
> 
>