Re: [PATCH v3] KVM: SEV: Mark nested locking of vcpu->lock

From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Gonda <pgonda@google.com>
Cc: John Sperbeck <jsperbeck@google.com>,
	kvm list <kvm@vger.kernel.org>,
	David Rientjes <rientjes@google.com>,
	Sean Christopherson <seanjc@google.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3] KVM: SEV: Mark nested locking of vcpu->lock
Date: Fri, 29 Apr 2022 17:38:16 +0200	[thread overview]
Message-ID: <8a2c5f8c-503c-b4f0-75e7-039533c9852d@redhat.com> (raw)
In-Reply-To: <CAMkAt6ragq4OmnX+n628Yd5pn51qFv4qV20upGR6tTvyYw3U5A@mail.gmail.com>

On 4/29/22 17:35, Peter Gonda wrote:
> On Thu, Apr 28, 2022 at 5:59 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 4/28/22 23:28, Peter Gonda wrote:
>>>
>>> So when actually trying this out I noticed that we are releasing the
>>> current vcpu iterator but really we haven't actually taken that lock
>>> yet. So we'd need to maintain a prev_* pointer and release that one.
>>
>> Not entirely true because all vcpu->mutex.dep_maps will be for the same
>> lock.  The dep_map is essentially a fancy string, in this case
>> "&vcpu->mutex".
>>
>> See the definition of mutex_init:
>>
>> #define mutex_init(mutex)                                              \
>> do {                                                                   \
>>           static struct lock_class_key __key;                            \
>>                                                                          \
>>           __mutex_init((mutex), #mutex, &__key);                         \
>> } while (0)
>>
>> and the dep_map field is initialized with
>>
>>           lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_SLEEP);
>>
>> (i.e. all vcpu->mutexes share the same name and key because they have a
>> single mutex_init-ialization site).  Lockdep is as crude in theory as it
>> is effective in practice!
>>
>>>
>>>            bool acquired = false;
>>>            kvm_for_each_vcpu(...) {
>>>                    if (!acquired) {
>>>                       if (mutex_lock_killable_nested(&vcpu->mutex, role)
>>>                           goto out_unlock;
>>>                       acquired = true;
>>>                    } else {
>>>                       if (mutex_lock_killable(&vcpu->mutex, role)
>>>                           goto out_unlock;
>>
>> This will cause a lockdep splat because it uses subclass 0.  All the
>> *_nested functions is allow you to specify a subclass other than zero.
> 
> OK got it. I now have this to lock:
> 
>           kvm_for_each_vcpu (i, vcpu, kvm) {
>                    if (prev_vcpu != NULL) {
>                            mutex_release(&prev_vcpu->mutex.dep_map, _THIS_IP_);
>                            prev_vcpu = NULL;
>                    }
> 
>                    if (mutex_lock_killable_nested(&vcpu->mutex, role))
>                            goto out_unlock;
>                    prev_vcpu = vcpu;
>            }
> 
> But I've noticed the unlocking is in the wrong order since we are
> using kvm_for_each_vcpu() I think we need a kvm_for_each_vcpu_rev() or
> something. Which maybe a bit for work:
> https://elixir.bootlin.com/linux/latest/source/lib/xarray.c#L1119.

No, you don't need any of this.  You can rely on there being only one 
depmap, otherwise you wouldn't need the mock releases and acquires at 
all.  Also the unlocking order does not matter for deadlocks, only the 
locking order does.  You're overdoing it. :)

Paolo