All of lore.kernel.org
 help / color / mirror / Atom feed
* kvm_read_guest_page() missing kvm->srcu read lock?
@ 2018-05-10 17:41 Andre Przywara
  2018-05-10 17:43 ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Andre Przywara @ 2018-05-10 17:41 UTC (permalink / raw)
  To: Paolo Bonzini, kvm; +Cc: Marc Zyngier, Jan Glauber, kvmarm

Hi,

Jan posted an lockdep splat complaining about a suspicious
rcu_dereference_check:
https://lists.cs.columbia.edu/pipermail/kvmarm/2018-May/031116.html

The gist of that is:
...
[ 1025.695517]  dump_stack+0x9c/0xd4
[ 1025.695524]  lockdep_rcu_suspicious+0xcc/0x118
[ 1025.695537]  gfn_to_memslot+0x174/0x190
[ 1025.695546]  kvm_read_guest+0x50/0xb0
[ 1025.695553]  vgic_its_check_id.isra.0+0x114/0x148
...
I chased that down and wonder if kvm_read_guest{,_page} is supposed to
be called inside a kvm->srcu critical section?

We have a check that suggests that eventually someone needs to enter the
SRCU criticial section:
static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm,
						  int as_id)
{
        return srcu_dereference_check(kvm->memslots[as_id], &kvm->srcu,
                        lockdep_is_held(&kvm->slots_lock) ||
                        !refcount_read(&kvm->users_count));
}

If I get this correctly this mean for accessing kvm->memslots we either
need to be inside an srcu critical section or hold the kvm->slots_lock
(for updates only).

If I am not mistaken, it is not necessary for *callers* of
kvm_read_guest_page() to do this, as this could be entirely contained
inside this function - since we only use the reference to the memslot
entry while doing the copy_from_user(), and the data is safe afterwards
from an RCU point of view because it has been *copied*.

Does that sound correct? Has anyone run a lockdep enabled kernel with
PROVE_RCU before and stumbled upon this message?
I am a bit doubtful that this is valid because this is generic KVM code
and should trigger with every kvm_read_guest() user.

With my limited understanding I would say the we just need to have a
srcu_read_lock() call at the beginning of kvm_read_guest_page() and the
corresponding srcu_read_unlock() call at the end of that function. Is
that enough?

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kvm_read_guest_page() missing kvm->srcu read lock?
  2018-05-10 17:41 kvm_read_guest_page() missing kvm->srcu read lock? Andre Przywara
@ 2018-05-10 17:43 ` Paolo Bonzini
  2018-05-11 11:02   ` Andre Przywara
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2018-05-10 17:43 UTC (permalink / raw)
  To: Andre Przywara, kvm; +Cc: Marc Zyngier, Jan Glauber, kvmarm

On 10/05/2018 19:41, Andre Przywara wrote:
> Hi,
> 
> Jan posted an lockdep splat complaining about a suspicious
> rcu_dereference_check:
> https://lists.cs.columbia.edu/pipermail/kvmarm/2018-May/031116.html
> 
> The gist of that is:
> ...
> [ 1025.695517]  dump_stack+0x9c/0xd4
> [ 1025.695524]  lockdep_rcu_suspicious+0xcc/0x118
> [ 1025.695537]  gfn_to_memslot+0x174/0x190
> [ 1025.695546]  kvm_read_guest+0x50/0xb0
> [ 1025.695553]  vgic_its_check_id.isra.0+0x114/0x148
> ...
> I chased that down and wonder if kvm_read_guest{,_page} is supposed to
> be called inside a kvm->srcu critical section?
> 
> We have a check that suggests that eventually someone needs to enter the
> SRCU criticial section:
> static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm,
> 						  int as_id)
> {
>         return srcu_dereference_check(kvm->memslots[as_id], &kvm->srcu,
>                         lockdep_is_held(&kvm->slots_lock) ||
>                         !refcount_read(&kvm->users_count));
> }
> 
> If I get this correctly this mean for accessing kvm->memslots we either
> need to be inside an srcu critical section or hold the kvm->slots_lock
> (for updates only).
> 
> If I am not mistaken, it is not necessary for *callers* of
> kvm_read_guest_page() to do this, as this could be entirely contained
> inside this function - since we only use the reference to the memslot
> entry while doing the copy_from_user(), and the data is safe afterwards
> from an RCU point of view because it has been *copied*.

Yes, it's the caller's responsibility.  srcu_read_lock/unlock is pretty
expensive so KVM assumes that the topmost callers do it.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kvm_read_guest_page() missing kvm->srcu read lock?
  2018-05-10 17:43 ` Paolo Bonzini
@ 2018-05-11 11:02   ` Andre Przywara
  2018-05-11 11:43     ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Andre Przywara @ 2018-05-11 11:02 UTC (permalink / raw)
  To: Paolo Bonzini, kvm; +Cc: Marc Zyngier, Jan Glauber, kvmarm

Hi Paolo,

thanks for the answer!
Took me a bit, but I think you are right (see below).

On 10/05/18 18:43, Paolo Bonzini wrote:
> On 10/05/2018 19:41, Andre Przywara wrote:
>> Hi,
>>
>> Jan posted an lockdep splat complaining about a suspicious
>> rcu_dereference_check:
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2018-May/031116.html
>>
>> The gist of that is:
>> ...
>> [ 1025.695517]  dump_stack+0x9c/0xd4
>> [ 1025.695524]  lockdep_rcu_suspicious+0xcc/0x118
>> [ 1025.695537]  gfn_to_memslot+0x174/0x190
>> [ 1025.695546]  kvm_read_guest+0x50/0xb0
>> [ 1025.695553]  vgic_its_check_id.isra.0+0x114/0x148
>> ...
>> I chased that down and wonder if kvm_read_guest{,_page} is supposed to
>> be called inside a kvm->srcu critical section?
>>
>> We have a check that suggests that eventually someone needs to enter the
>> SRCU criticial section:
>> static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm,
>> 						  int as_id)
>> {
>>         return srcu_dereference_check(kvm->memslots[as_id], &kvm->srcu,
>>                         lockdep_is_held(&kvm->slots_lock) ||
>>                         !refcount_read(&kvm->users_count));
>> }
>>
>> If I get this correctly this mean for accessing kvm->memslots we either
>> need to be inside an srcu critical section or hold the kvm->slots_lock
>> (for updates only).
>>
>> If I am not mistaken, it is not necessary for *callers* of
>> kvm_read_guest_page() to do this, as this could be entirely contained
>> inside this function - since we only use the reference to the memslot
>> entry while doing the copy_from_user(), and the data is safe afterwards
>> from an RCU point of view because it has been *copied*.
> 
> Yes, it's the caller's responsibility.  srcu_read_lock/unlock is pretty
> expensive

Is that so? I was under the impression that declaring RCU critical
sections is very cheap, is that different with SRCU?

> so KVM assumes that the topmost callers do it.

OK, fair enough. And with some hints from Jörg I understand now that x86
and s390 do a "srcu_read_lock(&kvm->srcu);" right after leaving the
guest and unlock it only shortly before entering again, so that any
intermediate calls are protected. That leaves the locking duty only up
to calls originating from userspace.
But AFAICT neither mips, powerpc or arm/arm64 are doing this. I am
checking now whether this is an omission or whether they are really
doing fine grained locking for all memslots accesses.

Definitely we are missing it around kvm_read_guest_page, so I need to
either add it there are do it like x86/s390.

Thanks!
Andre.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kvm_read_guest_page() missing kvm->srcu read lock?
  2018-05-11 11:02   ` Andre Przywara
@ 2018-05-11 11:43     ` Paolo Bonzini
  2018-05-11 13:25       ` Andre Przywara
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2018-05-11 11:43 UTC (permalink / raw)
  To: Andre Przywara, kvm
  Cc: James Hogan, Marc Zyngier, Jan Glauber, Paul Mackerras, kvmarm,
	David Gibson

On 11/05/2018 13:02, Andre Przywara wrote:
> Hi Paolo,
> 
> thanks for the answer!
> Took me a bit, but I think you are right (see below).
> 
> On 10/05/18 18:43, Paolo Bonzini wrote:
>> On 10/05/2018 19:41, Andre Przywara wrote:
>>> Hi,
>>>
>>> Jan posted an lockdep splat complaining about a suspicious
>>> rcu_dereference_check:
>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2018-May/031116.html
>>>
>>> The gist of that is:
>>> ...
>>> [ 1025.695517]  dump_stack+0x9c/0xd4
>>> [ 1025.695524]  lockdep_rcu_suspicious+0xcc/0x118
>>> [ 1025.695537]  gfn_to_memslot+0x174/0x190
>>> [ 1025.695546]  kvm_read_guest+0x50/0xb0
>>> [ 1025.695553]  vgic_its_check_id.isra.0+0x114/0x148
>>> ...
>>> I chased that down and wonder if kvm_read_guest{,_page} is supposed to
>>> be called inside a kvm->srcu critical section?
>>>
>>> We have a check that suggests that eventually someone needs to enter the
>>> SRCU criticial section:
>>> static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm,
>>> 						  int as_id)
>>> {
>>>         return srcu_dereference_check(kvm->memslots[as_id], &kvm->srcu,
>>>                         lockdep_is_held(&kvm->slots_lock) ||
>>>                         !refcount_read(&kvm->users_count));
>>> }
>>>
>>> If I get this correctly this mean for accessing kvm->memslots we either
>>> need to be inside an srcu critical section or hold the kvm->slots_lock
>>> (for updates only).
>>>
>>> If I am not mistaken, it is not necessary for *callers* of
>>> kvm_read_guest_page() to do this, as this could be entirely contained
>>> inside this function - since we only use the reference to the memslot
>>> entry while doing the copy_from_user(), and the data is safe afterwards
>>> from an RCU point of view because it has been *copied*.
>>
>> Yes, it's the caller's responsibility.  srcu_read_lock/unlock is pretty
>> expensive
> 
> Is that so? I was under the impression that declaring RCU critical
> sections is very cheap, is that different with SRCU?

Yes, because RCU effectively lets the scheduler do the expensive parts.
With SRCU you have to do them yourself with the advantage that: 1) you
can sleep during RCU critical sections; 2) synchronize_srcu is much
cheaper than synchronize_rcu and synchronize_sched.

It is still relatively cheap, and it doesn't serialize against writers,
but the order of magnitude is 100 clock cycles for each of lock and
unlock.  Compared with rcu_read_lock/unlock, which are nops on any
kernel but PREEMPT_RT, that counts as expensive. :)

>> so KVM assumes that the topmost callers do it.
> 
> OK, fair enough. And with some hints from Jörg I understand now that x86
> and s390 do a "srcu_read_lock(&kvm->srcu);" right after leaving the
:> guest and unlock it only shortly before entering again, so that any
> intermediate calls are protected. That leaves the locking duty only up
> to calls originating from userspace.
> But AFAICT neither mips, powerpc or arm/arm64 are doing this. I am
> checking now whether this is an omission or whether they are really
> doing fine grained locking for all memslots accesses.

Ok, let me Cc the maintainers.  I suppose at least some of them do use
lockdep from time to time, but it is certainly possible that some cases
have been missed.

Adding the srcu_read_lock/unlock directly in kvm_arch_vcpu_ioctl_run and
any other ioctls that need it is best, but in any case adding more pairs
is safe because they can be nested.

Thanks for the report!

Paolo
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kvm_read_guest_page() missing kvm->srcu read lock?
  2018-05-11 11:43     ` Paolo Bonzini
@ 2018-05-11 13:25       ` Andre Przywara
  2018-05-11 16:44         ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Andre Przywara @ 2018-05-11 13:25 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: James Hogan, Marc Zyngier, Jan Glauber, Paul Mackerras, kvmarm,
	David Gibson

Hi,

On 11/05/18 12:43, Paolo Bonzini wrote:
> On 11/05/2018 13:02, Andre Przywara wrote:
>> Hi Paolo,
>>
>> thanks for the answer!
>> Took me a bit, but I think you are right (see below).
>>
>> On 10/05/18 18:43, Paolo Bonzini wrote:
>>> On 10/05/2018 19:41, Andre Przywara wrote:
>>>> Hi,
>>>>
>>>> Jan posted an lockdep splat complaining about a suspicious
>>>> rcu_dereference_check:
>>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2018-May/031116.html
>>>>
>>>> The gist of that is:
>>>> ...
>>>> [ 1025.695517]  dump_stack+0x9c/0xd4
>>>> [ 1025.695524]  lockdep_rcu_suspicious+0xcc/0x118
>>>> [ 1025.695537]  gfn_to_memslot+0x174/0x190
>>>> [ 1025.695546]  kvm_read_guest+0x50/0xb0
>>>> [ 1025.695553]  vgic_its_check_id.isra.0+0x114/0x148
>>>> ...
>>>> I chased that down and wonder if kvm_read_guest{,_page} is supposed to
>>>> be called inside a kvm->srcu critical section?
>>>>
>>>> We have a check that suggests that eventually someone needs to enter the
>>>> SRCU criticial section:
>>>> static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm,
>>>> 						  int as_id)
>>>> {
>>>>         return srcu_dereference_check(kvm->memslots[as_id], &kvm->srcu,
>>>>                         lockdep_is_held(&kvm->slots_lock) ||
>>>>                         !refcount_read(&kvm->users_count));
>>>> }
>>>>
>>>> If I get this correctly this mean for accessing kvm->memslots we either
>>>> need to be inside an srcu critical section or hold the kvm->slots_lock
>>>> (for updates only).
>>>>
>>>> If I am not mistaken, it is not necessary for *callers* of
>>>> kvm_read_guest_page() to do this, as this could be entirely contained
>>>> inside this function - since we only use the reference to the memslot
>>>> entry while doing the copy_from_user(), and the data is safe afterwards
>>>> from an RCU point of view because it has been *copied*.
>>>
>>> Yes, it's the caller's responsibility.  srcu_read_lock/unlock is pretty
>>> expensive
>>
>> Is that so? I was under the impression that declaring RCU critical
>> sections is very cheap, is that different with SRCU?
> 
> Yes, because RCU effectively lets the scheduler do the expensive parts.
> With SRCU you have to do them yourself with the advantage that: 1) you
> can sleep during RCU critical sections; 2) synchronize_srcu is much
> cheaper than synchronize_rcu and synchronize_sched.
> 
> It is still relatively cheap, and it doesn't serialize against writers,
> but the order of magnitude is 100 clock cycles for each of lock and
> unlock.  Compared with rcu_read_lock/unlock, which are nops on any
> kernel but PREEMPT_RT, that counts as expensive. :)
> 
>>> so KVM assumes that the topmost callers do it.
>>
>> OK, fair enough. And with some hints from Jörg I understand now that x86
>> and s390 do a "srcu_read_lock(&kvm->srcu);" right after leaving the
> :> guest and unlock it only shortly before entering again, so that any
>> intermediate calls are protected. That leaves the locking duty only up
>> to calls originating from userspace.
>> But AFAICT neither mips, powerpc or arm/arm64 are doing this. I am
>> checking now whether this is an omission or whether they are really
>> doing fine grained locking for all memslots accesses.
> 
> Ok, let me Cc the maintainers.  I suppose at least some of them do use
> lockdep from time to time, but it is certainly possible that some cases
> have been missed.

Thanks for that. As far as I can see, mips seems to be safe, because
they don't use kvm_read_guest() and the other memslot references seem to
be properly protected. So James can enjoy this weekend ;-)

For powerpc it's a bit more complex, I tried to chase down at least the
four users of kvm_read_guest():
- The one in arch/powerpc/kvm/book3s_rtas.c is safe.
- The two users in arch/powerpc/kvm/book3s_64_mmu_radix.c don't seem to
take any locks, but are only called when creating the VM, so that's
supposedly somewhat safe (?)
- I couldn't find any protection for the usage in
arch/powerpc/kvm/powerpc.c, but the call chain is quite convoluted
there, so I might have missed something. It would be good if someone
more familiar with this code would take a look.

> Adding the srcu_read_lock/unlock directly in kvm_arch_vcpu_ioctl_run and
> any other ioctls that need it is best, but in any case adding more pairs
> is safe because they can be nested.

So I added a small wrapper around kvm_read_guest(), which takes and
drops the lock. Will send out the patch shortly. If powerpc needs it, I
am happy to provide this wrapper in kvm_main.c instead of some arm
header file instead.

Cheers,
Andre.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kvm_read_guest_page() missing kvm->srcu read lock?
  2018-05-11 13:25       ` Andre Przywara
@ 2018-05-11 16:44         ` Paolo Bonzini
  2018-05-11 16:59           ` Andre Przywara
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2018-05-11 16:44 UTC (permalink / raw)
  To: Andre Przywara, kvm
  Cc: James Hogan, Marc Zyngier, Jan Glauber, Paul Mackerras, kvmarm,
	David Gibson

On 11/05/2018 15:25, Andre Przywara wrote:
> - I couldn't find any protection for the usage in
> arch/powerpc/kvm/powerpc.c, but the call chain is quite convoluted
> there, so I might have missed something. It would be good if someone
> more familiar with this code would take a look.

I also didn't find anything, I got up to kvmppc_handle_exit_pr in
book3s_pr.c and kvmppc_handle_exit in booke.c, then the callers are
assembly and I decided it's buggy. :)

>> Adding the srcu_read_lock/unlock directly in kvm_arch_vcpu_ioctl_run and
>> any other ioctls that need it is best, but in any case adding more pairs
>> is safe because they can be nested.
>
> So I added a small wrapper around kvm_read_guest(), which takes and
> drops the lock. Will send out the patch shortly. If powerpc needs it, I
> am happy to provide this wrapper in kvm_main.c instead of some arm
> header file instead.

I think that risks having some performance impact, though perhaps
mitigated by ARM having many virtual devices in the core.  Moving it
above would be better, to the equivalent of POWER's kvmppc_handle_exit*
functions.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kvm_read_guest_page() missing kvm->srcu read lock?
  2018-05-11 16:44         ` Paolo Bonzini
@ 2018-05-11 16:59           ` Andre Przywara
  0 siblings, 0 replies; 7+ messages in thread
From: Andre Przywara @ 2018-05-11 16:59 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: James Hogan, Marc Zyngier, Jan Glauber, Paul Mackerras, kvmarm,
	David Gibson

Hi,

On 11/05/18 17:44, Paolo Bonzini wrote:
> On 11/05/2018 15:25, Andre Przywara wrote:
>> - I couldn't find any protection for the usage in
>> arch/powerpc/kvm/powerpc.c, but the call chain is quite convoluted
>> there, so I might have missed something. It would be good if someone
>> more familiar with this code would take a look.
> 
> I also didn't find anything, I got up to kvmppc_handle_exit_pr in
> book3s_pr.c and kvmppc_handle_exit in booke.c, then the callers are
> assembly and I decided it's buggy. :)
> 
>>> Adding the srcu_read_lock/unlock directly in kvm_arch_vcpu_ioctl_run and
>>> any other ioctls that need it is best, but in any case adding more pairs
>>> is safe because they can be nested.
>>
>> So I added a small wrapper around kvm_read_guest(), which takes and
>> drops the lock. Will send out the patch shortly. If powerpc needs it, I
>> am happy to provide this wrapper in kvm_main.c instead of some arm
>> header file instead.
> 
> I think that risks having some performance impact, though perhaps
> mitigated by ARM having many virtual devices in the core.  Moving it
> above would be better, to the equivalent of POWER's kvmppc_handle_exit*
> functions.

Well, I am not sure the performance is super important here, since doing
kvm_read_guest() in the first place is probably not very cheap anyway.
Also some instances of kvm_read_guest() are called from the save/restore
code, which originates from userland, so we wouldn't be covered by doing
in handle_exit.
But we can somewhat optimize by taking some locks outside of loops, for
instance. Not sure that is premature optimization, though. I would
prefer doing that later. I think in the past I had some idea how to
avoid accessing guest memory in some cases, not sure those still apply.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-11 16:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-10 17:41 kvm_read_guest_page() missing kvm->srcu read lock? Andre Przywara
2018-05-10 17:43 ` Paolo Bonzini
2018-05-11 11:02   ` Andre Przywara
2018-05-11 11:43     ` Paolo Bonzini
2018-05-11 13:25       ` Andre Przywara
2018-05-11 16:44         ` Paolo Bonzini
2018-05-11 16:59           ` Andre Przywara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.