From: David Woodhouse <dwmw2@infradead.org>
To: Sean Christopherson <seanjc@google.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
Paul Durrant <pdurrant@amazon.co.uk>
Subject: Re: [PATCH] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
Date: Tue, 23 Jan 2024 15:06:13 +0000 [thread overview]
Message-ID: <6dc0e9d1f5db41a053b734b29403ad48c288dea3.camel@infradead.org> (raw)
In-Reply-To: <Zaf7yCYt8XFuMhAd@google.com>
[-- Attachment #1: Type: text/plain, Size: 4912 bytes --]
On Wed, 2024-01-17 at 08:09 -0800, Sean Christopherson wrote:
>
> NAK, at least as a bug fix.
OK. Not a bug fix.
>
> The contract with the gfn_to_pfn_cache, or rather the lack thereof,
> is all kinds of screwed up.
Well yes, that's my point. How's this for a reworked commit message
which doesn't claim it's a bug fix....
KVM: pfncache: clean up and simplify locking mess
The locking on the gfn_to_pfn_cache is... interesting. And awful.
There is a rwlock in ->lock which readers take to ensure protection
against concurrent changes. But __kvm_gpc_refresh() makes assumptions
that certain fields will not change even while it drops the write lock
and performs MM operations to revalidate the target PFN and kernel
mapping.
Commit 93984f19e7bc ("KVM: Fully serialize gfn=>pfn cache refresh via
mutex") partly addressed that — not by fixing it, but by adding a new
mutex, ->refresh_lock. This prevented concurrent __kvm_gpc_refresh()
calls on a given gfn_to_pfn_cache, but is still only a partial solution.
There is still a theoretical race where __kvm_gpc_refresh() runs in
parallel with kvm_gpc_deactivate(). While __kvm_gpc_refresh() has
dropped the write lock, kvm_gpc_deactivate() clears the ->active flag
and unmaps ->khva. Then __kvm_gpc_refresh() determines that the previous
->pfn and ->khva are still valid, and reinstalls those values into the
structure. This leaves the gfn_to_pfn_cache with the ->valid bit set,
but ->active clear. And a ->khva which looks like a reasonable kernel
address but is actually unmapped.
All it takes is a subsequent reactivation to cause that ->khva to be
dereferenced. This would theoretically cause an oops which would look
something like this:
[1724749.564994] BUG: unable to handle page fault for address: ffffaa3540ace0e0
[1724749.565039] RIP: 0010:__kvm_xen_has_interrupt+0x8b/0xb0
I say "theoretically" because theoretically, that oops that was seen in
production cannot happen. The code which uses the gfn_to_pfn_cache is
supposed to have its *own* locking, to further paper over the fact that
the gfn_to_pfn_cache's own papering-over (->refresh_lock) of its own
rwlock abuse is not sufficient.
For the Xen vcpu_info that external lock is the vcpu->mutex, and for the
shared info it's kvm->arch.xen.xen_lock. Those locks ought to protect
the gfn_to_pfn_cache against concurrent deactivation vs. refresh in all
but the cases where the vcpu or kvm object is being *destroyed*, in
which case the subsequent reactivation should never happen.
Theoretically.
Nevertheless, this locking abuse is awful and should be fixed, even if
no clear explanation can be found for how the oops happened. So...
Clean up the semantics of hva_to_pfn_retry() so that it no longer does
any locking gymnastics because it no longer operates on the gpc object
at all. It is now called with a uhva and simply returns the
corresponding pfn (pinned), and a mapped khva for it.
Its caller __kvm_gpc_refresh() now sets gpc->uhva and clears gpc->valid
before dropping ->lock, calling hva_to_pfn_retry() and retaking ->lock
for write.
If hva_to_pfn_retry() fails, *or* if the ->uhva or ->active fields in
the gpc changed while the lock was dropped, the new mapping is discarded
and the gpc is not modified. On success with an unchanged gpc, the new
mapping is installed and the current ->pfn and ->uhva are taken into the
local old_pfn and old_khva variables to be unmapped once the locks are
all released.
This simplification means that ->refresh_lock is no longer needed for
correctness, but it does still provide a minor optimisation because it
will prevent two concurrent __kvm_gpc_refresh() calls from mapping a
given PFN, only for one of them to lose the race and discard its
mapping.
The optimisation in hva_to_pfn_retry() where it attempts to use the old
mapping if the pfn doesn't change is dropped, since it makes the pinning
more complex. It's a pointless optimisation anyway, since the odds of
the pfn ending up the same when the uhva has changed (i.e. the odds of
the two userspace addresses both pointing to the same underlying
physical page) are negligible,
The 'hva_changed' local variable in __kvm_gpc_refresh() is also removed,
since it's simpler just to clear gpc->valid if the uhva changed.
Likewise the unmap_old variable is dropped because it's just as easy to
check the old_pfn variable for KVM_PFN_ERR_FAULT.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
prev parent reply other threads:[~2024-01-23 15:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-12 20:38 [PATCH] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues Woodhouse, David
2024-01-17 16:09 ` Sean Christopherson
2024-01-17 16:32 ` Woodhouse, David
2024-01-17 16:51 ` Sean Christopherson
2024-01-17 16:59 ` David Woodhouse
2024-01-17 18:14 ` Sean Christopherson
2024-01-17 18:33 ` David Woodhouse
2024-01-23 15:06 ` David Woodhouse [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6dc0e9d1f5db41a053b734b29403ad48c288dea3.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=pdurrant@amazon.co.uk \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).