From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com, avi.kivity@gmail.com, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v3 04/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
Date: Fri, 15 Nov 2013 15:09:13 +0800 [thread overview]
Message-ID: <5285C899.6040005@linux.vnet.ibm.com> (raw)
In-Reply-To: <20131114183959.GA27064@amt.cnet>
On 11/15/2013 02:39 AM, Marcelo Tosatti wrote:
> On Thu, Nov 14, 2013 at 01:15:24PM +0800, Xiao Guangrong wrote:
>>
>> Hi Marcelo,
>>
>> On 11/14/2013 08:36 AM, Marcelo Tosatti wrote:
>>
>>>
>>> Any code location which reads the writable bit in the spte and assumes if its not
>>> set, that the translation which the spte refers to is not cached in a
>>> remote CPU's TLB can become buggy. (*)
>>>
>>> It might be the case that now its not an issue, but its so subtle that
>>> it should be improved.
>>>
>>> Can you add a fat comment on top of is_writeable_bit describing this?
>>> (and explain why is_writable_pte users do not make an assumption
>>> about (*).
>>>
>>> "Writeable bit of locklessly modifiable sptes might be cleared
>>> but TLBs not flushed: so whenever reading locklessly modifiable sptes
>>> you cannot assume TLBs are flushed".
>>>
>>> For example this one is unclear:
>>>
>>> if (!can_unsync && is_writable_pte(*sptep))
>>> goto set_pte;
>>> And:
>>>
>>> if (!is_writable_pte(spte) &&
>>> !(pt_protect && spte_is_locklessly_modifiable(spte)))
>>> return false;
>>>
>>> This is safe because get_dirty_log/kvm_mmu_slot_remove_write_access are
>>> serialized by a single mutex (if there were two mutexes, it would not be
>>> safe). Can you add an assert to both
>>> kvm_mmu_slot_remove_write_access/kvm_vm_ioctl_get_dirty_log
>>> for (slots_lock) is locked, and explain?
>>>
>>> So just improve the comments please, thanks (no need to resend whole
>>> series).
>>
>> Thank you very much for your time to review it and really appreciate
>> for you detailed the issue so clearly to me.
>>
>> I will do it on the top of this patchset or after it is merged
>> (if it's possiable).
>
> Ok, can you explain why every individual caller of is_writable_pte have
> no such assumption now? (the one mentioned above is not clear to me for
> example, should explain all of them).
Okay.
Generally speak, we 1) needn't care readonly spte too much since it
can not be locklessly write-protected and 2) if is_writable_pte() is used
to check mmu-mode's state we can check SPTE_MMU_WRITEABLE instead.
There are the places is_writable_pte is used:
1) in spte_has_volatile_bits():
527 static bool spte_has_volatile_bits(u64 spte)
528 {
529 /*
530 * Always atomicly update spte if it can be updated
531 * out of mmu-lock, it can ensure dirty bit is not lost,
532 * also, it can help us to get a stable is_writable_pte()
533 * to ensure tlb flush is not missed.
534 */
535 if (spte_is_locklessly_modifiable(spte))
536 return true;
537
538 if (!shadow_accessed_mask)
539 return false;
540
541 if (!is_shadow_present_pte(spte))
542 return false;
543
544 if ((spte & shadow_accessed_mask) &&
545 (!is_writable_pte(spte) || (spte & shadow_dirty_mask)))
546 return false;
547
548 return true;
549 }
this path is not broken since any spte can be lockless modifiable will do
lockless update (will always return 'true' in the line 536).
2): in mmu_spte_update()
594 /*
595 * For the spte updated out of mmu-lock is safe, since
596 * we always atomicly update it, see the comments in
597 * spte_has_volatile_bits().
598 */
599 if (spte_is_locklessly_modifiable(old_spte) &&
600 !is_writable_pte(new_spte))
601 ret = true;
The new_spte is a temp value that can not be fetched by lockless
write-protection and !is_writable_pte() is stable enough (can not be
locklessly write-protected).
3) in spte_write_protect()
1368 if (!is_writable_pte(spte) &&
1369 !spte_is_locklessly_modifiable(spte))
1370 return false;
1371
It always do write-protection if the spte is lockelss modifiable.
(This code is the aspect after applying the whole pachset, the code is safe too
before patch "[PATCH v3 14/15] KVM: MMU: clean up spte_write_protect" since
the lockless write-protection path is serialized by a single lock.).
4) in set_spte()
2690 /*
2691 * Optimization: for pte sync, if spte was writable the hash
2692 * lookup is unnecessary (and expensive). Write protection
2693 * is responsibility of mmu_get_page / kvm_sync_page.
2694 * Same reasoning can be applied to dirty page accounting.
2695 */
2696 if (!can_unsync && is_writable_pte(*sptep))
2697 goto set_pte;
It is used for a optimization and the worst case is the optimization is disabled
(walking the shadow pages in the hast table) when the spte has been locklessly
write-protected. It does not hurt anything since it is a rare event. And the
optimization can be back if we check SPTE_MMU_WRITEABLE instead.
5) fast_page_fault()
3110 /*
3111 * Check if it is a spurious fault caused by TLB lazily flushed.
3112 *
3113 * Need not check the access of upper level table entries since
3114 * they are always ACC_ALL.
3115 */
3116 if (is_writable_pte(spte)) {
3117 ret = true;
3118 goto exit;
3119 }
Since kvm_vm_ioctl_get_dirty_log() firstly get-and-clear dirty-bitmap before
do write-protect, the dirty-bitmap will be properly set again when fast_page_fault
fix the spte who is write-protected by lockless write-protection.
6) in fast_page_fault's tracepoint:
244 #define __spte_satisfied(__spte) \
245 (__entry->retry && is_writable_pte(__entry->__spte))
It causes the tracepoint reports the wrong result when fast_page_fault
and tdp_page_fault/lockless-write-protect run concurrently, i guess it's
okay since it's only used for trace.
7) in audit_write_protection():
202 if (is_writable_pte(*sptep))
203 audit_printk(kvm, "shadow page has writable "
204 "mappings: gfn %llx role %x\n",
205 sp->gfn, sp->role.word);
It's okay since lockless-write-protection does not update the readonly sptes.
>
> OK to improve comments later.
Thank you, Marcelo!
next prev parent reply other threads:[~2013-11-15 7:09 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-23 13:29 [PATCH v3 00/15] KVM: MMU: locklessly write-protect Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 01/15] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-11-12 0:25 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 02/15] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-11-12 22:44 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 03/15] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-11-13 0:10 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 04/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2013-11-14 0:36 ` Marcelo Tosatti
2013-11-14 5:15 ` Xiao Guangrong
2013-11-14 18:39 ` Marcelo Tosatti
2013-11-15 7:09 ` Xiao Guangrong [this message]
2013-11-19 0:19 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 05/15] KVM: MMU: update spte and add it into rmap before dirty log Xiao Guangrong
2013-11-15 0:08 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 06/15] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-11-19 0:48 ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 07/15] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-11-22 19:14 ` Marcelo Tosatti
2013-11-25 6:11 ` Xiao Guangrong
2013-11-25 6:29 ` Xiao Guangrong
2013-11-25 18:12 ` Marcelo Tosatti
2013-11-26 3:21 ` Xiao Guangrong
2013-11-26 10:12 ` Gleb Natapov
2013-11-26 19:31 ` Marcelo Tosatti
2013-11-28 8:53 ` Xiao Guangrong
2013-12-03 7:10 ` Xiao Guangrong
2013-12-05 13:50 ` Marcelo Tosatti
2013-12-05 15:30 ` Xiao Guangrong
2013-12-06 0:15 ` Marcelo Tosatti
2013-12-06 0:22 ` Marcelo Tosatti
2013-12-10 6:58 ` Xiao Guangrong
2013-11-25 10:19 ` Gleb Natapov
2013-11-25 10:25 ` Xiao Guangrong
2013-11-25 12:48 ` Avi Kivity
2013-11-25 14:23 ` Marcelo Tosatti
2013-11-25 14:29 ` Gleb Natapov
2013-11-25 18:06 ` Marcelo Tosatti
2013-11-26 3:10 ` Xiao Guangrong
2013-11-26 10:15 ` Gleb Natapov
2013-11-26 19:58 ` Marcelo Tosatti
2013-11-28 8:32 ` Xiao Guangrong
2013-11-25 14:08 ` Marcelo Tosatti
2013-11-26 3:02 ` Xiao Guangrong
2013-11-25 9:31 ` Peter Zijlstra
2013-11-25 10:59 ` Xiao Guangrong
2013-11-25 11:05 ` Peter Zijlstra
2013-11-25 11:29 ` Peter Zijlstra
2013-10-23 13:29 ` [PATCH v3 08/15] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 09/15] KVM: MMU: initialize the pointers in pte_list_desc properly Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 10/15] KVM: MMU: allocate shadow pages from slab Xiao Guangrong
2013-10-24 9:19 ` Gleb Natapov
2013-10-24 9:29 ` Xiao Guangrong
2013-10-24 9:52 ` Gleb Natapov
2013-10-24 10:10 ` Xiao Guangrong
2013-10-24 10:39 ` Gleb Natapov
2013-10-24 11:01 ` Xiao Guangrong
2013-10-24 12:32 ` Gleb Natapov
2013-10-28 3:16 ` Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 11/15] KVM: MMU: locklessly access shadow page under rcu protection Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 12/15] KVM: MMU: check last spte with unawareness of mapping level Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 13/15] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-10-24 9:17 ` Gleb Natapov
2013-10-24 9:24 ` Xiao Guangrong
2013-10-24 9:32 ` Gleb Natapov
2013-10-23 13:29 ` [PATCH v3 14/15] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 15/15] KVM: MMU: use rcu functions to access the pointer Xiao Guangrong
2013-11-03 12:29 ` [PATCH v3 00/15] KVM: MMU: locklessly write-protect Gleb Natapov
2013-11-11 5:33 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5285C899.6040005@linux.vnet.ibm.com \
--to=xiaoguangrong@linux.vnet.ibm.com \
--cc=avi.kivity@gmail.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).