From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1439C433E0 for ; Wed, 3 Feb 2021 11:40:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B56464F77 for ; Wed, 3 Feb 2021 11:40:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234262AbhBCLkb (ORCPT ); Wed, 3 Feb 2021 06:40:31 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:28164 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234231AbhBCLkU (ORCPT ); Wed, 3 Feb 2021 06:40:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612352332; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fyt/1j+ufqlbvMO8Cb0k5HgIBjZWY9LFMD9ai2A1eVM=; b=RcuTERvcPRTzedRMBNYRwQ2cDn2+u+6kIKnitm0LxlouN9nIVZiAVQYzGitwoo4OjUjgRh kakPgqmZtjHfAkUuQM12gTU2MppvqV3o3ZzhHXA1mPM0v8WIowNY1A9Di1Hbx1u66UxvNe h2za8FyL/zufOohnhMowDFEWvVS2xZA= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-243-EOb6JWMEPpmOSr2bTBfHTw-1; Wed, 03 Feb 2021 06:38:51 -0500 X-MC-Unique: EOb6JWMEPpmOSr2bTBfHTw-1 Received: by mail-ed1-f69.google.com with SMTP id y6so11308969edc.17 for ; Wed, 03 Feb 2021 03:38:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=fyt/1j+ufqlbvMO8Cb0k5HgIBjZWY9LFMD9ai2A1eVM=; b=r8/Lzzyqdr3InYdI9Gee8hsEbiMqIO+YhdeEuzSXNOv/bgQ8glDfnGbu6PGm9Qo+6w UxBll6yV3EXRb1EN1mrPI/QVf4ntoW1LlUNkBoa/GjIpQYecM6DWZxlPSqobTpW5GEF4 y9nUCnHooFDN5EeoTB3gab+0UeR3ceQvJLi7CjK0SjV91svH7QziPm2b9OFyQeJ2HEKm jVFHNmRDXuQks08nuPJPms0E4ia1ZzeueH5qLGIac6LSMoJhrRDR3PzqOIohALsMS1Oo 8Lj06h3AdTrOzDz+O+QPtP5HFL+AkM1eZCj38dUWysIHBUJ+GksTXLgzFpSTEPkbUute Sk2w== X-Gm-Message-State: AOAM532P/dxDogDiykkClJI9MdhQmRSfXgZHTZjnfzH6oawAiCY+62QO 0DoEfkvnGwq8rYKQEtzG5zyQoV1ZhEkVagCm8LYwvIVOY4kiWawO1yNYx5+ZsH28uqhHbg+iD2E 1hgyTUG8GQFauQ/T5oKdkDTOD X-Received: by 2002:aa7:d6c2:: with SMTP id x2mr2432749edr.225.1612352329985; Wed, 03 Feb 2021 03:38:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJySe8fY8EB8TI+A5pKdxx529xqLfM8+1X0HgPpcOJfv6q8T9TZGcIIXMeJXt26YU+Tu/jiFOA== X-Received: by 2002:aa7:d6c2:: with SMTP id x2mr2432734edr.225.1612352329723; Wed, 03 Feb 2021 03:38:49 -0800 (PST) Received: from ?IPv6:2001:b07:6468:f312:c8dd:75d4:99ab:290a? ([2001:b07:6468:f312:c8dd:75d4:99ab:290a]) by smtp.gmail.com with ESMTPSA id h12sm719987edb.16.2021.02.03.03.38.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 03 Feb 2021 03:38:48 -0800 (PST) Subject: Re: [PATCH v2 26/28] KVM: x86/mmu: Allow enabling / disabling dirty logging under MMU read lock To: Ben Gardon , linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Peter Xu , Sean Christopherson , Peter Shier , Peter Feiner , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong References: <20210202185734.1680553-1-bgardon@google.com> <20210202185734.1680553-27-bgardon@google.com> From: Paolo Bonzini Message-ID: Date: Wed, 3 Feb 2021 12:38:47 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <20210202185734.1680553-27-bgardon@google.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/02/21 19:57, Ben Gardon wrote: > To reduce lock contention and interference with page fault handlers, > allow the TDP MMU functions which enable and disable dirty logging > to operate under the MMU read lock. > > > Extend dirty logging enable disable functions read lock-ness > > Signed-off-by: Ben Gardon > --- > arch/x86/kvm/mmu/mmu.c | 14 +++--- > arch/x86/kvm/mmu/tdp_mmu.c | 93 ++++++++++++++++++++++++++++++-------- > arch/x86/kvm/mmu/tdp_mmu.h | 2 +- > 3 files changed, 84 insertions(+), 25 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index e3cf868be6bd..6ba2a72d4330 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5638,9 +5638,10 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, > > write_lock(&kvm->mmu_lock); > flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false); > + write_unlock(&kvm->mmu_lock); > + > if (kvm->arch.tdp_mmu_enabled) > flush |= kvm_tdp_mmu_clear_dirty_slot(kvm, memslot); > - write_unlock(&kvm->mmu_lock); > > /* > * It's also safe to flush TLBs out of mmu lock here as currently this > @@ -5661,9 +5662,10 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, > write_lock(&kvm->mmu_lock); > flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect, > false); > + write_unlock(&kvm->mmu_lock); > + > if (kvm->arch.tdp_mmu_enabled) > flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_2M); > - write_unlock(&kvm->mmu_lock); > > if (flush) > kvm_arch_flush_remote_tlbs_memslot(kvm, memslot); > @@ -5677,12 +5679,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm, > > write_lock(&kvm->mmu_lock); > flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false); > - if (kvm->arch.tdp_mmu_enabled) > - flush |= kvm_tdp_mmu_slot_set_dirty(kvm, memslot); > - write_unlock(&kvm->mmu_lock); > - > if (flush) > kvm_arch_flush_remote_tlbs_memslot(kvm, memslot); > + write_unlock(&kvm->mmu_lock); > + > + if (kvm->arch.tdp_mmu_enabled) > + kvm_tdp_mmu_slot_set_dirty(kvm, memslot); > } > EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty); > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index cfe66b8d39fa..6093926a6bc5 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -553,18 +553,22 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, > } > > /* > - * tdp_mmu_set_spte_atomic - Set a TDP MMU SPTE atomically and handle the > + * __tdp_mmu_set_spte_atomic - Set a TDP MMU SPTE atomically and handle the > * associated bookkeeping > * > * @kvm: kvm instance > * @iter: a tdp_iter instance currently on the SPTE that should be set > * @new_spte: The value the SPTE should be set to > + * @record_dirty_log: Record the page as dirty in the dirty bitmap if > + * appropriate for the change being made. Should be set > + * unless performing certain dirty logging operations. > + * Leaving record_dirty_log unset in that case prevents page > + * writes from being double counted. > * Returns: true if the SPTE was set, false if it was not. If false is returned, > * this function will have no side-effects. > */ > -static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm, > - struct tdp_iter *iter, > - u64 new_spte) > +static inline bool __tdp_mmu_set_spte_atomic(struct kvm *kvm, > + struct tdp_iter *iter, u64 new_spte, bool record_dirty_log) Instead of adding the bool argument, just name this tdp_mmu_set_spte_atomic_no_dirty_log... > { > u64 *root_pt = tdp_iter_root_pt(iter); > struct kvm_mmu_page *root = sptep_to_sp(root_pt); > @@ -583,12 +587,31 @@ static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm, > new_spte) != iter->old_spte) > return false; > > - handle_changed_spte(kvm, as_id, iter->gfn, iter->old_spte, new_spte, > - iter->level, true); > + __handle_changed_spte(kvm, as_id, iter->gfn, iter->old_spte, new_spte, > + iter->level, true); > + handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level); > + if (record_dirty_log) > + handle_changed_spte_dirty_log(kvm, as_id, iter->gfn, > + iter->old_spte, new_spte, > + iter->level); ... and tdp_mmu_set_spte_atomic becomes if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, iter, new_spte)) return false; handle_changed_spte_dirty_log(kvm, as_id, iter->gfn, iter->old_spte, new_spte, iter->level); return true; > @@ -1301,7 +1344,8 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, struct kvm_memory_slot *slot) > int root_as_id; > bool spte_set = false; > > - for_each_tdp_mmu_root_yield_safe(kvm, root, false) { > + read_lock(&kvm->mmu_lock); > + for_each_tdp_mmu_root_yield_safe(kvm, root, true) { > root_as_id = kvm_mmu_page_as_id(root); > if (root_as_id != slot->as_id) > continue; > @@ -1309,6 +1353,7 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, struct kvm_memory_slot *slot) > spte_set |= clear_dirty_gfn_range(kvm, root, slot->base_gfn, > slot->base_gfn + slot->npages); > } > + read_unlock(&kvm->mmu_lock); Same remark as before. > return spte_set; > } > @@ -1397,7 +1442,8 @@ static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, > rcu_read_lock(); > > tdp_root_for_each_pte(iter, root, start, end) { > - if (tdp_mmu_iter_cond_resched(kvm, &iter, false, false)) > +retry: > + if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true)) > continue; > > if (!is_shadow_present_pte(iter.old_spte) || > @@ -1406,7 +1452,14 @@ static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, > > new_spte = iter.old_spte | shadow_dirty_mask; > > - tdp_mmu_set_spte(kvm, &iter, new_spte); > + if (!tdp_mmu_set_spte_atomic(kvm, &iter, new_spte)) { > + /* > + * The iter must explicitly re-read the SPTE because > + * the atomic cmpxchg failed. > + */ > + iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep)); > + goto retry; > + } > spte_set = true; Yep, looks like that spte_set assignment should not have been removed. :) > } > > @@ -1417,15 +1470,15 @@ static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, > /* > * Set the dirty status of all the SPTEs mapping GFNs in the memslot. This is > * only used for PML, and so will involve setting the dirty bit on each SPTE. > - * Returns true if an SPTE has been changed and the TLBs need to be flushed. > */ > -bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) > +void kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) > { > struct kvm_mmu_page *root; > int root_as_id; > bool spte_set = false; > > - for_each_tdp_mmu_root_yield_safe(kvm, root, false) { > + read_lock(&kvm->mmu_lock); And again here. Paolo > + for_each_tdp_mmu_root_yield_safe(kvm, root, true) { > root_as_id = kvm_mmu_page_as_id(root); > if (root_as_id != slot->as_id) > continue; > @@ -1433,7 +1486,11 @@ bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) > spte_set |= set_dirty_gfn_range(kvm, root, slot->base_gfn, > slot->base_gfn + slot->npages); > } > - return spte_set; > + > + if (spte_set) > + kvm_arch_flush_remote_tlbs_memslot(kvm, slot); > + > + read_unlock(&kvm->mmu_lock); > } > > /* > diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h > index 10ada884270b..848b41b20985 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.h > +++ b/arch/x86/kvm/mmu/tdp_mmu.h > @@ -38,7 +38,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm, > struct kvm_memory_slot *slot, > gfn_t gfn, unsigned long mask, > bool wrprot); > -bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot); > +void kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot); > void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, > const struct kvm_memory_slot *slot); > >