From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CF73C433EF for ; Thu, 3 Mar 2022 21:06:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233941AbiCCVHJ (ORCPT ); Thu, 3 Mar 2022 16:07:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230342AbiCCVHH (ORCPT ); Thu, 3 Mar 2022 16:07:07 -0500 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3058756C31 for ; Thu, 3 Mar 2022 13:06:21 -0800 (PST) Received: by mail-pf1-x42c.google.com with SMTP id t5so5820627pfg.4 for ; Thu, 03 Mar 2022 13:06:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=2YMrVvHSLKMUXF8Ea2QMnfFfSxJo6UWFNMRbgeXxqeE=; b=iLh9Uof2STE41+Pazpe52q7YPYnjVuXTGIKG2qt7gZLHGjyaekOqGyJmUPTkGjlS8r EJJfxLF4rNGL0QWrLHXsHvddL4TEef5lC3/WYctfvgDKnxG6XFTEojH6CIAfivPS9xRf +nPoRfrdQSjBLLFu42qbswyg2XLvxkBk7vIfxuG2mHc7QeqxK3o4BnD5zazFVMzdKAGI 1VCbG2oVjx4iZYyDX2oP2ql6hcRvY3PX8Olk78oPlbOQZBBbjfIv4ZguX1Kqj8Rre9/8 iLT/GYp9cLlADOltBKYpKZ2ncJzw4CmI5SGvHiSV2e00nix38oDOvY1tLFA+hne3Ik/1 2g2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2YMrVvHSLKMUXF8Ea2QMnfFfSxJo6UWFNMRbgeXxqeE=; b=fSaVtbhBFllyVDjVamzso35nHXAAcVl6IbeZ+MrehWzvTjb8njOvpkyFznNo85Q0/3 L7Bj620fH37kSJOwpdoPXA9Ml+FRmyh2RszTdLdUakpjx2rYYjBU4A407cBHdtOA+Xja uiem9cHU5GJWCCDoxJdqpKoVtuVN5zi68FX1hVRHcG1t2hhPiHL9MyyLU/CZcxB9jLcT CFa8t94feKI3Whrz2lLW2iKmmCk5B0yh0hFC5hS+Lksf/7FiiSy/U10M+FN6ju3z2njY lWr7+Pbv5vpAhgzrqV0r9OY9KHCpzadhdZFdsusr62OqAD6tec9j1PGuk6ifX+75c8KX RH3w== X-Gm-Message-State: AOAM533KmCstfbchAxt49+8mWv7x8NmYu37FAWESpjrTcaJWm224leQv y2OdGhx21wEbVVxjH53evs6XBQ== X-Google-Smtp-Source: ABdhPJyXLwATznnV51/fVrD0KmuL+No2Ds3ngqbWVyYstKSPQYxINhnVgp4v1ZDcB3OW/pvxawCUQA== X-Received: by 2002:a05:6a00:1954:b0:4e1:f25:ce41 with SMTP id s20-20020a056a00195400b004e10f25ce41mr40083511pfk.44.1646341578069; Thu, 03 Mar 2022 13:06:18 -0800 (PST) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id u19-20020a056a00099300b004e16e381696sm3480827pfg.195.2022.03.03.13.06.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Mar 2022 13:06:17 -0800 (PST) Date: Thu, 3 Mar 2022 21:06:14 +0000 From: Sean Christopherson To: Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Hildenbrand , David Matlack , Ben Gardon , Mingwei Zhang Subject: Re: [PATCH v4 21/30] KVM: x86/mmu: Zap invalidated roots via asynchronous worker Message-ID: References: <20220303193842.370645-1-pbonzini@redhat.com> <20220303193842.370645-22-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 03, 2022, Sean Christopherson wrote: > On Thu, Mar 03, 2022, Paolo Bonzini wrote: > > + root->tdp_mmu_async_data = kvm; > > + INIT_WORK(&root->tdp_mmu_async_work, tdp_mmu_zap_root_work); > > + queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work); > > +} > > + > > +static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page) > > +{ > > + union kvm_mmu_page_role role = page->role; > > + role.invalid = true; > > + > > + /* No need to use cmpxchg, only the invalid bit can change. */ > > + role.word = xchg(&page->role.word, role.word); > > + return role.invalid; > > This helper is unused. It _could_ be used here, but I think it belongs in the > next patch. Critically, until zapping defunct roots creates the invariant that > invalid roots are _always_ zapped via worker, kvm_tdp_mmu_invalidate_all_roots() > must not assume that an invalid root is queued for zapping. I.e. doing this > before the "Zap defunct roots" would be wrong: > > list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { > if (kvm_tdp_root_mark_invalid(root)) > continue; > > if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))); > continue; > > tdp_mmu_schedule_zap_root(kvm, root); > } Gah, lost my train of thought and forgot that this _can_ re-queue a root even in this patch, it just can't it just can't re-queue a root that is _currently_ queued. The re-queue scenario happens if a root is queued and zapped, but is kept alive by a vCPU that hasn't yet put its reference. If another memslot comes along before the (sleeping) vCPU drops its reference, this will re-queue the root. It's not a major problem in this patch as it's a small amount of wasted effort, but it will be an issue when the "put" path starts using the queue, as that will create a scenario where a memslot update (or NX toggle) can come along while a defunct root is in the zap queue. Checking for role.invalid is wrong (as above), so for this patch I think the easiest thing is to use tdp_mmu_async_data as a sentinel that the root was zapped in the past and doesn't need to be re-zapped. /* * Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that * is about to be zapped, e.g. in response to a memslots update. The actual * zapping is performed asynchronously, so a reference is taken on all roots. * Using a separate workqueue makes it easy to ensure that the destruction is * performed before the "fast zap" completes, without keeping a separate list * of invalidated roots; the list is effectively the list of work items in * the workqueue. * * Skip roots that were already queued for zapping, the "fast zap" path is the * only user of the zap queue and always flushes the queue under slots_lock, * i.e. the queued zap is guaranteed to have completed already. * * Because mmu_lock is held for write, it should be impossible to observe a * root with zero refcount,* i.e. the list of roots cannot be stale. * * This has essentially the same effect for the TDP MMU * as updating mmu_valid_gen does for the shadow MMU. */ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) { struct kvm_mmu_page *root; lockdep_assert_held_write(&kvm->mmu_lock); list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { if (root->tdp_mmu_async_data) continue; if (WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) continue; root->role.invalid = true; tdp_mmu_schedule_zap_root(kvm, root); } }