From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30BE5C433F5 for ; Wed, 2 Mar 2022 20:47:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244257AbiCBUsU (ORCPT ); Wed, 2 Mar 2022 15:48:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231127AbiCBUsQ (ORCPT ); Wed, 2 Mar 2022 15:48:16 -0500 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC0B31260A for ; Wed, 2 Mar 2022 12:47:31 -0800 (PST) Received: by mail-pl1-x629.google.com with SMTP id z11so2609925pla.7 for ; Wed, 02 Mar 2022 12:47:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=w2++eX0uxoFYGn6KsfxgmKtFhWBbHG1cFniS6G65R/8=; b=ZvIT3UHrIkyHs392SzCJH5dcOt5KKLuzWyEeIFZOeA2qaJiPK7ton6Y6Lr1ivan4Js UzcI337HwVIoduHg/JgXtl2CQGrDZmUSf3ucbmrtJ6IgtSvNqsQw36nM9FV52EIaWM4b nNaiaF1+GKtFMTHQmBdtFfqVv6D55x5mgDz3jaj/0PN4CGVqdFbUWNqbbt7gXCxUwNEZ da1NehZYGkvWBSXYrG4bJB5le52dQRBSQDyWXcguuRqdmtaZ4BLJPrvK2Xn/b+IdPCLb oKjS+t4qWNIvM64IklH38agjF+NntHojUc4Pfaf5e5zzZCaWU8He9HomqZ7O07yFoHYc 1kCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=w2++eX0uxoFYGn6KsfxgmKtFhWBbHG1cFniS6G65R/8=; b=yIv089c9RQ9oCZh15YB8Ga5iKpiP7nOvb1T3mK6XwLZpIHHxICnDaxbKoV0BxBAtig 6C55jJI91bW8DxwYefYrXXHU2cClqKb/+q+SXtAypAqj/rQR99tm23UZiVpxEogxSGZS n6uK4hyE7QdtMpUcdl/YGT4VNi0tbU9l6WzXhJJRNNHO4bjz+aIQ4XzzCLKjVf04T6WV Yb7WSXjn9LSkWa/UxzVJLsdleeIZFghhMoAelKhDTfuBppObh5JdjzaJi+CPm+OrBn6x mviy8AWWAhS2kOSM/XmJReRsZl0SwBXTgpj1Fp3zmAOJiQ1EQHvJQsJ9H/C8SObQJh6e bxqA== X-Gm-Message-State: AOAM531GN0TCtZc/0glmxVD8FfbVEXzNrg5oKCemtkuWLNNxgb4T1zjI UEvSBmuSBTsrnol3DyajqcAHsw== X-Google-Smtp-Source: ABdhPJxkbHtoXR7YSBFfm/MXCJjH0S84GP7djHmq1H20YYhQpwJ2f0WnZE7RRove22yTCjGhMKM0DQ== X-Received: by 2002:a17:90b:4595:b0:1be:db22:8327 with SMTP id hd21-20020a17090b459500b001bedb228327mr1644495pjb.99.1646254051075; Wed, 02 Mar 2022 12:47:31 -0800 (PST) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id v23-20020a17090a521700b001bbfc181c93sm5892971pjh.19.2022.03.02.12.47.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Mar 2022 12:47:30 -0800 (PST) Date: Wed, 2 Mar 2022 20:47:26 +0000 From: Sean Christopherson To: Paolo Bonzini Cc: Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Hildenbrand , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Ben Gardon , Mingwei Zhang Subject: Re: [PATCH v3 22/28] KVM: x86/mmu: Zap defunct roots via asynchronous worker Message-ID: References: <20220226001546.360188-1-seanjc@google.com> <20220226001546.360188-23-seanjc@google.com> <442859af-6454-b15e-b2ad-0fc7c4e22909@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <442859af-6454-b15e-b2ad-0fc7c4e22909@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 02, 2022, Paolo Bonzini wrote: > On 3/2/22 20:33, Sean Christopherson wrote: > > What about that idea? Put roots invalidated by "fast zap" on_another_ list? > > My very original idea of moving the roots to a separate list didn't work because > > the roots needed to be reachable by the mmu_notifier. But we could just add > > another list_head (inside the unsync_child_bitmap union) and add the roots to > > _that_ list. > > Perhaps the "separate list" idea could be extended to have a single worker > for all kvm_tdp_mmu_put_root() work, and then indeed replace > kvm_tdp_mmu_zap_invalidated_roots() with a flush of _that_ worker. The > disadvantage is a little less parallelism in zapping invalidated roots; but > what is good for kvm_tdp_mmu_zap_invalidated_roots() is just as good for > kvm_tdp_mmu_put_root(), I suppose. If one wants separate work items, KVM > could have its own workqueue, and then you flush that workqueue. > > For now let's do it the simple but ugly way. Keeping > next_invalidated_root() does not make things worse than the status quo, and > further work will be easier to review if it's kept separate from this > already-complex work. Oof, that's not gonna work. My approach here in v3 doesn't work either. I finally remembered why I had the dedicated tdp_mmu_defunct_root flag and thus the smp_mb_*() dance. kvm_tdp_mmu_zap_invalidated_roots() assumes that it was gifted a reference to _all_ invalid roots by kvm_tdp_mmu_invalidate_all_roots(). This works in the current code base only because kvm->slots_lock is held for the entire duration, i.e. roots can't become invalid between the end of kvm_tdp_mmu_invalidate_all_roots() and the end of kvm_tdp_mmu_zap_invalidated_roots(). Marking a root invalid in kvm_tdp_mmu_put_root() breaks that assumption, e.g. if a new root is created and then dropped, it will be marked invalid but the "fast zap" will not have a reference. The "defunct" flag prevents this scenario by allowing the "fast zap" path to identify invalid roots for which it did not take a reference. By virtue of holding a reference, "fast zap" also guarantees that the roots it needs to invalidate and put can't become defunct. My preference would be to either go back to a variant of v2, or to implement my "second list" idea. I also need to figure out why I didn't encounter errors in v3, because I distinctly remember underflowing the refcount before adding the defunct flag...