Re: [PATCH 15/15] KVM: x86/mmu: Promote pages in-place when disabling dirty logging

From: Peter Xu <peterx@redhat.com>
To: Ben Gardon <bgardon@google.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Peter Shier <pshier@google.com>,
	David Matlack <dmatlack@google.com>,
	Mingwei Zhang <mizhang@google.com>,
	Yulei Zhang <yulei.kernel@gmail.com>,
	Wanpeng Li <kernellwp@gmail.com>,
	Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
	Kai Huang <kai.huang@intel.com>,
	Keqian Zhu <zhukeqian1@huawei.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH 15/15] KVM: x86/mmu: Promote pages in-place when disabling dirty logging
Date: Tue, 30 Nov 2021 15:28:07 +0800	[thread overview]
Message-ID: <YaXSh6RUOH7NHG8G@xz-m1.local> (raw)
In-Reply-To: <CANgfPd9pK83S+yoRokLg7wiroE6-OkieATTqgGn3yCCzwNFi4A@mail.gmail.com>

On Mon, Nov 29, 2021 at 10:31:14AM -0800, Ben Gardon wrote:
> > As comment above handle_removed_tdp_mmu_page() showed, at this point IIUC
> > current thread should have exclusive ownership of this orphaned and abandoned
> > pgtable page, then why in handle_removed_tdp_mmu_page() we still need all the
> > atomic operations and REMOVED_SPTE tricks to protect from concurrent access?
> > Since that's cmpxchg-ed out of the old pgtable, what can be accessing it
> > besides the current thread?
> 
> The cmpxchg does nothing to guarantee that other threads can't have a
> pointer to the page table, only that this thread knows it's the one
> that removed it from the page table. Other threads could still have
> pointers to it in two ways:
> 1. A kernel thread could be in the process of modifying an SPTE in the
> page table, under the MMU lock in read mode. In that case, there's no
> guarantee that there's not another kernel thread with a pointer to the
> SPTE until the end of an RCU grace period.

Right, I definitely missed that whole picture of the RCU usage.  Thanks.

> 2. There could be a pointer to the page table in a vCPU's paging
> structure caches, which are similar to the TLB but cache partial
> translations. These are also cleared out on TLB flush.

Could you elaborate what's the structure cache that you mentioned?  I thought
the processor page walker will just use the data cache (L1-L3) as pgtable
caches, in which case IIUC the invalidation happens when we do WRITE_ONCE()
that'll invalidate all the rest data cache besides the writter core.  But I
could be completely missing something..

> Sean's recent series linked the RCU grace period and TLB flush in a
> clever way so that we can ensure that the end of a grace period
> implies that the necessary flushes have happened already, but we still
> need to clear out the disconnected page table with atomic operations.
> We need to clear it out mostly to collect dirty / accessed bits and
> update page size stats.

Yes, this sounds reasonable too.

-- 
Peter Xu