Re: [Question] Hardware management of stage2 page dirty state

From: zhukeqian <zhukeqian1@huawei.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>,
	"Zengtao \(B\)" <prime.zeng@hisilicon.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [Question] Hardware management of stage2 page dirty state
Date: Fri, 15 May 2020 12:20:00 +0800	[thread overview]
Message-ID: <dce26d58-7b6b-5eaf-8f7d-41361cb5cc9c@huawei.com> (raw)
In-Reply-To: <20200514161427.GD1907@gaia>

Hi Catalin,

On 2020/5/15 0:14, Catalin Marinas wrote:
> Hi Keqian,
> 
> On Thu, May 14, 2020 at 05:16:52PM +0800, zhukeqian wrote:
>> I have some questions after deep reading your patch
>> https://patchwork.kernel.org/patch/8824261/ which enables hardware updates
>> of the Access Flag for Stage 2 page tables.
>>
>> I notice that at the bottom of commit message, you said the following words:
>> "After some digging through the KVM code, I concluded that hardware DBM
>> (dirty bit management) support is not feasible for Stage 2. A potential
>> user would be dirty logging but this requires a different bitmap exposed
>> to Qemu and, to avoid races, the stage 2 mappings need to be mapped
>> read-only on clean, writable on fault. This assumption simplifies the
>> hardware Stage 2 AF support."
>>
>> I have three questions here.
>>
>> 1. I do not understand the reason well about "not feasible". Does the main reason
>>    for this is the "races" you referred?
> 
> IIRC, dirty logging works by having a bitmap populated by the host
> kernel when the guest writes a page. Such write triggers a stage 2 fault
> and the kernel populates the bitmap. With S2 DBM, you wouldn't get a
> fault when the guest writes the page, so the host kernel would have to
> periodically check which S2 entries became writable to update the qemu
> bitmap.
Sure, the performance problem introduced by traversing page table entries is
a defect of DBM mechanism.

> 
> I think the race I had in mind was that the bitmap still reports the
> page as clean while the guest already updated it.
> 
> Looking at this again, it may not matter much as qemu can copy those
> pages again when migrating and before control is handed over to the new
> host.
Yes, race is not a problem. Qemu will not miss dirty pages when control is
handed over to the new Qemu.

> 
>> 2. What does the "races" refer to? Do you mean the races between [hardware S2 DBM]
>>    and [dirty information collection that executed by KVM]?
> 
> Yes.
> 
>>    During VM live migration, Qemu will send dirty page iteratively and finally stop
>>    VM when dirty pages is not too much. We may miss dirty pages during each iteration
>>    before VM stop, but there are no races after VM stop, so we won't miss dirty pages
>>    finally. It seems that "races" is not a convinced reason for "not feasible".
> 
> You are probably right. But you'd have to change the dirty tracking from
> a fault mechanism to a polling one checking the S2 page tables
> periodically. Or, can you check then only once after VM stop?

Our purpose is to remove performance side effect on guest caused by fault mechanism, so we want to
use DBM from begin to end.

For now, the only problem of DBM that we can figure out is the page table traversing performance.
We have done some demo tests on this and situation is not that bad. Besides, we have come up with
some optimizations which can ease this situation effectively.

I plan to send out all test data and PATCH RFC to community next week. It should work functional
correctly but without any optimizations. After that I will add all optimizations based on PATCH
RFC and send PATCH v1.

> 
>> 3. You said that disable hardware S2 DBM support can simplify the hardware S2 AF support.
>>    Could you please explain the reason in detail?
> 
> I probably meant that it simplifies the patch rather than something
> specific to the AF support. If you add DBM, you'd need to make sure that
> making a pte read-only doesn't lose the dirty information (see
> ptep_set_wrprotect(), not sure whether KVM uses the same macro).
> 
OK, I will notice this problem, thanks!

Thanks,
Keqian
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm