KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Keqian Zhu <zhukeqian1@huawei.com>
To: Ben Gardon <bgardon@google.com>
Cc: LKML <linux-kernel@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	<wanghaibin.wang@huawei.com>
Subject: Re: [RFC PATCH v2 2/2] KVM: x86: Not wr-protect huge page with init_all_set dirty log
Date: Tue, 20 Apr 2021 15:49:26 +0800
Message-ID: <49e6bf4f-0142-c9ea-a8c1-7cfe211c8d7b@huawei.com> (raw)
In-Reply-To: <CANgfPd_WzX6Fm7BiMoBoehuLL8tjh4WEqehUhF8biPyL8vS4XQ@mail.gmail.com>

Hi Ben,

On 2021/4/20 3:20, Ben Gardon wrote:
> On Fri, Apr 16, 2021 at 1:25 AM Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>
>> Currently during start dirty logging, if we're with init-all-set,
>> we write protect huge pages and leave normal pages untouched, for
>> that we can enable dirty logging for these pages lazily.
>>
>> Actually enable dirty logging lazily for huge pages is feasible
>> too, which not only reduces the time of start dirty logging, also
>> greatly reduces side-effect on guest when there is high dirty rate.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>  arch/x86/kvm/mmu/mmu.c | 48 ++++++++++++++++++++++++++++++++++++++----
>>  arch/x86/kvm/x86.c     | 37 +++++++++-----------------------
>>  2 files changed, 54 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 2ce5bc2ea46d..98fa25172b9a 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -1188,8 +1188,7 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
>>   * @gfn_offset: start of the BITS_PER_LONG pages we care about
>>   * @mask: indicates which pages we should protect
>>   *
>> - * Used when we do not need to care about huge page mappings: e.g. during dirty
>> - * logging we do not have any such mappings.
>> + * Used when we do not need to care about huge page mappings.
>>   */
>>  static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>>                                      struct kvm_memory_slot *slot,
>> @@ -1246,13 +1245,54 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
>>   * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to
>>   * enable dirty logging for them.
>>   *
>> - * Used when we do not need to care about huge page mappings: e.g. during dirty
>> - * logging we do not have any such mappings.
>> + * We need to care about huge page mappings: e.g. during dirty logging we may
>> + * have any such mappings.
>>   */
>>  void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>                                 struct kvm_memory_slot *slot,
>>                                 gfn_t gfn_offset, unsigned long mask)
>>  {
>> +       gfn_t start, end;
>> +
>> +       /*
>> +        * Huge pages are NOT write protected when we start dirty log with
>> +        * init-all-set, so we must write protect them at here.
>> +        *
>> +        * The gfn_offset is guaranteed to be aligned to 64, but the base_gfn
>> +        * of memslot has no such restriction, so the range can cross two large
>> +        * pages.
>> +        */
>> +       if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
>> +               start = slot->base_gfn + gfn_offset + __ffs(mask);
>> +               end = slot->base_gfn + gfn_offset + __fls(mask);
>> +               kvm_mmu_slot_gfn_write_protect(kvm, slot, start, PG_LEVEL_2M);
>> +
>> +               /* Cross two large pages? */
>> +               if (ALIGN(start << PAGE_SHIFT, PMD_SIZE) !=
>> +                   ALIGN(end << PAGE_SHIFT, PMD_SIZE))
>> +                       kvm_mmu_slot_gfn_write_protect(kvm, slot, end,
>> +                                                      PG_LEVEL_2M);
>> +       }
>> +
>> +       /*
>> +        * RFC:
>> +        *
>> +        * 1. I don't return early when kvm_mmu_slot_gfn_write_protect() returns
>> +        * true, because I am not very clear about the relationship between
>> +        * legacy mmu and tdp mmu. AFAICS, the code logic is NOT an if/else
>> +        * manner.
>> +        *
>> +        * The kvm_mmu_slot_gfn_write_protect() returns true when we hit a
>> +        * writable large page mapping in legacy mmu mapping or tdp mmu mapping.
>> +        * Do we still have normal mapping in that case? (e.g. We have large
>> +        * mapping in legacy mmu and normal mapping in tdp mmu).
> 
> Right, we can't return early because the two MMUs could map the page
> in different ways, but each MMU could also map the page in multiple
> ways independently.
> For example, if the legacy MMU was being used and we were running a
> nested VM, a page could be mapped 2M in EPT01 and 4K in EPT02, so we'd
> still need kvm_mmu_slot_gfn_write_protect  calls for both levels.
> I don't think there's a case where we can return early here with the
> information that the first calls to kvm_mmu_slot_gfn_write_protect
> access.
Thanks for the detailed explanation.

> 
>> +        *
>> +        * 2. kvm_mmu_slot_gfn_write_protect() doesn't tell us whether the large
>> +        * page mapping exist. If it exists but is clean, we can return early.
>> +        * However, we have to do invasive change.
> 
> What do you mean by invasive change?
We need the kvm_mmu_slot_gfn_write_protect to report whether all mapping are large
and clean, so we can return early. However it's not a part of semantics of this function.

If this is the final code, compared to old code, we have an extra gfn_write_protect(),
I don't whether it's acceptable?

Thanks,
Keqian


> 
>> +        */
>> +
>> +       /* Then we can handle the PT level pages */
>>         if (kvm_x86_ops.cpu_dirty_log_size)
>>                 kvm_mmu_clear_dirty_pt_masked(kvm, slot, gfn_offset, mask);
>>         else
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index eca63625aee4..dfd676ffa7da 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10888,36 +10888,19 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>>                  */
>>                 kvm_mmu_zap_collapsible_sptes(kvm, new);
>>         } else {
>> -               /* By default, write-protect everything to log writes. */
>> -               int level = PG_LEVEL_4K;
>> +               /*
>> +                * If we're with initial-all-set, we don't need to write protect
>> +                * any page because they're reported as dirty already.
>> +                */
>> +               if (kvm_dirty_log_manual_protect_and_init_set(kvm))
>> +                       return;
>>
>>                 if (kvm_x86_ops.cpu_dirty_log_size) {
>> -                       /*
>> -                        * Clear all dirty bits, unless pages are treated as
>> -                        * dirty from the get-go.
>> -                        */
>> -                       if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
>> -                               kvm_mmu_slot_leaf_clear_dirty(kvm, new);
>> -
>> -                       /*
>> -                        * Write-protect large pages on write so that dirty
>> -                        * logging happens at 4k granularity.  No need to
>> -                        * write-protect small SPTEs since write accesses are
>> -                        * logged by the CPU via dirty bits.
>> -                        */
>> -                       level = PG_LEVEL_2M;
>> -               } else if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
>> -                       /*
>> -                        * If we're with initial-all-set, we don't need
>> -                        * to write protect any small page because
>> -                        * they're reported as dirty already.  However
>> -                        * we still need to write-protect huge pages
>> -                        * so that the page split can happen lazily on
>> -                        * the first write to the huge page.
>> -                        */
>> -                       level = PG_LEVEL_2M;
>> +                       kvm_mmu_slot_leaf_clear_dirty(kvm, new);
>> +                       kvm_mmu_slot_remove_write_access(kvm, new, PG_LEVEL_2M);
>> +               } else {
>> +                       kvm_mmu_slot_remove_write_access(kvm, new, PG_LEVEL_4K);
>>                 }
>> -               kvm_mmu_slot_remove_write_access(kvm, new, level);
>>         }
>>  }
>>
>> --
>> 2.23.0
>>
> .
> 

  reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16  8:25 [RFC PATCH v2 0/2] KVM: x86: Enable dirty logging lazily for huge pages Keqian Zhu
2021-04-16  8:25 ` [RFC PATCH v2 1/2] KVM: x86: Support write protect gfn with min_level Keqian Zhu
2021-04-16  8:25 ` [RFC PATCH v2 2/2] KVM: x86: Not wr-protect huge page with init_all_set dirty log Keqian Zhu
2021-04-19 19:20   ` Ben Gardon
2021-04-20  7:49     ` Keqian Zhu [this message]
2021-04-20 16:30       ` Ben Gardon
2021-04-27  5:03         ` Keqian Zhu
2021-04-27 16:33           ` Ben Gardon
2021-04-28 10:51             ` Keqian Zhu
     [not found]               ` <60894846.1c69fb81.6e765.161bSMTPIN_ADDED_BROKEN@mx.google.com>
2021-04-28 16:22                 ` Ben Gardon
2021-04-29  3:30                   ` Keqian Zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49e6bf4f-0142-c9ea-a8c1-7cfe211c8d7b@huawei.com \
    --to=zhukeqian1@huawei.com \
    --cc=bgardon@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=wanghaibin.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git