All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/8] KVM: Resolve memslot ID via a hash table instead of via a static array
Date: Sat, 22 May 2021 13:11:30 +0200	[thread overview]
Message-ID: <5887de10-c615-175b-e491-86f94e542425@maciej.szmigiero.name> (raw)
In-Reply-To: <c7ba42ee-dc70-a86c-aeb2-d410c136a5ec@maciej.szmigiero.name>

On 21.05.2021 09:05, Maciej S. Szmigiero wrote:
> On 20.05.2021 00:31, Sean Christopherson wrote:
>> On Sun, May 16, 2021, Maciej S. Szmigiero wrote:
(..)
>>>           new_size = old_size;
>>>       slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
>>> -    if (likely(slots))
>>> -        memcpy(slots, old, old_size);
>>> +    if (unlikely(!slots))
>>> +        return NULL;
>>> +
>>> +    memcpy(slots, old, old_size);
>>> +
>>> +    hash_init(slots->id_hash);
>>> +    kvm_for_each_memslot(memslot, slots)
>>> +        hash_add(slots->id_hash, &memslot->id_node, memslot->id);
>>
>> What's the perf penalty if the number of memslots gets large?  I ask because the
>> lazy rmap allocation is adding multiple calls to kvm_dup_memslots().
> 
> I would expect the "move inactive" benchmark to be closest to measuring
> the performance of just a memslot array copy operation but the results
> suggest that the performance stays within ~10% window from 10 to 509
> memslots on the old code (it then climbs 13x for 32k case).
> 
> That suggests that something else is dominating this benchmark for these
> memslot counts (probably zapping of shadow pages).
> 
> At the same time, the tree-based memslots implementation is clearly
> faster in this benchmark, even for smaller memslot counts, so apparently
> copying of the memslot array has some performance impact, too.
> 
> Measuring just kvm_dup_memslots() performance would probably be done
> best by benchmarking KVM_MR_FLAGS_ONLY operation - will try to add this
> operation to my set of benchmarks and see how it performs with different
> memslot counts.

Update:
I've implemented a simple KVM_MR_FLAGS_ONLY benchmark, that repeatably
sets and unsets KVM_MEM_LOG_DIRTY_PAGES flag on a memslot with a single
page of memory in it. [1]

Since on the current code with higher memslot counts the "set flags"
operation spends a significant time in kvm_mmu_calculate_default_mmu_pages()
a second set of measurements was done with patch [2] applied.

In this case, the top functions in the perf trace are "memcpy" and
"clear_page" (called from kvm_set_memslot(), most likely from inlined
kvm_dup_memslots()).

For reference, a set of measurements with the whole patch series
(patches 1 - 8) applied was also done, as "new code".
In this case, SRCU-related functions dominate the perf trace.

32k memslots:
Current code:             0.00130s
Current code + patch [2]: 0.00104s (13x 4k result)
New code:                 0.0000144s

4k memslots:
Current code:             0.0000899s
Current code + patch [2]: 0.0000799s (+78% 2k result)
New code:                 0.0000144s

2k memslots:
Current code:             0.0000495s
Current code + patch [2]: 0.0000447s (+54% 509 result)
New code:                 0.0000143s

509 memslots:
Current code:             0.0000305s
Current code + patch [2]: 0.0000290s (+5% 100 result)
New code:                 0.0000141s

100 memslots:
Current code:             0.0000280s
Current code + patch [2]: 0.0000275s (same as for 10 slots)
New code:                 0.0000142s

10 memslots:
Current code:             0.0000272s
Current code + patch [2]: 0.0000272s
New code:                 0.0000141s

Thanks,
Maciej

[1]: The patch against memslot_perf_test.c is available here:
https://github.com/maciejsszmigiero/linux/commit/841e94898a55ff79af9d20a08205aa80808bd2a8

[2]: "[PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array"

  reply	other threads:[~2021-05-22 11:15 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-16 21:44 [PATCH v3 0/8] KVM: Scalable memslots implementation Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array Maciej S. Szmigiero
2021-05-19 21:00   ` Sean Christopherson
2021-05-21  7:03     ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 2/8] KVM: Integrate gfn_to_memslot_approx() into search_memslots() Maciej S. Szmigiero
2021-05-19 21:24   ` Sean Christopherson
2021-05-21  7:03     ` Maciej S. Szmigiero
2021-06-10 16:17     ` Paolo Bonzini
2021-05-16 21:44 ` [PATCH v3 3/8] KVM: Resolve memslot ID via a hash table instead of via a static array Maciej S. Szmigiero
2021-05-19 22:31   ` Sean Christopherson
2021-05-21  7:05     ` Maciej S. Szmigiero
2021-05-22 11:11       ` Maciej S. Szmigiero [this message]
2021-05-16 21:44 ` [PATCH v3 4/8] KVM: Introduce memslots hva tree Maciej S. Szmigiero
2021-05-19 23:07   ` Sean Christopherson
2021-05-21  7:06     ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 5/8] KVM: s390: Introduce kvm_s390_get_gfn_end() Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 6/8] KVM: Keep memslots in tree-based structures instead of array-based ones Maciej S. Szmigiero
2021-05-19 23:10   ` Sean Christopherson
2021-05-21  7:06     ` Maciej S. Szmigiero
2021-05-25 23:21   ` Sean Christopherson
2021-06-01 20:24     ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 7/8] KVM: Optimize gfn lookup in kvm_zap_gfn_range() Maciej S. Szmigiero
2021-05-26 17:33   ` Sean Christopherson
2021-06-01 20:25     ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 8/8] KVM: Optimize overlapping memslots check Maciej S. Szmigiero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5887de10-c615-175b-e491-86f94e542425@maciej.szmigiero.name \
    --to=mail@maciej.szmigiero.name \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=borntraeger@de.ibm.com \
    --cc=chenhuacai@kernel.org \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imammedo@redhat.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=james.morse@arm.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=paulus@ozlabs.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.