linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
@ 2022-03-25 23:31 Vipin Sharma
  2022-03-25 23:52 ` David Matlack
  0 siblings, 1 reply; 9+ messages in thread
From: Vipin Sharma @ 2022-03-25 23:31 UTC (permalink / raw)
  To: pbonzini
  Cc: seanjc, vkuznets, wanpengli, jmattson, joro, dmatlack, kvm,
	linux-kernel, Vipin Sharma

Avoid calling handlers on empty rmap entries and skip to the next non
empty rmap entry.

Empty rmap entries are noop in handlers.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Change-Id: I8abf0f4d82a2aae4c5d58b80bcc17ffc30785ffc
---
 arch/x86/kvm/mmu/mmu.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 51671cb34fb6..f296340803ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1499,11 +1499,14 @@ static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
 	return !!iterator->rmap;
 }
 
-static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
+static noinline void
+slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
 {
-	if (++iterator->rmap <= iterator->end_rmap) {
+	while (++iterator->rmap <= iterator->end_rmap) {
 		iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
-		return;
+
+		if (iterator->rmap->val)
+			return;
 	}
 
 	if (++iterator->level > iterator->end_level) {

base-commit: c9b8fecddb5bb4b67e351bbaeaa648a6f7456912
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
  2022-03-25 23:31 [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps Vipin Sharma
@ 2022-03-25 23:52 ` David Matlack
  2022-03-26  0:31   ` Vipin Sharma
  0 siblings, 1 reply; 9+ messages in thread
From: David Matlack @ 2022-03-25 23:52 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML

On Fri, Mar 25, 2022 at 4:31 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> Avoid calling handlers on empty rmap entries and skip to the next non
> empty rmap entry.
>
> Empty rmap entries are noop in handlers.
>
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Change-Id: I8abf0f4d82a2aae4c5d58b80bcc17ffc30785ffc

nit: Omit Change-Id tags from upstream commits.

> ---
>  arch/x86/kvm/mmu/mmu.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 51671cb34fb6..f296340803ba 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1499,11 +1499,14 @@ static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
>         return !!iterator->rmap;
>  }
>
> -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
> +static noinline void

What is the reason to add noinline?


> +slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
>  {
> -       if (++iterator->rmap <= iterator->end_rmap) {
> +       while (++iterator->rmap <= iterator->end_rmap) {
>                 iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
> -               return;
> +
> +               if (iterator->rmap->val)
> +                       return;
>         }
>
>         if (++iterator->level > iterator->end_level) {
>
> base-commit: c9b8fecddb5bb4b67e351bbaeaa648a6f7456912
> --
> 2.35.1.1021.g381101b075-goog
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
  2022-03-25 23:52 ` David Matlack
@ 2022-03-26  0:31   ` Vipin Sharma
  2022-03-27 10:40     ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Vipin Sharma @ 2022-03-26  0:31 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML

On Fri, Mar 25, 2022 at 4:53 PM David Matlack <dmatlack@google.com> wrote:
>
> On Fri, Mar 25, 2022 at 4:31 PM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > Avoid calling handlers on empty rmap entries and skip to the next non
> > empty rmap entry.
> >
> > Empty rmap entries are noop in handlers.
> >
> > Signed-off-by: Vipin Sharma <vipinsh@google.com>
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Change-Id: I8abf0f4d82a2aae4c5d58b80bcc17ffc30785ffc
>
> nit: Omit Change-Id tags from upstream commits.

Thanks for catching it.

>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 51671cb34fb6..f296340803ba 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -1499,11 +1499,14 @@ static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
> >         return !!iterator->rmap;
> >  }
> >
> > -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
> > +static noinline void
>
> What is the reason to add noinline?

My understanding is that since this method is called from
__always_inline methods, noinline will avoid gcc inlining the
slot_rmap_walk_next in those functions and generate smaller code.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
  2022-03-26  0:31   ` Vipin Sharma
@ 2022-03-27 10:40     ` Paolo Bonzini
  2022-03-28 19:13       ` Vipin Sharma
  2022-04-08 19:31       ` Vipin Sharma
  0 siblings, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2022-03-27 10:40 UTC (permalink / raw)
  To: Vipin Sharma, David Matlack
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm list, LKML

On 3/26/22 01:31, Vipin Sharma wrote:
>>> -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
>>> +static noinline void
>>
>> What is the reason to add noinline?
> 
> My understanding is that since this method is called from
> __always_inline methods, noinline will avoid gcc inlining the
> slot_rmap_walk_next in those functions and generate smaller code.
> 

Iterators are written in such a way that it's way more beneficial to 
inline them.  After inlining, compilers replace the aggregates (in this 
case, struct slot_rmap_walk_iterator) with one variable per field and 
that in turn enables a lot of optimizations, so the iterators should 
actually be always_inline if anything.

For the same reason I'd guess the effect on the generated code should be 
small (next time please include the output of "size mmu.o"), but should 
still be there.  I'll do a quick check of the generated code and apply 
the patch.

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
  2022-03-27 10:40     ` Paolo Bonzini
@ 2022-03-28 19:13       ` Vipin Sharma
  2022-03-28 20:29         ` Sean Christopherson
  2022-04-08 19:31       ` Vipin Sharma
  1 sibling, 1 reply; 9+ messages in thread
From: Vipin Sharma @ 2022-03-28 19:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Matlack, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML

Thank you David and Paolo, for checking this patch carefully. With
hindsight, I should have explicitly mentioned adding "noinline" in my
patch email.

On Sun, Mar 27, 2022 at 3:41 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 3/26/22 01:31, Vipin Sharma wrote:
> >>> -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
> >>> +static noinline void
> >>
> >> What is the reason to add noinline?
> >
> > My understanding is that since this method is called from
> > __always_inline methods, noinline will avoid gcc inlining the
> > slot_rmap_walk_next in those functions and generate smaller code.
> >
>
> Iterators are written in such a way that it's way more beneficial to
> inline them.  After inlining, compilers replace the aggregates (in this
> case, struct slot_rmap_walk_iterator) with one variable per field and
> that in turn enables a lot of optimizations, so the iterators should
> actually be always_inline if anything.
>
> For the same reason I'd guess the effect on the generated code should be
> small (next time please include the output of "size mmu.o"), but should
> still be there.  I'll do a quick check of the generated code and apply
> the patch.

Yeah, I should have added the "size mmu.o" output. Here is what I have found:

size arch/x86/kvm/mmu/mmu.o

Without noinline:
              text      data     bss       dec        hex filename
          89938   15793      72  105803   19d4b arch/x86/kvm/mmu/mmu.o

With noinline:
              text      data     bss        dec       hex filename
          90058   15793      72  105923   19dc3 arch/x86/kvm/mmu/mmu.o

With noinline, increase in size = 120

Curiously, I also checked file size with "ls -l" command
File size:
        Without noinline: 1394272 bytes
        With noinline: 1381216 bytes

With noinline, decrease in size = 13056 bytes

I also disassembled mmu.o via "objdump -d" and found following
Total lines in the generated assembly:
        Without noinline: 23438
        With noinline: 23393

With noinline, decrease in assembly code = 45

I can see in assembly code that there are multiple "call" operations
in the "with noinline" object file, which is expected and has less
lines of code compared to "without noinline". I am not sure why the
size command is showing an increase in text segment for "with
noinline" and what to infer with all of this data.

Thanks
Vipin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
  2022-03-28 19:13       ` Vipin Sharma
@ 2022-03-28 20:29         ` Sean Christopherson
  0 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2022-03-28 20:29 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: Paolo Bonzini, David Matlack, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML

On Mon, Mar 28, 2022, Vipin Sharma wrote:
> Thank you David and Paolo, for checking this patch carefully. With
> hindsight, I should have explicitly mentioned adding "noinline" in my
> patch email.
> 
> On Sun, Mar 27, 2022 at 3:41 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 3/26/22 01:31, Vipin Sharma wrote:
> > >>> -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
> > >>> +static noinline void
> > >>
> > >> What is the reason to add noinline?
> > >
> > > My understanding is that since this method is called from
> > > __always_inline methods, noinline will avoid gcc inlining the
> > > slot_rmap_walk_next in those functions and generate smaller code.
> > >
> >
> > Iterators are written in such a way that it's way more beneficial to
> > inline them.  After inlining, compilers replace the aggregates (in this
> > case, struct slot_rmap_walk_iterator) with one variable per field and
> > that in turn enables a lot of optimizations, so the iterators should
> > actually be always_inline if anything.
> >
> > For the same reason I'd guess the effect on the generated code should be
> > small (next time please include the output of "size mmu.o"), but should
> > still be there.  I'll do a quick check of the generated code and apply
> > the patch.
> 
> Yeah, I should have added the "size mmu.o" output. Here is what I have found:
> 
> size arch/x86/kvm/mmu/mmu.o
> 
> Without noinline:
>               text      data     bss       dec        hex filename
>           89938   15793      72  105803   19d4b arch/x86/kvm/mmu/mmu.o
> 
> With noinline:
>               text      data     bss        dec       hex filename
>           90058   15793      72  105923   19dc3 arch/x86/kvm/mmu/mmu.o
> 
> With noinline, increase in size = 120
> 
> Curiously, I also checked file size with "ls -l" command
> File size:
>         Without noinline: 1394272 bytes
>         With noinline: 1381216 bytes
> 
> With noinline, decrease in size = 13056 bytes
> 
> I also disassembled mmu.o via "objdump -d" and found following
> Total lines in the generated assembly:
>         Without noinline: 23438
>         With noinline: 23393
> 
> With noinline, decrease in assembly code = 45
> 
> I can see in assembly code that there are multiple "call" operations
> in the "with noinline" object file, which is expected and has less
> lines of code compared to "without noinline". I am not sure why the
> size command is showing an increase in text segment for "with
> noinline" and what to infer with all of this data.

The most common takeaway from these types of exercises is that trying to be smarter
than the compiler is usually a fools errand.  Smaller code footprint doesn't
necessarily equate to better runtime performance.  And conversely, inlining may
not always be a win, which is why tagging static helpers (not in headers) with
"inline" is generally discouraged.

IMO, unless there's an explicit side effect we want (or want to avoid), we should
never use "noinline".  E.g. the VMX <insn>_error() handlers use noinline so that
KVM only WARNs once per failure of instruction type, and fxregs_fixup() uses it
to keep the stack size manageable.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
  2022-03-27 10:40     ` Paolo Bonzini
  2022-03-28 19:13       ` Vipin Sharma
@ 2022-04-08 19:31       ` Vipin Sharma
  2022-04-18 16:29         ` Vipin Sharma
  1 sibling, 1 reply; 9+ messages in thread
From: Vipin Sharma @ 2022-04-08 19:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Matlack, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML

On Sun, Mar 27, 2022 at 3:41 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 3/26/22 01:31, Vipin Sharma wrote:
> >>> -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
> >>> +static noinline void
> >>
> >> What is the reason to add noinline?
> >
> > My understanding is that since this method is called from
> > __always_inline methods, noinline will avoid gcc inlining the
> > slot_rmap_walk_next in those functions and generate smaller code.
> >
>
> Iterators are written in such a way that it's way more beneficial to
> inline them.  After inlining, compilers replace the aggregates (in this
> case, struct slot_rmap_walk_iterator) with one variable per field and
> that in turn enables a lot of optimizations, so the iterators should
> actually be always_inline if anything.
>
> For the same reason I'd guess the effect on the generated code should be
> small (next time please include the output of "size mmu.o"), but should
> still be there.  I'll do a quick check of the generated code and apply
> the patch.
>
> Paolo
>

Let me know if you are still planning to modify the current patch by
removing "noinline" and merge or if you prefer a v2 without noinline.

Thanks
Vipin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
  2022-04-08 19:31       ` Vipin Sharma
@ 2022-04-18 16:29         ` Vipin Sharma
  2022-04-26 20:33           ` Vipin Sharma
  0 siblings, 1 reply; 9+ messages in thread
From: Vipin Sharma @ 2022-04-18 16:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Matlack, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML

On Fri, Apr 8, 2022 at 12:31 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On Sun, Mar 27, 2022 at 3:41 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 3/26/22 01:31, Vipin Sharma wrote:
> > >>> -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
> > >>> +static noinline void
> > >>
> > >> What is the reason to add noinline?
> > >
> > > My understanding is that since this method is called from
> > > __always_inline methods, noinline will avoid gcc inlining the
> > > slot_rmap_walk_next in those functions and generate smaller code.
> > >
> >
> > Iterators are written in such a way that it's way more beneficial to
> > inline them.  After inlining, compilers replace the aggregates (in this
> > case, struct slot_rmap_walk_iterator) with one variable per field and
> > that in turn enables a lot of optimizations, so the iterators should
> > actually be always_inline if anything.
> >
> > For the same reason I'd guess the effect on the generated code should be
> > small (next time please include the output of "size mmu.o"), but should
> > still be there.  I'll do a quick check of the generated code and apply
> > the patch.
> >
> > Paolo
> >
>
> Let me know if you are still planning to modify the current patch by
> removing "noinline" and merge or if you prefer a v2 without noinline.

Hi Paolo,

Any update on this patch?

Thanks
Vipin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps
  2022-04-18 16:29         ` Vipin Sharma
@ 2022-04-26 20:33           ` Vipin Sharma
  0 siblings, 0 replies; 9+ messages in thread
From: Vipin Sharma @ 2022-04-26 20:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Matlack, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm list, LKML

On Mon, Apr 18, 2022 at 9:29 AM Vipin Sharma <vipinsh@google.com> wrote:
>
> On Fri, Apr 8, 2022 at 12:31 PM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On Sun, Mar 27, 2022 at 3:41 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > >
> > > On 3/26/22 01:31, Vipin Sharma wrote:
> > > >>> -static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
> > > >>> +static noinline void
> > > >>
> > > >> What is the reason to add noinline?
> > > >
> > > > My understanding is that since this method is called from
> > > > __always_inline methods, noinline will avoid gcc inlining the
> > > > slot_rmap_walk_next in those functions and generate smaller code.
> > > >
> > >
> > > Iterators are written in such a way that it's way more beneficial to
> > > inline them.  After inlining, compilers replace the aggregates (in this
> > > case, struct slot_rmap_walk_iterator) with one variable per field and
> > > that in turn enables a lot of optimizations, so the iterators should
> > > actually be always_inline if anything.
> > >
> > > For the same reason I'd guess the effect on the generated code should be
> > > small (next time please include the output of "size mmu.o"), but should
> > > still be there.  I'll do a quick check of the generated code and apply
> > > the patch.
> > >
> > > Paolo
> > >
> >
> > Let me know if you are still planning to modify the current patch by
> > removing "noinline" and merge or if you prefer a v2 without noinline.
>
> Hi Paolo,
>
> Any update on this patch?
>

Hi Paolo,

Still waiting for your response on this patch :)
Please let me know if you prefer v2 (without noinline) or you will
merge this patch without noinline from your side. If there is any
concern or feedback which I can address please let me know.

Thanks
Vipin Sharma

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-26 20:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-25 23:31 [PATCH] KVM: x86/mmu: Speed up slot_rmap_walk_next for sparsely populated rmaps Vipin Sharma
2022-03-25 23:52 ` David Matlack
2022-03-26  0:31   ` Vipin Sharma
2022-03-27 10:40     ` Paolo Bonzini
2022-03-28 19:13       ` Vipin Sharma
2022-03-28 20:29         ` Sean Christopherson
2022-04-08 19:31       ` Vipin Sharma
2022-04-18 16:29         ` Vipin Sharma
2022-04-26 20:33           ` Vipin Sharma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).