All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saravana Kannan <saravanak@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: David Dai <davidai@google.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Rob Herring <robh+dt@kernel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	Sudeep Holla <sudeep.holla@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	kernel-team@android.com, linux-pm@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev
Subject: Re: [RFC PATCH 0/6] Improve VM DVFS and task placement behavior
Date: Wed, 5 Apr 2023 14:00:59 -0700	[thread overview]
Message-ID: <CAGETcx90SiaztPO21GsHSr18XUTHoLWt3Jv+y=EW5yfjJgzJHw@mail.gmail.com> (raw)
In-Reply-To: <86sfdfv0e1.wl-maz@kernel.org>

On Tue, Apr 4, 2023 at 1:49 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 04 Apr 2023 20:43:40 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Folks,
> >
> > On Thu, Mar 30, 2023 at 03:43:35PM -0700, David Dai wrote:
> >
> > <snip>
> >
> > > PCMark
> > > Higher is better
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Test Case (score) | Baseline |  Hypercall | %delta |  MMIO | %delta |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Weighted Total    |     6136 |       7274 |   +19% |  6867 |   +12% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Web Browsing      |     5558 |       6273 |   +13% |  6035 |    +9% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Video Editing     |     4921 |       5221 |    +6% |  5167 |    +5% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Writing           |     6864 |       8825 |   +29% |  8529 |   +24% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Photo Editing     |     7983 |      11593 |   +45% | 10812 |   +35% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Data Manipulation |     5814 |       6081 |    +5% |  5327 |    -8% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > >
> > > PCMark Performance/mAh
> > > Higher is better
> > > +-----------+----------+-----------+--------+------+--------+
> > > |           | Baseline | Hypercall | %delta | MMIO | %delta |
> > > +-----------+----------+-----------+--------+------+--------+
> > > | Score/mAh |       79 |        88 |   +11% |   83 |    +7% |
> > > +-----------+----------+-----------+--------+------+--------+
> > >
> > > Roblox
> > > Higher is better
> > > +-----+----------+------------+--------+-------+--------+
> > > |     | Baseline |  Hypercall | %delta |  MMIO | %delta |
> > > +-----+----------+------------+--------+-------+--------+
> > > | FPS |    18.25 |      28.66 |   +57% | 24.06 |   +32% |
> > > +-----+----------+------------+--------+-------+--------+
> > >
> > > Roblox Frames/mAh
> > > Higher is better
> > > +------------+----------+------------+--------+--------+--------+
> > > |            | Baseline |  Hypercall | %delta |   MMIO | %delta |
> > > +------------+----------+------------+--------+--------+--------+
> > > | Frames/mAh |    91.25 |     114.64 |   +26% | 103.11 |   +13% |
> > > +------------+----------+------------+--------+--------+--------+
> >
> > </snip>
> >
> > > Next steps:
> > > ===========
> > > We are continuing to look into communication mechanisms other than
> > > hypercalls that are just as/more efficient and avoid switching into the VMM
> > > userspace. Any inputs in this regard are greatly appreciated.

Hi Oliver and Marc,

Replying to both of you in this one email.

> >
> > We're highly unlikely to entertain such an interface in KVM.
> >
> > The entire feature is dependent on pinning vCPUs to physical cores, for which
> > userspace is in the driver's seat. That is a well established and documented
> > policy which can be seen in the way we handle heterogeneous systems and
> > vPMU.
> >
> > Additionally, this bloats the KVM PV ABI with highly VMM-dependent interfaces
> > that I would not expect to benefit the typical user of KVM.
> >
> > Based on the data above, it would appear that the userspace implementation is
> > in the same neighborhood as a KVM-based implementation, which only further
> > weakens the case for moving this into the kernel.

Oliver,

Sorry if the tables/data aren't presented in an intuitive way, but
MMIO vs hypercall is definitely not in the same neighborhood. The
hypercall method often gives close to 2x the improvement that the MMIO
method gives. For example:

- Roblox FPS: MMIO improves it by 32% vs hypercall improves it by 57%.
- Frames/mAh: MMIO improves it by 13% vs hypercall improves it by 26%.
- PC Mark Data manipulation: MMIO makes it worse by 8% vs hypercall
improves it by 5%

Hypercall does better for other cases too, just not as good. For example,
- PC Mark Photo editing: Going from MMIO to hypercall gives a 10% improvement.

These are all pretty non-trivial, at least in the mobile world. Heck,
whole teams would spend months for 2% improvement in battery :)

> >
> > I certainly can appreciate the motivation for the series, but this feature
> > should be in userspace as some form of a virtual device.
>
> +1 on all of the above.

Marc and Oliver,

We are not tied to hypercalls. We want to do the right thing here, but
MMIO going all the way to userspace definitely doesn't cut it as is.
This is where we need some guidance. See more below.

> The one thing I'd like to understand that the comment seems to imply
> that there is a significant difference in overhead between a hypercall
> and an MMIO. In my experience, both are pretty similar in cost for a
> handling location (both in userspace or both in the kernel).

I think the main difference really is that in our hypercall vs MMIO
comparison the hypercall is handled in the kernel vs MMIO goes all the
way to userspace. I agree with you that the difference probably won't
be significant if both of them go to the same "depth" in the privilege
levels.

> MMIO
> handling is a tiny bit more expensive due to a guaranteed TLB miss
> followed by a walk of the in-kernel device ranges, but that's all. It
> should hardly register.
>
> And if you really want some super-low latency, low overhead
> signalling, maybe an exception is the wrong tool for the job. Shared
> memory communication could be more appropriate.

Yeah, that's one of our next steps. Ideally, we want to use shared
memory for the host to guest information flow. It's a 32-bit value
representing the current frequency that the host can update whenever
the host CPU frequency changes and the guest can read whenever it
needs it.

For guest to host information flow, we'll need a kick from guest to
host because we need to take action on the host side when threads
migrate between vCPUs and cause a significant change in vCPU util.
Again it can be just a shared memory and some kick. This is what we
are currently trying to figure out how to do.

If there are APIs to do this, can you point us to those please? We'd
also want the shared memory to be accessible by the VMM (so, shared
between guest kernel, host kernel and VMM).

Are the above next steps sane? Or is that a no-go? The main thing we
want to cut out is the need for having to switch to userspace for
every single interaction because, as is, it leaves a lot on the table.

Also, thanks for all the feedback. Glad to receive it.

-Saravana

WARNING: multiple messages have this Message-ID (diff)
From: Saravana Kannan <saravanak@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: David Dai <davidai@google.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	 "Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	 Rob Herring <robh+dt@kernel.org>,
	 Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	James Morse <james.morse@arm.com>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	 Mark Rutland <mark.rutland@arm.com>,
	Lorenzo Pieralisi <lpieralisi@kernel.org>,
	 Sudeep Holla <sudeep.holla@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	 Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	kernel-team@android.com,  linux-pm@vger.kernel.org,
	devicetree@vger.kernel.org,  linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-doc@vger.kernel.org,
	 linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev
Subject: Re: [RFC PATCH 0/6] Improve VM DVFS and task placement behavior
Date: Wed, 5 Apr 2023 14:00:59 -0700	[thread overview]
Message-ID: <CAGETcx90SiaztPO21GsHSr18XUTHoLWt3Jv+y=EW5yfjJgzJHw@mail.gmail.com> (raw)
In-Reply-To: <86sfdfv0e1.wl-maz@kernel.org>

On Tue, Apr 4, 2023 at 1:49 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 04 Apr 2023 20:43:40 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Folks,
> >
> > On Thu, Mar 30, 2023 at 03:43:35PM -0700, David Dai wrote:
> >
> > <snip>
> >
> > > PCMark
> > > Higher is better
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Test Case (score) | Baseline |  Hypercall | %delta |  MMIO | %delta |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Weighted Total    |     6136 |       7274 |   +19% |  6867 |   +12% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Web Browsing      |     5558 |       6273 |   +13% |  6035 |    +9% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Video Editing     |     4921 |       5221 |    +6% |  5167 |    +5% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Writing           |     6864 |       8825 |   +29% |  8529 |   +24% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Photo Editing     |     7983 |      11593 |   +45% | 10812 |   +35% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > > | Data Manipulation |     5814 |       6081 |    +5% |  5327 |    -8% |
> > > +-------------------+----------+------------+--------+-------+--------+
> > >
> > > PCMark Performance/mAh
> > > Higher is better
> > > +-----------+----------+-----------+--------+------+--------+
> > > |           | Baseline | Hypercall | %delta | MMIO | %delta |
> > > +-----------+----------+-----------+--------+------+--------+
> > > | Score/mAh |       79 |        88 |   +11% |   83 |    +7% |
> > > +-----------+----------+-----------+--------+------+--------+
> > >
> > > Roblox
> > > Higher is better
> > > +-----+----------+------------+--------+-------+--------+
> > > |     | Baseline |  Hypercall | %delta |  MMIO | %delta |
> > > +-----+----------+------------+--------+-------+--------+
> > > | FPS |    18.25 |      28.66 |   +57% | 24.06 |   +32% |
> > > +-----+----------+------------+--------+-------+--------+
> > >
> > > Roblox Frames/mAh
> > > Higher is better
> > > +------------+----------+------------+--------+--------+--------+
> > > |            | Baseline |  Hypercall | %delta |   MMIO | %delta |
> > > +------------+----------+------------+--------+--------+--------+
> > > | Frames/mAh |    91.25 |     114.64 |   +26% | 103.11 |   +13% |
> > > +------------+----------+------------+--------+--------+--------+
> >
> > </snip>
> >
> > > Next steps:
> > > ===========
> > > We are continuing to look into communication mechanisms other than
> > > hypercalls that are just as/more efficient and avoid switching into the VMM
> > > userspace. Any inputs in this regard are greatly appreciated.

Hi Oliver and Marc,

Replying to both of you in this one email.

> >
> > We're highly unlikely to entertain such an interface in KVM.
> >
> > The entire feature is dependent on pinning vCPUs to physical cores, for which
> > userspace is in the driver's seat. That is a well established and documented
> > policy which can be seen in the way we handle heterogeneous systems and
> > vPMU.
> >
> > Additionally, this bloats the KVM PV ABI with highly VMM-dependent interfaces
> > that I would not expect to benefit the typical user of KVM.
> >
> > Based on the data above, it would appear that the userspace implementation is
> > in the same neighborhood as a KVM-based implementation, which only further
> > weakens the case for moving this into the kernel.

Oliver,

Sorry if the tables/data aren't presented in an intuitive way, but
MMIO vs hypercall is definitely not in the same neighborhood. The
hypercall method often gives close to 2x the improvement that the MMIO
method gives. For example:

- Roblox FPS: MMIO improves it by 32% vs hypercall improves it by 57%.
- Frames/mAh: MMIO improves it by 13% vs hypercall improves it by 26%.
- PC Mark Data manipulation: MMIO makes it worse by 8% vs hypercall
improves it by 5%

Hypercall does better for other cases too, just not as good. For example,
- PC Mark Photo editing: Going from MMIO to hypercall gives a 10% improvement.

These are all pretty non-trivial, at least in the mobile world. Heck,
whole teams would spend months for 2% improvement in battery :)

> >
> > I certainly can appreciate the motivation for the series, but this feature
> > should be in userspace as some form of a virtual device.
>
> +1 on all of the above.

Marc and Oliver,

We are not tied to hypercalls. We want to do the right thing here, but
MMIO going all the way to userspace definitely doesn't cut it as is.
This is where we need some guidance. See more below.

> The one thing I'd like to understand that the comment seems to imply
> that there is a significant difference in overhead between a hypercall
> and an MMIO. In my experience, both are pretty similar in cost for a
> handling location (both in userspace or both in the kernel).

I think the main difference really is that in our hypercall vs MMIO
comparison the hypercall is handled in the kernel vs MMIO goes all the
way to userspace. I agree with you that the difference probably won't
be significant if both of them go to the same "depth" in the privilege
levels.

> MMIO
> handling is a tiny bit more expensive due to a guaranteed TLB miss
> followed by a walk of the in-kernel device ranges, but that's all. It
> should hardly register.
>
> And if you really want some super-low latency, low overhead
> signalling, maybe an exception is the wrong tool for the job. Shared
> memory communication could be more appropriate.

Yeah, that's one of our next steps. Ideally, we want to use shared
memory for the host to guest information flow. It's a 32-bit value
representing the current frequency that the host can update whenever
the host CPU frequency changes and the guest can read whenever it
needs it.

For guest to host information flow, we'll need a kick from guest to
host because we need to take action on the host side when threads
migrate between vCPUs and cause a significant change in vCPU util.
Again it can be just a shared memory and some kick. This is what we
are currently trying to figure out how to do.

If there are APIs to do this, can you point us to those please? We'd
also want the shared memory to be accessible by the VMM (so, shared
between guest kernel, host kernel and VMM).

Are the above next steps sane? Or is that a no-go? The main thing we
want to cut out is the need for having to switch to userspace for
every single interaction because, as is, it leaves a lot on the table.

Also, thanks for all the feedback. Glad to receive it.

-Saravana

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2023-04-05 21:01 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 22:43 [RFC PATCH 0/6] Improve VM DVFS and task placement behavior David Dai
2023-03-30 22:43 ` David Dai
2023-03-30 22:43 ` [RFC PATCH 1/6] sched/fair: Add util_guest for tasks David Dai
2023-03-31  2:01   ` kernel test robot
2023-04-03 11:40   ` Dietmar Eggemann
2023-04-04  1:11     ` David Dai
2023-04-05  8:29       ` Quentin Perret
2023-04-05 10:50       ` Dietmar Eggemann
2023-04-05 21:42         ` Saravana Kannan
2023-04-05 23:36           ` David Dai
2023-04-05  8:14   ` Peter Zijlstra
2023-04-05 22:54     ` David Dai
2023-04-06  7:33       ` Peter Zijlstra
2023-03-30 22:43 ` [RFC PATCH 2/6] kvm: arm64: Add support for get_cur_cpufreq service David Dai
2023-03-30 22:43   ` David Dai
2023-04-05  8:04   ` Quentin Perret
2023-04-05  8:04     ` Quentin Perret
2023-03-30 22:43 ` [RFC PATCH 3/6] kvm: arm64: Add support for util_hint service David Dai
2023-03-30 22:43   ` David Dai
2023-03-30 22:43 ` [RFC PATCH 4/6] kvm: arm64: Add support for get_freqtbl service David Dai
2023-03-30 22:43   ` David Dai
2023-03-30 22:43 ` [RFC PATCH 5/6] dt-bindings: cpufreq: add bindings for virtual kvm cpufreq David Dai
2023-03-30 22:43 ` [RFC PATCH 6/6] cpufreq: add kvm-cpufreq driver David Dai
2023-04-05  8:22   ` Peter Zijlstra
2023-04-05 22:42     ` David Dai
2023-03-30 23:20 ` [RFC PATCH 0/6] Improve VM DVFS and task placement behavior Oliver Upton
2023-03-30 23:20   ` Oliver Upton
2023-03-30 23:36   ` Saravana Kannan
2023-03-30 23:36     ` Saravana Kannan
2023-03-30 23:40     ` Oliver Upton
2023-03-30 23:40       ` Oliver Upton
2023-03-31  0:34       ` Saravana Kannan
2023-03-31  0:34         ` Saravana Kannan
2023-03-31  0:49 ` Matthew Wilcox
2023-03-31  0:49   ` Matthew Wilcox
2023-04-03 10:18   ` Mel Gorman
2023-04-03 10:18     ` Mel Gorman
2023-04-04 19:43 ` Oliver Upton
2023-04-04 19:43   ` Oliver Upton
2023-04-04 20:49   ` Marc Zyngier
2023-04-04 20:49     ` Marc Zyngier
2023-04-05  7:48     ` Quentin Perret
2023-04-05  7:48       ` Quentin Perret
2023-04-05  8:33       ` Vincent Guittot
2023-04-05  8:33         ` Vincent Guittot
2023-04-05 21:07       ` Saravana Kannan
2023-04-05 21:07         ` Saravana Kannan
2023-04-06 12:52         ` Quentin Perret
2023-04-06 12:52           ` Quentin Perret
2023-04-06 21:39           ` David Dai
2023-04-06 21:39             ` David Dai
2023-04-05 21:00     ` Saravana Kannan [this message]
2023-04-05 21:00       ` Saravana Kannan
2023-04-06  8:42       ` Marc Zyngier
2023-04-06  8:42         ` Marc Zyngier
2023-04-05  8:05 ` Peter Zijlstra
2023-04-05  8:05   ` Peter Zijlstra
2023-04-05 21:08   ` Saravana Kannan
2023-04-05 21:08     ` Saravana Kannan
2023-04-06  7:36     ` Peter Zijlstra
2023-04-06  7:36       ` Peter Zijlstra
2023-04-06  7:38     ` Peter Zijlstra
2023-04-06  7:38       ` Peter Zijlstra
2023-04-27  7:46 ` Pavan Kondeti
2023-04-27  7:46   ` Pavan Kondeti
2023-04-27  9:52   ` Gupta, Pankaj
2023-04-27  9:52     ` Gupta, Pankaj
2023-04-27 11:26     ` Pavan Kondeti
2023-04-27 11:26       ` Pavan Kondeti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGETcx90SiaztPO21GsHSr18XUTHoLWt3Jv+y=EW5yfjJgzJHw@mail.gmail.com' \
    --to=saravanak@google.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=davidai@google.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=james.morse@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sudeep.holla@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.