Linux-PM Archive on lore.kernel.org
 help / color / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	kvm-devel <kvm@vger.kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim KrÄ?máÅ?" <rkrcmar@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Wanpeng Li" <kernellwp@gmail.com>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	"Raslan KarimAllah" <karahmed@amazon.de>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Ankur Arora" <ankur.a.arora@oracle.com>,
	"Christian Borntraeger" <borntraeger@de.ibm.com>,
	"Linux PM" <linux-pm@vger.kernel.org>
Subject: Re: [patch 0/3] cpuidle-haltpoll driver (v2)
Date: Tue, 11 Jun 2019 23:24:39 +0200
Message-ID: <CAJZ5v0gPbSXB3r71XaT-4Q7LsiFO_UVymBwOmU8J1W5+COk_1g@mail.gmail.com> (raw)
In-Reply-To: <20190611142627.GB4791@amt.cnet>

On Tue, Jun 11, 2019 at 4:27 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Tue, Jun 11, 2019 at 12:03:26AM +0200, Rafael J. Wysocki wrote:
> > On Mon, Jun 10, 2019 at 5:00 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > On Fri, Jun 07, 2019 at 11:49:51AM +0200, Rafael J. Wysocki wrote:
> > > > On 6/4/2019 12:52 AM, Marcelo Tosatti wrote:
> > > > >The cpuidle-haltpoll driver allows the guest vcpus to poll for a specified
> > > > >amount of time before halting. This provides the following benefits
> > > > >to host side polling:
> > > > >
> > > > >         1) The POLL flag is set while polling is performed, which allows
> > > > >            a remote vCPU to avoid sending an IPI (and the associated
> > > > >            cost of handling the IPI) when performing a wakeup.
> > > > >
> > > > >         2) The HLT VM-exit cost can be avoided.
> > > > >
> > > > >The downside of guest side polling is that polling is performed
> > > > >even with other runnable tasks in the host.
> > > > >
> > > > >Results comparing halt_poll_ns and server/client application
> > > > >where a small packet is ping-ponged:
> > > > >
> > > > >host                                        --> 31.33
> > > > >halt_poll_ns=300000 / no guest busy spin    --> 33.40   (93.8%)
> > > > >halt_poll_ns=0 / guest_halt_poll_ns=300000  --> 32.73   (95.7%)
> > > > >
> > > > >For the SAP HANA benchmarks (where idle_spin is a parameter
> > > > >of the previous version of the patch, results should be the
> > > > >same):
> > > > >
> > > > >hpns == halt_poll_ns
> > > > >
> > > > >                           idle_spin=0/   idle_spin=800/    idle_spin=0/
> > > > >                           hpns=200000    hpns=0            hpns=800000
> > > > >DeleteC06T03 (100 thread) 1.76           1.71 (-3%)        1.78   (+1%)
> > > > >InsertC16T02 (100 thread) 2.14           2.07 (-3%)        2.18   (+1.8%)
> > > > >DeleteC00T01 (1 thread)   1.34           1.28 (-4.5%)           1.29   (-3.7%)
> > > > >UpdateC00T03 (1 thread)   4.72           4.18 (-12%)    4.53   (-5%)
> > > > >
> > > > >V2:
> > > > >
> > > > >- Move from x86 to generic code (Paolo/Christian).
> > > > >- Add auto-tuning logic (Paolo).
> > > > >- Add MSR to disable host side polling (Paolo).
> > > > >
> > > > >
> > > > >
> > > > First of all, please CC power management patches (including cpuidle,
> > > > cpufreq etc) to linux-pm@vger.kernel.org (there are people on that
> > > > list who may want to see your changes before they go in) and CC
> > > > cpuidle material (in particular) to Peter Zijlstra.
> > > >
> > > > Second, I'm not a big fan of this approach to be honest, as it kind
> > > > of is a driver trying to play the role of a governor.
> > > >
> > > > We have a "polling state" already that could be used here in
> > > > principle so I wonder what would be wrong with that.  Also note that
> > > > there seems to be at least some code duplication between your code
> > > > and the "polling state" implementation, so maybe it would be
> > > > possible to do some things in a common way?
> > >
> > > Hi Rafael,
> > >
> > > After modifying poll_state.c to use a generic "poll time" driver
> > > callback [1] (since using a variable "target_residency" for that
> > > looks really ugly), would need a governor which does:
> > >
> > > haltpoll_governor_select_next_state()
> > >         if (prev_state was poll and evt happened on prev poll window) -> POLL.
> > >         if (prev_state == HLT)  -> POLL
> > >         otherwise               -> HLT
> > >
> > > And a "default_idle" cpuidle driver that:
> > >
> > > defaultidle_idle()
> > >         if (current_clr_polling_and_test()) {
> > >                 local_irq_enable();
> > >                 return index;
> > >         }
> > >         default_idle();
> > >         return
> > >
> > > Using such governor with any other cpuidle driver would
> > > be pointless (since it would enter the first state only
> > > and therefore not save power).
> > >
> > > Not certain about using the default_idle driver with
> > > other governors: one would rather use a driver that
> > > supports all states on a given machine.
> > >
> > > This combination of governor/driver pair, for the sake
> > > of sharing the idle loop, seems awkward to me.
> > > And fails the governor/driver separation: one will use the
> > > pair in practice.
> > >
> > > But i have no problem with it, so i'll proceed with that.
> > >
> > > Let me know otherwise.
> >
> > If my understanding of your argumentation is correct, it is only
> > necessary to take the default_idle_call() branch of
> > cpuidle_idle_call() in the VM case, so it should be sufficient to
> > provide a suitable default_idle_call() which is what you seem to be
> > trying to do.
>
> In the VM case, we need to poll before actually halting (this is because
> its tricky to implement MWAIT in guests, so polling for some amount
> of time allows the IPI avoidance optimization,
> see trace_sched_wake_idle_without_ipi, to take place).
>
> The amount of time we poll is variable and adjusted (see adjust_haltpoll_ns
> in the patchset).
>
> > I might have been confused by the terminology used in the patch series
> > if that's the case.
> >
> > Also, if that's the case, this is not cpuidle matter really.  It is a
> > matter of providing a better default_idle_call() for the arch at hand.
>
> Peter Zijlstra suggested a cpuidle driver for this.

So I wonder what his rationale was.

> Also, other architectures will use the same "poll before exiting to VM"
> logic (so we'd rather avoid duplicating this code): PPC, x86, S/390,
> MIPS... So in my POV it makes sense to unify this.

The logic is fine IMO, but the implementation here is questionable.

> So, back to your initial suggestion:
>
> Q) "Can you unify code with poll_state.c?"
> A) Yes, but it requires a new governor, which seems overkill and unfit
> for the purpose.
>
> Moreover, the logic in menu to decide whether its necessary or not
> to stop sched tick is useful for us (so a default_idle_call is not
> sufficient), because the cost of enabling/disabling the sched tick is
> high on VMs.

So in fact you need a governor, but you really only need it to decide
whether or not to stop the tick for you.

menu has a quite high overhead for that. :-)

> So i'll fix the comments of the cpuidle driver (which everyone seems
> to agree with, except your understandable distate for it) and repost.

  reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190603225242.289109849@amt.cnet>
2019-06-07  9:49 ` Rafael J. Wysocki
2019-06-07 17:16   ` Marcelo Tosatti
2019-06-07 18:22     ` Paolo Bonzini
2019-06-07 21:38       ` Marcelo Tosatti
2019-06-10 14:59   ` Marcelo Tosatti
2019-06-10 22:03     ` Rafael J. Wysocki
2019-06-11 14:26       ` Marcelo Tosatti
2019-06-11 21:24         ` Rafael J. Wysocki [this message]
2019-06-17 15:57           ` Peter Zijlstra
     [not found] ` <20190603225254.212931277@amt.cnet>
2019-06-07  9:54   ` [patch 1/3] drivers/cpuidle: add cpuidle-haltpoll driver Rafael J. Wysocki

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJZ5v0gPbSXB3r71XaT-4Q7LsiFO_UVymBwOmU8J1W5+COk_1g@mail.gmail.com \
    --to=rafael@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=borntraeger@de.ibm.com \
    --cc=karahmed@amazon.de \
    --cc=kernellwp@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pm/0 linux-pm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pm linux-pm/ https://lore.kernel.org/linux-pm \
		linux-pm@vger.kernel.org linux-pm@archiver.kernel.org
	public-inbox-index linux-pm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox