All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, Christoph Lameter <cl@linux.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Juri Lelli <juri.lelli@redhat.com>, Nitesh Lal <nilal@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Nicolas Saenz <nsaenzju@redhat.com>
Subject: Re: [patch 1/5] sched: isolation: introduce quiesce_on_exit_to_usermode isolcpu flags
Date: Mon, 19 Jul 2021 16:14:40 +0200	[thread overview]
Message-ID: <20210719141440.GE116346@lothringen> (raw)
In-Reply-To: <20210714204233.648529431@fuller.cnet>

On Wed, Jul 14, 2021 at 05:42:06PM -0300, Marcelo Tosatti wrote:
> Add a new isolcpus flag "quiesce_on_exit_to_usermode" to enable
> quiescing of deferred actions on return to userspace.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> Index: linux-2.6-vmstat-update/include/linux/sched/isolation.h
> ===================================================================
> --- linux-2.6-vmstat-update.orig/include/linux/sched/isolation.h
> +++ linux-2.6-vmstat-update/include/linux/sched/isolation.h
> Index: linux-2.6-vmstat-update/Documentation/admin-guide/kernel-parameters.txt
> ===================================================================
> --- linux-2.6-vmstat-update.orig/Documentation/admin-guide/kernel-parameters.txt
> +++ linux-2.6-vmstat-update/Documentation/admin-guide/kernel-parameters.txt
> @@ -2124,6 +2124,43 @@
>  
>  			The format of <cpu-list> is described above.
>  
> +                         quiesce_on_exit_to_usermode
> +
> +			  This flag allows userspace to take preventive measures to
> +			  avoid deferred actions and create a OS noise free environment for
> +			  the application, by quiescing such activities on
> +			  return from syscalls (that is, perform the necessary
> +			  background work on return to userspace, rather than allowing
> +			  it to happen when userspace is executing, in the form of
> +			  an interruption to the application).
> +
> +			  There might be a performance degradation from using this,
> +			  on systemcall heavy workloads, for the isolated CPUs.
> +			  This option is intended to be used by specialized workloads.
> +
> +			  It should be deprecated in favour of a prctl() interface
> +			  to enable this mode (which allows the quiescing to take
> +			  place only on select sections of userspace execution, namely
> +			  the latency sensitive loops).

So I don't believe in that. If boot parameters were deprecatable, isolcpus would
have been removed already. And now that it's here we have to support it forever
and even fight for keeping it usable with modern interfaces like cpuset.

Besides, such (very costly) quiescence on kernel exit should be only useful on
specific sections of a workload. No need to kill the performance everywhere.

It's a new feature, not a fix, so let's introduce a proper prctl() interface
once and for all. We can't postpone that step forever.

Thanks.

> +
> +			  Note: one of the preventive measures this option
> +			  enables is the following.
> +
> +			  Page counters are maintained in per-CPU counters to
> +			  improve performance. When a CPU modifies a page counter,
> +			  this modification is kept in the per-CPU counter.
> +			  Certain activities require a global count, which
> +			  involves requesting each CPU to flush its local counters
> +			  to the global VM counters.
> +			  This flush is implemented via a workqueue item, which
> +			  requires scheduling the workqueue task on isolated CPUs.
> +
> +			  To avoid this interruption, quiesce_on_exit_to_usermode
> +			  syncs the page counters on each return from system calls.
> +			  To ensure the application returns to userspace
> +			  with no modified per-CPU counters, its necessary to
> +			  use mlockall() in addition to this isolcpus flag.
> +
>  	iucv=		[HW,NET]
>  
>  	ivrs_ioapic	[HW,X86-64]
> 
> 

  reply	other threads:[~2021-07-19 14:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 20:42 [patch 0/5] optionally perform deferred actions on return to userspace (v3) Marcelo Tosatti
2021-07-14 20:42 ` [patch 1/5] sched: isolation: introduce quiesce_on_exit_to_usermode isolcpu flags Marcelo Tosatti
2021-07-19 14:14   ` Frederic Weisbecker [this message]
2021-07-14 20:42 ` [patch 2/5] common entry: add hook for isolation to __syscall_exit_to_user_mode_work Marcelo Tosatti
2021-07-14 20:42 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace Marcelo Tosatti
2021-07-14 20:42 ` [patch 4/5] mm: vmstat: move need_update Marcelo Tosatti
2021-07-14 20:42 ` [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
  -- strict thread matches above, loose matches on Subject: below --
2021-07-09 17:37 [patch 0/5] optionally perform deferred actions on return to userspace Marcelo Tosatti
2021-07-09 17:37 ` [patch 1/5] sched: isolation: introduce quiesce_on_exit_to_usermode isolcpu flags Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210719141440.GE116346@lothringen \
    --to=frederic@kernel.org \
    --cc=cl@linux.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.