Linux-api Archive on lore.kernel.org
 help / color / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>,
	Alex Belits <abelits@marvell.com>
Cc: Alex Belits <abelits@marvell.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Prasun Kapoor <pkapoor@marvell.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	"linux-mm@vger.kernel.org" <linux-mm@vger.kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Subject: Re: [PATCH 03/12] task_isolation: userspace hard isolation from kernel
Date: Tue, 28 Apr 2020 11:12:50 -0300
Message-ID: <20200428141250.GA28012@fuller.cnet> (raw)
In-Reply-To: <20200305183313.GA29033@lenoir>


I like the idea as well, especially the reporting infrastructure, and 
would like to see something like this integrated upstream.

On Thu, Mar 05, 2020 at 07:33:13PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 04, 2020 at 04:07:12PM +0000, Alex Belits wrote:
> > The existing nohz_full mode is designed as a "soft" isolation mode
> > that makes tradeoffs to minimize userspace interruptions while
> > still attempting to avoid overheads in the kernel entry/exit path,
> > to provide 100% kernel semantics, etc.
> > 
> > However, some applications require a "hard" commitment from the
> > kernel to avoid interruptions, in particular userspace device driver
> > style applications, such as high-speed networking code.
> > 
> > This change introduces a framework to allow applications
> > to elect to have the "hard" semantics as needed, specifying
> > prctl(PR_TASK_ISOLATION, PR_TASK_ISOLATION_ENABLE) to do so.
> > 
> > The kernel must be built with the new TASK_ISOLATION Kconfig flag
> > to enable this mode, and the kernel booted with an appropriate
> > "isolcpus=nohz,domain,CPULIST" boot argument to enable
> > nohz_full and isolcpus. The "task_isolation" state is then indicated
> > by setting a new task struct field, task_isolation_flag, to the
> > value passed by prctl(), and also setting a TIF_TASK_ISOLATION
> > bit in the thread_info flags. When the kernel is returning to
> > userspace from the prctl() call and sees TIF_TASK_ISOLATION set,
> > it calls the new task_isolation_start() routine to arrange for
> > the task to avoid being interrupted in the future.
> > 
> > With interrupts disabled, task_isolation_start() ensures that kernel
> > subsystems that might cause a future interrupt are quiesced. If it
> > doesn't succeed, it adjusts the syscall return value to indicate that
> > fact, and userspace can retry as desired. In addition to stopping
> > the scheduler tick, the code takes any actions that might avoid
> > a future interrupt to the core, such as a worker thread being
> > scheduled that could be quiesced now (e.g. the vmstat worker)
> > or a future IPI to the core to clean up some state that could be
> > cleaned up now (e.g. the mm lru per-cpu cache).
> > 
> > Once the task has returned to userspace after issuing the prctl(),
> > if it enters the kernel again via system call, page fault, or any
> > other exception or irq, the kernel will kill it with SIGKILL.

This severely limits usage of the interface. 

I suppose the reason for blocking system calls is to make sure 
userspace does not initiate actions that might generate interruptions, 
such as IPI flushes (memory unmaps or changes), vmstat work items
(page dirtying), or is there any reason for it ?


+/* Only a few syscalls are valid once we are in task isolation mode. */
+static bool is_acceptable_syscall(int syscall)
+{
+       /* No need to incur an isolation signal if we are just exiting. */
+       if (syscall == __NR_exit || syscall == __NR_exit_group)
+               return true;
+       
+       /* Check to see if it's the prctl for isolation. */
+       if (syscall == __NR_prctl) {
+               unsigned long arg[SYSCALL_MAX_ARGS];
+       
+               syscall_get_arguments(current, current_pt_regs(), arg);
+               if (arg[0] == PR_TASK_ISOLATION)
+                       return true;
+       }
+ 
+       return false;
+}


> > In addition to sending a signal, the code supports a kernel
> > command-line "task_isolation_debug" flag which causes a stack
> > backtrace to be generated whenever a task loses isolation.
> > 
> > To allow the state to be entered and exited, the syscall checking
> > test ignores the prctl(PR_TASK_ISOLATION) syscall so that we can
> > clear the bit again later, and ignores exit/exit_group to allow
> > exiting the task without a pointless signal being delivered.
> > 
> > The prctl() API allows for specifying a signal number to use instead
> > of the default SIGKILL, to allow for catching the notification
> > signal; for example, in a production environment, it might be
> > helpful to log information to the application logging mechanism
> > before exiting. Or, the signal handler might choose to reset the
> > program counter back to the code segment intended to be run isolated
> > via prctl() to continue execution.
> 
> Hi Alew,
> 
> I'm glad this patchset is being resurected.
> Reading that changelog, I like the general idea and the direction.
> The diff is a bit scary though but I'll check the patches in detail
> in the upcoming days.
> 
> > 
> > In a number of cases we can tell on a remote cpu that we are
> > going to be interrupting the cpu, e.g. via an IPI or a TLB flush.
> > In that case we generate the diagnostic (and optional stack dump)
> > on the remote core to be able to deliver better diagnostics.
> > If the interrupt is not something caught by Linux (e.g. a
> > hypervisor interrupt) we can also request a reschedule IPI to
> > be sent to the remote core so it can be sure to generate a
> > signal to notify the process.
> 
> I'm wondering if it's wise to run that on a guest at all :-)
> Or we should consider any guest exit to the host as a
> disturbance, we would then need some sort of paravirt
> driver to notify that, etc... That doesn't sound appealing.
> 
> Thanks.


  parent reply index

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04 16:01 [PATCH 00/12] "Task_isolation" mode Alex Belits
2020-03-04 16:03 ` [PATCH 01/12] task_isolation: vmstat: add quiet_vmstat_sync function Alex Belits
2020-03-04 16:04 ` [PATCH 02/12] task_isolation: vmstat: add vmstat_idle function Alex Belits
2020-03-04 16:07 ` [PATCH 03/12] task_isolation: userspace hard isolation from kernel Alex Belits
2020-03-05 18:33   ` Frederic Weisbecker
2020-03-08  5:32     ` [EXT] " Alex Belits
2020-04-28 14:12     ` Marcelo Tosatti [this message]
2020-03-06 15:26   ` Frederic Weisbecker
2020-03-08  6:06     ` [EXT] " Alex Belits
2020-03-06 16:00   ` Frederic Weisbecker
2020-03-08  7:16     ` [EXT] " Alex Belits
2020-03-04 16:08 ` [PATCH 04/12] task_isolation: Add task isolation hooks to arch-independent code Alex Belits
2020-03-04 16:09 ` [PATCH 05/12] task_isolation: arch/x86: enable task isolation functionality Alex Belits
2020-03-04 16:10 ` [PATCH 06/12] task_isolation: arch/arm64: " Alex Belits
2020-03-04 16:31   ` Mark Rutland
2020-03-08  4:48     ` [EXT] " Alex Belits
2020-03-04 16:11 ` [PATCH 07/12] task_isolation: arch/arm: " Alex Belits
2020-03-04 16:12 ` [PATCH 08/12] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu() Alex Belits
2020-03-06 16:03   ` Frederic Weisbecker
2020-03-08  7:28     ` [EXT] " Alex Belits
2020-03-09  2:38       ` Frederic Weisbecker
2020-03-04 16:13 ` [PATCH 09/12] task_isolation: net: don't flush backlog on CPUs running isolated tasks Alex Belits
2020-03-04 16:14 ` [PATCH 10/12] task_isolation: ringbuffer: don't interrupt CPUs running isolated tasks on buffer resize Alex Belits
2020-03-04 16:15 ` [PATCH 11/12] task_isolation: kick_all_cpus_sync: don't kick isolated cpus Alex Belits
2020-03-06 15:34   ` Frederic Weisbecker
2020-03-08  6:48     ` [EXT] " Alex Belits
2020-03-09  2:28       ` Frederic Weisbecker
2020-03-04 16:16 ` [PATCH 12/12] task_isolation: CONFIG_TASK_ISOLATION prevents distribution of jobs to non-housekeeping CPUs Alex Belits
2020-03-08  3:42 ` [PATCH v2 00/12] "Task_isolation" mode Alex Belits
2020-03-08  3:44   ` [PATCH v2 01/12] task_isolation: vmstat: add quiet_vmstat_sync function Alex Belits
2020-03-08  3:46   ` [PATCH v2 02/12] task_isolation: vmstat: add vmstat_idle function Alex Belits
2020-03-08  3:47   ` [PATCH v2 03/12] task_isolation: userspace hard isolation from kernel Alex Belits
     [not found]     ` <20200307214254.7a8f6c22@hermes.lan>
2020-03-08  7:33       ` [EXT] " Alex Belits
2020-03-27  8:42     ` Marta Rybczynska
2020-04-06  4:31     ` Kevyn-Alexandre Paré
2020-04-06  4:43     ` Kevyn-Alexandre Paré
2020-03-08  3:48   ` [PATCH v2 04/12] task_isolation: Add task isolation hooks to arch-independent code Alex Belits
2020-03-08  3:49   ` [PATCH v2 05/12] task_isolation: arch/x86: enable task isolation functionality Alex Belits
2020-03-08  3:50   ` [PATCH v2 06/12] task_isolation: arch/arm64: " Alex Belits
2020-03-09 16:59     ` Mark Rutland
2020-03-08  3:52   ` [PATCH v2 07/12] task_isolation: arch/arm: " Alex Belits
2020-03-08  3:53   ` [PATCH v2 08/12] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu() Alex Belits
2020-03-08  3:54   ` [PATCH v2 09/12] task_isolation: net: don't flush backlog on CPUs running isolated tasks Alex Belits
2020-03-08  3:55   ` [PATCH v2 10/12] task_isolation: ringbuffer: don't interrupt CPUs running isolated tasks on buffer resize Alex Belits
2020-04-06  4:27     ` Kevyn-Alexandre Paré
2020-03-08  3:56   ` [PATCH v2 11/12] task_isolation: kick_all_cpus_sync: don't kick isolated cpus Alex Belits
2020-03-08  3:57   ` [PATCH v2 12/12] task_isolation: CONFIG_TASK_ISOLATION prevents distribution of jobs to non-housekeeping CPUs Alex Belits
2020-04-09 15:09   ` [PATCH v3 00/13] "Task_isolation" mode Alex Belits
2020-04-09 15:15     ` [PATCH 01/13] task_isolation: vmstat: add quiet_vmstat_sync function Alex Belits
2020-04-09 15:16     ` [PATCH 02/13] task_isolation: vmstat: add vmstat_idle function Alex Belits
2020-04-09 15:17     ` [PATCH v3 03/13] task_isolation: add instruction synchronization memory barrier Alex Belits
2020-04-15 12:44       ` Mark Rutland
2020-04-19  5:02         ` [EXT] " Alex Belits
2020-04-20 12:23           ` Will Deacon
2020-04-20 12:36             ` Mark Rutland
2020-04-20 13:55               ` Will Deacon
2020-04-21  7:41                 ` Will Deacon
2020-04-20 12:45           ` Mark Rutland
2020-04-09 15:20     ` [PATCH v3 04/13] task_isolation: userspace hard isolation from kernel Alex Belits
2020-04-09 18:00       ` Andy Lutomirski
2020-04-19  5:07         ` Alex Belits
2020-04-09 15:21     ` [PATCH 05/13] task_isolation: Add task isolation hooks to arch-independent code Alex Belits
2020-04-09 15:22     ` [PATCH 06/13] task_isolation: arch/x86: enable task isolation functionality Alex Belits
2020-04-09 15:23     ` [PATCH v3 07/13] task_isolation: arch/arm64: " Alex Belits
2020-04-22 12:08       ` Catalin Marinas
2020-04-09 15:24     ` [PATCH v3 08/13] task_isolation: arch/arm: " Alex Belits
2020-04-09 15:25     ` [PATCH v3 09/13] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu() Alex Belits
2020-04-09 15:26     ` [PATCH v3 10/13] task_isolation: net: don't flush backlog on CPUs running isolated tasks Alex Belits
2020-04-09 15:27     ` [PATCH v3 11/13] task_isolation: ringbuffer: don't interrupt CPUs running isolated tasks on buffer resize Alex Belits
2020-04-09 15:27     ` [PATCH v3 12/13] task_isolation: kick_all_cpus_sync: don't kick isolated cpus Alex Belits
2020-04-09 15:28     ` [PATCH v3 13/13] task_isolation: CONFIG_TASK_ISOLATION prevents distribution of jobs to non-housekeeping CPUs Alex Belits

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200428141250.GA28012@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=abelits@marvell.com \
    --cc=frederic@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pkapoor@marvell.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-api Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-api/0 linux-api/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-api linux-api/ https://lore.kernel.org/linux-api \
		linux-api@vger.kernel.org
	public-inbox-index linux-api

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-api


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git