linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch v8 02/10] add prctl task isolation prctl docs and samples
@ 2021-12-08 16:09 Marcelo Tosatti
  2022-01-06 23:49 ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2021-12-08 16:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nitesh Lal, Nicolas Saenz Julienne, Frederic Weisbecker,
	Christoph Lameter, Juri Lelli, Peter Zijlstra, Alex Belits,
	Peter Xu, Thomas Gleixner, Daniel Bristot de Oliveira,
	Marcelo Tosatti

Add documentation and userspace sample code for prctl
task isolation interface.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
v8:
- Document the possibility for ISOL_F_QUIESCE_ONE, to configure
individual features                                              (Frederic Weisbecker).
- Fix PR_ISOL_CFG_GET typo in documentation                      (Frederic Weisbecker).
- Rebased against linux-2.6.git.

v7:
-  No changes
v6:
 - Update docs and samples regarding oneshot mode (Frederic Weisbecker).
 - Update docs and samples regarding more extensibility of
   CFG_SET of ISOL_F_QUIESCE                   (Frederic Weisbecker).

v5:
 - Fix documentation typos		      (Frederic Weisbecker).
 - Fix oneshot example comment

v4:
 - Switch to structures for parameters when possible
   (which are more extensible).
 - Switch to CFG_{S,G}ET naming and use drop
   "internal configuration" prctls            (Frederic Weisbecker).
 - Add summary of terms to documentation      (Frederic Weisbecker).
 - Examples for compute and one-shot modes    (Thomas G/Christoph L).

v3:
 - Split in smaller patches              (Nitesh Lal).
 - Misc cleanups                         (Nitesh Lal).
 - Clarify nohz_full is not a dependency (Nicolas Saenz).
 - Incorrect values for prctl definitions (kernel robot).
 - Save configured state, so applications
   can activate externally configured
   task isolation parameters.
-  Remove "system default" notion (chisol should
   make it obsolete).
 - Update documentation: add new section with explanation
   about configuration/activation and code example.
 - Update samples.

v2:

- Finer-grained control of quiescing (Frederic Weisbecker / Nicolas Saenz).
- Avoid potential regressions by allowing applications
  to use ISOL_F_QUIESCE_DEFMASK (whose default value
  is configurable in /sys/).         (Nitesh Lal / Nicolas Saenz).

 Documentation/userspace-api/task_isolation.rst |  370 +++++++++++++++++++++++++
 samples/Kconfig                                |    7 
 samples/Makefile                               |    1 
 samples/task_isolation/Makefile                |   11 
 samples/task_isolation/task_isol.c             |   92 ++++++
 samples/task_isolation/task_isol.h             |    9 
 samples/task_isolation/task_isol_computation.c |   89 ++++++
 samples/task_isolation/task_isol_oneshot.c     |  104 +++++++
 samples/task_isolation/task_isol_userloop.c    |   54 +++
 9 files changed, 737 insertions(+)

Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
===================================================================
--- /dev/null
+++ linux-2.6/Documentation/userspace-api/task_isolation.rst
@@ -0,0 +1,375 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Task isolation prctl interface
+===============================
+
+Certain types of applications benefit from running uninterrupted by
+background OS activities. Realtime systems and high-bandwidth networking
+applications with user-space drivers can fall into the category.
+
+To create an OS noise free environment for the application, this
+interface allows userspace to inform the kernel the start and
+end of the latency sensitive application section (with configurable
+system behaviour for that section).
+
+Note: the prctl interface is independent of nohz_full=.
+
+The prctl options are:
+
+
+        - PR_ISOL_FEAT_GET: Retrieve supported features.
+        - PR_ISOL_CFG_GET: Retrieve task isolation configuration.
+        - PR_ISOL_CFG_SET: Set task isolation configuration.
+        - PR_ISOL_ACTIVATE_GET: Retrieve task isolation activation state.
+        - PR_ISOL_ACTIVATE_SET: Set task isolation activation state.
+
+Summary of terms:
+
+
+- feature:
+
+        A distinct attribute or aspect of task isolation. Examples of
+        features could be logging, new operating modes (eg: syscalls disallowed),
+        userspace notifications, etc. The only feature currently available is quiescing.
+
+- configuration:
+
+        A specific choice from a given set
+        of possible choices that dictate how the particular feature
+        in question should behave.
+
+- activation state:
+
+        The activation state (whether active/inactive) of the task
+        isolation features (features must be configured before
+        being activated).
+
+Inheritance of the isolation parameters and state, across
+fork(2) and clone(2), can be changed via
+PR_ISOL_CFG_GET/PR_ISOL_CFG_SET.
+
+
+At a high-level, task isolation is divided in two steps:
+
+1. Configuration.
+2. Activation.
+
+Section "Userspace support" describes how to use
+task isolation.
+
+In terms of the interface, the sequence of steps to activate
+task isolation are:
+
+1. Retrieve supported task isolation features (PR_ISOL_FEAT_GET).
+2. Configure task isolation features (PR_ISOL_CFG_GET/PR_ISOL_CFG_SET).
+3. Activate or deactivate task isolation features (PR_ISOL_ACTIVATE_GET/PR_ISOL_ACTIVATE_SET).
+
+This interface is based on ideas and code from the
+task isolation patchset from Alex Belits:
+https://lwn.net/Articles/816298/
+
+Note: if the need arises to configure an individual quiesce feature
+with its own extensible structure, please add ISOL_F_QUIESCE_ONE
+to PR_ISOL_CFG_GET/PR_ISOL_CFG_SET (ISOL_F_QUIESCE operates on
+multiple features per syscall currently).
+
+--------------------
+Feature description
+--------------------
+
+        - ``ISOL_F_QUIESCE``
+
+        This feature allows quiescing selected kernel activities on
+        return from system calls.
+
+---------------------
+Interface description
+---------------------
+
+**PR_ISOL_FEAT**:
+
+        Returns the supported features and feature
+        capabilities, as a bitmask::
+
+                prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
+
+        The 'feat' argument specifies whether to return
+        supported features (if zero), or feature capabilities
+        (if not zero). Possible values for 'feat' are:
+
+
+        - ``0``:
+               Return the bitmask of supported features, in the location
+               pointed  to  by  ``(int *)arg3``. The buffer should allow space
+               for 8 bytes.
+
+        - ``ISOL_F_QUIESCE``:
+
+               Return a structure containing which kernel
+               activities are supported for quiescing, in the location
+               pointed to by ``(int *)arg3``::
+
+                        struct task_isol_quiesce_extensions {
+                                __u64 flags;
+                                __u64 supported_quiesce_bits;
+                                __u64 pad[6];
+                        };
+
+               Where:
+
+               *flags*: Additional flags (should be zero).
+
+               *supported_quiesce_bits*: Bitmask indicating
+                which features are supported for quiescing.
+
+               *pad*: Additional space for future enhancements.
+
+
+        Features and its capabilities are defined at
+        include/uapi/linux/task_isolation.h.
+
+**PR_ISOL_CFG_GET**:
+
+        Retrieve task isolation configuration.
+        The general format is::
+
+                prctl(PR_ISOL_CFG_GET, what, arg3, arg4, arg5);
+
+        The 'what' argument specifies what to configure. Possible values are:
+
+        - ``I_CFG_FEAT``:
+
+                Return configuration of task isolation features. The 'arg3' argument specifies
+                whether to return configured features (if zero), or individual
+                feature configuration (if not zero), as follows.
+
+                - ``0``:
+
+                        Return the bitmask of configured features, in the location
+                        pointed  to  by  ``(int *)arg4``. The buffer should allow space
+                        for 8 bytes.
+
+                - ``ISOL_F_QUIESCE``:
+
+                        If arg4 is QUIESCE_CONTROL, return the control structure for
+                        quiescing of background kernel activities, in the location
+                        pointed to by ``(int *)arg5``::
+
+                         struct task_isol_quiesce_control {
+                                __u64 flags;
+                                __u64 quiesce_mask;
+                                __u64 quiesce_oneshot_mask;
+                                __u64 pad[5];
+                         };
+
+                        See PR_ISOL_CFG_SET description for meaning of fields.
+
+        - ``I_CFG_INHERIT``:
+
+                Retrieve inheritance configuration across fork/clone.
+
+                Return the structure which configures inheritance
+                across fork/clone, in the location pointed to
+                by ``(int *)arg4``::
+
+                        struct task_isol_inherit_control {
+                                __u8    inherit_mask;
+                                __u8    pad[7];
+                        };
+
+                See PR_ISOL_CFG_SET description for meaning of fields.
+
+**PR_ISOL_CFG_SET**:
+
+        Set task isolation configuration.
+        The general format is::
+
+                prctl(PR_ISOL_CFG_SET, what, arg3, arg4, arg5);
+
+        The 'what' argument specifies what to configure. Possible values are:
+
+        - ``I_CFG_FEAT``:
+
+                Set configuration of task isolation features. 'arg3' specifies
+                the feature. Possible values are:
+
+                - ``ISOL_F_QUIESCE``:
+
+                        If arg4 is QUIESCE_CONTROL, set the control structure
+                        for quiescing of background kernel activities, from
+                        the location pointed to by ``(int *)arg5``::
+
+                         struct task_isol_quiesce_control {
+                                __u64 flags;
+                                __u64 quiesce_mask;
+                                __u64 quiesce_oneshot_mask;
+                                __u64 pad[5];
+                         };
+
+                        Where:
+
+                        *flags*: Additional flags (should be zero).
+
+                        *quiesce_mask*: A bitmask containing which kernel
+                        activities to quiesce.
+
+                        *quiesce_oneshot_mask*: A bitmask indicating which kernel
+                        activities should behave in oneshot mode, that is, quiescing
+                        will happen on return from prctl(PR_ISOL_ACTIVATE_SET), but not
+                        on return of subsequent system calls. The corresponding bit(s)
+                        must also be set at quiesce_mask.
+
+                        *pad*: Additional space for future enhancements.
+
+                        For quiesce_mask (and quiesce_oneshot_mask), possible bit sets are:
+
+                        - ``ISOL_F_QUIESCE_VMSTATS``
+
+                        VM statistics are maintained in per-CPU counters to
+                        improve performance. When a CPU modifies a VM statistic,
+                        this modification is kept in the per-CPU counter.
+                        Certain activities require a global count, which
+                        involves requesting each CPU to flush its local counters
+                        to the global VM counters.
+
+                        This flush is implemented via a workqueue item, which
+                        might schedule a workqueue on isolated CPUs.
+
+                        To avoid this interruption, task isolation can be
+                        configured to, upon return from system calls, synchronize
+                        the per-CPU counters to global counters, thus avoiding
+                        the interruption.
+
+        - ``I_CFG_INHERIT``:
+                Set inheritance configuration when a new task
+                is created via fork and clone.
+
+                The ``(int *)arg4`` argument is a pointer to::
+
+                        struct task_isol_inherit_control {
+                                __u8    inherit_mask;
+                                __u8    pad[7];
+                        };
+
+                inherit_mask is a bitmask that specifies which part
+                of task isolation should be inherited:
+
+                - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
+                  This is the state written via prctl(PR_ISOL_CFG_SET, ...).
+
+                - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
+                  (requires ISOL_INHERIT_CONF to be set). The new task
+                  should behave, after fork/clone, in the same manner
+                  as the parent task after it executed:
+
+                        prctl(PR_ISOL_ACTIVATE_SET, &mask, ...);
+
+**PR_ISOL_ACTIVATE_GET**:
+
+        Retrieve task isolation activation state.
+
+        The general format is::
+
+                prctl(PR_ISOL_ACTIVATE_GET, pmask, arg3, arg4, arg5);
+
+        'pmask' specifies the location of a feature mask, where
+        the current active mask will be copied. See PR_ISOL_ACTIVATE_SET
+        for description of individual bits.
+
+
+**PR_ISOL_ACTIVATE_SET**:
+
+        Set task isolation activation state (activates/deactivates
+        task isolation).
+
+        The general format is::
+
+                prctl(PR_ISOL_ACTIVATE_SET, pmask, arg3, arg4, arg5);
+
+
+        The 'pmask' argument specifies the location of an 8 byte mask
+        containing which features should be activated. Features whose
+        bits are cleared will be deactivated. The possible
+        bits for this mask are:
+
+                - ``ISOL_F_QUIESCE``:
+
+                Activate quiescing of background kernel activities.
+                Quiescing happens on return to userspace from this
+                system call, and on return from subsequent
+                system calls (unless quiesce_oneshot_mask has been set at
+                PR_ISOL_CFG_SET time).
+
+        Quiescing can be adjusted (while active) by
+        prctl(PR_ISOL_ACTIVATE_SET, &new_mask, ...).
+
+
+==================
+Userspace support
+==================
+
+Task isolation is divided in two main steps: configuration and activation.
+
+Each step can be performed by an external tool or the latency sensitive
+application itself. util-linux contains the "chisol" tool for this
+purpose.
+
+This results in three combinations:
+
+1. Both configuration and activation performed by the
+latency sensitive application.
+Allows fine grained control of what task isolation
+features are enabled and when (see samples section below).
+
+2. Only activation can be performed by the latency sensitive app
+(and configuration performed by chisol).
+This allows the admin/user to control task isolation parameters,
+and applications have to be modified only once.
+
+3. Configuration and activation performed by an external tool.
+This allows unmodified applications to take advantage of
+task isolation. Activation is performed by the "-a" option
+of chisol.
+
+========
+Examples
+========
+
+The ``samples/task_isolation/`` directory contains 3 examples:
+
+* task_isol_userloop.c:
+
+        Example of program with a loop on userspace scenario.
+
+* task_isol_computation.c:
+
+        Example of program that enters task isolated mode,
+        performs an amount of computation, exits task
+        isolated mode, and writes the computation to disk.
+
+* task_isol_oneshot.c:
+
+        Example of program that enables one-shot
+        mode for quiescing, enters a processing loop, then upon an external
+        event performs a number of syscalls to handle that event.
+
+This is a snippet of code to activate task isolation if
+it has been previously configured (by chisol for example)::
+
+        #include <sys/prctl.h>
+        #include <linux/types.h>
+
+        #ifdef PR_ISOL_CFG_GET
+        unsigned long long fmask;
+
+        ret = prctl(PR_ISOL_CFG_GET, I_CFG_FEAT, 0, &fmask, 0);
+        if (ret != -1 && fmask != 0) {
+                ret = prctl(PR_ISOL_ACTIVATE_SET, &fmask, 0, 0, 0);
+                if (ret == -1) {
+                        perror("prctl PR_ISOL_ACTIVATE_SET");
+                        return ret;
+                }
+        }
+        #endif
+
Index: linux-2.6/samples/task_isolation/task_isol.c
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+#include <errno.h>
+#include "task_isol.h"
+
+#ifdef PR_ISOL_FEAT_GET
+int task_isol_setup(int oneshot)
+{
+	int ret;
+	int errnosv;
+	unsigned long long fmask;
+	struct task_isol_quiesce_extensions qext;
+	struct task_isol_quiesce_control qctrl;
+
+	/* Retrieve supported task isolation features */
+	ret = prctl(PR_ISOL_FEAT_GET, 0, &fmask, 0, 0);
+	if (ret == -1) {
+		perror("prctl PR_ISOL_FEAT");
+		return ret;
+	}
+	printf("supported features bitmask: 0x%llx\n", fmask);
+
+	/* Retrieve supported ISOL_F_QUIESCE bits */
+	ret = prctl(PR_ISOL_FEAT_GET, ISOL_F_QUIESCE, &qext, 0, 0);
+	if (ret == -1) {
+		perror("prctl PR_ISOL_FEAT (ISOL_F_QUIESCE)");
+		return ret;
+	}
+	printf("supported ISOL_F_QUIESCE bits: 0x%llx\n",
+		qext.supported_quiesce_bits);
+
+	fmask = 0;
+	ret = prctl(PR_ISOL_CFG_GET, I_CFG_FEAT, 0, &fmask, 0);
+	errnosv = errno;
+	if (ret != -1 && fmask != 0) {
+		printf("Task isolation parameters already configured!\n");
+		return ret;
+	}
+	if (ret == -1 && errnosv != ENODATA) {
+		perror("prctl PR_ISOL_GET");
+		return ret;
+	}
+	memset(&qctrl, 0, sizeof(struct task_isol_quiesce_control));
+	qctrl.quiesce_mask = ISOL_F_QUIESCE_VMSTATS;
+	if (oneshot)
+		qctrl.quiesce_oneshot_mask = ISOL_F_QUIESCE_VMSTATS;
+
+	ret = prctl(PR_ISOL_CFG_SET, I_CFG_FEAT, ISOL_F_QUIESCE,
+		    QUIESCE_CONTROL, &qctrl);
+	if (ret == -1) {
+		perror("prctl PR_ISOL_CFG_SET");
+		return ret;
+	}
+	return ISOL_F_QUIESCE;
+}
+
+int task_isol_activate_set(unsigned long long mask)
+{
+	int ret;
+
+	ret = prctl(PR_ISOL_ACTIVATE_SET, &mask, 0, 0, 0);
+	if (ret == -1) {
+		perror("prctl PR_ISOL_ACTIVATE_SET");
+		return -1;
+	}
+
+	return 0;
+}
+
+#else
+
+int task_isol_setup(void)
+{
+	return 0;
+}
+
+int task_isol_activate_set(unsigned long long mask)
+{
+	return 0;
+}
+#endif
+
+
Index: linux-2.6/samples/task_isolation/task_isol.h
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __TASK_ISOL_H
+#define __TASK_ISOL_H
+
+int task_isol_setup(int oneshot);
+
+int task_isol_activate_set(unsigned long long mask);
+
+#endif /* __TASK_ISOL_H */
Index: linux-2.6/samples/task_isolation/task_isol_userloop.c
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol_userloop.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+#include "task_isol.h"
+
+int main(void)
+{
+	int ret;
+	void *buf = malloc(4096);
+	unsigned long mask;
+
+	memset(buf, 1, 4096);
+	ret = mlock(buf, 4096);
+	if (ret) {
+		perror("mlock");
+		return EXIT_FAILURE;
+	}
+
+	ret = task_isol_setup(0);
+	if (ret == -1)
+		return EXIT_FAILURE;
+
+	mask = ret;
+	/* enable quiescing on system call return, oneshot */
+	ret = task_isol_activate_set(mask);
+	if (ret)
+		return EXIT_FAILURE;
+
+#define NR_LOOPS 999999999
+#define NR_PRINT 100000000
+	/* busy loop */
+	while (ret < NR_LOOPS)  {
+		memset(buf, 0, 4096);
+		ret = ret+1;
+		if (!(ret % NR_PRINT))
+			printf("loops=%d of %d\n", ret, NR_LOOPS);
+	}
+
+
+	ret = task_isol_activate_set(mask & ~ISOL_F_QUIESCE);
+	if (ret)
+		return EXIT_FAILURE;
+
+	return EXIT_SUCCESS;
+}
+
Index: linux-2.6/samples/Kconfig
===================================================================
--- linux-2.6.orig/samples/Kconfig
+++ linux-2.6/samples/Kconfig
@@ -241,6 +241,13 @@ config SAMPLE_WATCH_QUEUE
 	  Build example userspace program to use the new mount_notify(),
 	  sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function.
 
+config SAMPLE_TASK_ISOLATION
+	bool "task isolation sample"
+	depends on CC_CAN_LINK && HEADERS_INSTALL
+	help
+	  Build example userspace program to use prctl task isolation
+	  interface.
+
 endif # SAMPLES
 
 config HAVE_SAMPLE_FTRACE_DIRECT
Index: linux-2.6/samples/Makefile
===================================================================
--- linux-2.6.orig/samples/Makefile
+++ linux-2.6/samples/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_SAMPLE_INTEL_MEI)		+= mei/
 subdir-$(CONFIG_SAMPLE_WATCHDOG)	+= watchdog
 subdir-$(CONFIG_SAMPLE_WATCH_QUEUE)	+= watch_queue
 obj-$(CONFIG_DEBUG_KMEMLEAK_TEST)	+= kmemleak/
+subdir-$(CONFIG_SAMPLE_TASK_ISOLATION)	+= task_isolation
Index: linux-2.6/samples/task_isolation/Makefile
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+userprogs-always-y += task_isol_userloop task_isol_computation task_isol_oneshot
+task_isol_userloop-objs := task_isol.o task_isol_userloop.o
+task_isol_computation-objs := task_isol.o task_isol_computation.o
+task_isol_oneshot-objs := task_isol.o task_isol_oneshot.o
+
+userccflags += -I usr/include
+
+
+#$(CC) $^ -o $@
Index: linux-2.6/samples/task_isolation/task_isol_computation.c
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol_computation.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Example of task isolation prctl interface with a loop:
+ *
+ *	do {
+ *		enable quiescing of kernel activities
+ *		perform computation
+ *		disable quiescing of kernel activities
+ *		write computation results to disk
+ *	} while (condition);
+ *
+ */
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+#include "task_isol.h"
+
+int main(void)
+{
+	int ret, fd, write_loops;
+	void *buf = malloc(4096);
+	unsigned long mask;
+
+	fd = open("/tmp/comp_output.data", O_RDWR|O_CREAT);
+	if (fd == -1) {
+		perror("open");
+		return EXIT_FAILURE;
+	}
+
+	memset(buf, 1, 4096);
+	ret = mlock(buf, 4096);
+	if (ret) {
+		perror("mlock");
+		return EXIT_FAILURE;
+	}
+
+	ret = task_isol_setup(0);
+	if (ret == -1)
+		return EXIT_FAILURE;
+
+	mask = ret;
+
+	write_loops = 0;
+	do {
+#define NR_LOOPS 999999999
+#define NR_PRINT 100000000
+		/* enable quiescing on system call return */
+		ret = task_isol_activate_set(mask);
+		if (ret)
+			return EXIT_FAILURE;
+
+		/* busy loop */
+		while (ret < NR_LOOPS)  {
+			memset(buf, 0xf, 4096);
+			ret = ret+1;
+			if (!(ret % NR_PRINT))
+				printf("wloop=%d loops=%d of %d\n", write_loops,
+					ret, NR_LOOPS);
+		}
+		/* disable quiescing on system call return */
+		ret = task_isol_activate_set(mask & ~ISOL_F_QUIESCE);
+		if (ret)
+			return EXIT_FAILURE;
+
+		/*
+		 * write computed data to disk, this would be
+		 * multiple writes on a real application, so
+		 * disabling quiescing is advantageous
+		 */
+		ret = write(fd, buf, 4096);
+		if (ret == -1) {
+			perror("write");
+			return EXIT_FAILURE;
+		}
+
+		write_loops += 1;
+	} while (write_loops < 5);
+
+
+	return EXIT_SUCCESS;
+}
+
Index: linux-2.6/samples/task_isolation/task_isol_oneshot.c
===================================================================
--- /dev/null
+++ linux-2.6/samples/task_isolation/task_isol_oneshot.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Example of task isolation prctl interface using
+ * oneshot mode for quiescing.
+ *
+ *
+ *      enable oneshot quiescing of kernel activities
+ *	do {
+ *		process data (no system calls)
+ *		if (event) {
+ *			process event with syscalls
+ *			enable oneshot quiescing of kernel activities
+ *		}
+ *	} while (!exit_condition);
+ *
+ */
+#include <sys/mman.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/prctl.h>
+#include <linux/prctl.h>
+#include "task_isol.h"
+
+int main(void)
+{
+	int ret, fd, cnt;
+	void *buf = malloc(4096);
+	unsigned long mask;
+
+	fd = open("/dev/zero", O_RDONLY);
+	if (fd == -1) {
+		perror("open");
+		return EXIT_FAILURE;
+	}
+
+	memset(buf, 1, 4096);
+	ret = mlock(buf, 4096);
+	if (ret) {
+		perror("mlock");
+		return EXIT_FAILURE;
+	}
+
+	ret = task_isol_setup(1);
+	if (ret == -1)
+		return EXIT_FAILURE;
+
+	mask = ret;
+
+#define NR_LOOPS 999999999
+#define NR_PRINT 100000000
+
+	/* enable quiescing on system call return, oneshot */
+	ret = task_isol_activate_set(mask);
+	if (ret)
+		return EXIT_FAILURE;
+	/* busy loop */
+	cnt = 0;
+	while (cnt < NR_LOOPS)  {
+		memset(buf, 0xf, 4096);
+		cnt = cnt+1;
+		if (!(cnt % NR_PRINT)) {
+			int i, r;
+
+			/* this could be considered handling an external
+			 * event: with one-shot mode, system calls
+			 * after prctl(PR_SET_ACTIVATE) will not incur
+			 * the penalty of quiescing
+			 */
+			printf("loops=%d of %d\n", cnt, NR_LOOPS);
+			for (i = 0; i < 100; i++) {
+				r = read(fd, buf, 4096);
+				if (r == -1) {
+					perror("read");
+					return EXIT_FAILURE;
+				}
+			}
+
+			ret = munlock(buf, 4096);
+			if (ret) {
+				perror("munlock");
+				return EXIT_FAILURE;
+			}
+
+			ret = mlock(buf, 4096);
+			if (ret) {
+				perror("mlock");
+				return EXIT_FAILURE;
+			}
+
+			/* enable quiescing on system call return */
+			ret = task_isol_activate_set(mask);
+			if (ret)
+				return EXIT_FAILURE;
+		}
+	}
+
+	return EXIT_SUCCESS;
+}
+



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch v8 02/10] add prctl task isolation prctl docs and samples
  2021-12-08 16:09 [patch v8 02/10] add prctl task isolation prctl docs and samples Marcelo Tosatti
@ 2022-01-06 23:49 ` Frederic Weisbecker
  2022-01-07 11:30   ` Marcelo Tosatti
  0 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2022-01-06 23:49 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Nitesh Lal, Nicolas Saenz Julienne,
	Christoph Lameter, Juri Lelli, Peter Zijlstra, Alex Belits,
	Peter Xu, Thomas Gleixner, Daniel Bristot de Oliveira

On Wed, Dec 08, 2021 at 01:09:08PM -0300, Marcelo Tosatti wrote:
> Add documentation and userspace sample code for prctl
> task isolation interface.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Acked-by: Frederic Weisbecker <frederic@kernel.org>

Thanks a lot! Time for me to look at the rest of the series.

Would be nice to have Thomas's opinion as well at least on
the interface (this patch).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch v8 02/10] add prctl task isolation prctl docs and samples
  2022-01-06 23:49 ` Frederic Weisbecker
@ 2022-01-07 11:30   ` Marcelo Tosatti
  2022-01-08  0:03     ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2022-01-07 11:30 UTC (permalink / raw)
  To: Frederic Weisbecker, Christoph Lameter
  Cc: linux-kernel, Nitesh Lal, Nicolas Saenz Julienne,
	Christoph Lameter, Juri Lelli, Peter Zijlstra, Alex Belits,
	Peter Xu, Thomas Gleixner, Daniel Bristot de Oliveira

On Fri, Jan 07, 2022 at 12:49:56AM +0100, Frederic Weisbecker wrote:
> On Wed, Dec 08, 2021 at 01:09:08PM -0300, Marcelo Tosatti wrote:
> > Add documentation and userspace sample code for prctl
> > task isolation interface.
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> Acked-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Thanks a lot! Time for me to look at the rest of the series.
> 
> Would be nice to have Thomas's opinion as well at least on
> the interface (this patch).

Yes. AFAIAW most of his earlier comments on what the 
interface should look like have been addressed (or at
least i've tried to)... including the ability for
the system admin to configure the isolation options.

The one thing missing is to attempt to enter nohz_full
on activation (which Christoph asked for).

Christoph, have a question on that. At
https://lkml.org/lkml/2021/12/14/346, you wrote:

"Applications running would ideally have no performance penalty and there
is no  issue with kernel activity unless the application is in its special
low latency loop. NOHZ is currently only activated after spinning in that
loop for 2 seconds or so. Would be best to be able to trigger that
manually somehow."

So was thinking of something similar to what the full task isolation
patchset does (with the behavior of returning an error as option...):

+int try_stop_full_tick(void)
+{
+	int cpu = smp_processor_id();
+	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+
+	/* For an unstable clock, we should return a permanent error code. */
+	if (atomic_read(&tick_dep_mask) & TICK_DEP_MASK_CLOCK_UNSTABLE)
+		return -EINVAL;
+
+	if (!can_stop_full_tick(cpu, ts))
+		return -EAGAIN;
+
+	tick_nohz_stop_sched_tick(ts, cpu);
+	return 0;
+}

Is that sufficient? (note it might still be possible 
for a failure to enter nohz_full due to a number of 
reasons), see tick_nohz_stop_sched_tick.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch v8 02/10] add prctl task isolation prctl docs and samples
  2022-01-07 11:30   ` Marcelo Tosatti
@ 2022-01-08  0:03     ` Frederic Weisbecker
  2022-01-24 18:10       ` Marcelo Tosatti
  0 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2022-01-08  0:03 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Christoph Lameter, linux-kernel, Nitesh Lal,
	Nicolas Saenz Julienne, Juri Lelli, Peter Zijlstra, Alex Belits,
	Peter Xu, Thomas Gleixner, Daniel Bristot de Oliveira

On Fri, Jan 07, 2022 at 08:30:01AM -0300, Marcelo Tosatti wrote:
> On Fri, Jan 07, 2022 at 12:49:56AM +0100, Frederic Weisbecker wrote:
> > On Wed, Dec 08, 2021 at 01:09:08PM -0300, Marcelo Tosatti wrote:
> > > Add documentation and userspace sample code for prctl
> > > task isolation interface.
> > > 
> > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > Acked-by: Frederic Weisbecker <frederic@kernel.org>
> > 
> > Thanks a lot! Time for me to look at the rest of the series.
> > 
> > Would be nice to have Thomas's opinion as well at least on
> > the interface (this patch).
> 
> Yes. AFAIAW most of his earlier comments on what the 
> interface should look like have been addressed (or at
> least i've tried to)... including the ability for
> the system admin to configure the isolation options.
> 
> The one thing missing is to attempt to enter nohz_full
> on activation (which Christoph asked for).
> 
> Christoph, have a question on that. At
> https://lkml.org/lkml/2021/12/14/346, you wrote:
> 
> "Applications running would ideally have no performance penalty and there
> is no  issue with kernel activity unless the application is in its special
> low latency loop. NOHZ is currently only activated after spinning in that
> loop for 2 seconds or so. Would be best to be able to trigger that
> manually somehow."
> 
> So was thinking of something similar to what the full task isolation
> patchset does (with the behavior of returning an error as option...):
> 
> +int try_stop_full_tick(void)
> +{
> +	int cpu = smp_processor_id();
> +	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> +
> +	/* For an unstable clock, we should return a permanent error code. */
> +	if (atomic_read(&tick_dep_mask) & TICK_DEP_MASK_CLOCK_UNSTABLE)
> +		return -EINVAL;
> +
> +	if (!can_stop_full_tick(cpu, ts))
> +		return -EAGAIN;
> +
> +	tick_nohz_stop_sched_tick(ts, cpu);
> +	return 0;
> +}
> 
> Is that sufficient? (note it might still be possible 
> for a failure to enter nohz_full due to a number of 
> reasons), see tick_nohz_stop_sched_tick.

Well, I guess we can simply make tick_nohz_full_update_tick() an API, then
it could be a QUIESCE feature.

But keep in mind we may not only fail to enter into nohz_full mode, we
may also enter it but, instead of completely stopping the tick, it can
be delayed to some future if there is still a timer callback queued somewhere.

Make sure you test "ts->next_tick == KTIME_MAX" after stopping the tick.

This raise the question: what do we do if a quiescing fails? At least if it's a
oneshot, we can return an -EBUSY from the prctl() but otherwise, subsequent kernel
entry/exit are a problem.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch v8 02/10] add prctl task isolation prctl docs and samples
  2022-01-08  0:03     ` Frederic Weisbecker
@ 2022-01-24 18:10       ` Marcelo Tosatti
  2022-01-24 18:20         ` Marcelo Tosatti
  0 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2022-01-24 18:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, linux-kernel, Nitesh Lal,
	Nicolas Saenz Julienne, Juri Lelli, Peter Zijlstra, Alex Belits,
	Peter Xu, Thomas Gleixner, Daniel Bristot de Oliveira

On Sat, Jan 08, 2022 at 01:03:08AM +0100, Frederic Weisbecker wrote:
> On Fri, Jan 07, 2022 at 08:30:01AM -0300, Marcelo Tosatti wrote:
> > On Fri, Jan 07, 2022 at 12:49:56AM +0100, Frederic Weisbecker wrote:
> > > On Wed, Dec 08, 2021 at 01:09:08PM -0300, Marcelo Tosatti wrote:
> > > > Add documentation and userspace sample code for prctl
> > > > task isolation interface.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > 
> > > Acked-by: Frederic Weisbecker <frederic@kernel.org>
> > > 
> > > Thanks a lot! Time for me to look at the rest of the series.
> > > 
> > > Would be nice to have Thomas's opinion as well at least on
> > > the interface (this patch).
> > 
> > Yes. AFAIAW most of his earlier comments on what the 
> > interface should look like have been addressed (or at
> > least i've tried to)... including the ability for
> > the system admin to configure the isolation options.
> > 
> > The one thing missing is to attempt to enter nohz_full
> > on activation (which Christoph asked for).
> > 
> > Christoph, have a question on that. At
> > https://lkml.org/lkml/2021/12/14/346, you wrote:
> > 
> > "Applications running would ideally have no performance penalty and there
> > is no  issue with kernel activity unless the application is in its special
> > low latency loop. NOHZ is currently only activated after spinning in that
> > loop for 2 seconds or so. Would be best to be able to trigger that
> > manually somehow."
> > 
> > So was thinking of something similar to what the full task isolation
> > patchset does (with the behavior of returning an error as option...):
> > 
> > +int try_stop_full_tick(void)
> > +{
> > +	int cpu = smp_processor_id();
> > +	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> > +
> > +	/* For an unstable clock, we should return a permanent error code. */
> > +	if (atomic_read(&tick_dep_mask) & TICK_DEP_MASK_CLOCK_UNSTABLE)
> > +		return -EINVAL;
> > +
> > +	if (!can_stop_full_tick(cpu, ts))
> > +		return -EAGAIN;
> > +
> > +	tick_nohz_stop_sched_tick(ts, cpu);
> > +	return 0;
> > +}
> > 
> > Is that sufficient? (note it might still be possible 
> > for a failure to enter nohz_full due to a number of 
> > reasons), see tick_nohz_stop_sched_tick.
> 
> Well, I guess we can simply make tick_nohz_full_update_tick() an API, then
> it could be a QUIESCE feature.
> 
> But keep in mind we may not only fail to enter into nohz_full mode, we
> may also enter it but, instead of completely stopping the tick, it can
> be delayed to some future if there is still a timer callback queued somewhere.
> 
> Make sure you test "ts->next_tick == KTIME_MAX" after stopping the tick.
> 
> This raise the question: what do we do if a quiescing fails? At least if it's a
> oneshot, we can return an -EBUSY from the prctl() but otherwise, subsequent kernel
> entry/exit are a problem.

Well, maybe two modes can be specified for the NOHZ_FULL task isolation
feature. On activation of task isolation:

	- Hint (default). Attempt to enter nohz_full mode,
	  continue if unable to do so.

	- Mandatory. Return an error if unable to enter nohz_full mode
	  (tracing required to determine actual reason. is that OK?)

static bool check_tick_dependency(atomic_t *dep)
{
        int val = atomic_read(dep);

        if (val & TICK_DEP_MASK_POSIX_TIMER) {
                trace_tick_stop(0, TICK_DEP_MASK_POSIX_TIMER);
                return true;
        }

        if (val & TICK_DEP_MASK_PERF_EVENTS) {
                trace_tick_stop(0, TICK_DEP_MASK_PERF_EVENTS);
                return true;
        }

        if (val & TICK_DEP_MASK_SCHED) {
                trace_tick_stop(0, TICK_DEP_MASK_SCHED);
                return true;
        }

        if (val & TICK_DEP_MASK_CLOCK_UNSTABLE) {
                trace_tick_stop(0, TICK_DEP_MASK_CLOCK_UNSTABLE);
                return true;
        }

        if (val & TICK_DEP_MASK_RCU) {
                trace_tick_stop(0, TICK_DEP_MASK_RCU);
                return true;
        }

        return false;
}

One thing that can be done on the handlers is to execute any pending irq_work, which
would fix:

https://lkml.org/lkml/2021/6/18/1174

How about that ?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch v8 02/10] add prctl task isolation prctl docs and samples
  2022-01-24 18:10       ` Marcelo Tosatti
@ 2022-01-24 18:20         ` Marcelo Tosatti
  0 siblings, 0 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2022-01-24 18:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Christoph Lameter, linux-kernel, Nitesh Lal,
	Nicolas Saenz Julienne, Juri Lelli, Peter Zijlstra, Alex Belits,
	Peter Xu, Thomas Gleixner, Daniel Bristot de Oliveira

On Mon, Jan 24, 2022 at 03:10:42PM -0300, Marcelo Tosatti wrote:
> On Sat, Jan 08, 2022 at 01:03:08AM +0100, Frederic Weisbecker wrote:
> > On Fri, Jan 07, 2022 at 08:30:01AM -0300, Marcelo Tosatti wrote:
> > > On Fri, Jan 07, 2022 at 12:49:56AM +0100, Frederic Weisbecker wrote:
> > > > On Wed, Dec 08, 2021 at 01:09:08PM -0300, Marcelo Tosatti wrote:
> > > > > Add documentation and userspace sample code for prctl
> > > > > task isolation interface.
> > > > > 
> > > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > > 
> > > > Acked-by: Frederic Weisbecker <frederic@kernel.org>
> > > > 
> > > > Thanks a lot! Time for me to look at the rest of the series.
> > > > 
> > > > Would be nice to have Thomas's opinion as well at least on
> > > > the interface (this patch).
> > > 
> > > Yes. AFAIAW most of his earlier comments on what the 
> > > interface should look like have been addressed (or at
> > > least i've tried to)... including the ability for
> > > the system admin to configure the isolation options.
> > > 
> > > The one thing missing is to attempt to enter nohz_full
> > > on activation (which Christoph asked for).
> > > 
> > > Christoph, have a question on that. At
> > > https://lkml.org/lkml/2021/12/14/346, you wrote:
> > > 
> > > "Applications running would ideally have no performance penalty and there
> > > is no  issue with kernel activity unless the application is in its special
> > > low latency loop. NOHZ is currently only activated after spinning in that
> > > loop for 2 seconds or so. Would be best to be able to trigger that
> > > manually somehow."
> > > 
> > > So was thinking of something similar to what the full task isolation
> > > patchset does (with the behavior of returning an error as option...):
> > > 
> > > +int try_stop_full_tick(void)
> > > +{
> > > +	int cpu = smp_processor_id();
> > > +	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> > > +
> > > +	/* For an unstable clock, we should return a permanent error code. */
> > > +	if (atomic_read(&tick_dep_mask) & TICK_DEP_MASK_CLOCK_UNSTABLE)
> > > +		return -EINVAL;
> > > +
> > > +	if (!can_stop_full_tick(cpu, ts))
> > > +		return -EAGAIN;
> > > +
> > > +	tick_nohz_stop_sched_tick(ts, cpu);
> > > +	return 0;
> > > +}
> > > 
> > > Is that sufficient? (note it might still be possible 
> > > for a failure to enter nohz_full due to a number of 
> > > reasons), see tick_nohz_stop_sched_tick.
> > 
> > Well, I guess we can simply make tick_nohz_full_update_tick() an API, then
> > it could be a QUIESCE feature.
> > 
> > But keep in mind we may not only fail to enter into nohz_full mode, we
> > may also enter it but, instead of completely stopping the tick, it can
> > be delayed to some future if there is still a timer callback queued somewhere.
> > 
> > Make sure you test "ts->next_tick == KTIME_MAX" after stopping the tick.
> > 
> > This raise the question: what do we do if a quiescing fails? At least if it's a
> > oneshot, we can return an -EBUSY from the prctl() but otherwise, subsequent kernel
> > entry/exit are a problem.
> 
> Well, maybe two modes can be specified for the NOHZ_FULL task isolation
> feature. On activation of task isolation:
> 
> 	- Hint (default). Attempt to enter nohz_full mode,
> 	  continue if unable to do so.
> 
> 	- Mandatory. Return an error if unable to enter nohz_full mode
> 	  (tracing required to determine actual reason. is that OK?)

This mode is poorly defined. What happens if some event after task
isolation activation causes nohz_full mode to be disabled ?

Or an alternative is to let the verification of nohz_full mode 
to take place at a different location, for example a BPF tool.
This works for our usecase, i believe.

> 
> static bool check_tick_dependency(atomic_t *dep)
> {
>         int val = atomic_read(dep);
> 
>         if (val & TICK_DEP_MASK_POSIX_TIMER) {
>                 trace_tick_stop(0, TICK_DEP_MASK_POSIX_TIMER);
>                 return true;
>         }
> 
>         if (val & TICK_DEP_MASK_PERF_EVENTS) {
>                 trace_tick_stop(0, TICK_DEP_MASK_PERF_EVENTS);
>                 return true;
>         }
> 
>         if (val & TICK_DEP_MASK_SCHED) {
>                 trace_tick_stop(0, TICK_DEP_MASK_SCHED);
>                 return true;
>         }
> 
>         if (val & TICK_DEP_MASK_CLOCK_UNSTABLE) {
>                 trace_tick_stop(0, TICK_DEP_MASK_CLOCK_UNSTABLE);
>                 return true;
>         }
> 
>         if (val & TICK_DEP_MASK_RCU) {
>                 trace_tick_stop(0, TICK_DEP_MASK_RCU);
>                 return true;
>         }
> 
>         return false;
> }
> 
> One thing that can be done on the handlers is to execute any pending irq_work, which
> would fix:
> 
> https://lkml.org/lkml/2021/6/18/1174
> 
> How about that ?
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-24 18:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08 16:09 [patch v8 02/10] add prctl task isolation prctl docs and samples Marcelo Tosatti
2022-01-06 23:49 ` Frederic Weisbecker
2022-01-07 11:30   ` Marcelo Tosatti
2022-01-08  0:03     ` Frederic Weisbecker
2022-01-24 18:10       ` Marcelo Tosatti
2022-01-24 18:20         ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).