All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] Fix file lock cache accounting, again
@ 2024-01-17 16:14 Josh Poimboeuf
  2024-01-17 16:14 ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2024-01-17 16:14 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Layton, Chuck Lever, Shakeel Butt,
	Roman Gushchin, Johannes Weiner, Michal Hocko
  Cc: linux-kernel, Jens Axboe, Tejun Heo, Vasily Averin,
	Michal Koutny, Waiman Long, Muchun Song, Jiri Kosina, cgroups,
	linux-mm

This is an attempt to fix file lock cache accounting (again).  The bug
was originally reported 2+ years ago [1] but was quickly reverted [2]
for performance reasons.

A few years ago some ideas [3] were floated about how to improve the
performance.  Did any of those ever get implemented?

Testing shows "mm: improve performance of accounted kernel memory
allocations" [4] helping some.  But even with those patches, much of the
original performance regression still remains, at least according to
microbenchmarks.

Despite that regression, this being a security and correctness issue, it
really needs to be fixed by default.  Those who want to live on the edge
(or have trusted user space) can disable it.

Patch 1 enables the fix by default, but allows disabling it at boot
time.

Patch 2 allows disabling it at build time.

Patches 3 and 4 allow disabling it (along with all the CPU mitigations)
using mitigations=off.

[1] 0f12156dff28 ("memcg: enable accounting for file lock caches")
[2] 3754707bcc3e ("Revert "memcg: enable accounting for file lock caches"")
[3] https://lore.kernel.org/lkml/dbc9955d-6c28-1dd5-b842-ef39a762aa3b@kernel.dk/
[4] https://lore.kernel.org/lkml/20231019225346.1822282-1-roman.gushchin@linux.dev/

Josh Poimboeuf (4):
  fs/locks: Fix file lock cache accounting, again
  fs/locks: Add CONFIG_FLOCK_ACCOUNTING
  mitigations: Expand 'mitigations=off' to include optional software
    mitigations
  mitigations: Add flock cache accounting to 'mitigations=off'

 .../admin-guide/kernel-parameters.txt         | 48 ++++++++++++++----
 arch/arm64/kernel/cpufeature.c                |  2 +-
 arch/arm64/kernel/proton-pack.c               |  6 +--
 arch/powerpc/kernel/security.c                | 14 +++---
 arch/s390/kernel/nospec-branch.c              |  2 +-
 arch/x86/kernel/cpu/bugs.c                    | 35 ++++++-------
 arch/x86/kvm/mmu/mmu.c                        |  2 +-
 arch/x86/mm/pti.c                             |  3 +-
 fs/Kconfig                                    | 15 ++++++
 fs/locks.c                                    | 31 +++++++++++-
 include/linux/bpf.h                           |  5 +-
 include/linux/cpu.h                           |  3 --
 include/linux/mitigations.h                   |  4 ++
 kernel/Makefile                               |  3 +-
 kernel/cpu.c                                  | 43 ----------------
 kernel/mitigations.c                          | 50 +++++++++++++++++++
 16 files changed, 174 insertions(+), 92 deletions(-)
 create mode 100644 include/linux/mitigations.h
 create mode 100644 kernel/mitigations.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 16:14 [PATCH RFC 0/4] Fix file lock cache accounting, again Josh Poimboeuf
@ 2024-01-17 16:14 ` Josh Poimboeuf
  2024-01-17 19:00   ` Jeff Layton
  2024-01-17 16:14 ` [PATCH RFC 2/4] fs/locks: Add CONFIG_FLOCK_ACCOUNTING Josh Poimboeuf
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2024-01-17 16:14 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Layton, Chuck Lever, Shakeel Butt,
	Roman Gushchin, Johannes Weiner, Michal Hocko
  Cc: linux-kernel, Jens Axboe, Tejun Heo, Vasily Averin,
	Michal Koutny, Waiman Long, Muchun Song, Jiri Kosina, cgroups,
	linux-mm

A container can exceed its memcg limits by allocating a bunch of file
locks.

This bug was originally fixed by commit 0f12156dff28 ("memcg: enable
accounting for file lock caches"), but was later reverted by commit
3754707bcc3e ("Revert "memcg: enable accounting for file lock caches"")
due to performance issues.

Unfortunately those performance issues were never addressed and the bug
has remained unfixed for over two years.

Fix it by default but allow users to disable it with a cmdline option
(flock_accounting=off).

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         | 17 +++++++++++
 fs/locks.c                                    | 30 +++++++++++++++++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6ee0f9a5da70..91987b06bc52 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1527,6 +1527,23 @@
 			See Documentation/admin-guide/sysctl/net.rst for
 			fb_tunnels_only_for_init_ns
 
+	flock_accounting=
+			[KNL] Enable/disable accounting for kernel
+			memory allocations related to file locks.
+			Format: { on | off }
+			Default: on
+			on:	Enable kernel memory accounting for file
+				locks.  This prevents task groups from
+				exceeding their memcg allocation limits.
+				However, it may cause slowdowns in the
+				flock() system call.
+			off:	Disable kernel memory accounting for
+				file locks.  This may allow a rogue task
+				to DoS the system by forcing the kernel
+				to allocate memory beyond the task
+				group's memcg limits.  Not recommended
+				unless you have trusted user space.
+
 	floppy=		[HW]
 			See Documentation/admin-guide/blockdev/floppy.rst.
 
diff --git a/fs/locks.c b/fs/locks.c
index cc7c117ee192..235ac56c557d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2905,15 +2905,41 @@ static int __init proc_locks_init(void)
 fs_initcall(proc_locks_init);
 #endif
 
+static bool flock_accounting __ro_after_init = true;
+
+static int __init flock_accounting_cmdline(char *str)
+{
+	if (!str)
+		return -EINVAL;
+
+	if (!strcmp(str, "off"))
+		flock_accounting = false;
+	else if (!strcmp(str, "on"))
+		flock_accounting = true;
+	else
+		return -EINVAL;
+
+	return 0;
+}
+early_param("flock_accounting", flock_accounting_cmdline);
+
+#define FLOCK_ACCOUNTING_MSG "WARNING: File lock accounting is disabled, container-triggered host memory exhaustion possible!\n"
+
 static int __init filelock_init(void)
 {
 	int i;
+	slab_flags_t flags = SLAB_PANIC;
+
+	if (!flock_accounting)
+		pr_err(FLOCK_ACCOUNTING_MSG);
+	else
+		flags |= SLAB_ACCOUNT;
 
 	flctx_cache = kmem_cache_create("file_lock_ctx",
-			sizeof(struct file_lock_context), 0, SLAB_PANIC, NULL);
+			sizeof(struct file_lock_context), 0, flags, NULL);
 
 	filelock_cache = kmem_cache_create("file_lock_cache",
-			sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
+			sizeof(struct file_lock), 0, flags, NULL);
 
 	for_each_possible_cpu(i) {
 		struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC 2/4] fs/locks: Add CONFIG_FLOCK_ACCOUNTING
  2024-01-17 16:14 [PATCH RFC 0/4] Fix file lock cache accounting, again Josh Poimboeuf
  2024-01-17 16:14 ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
@ 2024-01-17 16:14 ` Josh Poimboeuf
  2024-01-17 16:14 ` [PATCH RFC 3/4] mitigations: Expand 'mitigations=off' to include optional software mitigations Josh Poimboeuf
  2024-01-17 16:14 ` [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off' Josh Poimboeuf
  3 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2024-01-17 16:14 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Layton, Chuck Lever, Shakeel Butt,
	Roman Gushchin, Johannes Weiner, Michal Hocko
  Cc: linux-kernel, Jens Axboe, Tejun Heo, Vasily Averin,
	Michal Koutny, Waiman Long, Muchun Song, Jiri Kosina, cgroups,
	linux-mm

Allow flock cache accounting to be disabled at build time.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 fs/Kconfig | 15 +++++++++++++++
 fs/locks.c |  2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index a3159831ba98..591f54a03059 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -129,6 +129,21 @@ config FILE_LOCKING
           for filesystems like NFS and for the flock() system
           call. Disabling this option saves about 11k.
 
+config FLOCK_ACCOUNTING
+	bool "Enable kernel memory accounting for file locks" if EXPERT
+	depends on FILE_LOCKING
+	default y
+	help
+	  This option enables kernel memory accounting for file locks.  This
+	  prevents task groups from exceeding their memcg allocation limits.
+	  However, it may cause slowdowns in the flock() system call.
+
+	  Disabling this option is not recommended as it may allow a rogue task
+	  to DoS the system by forcing the kernel to allocate memory beyond the
+	  task group's memcg limits.
+
+	  If unsure, say Y.
+
 source "fs/crypto/Kconfig"
 
 source "fs/verity/Kconfig"
diff --git a/fs/locks.c b/fs/locks.c
index 235ac56c557d..e2799a18c4e8 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2905,7 +2905,7 @@ static int __init proc_locks_init(void)
 fs_initcall(proc_locks_init);
 #endif
 
-static bool flock_accounting __ro_after_init = true;
+static bool flock_accounting __ro_after_init = IS_ENABLED(CONFIG_FLOCK_ACCOUNTING);
 
 static int __init flock_accounting_cmdline(char *str)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC 3/4] mitigations: Expand 'mitigations=off' to include optional software mitigations
  2024-01-17 16:14 [PATCH RFC 0/4] Fix file lock cache accounting, again Josh Poimboeuf
  2024-01-17 16:14 ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
  2024-01-17 16:14 ` [PATCH RFC 2/4] fs/locks: Add CONFIG_FLOCK_ACCOUNTING Josh Poimboeuf
@ 2024-01-17 16:14 ` Josh Poimboeuf
  2024-01-17 16:14 ` [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off' Josh Poimboeuf
  3 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2024-01-17 16:14 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Layton, Chuck Lever, Shakeel Butt,
	Roman Gushchin, Johannes Weiner, Michal Hocko
  Cc: linux-kernel, Jens Axboe, Tejun Heo, Vasily Averin,
	Michal Koutny, Waiman Long, Muchun Song, Jiri Kosina, cgroups,
	linux-mm

The 'mitigations=off' cmdline option disables all CPU mitigations at
runtime.  It's intended for users who are running with trusted user
space and don't want the performance impact associated with all the
mitigations.

Up until now, it was only used for CPU mitigations.  However, there can
also be optional software mitigations which have performance impact.

Expand 'mitigations=' to include optional software mitigations.  After
all there's nothing in the "mitigations" name which limits it to CPU
vulnerabilities.

In theory we could introduce separate {cpu,sw}_mitigations= options, but
for the time being there's no need to separate them out.

It's simpler to have them combined since the use case of "I have trusted
user space and don't want the performance impacts of unneeded
mitigations" is the same, regardless of the source of the bug.

Move the interfaces around and rename them to reflect the new broader
impact of mitigations=off.

No functional changes.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         | 27 ++++++----
 arch/arm64/kernel/cpufeature.c                |  2 +-
 arch/arm64/kernel/proton-pack.c               |  6 +--
 arch/powerpc/kernel/security.c                | 14 +++---
 arch/s390/kernel/nospec-branch.c              |  2 +-
 arch/x86/kernel/cpu/bugs.c                    | 35 ++++++-------
 arch/x86/kvm/mmu/mmu.c                        |  2 +-
 arch/x86/mm/pti.c                             |  3 +-
 include/linux/bpf.h                           |  5 +-
 include/linux/cpu.h                           |  3 --
 include/linux/mitigations.h                   |  4 ++
 kernel/Makefile                               |  3 +-
 kernel/cpu.c                                  | 43 ----------------
 kernel/mitigations.c                          | 50 +++++++++++++++++++
 14 files changed, 109 insertions(+), 90 deletions(-)
 create mode 100644 include/linux/mitigations.h
 create mode 100644 kernel/mitigations.c

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 91987b06bc52..24e873351368 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3391,16 +3391,23 @@
 			https://repo.or.cz/w/linux-2.6/mini2440.git
 
 	mitigations=
-			[X86,PPC,S390,ARM64] Control optional mitigations for
-			CPU vulnerabilities.  This is a set of curated,
-			arch-independent options, each of which is an
-			aggregation of existing arch-specific options.
+			[KNL] Control optional mitigations for CPU
+			vulnerabilities and performance-impacting
+			software vulnerabilities.  This is a set of
+			curated, arch-independent options, each of which
+			is an aggregation of existing arch-specific
+			options.
 
 			off
-				Disable all optional CPU mitigations.  This
-				improves system performance, but it may also
-				expose users to several CPU vulnerabilities.
-				Equivalent to: if nokaslr then kpti=0 [ARM64]
+				Disable all optional mitigations.  This
+				improves system performance, but may also
+				expose users to several vulnerabilities.
+
+				Equivalent to:
+
+					       CPU mitigations:
+					       ----------------
+					       if nokaslr then kpti=0 [ARM64]
 					       gather_data_sampling=off [X86]
 					       kvm.nx_huge_pages=off [X86]
 					       l1tf=off [X86]
@@ -3426,7 +3433,7 @@
 					       kvm.nx_huge_pages=force.
 
 			auto (default)
-				Mitigate all CPU vulnerabilities, but leave SMT
+				Enable all optional mitigations, but leave SMT
 				enabled, even if it's vulnerable.  This is for
 				users who don't want to be surprised by SMT
 				getting disabled across kernel upgrades, or who
@@ -3434,7 +3441,7 @@
 				Equivalent to: (default behavior)
 
 			auto,nosmt
-				Mitigate all CPU vulnerabilities, disabling SMT
+				Enable all optional mitigations, disabling SMT
 				if needed.  This is for users who always want to
 				be fully mitigated, even if it means losing SMT.
 				Equivalent to: l1tf=flush,nosmt [X86]
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 01a4c1d7fc09..ae37898e5b1a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1719,7 +1719,7 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
 		}
 	}
 
-	if (cpu_mitigations_off() && !__kpti_forced) {
+	if (mitigations_off() && !__kpti_forced) {
 		str = "mitigations=off";
 		__kpti_forced = -1;
 	}
diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
index 6268a13a1d58..00242edf1885 100644
--- a/arch/arm64/kernel/proton-pack.c
+++ b/arch/arm64/kernel/proton-pack.c
@@ -91,7 +91,7 @@ early_param("nospectre_v2", parse_spectre_v2_param);
 
 static bool spectre_v2_mitigations_off(void)
 {
-	bool ret = __nospectre_v2 || cpu_mitigations_off();
+	bool ret = __nospectre_v2 || mitigations_off();
 
 	if (ret)
 		pr_info_once("spectre-v2 mitigation disabled by command line option\n");
@@ -421,7 +421,7 @@ early_param("ssbd", parse_spectre_v4_param);
  */
 static bool spectre_v4_mitigations_off(void)
 {
-	bool ret = cpu_mitigations_off() ||
+	bool ret = mitigations_off() ||
 		   __spectre_v4_policy == SPECTRE_V4_POLICY_MITIGATION_DISABLED;
 
 	if (ret)
@@ -1000,7 +1000,7 @@ void spectre_bhb_enable_mitigation(const struct arm64_cpu_capabilities *entry)
 		/* No point mitigating Spectre-BHB alone. */
 	} else if (!IS_ENABLED(CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY)) {
 		pr_info_once("spectre-bhb mitigation disabled by compile time option\n");
-	} else if (cpu_mitigations_off() || __nospectre_bhb) {
+	} else if (mitigations_off() || __nospectre_bhb) {
 		pr_info_once("spectre-bhb mitigation disabled by command line option\n");
 	} else if (supports_ecbhb(SCOPE_LOCAL_CPU)) {
 		state = SPECTRE_MITIGATED;
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 4856e1a5161c..52cf79b5d87a 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -64,7 +64,7 @@ void __init setup_barrier_nospec(void)
 	enable = security_ftr_enabled(SEC_FTR_FAVOUR_SECURITY) &&
 		 security_ftr_enabled(SEC_FTR_BNDS_CHK_SPEC_BAR);
 
-	if (!no_nospec && !cpu_mitigations_off())
+	if (!no_nospec && !mitigations_off())
 		enable_barrier_nospec(enable);
 }
 
@@ -135,7 +135,7 @@ early_param("nospectre_v2", handle_nospectre_v2);
 #ifdef CONFIG_PPC_E500
 void __init setup_spectre_v2(void)
 {
-	if (no_spectrev2 || cpu_mitigations_off())
+	if (no_spectrev2 || mitigations_off())
 		do_btb_flush_fixups();
 	else
 		btb_flush_enabled = true;
@@ -331,7 +331,7 @@ void setup_stf_barrier(void)
 
 	stf_enabled_flush_types = type;
 
-	if (!no_stf_barrier && !cpu_mitigations_off())
+	if (!no_stf_barrier && !mitigations_off())
 		stf_barrier_enable(enable);
 }
 
@@ -530,7 +530,7 @@ void setup_count_cache_flush(void)
 {
 	bool enable = true;
 
-	if (no_spectrev2 || cpu_mitigations_off()) {
+	if (no_spectrev2 || mitigations_off()) {
 		if (security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED) ||
 		    security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED))
 			pr_warn("Spectre v2 mitigations not fully under software control, can't disable\n");
@@ -700,13 +700,13 @@ void setup_rfi_flush(enum l1d_flush_type types, bool enable)
 
 	enabled_flush_types = types;
 
-	if (!cpu_mitigations_off() && !no_rfi_flush)
+	if (!mitigations_off() && !no_rfi_flush)
 		rfi_flush_enable(enable);
 }
 
 void setup_entry_flush(bool enable)
 {
-	if (cpu_mitigations_off())
+	if (mitigations_off())
 		return;
 
 	if (!no_entry_flush)
@@ -715,7 +715,7 @@ void setup_entry_flush(bool enable)
 
 void setup_uaccess_flush(bool enable)
 {
-	if (cpu_mitigations_off())
+	if (mitigations_off())
 		return;
 
 	if (!no_uaccess_flush)
diff --git a/arch/s390/kernel/nospec-branch.c b/arch/s390/kernel/nospec-branch.c
index d1b16d83e49a..75ec4ad4198b 100644
--- a/arch/s390/kernel/nospec-branch.c
+++ b/arch/s390/kernel/nospec-branch.c
@@ -59,7 +59,7 @@ early_param("nospectre_v2", nospectre_v2_setup_early);
 
 void __init nospec_auto_detect(void)
 {
-	if (test_facility(156) || cpu_mitigations_off()) {
+	if (test_facility(156) || mitigations_off()) {
 		/*
 		 * The machine supports etokens.
 		 * Disable expolines and disable nobp.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bb0ab8466b91..45d4c2664011 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -16,6 +16,7 @@
 #include <linux/sched/smt.h>
 #include <linux/pgtable.h>
 #include <linux/bpf.h>
+#include <linux/mitigations.h>
 
 #include <asm/spec-ctrl.h>
 #include <asm/cmdline.h>
@@ -243,7 +244,7 @@ static const char * const mds_strings[] = {
 
 static void __init mds_select_mitigation(void)
 {
-	if (!boot_cpu_has_bug(X86_BUG_MDS) || cpu_mitigations_off()) {
+	if (!boot_cpu_has_bug(X86_BUG_MDS) || mitigations_off()) {
 		mds_mitigation = MDS_MITIGATION_OFF;
 		return;
 	}
@@ -255,7 +256,7 @@ static void __init mds_select_mitigation(void)
 		static_branch_enable(&mds_user_clear);
 
 		if (!boot_cpu_has(X86_BUG_MSBDS_ONLY) &&
-		    (mds_nosmt || cpu_mitigations_auto_nosmt()))
+		    (mds_nosmt || mitigations_auto_nosmt()))
 			cpu_smt_disable(false);
 	}
 }
@@ -317,7 +318,7 @@ static void __init taa_select_mitigation(void)
 		return;
 	}
 
-	if (cpu_mitigations_off()) {
+	if (mitigations_off()) {
 		taa_mitigation = TAA_MITIGATION_OFF;
 		return;
 	}
@@ -358,7 +359,7 @@ static void __init taa_select_mitigation(void)
 	 */
 	static_branch_enable(&mds_user_clear);
 
-	if (taa_nosmt || cpu_mitigations_auto_nosmt())
+	if (taa_nosmt || mitigations_auto_nosmt())
 		cpu_smt_disable(false);
 }
 
@@ -408,7 +409,7 @@ static void __init mmio_select_mitigation(void)
 
 	if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) ||
 	     boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN) ||
-	     cpu_mitigations_off()) {
+	     mitigations_off()) {
 		mmio_mitigation = MMIO_MITIGATION_OFF;
 		return;
 	}
@@ -451,7 +452,7 @@ static void __init mmio_select_mitigation(void)
 	else
 		mmio_mitigation = MMIO_MITIGATION_UCODE_NEEDED;
 
-	if (mmio_nosmt || cpu_mitigations_auto_nosmt())
+	if (mmio_nosmt || mitigations_auto_nosmt())
 		cpu_smt_disable(false);
 }
 
@@ -481,7 +482,7 @@ early_param("mmio_stale_data", mmio_stale_data_parse_cmdline);
 
 static void __init md_clear_update_mitigation(void)
 {
-	if (cpu_mitigations_off())
+	if (mitigations_off())
 		return;
 
 	if (!static_key_enabled(&mds_user_clear))
@@ -611,7 +612,7 @@ static void __init srbds_select_mitigation(void)
 		srbds_mitigation = SRBDS_MITIGATION_HYPERVISOR;
 	else if (!boot_cpu_has(X86_FEATURE_SRBDS_CTRL))
 		srbds_mitigation = SRBDS_MITIGATION_UCODE_NEEDED;
-	else if (cpu_mitigations_off() || srbds_off)
+	else if (mitigations_off() || srbds_off)
 		srbds_mitigation = SRBDS_MITIGATION_OFF;
 
 	update_srbds_msr();
@@ -742,7 +743,7 @@ static void __init gds_select_mitigation(void)
 		goto out;
 	}
 
-	if (cpu_mitigations_off())
+	if (mitigations_off())
 		gds_mitigation = GDS_MITIGATION_OFF;
 	/* Will verify below that mitigation _can_ be disabled */
 
@@ -841,7 +842,7 @@ static bool smap_works_speculatively(void)
 
 static void __init spectre_v1_select_mitigation(void)
 {
-	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1) || cpu_mitigations_off()) {
+	if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1) || mitigations_off()) {
 		spectre_v1_mitigation = SPECTRE_V1_MITIGATION_NONE;
 		return;
 	}
@@ -974,7 +975,7 @@ static void __init retbleed_select_mitigation(void)
 {
 	bool mitigate_smt = false;
 
-	if (!boot_cpu_has_bug(X86_BUG_RETBLEED) || cpu_mitigations_off())
+	if (!boot_cpu_has_bug(X86_BUG_RETBLEED) || mitigations_off())
 		return;
 
 	switch (retbleed_cmd) {
@@ -1068,7 +1069,7 @@ static void __init retbleed_select_mitigation(void)
 	}
 
 	if (mitigate_smt && !boot_cpu_has(X86_FEATURE_STIBP) &&
-	    (retbleed_nosmt || cpu_mitigations_auto_nosmt()))
+	    (retbleed_nosmt || mitigations_auto_nosmt()))
 		cpu_smt_disable(false);
 
 	/*
@@ -1391,7 +1392,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
 	int ret, i;
 
 	if (cmdline_find_option_bool(boot_command_line, "nospectre_v2") ||
-	    cpu_mitigations_off())
+	    mitigations_off())
 		return SPECTRE_V2_CMD_NONE;
 
 	ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg));
@@ -1885,7 +1886,7 @@ static enum ssb_mitigation_cmd __init ssb_parse_cmdline(void)
 	int ret, i;
 
 	if (cmdline_find_option_bool(boot_command_line, "nospec_store_bypass_disable") ||
-	    cpu_mitigations_off()) {
+	    mitigations_off()) {
 		return SPEC_STORE_BYPASS_CMD_NONE;
 	} else {
 		ret = cmdline_find_option(boot_command_line, "spec_store_bypass_disable",
@@ -2283,9 +2284,9 @@ static void __init l1tf_select_mitigation(void)
 	if (!boot_cpu_has_bug(X86_BUG_L1TF))
 		return;
 
-	if (cpu_mitigations_off())
+	if (mitigations_off())
 		l1tf_mitigation = L1TF_MITIGATION_OFF;
-	else if (cpu_mitigations_auto_nosmt())
+	else if (mitigations_auto_nosmt())
 		l1tf_mitigation = L1TF_MITIGATION_FLUSH_NOSMT;
 
 	override_cache_bits(&boot_cpu_data);
@@ -2410,7 +2411,7 @@ static void __init srso_select_mitigation(void)
 {
 	bool has_microcode = boot_cpu_has(X86_FEATURE_IBPB_BRTYPE);
 
-	if (cpu_mitigations_off())
+	if (mitigations_off())
 		return;
 
 	if (!boot_cpu_has_bug(X86_BUG_SRSO)) {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0b1f991b9a31..f0d105f740ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6819,7 +6819,7 @@ static int get_nx_huge_pages(char *buffer, const struct kernel_param *kp)
 static bool get_nx_auto_mode(void)
 {
 	/* Return true when CPU has the bug, and mitigations are ON */
-	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !cpu_mitigations_off();
+	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !mitigations_off();
 }
 
 static void __set_nx_huge_pages(bool val)
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 669ba1c345b3..16a63c241e1e 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -28,6 +28,7 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 #include <linux/cpu.h>
+#include <linux/mitigations.h>
 
 #include <asm/cpufeature.h>
 #include <asm/hypervisor.h>
@@ -84,7 +85,7 @@ void __init pti_check_boottime_disable(void)
 		return;
 	}
 
-	if (cpu_mitigations_off())
+	if (mitigations_off())
 		pti_mode = PTI_FORCE_OFF;
 	if (pti_mode == PTI_FORCE_OFF) {
 		pti_print_if_insecure("disabled on command line.");
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e30100597d0a..04356b9fa82a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -30,6 +30,7 @@
 #include <linux/static_call.h>
 #include <linux/memcontrol.h>
 #include <linux/cfi.h>
+#include <linux/mitigations.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -2214,12 +2215,12 @@ static inline bool bpf_allow_uninit_stack(void)
 
 static inline bool bpf_bypass_spec_v1(void)
 {
-	return cpu_mitigations_off() || perfmon_capable();
+	return mitigations_off() || perfmon_capable();
 }
 
 static inline bool bpf_bypass_spec_v4(void)
 {
-	return cpu_mitigations_off() || perfmon_capable();
+	return mitigations_off() || perfmon_capable();
 }
 
 int bpf_map_new_fd(struct bpf_map *map, int flags);
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index fc8094419084..b8c81d924a62 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -212,7 +212,4 @@ void cpuhp_report_idle_dead(void);
 static inline void cpuhp_report_idle_dead(void) { }
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
-extern bool cpu_mitigations_off(void);
-extern bool cpu_mitigations_auto_nosmt(void);
-
 #endif /* _LINUX_CPU_H_ */
diff --git a/include/linux/mitigations.h b/include/linux/mitigations.h
new file mode 100644
index 000000000000..5acc80d49230
--- /dev/null
+++ b/include/linux/mitigations.h
@@ -0,0 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+extern bool mitigations_off(void);
+extern bool mitigations_auto_nosmt(void);
diff --git a/kernel/Makefile b/kernel/Makefile
index ce105a5558fc..d1514432bbc7 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,8 @@ obj-y     = fork.o exec_domain.o panic.o \
 	    extable.o params.o \
 	    kthread.o sys_ni.o nsproxy.o \
 	    notifier.o ksysfs.o cred.o reboot.o \
-	    async.o range.o smpboot.o ucount.o regset.o ksyms_common.o
+	    async.o range.o smpboot.o ucount.o regset.o ksyms_common.o \
+	    mitigations.o
 
 obj-$(CONFIG_USERMODE_DRIVER) += usermode_driver.o
 obj-$(CONFIG_MULTIUSER) += groups.o
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e6ec3ba4950b..e273478cd437 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -3195,46 +3195,3 @@ void __init boot_cpu_hotplug_init(void)
 	this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
 	this_cpu_write(cpuhp_state.target, CPUHP_ONLINE);
 }
-
-/*
- * These are used for a global "mitigations=" cmdline option for toggling
- * optional CPU mitigations.
- */
-enum cpu_mitigations {
-	CPU_MITIGATIONS_OFF,
-	CPU_MITIGATIONS_AUTO,
-	CPU_MITIGATIONS_AUTO_NOSMT,
-};
-
-static enum cpu_mitigations cpu_mitigations __ro_after_init =
-	CPU_MITIGATIONS_AUTO;
-
-static int __init mitigations_parse_cmdline(char *arg)
-{
-	if (!strcmp(arg, "off"))
-		cpu_mitigations = CPU_MITIGATIONS_OFF;
-	else if (!strcmp(arg, "auto"))
-		cpu_mitigations = CPU_MITIGATIONS_AUTO;
-	else if (!strcmp(arg, "auto,nosmt"))
-		cpu_mitigations = CPU_MITIGATIONS_AUTO_NOSMT;
-	else
-		pr_crit("Unsupported mitigations=%s, system may still be vulnerable\n",
-			arg);
-
-	return 0;
-}
-early_param("mitigations", mitigations_parse_cmdline);
-
-/* mitigations=off */
-bool cpu_mitigations_off(void)
-{
-	return cpu_mitigations == CPU_MITIGATIONS_OFF;
-}
-EXPORT_SYMBOL_GPL(cpu_mitigations_off);
-
-/* mitigations=auto,nosmt */
-bool cpu_mitigations_auto_nosmt(void)
-{
-	return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT;
-}
-EXPORT_SYMBOL_GPL(cpu_mitigations_auto_nosmt);
diff --git a/kernel/mitigations.c b/kernel/mitigations.c
new file mode 100644
index 000000000000..2828a755a719
--- /dev/null
+++ b/kernel/mitigations.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/cache.h>
+#include <linux/init.h>
+#include <linux/mitigations.h>
+#include <linux/printk.h>
+#include <linux/string.h>
+
+enum mitigations {
+	MITIGATIONS_OFF,
+	MITIGATIONS_AUTO,
+	MITIGATIONS_AUTO_NOSMT,
+};
+
+static enum mitigations mitigations __ro_after_init =
+	MITIGATIONS_AUTO;
+
+/*
+ * The "mitigations=" cmdline option is for toggling optional CPU or software
+ * mitigations which may impact performance.  Mitigations should only be turned
+ * off if user space and VMs are running trusted code.
+ */
+static int __init mitigations_parse_cmdline(char *arg)
+{
+	if (!strcmp(arg, "off"))
+		mitigations = MITIGATIONS_OFF;
+	else if (!strcmp(arg, "auto"))
+		mitigations = MITIGATIONS_AUTO;
+	else if (!strcmp(arg, "auto,nosmt"))
+		mitigations = MITIGATIONS_AUTO_NOSMT;
+	else
+		pr_crit("Unsupported mitigations=%s, system may still be vulnerable\n", arg);
+
+	return 0;
+}
+early_param("mitigations", mitigations_parse_cmdline);
+
+/* mitigations=off */
+bool mitigations_off(void)
+{
+	return mitigations == MITIGATIONS_OFF;
+}
+EXPORT_SYMBOL_GPL(mitigations_off);
+
+/* mitigations=auto,nosmt */
+bool mitigations_auto_nosmt(void)
+{
+	return mitigations == MITIGATIONS_AUTO_NOSMT;
+}
+EXPORT_SYMBOL_GPL(mitigations_auto_nosmt);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off'
  2024-01-17 16:14 [PATCH RFC 0/4] Fix file lock cache accounting, again Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2024-01-17 16:14 ` [PATCH RFC 3/4] mitigations: Expand 'mitigations=off' to include optional software mitigations Josh Poimboeuf
@ 2024-01-17 16:14 ` Josh Poimboeuf
  2024-01-18  9:04   ` Michal Koutný
  3 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2024-01-17 16:14 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Layton, Chuck Lever, Shakeel Butt,
	Roman Gushchin, Johannes Weiner, Michal Hocko
  Cc: linux-kernel, Jens Axboe, Tejun Heo, Vasily Averin,
	Michal Koutny, Waiman Long, Muchun Song, Jiri Kosina, cgroups,
	linux-mm

Allow flock cache accounting to be disabled with 'mitigations=off', as
it fits the profile for that option: trusted user space combined with a
performance-impacting mitigation.

Also, for consistency with the other CONFIG_MITIGATION_* options, rename
CONFIG_FLOCK_ACCOUNTING to CONFIG_MITIGATION_FLOCK_ACCOUNTING.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 4 ++++
 fs/Kconfig                                      | 2 +-
 fs/locks.c                                      | 5 +++--
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 24e873351368..b31fe7433b48 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3427,6 +3427,10 @@
 					       ssbd=force-off [ARM64]
 					       tsx_async_abort=off [X86]
 
+					       Software mitigations:
+					       ---------------------
+					       flock_accounting=off [KNL]
+
 				Exceptions:
 					       This does not have any effect on
 					       kvm.nx_huge_pages when
diff --git a/fs/Kconfig b/fs/Kconfig
index 591f54a03059..4345b79d3b40 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -129,7 +129,7 @@ config FILE_LOCKING
           for filesystems like NFS and for the flock() system
           call. Disabling this option saves about 11k.
 
-config FLOCK_ACCOUNTING
+config MITIGATION_FLOCK_ACCOUNTING
 	bool "Enable kernel memory accounting for file locks" if EXPERT
 	depends on FILE_LOCKING
 	default y
diff --git a/fs/locks.c b/fs/locks.c
index e2799a18c4e8..fd4157ccd504 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -64,6 +64,7 @@
 #include <linux/hashtable.h>
 #include <linux/percpu.h>
 #include <linux/sysctl.h>
+#include <linux/mitigations.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/filelock.h>
@@ -2905,7 +2906,7 @@ static int __init proc_locks_init(void)
 fs_initcall(proc_locks_init);
 #endif
 
-static bool flock_accounting __ro_after_init = IS_ENABLED(CONFIG_FLOCK_ACCOUNTING);
+static bool flock_accounting __ro_after_init = IS_ENABLED(CONFIG_MITIGATION_FLOCK_ACCOUNTING);
 
 static int __init flock_accounting_cmdline(char *str)
 {
@@ -2930,7 +2931,7 @@ static int __init filelock_init(void)
 	int i;
 	slab_flags_t flags = SLAB_PANIC;
 
-	if (!flock_accounting)
+	if (mitigations_off() || !flock_accounting)
 		pr_err(FLOCK_ACCOUNTING_MSG);
 	else
 		flags |= SLAB_ACCOUNT;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 16:14 ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
@ 2024-01-17 19:00   ` Jeff Layton
  2024-01-17 19:39     ` Josh Poimboeuf
  2024-01-18  9:49     ` Michal Hocko
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff Layton @ 2024-01-17 19:00 UTC (permalink / raw)
  To: Josh Poimboeuf, Linus Torvalds, Chuck Lever, Shakeel Butt,
	Roman Gushchin, Johannes Weiner, Michal Hocko
  Cc: linux-kernel, Jens Axboe, Tejun Heo, Vasily Averin,
	Michal Koutny, Waiman Long, Muchun Song, Jiri Kosina, cgroups,
	linux-mm

On Wed, 2024-01-17 at 08:14 -0800, Josh Poimboeuf wrote:
> A container can exceed its memcg limits by allocating a bunch of file
> locks.
> 
> This bug was originally fixed by commit 0f12156dff28 ("memcg: enable
> accounting for file lock caches"), but was later reverted by commit
> 3754707bcc3e ("Revert "memcg: enable accounting for file lock caches"")
> due to performance issues.
> 
> Unfortunately those performance issues were never addressed and the bug
> has remained unfixed for over two years.
> 
> Fix it by default but allow users to disable it with a cmdline option
> (flock_accounting=off).
> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> ---
>  .../admin-guide/kernel-parameters.txt         | 17 +++++++++++
>  fs/locks.c                                    | 30 +++++++++++++++++--
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6ee0f9a5da70..91987b06bc52 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1527,6 +1527,23 @@
>  			See Documentation/admin-guide/sysctl/net.rst for
>  			fb_tunnels_only_for_init_ns
>  
> +	flock_accounting=
> +			[KNL] Enable/disable accounting for kernel
> +			memory allocations related to file locks.
> +			Format: { on | off }
> +			Default: on
> +			on:	Enable kernel memory accounting for file
> +				locks.  This prevents task groups from
> +				exceeding their memcg allocation limits.
> +				However, it may cause slowdowns in the
> +				flock() system call.
> +			off:	Disable kernel memory accounting for
> +				file locks.  This may allow a rogue task
> +				to DoS the system by forcing the kernel
> +				to allocate memory beyond the task
> +				group's memcg limits.  Not recommended
> +				unless you have trusted user space.
> +
>  	floppy=		[HW]
>  			See Documentation/admin-guide/blockdev/floppy.rst.
>  
> diff --git a/fs/locks.c b/fs/locks.c
> index cc7c117ee192..235ac56c557d 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2905,15 +2905,41 @@ static int __init proc_locks_init(void)
>  fs_initcall(proc_locks_init);
>  #endif
>  
> +static bool flock_accounting __ro_after_init = true;
> +
> +static int __init flock_accounting_cmdline(char *str)
> +{
> +	if (!str)
> +		return -EINVAL;
> +
> +	if (!strcmp(str, "off"))
> +		flock_accounting = false;
> +	else if (!strcmp(str, "on"))
> +		flock_accounting = true;
> +	else
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +early_param("flock_accounting", flock_accounting_cmdline);
> +
> +#define FLOCK_ACCOUNTING_MSG "WARNING: File lock accounting is disabled, container-triggered host memory exhaustion possible!\n"
> +
>  static int __init filelock_init(void)
>  {
>  	int i;
> +	slab_flags_t flags = SLAB_PANIC;
> +
> +	if (!flock_accounting)
> +		pr_err(FLOCK_ACCOUNTING_MSG);
> +	else
> +		flags |= SLAB_ACCOUNT;
>  
>  	flctx_cache = kmem_cache_create("file_lock_ctx",
> -			sizeof(struct file_lock_context), 0, SLAB_PANIC, NULL);
> +			sizeof(struct file_lock_context), 0, flags, NULL);
>  
>  	filelock_cache = kmem_cache_create("file_lock_cache",
> -			sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
> +			sizeof(struct file_lock), 0, flags, NULL);
>  
>  	for_each_possible_cpu(i) {
>  		struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);

I'm really not a fan of tunables or different kconfig options,
especially for something niche like this.

I also question whether this accounting will show up under any real-
world workloads, and whether it was just wrong to revert those patches
back in 2021.

File locking is an activity where we inherently expect to block. Ideally
we don't if the lock is uncontended of course, but it's always a
possibility.

The benchmark that prompted the regression basically just tries to
create and release a bunch of file locks as quickly as possible.
Legitimate applications that do a lot of very rapid locking like this
benchmark are basically non-existent. Usually the pattern is:

    acquire lock
    do some (relatively slow) I/O
    release lock

In that sort of scenario, is this memcg accounting more than just line
noise? I wonder whether we should just bite the bullet and see whether
there are any real workloads that suffer due to SLAB_ACCOUNT being
enabled on these caches?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 19:00   ` Jeff Layton
@ 2024-01-17 19:39     ` Josh Poimboeuf
  2024-01-17 20:20       ` Linus Torvalds
  2024-01-18  9:49     ` Michal Hocko
  1 sibling, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2024-01-17 19:39 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linus Torvalds, Chuck Lever, Shakeel Butt, Roman Gushchin,
	Johannes Weiner, Michal Hocko, linux-kernel, Jens Axboe,
	Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, Jan 17, 2024 at 02:00:55PM -0500, Jeff Layton wrote:
> I'm really not a fan of tunables or different kconfig options,
> especially for something niche like this.
> 
> I also question whether this accounting will show up under any real-
> world workloads, and whether it was just wrong to revert those patches
> back in 2021.
> 
> File locking is an activity where we inherently expect to block. Ideally
> we don't if the lock is uncontended of course, but it's always a
> possibility.
> 
> The benchmark that prompted the regression basically just tries to
> create and release a bunch of file locks as quickly as possible.
> Legitimate applications that do a lot of very rapid locking like this
> benchmark are basically non-existent. Usually the pattern is:
> 
>     acquire lock
>     do some (relatively slow) I/O
>     release lock
> 
> In that sort of scenario, is this memcg accounting more than just line
> noise? I wonder whether we should just bite the bullet and see whether
> there are any real workloads that suffer due to SLAB_ACCOUNT being
> enabled on these caches?

That's a good point.  If the microbenchmark isn't likely to be even
remotely realistic, maybe we should just revert the revert until if/when
somebody shows a real world impact.

Linus, any objections to that?

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 19:39     ` Josh Poimboeuf
@ 2024-01-17 20:20       ` Linus Torvalds
  2024-01-17 21:02         ` Shakeel Butt
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Linus Torvalds @ 2024-01-17 20:20 UTC (permalink / raw)
  To: Josh Poimboeuf, Vlastimil Babka
  Cc: Jeff Layton, Chuck Lever, Shakeel Butt, Roman Gushchin,
	Johannes Weiner, Michal Hocko, linux-kernel, Jens Axboe,
	Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> That's a good point.  If the microbenchmark isn't likely to be even
> remotely realistic, maybe we should just revert the revert until if/when
> somebody shows a real world impact.
>
> Linus, any objections to that?

We use SLAB_ACCOUNT for much more common allocations like queued
signals, so I would tend to agree with Jeff that it's probably just
some not very interesting microbenchmark that shows any file locking
effects from SLAB_ALLOC, not any real use.

That said, those benchmarks do matter. It's very easy to say "not
relevant in the big picture" and then the end result is that
everything is a bit of a pig.

And the regression was absolutely *ENORMOUS*. We're not talking "a few
percent". We're talking a 33% regression that caused the revert:

   https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/

I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
single allocation, it would be much nicer to account at a bigger
granularity, possibly by having per-thread counters first before
falling back to the obj_cgroup_charge. Whatever.

It's kind of stupid to have a benchmark that just allocates and
deallocates a file lock in quick succession spend lots of time
incrementing and decrementing cgroup charges for that repeated
alloc/free.

However, that problem with SLAB_ACCOUNT is not the fault of file
locking, but more of a slab issue.

End result: I think we should bring in Vlastimil and whoever else is
doing SLAB_ACCOUNT things, and have them look at that side.

And then just enable SLAB_ACCOUNT for file locks. But very much look
at silly costs in SLAB_ACCOUNT first, at least for trivial
"alloc/free" patterns..

Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
file lock caches") for the history here.

                 Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 20:20       ` Linus Torvalds
@ 2024-01-17 21:02         ` Shakeel Butt
  2024-01-17 22:20           ` Roman Gushchin
  2024-01-17 21:19         ` Vlastimil Babka
  2024-01-17 21:50         ` Roman Gushchin
  2 siblings, 1 reply; 20+ messages in thread
From: Shakeel Butt @ 2024-01-17 21:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Vlastimil Babka, Jeff Layton, Chuck Lever,
	Roman Gushchin, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, Jan 17, 2024 at 12:21 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > That's a good point.  If the microbenchmark isn't likely to be even
> > remotely realistic, maybe we should just revert the revert until if/when
> > somebody shows a real world impact.
> >
> > Linus, any objections to that?
>
> We use SLAB_ACCOUNT for much more common allocations like queued
> signals, so I would tend to agree with Jeff that it's probably just
> some not very interesting microbenchmark that shows any file locking
> effects from SLAB_ALLOC, not any real use.
>
> That said, those benchmarks do matter. It's very easy to say "not
> relevant in the big picture" and then the end result is that
> everything is a bit of a pig.
>
> And the regression was absolutely *ENORMOUS*. We're not talking "a few
> percent". We're talking a 33% regression that caused the revert:
>
>    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
>
> I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> single allocation, it would be much nicer to account at a bigger
> granularity, possibly by having per-thread counters first before
> falling back to the obj_cgroup_charge. Whatever.
>
> It's kind of stupid to have a benchmark that just allocates and
> deallocates a file lock in quick succession spend lots of time
> incrementing and decrementing cgroup charges for that repeated
> alloc/free.
>
> However, that problem with SLAB_ACCOUNT is not the fault of file
> locking, but more of a slab issue.
>
> End result: I think we should bring in Vlastimil and whoever else is
> doing SLAB_ACCOUNT things, and have them look at that side.
>
> And then just enable SLAB_ACCOUNT for file locks. But very much look
> at silly costs in SLAB_ACCOUNT first, at least for trivial
> "alloc/free" patterns..
>
> Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
> file lock caches") for the history here.
>

Roman last looked into optimizing this code path. I suspect
mod_objcg_state() to be more costly than obj_cgroup_charge(). I will
try to measure this path and see if I can improve it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 20:20       ` Linus Torvalds
  2024-01-17 21:02         ` Shakeel Butt
@ 2024-01-17 21:19         ` Vlastimil Babka
  2024-01-17 21:50         ` Roman Gushchin
  2 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2024-01-17 21:19 UTC (permalink / raw)
  To: Linus Torvalds, Josh Poimboeuf
  Cc: Jeff Layton, Chuck Lever, Shakeel Butt, Roman Gushchin,
	Johannes Weiner, Michal Hocko, linux-kernel, Jens Axboe,
	Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On 1/17/24 21:20, Linus Torvalds wrote:
> On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>>
>> That's a good point.  If the microbenchmark isn't likely to be even
>> remotely realistic, maybe we should just revert the revert until if/when
>> somebody shows a real world impact.
>>
>> Linus, any objections to that?
> 
> We use SLAB_ACCOUNT for much more common allocations like queued
> signals, so I would tend to agree with Jeff that it's probably just
> some not very interesting microbenchmark that shows any file locking
> effects from SLAB_ALLOC, not any real use.
> 
> That said, those benchmarks do matter. It's very easy to say "not
> relevant in the big picture" and then the end result is that
> everything is a bit of a pig.
> 
> And the regression was absolutely *ENORMOUS*. We're not talking "a few
> percent". We're talking a 33% regression that caused the revert:
> 
>    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
> 
> I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> single allocation, it would be much nicer to account at a bigger
> granularity, possibly by having per-thread counters first before
> falling back to the obj_cgroup_charge. Whatever.

Counters are one thing (afaik some batching happens on the memcg side via
"stocks"), but another is associating the memcg with the allocated objects
in slab pages, so kmem_cache_free() knows which counter to decrement. We'll
have to see where the overhead is today.

If there's overhead due to calls between mm/slub.c and mm/memcontrol.c we
can now reduce that with SLAB gone.

> It's kind of stupid to have a benchmark that just allocates and
> deallocates a file lock in quick succession spend lots of time
> incrementing and decrementing cgroup charges for that repeated
> alloc/free.
> 
> However, that problem with SLAB_ACCOUNT is not the fault of file
> locking, but more of a slab issue.
> 
> End result: I think we should bring in Vlastimil and whoever else is
> doing SLAB_ACCOUNT things, and have them look at that side.

Roman and Shakeel are already Cc'd. Roman recently did
https://lore.kernel.org/lkml/20231019225346.1822282-1-roman.gushchin@linux.dev/
which is mentioned in the cover letter and was merged in 6.7, but cover says
it didn't help much, too bad. So is it still 33% or how much?

> And then just enable SLAB_ACCOUNT for file locks. But very much look
> at silly costs in SLAB_ACCOUNT first, at least for trivial
> "alloc/free" patterns..
> 
> Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
> file lock caches") for the history here.
> 
>                  Linus


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 20:20       ` Linus Torvalds
  2024-01-17 21:02         ` Shakeel Butt
  2024-01-17 21:19         ` Vlastimil Babka
@ 2024-01-17 21:50         ` Roman Gushchin
  2 siblings, 0 replies; 20+ messages in thread
From: Roman Gushchin @ 2024-01-17 21:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Vlastimil Babka, Jeff Layton, Chuck Lever,
	Shakeel Butt, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, Jan 17, 2024 at 12:20:59PM -0800, Linus Torvalds wrote:
> On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > That's a good point.  If the microbenchmark isn't likely to be even
> > remotely realistic, maybe we should just revert the revert until if/when
> > somebody shows a real world impact.
> >
> > Linus, any objections to that?
> 
> We use SLAB_ACCOUNT for much more common allocations like queued
> signals, so I would tend to agree with Jeff that it's probably just
> some not very interesting microbenchmark that shows any file locking
> effects from SLAB_ALLOC, not any real use.
> 
> That said, those benchmarks do matter. It's very easy to say "not
> relevant in the big picture" and then the end result is that
> everything is a bit of a pig.
> 
> And the regression was absolutely *ENORMOUS*. We're not talking "a few
> percent". We're talking a 33% regression that caused the revert:
> 
>    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
> 
> I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> single allocation, it would be much nicer to account at a bigger
> granularity, possibly by having per-thread counters first before
> falling back to the obj_cgroup_charge. Whatever.
> 
> It's kind of stupid to have a benchmark that just allocates and
> deallocates a file lock in quick succession spend lots of time
> incrementing and decrementing cgroup charges for that repeated
> alloc/free.
> 
> However, that problem with SLAB_ACCOUNT is not the fault of file
> locking, but more of a slab issue.
> 
> End result: I think we should bring in Vlastimil and whoever else is
> doing SLAB_ACCOUNT things, and have them look at that side.
> 
> And then just enable SLAB_ACCOUNT for file locks. But very much look
> at silly costs in SLAB_ACCOUNT first, at least for trivial
> "alloc/free" patterns..
> 
> Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> thing?

Probably me.

I recently did some work on improving the kmem accounting performance,
which is mentioned in this thread and shaves off about 30%:
https://lore.kernel.org/lkml/20231019225346.1822282-1-roman.gushchin@linux.dev/

Overall the SLAB_ACCOUNT overhead looks big on micro-benchmarks simple because
SLAB allocation path is really fast, so even touching a per-cpu variable adds
a noticeable overhead. There is nothing particularly slow on the kmem allocation
and release paths, but saving a memcg/objcg pointer, bumping the charge
and stats adds up, even though we have batching in place.

I believe the only real way to make it significantly faster is to cache
pre-charged slab objects, but it adds to the complexity and increases the memory
footprint. So far it was all about micro-benchmarks, I haven't seen any
complaints on the performance of real workloads.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 21:02         ` Shakeel Butt
@ 2024-01-17 22:20           ` Roman Gushchin
  2024-01-17 22:56             ` Shakeel Butt
  2024-01-19  7:47             ` Shakeel Butt
  0 siblings, 2 replies; 20+ messages in thread
From: Roman Gushchin @ 2024-01-17 22:20 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Linus Torvalds, Josh Poimboeuf, Vlastimil Babka, Jeff Layton,
	Chuck Lever, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, Jan 17, 2024 at 01:02:19PM -0800, Shakeel Butt wrote:
> On Wed, Jan 17, 2024 at 12:21 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > That's a good point.  If the microbenchmark isn't likely to be even
> > > remotely realistic, maybe we should just revert the revert until if/when
> > > somebody shows a real world impact.
> > >
> > > Linus, any objections to that?
> >
> > We use SLAB_ACCOUNT for much more common allocations like queued
> > signals, so I would tend to agree with Jeff that it's probably just
> > some not very interesting microbenchmark that shows any file locking
> > effects from SLAB_ALLOC, not any real use.
> >
> > That said, those benchmarks do matter. It's very easy to say "not
> > relevant in the big picture" and then the end result is that
> > everything is a bit of a pig.
> >
> > And the regression was absolutely *ENORMOUS*. We're not talking "a few
> > percent". We're talking a 33% regression that caused the revert:
> >
> >    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
> >
> > I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> > single allocation, it would be much nicer to account at a bigger
> > granularity, possibly by having per-thread counters first before
> > falling back to the obj_cgroup_charge. Whatever.
> >
> > It's kind of stupid to have a benchmark that just allocates and
> > deallocates a file lock in quick succession spend lots of time
> > incrementing and decrementing cgroup charges for that repeated
> > alloc/free.
> >
> > However, that problem with SLAB_ACCOUNT is not the fault of file
> > locking, but more of a slab issue.
> >
> > End result: I think we should bring in Vlastimil and whoever else is
> > doing SLAB_ACCOUNT things, and have them look at that side.
> >
> > And then just enable SLAB_ACCOUNT for file locks. But very much look
> > at silly costs in SLAB_ACCOUNT first, at least for trivial
> > "alloc/free" patterns..
> >
> > Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> > thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
> > file lock caches") for the history here.
> >
> 
> Roman last looked into optimizing this code path. I suspect
> mod_objcg_state() to be more costly than obj_cgroup_charge(). I will
> try to measure this path and see if I can improve it.

It's roughly an equal split between mod_objcg_state() and obj_cgroup_charge().
And each is comparable (by order of magnitude) to the slab allocation cost
itself. On the free() path a significant cost comes simple from reading
the objcg pointer (it's usually a cache miss).

So I don't see how we can make it really cheap (say, less than 5% overhead)
without caching pre-accounted objects.

I thought about merging of charge and stats handling paths, which _maybe_ can
shave off another 20-30%, but there still will be a double-digit% accounting
overhead.

I'm curious to hear other ideas and suggestions.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 22:20           ` Roman Gushchin
@ 2024-01-17 22:56             ` Shakeel Butt
  2024-01-22  5:10               ` Linus Torvalds
  2024-01-19  7:47             ` Shakeel Butt
  1 sibling, 1 reply; 20+ messages in thread
From: Shakeel Butt @ 2024-01-17 22:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Linus Torvalds, Josh Poimboeuf, Vlastimil Babka, Jeff Layton,
	Chuck Lever, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, Jan 17, 2024 at 2:20 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Wed, Jan 17, 2024 at 01:02:19PM -0800, Shakeel Butt wrote:
> > On Wed, Jan 17, 2024 at 12:21 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > > >
> > > > That's a good point.  If the microbenchmark isn't likely to be even
> > > > remotely realistic, maybe we should just revert the revert until if/when
> > > > somebody shows a real world impact.
> > > >
> > > > Linus, any objections to that?
> > >
> > > We use SLAB_ACCOUNT for much more common allocations like queued
> > > signals, so I would tend to agree with Jeff that it's probably just
> > > some not very interesting microbenchmark that shows any file locking
> > > effects from SLAB_ALLOC, not any real use.
> > >
> > > That said, those benchmarks do matter. It's very easy to say "not
> > > relevant in the big picture" and then the end result is that
> > > everything is a bit of a pig.
> > >
> > > And the regression was absolutely *ENORMOUS*. We're not talking "a few
> > > percent". We're talking a 33% regression that caused the revert:
> > >
> > >    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
> > >
> > > I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> > > single allocation, it would be much nicer to account at a bigger
> > > granularity, possibly by having per-thread counters first before
> > > falling back to the obj_cgroup_charge. Whatever.
> > >
> > > It's kind of stupid to have a benchmark that just allocates and
> > > deallocates a file lock in quick succession spend lots of time
> > > incrementing and decrementing cgroup charges for that repeated
> > > alloc/free.
> > >
> > > However, that problem with SLAB_ACCOUNT is not the fault of file
> > > locking, but more of a slab issue.
> > >
> > > End result: I think we should bring in Vlastimil and whoever else is
> > > doing SLAB_ACCOUNT things, and have them look at that side.
> > >
> > > And then just enable SLAB_ACCOUNT for file locks. But very much look
> > > at silly costs in SLAB_ACCOUNT first, at least for trivial
> > > "alloc/free" patterns..
> > >
> > > Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> > > thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
> > > file lock caches") for the history here.
> > >
> >
> > Roman last looked into optimizing this code path. I suspect
> > mod_objcg_state() to be more costly than obj_cgroup_charge(). I will
> > try to measure this path and see if I can improve it.
>
> It's roughly an equal split between mod_objcg_state() and obj_cgroup_charge().
> And each is comparable (by order of magnitude) to the slab allocation cost
> itself. On the free() path a significant cost comes simple from reading
> the objcg pointer (it's usually a cache miss).
>
> So I don't see how we can make it really cheap (say, less than 5% overhead)
> without caching pre-accounted objects.

Maybe this is what we want. Now we are down to just SLUB, maybe such
caching of pre-accounted objects can be in SLUB layer and we can
decide to keep this caching per-kmem-cache opt-in or always on.

>
> I thought about merging of charge and stats handling paths, which _maybe_ can
> shave off another 20-30%, but there still will be a double-digit% accounting
> overhead.
>
> I'm curious to hear other ideas and suggestions.
>
> Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off'
  2024-01-17 16:14 ` [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off' Josh Poimboeuf
@ 2024-01-18  9:04   ` Michal Koutný
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Koutný @ 2024-01-18  9:04 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Jeff Layton, Chuck Lever, Shakeel Butt,
	Roman Gushchin, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Waiman Long, Muchun Song,
	Jiri Kosina, cgroups, linux-mm

[-- Attachment #1: Type: text/plain, Size: 684 bytes --]

On Wed, Jan 17, 2024 at 08:14:46AM -0800, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> Allow flock cache accounting to be disabled with 'mitigations=off', as
> it fits the profile for that option: trusted user space combined with a
> performance-impacting mitigation.

Note that some other kernel objects that don't have any other tight
limit are already charged too (but their charging likely did not stand
out in any performance regression tests).
In the situation you describe, users can already pass
`cgroup.memory=nokmem` and get rid of charging overhead in general.

IOW, if flock objects are charged, there already is a boot option to
turn off such behavior.

Regards,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 19:00   ` Jeff Layton
  2024-01-17 19:39     ` Josh Poimboeuf
@ 2024-01-18  9:49     ` Michal Hocko
  1 sibling, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2024-01-18  9:49 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Josh Poimboeuf, Linus Torvalds, Chuck Lever, Shakeel Butt,
	Roman Gushchin, Johannes Weiner, linux-kernel, Jens Axboe,
	Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed 17-01-24 14:00:55, Jeff Layton wrote:
> I'm really not a fan of tunables or different kconfig options,
> especially for something niche like this.

I completely agree with that. We do have plethora of kernel objects
which are accounted and we really do not want to have a kernel cmd line
option for many/each of them. Kernel memory accounting can be disabled
through kernel cmd line already. 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 22:20           ` Roman Gushchin
  2024-01-17 22:56             ` Shakeel Butt
@ 2024-01-19  7:47             ` Shakeel Butt
  1 sibling, 0 replies; 20+ messages in thread
From: Shakeel Butt @ 2024-01-19  7:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Linus Torvalds, Josh Poimboeuf, Vlastimil Babka, Jeff Layton,
	Chuck Lever, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, Jan 17, 2024 at 2:20 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Wed, Jan 17, 2024 at 01:02:19PM -0800, Shakeel Butt wrote:
> > On Wed, Jan 17, 2024 at 12:21 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > > >
> > > > That's a good point.  If the microbenchmark isn't likely to be even
> > > > remotely realistic, maybe we should just revert the revert until if/when
> > > > somebody shows a real world impact.
> > > >
> > > > Linus, any objections to that?
> > >
> > > We use SLAB_ACCOUNT for much more common allocations like queued
> > > signals, so I would tend to agree with Jeff that it's probably just
> > > some not very interesting microbenchmark that shows any file locking
> > > effects from SLAB_ALLOC, not any real use.
> > >
> > > That said, those benchmarks do matter. It's very easy to say "not
> > > relevant in the big picture" and then the end result is that
> > > everything is a bit of a pig.
> > >
> > > And the regression was absolutely *ENORMOUS*. We're not talking "a few
> > > percent". We're talking a 33% regression that caused the revert:
> > >
> > >    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
> > >
> > > I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> > > single allocation, it would be much nicer to account at a bigger
> > > granularity, possibly by having per-thread counters first before
> > > falling back to the obj_cgroup_charge. Whatever.
> > >
> > > It's kind of stupid to have a benchmark that just allocates and
> > > deallocates a file lock in quick succession spend lots of time
> > > incrementing and decrementing cgroup charges for that repeated
> > > alloc/free.
> > >
> > > However, that problem with SLAB_ACCOUNT is not the fault of file
> > > locking, but more of a slab issue.
> > >
> > > End result: I think we should bring in Vlastimil and whoever else is
> > > doing SLAB_ACCOUNT things, and have them look at that side.
> > >
> > > And then just enable SLAB_ACCOUNT for file locks. But very much look
> > > at silly costs in SLAB_ACCOUNT first, at least for trivial
> > > "alloc/free" patterns..
> > >
> > > Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> > > thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
> > > file lock caches") for the history here.
> > >
> >
> > Roman last looked into optimizing this code path. I suspect
> > mod_objcg_state() to be more costly than obj_cgroup_charge(). I will
> > try to measure this path and see if I can improve it.
>
> It's roughly an equal split between mod_objcg_state() and obj_cgroup_charge().
> And each is comparable (by order of magnitude) to the slab allocation cost
> itself. On the free() path a significant cost comes simple from reading
> the objcg pointer (it's usually a cache miss).
>
> So I don't see how we can make it really cheap (say, less than 5% overhead)
> without caching pre-accounted objects.
>
> I thought about merging of charge and stats handling paths, which _maybe_ can
> shave off another 20-30%, but there still will be a double-digit% accounting
> overhead.
>
> I'm curious to hear other ideas and suggestions.
>
> Thanks!

I profiled (perf record -a) the same benchmark i.e. lock1_processes on
an icelake machine with 72 cores and got the following results:

  12.72%  lock1_processes  [kernel.kallsyms]   [k] mod_objcg_state
  10.89%  lock1_processes  [kernel.kallsyms]   [k] kmem_cache_free
   8.40%  lock1_processes  [kernel.kallsyms]   [k] slab_post_alloc_hook
   8.36%  lock1_processes  [kernel.kallsyms]   [k] kmem_cache_alloc
   5.18%  lock1_processes  [kernel.kallsyms]   [k] refill_obj_stock
   5.18%  lock1_processes  [kernel.kallsyms]   [k] _copy_from_user

On annotating mod_objcg_state(), the following irq disabling
instructions are taking 30% of its time.

  6.64 │       pushfq
 10.26│       popq   -0x38(%rbp)
  6.05 │       mov    -0x38(%rbp),%rcx
  7.60 │       cli

For kmem_cache_free() & kmem_cache_alloc(), the following instruction
was expensive, which corresponds to __update_cpu_freelist_fast().

 16.33 │      cmpxchg16b %gs:(%rsi)

For slab_post_alloc_hook(), it's all over the place and
refill_obj_stock() is very similar to mod_objcg_state().

I will dig more in the next couple of days.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-17 22:56             ` Shakeel Butt
@ 2024-01-22  5:10               ` Linus Torvalds
  2024-01-22 17:38                 ` Shakeel Butt
  2024-01-26  9:50                 ` Vlastimil Babka
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2024-01-22  5:10 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Josh Poimboeuf, Vlastimil Babka, Jeff Layton,
	Chuck Lever, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@google.com> wrote:
> >
> > So I don't see how we can make it really cheap (say, less than 5% overhead)
> > without caching pre-accounted objects.
>
> Maybe this is what we want. Now we are down to just SLUB, maybe such
> caching of pre-accounted objects can be in SLUB layer and we can
> decide to keep this caching per-kmem-cache opt-in or always on.

So it turns out that we have another case of SLAB_ACCOUNT being quite
a big expense, and it's actually the normal - but failed - open() or
execve() case.

See the thread at

    https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/

and to see the effect in profiles, you can use this EXTREMELY stupid
test program:

    #include <fcntl.h>

    int main(int argc, char **argv)
    {
        for (int i = 0; i < 10000000; i++)
                open("nonexistent", O_RDONLY);
    }

where the point of course is that the "nonexistent" pathname doesn't
actually exist (so don't create a file called that for the test).

What happens is that open() allocates a 'struct file *' early from the
filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
for it, failt the pathname open, and free it again, which uncharges
the accounting.

Now, in this case, I actually have a suggestion: could we please just
make SLAB_ACCOUNT be something that we do *after* the allocation, kind
of the same way the zeroing works?

IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
slab_post_alloc_hook() do all the "charge the memcg if required".

Obviously that means that now a failure to charge the memcg would have
to then de-allocate things, but that's an uncommon path and would be
marked unlikely and not be in the hot path at all.

Now, the reason I would prefer that is that the *second* step would be to

 (a) expose a "kmem_cache_charge()" function that takes a
*non*-accounted slab allocation, and turns it into an accounted one
(and obviously this is why you want to do everything in the post-alloc
hook: just try to share this code)

 (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
allocations start out unaccounted.

 (c) when we have *actually* looked up the pathname and open the file
successfully, at *that* point we'd do a

        error = kmem_cache_charge(filp_cachep, file);

    in do_dentry_open() to turn the unaccounted file pointer into an
accounted one (and if that fails, we do the cleanup and free it, of
course, exactly like we do when file_get_write_access() fails)

which means that now the failure case doesn't unnecessarily charge the
allocation that never ends up being finalized.

NOTE! I think this would clean up mm/slub.c too, simply because it
would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
rid of the need to carry the "struct obj_cgroup **objcgp" pointer
along until the post-alloc hook: everything would be done post-alloc.

The actual kmem_cache_free() code already deals with "this slab hasn't
been accounted" because it obviously has to deal with allocations that
were done without __GFP_ACCOUNT anyway. So there's no change needed on
the freeing path, it already has to handle this all gracefully.

I may be missing something, but it would seem to have very little
downside, and fix a case that actually is visible right now.

              Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-22  5:10               ` Linus Torvalds
@ 2024-01-22 17:38                 ` Shakeel Butt
  2024-01-26  9:50                 ` Vlastimil Babka
  1 sibling, 0 replies; 20+ messages in thread
From: Shakeel Butt @ 2024-01-22 17:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Gushchin, Josh Poimboeuf, Vlastimil Babka, Jeff Layton,
	Chuck Lever, Johannes Weiner, Michal Hocko, linux-kernel,
	Jens Axboe, Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm

Hi Linus,

On Sun, Jan 21, 2024 at 9:16 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > So I don't see how we can make it really cheap (say, less than 5% overhead)
> > > without caching pre-accounted objects.
> >
> > Maybe this is what we want. Now we are down to just SLUB, maybe such
> > caching of pre-accounted objects can be in SLUB layer and we can
> > decide to keep this caching per-kmem-cache opt-in or always on.
>
> So it turns out that we have another case of SLAB_ACCOUNT being quite
> a big expense, and it's actually the normal - but failed - open() or
> execve() case.
>
> See the thread at
>
>     https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/
>
> and to see the effect in profiles, you can use this EXTREMELY stupid
> test program:
>
>     #include <fcntl.h>
>
>     int main(int argc, char **argv)
>     {
>         for (int i = 0; i < 10000000; i++)
>                 open("nonexistent", O_RDONLY);
>     }
>
> where the point of course is that the "nonexistent" pathname doesn't
> actually exist (so don't create a file called that for the test).
>
> What happens is that open() allocates a 'struct file *' early from the
> filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
> for it, failt the pathname open, and free it again, which uncharges
> the accounting.
>
> Now, in this case, I actually have a suggestion: could we please just
> make SLAB_ACCOUNT be something that we do *after* the allocation, kind
> of the same way the zeroing works?
>
> IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
> slab_post_alloc_hook() do all the "charge the memcg if required".
>
> Obviously that means that now a failure to charge the memcg would have
> to then de-allocate things, but that's an uncommon path and would be
> marked unlikely and not be in the hot path at all.
>
> Now, the reason I would prefer that is that the *second* step would be to
>
>  (a) expose a "kmem_cache_charge()" function that takes a
> *non*-accounted slab allocation, and turns it into an accounted one
> (and obviously this is why you want to do everything in the post-alloc
> hook: just try to share this code)
>
>  (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
> allocations start out unaccounted.
>
>  (c) when we have *actually* looked up the pathname and open the file
> successfully, at *that* point we'd do a
>
>         error = kmem_cache_charge(filp_cachep, file);
>
>     in do_dentry_open() to turn the unaccounted file pointer into an
> accounted one (and if that fails, we do the cleanup and free it, of
> course, exactly like we do when file_get_write_access() fails)
>
> which means that now the failure case doesn't unnecessarily charge the
> allocation that never ends up being finalized.
>
> NOTE! I think this would clean up mm/slub.c too, simply because it
> would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
> rid of the need to carry the "struct obj_cgroup **objcgp" pointer
> along until the post-alloc hook: everything would be done post-alloc.
>
> The actual kmem_cache_free() code already deals with "this slab hasn't
> been accounted" because it obviously has to deal with allocations that
> were done without __GFP_ACCOUNT anyway. So there's no change needed on
> the freeing path, it already has to handle this all gracefully.
>
> I may be missing something, but it would seem to have very little
> downside, and fix a case that actually is visible right now.
>

Thanks for the idea. Actually I had a discussion with Vlastimil at LPC
'22 on a similar idea but for a different problem i.e. allocation in
irq context or more specifically charging sockets for incoming TCP
connections. If you see inet_csk_accept() we solved this for TCP
memory by charging at accept() syscall but the kernel memory holding
socket is still uncharged.

So, I think having the framework you suggested would help more
use-cases. I will take a stab at this in the next couple of days
(sorry stuck in some other stuff atm).

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-22  5:10               ` Linus Torvalds
  2024-01-22 17:38                 ` Shakeel Butt
@ 2024-01-26  9:50                 ` Vlastimil Babka
  2024-01-30 11:04                   ` Vlastimil Babka
  1 sibling, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2024-01-26  9:50 UTC (permalink / raw)
  To: Linus Torvalds, Shakeel Butt
  Cc: Roman Gushchin, Josh Poimboeuf, Jeff Layton, Chuck Lever,
	Johannes Weiner, Michal Hocko, linux-kernel, Jens Axboe,
	Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm, Howard McLauchlan,
	bpf, Jens Axboe

On 1/22/24 06:10, Linus Torvalds wrote:
> On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@google.com> wrote:
>> >
>> > So I don't see how we can make it really cheap (say, less than 5% overhead)
>> > without caching pre-accounted objects.
>>
>> Maybe this is what we want. Now we are down to just SLUB, maybe such
>> caching of pre-accounted objects can be in SLUB layer and we can
>> decide to keep this caching per-kmem-cache opt-in or always on.
> 
> So it turns out that we have another case of SLAB_ACCOUNT being quite
> a big expense, and it's actually the normal - but failed - open() or
> execve() case.
> 
> See the thread at
> 
>     https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/
> 
> and to see the effect in profiles, you can use this EXTREMELY stupid
> test program:
> 
>     #include <fcntl.h>
> 
>     int main(int argc, char **argv)
>     {
>         for (int i = 0; i < 10000000; i++)
>                 open("nonexistent", O_RDONLY);
>     }

This reminded me I can see should_failslab() in the profiles (1.43% plus the
overhead in its caller) even if it does nothing at all, and it's completely
unconditional since commit 4f6923fbb352 ("mm: make should_failslab always
available for fault injection").

We discussed it briefly when Jens tried to change it in [1] to depend on
CONFIG_FAILSLAB again. But now I think it should be even possible to leave
it always available, but behind a static key. BPF or whoever else uses these
error injection hooks would have to track how many users of them are active
and manage the static key accordingly. Then it could be always available,
but have no overhead when there's no active user? Other similars hooks could
benefit from such an approach too?

[1]
https://lore.kernel.org/all/e01e5e40-692a-519c-4cba-e3331f173c82@kernel.dk/#t


> where the point of course is that the "nonexistent" pathname doesn't
> actually exist (so don't create a file called that for the test).
> 
> What happens is that open() allocates a 'struct file *' early from the
> filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
> for it, failt the pathname open, and free it again, which uncharges
> the accounting.
> 
> Now, in this case, I actually have a suggestion: could we please just
> make SLAB_ACCOUNT be something that we do *after* the allocation, kind
> of the same way the zeroing works?
> 
> IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
> slab_post_alloc_hook() do all the "charge the memcg if required".
> 
> Obviously that means that now a failure to charge the memcg would have
> to then de-allocate things, but that's an uncommon path and would be
> marked unlikely and not be in the hot path at all.
> 
> Now, the reason I would prefer that is that the *second* step would be to
> 
>  (a) expose a "kmem_cache_charge()" function that takes a
> *non*-accounted slab allocation, and turns it into an accounted one
> (and obviously this is why you want to do everything in the post-alloc
> hook: just try to share this code)
> 
>  (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
> allocations start out unaccounted.
> 
>  (c) when we have *actually* looked up the pathname and open the file
> successfully, at *that* point we'd do a
> 
>         error = kmem_cache_charge(filp_cachep, file);
> 
>     in do_dentry_open() to turn the unaccounted file pointer into an
> accounted one (and if that fails, we do the cleanup and free it, of
> course, exactly like we do when file_get_write_access() fails)
> 
> which means that now the failure case doesn't unnecessarily charge the
> allocation that never ends up being finalized.
> 
> NOTE! I think this would clean up mm/slub.c too, simply because it
> would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
> rid of the need to carry the "struct obj_cgroup **objcgp" pointer
> along until the post-alloc hook: everything would be done post-alloc.
> 
> The actual kmem_cache_free() code already deals with "this slab hasn't
> been accounted" because it obviously has to deal with allocations that
> were done without __GFP_ACCOUNT anyway. So there's no change needed on
> the freeing path, it already has to handle this all gracefully.
> 
> I may be missing something, but it would seem to have very little
> downside, and fix a case that actually is visible right now.
> 
>               Linus


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
  2024-01-26  9:50                 ` Vlastimil Babka
@ 2024-01-30 11:04                   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2024-01-30 11:04 UTC (permalink / raw)
  To: Linus Torvalds, Shakeel Butt
  Cc: Roman Gushchin, Josh Poimboeuf, Jeff Layton, Chuck Lever,
	Johannes Weiner, Michal Hocko, linux-kernel, Jens Axboe,
	Tejun Heo, Vasily Averin, Michal Koutny, Waiman Long,
	Muchun Song, Jiri Kosina, cgroups, linux-mm, Howard McLauchlan,
	bpf

On 1/26/24 10:50, Vlastimil Babka wrote:
> On 1/22/24 06:10, Linus Torvalds wrote:
>> On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@google.com> wrote:
>>> >
>>> > So I don't see how we can make it really cheap (say, less than 5% overhead)
>>> > without caching pre-accounted objects.
>>>
>>> Maybe this is what we want. Now we are down to just SLUB, maybe such
>>> caching of pre-accounted objects can be in SLUB layer and we can
>>> decide to keep this caching per-kmem-cache opt-in or always on.
>> 
>> So it turns out that we have another case of SLAB_ACCOUNT being quite
>> a big expense, and it's actually the normal - but failed - open() or
>> execve() case.
>> 
>> See the thread at
>> 
>>     https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/
>> 
>> and to see the effect in profiles, you can use this EXTREMELY stupid
>> test program:
>> 
>>     #include <fcntl.h>
>> 
>>     int main(int argc, char **argv)
>>     {
>>         for (int i = 0; i < 10000000; i++)
>>                 open("nonexistent", O_RDONLY);
>>     }
> 
> This reminded me I can see should_failslab() in the profiles (1.43% plus the
> overhead in its caller) even if it does nothing at all, and it's completely
> unconditional since commit 4f6923fbb352 ("mm: make should_failslab always
> available for fault injection").
> 
> We discussed it briefly when Jens tried to change it in [1] to depend on
> CONFIG_FAILSLAB again. But now I think it should be even possible to leave
> it always available, but behind a static key. BPF or whoever else uses these
> error injection hooks would have to track how many users of them are active
> and manage the static key accordingly. Then it could be always available,
> but have no overhead when there's no active user? Other similars hooks could
> benefit from such an approach too?
> 
> [1]
> https://lore.kernel.org/all/e01e5e40-692a-519c-4cba-e3331f173c82@kernel.dk/#t

Just for completeness, with the hunk below I've seen some 2% improvement on
the test program from Linus.
Of course something needs to operate the static key and I'm not familiar
enough with bpf and related machinery whether it's tracking users of the
injection points already where the static key toggling could be hooked.

diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..da07b358d092 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3750,6 +3750,8 @@ noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags)
 }
 ALLOW_ERROR_INJECTION(should_failslab, ERRNO);
 
+static DEFINE_STATIC_KEY_FALSE(failslab_enabled);
+
 static __fastpath_inline
 struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
                                       struct list_lru *lru,
@@ -3760,8 +3762,10 @@ struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
 
        might_alloc(flags);
 
-       if (unlikely(should_failslab(s, flags)))
-               return NULL;
+       if (static_branch_unlikely(&failslab_enabled)) {
+               if (unlikely(should_failslab(s, flags)))
+                       return NULL;
+       }
 
        if (unlikely(!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags)))
                return NULL;




^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-01-30 11:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-17 16:14 [PATCH RFC 0/4] Fix file lock cache accounting, again Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
2024-01-17 19:00   ` Jeff Layton
2024-01-17 19:39     ` Josh Poimboeuf
2024-01-17 20:20       ` Linus Torvalds
2024-01-17 21:02         ` Shakeel Butt
2024-01-17 22:20           ` Roman Gushchin
2024-01-17 22:56             ` Shakeel Butt
2024-01-22  5:10               ` Linus Torvalds
2024-01-22 17:38                 ` Shakeel Butt
2024-01-26  9:50                 ` Vlastimil Babka
2024-01-30 11:04                   ` Vlastimil Babka
2024-01-19  7:47             ` Shakeel Butt
2024-01-17 21:19         ` Vlastimil Babka
2024-01-17 21:50         ` Roman Gushchin
2024-01-18  9:49     ` Michal Hocko
2024-01-17 16:14 ` [PATCH RFC 2/4] fs/locks: Add CONFIG_FLOCK_ACCOUNTING Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 3/4] mitigations: Expand 'mitigations=off' to include optional software mitigations Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off' Josh Poimboeuf
2024-01-18  9:04   ` Michal Koutný

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.