Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
@ 2020-03-25 12:03 Vlastimil Babka
  2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Vlastimil Babka @ 2020-03-25 12:03 UTC (permalink / raw)
  To: Luis Chamberlain, Kees Cook, Iurii Zaikin
  Cc: linux-kernel, linux-api, linux-mm, Ivan Teterevkov, Michal Hocko,
	David Rientjes, Matthew Wilcox, Eric W . Biederman,
	Guilherme G . Piccoli, Vlastimil Babka

A recently proposed patch to add vm_swappiness command line parameter in
addition to existing sysctl [1] made me wonder why we don't have a general
support for passing sysctl parameters via command line. Googling found only
somebody else wondering the same [2], but I haven't found any prior discussion
with reasons why not to do this.

Settings the vm_swappiness issue aside (the underlying issue might be solved in
a different way), quick search of kernel-parameters.txt shows there are already
some that exist as both sysctl and kernel parameter - hung_task_panic,
nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
would remove the need to add more of those one-offs and might be handy in
situations where configuration by e.g. /etc/sysctl.d/ is impractical.
Also after 61a47c1ad3a4 ("sysctl: Remove the sysctl system call") the only way
to set sysctl is via procfs, so this would eventually allow small systems to be
built without CONFIG_PROC_SYSCTL and still be able to change sysctl parameters.

Hence, this patch adds a new parse_args() pass that looks for parameters
prefixed by 'sysctl.' and searches for them in the sysctl ctl_tables. When
found, the respective proc handler is invoked. The search is just a naive
linear one, to avoid using the whole procfs layer. It should be acceptable,
as the cost depends on number of sysctl. parameters passed.

The main limitation of avoiding the procfs layer is however that sysctls
dynamically registered by register_sysctl_table() or register_sysctl_paths()
cannot currently be set by this method.

The processing is hooked right before the init process is loaded, as some
handlers might be more complicated than simple setters and might need some
subsystems to be initialized. At the moment the init process can be started and
eventually execute a process writing to /proc/sys/ then it should be also fine
to do that from the kernel.

[1] https://lore.kernel.org/linux-doc/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
[2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
v2: - handle any nesting level of parameter name
 - add Documentation/admin-guide/kernel-parameters.txt blurb
 - alias support for legacy one-off parameters, with first conversion (patch 2)
 - still no support for dynamically registed sysctls

 .../admin-guide/kernel-parameters.txt         |  9 +++
 include/linux/sysctl.h                        |  1 +
 init/main.c                                   | 21 +++++++
 kernel/sysctl.c                               | 62 +++++++++++++++++++
 4 files changed, 93 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c07815d230bc..5076e288f93f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4793,6 +4793,15 @@
 
 	switches=	[HW,M68k]
 
+	sysctl.*=	[KNL]
+			Set a sysctl parameter right before loading the init
+			process, as if the value was written to the respective
+			/proc/sys/... file. Currently a subset of sysctl
+			parameters is supported that is not registered
+			dynamically. Unrecognized parameters and invalid values
+			are reported in the kernel log.
+			Example: sysctl.vm.swappiness=40
+
 	sysfs.deprecated=0|1 [KNL]
 			Enable/disable old style sysfs layout for old udev
 			on older distributions. When this option is enabled
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 02fa84493f23..62ae963a5c0c 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -206,6 +206,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 void unregister_sysctl_table(struct ctl_table_header * table);
 
 extern int sysctl_init(void);
+int process_sysctl_arg(char *param, char *val, const char *unused, void *arg);
 
 extern struct ctl_table sysctl_mount_point[];
 
diff --git a/init/main.c b/init/main.c
index ee4947af823f..74a094c6b8b9 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1345,6 +1345,25 @@ void __weak free_initmem(void)
 	free_initmem_default(POISON_FREE_INITMEM);
 }
 
+static void do_sysctl_args(void)
+{
+#ifdef CONFIG_SYSCTL
+	size_t len = strlen(saved_command_line) + 1;
+	char *command_line;
+
+	command_line = kzalloc(len, GFP_KERNEL);
+	if (!command_line)
+		panic("%s: Failed to allocate %zu bytes\n", __func__, len);
+
+	strcpy(command_line, saved_command_line);
+
+	parse_args("Setting sysctl args", command_line,
+		   NULL, 0, -1, -1, NULL, process_sysctl_arg);
+
+	kfree(command_line);
+#endif
+}
+
 static int __ref kernel_init(void *unused)
 {
 	int ret;
@@ -1367,6 +1386,8 @@ static int __ref kernel_init(void *unused)
 
 	rcu_end_inkernel_boot();
 
+	do_sysctl_args();
+
 	if (ramdisk_execute_command) {
 		ret = run_init_process(ramdisk_execute_command);
 		if (!ret)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ad5b88a53c5a..18c7f5606d55 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1980,6 +1980,68 @@ int __init sysctl_init(void)
 	return 0;
 }
 
+/* Set sysctl value passed on kernel command line. */
+int process_sysctl_arg(char *param, char *val,
+			       const char *unused, void *arg)
+{
+	size_t count;
+	char *remaining;
+	int err;
+	loff_t ppos = 0;
+	struct ctl_table *ctl, *found = NULL;
+
+	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
+		return 0;
+
+	param += sizeof("sysctl.") - 1;
+
+	remaining = param;
+	ctl = &sysctl_base_table[0];
+
+	while(ctl->procname != 0) {
+		int len = strlen(ctl->procname);
+		if (strncmp(remaining, ctl->procname, len)) {
+			ctl++;
+			continue;
+		}
+		if (ctl->child) {
+			if (remaining[len] == '.') {
+				remaining += len + 1;
+				ctl = ctl->child;
+				continue;
+			}
+		} else {
+			if (remaining[len] == '\0') {
+				found = ctl;
+				break;
+			}
+		}
+		ctl++;
+	}
+
+	if (!found) {
+		pr_warn("Unknown sysctl param '%s' on command line", param);
+		return 0;
+	}
+
+	if (!(found->mode & 0200)) {
+		pr_warn("Cannot set sysctl '%s=%s' from command line - not writable",
+			param, val);
+		return 0;
+	}
+
+	count = strlen(val);
+	err = found->proc_handler(found, 1, val, &count, &ppos);
+
+	if (err)
+		pr_warn("Error %d setting sysctl '%s=%s' from command line",
+			err, param, val);
+
+	pr_debug("Set sysctl '%s=%s' from command line", param, val);
+
+	return 0;
+}
+
 #endif /* CONFIG_SYSCTL */
 
 /*
-- 
2.25.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC v2 2/2] kernel/sysctl: support handling command line aliases
  2020-03-25 12:03 [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Vlastimil Babka
@ 2020-03-25 12:03 ` Vlastimil Babka
  2020-03-25 14:29   ` Michal Hocko
                     ` (2 more replies)
  2020-03-25 21:21 ` [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Kees Cook
  2020-03-25 22:20 ` Eric W. Biederman
  2 siblings, 3 replies; 18+ messages in thread
From: Vlastimil Babka @ 2020-03-25 12:03 UTC (permalink / raw)
  To: Luis Chamberlain, Kees Cook, Iurii Zaikin
  Cc: linux-kernel, linux-api, linux-mm, Ivan Teterevkov, Michal Hocko,
	David Rientjes, Matthew Wilcox, Eric W . Biederman,
	Guilherme G . Piccoli, Vlastimil Babka

We can now handle sysctl parameters on kernel command line, but historically
some parameters introduced their own command line equivalent, which we don't
want to remove for compatibility reasons. We can however convert them to the
generic infrastructure with a table translating the legacy command line
parameters to their sysctl names, and removing the one-off param handlers.

This patch adds the support and makes the first conversion to demonstrate it,
on the (deprecated) numa_zonelist_order parameter.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 kernel/sysctl.c | 39 +++++++++++++++++++++++++++++++++++----
 mm/page_alloc.c |  9 ---------
 2 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 18c7f5606d55..fd72853396f9 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1971,6 +1971,22 @@ static struct ctl_table dev_table[] = {
 	{ }
 };
 
+struct sysctl_alias {
+	char *kernel_param;
+	char *sysctl_param;
+};
+
+/*
+ * Historically some settings had both sysctl and a command line parameter.
+ * With the generic sysctl. parameter support, we can handle them at a single
+ * place and only keep the historical name for compatibility. This is not meant
+ * to add brand new aliases.
+ */
+static struct sysctl_alias sysctl_aliases[] = {
+	{"numa_zonelist_order",		"vm.numa_zonelist_order" },
+	{ }
+};
+
 int __init sysctl_init(void)
 {
 	struct ctl_table_header *hdr;
@@ -1980,6 +1996,18 @@ int __init sysctl_init(void)
 	return 0;
 }
 
+char *sysctl_find_alias(char *param)
+{
+	struct sysctl_alias *alias;
+
+	for (alias = &sysctl_aliases[0]; alias->kernel_param != NULL; alias++) {
+		if (strcmp(alias->kernel_param, param) == 0)
+			return alias->sysctl_param;
+	}
+
+	return NULL;
+}
+
 /* Set sysctl value passed on kernel command line. */
 int process_sysctl_arg(char *param, char *val,
 			       const char *unused, void *arg)
@@ -1990,10 +2018,13 @@ int process_sysctl_arg(char *param, char *val,
 	loff_t ppos = 0;
 	struct ctl_table *ctl, *found = NULL;
 
-	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
-		return 0;
-
-	param += sizeof("sysctl.") - 1;
+	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1) == 0) {
+		param += sizeof("sysctl.") - 1;
+	} else {
+		param = sysctl_find_alias(param);
+		if (!param)
+			return 0;
+	}
 
 	remaining = param;
 	ctl = &sysctl_base_table[0];
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..de7a134b1b8a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5460,15 +5460,6 @@ static int __parse_numa_zonelist_order(char *s)
 	return 0;
 }
 
-static __init int setup_numa_zonelist_order(char *s)
-{
-	if (!s)
-		return 0;
-
-	return __parse_numa_zonelist_order(s);
-}
-early_param("numa_zonelist_order", setup_numa_zonelist_order);
-
 char numa_zonelist_order[] = "Node";
 
 /*
-- 
2.25.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 2/2] kernel/sysctl: support handling command line aliases
  2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
@ 2020-03-25 14:29   ` Michal Hocko
  2020-03-25 14:36     ` Vlastimil Babka
  2020-03-25 22:42   ` Kees Cook
  2020-03-29 15:00   ` Arvind Sankar
  2 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2020-03-25 14:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-api, linux-mm, Ivan Teterevkov, David Rientjes,
	Matthew Wilcox, Eric W . Biederman, Guilherme G . Piccoli

Both patches look really great to me. I haven't really checked all the
details but from a quick glance they both seem ok.

I would just add a small clarification here. Unless I am mistaken
early_param is called earlier than it would be now. But that shouldn't
cause any problems because the underlying implementation is just a noop
for backward compatibility.

Thanks a lot this looks like a very nice improvement.

On Wed 25-03-20 13:03:45, Vlastimil Babka wrote:
[...]
> -static __init int setup_numa_zonelist_order(char *s)
> -{
> -	if (!s)
> -		return 0;
> -
> -	return __parse_numa_zonelist_order(s);
> -}
> -early_param("numa_zonelist_order", setup_numa_zonelist_order);
> -
>  char numa_zonelist_order[] = "Node";
>  
>  /*
> -- 
> 2.25.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 2/2] kernel/sysctl: support handling command line aliases
  2020-03-25 14:29   ` Michal Hocko
@ 2020-03-25 14:36     ` Vlastimil Babka
  2020-03-25 14:44       ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Vlastimil Babka @ 2020-03-25 14:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-api, linux-mm, Ivan Teterevkov, David Rientjes,
	Matthew Wilcox, Eric W . Biederman, Guilherme G . Piccoli

On 3/25/20 3:29 PM, Michal Hocko wrote:
> Both patches look really great to me. I haven't really checked all the
> details but from a quick glance they both seem ok.

Thanks.

> I would just add a small clarification here. Unless I am mistaken
> early_param is called earlier than it would be now. But that shouldn't
> cause any problems because the underlying implementation is just a noop
> for backward compatibility.

Yeah, indeed worth noting somewhere explicitly. The conversion can't be done
blindly, one has to consider whether the delay compared to early_param can be a
disadvantage or not. For example the nmi_watchdog parameter is probably best
left as it is?

> Thanks a lot this looks like a very nice improvement.
> 
> On Wed 25-03-20 13:03:45, Vlastimil Babka wrote:
> [...]
>> -static __init int setup_numa_zonelist_order(char *s)
>> -{
>> -	if (!s)
>> -		return 0;
>> -
>> -	return __parse_numa_zonelist_order(s);
>> -}
>> -early_param("numa_zonelist_order", setup_numa_zonelist_order);
>> -
>>  char numa_zonelist_order[] = "Node";
>>  
>>  /*
>> -- 
>> 2.25.1
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 2/2] kernel/sysctl: support handling command line aliases
  2020-03-25 14:36     ` Vlastimil Babka
@ 2020-03-25 14:44       ` Michal Hocko
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2020-03-25 14:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-api, linux-mm, Ivan Teterevkov, David Rientjes,
	Matthew Wilcox, Eric W . Biederman, Guilherme G . Piccoli

On Wed 25-03-20 15:36:23, Vlastimil Babka wrote:
> On 3/25/20 3:29 PM, Michal Hocko wrote:
> > Both patches look really great to me. I haven't really checked all the
> > details but from a quick glance they both seem ok.
> 
> Thanks.
> 
> > I would just add a small clarification here. Unless I am mistaken
> > early_param is called earlier than it would be now. But that shouldn't
> > cause any problems because the underlying implementation is just a noop
> > for backward compatibility.
> 
> Yeah, indeed worth noting somewhere explicitly. The conversion can't be done
> blindly, one has to consider whether the delay compared to early_param can be a
> disadvantage or not. For example the nmi_watchdog parameter is probably best
> left as it is?

I wouldn't mind moving nmi_watchdog timeout initialization to later. If
there is a usecase to rely on an early initialization then the patch can
be reverted but I struggle to think of anything reasonable. If the early
init code needs a lonter timeout to prevent from false positives then
there is clearly a bug to be better fixed. And a necessary shorter timeout
sounds quite exotic to me TBH.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-25 12:03 [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Vlastimil Babka
  2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
@ 2020-03-25 21:21 ` Kees Cook
  2020-03-26  9:30   ` Vlastimil Babka
  2020-03-25 22:20 ` Eric W. Biederman
  2 siblings, 1 reply; 18+ messages in thread
From: Kees Cook @ 2020-03-25 21:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Luis Chamberlain, Iurii Zaikin, linux-kernel, linux-api,
	linux-mm, Ivan Teterevkov, Michal Hocko, David Rientjes,
	Matthew Wilcox, Eric W . Biederman, Guilherme G . Piccoli

On Wed, Mar 25, 2020 at 01:03:44PM +0100, Vlastimil Babka wrote:
> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
> 
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
> Also after 61a47c1ad3a4 ("sysctl: Remove the sysctl system call") the only way
> to set sysctl is via procfs, so this would eventually allow small systems to be
> built without CONFIG_PROC_SYSCTL and still be able to change sysctl parameters.
> 
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and searches for them in the sysctl ctl_tables. When
> found, the respective proc handler is invoked. The search is just a naive
> linear one, to avoid using the whole procfs layer. It should be acceptable,
> as the cost depends on number of sysctl. parameters passed.
> 
> The main limitation of avoiding the procfs layer is however that sysctls
> dynamically registered by register_sysctl_table() or register_sysctl_paths()
> cannot currently be set by this method.
> 
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.
> 
> [1] https://lore.kernel.org/linux-doc/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> v2: - handle any nesting level of parameter name
>  - add Documentation/admin-guide/kernel-parameters.txt blurb
>  - alias support for legacy one-off parameters, with first conversion (patch 2)
>  - still no support for dynamically registed sysctls
> 
>  .../admin-guide/kernel-parameters.txt         |  9 +++
>  include/linux/sysctl.h                        |  1 +
>  init/main.c                                   | 21 +++++++
>  kernel/sysctl.c                               | 62 +++++++++++++++++++
>  4 files changed, 93 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..5076e288f93f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4793,6 +4793,15 @@
>  
>  	switches=	[HW,M68k]
>  
> +	sysctl.*=	[KNL]
> +			Set a sysctl parameter right before loading the init
> +			process, as if the value was written to the respective
> +			/proc/sys/... file. Currently a subset of sysctl
> +			parameters is supported that is not registered
> +			dynamically. Unrecognized parameters and invalid values
> +			are reported in the kernel log.
> +			Example: sysctl.vm.swappiness=40
> +
>  	sysfs.deprecated=0|1 [KNL]
>  			Enable/disable old style sysfs layout for old udev
>  			on older distributions. When this option is enabled
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 02fa84493f23..62ae963a5c0c 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -206,6 +206,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
>  void unregister_sysctl_table(struct ctl_table_header * table);
>  
>  extern int sysctl_init(void);
> +int process_sysctl_arg(char *param, char *val, const char *unused, void *arg);
>  
>  extern struct ctl_table sysctl_mount_point[];
>  
> diff --git a/init/main.c b/init/main.c
> index ee4947af823f..74a094c6b8b9 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1345,6 +1345,25 @@ void __weak free_initmem(void)
>  	free_initmem_default(POISON_FREE_INITMEM);
>  }
>  
> +static void do_sysctl_args(void)
> +{
> +#ifdef CONFIG_SYSCTL
> +	size_t len = strlen(saved_command_line) + 1;
> +	char *command_line;
> +
> +	command_line = kzalloc(len, GFP_KERNEL);
> +	if (!command_line)
> +		panic("%s: Failed to allocate %zu bytes\n", __func__, len);
> +
> +	strcpy(command_line, saved_command_line);

No need to open-code this:

	char *command_line;

	command_line = kstrdup(saved_command_line, GFP_KERNEL);
	if (!command_line)
		panic("%s: Failed to allocate %zu bytes\n", __func__, len);

> +
> +	parse_args("Setting sysctl args", command_line,
> +		   NULL, 0, -1, -1, NULL, process_sysctl_arg);
> +
> +	kfree(command_line);
> +#endif
> +}
> +
>  static int __ref kernel_init(void *unused)
>  {
>  	int ret;
> @@ -1367,6 +1386,8 @@ static int __ref kernel_init(void *unused)
>  
>  	rcu_end_inkernel_boot();
>  
> +	do_sysctl_args();
> +
>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index ad5b88a53c5a..18c7f5606d55 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1980,6 +1980,68 @@ int __init sysctl_init(void)
>  	return 0;
>  }
>  
> +/* Set sysctl value passed on kernel command line. */
> +int process_sysctl_arg(char *param, char *val,
> +			       const char *unused, void *arg)
> +{
> +	size_t count;
> +	char *remaining;
> +	int err;
> +	loff_t ppos = 0;
> +	struct ctl_table *ctl, *found = NULL;
> +
> +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> +		return 0;
> +
> +	param += sizeof("sysctl.") - 1;
> +
> +	remaining = param;
> +	ctl = &sysctl_base_table[0];
> +
> +	while(ctl->procname != 0) {
> +		int len = strlen(ctl->procname);
> +		if (strncmp(remaining, ctl->procname, len)) {
> +			ctl++;
> +			continue;
> +		}

I think you need to validate that "len" is within "remaining" here
first.

> +		if (ctl->child) {
> +			if (remaining[len] == '.') {
> +				remaining += len + 1;

And that "len + 1" is still valid.

> +				ctl = ctl->child;
> +				continue;
> +			}
> +		} else {
> +			if (remaining[len] == '\0') {
> +				found = ctl;
> +				break;
> +			}
> +		}
> +		ctl++;
> +	}
> +
> +	if (!found) {
> +		pr_warn("Unknown sysctl param '%s' on command line", param);
> +		return 0;
> +	}
> +
> +	if (!(found->mode & 0200)) {
> +		pr_warn("Cannot set sysctl '%s=%s' from command line - not writable",
> +			param, val);
> +		return 0;
> +	}

Oh yes; good call about this writable mode test.

> +
> +	count = strlen(val);
> +	err = found->proc_handler(found, 1, val, &count, &ppos);
> +
> +	if (err)
> +		pr_warn("Error %d setting sysctl '%s=%s' from command line",
> +			err, param, val);
> +
> +	pr_debug("Set sysctl '%s=%s' from command line", param, val);
> +
> +	return 0;
> +}
> +
>  #endif /* CONFIG_SYSCTL */
>  
>  /*
> -- 
> 2.25.1
> 

Outside of the nits and missing bounds check, I like it! :) 

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-25 12:03 [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Vlastimil Babka
  2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
  2020-03-25 21:21 ` [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Kees Cook
@ 2020-03-25 22:20 ` Eric W. Biederman
  2020-03-25 22:54   ` Kees Cook
                     ` (2 more replies)
  2 siblings, 3 replies; 18+ messages in thread
From: Eric W. Biederman @ 2020-03-25 22:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-api, linux-mm, Ivan Teterevkov, Michal Hocko,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

Vlastimil Babka <vbabka@suse.cz> writes:

> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
>
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
> Also after 61a47c1ad3a4 ("sysctl: Remove the sysctl system call") the only way
> to set sysctl is via procfs, so this would eventually allow small systems to be
> built without CONFIG_PROC_SYSCTL and still be able to change sysctl parameters.
>
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and searches for them in the sysctl ctl_tables. When
> found, the respective proc handler is invoked. The search is just a naive
> linear one, to avoid using the whole procfs layer. It should be acceptable,
> as the cost depends on number of sysctl. parameters passed.
>
> The main limitation of avoiding the procfs layer is however that sysctls
> dynamically registered by register_sysctl_table() or register_sysctl_paths()
> cannot currently be set by this method.
>
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.
>
> [1] https://lore.kernel.org/linux-doc/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> v2: - handle any nesting level of parameter name
>  - add Documentation/admin-guide/kernel-parameters.txt blurb
>  - alias support for legacy one-off parameters, with first conversion (patch 2)
>  - still no support for dynamically registed sysctls
>
>  .../admin-guide/kernel-parameters.txt         |  9 +++
>  include/linux/sysctl.h                        |  1 +
>  init/main.c                                   | 21 +++++++
>  kernel/sysctl.c                               | 62 +++++++++++++++++++
>  4 files changed, 93 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..5076e288f93f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4793,6 +4793,15 @@
>  
>  	switches=	[HW,M68k]
>  
> +	sysctl.*=	[KNL]
> +			Set a sysctl parameter right before loading the init
> +			process, as if the value was written to the respective
> +			/proc/sys/... file. Currently a subset of sysctl
> +			parameters is supported that is not registered
> +			dynamically. Unrecognized parameters and invalid values
> +			are reported in the kernel log.
> +			Example: sysctl.vm.swappiness=40
> +
>  	sysfs.deprecated=0|1 [KNL]
>  			Enable/disable old style sysfs layout for old udev
>  			on older distributions. When this option is enabled
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 02fa84493f23..62ae963a5c0c 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -206,6 +206,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
>  void unregister_sysctl_table(struct ctl_table_header * table);
>  
>  extern int sysctl_init(void);
> +int process_sysctl_arg(char *param, char *val, const char *unused, void *arg);
>  
>  extern struct ctl_table sysctl_mount_point[];
>  
> diff --git a/init/main.c b/init/main.c
> index ee4947af823f..74a094c6b8b9 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1345,6 +1345,25 @@ void __weak free_initmem(void)
>  	free_initmem_default(POISON_FREE_INITMEM);
>  }
>  
> +static void do_sysctl_args(void)
> +{
> +#ifdef CONFIG_SYSCTL
> +	size_t len = strlen(saved_command_line) + 1;
> +	char *command_line;
> +
> +	command_line = kzalloc(len, GFP_KERNEL);
> +	if (!command_line)
> +		panic("%s: Failed to allocate %zu bytes\n", __func__, len);
> +
> +	strcpy(command_line, saved_command_line);
> +
> +	parse_args("Setting sysctl args", command_line,
> +		   NULL, 0, -1, -1, NULL, process_sysctl_arg);
> +
> +	kfree(command_line);
> +#endif
> +}
> +
>  static int __ref kernel_init(void *unused)
>  {
>  	int ret;
> @@ -1367,6 +1386,8 @@ static int __ref kernel_init(void *unused)
>  
>  	rcu_end_inkernel_boot();
>  
> +	do_sysctl_args();
> +
>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index ad5b88a53c5a..18c7f5606d55 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1980,6 +1980,68 @@ int __init sysctl_init(void)
>  	return 0;
>  }
>  
> +/* Set sysctl value passed on kernel command line. */
> +int process_sysctl_arg(char *param, char *val,
> +			       const char *unused, void *arg)
> +{
> +	size_t count;
> +	char *remaining;
> +	int err;
> +	loff_t ppos = 0;
> +	struct ctl_table *ctl, *found = NULL;
> +
> +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> +		return 0;

Is there any way we can use a slash separated path.  I know
in practice there are not any sysctl names that don't have
a '.' in them but why should we artifically limit ourselves?

I guess as long as we don't mind not being able to set sysctls
that have a '.' in them it doesn't matter.

> +
> +	param += sizeof("sysctl.") - 1;
> +
> +	remaining = param;
> +	ctl = &sysctl_base_table[0];
> +
> +	while(ctl->procname != 0) {
              ^^^^^^^^^^^^^^^^^^

Please either test "while(ctl->procname)" or
"while(ctl->procname != NULL)" testing against 0 makes it look like
procname is an integer.  The style in the kernel is to test against
NULL, to make it clear when something is a pointer.

> +		int len = strlen(ctl->procname);

You should have done "strchr(remaining)" and figured out if there is
another '.' and only compared up to that dot.  Probably skipping this
entry entirely if the two lengths don't match.

> +		if (strncmp(remaining, ctl->procname, len)) {
> +			ctl++;
> +			continue;
> +		}
> +		if (ctl->child) {
> +			if (remaining[len] == '.') {
> +				remaining += len + 1;
> +				ctl = ctl->child;
> +				continue;
> +			}
> +		} else {
> +			if (remaining[len] == '\0') {
> +				found = ctl;
> +				break;
> +			}
> +		}
> +		ctl++;

There should be exactly one match for a name a table.
If you get here the code should break, not continue on.

> +	}
> +
> +	if (!found) {
> +		pr_warn("Unknown sysctl param '%s' on command line", param);
> +		return 0;
> +	}
> +
> +	if (!(found->mode & 0200)) {
> +		pr_warn("Cannot set sysctl '%s=%s' from command line - not writable",
> +			param, val);
> +		return 0;
> +	}
> +
> +	count = strlen(val);
> +	err = found->proc_handler(found, 1, val, &count, &ppos);
> +
> +	if (err)
> +		pr_warn("Error %d setting sysctl '%s=%s' from command line",
> +			err, param, val);
> +
> +	pr_debug("Set sysctl '%s=%s' from command line", param, val);
> +
> +	return 0;
> +}

You really should be able to have this code live in
fs/proc/proc_sysctl.c and utilize lookup_entry.

That should give you the ability to lookup any sysctl.  If
kernel/sysctl.c is compiled into the kernel proc_sysctl.c is compiled
into the kernel.  Systems that don't select CONFIG_PROC_SYSCTL won't
have any sysctl tables installed at all so they do not make sense to
consider or design for.

Further it will be faster to lookup the sysctls using the code from
proc_sysctl.c as it constructs an rbtree of all of the entries in
a directory.  The code might as well take advantage of that for large
directories.

Arguably the main sysctl tables in kernel/sysctl.c should be split up so
that things are more localized and there is less global state exported
throughout the kernel.  I certainly don't want to discourage anyone from
doing that just so their sysctl can be used on the command line.


Hmm.  There is a big gotcha in here and I think it should be mentioned.
This code only works because no one has done set_fs(KERNEL_DS).  Which
means this only works with strings that are kernel addresses essentially
by mistake.  A big fat comment documenting why it is safe to pass in
kernel addresses to a function that takes a "char __user*" pointer
would be very good.

Eric


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 2/2] kernel/sysctl: support handling command line aliases
  2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
  2020-03-25 14:29   ` Michal Hocko
@ 2020-03-25 22:42   ` Kees Cook
  2020-03-29 15:00   ` Arvind Sankar
  2 siblings, 0 replies; 18+ messages in thread
From: Kees Cook @ 2020-03-25 22:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Luis Chamberlain, Iurii Zaikin, linux-kernel, linux-api,
	linux-mm, Ivan Teterevkov, Michal Hocko, David Rientjes,
	Matthew Wilcox, Eric W . Biederman, Guilherme G . Piccoli

On Wed, Mar 25, 2020 at 01:03:45PM +0100, Vlastimil Babka wrote:
> We can now handle sysctl parameters on kernel command line, but historically
> some parameters introduced their own command line equivalent, which we don't
> want to remove for compatibility reasons. We can however convert them to the
> generic infrastructure with a table translating the legacy command line
> parameters to their sysctl names, and removing the one-off param handlers.
> 
> This patch adds the support and makes the first conversion to demonstrate it,
> on the (deprecated) numa_zonelist_order parameter.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  kernel/sysctl.c | 39 +++++++++++++++++++++++++++++++++++----
>  mm/page_alloc.c |  9 ---------
>  2 files changed, 35 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 18c7f5606d55..fd72853396f9 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1971,6 +1971,22 @@ static struct ctl_table dev_table[] = {
>  	{ }
>  };
>  
> +struct sysctl_alias {
> +	char *kernel_param;

const char ...

> +	char *sysctl_param;
> +};
> +
> +/*
> + * Historically some settings had both sysctl and a command line parameter.
> + * With the generic sysctl. parameter support, we can handle them at a single
> + * place and only keep the historical name for compatibility. This is not meant
> + * to add brand new aliases.
> + */
> +static struct sysctl_alias sysctl_aliases[] = {

static const ...

> +	{"numa_zonelist_order",		"vm.numa_zonelist_order" },
> +	{ }
> +};
> +
>  int __init sysctl_init(void)
>  {
>  	struct ctl_table_header *hdr;
> @@ -1980,6 +1996,18 @@ int __init sysctl_init(void)
>  	return 0;
>  }
>  
> +char *sysctl_find_alias(char *param)
> +{
> +	struct sysctl_alias *alias;
> +
> +	for (alias = &sysctl_aliases[0]; alias->kernel_param != NULL; alias++) {
> +		if (strcmp(alias->kernel_param, param) == 0)
> +			return alias->sysctl_param;
> +	}
> +
> +	return NULL;
> +}
> +
>  /* Set sysctl value passed on kernel command line. */
>  int process_sysctl_arg(char *param, char *val,
>  			       const char *unused, void *arg)
> @@ -1990,10 +2018,13 @@ int process_sysctl_arg(char *param, char *val,
>  	loff_t ppos = 0;
>  	struct ctl_table *ctl, *found = NULL;
>  
> -	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> -		return 0;
> -
> -	param += sizeof("sysctl.") - 1;
> +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1) == 0) {
> +		param += sizeof("sysctl.") - 1;
> +	} else {
> +		param = sysctl_find_alias(param);
> +		if (!param)
> +			return 0;
> +	}
>  
>  	remaining = param;
>  	ctl = &sysctl_base_table[0];
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..de7a134b1b8a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5460,15 +5460,6 @@ static int __parse_numa_zonelist_order(char *s)
>  	return 0;
>  }
>  
> -static __init int setup_numa_zonelist_order(char *s)
> -{
> -	if (!s)
> -		return 0;
> -
> -	return __parse_numa_zonelist_order(s);
> -}
> -early_param("numa_zonelist_order", setup_numa_zonelist_order);
> -
>  char numa_zonelist_order[] = "Node";

Nice. :) Effectively: -9 lines +1 line for the using aliasing. I think
it would be worth identifying the specific requirements for a sysctl
alias to be safe to use, and likely in a comment before the alias table:

- boot param parsing must be identical to the sysctl parsing
- temporal changes must be tolerable: i.e. early_param() runs earlier
  than when the sysctl-in-boot-param runs -- must the variable be set
  before the code's other __init functions run?
- must be for a non-module code (since we don't have the dynamic support
  yet)

As it turns out, "numa_zonelist_order" has literally no effect on
anything -- it's a parsed but ignored setting:

static int __parse_numa_zonelist_order(char *s)
{
	/*
	 * We used to support different zonlists modes but they turned
	 * out to be just not useful. Let's keep the warning in place
	 * if somebody still use the cmd line parameter so that we do
	 * not fail it silently
	 */
	if (!(*s == 'd' || *s == 'D' || *s == 'n' || *s == 'N')) {
		pr_warn("Ignoring unsupported numa_zonelist_order value: %s\n", s);
		return -EINVAL;
	}
	return 0;
}

But anyway, do you have a way to generate a list of potential candidates?

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-25 22:20 ` Eric W. Biederman
@ 2020-03-25 22:54   ` Kees Cook
  2020-03-26  6:58   ` Michal Hocko
  2020-03-26 13:29   ` Vlastimil Babka
  2 siblings, 0 replies; 18+ messages in thread
From: Kees Cook @ 2020-03-25 22:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vlastimil Babka, Luis Chamberlain, Iurii Zaikin, linux-kernel,
	linux-api, linux-mm, Ivan Teterevkov, Michal Hocko,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

On Wed, Mar 25, 2020 at 05:20:40PM -0500, Eric W. Biederman wrote:
> Hmm.  There is a big gotcha in here and I think it should be mentioned.
> This code only works because no one has done set_fs(KERNEL_DS).  Which
> means this only works with strings that are kernel addresses essentially
> by mistake.  A big fat comment documenting why it is safe to pass in
> kernel addresses to a function that takes a "char __user*" pointer
> would be very good.

Yeah, I was going to mention this too just now as I went looking through
one of the handlers and was reminded that the args are marked __user. :P

I suspect we might need to add some __force __user markings or something
(with a comment as you say above), to keep sparse from going crazy. :)

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-25 22:20 ` Eric W. Biederman
  2020-03-25 22:54   ` Kees Cook
@ 2020-03-26  6:58   ` Michal Hocko
  2020-03-26  7:21     ` Kees Cook
                       ` (2 more replies)
  2020-03-26 13:29   ` Vlastimil Babka
  2 siblings, 3 replies; 18+ messages in thread
From: Michal Hocko @ 2020-03-26  6:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vlastimil Babka, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	linux-kernel, linux-api, linux-mm, Ivan Teterevkov,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

On Wed 25-03-20 17:20:40, Eric W. Biederman wrote:
> Vlastimil Babka <vbabka@suse.cz> writes:
[...]
> > +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> > +		return 0;
> 
> Is there any way we can use a slash separated path.  I know
> in practice there are not any sysctl names that don't have
> a '.' in them but why should we artifically limit ourselves?

Because this is the normal userspace interface? Why should it be any
different from calling sysctl?
[...]

> Further it will be faster to lookup the sysctls using the code from
> proc_sysctl.c as it constructs an rbtree of all of the entries in
> a directory.  The code might as well take advantage of that for large
> directories.

Sounds like a good fit for a follow up patch to me. Let's make this
as simple as possible for the initial version. But up to Vlastimil of
course.

[...]

> Hmm.  There is a big gotcha in here and I think it should be mentioned.
> This code only works because no one has done set_fs(KERNEL_DS).  Which
> means this only works with strings that are kernel addresses essentially
> by mistake.  A big fat comment documenting why it is safe to pass in
> kernel addresses to a function that takes a "char __user*" pointer
> would be very good.

Agreed

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-26  6:58   ` Michal Hocko
@ 2020-03-26  7:21     ` Kees Cook
  2020-03-26 12:45     ` Eric W. Biederman
  2020-03-26 13:30     ` Christian Brauner
  2 siblings, 0 replies; 18+ messages in thread
From: Kees Cook @ 2020-03-26  7:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Eric W. Biederman, Vlastimil Babka, Luis Chamberlain,
	Iurii Zaikin, linux-kernel, linux-api, linux-mm, Ivan Teterevkov,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

On Thu, Mar 26, 2020 at 07:58:29AM +0100, Michal Hocko wrote:
> On Wed 25-03-20 17:20:40, Eric W. Biederman wrote:
> > Vlastimil Babka <vbabka@suse.cz> writes:
> [...]
> > > +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> > > +		return 0;
> > 
> > Is there any way we can use a slash separated path.  I know
> > in practice there are not any sysctl names that don't have
> > a '.' in them but why should we artifically limit ourselves?
> 
> Because this is the normal userspace interface? Why should it be any
> different from calling sysctl?

Right. The common method from userspace is dot-separated (which I agree
is weird, but it's been like this for ages: see manpages sysctl(8) and
sysctl.conf(5) for the details and examples). While "/" is accepted by
sysctl, the files shipped in /etc/sysctl.d/ are all using "."  separators.

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-25 21:21 ` [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Kees Cook
@ 2020-03-26  9:30   ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2020-03-26  9:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis Chamberlain, Iurii Zaikin, linux-kernel, linux-api,
	linux-mm, Ivan Teterevkov, Michal Hocko, David Rientjes,
	Matthew Wilcox, Eric W . Biederman, Guilherme G . Piccoli

On 3/25/20 10:21 PM, Kees Cook wrote:
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -1345,6 +1345,25 @@ void __weak free_initmem(void)
>>  	free_initmem_default(POISON_FREE_INITMEM);
>>  }
>>  
>> +static void do_sysctl_args(void)
>> +{
>> +#ifdef CONFIG_SYSCTL
>> +	size_t len = strlen(saved_command_line) + 1;
>> +	char *command_line;
>> +
>> +	command_line = kzalloc(len, GFP_KERNEL);
>> +	if (!command_line)
>> +		panic("%s: Failed to allocate %zu bytes\n", __func__, len);
>> +
>> +	strcpy(command_line, saved_command_line);
> 
> No need to open-code this:
> 
> 	char *command_line;
> 
> 	command_line = kstrdup(saved_command_line, GFP_KERNEL);
> 	if (!command_line)
> 		panic("%s: Failed to allocate %zu bytes\n", __func__, len);
> 

Ah, right. I admit I basically copy_pasted some other parse_args user.

>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index ad5b88a53c5a..18c7f5606d55 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -1980,6 +1980,68 @@ int __init sysctl_init(void)
>>  	return 0;
>>  }
>>  
>> +/* Set sysctl value passed on kernel command line. */
>> +int process_sysctl_arg(char *param, char *val,
>> +			       const char *unused, void *arg)
>> +{
>> +	size_t count;
>> +	char *remaining;
>> +	int err;
>> +	loff_t ppos = 0;
>> +	struct ctl_table *ctl, *found = NULL;
>> +
>> +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
>> +		return 0;
>> +
>> +	param += sizeof("sysctl.") - 1;
>> +
>> +	remaining = param;
>> +	ctl = &sysctl_base_table[0];
>> +
>> +	while(ctl->procname != 0) {
>> +		int len = strlen(ctl->procname);
>> +		if (strncmp(remaining, ctl->procname, len)) {
>> +			ctl++;
>> +			continue;
>> +		}
> 
> I think you need to validate that "len" is within "remaining" here
> first.

My reasoning was that if remaining terminates too early, the null byte would be
different from non-null byte in ctl->procname and thus strncmp will return it as
different?
And the reason I used len in strncmp there is only so it doesn't compare the
terminating null, because remaning can continue with ".foo" instead.

>> +		if (ctl->child) {
>> +			if (remaining[len] == '.') {
>> +				remaining += len + 1;
> 
> And that "len + 1" is still valid.

And since we passed strncmp(..., len), remaining[len] might be null byte, but
then we can still compare it with '.'.

But C strings are full of landmines.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-26  6:58   ` Michal Hocko
  2020-03-26  7:21     ` Kees Cook
@ 2020-03-26 12:45     ` Eric W. Biederman
  2020-03-30 22:09       ` Luis Chamberlain
  2020-03-26 13:30     ` Christian Brauner
  2 siblings, 1 reply; 18+ messages in thread
From: Eric W. Biederman @ 2020-03-26 12:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	linux-kernel, linux-api, linux-mm, Ivan Teterevkov,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

Michal Hocko <mhocko@kernel.org> writes:

> On Wed 25-03-20 17:20:40, Eric W. Biederman wrote:
>> Vlastimil Babka <vbabka@suse.cz> writes:
> [...]
>> > +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
>> > +		return 0;
>> 
>> Is there any way we can use a slash separated path.  I know
>> in practice there are not any sysctl names that don't have
>> a '.' in them but why should we artifically limit ourselves?
>
> Because this is the normal userspace interface? Why should it be any
> different from calling sysctl?
> [...]

Why should the kernel command line implement userspace whims?
I was thinking something like: "sysctl/kernel/max_lock_depth=2048"
doesn't look too bad and it makes things like reusing our
kernel internal helpers much easier.

Plus it suggest that we could do the same for sysfs files:
	"sysfs/kernel/fscaps=1"

And the code could be same for both cases except for the filesystem
prefix.

>> Further it will be faster to lookup the sysctls using the code from
>> proc_sysctl.c as it constructs an rbtree of all of the entries in
>> a directory.  The code might as well take advantage of that for large
>> directories.
>
> Sounds like a good fit for a follow up patch to me. Let's make this
> as simple as possible for the initial version. But up to Vlastimil of course.

I would argue that reusing proc_sysctl.c:lookup_entry() should make the
code simpler, and easier to reason about.

Especially given the bugs in the first version with a sysctl path.
A clean separation between separating the path from into pieces and
looking up those pieces should make the code more robust.

That plus I want to get very far away from the incorrect idea that you
can have sysctls without compiling in proc support.  That is not how
the code works, that is not how the code is tested.

It is also worth pointing out that:

	proc_mnt = kern_mount(proc_fs_type);
        for_each_sysctl_cmdline() {
        	...
		file = file_open_root(proc_mnt->mnt_root, proc_mnt, sysctl_path, O_WRONLY, 0);
		kernel_write(file, value, value_len);
        }
        kern_umount(proc_mnt);

Is not an unreasonable implementation.

There are problems with a persistent mount of proc in that it forces
userspace not to use any proc mount options.  But a temporary mount of
proc to deal with command line options is not at all unreasonable.
Plus it looks like we can have kern_write do all of the kernel/user
buffer silliness.

> [...]
>
>> Hmm.  There is a big gotcha in here and I think it should be mentioned.
>> This code only works because no one has done set_fs(KERNEL_DS).  Which
>> means this only works with strings that are kernel addresses essentially
>> by mistake.  A big fat comment documenting why it is safe to pass in
>> kernel addresses to a function that takes a "char __user*" pointer
>> would be very good.
>
> Agreed

Eric


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-25 22:20 ` Eric W. Biederman
  2020-03-25 22:54   ` Kees Cook
  2020-03-26  6:58   ` Michal Hocko
@ 2020-03-26 13:29   ` Vlastimil Babka
  2 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2020-03-26 13:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-api, linux-mm, Ivan Teterevkov, Michal Hocko,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

On 3/25/20 11:20 PM, Eric W. Biederman wrote:
> Vlastimil Babka <vbabka@suse.cz> writes:
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -1980,6 +1980,68 @@ int __init sysctl_init(void)
>>  	return 0;
>>  }
>>  
>> +/* Set sysctl value passed on kernel command line. */
>> +int process_sysctl_arg(char *param, char *val,
>> +			       const char *unused, void *arg)
>> +{
>> +	size_t count;
>> +	char *remaining;
>> +	int err;
>> +	loff_t ppos = 0;
>> +	struct ctl_table *ctl, *found = NULL;
>> +
>> +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
>> +		return 0;
> 
> Is there any way we can use a slash separated path.  I know

We could, but as others explained, people and tools are used to the dot
separation, so I think the only sensible options are supporting only dot, or
both dot and slash.

> in practice there are not any sysctl names that don't have
> a '.' in them but why should we artifically limit ourselves?

Existing tools would probably break (or perhaps sysctl(8) is smarter than I
think, dunno).

> I guess as long as we don't mind not being able to set sysctls
> that have a '.' in them it doesn't matter.

Right.

>> +
>> +	param += sizeof("sysctl.") - 1;
>> +
>> +	remaining = param;
>> +	ctl = &sysctl_base_table[0];
>> +
>> +	while(ctl->procname != 0) {
>               ^^^^^^^^^^^^^^^^^^
> 
> Please either test "while(ctl->procname)" or
> "while(ctl->procname != NULL)" testing against 0 makes it look like
> procname is an integer.  The style in the kernel is to test against
> NULL, to make it clear when something is a pointer.

OK

>> +		int len = strlen(ctl->procname);
> 
> You should have done "strchr(remaining)" and figured out if there is
> another '.' and only compared up to that dot.  Probably skipping this
> entry entirely if the two lengths don't match.

That's also possible, but AFAICS my code works as intended, as I explained in a
reply to Kees, and also below:

>> +		if (strncmp(remaining, ctl->procname, len)) {
>> +			ctl++;
>> +			continue;
>> +		}
>> +		if (ctl->child) {
>> +			if (remaining[len] == '.') {
>> +				remaining += len + 1;
>> +				ctl = ctl->child;
>> +				continue;
>> +			}
>> +		} else {
>> +			if (remaining[len] == '\0') {
>> +				found = ctl;
>> +				break;
>> +			}
>> +		}
>> +		ctl++;
> 
> There should be exactly one match for a name a table.
> If you get here the code should break, not continue on.

If there existed e.g. both "vm.swap" and "vm.swappiness" options and user passed
"vm.swappiness=10", but the "swap" ctl entry was encountered first, it will
succeed the strncmp(), but then realize "swap" was just a prefix of what user
specified (remaining[len] is not '\0') and hence continue serching for other
matches.

>> +	}
>> +
>> +	if (!found) {
>> +		pr_warn("Unknown sysctl param '%s' on command line", param);
>> +		return 0;
>> +	}
>> +
>> +	if (!(found->mode & 0200)) {
>> +		pr_warn("Cannot set sysctl '%s=%s' from command line - not writable",
>> +			param, val);
>> +		return 0;
>> +	}
>> +
>> +	count = strlen(val);
>> +	err = found->proc_handler(found, 1, val, &count, &ppos);
>> +
>> +	if (err)
>> +		pr_warn("Error %d setting sysctl '%s=%s' from command line",
>> +			err, param, val);
>> +
>> +	pr_debug("Set sysctl '%s=%s' from command line", param, val);
>> +
>> +	return 0;
>> +}
> 
> You really should be able to have this code live in
> fs/proc/proc_sysctl.c and utilize lookup_entry.
> 
> That should give you the ability to lookup any sysctl.  If
> kernel/sysctl.c is compiled into the kernel proc_sysctl.c is compiled
> into the kernel.  Systems that don't select CONFIG_PROC_SYSCTL won't
> have any sysctl tables installed at all so they do not make sense to
> consider or design for.

I see. In fact one reason why I tried to avoid the proc stuff was your commit
61a47c1ad3a4 ("sysctl: Remove the sysctl system call") and this part:

> As this removes one of the few uses of the internal kernel mount
> of proc I hope this allows for even more simplifications of the
> proc filesystem.

But if you now suggest using the kernel mount then sure, it I don't object make
the code simpler and handle all sysctls.

> Further it will be faster to lookup the sysctls using the code from
> proc_sysctl.c as it constructs an rbtree of all of the entries in
> a directory.  The code might as well take advantage of that for large
> directories.
> 
> Arguably the main sysctl tables in kernel/sysctl.c should be split up so
> that things are more localized and there is less global state exported
> throughout the kernel.  I certainly don't want to discourage anyone from
> doing that just so their sysctl can be used on the command line.

Fair point.

> Hmm.  There is a big gotcha in here and I think it should be mentioned.
> This code only works because no one has done set_fs(KERNEL_DS).  Which
> means this only works with strings that are kernel addresses essentially
> by mistake.  A big fat comment documenting why it is safe to pass in
> kernel addresses to a function that takes a "char __user*" pointer
> would be very good.

Thanks, didn't realize that.

> Eric
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-26  6:58   ` Michal Hocko
  2020-03-26  7:21     ` Kees Cook
  2020-03-26 12:45     ` Eric W. Biederman
@ 2020-03-26 13:30     ` Christian Brauner
  2020-03-26 13:39       ` Michal Hocko
  2 siblings, 1 reply; 18+ messages in thread
From: Christian Brauner @ 2020-03-26 13:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Eric W. Biederman, Vlastimil Babka, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-api, linux-mm, Ivan Teterevkov,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

On Thu, Mar 26, 2020 at 07:58:29AM +0100, Michal Hocko wrote:
> On Wed 25-03-20 17:20:40, Eric W. Biederman wrote:
> > Vlastimil Babka <vbabka@suse.cz> writes:
> [...]
> > > +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> > > +		return 0;
> > 
> > Is there any way we can use a slash separated path.  I know
> > in practice there are not any sysctl names that don't have
> > a '.' in them but why should we artifically limit ourselves?
> 
> Because this is the normal userspace interface? Why should it be any
> different from calling sysctl?
> [...]

Imho, we should use ".". Kernel developers aren't the ones setting
these options, admins are and if I think back to the times doing that as
a job at uni I'd be very confused if I learned that I get to set sysctl
options through the kernel command but need to use yet another format
than what I usually do to set those from the shell. Consistency is most
of the times to be preferred imho.

Also, the kernel docs illustrate that the "." syntax is used for other
keys as well (e.g. acpi.<option>) and userspace options passed via the
kernel command line have standardized on the "." format as well, e.g.
systemd appends in the same format (e.g.
systemd.unified_cgroup_hierarchy, systemd.unit what have you).

Christian


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-26 13:30     ` Christian Brauner
@ 2020-03-26 13:39       ` Michal Hocko
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2020-03-26 13:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, Vlastimil Babka, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, linux-kernel, linux-api, linux-mm, Ivan Teterevkov,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

On Thu 26-03-20 14:30:41, Christian Brauner wrote:
> On Thu, Mar 26, 2020 at 07:58:29AM +0100, Michal Hocko wrote:
> > On Wed 25-03-20 17:20:40, Eric W. Biederman wrote:
> > > Vlastimil Babka <vbabka@suse.cz> writes:
> > [...]
> > > > +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> > > > +		return 0;
> > > 
> > > Is there any way we can use a slash separated path.  I know
> > > in practice there are not any sysctl names that don't have
> > > a '.' in them but why should we artifically limit ourselves?
> > 
> > Because this is the normal userspace interface? Why should it be any
> > different from calling sysctl?
> > [...]
> 
> Imho, we should use ".". Kernel developers aren't the ones setting
> these options, admins are and if I think back to the times doing that as
> a job at uni I'd be very confused if I learned that I get to set sysctl
> options through the kernel command but need to use yet another format
> than what I usually do to set those from the shell. Consistency is most
> of the times to be preferred imho.

Absolutely agreed! Even if sysctl can consume / instead of ., which was a
news to me btw, the majority of the usage is with `.'
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 2/2] kernel/sysctl: support handling command line aliases
  2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
  2020-03-25 14:29   ` Michal Hocko
  2020-03-25 22:42   ` Kees Cook
@ 2020-03-29 15:00   ` Arvind Sankar
  2 siblings, 0 replies; 18+ messages in thread
From: Arvind Sankar @ 2020-03-29 15:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Luis Chamberlain, Kees Cook, Iurii Zaikin, linux-kernel,
	linux-api, linux-mm, Ivan Teterevkov, Michal Hocko,
	David Rientjes, Matthew Wilcox, Eric W . Biederman,
	Guilherme G . Piccoli

On Wed, Mar 25, 2020 at 01:03:45PM +0100, Vlastimil Babka wrote:
> We can now handle sysctl parameters on kernel command line, but historically
> some parameters introduced their own command line equivalent, which we don't
> want to remove for compatibility reasons. We can however convert them to the
> generic infrastructure with a table translating the legacy command line
> parameters to their sysctl names, and removing the one-off param handlers.
> 
> This patch adds the support and makes the first conversion to demonstrate it,
> on the (deprecated) numa_zonelist_order parameter.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  kernel/sysctl.c | 39 +++++++++++++++++++++++++++++++++++----
>  mm/page_alloc.c |  9 ---------
>  2 files changed, 35 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 18c7f5606d55..fd72853396f9 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1971,6 +1971,22 @@ static struct ctl_table dev_table[] = {
>  	{ }
>  };
>  
> +struct sysctl_alias {
> +	char *kernel_param;
> +	char *sysctl_param;
> +};
> +
> +/*
> + * Historically some settings had both sysctl and a command line parameter.
> + * With the generic sysctl. parameter support, we can handle them at a single
> + * place and only keep the historical name for compatibility. This is not meant
> + * to add brand new aliases.
> + */
> +static struct sysctl_alias sysctl_aliases[] = {
> +	{"numa_zonelist_order",		"vm.numa_zonelist_order" },
> +	{ }
> +};
> +
>  int __init sysctl_init(void)
>  {
>  	struct ctl_table_header *hdr;
> @@ -1980,6 +1996,18 @@ int __init sysctl_init(void)
>  	return 0;
>  }
>  
> +char *sysctl_find_alias(char *param)

This function should probably be declared static?

> +{
> +	struct sysctl_alias *alias;
> +
> +	for (alias = &sysctl_aliases[0]; alias->kernel_param != NULL; alias++) {
> +		if (strcmp(alias->kernel_param, param) == 0)
> +			return alias->sysctl_param;
> +	}
> +
> +	return NULL;
> +}
> +
>  /* Set sysctl value passed on kernel command line. */
>  int process_sysctl_arg(char *param, char *val,
>  			       const char *unused, void *arg)
> @@ -1990,10 +2018,13 @@ int process_sysctl_arg(char *param, char *val,
>  	loff_t ppos = 0;
>  	struct ctl_table *ctl, *found = NULL;
>  
> -	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> -		return 0;
> -
> -	param += sizeof("sysctl.") - 1;
> +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1) == 0) {
> +		param += sizeof("sysctl.") - 1;
> +	} else {
> +		param = sysctl_find_alias(param);
> +		if (!param)
> +			return 0;
> +	}
>  
>  	remaining = param;
>  	ctl = &sysctl_base_table[0];
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c4eb750a199..de7a134b1b8a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5460,15 +5460,6 @@ static int __parse_numa_zonelist_order(char *s)
>  	return 0;
>  }
>  
> -static __init int setup_numa_zonelist_order(char *s)
> -{
> -	if (!s)
> -		return 0;
> -
> -	return __parse_numa_zonelist_order(s);
> -}
> -early_param("numa_zonelist_order", setup_numa_zonelist_order);
> -
>  char numa_zonelist_order[] = "Node";
>  
>  /*
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
  2020-03-26 12:45     ` Eric W. Biederman
@ 2020-03-30 22:09       ` Luis Chamberlain
  0 siblings, 0 replies; 18+ messages in thread
From: Luis Chamberlain @ 2020-03-30 22:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michal Hocko, Vlastimil Babka, Kees Cook, Iurii Zaikin,
	linux-kernel, linux-api, linux-mm, Ivan Teterevkov,
	David Rientjes, Matthew Wilcox, Guilherme G . Piccoli

On Thu, Mar 26, 2020 at 07:45:13AM -0500, Eric W. Biederman wrote:
> > On Wed 25-03-20 17:20:40, Eric W. Biederman wrote:
> plus I want to get very far away from the incorrect idea that you
> can have sysctls without compiling in proc support.  That is not how
> the code works, that is not how the code is tested.

Agreed.

> It is also worth pointing out that:
> 
> 	proc_mnt = kern_mount(proc_fs_type);
>         for_each_sysctl_cmdline() {
>         	...
> 		file = file_open_root(proc_mnt->mnt_root, proc_mnt, sysctl_path, O_WRONLY, 0);
> 		kernel_write(file, value, value_len);
>         }
>         kern_umount(proc_mnt);
> 
> Is not an unreasonable implementation.

This:

> There are problems with a persistent mount of proc in that it forces
> userspace not to use any proc mount options.  But a temporary mount of
> proc to deal with command line options is not at all unreasonable.
> Plus it looks like we can have kern_write do all of the kernel/user
> buffer silliness.

Is a bit of tribal knowledge worth documenting for the approach taken
forward. Vlastimil can you add a little comment mentioning some of this
logic?

  Luis


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25 12:03 [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Vlastimil Babka
2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
2020-03-25 14:29   ` Michal Hocko
2020-03-25 14:36     ` Vlastimil Babka
2020-03-25 14:44       ` Michal Hocko
2020-03-25 22:42   ` Kees Cook
2020-03-29 15:00   ` Arvind Sankar
2020-03-25 21:21 ` [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Kees Cook
2020-03-26  9:30   ` Vlastimil Babka
2020-03-25 22:20 ` Eric W. Biederman
2020-03-25 22:54   ` Kees Cook
2020-03-26  6:58   ` Michal Hocko
2020-03-26  7:21     ` Kees Cook
2020-03-26 12:45     ` Eric W. Biederman
2020-03-30 22:09       ` Luis Chamberlain
2020-03-26 13:30     ` Christian Brauner
2020-03-26 13:39       ` Michal Hocko
2020-03-26 13:29   ` Vlastimil Babka

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git