Linux-api Archive on lore.kernel.org
 help / color / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-mm@kvack.org, Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Michal Hocko <mhocko@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Guilherme G . Piccoli" <gpiccoli@canonical.com>
Subject: Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
Date: Wed, 25 Mar 2020 17:20:40 -0500
Message-ID: <874kuc5b5z.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20200325120345.12946-1-vbabka@suse.cz> (Vlastimil Babka's message of "Wed, 25 Mar 2020 13:03:44 +0100")

Vlastimil Babka <vbabka@suse.cz> writes:

> A recently proposed patch to add vm_swappiness command line parameter in
> addition to existing sysctl [1] made me wonder why we don't have a general
> support for passing sysctl parameters via command line. Googling found only
> somebody else wondering the same [2], but I haven't found any prior discussion
> with reasons why not to do this.
>
> Settings the vm_swappiness issue aside (the underlying issue might be solved in
> a different way), quick search of kernel-parameters.txt shows there are already
> some that exist as both sysctl and kernel parameter - hung_task_panic,
> nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism
> would remove the need to add more of those one-offs and might be handy in
> situations where configuration by e.g. /etc/sysctl.d/ is impractical.
> Also after 61a47c1ad3a4 ("sysctl: Remove the sysctl system call") the only way
> to set sysctl is via procfs, so this would eventually allow small systems to be
> built without CONFIG_PROC_SYSCTL and still be able to change sysctl parameters.
>
> Hence, this patch adds a new parse_args() pass that looks for parameters
> prefixed by 'sysctl.' and searches for them in the sysctl ctl_tables. When
> found, the respective proc handler is invoked. The search is just a naive
> linear one, to avoid using the whole procfs layer. It should be acceptable,
> as the cost depends on number of sysctl. parameters passed.
>
> The main limitation of avoiding the procfs layer is however that sysctls
> dynamically registered by register_sysctl_table() or register_sysctl_paths()
> cannot currently be set by this method.
>
> The processing is hooked right before the init process is loaded, as some
> handlers might be more complicated than simple setters and might need some
> subsystems to be initialized. At the moment the init process can be started and
> eventually execute a process writing to /proc/sys/ then it should be also fine
> to do that from the kernel.
>
> [1] https://lore.kernel.org/linux-doc/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
> [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> v2: - handle any nesting level of parameter name
>  - add Documentation/admin-guide/kernel-parameters.txt blurb
>  - alias support for legacy one-off parameters, with first conversion (patch 2)
>  - still no support for dynamically registed sysctls
>
>  .../admin-guide/kernel-parameters.txt         |  9 +++
>  include/linux/sysctl.h                        |  1 +
>  init/main.c                                   | 21 +++++++
>  kernel/sysctl.c                               | 62 +++++++++++++++++++
>  4 files changed, 93 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c07815d230bc..5076e288f93f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4793,6 +4793,15 @@
>  
>  	switches=	[HW,M68k]
>  
> +	sysctl.*=	[KNL]
> +			Set a sysctl parameter right before loading the init
> +			process, as if the value was written to the respective
> +			/proc/sys/... file. Currently a subset of sysctl
> +			parameters is supported that is not registered
> +			dynamically. Unrecognized parameters and invalid values
> +			are reported in the kernel log.
> +			Example: sysctl.vm.swappiness=40
> +
>  	sysfs.deprecated=0|1 [KNL]
>  			Enable/disable old style sysfs layout for old udev
>  			on older distributions. When this option is enabled
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 02fa84493f23..62ae963a5c0c 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -206,6 +206,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
>  void unregister_sysctl_table(struct ctl_table_header * table);
>  
>  extern int sysctl_init(void);
> +int process_sysctl_arg(char *param, char *val, const char *unused, void *arg);
>  
>  extern struct ctl_table sysctl_mount_point[];
>  
> diff --git a/init/main.c b/init/main.c
> index ee4947af823f..74a094c6b8b9 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1345,6 +1345,25 @@ void __weak free_initmem(void)
>  	free_initmem_default(POISON_FREE_INITMEM);
>  }
>  
> +static void do_sysctl_args(void)
> +{
> +#ifdef CONFIG_SYSCTL
> +	size_t len = strlen(saved_command_line) + 1;
> +	char *command_line;
> +
> +	command_line = kzalloc(len, GFP_KERNEL);
> +	if (!command_line)
> +		panic("%s: Failed to allocate %zu bytes\n", __func__, len);
> +
> +	strcpy(command_line, saved_command_line);
> +
> +	parse_args("Setting sysctl args", command_line,
> +		   NULL, 0, -1, -1, NULL, process_sysctl_arg);
> +
> +	kfree(command_line);
> +#endif
> +}
> +
>  static int __ref kernel_init(void *unused)
>  {
>  	int ret;
> @@ -1367,6 +1386,8 @@ static int __ref kernel_init(void *unused)
>  
>  	rcu_end_inkernel_boot();
>  
> +	do_sysctl_args();
> +
>  	if (ramdisk_execute_command) {
>  		ret = run_init_process(ramdisk_execute_command);
>  		if (!ret)
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index ad5b88a53c5a..18c7f5606d55 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1980,6 +1980,68 @@ int __init sysctl_init(void)
>  	return 0;
>  }
>  
> +/* Set sysctl value passed on kernel command line. */
> +int process_sysctl_arg(char *param, char *val,
> +			       const char *unused, void *arg)
> +{
> +	size_t count;
> +	char *remaining;
> +	int err;
> +	loff_t ppos = 0;
> +	struct ctl_table *ctl, *found = NULL;
> +
> +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
> +		return 0;

Is there any way we can use a slash separated path.  I know
in practice there are not any sysctl names that don't have
a '.' in them but why should we artifically limit ourselves?

I guess as long as we don't mind not being able to set sysctls
that have a '.' in them it doesn't matter.

> +
> +	param += sizeof("sysctl.") - 1;
> +
> +	remaining = param;
> +	ctl = &sysctl_base_table[0];
> +
> +	while(ctl->procname != 0) {
              ^^^^^^^^^^^^^^^^^^

Please either test "while(ctl->procname)" or
"while(ctl->procname != NULL)" testing against 0 makes it look like
procname is an integer.  The style in the kernel is to test against
NULL, to make it clear when something is a pointer.

> +		int len = strlen(ctl->procname);

You should have done "strchr(remaining)" and figured out if there is
another '.' and only compared up to that dot.  Probably skipping this
entry entirely if the two lengths don't match.

> +		if (strncmp(remaining, ctl->procname, len)) {
> +			ctl++;
> +			continue;
> +		}
> +		if (ctl->child) {
> +			if (remaining[len] == '.') {
> +				remaining += len + 1;
> +				ctl = ctl->child;
> +				continue;
> +			}
> +		} else {
> +			if (remaining[len] == '\0') {
> +				found = ctl;
> +				break;
> +			}
> +		}
> +		ctl++;

There should be exactly one match for a name a table.
If you get here the code should break, not continue on.

> +	}
> +
> +	if (!found) {
> +		pr_warn("Unknown sysctl param '%s' on command line", param);
> +		return 0;
> +	}
> +
> +	if (!(found->mode & 0200)) {
> +		pr_warn("Cannot set sysctl '%s=%s' from command line - not writable",
> +			param, val);
> +		return 0;
> +	}
> +
> +	count = strlen(val);
> +	err = found->proc_handler(found, 1, val, &count, &ppos);
> +
> +	if (err)
> +		pr_warn("Error %d setting sysctl '%s=%s' from command line",
> +			err, param, val);
> +
> +	pr_debug("Set sysctl '%s=%s' from command line", param, val);
> +
> +	return 0;
> +}

You really should be able to have this code live in
fs/proc/proc_sysctl.c and utilize lookup_entry.

That should give you the ability to lookup any sysctl.  If
kernel/sysctl.c is compiled into the kernel proc_sysctl.c is compiled
into the kernel.  Systems that don't select CONFIG_PROC_SYSCTL won't
have any sysctl tables installed at all so they do not make sense to
consider or design for.

Further it will be faster to lookup the sysctls using the code from
proc_sysctl.c as it constructs an rbtree of all of the entries in
a directory.  The code might as well take advantage of that for large
directories.

Arguably the main sysctl tables in kernel/sysctl.c should be split up so
that things are more localized and there is less global state exported
throughout the kernel.  I certainly don't want to discourage anyone from
doing that just so their sysctl can be used on the command line.


Hmm.  There is a big gotcha in here and I think it should be mentioned.
This code only works because no one has done set_fs(KERNEL_DS).  Which
means this only works with strings that are kernel addresses essentially
by mistake.  A big fat comment documenting why it is safe to pass in
kernel addresses to a function that takes a "char __user*" pointer
would be very good.

Eric

  parent reply index

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25 12:03 Vlastimil Babka
2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
2020-03-25 14:29   ` Michal Hocko
2020-03-25 14:36     ` Vlastimil Babka
2020-03-25 14:44       ` Michal Hocko
2020-03-25 22:42   ` Kees Cook
2020-03-29 15:00   ` Arvind Sankar
2020-03-25 21:21 ` [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Kees Cook
2020-03-26  9:30   ` Vlastimil Babka
2020-03-25 22:20 ` Eric W. Biederman [this message]
2020-03-25 22:54   ` Kees Cook
2020-03-26  6:58   ` Michal Hocko
2020-03-26  7:21     ` Kees Cook
2020-03-26 12:45     ` Eric W. Biederman
2020-03-30 22:09       ` Luis Chamberlain
2020-03-26 13:30     ` Christian Brauner
2020-03-26 13:39       ` Michal Hocko
2020-03-26 13:29   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kuc5b5z.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=gpiccoli@canonical.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-api Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-api/0 linux-api/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-api linux-api/ https://lore.kernel.org/linux-api \
		linux-api@vger.kernel.org
	public-inbox-index linux-api

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-api


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git