All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-mm@kvack.org, Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Michal Hocko <mhocko@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Guilherme G . Piccoli" <gpiccoli@canonical.com>
Subject: Re: [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line
Date: Thu, 26 Mar 2020 14:29:50 +0100	[thread overview]
Message-ID: <a551765c-829c-187a-efe5-31caab1d3ac1@suse.cz> (raw)
In-Reply-To: <874kuc5b5z.fsf@x220.int.ebiederm.org>

On 3/25/20 11:20 PM, Eric W. Biederman wrote:
> Vlastimil Babka <vbabka@suse.cz> writes:
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -1980,6 +1980,68 @@ int __init sysctl_init(void)
>>  	return 0;
>>  }
>>  
>> +/* Set sysctl value passed on kernel command line. */
>> +int process_sysctl_arg(char *param, char *val,
>> +			       const char *unused, void *arg)
>> +{
>> +	size_t count;
>> +	char *remaining;
>> +	int err;
>> +	loff_t ppos = 0;
>> +	struct ctl_table *ctl, *found = NULL;
>> +
>> +	if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1))
>> +		return 0;
> 
> Is there any way we can use a slash separated path.  I know

We could, but as others explained, people and tools are used to the dot
separation, so I think the only sensible options are supporting only dot, or
both dot and slash.

> in practice there are not any sysctl names that don't have
> a '.' in them but why should we artifically limit ourselves?

Existing tools would probably break (or perhaps sysctl(8) is smarter than I
think, dunno).

> I guess as long as we don't mind not being able to set sysctls
> that have a '.' in them it doesn't matter.

Right.

>> +
>> +	param += sizeof("sysctl.") - 1;
>> +
>> +	remaining = param;
>> +	ctl = &sysctl_base_table[0];
>> +
>> +	while(ctl->procname != 0) {
>               ^^^^^^^^^^^^^^^^^^
> 
> Please either test "while(ctl->procname)" or
> "while(ctl->procname != NULL)" testing against 0 makes it look like
> procname is an integer.  The style in the kernel is to test against
> NULL, to make it clear when something is a pointer.

OK

>> +		int len = strlen(ctl->procname);
> 
> You should have done "strchr(remaining)" and figured out if there is
> another '.' and only compared up to that dot.  Probably skipping this
> entry entirely if the two lengths don't match.

That's also possible, but AFAICS my code works as intended, as I explained in a
reply to Kees, and also below:

>> +		if (strncmp(remaining, ctl->procname, len)) {
>> +			ctl++;
>> +			continue;
>> +		}
>> +		if (ctl->child) {
>> +			if (remaining[len] == '.') {
>> +				remaining += len + 1;
>> +				ctl = ctl->child;
>> +				continue;
>> +			}
>> +		} else {
>> +			if (remaining[len] == '\0') {
>> +				found = ctl;
>> +				break;
>> +			}
>> +		}
>> +		ctl++;
> 
> There should be exactly one match for a name a table.
> If you get here the code should break, not continue on.

If there existed e.g. both "vm.swap" and "vm.swappiness" options and user passed
"vm.swappiness=10", but the "swap" ctl entry was encountered first, it will
succeed the strncmp(), but then realize "swap" was just a prefix of what user
specified (remaining[len] is not '\0') and hence continue serching for other
matches.

>> +	}
>> +
>> +	if (!found) {
>> +		pr_warn("Unknown sysctl param '%s' on command line", param);
>> +		return 0;
>> +	}
>> +
>> +	if (!(found->mode & 0200)) {
>> +		pr_warn("Cannot set sysctl '%s=%s' from command line - not writable",
>> +			param, val);
>> +		return 0;
>> +	}
>> +
>> +	count = strlen(val);
>> +	err = found->proc_handler(found, 1, val, &count, &ppos);
>> +
>> +	if (err)
>> +		pr_warn("Error %d setting sysctl '%s=%s' from command line",
>> +			err, param, val);
>> +
>> +	pr_debug("Set sysctl '%s=%s' from command line", param, val);
>> +
>> +	return 0;
>> +}
> 
> You really should be able to have this code live in
> fs/proc/proc_sysctl.c and utilize lookup_entry.
> 
> That should give you the ability to lookup any sysctl.  If
> kernel/sysctl.c is compiled into the kernel proc_sysctl.c is compiled
> into the kernel.  Systems that don't select CONFIG_PROC_SYSCTL won't
> have any sysctl tables installed at all so they do not make sense to
> consider or design for.

I see. In fact one reason why I tried to avoid the proc stuff was your commit
61a47c1ad3a4 ("sysctl: Remove the sysctl system call") and this part:

> As this removes one of the few uses of the internal kernel mount
> of proc I hope this allows for even more simplifications of the
> proc filesystem.

But if you now suggest using the kernel mount then sure, it I don't object make
the code simpler and handle all sysctls.

> Further it will be faster to lookup the sysctls using the code from
> proc_sysctl.c as it constructs an rbtree of all of the entries in
> a directory.  The code might as well take advantage of that for large
> directories.
> 
> Arguably the main sysctl tables in kernel/sysctl.c should be split up so
> that things are more localized and there is less global state exported
> throughout the kernel.  I certainly don't want to discourage anyone from
> doing that just so their sysctl can be used on the command line.

Fair point.

> Hmm.  There is a big gotcha in here and I think it should be mentioned.
> This code only works because no one has done set_fs(KERNEL_DS).  Which
> means this only works with strings that are kernel addresses essentially
> by mistake.  A big fat comment documenting why it is safe to pass in
> kernel addresses to a function that takes a "char __user*" pointer
> would be very good.

Thanks, didn't realize that.

> Eric
> 


      parent reply	other threads:[~2020-03-26 13:29 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25 12:03 [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Vlastimil Babka
2020-03-25 12:03 ` [RFC v2 2/2] kernel/sysctl: support handling command line aliases Vlastimil Babka
2020-03-25 14:29   ` Michal Hocko
2020-03-25 14:36     ` Vlastimil Babka
2020-03-25 14:44       ` Michal Hocko
2020-03-25 22:42   ` Kees Cook
2020-03-29 15:00   ` Arvind Sankar
2020-03-25 21:21 ` [RFC v2 1/2] kernel/sysctl: support setting sysctl parameters from kernel command line Kees Cook
2020-03-26  9:30   ` Vlastimil Babka
2020-03-25 22:20 ` Eric W. Biederman
2020-03-25 22:20   ` Eric W. Biederman
2020-03-25 22:54   ` Kees Cook
2020-03-26  6:58   ` Michal Hocko
2020-03-26  7:21     ` Kees Cook
2020-03-26 12:45     ` Eric W. Biederman
2020-03-26 12:45       ` Eric W. Biederman
2020-03-30 22:09       ` Luis Chamberlain
2020-03-26 13:30     ` Christian Brauner
2020-03-26 13:39       ` Michal Hocko
2020-03-26 13:29   ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a551765c-829c-187a-efe5-31caab1d3ac1@suse.cz \
    --to=vbabka@suse.cz \
    --cc=ebiederm@xmission.com \
    --cc=gpiccoli@canonical.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=willy@infradead.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.