linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrey Ignatov <rdna@fb.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"ast@kernel.org" <ast@kernel.org>, Roman Gushchin <guro@fb.com>,
	Kernel Team <Kernel-team@fb.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v2 bpf-next 05/21] bpf: Introduce bpf_sysctl_{get,set}_new_value helpers
Date: Fri, 5 Apr 2019 00:20:41 +0000	[thread overview]
Message-ID: <20190405002032.GA89960@rdna-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <368fcbf5-4144-c95b-d39a-d756546a67d5@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> [Thu, 2019-04-04 07:38 -0700]:
> On 03/26/2019 01:43 AM, Andrey Ignatov wrote:
> > Add helpers to work with new value being written to sysctl by user
> > space.
> > 
> > bpf_sysctl_get_new_value() copies value being written to sysctl into
> > provided buffer.
> > 
> > bpf_sysctl_set_new_value() overrides new value being written by user
> > space with a one from provided buffer. Buffer should contain string
> > representation of the value, similar to what can be seen in /proc/sys/.
> > 
> > Both helpers can be used only on sysctl write.
> > 
> > File position matters and can be managed by an interface that will be
> > introduced separately. E.g. if user space calls sys_write to a file in
> > /proc/sys/ at file position = X, where X > 0, then the value set by
> > bpf_sysctl_set_new_value() will be written starting from X. If program
> > wants to override whole value with specified buffer, file position has
> > to be set to zero.
> > 
> > Documentation for the new helpers is provided in bpf.h UAPI.
> > 
> > Signed-off-by: Andrey Ignatov <rdna@fb.com>
> > ---
> >  fs/proc/proc_sysctl.c      | 22 ++++++++---
> >  include/linux/bpf-cgroup.h |  8 ++--
> >  include/linux/filter.h     |  3 ++
> >  include/uapi/linux/bpf.h   | 38 +++++++++++++++++-
> >  kernel/bpf/cgroup.c        | 81 +++++++++++++++++++++++++++++++++++++-
> >  5 files changed, 142 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> > index 72f4a096c146..4d1ab22774f7 100644
> > --- a/fs/proc/proc_sysctl.c
> > +++ b/fs/proc/proc_sysctl.c
> > @@ -570,8 +570,8 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
> >  	struct inode *inode = file_inode(filp);
> >  	struct ctl_table_header *head = grab_header(inode);
> >  	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
> > +	void *new_buf = NULL;
> >  	ssize_t error;
> > -	size_t res;
> >  
> >  	if (IS_ERR(head))
> >  		return PTR_ERR(head);
> > @@ -589,15 +589,27 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
> >  	if (!table->proc_handler)
> >  		goto out;
> >  
> > -	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write);
> > +	error = BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, &count,
> > +					   &new_buf);
> >  	if (error)
> >  		goto out;
> >  
> >  	/* careful: calling conventions are nasty here */
> > -	res = count;
> > -	error = table->proc_handler(table, write, buf, &res, ppos);
> > +	if (new_buf) {
> > +		mm_segment_t old_fs;
> > +
> > +		old_fs = get_fs();
> > +		set_fs(KERNEL_DS);
> > +		error = table->proc_handler(table, write, (void __user *)new_buf,
> > +					    &count, ppos);
> > +		set_fs(old_fs);
> 
> From quick glance on the set, the above stood out. Afaik, there is an ongoing
> effort by Al and other fs/core folks (as visible in the git log) to get rid of
> set_fs() calls in the tree with the goal of eliminating this interface /entirely/
> (more context on 'why' here: https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_722267_&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=3jAokpHyGuCuJ834j-tttQ&m=fmn6jd1czDvp5a6GeSw0zLMxU3VRcgm1ohqwAPOKf38&s=AfbvJ91arUzm328cKzcHXeeb104boAx8NJjsoIU6Lbk&e=). Is there a better
> way to achieve the above w/o needing it?

That's a good question. I've spent quite a lot of time looking for a
better way and the only one I'm aware of so far is to change
proc_handler signature, so that it accepts kernel 'buffer', and copying
between user and kernel happens outside of proc_handler.

But it would require changing all proc_handler implementations as well
so that they accept kernel 'buffer' and don't copy data from/to user by
themselves and there are just too many sysctl proc_handler
implementations:

  % git grep -E '\.proc_handler\s+=\s+' | \
  	sed -Ee 's/^.*\.proc_handler\s+=\s+//' | sort -u | wc -l
  179
  % git grep -lE '\.proc_handler\s+=\s+' | wc -l
  103


, i.e. it's huge refactoring that can be really hard to upstream.

Also I looked at the LWN article you mentioned and found this branch
that cleans up set_fs use cases:
http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/setfs-elimination

I see it uses either similar approach, i.e. introduce separate (or use
available) function that accepts kernel buffer, then copy data from user
and pass it to such a function. Another approach, I see in the branch,
is to use iovec iterators (that's mentioned in the LWN article as well),
but that again would require changing all proc_handler implementations.

That's being said I don't know a better way to do this w/o huge
refactoring.


> > +		kfree(new_buf);
> > +	} else {
> > +		error = table->proc_handler(table, write, buf, &count, ppos);
> > +	}
> > +
> >  	if (!error)
> > -		error = res;
> > +		error = count;
> >  out:
> >  	sysctl_head_finish(head);
> >  
> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > index b1c45da20a26..1e97271f9a10 100644
> > --- a/include/linux/bpf-cgroup.h
> > +++ b/include/linux/bpf-cgroup.h
> > @@ -113,7 +113,8 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
> >  
> >  int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
> >  				   struct ctl_table *table, int write,
> > -				   enum bpf_attach_type type);
> > +				   void __user *buf, size_t *pcount,
> > +				   void **new_buf, enum bpf_attach_type type);
> >  
> >  static inline enum bpf_cgroup_storage_type cgroup_storage_type(
> >  	struct bpf_map *map)
> > @@ -261,11 +262,12 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
> >  })
> >  
> >  
> > -#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write)			       \
> > +#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write, buf, count, nbuf)       \
> >  ({									       \
> >  	int __ret = 0;							       \
> >  	if (cgroup_bpf_enabled)						       \
> >  		__ret = __cgroup_bpf_run_filter_sysctl(head, table, write,     \
> > +						       buf, count, nbuf,       \
> >  						       BPF_CGROUP_SYSCTL);     \
> >  	__ret;								       \
> >  })
> > @@ -338,7 +340,7 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
> >  #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
> >  #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
> >  #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
> > -#define BPF_CGROUP_RUN_PROG_SYSCTL(head, table, write) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,nbuf) ({ 0; })
> >  
> >  #define for_each_cgroup_storage_type(stype) for (; false; )
> >  

-- 
Andrey Ignatov

  reply	other threads:[~2019-04-05  0:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-26  0:43 [PATCH v2 bpf-next 00/21] bpf: Sysctl hook Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 01/21] bpf: Add base proto function for cgroup-bpf programs Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 02/21] bpf: Sysctl hook Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 03/21] bpf: Introduce bpf_sysctl_get_name helper Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 04/21] bpf: Introduce bpf_sysctl_get_current_value helper Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 05/21] bpf: Introduce bpf_sysctl_{get,set}_new_value helpers Andrey Ignatov
2019-04-04 14:37   ` Daniel Borkmann
2019-04-05  0:20     ` Andrey Ignatov [this message]
2019-03-26  0:43 ` [PATCH v2 bpf-next 06/21] bpf: Add file_pos field to bpf_sysctl ctx Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 07/21] bpf: Sync bpf.h to tools/ Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 08/21] libbpf: Support sysctl hook Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 09/21] selftests/bpf: Test sysctl section name Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 10/21] selftests/bpf: Test BPF_CGROUP_SYSCTL Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 11/21] selftests/bpf: Test bpf_sysctl_get_name helper Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 12/21] selftests/bpf: Test sysctl_get_current_value helper Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 13/21] selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 14/21] selftests/bpf: Test file_pos field in bpf_sysctl ctx Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 15/21] bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 16/21] bpf: Introduce bpf_strtol and bpf_strtoul helpers Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 17/21] bpf: Sync bpf.h to tools/ Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 18/21] selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 19/21] selftests/bpf: Test ARG_PTR_TO_LONG arg type Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 20/21] selftests/bpf: Test bpf_strtol and bpf_strtoul helpers Andrey Ignatov
2019-03-26  0:43 ` [PATCH v2 bpf-next 21/21] selftests/bpf: C based test for sysctl and strtoX Andrey Ignatov
2019-03-26 20:34 ` [PATCH v2 bpf-next 00/21] bpf: Sysctl hook Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190405002032.GA89960@rdna-mbp.dhcp.thefacebook.com \
    --to=rdna@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=adobriyan@gmail.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=guro@fb.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).