Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
       [not found] ` <20180503043604.1604587-2-ast@kernel.org>
@ 2018-05-04 19:56   ` Luis R. Rodriguez
  2018-05-05  1:37     ` Alexei Starovoitov
  2018-05-10 22:27     ` Kees Cook
  0 siblings, 2 replies; 8+ messages in thread
From: Luis R. Rodriguez @ 2018-05-04 19:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, torvalds, gregkh, luto, netdev, linux-kernel,
	kernel-team, Al Viro, David Howells, Mimi Zohar, Kees Cook,
	Andrew Morton, Dominik Brodowski, Hugh Dickins, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, Rafael J. Wysocki,
	Luis R. Rodriguez, Linux FS Devel, pjones, mjg59,
	linux-security-module, linux-integrity

What a mighty short list of reviewers. Adding some more. My review below.
I'd appreciate a Cc on future versions of these patches.

On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> Introduce helper:
> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> struct umh_info {
>        struct file *pipe_to_umh;
>        struct file *pipe_from_umh;
>        pid_t pid;
> };
> 
> that GPLed kernel modules (signed or unsigned) can use it to execute part
> of its own data as swappable user mode process.
> 
> The kernel will do:
> - mount "tmpfs"

Actually its a *shared* vfsmount tmpfs for all umh blobs.

> - allocate a unique file in tmpfs
> - populate that file with [data, data + len] bytes
> - user-mode-helper code will do_execve that file and, before the process
>   starts, the kernel will create two unix pipes for bidirectional
>   communication between kernel module and umh
> - close tmpfs file, effectively deleting it
> - the fork_usermode_blob will return zero on success and populate
>   'struct umh_info' with two unix pipes and the pid of the user process

But since its using UMH_WAIT_EXEC, all we can guarantee currently is the
inception point was intended, well though out, and will run, but the return
value in no way reflects the success or not of the execution. More below.

> As the first step in the development of the bpfilter project
> the fork_usermode_blob() helper is introduced to allow user mode code
> to be invoked from a kernel module. The idea is that user mode code plus
> normal kernel module code are built as part of the kernel build
> and installed as traditional kernel module into distro specified location,
> such that from a distribution point of view, there is
> no difference between regular kernel modules and kernel modules + umh code.
> Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> by a kernel module doesn't make it any special from kernel and user space
> tooling point of view.
> 
> Such approach enables kernel to delegate functionality traditionally done
> by the kernel modules into the user space processes (either root or !root) and
> reduces security attack surface of the new code. The buggy umh code would crash
> the user process, but not the kernel. Another advantage is that umh code
> of the kernel module can be debugged and tested out of user space
> (e.g. opening the possibility to run clang sanitizers, fuzzers or
> user space test suites on the umh code).
> In case of the bpfilter project such architecture allows complex control plane
> to be done in the user space while bpf based data plane stays in the kernel.
> 
> Since umh can crash, can be oom-ed by the kernel, killed by the admin,
> the kernel module that uses them (like bpfilter) needs to manage life
> time of umh on its own via two unix pipes and the pid of umh.
> 
> The exit code of such kernel module should kill the umh it started,
> so that rmmod of the kernel module will cleanup the corresponding umh.
> Just like if the kernel module does kmalloc() it should kfree() it in the exit code.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  fs/exec.c               |  38 ++++++++---
>  include/linux/binfmts.h |   1 +
>  include/linux/umh.h     |  12 ++++
>  kernel/umh.c            | 176 +++++++++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 215 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 183059c427b9..30a36c2a39bf 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1706,14 +1706,13 @@ static int exec_binprm(struct linux_binprm *bprm)
>  /*
>   * sys_execve() executes a new program.
>   */
> -static int do_execveat_common(int fd, struct filename *filename,
> -			      struct user_arg_ptr argv,
> -			      struct user_arg_ptr envp,
> -			      int flags)
> +static int __do_execve_file(int fd, struct filename *filename,
> +			    struct user_arg_ptr argv,
> +			    struct user_arg_ptr envp,
> +			    int flags, struct file *file)
>  {
>  	char *pathbuf = NULL;
>  	struct linux_binprm *bprm;
> -	struct file *file;
>  	struct files_struct *displaced;
>  	int retval;

Keeping in mind a fuzzer...

Note, right below this, and not shown here in the hunk, is:

        if (IS_ERR(filename))                                                   
                return PTR_ERR(filename)
>  
> @@ -1752,7 +1751,8 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	check_unsafe_exec(bprm);
>  	current->in_execve = 1;
>  
> -	file = do_open_execat(fd, filename, flags);
> +	if (!file)
> +		file = do_open_execat(fd, filename, flags);


Here we now seem to allow !file and open the file with the passed fd as in
the old days. This is an expected change.

>  	retval = PTR_ERR(file);
>  	if (IS_ERR(file))
>  		goto out_unmark;
> @@ -1760,7 +1760,9 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	sched_exec();
>  
>  	bprm->file = file;
> -	if (fd == AT_FDCWD || filename->name[0] == '/') {
> +	if (!filename) {

If anything shouldn't this be:

	if (IS_ERR(filename))

But, wouldn't the above first branch in the routine catch this?

> +		bprm->filename = "none";

Given this seems like a desirable branch which was tested, wonder how this
ever got set if the above branch in the first hunk I noted hit true?

In any case, we seem to have two cases, can we rule out the exact requirements
at the top so we can bail out with an error code if one or the other way to
call this function does not align with expectations?

> +	} else if (fd == AT_FDCWD || filename->name[0] == '/') {
>  		bprm->filename = filename->name;
>  	} else {
>  		if (filename->name[0] == '\0')
> @@ -1826,7 +1828,8 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	task_numa_free(current);
>  	free_bprm(bprm);
>  	kfree(pathbuf);
> -	putname(filename);
> +	if (filename)
> +		putname(filename);
>  	if (displaced)
>  		put_files_struct(displaced);
>  	return retval;
> @@ -1849,10 +1852,27 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	if (displaced)
>  		reset_files_struct(displaced);
>  out_ret:
> -	putname(filename);
> +	if (filename)
> +		putname(filename);
>  	return retval;
>  }
>  
> +static int do_execveat_common(int fd, struct filename *filename,

Further signs the filename is now optional. But I don't understand how these
branches ever be true, but perhaps I'm missing something?

> +			      struct user_arg_ptr argv,
> +			      struct user_arg_ptr envp,
> +			      int flags)
> +{
> +	return __do_execve_file(fd, filename, argv, envp, flags, NULL);
> +}
> +
> +int do_execve_file(struct file *file, void *__argv, void *__envp)
> +{
> +	struct user_arg_ptr argv = { .ptr.native = __argv };
> +	struct user_arg_ptr envp = { .ptr.native = __envp };
> +
> +	return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
> +}

Or maybe do the semantics expectations checks here, so we don't clutter
do_execveat_common() with them?

> +
>  int do_execve(struct filename *filename,
>  	const char __user *const __user *__argv,
>  	const char __user *const __user *__envp)
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 4955e0863b83..c05f24fac4f6 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -150,5 +150,6 @@ extern int do_execveat(int, struct filename *,
>  		       const char __user * const __user *,
>  		       const char __user * const __user *,
>  		       int);
> +int do_execve_file(struct file *file, void *__argv, void *__envp);
>  
>  #endif /* _LINUX_BINFMTS_H */
> diff --git a/include/linux/umh.h b/include/linux/umh.h
> index 244aff638220..5c812acbb80a 100644
> --- a/include/linux/umh.h
> +++ b/include/linux/umh.h
> @@ -22,8 +22,10 @@ struct subprocess_info {
>  	const char *path;
>  	char **argv;
>  	char **envp;
> +	struct file *file;
>  	int wait;
>  	int retval;
> +	pid_t pid;
>  	int (*init)(struct subprocess_info *info, struct cred *new);
>  	void (*cleanup)(struct subprocess_info *info);
>  	void *data;

While at it, can you kdocify struct subprocess_info and add new docs for at
least these two entires you are adding ?

> @@ -38,6 +40,16 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp,
>  			  int (*init)(struct subprocess_info *info, struct cred *new),
>  			  void (*cleanup)(struct subprocess_info *), void *data);
>  
> +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> +			  int (*init)(struct subprocess_info *info, struct cred *new),
> +			  void (*cleanup)(struct subprocess_info *), void *data);

Likewise but on the umc.c file.

> +struct umh_info {
> +	struct file *pipe_to_umh;
> +	struct file *pipe_from_umh;
> +	pid_t pid;
> +};

Likewise.

> +int fork_usermode_blob(void *data, size_t len, struct umh_info *info);

Likewise but on the umc.c files.

> +
>  extern int
>  call_usermodehelper_exec(struct subprocess_info *info, int wait);
>  
> diff --git a/kernel/umh.c b/kernel/umh.c
> index f76b3ff876cf..c3f418d7d51a 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -25,6 +25,8 @@
>  #include <linux/ptrace.h>
>  #include <linux/async.h>
>  #include <linux/uaccess.h>
> +#include <linux/shmem_fs.h>
> +#include <linux/pipe_fs_i.h>
>  
>  #include <trace/events/module.h>
>  
> @@ -97,9 +99,13 @@ static int call_usermodehelper_exec_async(void *data)
>  
>  	commit_creds(new);
>  
> -	retval = do_execve(getname_kernel(sub_info->path),
> -			   (const char __user *const __user *)sub_info->argv,
> -			   (const char __user *const __user *)sub_info->envp);
> +	if (sub_info->file)
> +		retval = do_execve_file(sub_info->file,
> +					sub_info->argv, sub_info->envp);
> +	else
> +		retval = do_execve(getname_kernel(sub_info->path),
> +				   (const char __user *const __user *)sub_info->argv,
> +				   (const char __user *const __user *)sub_info->envp);
>  out:
>  	sub_info->retval = retval;
>  	/*
> @@ -185,6 +191,8 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
>  		if (pid < 0) {
>  			sub_info->retval = pid;
>  			umh_complete(sub_info);
> +		} else {
> +			sub_info->pid = pid;
>  		}
>  	}
>  }
> @@ -393,6 +401,168 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
>  }
>  EXPORT_SYMBOL(call_usermodehelper_setup);
>  
> +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> +		int (*init)(struct subprocess_info *info, struct cred *new),
> +		void (*cleanup)(struct subprocess_info *info), void *data)

Should be static, no other users outside of this file.
Please use umh_setup_file().

> +{
> +	struct subprocess_info *sub_info;

Considering a possible fuzzer triggering random data we should probably
return NULL early and avoid the kzalloc if:

	if (!file || !init || !cleanup)
		return NULL;

Is data optional? The kdoc could clarify this.


> +
> +	sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL);
> +	if (!sub_info)
> +		return NULL;
> +
> +	INIT_WORK(&sub_info->work, call_usermodehelper_exec_work);
> +	sub_info->path = "none";
> +	sub_info->file = file;
> +	sub_info->init = init;
> +	sub_info->cleanup = cleanup;
> +	sub_info->data = data;
> +	return sub_info;
> +}
> +
> +static struct vfsmount *umh_fs;
> +
> +static int init_tmpfs(void)

Please use umh_init_tmpfs(). Also see init/main.c do_basic_setup() which calls
usermodehelper_enable() prior to do_initcalls(). Now, fortunately TMPFS is only
bool, saving us from some races and we do call tmpfs's init first shmem_init():

static void __init do_basic_setup(void)
{
	cpuset_init_smp();
	shmem_init();
	driver_init();
	init_irq_proc();
	do_ctors();
	usermodehelper_enable();
	do_initcalls();
}

But it also means we're enabling your new call call fork_usermode_blob() on
early init code even if we're not setup. Since this umh tmpfs vfsmount is
shared I'd say just call this init right before usermodehelper_enable()
on do_basic_setup().

> +{
> +	struct file_system_type *type;
> +
> +	if (umh_fs)
> +		return 0;
> +	type = get_fs_type("tmpfs");
> +	if (!type)
> +		return -ENODEV;
> +	umh_fs = kern_mount(type);
> +	if (IS_ERR(umh_fs)) {
> +		int err = PTR_ERR(umh_fs);
> +
> +		put_filesystem(type);
> +		umh_fs = NULL;
> +		return err;
> +	}
> +	return 0;
> +}
> +
> +static int alloc_tmpfs_file(size_t size, struct file **filp)

Please use umh_alloc_tmpfs_file()

> +{
> +	struct file *file;
> +	int err;
> +
> +	err = init_tmpfs();
> +	if (err)
> +		return err;
> +	file = shmem_file_setup_with_mnt(umh_fs, "umh", size, VM_NORESERVE);
> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +	*filp = file;
> +	return 0;
> +}
> +
> +static int populate_file(struct file *file, const void *data, size_t size)

Please use umh_populate_file()

> +{
> +	size_t offset = 0;
> +	int err;
> +
> +	do {
> +		unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
> +		struct page *page;
> +		void *pgdata, *vaddr;
> +
> +		err = pagecache_write_begin(file, file->f_mapping, offset, len,
> +					    0, &page, &pgdata);
> +		if (err < 0)
> +			goto fail;
> +
> +		vaddr = kmap(page);
> +		memcpy(vaddr, data, len);
> +		kunmap(page);
> +
> +		err = pagecache_write_end(file, file->f_mapping, offset, len,
> +					  len, page, pgdata);
> +		if (err < 0)
> +			goto fail;
> +
> +		size -= len;
> +		data += len;
> +		offset += len;
> +	} while (size);

Character for character, this looks like a wonderful copy and paste from
i915_gem_object_create_from_data()'s own loop which does the same exact
thing. Perhaps its time for a helper on mm/filemap.c with an export so
if a bug is fixed in one place its fixed in both places.

> +	return 0;
> +fail:
> +	return err;
> +}
> +
> +static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)

The function name umh_pipe_setup() is also used on fs/coredump.c, with the same
prototype, perhaps rename that before we take this on, even if both are static.

> +{
> +	struct umh_info *umh_info = info->data;
> +	struct file *from_umh[2];
> +	struct file *to_umh[2];
> +	int err;
> +
> +	/* create pipe to send data to umh */
> +	err = create_pipe_files(to_umh, 0);
> +	if (err)
> +		return err;
> +	err = replace_fd(0, to_umh[0], 0);
> +	fput(to_umh[0]);
> +	if (err < 0) {
> +		fput(to_umh[1]);
> +		return err;
> +	}
> +
> +	/* create pipe to receive data from umh */
> +	err = create_pipe_files(from_umh, 0);
> +	if (err) {
> +		fput(to_umh[1]);
> +		replace_fd(0, NULL, 0);
> +		return err;
> +	}
> +	err = replace_fd(1, from_umh[1], 0);
> +	fput(from_umh[1]);
> +	if (err < 0) {
> +		fput(to_umh[1]);
> +		replace_fd(0, NULL, 0);
> +		fput(from_umh[0]);
> +		return err;
> +	}
> +
> +	umh_info->pipe_to_umh = to_umh[1];
> +	umh_info->pipe_from_umh = from_umh[0];
> +	return 0;
> +}
> +
> +static void umh_save_pid(struct subprocess_info *info)
> +{
> +	struct umh_info *umh_info = info->data;
> +
> +	umh_info->pid = info->pid;
> +}
> +
> +int fork_usermode_blob(void *data, size_t len, struct umh_info *info)

Please use umh_fork_blob()

> +{
> +	struct subprocess_info *sub_info;
> +	struct file *file = NULL;
> +	int err;
> +
> +	err = alloc_tmpfs_file(len, &file);
> +	if (err)
> +		return err;
> +
> +	err = populate_file(file, data, len);
> +	if (err)
> +		goto out;
> +
> +	err = -ENOMEM;
> +	sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup,
> +						  umh_save_pid, info);
> +	if (!sub_info)
> +		goto out;
> +
> +	err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);

Alright, neat, so to be clear, we're just glad to try inception, we have no
clue or idea what the real return value would be, its up to the caller to track
the progress somehow?

Can you add a kdoc entry for this and clarify requirements?

Also, can you extend lib/test_kmod.c with a test case for this with its own
demo and try to blow it up?

I hadn't tried suspend/resume during a kmod test, but since we're using a
kernel_thread() I wouldn't be surprised if we barf while stress testing the
module loader. Its surely a corner case, but better mention that now than cry
later if we get heavy umh modules and all of a sudden we start using this for
whatever reason close to suspend.

  Luis

> +out:
> +	fput(file);
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(fork_usermode_blob);
> +
>  /**
>   * call_usermodehelper_exec - start a usermode application
>   * @sub_info: information about the subprocessa
> -- 
> 2.9.5


-- 
Do not panic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
  2018-05-04 19:56   ` [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper Luis R. Rodriguez
@ 2018-05-05  1:37     ` Alexei Starovoitov
  2018-05-07 18:39       ` Luis R. Rodriguez
  2018-05-10 22:27     ` Kees Cook
  1 sibling, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2018-05-05  1:37 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alexei Starovoitov, davem, daniel, torvalds, gregkh, luto,
	netdev, linux-kernel, kernel-team, Al Viro, David Howells,
	Mimi Zohar, Kees Cook, Andrew Morton, Dominik Brodowski,
	Hugh Dickins, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Rafael J. Wysocki, Linux FS Devel, pjones, mjg59,
	linux-security-module, linux-integrity

On Fri, May 04, 2018 at 07:56:43PM +0000, Luis R. Rodriguez wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.

sure.

> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> > Introduce helper:
> > int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> > struct umh_info {
> >        struct file *pipe_to_umh;
> >        struct file *pipe_from_umh;
> >        pid_t pid;
> > };
> > 
> > that GPLed kernel modules (signed or unsigned) can use it to execute part
> > of its own data as swappable user mode process.
> > 
> > The kernel will do:
> > - mount "tmpfs"
> 
> Actually its a *shared* vfsmount tmpfs for all umh blobs.

yep

> > - allocate a unique file in tmpfs
> > - populate that file with [data, data + len] bytes
> > - user-mode-helper code will do_execve that file and, before the process
> >   starts, the kernel will create two unix pipes for bidirectional
> >   communication between kernel module and umh
> > - close tmpfs file, effectively deleting it
> > - the fork_usermode_blob will return zero on success and populate
> >   'struct umh_info' with two unix pipes and the pid of the user process
> 
> But since its using UMH_WAIT_EXEC, all we can guarantee currently is the
> inception point was intended, well though out, and will run, but the return
> value in no way reflects the success or not of the execution. More below.

yep

> > As the first step in the development of the bpfilter project
> > the fork_usermode_blob() helper is introduced to allow user mode code
> > to be invoked from a kernel module. The idea is that user mode code plus
> > normal kernel module code are built as part of the kernel build
> > and installed as traditional kernel module into distro specified location,
> > such that from a distribution point of view, there is
> > no difference between regular kernel modules and kernel modules + umh code.
> > Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
> > by a kernel module doesn't make it any special from kernel and user space
> > tooling point of view.
> > 
> > Such approach enables kernel to delegate functionality traditionally done
> > by the kernel modules into the user space processes (either root or !root) and
> > reduces security attack surface of the new code. The buggy umh code would crash
> > the user process, but not the kernel. Another advantage is that umh code
> > of the kernel module can be debugged and tested out of user space
> > (e.g. opening the possibility to run clang sanitizers, fuzzers or
> > user space test suites on the umh code).
> > In case of the bpfilter project such architecture allows complex control plane
> > to be done in the user space while bpf based data plane stays in the kernel.
> > 
> > Since umh can crash, can be oom-ed by the kernel, killed by the admin,
> > the kernel module that uses them (like bpfilter) needs to manage life
> > time of umh on its own via two unix pipes and the pid of umh.
> > 
> > The exit code of such kernel module should kill the umh it started,
> > so that rmmod of the kernel module will cleanup the corresponding umh.
> > Just like if the kernel module does kmalloc() it should kfree() it in the exit code.
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  fs/exec.c               |  38 ++++++++---
> >  include/linux/binfmts.h |   1 +
> >  include/linux/umh.h     |  12 ++++
> >  kernel/umh.c            | 176 +++++++++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 215 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 183059c427b9..30a36c2a39bf 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1706,14 +1706,13 @@ static int exec_binprm(struct linux_binprm *bprm)
> >  /*
> >   * sys_execve() executes a new program.
> >   */
> > -static int do_execveat_common(int fd, struct filename *filename,
> > -			      struct user_arg_ptr argv,
> > -			      struct user_arg_ptr envp,
> > -			      int flags)
> > +static int __do_execve_file(int fd, struct filename *filename,
> > +			    struct user_arg_ptr argv,
> > +			    struct user_arg_ptr envp,
> > +			    int flags, struct file *file)
> >  {
> >  	char *pathbuf = NULL;
> >  	struct linux_binprm *bprm;
> > -	struct file *file;
> >  	struct files_struct *displaced;
> >  	int retval;
> 
> Keeping in mind a fuzzer...
> 
> Note, right below this, and not shown here in the hunk, is:
> 
>         if (IS_ERR(filename))                                                   
>                 return PTR_ERR(filename)
> >  
> > @@ -1752,7 +1751,8 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	check_unsafe_exec(bprm);
> >  	current->in_execve = 1;
> >  
> > -	file = do_open_execat(fd, filename, flags);
> > +	if (!file)
> > +		file = do_open_execat(fd, filename, flags);
> 
> 
> Here we now seem to allow !file and open the file with the passed fd as in
> the old days. This is an expected change.
> 
> >  	retval = PTR_ERR(file);
> >  	if (IS_ERR(file))
> >  		goto out_unmark;
> > @@ -1760,7 +1760,9 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	sched_exec();
> >  
> >  	bprm->file = file;
> > -	if (fd == AT_FDCWD || filename->name[0] == '/') {
> > +	if (!filename) {
> 
> If anything shouldn't this be:
> 
> 	if (IS_ERR(filename))

nope. it should be !filename as do_execve_file() passes NULL.
IS_ERR != IS_ERR_OR_NULL

> But, wouldn't the above first branch in the routine catch this?
> 
> > +		bprm->filename = "none";
> 
> Given this seems like a desirable branch which was tested, wonder how this
> ever got set if the above branch in the first hunk I noted hit true?
> 
> In any case, we seem to have two cases, can we rule out the exact requirements
> at the top so we can bail out with an error code if one or the other way to
> call this function does not align with expectations?

I think you're misreading the code or I don't understand the concern at all.

> > +	} else if (fd == AT_FDCWD || filename->name[0] == '/') {
> >  		bprm->filename = filename->name;
> >  	} else {
> >  		if (filename->name[0] == '\0')
> > @@ -1826,7 +1828,8 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	task_numa_free(current);
> >  	free_bprm(bprm);
> >  	kfree(pathbuf);
> > -	putname(filename);
> > +	if (filename)
> > +		putname(filename);
> >  	if (displaced)
> >  		put_files_struct(displaced);
> >  	return retval;
> > @@ -1849,10 +1852,27 @@ static int do_execveat_common(int fd, struct filename *filename,
> >  	if (displaced)
> >  		reset_files_struct(displaced);
> >  out_ret:
> > -	putname(filename);
> > +	if (filename)
> > +		putname(filename);
> >  	return retval;
> >  }
> >  
> > +static int do_execveat_common(int fd, struct filename *filename,
> 
> Further signs the filename is now optional. But I don't understand how these
> branches ever be true, but perhaps I'm missing something?
> 
> > +			      struct user_arg_ptr argv,
> > +			      struct user_arg_ptr envp,
> > +			      int flags)
> > +{
> > +	return __do_execve_file(fd, filename, argv, envp, flags, NULL);
> > +}
> > +
> > +int do_execve_file(struct file *file, void *__argv, void *__envp)
> > +{
> > +	struct user_arg_ptr argv = { .ptr.native = __argv };
> > +	struct user_arg_ptr envp = { .ptr.native = __envp };
> > +
> > +	return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
> > +}
> 
> Or maybe do the semantics expectations checks here, so we don't clutter
> do_execveat_common() with them?

specifically ?

> > +
> >  int do_execve(struct filename *filename,
> >  	const char __user *const __user *__argv,
> >  	const char __user *const __user *__envp)
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index 4955e0863b83..c05f24fac4f6 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -150,5 +150,6 @@ extern int do_execveat(int, struct filename *,
> >  		       const char __user * const __user *,
> >  		       const char __user * const __user *,
> >  		       int);
> > +int do_execve_file(struct file *file, void *__argv, void *__envp);
> >  
> >  #endif /* _LINUX_BINFMTS_H */
> > diff --git a/include/linux/umh.h b/include/linux/umh.h
> > index 244aff638220..5c812acbb80a 100644
> > --- a/include/linux/umh.h
> > +++ b/include/linux/umh.h
> > @@ -22,8 +22,10 @@ struct subprocess_info {
> >  	const char *path;
> >  	char **argv;
> >  	char **envp;
> > +	struct file *file;
> >  	int wait;
> >  	int retval;
> > +	pid_t pid;
> >  	int (*init)(struct subprocess_info *info, struct cred *new);
> >  	void (*cleanup)(struct subprocess_info *info);
> >  	void *data;
> 
> While at it, can you kdocify struct subprocess_info and add new docs for at
> least these two entires you are adding ?

Sorry 'while at it' doesn't sound as a good reason to
add kdoc now instead of later.

> > @@ -38,6 +40,16 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp,
> >  			  int (*init)(struct subprocess_info *info, struct cred *new),
> >  			  void (*cleanup)(struct subprocess_info *), void *data);
> >  
> > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> > +			  int (*init)(struct subprocess_info *info, struct cred *new),
> > +			  void (*cleanup)(struct subprocess_info *), void *data);
> 
> Likewise but on the umc.c file.
> 
> > +struct umh_info {
> > +	struct file *pipe_to_umh;
> > +	struct file *pipe_from_umh;
> > +	pid_t pid;
> > +};
> 
> Likewise.

what 'likewise' ? The kdoc ?

> 
> > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> 
> Likewise but on the umc.c files.
> 
> > +
> >  extern int
> >  call_usermodehelper_exec(struct subprocess_info *info, int wait);
> >  
> > diff --git a/kernel/umh.c b/kernel/umh.c
> > index f76b3ff876cf..c3f418d7d51a 100644
> > --- a/kernel/umh.c
> > +++ b/kernel/umh.c
> > @@ -25,6 +25,8 @@
> >  #include <linux/ptrace.h>
> >  #include <linux/async.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/shmem_fs.h>
> > +#include <linux/pipe_fs_i.h>
> >  
> >  #include <trace/events/module.h>
> >  
> > @@ -97,9 +99,13 @@ static int call_usermodehelper_exec_async(void *data)
> >  
> >  	commit_creds(new);
> >  
> > -	retval = do_execve(getname_kernel(sub_info->path),
> > -			   (const char __user *const __user *)sub_info->argv,
> > -			   (const char __user *const __user *)sub_info->envp);
> > +	if (sub_info->file)
> > +		retval = do_execve_file(sub_info->file,
> > +					sub_info->argv, sub_info->envp);
> > +	else
> > +		retval = do_execve(getname_kernel(sub_info->path),
> > +				   (const char __user *const __user *)sub_info->argv,
> > +				   (const char __user *const __user *)sub_info->envp);
> >  out:
> >  	sub_info->retval = retval;
> >  	/*
> > @@ -185,6 +191,8 @@ static void call_usermodehelper_exec_work(struct work_struct *work)
> >  		if (pid < 0) {
> >  			sub_info->retval = pid;
> >  			umh_complete(sub_info);
> > +		} else {
> > +			sub_info->pid = pid;
> >  		}
> >  	}
> >  }
> > @@ -393,6 +401,168 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv,
> >  }
> >  EXPORT_SYMBOL(call_usermodehelper_setup);
> >  
> > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
> > +		int (*init)(struct subprocess_info *info, struct cred *new),
> > +		void (*cleanup)(struct subprocess_info *info), void *data)
> 
> Should be static, no other users outside of this file.

good catch. will change to static.

> Please use umh_setup_file().

sorry. makes no sense.
There is call_usermodehelper_setup() right above it.
call_usermodehelper_setup_file() just follows the naming convention.
If you prefer shorter names, both have to be renamed in the separate patch series.

> > +{
> > +	struct subprocess_info *sub_info;
> 
> Considering a possible fuzzer triggering random data we should probably
> return NULL early and avoid the kzalloc if:

I missing 'fuzzer' point here and earlier.
'fuzzer' cannot reach here. It's all internal api.

> 	if (!file || !init || !cleanup)
> 		return NULL;

sorry, nope. in kernel we don't do defensive programming like this.

> Is data optional? The kdoc could clarify this.

No. Should be obvious from this patch.
The only caller of call_usermodehelper_setup_file() is fork_usermode_blob()
and it passes 'struct umh_info *info'.

> 
> > +
> > +	sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL);
> > +	if (!sub_info)
> > +		return NULL;
> > +
> > +	INIT_WORK(&sub_info->work, call_usermodehelper_exec_work);
> > +	sub_info->path = "none";
> > +	sub_info->file = file;
> > +	sub_info->init = init;
> > +	sub_info->cleanup = cleanup;
> > +	sub_info->data = data;
> > +	return sub_info;
> > +}
> > +
> > +static struct vfsmount *umh_fs;
> > +
> > +static int init_tmpfs(void)
> 
> Please use umh_init_tmpfs(). 

ok

> Also see init/main.c do_basic_setup() which calls
> usermodehelper_enable() prior to do_initcalls(). Now, fortunately TMPFS is only
> bool, saving us from some races and we do call tmpfs's init first shmem_init():
> 
> static void __init do_basic_setup(void)
> {
> 	cpuset_init_smp();
> 	shmem_init();
> 	driver_init();
> 	init_irq_proc();
> 	do_ctors();
> 	usermodehelper_enable();
> 	do_initcalls();
> }
> 
> But it also means we're enabling your new call call fork_usermode_blob() on
> early init code even if we're not setup. Since this umh tmpfs vfsmount is
> shared I'd say just call this init right before usermodehelper_enable()
> on do_basic_setup().

Not following.
Why init_tmpfs() should be called by __init function?
Are you saying make 'static struct vfsmount *shm_mnt;'
global and use it here? so no init_tmpfs() necessary?
I think that can work, but feels that having two
tmpfs mounts (one for shmem and one for umh) is cleaner.

> 
> > +{
> > +	struct file_system_type *type;
> > +
> > +	if (umh_fs)
> > +		return 0;
> > +	type = get_fs_type("tmpfs");
> > +	if (!type)
> > +		return -ENODEV;
> > +	umh_fs = kern_mount(type);
> > +	if (IS_ERR(umh_fs)) {
> > +		int err = PTR_ERR(umh_fs);
> > +
> > +		put_filesystem(type);
> > +		umh_fs = NULL;
> > +		return err;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int alloc_tmpfs_file(size_t size, struct file **filp)
> 
> Please use umh_alloc_tmpfs_file()

ok

> > +{
> > +	struct file *file;
> > +	int err;
> > +
> > +	err = init_tmpfs();
> > +	if (err)
> > +		return err;
> > +	file = shmem_file_setup_with_mnt(umh_fs, "umh", size, VM_NORESERVE);
> > +	if (IS_ERR(file))
> > +		return PTR_ERR(file);
> > +	*filp = file;
> > +	return 0;
> > +}
> > +
> > +static int populate_file(struct file *file, const void *data, size_t size)
> 
> Please use umh_populate_file()

ok

> > +{
> > +	size_t offset = 0;
> > +	int err;
> > +
> > +	do {
> > +		unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
> > +		struct page *page;
> > +		void *pgdata, *vaddr;
> > +
> > +		err = pagecache_write_begin(file, file->f_mapping, offset, len,
> > +					    0, &page, &pgdata);
> > +		if (err < 0)
> > +			goto fail;
> > +
> > +		vaddr = kmap(page);
> > +		memcpy(vaddr, data, len);
> > +		kunmap(page);
> > +
> > +		err = pagecache_write_end(file, file->f_mapping, offset, len,
> > +					  len, page, pgdata);
> > +		if (err < 0)
> > +			goto fail;
> > +
> > +		size -= len;
> > +		data += len;
> > +		offset += len;
> > +	} while (size);
> 
> Character for character, this looks like a wonderful copy and paste from
> i915_gem_object_create_from_data()'s own loop which does the same exact
> thing. Perhaps its time for a helper on mm/filemap.c with an export so
> if a bug is fixed in one place its fixed in both places.

yes, of course, but not right now.
Once it all lands that will be the time to create common helper.
It's not a good idea to mess with i915 in one patch set.

> > +	return 0;
> > +fail:
> > +	return err;
> > +}
> > +
> > +static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> 
> The function name umh_pipe_setup() is also used on fs/coredump.c, with the same
> prototype, perhaps rename that before we take this on, even if both are static.

hmm. why?
These are two static functions with the same name, so?
tags get confusing?

> > +{
> > +	struct umh_info *umh_info = info->data;
> > +	struct file *from_umh[2];
> > +	struct file *to_umh[2];
> > +	int err;
> > +
> > +	/* create pipe to send data to umh */
> > +	err = create_pipe_files(to_umh, 0);
> > +	if (err)
> > +		return err;
> > +	err = replace_fd(0, to_umh[0], 0);
> > +	fput(to_umh[0]);
> > +	if (err < 0) {
> > +		fput(to_umh[1]);
> > +		return err;
> > +	}
> > +
> > +	/* create pipe to receive data from umh */
> > +	err = create_pipe_files(from_umh, 0);
> > +	if (err) {
> > +		fput(to_umh[1]);
> > +		replace_fd(0, NULL, 0);
> > +		return err;
> > +	}
> > +	err = replace_fd(1, from_umh[1], 0);
> > +	fput(from_umh[1]);
> > +	if (err < 0) {
> > +		fput(to_umh[1]);
> > +		replace_fd(0, NULL, 0);
> > +		fput(from_umh[0]);
> > +		return err;
> > +	}
> > +
> > +	umh_info->pipe_to_umh = to_umh[1];
> > +	umh_info->pipe_from_umh = from_umh[0];
> > +	return 0;
> > +}
> > +
> > +static void umh_save_pid(struct subprocess_info *info)
> > +{
> > +	struct umh_info *umh_info = info->data;
> > +
> > +	umh_info->pid = info->pid;
> > +}
> > +
> > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
> 
> Please use umh_fork_blob()

sorry, no. fork_usermode_blob() is much more descriptive name.

> > +{
> > +	struct subprocess_info *sub_info;
> > +	struct file *file = NULL;
> > +	int err;
> > +
> > +	err = alloc_tmpfs_file(len, &file);
> > +	if (err)
> > +		return err;
> > +
> > +	err = populate_file(file, data, len);
> > +	if (err)
> > +		goto out;
> > +
> > +	err = -ENOMEM;
> > +	sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup,
> > +						  umh_save_pid, info);
> > +	if (!sub_info)
> > +		goto out;
> > +
> > +	err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);
> 
> Alright, neat, so to be clear, we're just glad to try inception, we have no
> clue or idea what the real return value would be, its up to the caller to track
> the progress somehow?

yep.

> Can you add a kdoc entry for this and clarify requirements?

ok. I'll add a comment to this helper.

> Also, can you extend lib/test_kmod.c with a test case for this with its own
> demo and try to blow it up?

in what sense? bpfilter is the test and the driving component for it.
I'm expecting that folks who want to use this helper to do usb drivers
in user space may want to extend this helper further, but that's their job.

> I hadn't tried suspend/resume during a kmod test, but since we're using a
> kernel_thread() I wouldn't be surprised if we barf while stress testing the
> module loader. Its surely a corner case, but better mention that now than cry
> later if we get heavy umh modules and all of a sudden we start using this for
> whatever reason close to suspend.

folks that care about suspend/resume should do that.
I'm happy to gate this helper for !CONFIG_SUSPEND, since I have
no idea what issues can be uncovered, how to fix them and no desire to do so.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
  2018-05-05  1:37     ` Alexei Starovoitov
@ 2018-05-07 18:39       ` Luis R. Rodriguez
  2018-05-09  2:25         ` Alexei Starovoitov
  0 siblings, 1 reply; 8+ messages in thread
From: Luis R. Rodriguez @ 2018-05-07 18:39 UTC (permalink / raw)
  To: Alexei Starovoitov, Eric Paris, Matthew Auld, Josh Triplett,
	Kirill A. Shutemov, Joonas Lahtinen, Chris Wilson,
	Stephen Smalley, Eric W. Biederman, Mimi Zohar
  Cc: Luis R. Rodriguez, Alexei Starovoitov, davem, daniel, torvalds,
	gregkh, luto, netdev, linux-kernel, kernel-team, Al Viro,
	David Howells, Kees Cook, Andrew Morton, Dominik Brodowski,
	Hugh Dickins, Jann Horn, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Rafael J. Wysocki, Linux FS Devel,
	pjones, mjg59, linux-security-module, linux-integrity

On Fri, May 04, 2018 at 06:37:11PM -0700, Alexei Starovoitov wrote:
> On Fri, May 04, 2018 at 07:56:43PM +0000, Luis R. Rodriguez wrote:
> > What a mighty short list of reviewers. Adding some more. My review below.
> > I'd appreciate a Cc on future versions of these patches.
> 
> sure.
> 
> > On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> > > Introduce helper:
> > > int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> > > struct umh_info {
> > >        struct file *pipe_to_umh;
> > >        struct file *pipe_from_umh;
> > >        pid_t pid;
> > > };
> > > 
> > > that GPLed kernel modules (signed or unsigned) can use it to execute part
> > > of its own data as swappable user mode process.
> > > 
> > > The kernel will do:
> > > - mount "tmpfs"
> > 
> > Actually its a *shared* vfsmount tmpfs for all umh blobs.
> 
> yep

OK just note CONFIG_TMPFS can be disabled, and likewise for CONFIG_SHMEM,
in which case tmpfs and shmem are replaced by a simple ramfs code, more
appropriate for systems without swap.

> > > +static struct vfsmount *umh_fs;
> > > +
> > > +static int init_tmpfs(void)
> > 
> > Please use umh_init_tmpfs(). 
> 
> ok
> 
> > Also see init/main.c do_basic_setup() which calls
> > usermodehelper_enable() prior to do_initcalls(). Now, fortunately TMPFS is only
> > bool, saving us from some races and we do call tmpfs's init first shmem_init():
> > 
> > static void __init do_basic_setup(void)
> > {
> > 	cpuset_init_smp();
> > 	shmem_init();
> > 	driver_init();
> > 	init_irq_proc();
> > 	do_ctors();
> > 	usermodehelper_enable();
> > 	do_initcalls();
> > }
> > 
> > But it also means we're enabling your new call call fork_usermode_blob() on
> > early init code even if we're not setup. Since this umh tmpfs vfsmount is
> > shared I'd say just call this init right before usermodehelper_enable()
> > on do_basic_setup().
> 
> Not following.
> Why init_tmpfs() should be called by __init function?

Nope, not at all, I was suggesting:

diff --git a/init/main.c b/init/main.c
index 0697284a28ee..67a48fbd96ca 100644
--- a/init/main.c
+++ b/init/main.c
@@ -973,6 +973,7 @@ static void __init do_basic_setup(void)
 	driver_init();
 	init_irq_proc();
 	do_ctors();
+	umh_init_tmpfs();
 	usermodehelper_enable();
 	do_initcalls();
 }

Mainly to avoid the locking situation Jann Horn noted, and also provide
proper kernel ordering expectations.

> Are you saying make 'static struct vfsmount *shm_mnt;'
> global and use it here? so no init_tmpfs() necessary?
> I think that can work, but feels that having two
> tmpfs mounts (one for shmem and one for umh) is cleaner.

No, but now that you mention it... if a shared vfsmount is not used the
/sys/kernel/mm/transparent_hugepage/shmem_enabled knob for using huge pages
would not be followed for umh modules. For the i915 driver this was *why*
they ended up adding shmem_file_setup_with_mnt(), they wanted huge pages to
support huge-gtt-pages. What is the rationale behind umh.c using it for
umh modules?

Users of shmem_kernel_file_setup() spawned later out of the desire to
*avoid* LSMs since it didn't make sense in their case as their inodes
are never exposed to userspace. Such is the case for ipc/shm.c and
security/keys/big_key.c. Refer to commit c7277090927a5 ("security: shmem:
implement kernel private shmem inodes") and then commit e1832f2923ec9
("ipc: use private shmem or hugetlbfs inodes for shm segments").

In this new umh usermode modules case we are:

  a) actually mapping data already extracted by the kernel somehow from
     a file somehow, presumably from /lib/modules/ path somewhere, but
     again this is not visible to umc.c, as it just gets called with:

	fork_usermode_blob(void *data, size_t len, struct umh_info *info)

  b) Creating the respective tmpfs file with shmem_file_setup_with_mnt()
     with our on our own shared struct vfsmount *umh_fs.

  c) Populating the file created and stuffing it with our data passed

  d) Calling do_execve_file() on it.

Its not clear to me why you used shmem_file_setup_with_mnt() in this case. What
are the gains? It would make sense to use shmem_kernel_file_setup() to avoid an
LSM check on step b) *but* only if we already had a proper LSM check on step
a).

I checked how you use fork_usermode_blob() in a) and in this case the kernel
module bpfilter would be loaded first, and for that we already have an LSM
check / hook for that module. From my review then, the magic done on your
second patch to stuff the userspace application into the module should be
irrelevant to us from an LSM perspective.

So, I can see a reason to use shmem_kernel_file_setup() but not I cannot
see a reason to be using      shmem_file_setup_with_mnt() at the moment.

I Cc'd tmpfs and LSM folks for further feedback.

> > > +{
> > > +	size_t offset = 0;
> > > +	int err;
> > > +
> > > +	do {
> > > +		unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
> > > +		struct page *page;
> > > +		void *pgdata, *vaddr;
> > > +
> > > +		err = pagecache_write_begin(file, file->f_mapping, offset, len,
> > > +					    0, &page, &pgdata);
> > > +		if (err < 0)
> > > +			goto fail;
> > > +
> > > +		vaddr = kmap(page);
> > > +		memcpy(vaddr, data, len);
> > > +		kunmap(page);
> > > +
> > > +		err = pagecache_write_end(file, file->f_mapping, offset, len,
> > > +					  len, page, pgdata);
> > > +		if (err < 0)
> > > +			goto fail;
> > > +
> > > +		size -= len;
> > > +		data += len;
> > > +		offset += len;
> > > +	} while (size);
> > 
> > Character for character, this looks like a wonderful copy and paste from
> > i915_gem_object_create_from_data()'s own loop which does the same exact
> > thing. Perhaps its time for a helper on mm/filemap.c with an export so
> > if a bug is fixed in one place its fixed in both places.
> 
> yes, of course, but not right now.
> Once it all lands that will be the time to create common helper.
> It's not a good idea to mess with i915 in one patch set.

Either way works with me, so long as its done.

> > > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
> > 
> > Please use umh_fork_blob()
> 
> sorry, no. fork_usermode_blob() is much more descriptive name.

Prefixing new umh.c symbols with umh_*() makes it very clear this came from
kernel/umh.c functionality. I've been dealing with other places in the kernel
that have conflated their own use of kernel/umh.c functionality what they
expose in both code and documentation, and correcting this has taken a long
time. Best avoid future confusion and be consistent with new exported symbols
for umc.c functionality.

Also, descriptive as fork_usermode_blob() may seem there is a possible future
clash with a more generic call.

> > Also, can you extend lib/test_kmod.c with a test case for this with its own
> > demo and try to blow it up?
> 
> in what sense? bpfilter is the test and the driving component for it.

That's the thing, it shouldn't be.

We are adding *new* functionality here, I don't want to require enabling
bpfitler or its dependencies to test generic umh user module loading
functionality.  For instance, we have lib/test_module.c to help test generic
module loading, regardless of the functionality or requirements for other
modules.

> I'm expecting that folks who want to use this helper to do usb drivers
> in user space may want to extend this helper further, but that's their job.

I don't even want to test USB, I am just interesting in the *very* *very*
basic aspects of it. A simple lib/test_umh_module.c would do with a respective
check that its loaded, given this is a requirement from the API. It also helps
folks understand the core basic knobs without having to look at bfilter code.

If we're going to get this merged I'd be interested in doing ongoing testing
with 0day with  simple UMH module with and without CONFIG_SHMEM for instance
and check that it works in both cases.

Testing this may seem irrelevant to you but keep in mind we're also already
testing just general kernel module loading. As silly as it may seem, adding new
functionality and a respective test case lets us try to avoid regressions, and
provide small unit tests to help reproduce issues and corner case situations
you may not be considering.

  Luis

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
       [not found] ` <20180503043604.1604587-3-ast@kernel.org>
@ 2018-05-07 18:51   ` Luis R. Rodriguez
  2018-05-09  2:29     ` Alexei Starovoitov
  0 siblings, 1 reply; 8+ messages in thread
From: Luis R. Rodriguez @ 2018-05-07 18:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, torvalds, gregkh, luto, netdev, linux-kernel,
	kernel-team, Juergen Gross, Eric Paris, Matthew Auld,
	Josh Triplett, Kirill A. Shutemov, Joonas Lahtinen, Chris Wilson,
	Stephen Smalley, Eric W. Biederman, Mimi Zohar, David Howells,
	Kees Cook, Andrew Morton, Dominik Brodowski, Hugh Dickins,
	Jann Horn, Jani Nikula, Rodrigo Vivi, David Airlie,
	Rafael J. Wysocki, Linux FS Devel, pjones, Michael Matz, mjg59,
	linux-security-module, linux-integrity

On Wed, May 02, 2018 at 09:36:02PM -0700, Alexei Starovoitov wrote:
> bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
> and user mode helper code that is embedded into bpfilter.ko
> 
> The steps to build bpfilter.ko are the following:
> - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
> - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
>   is converted into bpfilter_umh.o object file
>   with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
>   Example:
>   $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
>   0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
>   0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
>   0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
> - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
> 
> bpfilter_kern.c is a normal kernel module code that calls
> the fork_usermode_blob() helper to execute part of its own data
> as a user mode process.
> 
> Notice that _binary_net_bpfilter_bpfilter_umh_start - end
> is placed into .init.rodata section, so it's freed as soon as __init
> function of bpfilter.ko is finished.
> As part of __init the bpfilter.ko does first request/reply action
> via two unix pipe provided by fork_usermode_blob() helper to
> make sure that umh is healthy. If not it will kill it via pid.

It does this very fast, right away. On a really slow system how are you sure
that this won't race and the execution of the check happens early on prior to
letting the actual setup trigger? After all, we're calling the userpsace
process in async mode. We could preempt it now.

> Later bpfilter_process_sockopt() will be called from bpfilter hooks
> in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
> 
> If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
> kill umh as well.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpfilter.h      | 15 +++++++
>  include/uapi/linux/bpfilter.h | 21 ++++++++++
>  net/Kconfig                   |  2 +
>  net/Makefile                  |  1 +
>  net/bpfilter/Kconfig          | 17 ++++++++
>  net/bpfilter/Makefile         | 24 +++++++++++
>  net/bpfilter/bpfilter_kern.c  | 93 +++++++++++++++++++++++++++++++++++++++++++
>  net/bpfilter/main.c           | 63 +++++++++++++++++++++++++++++
>  net/bpfilter/msgfmt.h         | 17 ++++++++
>  net/ipv4/Makefile             |  2 +
>  net/ipv4/bpfilter/Makefile    |  2 +
>  net/ipv4/bpfilter/sockopt.c   | 42 +++++++++++++++++++
>  net/ipv4/ip_sockglue.c        | 17 ++++++++
>  13 files changed, 316 insertions(+)
>  create mode 100644 include/linux/bpfilter.h
>  create mode 100644 include/uapi/linux/bpfilter.h
>  create mode 100644 net/bpfilter/Kconfig
>  create mode 100644 net/bpfilter/Makefile
>  create mode 100644 net/bpfilter/bpfilter_kern.c
>  create mode 100644 net/bpfilter/main.c
>  create mode 100644 net/bpfilter/msgfmt.h
>  create mode 100644 net/ipv4/bpfilter/Makefile
>  create mode 100644 net/ipv4/bpfilter/sockopt.c
> 
> diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h
> new file mode 100644
> index 000000000000..687b1760bb9f
> --- /dev/null
> +++ b/include/linux/bpfilter.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_BPFILTER_H
> +#define _LINUX_BPFILTER_H
> +
> +#include <uapi/linux/bpfilter.h>
> +
> +struct sock;
> +int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char *optval,
> +			    unsigned int optlen);
> +int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char *optval,
> +			    int *optlen);
> +extern int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
> +				       char __user *optval,
> +				       unsigned int optlen, bool is_set);
> +#endif
> diff --git a/include/uapi/linux/bpfilter.h b/include/uapi/linux/bpfilter.h
> new file mode 100644
> index 000000000000..2ec3cc99ea4c
> --- /dev/null
> +++ b/include/uapi/linux/bpfilter.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _UAPI_LINUX_BPFILTER_H
> +#define _UAPI_LINUX_BPFILTER_H
> +
> +#include <linux/if.h>
> +
> +enum {
> +	BPFILTER_IPT_SO_SET_REPLACE = 64,
> +	BPFILTER_IPT_SO_SET_ADD_COUNTERS = 65,
> +	BPFILTER_IPT_SET_MAX,
> +};
> +
> +enum {
> +	BPFILTER_IPT_SO_GET_INFO = 64,
> +	BPFILTER_IPT_SO_GET_ENTRIES = 65,
> +	BPFILTER_IPT_SO_GET_REVISION_MATCH = 66,
> +	BPFILTER_IPT_SO_GET_REVISION_TARGET = 67,
> +	BPFILTER_IPT_GET_MAX,
> +};
> +
> +#endif /* _UAPI_LINUX_BPFILTER_H */
> diff --git a/net/Kconfig b/net/Kconfig
> index b62089fb1332..ed6368b306fa 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -201,6 +201,8 @@ source "net/bridge/netfilter/Kconfig"
>  
>  endif
>  
> +source "net/bpfilter/Kconfig"
> +
>  source "net/dccp/Kconfig"
>  source "net/sctp/Kconfig"
>  source "net/rds/Kconfig"
> diff --git a/net/Makefile b/net/Makefile
> index a6147c61b174..7f982b7682bd 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_TLS)		+= tls/
>  obj-$(CONFIG_XFRM)		+= xfrm/
>  obj-$(CONFIG_UNIX)		+= unix/
>  obj-$(CONFIG_NET)		+= ipv6/
> +obj-$(CONFIG_BPFILTER)		+= bpfilter/
>  obj-$(CONFIG_PACKET)		+= packet/
>  obj-$(CONFIG_NET_KEY)		+= key/
>  obj-$(CONFIG_BRIDGE)		+= bridge/
> diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
> new file mode 100644
> index 000000000000..782a732b9a5c
> --- /dev/null
> +++ b/net/bpfilter/Kconfig
> @@ -0,0 +1,17 @@
> +menuconfig BPFILTER
> +	bool "BPF based packet filtering framework (BPFILTER)"
> +	default n
> +	depends on NET && BPF
> +	help
> +	  This builds experimental bpfilter framework that is aiming to
> +	  provide netfilter compatible functionality via BPF
> +
> +if BPFILTER
> +config BPFILTER_UMH
> +	tristate "bpftiler kernel module with user mode helper"
> +	default m
> +	depends on m
> +	help
> +	  This builds bpfilter kernel module with embedded user mode helper
> +endif
> +
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> new file mode 100644
> index 000000000000..897eedae523e
> --- /dev/null
> +++ b/net/bpfilter/Makefile
> @@ -0,0 +1,24 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for the Linux BPFILTER layer.
> +#
> +
> +hostprogs-y := bpfilter_umh
> +bpfilter_umh-objs := main.o
> +HOSTCFLAGS += -I. -Itools/include/
> +
> +# a bit of elf magic to convert bpfilter_umh binary into a binary blob
> +# inside bpfilter_umh.o elf file referenced by
> +# _binary_net_bpfilter_bpfilter_umh_start symbol
> +# which bpfilter_kern.c passes further into umh blob loader at run-time
> +quiet_cmd_copy_umh = GEN $@
> +      cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> +      $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
> +      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
> +      --rename-section .data=.init.rodata $< $@

Cool, but so our expectation is that the compiler sets this symbol, how
are we sure it will always be set?

> +
> +$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
> +	$(call cmd,copy_umh)
> +
> +obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
> +bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
> new file mode 100644
> index 000000000000..e0a6fdd5842b
> --- /dev/null
> +++ b/net/bpfilter/bpfilter_kern.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/umh.h>
> +#include <linux/bpfilter.h>
> +#include <linux/sched.h>
> +#include <linux/sched/signal.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include "msgfmt.h"
> +
> +#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
> +#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
> +
> +extern char UMH_start;
> +extern char UMH_end;
> +
> +static struct umh_info info;
> +
> +static void shutdown_umh(struct umh_info *info)
> +{
> +	struct task_struct *tsk;
> +
> +	tsk = pid_task(find_vpid(info->pid), PIDTYPE_PID);
> +	if (tsk)
> +		force_sig(SIGKILL, tsk);
> +	fput(info->pipe_to_umh);
> +	fput(info->pipe_from_umh);
> +}
> +
> +static void stop_umh(void)
> +{
> +	if (bpfilter_process_sockopt) {
> +		bpfilter_process_sockopt = NULL;
> +		shutdown_umh(&info);
> +	}
> +}
> +
> +static int __bpfilter_process_sockopt(struct sock *sk, int optname,
> +				      char __user *optval,
> +				      unsigned int optlen, bool is_set)
> +{
> +	struct mbox_request req;
> +	struct mbox_reply reply;
> +	loff_t pos;
> +	ssize_t n;
> +
> +	req.is_set = is_set;
> +	req.pid = current->pid;
> +	req.cmd = optname;
> +	req.addr = (long)optval;
> +	req.len = optlen;
> +	n = __kernel_write(info.pipe_to_umh, &req, sizeof(req), &pos);
> +	if (n != sizeof(req)) {
> +		pr_err("write fail %zd\n", n);
> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	pos = 0;
> +	n = kernel_read(info.pipe_from_umh, &reply, sizeof(reply), &pos);
> +	if (n != sizeof(reply)) {
> +		pr_err("read fail %zd\n", n);
> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	return reply.status;
> +}
> +
> +static int __init load_umh(void)
> +{
> +	int err;
> +
> +	err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
> +	if (err)
> +		return err;
> +	pr_info("Loaded umh pid %d\n", info.pid);
> +	bpfilter_process_sockopt = &__bpfilter_process_sockopt;
> +
> +	if (__bpfilter_process_sockopt(NULL, 0, 0, 0, 0) != 0) {

See, here, what if the userspace process gets preemtped and we run this
check afterwards? Is that possible?

  Luis

> +		stop_umh();
> +		return -EFAULT;
> +	}
> +	return 0;
> +}
> +
> +static void __exit fini_umh(void)
> +{
> +	stop_umh();
> +}
> +module_init(load_umh);
> +module_exit(fini_umh);
> +MODULE_LICENSE("GPL");
> diff --git a/net/bpfilter/main.c b/net/bpfilter/main.c
> new file mode 100644
> index 000000000000..81bbc1684896
> --- /dev/null
> +++ b/net/bpfilter/main.c
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <sys/uio.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <sys/socket.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include "include/uapi/linux/bpf.h"
> +#include <asm/unistd.h>
> +#include "msgfmt.h"
> +
> +int debug_fd;
> +
> +static int handle_get_cmd(struct mbox_request *cmd)
> +{
> +	switch (cmd->cmd) {
> +	case 0:
> +		return 0;
> +	default:
> +		break;
> +	}
> +	return -ENOPROTOOPT;
> +}
> +
> +static int handle_set_cmd(struct mbox_request *cmd)
> +{
> +	return -ENOPROTOOPT;
> +}
> +
> +static void loop(void)
> +{
> +	while (1) {
> +		struct mbox_request req;
> +		struct mbox_reply reply;
> +		int n;
> +
> +		n = read(0, &req, sizeof(req));
> +		if (n != sizeof(req)) {
> +			dprintf(debug_fd, "invalid request %d\n", n);
> +			return;
> +		}
> +
> +		reply.status = req.is_set ?
> +			handle_set_cmd(&req) :
> +			handle_get_cmd(&req);
> +
> +		n = write(1, &reply, sizeof(reply));
> +		if (n != sizeof(reply)) {
> +			dprintf(debug_fd, "reply failed %d\n", n);
> +			return;
> +		}
> +	}
> +}
> +
> +int main(void)
> +{
> +	debug_fd = open("/dev/console", 00000002 | 00000100);
> +	dprintf(debug_fd, "Started bpfilter\n");
> +	loop();
> +	close(debug_fd);
> +	return 0;
> +}
> diff --git a/net/bpfilter/msgfmt.h b/net/bpfilter/msgfmt.h
> new file mode 100644
> index 000000000000..94b9ac9e5114
> --- /dev/null
> +++ b/net/bpfilter/msgfmt.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _NET_BPFILTER_MSGFMT_H
> +#define _NET_BPFTILER_MSGFMT_H
> +
> +struct mbox_request {
> +	__u64 addr;
> +	__u32 len;
> +	__u32 is_set;
> +	__u32 cmd;
> +	__u32 pid;
> +};
> +
> +struct mbox_reply {
> +	__u32 status;
> +};
> +
> +#endif
> diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
> index b379520f9133..7018f91c5a39 100644
> --- a/net/ipv4/Makefile
> +++ b/net/ipv4/Makefile
> @@ -16,6 +16,8 @@ obj-y     := route.o inetpeer.o protocol.o \
>  	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
>  	     metrics.o
>  
> +obj-$(CONFIG_BPFILTER) += bpfilter/
> +
>  obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
>  obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
>  obj-$(CONFIG_PROC_FS) += proc.o
> diff --git a/net/ipv4/bpfilter/Makefile b/net/ipv4/bpfilter/Makefile
> new file mode 100644
> index 000000000000..ce262d76cc48
> --- /dev/null
> +++ b/net/ipv4/bpfilter/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_BPFILTER) += sockopt.o
> +
> diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c
> new file mode 100644
> index 000000000000..42a96d2d8d05
> --- /dev/null
> +++ b/net/ipv4/bpfilter/sockopt.c
> @@ -0,0 +1,42 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/uaccess.h>
> +#include <linux/bpfilter.h>
> +#include <uapi/linux/bpf.h>
> +#include <linux/wait.h>
> +#include <linux/kmod.h>
> +
> +int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
> +				char __user *optval,
> +				unsigned int optlen, bool is_set);
> +EXPORT_SYMBOL_GPL(bpfilter_process_sockopt);
> +
> +int bpfilter_mbox_request(struct sock *sk, int optname, char __user *optval,
> +			  unsigned int optlen, bool is_set)
> +{
> +	if (!bpfilter_process_sockopt) {
> +		int err = request_module("bpfilter");
> +
> +		if (err)
> +			return err;
> +		if (!bpfilter_process_sockopt)
> +			return -ECHILD;
> +	}
> +	return bpfilter_process_sockopt(sk, optname, optval, optlen, is_set);
> +}
> +
> +int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval,
> +			    unsigned int optlen)
> +{
> +	return bpfilter_mbox_request(sk, optname, optval, optlen, true);
> +}
> +
> +int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval,
> +			    int __user *optlen)
> +{
> +	int len;
> +
> +	if (get_user(len, optlen))
> +		return -EFAULT;
> +
> +	return bpfilter_mbox_request(sk, optname, optval, len, false);
> +}
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 5ad2d8ed3a3f..e0791faacb24 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -47,6 +47,8 @@
>  #include <linux/errqueue.h>
>  #include <linux/uaccess.h>
>  
> +#include <linux/bpfilter.h>
> +
>  /*
>   *	SOL_IP control messages.
>   */
> @@ -1244,6 +1246,11 @@ int ip_setsockopt(struct sock *sk, int level,
>  		return -ENOPROTOOPT;
>  
>  	err = do_ip_setsockopt(sk, level, optname, optval, optlen);
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_SET_REPLACE &&
> +	    optname < BPFILTER_IPT_SET_MAX)
> +		err = bpfilter_ip_set_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
> @@ -1552,6 +1559,11 @@ int ip_getsockopt(struct sock *sk, int level,
>  	int err;
>  
>  	err = do_ip_getsockopt(sk, level, optname, optval, optlen, 0);
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_GET_INFO &&
> +	    optname < BPFILTER_IPT_GET_MAX)
> +		err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
> @@ -1584,6 +1596,11 @@ int compat_ip_getsockopt(struct sock *sk, int level, int optname,
>  	err = do_ip_getsockopt(sk, level, optname, optval, optlen,
>  		MSG_CMSG_COMPAT);
>  
> +#ifdef CONFIG_BPFILTER
> +	if (optname >= BPFILTER_IPT_SO_GET_INFO &&
> +	    optname < BPFILTER_IPT_GET_MAX)
> +		err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
> +#endif
>  #ifdef CONFIG_NETFILTER
>  	/* we need to exclude all possible ENOPROTOOPTs except default case */
>  	if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
> -- 
> 2.9.5

-- 
Do not panic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
  2018-05-07 18:39       ` Luis R. Rodriguez
@ 2018-05-09  2:25         ` Alexei Starovoitov
  0 siblings, 0 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2018-05-09  2:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Eric Paris, Matthew Auld, Josh Triplett, Kirill A. Shutemov,
	Joonas Lahtinen, Chris Wilson, Stephen Smalley,
	Eric W. Biederman, Mimi Zohar, Alexei Starovoitov, davem, daniel,
	torvalds, gregkh, luto, netdev, linux-kernel, kernel-team,
	Al Viro, David Howells, Kees Cook, Andrew Morton,
	Dominik Brodowski, Hugh Dickins, Jann Horn, Jani Nikula,
	Rodrigo Vivi, David Airlie, Rafael J. Wysocki, Linux FS Devel,
	pjones, mjg59, linux-security-module, linux-integrity

On Mon, May 07, 2018 at 06:39:31PM +0000, Luis R. Rodriguez wrote:
> 
> > Are you saying make 'static struct vfsmount *shm_mnt;'
> > global and use it here? so no init_tmpfs() necessary?
> > I think that can work, but feels that having two
> > tmpfs mounts (one for shmem and one for umh) is cleaner.
> 
> No, but now that you mention it... if a shared vfsmount is not used the
> /sys/kernel/mm/transparent_hugepage/shmem_enabled knob for using huge pages
> would not be followed for umh modules. For the i915 driver this was *why*
> they ended up adding shmem_file_setup_with_mnt(), they wanted huge pages to
> support huge-gtt-pages. What is the rationale behind umh.c using it for
> umh modules?
> 
> Users of shmem_kernel_file_setup() spawned later out of the desire to
> *avoid* LSMs since it didn't make sense in their case as their inodes
> are never exposed to userspace. Such is the case for ipc/shm.c and
> security/keys/big_key.c. Refer to commit c7277090927a5 ("security: shmem:
> implement kernel private shmem inodes") and then commit e1832f2923ec9
> ("ipc: use private shmem or hugetlbfs inodes for shm segments").
> 
> In this new umh usermode modules case we are:
> 
>   a) actually mapping data already extracted by the kernel somehow from
>      a file somehow, presumably from /lib/modules/ path somewhere, but
>      again this is not visible to umc.c, as it just gets called with:
> 
> 	fork_usermode_blob(void *data, size_t len, struct umh_info *info)
> 
>   b) Creating the respective tmpfs file with shmem_file_setup_with_mnt()
>      with our on our own shared struct vfsmount *umh_fs.
> 
>   c) Populating the file created and stuffing it with our data passed
> 
>   d) Calling do_execve_file() on it.
> 
> Its not clear to me why you used shmem_file_setup_with_mnt() in this case. What
> are the gains? It would make sense to use shmem_kernel_file_setup() to avoid an
> LSM check on step b) *but* only if we already had a proper LSM check on step
> a).
> 
> I checked how you use fork_usermode_blob() in a) and in this case the kernel
> module bpfilter would be loaded first, and for that we already have an LSM
> check / hook for that module. From my review then, the magic done on your
> second patch to stuff the userspace application into the module should be
> irrelevant to us from an LSM perspective.
> 
> So, I can see a reason to use shmem_kernel_file_setup() but not I cannot
> see a reason to be using      shmem_file_setup_with_mnt() at the moment.

That's a good idea. I will switch to using shmem_kernel_file_setup().
I guess I can use kernel_write() as well instead of populate_file().
I wonder why i915 had to use pagecache_write_begin() and the loop
instead of kernel_write()...
The only reason to copy into tmpfs file is to make that memory swappable.
All standard shmem knobs should apply.

> > > > +{
> > > > +	size_t offset = 0;
> > > > +	int err;
> > > > +
> > > > +	do {
> > > > +		unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
> > > > +		struct page *page;
> > > > +		void *pgdata, *vaddr;
> > > > +
> > > > +		err = pagecache_write_begin(file, file->f_mapping, offset, len,
> > > > +					    0, &page, &pgdata);
> > > > +		if (err < 0)
> > > > +			goto fail;
> > > > +
> > > > +		vaddr = kmap(page);
> > > > +		memcpy(vaddr, data, len);
> > > > +		kunmap(page);
> > > > +
> > > > +		err = pagecache_write_end(file, file->f_mapping, offset, len,
> > > > +					  len, page, pgdata);
> > > > +		if (err < 0)
> > > > +			goto fail;
> > > > +
> > > > +		size -= len;
> > > > +		data += len;
> > > > +		offset += len;
> > > > +	} while (size);
> > > 
> > > Character for character, this looks like a wonderful copy and paste from
> > > i915_gem_object_create_from_data()'s own loop which does the same exact
> > > thing. Perhaps its time for a helper on mm/filemap.c with an export so
> > > if a bug is fixed in one place its fixed in both places.
> > 
> > yes, of course, but not right now.
> > Once it all lands that will be the time to create common helper.
> > It's not a good idea to mess with i915 in one patch set.
> 
> Either way works with me, so long as its done.

Will be gone due to switch to kernel_write().

> 
> > > > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
> > > 
> > > Please use umh_fork_blob()
> > 
> > sorry, no. fork_usermode_blob() is much more descriptive name.
> 
> Prefixing new umh.c symbols with umh_*() makes it very clear this came from
> kernel/umh.c functionality. I've been dealing with other places in the kernel
> that have conflated their own use of kernel/umh.c functionality what they
> expose in both code and documentation, and correcting this has taken a long
> time. Best avoid future confusion and be consistent with new exported symbols
> for umc.c functionality.

There is no confusion today. The most known umh api is a family of
call_usermodehelper*()
In this case it's not a 'call', it's a 'fork', since part of kernel module
being forked as user mode process.
I considered naming this function fork_usermodehelper(),
but it's also not quite correct, since 'user mode helper' has predefined
meaning of something that has the path whereas here it's a blob of bytes.
Hence fork_usermode_blob() is more accurate and semantically correct name,
whereas umh_fork_blob() is not.

Notice I no longer call these new kernel modules as 'umh modules',
since that's the wrong name for the same reasons.
They are good ol' kernel modules.
The new functionality allowed by this patch is:
forking part of kernel module data as user mode process.
A lot of umh logic is reused, but 'user mode helpers' and
'user mode blobs' are distinct kernel features.

> I don't even want to test USB, I am just interesting in the *very* *very*
> basic aspects of it. A simple lib/test_umh_module.c would do with a respective
> check that its loaded, given this is a requirement from the API. It also helps
> folks understand the core basic knobs without having to look at bfilter code.

I agree that lib/test_usermode_blob.c must be available eventually.
Right now we cannot add it to the tree, since we need to figure how interface
between kernel and usermode_blob will work based on real world use case
of bpfilter. Once it gets further along that would be the time to say:
"look, here is the test for fork_usermode_blob() and here how others (usb drivers)
can and should use it".
Today is not the right time to fix the api. Such lib/test_usermode_blob.c
would have to be constantly adjusted as we tweak bpfilter side becoming
unnecessary burden of us bpfilter developers.
Everyone really need to think of these patches as work in progress
and internal details and api of fork_usermode_blob() will change.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
  2018-05-07 18:51   ` [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module Luis R. Rodriguez
@ 2018-05-09  2:29     ` Alexei Starovoitov
  0 siblings, 0 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2018-05-09  2:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alexei Starovoitov, davem, daniel, torvalds, gregkh, luto,
	netdev, linux-kernel, kernel-team, Juergen Gross, Eric Paris,
	Matthew Auld, Josh Triplett, Kirill A. Shutemov, Joonas Lahtinen,
	Chris Wilson, Stephen Smalley, Eric W. Biederman, Mimi Zohar,
	David Howells, Kees Cook, Andrew Morton, Dominik Brodowski,
	Hugh Dickins, Jann Horn, Jani Nikula, Rodrigo Vivi, David Airlie,
	Rafael J. Wysocki, Linux FS Devel, pjones, Michael Matz, mjg59,
	linux-security-module, linux-integrity

On Mon, May 07, 2018 at 06:51:24PM +0000, Luis R. Rodriguez wrote:
> > Notice that _binary_net_bpfilter_bpfilter_umh_start - end
> > is placed into .init.rodata section, so it's freed as soon as __init
> > function of bpfilter.ko is finished.
> > As part of __init the bpfilter.ko does first request/reply action
> > via two unix pipe provided by fork_usermode_blob() helper to
> > make sure that umh is healthy. If not it will kill it via pid.
> 
> It does this very fast, right away. On a really slow system how are you sure
> that this won't race and the execution of the check happens early on prior to
> letting the actual setup trigger? After all, we're calling the userpsace
> process in async mode. We could preempt it now.

I don't see an issue.
the kernel synchronously writes into a pipe. User space process reads.
Exactly the same as coredump logic with pipes.

> > +# a bit of elf magic to convert bpfilter_umh binary into a binary blob
> > +# inside bpfilter_umh.o elf file referenced by
> > +# _binary_net_bpfilter_bpfilter_umh_start symbol
> > +# which bpfilter_kern.c passes further into umh blob loader at run-time
> > +quiet_cmd_copy_umh = GEN $@
> > +      cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> > +      $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
> > +      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
> > +      --rename-section .data=.init.rodata $< $@
> 
> Cool, but so our expectation is that the compiler sets this symbol, how
> are we sure it will always be set?

Compiler doesn't set it. objcopy does.

> > +
> > +	if (__bpfilter_process_sockopt(NULL, 0, 0, 0, 0) != 0) {
> 
> See, here, what if the userspace process gets preemtped and we run this
> check afterwards? Is that possible?

User space is a normal task. It can sleep and can be single stepped with GDB.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
  2018-05-04 19:56   ` [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper Luis R. Rodriguez
  2018-05-05  1:37     ` Alexei Starovoitov
@ 2018-05-10 22:27     ` Kees Cook
  2018-05-10 23:16       ` Alexei Starovoitov
  1 sibling, 1 reply; 8+ messages in thread
From: Kees Cook @ 2018-05-10 22:27 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Greg KH, Andy Lutomirski, Network Development,
	LKML, kernel-team, Al Viro, David Howells, Mimi Zohar,
	Andrew Morton, Dominik Brodowski, Hugh Dickins, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, Rafael J. Wysocki,
	Linux FS Devel, Peter Jones, Matthew Garrett,
	linux-security-module, linux-integrity, Jessica Yu

On Fri, May 4, 2018 at 12:56 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.

Me too, please. And likely linux-security-module@ and Jessica too.

> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
>> Introduce helper:
>> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
>> struct umh_info {
>>        struct file *pipe_to_umh;
>>        struct file *pipe_from_umh;
>>        pid_t pid;
>> };
>>
>> that GPLed kernel modules (signed or unsigned) can use it to execute part
>> of its own data as swappable user mode process.
>>
>> The kernel will do:
>> - mount "tmpfs"
>> - allocate a unique file in tmpfs
>> - populate that file with [data, data + len] bytes
>> - user-mode-helper code will do_execve that file and, before the process
>>   starts, the kernel will create two unix pipes for bidirectional
>>   communication between kernel module and umh
>> - close tmpfs file, effectively deleting it
>> - the fork_usermode_blob will return zero on success and populate
>>   'struct umh_info' with two unix pipes and the pid of the user process

I'm trying to think how LSMs can successfully reason about the
resulting exec(). In the past, we've replaced "blob" style interfaces
with file-based interfaces (e.g. init_module() -> finit_module(),
kexec_load() -> kexec_file_load()) to better let the kernel understand
the origin of executable content. Here the intent is fine: we're
getting the exec from an already-loaded module, etc, etc. I'm trying
to think specifically about the interface.

How can the ultimate exec get tied back to the kernel module in a way
that the LSM can query? Right now the hooks hit during exec are:
kernel_read_file() and kernel_post_read_file() of tmpfs file,
bprm_set_creds(), bprm_check(), bprm_commiting_creds(),
bprm_commited_creds(). It seems silly to me for an LSM to perform
these checks at all since I would expect the _meaningful_ check to be
finit_module() of the module itself. Having a way for an LSM to know
the exec is tied to a kernel module would let them skip the nonsense
checks.

Since the process for doing the usermode_blob is defined by the kernel
module build/link/objcopy process, could we tighten the
fork_usermode_blob() interface to point to the kernel module itself,
rather than leaving it an open-ended "blob" interface? Given our
history of needing to replace blob interfaces with file interfaces,
I'm cautious to add a new blob interface. Maybe just pull all the
blob-finding/loading into the interface, and just make it something
like fork_usermode_kmod(struct module *mod, struct umh_info *info) ?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
  2018-05-10 22:27     ` Kees Cook
@ 2018-05-10 23:16       ` Alexei Starovoitov
  0 siblings, 0 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2018-05-10 23:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Luis R. Rodriguez, Alexei Starovoitov, David S. Miller,
	Daniel Borkmann, Linus Torvalds, Greg KH, Andy Lutomirski,
	Network Development, LKML, kernel-team, Al Viro, David Howells,
	Mimi Zohar, Andrew Morton, Dominik Brodowski, Hugh Dickins,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	Rafael J. Wysocki, Linux FS Devel, Peter Jones, Matthew Garrett,
	linux-security-module, linux-integrity, Jessica Yu

On Thu, May 10, 2018 at 03:27:24PM -0700, Kees Cook wrote:
> On Fri, May 4, 2018 at 12:56 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > What a mighty short list of reviewers. Adding some more. My review below.
> > I'd appreciate a Cc on future versions of these patches.
> 
> Me too, please. And likely linux-security-module@ and Jessica too.
> 
> > On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> >> Introduce helper:
> >> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> >> struct umh_info {
> >>        struct file *pipe_to_umh;
> >>        struct file *pipe_from_umh;
> >>        pid_t pid;
> >> };
> >>
> >> that GPLed kernel modules (signed or unsigned) can use it to execute part
> >> of its own data as swappable user mode process.
> >>
> >> The kernel will do:
> >> - mount "tmpfs"
> >> - allocate a unique file in tmpfs
> >> - populate that file with [data, data + len] bytes
> >> - user-mode-helper code will do_execve that file and, before the process
> >>   starts, the kernel will create two unix pipes for bidirectional
> >>   communication between kernel module and umh
> >> - close tmpfs file, effectively deleting it
> >> - the fork_usermode_blob will return zero on success and populate
> >>   'struct umh_info' with two unix pipes and the pid of the user process
> 
> I'm trying to think how LSMs can successfully reason about the
> resulting exec(). In the past, we've replaced "blob" style interfaces
> with file-based interfaces (e.g. init_module() -> finit_module(),
> kexec_load() -> kexec_file_load()) to better let the kernel understand
> the origin of executable content. Here the intent is fine: we're
> getting the exec from an already-loaded module, etc, etc. I'm trying
> to think specifically about the interface.
> 
> How can the ultimate exec get tied back to the kernel module in a way
> that the LSM can query? Right now the hooks hit during exec are:
> kernel_read_file() and kernel_post_read_file() of tmpfs file,
> bprm_set_creds(), bprm_check(), bprm_commiting_creds(),
> bprm_commited_creds(). It seems silly to me for an LSM to perform
> these checks at all since I would expect the _meaningful_ check to be
> finit_module() of the module itself. Having a way for an LSM to know
> the exec is tied to a kernel module would let them skip the nonsense
> checks.
> 
> Since the process for doing the usermode_blob is defined by the kernel
> module build/link/objcopy process, could we tighten the
> fork_usermode_blob() interface to point to the kernel module itself,
> rather than leaving it an open-ended "blob" interface? Given our
> history of needing to replace blob interfaces with file interfaces,
> I'm cautious to add a new blob interface. Maybe just pull all the
> blob-finding/loading into the interface, and just make it something
> like fork_usermode_kmod(struct module *mod, struct umh_info *info) ?

I don't think it will work, since Andy and others pointed out that
bpfilter needs to work as builtin as well. There is no 'struct module'
in such case, but fork-ing of the user process still needs to happen.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20180503043604.1604587-1-ast@kernel.org>
     [not found] ` <20180503043604.1604587-2-ast@kernel.org>
2018-05-04 19:56   ` [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper Luis R. Rodriguez
2018-05-05  1:37     ` Alexei Starovoitov
2018-05-07 18:39       ` Luis R. Rodriguez
2018-05-09  2:25         ` Alexei Starovoitov
2018-05-10 22:27     ` Kees Cook
2018-05-10 23:16       ` Alexei Starovoitov
     [not found] ` <20180503043604.1604587-3-ast@kernel.org>
2018-05-07 18:51   ` [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module Luis R. Rodriguez
2018-05-09  2:29     ` Alexei Starovoitov

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git