Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
@ 2017-12-01  0:57 Kees Cook
  2017-12-01  1:33 ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Kees Cook @ 2017-12-01  0:57 UTC (permalink / raw)
  To: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann, Pablo Neira Ayuso
  Cc: Linus Torvalds, David Miller, LKML, Network Development, Al Viro,
	Christoph Hellwig, Thomas Garnier, Jann Horn

On Mon, Oct 9, 2017 at 4:10 PM, David Miller <davem@davemloft.net> wrote:
> Shmulik Ladkani (1):
>       netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'

This adds a new user of set_fs(), which we're trying to eliminate (or
at least not expand):

+       set_fs(KERNEL_DS);
+       fd = bpf_obj_get_user(path);
+       set_fs(oldfs);

Can you please adjust this to not make set_fs() changes?

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-01  0:57 netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Kees Cook
@ 2017-12-01  1:33 ` Al Viro
  2017-12-01  3:48   ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2017-12-01  1:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann,
	Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML,
	Network Development, Christoph Hellwig, Thomas Garnier,
	Jann Horn

On Thu, Nov 30, 2017 at 04:57:30PM -0800, Kees Cook wrote:
> On Mon, Oct 9, 2017 at 4:10 PM, David Miller <davem@davemloft.net> wrote:
> > Shmulik Ladkani (1):
> >       netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
> 
> This adds a new user of set_fs(), which we're trying to eliminate (or
> at least not expand):
> 
> +       set_fs(KERNEL_DS);
> +       fd = bpf_obj_get_user(path);
> +       set_fs(oldfs);
> 
> Can you please adjust this to not make set_fs() changes?

That's not the worst problem there.  Messing with descriptor table is much
worse.  It can be shared between threads; by the time you get to fdget()
the damn thing might have nothing to do with what bpf_obj_get_user() has
put there, ditto for sys_close().

Use of file descriptors should be limited to "got a number from userland,
convert to struct file *" on the way in and "install struct file * into
descriptor table and return the descriptor to userland" on the way out.
And the latter - *ONLY* after the last possible point of failure.  Once
a file reference is inserted into descriptor table, that's it - you
can't undo that.

The only way to use bpf_obj_get_user() is to pass its return value to
userland.  As return value of syscall - not even put_user() (for that
you'd need to reserve the descriptor, copy it to userland and only
then attach struct file * to it).

The whole approach stinks - what it needs is something that would
take struct filename * and return struct bpf_prog * or struct file *
reference.  With bpf_obj_get_user() and this thing implemented
via that.

I'm looking into that thing...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-01  1:33 ` Al Viro
@ 2017-12-01  3:48   ` Al Viro
  2017-12-01  4:54     ` Al Viro
  2017-12-01 21:34     ` Daniel Borkmann
  0 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2017-12-01  3:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann,
	Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML,
	Network Development, Christoph Hellwig, Thomas Garnier,
	Jann Horn

On Fri, Dec 01, 2017 at 01:33:04AM +0000, Al Viro wrote:

> Use of file descriptors should be limited to "got a number from userland,
> convert to struct file *" on the way in and "install struct file * into
> descriptor table and return the descriptor to userland" on the way out.
> And the latter - *ONLY* after the last possible point of failure.  Once
> a file reference is inserted into descriptor table, that's it - you
> can't undo that.
> 
> The only way to use bpf_obj_get_user() is to pass its return value to
> userland.  As return value of syscall - not even put_user() (for that
> you'd need to reserve the descriptor, copy it to userland and only
> then attach struct file * to it).
> 
> The whole approach stinks - what it needs is something that would
> take struct filename * and return struct bpf_prog * or struct file *
> reference.  With bpf_obj_get_user() and this thing implemented
> via that.
> 
> I'm looking into that thing...

What it tries to pull off is something not far from

static struct bpf_prog *__get_prog(struct inode *inode, enum bpf_prog_type type)
{
	struct bpf_prog *prog;
	int err = inode_permission(inode, FMODE_READ | FMODE_WRITE);
	if (err)
		return ERR_PTR(err);

	if (inode->i_op == &bpf_map_iops)
		return ERR_PTR(-EINVAL);

	if (inode->i_op != &bpf_prog_iops)
		return ERR_PTR(-EACCES);

	prog = inode->i_private;
	err = security_bpf_prog(prog);
	if (err < 0)
		return ERR_PTR(err);

	if (!bpf_prog_get_ok(prog, &type, false))
		return ERR_PTR(-EINVAL);

	return bpf_prog_inc(prog);
}

struct bpf_prog *get_prog_path_type(const char *name, enum bpf_prog_type type)
{
	struct path path;
	struct bpf_prog *prog;
	int err = kern_path(name, LOOKUP_FOLLOW, &path);
	if (err)
		return ERR_PTR(err);
	prog = __get_prog(d_backing_inode(path.dentry), type);
	if (!IS_ERR(prog))
		touch_atime(&path);
	path_put(&path);
	return prog;
}

static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
{
	*ret = get_prog_path_type(path, BPF_PROG_TYPE_SOCKET_FILTER);
        return PTR_ERR_OR_ZERO(*ret);
}

That skips all tracepoint random shite (pardon the triple redundance) and makes
a somewhat arbitrary change for touch_atime() logics.  And, of course, it is
not even compile-tested.

Something similar to get_prog_path_type() above might make for a usable
primitive, IMO...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-01  3:48   ` Al Viro
@ 2017-12-01  4:54     ` Al Viro
  2017-12-01 17:39       ` Al Viro
       [not found]       ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com>
  2017-12-01 21:34     ` Daniel Borkmann
  1 sibling, 2 replies; 12+ messages in thread
From: Al Viro @ 2017-12-01  4:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann,
	Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML,
	Network Development, Christoph Hellwig, Thomas Garnier,
	Jann Horn

On Fri, Dec 01, 2017 at 03:48:59AM +0000, Al Viro wrote:

> Something similar to get_prog_path_type() above might make for a usable
> primitive, IMO...

Incidentally, bpf_obj_get_user()/bpf_obj_do_get() should just use
user_path(), rather than wanking with getname()+kern_path(pname->name)+putname().
Note that kern_path() will do getname_kernel() to get struct pathname...

Would cause problems for tracepoints in there, though.  And that, BTW,
is precisely why I don't want tracepoints in core VFS, TYVM - makes
restructuring the code harder...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-01  4:54     ` Al Viro
@ 2017-12-01 17:39       ` Al Viro
  2017-12-01 20:47         ` Daniel Borkmann
       [not found]       ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com>
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2017-12-01 17:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann,
	Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML,
	Network Development, Christoph Hellwig, Thomas Garnier,
	Jann Horn

On Fri, Dec 01, 2017 at 04:54:39AM +0000, Al Viro wrote:
> On Fri, Dec 01, 2017 at 03:48:59AM +0000, Al Viro wrote:
> 
> > Something similar to get_prog_path_type() above might make for a usable
> > primitive, IMO...
> 
> Incidentally, bpf_obj_get_user()/bpf_obj_do_get() should just use
> user_path(), rather than wanking with getname()+kern_path(pname->name)+putname().
> Note that kern_path() will do getname_kernel() to get struct pathname...
> 
> Would cause problems for tracepoints in there, though.  And that, BTW,
> is precisely why I don't want tracepoints in core VFS, TYVM - makes
> restructuring the code harder...

Egads...  Contortions in bpf ->mknod() are really obnoxious.

First of all, it checks that ->d_fsdata is non-NULL and fails otherwise.
The only time ->d_fsdata gets non-NULL on that fs?  In bpf_obj_do_pin(), this:
        dentry->d_fsdata = raw;
        ret = vfs_mknod(dir, dentry, mode, devt);
        dentry->d_fsdata = NULL;
In other words, it's *not* going to work from normal mknod(2).  Why go through
->mknod(), then, especially since it requires that kind of contortions to
pass the data in?

devt is 0:1 or 0:2 here.  mode?  Character or block device, right?  Like hell -
it's a regular file.  And devt is a cute way to pass a flag down into bpf_mkobj()
(aka. ->mknod()) through vfs_mknod().  No, it doesn't go into ->i_rdev...
And to make the things even more fun, the damn thing is passed to a couple
of Linux S&M hooks - security_path_mknod() and security_inode_mknod().  Oh, sorry -
three hooks.  There's devcgroup_inode_mknod() as well, but that thing sees S_IFREG
in mode and buggers off quietly.  Our esteemed sadomaso^Wsecurity community gets
to play, though.  Without any way to see _what_ are we attaching to that place in
the bpf fs tree, but hey - it's security, it doesn't need to make sense...

What the hell?  If you need a clean way to do something, why don't you describe
(on fsdevel, or in off-list mail to relevant people) what do you really want?
Sure, you can "work around" anything, but doesn't that level of perversion
strike you as a clear sign of something being not right?

For crying out loud, you are trying to pass a tagged pointer to one or another
kind of object into your own function.  For that you
	* use a field in a globally visible data structure as a temporary storage
for a pointer
	* encode your tag (essentially a boolean) into a fucking _device_ _number_,
of all things, and shove it through, hoping that no LSM module gets weirded out by
non-zero device number combined with regular file for mode.

If that does not scream "wrong or missing primitive", I don't know what would.
You want something along the lines of "create a filesystem object at given
location, calling this function with this argument for actual object creation"?
Fair enough, but then let's add a primitive that would do just that.

And grepping around for similar sick tricks catches a slightly milder example -
mq_open(2) doesn't play with encoding stuff into dev_t, but otherwise it's very
similar and could also benefit from the same primitive.

How about something like this:
int vfs_mkobj(struct dentry *dentry, umode_t mode,
                int (*f)(struct dentry *, umode_t, void *),
		void *arg)
{
	struct inode *dir = dentry->d_parent->d_inode;
        int error = may_create(dir, dentry);
        if (error)
                return error;

        mode &= S_IALLUGO;
        mode |= S_IFREG;
        error = security_inode_create(dir, dentry, mode);
        if (error)
                return error;
        error = f(dentry, mode, arg);
        if (!error)
                fsnotify_create(dir, dentry);
        return error;
}

exported by fs/namei.c, with your code doing

	switch (type) {
	case BPF_TYPE_PROG:
		error = vfs_mkobj(path.dentry, mode, bpf_mkprog, raw);
		break;
	case BPF_TYPE_MAP:
		error = vfs_mkobj(path.dentry, mode, bpf_mkmap, raw);
		break;
	default:
		error = -EPERM;
	}
instead that vfs_mknod() hack, with

static int bpf_mkprog(struct inode *dir, struct dentry *dentry,
		 umode_t mode, void *raw)
{
	return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_prog_iops);
}

static int bpf_mkmap(struct inode *dir, struct dentry *dentry,
		 umode_t mode, void *raw)
{
	return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_map_iops);
}

static int bpf_mkobj_ops(struct inode *dir, struct dentry *dentry,
		 umode_t mode, void *raw, struct inode_operations *iops)
{
        struct inode *inode;

        inode = bpf_get_inode(dir->i_sb, dir, mode);
        if (IS_ERR(inode))
                return PTR_ERR(inode);

        inode->i_op = iops;
        inode->i_private = raw;

        bpf_dentry_finalize(dentry, inode, dir);
        return 0;
}

And to hell with messing with dev_t, ->d_fsdata or having ->mknod() there at all...
Might want to replace security_path_mknod() with something saner, while we are
at it.

Objections?

PS: mqueue.c would also benefit from such primitive - do_create() there would
simply pass attr as callback's argument into vfs_mkobj(), with callback being
the guts of mqueue_create()...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-01 17:39       ` Al Viro
@ 2017-12-01 20:47         ` Daniel Borkmann
  2017-12-02 18:48           ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2017-12-01 20:47 UTC (permalink / raw)
  To: Al Viro, Kees Cook
  Cc: Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso,
	Linus Torvalds, David Miller, LKML, Network Development,
	Christoph Hellwig, Thomas Garnier, Jann Horn

On 12/01/2017 06:39 PM, Al Viro wrote:
[...]
> If that does not scream "wrong or missing primitive", I don't know what would.
> You want something along the lines of "create a filesystem object at given
> location, calling this function with this argument for actual object creation"?
> Fair enough, but then let's add a primitive that would do just that.
> 
> And grepping around for similar sick tricks catches a slightly milder example -
> mq_open(2) doesn't play with encoding stuff into dev_t, but otherwise it's very
> similar and could also benefit from the same primitive.
> 
> How about something like this:
> int vfs_mkobj(struct dentry *dentry, umode_t mode,
>                 int (*f)(struct dentry *, umode_t, void *),
> 		void *arg)
> {
> 	struct inode *dir = dentry->d_parent->d_inode;
>         int error = may_create(dir, dentry);
>         if (error)
>                 return error;
> 
>         mode &= S_IALLUGO;
>         mode |= S_IFREG;
>         error = security_inode_create(dir, dentry, mode);
>         if (error)
>                 return error;
>         error = f(dentry, mode, arg);
>         if (!error)
>                 fsnotify_create(dir, dentry);
>         return error;
> }
> 
> exported by fs/namei.c, with your code doing
> 
> 	switch (type) {
> 	case BPF_TYPE_PROG:
> 		error = vfs_mkobj(path.dentry, mode, bpf_mkprog, raw);
> 		break;
> 	case BPF_TYPE_MAP:
> 		error = vfs_mkobj(path.dentry, mode, bpf_mkmap, raw);
> 		break;
> 	default:
> 		error = -EPERM;
> 	}
> instead that vfs_mknod() hack, with
> 
> static int bpf_mkprog(struct inode *dir, struct dentry *dentry,
> 		 umode_t mode, void *raw)
> {
> 	return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_prog_iops);
> }
> 
> static int bpf_mkmap(struct inode *dir, struct dentry *dentry,
> 		 umode_t mode, void *raw)
> {
> 	return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_map_iops);
> }
> 
> static int bpf_mkobj_ops(struct inode *dir, struct dentry *dentry,
> 		 umode_t mode, void *raw, struct inode_operations *iops)
> {
>         struct inode *inode;
> 
>         inode = bpf_get_inode(dir->i_sb, dir, mode);
>         if (IS_ERR(inode))
>                 return PTR_ERR(inode);
> 
>         inode->i_op = iops;
>         inode->i_private = raw;
> 
>         bpf_dentry_finalize(dentry, inode, dir);
>         return 0;
> }
> 
> And to hell with messing with dev_t, ->d_fsdata or having ->mknod() there at all...
> Might want to replace security_path_mknod() with something saner, while we are
> at it.
> 
> Objections?

No, thanks for looking into this, and sorry for this fugly hack! :( Not
that this doesn't make it any better, but I think back then I took it
over from mqueue implementation ... should have known better and looking
into making this generic instead, sigh. The above looks good to me, so
no objections from my side and thanks for working on it!

> PS: mqueue.c would also benefit from such primitive - do_create() there would
> simply pass attr as callback's argument into vfs_mkobj(), with callback being
> the guts of mqueue_create()...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-01 20:47         ` Daniel Borkmann
@ 2017-12-02 18:48           ` Al Viro
  2017-12-02 22:08             ` Al Viro
  2017-12-04  9:57             ` Daniel Borkmann
  0 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2017-12-02 18:48 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso,
	Linus Torvalds, David Miller, LKML, Network Development,
	Christoph Hellwig, Thomas Garnier, Jann Horn

On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote:

> > Might want to replace security_path_mknod() with something saner, while we are
> > at it.
> > 
> > Objections?
> 
> No, thanks for looking into this, and sorry for this fugly hack! :( Not
> that this doesn't make it any better, but I think back then I took it
> over from mqueue implementation ... should have known better and looking
> into making this generic instead, sigh. The above looks good to me, so
> no objections from my side and thanks for working on it!
> 
> > PS: mqueue.c would also benefit from such primitive - do_create() there would
> > simply pass attr as callback's argument into vfs_mkobj(), with callback being
> > the guts of mqueue_create()...

OK...  See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2)
passes LTP tests, but that's not saying much, and BPF side is completely
untested.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-02 18:48           ` Al Viro
@ 2017-12-02 22:08             ` Al Viro
  2017-12-03  4:22               ` Willem de Bruijn
  2017-12-04  9:57             ` Daniel Borkmann
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2017-12-02 22:08 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso,
	Linus Torvalds, David Miller, LKML, Network Development,
	Christoph Hellwig, Thomas Garnier, Jann Horn

On Sat, Dec 02, 2017 at 06:48:50PM +0000, Al Viro wrote:
> On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote:
> 
> > > Might want to replace security_path_mknod() with something saner, while we are
> > > at it.
> > > 
> > > Objections?
> > 
> > No, thanks for looking into this, and sorry for this fugly hack! :( Not
> > that this doesn't make it any better, but I think back then I took it
> > over from mqueue implementation ... should have known better and looking
> > into making this generic instead, sigh. The above looks good to me, so
> > no objections from my side and thanks for working on it!
> > 
> > > PS: mqueue.c would also benefit from such primitive - do_create() there would
> > > simply pass attr as callback's argument into vfs_mkobj(), with callback being
> > > the guts of mqueue_create()...
> 
> OK...  See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2)
> passes LTP tests, but that's not saying much, and BPF side is completely
> untested.

... and FWIW, completely untested patch for net/netfilter/xt_bpf.c follows:

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e55e4255a210..a7000e4775e7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -514,6 +514,9 @@ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
 	return bpf_prog_get_type_dev(ufd, type, false);
 }
 
+struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type);
+bool bpf_prog_get_ok(struct bpf_prog *, enum bpf_prog_type *, bool);
+
 int bpf_prog_offload_compile(struct bpf_prog *prog);
 void bpf_prog_offload_destroy(struct bpf_prog *prog);
 
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 2b75faccc771..9d1050dc2a7a 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -364,6 +364,45 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
 }
 EXPORT_SYMBOL_GPL(bpf_obj_get_user);
 
+static struct bpf_prog *__get_prog_inode(struct inode *inode, enum bpf_prog_type type)
+{
+	struct bpf_prog *prog;
+	int ret = inode_permission(inode, MAY_READ | MAY_WRITE);
+	if (ret)
+		return ERR_PTR(ret);
+
+	if (inode->i_op == &bpf_map_iops)
+		return ERR_PTR(-EINVAL);
+	if (inode->i_op != &bpf_prog_iops)
+		return ERR_PTR(-EACCES);
+
+	prog = inode->i_private;
+
+	ret = security_bpf_prog(prog);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	if (!bpf_prog_get_ok(prog, &type, false))
+		return ERR_PTR(-EINVAL);
+
+	return bpf_prog_inc(prog);
+}
+
+struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type)
+{
+	struct bpf_prog *prog;
+	struct path path;
+	int ret = kern_path(name, LOOKUP_FOLLOW, &path);
+	if (ret)
+		return ERR_PTR(ret);
+	prog = __get_prog_inode(d_backing_inode(path.dentry), type);
+	if (!IS_ERR(prog))
+		touch_atime(&path);
+	path_put(&path);
+	return prog;
+}
+EXPORT_SYMBOL(bpf_prog_get_type_path);
+
 static void bpf_evict_inode(struct inode *inode)
 {
 	enum bpf_type type;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2c4cfeaa8d5e..5cb783fc8224 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1057,7 +1057,7 @@ struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_inc_not_zero);
 
-static bool bpf_prog_get_ok(struct bpf_prog *prog,
+bool bpf_prog_get_ok(struct bpf_prog *prog,
 			    enum bpf_prog_type *attach_type, bool attach_drv)
 {
 	/* not an attachment, just a refcount inc, always allow */
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index 041da0d9c06f..fa2ca0a13619 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -52,18 +52,8 @@ static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret)
 
 static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
 {
-	mm_segment_t oldfs = get_fs();
-	int retval, fd;
-
-	set_fs(KERNEL_DS);
-	fd = bpf_obj_get_user(path, 0);
-	set_fs(oldfs);
-	if (fd < 0)
-		return fd;
-
-	retval = __bpf_mt_check_fd(fd, ret);
-	sys_close(fd);
-	return retval;
+	*ret = bpf_prog_get_type_path(path, BPF_PROG_TYPE_SOCKET_FILTER);
+	return PTR_ERR_OR_ZERO(*ret);
 }
 
 static int bpf_mt_check(const struct xt_mtchk_param *par)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-02 22:08             ` Al Viro
@ 2017-12-03  4:22               ` Willem de Bruijn
  0 siblings, 0 replies; 12+ messages in thread
From: Willem de Bruijn @ 2017-12-03  4:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Daniel Borkmann, Kees Cook, Shmulik Ladkani, Willem de Bruijn,
	Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML,
	Network Development, Christoph Hellwig, Thomas Garnier,
	Jann Horn

>> OK...  See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2)
>> passes LTP tests, but that's not saying much, and BPF side is completely
>> untested.
>
> ... and FWIW, completely untested patch for net/netfilter/xt_bpf.c follows:

Thanks a lot for this fix.

The tree including the bpf fix passes this basic xt_bpf test:

  mount -t bpf bpf /sys/fs/bpf
  ./pin /sys/fs/bpf/pass
  iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/five -j LOG
  iptables -L INPUT
  iptables -F INPUT

where pin is as follows:

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index adeaa1302f34..0cd2bb8d634b 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -41,6 +41,7 @@ hostprogs-y += xdp_redirect_map
 hostprogs-y += xdp_redirect_cpu
 hostprogs-y += xdp_monitor
 hostprogs-y += syscall_tp
+hostprogs-y += pin

 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -89,6 +90,7 @@ xdp_redirect_map-objs := bpf_load.o $(LIBBPF)
xdp_redirect_map_user.o
 xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
+pin-objs := $(LIBBPF) pin.o

 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
diff --git a/samples/bpf/pin.c b/samples/bpf/pin.c
new file mode 100644
index 000000000000..826e86784edf
--- /dev/null
+++ b/samples/bpf/pin.c
@@ -0,0 +1,41 @@
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <error.h>
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "libbpf.h"
+#include "bpf_load.h"
+
+static char log_buf[1 << 16];
+
+int main(int argc, char **argv)
+{
+       struct bpf_insn prog[] = {
+               BPF_MOV64_IMM(BPF_REG_0, 1),
+               BPF_EXIT_INSN(),
+       };
+       int fd;
+
+       if (argc != 2)
+               error(1, 0, "Usage: %s <filepath>\n", argv[0]);
+
+       fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog,
+                             sizeof(prog) / sizeof(prog[0]),
+                             "GPL", 0, log_buf, sizeof(log_buf));
+       if (fd == -1)
+               error(1, errno, "load: %s", log_buf);
+
+       if (bpf_obj_pin(fd, argv[1]))
+               error(1, errno, "pin");
+
+       if (close(fd))
+               error(1, errno, "close");
+
+       return 0;
+}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-02 18:48           ` Al Viro
  2017-12-02 22:08             ` Al Viro
@ 2017-12-04  9:57             ` Daniel Borkmann
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2017-12-04  9:57 UTC (permalink / raw)
  To: Al Viro
  Cc: Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso,
	Linus Torvalds, David Miller, LKML, Network Development,
	Christoph Hellwig, Thomas Garnier, Jann Horn

On 12/02/2017 07:48 PM, Al Viro wrote:
> On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote:
>>> Might want to replace security_path_mknod() with something saner, while we are
>>> at it.
>>>
>>> Objections?
>>
>> No, thanks for looking into this, and sorry for this fugly hack! :( Not
>> that this doesn't make it any better, but I think back then I took it
>> over from mqueue implementation ... should have known better and looking
>> into making this generic instead, sigh. The above looks good to me, so
>> no objections from my side and thanks for working on it!
>>
>>> PS: mqueue.c would also benefit from such primitive - do_create() there would
>>> simply pass attr as callback's argument into vfs_mkobj(), with callback being
>>> the guts of mqueue_create()...
> 
> OK...  See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2)
> passes LTP tests, but that's not saying much, and BPF side is completely
> untested.

I pulled vfs.git#untested.mkobj into my local tree and ran tests for both
progs and maps on it, all went fine and the patch looks good to me.

For 'bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod()' when
you push the fix to Linus, feel free to add:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

Thanks for your help, Al!

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com>]

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
       [not found]       ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com>
@ 2017-12-01 20:13         ` Daniel Borkmann
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2017-12-01 20:13 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso,
	David Miller, LKML, Network Development, Christoph Hellwig,
	Thomas Garnier, Jann Horn

On 12/01/2017 07:28 PM, Linus Torvalds wrote:
> [ Sorry for HTML email crud - traveling and on mobile right now ]
> 
> On Nov 30, 2017 23:54, "Al Viro" <viro@zeniv.linux.org.uk> wrote:
> 
> Would cause problems for tracepoints in there, though.  And that, BTW,
> is precisely why I don't want tracepoints in core VFS, TYVM - makes
> restructuring the code harder...
> 
> Just ignore them, see if anybody notices, and then they can add them back.
> Tracepoints shouldn't hold up kernel development, and I doubt these are
> ones that could be noticed by normal users.

Yep, agree, if it really gets in the way, then lets remove them for
now. After all, that was what was decided anyway.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  2017-12-01  3:48   ` Al Viro
  2017-12-01  4:54     ` Al Viro
@ 2017-12-01 21:34     ` Daniel Borkmann
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2017-12-01 21:34 UTC (permalink / raw)
  To: Al Viro, Kees Cook
  Cc: Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso,
	Linus Torvalds, David Miller, LKML, Network Development,
	Christoph Hellwig, Thomas Garnier, Jann Horn

On 12/01/2017 04:48 AM, Al Viro wrote:
> On Fri, Dec 01, 2017 at 01:33:04AM +0000, Al Viro wrote:
> 
>> Use of file descriptors should be limited to "got a number from userland,
>> convert to struct file *" on the way in and "install struct file * into
>> descriptor table and return the descriptor to userland" on the way out.
>> And the latter - *ONLY* after the last possible point of failure.  Once
>> a file reference is inserted into descriptor table, that's it - you
>> can't undo that.
>>
>> The only way to use bpf_obj_get_user() is to pass its return value to
>> userland.  As return value of syscall - not even put_user() (for that
>> you'd need to reserve the descriptor, copy it to userland and only
>> then attach struct file * to it).
>>
>> The whole approach stinks - what it needs is something that would
>> take struct filename * and return struct bpf_prog * or struct file *
>> reference.  With bpf_obj_get_user() and this thing implemented
>> via that.

Agree, the "fix" is completely buggy due to fd being exposed to user
space during that period of time ...

>> I'm looking into that thing...
> 
> What it tries to pull off is something not far from
> 
> static struct bpf_prog *__get_prog(struct inode *inode, enum bpf_prog_type type)
> {
> 	struct bpf_prog *prog;
> 	int err = inode_permission(inode, FMODE_READ | FMODE_WRITE);
> 	if (err)
> 		return ERR_PTR(err);
> 
> 	if (inode->i_op == &bpf_map_iops)
> 		return ERR_PTR(-EINVAL);
> 
> 	if (inode->i_op != &bpf_prog_iops)
> 		return ERR_PTR(-EACCES);
> 
> 	prog = inode->i_private;
> 	err = security_bpf_prog(prog);
> 	if (err < 0)
> 		return ERR_PTR(err);
> 
> 	if (!bpf_prog_get_ok(prog, &type, false))
> 		return ERR_PTR(-EINVAL);
> 
> 	return bpf_prog_inc(prog);
> }
> 
> struct bpf_prog *get_prog_path_type(const char *name, enum bpf_prog_type type)
> {
> 	struct path path;
> 	struct bpf_prog *prog;
> 	int err = kern_path(name, LOOKUP_FOLLOW, &path);
> 	if (err)
> 		return ERR_PTR(err);
> 	prog = __get_prog(d_backing_inode(path.dentry), type);
> 	if (!IS_ERR(prog))
> 		touch_atime(&path);
> 	path_put(&path);
> 	return prog;
> }
> 
> static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
> {
> 	*ret = get_prog_path_type(path, BPF_PROG_TYPE_SOCKET_FILTER);
>         return PTR_ERR_OR_ZERO(*ret);
> }
> 
> That skips all tracepoint random shite (pardon the triple redundance) and makes
> a somewhat arbitrary change for touch_atime() logics.  And, of course, it is
> not even compile-tested.
> 
> Something similar to get_prog_path_type() above might make for a usable
> primitive, IMO...

The above looks good to me!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-12-04  9:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01  0:57 netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Kees Cook
2017-12-01  1:33 ` Al Viro
2017-12-01  3:48   ` Al Viro
2017-12-01  4:54     ` Al Viro
2017-12-01 17:39       ` Al Viro
2017-12-01 20:47         ` Daniel Borkmann
2017-12-02 18:48           ` Al Viro
2017-12-02 22:08             ` Al Viro
2017-12-03  4:22               ` Willem de Bruijn
2017-12-04  9:57             ` Daniel Borkmann
     [not found]       ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com>
2017-12-01 20:13         ` Daniel Borkmann
2017-12-01 21:34     ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).