* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
@ 2017-12-01 0:57 Kees Cook
2017-12-01 1:33 ` Al Viro
0 siblings, 1 reply; 12+ messages in thread
From: Kees Cook @ 2017-12-01 0:57 UTC (permalink / raw)
To: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann, Pablo Neira Ayuso
Cc: Linus Torvalds, David Miller, LKML, Network Development, Al Viro,
Christoph Hellwig, Thomas Garnier, Jann Horn
On Mon, Oct 9, 2017 at 4:10 PM, David Miller <davem@davemloft.net> wrote:
> Shmulik Ladkani (1):
> netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
This adds a new user of set_fs(), which we're trying to eliminate (or
at least not expand):
+ set_fs(KERNEL_DS);
+ fd = bpf_obj_get_user(path);
+ set_fs(oldfs);
Can you please adjust this to not make set_fs() changes?
Thanks!
-Kees
--
Kees Cook
Pixel Security
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-01 0:57 netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Kees Cook @ 2017-12-01 1:33 ` Al Viro 2017-12-01 3:48 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2017-12-01 1:33 UTC (permalink / raw) To: Kees Cook Cc: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On Thu, Nov 30, 2017 at 04:57:30PM -0800, Kees Cook wrote: > On Mon, Oct 9, 2017 at 4:10 PM, David Miller <davem@davemloft.net> wrote: > > Shmulik Ladkani (1): > > netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' > > This adds a new user of set_fs(), which we're trying to eliminate (or > at least not expand): > > + set_fs(KERNEL_DS); > + fd = bpf_obj_get_user(path); > + set_fs(oldfs); > > Can you please adjust this to not make set_fs() changes? That's not the worst problem there. Messing with descriptor table is much worse. It can be shared between threads; by the time you get to fdget() the damn thing might have nothing to do with what bpf_obj_get_user() has put there, ditto for sys_close(). Use of file descriptors should be limited to "got a number from userland, convert to struct file *" on the way in and "install struct file * into descriptor table and return the descriptor to userland" on the way out. And the latter - *ONLY* after the last possible point of failure. Once a file reference is inserted into descriptor table, that's it - you can't undo that. The only way to use bpf_obj_get_user() is to pass its return value to userland. As return value of syscall - not even put_user() (for that you'd need to reserve the descriptor, copy it to userland and only then attach struct file * to it). The whole approach stinks - what it needs is something that would take struct filename * and return struct bpf_prog * or struct file * reference. With bpf_obj_get_user() and this thing implemented via that. I'm looking into that thing... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-01 1:33 ` Al Viro @ 2017-12-01 3:48 ` Al Viro 2017-12-01 4:54 ` Al Viro 2017-12-01 21:34 ` Daniel Borkmann 0 siblings, 2 replies; 12+ messages in thread From: Al Viro @ 2017-12-01 3:48 UTC (permalink / raw) To: Kees Cook Cc: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On Fri, Dec 01, 2017 at 01:33:04AM +0000, Al Viro wrote: > Use of file descriptors should be limited to "got a number from userland, > convert to struct file *" on the way in and "install struct file * into > descriptor table and return the descriptor to userland" on the way out. > And the latter - *ONLY* after the last possible point of failure. Once > a file reference is inserted into descriptor table, that's it - you > can't undo that. > > The only way to use bpf_obj_get_user() is to pass its return value to > userland. As return value of syscall - not even put_user() (for that > you'd need to reserve the descriptor, copy it to userland and only > then attach struct file * to it). > > The whole approach stinks - what it needs is something that would > take struct filename * and return struct bpf_prog * or struct file * > reference. With bpf_obj_get_user() and this thing implemented > via that. > > I'm looking into that thing... What it tries to pull off is something not far from static struct bpf_prog *__get_prog(struct inode *inode, enum bpf_prog_type type) { struct bpf_prog *prog; int err = inode_permission(inode, FMODE_READ | FMODE_WRITE); if (err) return ERR_PTR(err); if (inode->i_op == &bpf_map_iops) return ERR_PTR(-EINVAL); if (inode->i_op != &bpf_prog_iops) return ERR_PTR(-EACCES); prog = inode->i_private; err = security_bpf_prog(prog); if (err < 0) return ERR_PTR(err); if (!bpf_prog_get_ok(prog, &type, false)) return ERR_PTR(-EINVAL); return bpf_prog_inc(prog); } struct bpf_prog *get_prog_path_type(const char *name, enum bpf_prog_type type) { struct path path; struct bpf_prog *prog; int err = kern_path(name, LOOKUP_FOLLOW, &path); if (err) return ERR_PTR(err); prog = __get_prog(d_backing_inode(path.dentry), type); if (!IS_ERR(prog)) touch_atime(&path); path_put(&path); return prog; } static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret) { *ret = get_prog_path_type(path, BPF_PROG_TYPE_SOCKET_FILTER); return PTR_ERR_OR_ZERO(*ret); } That skips all tracepoint random shite (pardon the triple redundance) and makes a somewhat arbitrary change for touch_atime() logics. And, of course, it is not even compile-tested. Something similar to get_prog_path_type() above might make for a usable primitive, IMO... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-01 3:48 ` Al Viro @ 2017-12-01 4:54 ` Al Viro 2017-12-01 17:39 ` Al Viro [not found] ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com> 2017-12-01 21:34 ` Daniel Borkmann 1 sibling, 2 replies; 12+ messages in thread From: Al Viro @ 2017-12-01 4:54 UTC (permalink / raw) To: Kees Cook Cc: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On Fri, Dec 01, 2017 at 03:48:59AM +0000, Al Viro wrote: > Something similar to get_prog_path_type() above might make for a usable > primitive, IMO... Incidentally, bpf_obj_get_user()/bpf_obj_do_get() should just use user_path(), rather than wanking with getname()+kern_path(pname->name)+putname(). Note that kern_path() will do getname_kernel() to get struct pathname... Would cause problems for tracepoints in there, though. And that, BTW, is precisely why I don't want tracepoints in core VFS, TYVM - makes restructuring the code harder... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-01 4:54 ` Al Viro @ 2017-12-01 17:39 ` Al Viro 2017-12-01 20:47 ` Daniel Borkmann [not found] ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com> 1 sibling, 1 reply; 12+ messages in thread From: Al Viro @ 2017-12-01 17:39 UTC (permalink / raw) To: Kees Cook Cc: Shmulik Ladkani, Willem de Bruijn, Daniel Borkmann, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On Fri, Dec 01, 2017 at 04:54:39AM +0000, Al Viro wrote: > On Fri, Dec 01, 2017 at 03:48:59AM +0000, Al Viro wrote: > > > Something similar to get_prog_path_type() above might make for a usable > > primitive, IMO... > > Incidentally, bpf_obj_get_user()/bpf_obj_do_get() should just use > user_path(), rather than wanking with getname()+kern_path(pname->name)+putname(). > Note that kern_path() will do getname_kernel() to get struct pathname... > > Would cause problems for tracepoints in there, though. And that, BTW, > is precisely why I don't want tracepoints in core VFS, TYVM - makes > restructuring the code harder... Egads... Contortions in bpf ->mknod() are really obnoxious. First of all, it checks that ->d_fsdata is non-NULL and fails otherwise. The only time ->d_fsdata gets non-NULL on that fs? In bpf_obj_do_pin(), this: dentry->d_fsdata = raw; ret = vfs_mknod(dir, dentry, mode, devt); dentry->d_fsdata = NULL; In other words, it's *not* going to work from normal mknod(2). Why go through ->mknod(), then, especially since it requires that kind of contortions to pass the data in? devt is 0:1 or 0:2 here. mode? Character or block device, right? Like hell - it's a regular file. And devt is a cute way to pass a flag down into bpf_mkobj() (aka. ->mknod()) through vfs_mknod(). No, it doesn't go into ->i_rdev... And to make the things even more fun, the damn thing is passed to a couple of Linux S&M hooks - security_path_mknod() and security_inode_mknod(). Oh, sorry - three hooks. There's devcgroup_inode_mknod() as well, but that thing sees S_IFREG in mode and buggers off quietly. Our esteemed sadomaso^Wsecurity community gets to play, though. Without any way to see _what_ are we attaching to that place in the bpf fs tree, but hey - it's security, it doesn't need to make sense... What the hell? If you need a clean way to do something, why don't you describe (on fsdevel, or in off-list mail to relevant people) what do you really want? Sure, you can "work around" anything, but doesn't that level of perversion strike you as a clear sign of something being not right? For crying out loud, you are trying to pass a tagged pointer to one or another kind of object into your own function. For that you * use a field in a globally visible data structure as a temporary storage for a pointer * encode your tag (essentially a boolean) into a fucking _device_ _number_, of all things, and shove it through, hoping that no LSM module gets weirded out by non-zero device number combined with regular file for mode. If that does not scream "wrong or missing primitive", I don't know what would. You want something along the lines of "create a filesystem object at given location, calling this function with this argument for actual object creation"? Fair enough, but then let's add a primitive that would do just that. And grepping around for similar sick tricks catches a slightly milder example - mq_open(2) doesn't play with encoding stuff into dev_t, but otherwise it's very similar and could also benefit from the same primitive. How about something like this: int vfs_mkobj(struct dentry *dentry, umode_t mode, int (*f)(struct dentry *, umode_t, void *), void *arg) { struct inode *dir = dentry->d_parent->d_inode; int error = may_create(dir, dentry); if (error) return error; mode &= S_IALLUGO; mode |= S_IFREG; error = security_inode_create(dir, dentry, mode); if (error) return error; error = f(dentry, mode, arg); if (!error) fsnotify_create(dir, dentry); return error; } exported by fs/namei.c, with your code doing switch (type) { case BPF_TYPE_PROG: error = vfs_mkobj(path.dentry, mode, bpf_mkprog, raw); break; case BPF_TYPE_MAP: error = vfs_mkobj(path.dentry, mode, bpf_mkmap, raw); break; default: error = -EPERM; } instead that vfs_mknod() hack, with static int bpf_mkprog(struct inode *dir, struct dentry *dentry, umode_t mode, void *raw) { return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_prog_iops); } static int bpf_mkmap(struct inode *dir, struct dentry *dentry, umode_t mode, void *raw) { return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_map_iops); } static int bpf_mkobj_ops(struct inode *dir, struct dentry *dentry, umode_t mode, void *raw, struct inode_operations *iops) { struct inode *inode; inode = bpf_get_inode(dir->i_sb, dir, mode); if (IS_ERR(inode)) return PTR_ERR(inode); inode->i_op = iops; inode->i_private = raw; bpf_dentry_finalize(dentry, inode, dir); return 0; } And to hell with messing with dev_t, ->d_fsdata or having ->mknod() there at all... Might want to replace security_path_mknod() with something saner, while we are at it. Objections? PS: mqueue.c would also benefit from such primitive - do_create() there would simply pass attr as callback's argument into vfs_mkobj(), with callback being the guts of mqueue_create()... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-01 17:39 ` Al Viro @ 2017-12-01 20:47 ` Daniel Borkmann 2017-12-02 18:48 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Daniel Borkmann @ 2017-12-01 20:47 UTC (permalink / raw) To: Al Viro, Kees Cook Cc: Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On 12/01/2017 06:39 PM, Al Viro wrote: [...] > If that does not scream "wrong or missing primitive", I don't know what would. > You want something along the lines of "create a filesystem object at given > location, calling this function with this argument for actual object creation"? > Fair enough, but then let's add a primitive that would do just that. > > And grepping around for similar sick tricks catches a slightly milder example - > mq_open(2) doesn't play with encoding stuff into dev_t, but otherwise it's very > similar and could also benefit from the same primitive. > > How about something like this: > int vfs_mkobj(struct dentry *dentry, umode_t mode, > int (*f)(struct dentry *, umode_t, void *), > void *arg) > { > struct inode *dir = dentry->d_parent->d_inode; > int error = may_create(dir, dentry); > if (error) > return error; > > mode &= S_IALLUGO; > mode |= S_IFREG; > error = security_inode_create(dir, dentry, mode); > if (error) > return error; > error = f(dentry, mode, arg); > if (!error) > fsnotify_create(dir, dentry); > return error; > } > > exported by fs/namei.c, with your code doing > > switch (type) { > case BPF_TYPE_PROG: > error = vfs_mkobj(path.dentry, mode, bpf_mkprog, raw); > break; > case BPF_TYPE_MAP: > error = vfs_mkobj(path.dentry, mode, bpf_mkmap, raw); > break; > default: > error = -EPERM; > } > instead that vfs_mknod() hack, with > > static int bpf_mkprog(struct inode *dir, struct dentry *dentry, > umode_t mode, void *raw) > { > return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_prog_iops); > } > > static int bpf_mkmap(struct inode *dir, struct dentry *dentry, > umode_t mode, void *raw) > { > return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_map_iops); > } > > static int bpf_mkobj_ops(struct inode *dir, struct dentry *dentry, > umode_t mode, void *raw, struct inode_operations *iops) > { > struct inode *inode; > > inode = bpf_get_inode(dir->i_sb, dir, mode); > if (IS_ERR(inode)) > return PTR_ERR(inode); > > inode->i_op = iops; > inode->i_private = raw; > > bpf_dentry_finalize(dentry, inode, dir); > return 0; > } > > And to hell with messing with dev_t, ->d_fsdata or having ->mknod() there at all... > Might want to replace security_path_mknod() with something saner, while we are > at it. > > Objections? No, thanks for looking into this, and sorry for this fugly hack! :( Not that this doesn't make it any better, but I think back then I took it over from mqueue implementation ... should have known better and looking into making this generic instead, sigh. The above looks good to me, so no objections from my side and thanks for working on it! > PS: mqueue.c would also benefit from such primitive - do_create() there would > simply pass attr as callback's argument into vfs_mkobj(), with callback being > the guts of mqueue_create()... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-01 20:47 ` Daniel Borkmann @ 2017-12-02 18:48 ` Al Viro 2017-12-02 22:08 ` Al Viro 2017-12-04 9:57 ` Daniel Borkmann 0 siblings, 2 replies; 12+ messages in thread From: Al Viro @ 2017-12-02 18:48 UTC (permalink / raw) To: Daniel Borkmann Cc: Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote: > > Might want to replace security_path_mknod() with something saner, while we are > > at it. > > > > Objections? > > No, thanks for looking into this, and sorry for this fugly hack! :( Not > that this doesn't make it any better, but I think back then I took it > over from mqueue implementation ... should have known better and looking > into making this generic instead, sigh. The above looks good to me, so > no objections from my side and thanks for working on it! > > > PS: mqueue.c would also benefit from such primitive - do_create() there would > > simply pass attr as callback's argument into vfs_mkobj(), with callback being > > the guts of mqueue_create()... OK... See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2) passes LTP tests, but that's not saying much, and BPF side is completely untested. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-02 18:48 ` Al Viro @ 2017-12-02 22:08 ` Al Viro 2017-12-03 4:22 ` Willem de Bruijn 2017-12-04 9:57 ` Daniel Borkmann 1 sibling, 1 reply; 12+ messages in thread From: Al Viro @ 2017-12-02 22:08 UTC (permalink / raw) To: Daniel Borkmann Cc: Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On Sat, Dec 02, 2017 at 06:48:50PM +0000, Al Viro wrote: > On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote: > > > > Might want to replace security_path_mknod() with something saner, while we are > > > at it. > > > > > > Objections? > > > > No, thanks for looking into this, and sorry for this fugly hack! :( Not > > that this doesn't make it any better, but I think back then I took it > > over from mqueue implementation ... should have known better and looking > > into making this generic instead, sigh. The above looks good to me, so > > no objections from my side and thanks for working on it! > > > > > PS: mqueue.c would also benefit from such primitive - do_create() there would > > > simply pass attr as callback's argument into vfs_mkobj(), with callback being > > > the guts of mqueue_create()... > > OK... See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2) > passes LTP tests, but that's not saying much, and BPF side is completely > untested. ... and FWIW, completely untested patch for net/netfilter/xt_bpf.c follows: diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e55e4255a210..a7000e4775e7 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -514,6 +514,9 @@ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd, return bpf_prog_get_type_dev(ufd, type, false); } +struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type); +bool bpf_prog_get_ok(struct bpf_prog *, enum bpf_prog_type *, bool); + int bpf_prog_offload_compile(struct bpf_prog *prog); void bpf_prog_offload_destroy(struct bpf_prog *prog); diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 2b75faccc771..9d1050dc2a7a 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -364,6 +364,45 @@ int bpf_obj_get_user(const char __user *pathname, int flags) } EXPORT_SYMBOL_GPL(bpf_obj_get_user); +static struct bpf_prog *__get_prog_inode(struct inode *inode, enum bpf_prog_type type) +{ + struct bpf_prog *prog; + int ret = inode_permission(inode, MAY_READ | MAY_WRITE); + if (ret) + return ERR_PTR(ret); + + if (inode->i_op == &bpf_map_iops) + return ERR_PTR(-EINVAL); + if (inode->i_op != &bpf_prog_iops) + return ERR_PTR(-EACCES); + + prog = inode->i_private; + + ret = security_bpf_prog(prog); + if (ret < 0) + return ERR_PTR(ret); + + if (!bpf_prog_get_ok(prog, &type, false)) + return ERR_PTR(-EINVAL); + + return bpf_prog_inc(prog); +} + +struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type) +{ + struct bpf_prog *prog; + struct path path; + int ret = kern_path(name, LOOKUP_FOLLOW, &path); + if (ret) + return ERR_PTR(ret); + prog = __get_prog_inode(d_backing_inode(path.dentry), type); + if (!IS_ERR(prog)) + touch_atime(&path); + path_put(&path); + return prog; +} +EXPORT_SYMBOL(bpf_prog_get_type_path); + static void bpf_evict_inode(struct inode *inode) { enum bpf_type type; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 2c4cfeaa8d5e..5cb783fc8224 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1057,7 +1057,7 @@ struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog) } EXPORT_SYMBOL_GPL(bpf_prog_inc_not_zero); -static bool bpf_prog_get_ok(struct bpf_prog *prog, +bool bpf_prog_get_ok(struct bpf_prog *prog, enum bpf_prog_type *attach_type, bool attach_drv) { /* not an attachment, just a refcount inc, always allow */ diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c index 041da0d9c06f..fa2ca0a13619 100644 --- a/net/netfilter/xt_bpf.c +++ b/net/netfilter/xt_bpf.c @@ -52,18 +52,8 @@ static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret) static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret) { - mm_segment_t oldfs = get_fs(); - int retval, fd; - - set_fs(KERNEL_DS); - fd = bpf_obj_get_user(path, 0); - set_fs(oldfs); - if (fd < 0) - return fd; - - retval = __bpf_mt_check_fd(fd, ret); - sys_close(fd); - return retval; + *ret = bpf_prog_get_type_path(path, BPF_PROG_TYPE_SOCKET_FILTER); + return PTR_ERR_OR_ZERO(*ret); } static int bpf_mt_check(const struct xt_mtchk_param *par) ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-02 22:08 ` Al Viro @ 2017-12-03 4:22 ` Willem de Bruijn 0 siblings, 0 replies; 12+ messages in thread From: Willem de Bruijn @ 2017-12-03 4:22 UTC (permalink / raw) To: Al Viro Cc: Daniel Borkmann, Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn >> OK... See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2) >> passes LTP tests, but that's not saying much, and BPF side is completely >> untested. > > ... and FWIW, completely untested patch for net/netfilter/xt_bpf.c follows: Thanks a lot for this fix. The tree including the bpf fix passes this basic xt_bpf test: mount -t bpf bpf /sys/fs/bpf ./pin /sys/fs/bpf/pass iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/five -j LOG iptables -L INPUT iptables -F INPUT where pin is as follows: diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index adeaa1302f34..0cd2bb8d634b 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -41,6 +41,7 @@ hostprogs-y += xdp_redirect_map hostprogs-y += xdp_redirect_cpu hostprogs-y += xdp_monitor hostprogs-y += syscall_tp +hostprogs-y += pin # Libbpf dependencies LIBBPF := ../../tools/lib/bpf/bpf.o @@ -89,6 +90,7 @@ xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o +pin-objs := $(LIBBPF) pin.o # Tell kbuild to always build the programs always := $(hostprogs-y) diff --git a/samples/bpf/pin.c b/samples/bpf/pin.c new file mode 100644 index 000000000000..826e86784edf --- /dev/null +++ b/samples/bpf/pin.c @@ -0,0 +1,41 @@ +#define _GNU_SOURCE + +#include <errno.h> +#include <error.h> +#include <linux/bpf.h> +#include <linux/filter.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include "libbpf.h" +#include "bpf_load.h" + +static char log_buf[1 << 16]; + +int main(int argc, char **argv) +{ + struct bpf_insn prog[] = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_EXIT_INSN(), + }; + int fd; + + if (argc != 2) + error(1, 0, "Usage: %s <filepath>\n", argv[0]); + + fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, + sizeof(prog) / sizeof(prog[0]), + "GPL", 0, log_buf, sizeof(log_buf)); + if (fd == -1) + error(1, errno, "load: %s", log_buf); + + if (bpf_obj_pin(fd, argv[1])) + error(1, errno, "pin"); + + if (close(fd)) + error(1, errno, "close"); + + return 0; +} ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-02 18:48 ` Al Viro 2017-12-02 22:08 ` Al Viro @ 2017-12-04 9:57 ` Daniel Borkmann 1 sibling, 0 replies; 12+ messages in thread From: Daniel Borkmann @ 2017-12-04 9:57 UTC (permalink / raw) To: Al Viro Cc: Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On 12/02/2017 07:48 PM, Al Viro wrote: > On Fri, Dec 01, 2017 at 09:47:00PM +0100, Daniel Borkmann wrote: >>> Might want to replace security_path_mknod() with something saner, while we are >>> at it. >>> >>> Objections? >> >> No, thanks for looking into this, and sorry for this fugly hack! :( Not >> that this doesn't make it any better, but I think back then I took it >> over from mqueue implementation ... should have known better and looking >> into making this generic instead, sigh. The above looks good to me, so >> no objections from my side and thanks for working on it! >> >>> PS: mqueue.c would also benefit from such primitive - do_create() there would >>> simply pass attr as callback's argument into vfs_mkobj(), with callback being >>> the guts of mqueue_create()... > > OK... See vfs.git#untested.mkobj; it really needs testing, though - mq_open(2) > passes LTP tests, but that's not saying much, and BPF side is completely > untested. I pulled vfs.git#untested.mkobj into my local tree and ran tests for both progs and maps on it, all went fine and the patch looks good to me. For 'bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod()' when you push the fix to Linus, feel free to add: Acked-by: Daniel Borkmann <daniel@iogearbox.net> Thanks for your help, Al! ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com>]
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' [not found] ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com> @ 2017-12-01 20:13 ` Daniel Borkmann 0 siblings, 0 replies; 12+ messages in thread From: Daniel Borkmann @ 2017-12-01 20:13 UTC (permalink / raw) To: Linus Torvalds, Al Viro Cc: Kees Cook, Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On 12/01/2017 07:28 PM, Linus Torvalds wrote: > [ Sorry for HTML email crud - traveling and on mobile right now ] > > On Nov 30, 2017 23:54, "Al Viro" <viro@zeniv.linux.org.uk> wrote: > > Would cause problems for tracepoints in there, though. And that, BTW, > is precisely why I don't want tracepoints in core VFS, TYVM - makes > restructuring the code harder... > > Just ignore them, see if anybody notices, and then they can add them back. > Tracepoints shouldn't hold up kernel development, and I doubt these are > ones that could be noticed by normal users. Yep, agree, if it really gets in the way, then lets remove them for now. After all, that was what was decided anyway. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' 2017-12-01 3:48 ` Al Viro 2017-12-01 4:54 ` Al Viro @ 2017-12-01 21:34 ` Daniel Borkmann 1 sibling, 0 replies; 12+ messages in thread From: Daniel Borkmann @ 2017-12-01 21:34 UTC (permalink / raw) To: Al Viro, Kees Cook Cc: Shmulik Ladkani, Willem de Bruijn, Pablo Neira Ayuso, Linus Torvalds, David Miller, LKML, Network Development, Christoph Hellwig, Thomas Garnier, Jann Horn On 12/01/2017 04:48 AM, Al Viro wrote: > On Fri, Dec 01, 2017 at 01:33:04AM +0000, Al Viro wrote: > >> Use of file descriptors should be limited to "got a number from userland, >> convert to struct file *" on the way in and "install struct file * into >> descriptor table and return the descriptor to userland" on the way out. >> And the latter - *ONLY* after the last possible point of failure. Once >> a file reference is inserted into descriptor table, that's it - you >> can't undo that. >> >> The only way to use bpf_obj_get_user() is to pass its return value to >> userland. As return value of syscall - not even put_user() (for that >> you'd need to reserve the descriptor, copy it to userland and only >> then attach struct file * to it). >> >> The whole approach stinks - what it needs is something that would >> take struct filename * and return struct bpf_prog * or struct file * >> reference. With bpf_obj_get_user() and this thing implemented >> via that. Agree, the "fix" is completely buggy due to fd being exposed to user space during that period of time ... >> I'm looking into that thing... > > What it tries to pull off is something not far from > > static struct bpf_prog *__get_prog(struct inode *inode, enum bpf_prog_type type) > { > struct bpf_prog *prog; > int err = inode_permission(inode, FMODE_READ | FMODE_WRITE); > if (err) > return ERR_PTR(err); > > if (inode->i_op == &bpf_map_iops) > return ERR_PTR(-EINVAL); > > if (inode->i_op != &bpf_prog_iops) > return ERR_PTR(-EACCES); > > prog = inode->i_private; > err = security_bpf_prog(prog); > if (err < 0) > return ERR_PTR(err); > > if (!bpf_prog_get_ok(prog, &type, false)) > return ERR_PTR(-EINVAL); > > return bpf_prog_inc(prog); > } > > struct bpf_prog *get_prog_path_type(const char *name, enum bpf_prog_type type) > { > struct path path; > struct bpf_prog *prog; > int err = kern_path(name, LOOKUP_FOLLOW, &path); > if (err) > return ERR_PTR(err); > prog = __get_prog(d_backing_inode(path.dentry), type); > if (!IS_ERR(prog)) > touch_atime(&path); > path_put(&path); > return prog; > } > > static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret) > { > *ret = get_prog_path_type(path, BPF_PROG_TYPE_SOCKET_FILTER); > return PTR_ERR_OR_ZERO(*ret); > } > > That skips all tracepoint random shite (pardon the triple redundance) and makes > a somewhat arbitrary change for touch_atime() logics. And, of course, it is > not even compile-tested. > > Something similar to get_prog_path_type() above might make for a usable > primitive, IMO... The above looks good to me! ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-12-04 9:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-01 0:57 netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Kees Cook 2017-12-01 1:33 ` Al Viro 2017-12-01 3:48 ` Al Viro 2017-12-01 4:54 ` Al Viro 2017-12-01 17:39 ` Al Viro 2017-12-01 20:47 ` Daniel Borkmann 2017-12-02 18:48 ` Al Viro 2017-12-02 22:08 ` Al Viro 2017-12-03 4:22 ` Willem de Bruijn 2017-12-04 9:57 ` Daniel Borkmann [not found] ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com> 2017-12-01 20:13 ` Daniel Borkmann 2017-12-01 21:34 ` Daniel Borkmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).