From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 342DEC28EB7 for ; Thu, 6 Jun 2019 21:50:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 003C4208C0 for ; Thu, 6 Jun 2019 21:50:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="vLIj4Dzc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728617AbfFFVuh (ORCPT ); Thu, 6 Jun 2019 17:50:37 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:45259 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726941AbfFFVuh (ORCPT ); Thu, 6 Jun 2019 17:50:37 -0400 Received: by mail-qk1-f196.google.com with SMTP id s22so32462qkj.12; Thu, 06 Jun 2019 14:50:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Jan8m2cOIMygeS4TLXlPGZKGiE1V/r1D2arKcybM6lU=; b=vLIj4DzcQgmOTcI07xzGw0+Se1Irsx8fBJidw/7BFcf4w3TPiMAilqc5chyuMZJXZw MwSACwnS0vNfZlyi1gzIQog39dfAWB8tKKgWKrtrhQQX7HBgna2n7pvStMqeOSfg/tsG vN044MLUyscYgXbyA5XEOYrxO+zdqlIEwyDo/vQ7P4A4wU1YdUFKKACz+7JT9VanI3L/ 9te1uF1DopsSpZ/U5sDvxRxLrVedkApmV1QPCEMt9YUbgXK2W/mAZqZPRwk3watAGxwd 6ql4gJJqs5N0vc5uQXkmHYJtrjxvN+s4pLPqcasoi4hG+3as+PzqBYzSGulDS6OTqlU7 MhOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Jan8m2cOIMygeS4TLXlPGZKGiE1V/r1D2arKcybM6lU=; b=WmfxVhc9qsaYdrsY+pirowQom9/ru1sjXqbELfU3pUs1czKSL1q1Bm4/CZQcJNIouE MOyM1yPdTfXnxLLnMuMsNiUbBNSx8PDV3Il57UgTYzWYU1Ik+Y+csiyzxdG+Stft6tDQ ShdzcYrPoYFzMr1uxMeH+7Ec4iTlpZ+V8yyLghq6RUY6SWBtNmlePubcLV/Vc6BPH/mG Xuq46CtXZs8XBmixN0wK6K1BIp2LElVNjhsAdDIYIaM9GTJunGijajVOPbvr6YKrdL3q 7ip2VtXDeRSJ1z/Xs2G8R5r2jrRsvRATW7CMR3/odYYe+GrKq/HnpEoRec7ViFiqcyiP 6/hQ== X-Gm-Message-State: APjAAAVd2VZGYD3cM3Bkxy8KuOzLy18S8icJblbRz8Y+z8c4IHuMdgHY 811IaaFPm0iTAaqNusihRaie6q0qaygkhOB1flXSzZIfdzU= X-Google-Smtp-Source: APXvYqzPqxRKBP2DiQr9eAHjiWUmIC3K9zGQnHyEGcEI5KfyabVJ4iLTm+DEvbeVpGapYL8kRXaGGqp4okHPuDD1Nhs= X-Received: by 2002:a05:620a:147:: with SMTP id e7mr40572613qkn.247.1559857835137; Thu, 06 Jun 2019 14:50:35 -0700 (PDT) MIME-Version: 1.0 References: <20190606175146.205269-1-sdf@google.com> <20190606175146.205269-2-sdf@google.com> In-Reply-To: <20190606175146.205269-2-sdf@google.com> From: Andrii Nakryiko Date: Thu, 6 Jun 2019 14:50:24 -0700 Message-ID: Subject: Re: [PATCH bpf-next v2 1/8] bpf: implement getsockopt and setsockopt hooks To: Stanislav Fomichev Cc: Networking , bpf , davem@davemloft.net, Alexei Starovoitov , Daniel Borkmann Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Jun 6, 2019 at 12:03 PM Stanislav Fomichev wrote: > > Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and > BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks. > > BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments. > BPF_CGROUP_GETSOCKOPT can modify the supplied buffer. > Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure. > > The buffer memory is pre-allocated (because I don't think there is > a precedent for working with __user memory from bpf). This might be > slow to do for each {s,g}etsockopt call, that's why I've added > __cgroup_bpf_prog_array_is_empty that exits early if there is nothing > attached to a cgroup. Note, however, that there is a race between > __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup > program layout might have changed; this should not be a problem > because in general there is a race between multiple calls to > {s,g}etsocktop and user adding/removing bpf progs from a cgroup. > > The return code of the BPF program is handled as follows: > * 0: EPERM > * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits > * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF > prog exits > > v2: > * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau) > * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau) > * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau) > * added [0,2] return code check to verifier (Martin Lau) > * dropped unused buf[64] from the stack (Martin Lau) > * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau) > * dropped bpf_target_off from ctx rewrites (Martin Lau) > * use return code for kernel bypass (Martin Lau & Andrii Nakryiko) > > Signed-off-by: Stanislav Fomichev > --- > include/linux/bpf-cgroup.h | 29 ++++ > include/linux/bpf.h | 46 ++++++ > include/linux/bpf_types.h | 1 + > include/linux/filter.h | 13 ++ > include/uapi/linux/bpf.h | 14 ++ > kernel/bpf/cgroup.c | 277 +++++++++++++++++++++++++++++++++++++ > kernel/bpf/core.c | 9 ++ > kernel/bpf/syscall.c | 19 +++ > kernel/bpf/verifier.c | 15 ++ > net/core/filter.c | 4 +- > net/socket.c | 18 +++ > 11 files changed, 443 insertions(+), 2 deletions(-) > > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h > index b631ee75762d..406f1ba82531 100644 > --- a/include/linux/bpf-cgroup.h > +++ b/include/linux/bpf-cgroup.h > @@ -124,6 +124,13 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, > loff_t *ppos, void **new_buf, > enum bpf_attach_type type); > > +int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int level, > + int optname, char __user *optval, > + unsigned int optlen); > +int __cgroup_bpf_run_filter_getsockopt(struct sock *sock, int level, > + int optname, char __user *optval, > + int __user *optlen); > + > static inline enum bpf_cgroup_storage_type cgroup_storage_type( > struct bpf_map *map) > { > @@ -280,6 +287,26 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, > __ret; \ > }) > > +#define BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock, level, optname, optval, optlen) \ > +({ \ > + int __ret = 0; \ > + if (cgroup_bpf_enabled) \ > + __ret = __cgroup_bpf_run_filter_setsockopt(sock, level, \ > + optname, optval, \ > + optlen); \ > + __ret; \ > +}) > + > +#define BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock, level, optname, optval, optlen) \ > +({ \ > + int __ret = 0; \ > + if (cgroup_bpf_enabled) \ > + __ret = __cgroup_bpf_run_filter_getsockopt(sock, level, \ > + optname, optval, \ > + optlen); \ > + __ret; \ > +}) > + > int cgroup_bpf_prog_attach(const union bpf_attr *attr, > enum bpf_prog_type ptype, struct bpf_prog *prog); > int cgroup_bpf_prog_detach(const union bpf_attr *attr, > @@ -349,6 +376,8 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map, > #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; }) > #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; }) > #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos,nbuf) ({ 0; }) > +#define BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock, level, optname, optval, optlen) ({ 0; }) > +#define BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock, level, optname, optval, optlen) ({ 0; }) > > #define for_each_cgroup_storage_type(stype) for (; false; ) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index e5a309e6a400..883a190bc0b8 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -520,6 +520,7 @@ struct bpf_prog_array { > struct bpf_prog_array *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags); > void bpf_prog_array_free(struct bpf_prog_array *progs); > int bpf_prog_array_length(struct bpf_prog_array *progs); > +bool bpf_prog_array_is_empty(struct bpf_prog_array *array); > int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs, > __u32 __user *prog_ids, u32 cnt); > > @@ -606,6 +607,49 @@ _out: \ > _ret; \ > }) > > +/* To be used by BPF_PROG_TYPE_CGROUP_SOCKOPT program type. > + * > + * Expected BPF program return values are: > + * 0: return -EPERM to the userspace > + * 1: sockopt was not handled by BPF, kernel should do it > + * 2: sockopt was handled by BPF, kernel not should do it and return typo: should not do it? > + * to the userspace instead > + * > + * Note, that return '0' takes precedence over everything else. In other > + * words, if any single program in the prog array has returned 0, > + * the userspace will get -EPERM (regardless of what other programs > + * return). > + * > + * The macro itself returns: > + * 0: sockopt was not handled by BPF, kernel should do it > + * 1: sockopt was handled by BPF, kernel snot hould do it typo: "snot hould do it" -> "shouldn't do it"? > + * -EPERM: return error back to userspace > + */ > +#define BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY(array, ctx, func) \ > + ({ \ > + struct bpf_prog_array_item *_item; \ > + struct bpf_prog *_prog; \ > + struct bpf_prog_array *_array; \ > + u32 ret; \ > + u32 _success = 1; \ > + u32 _bypass = 0; \ reverse Christmas tree? or it's not enforced in a macro? > + preempt_disable(); \ > + rcu_read_lock(); \ > + _array = rcu_dereference(array); \ > + _item = &_array->items[0]; \ > + while ((_prog = READ_ONCE(_item->prog))) { \ > + bpf_cgroup_storage_set(_item->cgroup_storage); \ > + ret = func(_prog, ctx); \ > + _success &= (ret > 0); \ > + _bypass |= (ret == 2); \ > + _item++; \ > + } \ > + rcu_read_unlock(); \ > + preempt_enable(); \ > + ret = _success ? _bypass : -EPERM; \ > + ret; \ > + }) > + > #define BPF_PROG_RUN_ARRAY(array, ctx, func) \ > __BPF_PROG_RUN_ARRAY(array, ctx, func, false) > > @@ -1054,6 +1098,8 @@ extern const struct bpf_func_proto bpf_spin_unlock_proto; > extern const struct bpf_func_proto bpf_get_local_storage_proto; > extern const struct bpf_func_proto bpf_strtol_proto; > extern const struct bpf_func_proto bpf_strtoul_proto; > +extern const struct bpf_func_proto bpf_sk_fullsock_proto; > +extern const struct bpf_func_proto bpf_tcp_sock_proto; > > /* Shared helpers among cBPF and eBPF. */ > void bpf_user_rnd_init_once(void); > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h > index 5a9975678d6f..eec5aeeeaf92 100644 > --- a/include/linux/bpf_types.h > +++ b/include/linux/bpf_types.h > @@ -30,6 +30,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, raw_tracepoint_writable) > #ifdef CONFIG_CGROUP_BPF > BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev) > BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SYSCTL, cg_sysctl) > +BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCKOPT, cg_sockopt) > #endif > #ifdef CONFIG_BPF_LIRC_MODE2 > BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2) > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 43b45d6db36d..6e64d01e4e36 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -1199,4 +1199,17 @@ struct bpf_sysctl_kern { > u64 tmp_reg; > }; > > +struct bpf_sockopt_kern { > + struct sock *sk; > + u8 *optval; > + u8 *optval_end; > + s32 level; > + s32 optname; > + u32 optlen; > + > + /* Small on-stack optval buffer to avoid small allocations. > + */ > + u8 buf[64] __aligned(8); > +}; > + > #endif /* __LINUX_FILTER_H__ */ > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 7c6aef253173..310b6bbfded8 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -170,6 +170,7 @@ enum bpf_prog_type { > BPF_PROG_TYPE_FLOW_DISSECTOR, > BPF_PROG_TYPE_CGROUP_SYSCTL, > BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, > + BPF_PROG_TYPE_CGROUP_SOCKOPT, > }; > > enum bpf_attach_type { > @@ -192,6 +193,8 @@ enum bpf_attach_type { > BPF_LIRC_MODE2, > BPF_FLOW_DISSECTOR, > BPF_CGROUP_SYSCTL, > + BPF_CGROUP_GETSOCKOPT, > + BPF_CGROUP_SETSOCKOPT, > __MAX_BPF_ATTACH_TYPE > }; > > @@ -3533,4 +3536,15 @@ struct bpf_sysctl { > */ > }; > > +struct bpf_sockopt { > + __bpf_md_ptr(struct bpf_sock *, sk); > + > + __s32 level; > + __s32 optname; > + > + __u32 optlen; > + __u32 optval; > + __u32 optval_end; > +}; > + > #endif /* _UAPI__LINUX_BPF_H__ */ > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > index 1b65ab0df457..04bc1a09464e 100644 > --- a/kernel/bpf/cgroup.c > +++ b/kernel/bpf/cgroup.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key); > EXPORT_SYMBOL(cgroup_bpf_enabled_key); > @@ -924,6 +925,142 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, > } > EXPORT_SYMBOL(__cgroup_bpf_run_filter_sysctl); > > +static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp, > + enum bpf_attach_type attach_type) > +{ > + struct bpf_prog_array *prog_array; > + bool empty; > + > + rcu_read_lock(); > + prog_array = rcu_dereference(cgrp->bpf.effective[attach_type]); > + empty = bpf_prog_array_is_empty(prog_array); > + rcu_read_unlock(); > + > + return empty; > +} > + > +static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen) > +{ > + if (unlikely(max_optlen > PAGE_SIZE)) > + return -EINVAL; > + > + if (likely(max_optlen <= sizeof(ctx->buf))) { > + ctx->optval = ctx->buf; > + } else { > + ctx->optval = kzalloc(max_optlen, GFP_USER); > + if (!ctx->optval) > + return -ENOMEM; > + } > + > + ctx->optval_end = ctx->optval + max_optlen; > + ctx->optlen = max_optlen; > + > + return 0; > +} > + > +static void sockopt_free_buf(struct bpf_sockopt_kern *ctx) > +{ > + if (unlikely(ctx->optval != ctx->buf)) > + kfree(ctx->optval); > +} > + > +int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int level, > + int optname, char __user *optval, > + unsigned int optlen) > +{ > + struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); > + struct bpf_sockopt_kern ctx = { > + .sk = sk, > + .level = level, > + .optname = optname, > + }; > + int ret; > + > + /* Opportunistic check to see whether we have any BPF program > + * attached to the hook so we don't waste time allocating > + * memory and locking the socket. > + */ > + if (__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_SETSOCKOPT)) > + return 0; > + > + ret = sockopt_alloc_buf(&ctx, optlen); > + if (ret) > + return ret; > + > + if (copy_from_user(ctx.optval, optval, optlen) != 0) { > + sockopt_free_buf(&ctx); > + return -EFAULT; > + } > + > + lock_sock(sk); > + ret = BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY( > + cgrp->bpf.effective[BPF_CGROUP_SETSOCKOPT], > + &ctx, BPF_PROG_RUN); > + release_sock(sk); > + > + sockopt_free_buf(&ctx); > + > + return ret; > +} > +EXPORT_SYMBOL(__cgroup_bpf_run_filter_setsockopt); > + > +int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, > + int optname, char __user *optval, > + int __user *optlen) > +{ > + struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); > + struct bpf_sockopt_kern ctx = { > + .sk = sk, > + .level = level, > + .optname = optname, > + }; > + int max_optlen; > + int ret; > + > + /* Opportunistic check to see whether we have any BPF program > + * attached to the hook so we don't waste time allocating > + * memory and locking the socket. > + */ > + if (__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_GETSOCKOPT)) > + return 0; > + > + if (get_user(max_optlen, optlen)) > + return -EFAULT; > + > + ret = sockopt_alloc_buf(&ctx, max_optlen); > + if (ret) > + return ret; > + > + lock_sock(sk); > + ret = BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY( > + cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT], > + &ctx, BPF_PROG_RUN); > + release_sock(sk); > + > + if (ret < 0) { > + sockopt_free_buf(&ctx); > + return ret; > + } > + > + if (ctx.optlen > max_optlen) { > + sockopt_free_buf(&ctx); > + return -EFAULT; > + } So this is the case where BPF program returns value that's bigger than a buffer provided by users? Existing code in sock_getsockopt handles that by filling out only first N bytes, instead of failing. Should the behavior be the same here? > + > + if (copy_to_user(optval, ctx.optval, ctx.optlen) != 0) { > + sockopt_free_buf(&ctx); > + return -EFAULT; > + } > + > + sockopt_free_buf(&ctx); > + > + if (put_user(ctx.optlen, optlen)) > + return -EFAULT; > + > + return ret; > +} > +EXPORT_SYMBOL(__cgroup_bpf_run_filter_getsockopt); > + > static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp, > size_t *lenp) > { > @@ -1184,3 +1321,143 @@ const struct bpf_verifier_ops cg_sysctl_verifier_ops = { > > const struct bpf_prog_ops cg_sysctl_prog_ops = { > }; > + > +static const struct bpf_func_proto * > +cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > +{ > + switch (func_id) { > + case BPF_FUNC_sk_fullsock: > + return &bpf_sk_fullsock_proto; > + case BPF_FUNC_sk_storage_get: > + return &bpf_sk_storage_get_proto; > + case BPF_FUNC_sk_storage_delete: > + return &bpf_sk_storage_delete_proto; > +#ifdef CONFIG_INET > + case BPF_FUNC_tcp_sock: > + return &bpf_tcp_sock_proto; > +#endif > + default: > + return cgroup_base_func_proto(func_id, prog); > + } > +} > + > +static bool cg_sockopt_is_valid_access(int off, int size, > + enum bpf_access_type type, > + const struct bpf_prog *prog, > + struct bpf_insn_access_aux *info) > +{ > + const int size_default = sizeof(__u32); > + > + if (off < 0 || off >= sizeof(struct bpf_sockopt)) > + return false; > + > + if (off % size != 0) > + return false; > + > + if (type == BPF_WRITE) { > + switch (off) { > + case offsetof(struct bpf_sockopt, optlen): > + if (size != size_default) > + return false; > + return prog->expected_attach_type == > + BPF_CGROUP_GETSOCKOPT; > + default: > + return false; > + } > + } > + > + switch (off) { > + case offsetof(struct bpf_sockopt, sk): > + if (size != sizeof(__u64)) > + return false; > + info->reg_type = PTR_TO_SOCKET; > + break; > + case bpf_ctx_range(struct bpf_sockopt, optval): > + if (size != size_default) > + return false; > + info->reg_type = PTR_TO_PACKET; > + break; > + case bpf_ctx_range(struct bpf_sockopt, optval_end): > + if (size != size_default) > + return false; > + info->reg_type = PTR_TO_PACKET_END; > + break; > + default: > + if (size != size_default) > + return false; > + break; > + } > + return true; > +} > + > +static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type, > + const struct bpf_insn *si, > + struct bpf_insn *insn_buf, > + struct bpf_prog *prog, > + u32 *target_size) > +{ > + struct bpf_insn *insn = insn_buf; > + > + switch (si->off) { > + case offsetof(struct bpf_sockopt, sk): > + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct > + bpf_sockopt_kern, sk), > + si->dst_reg, si->src_reg, > + offsetof(struct bpf_sockopt_kern, sk)); > + break; > + case offsetof(struct bpf_sockopt, level): > + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, > + offsetof(struct bpf_sockopt_kern, level)); > + break; > + case offsetof(struct bpf_sockopt, optname): > + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, > + offsetof(struct bpf_sockopt_kern, > + optname)); > + break; > + case offsetof(struct bpf_sockopt, optlen): > + if (type == BPF_WRITE) > + *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, > + offsetof(struct bpf_sockopt_kern, > + optlen)); > + else > + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, > + offsetof(struct bpf_sockopt_kern, > + optlen)); > + break; > + case offsetof(struct bpf_sockopt, optval): > + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, > + optval), > + si->dst_reg, si->src_reg, > + offsetof(struct bpf_sockopt_kern, > + optval)); > + break; > + case offsetof(struct bpf_sockopt, optval_end): > + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, > + optval_end), > + si->dst_reg, si->src_reg, > + offsetof(struct bpf_sockopt_kern, > + optval_end)); > + break; > + } > + > + return insn - insn_buf; > +} > + > +static int cg_sockopt_get_prologue(struct bpf_insn *insn_buf, > + bool direct_write, > + const struct bpf_prog *prog) > +{ > + /* Nothing to do for sockopt argument. The data is kzalloc'ated. > + */ > + return 0; > +} > + > +const struct bpf_verifier_ops cg_sockopt_verifier_ops = { > + .get_func_proto = cg_sockopt_func_proto, > + .is_valid_access = cg_sockopt_is_valid_access, > + .convert_ctx_access = cg_sockopt_convert_ctx_access, > + .gen_prologue = cg_sockopt_get_prologue, > +}; > + > +const struct bpf_prog_ops cg_sockopt_prog_ops = { > +}; > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 33fb292f2e30..e9152ebd66bc 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -1813,6 +1813,15 @@ int bpf_prog_array_length(struct bpf_prog_array *array) > return cnt; > } > > +bool bpf_prog_array_is_empty(struct bpf_prog_array *array) > +{ > + struct bpf_prog_array_item *item; > + > + for (item = array->items; item->prog; item++) > + if (item->prog != &dummy_bpf_prog.prog) > + return false; > + return true; > +} > > static bool bpf_prog_array_copy_core(struct bpf_prog_array *array, > u32 *prog_ids, > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 4c53cbd3329d..4ad2b5f1905f 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -1596,6 +1596,14 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type, > default: > return -EINVAL; > } > + case BPF_PROG_TYPE_CGROUP_SOCKOPT: > + switch (expected_attach_type) { > + case BPF_CGROUP_SETSOCKOPT: > + case BPF_CGROUP_GETSOCKOPT: > + return 0; > + default: > + return -EINVAL; > + } > default: > return 0; > } > @@ -1846,6 +1854,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog, > switch (prog->type) { > case BPF_PROG_TYPE_CGROUP_SOCK: > case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: > + case BPF_PROG_TYPE_CGROUP_SOCKOPT: > return attach_type == prog->expected_attach_type ? 0 : -EINVAL; > case BPF_PROG_TYPE_CGROUP_SKB: > return prog->enforce_expected_attach_type && > @@ -1916,6 +1925,10 @@ static int bpf_prog_attach(const union bpf_attr *attr) > case BPF_CGROUP_SYSCTL: > ptype = BPF_PROG_TYPE_CGROUP_SYSCTL; > break; > + case BPF_CGROUP_GETSOCKOPT: > + case BPF_CGROUP_SETSOCKOPT: > + ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT; > + break; > default: > return -EINVAL; > } > @@ -1997,6 +2010,10 @@ static int bpf_prog_detach(const union bpf_attr *attr) > case BPF_CGROUP_SYSCTL: > ptype = BPF_PROG_TYPE_CGROUP_SYSCTL; > break; > + case BPF_CGROUP_GETSOCKOPT: > + case BPF_CGROUP_SETSOCKOPT: > + ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT; > + break; > default: > return -EINVAL; > } > @@ -2031,6 +2048,8 @@ static int bpf_prog_query(const union bpf_attr *attr, > case BPF_CGROUP_SOCK_OPS: > case BPF_CGROUP_DEVICE: > case BPF_CGROUP_SYSCTL: > + case BPF_CGROUP_GETSOCKOPT: > + case BPF_CGROUP_SETSOCKOPT: > break; > case BPF_LIRC_MODE2: > return lirc_prog_query(attr, uattr); > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 5c2cb5bd84ce..fffc668ef536 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -1717,6 +1717,18 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env, > > env->seen_direct_write = true; > return true; > + > + case BPF_PROG_TYPE_CGROUP_SOCKOPT: > + if (t == BPF_WRITE) { > + if (env->prog->expected_attach_type == > + BPF_CGROUP_GETSOCKOPT) { > + env->seen_direct_write = true; > + return true; > + } > + return false; > + } > + return true; > + > default: > return false; > } > @@ -5524,6 +5536,9 @@ static int check_return_code(struct bpf_verifier_env *env) > case BPF_PROG_TYPE_CGROUP_DEVICE: > case BPF_PROG_TYPE_CGROUP_SYSCTL: > break; > + case BPF_PROG_TYPE_CGROUP_SOCKOPT: > + range = tnum_range(0, 2); > + break; > default: > return 0; > } > diff --git a/net/core/filter.c b/net/core/filter.c > index 55bfc941d17a..4652c0a005ca 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -1835,7 +1835,7 @@ BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk) > return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL; > } > > -static const struct bpf_func_proto bpf_sk_fullsock_proto = { > +const struct bpf_func_proto bpf_sk_fullsock_proto = { > .func = bpf_sk_fullsock, > .gpl_only = false, > .ret_type = RET_PTR_TO_SOCKET_OR_NULL, > @@ -5636,7 +5636,7 @@ BPF_CALL_1(bpf_tcp_sock, struct sock *, sk) > return (unsigned long)NULL; > } > > -static const struct bpf_func_proto bpf_tcp_sock_proto = { > +const struct bpf_func_proto bpf_tcp_sock_proto = { > .func = bpf_tcp_sock, > .gpl_only = false, > .ret_type = RET_PTR_TO_TCP_SOCK_OR_NULL, > diff --git a/net/socket.c b/net/socket.c > index 72372dc5dd70..e8654f1f70e6 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -2069,6 +2069,15 @@ static int __sys_setsockopt(int fd, int level, int optname, > if (err) > goto out_put; > > + err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, level, optname, > + optval, optlen); > + if (err < 0) { > + goto out_put; > + } else if (err > 0) { > + err = 0; > + goto out_put; > + } > + > if (level == SOL_SOCKET) > err = > sock_setsockopt(sock, level, optname, optval, > @@ -2106,6 +2115,15 @@ static int __sys_getsockopt(int fd, int level, int optname, > if (err) > goto out_put; > > + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, > + optval, optlen); > + if (err < 0) { > + goto out_put; > + } else if (err > 0) { > + err = 0; > + goto out_put; > + } > + > if (level == SOL_SOCKET) > err = > sock_getsockopt(sock, level, optname, optval, > -- > 2.22.0.rc1.311.g5d7573a151-goog >