* [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage
@ 2022-03-16 19:54 Joanne Koong
2022-03-17 2:23 ` Martin KaFai Lau
2022-03-17 4:06 ` kernel test robot
0 siblings, 2 replies; 6+ messages in thread
From: Joanne Koong @ 2022-03-16 19:54 UTC (permalink / raw)
To: bpf
Cc: kafai, kpsingh, memxor, ast, daniel, andrii, tj, davemarchevsky,
Joanne Koong
From: Joanne Koong <joannelkoong@gmail.com>
Currently, local storage memory can only be allocated atomically
(GFP_ATOMIC). This restriction is too strict for sleepable bpf
programs.
In this patch, the verifier detects whether the program is sleepable,
and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a
5th argument to bpf_task/sk/inode_storage_get. This flag will propagate
down to the local storage functions that allocate memory.
Please note that bpf_task/sk/inode_storage_update_elem functions are
invoked by userspace applications through syscalls. Preemption is
disabled before bpf_task/sk/inode_storage_update_elem is called, which
means they will always have to allocate memory atomically.
The existing local storage selftests cover both the GFP_ATOMIC and the
GFP_KERNEL cases in bpf_local_storage_update.
v2 <- v1:
* Allocate the memory before/after the raw_spin_lock_irqsave, depending
on the gfp flags
* Rename mem_flags to gfp_flags
* Reword the comment "*mem_flags* is set by the bpf verifier" to
"*gfp_flags* is a hidden argument provided by the verifier"
* Add a sentence to the commit message about existing local storage
selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in
bpf_local_storage_update.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: KP Singh <kpsingh@kernel.org>
---
include/linux/bpf_local_storage.h | 7 ++--
kernel/bpf/bpf_inode_storage.c | 9 ++---
kernel/bpf/bpf_local_storage.c | 58 ++++++++++++++++++++-----------
kernel/bpf/bpf_task_storage.c | 10 +++---
kernel/bpf/verifier.c | 20 +++++++++++
net/core/bpf_sk_storage.c | 21 ++++++-----
6 files changed, 84 insertions(+), 41 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 37b3906af8b1..493e63258497 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -154,16 +154,17 @@ void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
- bool charge_mem);
+ bool charge_mem, gfp_t gfp_flags);
int
bpf_local_storage_alloc(void *owner,
struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *first_selem);
+ struct bpf_local_storage_elem *first_selem,
+ gfp_t gfp_flags);
struct bpf_local_storage_data *
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
- void *value, u64 map_flags);
+ void *value, u64 map_flags, gfp_t gfp_flags);
void bpf_local_storage_free_rcu(struct rcu_head *rcu);
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index e29d9e3d853e..96be8d518885 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -136,7 +136,7 @@ static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
sdata = bpf_local_storage_update(f->f_inode,
(struct bpf_local_storage_map *)map,
- value, map_flags);
+ value, map_flags, GFP_ATOMIC);
fput(f);
return PTR_ERR_OR_ZERO(sdata);
}
@@ -169,8 +169,9 @@ static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
return err;
}
-BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
- void *, value, u64, flags)
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
+ void *, value, u64, flags, gfp_t, gfp_flags)
{
struct bpf_local_storage_data *sdata;
@@ -196,7 +197,7 @@ BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
sdata = bpf_local_storage_update(
inode, (struct bpf_local_storage_map *)map, value,
- BPF_NOEXIST);
+ BPF_NOEXIST, gfp_flags);
return IS_ERR(sdata) ? (unsigned long)NULL :
(unsigned long)sdata->data;
}
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 092a1ac772d7..01aa2b51ec4d 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -63,7 +63,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
- void *value, bool charge_mem)
+ void *value, bool charge_mem, gfp_t gfp_flags)
{
struct bpf_local_storage_elem *selem;
@@ -71,7 +71,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
return NULL;
selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
- GFP_ATOMIC | __GFP_NOWARN);
+ gfp_flags | __GFP_NOWARN);
if (selem) {
if (value)
memcpy(SDATA(selem)->data, value, smap->map.value_size);
@@ -282,7 +282,8 @@ static int check_flags(const struct bpf_local_storage_data *old_sdata,
int bpf_local_storage_alloc(void *owner,
struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *first_selem)
+ struct bpf_local_storage_elem *first_selem,
+ gfp_t gfp_flags)
{
struct bpf_local_storage *prev_storage, *storage;
struct bpf_local_storage **owner_storage_ptr;
@@ -293,7 +294,7 @@ int bpf_local_storage_alloc(void *owner,
return err;
storage = bpf_map_kzalloc(&smap->map, sizeof(*storage),
- GFP_ATOMIC | __GFP_NOWARN);
+ gfp_flags | __GFP_NOWARN);
if (!storage) {
err = -ENOMEM;
goto uncharge;
@@ -350,10 +351,10 @@ int bpf_local_storage_alloc(void *owner,
*/
struct bpf_local_storage_data *
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
- void *value, u64 map_flags)
+ void *value, u64 map_flags, gfp_t gfp_flags)
{
struct bpf_local_storage_data *old_sdata = NULL;
- struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage_elem *selem = NULL;
struct bpf_local_storage *local_storage;
unsigned long flags;
int err;
@@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
!map_value_has_spin_lock(&smap->map)))
return ERR_PTR(-EINVAL);
+ if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST)
+ return ERR_PTR(-EINVAL);
+
local_storage = rcu_dereference_check(*owner_storage(smap, owner),
bpf_rcu_lock_held());
if (!local_storage || hlist_empty(&local_storage->list)) {
@@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (err)
return ERR_PTR(err);
- selem = bpf_selem_alloc(smap, owner, value, true);
+ selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
if (!selem)
return ERR_PTR(-ENOMEM);
- err = bpf_local_storage_alloc(owner, smap, selem);
+ err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
if (err) {
kfree(selem);
mem_uncharge(smap, owner, smap->elem_size);
@@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
}
}
+ if (gfp_flags == GFP_KERNEL) {
+ selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
+ if (!selem)
+ return ERR_PTR(-ENOMEM);
+ }
+
raw_spin_lock_irqsave(&local_storage->lock, flags);
/* Recheck local_storage->list under local_storage->lock */
@@ -429,19 +439,21 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
goto unlock;
}
- /* local_storage->lock is held. Hence, we are sure
- * we can unlink and uncharge the old_sdata successfully
- * later. Hence, instead of charging the new selem now
- * and then uncharge the old selem later (which may cause
- * a potential but unnecessary charge failure), avoid taking
- * a charge at all here (the "!old_sdata" check) and the
- * old_sdata will not be uncharged later during
- * bpf_selem_unlink_storage_nolock().
- */
- selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
- if (!selem) {
- err = -ENOMEM;
- goto unlock_err;
+ if (gfp_flags != GFP_KERNEL) {
+ /* local_storage->lock is held. Hence, we are sure
+ * we can unlink and uncharge the old_sdata successfully
+ * later. Hence, instead of charging the new selem now
+ * and then uncharge the old selem later (which may cause
+ * a potential but unnecessary charge failure), avoid taking
+ * a charge at all here (the "!old_sdata" check) and the
+ * old_sdata will not be uncharged later during
+ * bpf_selem_unlink_storage_nolock().
+ */
+ selem = bpf_selem_alloc(smap, owner, value, !old_sdata, gfp_flags);
+ if (!selem) {
+ err = -ENOMEM;
+ goto unlock_err;
+ }
}
/* First, link the new selem to the map */
@@ -463,6 +475,10 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
unlock_err:
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
+ if (selem) {
+ mem_uncharge(smap, owner, smap->elem_size);
+ kfree(selem);
+ }
return ERR_PTR(err);
}
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index 5da7bed0f5f6..6638a0ecc3d2 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -174,7 +174,8 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
bpf_task_storage_lock();
sdata = bpf_local_storage_update(
- task, (struct bpf_local_storage_map *)map, value, map_flags);
+ task, (struct bpf_local_storage_map *)map, value, map_flags,
+ GFP_ATOMIC);
bpf_task_storage_unlock();
err = PTR_ERR_OR_ZERO(sdata);
@@ -226,8 +227,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
return err;
}
-BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
- task, void *, value, u64, flags)
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
+ task, void *, value, u64, flags, gfp_t, gfp_flags)
{
struct bpf_local_storage_data *sdata;
@@ -250,7 +252,7 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
(flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value,
- BPF_NOEXIST);
+ BPF_NOEXIST, gfp_flags);
unlock:
bpf_task_storage_unlock();
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0db6cd8dcb35..392fdaabedbd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13491,6 +13491,26 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
goto patch_call_imm;
}
+ if (insn->imm == BPF_FUNC_task_storage_get ||
+ insn->imm == BPF_FUNC_sk_storage_get ||
+ insn->imm == BPF_FUNC_inode_storage_get) {
+ if (env->prog->aux->sleepable)
+ insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, GFP_KERNEL);
+ else
+ insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, GFP_ATOMIC);
+ insn_buf[1] = *insn;
+ cnt = 2;
+
+ new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+ if (!new_prog)
+ return -ENOMEM;
+
+ delta += cnt - 1;
+ env->prog = prog = new_prog;
+ insn = new_prog->insnsi + i + delta;
+ goto patch_call_imm;
+ }
+
/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
* and other inlining handlers are currently limited to 64 bit
* only.
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index d9c37fd10809..7aff1206a851 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -141,7 +141,7 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
if (sock) {
sdata = bpf_local_storage_update(
sock->sk, (struct bpf_local_storage_map *)map, value,
- map_flags);
+ map_flags, GFP_ATOMIC);
sockfd_put(sock);
return PTR_ERR_OR_ZERO(sdata);
}
@@ -172,7 +172,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
{
struct bpf_local_storage_elem *copy_selem;
- copy_selem = bpf_selem_alloc(smap, newsk, NULL, true);
+ copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, GFP_ATOMIC);
if (!copy_selem)
return NULL;
@@ -230,7 +230,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
bpf_selem_link_map(smap, copy_selem);
bpf_selem_link_storage_nolock(new_sk_storage, copy_selem);
} else {
- ret = bpf_local_storage_alloc(newsk, smap, copy_selem);
+ ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC);
if (ret) {
kfree(copy_selem);
atomic_sub(smap->elem_size,
@@ -255,8 +255,9 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
return ret;
}
-BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
- void *, value, u64, flags)
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
+ void *, value, u64, flags, gfp_t, gfp_flags)
{
struct bpf_local_storage_data *sdata;
@@ -277,7 +278,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
refcount_inc_not_zero(&sk->sk_refcnt)) {
sdata = bpf_local_storage_update(
sk, (struct bpf_local_storage_map *)map, value,
- BPF_NOEXIST);
+ BPF_NOEXIST, gfp_flags);
/* sk must be a fullsock (guaranteed by verifier),
* so sock_gen_put() is unnecessary.
*/
@@ -417,14 +418,16 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog)
return false;
}
-BPF_CALL_4(bpf_sk_storage_get_tracing, struct bpf_map *, map, struct sock *, sk,
- void *, value, u64, flags)
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_sk_storage_get_tracing, struct bpf_map *, map, struct sock *, sk,
+ void *, value, u64, flags, gfp_t, gfp_flags)
{
WARN_ON_ONCE(!bpf_rcu_lock_held());
if (in_hardirq() || in_nmi())
return (unsigned long)NULL;
- return (unsigned long)____bpf_sk_storage_get(map, sk, value, flags);
+ return (unsigned long)____bpf_sk_storage_get(map, sk, value, flags,
+ gfp_flags);
}
BPF_CALL_2(bpf_sk_storage_delete_tracing, struct bpf_map *, map,
--
2.30.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage
2022-03-16 19:54 [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage Joanne Koong
@ 2022-03-17 2:23 ` Martin KaFai Lau
2022-03-17 5:26 ` Joanne Koong
2022-03-17 4:06 ` kernel test robot
1 sibling, 1 reply; 6+ messages in thread
From: Martin KaFai Lau @ 2022-03-17 2:23 UTC (permalink / raw)
To: Joanne Koong
Cc: bpf, kpsingh, memxor, ast, daniel, andrii, tj, davemarchevsky,
Joanne Koong
On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote:
> From: Joanne Koong <joannelkoong@gmail.com>
>
> Currently, local storage memory can only be allocated atomically
> (GFP_ATOMIC). This restriction is too strict for sleepable bpf
> programs.
>
> In this patch, the verifier detects whether the program is sleepable,
> and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a
> 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate
> down to the local storage functions that allocate memory.
>
> Please note that bpf_task/sk/inode_storage_update_elem functions are
> invoked by userspace applications through syscalls. Preemption is
> disabled before bpf_task/sk/inode_storage_update_elem is called, which
> means they will always have to allocate memory atomically.
>
> The existing local storage selftests cover both the GFP_ATOMIC and the
> GFP_KERNEL cases in bpf_local_storage_update.
>
> v2 <- v1:
> * Allocate the memory before/after the raw_spin_lock_irqsave, depending
> on the gfp flags
> * Rename mem_flags to gfp_flags
> * Reword the comment "*mem_flags* is set by the bpf verifier" to
> "*gfp_flags* is a hidden argument provided by the verifier"
> * Add a sentence to the commit message about existing local storage
> selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in
> bpf_local_storage_update.
[ ... ]
> struct bpf_local_storage_data *
> bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> - void *value, u64 map_flags)
> + void *value, u64 map_flags, gfp_t gfp_flags)
> {
> struct bpf_local_storage_data *old_sdata = NULL;
> - struct bpf_local_storage_elem *selem;
> + struct bpf_local_storage_elem *selem = NULL;
> struct bpf_local_storage *local_storage;
> unsigned long flags;
> int err;
> @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> !map_value_has_spin_lock(&smap->map)))
> return ERR_PTR(-EINVAL);
>
> + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST)
> + return ERR_PTR(-EINVAL);
> +
> local_storage = rcu_dereference_check(*owner_storage(smap, owner),
> bpf_rcu_lock_held());
> if (!local_storage || hlist_empty(&local_storage->list)) {
> @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> if (err)
> return ERR_PTR(err);
>
> - selem = bpf_selem_alloc(smap, owner, value, true);
> + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
> if (!selem)
> return ERR_PTR(-ENOMEM);
>
> - err = bpf_local_storage_alloc(owner, smap, selem);
> + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
> if (err) {
> kfree(selem);
> mem_uncharge(smap, owner, smap->elem_size);
> @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> }
> }
>
> + if (gfp_flags == GFP_KERNEL) {
> + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
I think this new path is not executed by the existing
"progs/local_storage.c" test which has sleepable lsm prog. At least a second
BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed?
or there is other selftest covering this new path that I missed?
Others lgtm.
Acked-by: Martin KaFai Lau <kafai@fb.com>
> + if (!selem)
> + return ERR_PTR(-ENOMEM);
> + }
> +
> raw_spin_lock_irqsave(&local_storage->lock, flags);
>
> /* Recheck local_storage->list under local_storage->lock */
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage
2022-03-16 19:54 [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage Joanne Koong
2022-03-17 2:23 ` Martin KaFai Lau
@ 2022-03-17 4:06 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2022-03-17 4:06 UTC (permalink / raw)
To: Joanne Koong, bpf
Cc: kbuild-all, kafai, kpsingh, memxor, ast, daniel, andrii, tj,
davemarchevsky, Joanne Koong
Hi Joanne,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/0day-ci/linux/commits/Joanne-Koong/bpf-Enable-non-atomic-allocations-in-local-storage/20220317-035639
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: s390-randconfig-s031-20220313 (https://download.01.org/0day-ci/archive/20220317/202203171134.RcEuYPsA-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/0day-ci/linux/commit/022ee8bbd2bfdefff36373ab838326754fe2bb55
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Joanne-Koong/bpf-Enable-non-atomic-allocations-in-local-storage/20220317-035639
git checkout 022ee8bbd2bfdefff36373ab838326754fe2bb55
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=s390 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> kernel/bpf/verifier.c:13498:47: sparse: sparse: incorrect type in initializer (different base types) @@ expected signed int [usertype] imm @@ got restricted gfp_t @@
kernel/bpf/verifier.c:13498:47: sparse: expected signed int [usertype] imm
kernel/bpf/verifier.c:13498:47: sparse: got restricted gfp_t
kernel/bpf/verifier.c:13500:47: sparse: sparse: incorrect type in initializer (different base types) @@ expected signed int [usertype] imm @@ got restricted gfp_t @@
kernel/bpf/verifier.c:13500:47: sparse: expected signed int [usertype] imm
kernel/bpf/verifier.c:13500:47: sparse: got restricted gfp_t
kernel/bpf/verifier.c:13726:38: sparse: sparse: subtraction of functions? Share your drugs
kernel/bpf/verifier.c: note: in included file (through include/linux/bpf.h, include/linux/bpf-cgroup.h):
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:63:40: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast to non-scalar
include/linux/bpfptr.h:52:47: sparse: sparse: cast from non-scalar
vim +13498 kernel/bpf/verifier.c
13234
13235 /* Do various post-verification rewrites in a single program pass.
13236 * These rewrites simplify JIT and interpreter implementations.
13237 */
13238 static int do_misc_fixups(struct bpf_verifier_env *env)
13239 {
13240 struct bpf_prog *prog = env->prog;
13241 enum bpf_attach_type eatype = prog->expected_attach_type;
13242 bool expect_blinding = bpf_jit_blinding_enabled(prog);
13243 enum bpf_prog_type prog_type = resolve_prog_type(prog);
13244 struct bpf_insn *insn = prog->insnsi;
13245 const struct bpf_func_proto *fn;
13246 const int insn_cnt = prog->len;
13247 const struct bpf_map_ops *ops;
13248 struct bpf_insn_aux_data *aux;
13249 struct bpf_insn insn_buf[16];
13250 struct bpf_prog *new_prog;
13251 struct bpf_map *map_ptr;
13252 int i, ret, cnt, delta = 0;
13253
13254 for (i = 0; i < insn_cnt; i++, insn++) {
13255 /* Make divide-by-zero exceptions impossible. */
13256 if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
13257 insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
13258 insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
13259 insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
13260 bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
13261 bool isdiv = BPF_OP(insn->code) == BPF_DIV;
13262 struct bpf_insn *patchlet;
13263 struct bpf_insn chk_and_div[] = {
13264 /* [R,W]x div 0 -> 0 */
13265 BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
13266 BPF_JNE | BPF_K, insn->src_reg,
13267 0, 2, 0),
13268 BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
13269 BPF_JMP_IMM(BPF_JA, 0, 0, 1),
13270 *insn,
13271 };
13272 struct bpf_insn chk_and_mod[] = {
13273 /* [R,W]x mod 0 -> [R,W]x */
13274 BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
13275 BPF_JEQ | BPF_K, insn->src_reg,
13276 0, 1 + (is64 ? 0 : 1), 0),
13277 *insn,
13278 BPF_JMP_IMM(BPF_JA, 0, 0, 1),
13279 BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
13280 };
13281
13282 patchlet = isdiv ? chk_and_div : chk_and_mod;
13283 cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
13284 ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
13285
13286 new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
13287 if (!new_prog)
13288 return -ENOMEM;
13289
13290 delta += cnt - 1;
13291 env->prog = prog = new_prog;
13292 insn = new_prog->insnsi + i + delta;
13293 continue;
13294 }
13295
13296 /* Implement LD_ABS and LD_IND with a rewrite, if supported by the program type. */
13297 if (BPF_CLASS(insn->code) == BPF_LD &&
13298 (BPF_MODE(insn->code) == BPF_ABS ||
13299 BPF_MODE(insn->code) == BPF_IND)) {
13300 cnt = env->ops->gen_ld_abs(insn, insn_buf);
13301 if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) {
13302 verbose(env, "bpf verifier is misconfigured\n");
13303 return -EINVAL;
13304 }
13305
13306 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13307 if (!new_prog)
13308 return -ENOMEM;
13309
13310 delta += cnt - 1;
13311 env->prog = prog = new_prog;
13312 insn = new_prog->insnsi + i + delta;
13313 continue;
13314 }
13315
13316 /* Rewrite pointer arithmetic to mitigate speculation attacks. */
13317 if (insn->code == (BPF_ALU64 | BPF_ADD | BPF_X) ||
13318 insn->code == (BPF_ALU64 | BPF_SUB | BPF_X)) {
13319 const u8 code_add = BPF_ALU64 | BPF_ADD | BPF_X;
13320 const u8 code_sub = BPF_ALU64 | BPF_SUB | BPF_X;
13321 struct bpf_insn *patch = &insn_buf[0];
13322 bool issrc, isneg, isimm;
13323 u32 off_reg;
13324
13325 aux = &env->insn_aux_data[i + delta];
13326 if (!aux->alu_state ||
13327 aux->alu_state == BPF_ALU_NON_POINTER)
13328 continue;
13329
13330 isneg = aux->alu_state & BPF_ALU_NEG_VALUE;
13331 issrc = (aux->alu_state & BPF_ALU_SANITIZE) ==
13332 BPF_ALU_SANITIZE_SRC;
13333 isimm = aux->alu_state & BPF_ALU_IMMEDIATE;
13334
13335 off_reg = issrc ? insn->src_reg : insn->dst_reg;
13336 if (isimm) {
13337 *patch++ = BPF_MOV32_IMM(BPF_REG_AX, aux->alu_limit);
13338 } else {
13339 if (isneg)
13340 *patch++ = BPF_ALU64_IMM(BPF_MUL, off_reg, -1);
13341 *patch++ = BPF_MOV32_IMM(BPF_REG_AX, aux->alu_limit);
13342 *patch++ = BPF_ALU64_REG(BPF_SUB, BPF_REG_AX, off_reg);
13343 *patch++ = BPF_ALU64_REG(BPF_OR, BPF_REG_AX, off_reg);
13344 *patch++ = BPF_ALU64_IMM(BPF_NEG, BPF_REG_AX, 0);
13345 *patch++ = BPF_ALU64_IMM(BPF_ARSH, BPF_REG_AX, 63);
13346 *patch++ = BPF_ALU64_REG(BPF_AND, BPF_REG_AX, off_reg);
13347 }
13348 if (!issrc)
13349 *patch++ = BPF_MOV64_REG(insn->dst_reg, insn->src_reg);
13350 insn->src_reg = BPF_REG_AX;
13351 if (isneg)
13352 insn->code = insn->code == code_add ?
13353 code_sub : code_add;
13354 *patch++ = *insn;
13355 if (issrc && isneg && !isimm)
13356 *patch++ = BPF_ALU64_IMM(BPF_MUL, off_reg, -1);
13357 cnt = patch - insn_buf;
13358
13359 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13360 if (!new_prog)
13361 return -ENOMEM;
13362
13363 delta += cnt - 1;
13364 env->prog = prog = new_prog;
13365 insn = new_prog->insnsi + i + delta;
13366 continue;
13367 }
13368
13369 if (insn->code != (BPF_JMP | BPF_CALL))
13370 continue;
13371 if (insn->src_reg == BPF_PSEUDO_CALL)
13372 continue;
13373 if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
13374 ret = fixup_kfunc_call(env, insn);
13375 if (ret)
13376 return ret;
13377 continue;
13378 }
13379
13380 if (insn->imm == BPF_FUNC_get_route_realm)
13381 prog->dst_needed = 1;
13382 if (insn->imm == BPF_FUNC_get_prandom_u32)
13383 bpf_user_rnd_init_once();
13384 if (insn->imm == BPF_FUNC_override_return)
13385 prog->kprobe_override = 1;
13386 if (insn->imm == BPF_FUNC_tail_call) {
13387 /* If we tail call into other programs, we
13388 * cannot make any assumptions since they can
13389 * be replaced dynamically during runtime in
13390 * the program array.
13391 */
13392 prog->cb_access = 1;
13393 if (!allow_tail_call_in_subprogs(env))
13394 prog->aux->stack_depth = MAX_BPF_STACK;
13395 prog->aux->max_pkt_offset = MAX_PACKET_OFF;
13396
13397 /* mark bpf_tail_call as different opcode to avoid
13398 * conditional branch in the interpreter for every normal
13399 * call and to prevent accidental JITing by JIT compiler
13400 * that doesn't support bpf_tail_call yet
13401 */
13402 insn->imm = 0;
13403 insn->code = BPF_JMP | BPF_TAIL_CALL;
13404
13405 aux = &env->insn_aux_data[i + delta];
13406 if (env->bpf_capable && !expect_blinding &&
13407 prog->jit_requested &&
13408 !bpf_map_key_poisoned(aux) &&
13409 !bpf_map_ptr_poisoned(aux) &&
13410 !bpf_map_ptr_unpriv(aux)) {
13411 struct bpf_jit_poke_descriptor desc = {
13412 .reason = BPF_POKE_REASON_TAIL_CALL,
13413 .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
13414 .tail_call.key = bpf_map_key_immediate(aux),
13415 .insn_idx = i + delta,
13416 };
13417
13418 ret = bpf_jit_add_poke_descriptor(prog, &desc);
13419 if (ret < 0) {
13420 verbose(env, "adding tail call poke descriptor failed\n");
13421 return ret;
13422 }
13423
13424 insn->imm = ret + 1;
13425 continue;
13426 }
13427
13428 if (!bpf_map_ptr_unpriv(aux))
13429 continue;
13430
13431 /* instead of changing every JIT dealing with tail_call
13432 * emit two extra insns:
13433 * if (index >= max_entries) goto out;
13434 * index &= array->index_mask;
13435 * to avoid out-of-bounds cpu speculation
13436 */
13437 if (bpf_map_ptr_poisoned(aux)) {
13438 verbose(env, "tail_call abusing map_ptr\n");
13439 return -EINVAL;
13440 }
13441
13442 map_ptr = BPF_MAP_PTR(aux->map_ptr_state);
13443 insn_buf[0] = BPF_JMP_IMM(BPF_JGE, BPF_REG_3,
13444 map_ptr->max_entries, 2);
13445 insn_buf[1] = BPF_ALU32_IMM(BPF_AND, BPF_REG_3,
13446 container_of(map_ptr,
13447 struct bpf_array,
13448 map)->index_mask);
13449 insn_buf[2] = *insn;
13450 cnt = 3;
13451 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13452 if (!new_prog)
13453 return -ENOMEM;
13454
13455 delta += cnt - 1;
13456 env->prog = prog = new_prog;
13457 insn = new_prog->insnsi + i + delta;
13458 continue;
13459 }
13460
13461 if (insn->imm == BPF_FUNC_timer_set_callback) {
13462 /* The verifier will process callback_fn as many times as necessary
13463 * with different maps and the register states prepared by
13464 * set_timer_callback_state will be accurate.
13465 *
13466 * The following use case is valid:
13467 * map1 is shared by prog1, prog2, prog3.
13468 * prog1 calls bpf_timer_init for some map1 elements
13469 * prog2 calls bpf_timer_set_callback for some map1 elements.
13470 * Those that were not bpf_timer_init-ed will return -EINVAL.
13471 * prog3 calls bpf_timer_start for some map1 elements.
13472 * Those that were not both bpf_timer_init-ed and
13473 * bpf_timer_set_callback-ed will return -EINVAL.
13474 */
13475 struct bpf_insn ld_addrs[2] = {
13476 BPF_LD_IMM64(BPF_REG_3, (long)prog->aux),
13477 };
13478
13479 insn_buf[0] = ld_addrs[0];
13480 insn_buf[1] = ld_addrs[1];
13481 insn_buf[2] = *insn;
13482 cnt = 3;
13483
13484 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13485 if (!new_prog)
13486 return -ENOMEM;
13487
13488 delta += cnt - 1;
13489 env->prog = prog = new_prog;
13490 insn = new_prog->insnsi + i + delta;
13491 goto patch_call_imm;
13492 }
13493
13494 if (insn->imm == BPF_FUNC_task_storage_get ||
13495 insn->imm == BPF_FUNC_sk_storage_get ||
13496 insn->imm == BPF_FUNC_inode_storage_get) {
13497 if (env->prog->aux->sleepable)
13498 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, GFP_KERNEL);
13499 else
13500 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, GFP_ATOMIC);
13501 insn_buf[1] = *insn;
13502 cnt = 2;
13503
13504 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13505 if (!new_prog)
13506 return -ENOMEM;
13507
13508 delta += cnt - 1;
13509 env->prog = prog = new_prog;
13510 insn = new_prog->insnsi + i + delta;
13511 goto patch_call_imm;
13512 }
13513
13514 /* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
13515 * and other inlining handlers are currently limited to 64 bit
13516 * only.
13517 */
13518 if (prog->jit_requested && BITS_PER_LONG == 64 &&
13519 (insn->imm == BPF_FUNC_map_lookup_elem ||
13520 insn->imm == BPF_FUNC_map_update_elem ||
13521 insn->imm == BPF_FUNC_map_delete_elem ||
13522 insn->imm == BPF_FUNC_map_push_elem ||
13523 insn->imm == BPF_FUNC_map_pop_elem ||
13524 insn->imm == BPF_FUNC_map_peek_elem ||
13525 insn->imm == BPF_FUNC_redirect_map ||
13526 insn->imm == BPF_FUNC_for_each_map_elem)) {
13527 aux = &env->insn_aux_data[i + delta];
13528 if (bpf_map_ptr_poisoned(aux))
13529 goto patch_call_imm;
13530
13531 map_ptr = BPF_MAP_PTR(aux->map_ptr_state);
13532 ops = map_ptr->ops;
13533 if (insn->imm == BPF_FUNC_map_lookup_elem &&
13534 ops->map_gen_lookup) {
13535 cnt = ops->map_gen_lookup(map_ptr, insn_buf);
13536 if (cnt == -EOPNOTSUPP)
13537 goto patch_map_ops_generic;
13538 if (cnt <= 0 || cnt >= ARRAY_SIZE(insn_buf)) {
13539 verbose(env, "bpf verifier is misconfigured\n");
13540 return -EINVAL;
13541 }
13542
13543 new_prog = bpf_patch_insn_data(env, i + delta,
13544 insn_buf, cnt);
13545 if (!new_prog)
13546 return -ENOMEM;
13547
13548 delta += cnt - 1;
13549 env->prog = prog = new_prog;
13550 insn = new_prog->insnsi + i + delta;
13551 continue;
13552 }
13553
13554 BUILD_BUG_ON(!__same_type(ops->map_lookup_elem,
13555 (void *(*)(struct bpf_map *map, void *key))NULL));
13556 BUILD_BUG_ON(!__same_type(ops->map_delete_elem,
13557 (int (*)(struct bpf_map *map, void *key))NULL));
13558 BUILD_BUG_ON(!__same_type(ops->map_update_elem,
13559 (int (*)(struct bpf_map *map, void *key, void *value,
13560 u64 flags))NULL));
13561 BUILD_BUG_ON(!__same_type(ops->map_push_elem,
13562 (int (*)(struct bpf_map *map, void *value,
13563 u64 flags))NULL));
13564 BUILD_BUG_ON(!__same_type(ops->map_pop_elem,
13565 (int (*)(struct bpf_map *map, void *value))NULL));
13566 BUILD_BUG_ON(!__same_type(ops->map_peek_elem,
13567 (int (*)(struct bpf_map *map, void *value))NULL));
13568 BUILD_BUG_ON(!__same_type(ops->map_redirect,
13569 (int (*)(struct bpf_map *map, u32 ifindex, u64 flags))NULL));
13570 BUILD_BUG_ON(!__same_type(ops->map_for_each_callback,
13571 (int (*)(struct bpf_map *map,
13572 bpf_callback_t callback_fn,
13573 void *callback_ctx,
13574 u64 flags))NULL));
13575
13576 patch_map_ops_generic:
13577 switch (insn->imm) {
13578 case BPF_FUNC_map_lookup_elem:
13579 insn->imm = BPF_CALL_IMM(ops->map_lookup_elem);
13580 continue;
13581 case BPF_FUNC_map_update_elem:
13582 insn->imm = BPF_CALL_IMM(ops->map_update_elem);
13583 continue;
13584 case BPF_FUNC_map_delete_elem:
13585 insn->imm = BPF_CALL_IMM(ops->map_delete_elem);
13586 continue;
13587 case BPF_FUNC_map_push_elem:
13588 insn->imm = BPF_CALL_IMM(ops->map_push_elem);
13589 continue;
13590 case BPF_FUNC_map_pop_elem:
13591 insn->imm = BPF_CALL_IMM(ops->map_pop_elem);
13592 continue;
13593 case BPF_FUNC_map_peek_elem:
13594 insn->imm = BPF_CALL_IMM(ops->map_peek_elem);
13595 continue;
13596 case BPF_FUNC_redirect_map:
13597 insn->imm = BPF_CALL_IMM(ops->map_redirect);
13598 continue;
13599 case BPF_FUNC_for_each_map_elem:
13600 insn->imm = BPF_CALL_IMM(ops->map_for_each_callback);
13601 continue;
13602 }
13603
13604 goto patch_call_imm;
13605 }
13606
13607 /* Implement bpf_jiffies64 inline. */
13608 if (prog->jit_requested && BITS_PER_LONG == 64 &&
13609 insn->imm == BPF_FUNC_jiffies64) {
13610 struct bpf_insn ld_jiffies_addr[2] = {
13611 BPF_LD_IMM64(BPF_REG_0,
13612 (unsigned long)&jiffies),
13613 };
13614
13615 insn_buf[0] = ld_jiffies_addr[0];
13616 insn_buf[1] = ld_jiffies_addr[1];
13617 insn_buf[2] = BPF_LDX_MEM(BPF_DW, BPF_REG_0,
13618 BPF_REG_0, 0);
13619 cnt = 3;
13620
13621 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf,
13622 cnt);
13623 if (!new_prog)
13624 return -ENOMEM;
13625
13626 delta += cnt - 1;
13627 env->prog = prog = new_prog;
13628 insn = new_prog->insnsi + i + delta;
13629 continue;
13630 }
13631
13632 /* Implement bpf_get_func_arg inline. */
13633 if (prog_type == BPF_PROG_TYPE_TRACING &&
13634 insn->imm == BPF_FUNC_get_func_arg) {
13635 /* Load nr_args from ctx - 8 */
13636 insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
13637 insn_buf[1] = BPF_JMP32_REG(BPF_JGE, BPF_REG_2, BPF_REG_0, 6);
13638 insn_buf[2] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 3);
13639 insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1);
13640 insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0);
13641 insn_buf[5] = BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, 0);
13642 insn_buf[6] = BPF_MOV64_IMM(BPF_REG_0, 0);
13643 insn_buf[7] = BPF_JMP_A(1);
13644 insn_buf[8] = BPF_MOV64_IMM(BPF_REG_0, -EINVAL);
13645 cnt = 9;
13646
13647 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13648 if (!new_prog)
13649 return -ENOMEM;
13650
13651 delta += cnt - 1;
13652 env->prog = prog = new_prog;
13653 insn = new_prog->insnsi + i + delta;
13654 continue;
13655 }
13656
13657 /* Implement bpf_get_func_ret inline. */
13658 if (prog_type == BPF_PROG_TYPE_TRACING &&
13659 insn->imm == BPF_FUNC_get_func_ret) {
13660 if (eatype == BPF_TRACE_FEXIT ||
13661 eatype == BPF_MODIFY_RETURN) {
13662 /* Load nr_args from ctx - 8 */
13663 insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
13664 insn_buf[1] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
13665 insn_buf[2] = BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1);
13666 insn_buf[3] = BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, 0);
13667 insn_buf[4] = BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_3, 0);
13668 insn_buf[5] = BPF_MOV64_IMM(BPF_REG_0, 0);
13669 cnt = 6;
13670 } else {
13671 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_0, -EOPNOTSUPP);
13672 cnt = 1;
13673 }
13674
13675 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
13676 if (!new_prog)
13677 return -ENOMEM;
13678
13679 delta += cnt - 1;
13680 env->prog = prog = new_prog;
13681 insn = new_prog->insnsi + i + delta;
13682 continue;
13683 }
13684
13685 /* Implement get_func_arg_cnt inline. */
13686 if (prog_type == BPF_PROG_TYPE_TRACING &&
13687 insn->imm == BPF_FUNC_get_func_arg_cnt) {
13688 /* Load nr_args from ctx - 8 */
13689 insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
13690
13691 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1);
13692 if (!new_prog)
13693 return -ENOMEM;
13694
13695 env->prog = prog = new_prog;
13696 insn = new_prog->insnsi + i + delta;
13697 continue;
13698 }
13699
13700 /* Implement bpf_get_func_ip inline. */
13701 if (prog_type == BPF_PROG_TYPE_TRACING &&
13702 insn->imm == BPF_FUNC_get_func_ip) {
13703 /* Load IP address from ctx - 16 */
13704 insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -16);
13705
13706 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1);
13707 if (!new_prog)
13708 return -ENOMEM;
13709
13710 env->prog = prog = new_prog;
13711 insn = new_prog->insnsi + i + delta;
13712 continue;
13713 }
13714
13715 patch_call_imm:
13716 fn = env->ops->get_func_proto(insn->imm, env->prog);
13717 /* all functions that have prototype and verifier allowed
13718 * programs to call them, must be real in-kernel functions
13719 */
13720 if (!fn->func) {
13721 verbose(env,
13722 "kernel subsystem misconfigured func %s#%d\n",
13723 func_id_name(insn->imm), insn->imm);
13724 return -EFAULT;
13725 }
13726 insn->imm = fn->func - __bpf_call_base;
13727 }
13728
13729 /* Since poke tab is now finalized, publish aux to tracker. */
13730 for (i = 0; i < prog->aux->size_poke_tab; i++) {
13731 map_ptr = prog->aux->poke_tab[i].tail_call.map;
13732 if (!map_ptr->ops->map_poke_track ||
13733 !map_ptr->ops->map_poke_untrack ||
13734 !map_ptr->ops->map_poke_run) {
13735 verbose(env, "bpf verifier is misconfigured\n");
13736 return -EINVAL;
13737 }
13738
13739 ret = map_ptr->ops->map_poke_track(map_ptr, prog->aux);
13740 if (ret < 0) {
13741 verbose(env, "tracking tail call prog failed\n");
13742 return ret;
13743 }
13744 }
13745
13746 sort_kfunc_descs_by_imm(env->prog);
13747
13748 return 0;
13749 }
13750
---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage
2022-03-17 2:23 ` Martin KaFai Lau
@ 2022-03-17 5:26 ` Joanne Koong
2022-03-17 18:04 ` Martin KaFai Lau
0 siblings, 1 reply; 6+ messages in thread
From: Joanne Koong @ 2022-03-17 5:26 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Joanne Koong, bpf, KP Singh, Kumar Kartikeya Dwivedi,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Tejun Heo,
Dave Marchevsky
On Wed, Mar 16, 2022 at 7:23 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote:
> > From: Joanne Koong <joannelkoong@gmail.com>
> >
> > Currently, local storage memory can only be allocated atomically
> > (GFP_ATOMIC). This restriction is too strict for sleepable bpf
> > programs.
> >
> > In this patch, the verifier detects whether the program is sleepable,
> > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a
> > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate
> > down to the local storage functions that allocate memory.
> >
> > Please note that bpf_task/sk/inode_storage_update_elem functions are
> > invoked by userspace applications through syscalls. Preemption is
> > disabled before bpf_task/sk/inode_storage_update_elem is called, which
> > means they will always have to allocate memory atomically.
> >
> > The existing local storage selftests cover both the GFP_ATOMIC and the
> > GFP_KERNEL cases in bpf_local_storage_update.
> >
> > v2 <- v1:
> > * Allocate the memory before/after the raw_spin_lock_irqsave, depending
> > on the gfp flags
> > * Rename mem_flags to gfp_flags
> > * Reword the comment "*mem_flags* is set by the bpf verifier" to
> > "*gfp_flags* is a hidden argument provided by the verifier"
> > * Add a sentence to the commit message about existing local storage
> > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in
> > bpf_local_storage_update.
>
> [ ... ]
>
> > struct bpf_local_storage_data *
> > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > - void *value, u64 map_flags)
> > + void *value, u64 map_flags, gfp_t gfp_flags)
> > {
> > struct bpf_local_storage_data *old_sdata = NULL;
> > - struct bpf_local_storage_elem *selem;
> > + struct bpf_local_storage_elem *selem = NULL;
> > struct bpf_local_storage *local_storage;
> > unsigned long flags;
> > int err;
> > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > !map_value_has_spin_lock(&smap->map)))
> > return ERR_PTR(-EINVAL);
> >
> > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST)
> > + return ERR_PTR(-EINVAL);
> > +
> > local_storage = rcu_dereference_check(*owner_storage(smap, owner),
> > bpf_rcu_lock_held());
> > if (!local_storage || hlist_empty(&local_storage->list)) {
> > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > if (err)
> > return ERR_PTR(err);
> >
> > - selem = bpf_selem_alloc(smap, owner, value, true);
> > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
> > if (!selem)
> > return ERR_PTR(-ENOMEM);
> >
> > - err = bpf_local_storage_alloc(owner, smap, selem);
> > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
> > if (err) {
> > kfree(selem);
> > mem_uncharge(smap, owner, smap->elem_size);
> > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > }
> > }
> >
> > + if (gfp_flags == GFP_KERNEL) {
> > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
> I think this new path is not executed by the existing
> "progs/local_storage.c" test which has sleepable lsm prog. At least a second
> BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed?
> or there is other selftest covering this new path that I missed?
Thanks for your feedback. I think I'm misunderstanding how the
progs/local_storage.c test and/or local storage works then. Would you
mind explaining why a second map is needed?
This is my (faulty) understanding of what is happening:
1) In "local_storage.c" in "SEC("lsm.s/inode_rename")" there is a call
to bpf_inode_storage_get with the BPF_LOCAL_STORAGE_GET_F_CREATE flag
set, which will call into bpf_local_storage_update (which will create
the local storage + the selem, and put it in the RCU for that
inode_storage_map)
2) Then, further down in the "local_storage.c" file in
"SEC("lsm.s/bprm_committed_creds")", there is another call to
bpf_inode_storage_get on the same inode_storage_map but on a different
inode, also with the BPF_LOCAL_STORAGE_GET_F_CREATE flag set. This
will also call into bpf_local_storage_update.
3) In bpf_local_storage_update from the call in #2, it sees that there
is a local storage associated with this map in the RCU, it tries to
look for the inode but doesn't find it, so it needs to allocate with
GFP_KERNEL a new selem and then update with the new selem.
I just tried out some debug statements locally to test, and it looks
like my analysis of #3 is wrong. I will debug this some more tomorrow
>
> Others lgtm.
>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
>
> > + if (!selem)
> > + return ERR_PTR(-ENOMEM);
> > + }
> > +
> > raw_spin_lock_irqsave(&local_storage->lock, flags);
> >
> > /* Recheck local_storage->list under local_storage->lock */
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage
2022-03-17 5:26 ` Joanne Koong
@ 2022-03-17 18:04 ` Martin KaFai Lau
2022-03-17 22:08 ` Joanne Koong
0 siblings, 1 reply; 6+ messages in thread
From: Martin KaFai Lau @ 2022-03-17 18:04 UTC (permalink / raw)
To: Joanne Koong
Cc: Joanne Koong, bpf, KP Singh, Kumar Kartikeya Dwivedi,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Tejun Heo,
Dave Marchevsky
On Wed, Mar 16, 2022 at 10:26:57PM -0700, Joanne Koong wrote:
> On Wed, Mar 16, 2022 at 7:23 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote:
> > > From: Joanne Koong <joannelkoong@gmail.com>
> > >
> > > Currently, local storage memory can only be allocated atomically
> > > (GFP_ATOMIC). This restriction is too strict for sleepable bpf
> > > programs.
> > >
> > > In this patch, the verifier detects whether the program is sleepable,
> > > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a
> > > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate
> > > down to the local storage functions that allocate memory.
> > >
> > > Please note that bpf_task/sk/inode_storage_update_elem functions are
> > > invoked by userspace applications through syscalls. Preemption is
> > > disabled before bpf_task/sk/inode_storage_update_elem is called, which
> > > means they will always have to allocate memory atomically.
> > >
> > > The existing local storage selftests cover both the GFP_ATOMIC and the
> > > GFP_KERNEL cases in bpf_local_storage_update.
> > >
> > > v2 <- v1:
> > > * Allocate the memory before/after the raw_spin_lock_irqsave, depending
> > > on the gfp flags
> > > * Rename mem_flags to gfp_flags
> > > * Reword the comment "*mem_flags* is set by the bpf verifier" to
> > > "*gfp_flags* is a hidden argument provided by the verifier"
> > > * Add a sentence to the commit message about existing local storage
> > > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in
> > > bpf_local_storage_update.
> >
> > [ ... ]
> >
> > > struct bpf_local_storage_data *
> > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > - void *value, u64 map_flags)
> > > + void *value, u64 map_flags, gfp_t gfp_flags)
> > > {
> > > struct bpf_local_storage_data *old_sdata = NULL;
> > > - struct bpf_local_storage_elem *selem;
> > > + struct bpf_local_storage_elem *selem = NULL;
> > > struct bpf_local_storage *local_storage;
> > > unsigned long flags;
> > > int err;
> > > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > !map_value_has_spin_lock(&smap->map)))
> > > return ERR_PTR(-EINVAL);
> > >
> > > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST)
> > > + return ERR_PTR(-EINVAL);
> > > +
> > > local_storage = rcu_dereference_check(*owner_storage(smap, owner),
> > > bpf_rcu_lock_held());
> > > if (!local_storage || hlist_empty(&local_storage->list)) {
> > > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > if (err)
> > > return ERR_PTR(err);
> > >
> > > - selem = bpf_selem_alloc(smap, owner, value, true);
> > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
> > > if (!selem)
> > > return ERR_PTR(-ENOMEM);
> > >
> > > - err = bpf_local_storage_alloc(owner, smap, selem);
> > > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
> > > if (err) {
> > > kfree(selem);
> > > mem_uncharge(smap, owner, smap->elem_size);
> > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > }
> > > }
> > >
> > > + if (gfp_flags == GFP_KERNEL) {
> > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
> > I think this new path is not executed by the existing
> > "progs/local_storage.c" test which has sleepable lsm prog. At least a second
> > BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed?
> > or there is other selftest covering this new path that I missed?
> Thanks for your feedback. I think I'm misunderstanding how the
> progs/local_storage.c test and/or local storage works then. Would you
> mind explaining why a second map is needed?
>
> This is my (faulty) understanding of what is happening:
> 1) In "local_storage.c" in "SEC("lsm.s/inode_rename")" there is a call
> to bpf_inode_storage_get with the BPF_LOCAL_STORAGE_GET_F_CREATE flag
> set, which will call into bpf_local_storage_update (which will create
> the local storage + the selem, and put it in the RCU for that
> inode_storage_map)
From reading the comment above the bpf_inode_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE):
"new_dentry->d_inode can be NULL", so it is expected to fail.
Meaning no storage is created.
>
> 2) Then, further down in the "local_storage.c" file in
> "SEC("lsm.s/bprm_committed_creds")", there is another call to
> bpf_inode_storage_get on the same inode_storage_map but on a different
> inode, also with the BPF_LOCAL_STORAGE_GET_F_CREATE flag set. This
> will also call into bpf_local_storage_update.
I belive this is the inode and the storage that the second
bpf_inode_storage_get(..., 0) in the "inode_rename" bpf-prog is supposed
to get. Otherwise, I don't see how the test can pass.
>
> 3) In bpf_local_storage_update from the call in #2, it sees that there
> is a local storage associated with this map in the RCU, it tries to
> look for the inode but doesn't find it, so it needs to allocate with
> GFP_KERNEL a new selem and then update with the new selem.
Correct, that will be the very first storage created for this inode
and it will go through the "if (!local_storage || hlist_empty(&local_storage->list))"
allocation code path in bpf_local_storage_update() which is
an existing code path.
I was talking specifically about the "if (gfp_flags == GFP_KERNEL)"
allocation code path. Thus, it needs a second inode local storage (i.e.
a second inode map) for the same inode. A second inode storage map
and another "bpf_inode_storage_get(&second_inode_storage_map, ...
BPF_LOCAL_STORAGE_GET_F_CREATE)" should be enough.
It seems it needs a re-spin because of the sparse warning.
I don't see an issue from the code, just thinking it will
be useful to have a test to exercise this path. It
could be a follow up as an individual patch if not in v3.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage
2022-03-17 18:04 ` Martin KaFai Lau
@ 2022-03-17 22:08 ` Joanne Koong
0 siblings, 0 replies; 6+ messages in thread
From: Joanne Koong @ 2022-03-17 22:08 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Joanne Koong, bpf, KP Singh, Kumar Kartikeya Dwivedi,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Tejun Heo,
Dave Marchevsky
On Thu, Mar 17, 2022 at 11:04 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Mar 16, 2022 at 10:26:57PM -0700, Joanne Koong wrote:
> > On Wed, Mar 16, 2022 at 7:23 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Wed, Mar 16, 2022 at 12:54:00PM -0700, Joanne Koong wrote:
> > > > From: Joanne Koong <joannelkoong@gmail.com>
> > > >
> > > > Currently, local storage memory can only be allocated atomically
> > > > (GFP_ATOMIC). This restriction is too strict for sleepable bpf
> > > > programs.
> > > >
> > > > In this patch, the verifier detects whether the program is sleepable,
> > > > and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a
> > > > 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate
> > > > down to the local storage functions that allocate memory.
> > > >
> > > > Please note that bpf_task/sk/inode_storage_update_elem functions are
> > > > invoked by userspace applications through syscalls. Preemption is
> > > > disabled before bpf_task/sk/inode_storage_update_elem is called, which
> > > > means they will always have to allocate memory atomically.
> > > >
> > > > The existing local storage selftests cover both the GFP_ATOMIC and the
> > > > GFP_KERNEL cases in bpf_local_storage_update.
> > > >
> > > > v2 <- v1:
> > > > * Allocate the memory before/after the raw_spin_lock_irqsave, depending
> > > > on the gfp flags
> > > > * Rename mem_flags to gfp_flags
> > > > * Reword the comment "*mem_flags* is set by the bpf verifier" to
> > > > "*gfp_flags* is a hidden argument provided by the verifier"
> > > > * Add a sentence to the commit message about existing local storage
> > > > selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in
> > > > bpf_local_storage_update.
> > >
> > > [ ... ]
> > >
> > > > struct bpf_local_storage_data *
> > > > bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > > - void *value, u64 map_flags)
> > > > + void *value, u64 map_flags, gfp_t gfp_flags)
> > > > {
> > > > struct bpf_local_storage_data *old_sdata = NULL;
> > > > - struct bpf_local_storage_elem *selem;
> > > > + struct bpf_local_storage_elem *selem = NULL;
> > > > struct bpf_local_storage *local_storage;
> > > > unsigned long flags;
> > > > int err;
> > > > @@ -365,6 +366,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > > !map_value_has_spin_lock(&smap->map)))
> > > > return ERR_PTR(-EINVAL);
> > > >
> > > > + if (gfp_flags == GFP_KERNEL && (map_flags & ~BPF_F_LOCK) != BPF_NOEXIST)
> > > > + return ERR_PTR(-EINVAL);
> > > > +
> > > > local_storage = rcu_dereference_check(*owner_storage(smap, owner),
> > > > bpf_rcu_lock_held());
> > > > if (!local_storage || hlist_empty(&local_storage->list)) {
> > > > @@ -373,11 +377,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > > if (err)
> > > > return ERR_PTR(err);
> > > >
> > > > - selem = bpf_selem_alloc(smap, owner, value, true);
> > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
> > > > if (!selem)
> > > > return ERR_PTR(-ENOMEM);
> > > >
> > > > - err = bpf_local_storage_alloc(owner, smap, selem);
> > > > + err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
> > > > if (err) {
> > > > kfree(selem);
> > > > mem_uncharge(smap, owner, smap->elem_size);
> > > > @@ -404,6 +408,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> > > > }
> > > > }
> > > >
> > > > + if (gfp_flags == GFP_KERNEL) {
> > > > + selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
> > > I think this new path is not executed by the existing
> > > "progs/local_storage.c" test which has sleepable lsm prog. At least a second
> > > BPF_MAP_TYPE_TASK_STORAGE map (or SK_STORAGE) is needed?
> > > or there is other selftest covering this new path that I missed?
> > Thanks for your feedback. I think I'm misunderstanding how the
> > progs/local_storage.c test and/or local storage works then. Would you
> > mind explaining why a second map is needed?
> >
> > This is my (faulty) understanding of what is happening:
> > 1) In "local_storage.c" in "SEC("lsm.s/inode_rename")" there is a call
> > to bpf_inode_storage_get with the BPF_LOCAL_STORAGE_GET_F_CREATE flag
> > set, which will call into bpf_local_storage_update (which will create
> > the local storage + the selem, and put it in the RCU for that
> > inode_storage_map)
> From reading the comment above the bpf_inode_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE):
> "new_dentry->d_inode can be NULL", so it is expected to fail.
> Meaning no storage is created.
>
> >
> > 2) Then, further down in the "local_storage.c" file in
> > "SEC("lsm.s/bprm_committed_creds")", there is another call to
> > bpf_inode_storage_get on the same inode_storage_map but on a different
> > inode, also with the BPF_LOCAL_STORAGE_GET_F_CREATE flag set. This
> > will also call into bpf_local_storage_update.
> I belive this is the inode and the storage that the second
> bpf_inode_storage_get(..., 0) in the "inode_rename" bpf-prog is supposed
> to get. Otherwise, I don't see how the test can pass.
>
> >
> > 3) In bpf_local_storage_update from the call in #2, it sees that there
> > is a local storage associated with this map in the RCU, it tries to
> > look for the inode but doesn't find it, so it needs to allocate with
> > GFP_KERNEL a new selem and then update with the new selem.
> Correct, that will be the very first storage created for this inode
> and it will go through the "if (!local_storage || hlist_empty(&local_storage->list))"
> allocation code path in bpf_local_storage_update() which is
> an existing code path.
>
Ah, I see. I mistakenly thought inodes shared local storages if you
passed in the same map.
Thanks for the clarification!
> I was talking specifically about the "if (gfp_flags == GFP_KERNEL)"
> allocation code path. Thus, it needs a second inode local storage (i.e.
> a second inode map) for the same inode. A second inode storage map
> and another "bpf_inode_storage_get(&second_inode_storage_map, ...
> BPF_LOCAL_STORAGE_GET_F_CREATE)" should be enough.
>
> It seems it needs a re-spin because of the sparse warning.
> I don't see an issue from the code, just thinking it will
> be useful to have a test to exercise this path. It
> could be a follow up as an individual patch if not in v3.
I will submit a v3 that fixes the sparse warning and adds a case to
exercise this path.
Thanks for reviewing this, Martin!
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-03-17 22:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16 19:54 [PATCH bpf-next v2] bpf: Enable non-atomic allocations in local storage Joanne Koong
2022-03-17 2:23 ` Martin KaFai Lau
2022-03-17 5:26 ` Joanne Koong
2022-03-17 18:04 ` Martin KaFai Lau
2022-03-17 22:08 ` Joanne Koong
2022-03-17 4:06 ` kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.