bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>, bpf <bpf@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH bpf 2/2] bpf, sockhash: Synchronize delete from bucket list on map free
Date: Tue, 9 Jun 2020 11:04:54 -0700	[thread overview]
Message-ID: <CAADnVQLe-vks4TpJzcWc+HPa1jzO2z6yZg-M8iD2JNAmfm9dXQ@mail.gmail.com> (raw)
In-Reply-To: <5edfc23e1bb3e_5cca2af6a27f45b8df@john-XPS-13-9370.notmuch>

On Tue, Jun 9, 2020 at 10:51 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Jakub Sitnicki wrote:
> > We can end up modifying the sockhash bucket list from two CPUs when a
> > sockhash is being destroyed (sock_hash_free) on one CPU, while a socket
> > that is in the sockhash is unlinking itself from it on another CPU
> > it (sock_hash_delete_from_link).
> >
> > This results in accessing a list element that is in an undefined state as
> > reported by KASAN:
> >
> > | ==================================================================
> > | BUG: KASAN: wild-memory-access in sock_hash_free+0x13c/0x280
> > | Write of size 8 at addr dead000000000122 by task kworker/2:1/95
> > |
> > | CPU: 2 PID: 95 Comm: kworker/2:1 Not tainted 5.7.0-rc7-02961-ge22c35ab0038-dirty #691
> > | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
> > | Workqueue: events bpf_map_free_deferred
> > | Call Trace:
> > |  dump_stack+0x97/0xe0
> > |  ? sock_hash_free+0x13c/0x280
> > |  __kasan_report.cold+0x5/0x40
> > |  ? mark_lock+0xbc1/0xc00
> > |  ? sock_hash_free+0x13c/0x280
> > |  kasan_report+0x38/0x50
> > |  ? sock_hash_free+0x152/0x280
> > |  sock_hash_free+0x13c/0x280
> > |  bpf_map_free_deferred+0xb2/0xd0
> > |  ? bpf_map_charge_finish+0x50/0x50
> > |  ? rcu_read_lock_sched_held+0x81/0xb0
> > |  ? rcu_read_lock_bh_held+0x90/0x90
> > |  process_one_work+0x59a/0xac0
> > |  ? lock_release+0x3b0/0x3b0
> > |  ? pwq_dec_nr_in_flight+0x110/0x110
> > |  ? rwlock_bug.part.0+0x60/0x60
> > |  worker_thread+0x7a/0x680
> > |  ? _raw_spin_unlock_irqrestore+0x4c/0x60
> > |  kthread+0x1cc/0x220
> > |  ? process_one_work+0xac0/0xac0
> > |  ? kthread_create_on_node+0xa0/0xa0
> > |  ret_from_fork+0x24/0x30
> > | ==================================================================
> >
> > Fix it by reintroducing spin-lock protected critical section around the
> > code that removes the elements from the bucket on sockhash free.
> >
> > To do that we also need to defer processing of removed elements, until out
> > of atomic context so that we can unlink the socket from the map when
> > holding the sock lock.
> >
> > Fixes: 90db6d772f74 ("bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free")
> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > ---
> >  net/core/sock_map.c | 23 +++++++++++++++++++++--
> >  1 file changed, 21 insertions(+), 2 deletions(-)
>
> Thanks.
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Applied both to bpf tree.

FYI I see this splat:
 ./test_sockmap
# 1/ 6  sockmap::txmsg test passthrough:OK
# 2/ 6  sockmap::txmsg test redirect:OK
# 3/ 6  sockmap::txmsg test drop:OK
# 4/ 6  sockmap::txmsg test ingress redirect:OK
[   19.180397]
[   19.180633] =============================
[   19.181042] WARNING: suspicious RCU usage
[   19.181517] 5.7.0-07177-g75e68e5bf2c7 #688 Not tainted
[   19.182048] -----------------------------
[   19.182570] include/linux/skmsg.h:284 suspicious
rcu_dereference_check() usage!
[   19.183341]
[   19.183341] other info that might help us debug this:
[   19.183341]
[   19.184215]
[   19.184215] rcu_scheduler_active = 2, debug_locks = 1
[   19.184875] 1 lock held by test_sockmap/291:
[   19.185356]  #0: ffff8881315b5b20 (sk_lock-AF_INET){+.+.}-{0:0},
at: tls_sw_recvmsg+0x128/0x810
[   19.186315]
[   19.186315] stack backtrace:
[   19.186774] CPU: 0 PID: 291 Comm: test_sockmap Not tainted
5.7.0-07177-g75e68e5bf2c7 #688
[   19.188497] Call Trace:
[   19.188757]  dump_stack+0x71/0xa0
[   19.189140]  sk_psock_skb_redirect.isra.0+0xa6/0xf0
[   19.189651]  sk_psock_tls_strp_read+0x1cc/0x250
[   19.190142]  tls_sw_recvmsg+0x6e4/0x810
[   19.190584]  inet_recvmsg+0x55/0x1d0
[   19.190963]  ____sys_recvmsg+0x73/0x130
[   19.191365]  ? import_iovec+0x27/0xd0
[   19.191800]  ? copy_msghdr_from_user+0x4c/0x70
[   19.192271]  ___sys_recvmsg+0x68/0x90
[   19.192682]  ? __might_fault+0x36/0x80
[   19.193078]  ? __audit_syscall_exit+0x242/0x2b0
[   19.193576]  __sys_recvmsg+0x46/0x80
[   19.193964]  do_syscall_64+0x4a/0x1b0
[   19.194355]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[   19.194924] RIP: 0033:0x7fc915cc4490

  reply	other threads:[~2020-06-09 18:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-07 20:52 [PATCH bpf 0/2] Fixes for sock_hash_free Jakub Sitnicki
2020-06-07 20:52 ` [PATCH bpf 1/2] bpf, sockhash: Fix memory leak when unlinking sockets in sock_hash_free Jakub Sitnicki
2020-06-09 16:56   ` John Fastabend
2020-06-07 20:52 ` [PATCH bpf 2/2] bpf, sockhash: Synchronize delete from bucket list on map free Jakub Sitnicki
2020-06-09 17:09   ` John Fastabend
2020-06-09 18:04     ` Alexei Starovoitov [this message]
2020-06-09 20:29       ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQLe-vks4TpJzcWc+HPa1jzO2z6yZg-M8iD2JNAmfm9dXQ@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=eric.dumazet@gmail.com \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).