All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Andrii Nakryiko <andriin@fb.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, ast@fb.com,
	daniel@iogearbox.net, andrii.nakryiko@gmail.com,
	kernel-team@fb.com, "Paul E . McKenney" <paulmck@kernel.org>,
	Jonathan Lemon <jonathan.lemon@gmail.com>
Subject: Re: [PATCH bpf-next 1/6] bpf: implement BPF ring buffer and verifier support for it
Date: Thu, 14 May 2020 12:06:49 -0700	[thread overview]
Message-ID: <20200514190649.ca4qugueh5sp32ax@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20200513192532.4058934-2-andriin@fb.com>

On Wed, May 13, 2020 at 12:25:27PM -0700, Andrii Nakryiko wrote:
> +
> +/* Given pointer to ring buffer record metadata, restore pointer to struct
> + * bpf_ringbuf itself by using page offset stored at offset 4
> + */
> +static struct bpf_ringbuf *bpf_ringbuf_restore_from_rec(void *meta_ptr)
> +{
> +	unsigned long addr = (unsigned long)meta_ptr;
> +	unsigned long off = *(u32 *)(meta_ptr + 4) << PAGE_SHIFT;

Looking at the further code it seems this one should be READ_ONCE, but...

> +
> +	return (void*)((addr & PAGE_MASK) - off);
> +}
> +
> +static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
> +{
> +	unsigned long cons_pos, prod_pos, new_prod_pos, flags;
> +	u32 len, pg_off;
> +	void *meta_ptr;
> +
> +	if (unlikely(size > UINT_MAX))
> +		return NULL;
> +
> +	len = round_up(size + RINGBUF_META_SZ, 8);

it may overflow despite the check above.

> +	cons_pos = READ_ONCE(rb->consumer_pos);
> +
> +	if (in_nmi()) {
> +		if (!spin_trylock_irqsave(&rb->spinlock, flags))
> +			return NULL;
> +	} else {
> +		spin_lock_irqsave(&rb->spinlock, flags);
> +	}
> +
> +	prod_pos = rb->producer_pos;
> +	new_prod_pos = prod_pos + len;
> +
> +	/* check for out of ringbuf space by ensuring producer position
> +	 * doesn't advance more than (ringbuf_size - 1) ahead
> +	 */
> +	if (new_prod_pos - cons_pos > rb->mask) {
> +		spin_unlock_irqrestore(&rb->spinlock, flags);
> +		return NULL;
> +	}
> +
> +	meta_ptr = rb->data + (prod_pos & rb->mask);
> +	pg_off = bpf_ringbuf_rec_pg_off(rb, meta_ptr);
> +
> +	WRITE_ONCE(*(u32 *)meta_ptr, RINGBUF_BUSY_BIT | size);
> +	WRITE_ONCE(*(u32 *)(meta_ptr + 4), pg_off);

it doens't match to few other places where normal read is done.
But why WRITE_ONCE here?
How does it race with anything?
producer_pos is updated later.

> +
> +	/* ensure length prefix is written before updating producer positions */
> +	smp_wmb();

this barrier is enough to make sure meta_ptr and meta_ptr+4 init
is visible before producer_pos is updated below.

> +	WRITE_ONCE(rb->producer_pos, new_prod_pos);
> +
> +	spin_unlock_irqrestore(&rb->spinlock, flags);
> +
> +	return meta_ptr + RINGBUF_META_SZ;
> +}
> +
> +BPF_CALL_3(bpf_ringbuf_reserve, struct bpf_map *, map, u64, size, u64, flags)
> +{
> +	struct bpf_ringbuf_map *rb_map;
> +
> +	if (unlikely(flags))
> +		return -EINVAL;
> +
> +	rb_map = container_of(map, struct bpf_ringbuf_map, map);
> +	return (unsigned long)__bpf_ringbuf_reserve(rb_map->rb, size);
> +}
> +
> +const struct bpf_func_proto bpf_ringbuf_reserve_proto = {
> +	.func		= bpf_ringbuf_reserve,
> +	.ret_type	= RET_PTR_TO_ALLOC_MEM_OR_NULL,
> +	.arg1_type	= ARG_CONST_MAP_PTR,
> +	.arg2_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
> +	.arg3_type	= ARG_ANYTHING,
> +};
> +
> +static void bpf_ringbuf_commit(void *sample, bool discard)
> +{
> +	unsigned long rec_pos, cons_pos;
> +	u32 new_meta, old_meta;
> +	void *meta_ptr;
> +	struct bpf_ringbuf *rb;
> +
> +	meta_ptr = sample - RINGBUF_META_SZ;
> +	rb = bpf_ringbuf_restore_from_rec(meta_ptr);
> +	old_meta = *(u32 *)meta_ptr;

I think this one will race with user space and should be READ_ONCE.

> +	new_meta = old_meta ^ RINGBUF_BUSY_BIT;
> +	if (discard)
> +		new_meta |= RINGBUF_DISCARD_BIT;
> +
> +	/* update metadata header with correct final size prefix */
> +	xchg((u32 *)meta_ptr, new_meta);
> +
> +	/* if consumer caught up and is waiting for our record, notify about
> +	 * new data availability
> +	 */
> +	rec_pos = (void *)meta_ptr - (void *)rb->data;
> +	cons_pos = smp_load_acquire(&rb->consumer_pos) & rb->mask;

hmm. Earlier WRITE_ONCE(rb->producer_pos) is used, but here it's load_acquire.
Please be consistent with pairing.

> +	if (cons_pos == rec_pos)
> +		wake_up_all(&rb->waitq);

Is it legal to do from preempt_disabled region?

  parent reply	other threads:[~2020-05-14 19:07 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13 19:25 [PATCH bpf-next 0/6] BPF ring buffer Andrii Nakryiko
2020-05-13 19:25 ` [PATCH bpf-next 1/6] bpf: implement BPF ring buffer and verifier support for it Andrii Nakryiko
2020-05-13 20:57   ` kbuild test robot
2020-05-13 20:57     ` kbuild test robot
2020-05-13 21:58   ` Alan Maguire
2020-05-14  5:59     ` Andrii Nakryiko
2020-05-14 22:25       ` Alan Maguire
2020-05-13 22:16   ` kbuild test robot
2020-05-13 22:16     ` kbuild test robot
2020-05-14 16:50   ` Jonathan Lemon
2020-05-14 20:11     ` Andrii Nakryiko
2020-05-14 17:33   ` sdf
2020-05-14 20:18     ` Andrii Nakryiko
2020-05-14 20:53       ` sdf
2020-05-14 21:13         ` Andrii Nakryiko
2020-05-14 21:56           ` Stanislav Fomichev
2020-05-14 19:06   ` Alexei Starovoitov [this message]
2020-05-14 20:49     ` Andrii Nakryiko
2020-05-14 19:18   ` Jakub Kicinski
2020-05-14 19:18     ` Jakub Kicinski
2020-05-14 20:39     ` Thomas Gleixner
2020-05-14 21:30       ` Andrii Nakryiko
2020-05-14 22:13         ` Paul E. McKenney
2020-05-14 22:56         ` Alexei Starovoitov
2020-05-14 23:06           ` Andrii Nakryiko
2020-05-13 19:25 ` [PATCH bpf-next 2/6] tools/memory-model: add BPF ringbuf MPSC litmus tests Andrii Nakryiko
2020-05-13 19:25 ` [PATCH bpf-next 3/6] bpf: track reference type in verifier Andrii Nakryiko
2020-05-13 19:25 ` [PATCH bpf-next 4/6] libbpf: add BPF ring buffer support Andrii Nakryiko
2020-05-13 19:25 ` [PATCH bpf-next 5/6] selftests/bpf: add BPF ringbuf selftests Andrii Nakryiko
2020-05-13 19:25 ` [PATCH bpf-next 6/6] bpf: add BPF ringbuf and perf buffer benchmarks Andrii Nakryiko
2020-05-13 22:49 ` [PATCH bpf-next 0/6] BPF ring buffer Jonathan Lemon
2020-05-14  6:08   ` Andrii Nakryiko
2020-05-14 16:30     ` Jonathan Lemon
2020-05-14 20:06       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200514190649.ca4qugueh5sp32ax@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jonathan.lemon@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.