All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Jamal Hadi Salim <jhs@mojatatu.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>,
	Cong Wang <cong.wang@bytedance.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Dongdong Wang <wangdongdong.6@bytedance.com>
Subject: Re: [Patch bpf-next v5 1/3] bpf: introduce timeout hash map
Date: Thu, 28 Jan 2021 18:54:35 -0800	[thread overview]
Message-ID: <20210129025435.a34ydsgmwzrnwjlg@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAM_iQpXAQ7AMz34=o5E=81RFGFsQB5jCDTCCaVdHokU6kaJQsQ@mail.gmail.com>

On Wed, Jan 27, 2021 at 10:28:15PM -0800, Cong Wang wrote:
> On Wed, Jan 27, 2021 at 10:00 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Jan 26, 2021 at 11:00 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > > >               ret = PTR_ERR(l_new);
> > > > > +             if (ret == -EAGAIN) {
> > > > > +                     htab_unlock_bucket(htab, b, hash, flags);
> > > > > +                     htab_gc_elem(htab, l_old);
> > > > > +                     mod_delayed_work(system_unbound_wq, &htab->gc_work, 0);
> > > > > +                     goto again;
> > > >
> > > > Also this one looks rather worrying, so the BPF prog is stalled here, loop-waiting
> > > > in (e.g. XDP) hot path for system_unbound_wq to kick in to make forward progress?
> > >
> > > In this case, the old one is scheduled for removal in GC, we just wait for GC
> > > to finally remove it. It won't stall unless GC itself or the worker scheduler is
> > > wrong, both of which should be kernel bugs.
> > >
> > > If we don't do this, users would get a -E2BIG when it is not too big. I don't
> > > know a better way to handle this sad situation, maybe returning -EBUSY
> > > to users and let them call again?
> >
> > I think using wq for timers is a non-starter.
> > Tying a hash/lru map with a timer is not a good idea either.
> 
> Both xt_hashlimit and nf_conntrack_core use delayed/deferrable
> works, probably since their beginnings. They seem to have started
> well. ;)

That code was written when network speed was in Mbits and DDoS abbreviation
wasn't invented. Things are different now.

> > I'm proposing a timer map where each object will go through
> > bpf_timer_setup(timer, callback, flags);
> > where "callback" is a bpf subprogram.
> > Corresponding bpf_del_timer and bpf_mod_timer would work the same way
> > they are in the kernel.
> > The tricky part is kernel style of using from_timer() to access the
> > object with additional info.
> > I think bpf timer map can model it the same way.
> > At map creation time the value_size will specify the amount of extra
> > bytes necessary.
> > Another alternative is to pass an extra data argument to a callback.
> 
> Hmm, this idea is very interesting. I still think arming a timer,
> whether a kernel timer or a bpf timer, for each entry is overkill,
> but we can arm one for each map, something like:
> 
> bpf_timer_run(interval, bpf_prog, &any_map);
> 
> so we run 'bpf_prog' on any map every 'interval', but the 'bpf_prog'
> would have to iterate the whole map during each interval to delete
> the expired ones. This is probably doable: the timestamps can be
> stored either as a part of key or value, and bpf_jiffies64() is already
> available, users would have to discard expired ones after lookup
> when they are faster than the timer GC.

I meant it would look like:

noinline per_elem_callback(map, key, value, ...)
{
  if (value->foo > ...)
    bpf_delete_map_elem(map, key);
}

noinline timer_callback(timer, ctx)
{
  map = ctx->map;
  bpf_for_each_map_elem(map, per_elem_callback, ...);
}

int main_bpf_prog(skb)
{
  bpf_timer_setup(my_timer, timer_callback, ...);
  bpf_mod_timer(my_timer, HZ);
}

The bpf_for_each_map_elem() work is already in progress. Expect patches to hit
mailing list soon.
If you can work on patches for bpf_timer_*() it would be awesome.

  reply	other threads:[~2021-01-29  2:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-22 20:54 [Patch bpf-next v5 0/3] bpf: introduce timeout hash map Cong Wang
2021-01-22 20:54 ` [Patch bpf-next v5 1/3] " Cong Wang
2021-01-26 22:04   ` Daniel Borkmann
2021-01-27  6:59     ` Cong Wang
2021-01-27 18:00       ` Alexei Starovoitov
2021-01-27 22:48         ` Daniel Borkmann
2021-01-28  2:45           ` Alexei Starovoitov
2021-01-28  6:28         ` Cong Wang
2021-01-29  2:54           ` Alexei Starovoitov [this message]
2021-01-29  5:57             ` Cong Wang
2021-01-29  6:21               ` Yonghong Song
     [not found]             ` <f7bc5873-7722-e359-b450-4db7dc3656d6@mojatatu.com>
     [not found]               ` <dc5ddf32-2d65-15a9-9448-5f2d3a10d227@mojatatu.com>
2021-01-30  3:14                 ` Alexei Starovoitov
2021-01-31 20:35                   ` Jamal Hadi Salim
2021-01-22 20:54 ` [Patch bpf-next v5 2/3] selftests/bpf: add test cases for bpf timeout map Cong Wang
2021-01-22 20:54 ` [Patch bpf-next v5 3/3] selftests/bpf: add timeout map check in map_ptr tests Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210129025435.a34ydsgmwzrnwjlg@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=jhs@mojatatu.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=wangdongdong.6@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.