bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joe Stringer <joe@cilium.io>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	Xiongchun Duan <duanxiongchun@bytedance.com>,
	Dongdong Wang <wangdongdong.6@bytedance.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Cong Wang <cong.wang@bytedance.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	Pedro Tammela <pctammela@mojatatu.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>
Subject: Re: [RFC Patch bpf-next] bpf: introduce bpf timer
Date: Mon, 10 May 2021 22:05:59 -0700	[thread overview]
Message-ID: <CAOftzPh0cj_XRES8mrNWnyKFZDLpRez09NAofmu1F1JAZf43Cw@mail.gmail.com> (raw)
In-Reply-To: <CAM_iQpWBrxuT=Y3CbhxYpE5a+QSk-O=Vj4euegggXAAKTHRBqw@mail.gmail.com>

Hi Cong,

On Sat, May 8, 2021 at 10:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Tue, Apr 27, 2021 at 11:34 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Apr 27, 2021 at 9:36 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > We don't go back to why user-space cleanup is inefficient again,
> > > do we? ;)
> >
> > I remain unconvinced that cilium conntrack _needs_ timer apis.
> > It works fine in production and I don't hear any complaints
> > from cilium users. So 'user space cleanup inefficiencies' is
> > very subjective and cannot be the reason to add timer apis.
>
> I am pretty sure I showed the original report to you when I sent
> timeout hashmap patch, in case you forgot here it is again:
> https://github.com/cilium/cilium/issues/5048
>
> and let me quote the original report here:
>
> "The current implementation (as of v1.2) for managing the contents of
> the datapath connection tracking map leaves something to be desired:
> Once per minute, the userspace cilium-agent makes a series of calls to
> the bpf() syscall to fetch all of the entries in the map to determine
> whether they should be deleted. For each entry in the map, 2-3 calls
> must be made: One to fetch the next key, one to fetch the value, and
> perhaps one to delete the entry. The maximum size of the map is 1
> million entries, and if the current count approaches this size then
> the garbage collection goroutine may spend a significant number of CPU
> cycles iterating and deleting elements from the conntrack map."

I'm also curious to hear more details as I haven't seen any recent
discussion in the common Cilium community channels (GitHub / Slack)
around deficiencies in the conntrack garbage collection since we
addressed the LRU issues upstream and switched back to LRU maps.
There's an update to the report quoted from the same link above:

"In recent releases, we've moved back to LRU for management of the CT
maps so the core problem is not as bad; furthermore we have
implemented a backoff for GC depending on the size and number of
entries in the conntrack table, so that in active environments the
userspace GC is frequent enough to prevent issues but in relatively
passive environments the userspace GC is only rarely run (to minimize
CPU impact)."

By "core problem is not as bad", I would have been referring to the
way that failing to garbage collect hashtables in a timely manner can
lead to rejecting new connections due to lack of available map space.
Switching back to LRU mitigated this concern. With a reduced frequency
of running the garbage collection logic, the CPU impact is lower as
well. I don't think we've explored batched map ops for this use case
yet either, which would already serve to improve the CPU usage
situation without extending the kernel.

The main outstanding issue I'm aware of is that we will often have a
1:1 mapping of entries in the CT map and the NAT map, and ideally we'd
like them to have tied fates but currently we have no mechanism to do
this with LRU. When LRU eviction occurs, the entries can get out of
sync until the next GC. I could imagine timers helping with this if we
were to switch back to hash maps since we could handle this problem in
custom eviction logic, but that would reintroduce the entry management
problem above. So then we'd still need more work to figure out how to
address that with a timers approach. If I were to guess right now, the
right solution for this particular problem is probably associating
programs with map entry lifecycle events (like LRU eviction) rather
than adding timers to trigger the logic we want, but that's a whole
different discussion.

Cheers,
Joe

  parent reply	other threads:[~2021-05-11  5:06 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  4:26 [RFC Patch bpf-next] bpf: introduce bpf timer Cong Wang
2021-04-01  6:38 ` Song Liu
2021-04-01 17:28   ` Cong Wang
2021-04-01 20:17     ` Song Liu
2021-04-02 17:34       ` Cong Wang
2021-04-02 17:57         ` Song Liu
2021-04-02 19:08           ` Cong Wang
2021-04-02 19:43             ` Song Liu
2021-04-02 20:57               ` Cong Wang
2021-04-02 23:31                 ` Song Liu
2021-04-05 23:49                   ` Cong Wang
2021-04-06  1:07                     ` Song Liu
2021-04-06  1:24                       ` Cong Wang
2021-04-06  6:17                         ` Song Liu
2021-04-06 16:48                           ` Cong Wang
2021-04-06 23:36                             ` Song Liu
2021-04-08 22:45                               ` Cong Wang
2021-04-02 19:28 ` Alexei Starovoitov
2021-04-02 21:24   ` Cong Wang
2021-04-02 23:45     ` Alexei Starovoitov
2021-04-06  0:36       ` Cong Wang
2021-04-12 23:01         ` Alexei Starovoitov
2021-04-15  4:02           ` Cong Wang
2021-04-15  4:25             ` Alexei Starovoitov
2021-04-15 15:51               ` Cong Wang
2021-04-26 23:00               ` Cong Wang
2021-04-26 23:05                 ` Alexei Starovoitov
2021-04-26 23:37                   ` Cong Wang
2021-04-27  2:01                     ` Alexei Starovoitov
2021-04-27 11:52                       ` Jamal Hadi Salim
2021-04-27 16:36                       ` Cong Wang
2021-04-27 18:33                         ` Alexei Starovoitov
2021-05-09  5:37                           ` Cong Wang
2021-05-10 20:55                             ` Jamal Hadi Salim
2021-05-11 21:29                               ` Cong Wang
2021-05-12 22:56                                 ` Jamal Hadi Salim
2021-05-11  5:05                             ` Joe Stringer [this message]
2021-05-11 21:08                               ` Cong Wang
2021-05-12 22:43                               ` Jamal Hadi Salim
2021-05-13 18:45                                 ` Jamal Hadi Salim
2021-05-14  2:53                                   ` Cong Wang
2021-08-11 21:03                                     ` Joe Stringer
2021-05-20 18:55 [RFC PATCH bpf-next] bpf: Introduce bpf_timer Alexei Starovoitov
2021-05-21 14:38 ` Alexei Starovoitov
2021-05-21 21:37 ` Cong Wang
2021-05-23 16:01   ` Alexei Starovoitov
2021-05-24  8:45     ` Lorenz Bauer
2021-05-25  3:16     ` Cong Wang
2021-05-25  4:59       ` Cong Wang
2021-05-25 18:21         ` Alexei Starovoitov
2021-05-25 19:35           ` Jamal Hadi Salim
2021-05-25 19:57             ` Alexei Starovoitov
2021-05-25 21:09               ` Jamal Hadi Salim
2021-05-25 22:08                 ` Alexei Starovoitov
2021-05-26 15:34                   ` Jamal Hadi Salim
2021-05-26 16:58                     ` Alexei Starovoitov
2021-05-26 18:25                       ` Jamal Hadi Salim
2021-05-30  6:36           ` Cong Wang
2021-06-02  2:00             ` Alexei Starovoitov
2021-06-02  8:48               ` Toke Høiland-Jørgensen
2021-06-02 17:54                 ` Martin KaFai Lau
2021-06-02 18:13                   ` Kumar Kartikeya Dwivedi
2021-06-02 18:26                     ` Alexei Starovoitov
2021-06-02 18:30                       ` Kumar Kartikeya Dwivedi
2021-06-02 18:46                     ` John Fastabend
2021-05-23 11:48 ` Toke Høiland-Jørgensen
2021-05-23 15:58   ` Alexei Starovoitov
2021-05-24  8:42     ` Lorenz Bauer
2021-05-24 14:48       ` Alexei Starovoitov
2021-05-24 17:33     ` Alexei Starovoitov
2021-05-24 18:39       ` Toke Høiland-Jørgensen
2021-05-24 18:38     ` Toke Høiland-Jørgensen
2021-05-24 11:49 ` Lorenz Bauer
2021-05-24 14:56   ` Alexei Starovoitov
2021-05-24 19:13     ` Andrii Nakryiko
2021-05-25  5:22       ` Cong Wang
2021-05-25 19:47         ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOftzPh0cj_XRES8mrNWnyKFZDLpRez09NAofmu1F1JAZf43Cw@mail.gmail.com \
    --to=joe@cilium.io \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=duanxiongchun@bytedance.com \
    --cc=jhs@mojatatu.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=pctammela@mojatatu.com \
    --cc=songliubraving@fb.com \
    --cc=songmuchun@bytedance.com \
    --cc=wangdongdong.6@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).