All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <song@kernel.org>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mike Rapoport <rppt@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"hch@lst.de" <hch@lst.de>,
	"rick.p.edgecombe@intel.com" <rick.p.edgecombe@intel.com>,
	"aaron.lu@intel.com" <aaron.lu@intel.com>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>
Subject: Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs
Date: Tue, 8 Nov 2022 13:40:04 -0800	[thread overview]
Message-ID: <CAPhsuW6w04_zgX=aXFVrXVYX1nnie1KN4oZZBrBNdL32-L1-qg@mail.gmail.com> (raw)
In-Reply-To: <d0c60ab6-e618-425a-4279-454901a60235@csgroup.eu>

On Tue, Nov 8, 2022 at 11:43 AM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 08/11/2022 à 19:41, Song Liu a écrit :
> > On Tue, Nov 8, 2022 at 3:27 AM Mike Rapoport <rppt@kernel.org> wrote:
> >>
> >> Hi Song,
> >>
> >> On Mon, Nov 07, 2022 at 02:39:16PM -0800, Song Liu wrote:
> >>> This patchset tries to address the following issues:
> >>>
> >>> 1. Direct map fragmentation
> >>>
> >>> On x86, STRICT_*_RWX requires the direct map of any RO+X memory to be also
> >>> RO+X. These set_memory_* calls cause 1GB page table entries to be split
> >>> into 2MB and 4kB ones. This fragmentation in direct map results in bigger
> >>> and slower page table, and pressure for both instruction and data TLB.
> >>>
> >>> Our previous work in bpf_prog_pack tries to address this issue from BPF
> >>> program side. Based on the experiments by Aaron Lu [4], bpf_prog_pack has
> >>> greatly reduced direct map fragmentation from BPF programs.
> >>
> >> Usage of set_memory_* APIs with memory allocated from vmalloc/modules
> >> virtual range does not change the direct map, but only updates the
> >> permissions in vmalloc range. The direct map splits occur in
> >> vm_remove_mappings() when the memory is *freed*.
> >>
> >> That said, both bpf_prog_pack and these patches do reduce the
> >> fragmentation, but this happens because the memory is freed to the system
> >> in 2M chunks and there are no splits of 2M pages. Besides, since the same
> >> 2M page used for many BPF programs there should be way less vfree() calls.
> >>
> >>> 2. iTLB pressure from BPF program
> >>>
> >>> Dynamic kernel text such as modules and BPF programs (even with current
> >>> bpf_prog_pack) use 4kB pages on x86, when the total size of modules and
> >>> BPF program is big, we can see visible performance drop caused by high
> >>> iTLB miss rate.
> >>
> >> Like Luis mentioned several times already, it would be nice to see numbers.
> >>
> >>> 3. TLB shootdown for short-living BPF programs
> >>>
> >>> Before bpf_prog_pack loading and unloading BPF programs requires global
> >>> TLB shootdown. This patchset (and bpf_prog_pack) replaces it with a local
> >>> TLB flush.
> >>>
> >>> 4. Reduce memory usage by BPF programs (in some cases)
> >>>
> >>> Most BPF programs and various trampolines are small, and they often
> >>> occupies a whole page. From a random server in our fleet, 50% of the
> >>> loaded BPF programs are less than 500 byte in size, and 75% of them are
> >>> less than 2kB in size. Allowing these BPF programs to share 2MB pages
> >>> would yield some memory saving for systems with many BPF programs. For
> >>> systems with only small number of BPF programs, this patch may waste a
> >>> little memory by allocating one 2MB page, but using only part of it.
> >>
> >> I'm not convinced there are memory savings here. Unless you have hundreds
> >> of BPF programs, most of 2M page will be wasted, won't it?
> >> So for systems that have moderate use of BPF most of the 2M page will be
> >> unused, right?
> >
> > There will be some memory waste in such cases. But it will get better with:
> > 1) With 4/5 and 5/5, BPF programs will share this 2MB page with kernel .text
> > section (_stext to _etext);
> > 2) modules, ftrace, kprobe will also share this 2MB page;
> > 3) There are bigger BPF programs in many use cases.
>
> And what I love with this series (for powerpc/32) is that we will likely
> now be able to have bpf, ftrace, kprobe without the performance cost of
> CONFIG_MODULES.

Yeah, I remember reading emails about using tracing tools without
CONFIG_MODULES. We still need more work (beyond this set) to make it
happen for powerpc/32. For example, current powerpc bpf_jit doesn't
support jitting into ROX memory.

Song


>
> Today, CONFIG_MODULES means page mapping, which means handling of kernel
> page in ITLB miss handlers.
>
> By using some of the space between end of rodata and start of inittext,
> we are able to use ROX linear memory which is mapped by blocks. It means
> there is no need to handle kernel text in ITLB handler (You can look at
> https://elixir.bootlin.com/linux/v6.1-rc3/source/arch/powerpc/kernel/head_8xx.S#L191
> to better understand what I'm talking about).
>
> Thanks
> Christophe

  reply	other threads:[~2022-11-08 21:40 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-07 22:39 [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 1/5] vmalloc: introduce execmem_alloc, execmem_free, and execmem_fill Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 2/5] x86/alternative: support execmem_alloc() and execmem_free() Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 3/5] bpf: use execmem_alloc for bpf program and bpf dispatcher Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 4/5] vmalloc: introduce register_text_tail_vm() Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 5/5] x86: use register_text_tail_vm Song Liu
2022-11-08 19:04   ` Edgecombe, Rick P
2022-11-08 22:15     ` Song Liu
2022-11-15 17:28       ` Edgecombe, Rick P
2022-11-07 22:55 ` [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Luis Chamberlain
2022-11-07 23:13   ` Song Liu
2022-11-07 23:39     ` Luis Chamberlain
2022-11-08  0:13       ` Edgecombe, Rick P
2022-11-08  2:45         ` Luis Chamberlain
2022-11-08 18:20         ` Song Liu
2022-11-08 18:12       ` Song Liu
2022-11-08 11:27 ` Mike Rapoport
2022-11-08 12:38   ` Aaron Lu
2022-11-09  6:55     ` Christoph Hellwig
2022-11-09 11:05       ` Peter Zijlstra
2022-11-08 16:51   ` Edgecombe, Rick P
2022-11-08 18:50     ` Song Liu
2022-11-09 11:17     ` Mike Rapoport
2022-11-09 17:04       ` Edgecombe, Rick P
2022-11-09 17:53         ` Song Liu
2022-11-13 10:34         ` Mike Rapoport
2022-11-14 20:30           ` Song Liu
2022-11-15 21:18             ` Luis Chamberlain
2022-11-15 21:39               ` Edgecombe, Rick P
2022-11-16 22:34                 ` Luis Chamberlain
2022-11-17  8:50             ` Mike Rapoport
2022-11-17 18:36               ` Song Liu
2022-11-20 10:41                 ` Mike Rapoport
2022-11-21 14:52                   ` Song Liu
2022-11-30  9:39                     ` Mike Rapoport
2022-11-09 17:43       ` Song Liu
2022-11-09 21:23         ` Christophe Leroy
2022-11-09 21:23           ` Christophe Leroy
2022-11-10  1:50           ` Song Liu
2022-11-10  1:50             ` Song Liu
2022-11-13 10:42         ` Mike Rapoport
2022-11-14 20:45           ` Song Liu
2022-11-15 20:51             ` Luis Chamberlain
2022-11-20 10:44             ` Mike Rapoport
2022-11-08 18:41   ` Song Liu
2022-11-08 19:43     ` Christophe Leroy
2022-11-08 21:40       ` Song Liu [this message]
2022-11-13  9:58     ` Mike Rapoport
2022-11-14 20:13       ` Song Liu
2022-11-08 11:44 ` Christophe Leroy
2022-11-08 18:47   ` Song Liu
2022-11-08 19:32     ` Christophe Leroy
2022-11-08 11:48 ` Mike Rapoport
2022-11-15  1:30 ` Song Liu
2022-11-15 17:34   ` Edgecombe, Rick P
2022-11-15 21:54     ` Song Liu
2022-11-15 22:14       ` Edgecombe, Rick P
2022-11-15 22:32         ` Song Liu
2022-11-16  1:20         ` Song Liu
2022-11-16 21:22           ` Edgecombe, Rick P
2022-11-16 22:03             ` Song Liu
2022-11-15 21:09   ` Luis Chamberlain
2022-11-15 21:32     ` Luis Chamberlain
2022-11-15 22:48     ` Song Liu
2022-11-16 22:33       ` Luis Chamberlain
2022-11-16 22:47         ` Edgecombe, Rick P
2022-11-16 23:53           ` Luis Chamberlain
2022-11-17  1:17             ` Song Liu
2022-11-17  9:37         ` Mike Rapoport
2022-11-29 10:23   ` Thomas Gleixner
2022-11-29 17:26     ` Song Liu
2022-11-29 23:56       ` Thomas Gleixner
2022-11-30 16:18         ` Song Liu
2022-12-01  9:08           ` Thomas Gleixner
2022-12-01 19:31             ` Song Liu
2022-12-02  1:38               ` Thomas Gleixner
2022-12-02  8:38                 ` Song Liu
2022-12-02  9:22                   ` Thomas Gleixner
2022-12-06 20:25                     ` Song Liu
2022-12-07 15:36                       ` Thomas Gleixner
2022-12-07 16:53                         ` Christophe Leroy
2022-12-07 19:29                           ` Song Liu
2022-12-07 21:04                           ` Thomas Gleixner
2022-12-07 21:48                             ` Christophe Leroy
2022-12-07 19:26                         ` Song Liu
2022-12-07 20:57                           ` Thomas Gleixner
2022-12-07 23:17                             ` Song Liu
2022-12-02 10:46                 ` Christophe Leroy
2022-12-02 17:43                   ` Thomas Gleixner
2022-12-01 20:23             ` Mike Rapoport
2022-12-01 22:34               ` Thomas Gleixner
2022-12-03 14:46                 ` Mike Rapoport
2022-12-03 20:58                   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPhsuW6w04_zgX=aXFVrXVYX1nnie1KN4oZZBrBNdL32-L1-qg@mail.gmail.com' \
    --to=song@kernel.org \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=hch@lst.de \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.