linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <song@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: bpf@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org,
	 akpm@linux-foundation.org, x86@kernel.org, hch@lst.de,
	 rick.p.edgecombe@intel.com, aaron.lu@intel.com, rppt@kernel.org,
	 mcgrof@kernel.org
Subject: Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs
Date: Wed, 30 Nov 2022 08:18:31 -0800	[thread overview]
Message-ID: <CAPhsuW4Fy4kdTqK0rHXrPprUqiab4LgcTUG6YhDQaPrWkgZjwQ@mail.gmail.com> (raw)
In-Reply-To: <871qpluxfu.ffs@tglx>

Hi Thomas,

On Tue, Nov 29, 2022 at 3:56 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Song,
>
> On Tue, Nov 29 2022 at 09:26, Song Liu wrote:
> > On Tue, Nov 29, 2022 at 2:23 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >> Modules are the obvious starting point. Once that is solved pretty much
> >> everything else falls into place including BPF.
> >>
> >> Without modules support this whole exercise is pointless and not going
> >> anywhere near x86.
> >
> > I am not sure I fully understand your point here. Do you mean
> >
> > 1) There is something wrong with this solution, that makes it not suitable
> > for modules;
> >    or
> > 2) The solution is in the right direction and it will very likely work
> > for modules.
> > But we haven't finished module support. ?
>
> As I'm obviously unable to express myself coherently, let me try again:
>
>  A solution which solves the BPF problem, but does not solve the
>  underlying problem of module_alloc() is not acceptable.
>
> Is that clear enough?

While I sincerely want to provide a solution not just for BPF but also
for modules and others, I don't think I fully understand the underlying
problem of module_alloc(). I sincerely would like to learn more about it.

>
> > If it is 1), I would like to understand what are the issues that make it not
> > suitable for modules. If it is 2), I think a solid, mostly like working small
> > step toward the right direction is the better way as it makes code reviews
> > a lot easier and has much lower risks. Does this make sense?
>
> No. Because all you are interested in is to get your BPF itch scratched
> instead of actually sitting down and solving the underlying problem and
> thereby creating a benefit for everyone.

TBH, until your reply, I thought I was working on something that would
benefit everyone. It is indeed not just for BPF itch, as bpf_prog_pack
already scratched it for BPF.

>
> You are not making anything easier. You are violating the basic
> engineering principle of "Fix the root cause, not the symptom".
>

I am not sure what is the root cause and the symptom here. I
understand ideas referred in this lwn article:

   https://lwn.net/Articles/894557/

But I don't know which one of them (if any) would fix the root cause.

> By doing that you are actually creating more problems than you
> solve. Why?
>
>   Clearly your "solution" does not cover the full requirements of the
>   module space because you solely focus on executable memory allocations
>   which somehow magically go into the module address space.
>
>   Can you coherently explain how this results in a consistent solution
>   for the rest of the module requirements?
>
>   Can you coherently explain why this wont create problems down the road
>   for anyone who actually would be willing to solve the root cause?
>
> No, you can't answer any of these questions simply because you never
> explored the problem space sufficiently.

I was thinking, for modules, we only need something new for module text,
and module data will just use vmalloc(). I guess this is probably not the
right solution?

>
> I'm not the first one to point this out. Quite some people in the
> various threads regarding this issue have been pointing that out to you
> before. They even provided you hints on how this can be solved properly
> once and forever and for everyones benefits.

I tried to review various threads. Unfortunately, I am not able to identify
the proper hints and construct a solution.

>
> > I would also highlight that part of the benefit of this work comes from
> > reducing direct map fragmentations. While BPF programs consume less
> > memory, they are more dynamic and can cause more direct map
> > fragmentations. bpf_prog_pack in upstream kernel already covers this
> > part, but this set is a better solution than bpf_prog_pack.
> >
> > Finally, I would like to point out that 5/6 and 6/6 of (v5) the set let BPF
> > programs share a 2MB page with static kernel text. Therefore, even
> > for systems without many BPF programs, we should already see some
> > reduction in iTLB misses.
>
> Can you please stop this marketing nonsense? As I pointed out to you in
> the very mail which your are replying to, the influence of BPF on the
> system I picked randomly out of the pool is pretty close to ZERO.
>
> Ergo, the reduction of iTLB misses is going to be equally close to
> ZERO. What is the benefit you are trying to sell me?
>
> I'm happy to run perf on this machine and provide the numbers which put
> your 'we should already see some reduction' handwaving into perspective.
>
> But the above is just a distraction. The real point is:
>
> You can highlight and point out the benefits of your BPF specific
> solution as much as you want, it does not make the fact that you are
> "fixing" the symptom instead of the root cause magically go away.
>
> Again for the record:
>
>   The iTLB pressure problem, which affects modules, kprobes, tracing and
>   BPF, is caused by the  way how module_alloc() is implemented.

TBH, I don't think I understand this...

Do you mean the problem with  module_alloc() is that it is not aware of
desired permissions (W or X or neither)? If so, is permission vmalloc [1]
the right direction for this?

[1] https://lwn.net/ml/linux-mm/20201120202426.18009-1-rick.p.edgecombe@intel.com/

>
> That's the root cause and this needs to be solved for _ALL_ of the users
> of this infrastructure and not worked around by adding something which
> makes BPF shiny and handwaves about that it solves the underlying
> problem.

While I did plan to enable 2MB pages for module text, I didn't plan to
solve it in the first set. However, since you think it is possible and would
like to provide directions, I am up for the challenge and will give it a try.
Please share more details about the right direction. Otherwise, I am
still lost...

Thanks,
Song


  reply	other threads:[~2022-11-30 16:18 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-07 22:39 [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 1/5] vmalloc: introduce execmem_alloc, execmem_free, and execmem_fill Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 2/5] x86/alternative: support execmem_alloc() and execmem_free() Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 3/5] bpf: use execmem_alloc for bpf program and bpf dispatcher Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 4/5] vmalloc: introduce register_text_tail_vm() Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 5/5] x86: use register_text_tail_vm Song Liu
2022-11-08 19:04   ` Edgecombe, Rick P
2022-11-08 22:15     ` Song Liu
2022-11-15 17:28       ` Edgecombe, Rick P
2022-11-07 22:55 ` [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Luis Chamberlain
2022-11-07 23:13   ` Song Liu
2022-11-07 23:39     ` Luis Chamberlain
2022-11-08  0:13       ` Edgecombe, Rick P
2022-11-08  2:45         ` Luis Chamberlain
2022-11-08 18:20         ` Song Liu
2022-11-08 18:12       ` Song Liu
2022-11-08 11:27 ` Mike Rapoport
2022-11-08 12:38   ` Aaron Lu
2022-11-09  6:55     ` Christoph Hellwig
2022-11-09 11:05       ` Peter Zijlstra
2022-11-08 16:51   ` Edgecombe, Rick P
2022-11-08 18:50     ` Song Liu
2022-11-09 11:17     ` Mike Rapoport
2022-11-09 17:04       ` Edgecombe, Rick P
2022-11-09 17:53         ` Song Liu
2022-11-13 10:34         ` Mike Rapoport
2022-11-14 20:30           ` Song Liu
2022-11-15 21:18             ` Luis Chamberlain
2022-11-15 21:39               ` Edgecombe, Rick P
2022-11-16 22:34                 ` Luis Chamberlain
2022-11-17  8:50             ` Mike Rapoport
2022-11-17 18:36               ` Song Liu
2022-11-20 10:41                 ` Mike Rapoport
2022-11-21 14:52                   ` Song Liu
2022-11-30  9:39                     ` Mike Rapoport
2022-11-09 17:43       ` Song Liu
2022-11-09 21:23         ` Christophe Leroy
2022-11-10  1:50           ` Song Liu
2022-11-13 10:42         ` Mike Rapoport
2022-11-14 20:45           ` Song Liu
2022-11-15 20:51             ` Luis Chamberlain
2022-11-20 10:44             ` Mike Rapoport
2022-11-08 18:41   ` Song Liu
2022-11-08 19:43     ` Christophe Leroy
2022-11-08 21:40       ` Song Liu
2022-11-13  9:58     ` Mike Rapoport
2022-11-14 20:13       ` Song Liu
2022-11-08 11:44 ` Christophe Leroy
2022-11-08 18:47   ` Song Liu
2022-11-08 19:32     ` Christophe Leroy
2022-11-08 11:48 ` Mike Rapoport
2022-11-15  1:30 ` Song Liu
2022-11-15 17:34   ` Edgecombe, Rick P
2022-11-15 21:54     ` Song Liu
2022-11-15 22:14       ` Edgecombe, Rick P
2022-11-15 22:32         ` Song Liu
2022-11-16  1:20         ` Song Liu
2022-11-16 21:22           ` Edgecombe, Rick P
2022-11-16 22:03             ` Song Liu
2022-11-15 21:09   ` Luis Chamberlain
2022-11-15 21:32     ` Luis Chamberlain
2022-11-15 22:48     ` Song Liu
2022-11-16 22:33       ` Luis Chamberlain
2022-11-16 22:47         ` Edgecombe, Rick P
2022-11-16 23:53           ` Luis Chamberlain
2022-11-17  1:17             ` Song Liu
2022-11-17  9:37         ` Mike Rapoport
2022-11-29 10:23   ` Thomas Gleixner
2022-11-29 17:26     ` Song Liu
2022-11-29 23:56       ` Thomas Gleixner
2022-11-30 16:18         ` Song Liu [this message]
2022-12-01  9:08           ` Thomas Gleixner
2022-12-01 19:31             ` Song Liu
2022-12-02  1:38               ` Thomas Gleixner
2022-12-02  8:38                 ` Song Liu
2022-12-02  9:22                   ` Thomas Gleixner
2022-12-06 20:25                     ` Song Liu
2022-12-07 15:36                       ` Thomas Gleixner
2022-12-07 16:53                         ` Christophe Leroy
2022-12-07 19:29                           ` Song Liu
2022-12-07 21:04                           ` Thomas Gleixner
2022-12-07 21:48                             ` Christophe Leroy
2022-12-07 19:26                         ` Song Liu
2022-12-07 20:57                           ` Thomas Gleixner
2022-12-07 23:17                             ` Song Liu
2022-12-02 10:46                 ` Christophe Leroy
2022-12-02 17:43                   ` Thomas Gleixner
2022-12-01 20:23             ` Mike Rapoport
2022-12-01 22:34               ` Thomas Gleixner
2022-12-03 14:46                 ` Mike Rapoport
2022-12-03 20:58                   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPhsuW4Fy4kdTqK0rHXrPprUqiab4LgcTUG6YhDQaPrWkgZjwQ@mail.gmail.com \
    --to=song@kernel.org \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).