linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "rppt@kernel.org" <rppt@kernel.org>
Cc: "peterz@infradead.org" <peterz@infradead.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"song@kernel.org" <song@kernel.org>, "hch@lst.de" <hch@lst.de>,
	"x86@kernel.org" <x86@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"Lu, Aaron" <aaron.lu@intel.com>
Subject: Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs
Date: Wed, 9 Nov 2022 17:04:25 +0000	[thread overview]
Message-ID: <bcdc5a31570f87267183496f06963ac58b41bfe1.camel@intel.com> (raw)
In-Reply-To: <Y2uMWvmiPlaNXlZz@kernel.org>

On Wed, 2022-11-09 at 13:17 +0200, Mike Rapoport wrote:
> On Tue, Nov 08, 2022 at 04:51:12PM +0000, Edgecombe, Rick P wrote:
> > On Tue, 2022-11-08 at 13:27 +0200, Mike Rapoport wrote:
> > > > Based on our experiments [5], we measured 0.5% performance
> > > > improvement
> > > > from bpf_prog_pack. This patchset further boosts the
> > > > improvement to
> > > > 0.7%.
> > > > The difference is because bpf_prog_pack uses 512x 4kB pages
> > > > instead
> > > > of
> > > > 1x 2MB page, bpf_prog_pack as-is doesn't resolve #2 above.
> > > > 
> > > > This patchset replaces bpf_prog_pack with a better API and
> > > > makes it
> > > > available for other dynamic kernel text, such as modules,
> > > > ftrace,
> > > > kprobe.
> > > 
> > >   
> > > The proposed execmem_alloc() looks to me very much tailored for
> > > x86
> > > to be
> > > used as a replacement for module_alloc(). Some architectures have
> > > module_alloc() that is quite different from the default or x86
> > > version, so
> > > I'd expect at least some explanation how modules etc can use
> > > execmem_
> > > APIs
> > > without breaking !x86 architectures.
> > 
> > I think this is fair, but I think we should ask ask ourselves - how
> > much should we do in one step?
> 
> I think that at least we need an evidence that execmem_alloc() etc
> can be
> actually used by modules/ftrace/kprobes. Luis said that RFC v2 didn't
> work
> for him at all, so having a core MM API for code allocation that only
> works
> with BPF on x86 seems not right to me.

Those modules changes wouldn't work on non-x86 either. Most of modules
is cross-arch, so this kind of has to work for non-text_poke() or
modules needs to be refactored.

>  
> > For non-text_poke() architectures, the way you can make it work is
> > have
> > the API look like:
> > execmem_alloc()  <- Does the allocation, but necessarily usable yet
> > execmem_write()  <- Loads the mapping, doesn't work after finish()
> > execmem_finish() <- Makes the mapping live (loaded, executable,
> > ready)
> > 
> > So for text_poke():
> > execmem_alloc()  <- reserves the mapping
> > execmem_write()  <- text_pokes() to the mapping
> > execmem_finish() <- does nothing
> > 
> > And non-text_poke():
> > execmem_alloc()  <- Allocates a regular RW vmalloc allocation
> > execmem_write()  <- Writes normally to it
> > execmem_finish() <- does set_memory_ro()/set_memory_x() on it
> > 
> > Non-text_poke() only gets the benefits of centralized logic, but
> > the
> > interface works for both. This is pretty much what the perm_alloc()
> > RFC
> > did to make it work with other arch's and modules. But to fit with
> > the
> > existing modules code (which is actually spread all over) and also
> > handle RO sections, it also needed some additional bells and
> > whistles.
> 
> I'm less concerned about non-text_poke() part, but rather about
> restrictions where code and data can live on different architectures
> and
> whether these restrictions won't lead to inability to use the
> centralized
> logic on, say, arm64 and powerpc.
> 
> For instance, if we use execmem_alloc() for modules, it means that
> data
> sections should be allocated separately with plain vmalloc(). Will
> this
> work universally? Or this will require special care with additional
> complexity in the modules code?

Good point. If the module data was still in the modules range, I would
hope it would still work, but there are a lot of architectures to
check. Some might care if the data is really close to the text. I'm not
sure.

The perm_alloc() stuff did some hacks to force the allocations close to
each other out of paranoia about this. Basically started with one
allocation, but then tracked the pieces separately so arch's could
separate them if they wanted. But I wondered if it was really needed.

>  
> > So the question I'm trying to ask is, how much should we target for
> > the
> > next step? I first thought that this functionality was so
> > intertwined,
> > it would be too hard to do iteratively. So if we want to try
> > iteratively, I'm ok if it doesn't solve everything.
> 
>  
> With execmem_alloc() as the first step I'm failing to see the large
> picture. If we want to use it for modules, how will we allocate RO
> data?

Similar to the perm_alloc() hacks?

> with similar rodata_alloc() that uses yet another tree in vmalloc? 

It would have to group them together at least. Not sure if it needs a
separate tree or not. I would think permission flags would be better
than a new function for each memory type.

> How the caching of large pages in vmalloc can be made useful for use
> cases
> like secretmem and PKS?

This part is easy I think. If we had an unmapped page allocator it
could just feed this. Do you have any idea when you might pick up that
stuff again?

To answer my own question, I think a good first step would be to make
the interface also work for non-text_poke() so it could really be cross
arch, then use it for everything except modules. The benefit to the
other arch's at that point is centralized handling of loading text. 

  reply	other threads:[~2022-11-09 17:06 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-07 22:39 [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 1/5] vmalloc: introduce execmem_alloc, execmem_free, and execmem_fill Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 2/5] x86/alternative: support execmem_alloc() and execmem_free() Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 3/5] bpf: use execmem_alloc for bpf program and bpf dispatcher Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 4/5] vmalloc: introduce register_text_tail_vm() Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 5/5] x86: use register_text_tail_vm Song Liu
2022-11-08 19:04   ` Edgecombe, Rick P
2022-11-08 22:15     ` Song Liu
2022-11-15 17:28       ` Edgecombe, Rick P
2022-11-07 22:55 ` [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Luis Chamberlain
2022-11-07 23:13   ` Song Liu
2022-11-07 23:39     ` Luis Chamberlain
2022-11-08  0:13       ` Edgecombe, Rick P
2022-11-08  2:45         ` Luis Chamberlain
2022-11-08 18:20         ` Song Liu
2022-11-08 18:12       ` Song Liu
2022-11-08 11:27 ` Mike Rapoport
2022-11-08 12:38   ` Aaron Lu
2022-11-09  6:55     ` Christoph Hellwig
2022-11-09 11:05       ` Peter Zijlstra
2022-11-08 16:51   ` Edgecombe, Rick P
2022-11-08 18:50     ` Song Liu
2022-11-09 11:17     ` Mike Rapoport
2022-11-09 17:04       ` Edgecombe, Rick P [this message]
2022-11-09 17:53         ` Song Liu
2022-11-13 10:34         ` Mike Rapoport
2022-11-14 20:30           ` Song Liu
2022-11-15 21:18             ` Luis Chamberlain
2022-11-15 21:39               ` Edgecombe, Rick P
2022-11-16 22:34                 ` Luis Chamberlain
2022-11-17  8:50             ` Mike Rapoport
2022-11-17 18:36               ` Song Liu
2022-11-20 10:41                 ` Mike Rapoport
2022-11-21 14:52                   ` Song Liu
2022-11-30  9:39                     ` Mike Rapoport
2022-11-09 17:43       ` Song Liu
2022-11-09 21:23         ` Christophe Leroy
2022-11-10  1:50           ` Song Liu
2022-11-13 10:42         ` Mike Rapoport
2022-11-14 20:45           ` Song Liu
2022-11-15 20:51             ` Luis Chamberlain
2022-11-20 10:44             ` Mike Rapoport
2022-11-08 18:41   ` Song Liu
2022-11-08 19:43     ` Christophe Leroy
2022-11-08 21:40       ` Song Liu
2022-11-13  9:58     ` Mike Rapoport
2022-11-14 20:13       ` Song Liu
2022-11-08 11:44 ` Christophe Leroy
2022-11-08 18:47   ` Song Liu
2022-11-08 19:32     ` Christophe Leroy
2022-11-08 11:48 ` Mike Rapoport
2022-11-15  1:30 ` Song Liu
2022-11-15 17:34   ` Edgecombe, Rick P
2022-11-15 21:54     ` Song Liu
2022-11-15 22:14       ` Edgecombe, Rick P
2022-11-15 22:32         ` Song Liu
2022-11-16  1:20         ` Song Liu
2022-11-16 21:22           ` Edgecombe, Rick P
2022-11-16 22:03             ` Song Liu
2022-11-15 21:09   ` Luis Chamberlain
2022-11-15 21:32     ` Luis Chamberlain
2022-11-15 22:48     ` Song Liu
2022-11-16 22:33       ` Luis Chamberlain
2022-11-16 22:47         ` Edgecombe, Rick P
2022-11-16 23:53           ` Luis Chamberlain
2022-11-17  1:17             ` Song Liu
2022-11-17  9:37         ` Mike Rapoport
2022-11-29 10:23   ` Thomas Gleixner
2022-11-29 17:26     ` Song Liu
2022-11-29 23:56       ` Thomas Gleixner
2022-11-30 16:18         ` Song Liu
2022-12-01  9:08           ` Thomas Gleixner
2022-12-01 19:31             ` Song Liu
2022-12-02  1:38               ` Thomas Gleixner
2022-12-02  8:38                 ` Song Liu
2022-12-02  9:22                   ` Thomas Gleixner
2022-12-06 20:25                     ` Song Liu
2022-12-07 15:36                       ` Thomas Gleixner
2022-12-07 16:53                         ` Christophe Leroy
2022-12-07 19:29                           ` Song Liu
2022-12-07 21:04                           ` Thomas Gleixner
2022-12-07 21:48                             ` Christophe Leroy
2022-12-07 19:26                         ` Song Liu
2022-12-07 20:57                           ` Thomas Gleixner
2022-12-07 23:17                             ` Song Liu
2022-12-02 10:46                 ` Christophe Leroy
2022-12-02 17:43                   ` Thomas Gleixner
2022-12-01 20:23             ` Mike Rapoport
2022-12-01 22:34               ` Thomas Gleixner
2022-12-03 14:46                 ` Mike Rapoport
2022-12-03 20:58                   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bcdc5a31570f87267183496f06963ac58b41bfe1.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).