All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@intel.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Song Liu <song@kernel.org>, <bpf@vger.kernel.org>,
	<linux-mm@kvack.org>, <akpm@linux-foundation.org>,
	<x86@kernel.org>, <peterz@infradead.org>, <hch@lst.de>,
	<rick.p.edgecombe@intel.com>, <mcgrof@kernel.org>
Subject: Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs
Date: Tue, 8 Nov 2022 20:38:32 +0800	[thread overview]
Message-ID: <Y2pNyKmMnOEeongp@ziqianlu-desk2> (raw)
In-Reply-To: <Y2o9Iz30A3Nruqs4@kernel.org>

Hi Mike,

On Tue, Nov 08, 2022 at 01:27:31PM +0200, Mike Rapoport wrote:
> Hi Song,
>  
> On Mon, Nov 07, 2022 at 02:39:16PM -0800, Song Liu wrote:
> > This patchset tries to address the following issues:
> > 
> > 1. Direct map fragmentation
> > 
> > On x86, STRICT_*_RWX requires the direct map of any RO+X memory to be also
> > RO+X. These set_memory_* calls cause 1GB page table entries to be split
> > into 2MB and 4kB ones. This fragmentation in direct map results in bigger
> > and slower page table, and pressure for both instruction and data TLB.
> >
> > Our previous work in bpf_prog_pack tries to address this issue from BPF
> > program side. Based on the experiments by Aaron Lu [4], bpf_prog_pack has
> > greatly reduced direct map fragmentation from BPF programs.
> 
> Usage of set_memory_* APIs with memory allocated from vmalloc/modules
> virtual range does not change the direct map, but only updates the
> permissions in vmalloc range. The direct map splits occur in
> vm_remove_mappings() when the memory is *freed*.

set_memory_nx/x() on a vmalloced range will not affect direct map but
set_memory_ro/rw() will. set_memory_ro/rw() cares about other alias
mappings and will do the same permission change for that alias mapping,
e.g. direct mapping.

For this reason, the bpf prog load can trigger a direct map split. A
sample callstack on x86_64 VM looks like this:

[   40.602450] address=0xffffffffc01e2000 numpages=1 set=0x0 clr=0x2 alias=1
[   40.614566] address=0xffff88816ee1e000 numpages=1 set=0x0 clr=0x2 alias=0
[   40.627641] split: address=0xffff88816ee1e000, level=2
[   40.627981] CPU: 15 PID: 534 Comm: sockex1 Not tainted 5.17.0-dirty #28
[   40.628421] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[   40.628996] Call Trace:
[   40.629161]  <TASK>
[   40.629304]  dump_stack_lvl+0x45/0x59
[   40.629550]  __change_page_attr_set_clr+0x718/0x8d4
[   40.629872]  ? static_protections+0x1c8/0x1fd
[   40.630160]  ? dump_stack_lvl+0x54/0x59
[   40.630418]  __change_page_attr_set_clr+0x7ff/0x8d4
[   40.630739]  ? _raw_spin_unlock+0x14/0x30
[   40.631004]  ? __purge_vmap_area_lazy+0x323/0x720
[   40.631316]  ? _raw_spin_unlock_irqrestore+0x18/0x40
[   40.631646]  change_page_attr_set_clr.cold+0x2f/0x164
[   40.631979]  set_memory_ro+0x26/0x30
[   40.632215]  bpf_int_jit_compile+0x4a1/0x4e0
[   40.632502]  bpf_prog_select_runtime+0xad/0xf0
[   40.632794]  bpf_prog_load+0x6a1/0xa20
[   40.633044]  ? _raw_spin_trylock_bh+0x1/0x50
[   40.633328]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[   40.633656]  ? free_debug_processing+0x1f8/0x2c0
[   40.633964]  ? __slab_free+0x2f0/0x4f0
[   40.634214]  ? trace_hardirqs_on+0x2b/0xf0
[   40.634492]  __sys_bpf+0xb20/0x2750
[   40.634726]  ? __might_fault+0x1e/0x20
[   40.634978]  __x64_sys_bpf+0x1c/0x20
[   40.635216]  do_syscall_64+0x38/0x90
[   40.635457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   40.635792] RIP: 0033:0x7fd4f2cacfbd
[   40.636030] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 ce 0e 00 f7 d8 64 89 01 48
[   40.637253] RSP: 002b:00007ffddf20b2d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[   40.637752] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fd4f2cacfbd
[   40.638220] RDX: 0000000000000080 RSI: 00007ffddf20b360 RDI: 0000000000000005
[   40.638689] RBP: 0000000000436c48 R08: 0000000000000000 R09: 0000000000000000
[   40.639156] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000080
[   40.639627] R13: 00007ffddf20b360 R14: 0000000000000000 R15: 0000000000000000
[   40.640096]  </TASK>

Regards,
Aaron

  reply	other threads:[~2022-11-08 12:38 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-07 22:39 [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 1/5] vmalloc: introduce execmem_alloc, execmem_free, and execmem_fill Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 2/5] x86/alternative: support execmem_alloc() and execmem_free() Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 3/5] bpf: use execmem_alloc for bpf program and bpf dispatcher Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 4/5] vmalloc: introduce register_text_tail_vm() Song Liu
2022-11-07 22:39 ` [PATCH bpf-next v2 5/5] x86: use register_text_tail_vm Song Liu
2022-11-08 19:04   ` Edgecombe, Rick P
2022-11-08 22:15     ` Song Liu
2022-11-15 17:28       ` Edgecombe, Rick P
2022-11-07 22:55 ` [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs Luis Chamberlain
2022-11-07 23:13   ` Song Liu
2022-11-07 23:39     ` Luis Chamberlain
2022-11-08  0:13       ` Edgecombe, Rick P
2022-11-08  2:45         ` Luis Chamberlain
2022-11-08 18:20         ` Song Liu
2022-11-08 18:12       ` Song Liu
2022-11-08 11:27 ` Mike Rapoport
2022-11-08 12:38   ` Aaron Lu [this message]
2022-11-09  6:55     ` Christoph Hellwig
2022-11-09 11:05       ` Peter Zijlstra
2022-11-08 16:51   ` Edgecombe, Rick P
2022-11-08 18:50     ` Song Liu
2022-11-09 11:17     ` Mike Rapoport
2022-11-09 17:04       ` Edgecombe, Rick P
2022-11-09 17:53         ` Song Liu
2022-11-13 10:34         ` Mike Rapoport
2022-11-14 20:30           ` Song Liu
2022-11-15 21:18             ` Luis Chamberlain
2022-11-15 21:39               ` Edgecombe, Rick P
2022-11-16 22:34                 ` Luis Chamberlain
2022-11-17  8:50             ` Mike Rapoport
2022-11-17 18:36               ` Song Liu
2022-11-20 10:41                 ` Mike Rapoport
2022-11-21 14:52                   ` Song Liu
2022-11-30  9:39                     ` Mike Rapoport
2022-11-09 17:43       ` Song Liu
2022-11-09 21:23         ` Christophe Leroy
2022-11-09 21:23           ` Christophe Leroy
2022-11-10  1:50           ` Song Liu
2022-11-10  1:50             ` Song Liu
2022-11-13 10:42         ` Mike Rapoport
2022-11-14 20:45           ` Song Liu
2022-11-15 20:51             ` Luis Chamberlain
2022-11-20 10:44             ` Mike Rapoport
2022-11-08 18:41   ` Song Liu
2022-11-08 19:43     ` Christophe Leroy
2022-11-08 21:40       ` Song Liu
2022-11-13  9:58     ` Mike Rapoport
2022-11-14 20:13       ` Song Liu
2022-11-08 11:44 ` Christophe Leroy
2022-11-08 18:47   ` Song Liu
2022-11-08 19:32     ` Christophe Leroy
2022-11-08 11:48 ` Mike Rapoport
2022-11-15  1:30 ` Song Liu
2022-11-15 17:34   ` Edgecombe, Rick P
2022-11-15 21:54     ` Song Liu
2022-11-15 22:14       ` Edgecombe, Rick P
2022-11-15 22:32         ` Song Liu
2022-11-16  1:20         ` Song Liu
2022-11-16 21:22           ` Edgecombe, Rick P
2022-11-16 22:03             ` Song Liu
2022-11-15 21:09   ` Luis Chamberlain
2022-11-15 21:32     ` Luis Chamberlain
2022-11-15 22:48     ` Song Liu
2022-11-16 22:33       ` Luis Chamberlain
2022-11-16 22:47         ` Edgecombe, Rick P
2022-11-16 23:53           ` Luis Chamberlain
2022-11-17  1:17             ` Song Liu
2022-11-17  9:37         ` Mike Rapoport
2022-11-29 10:23   ` Thomas Gleixner
2022-11-29 17:26     ` Song Liu
2022-11-29 23:56       ` Thomas Gleixner
2022-11-30 16:18         ` Song Liu
2022-12-01  9:08           ` Thomas Gleixner
2022-12-01 19:31             ` Song Liu
2022-12-02  1:38               ` Thomas Gleixner
2022-12-02  8:38                 ` Song Liu
2022-12-02  9:22                   ` Thomas Gleixner
2022-12-06 20:25                     ` Song Liu
2022-12-07 15:36                       ` Thomas Gleixner
2022-12-07 16:53                         ` Christophe Leroy
2022-12-07 19:29                           ` Song Liu
2022-12-07 21:04                           ` Thomas Gleixner
2022-12-07 21:48                             ` Christophe Leroy
2022-12-07 19:26                         ` Song Liu
2022-12-07 20:57                           ` Thomas Gleixner
2022-12-07 23:17                             ` Song Liu
2022-12-02 10:46                 ` Christophe Leroy
2022-12-02 17:43                   ` Thomas Gleixner
2022-12-01 20:23             ` Mike Rapoport
2022-12-01 22:34               ` Thomas Gleixner
2022-12-03 14:46                 ` Mike Rapoport
2022-12-03 20:58                   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2pNyKmMnOEeongp@ziqianlu-desk2 \
    --to=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.