BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: ast@kernel.org, daniel@iogearbox.net
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	bjorn.topel@intel.com, magnus.karlsson@intel.com,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Subject: [RFC PATCH bpf-next 0/1] bpf, x64: optimize JIT prologue/epilogue generation
Date: Mon, 11 May 2020 16:39:11 +0200
Message-ID: <20200511143912.34086-1-maciej.fijalkowski@intel.com> (raw)


Today, BPF x86-64 JIT is preserving all of the callee-saved registers
for each BPF program being JITed, even when none of the R6-R9 registers
are used by the BPF program. Furthermore the tail call counter is always
pushed/popped to/from the stack even when there is no tail call usage in
BPF program being JITed. Optimization can be introduced that would
detect the usage of R6-R9 and based on that push/pop to/from the stack
only what is needed. Same goes for tail call counter.

Results look promising for such instruction reduction. Below are the
numbers for xdp1 sample on FVL 40G NIC receiving traffic from pktgen:

* With optimization: 22.3 Mpps
* Without:           19.0 mpps

So it's around 15% of performance improvement. Note that xdp1 is not
using any of callee saved registers, nor the tail call, hence such

There is one detail that needs to be handled though.

Currently, x86-64 JIT tail call implementation is skipping the prologue
of target BPF program that has constant size. With the mentioned
optimization implemented, each particular BPF program that might be
inserted onto the prog array map and therefore be the target of tail
call, could have various prologue size.

Let's have some pseudo-code example:




Today, pro and epi are always the same (9/7) instructions. So a tail
call from func1 to func2 is just a:

jump func2 + sizeof pro in bytes (PROLOGUE_SIZE)

With the optimization:




For making the tail calls up and running with the mentioned optimization
in place, x86-64 JIT should emit the pop registers instructions
that were pushed on prologue before the actual jump. Jump offset should
skip the instructions that are handling rbp/rsp, not the whole prologue.

A tail call within func1 would then need to be:
epi -> pop what pro pushed, but no leave/ret instructions
jump func2 + 16 // first push insn of pro'; if no push, then this would
                // a direct jump to code'

Magic value of 16 comes from count of bytes that represent instructions
that are skipped:
0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
55                      push   %rbp
48 89 e5                mov    %rsp,%rbp
48 81 ec 08 00 00 00    sub    $0x8,%rsp

which would in many cases add *more* instructions for tailcalls. If none
of callee-saved registers are used, then there would be no overhead with
such optimization in place.

I'm not sure how to measure properly the impact on the BPF programs that
are utilizing tail calls. Any suggestions?

Daniel, Alexei, what is your view on this?

For implementation details, see commit message of included patch.

Thank you,

Maciej Fijalkowski (1):
  bpf, x64: optimize JIT prologue/epilogue generation

 arch/x86/net/bpf_jit_comp.c | 190 ++++++++++++++++++++++++++++--------
 1 file changed, 148 insertions(+), 42 deletions(-)


             reply index

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11 14:39 Maciej Fijalkowski [this message]
2020-05-11 14:39 ` [RFC PATCH bpf-next 1/1] " Maciej Fijalkowski
2020-05-11 20:05 ` [RFC PATCH bpf-next 0/1] " Daniel Borkmann
2020-05-12  0:01   ` Alexei Starovoitov
2020-05-13 11:58     ` Maciej Fijalkowski
2020-05-17  4:32       ` getting bpf_tail_call to work with bpf function calls. Was: " Alexei Starovoitov
2020-05-18 18:44         ` Maciej Fijalkowski
2020-05-21  4:05           ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200511143912.34086-1-maciej.fijalkowski@intel.com \
    --to=maciej.fijalkowski@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git