All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue
@ 2020-09-29 20:46 Maciej Fijalkowski
  2020-09-29 20:46 ` [PATCH bpf-next 1/2] bpf, x64: drop "pop %rcx" instruction on BPF JIT epilogue Maciej Fijalkowski
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Maciej Fijalkowski @ 2020-09-29 20:46 UTC (permalink / raw)
  To: ast, daniel; +Cc: bpf, netdev, bjorn.topel, magnus.karlsson, Maciej Fijalkowski

Hi!

This small set can be considered as a followup after recent addition of
support for tailcalls in bpf subprograms and is focused on optimizing
x64 JIT prologue and epilogue sections.

Turns out the popping tail call counter is not needed anymore and %rsp
handling when stack depth is 0 can be skipped.

For longer explanations, please see commit messages.

Thank you,
Maciej


Maciej Fijalkowski (2):
  bpf, x64: drop "pop %rcx" instruction on BPF JIT epilogue
  bpf: x64: do not emit sub/add 0, %rsp when !stack_depth

 arch/x86/net/bpf_jit_comp.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH bpf-next 1/2] bpf, x64: drop "pop %rcx" instruction on BPF JIT epilogue
  2020-09-29 20:46 [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue Maciej Fijalkowski
@ 2020-09-29 20:46 ` Maciej Fijalkowski
  2020-09-29 20:46 ` [PATCH bpf-next 2/2] bpf: x64: do not emit sub/add 0, %rsp when !stack_depth Maciej Fijalkowski
  2020-09-30  0:00 ` [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue patchwork-bot+bpf
  2 siblings, 0 replies; 4+ messages in thread
From: Maciej Fijalkowski @ 2020-09-29 20:46 UTC (permalink / raw)
  To: ast, daniel; +Cc: bpf, netdev, bjorn.topel, magnus.karlsson, Maciej Fijalkowski

Back when all of the callee-saved registers where always pushed to stack
in x64 JIT prologue, tail call counter was placed at the bottom of the
BPF program's stack frame that had a following layout:

+-------------+
|  ret addr   |
+-------------+
|     rbp     | <- rbp
+-------------+
|             |
| free space  |
| from:       |
| sub $x,%rsp |
|             |
+-------------+
|     rbx     |
+-------------+
|     r13     |
+-------------+
|     r14     |
+-------------+
|     r15     |
+-------------+
|  tail call  | <- rsp
|   counter   |
+-------------+

In order to restore the callee saved registers, epilogue needed to
explicitly toss away the tail call counter via "pop %rbx" insn, so that
%rsp would be back at the place where %r15 was stored.

Currently, the tail call counter is placed on stack *before* the callee
saved registers (brackets on rbx through r15 mean that they are now
pushed to stack only if they are used):

+-------------+
|  ret addr   |
+-------------+
|     rbp     | <- rbp
+-------------+
|             |
| free space  |
| from:       |
| sub $x,%rsp |
|             |
+-------------+
|  tail call  |
|   counter   |
+-------------+
(     rbx     )
+-------------+
(     r13     )
+-------------+
(     r14     )
+-------------+
(     r15     ) <- rsp
+-------------+

For the record, the epilogue insns consist of (assuming all of the
callee saved registers are used by program):
pop    %r15
pop    %r14
pop    %r13
pop    %rbx
pop    %rcx
leaveq
retq

"pop %rbx" for getting rid of tail call counter was not an option
anymore as it would overwrite the restored value of %rbx register, so it
was changed to use the %rcx register.

Since epilogue can start popping the callee saved registers right away
without any additional work, the "pop %rcx" could be dropped altogether
as "leave" insn will simply move the %rbp to %rsp. IOW, tail call
counter does not need the explicit handling.

Having in mind the explanation above and the actual reason for that,
let's piggy back on "leave" insn for discarding the tail call counter
from stack and remove the "pop %rcx" from epilogue.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 arch/x86/net/bpf_jit_comp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 26f43279b78b..a263918043ce 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1441,8 +1441,6 @@ xadd:			if (is_imm8(insn->off))
 			/* Update cleanup_addr */
 			ctx->cleanup_addr = proglen;
 			pop_callee_regs(&prog, callee_regs_used);
-			if (tail_call_reachable)
-				EMIT1(0x59); /* pop rcx, get rid of tail_call_cnt */
 			EMIT1(0xC9);         /* leave */
 			EMIT1(0xC3);         /* ret */
 			break;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH bpf-next 2/2] bpf: x64: do not emit sub/add 0, %rsp when !stack_depth
  2020-09-29 20:46 [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue Maciej Fijalkowski
  2020-09-29 20:46 ` [PATCH bpf-next 1/2] bpf, x64: drop "pop %rcx" instruction on BPF JIT epilogue Maciej Fijalkowski
@ 2020-09-29 20:46 ` Maciej Fijalkowski
  2020-09-30  0:00 ` [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue patchwork-bot+bpf
  2 siblings, 0 replies; 4+ messages in thread
From: Maciej Fijalkowski @ 2020-09-29 20:46 UTC (permalink / raw)
  To: ast, daniel; +Cc: bpf, netdev, bjorn.topel, magnus.karlsson, Maciej Fijalkowski

There is no particular reason for keeping the "sub 0, %rsp" insn within
the BPF's x64 JIT prologue.

When tail call code was skipping the whole prologue section these 7
bytes that represent the rsp subtraction could not be simply discarded
as the jump target address would be broken. An option to address that
would be to substitute it with nop7.

Right now tail call is skipping only first 11 bytes of target program's
prologue and "sub X, %rsp" is the first insn that is processed, so if
stack depth is zero then this insn could be omitted without the need for
nop7 swap.

Therefore, do not emit the "sub 0, %rsp" in prologue when program is not
making use of R10 register. Also, make the emission of "add X, %rsp"
conditional in tail call code logic and take into account the presence
of mentioned insn when calculating the jump offsets.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 arch/x86/net/bpf_jit_comp.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a263918043ce..796506dcfc42 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -281,7 +281,8 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
 	EMIT1(0x55);             /* push rbp */
 	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
 	/* sub rsp, rounded_stack_depth */
-	EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
+	if (stack_depth)
+		EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
 	if (tail_call_reachable)
 		EMIT1(0x50);         /* push rax */
 	*pprog = prog;
@@ -407,9 +408,9 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	int tcc_off = -4 - round_up(stack_depth, 8);
 	u8 *prog = *pprog;
 	int pop_bytes = 0;
-	int off1 = 49;
-	int off2 = 38;
-	int off3 = 16;
+	int off1 = 42;
+	int off2 = 31;
+	int off3 = 9;
 	int cnt = 0;
 
 	/* count the additional bytes used for popping callee regs from stack
@@ -421,6 +422,12 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	off2 += pop_bytes;
 	off3 += pop_bytes;
 
+	if (stack_depth) {
+		off1 += 7;
+		off2 += 7;
+		off3 += 7;
+	}
+
 	/*
 	 * rdi - pointer to ctx
 	 * rsi - pointer to bpf_array
@@ -465,8 +472,9 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
 	prog = *pprog;
 
 	EMIT1(0x58);                              /* pop rax */
-	EMIT3_off32(0x48, 0x81, 0xC4,             /* add rsp, sd */
-		    round_up(stack_depth, 8));
+	if (stack_depth)
+		EMIT3_off32(0x48, 0x81, 0xC4,     /* add rsp, sd */
+			    round_up(stack_depth, 8));
 
 	/* goto *(prog->bpf_func + X86_TAIL_CALL_OFFSET); */
 	EMIT4(0x48, 0x8B, 0x49,                   /* mov rcx, qword ptr [rcx + 32] */
@@ -491,7 +499,7 @@ static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
 	int tcc_off = -4 - round_up(stack_depth, 8);
 	u8 *prog = *pprog;
 	int pop_bytes = 0;
-	int off1 = 27;
+	int off1 = 20;
 	int poke_off;
 	int cnt = 0;
 
@@ -506,10 +514,14 @@ static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
 	 * total bytes for:
 	 * - nop5/ jmpq $off
 	 * - pop callee regs
-	 * - sub rsp, $val
+	 * - sub rsp, $val if depth > 0
 	 * - pop rax
 	 */
-	poke_off = X86_PATCH_SIZE + pop_bytes + 7 + 1;
+	poke_off = X86_PATCH_SIZE + pop_bytes + 1;
+	if (stack_depth) {
+		poke_off += 7;
+		off1 += 7;
+	}
 
 	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -533,7 +545,8 @@ static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
 	pop_callee_regs(pprog, callee_regs_used);
 	prog = *pprog;
 	EMIT1(0x58);                                  /* pop rax */
-	EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
+	if (stack_depth)
+		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
 
 	memcpy(prog, ideal_nops[NOP_ATOMIC5], X86_PATCH_SIZE);
 	prog += X86_PATCH_SIZE;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue
  2020-09-29 20:46 [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue Maciej Fijalkowski
  2020-09-29 20:46 ` [PATCH bpf-next 1/2] bpf, x64: drop "pop %rcx" instruction on BPF JIT epilogue Maciej Fijalkowski
  2020-09-29 20:46 ` [PATCH bpf-next 2/2] bpf: x64: do not emit sub/add 0, %rsp when !stack_depth Maciej Fijalkowski
@ 2020-09-30  0:00 ` patchwork-bot+bpf
  2 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+bpf @ 2020-09-30  0:00 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: bpf

Hello:

This series was applied to bpf/bpf-next.git (refs/heads/master):

On Tue, 29 Sep 2020 22:46:51 +0200 you wrote:
> Hi!
> 
> This small set can be considered as a followup after recent addition of
> support for tailcalls in bpf subprograms and is focused on optimizing
> x64 JIT prologue and epilogue sections.
> 
> Turns out the popping tail call counter is not needed anymore and %rsp
> handling when stack depth is 0 can be skipped.
> 
> [...]

Here is the summary with links:
  - [bpf-next,1/2] bpf, x64: drop "pop %rcx" instruction on BPF JIT epilogue
    https://git.kernel.org/bpf/bpf-next/c/d207929d97ea
  - [bpf-next,2/2] bpf: x64: do not emit sub/add 0, %rsp when !stack_depth
    https://git.kernel.org/bpf/bpf-next/c/4d0b8c0b46a5

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-30  0:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 20:46 [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue Maciej Fijalkowski
2020-09-29 20:46 ` [PATCH bpf-next 1/2] bpf, x64: drop "pop %rcx" instruction on BPF JIT epilogue Maciej Fijalkowski
2020-09-29 20:46 ` [PATCH bpf-next 2/2] bpf: x64: do not emit sub/add 0, %rsp when !stack_depth Maciej Fijalkowski
2020-09-30  0:00 ` [PATCH bpf-next 0/2] bpf, x64: optimize JIT's pro/epilogue patchwork-bot+bpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.