All of lore.kernel.org
 help / color / mirror / Atom feed
* assembler mnenomics for call/tailcall plus maps...
@ 2017-04-27 20:42 David Miller
  2017-04-28  2:06   ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2017-04-27 20:42 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, xdp-newbies


Can you guys give me some kind of idea of how it might be nice to
represent calls and tailcalls in assembler files?

And also the emission of maps.

Right now I just have the assembler looking for 32-bit immediate
values for call and tailcall instructions.

Looking at samples/bpf/sockex3_kern.c we have:

struct bpf_map_def SEC("maps") jmp_table = {
	.type = BPF_MAP_TYPE_PROG_ARRAY,
	.key_size = sizeof(u32),
	.value_size = sizeof(u32),
	.max_entries = 8,
};

#define PARSE_VLAN 1
#define PARSE_MPLS 2
#define PARSE_IP 3
#define PARSE_IPV6 4

/* protocol dispatch routine.
 * It tail-calls next BPF program depending on eth proto
 * Note, we could have used:
 * bpf_tail_call(skb, &jmp_table, proto);
 * but it would need large prog_array
 */
static inline void parse_eth_proto(struct __sk_buff *skb, u32 proto)
{
	switch (proto) {
	case ETH_P_8021Q:
	case ETH_P_8021AD:
		bpf_tail_call(skb, &jmp_table, PARSE_VLAN);
		break;
	case ETH_P_MPLS_UC:
	case ETH_P_MPLS_MC:
		bpf_tail_call(skb, &jmp_table, PARSE_MPLS);
		break;
	case ETH_P_IP:
		bpf_tail_call(skb, &jmp_table, PARSE_IP);
		break;
	case ETH_P_IPV6:
		bpf_tail_call(skb, &jmp_table, PARSE_IPV6);
		break;
	}
}

and these bpf_tail_call() invocations seem to expand to something like:

	call	1 ! PARSE_VLAN
	call	2 ! PARSE_MPLS
	call	3 ! PARSE_IP
	call	4 ! PARSE_IPV6

in the resultant ELF file.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: assembler mnenomics for call/tailcall plus maps...
  2017-04-27 20:42 assembler mnenomics for call/tailcall plus maps David Miller
@ 2017-04-28  2:06   ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2017-04-28  2:06 UTC (permalink / raw)
  To: David Miller, daniel; +Cc: netdev, xdp-newbies

On 4/27/17 1:42 PM, David Miller wrote:
>
> Can you guys give me some kind of idea of how it might be nice to
> represent calls and tailcalls in assembler files?

llvm prints them as 'call 1' for bpf_map_lookup
and 'call 12' for bpf_tail_call.
In the instruction stream bpf_tail_call is no different
than regular call. Only after verifier it becomes special instruction.
This 'call 1234' is obviously not readable.
We probably should allow both 'call 123' to make sure that
we don't need to change assembler/compiler with every new
helper added and allow more sane
'call bpf_map_lookup_elem'
for helpers with known func_id.

> And also the emission of maps.
>
> Right now I just have the assembler looking for 32-bit immediate
> values for call and tailcall instructions.
>
> Looking at samples/bpf/sockex3_kern.c we have:
>
> struct bpf_map_def SEC("maps") jmp_table = {
> 	.type = BPF_MAP_TYPE_PROG_ARRAY,
> 	.key_size = sizeof(u32),
> 	.value_size = sizeof(u32),
> 	.max_entries = 8,
> };

yeah. this one is tricky.
Currently there is a protocol between C file section name
and user space bpf_loader.
The above 'struct bpf_map_def' is recognized by
samples/bpf/bpf_load.c
and by tools/lib/bpf/libbpf.c which is used by perf.
iproute2 allows extended format with two extra fields
for pinning.

So if we write the same stuff in assembler, the existing
loaders will understand it.
The tricky part is in relocations.
To do bpf_map_lookup_elem() the R1 needs to point to map,
so in assembler we need to be able to say something like:

ldimm64 r1, jmp_table

where jmp_table will be in data section named 'maps'.

So in asm the map lookup will look like:
	.section        maps,"aw",@progbits
         .globl  hashmap_def
hashmap_def:
         .long   1  # type
         .long   24 # key_size
         .long   40 # value_size
         .long   256 # max_entries

         .text
         .section        xdp_tx_iptunnel,"ax",@progbits
         .globl  _xdp_prog
_xdp_prog:
          ldimm64 r1, hashmap_def
          mov r2, r10
          add r2, -8
          call bpf_map_lookup_elem

this is 64-bit relo for ldimm64 insn

This is how it's defined in llvm:
ELF_RELOC(R_BPF_NONE,        0)
ELF_RELOC(R_BPF_64_64,       1)
ELF_RELOC(R_BPF_64_32,      10)

The R_BPF_64_64 is the only relocation against .text
The other one is used for relo into dwarf sections.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: assembler mnenomics for call/tailcall plus maps...
@ 2017-04-28  2:06   ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2017-04-28  2:06 UTC (permalink / raw)
  To: David Miller, daniel; +Cc: netdev, xdp-newbies

On 4/27/17 1:42 PM, David Miller wrote:
>
> Can you guys give me some kind of idea of how it might be nice to
> represent calls and tailcalls in assembler files?

llvm prints them as 'call 1' for bpf_map_lookup
and 'call 12' for bpf_tail_call.
In the instruction stream bpf_tail_call is no different
than regular call. Only after verifier it becomes special instruction.
This 'call 1234' is obviously not readable.
We probably should allow both 'call 123' to make sure that
we don't need to change assembler/compiler with every new
helper added and allow more sane
'call bpf_map_lookup_elem'
for helpers with known func_id.

> And also the emission of maps.
>
> Right now I just have the assembler looking for 32-bit immediate
> values for call and tailcall instructions.
>
> Looking at samples/bpf/sockex3_kern.c we have:
>
> struct bpf_map_def SEC("maps") jmp_table = {
> 	.type = BPF_MAP_TYPE_PROG_ARRAY,
> 	.key_size = sizeof(u32),
> 	.value_size = sizeof(u32),
> 	.max_entries = 8,
> };

yeah. this one is tricky.
Currently there is a protocol between C file section name
and user space bpf_loader.
The above 'struct bpf_map_def' is recognized by
samples/bpf/bpf_load.c
and by tools/lib/bpf/libbpf.c which is used by perf.
iproute2 allows extended format with two extra fields
for pinning.

So if we write the same stuff in assembler, the existing
loaders will understand it.
The tricky part is in relocations.
To do bpf_map_lookup_elem() the R1 needs to point to map,
so in assembler we need to be able to say something like:

ldimm64 r1, jmp_table

where jmp_table will be in data section named 'maps'.

So in asm the map lookup will look like:
	.section        maps,"aw",@progbits
         .globl  hashmap_def
hashmap_def:
         .long   1  # type
         .long   24 # key_size
         .long   40 # value_size
         .long   256 # max_entries

         .text
         .section        xdp_tx_iptunnel,"ax",@progbits
         .globl  _xdp_prog
_xdp_prog:
          ldimm64 r1, hashmap_def
          mov r2, r10
          add r2, -8
          call bpf_map_lookup_elem

this is 64-bit relo for ldimm64 insn

This is how it's defined in llvm:
ELF_RELOC(R_BPF_NONE,        0)
ELF_RELOC(R_BPF_64_64,       1)
ELF_RELOC(R_BPF_64_32,      10)

The R_BPF_64_64 is the only relocation against .text
The other one is used for relo into dwarf sections.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: assembler mnenomics for call/tailcall plus maps...
  2017-04-28  2:06   ` Alexei Starovoitov
  (?)
@ 2017-04-29 18:38   ` David Miller
  2017-04-30  6:35       ` Alexei Starovoitov
  -1 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2017-04-29 18:38 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies

From: Alexei Starovoitov <ast@fb.com>
Date: Thu, 27 Apr 2017 19:06:14 -0700

> So in asm the map lookup will look like:
> 	.section        maps,"aw",@progbits
>         .globl  hashmap_def
> hashmap_def:
>         .long   1  # type
>         .long   24 # key_size
>         .long   40 # value_size
>         .long   256 # max_entries
> 
>         .text
>         .section        xdp_tx_iptunnel,"ax",@progbits
>         .globl  _xdp_prog
> _xdp_prog:
>          ldimm64 r1, hashmap_def
>          mov r2, r10
>          add r2, -8
>          call bpf_map_lookup_elem
> 
> this is 64-bit relo for ldimm64 insn
> 
> This is how it's defined in llvm:
> ELF_RELOC(R_BPF_NONE,        0)
> ELF_RELOC(R_BPF_64_64,       1)
> ELF_RELOC(R_BPF_64_32,      10)
> 
> The R_BPF_64_64 is the only relocation against .text
> The other one is used for relo into dwarf sections.

There is another missing element here, we don't use the relocation to
write the actual address of "&hashmap_def" into the ldimm64
instruction.

Instead, we use the relocation to write the file descriptor number for
that map into the instruction.  The only parts of the relocation that
are used are the offset (to find the BPF instruction) and the symbol
reference (to find out which map we are referencing).

The BPF loader also doesn't even check the relocation number to make
sure it's of the correct type.  It just assumes that any relocation it
sees is exactly this kind used for maps.

My relocations in binutils currently look like this:

START_RELOC_NUMBERS (elf_bpf_reloc_type)
  RELOC_NUMBER (R_BPF_NONE, 0)
  RELOC_NUMBER (R_BPF_INSN_16, 1)
  RELOC_NUMBER (R_BPF_INSN_32, 2)
  RELOC_NUMBER (R_BPF_INSN_64, 3)
  RELOC_NUMBER (R_BPF_WDISP16, 4)
  RELOC_NUMBER (R_BPF_DATA_8,  5)
  RELOC_NUMBER (R_BPF_DATA_16, 6)
  RELOC_NUMBER (R_BPF_DATA_32, 7)
  RELOC_NUMBER (R_BPF_DATA_64, 8)
END_RELOC_NUMBERS (R_BPF_max)

There is of course R_BPF_NONE, and then we have 4 relocations for
instructions.  One for the 16-bit offset field (non-pc-relative), one
for the 32-bit immediate field (again, non-pc-relative), one for
ldimm64's immediate field, and finally R_BPF_WDISP16 for doing a
pc-relative relocation to the offset field of a jump instruction.

Then we have the usual data relocations, for 8, 16, 32, and 64-bit.

I guess LLVM's R_BPF_64_64 is the one used for the ldimm64 reference
to a map, and is equivalent to R_BPF_INSN_64 above.

We just need to sort out how we want this semantically to interconnect
etc.

In binutils we can make the MAP reference special and explicit, so we
would for example add:

	RELOC_NUMBER (R_BPF_MAP, 9)

or whatever.  And then for assembler syntax, use something like:

	%map(SYMBOL)

So you would go:

	ldimm64	r1, %map(hash_map)

or, taking it one step further, do the following since we know this
maps to a 32-bit FD:

	mov32	r1, %map(hash_map)

and this way avoid eating up a 16-byte opcode just for this.

In GCC it will be simple to get the backend to emit this, various
options exist.  We can make it a special "__attribute__((map))", or
use address spaces to annotate the map object.  And then when the
ldimm64 or whatever instruction is emitted, and it sees the symbol
referenced has this special type, it will emit "%%map(%s)" instead of
just "%s" for the symbol name in the asembler output.

That in turn will lead to the correct relocation.

But I guess for now what I could do is just make R_BPF_INSN_64 have
the same number as LLVM's R_BPF_64_64 and it should "just work" using
tooling.

I think we should spend serious time properly designing the
relocations and thinking ahead about people perhaps wanting to link
multiple objects together, call functions in other objects, and
perhaps even doing dynamic relocations.  Nothing fundamentally in
eBPF prevents this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: assembler mnenomics for call/tailcall plus maps...
  2017-04-29 18:38   ` David Miller
@ 2017-04-30  6:35       ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2017-04-30  6:35 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev, xdp-newbies

On 4/29/17 11:38 AM, David Miller wrote:
> or whatever.  And then for assembler syntax, use something like:
>
> 	%map(SYMBOL)
>
> So you would go:
>
> 	ldimm64	r1, %map(hash_map)

sure. that works.
The elf loaders should have checked relo code, of course.
I guess the above ldimm64 should probably be a special one with
insn->src_reg == BPF_PSEUDO_MAP_FD == 1
This is how kernel knows that ldimm64 carries map_fd and not
just arbitrary 64-bit constant.
The idea was to use constants in src_reg field to mark
different address spaces.
In particular tracing needs per-task storage space to
associate multiple events.
Right now the programs do it like:
u32 pid = (u32)bpf_get_current_pid_tgid();
struct scratch_space *value = bpf_map_lookup_elem(&hashmap, &pid);
// access value->var1, value->var2
The C code could have been much simpler if we could use normal global
variables var1 and var2 marked as 'per-task' address space.
I can imagine such per-task variables would be code=2,
per-cpu variables code=3 and so on.
That was never implemented, unfortunately.

Currently llvm doesn't do any special markings.
It generates normal ldimm64 with relocation into 'maps' section
then elf loader recognizes that, it creates a map, stores FD into
insn->imm = map_fd and marks it insn->src_reg = BPF_PSEUDO_MAP_FD
before sending the whole program into the kernel.

> or, taking it one step further, do the following since we know this
> maps to a 32-bit FD:
>
> 	mov32	r1, %map(hash_map)

hence this approach won't work without serious elf loader hacks.
The kernel needs to see ldimm64 because after it validated map_fd,
it will store real 'struct bpf_map *' pointer into this ldimm64
instruction and it will clear 'src_reg' markings.
So from interpreter and from JITs point of view there are no
special ldimm64 instructions. All ldimm64 are moving 64-bit
constant into a register. It's only verifier that knows that
some of these constants are real pointers.

> In GCC it will be simple to get the backend to emit this, various
> options exist.  We can make it a special "__attribute__((map))", or
> use address spaces to annotate the map object.  And then when the
> ldimm64 or whatever instruction is emitted, and it sees the symbol
> referenced has this special type, it will emit "%%map(%s)" instead of
> just "%s" for the symbol name in the asembler output.

I like the %map(symbol) idea.
I think it fits the whole thing quite well.
Not sure though how gcc will know that it needs to emit %map(..)

> But I guess for now what I could do is just make R_BPF_INSN_64 have
> the same number as LLVM's R_BPF_64_64 and it should "just work" using
> tooling.

yeah. I don't even remember why current llvm relo codes are 1 and 10.
Probably had something else in between, but then removed, because
it wasn't used, but the numbers stuck.

> I think we should spend serious time properly designing the
> relocations and thinking ahead about people perhaps wanting to link
> multiple objects together, call functions in other objects, and
> perhaps even doing dynamic relocations.  Nothing fundamentally in
> eBPF prevents this.

Yes! Completely agree.

I think we need to treat kernel<->user encoding of address space
for ldimm64 insn and elf relo codes differently.
Today BPF_PSEUDO_MAP_FD == 1 and relo code for ldimm64 into map
section is also == 1. These two are probably very confusing.
The former is user->kernel protocol and the latter is compiler->loader
convention.

The relo 10 thingy is never seen by elf loader. It's only there
because generated dwarf data need to convey info about the program,
so llvm emits .relo section into dwarf data with code=1 and code=10.
It's only there because this is how dwarf works.
The only relocation that elf loader cares about is code=1
and the only src_reg mark that kernel cares about is BPF_PSEUDO_MAP_FD.

I take all the blame for not documenting this thing properly.
The elf loader in samples/bpf/bpf_load.c should have been temporary.
Its only purpose was to have minimal demo to parse elf and load it.
I didn't expect the .o approach to come that far.
My bet was on iovisor/bcc approach where elf file is never generated.
C->bpf is compiled in memory and loaded into the kernel completely
without elf and without relocations.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: assembler mnenomics for call/tailcall plus maps...
@ 2017-04-30  6:35       ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2017-04-30  6:35 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev, xdp-newbies

On 4/29/17 11:38 AM, David Miller wrote:
> or whatever.  And then for assembler syntax, use something like:
>
> 	%map(SYMBOL)
>
> So you would go:
>
> 	ldimm64	r1, %map(hash_map)

sure. that works.
The elf loaders should have checked relo code, of course.
I guess the above ldimm64 should probably be a special one with
insn->src_reg == BPF_PSEUDO_MAP_FD == 1
This is how kernel knows that ldimm64 carries map_fd and not
just arbitrary 64-bit constant.
The idea was to use constants in src_reg field to mark
different address spaces.
In particular tracing needs per-task storage space to
associate multiple events.
Right now the programs do it like:
u32 pid = (u32)bpf_get_current_pid_tgid();
struct scratch_space *value = bpf_map_lookup_elem(&hashmap, &pid);
// access value->var1, value->var2
The C code could have been much simpler if we could use normal global
variables var1 and var2 marked as 'per-task' address space.
I can imagine such per-task variables would be code=2,
per-cpu variables code=3 and so on.
That was never implemented, unfortunately.

Currently llvm doesn't do any special markings.
It generates normal ldimm64 with relocation into 'maps' section
then elf loader recognizes that, it creates a map, stores FD into
insn->imm = map_fd and marks it insn->src_reg = BPF_PSEUDO_MAP_FD
before sending the whole program into the kernel.

> or, taking it one step further, do the following since we know this
> maps to a 32-bit FD:
>
> 	mov32	r1, %map(hash_map)

hence this approach won't work without serious elf loader hacks.
The kernel needs to see ldimm64 because after it validated map_fd,
it will store real 'struct bpf_map *' pointer into this ldimm64
instruction and it will clear 'src_reg' markings.
So from interpreter and from JITs point of view there are no
special ldimm64 instructions. All ldimm64 are moving 64-bit
constant into a register. It's only verifier that knows that
some of these constants are real pointers.

> In GCC it will be simple to get the backend to emit this, various
> options exist.  We can make it a special "__attribute__((map))", or
> use address spaces to annotate the map object.  And then when the
> ldimm64 or whatever instruction is emitted, and it sees the symbol
> referenced has this special type, it will emit "%%map(%s)" instead of
> just "%s" for the symbol name in the asembler output.

I like the %map(symbol) idea.
I think it fits the whole thing quite well.
Not sure though how gcc will know that it needs to emit %map(..)

> But I guess for now what I could do is just make R_BPF_INSN_64 have
> the same number as LLVM's R_BPF_64_64 and it should "just work" using
> tooling.

yeah. I don't even remember why current llvm relo codes are 1 and 10.
Probably had something else in between, but then removed, because
it wasn't used, but the numbers stuck.

> I think we should spend serious time properly designing the
> relocations and thinking ahead about people perhaps wanting to link
> multiple objects together, call functions in other objects, and
> perhaps even doing dynamic relocations.  Nothing fundamentally in
> eBPF prevents this.

Yes! Completely agree.

I think we need to treat kernel<->user encoding of address space
for ldimm64 insn and elf relo codes differently.
Today BPF_PSEUDO_MAP_FD == 1 and relo code for ldimm64 into map
section is also == 1. These two are probably very confusing.
The former is user->kernel protocol and the latter is compiler->loader
convention.

The relo 10 thingy is never seen by elf loader. It's only there
because generated dwarf data need to convey info about the program,
so llvm emits .relo section into dwarf data with code=1 and code=10.
It's only there because this is how dwarf works.
The only relocation that elf loader cares about is code=1
and the only src_reg mark that kernel cares about is BPF_PSEUDO_MAP_FD.

I take all the blame for not documenting this thing properly.
The elf loader in samples/bpf/bpf_load.c should have been temporary.
Its only purpose was to have minimal demo to parse elf and load it.
I didn't expect the .o approach to come that far.
My bet was on iovisor/bcc approach where elf file is never generated.
C->bpf is compiled in memory and loaded into the kernel completely
without elf and without relocations.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: assembler mnenomics for call/tailcall plus maps...
  2017-04-30  6:35       ` Alexei Starovoitov
  (?)
@ 2017-04-30 15:27       ` David Miller
  -1 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2017-04-30 15:27 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, xdp-newbies

From: Alexei Starovoitov <ast@fb.com>
Date: Sat, 29 Apr 2017 23:35:30 -0700

> On 4/29/17 11:38 AM, David Miller wrote:
>> or, taking it one step further, do the following since we know this
>> maps to a 32-bit FD:
>>
>> 	mov32	r1, %map(hash_map)
> 
> hence this approach won't work without serious elf loader hacks.
> The kernel needs to see ldimm64 because after it validated map_fd,
> it will store real 'struct bpf_map *' pointer into this ldimm64
> instruction and it will clear 'src_reg' markings.

I didn't see this part, now it all makes sense why ldimm64 is used
and I therefore think we should keep it this way.

> So from interpreter and from JITs point of view there are no
> special ldimm64 instructions. All ldimm64 are moving 64-bit
> constant into a register. It's only verifier that knows that
> some of these constants are real pointers.
> 
>> In GCC it will be simple to get the backend to emit this, various
>> options exist.  We can make it a special "__attribute__((map))", or
>> use address spaces to annotate the map object.  And then when the
>> ldimm64 or whatever instruction is emitted, and it sees the symbol
>> referenced has this special type, it will emit "%%map(%s)" instead of
>> just "%s" for the symbol name in the asembler output.
> 
> I like the %map(symbol) idea.
> I think it fits the whole thing quite well.
> Not sure though how gcc will know that it needs to emit %map(..)

I just explained it in that paragraph above :-)

struct bpf_map_def SEC("maps") jmp_table __attribute__((map)) = {

And when referenced by an instruction the bpf gcc backend can see that
the "map" attribute is set and emit the appropriate %map() string into
the assembler.

We can even make the special map attribute do the SEC("") part too.

> I take all the blame for not documenting this thing properly.
> The elf loader in samples/bpf/bpf_load.c should have been temporary.
> Its only purpose was to have minimal demo to parse elf and load it.
> I didn't expect the .o approach to come that far.
> My bet was on iovisor/bcc approach where elf file is never generated.
> C->bpf is compiled in memory and loaded into the kernel completely
> without elf and without relocations.

I think it is better to have real objects for introspection (even
after session is complete) and for testing under simulators (one of
which I plan to write).

And if we linked a real final static object, elf header would be all
that would be needed to find execution entry point.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-30 15:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-27 20:42 assembler mnenomics for call/tailcall plus maps David Miller
2017-04-28  2:06 ` Alexei Starovoitov
2017-04-28  2:06   ` Alexei Starovoitov
2017-04-29 18:38   ` David Miller
2017-04-30  6:35     ` Alexei Starovoitov
2017-04-30  6:35       ` Alexei Starovoitov
2017-04-30 15:27       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.