* assembler mnenomics for call/tailcall plus maps... @ 2017-04-27 20:42 David Miller 2017-04-28 2:06 ` Alexei Starovoitov 0 siblings, 1 reply; 7+ messages in thread From: David Miller @ 2017-04-27 20:42 UTC (permalink / raw) To: ast, daniel; +Cc: netdev, xdp-newbies Can you guys give me some kind of idea of how it might be nice to represent calls and tailcalls in assembler files? And also the emission of maps. Right now I just have the assembler looking for 32-bit immediate values for call and tailcall instructions. Looking at samples/bpf/sockex3_kern.c we have: struct bpf_map_def SEC("maps") jmp_table = { .type = BPF_MAP_TYPE_PROG_ARRAY, .key_size = sizeof(u32), .value_size = sizeof(u32), .max_entries = 8, }; #define PARSE_VLAN 1 #define PARSE_MPLS 2 #define PARSE_IP 3 #define PARSE_IPV6 4 /* protocol dispatch routine. * It tail-calls next BPF program depending on eth proto * Note, we could have used: * bpf_tail_call(skb, &jmp_table, proto); * but it would need large prog_array */ static inline void parse_eth_proto(struct __sk_buff *skb, u32 proto) { switch (proto) { case ETH_P_8021Q: case ETH_P_8021AD: bpf_tail_call(skb, &jmp_table, PARSE_VLAN); break; case ETH_P_MPLS_UC: case ETH_P_MPLS_MC: bpf_tail_call(skb, &jmp_table, PARSE_MPLS); break; case ETH_P_IP: bpf_tail_call(skb, &jmp_table, PARSE_IP); break; case ETH_P_IPV6: bpf_tail_call(skb, &jmp_table, PARSE_IPV6); break; } } and these bpf_tail_call() invocations seem to expand to something like: call 1 ! PARSE_VLAN call 2 ! PARSE_MPLS call 3 ! PARSE_IP call 4 ! PARSE_IPV6 in the resultant ELF file. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: assembler mnenomics for call/tailcall plus maps... 2017-04-27 20:42 assembler mnenomics for call/tailcall plus maps David Miller @ 2017-04-28 2:06 ` Alexei Starovoitov 0 siblings, 0 replies; 7+ messages in thread From: Alexei Starovoitov @ 2017-04-28 2:06 UTC (permalink / raw) To: David Miller, daniel; +Cc: netdev, xdp-newbies On 4/27/17 1:42 PM, David Miller wrote: > > Can you guys give me some kind of idea of how it might be nice to > represent calls and tailcalls in assembler files? llvm prints them as 'call 1' for bpf_map_lookup and 'call 12' for bpf_tail_call. In the instruction stream bpf_tail_call is no different than regular call. Only after verifier it becomes special instruction. This 'call 1234' is obviously not readable. We probably should allow both 'call 123' to make sure that we don't need to change assembler/compiler with every new helper added and allow more sane 'call bpf_map_lookup_elem' for helpers with known func_id. > And also the emission of maps. > > Right now I just have the assembler looking for 32-bit immediate > values for call and tailcall instructions. > > Looking at samples/bpf/sockex3_kern.c we have: > > struct bpf_map_def SEC("maps") jmp_table = { > .type = BPF_MAP_TYPE_PROG_ARRAY, > .key_size = sizeof(u32), > .value_size = sizeof(u32), > .max_entries = 8, > }; yeah. this one is tricky. Currently there is a protocol between C file section name and user space bpf_loader. The above 'struct bpf_map_def' is recognized by samples/bpf/bpf_load.c and by tools/lib/bpf/libbpf.c which is used by perf. iproute2 allows extended format with two extra fields for pinning. So if we write the same stuff in assembler, the existing loaders will understand it. The tricky part is in relocations. To do bpf_map_lookup_elem() the R1 needs to point to map, so in assembler we need to be able to say something like: ldimm64 r1, jmp_table where jmp_table will be in data section named 'maps'. So in asm the map lookup will look like: .section maps,"aw",@progbits .globl hashmap_def hashmap_def: .long 1 # type .long 24 # key_size .long 40 # value_size .long 256 # max_entries .text .section xdp_tx_iptunnel,"ax",@progbits .globl _xdp_prog _xdp_prog: ldimm64 r1, hashmap_def mov r2, r10 add r2, -8 call bpf_map_lookup_elem this is 64-bit relo for ldimm64 insn This is how it's defined in llvm: ELF_RELOC(R_BPF_NONE, 0) ELF_RELOC(R_BPF_64_64, 1) ELF_RELOC(R_BPF_64_32, 10) The R_BPF_64_64 is the only relocation against .text The other one is used for relo into dwarf sections. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: assembler mnenomics for call/tailcall plus maps... @ 2017-04-28 2:06 ` Alexei Starovoitov 0 siblings, 0 replies; 7+ messages in thread From: Alexei Starovoitov @ 2017-04-28 2:06 UTC (permalink / raw) To: David Miller, daniel; +Cc: netdev, xdp-newbies On 4/27/17 1:42 PM, David Miller wrote: > > Can you guys give me some kind of idea of how it might be nice to > represent calls and tailcalls in assembler files? llvm prints them as 'call 1' for bpf_map_lookup and 'call 12' for bpf_tail_call. In the instruction stream bpf_tail_call is no different than regular call. Only after verifier it becomes special instruction. This 'call 1234' is obviously not readable. We probably should allow both 'call 123' to make sure that we don't need to change assembler/compiler with every new helper added and allow more sane 'call bpf_map_lookup_elem' for helpers with known func_id. > And also the emission of maps. > > Right now I just have the assembler looking for 32-bit immediate > values for call and tailcall instructions. > > Looking at samples/bpf/sockex3_kern.c we have: > > struct bpf_map_def SEC("maps") jmp_table = { > .type = BPF_MAP_TYPE_PROG_ARRAY, > .key_size = sizeof(u32), > .value_size = sizeof(u32), > .max_entries = 8, > }; yeah. this one is tricky. Currently there is a protocol between C file section name and user space bpf_loader. The above 'struct bpf_map_def' is recognized by samples/bpf/bpf_load.c and by tools/lib/bpf/libbpf.c which is used by perf. iproute2 allows extended format with two extra fields for pinning. So if we write the same stuff in assembler, the existing loaders will understand it. The tricky part is in relocations. To do bpf_map_lookup_elem() the R1 needs to point to map, so in assembler we need to be able to say something like: ldimm64 r1, jmp_table where jmp_table will be in data section named 'maps'. So in asm the map lookup will look like: .section maps,"aw",@progbits .globl hashmap_def hashmap_def: .long 1 # type .long 24 # key_size .long 40 # value_size .long 256 # max_entries .text .section xdp_tx_iptunnel,"ax",@progbits .globl _xdp_prog _xdp_prog: ldimm64 r1, hashmap_def mov r2, r10 add r2, -8 call bpf_map_lookup_elem this is 64-bit relo for ldimm64 insn This is how it's defined in llvm: ELF_RELOC(R_BPF_NONE, 0) ELF_RELOC(R_BPF_64_64, 1) ELF_RELOC(R_BPF_64_32, 10) The R_BPF_64_64 is the only relocation against .text The other one is used for relo into dwarf sections. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: assembler mnenomics for call/tailcall plus maps... 2017-04-28 2:06 ` Alexei Starovoitov (?) @ 2017-04-29 18:38 ` David Miller 2017-04-30 6:35 ` Alexei Starovoitov -1 siblings, 1 reply; 7+ messages in thread From: David Miller @ 2017-04-29 18:38 UTC (permalink / raw) To: ast; +Cc: daniel, netdev, xdp-newbies From: Alexei Starovoitov <ast@fb.com> Date: Thu, 27 Apr 2017 19:06:14 -0700 > So in asm the map lookup will look like: > .section maps,"aw",@progbits > .globl hashmap_def > hashmap_def: > .long 1 # type > .long 24 # key_size > .long 40 # value_size > .long 256 # max_entries > > .text > .section xdp_tx_iptunnel,"ax",@progbits > .globl _xdp_prog > _xdp_prog: > ldimm64 r1, hashmap_def > mov r2, r10 > add r2, -8 > call bpf_map_lookup_elem > > this is 64-bit relo for ldimm64 insn > > This is how it's defined in llvm: > ELF_RELOC(R_BPF_NONE, 0) > ELF_RELOC(R_BPF_64_64, 1) > ELF_RELOC(R_BPF_64_32, 10) > > The R_BPF_64_64 is the only relocation against .text > The other one is used for relo into dwarf sections. There is another missing element here, we don't use the relocation to write the actual address of "&hashmap_def" into the ldimm64 instruction. Instead, we use the relocation to write the file descriptor number for that map into the instruction. The only parts of the relocation that are used are the offset (to find the BPF instruction) and the symbol reference (to find out which map we are referencing). The BPF loader also doesn't even check the relocation number to make sure it's of the correct type. It just assumes that any relocation it sees is exactly this kind used for maps. My relocations in binutils currently look like this: START_RELOC_NUMBERS (elf_bpf_reloc_type) RELOC_NUMBER (R_BPF_NONE, 0) RELOC_NUMBER (R_BPF_INSN_16, 1) RELOC_NUMBER (R_BPF_INSN_32, 2) RELOC_NUMBER (R_BPF_INSN_64, 3) RELOC_NUMBER (R_BPF_WDISP16, 4) RELOC_NUMBER (R_BPF_DATA_8, 5) RELOC_NUMBER (R_BPF_DATA_16, 6) RELOC_NUMBER (R_BPF_DATA_32, 7) RELOC_NUMBER (R_BPF_DATA_64, 8) END_RELOC_NUMBERS (R_BPF_max) There is of course R_BPF_NONE, and then we have 4 relocations for instructions. One for the 16-bit offset field (non-pc-relative), one for the 32-bit immediate field (again, non-pc-relative), one for ldimm64's immediate field, and finally R_BPF_WDISP16 for doing a pc-relative relocation to the offset field of a jump instruction. Then we have the usual data relocations, for 8, 16, 32, and 64-bit. I guess LLVM's R_BPF_64_64 is the one used for the ldimm64 reference to a map, and is equivalent to R_BPF_INSN_64 above. We just need to sort out how we want this semantically to interconnect etc. In binutils we can make the MAP reference special and explicit, so we would for example add: RELOC_NUMBER (R_BPF_MAP, 9) or whatever. And then for assembler syntax, use something like: %map(SYMBOL) So you would go: ldimm64 r1, %map(hash_map) or, taking it one step further, do the following since we know this maps to a 32-bit FD: mov32 r1, %map(hash_map) and this way avoid eating up a 16-byte opcode just for this. In GCC it will be simple to get the backend to emit this, various options exist. We can make it a special "__attribute__((map))", or use address spaces to annotate the map object. And then when the ldimm64 or whatever instruction is emitted, and it sees the symbol referenced has this special type, it will emit "%%map(%s)" instead of just "%s" for the symbol name in the asembler output. That in turn will lead to the correct relocation. But I guess for now what I could do is just make R_BPF_INSN_64 have the same number as LLVM's R_BPF_64_64 and it should "just work" using tooling. I think we should spend serious time properly designing the relocations and thinking ahead about people perhaps wanting to link multiple objects together, call functions in other objects, and perhaps even doing dynamic relocations. Nothing fundamentally in eBPF prevents this. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: assembler mnenomics for call/tailcall plus maps... 2017-04-29 18:38 ` David Miller @ 2017-04-30 6:35 ` Alexei Starovoitov 0 siblings, 0 replies; 7+ messages in thread From: Alexei Starovoitov @ 2017-04-30 6:35 UTC (permalink / raw) To: David Miller; +Cc: daniel, netdev, xdp-newbies On 4/29/17 11:38 AM, David Miller wrote: > or whatever. And then for assembler syntax, use something like: > > %map(SYMBOL) > > So you would go: > > ldimm64 r1, %map(hash_map) sure. that works. The elf loaders should have checked relo code, of course. I guess the above ldimm64 should probably be a special one with insn->src_reg == BPF_PSEUDO_MAP_FD == 1 This is how kernel knows that ldimm64 carries map_fd and not just arbitrary 64-bit constant. The idea was to use constants in src_reg field to mark different address spaces. In particular tracing needs per-task storage space to associate multiple events. Right now the programs do it like: u32 pid = (u32)bpf_get_current_pid_tgid(); struct scratch_space *value = bpf_map_lookup_elem(&hashmap, &pid); // access value->var1, value->var2 The C code could have been much simpler if we could use normal global variables var1 and var2 marked as 'per-task' address space. I can imagine such per-task variables would be code=2, per-cpu variables code=3 and so on. That was never implemented, unfortunately. Currently llvm doesn't do any special markings. It generates normal ldimm64 with relocation into 'maps' section then elf loader recognizes that, it creates a map, stores FD into insn->imm = map_fd and marks it insn->src_reg = BPF_PSEUDO_MAP_FD before sending the whole program into the kernel. > or, taking it one step further, do the following since we know this > maps to a 32-bit FD: > > mov32 r1, %map(hash_map) hence this approach won't work without serious elf loader hacks. The kernel needs to see ldimm64 because after it validated map_fd, it will store real 'struct bpf_map *' pointer into this ldimm64 instruction and it will clear 'src_reg' markings. So from interpreter and from JITs point of view there are no special ldimm64 instructions. All ldimm64 are moving 64-bit constant into a register. It's only verifier that knows that some of these constants are real pointers. > In GCC it will be simple to get the backend to emit this, various > options exist. We can make it a special "__attribute__((map))", or > use address spaces to annotate the map object. And then when the > ldimm64 or whatever instruction is emitted, and it sees the symbol > referenced has this special type, it will emit "%%map(%s)" instead of > just "%s" for the symbol name in the asembler output. I like the %map(symbol) idea. I think it fits the whole thing quite well. Not sure though how gcc will know that it needs to emit %map(..) > But I guess for now what I could do is just make R_BPF_INSN_64 have > the same number as LLVM's R_BPF_64_64 and it should "just work" using > tooling. yeah. I don't even remember why current llvm relo codes are 1 and 10. Probably had something else in between, but then removed, because it wasn't used, but the numbers stuck. > I think we should spend serious time properly designing the > relocations and thinking ahead about people perhaps wanting to link > multiple objects together, call functions in other objects, and > perhaps even doing dynamic relocations. Nothing fundamentally in > eBPF prevents this. Yes! Completely agree. I think we need to treat kernel<->user encoding of address space for ldimm64 insn and elf relo codes differently. Today BPF_PSEUDO_MAP_FD == 1 and relo code for ldimm64 into map section is also == 1. These two are probably very confusing. The former is user->kernel protocol and the latter is compiler->loader convention. The relo 10 thingy is never seen by elf loader. It's only there because generated dwarf data need to convey info about the program, so llvm emits .relo section into dwarf data with code=1 and code=10. It's only there because this is how dwarf works. The only relocation that elf loader cares about is code=1 and the only src_reg mark that kernel cares about is BPF_PSEUDO_MAP_FD. I take all the blame for not documenting this thing properly. The elf loader in samples/bpf/bpf_load.c should have been temporary. Its only purpose was to have minimal demo to parse elf and load it. I didn't expect the .o approach to come that far. My bet was on iovisor/bcc approach where elf file is never generated. C->bpf is compiled in memory and loaded into the kernel completely without elf and without relocations. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: assembler mnenomics for call/tailcall plus maps... @ 2017-04-30 6:35 ` Alexei Starovoitov 0 siblings, 0 replies; 7+ messages in thread From: Alexei Starovoitov @ 2017-04-30 6:35 UTC (permalink / raw) To: David Miller; +Cc: daniel, netdev, xdp-newbies On 4/29/17 11:38 AM, David Miller wrote: > or whatever. And then for assembler syntax, use something like: > > %map(SYMBOL) > > So you would go: > > ldimm64 r1, %map(hash_map) sure. that works. The elf loaders should have checked relo code, of course. I guess the above ldimm64 should probably be a special one with insn->src_reg == BPF_PSEUDO_MAP_FD == 1 This is how kernel knows that ldimm64 carries map_fd and not just arbitrary 64-bit constant. The idea was to use constants in src_reg field to mark different address spaces. In particular tracing needs per-task storage space to associate multiple events. Right now the programs do it like: u32 pid = (u32)bpf_get_current_pid_tgid(); struct scratch_space *value = bpf_map_lookup_elem(&hashmap, &pid); // access value->var1, value->var2 The C code could have been much simpler if we could use normal global variables var1 and var2 marked as 'per-task' address space. I can imagine such per-task variables would be code=2, per-cpu variables code=3 and so on. That was never implemented, unfortunately. Currently llvm doesn't do any special markings. It generates normal ldimm64 with relocation into 'maps' section then elf loader recognizes that, it creates a map, stores FD into insn->imm = map_fd and marks it insn->src_reg = BPF_PSEUDO_MAP_FD before sending the whole program into the kernel. > or, taking it one step further, do the following since we know this > maps to a 32-bit FD: > > mov32 r1, %map(hash_map) hence this approach won't work without serious elf loader hacks. The kernel needs to see ldimm64 because after it validated map_fd, it will store real 'struct bpf_map *' pointer into this ldimm64 instruction and it will clear 'src_reg' markings. So from interpreter and from JITs point of view there are no special ldimm64 instructions. All ldimm64 are moving 64-bit constant into a register. It's only verifier that knows that some of these constants are real pointers. > In GCC it will be simple to get the backend to emit this, various > options exist. We can make it a special "__attribute__((map))", or > use address spaces to annotate the map object. And then when the > ldimm64 or whatever instruction is emitted, and it sees the symbol > referenced has this special type, it will emit "%%map(%s)" instead of > just "%s" for the symbol name in the asembler output. I like the %map(symbol) idea. I think it fits the whole thing quite well. Not sure though how gcc will know that it needs to emit %map(..) > But I guess for now what I could do is just make R_BPF_INSN_64 have > the same number as LLVM's R_BPF_64_64 and it should "just work" using > tooling. yeah. I don't even remember why current llvm relo codes are 1 and 10. Probably had something else in between, but then removed, because it wasn't used, but the numbers stuck. > I think we should spend serious time properly designing the > relocations and thinking ahead about people perhaps wanting to link > multiple objects together, call functions in other objects, and > perhaps even doing dynamic relocations. Nothing fundamentally in > eBPF prevents this. Yes! Completely agree. I think we need to treat kernel<->user encoding of address space for ldimm64 insn and elf relo codes differently. Today BPF_PSEUDO_MAP_FD == 1 and relo code for ldimm64 into map section is also == 1. These two are probably very confusing. The former is user->kernel protocol and the latter is compiler->loader convention. The relo 10 thingy is never seen by elf loader. It's only there because generated dwarf data need to convey info about the program, so llvm emits .relo section into dwarf data with code=1 and code=10. It's only there because this is how dwarf works. The only relocation that elf loader cares about is code=1 and the only src_reg mark that kernel cares about is BPF_PSEUDO_MAP_FD. I take all the blame for not documenting this thing properly. The elf loader in samples/bpf/bpf_load.c should have been temporary. Its only purpose was to have minimal demo to parse elf and load it. I didn't expect the .o approach to come that far. My bet was on iovisor/bcc approach where elf file is never generated. C->bpf is compiled in memory and loaded into the kernel completely without elf and without relocations. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: assembler mnenomics for call/tailcall plus maps... 2017-04-30 6:35 ` Alexei Starovoitov (?) @ 2017-04-30 15:27 ` David Miller -1 siblings, 0 replies; 7+ messages in thread From: David Miller @ 2017-04-30 15:27 UTC (permalink / raw) To: ast; +Cc: daniel, netdev, xdp-newbies From: Alexei Starovoitov <ast@fb.com> Date: Sat, 29 Apr 2017 23:35:30 -0700 > On 4/29/17 11:38 AM, David Miller wrote: >> or, taking it one step further, do the following since we know this >> maps to a 32-bit FD: >> >> mov32 r1, %map(hash_map) > > hence this approach won't work without serious elf loader hacks. > The kernel needs to see ldimm64 because after it validated map_fd, > it will store real 'struct bpf_map *' pointer into this ldimm64 > instruction and it will clear 'src_reg' markings. I didn't see this part, now it all makes sense why ldimm64 is used and I therefore think we should keep it this way. > So from interpreter and from JITs point of view there are no > special ldimm64 instructions. All ldimm64 are moving 64-bit > constant into a register. It's only verifier that knows that > some of these constants are real pointers. > >> In GCC it will be simple to get the backend to emit this, various >> options exist. We can make it a special "__attribute__((map))", or >> use address spaces to annotate the map object. And then when the >> ldimm64 or whatever instruction is emitted, and it sees the symbol >> referenced has this special type, it will emit "%%map(%s)" instead of >> just "%s" for the symbol name in the asembler output. > > I like the %map(symbol) idea. > I think it fits the whole thing quite well. > Not sure though how gcc will know that it needs to emit %map(..) I just explained it in that paragraph above :-) struct bpf_map_def SEC("maps") jmp_table __attribute__((map)) = { And when referenced by an instruction the bpf gcc backend can see that the "map" attribute is set and emit the appropriate %map() string into the assembler. We can even make the special map attribute do the SEC("") part too. > I take all the blame for not documenting this thing properly. > The elf loader in samples/bpf/bpf_load.c should have been temporary. > Its only purpose was to have minimal demo to parse elf and load it. > I didn't expect the .o approach to come that far. > My bet was on iovisor/bcc approach where elf file is never generated. > C->bpf is compiled in memory and loaded into the kernel completely > without elf and without relocations. I think it is better to have real objects for introspection (even after session is complete) and for testing under simulators (one of which I plan to write). And if we linked a real final static object, elf header would be all that would be needed to find execution entry point. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-04-30 15:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-27 20:42 assembler mnenomics for call/tailcall plus maps David Miller 2017-04-28 2:06 ` Alexei Starovoitov 2017-04-28 2:06 ` Alexei Starovoitov 2017-04-29 18:38 ` David Miller 2017-04-30 6:35 ` Alexei Starovoitov 2017-04-30 6:35 ` Alexei Starovoitov 2017-04-30 15:27 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.