linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm
@ 2014-08-24 20:21 Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 01/29] bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT Alexei Starovoitov
                   ` (29 more replies)
  0 siblings, 30 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

Hi All,

enough RFCs, let's finalize it...

Andy, Kees, please take a look at verifier and syscall API once again.
I hope I addressed all of your comments.

Peter, I played with 'lea [%rip+off]' tricks in JIT as suggested by Andy, but
there was no performance gain and JIT main loop got quite complicated, so
I left it as-is with movabsq. I hope you're ok for now. Let's revisit it later.

Steven, Namhyung, please review the way it attaches to tracing
(tracepoint, syscalls, kprobe)

Brendan, I've added example#3 which is heavily influenced by your heatmap graphs
and ACM paper. It measures disk IO latency and prints heatmap in text terminal
using shades of gray. Looks cool. I've wasted a day trying to make heatmap
be as fancy as on your slides, but I guess my SSDs are too predicatable :)

Fully tested on x64 and i386.
Build/boot tested on arm/sparc with NET and NET-less configs.
(There are warning regarding unimplemented syscall, of course)

V4->V5:
- while playing with tracing examples Brendan noticed that x64 JIT missed
  'shift by register' support. fixed and added to testsuite (patch 0001)
- fixed BPF_LD_IMM64 encoding (as suggested by Andy) (patch 0002)
  and added verifier tests for this insn (patch 0024)
- enabled bpf syscall on i386 as well (patch 0005)
- added more comments to verifier around bounds checking,
  since the logic was confusing to Kees earlier (patch 0014)
- split eBPF out of NET via hidden CONFIG_BPF (patch 0017)
- added bpf_ktime_get_ns() and tracing example#3 (as suggested by Brendan)
  (patch 0021 and 0029)
- added a bunch more verifier tests. 42 tests so far (patch 0024)
- dropped ebpf+sockets patch and examples, there was not enough public
  discussions on it whereas ebpf+tracing got a lot of mileage

I still owe Brendan better strings support and map[stack()]++ so that
'flame graphs' can work :)
Right now I'd like to focus on getting the current set in, since it's
very useful for performance analysis already.

Steven, if you don't like access to trace_printk() from eBPF programs,
I can drop patch 0019 for now, but we'd need to think of a way to print
things from programs and trace_printk() looks like the best fit.

Netdev folks, this patch doesn't affect any networking bits, but
obviously I would like to apply the technology in the networking space.
Where and how, is tbd. We're still discussing ovs+bpf.

All, most of the diff is LLVM backend (patch 0025), I think it makes
sense to keep it in kernel tree for now. Once backend is upstreamed
we can remove it from here.

IMO this is pretty solid base for all sorts of things.

previous V4 discussion:
https://lkml.org/lkml/2014/8/13/111

V3->V4:
- introduced 'load 64-bit immediate' eBPF instruction
- use BPF_LD_IMM64 in LLVM, verifier, programs
- got rid of 'fixup' section in eBPF programs
- got rid of map IDR and internal map_id
- split verifier into 6 patches and added verifier testsuite
- add verifier check for reserved instruction fields
- fixed bug in LLVM eBPF backend (it was miscompiling __builtin_expect)
- fixed race condition in htab_map_update_elem()
- tracing filters can now attach to tracepoint, syscall, kprobe events
- improved C examples 

V2->V3:
- fixed verifier register range bug and addressed other comments (Thanks Kees!)
- re-added LLVM eBPF backend
- added two examples in C
- user space ELF parser and loader example

V1->V2:
- got rid of global id, everything now FD based (Thanks Andy!)
- split type enum in verifier (as suggested by Andy and Namhyung)
- switched gpl enforcement to be kmod like (as suggested by Andy and David)
- addressed feedback from Namhyung, Chema, Joe
- added more comments to verifier
- renamed sock_filter_int -> bpf_insn
- rebased on net-next

As always all patches are available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf master

------

Alexei Starovoitov (29):
  bpf: x86: add missing 'shift by register' instructions to x64 eBPF
    JIT
  net: filter: add "load 64-bit immediate" eBPF instruction
  net: filter: split filter.h and expose eBPF to user space
  bpf: introduce syscall(BPF, ...) and BPF maps
  bpf: enable bpf syscall on x64 and i386
  bpf: add lookup/update/delete/iterate methods to BPF maps
  bpf: add hashtable type of BPF maps
  bpf: expand BPF syscall with program load/unload
  bpf: handle pseudo BPF_CALL insn
  bpf: verifier (add docs)
  bpf: verifier (add ability to receive verification log)
  bpf: handle pseudo BPF_LD_IMM64 insn
  bpf: verifier (add branch/goto checks)
  bpf: verifier (add verifier core)
  bpf: verifier (add state prunning optimization)
  bpf: allow eBPF programs to use maps
  bpf: split eBPF out of NET
  tracing: allow eBPF programs to be attached to events
  tracing: allow eBPF programs call printk()
  tracing: allow eBPF programs to be attached to kprobe/kretprobe
  tracing: allow eBPF programs to call ktime_get_ns() and get_current()
  samples: bpf: add mini eBPF library to manipulate maps and programs
  samples: bpf: example of tracing filters with eBPF
  bpf: verifier test
  bpf: llvm backend
  samples: bpf: elf file loader
  samples: bpf: eBPF example in C
  samples: bpf: counting eBPF example in C
  samples: bpf: IO latency analysis (iosnoop/heatmap)

 Documentation/networking/filter.txt                |  309 +++-
 arch/Kconfig                                       |    4 +
 arch/x86/net/bpf_jit_comp.c                        |   59 +
 arch/x86/syscalls/syscall_32.tbl                   |    1 +
 arch/x86/syscalls/syscall_64.tbl                   |    1 +
 fs/btrfs/super.c                                   |    3 +
 include/linux/bpf.h                                |  140 ++
 include/linux/filter.h                             |  303 +---
 include/linux/ftrace_event.h                       |    5 +
 include/linux/syscalls.h                           |    3 +-
 include/trace/bpf_trace.h                          |   23 +
 include/trace/ftrace.h                             |   25 +
 include/uapi/asm-generic/unistd.h                  |    4 +-
 include/uapi/linux/Kbuild                          |    1 +
 include/uapi/linux/bpf.h                           |  424 +++++
 kernel/Makefile                                    |    2 +-
 kernel/bpf/Makefile                                |    2 +-
 kernel/bpf/core.c                                  |   17 +
 kernel/bpf/hashtab.c                               |  372 ++++
 kernel/bpf/syscall.c                               |  658 +++++++
 kernel/bpf/verifier.c                              | 1911 ++++++++++++++++++++
 kernel/sys_ni.c                                    |    3 +
 kernel/trace/Kconfig                               |    1 +
 kernel/trace/Makefile                              |    1 +
 kernel/trace/bpf_trace.c                           |  264 +++
 kernel/trace/trace.h                               |    3 +
 kernel/trace/trace_events.c                        |   41 +-
 kernel/trace/trace_events_filter.c                 |   72 +-
 kernel/trace/trace_kprobe.c                        |   28 +
 kernel/trace/trace_syscalls.c                      |   32 +
 lib/test_bpf.c                                     |   59 +
 net/Kconfig                                        |    1 +
 net/core/filter.c                                  |    2 +
 samples/bpf/Makefile                               |   28 +
 samples/bpf/bpf_helpers.h                          |   27 +
 samples/bpf/bpf_load.c                             |  234 +++
 samples/bpf/bpf_load.h                             |   26 +
 samples/bpf/dropmon.c                              |  122 ++
 samples/bpf/ex1_kern.c                             |   27 +
 samples/bpf/ex1_user.c                             |   24 +
 samples/bpf/ex2_kern.c                             |   73 +
 samples/bpf/ex2_user.c                             |   94 +
 samples/bpf/ex3_kern.c                             |  104 ++
 samples/bpf/ex3_user.c                             |  149 ++
 samples/bpf/libbpf.c                               |  138 ++
 samples/bpf/libbpf.h                               |   21 +
 samples/bpf/test_verifier.c                        |  599 ++++++
 tools/bpf/llvm/.gitignore                          |   54 +
 tools/bpf/llvm/LICENSE.TXT                         |   70 +
 tools/bpf/llvm/Makefile.rules                      |  641 +++++++
 tools/bpf/llvm/README.txt                          |   23 +
 tools/bpf/llvm/bld/Makefile                        |   27 +
 tools/bpf/llvm/bld/Makefile.common                 |   14 +
 tools/bpf/llvm/bld/Makefile.config                 |  124 ++
 .../llvm/bld/include/llvm/Config/AsmParsers.def    |    8 +
 .../llvm/bld/include/llvm/Config/AsmPrinters.def   |    9 +
 .../llvm/bld/include/llvm/Config/Disassemblers.def |    8 +
 tools/bpf/llvm/bld/include/llvm/Config/Targets.def |    9 +
 .../bpf/llvm/bld/include/llvm/Support/DataTypes.h  |   96 +
 tools/bpf/llvm/bld/lib/Makefile                    |   11 +
 .../llvm/bld/lib/Target/BPF/InstPrinter/Makefile   |   10 +
 .../llvm/bld/lib/Target/BPF/MCTargetDesc/Makefile  |   11 +
 tools/bpf/llvm/bld/lib/Target/BPF/Makefile         |   17 +
 .../llvm/bld/lib/Target/BPF/TargetInfo/Makefile    |   10 +
 tools/bpf/llvm/bld/lib/Target/Makefile             |   11 +
 tools/bpf/llvm/bld/tools/Makefile                  |   12 +
 tools/bpf/llvm/bld/tools/llc/Makefile              |   15 +
 tools/bpf/llvm/lib/Target/BPF/BPF.h                |   28 +
 tools/bpf/llvm/lib/Target/BPF/BPF.td               |   29 +
 tools/bpf/llvm/lib/Target/BPF/BPFAsmPrinter.cpp    |  100 +
 tools/bpf/llvm/lib/Target/BPF/BPFCallingConv.td    |   24 +
 tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.cpp |   36 +
 tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.h   |   35 +
 tools/bpf/llvm/lib/Target/BPF/BPFISelDAGToDAG.cpp  |  182 ++
 tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.cpp  |  683 +++++++
 tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.h    |  105 ++
 tools/bpf/llvm/lib/Target/BPF/BPFInstrFormats.td   |   29 +
 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.cpp     |  162 ++
 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.h       |   53 +
 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.td      |  498 +++++
 tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.cpp   |   77 +
 tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.h     |   40 +
 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.cpp  |  122 ++
 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.h    |   65 +
 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.td   |   39 +
 tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.cpp     |   23 +
 tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.h       |   33 +
 tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.cpp |   66 +
 tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.h   |   69 +
 .../lib/Target/BPF/InstPrinter/BPFInstPrinter.cpp  |   81 +
 .../lib/Target/BPF/InstPrinter/BPFInstPrinter.h    |   34 +
 .../lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp  |   89 +
 .../llvm/lib/Target/BPF/MCTargetDesc/BPFBaseInfo.h |   33 +
 .../Target/BPF/MCTargetDesc/BPFELFObjectWriter.cpp |   56 +
 .../lib/Target/BPF/MCTargetDesc/BPFMCAsmInfo.h     |   34 +
 .../Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp   |  167 ++
 .../Target/BPF/MCTargetDesc/BPFMCTargetDesc.cpp    |  115 ++
 .../lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.h  |   56 +
 .../lib/Target/BPF/TargetInfo/BPFTargetInfo.cpp    |   13 +
 tools/bpf/llvm/tools/llc/llc.cpp                   |  381 ++++
 100 files changed, 10875 insertions(+), 302 deletions(-)
 create mode 100644 include/linux/bpf.h
 create mode 100644 include/trace/bpf_trace.h
 create mode 100644 include/uapi/linux/bpf.h
 create mode 100644 kernel/bpf/hashtab.c
 create mode 100644 kernel/bpf/syscall.c
 create mode 100644 kernel/bpf/verifier.c
 create mode 100644 kernel/trace/bpf_trace.c
 create mode 100644 samples/bpf/Makefile
 create mode 100644 samples/bpf/bpf_helpers.h
 create mode 100644 samples/bpf/bpf_load.c
 create mode 100644 samples/bpf/bpf_load.h
 create mode 100644 samples/bpf/dropmon.c
 create mode 100644 samples/bpf/ex1_kern.c
 create mode 100644 samples/bpf/ex1_user.c
 create mode 100644 samples/bpf/ex2_kern.c
 create mode 100644 samples/bpf/ex2_user.c
 create mode 100644 samples/bpf/ex3_kern.c
 create mode 100644 samples/bpf/ex3_user.c
 create mode 100644 samples/bpf/libbpf.c
 create mode 100644 samples/bpf/libbpf.h
 create mode 100644 samples/bpf/test_verifier.c
 create mode 100644 tools/bpf/llvm/.gitignore
 create mode 100644 tools/bpf/llvm/LICENSE.TXT
 create mode 100644 tools/bpf/llvm/Makefile.rules
 create mode 100644 tools/bpf/llvm/README.txt
 create mode 100644 tools/bpf/llvm/bld/Makefile
 create mode 100644 tools/bpf/llvm/bld/Makefile.common
 create mode 100644 tools/bpf/llvm/bld/Makefile.config
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Config/AsmParsers.def
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Config/AsmPrinters.def
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Config/Disassemblers.def
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Config/Targets.def
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Support/DataTypes.h
 create mode 100644 tools/bpf/llvm/bld/lib/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/BPF/InstPrinter/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/BPF/MCTargetDesc/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/BPF/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/BPF/TargetInfo/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/Makefile
 create mode 100644 tools/bpf/llvm/bld/tools/Makefile
 create mode 100644 tools/bpf/llvm/bld/tools/llc/Makefile
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPF.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPF.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFAsmPrinter.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFCallingConv.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFISelDAGToDAG.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFInstrFormats.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFBaseInfo.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFELFObjectWriter.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCAsmInfo.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/TargetInfo/BPFTargetInfo.cpp
 create mode 100644 tools/bpf/llvm/tools/llc/llc.cpp

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 01/29] bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 02/29] net: filter: add "load 64-bit immediate" eBPF instruction Alexei Starovoitov
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

'shift by register' operations are supported by eBPF interpreter, but were
accidently left out of x64 JIT compiler. Fix it and add a testcase.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 arch/x86/net/bpf_jit_comp.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 lib/test_bpf.c              |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 5c8cb8043c5a..b08a98c59530 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -515,6 +515,48 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			EMIT3(0xC1, add_1reg(b3, dst_reg), imm32);
 			break;
 
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_ARSH | BPF_X:
+		case BPF_ALU64 | BPF_LSH | BPF_X:
+		case BPF_ALU64 | BPF_RSH | BPF_X:
+		case BPF_ALU64 | BPF_ARSH | BPF_X:
+
+			/* check for bad case when dst_reg == rcx */
+			if (dst_reg == BPF_REG_4) {
+				/* mov r11, dst_reg */
+				EMIT_mov(AUX_REG, dst_reg);
+				dst_reg = AUX_REG;
+			}
+
+			if (src_reg != BPF_REG_4) { /* common case */
+				EMIT1(0x51); /* push rcx */
+
+				/* mov rcx, src_reg */
+				EMIT_mov(BPF_REG_4, src_reg);
+			}
+
+			/* shl %rax, %cl | shr %rax, %cl | sar %rax, %cl */
+			if (BPF_CLASS(insn->code) == BPF_ALU64)
+				EMIT1(add_1mod(0x48, dst_reg));
+			else if (is_ereg(dst_reg))
+				EMIT1(add_1mod(0x40, dst_reg));
+
+			switch (BPF_OP(insn->code)) {
+			case BPF_LSH: b3 = 0xE0; break;
+			case BPF_RSH: b3 = 0xE8; break;
+			case BPF_ARSH: b3 = 0xF8; break;
+			}
+			EMIT2(0xD3, add_1reg(b3, dst_reg));
+
+			if (src_reg != BPF_REG_4)
+				EMIT1(0x59); /* pop rcx */
+
+			if (insn->dst_reg == BPF_REG_4)
+				/* mov dst_reg, r11 */
+				EMIT_mov(insn->dst_reg, AUX_REG);
+			break;
+
 		case BPF_ALU | BPF_END | BPF_FROM_BE:
 			switch (imm32) {
 			case 16:
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 89e0345733bd..8c66c6aace04 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -1342,6 +1342,44 @@ static struct bpf_test tests[] = {
 		{ { 0, -1 } }
 	},
 	{
+		"INT: shifts by register",
+		.u.insns_int = {
+			BPF_MOV64_IMM(R0, -1234),
+			BPF_MOV64_IMM(R1, 1),
+			BPF_ALU32_REG(BPF_RSH, R0, R1),
+			BPF_JMP_IMM(BPF_JEQ, R0, 0x7ffffd97, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(R2, 1),
+			BPF_ALU64_REG(BPF_LSH, R0, R2),
+			BPF_MOV32_IMM(R4, -1234),
+			BPF_JMP_REG(BPF_JEQ, R0, R4, 1),
+			BPF_EXIT_INSN(),
+			BPF_ALU64_IMM(BPF_AND, R4, 63),
+			BPF_ALU64_REG(BPF_LSH, R0, R4), /* R0 <= 46 */
+			BPF_MOV64_IMM(R3, 47),
+			BPF_ALU64_REG(BPF_ARSH, R0, R3),
+			BPF_JMP_IMM(BPF_JEQ, R0, -617, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(R2, 1),
+			BPF_ALU64_REG(BPF_LSH, R4, R2), /* R4 = 46 << 1 */
+			BPF_JMP_IMM(BPF_JEQ, R4, 92, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(R4, 4),
+			BPF_ALU64_REG(BPF_LSH, R4, R4), /* R4 = 4 << 4 */
+			BPF_JMP_IMM(BPF_JEQ, R4, 64, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(R4, 5),
+			BPF_ALU32_REG(BPF_LSH, R4, R4), /* R4 = 5 << 5 */
+			BPF_JMP_IMM(BPF_JEQ, R4, 160, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(R0, -1),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, -1 } }
+	},
+	{
 		"INT: DIV + ABS",
 		.u.insns_int = {
 			BPF_ALU64_REG(BPF_MOV, R6, R1),
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 02/29] net: filter: add "load 64-bit immediate" eBPF instruction
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 01/29] bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 03/29] net: filter: split filter.h and expose eBPF to user space Alexei Starovoitov
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
All previous instructions were 8-byte. This is first 16-byte instruction.
Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
insn[0].code = BPF_LD | BPF_DW | BPF_IMM
insn[0].dst_reg = destination register
insn[0].imm = lower 32-bit
insn[1].code = 0
insn[1].imm = upper 32-bit
All unused fields must be zero.

Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
which loads 32-bit immediate value into a register.

x64 JITs it as single 'movabsq %rax, imm64'
arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn

Note that old eBPF programs are binary compatible with new interpreter.

In the following patches this instruction is used to store eBPF map pointers:
 BPF_LD_IMM64(R1, const_imm_map_ptr)
 BPF_CALL(BPF_FUNC_map_lookup_elem)
and verifier is introduced to check validity of the programs.

Later LLVM compiler is using this insn as generic load of 64-bit immediate
constant and as a load of map pointer with relocation.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 Documentation/networking/filter.txt |    8 +++++++-
 arch/x86/net/bpf_jit_comp.c         |   17 +++++++++++++++++
 include/linux/filter.h              |   18 ++++++++++++++++++
 kernel/bpf/core.c                   |    5 +++++
 lib/test_bpf.c                      |   21 +++++++++++++++++++++
 5 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index c48a9704bda8..81916ab5d96f 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -951,7 +951,7 @@ Size modifier is one of ...
 
 Mode modifier is one of:
 
-  BPF_IMM  0x00  /* classic BPF only, reserved in eBPF */
+  BPF_IMM  0x00  /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
   BPF_ABS  0x20
   BPF_IND  0x40
   BPF_MEM  0x60
@@ -995,6 +995,12 @@ BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
 Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
 2 byte atomic increments are not supported.
 
+eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
+of two consecutive 'struct bpf_insn' 8-byte blocks and interpreted as single
+instruction that loads 64-bit immediate value into a dst_reg.
+Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
+32-bit immediate value into a register.
+
 Testing
 -------
 
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index b08a98c59530..98837147ee57 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -393,6 +393,23 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			EMIT1_off32(add_1reg(0xB8, dst_reg), imm32);
 			break;
 
+		case BPF_LD | BPF_IMM | BPF_DW:
+			if (insn[1].code != 0 || insn[1].src_reg != 0 ||
+			    insn[1].dst_reg != 0 || insn[1].off != 0) {
+				/* verifier must catch invalid insns */
+				pr_err("invalid BPF_LD_IMM64 insn\n");
+				return -EINVAL;
+			}
+
+			/* movabsq %rax, imm64 */
+			EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
+			EMIT(insn[0].imm, 4);
+			EMIT(insn[1].imm, 4);
+
+			insn++;
+			i++;
+			break;
+
 			/* dst %= src, dst /= src, dst %= imm32, dst /= imm32 */
 		case BPF_ALU | BPF_MOD | BPF_X:
 		case BPF_ALU | BPF_DIV | BPF_X:
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a5227ab8ccb1..f3262b598262 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -161,6 +161,24 @@ enum {
 		.off   = 0,					\
 		.imm   = IMM })
 
+/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
+#define BPF_LD_IMM64(DST, IMM)					\
+	BPF_LD_IMM64_RAW(DST, 0, IMM)
+
+#define BPF_LD_IMM64_RAW(DST, SRC, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_DW | BPF_IMM,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = (__u32) (IMM) }),			\
+	((struct bpf_insn) {					\
+		.code  = 0, /* zero is reserved opcode */	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = ((__u64) (IMM)) >> 32 })
+
 /* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
 
 #define BPF_MOV64_RAW(TYPE, DST, SRC, IMM)			\
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 7f0dbcbb34af..0434c2170f2b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -180,6 +180,7 @@ static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
 		[BPF_LD | BPF_IND | BPF_W] = &&LD_IND_W,
 		[BPF_LD | BPF_IND | BPF_H] = &&LD_IND_H,
 		[BPF_LD | BPF_IND | BPF_B] = &&LD_IND_B,
+		[BPF_LD | BPF_IMM | BPF_DW] = &&LD_IMM_DW,
 	};
 	void *ptr;
 	int off;
@@ -239,6 +240,10 @@ select_insn:
 	ALU64_MOV_K:
 		DST = IMM;
 		CONT;
+	LD_IMM_DW:
+		DST = (u64) (u32) insn[0].imm | ((u64) (u32) insn[1].imm) << 32;
+		insn++;
+		CONT;
 	ALU64_ARSH_X:
 		(*(s64 *) &DST) >>= SRC;
 		CONT;
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 8c66c6aace04..46ab1a7ef135 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -1735,6 +1735,27 @@ static struct bpf_test tests[] = {
 		{ },
 		{ { 1, 0 } },
 	},
+	{
+		"load 64-bit immediate",
+		.u.insns_int = {
+			BPF_LD_IMM64(R1, 0x567800001234L),
+			BPF_MOV64_REG(R2, R1),
+			BPF_MOV64_REG(R3, R2),
+			BPF_ALU64_IMM(BPF_RSH, R2, 32),
+			BPF_ALU64_IMM(BPF_LSH, R3, 32),
+			BPF_ALU64_IMM(BPF_RSH, R3, 32),
+			BPF_ALU64_IMM(BPF_MOV, R0, 0),
+			BPF_JMP_IMM(BPF_JEQ, R2, 0x5678, 1),
+			BPF_EXIT_INSN(),
+			BPF_JMP_IMM(BPF_JEQ, R3, 0x1234, 1),
+			BPF_EXIT_INSN(),
+			BPF_ALU64_IMM(BPF_MOV, R0, 1),
+			BPF_EXIT_INSN(),
+		},
+		INTERNAL,
+		{ },
+		{ { 0, 1 } }
+	},
 };
 
 static struct net_device dev;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 03/29] net: filter: split filter.h and expose eBPF to user space
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 01/29] bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 02/29] net: filter: add "load 64-bit immediate" eBPF instruction Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 04/29] bpf: introduce syscall(BPF, ...) and BPF maps Alexei Starovoitov
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

eBPF can be used from user space.

uapi/linux/bpf.h: eBPF instruction set definition

linux/filter.h: the rest

This patch only moves macro definitions, but practically it freezes existing
eBPF instruction set, though new instructions can still be added in the future.

These eBPF definitions cannot go into uapi/linux/filter.h, since the names
may conflict with existing applications.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/filter.h    |  312 +------------------------------------------
 include/uapi/linux/Kbuild |    1 +
 include/uapi/linux/bpf.h  |  321 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 323 insertions(+), 311 deletions(-)
 create mode 100644 include/uapi/linux/bpf.h

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f3262b598262..f04793474d16 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -9,322 +9,12 @@
 #include <linux/skbuff.h>
 #include <linux/workqueue.h>
 #include <uapi/linux/filter.h>
-
-/* Internally used and optimized filter representation with extended
- * instruction set based on top of classic BPF.
- */
-
-/* instruction classes */
-#define BPF_ALU64	0x07	/* alu mode in double word width */
-
-/* ld/ldx fields */
-#define BPF_DW		0x18	/* double word */
-#define BPF_XADD	0xc0	/* exclusive add */
-
-/* alu/jmp fields */
-#define BPF_MOV		0xb0	/* mov reg to reg */
-#define BPF_ARSH	0xc0	/* sign extending arithmetic shift right */
-
-/* change endianness of a register */
-#define BPF_END		0xd0	/* flags for endianness conversion: */
-#define BPF_TO_LE	0x00	/* convert to little-endian */
-#define BPF_TO_BE	0x08	/* convert to big-endian */
-#define BPF_FROM_LE	BPF_TO_LE
-#define BPF_FROM_BE	BPF_TO_BE
-
-#define BPF_JNE		0x50	/* jump != */
-#define BPF_JSGT	0x60	/* SGT is signed '>', GT in x86 */
-#define BPF_JSGE	0x70	/* SGE is signed '>=', GE in x86 */
-#define BPF_CALL	0x80	/* function call */
-#define BPF_EXIT	0x90	/* function return */
-
-/* Register numbers */
-enum {
-	BPF_REG_0 = 0,
-	BPF_REG_1,
-	BPF_REG_2,
-	BPF_REG_3,
-	BPF_REG_4,
-	BPF_REG_5,
-	BPF_REG_6,
-	BPF_REG_7,
-	BPF_REG_8,
-	BPF_REG_9,
-	BPF_REG_10,
-	__MAX_BPF_REG,
-};
-
-/* BPF has 10 general purpose 64-bit registers and stack frame. */
-#define MAX_BPF_REG	__MAX_BPF_REG
-
-/* ArgX, context and stack frame pointer register positions. Note,
- * Arg1, Arg2, Arg3, etc are used as argument mappings of function
- * calls in BPF_CALL instruction.
- */
-#define BPF_REG_ARG1	BPF_REG_1
-#define BPF_REG_ARG2	BPF_REG_2
-#define BPF_REG_ARG3	BPF_REG_3
-#define BPF_REG_ARG4	BPF_REG_4
-#define BPF_REG_ARG5	BPF_REG_5
-#define BPF_REG_CTX	BPF_REG_6
-#define BPF_REG_FP	BPF_REG_10
-
-/* Additional register mappings for converted user programs. */
-#define BPF_REG_A	BPF_REG_0
-#define BPF_REG_X	BPF_REG_7
-#define BPF_REG_TMP	BPF_REG_8
-
-/* BPF program can access up to 512 bytes of stack space. */
-#define MAX_BPF_STACK	512
-
-/* Helper macros for filter block array initializers. */
-
-/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
-
-#define BPF_ALU64_REG(OP, DST, SRC)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU64 | BPF_OP(OP) | BPF_X,	\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = 0,					\
-		.imm   = 0 })
-
-#define BPF_ALU32_REG(OP, DST, SRC)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = 0,					\
-		.imm   = 0 })
-
-/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
-
-#define BPF_ALU64_IMM(OP, DST, IMM)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU64 | BPF_OP(OP) | BPF_K,	\
-		.dst_reg = DST,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = IMM })
-
-#define BPF_ALU32_IMM(OP, DST, IMM)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
-		.dst_reg = DST,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = IMM })
-
-/* Endianess conversion, cpu_to_{l,b}e(), {l,b}e_to_cpu() */
-
-#define BPF_ENDIAN(TYPE, DST, LEN)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU | BPF_END | BPF_SRC(TYPE),	\
-		.dst_reg = DST,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = LEN })
-
-/* Short form of mov, dst_reg = src_reg */
-
-#define BPF_MOV64_REG(DST, SRC)					\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU64 | BPF_MOV | BPF_X,		\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = 0,					\
-		.imm   = 0 })
-
-#define BPF_MOV32_REG(DST, SRC)					\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU | BPF_MOV | BPF_X,		\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = 0,					\
-		.imm   = 0 })
-
-/* Short form of mov, dst_reg = imm32 */
-
-#define BPF_MOV64_IMM(DST, IMM)					\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU64 | BPF_MOV | BPF_K,		\
-		.dst_reg = DST,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = IMM })
-
-#define BPF_MOV32_IMM(DST, IMM)					\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU | BPF_MOV | BPF_K,		\
-		.dst_reg = DST,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = IMM })
-
-/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
-#define BPF_LD_IMM64(DST, IMM)					\
-	BPF_LD_IMM64_RAW(DST, 0, IMM)
-
-#define BPF_LD_IMM64_RAW(DST, SRC, IMM)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_LD | BPF_DW | BPF_IMM,		\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = 0,					\
-		.imm   = (__u32) (IMM) }),			\
-	((struct bpf_insn) {					\
-		.code  = 0, /* zero is reserved opcode */	\
-		.dst_reg = 0,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = ((__u64) (IMM)) >> 32 })
-
-/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
-
-#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM)			\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU64 | BPF_MOV | BPF_SRC(TYPE),	\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = 0,					\
-		.imm   = IMM })
-
-#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
-	((struct bpf_insn) {					\
-		.code  = BPF_ALU | BPF_MOV | BPF_SRC(TYPE),	\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = 0,					\
-		.imm   = IMM })
-
-/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
-
-#define BPF_LD_ABS(SIZE, IMM)					\
-	((struct bpf_insn) {					\
-		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
-		.dst_reg = 0,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = IMM })
-
-/* Indirect packet access, R0 = *(uint *) (skb->data + src_reg + imm32) */
-
-#define BPF_LD_IND(SIZE, SRC, IMM)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_IND,	\
-		.dst_reg = 0,					\
-		.src_reg = SRC,					\
-		.off   = 0,					\
-		.imm   = IMM })
-
-/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
-
-#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
-	((struct bpf_insn) {					\
-		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = OFF,					\
-		.imm   = 0 })
-
-/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
-
-#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
-	((struct bpf_insn) {					\
-		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = OFF,					\
-		.imm   = 0 })
-
-/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
-
-#define BPF_ST_MEM(SIZE, DST, OFF, IMM)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM,	\
-		.dst_reg = DST,					\
-		.src_reg = 0,					\
-		.off   = OFF,					\
-		.imm   = IMM })
-
-/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
-
-#define BPF_JMP_REG(OP, DST, SRC, OFF)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_JMP | BPF_OP(OP) | BPF_X,		\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = OFF,					\
-		.imm   = 0 })
-
-/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
-
-#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
-	((struct bpf_insn) {					\
-		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
-		.dst_reg = DST,					\
-		.src_reg = 0,					\
-		.off   = OFF,					\
-		.imm   = IMM })
-
-/* Function call */
-
-#define BPF_EMIT_CALL(FUNC)					\
-	((struct bpf_insn) {					\
-		.code  = BPF_JMP | BPF_CALL,			\
-		.dst_reg = 0,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = ((FUNC) - __bpf_call_base) })
-
-/* Raw code statement block */
-
-#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
-	((struct bpf_insn) {					\
-		.code  = CODE,					\
-		.dst_reg = DST,					\
-		.src_reg = SRC,					\
-		.off   = OFF,					\
-		.imm   = IMM })
-
-/* Program exit */
-
-#define BPF_EXIT_INSN()						\
-	((struct bpf_insn) {					\
-		.code  = BPF_JMP | BPF_EXIT,			\
-		.dst_reg = 0,					\
-		.src_reg = 0,					\
-		.off   = 0,					\
-		.imm   = 0 })
-
-#define bytes_to_bpf_size(bytes)				\
-({								\
-	int bpf_size = -EINVAL;					\
-								\
-	if (bytes == sizeof(u8))				\
-		bpf_size = BPF_B;				\
-	else if (bytes == sizeof(u16))				\
-		bpf_size = BPF_H;				\
-	else if (bytes == sizeof(u32))				\
-		bpf_size = BPF_W;				\
-	else if (bytes == sizeof(u64))				\
-		bpf_size = BPF_DW;				\
-								\
-	bpf_size;						\
-})
+#include <uapi/linux/bpf.h>
 
 /* Macro to invoke filter function. */
 #define SK_RUN_FILTER(filter, ctx) \
 	(*filter->prog->bpf_func)(ctx, filter->prog->insnsi)
 
-struct bpf_insn {
-	__u8	code;		/* opcode */
-	__u8	dst_reg:4;	/* dest register */
-	__u8	src_reg:4;	/* source register */
-	__s16	off;		/* signed offset */
-	__s32	imm;		/* signed immediate constant */
-};
-
 #ifdef CONFIG_COMPAT
 /* A struct sock_filter is architecture independent. */
 struct compat_sock_fprog {
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 24e9033f8b3f..fb3f7b675229 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -67,6 +67,7 @@ header-y += bfs_fs.h
 header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
+header-y += bpf.h
 header-y += bpqether.h
 header-y += bsg.h
 header-y += btrfs.h
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
new file mode 100644
index 000000000000..45a09b46c578
--- /dev/null
+++ b/include/uapi/linux/bpf.h
@@ -0,0 +1,321 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _UAPI__LINUX_BPF_H__
+#define _UAPI__LINUX_BPF_H__
+
+#include <linux/types.h>
+
+/* Extended instruction set based on top of classic BPF */
+
+/* instruction classes */
+#define BPF_ALU64	0x07	/* alu mode in double word width */
+
+/* ld/ldx fields */
+#define BPF_DW		0x18	/* double word */
+#define BPF_XADD	0xc0	/* exclusive add */
+
+/* alu/jmp fields */
+#define BPF_MOV		0xb0	/* mov reg to reg */
+#define BPF_ARSH	0xc0	/* sign extending arithmetic shift right */
+
+/* change endianness of a register */
+#define BPF_END		0xd0	/* flags for endianness conversion: */
+#define BPF_TO_LE	0x00	/* convert to little-endian */
+#define BPF_TO_BE	0x08	/* convert to big-endian */
+#define BPF_FROM_LE	BPF_TO_LE
+#define BPF_FROM_BE	BPF_TO_BE
+
+#define BPF_JNE		0x50	/* jump != */
+#define BPF_JSGT	0x60	/* SGT is signed '>', GT in x86 */
+#define BPF_JSGE	0x70	/* SGE is signed '>=', GE in x86 */
+#define BPF_CALL	0x80	/* function call */
+#define BPF_EXIT	0x90	/* function return */
+
+/* Register numbers */
+enum {
+	BPF_REG_0 = 0,
+	BPF_REG_1,
+	BPF_REG_2,
+	BPF_REG_3,
+	BPF_REG_4,
+	BPF_REG_5,
+	BPF_REG_6,
+	BPF_REG_7,
+	BPF_REG_8,
+	BPF_REG_9,
+	BPF_REG_10,
+	__MAX_BPF_REG,
+};
+
+/* BPF has 10 general purpose 64-bit registers and stack frame. */
+#define MAX_BPF_REG	__MAX_BPF_REG
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	BPF_REG_1
+#define BPF_REG_ARG2	BPF_REG_2
+#define BPF_REG_ARG3	BPF_REG_3
+#define BPF_REG_ARG4	BPF_REG_4
+#define BPF_REG_ARG5	BPF_REG_5
+#define BPF_REG_CTX	BPF_REG_6
+#define BPF_REG_FP	BPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	BPF_REG_0
+#define BPF_REG_X	BPF_REG_7
+#define BPF_REG_TMP	BPF_REG_8
+
+/* BPF program can access up to 512 bytes of stack space. */
+#define MAX_BPF_STACK	512
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU64_IMM(OP, DST, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_OP(OP) | BPF_K,	\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Endianess conversion, cpu_to_{l,b}e(), {l,b}e_to_cpu() */
+
+#define BPF_ENDIAN(TYPE, DST, LEN)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_END | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = LEN })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV64_IMM(DST, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
+#define BPF_LD_IMM64(DST, IMM)					\
+	BPF_LD_IMM64_RAW(DST, 0, IMM)
+
+#define BPF_LD_IMM64_RAW(DST, SRC, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_DW | BPF_IMM,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = (__u32) (IMM) }),			\
+	((struct bpf_insn) {					\
+		.code  = 0, /* zero is reserved opcode */	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = ((__u64) (IMM)) >> 32 })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU64 | BPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_ALU | BPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Indirect packet access, R0 = *(uint *) (skb->data + src_reg + imm32) */
+
+#define BPF_LD_IND(SIZE, SRC, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_IND,	\
+		.dst_reg = 0,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
+
+#define BPF_ST_MEM(SIZE, DST, OFF, IMM)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
+
+#define BPF_JMP_REG(OP, DST, SRC, OFF)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Function call */
+
+#define BPF_EMIT_CALL(FUNC)					\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_CALL,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = ((FUNC) - __bpf_call_base) })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct bpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct bpf_insn) {					\
+		.code  = BPF_JMP | BPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define bytes_to_bpf_size(bytes)				\
+({								\
+	int bpf_size = -EINVAL;					\
+								\
+	if (bytes == sizeof(u8))				\
+		bpf_size = BPF_B;				\
+	else if (bytes == sizeof(u16))				\
+		bpf_size = BPF_H;				\
+	else if (bytes == sizeof(u32))				\
+		bpf_size = BPF_W;				\
+	else if (bytes == sizeof(u64))				\
+		bpf_size = BPF_DW;				\
+								\
+	bpf_size;						\
+})
+
+struct bpf_insn {
+	__u8	code;		/* opcode */
+	__u8	dst_reg:4;	/* dest register */
+	__u8	src_reg:4;	/* source register */
+	__s16	off;		/* signed offset */
+	__s32	imm;		/* signed immediate constant */
+};
+
+#endif /* _UAPI__LINUX_BPF_H__ */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 04/29] bpf: introduce syscall(BPF, ...) and BPF maps
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 03/29] net: filter: split filter.h and expose eBPF to user space Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 05/29] bpf: enable bpf syscall on x64 and i386 Alexei Starovoitov
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

BPF syscall is a demux for different BPF releated commands.

'maps' is a generic storage of different types for sharing data between kernel
and userspace.

The maps can be created from user space via BPF syscall:
- create a map with given type and attributes
  fd = bpf_map_create(map_type, struct nlattr *attr, int len)
  returns fd or negative error

- close(fd) deletes the map

Next patch allows userspace programs to populate/read maps that eBPF programs
are concurrently updating.

maps can have different types: hash, bloom filter, radix-tree, etc.

The map is defined by:
  . type
  . max number of elements
  . key size in bytes
  . value size in bytes

This patch establishes core infrastructure for BPF maps.
Next patches implement lookup/update and hashtable type.
More map types can be added in the future.

syscall is using type-length-value style of passing arguments to be backwards
compatible with future extensions to map attributes. Different map types may
use different attributes as well.
The concept of type-lenght-value is borrowed from netlink, but netlink itself
is not applicable here, since BPF programs and maps can be used in NET-less
configurations.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 Documentation/networking/filter.txt |   71 ++++++++++++++++
 include/linux/bpf.h                 |   42 ++++++++++
 include/uapi/linux/bpf.h            |   24 ++++++
 kernel/bpf/Makefile                 |    2 +-
 kernel/bpf/syscall.c                |  156 +++++++++++++++++++++++++++++++++++
 5 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/bpf.h
 create mode 100644 kernel/bpf/syscall.c

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 81916ab5d96f..27a0a6c6acb4 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -1001,6 +1001,77 @@ instruction that loads 64-bit immediate value into a dst_reg.
 Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
 32-bit immediate value into a register.
 
+eBPF maps
+---------
+'maps' is a generic storage of different types for sharing data between kernel
+and userspace.
+
+The maps are accessed from user space via BPF syscall, which has commands:
+- create a map with given type and attributes
+  map_fd = bpf_map_create(map_type, struct nlattr *attr, int len)
+  returns process-local file descriptor or negative error
+
+- lookup key in a given map
+  err = bpf_map_lookup_elem(int fd, void *key, void *value)
+  returns zero and stores found elem into value or negative error
+
+- create or update key/value pair in a given map
+  err = bpf_map_update_elem(int fd, void *key, void *value)
+  returns zero or negative error
+
+- find and delete element by key in a given map
+  err = bpf_map_delete_elem(int fd, void *key)
+
+- to delete map: close(fd)
+  Exiting process will delete maps automatically
+
+userspace programs uses this API to create/populate/read maps that eBPF programs
+are concurrently updating.
+
+maps can have different types: hash, array, bloom filter, radix-tree, etc.
+
+The map is defined by:
+  . type
+  . max number of elements
+  . key size in bytes
+  . value size in bytes
+
+The maps are accesible from eBPF program with API:
+  void * bpf_map_lookup_elem(u32 map_fd, void *key);
+  int bpf_map_update_elem(u32 map_fd, void *key, void *value);
+  int bpf_map_delete_elem(u32 map_fd, void *key);
+
+The kernel replaces process-local map_fd with kernel internal map pointer,
+while loading eBPF program.
+
+If eBPF verifier is configured to recognize extra calls in the program
+bpf_map_lookup_elem() and bpf_map_update_elem() then access to maps looks like:
+  ...
+  ptr_to_value = bpf_map_lookup_elem(map_fd, key)
+  access memory range [ptr_to_value, ptr_to_value + value_size_in_bytes)
+  ...
+  prepare key2 and value2 on stack of key_size and value_size
+  err = bpf_map_update_elem(map_fd, key2, value2)
+  ...
+
+eBPF program cannot create or delete maps
+(such calls will be unknown to verifier)
+
+During program loading the refcnt of used maps is incremented, so they don't get
+deleted while program is running
+
+bpf_map_update_elem() can fail if maximum number of elements reached.
+if key2 already exists, bpf_map_update_elem() replaces it with value2 atomically
+
+bpf_map_lookup_elem() returns NULL or ptr_to_value, so program must do
+if (ptr_to_value != NULL) check before accessing it.
+NULL means that element with given 'key' was not found.
+
+The verifier will check that the program accesses map elements within specified
+size. It will not let programs pass junk values to bpf_map_*_elem() functions,
+so these functions (implemented in C inside kernel) can safely access
+the pointers in all cases.
+
 Testing
 -------
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
new file mode 100644
index 000000000000..607ca53fe2af
--- /dev/null
+++ b/include/linux/bpf.h
@@ -0,0 +1,42 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _LINUX_BPF_H
+#define _LINUX_BPF_H 1
+
+#include <uapi/linux/bpf.h>
+#include <linux/workqueue.h>
+
+struct bpf_map;
+struct nlattr;
+
+/* map is generic key/value storage optionally accesible by eBPF programs */
+struct bpf_map_ops {
+	/* funcs callable from userspace (via syscall) */
+	struct bpf_map *(*map_alloc)(struct nlattr *attrs[BPF_MAP_ATTR_MAX + 1]);
+	void (*map_free)(struct bpf_map *);
+};
+
+struct bpf_map {
+	atomic_t refcnt;
+	enum bpf_map_type map_type;
+	u32 key_size;
+	u32 value_size;
+	u32 max_entries;
+	struct bpf_map_ops *ops;
+	struct work_struct work;
+};
+
+struct bpf_map_type_list {
+	struct list_head list_node;
+	struct bpf_map_ops *ops;
+	enum bpf_map_type type;
+};
+
+void bpf_register_map_type(struct bpf_map_type_list *tl);
+void bpf_map_put(struct bpf_map *map);
+
+#endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 45a09b46c578..de21c8ecf0bb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -318,4 +318,28 @@ struct bpf_insn {
 	__s32	imm;		/* signed immediate constant */
 };
 
+/* BPF syscall commands */
+enum bpf_cmd {
+	/* create a map with given type and attributes
+	 * fd = bpf_map_create(bpf_map_type, struct nlattr *attr, int len)
+	 * returns fd or negative error
+	 * map is deleted when fd is closed
+	 */
+	BPF_MAP_CREATE,
+};
+
+enum bpf_map_attributes {
+	BPF_MAP_UNSPEC,
+	BPF_MAP_KEY_SIZE,	/* size of key in bytes */
+	BPF_MAP_VALUE_SIZE,	/* size of value in bytes */
+	BPF_MAP_MAX_ENTRIES,	/* maximum number of entries in a map */
+	__BPF_MAP_ATTR_MAX,
+};
+#define BPF_MAP_ATTR_MAX (__BPF_MAP_ATTR_MAX - 1)
+#define BPF_MAP_MAX_ATTR_SIZE 65535
+
+enum bpf_map_type {
+	BPF_MAP_TYPE_UNSPEC,
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 6a71145e2769..e9f7334ed07a 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1 +1 @@
-obj-y := core.o
+obj-y := core.o syscall.o
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
new file mode 100644
index 000000000000..04cdf7948f8f
--- /dev/null
+++ b/kernel/bpf/syscall.c
@@ -0,0 +1,156 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/bpf.h>
+#include <linux/syscalls.h>
+#include <net/netlink.h>
+#include <linux/anon_inodes.h>
+
+static LIST_HEAD(bpf_map_types);
+
+static struct bpf_map *find_and_alloc_map(enum bpf_map_type type,
+					  struct nlattr *tb[BPF_MAP_ATTR_MAX + 1])
+{
+	struct bpf_map_type_list *tl;
+	struct bpf_map *map;
+
+	list_for_each_entry(tl, &bpf_map_types, list_node) {
+		if (tl->type == type) {
+			map = tl->ops->map_alloc(tb);
+			if (IS_ERR(map))
+				return map;
+			map->ops = tl->ops;
+			map->map_type = type;
+			return map;
+		}
+	}
+	return ERR_PTR(-EINVAL);
+}
+
+/* boot time registration of different map implementations */
+void bpf_register_map_type(struct bpf_map_type_list *tl)
+{
+	list_add(&tl->list_node, &bpf_map_types);
+}
+
+/* called from workqueue */
+static void bpf_map_free_deferred(struct work_struct *work)
+{
+	struct bpf_map *map = container_of(work, struct bpf_map, work);
+
+	/* implementation dependent freeing */
+	map->ops->map_free(map);
+}
+
+/* decrement map refcnt and schedule it for freeing via workqueue
+ * (unrelying map implementation ops->map_free() might sleep)
+ */
+void bpf_map_put(struct bpf_map *map)
+{
+	if (atomic_dec_and_test(&map->refcnt)) {
+		INIT_WORK(&map->work, bpf_map_free_deferred);
+		schedule_work(&map->work);
+	}
+}
+
+static int bpf_map_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_map *map = filp->private_data;
+
+	bpf_map_put(map);
+	return 0;
+}
+
+static const struct file_operations bpf_map_fops = {
+        .release = bpf_map_release,
+};
+
+static const struct nla_policy map_policy[BPF_MAP_ATTR_MAX + 1] = {
+	[BPF_MAP_KEY_SIZE]    = { .type = NLA_U32 },
+	[BPF_MAP_VALUE_SIZE]  = { .type = NLA_U32 },
+	[BPF_MAP_MAX_ENTRIES] = { .type = NLA_U32 },
+};
+
+/* called via syscall */
+static int map_create(enum bpf_map_type type, struct nlattr __user *uattr, int len)
+{
+	struct nlattr *tb[BPF_MAP_ATTR_MAX + 1];
+	struct bpf_map *map;
+	struct nlattr *attr;
+	int err;
+
+	if (len <= 0 || len > BPF_MAP_MAX_ATTR_SIZE)
+		return -EINVAL;
+
+	attr = kmalloc(len, GFP_USER);
+	if (!attr)
+		return -ENOMEM;
+
+	/* copy map attributes from user space */
+	err = -EFAULT;
+	if (copy_from_user(attr, uattr, len) != 0)
+		goto free_attr;
+
+	/* perform basic validation */
+	err = nla_parse(tb, BPF_MAP_ATTR_MAX, attr, len, map_policy);
+	if (err < 0)
+		goto free_attr;
+
+	/* find map type and init map: hashtable vs rbtree vs bloom vs ... */
+	map = find_and_alloc_map(type, tb);
+	if (IS_ERR(map)) {
+		err = PTR_ERR(map);
+		goto free_attr;
+	}
+
+	atomic_set(&map->refcnt, 1);
+
+	err = anon_inode_getfd("bpf-map", &bpf_map_fops, map, O_RDWR | O_CLOEXEC);
+
+	if (err < 0)
+		/* failed to allocate fd */
+		goto free_map;
+
+	/* user supplied array of map attributes is no longer needed */
+	kfree(attr);
+
+	return err;
+
+free_map:
+	map->ops->map_free(map);
+free_attr:
+	kfree(attr);
+	return err;
+}
+
+SYSCALL_DEFINE5(bpf, int, cmd, unsigned long, arg2, unsigned long, arg3,
+		unsigned long, arg4, unsigned long, arg5)
+{
+	/* eBPF syscall is limited to root temporarily. This restriction will
+	 * be lifted when verifier has enough mileage and security audit is
+	 * clean. Note that tracing/networking analytics use cases will be
+	 * turning off 'secure' mode of verifier, since they need to pass
+	 * kernel data back to user space
+	 */
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (arg5 != 0)
+		return -EINVAL;
+
+	switch (cmd) {
+	case BPF_MAP_CREATE:
+		return map_create((enum bpf_map_type) arg2,
+				  (struct nlattr __user *) arg3, (int) arg4);
+	default:
+		return -EINVAL;
+	}
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 05/29] bpf: enable bpf syscall on x64 and i386
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (3 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 04/29] bpf: introduce syscall(BPF, ...) and BPF maps Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps Alexei Starovoitov
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

done as separate commit to ease conflict resolution

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 arch/x86/syscalls/syscall_32.tbl  |    1 +
 arch/x86/syscalls/syscall_64.tbl  |    1 +
 include/linux/syscalls.h          |    3 ++-
 include/uapi/asm-generic/unistd.h |    4 +++-
 kernel/sys_ni.c                   |    3 +++
 5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 028b78168d85..9fe1b5d002f0 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -363,3 +363,4 @@
 354	i386	seccomp			sys_seccomp
 355	i386	getrandom		sys_getrandom
 356	i386	memfd_create		sys_memfd_create
+357	i386	bpf			sys_bpf
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 35dd922727b9..281150b539a2 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -327,6 +327,7 @@
 318	common	getrandom		sys_getrandom
 319	common	memfd_create		sys_memfd_create
 320	common	kexec_file_load		sys_kexec_file_load
+321	common	bpf			sys_bpf
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 0f86d85a9ce4..61bc112b9fa5 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -875,5 +875,6 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
 			    const char __user *uargs);
 asmlinkage long sys_getrandom(char __user *buf, size_t count,
 			      unsigned int flags);
-
+asmlinkage long sys_bpf(int cmd, unsigned long arg2, unsigned long arg3,
+			unsigned long arg4, unsigned long arg5);
 #endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 11d11bc5c78f..22749c134117 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp)
 __SYSCALL(__NR_getrandom, sys_getrandom)
 #define __NR_memfd_create 279
 __SYSCALL(__NR_memfd_create, sys_memfd_create)
+#define __NR_bpf 280
+__SYSCALL(__NR_bpf, sys_bpf)
 
 #undef __NR_syscalls
-#define __NR_syscalls 280
+#define __NR_syscalls 281
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 391d4ddb6f4b..b4b5083f5f5e 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -218,3 +218,6 @@ cond_syscall(sys_kcmp);
 
 /* operate on Secure Computing state */
 cond_syscall(sys_seccomp);
+
+/* access BPF programs and maps */
+cond_syscall(sys_bpf);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (4 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 05/29] bpf: enable bpf syscall on x64 and i386 Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-25 21:33   ` Cong Wang
  2014-08-24 20:21 ` [PATCH v5 net-next 07/29] bpf: add hashtable type of " Alexei Starovoitov
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

'maps' is a generic storage of different types for sharing data between kernel
and userspace.

The maps are accessed from user space via BPF syscall, which has commands:

- create a map with given type and attributes
  fd = bpf_map_create(map_type, struct nlattr *attr, int len)
  returns fd or negative error

- lookup key in a given map referenced by fd
  err = bpf_map_lookup_elem(int fd, void *key, void *value)
  returns zero and stores found elem into value or negative error

- create or update key/value pair in a given map
  err = bpf_map_update_elem(int fd, void *key, void *value)
  returns zero or negative error

- find and delete element by key in a given map
  err = bpf_map_delete_elem(int fd, void *key)

- iterate map elements (based on input key return next_key)
  err = bpf_map_get_next_key(int fd, void *key, void *next_key)

- close(fd) deletes the map

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/bpf.h      |    8 ++
 include/uapi/linux/bpf.h |   25 ++++++
 kernel/bpf/syscall.c     |  198 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 231 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 607ca53fe2af..fd1ac4b5ba8b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -9,6 +9,7 @@
 
 #include <uapi/linux/bpf.h>
 #include <linux/workqueue.h>
+#include <linux/file.h>
 
 struct bpf_map;
 struct nlattr;
@@ -18,6 +19,12 @@ struct bpf_map_ops {
 	/* funcs callable from userspace (via syscall) */
 	struct bpf_map *(*map_alloc)(struct nlattr *attrs[BPF_MAP_ATTR_MAX + 1]);
 	void (*map_free)(struct bpf_map *);
+	int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
+
+	/* funcs callable from userspace and from eBPF programs */
+	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
+	int (*map_update_elem)(struct bpf_map *map, void *key, void *value);
+	int (*map_delete_elem)(struct bpf_map *map, void *key);
 };
 
 struct bpf_map {
@@ -38,5 +45,6 @@ struct bpf_map_type_list {
 
 void bpf_register_map_type(struct bpf_map_type_list *tl);
 void bpf_map_put(struct bpf_map *map);
+struct bpf_map *bpf_map_get(struct fd f);
 
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index de21c8ecf0bb..f68edb2681f8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -326,6 +326,31 @@ enum bpf_cmd {
 	 * map is deleted when fd is closed
 	 */
 	BPF_MAP_CREATE,
+
+	/* lookup key in a given map
+	 * err = bpf_map_lookup_elem(int fd, void *key, void *value)
+	 * returns zero and stores found elem into value
+	 * or negative error
+	 */
+	BPF_MAP_LOOKUP_ELEM,
+
+	/* create or update key/value pair in a given map
+	 * err = bpf_map_update_elem(int fd, void *key, void *value)
+	 * returns zero or negative error
+	 */
+	BPF_MAP_UPDATE_ELEM,
+
+	/* find and delete elem by key in a given map
+	 * err = bpf_map_delete_elem(int fd, void *key)
+	 * returns zero or negative error
+	 */
+	BPF_MAP_DELETE_ELEM,
+
+	/* lookup key in a given map and return next key
+	 * err = bpf_map_get_elem(int fd, void *key, void *next_key)
+	 * returns zero and stores next key or negative error
+	 */
+	BPF_MAP_GET_NEXT_KEY,
 };
 
 enum bpf_map_attributes {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 04cdf7948f8f..45e100ece1b7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -13,6 +13,7 @@
 #include <linux/syscalls.h>
 #include <net/netlink.h>
 #include <linux/anon_inodes.h>
+#include <linux/file.h>
 
 static LIST_HEAD(bpf_map_types);
 
@@ -131,6 +132,189 @@ free_attr:
 	return err;
 }
 
+/* if error is returned, fd is released.
+ * On success caller should complete fd access with matching fdput()
+ */
+struct bpf_map *bpf_map_get(struct fd f)
+{
+	struct bpf_map *map;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	if (f.file->f_op != &bpf_map_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	map = f.file->private_data;
+
+	return map;
+}
+
+static int map_lookup_elem(int ufd, void __user *ukey, void __user *uvalue)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_map *map;
+	void *key, *value;
+	int err;
+
+	map = bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	err = -ENOMEM;
+	key = kmalloc(map->key_size, GFP_USER);
+	if (!key)
+		goto err_put;
+
+	err = -EFAULT;
+	if (copy_from_user(key, ukey, map->key_size) != 0)
+		goto free_key;
+
+	err = -ESRCH;
+	rcu_read_lock();
+	value = map->ops->map_lookup_elem(map, key);
+	if (!value)
+		goto err_unlock;
+
+	err = -EFAULT;
+	if (copy_to_user(uvalue, value, map->value_size) != 0)
+		goto err_unlock;
+
+	err = 0;
+
+err_unlock:
+	rcu_read_unlock();
+free_key:
+	kfree(key);
+err_put:
+	fdput(f);
+	return err;
+}
+
+static int map_update_elem(int ufd, void __user *ukey, void __user *uvalue)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_map *map;
+	void *key, *value;
+	int err;
+
+	map = bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	err = -ENOMEM;
+	key = kmalloc(map->key_size, GFP_USER);
+	if (!key)
+		goto err_put;
+
+	err = -EFAULT;
+	if (copy_from_user(key, ukey, map->key_size) != 0)
+		goto free_key;
+
+	err = -ENOMEM;
+	value = kmalloc(map->value_size, GFP_USER);
+	if (!value)
+		goto free_key;
+
+	err = -EFAULT;
+	if (copy_from_user(value, uvalue, map->value_size) != 0)
+		goto free_value;
+
+	/* eBPF program that use maps are running under rcu_read_lock(),
+	 * therefore all map accessors rely on this fact, so do the same here
+	 */
+	rcu_read_lock();
+	err = map->ops->map_update_elem(map, key, value);
+	rcu_read_unlock();
+
+free_value:
+	kfree(value);
+free_key:
+	kfree(key);
+err_put:
+	fdput(f);
+	return err;
+}
+
+static int map_delete_elem(int ufd, void __user *ukey)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_map *map;
+	void *key;
+	int err;
+
+	map = bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	err = -ENOMEM;
+	key = kmalloc(map->key_size, GFP_USER);
+	if (!key)
+		goto err_put;
+
+	err = -EFAULT;
+	if (copy_from_user(key, ukey, map->key_size) != 0)
+		goto free_key;
+
+	rcu_read_lock();
+	err = map->ops->map_delete_elem(map, key);
+	rcu_read_unlock();
+
+free_key:
+	kfree(key);
+err_put:
+	fdput(f);
+	return err;
+}
+
+static int map_get_next_key(int ufd, void __user *ukey, void __user *unext_key)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_map *map;
+	void *key, *next_key;
+	int err;
+
+	map = bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	err = -ENOMEM;
+	key = kmalloc(map->key_size, GFP_USER);
+	if (!key)
+		goto err_put;
+
+	err = -EFAULT;
+	if (copy_from_user(key, ukey, map->key_size) != 0)
+		goto free_key;
+
+	err = -ENOMEM;
+	next_key = kmalloc(map->key_size, GFP_USER);
+	if (!next_key)
+		goto free_key;
+
+	rcu_read_lock();
+	err = map->ops->map_get_next_key(map, key, next_key);
+	rcu_read_unlock();
+	if (err)
+		goto free_next_key;
+
+	err = -EFAULT;
+	if (copy_to_user(unext_key, next_key, map->key_size) != 0)
+		goto free_next_key;
+
+	err = 0;
+
+free_next_key:
+	kfree(next_key);
+free_key:
+	kfree(key);
+err_put:
+	fdput(f);
+	return err;
+}
+
 SYSCALL_DEFINE5(bpf, int, cmd, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -150,6 +334,20 @@ SYSCALL_DEFINE5(bpf, int, cmd, unsigned long, arg2, unsigned long, arg3,
 	case BPF_MAP_CREATE:
 		return map_create((enum bpf_map_type) arg2,
 				  (struct nlattr __user *) arg3, (int) arg4);
+	case BPF_MAP_LOOKUP_ELEM:
+		return map_lookup_elem((int) arg2, (void __user *) arg3,
+				       (void __user *) arg4);
+	case BPF_MAP_UPDATE_ELEM:
+		return map_update_elem((int) arg2, (void __user *) arg3,
+				       (void __user *) arg4);
+	case BPF_MAP_DELETE_ELEM:
+		if (arg4 != 0)
+			return -EINVAL;
+		return map_delete_elem((int) arg2, (void __user *) arg3);
+
+	case BPF_MAP_GET_NEXT_KEY:
+		return map_get_next_key((int) arg2, (void __user *) arg3,
+					(void __user *) arg4);
 	default:
 		return -EINVAL;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 07/29] bpf: add hashtable type of BPF maps
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (5 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 08/29] bpf: expand BPF syscall with program load/unload Alexei Starovoitov
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

add new map type: BPF_MAP_TYPE_HASH
and its simple (not auto resizeable) hash table implementation

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/uapi/linux/bpf.h |    1 +
 kernel/bpf/Makefile      |    2 +-
 kernel/bpf/hashtab.c     |  372 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 374 insertions(+), 1 deletion(-)
 create mode 100644 kernel/bpf/hashtab.c

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f68edb2681f8..8069ab7b64cf 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -365,6 +365,7 @@ enum bpf_map_attributes {
 
 enum bpf_map_type {
 	BPF_MAP_TYPE_UNSPEC,
+	BPF_MAP_TYPE_HASH,
 };
 
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index e9f7334ed07a..558e12712ebc 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1 +1 @@
-obj-y := core.o syscall.o
+obj-y := core.o syscall.o hashtab.o
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
new file mode 100644
index 000000000000..bc8d32f0f720
--- /dev/null
+++ b/kernel/bpf/hashtab.c
@@ -0,0 +1,372 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/bpf.h>
+#include <net/netlink.h>
+#include <linux/jhash.h>
+
+struct bpf_htab {
+	struct bpf_map map;
+	struct hlist_head *buckets;
+	struct kmem_cache *elem_cache;
+	spinlock_t lock;
+	u32 count; /* number of elements in this hashtable */
+	u32 n_buckets; /* number of hash buckets */
+	u32 elem_size; /* size of each element in bytes */
+};
+
+/* each htab element is struct htab_elem + key + value */
+struct htab_elem {
+	struct hlist_node hash_node;
+	struct rcu_head rcu;
+	struct bpf_htab *htab;
+	u32 hash;
+	u32 pad;
+	char key[0];
+};
+
+#define HASH_MAX_BUCKETS 1024
+#define BPF_MAP_MAX_KEY_SIZE 256
+static struct bpf_map *htab_map_alloc(struct nlattr *attr[BPF_MAP_ATTR_MAX + 1])
+{
+	struct bpf_htab *htab;
+	int err, i;
+
+	htab = kzalloc(sizeof(*htab), GFP_USER);
+	if (!htab)
+		return ERR_PTR(-ENOMEM);
+
+	/* look for mandatory map attributes */
+	err = -EINVAL;
+	if (!attr[BPF_MAP_KEY_SIZE])
+		goto free_htab;
+	htab->map.key_size = nla_get_u32(attr[BPF_MAP_KEY_SIZE]);
+
+	if (!attr[BPF_MAP_VALUE_SIZE])
+		goto free_htab;
+	htab->map.value_size = nla_get_u32(attr[BPF_MAP_VALUE_SIZE]);
+
+	if (!attr[BPF_MAP_MAX_ENTRIES])
+		goto free_htab;
+	htab->map.max_entries = nla_get_u32(attr[BPF_MAP_MAX_ENTRIES]);
+
+	htab->n_buckets = (htab->map.max_entries <= HASH_MAX_BUCKETS) ?
+			  htab->map.max_entries : HASH_MAX_BUCKETS;
+
+	/* hash table size must be power of 2 */
+	if ((htab->n_buckets & (htab->n_buckets - 1)) != 0)
+		goto free_htab;
+
+	err = -E2BIG;
+	if (htab->map.key_size > BPF_MAP_MAX_KEY_SIZE)
+		goto free_htab;
+
+	err = -ENOMEM;
+	htab->buckets = kmalloc_array(htab->n_buckets,
+				      sizeof(struct hlist_head), GFP_USER);
+
+	if (!htab->buckets)
+		goto free_htab;
+
+	for (i = 0; i < htab->n_buckets; i++)
+		INIT_HLIST_HEAD(&htab->buckets[i]);
+
+	spin_lock_init(&htab->lock);
+	htab->count = 0;
+
+	htab->elem_size = sizeof(struct htab_elem) +
+			  round_up(htab->map.key_size, 8) +
+			  htab->map.value_size;
+
+	htab->elem_cache = kmem_cache_create("bpf_htab", htab->elem_size, 0, 0,
+					     NULL);
+	if (!htab->elem_cache)
+		goto free_buckets;
+
+	return &htab->map;
+
+free_buckets:
+	kfree(htab->buckets);
+free_htab:
+	kfree(htab);
+	return ERR_PTR(err);
+}
+
+static inline u32 htab_map_hash(const void *key, u32 key_len)
+{
+	return jhash(key, key_len, 0);
+}
+
+static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+{
+	return &htab->buckets[hash & (htab->n_buckets - 1)];
+}
+
+static struct htab_elem *lookup_elem_raw(struct hlist_head *head, u32 hash,
+					 void *key, u32 key_size)
+{
+	struct htab_elem *l;
+
+	hlist_for_each_entry_rcu(l, head, hash_node) {
+		if (l->hash == hash && !memcmp(&l->key, key, key_size))
+			return l;
+	}
+	return NULL;
+}
+
+/* Must be called with rcu_read_lock. */
+static void *htab_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+	struct hlist_head *head;
+	struct htab_elem *l;
+	u32 hash, key_size;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	key_size = map->key_size;
+
+	hash = htab_map_hash(key, key_size);
+
+	head = select_bucket(htab, hash);
+
+	l = lookup_elem_raw(head, hash, key, key_size);
+
+	if (l)
+		return l->key + round_up(map->key_size, 8);
+	else
+		return NULL;
+}
+
+/* Must be called with rcu_read_lock. */
+static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+	struct hlist_head *head;
+	struct htab_elem *l, *next_l;
+	u32 hash, key_size;
+	int i;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	key_size = map->key_size;
+
+	hash = htab_map_hash(key, key_size);
+
+	head = select_bucket(htab, hash);
+
+	/* lookup the key */
+	l = lookup_elem_raw(head, hash, key, key_size);
+
+	if (!l) {
+		i = 0;
+		goto find_first_elem;
+	}
+
+	/* key was found, get next key in the same bucket */
+	next_l = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(&l->hash_node)),
+				  struct htab_elem, hash_node);
+
+	if (next_l) {
+		/* if next elem in this hash list is non-zero, just return it */
+		memcpy(next_key, next_l->key, key_size);
+		return 0;
+	} else {
+		/* no more elements in this hash list, go to the next bucket */
+		i = hash & (htab->n_buckets - 1);
+		i++;
+	}
+
+find_first_elem:
+	/* iterate over buckets */
+	for (; i < htab->n_buckets; i++) {
+		head = select_bucket(htab, i);
+
+		/* pick first element in the bucket */
+		next_l = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),
+					  struct htab_elem, hash_node);
+		if (next_l) {
+			/* if it's not empty, just return it */
+			memcpy(next_key, next_l->key, key_size);
+			return 0;
+		}
+	}
+
+	/* itereated over all buckets and all elements */
+	return -ENOENT;
+}
+
+static struct htab_elem *htab_alloc_elem(struct bpf_htab *htab)
+{
+	void *l;
+
+	l = kmem_cache_alloc(htab->elem_cache, GFP_ATOMIC);
+	if (!l)
+		return ERR_PTR(-ENOMEM);
+	return l;
+}
+
+static void free_htab_elem_rcu(struct rcu_head *rcu)
+{
+	struct htab_elem *l = container_of(rcu, struct htab_elem, rcu);
+
+	kmem_cache_free(l->htab->elem_cache, l);
+}
+
+static void release_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
+{
+	l->htab = htab;
+	call_rcu(&l->rcu, free_htab_elem_rcu);
+}
+
+/* Must be called with rcu_read_lock. */
+static int htab_map_update_elem(struct bpf_map *map, void *key, void *value)
+{
+	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+	struct htab_elem *l_new, *l_old;
+	struct hlist_head *head;
+	unsigned long flags;
+	u32 key_size;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	l_new = htab_alloc_elem(htab);
+	if (IS_ERR(l_new))
+		return -ENOMEM;
+
+	key_size = map->key_size;
+
+	memcpy(l_new->key, key, key_size);
+	memcpy(l_new->key + round_up(key_size, 8), value, map->value_size);
+
+	l_new->hash = htab_map_hash(l_new->key, key_size);
+
+	/* bpf_map_update_elem() can be called in_irq() as well, so
+	 * spin_lock() or spin_lock_bh() cannot be used
+	 */
+	spin_lock_irqsave(&htab->lock, flags);
+
+	head = select_bucket(htab, l_new->hash);
+
+	l_old = lookup_elem_raw(head, l_new->hash, key, key_size);
+
+	if (!l_old && unlikely(htab->count >= map->max_entries)) {
+		/* if elem with this 'key' doesn't exist and we've reached
+		 * max_entries limit, fail insertion of new elem
+		 */
+		spin_unlock_irqrestore(&htab->lock, flags);
+		kmem_cache_free(htab->elem_cache, l_new);
+		return -EFBIG;
+	}
+
+	/* add new element to the head of the list, so that concurrent
+	 * search will find it before old elem
+	 */
+	hlist_add_head_rcu(&l_new->hash_node, head);
+	if (l_old) {
+		hlist_del_rcu(&l_old->hash_node);
+		release_htab_elem(htab, l_old);
+	} else {
+		htab->count++;
+	}
+	spin_unlock_irqrestore(&htab->lock, flags);
+
+	return 0;
+}
+
+/* Must be called with rcu_read_lock. */
+static int htab_map_delete_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+	struct hlist_head *head;
+	struct htab_elem *l;
+	unsigned long flags;
+	u32 hash, key_size;
+	int ret = -ESRCH;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	key_size = map->key_size;
+
+	hash = htab_map_hash(key, key_size);
+
+	spin_lock_irqsave(&htab->lock, flags);
+
+	head = select_bucket(htab, hash);
+
+	l = lookup_elem_raw(head, hash, key, key_size);
+
+	if (l) {
+		hlist_del_rcu(&l->hash_node);
+		htab->count--;
+		release_htab_elem(htab, l);
+		ret = 0;
+	}
+
+	spin_unlock_irqrestore(&htab->lock, flags);
+	return ret;
+}
+
+static void delete_all_elements(struct bpf_htab *htab)
+{
+	int i;
+
+	for (i = 0; i < htab->n_buckets; i++) {
+		struct hlist_head *head = select_bucket(htab, i);
+		struct hlist_node *n;
+		struct htab_elem *l;
+
+		hlist_for_each_entry_safe(l, n, head, hash_node) {
+			hlist_del_rcu(&l->hash_node);
+			htab->count--;
+			kmem_cache_free(htab->elem_cache, l);
+		}
+	}
+}
+
+/* called when map->refcnt goes to zero */
+static void htab_map_free(struct bpf_map *map)
+{
+	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+
+	/* wait for all outstanding updates to complete */
+	synchronize_rcu();
+
+	/* kmem_cache_free all htab elements */
+	delete_all_elements(htab);
+
+	/* and destroy cache, which might sleep */
+	kmem_cache_destroy(htab->elem_cache);
+
+	kfree(htab->buckets);
+	kfree(htab);
+}
+
+static struct bpf_map_ops htab_ops = {
+	.map_alloc = htab_map_alloc,
+	.map_free = htab_map_free,
+	.map_get_next_key = htab_map_get_next_key,
+	.map_lookup_elem = htab_map_lookup_elem,
+	.map_update_elem = htab_map_update_elem,
+	.map_delete_elem = htab_map_delete_elem,
+};
+
+static struct bpf_map_type_list tl = {
+	.ops = &htab_ops,
+	.type = BPF_MAP_TYPE_HASH,
+};
+
+static int __init register_htab_map(void)
+{
+	bpf_register_map_type(&tl);
+	return 0;
+}
+late_initcall(register_htab_map);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 08/29] bpf: expand BPF syscall with program load/unload
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (6 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 07/29] bpf: add hashtable type of " Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 09/29] bpf: handle pseudo BPF_CALL insn Alexei Starovoitov
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

eBPF programs are safe run-to-completion functions with load/unload
methods from userspace similar to kernel modules.

User space API:

- load eBPF program
  fd = bpf_prog_load(bpf_prog_type, struct nlattr *prog, int len)

  where 'prog' is a sequence of sections (TEXT, LICENSE)
  TEXT - array of eBPF instructions
  LICENSE - must be GPL compatible to call helper functions marked gpl_only

- unload eBPF program
  close(fd)

User space example of syscall(__NR_bpf, BPF_PROG_LOAD, prog_type, ...)
follows in later patches

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/bpf.h      |   36 +++++++++
 include/linux/filter.h   |    9 ++-
 include/uapi/linux/bpf.h |   28 +++++++
 kernel/bpf/syscall.c     |  196 ++++++++++++++++++++++++++++++++++++++++++++++
 net/core/filter.c        |    2 +
 5 files changed, 269 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fd1ac4b5ba8b..ac6320f44812 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -47,4 +47,40 @@ void bpf_register_map_type(struct bpf_map_type_list *tl);
 void bpf_map_put(struct bpf_map *map);
 struct bpf_map *bpf_map_get(struct fd f);
 
+/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
+ * to in-kernel helper functions and for adjusting imm32 field in BPF_CALL
+ * instructions after verifying
+ */
+struct bpf_func_proto {
+	u64 (*func)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+	bool gpl_only;
+};
+
+struct bpf_verifier_ops {
+	/* return eBPF function prototype for verification */
+	const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
+};
+
+struct bpf_prog_type_list {
+	struct list_head list_node;
+	struct bpf_verifier_ops *ops;
+	enum bpf_prog_type type;
+};
+
+void bpf_register_prog_type(struct bpf_prog_type_list *tl);
+
+struct bpf_prog_info {
+	atomic_t refcnt;
+	bool is_gpl_compatible;
+	enum bpf_prog_type prog_type;
+	struct bpf_verifier_ops *ops;
+	struct bpf_map **used_maps;
+	u32 used_map_cnt;
+};
+
+struct bpf_prog;
+
+void bpf_prog_put(struct bpf_prog *prog);
+struct bpf_prog *bpf_prog_get(u32 ufd);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index f04793474d16..f06913b29861 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -31,11 +31,16 @@ struct sock_fprog_kern {
 struct sk_buff;
 struct sock;
 struct seccomp_data;
+struct bpf_prog_info;
 
 struct bpf_prog {
 	u32			jited:1,	/* Is our filter JIT'ed? */
-				len:31;		/* Number of filter blocks */
-	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
+				has_info:1,	/* whether 'info' is valid */
+				len:30;		/* Number of filter blocks */
+	union {
+		struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
+		struct bpf_prog_info	*info;
+	};
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct bpf_insn *filter);
 	union {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8069ab7b64cf..7468fe55db7b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -351,6 +351,13 @@ enum bpf_cmd {
 	 * returns zero and stores next key or negative error
 	 */
 	BPF_MAP_GET_NEXT_KEY,
+
+	/* verify and load eBPF program
+	 * prog_id = bpf_prog_load(bpf_prog_type, struct nlattr *prog, int len)
+	 * prog is a sequence of sections
+	 * returns fd or negative error
+	 */
+	BPF_PROG_LOAD,
 };
 
 enum bpf_map_attributes {
@@ -368,4 +375,25 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_HASH,
 };
 
+enum bpf_prog_attributes {
+	BPF_PROG_UNSPEC,
+	BPF_PROG_TEXT,		/* array of eBPF instructions */
+	BPF_PROG_LICENSE,	/* license string */
+	__BPF_PROG_ATTR_MAX,
+};
+#define BPF_PROG_ATTR_MAX (__BPF_PROG_ATTR_MAX - 1)
+#define BPF_PROG_MAX_ATTR_SIZE 65535
+
+enum bpf_prog_type {
+	BPF_PROG_TYPE_UNSPEC,
+};
+
+/* integer value in 'imm' field of BPF_CALL instruction selects which helper
+ * function eBPF program intends to call
+ */
+enum bpf_func_id {
+	BPF_FUNC_unspec,
+	__BPF_FUNC_MAX_ID,
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 45e100ece1b7..4c5f5169f6fc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -14,6 +14,8 @@
 #include <net/netlink.h>
 #include <linux/anon_inodes.h>
 #include <linux/file.h>
+#include <linux/license.h>
+#include <linux/filter.h>
 
 static LIST_HEAD(bpf_map_types);
 
@@ -315,6 +317,197 @@ err_put:
 	return err;
 }
 
+static LIST_HEAD(bpf_prog_types);
+
+static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
+{
+	struct bpf_prog_type_list *tl;
+
+	list_for_each_entry(tl, &bpf_prog_types, list_node) {
+		if (tl->type == type) {
+			prog->info->ops = tl->ops;
+			prog->info->prog_type = type;
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
+void bpf_register_prog_type(struct bpf_prog_type_list *tl)
+{
+	list_add(&tl->list_node, &bpf_prog_types);
+}
+
+/* drop refcnt on maps used by eBPF program and free auxilary data */
+static void free_bpf_prog_info(struct bpf_prog_info *info)
+{
+	int i;
+
+	for (i = 0; i < info->used_map_cnt; i++)
+		bpf_map_put(info->used_maps[i]);
+
+	kfree(info->used_maps);
+	kfree(info);
+}
+
+void bpf_prog_put(struct bpf_prog *prog)
+{
+	BUG_ON(!prog->has_info);
+	if (atomic_dec_and_test(&prog->info->refcnt)) {
+		free_bpf_prog_info(prog->info);
+		bpf_prog_free(prog);
+	}
+}
+
+static int bpf_prog_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_prog *prog = filp->private_data;
+
+	bpf_prog_put(prog);
+	return 0;
+}
+
+static const struct file_operations bpf_prog_fops = {
+        .release = bpf_prog_release,
+};
+
+static struct bpf_prog *get_prog(struct fd f)
+{
+	struct bpf_prog *prog;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	if (f.file->f_op != &bpf_prog_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	prog = f.file->private_data;
+
+	return prog;
+}
+
+/* called by sockets/tracing/seccomp before attaching program to an event
+ * pairs with bpf_prog_put()
+ */
+struct bpf_prog *bpf_prog_get(u32 ufd)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_prog *prog;
+
+	prog = get_prog(f);
+
+	if (IS_ERR(prog))
+		return prog;
+
+	atomic_inc(&prog->info->refcnt);
+	fdput(f);
+	return prog;
+}
+
+static const struct nla_policy prog_policy[BPF_PROG_ATTR_MAX + 1] = {
+	[BPF_PROG_TEXT]      = { .type = NLA_BINARY },
+	[BPF_PROG_LICENSE]   = { .type = NLA_NUL_STRING },
+};
+
+static int bpf_prog_load(enum bpf_prog_type type, struct nlattr __user *uattr,
+			 int len)
+{
+	struct nlattr *tb[BPF_PROG_ATTR_MAX + 1];
+	struct bpf_prog *prog;
+	struct nlattr *attr;
+	size_t insn_len;
+	int err;
+	bool is_gpl;
+
+	if (len <= 0 || len > BPF_PROG_MAX_ATTR_SIZE)
+		return -EINVAL;
+
+	attr = kmalloc(len, GFP_USER);
+	if (!attr)
+		return -ENOMEM;
+
+	/* copy eBPF program from user space */
+	err = -EFAULT;
+	if (copy_from_user(attr, uattr, len) != 0)
+		goto free_attr;
+
+	/* perform basic validation */
+	err = nla_parse(tb, BPF_PROG_ATTR_MAX, attr, len, prog_policy);
+	if (err < 0)
+		goto free_attr;
+
+	err = -EINVAL;
+	/* look for mandatory license string */
+	if (!tb[BPF_PROG_LICENSE])
+		goto free_attr;
+
+	/* eBPF programs must be GPL compatible to use GPL-ed functions */
+	is_gpl = license_is_gpl_compatible(nla_data(tb[BPF_PROG_LICENSE]));
+
+	/* look for mandatory array of eBPF instructions */
+	if (!tb[BPF_PROG_TEXT])
+		goto free_attr;
+
+	insn_len = nla_len(tb[BPF_PROG_TEXT]);
+	if (insn_len % sizeof(struct bpf_insn) != 0 || insn_len <= 0)
+		goto free_attr;
+
+	/* plain bpf_prog allocation */
+	err = -ENOMEM;
+	prog = kmalloc(bpf_prog_size(insn_len), GFP_USER);
+	if (!prog)
+		goto free_attr;
+
+	prog->len = insn_len / sizeof(struct bpf_insn);
+	memcpy(prog->insns, nla_data(tb[BPF_PROG_TEXT]), insn_len);
+	prog->orig_prog = NULL;
+	prog->jited = 0;
+	prog->has_info = 0;
+
+	/* allocate eBPF related auxilary data */
+	prog->info = kzalloc(sizeof(struct bpf_prog_info), GFP_USER);
+	if (!prog->info)
+		goto free_prog;
+	prog->has_info = 1;
+	atomic_set(&prog->info->refcnt, 1);
+	prog->info->is_gpl_compatible = is_gpl;
+
+	/* find program type: socket_filter vs tracing_filter */
+	err = find_prog_type(type, prog);
+	if (err < 0)
+		goto free_prog_info;
+
+	/* run eBPF verifier */
+	/* err = bpf_check(prog, tb); */
+
+	if (err < 0)
+		goto free_prog_info;
+
+	/* eBPF program is ready to be JITed */
+	bpf_prog_select_runtime(prog);
+
+	err = anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog, O_RDWR | O_CLOEXEC);
+
+	if (err < 0)
+		/* failed to allocate fd */
+		goto free_prog_info;
+
+	/* user supplied eBPF prog attributes are no longer needed */
+	kfree(attr);
+
+	return err;
+
+free_prog_info:
+	free_bpf_prog_info(prog->info);
+free_prog:
+	bpf_prog_free(prog);
+free_attr:
+	kfree(attr);
+	return err;
+}
+
 SYSCALL_DEFINE5(bpf, int, cmd, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -348,6 +541,9 @@ SYSCALL_DEFINE5(bpf, int, cmd, unsigned long, arg2, unsigned long, arg3,
 	case BPF_MAP_GET_NEXT_KEY:
 		return map_get_next_key((int) arg2, (void __user *) arg3,
 					(void __user *) arg4);
+	case BPF_PROG_LOAD:
+		return bpf_prog_load((enum bpf_prog_type) arg2,
+				     (struct nlattr __user *) arg3, (int) arg4);
 	default:
 		return -EINVAL;
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index d814b8a89d0f..ed15874a9beb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -835,6 +835,7 @@ static void bpf_release_orig_filter(struct bpf_prog *fp)
 {
 	struct sock_fprog_kern *fprog = fp->orig_prog;
 
+	BUG_ON(fp->has_info);
 	if (fprog) {
 		kfree(fprog->filter);
 		kfree(fprog);
@@ -973,6 +974,7 @@ static struct bpf_prog *bpf_prepare_filter(struct bpf_prog *fp)
 
 	fp->bpf_func = NULL;
 	fp->jited = 0;
+	fp->has_info = 0;
 
 	err = bpf_check_classic(fp->insns, fp->len);
 	if (err) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 09/29] bpf: handle pseudo BPF_CALL insn
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (7 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 08/29] bpf: expand BPF syscall with program load/unload Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 10/29] bpf: verifier (add docs) Alexei Starovoitov
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

in native eBPF programs userspace is using pseudo BPF_CALL instructions
which encode one of 'enum bpf_func_id' inside insn->imm field.
Verifier checks that program using correct function arguments to given func_id.
If all checks passed, kernel needs to fixup BPF_CALL->imm fields by
replacing func_id with in-kernel function pointer.
eBPF interpreter just calls the function.

In-kernel eBPF users continue to use generic BPF_CALL.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 kernel/bpf/syscall.c |   37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4c5f5169f6fc..5a336af61858 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -338,6 +338,40 @@ void bpf_register_prog_type(struct bpf_prog_type_list *tl)
 	list_add(&tl->list_node, &bpf_prog_types);
 }
 
+/* fixup insn->imm field of bpf_call instructions:
+ * if (insn->imm == BPF_FUNC_map_lookup_elem)
+ *      insn->imm = bpf_map_lookup_elem - __bpf_call_base;
+ * else if (insn->imm == BPF_FUNC_map_update_elem)
+ *      insn->imm = bpf_map_update_elem - __bpf_call_base;
+ * else ...
+ *
+ * this function is called after eBPF program passed verification
+ */
+static void fixup_bpf_calls(struct bpf_prog *prog)
+{
+	const struct bpf_func_proto *fn;
+	int i;
+
+	for (i = 0; i < prog->len; i++) {
+		struct bpf_insn *insn = &prog->insnsi[i];
+
+		if (insn->code == (BPF_JMP | BPF_CALL)) {
+			/* we reach here when program has bpf_call instructions
+			 * and it passed bpf_check(), means that
+			 * ops->get_func_proto must have been supplied, check it
+			 */
+			BUG_ON(!prog->info->ops->get_func_proto);
+
+			fn = prog->info->ops->get_func_proto(insn->imm);
+			/* all functions that have prototype and verifier allowed
+			 * programs to call them, must be real in-kernel functions
+			 */
+			BUG_ON(!fn->func);
+			insn->imm = fn->func - __bpf_call_base;
+		}
+	}
+}
+
 /* drop refcnt on maps used by eBPF program and free auxilary data */
 static void free_bpf_prog_info(struct bpf_prog_info *info)
 {
@@ -485,6 +519,9 @@ static int bpf_prog_load(enum bpf_prog_type type, struct nlattr __user *uattr,
 	if (err < 0)
 		goto free_prog_info;
 
+	/* fixup BPF_CALL->imm field */
+	fixup_bpf_calls(prog);
+
 	/* eBPF program is ready to be JITed */
 	bpf_prog_select_runtime(prog);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 10/29] bpf: verifier (add docs)
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (8 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 09/29] bpf: handle pseudo BPF_CALL insn Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 11/29] bpf: verifier (add ability to receive verification log) Alexei Starovoitov
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

this patch adds all of eBPF verfier documentation and empty bpf_check()

The end goal for the verifier is to statically check safety of the program.

Verifier will catch:
- loops
- out of range jumps
- unreachable instructions
- invalid instructions
- uninitialized register access
- uninitialized stack access
- misaligned stack access
- out of range stack access
- invalid calling convention

More details in Documentation/networking/filter.txt

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 Documentation/networking/filter.txt |  230 +++++++++++++++++++++++++++++++++++
 include/linux/bpf.h                 |    2 +
 kernel/bpf/Makefile                 |    2 +-
 kernel/bpf/syscall.c                |    2 +-
 kernel/bpf/verifier.c               |  152 +++++++++++++++++++++++
 5 files changed, 386 insertions(+), 2 deletions(-)
 create mode 100644 kernel/bpf/verifier.c

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 27a0a6c6acb4..b121b01c3af4 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -1001,6 +1001,105 @@ instruction that loads 64-bit immediate value into a dst_reg.
 Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
 32-bit immediate value into a register.
 
+eBPF verifier
+-------------
+The safety of the eBPF program is determined in two steps.
+
+First step does DAG check to disallow loops and other CFG validation.
+In particular it will detect programs that have unreachable instructions.
+(though classic BPF checker allows them)
+
+Second step starts from the first insn and descends all possible paths.
+It simulates execution of every insn and observes the state change of
+registers and stack.
+
+At the start of the program the register R1 contains a pointer to context
+and has type PTR_TO_CTX.
+If verifier sees an insn that does R2=R1, then R2 has now type
+PTR_TO_CTX as well and can be used on the right hand side of expression.
+If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=UNKNOWN_VALUE,
+since addition of two valid pointers makes invalid pointer.
+(In 'secure' mode verifier will reject any type of pointer arithmetic to make
+sure that kernel addresses don't leak to unprivileged users)
+
+If register was never written to, it's not readable:
+  bpf_mov R0 = R2
+  bpf_exit
+will be rejected, since R2 is unreadable at the start of the program.
+
+After kernel function call, R1-R5 are reset to unreadable and
+R0 has a return type of the function.
+
+Since R6-R9 are callee saved, their state is preserved across the call.
+  bpf_mov R6 = 1
+  bpf_call foo
+  bpf_mov R0 = R6
+  bpf_exit
+is a correct program. If there was R1 instead of R6, it would have
+been rejected.
+
+Classic BPF register X is mapped to eBPF register R7 inside sk_convert_filter(),
+so that its state is preserved across calls.
+
+load/store instructions are allowed only with registers of valid types, which
+are PTR_TO_CTX, PTR_TO_MAP, FRAME_PTR. They are bounds and alignment checked.
+For example:
+ bpf_mov R1 = 1
+ bpf_mov R2 = 2
+ bpf_xadd *(u32 *)(R1 + 3) += R2
+ bpf_exit
+will be rejected, since R1 doesn't have a valid pointer type at the time of
+execution of instruction bpf_xadd.
+
+At the start R1 contains pointer to ctx and R1 type is PTR_TO_CTX.
+ctx is generic. verifier is configured to known what context is for particular
+class of bpf programs. For example, context == skb (for socket filters) and
+ctx == seccomp_data for seccomp filters.
+A callback is used to customize verifier to restrict eBPF program access to only
+certain fields within ctx structure with specified size and alignment.
+
+For example, the following insn:
+  bpf_ld R0 = *(u32 *)(R6 + 8)
+intends to load a word from address R6 + 8 and store it into R0
+If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
+that offset 8 of size 4 bytes can be accessed for reading, otherwise
+the verifier will reject the program.
+If R6=FRAME_PTR, then access should be aligned and be within
+stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
+so it will fail verification, since it's out of bounds.
+
+The verifier will allow eBPF program to read data from stack only after
+it wrote into it.
+Classic BPF verifier does similar check with M[0-15] memory slots.
+For example:
+  bpf_ld R0 = *(u32 *)(R10 - 4)
+  bpf_exit
+is invalid program.
+Though R10 is correct read-only register and has type FRAME_PTR
+and R10 - 4 is within stack bounds, there were no stores into that location.
+
+Pointer register spill/fill is tracked as well, since four (R6-R9)
+callee saved registers may not be enough for some programs.
+
+Allowed function calls are customized with bpf_verifier_ops->get_func_proto()
+The eBPF verifier will check that registers match argument constraints.
+After the call register R0 will be set to return type of the function.
+
+Function calls is a main mechanism to extend functionality of eBPF programs.
+Socket filters may let programs to call one set of functions, whereas tracing
+filters may allow completely different set.
+
+If a function made accessible to eBPF program, it needs to be thought through
+from security point of view. The verifier will guarantee that the function is
+called with valid arguments.
+
+seccomp vs socket filters have different security restrictions for classic BPF.
+Seccomp solves this by two stage verifier: classic BPF verifier is followed
+by seccomp verifier. In case of eBPF one configurable verifier is shared for
+all use cases.
+
+See details of eBPF verifier in kernel/bpf/verifier.c
+
 eBPF maps
 ---------
 'maps' is a generic storage of different types for sharing data between kernel
@@ -1072,6 +1171,137 @@ size. It will not let programs pass junk values to bpf_map_*_elem() functions,
 so these functions (implemented in C inside kernel) can safely access
 the pointers in all cases.
 
+Understanding eBPF verifier messages
+------------------------------------
+
+The following are few examples of invalid eBPF programs and verifier error
+messages as seen in the log:
+
+Program with unreachable instructions:
+static struct bpf_insn prog[] = {
+  BPF_EXIT_INSN(),
+  BPF_EXIT_INSN(),
+};
+Error:
+  unreachable insn 1
+
+Program that reads uninitialized register:
+  BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+  BPF_EXIT_INSN(),
+Error:
+  0: (bf) r0 = r2
+  R2 !read_ok
+
+Program that doesn't initialize R0 before exiting:
+  BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
+  BPF_EXIT_INSN(),
+Error:
+  0: (bf) r2 = r1
+  1: (95) exit
+  R0 !read_ok
+
+Program that accesses stack out of bounds:
+  BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
+  BPF_EXIT_INSN(),
+Error:
+  0: (7a) *(u64 *)(r10 +8) = 0
+  invalid stack off=8 size=8
+
+Program that doesn't initialize stack before passing its address into function:
+  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+  BPF_LD_MAP_FD(BPF_REG_1, 0),
+  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+  BPF_EXIT_INSN(),
+Error:
+  0: (bf) r2 = r10
+  1: (07) r2 += -8
+  2: (b7) r1 = 0x0
+  3: (85) call 1
+  invalid indirect read from stack off -8+0 size 8
+
+Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:
+  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+  BPF_LD_MAP_FD(BPF_REG_1, 0),
+  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+  BPF_EXIT_INSN(),
+Error:
+  0: (7a) *(u64 *)(r10 -8) = 0
+  1: (bf) r2 = r10
+  2: (07) r2 += -8
+  3: (b7) r1 = 0x0
+  4: (85) call 1
+  fd 0 is not pointing to valid bpf_map
+
+Program that doesn't check return value of map_lookup_elem() before accessing
+map element:
+  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+  BPF_LD_MAP_FD(BPF_REG_1, 0),
+  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+  BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
+  BPF_EXIT_INSN(),
+Error:
+  0: (7a) *(u64 *)(r10 -8) = 0
+  1: (bf) r2 = r10
+  2: (07) r2 += -8
+  3: (b7) r1 = 0x0
+  4: (85) call 1
+  5: (7a) *(u64 *)(r0 +0) = 0
+  R0 invalid mem access 'map_value_or_null'
+
+Program that correctly checks map_lookup_elem() returned value for NULL, but
+accesses the memory with incorrect alignment:
+  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+  BPF_LD_MAP_FD(BPF_REG_1, 0),
+  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+  BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+  BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
+  BPF_EXIT_INSN(),
+Error:
+  0: (7a) *(u64 *)(r10 -8) = 0
+  1: (bf) r2 = r10
+  2: (07) r2 += -8
+  3: (b7) r1 = 1
+  4: (85) call 1
+  5: (15) if r0 == 0x0 goto pc+1
+   R0=map_ptr R10=fp
+  6: (7a) *(u64 *)(r0 +4) = 0
+  misaligned access off 4 size 8
+
+Program that correctly checks map_lookup_elem() returned value for NULL and
+accesses memory with correct alignment in one side of 'if' branch, but fails
+to do so in the other side of 'if' branch:
+  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+  BPF_LD_MAP_FD(BPF_REG_1, 0),
+  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+  BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+  BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
+  BPF_EXIT_INSN(),
+  BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
+  BPF_EXIT_INSN(),
+Error:
+  0: (7a) *(u64 *)(r10 -8) = 0
+  1: (bf) r2 = r10
+  2: (07) r2 += -8
+  3: (b7) r1 = 1
+  4: (85) call 1
+  5: (15) if r0 == 0x0 goto pc+2
+   R0=map_ptr R10=fp
+  6: (7a) *(u64 *)(r0 +0) = 0
+  7: (95) exit
+
+  from 5 to 8: R0=imm0 R10=fp
+  8: (7a) *(u64 *)(r0 +0) = 1
+  R0 invalid mem access 'imm'
+
 Testing
 -------
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ac6320f44812..d818e473d12c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -82,5 +82,7 @@ struct bpf_prog;
 
 void bpf_prog_put(struct bpf_prog *prog);
 struct bpf_prog *bpf_prog_get(u32 ufd);
+/* verify correctness of eBPF program */
+int bpf_check(struct bpf_prog *fp, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1]);
 
 #endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 558e12712ebc..95a9035e0f29 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1 +1 @@
-obj-y := core.o syscall.o hashtab.o
+obj-y := core.o syscall.o hashtab.o verifier.o
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5a336af61858..a3581646ee11 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -514,7 +514,7 @@ static int bpf_prog_load(enum bpf_prog_type type, struct nlattr __user *uattr,
 		goto free_prog_info;
 
 	/* run eBPF verifier */
-	/* err = bpf_check(prog, tb); */
+	err = bpf_check(prog, tb);
 
 	if (err < 0)
 		goto free_prog_info;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
new file mode 100644
index 000000000000..c1dc2441994a
--- /dev/null
+++ b/kernel/bpf/verifier.c
@@ -0,0 +1,152 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <net/netlink.h>
+#include <linux/file.h>
+#include <linux/vmalloc.h>
+
+/* bpf_check() is a static code analyzer that walks eBPF program
+ * instruction by instruction and updates register/stack state.
+ * All paths of conditional branches are analyzed until 'bpf_exit' insn.
+ *
+ * At the first pass depth-first-search verifies that the BPF program is a DAG.
+ * It rejects the following programs:
+ * - larger than BPF_MAXINSNS insns
+ * - if loop is present (detected via back-edge)
+ * - unreachable insns exist (shouldn't be a forest. program = one function)
+ * - out of bounds or malformed jumps
+ * The second pass is all possible path descent from the 1st insn.
+ * Conditional branch target insns keep a link list of verifier states.
+ * If the state already visited, this path can be pruned.
+ * If it wasn't a DAG, such state prunning would be incorrect, since it would
+ * skip cycles. Since it's analyzing all pathes through the program,
+ * the length of the analysis is limited to 32k insn, which may be hit even
+ * if insn_cnt < 4K, but there are too many branches that change stack/regs.
+ * Number of 'branches to be analyzed' is limited to 1k
+ *
+ * On entry to each instruction, each register has a type, and the instruction
+ * changes the types of the registers depending on instruction semantics.
+ * If instruction is BPF_MOV64_REG(BPF_REG_1, BPF_REG_5), then type of R5 is
+ * copied to R1.
+ *
+ * All registers are 64-bit (even on 32-bit arch)
+ * R0 - return register
+ * R1-R5 argument passing registers
+ * R6-R9 callee saved registers
+ * R10 - frame pointer read-only
+ *
+ * At the start of BPF program the register R1 contains a pointer to bpf_context
+ * and has type PTR_TO_CTX.
+ *
+ * Most of the time the registers have UNKNOWN_VALUE type, which
+ * means the register has some value, but it's not a valid pointer.
+ * Verifier doesn't attemp to track all arithmetic operations on pointers.
+ * The only special case is the sequence:
+ *    BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+ *    BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -20),
+ * 1st insn copies R10 (which has FRAME_PTR) type into R1
+ * and 2nd arithmetic instruction is pattern matched to recognize
+ * that it wants to construct a pointer to some element within stack.
+ * So after 2nd insn, the register R1 has type PTR_TO_STACK
+ * (and -20 constant is saved for further stack bounds checking).
+ * Meaning that this reg is a pointer to stack plus known immediate constant.
+ *
+ * When program is doing load or store insns the type of base register can be:
+ * PTR_TO_MAP_VALUE, PTR_TO_CTX, FRAME_PTR. These are three pointer types recognized
+ * by check_mem_access() function.
+ *
+ * PTR_TO_MAP_VALUE means that this register is pointing to 'map element value'
+ * and the range of [ptr, ptr + map's value_size) is accessible.
+ *
+ * registers used to pass pointers to function calls are verified against
+ * function prototypes
+ *
+ * ARG_PTR_TO_MAP_KEY is a function argument constraint.
+ * It means that the register type passed to this function must be
+ * PTR_TO_STACK and it will be used inside the function as
+ * 'pointer to map element key'
+ *
+ * For example the argument constraints for bpf_map_lookup_elem():
+ *   .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+ *   .arg1_type = ARG_CONST_MAP_ID,
+ *   .arg2_type = ARG_PTR_TO_MAP_KEY,
+ *
+ * ret_type says that this function returns 'pointer to map elem value or null'
+ * 1st argument is a 'const immediate' value which must be one of valid map_ids.
+ * 2nd argument is a pointer to stack, which will be used inside the function as
+ * a pointer to map element key.
+ *
+ * On the kernel side the helper function looks like:
+ * u64 bpf_map_lookup_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+ * {
+ *    struct bpf_map *map;
+ *    int map_id = r1;
+ *    void *key = (void *) (unsigned long) r2;
+ *    void *value;
+ *
+ *    here kernel can access 'key' pointer safely, knowing that
+ *    [key, key + map->key_size) bytes are valid and were initialized on
+ *    the stack of eBPF program.
+ * }
+ *
+ * Corresponding eBPF program looked like:
+ *    BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),  // after this insn R2 type is FRAME_PTR
+ *    BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), // after this insn R2 type is PTR_TO_STACK
+ *    BPF_MOV64_IMM(BPF_REG_1, MAP_ID),      // after this insn R1 type is CONST_ARG
+ *    BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+ * here verifier looks a prototype of map_lookup_elem and sees:
+ * .arg1_type == ARG_CONST_MAP_ID and R1->type == CONST_ARG, which is ok so far,
+ * then it goes and finds a map with map_id equal to R1->imm value.
+ * Now verifier knows that this map has key of key_size bytes
+ *
+ * Then .arg2_type == ARG_PTR_TO_MAP_KEY and R2->type == PTR_TO_STACK, ok so far,
+ * Now verifier checks that [R2, R2 + map's key_size) are within stack limits
+ * and were initialized prior to this call.
+ * If it's ok, then verifier allows this BPF_CALL insn and looks at
+ * .ret_type which is RET_PTR_TO_MAP_VALUE_OR_NULL, so it sets
+ * R0->type = PTR_TO_MAP_VALUE_OR_NULL which means bpf_map_lookup_elem() function
+ * returns ether pointer to map value or NULL.
+ *
+ * When type PTR_TO_MAP_VALUE_OR_NULL passes through 'if (reg != 0) goto +off'
+ * insn, the register holding that pointer in the true branch changes state to
+ * PTR_TO_MAP_VALUE and the same register changes state to CONST_IMM in the false
+ * branch. See check_cond_jmp_op().
+ *
+ * After the call R0 is set to return type of the function and registers R1-R5
+ * are set to NOT_INIT to indicate that they are no longer readable.
+ *
+ * load/store alignment is checked:
+ *    BPF_STX_MEM(BPF_DW, dest_reg, src_reg, 3)
+ * is rejected, because it's misaligned
+ *
+ * load/store to stack are bounds checked and register spill is tracked
+ *    BPF_STX_MEM(BPF_B, BPF_REG_10, src_reg, 0)
+ * is rejected, because it's out of bounds
+ *
+ * load/store to map are bounds checked:
+ *    BPF_STX_MEM(BPF_H, dest_reg, src_reg, 8)
+ * is ok, if dest_reg->type == PTR_TO_MAP_VALUE and
+ * 8 + sizeof(u16) <= map_info->value_size
+ *
+ * load/store to bpf_context are checked against known fields
+ */
+
+int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
+{
+	int ret = -EINVAL;
+
+	return ret;
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 11/29] bpf: verifier (add ability to receive verification log)
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (9 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 10/29] bpf: verifier (add docs) Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 12/29] bpf: handle pseudo BPF_LD_IMM64 insn Alexei Starovoitov
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

add optional attributes for BPF_PROG_LOAD syscall:
  BPF_PROG_LOG_LEVEL,     /* verbosity level of eBPF verifier */
  BPF_PROG_LOG_BUF,       /* user supplied buffer */
  BPF_PROG_LOG_SIZE,      /* size of user buffer */

In such case the verifier will return its verification log in the user
supplied buffer which can be used by humans to analyze why verifier
rejected given program

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/uapi/linux/bpf.h |    4 +
 kernel/bpf/syscall.c     |    3 +
 kernel/bpf/verifier.c    |  236 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 243 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7468fe55db7b..d280acad4b28 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -379,6 +379,10 @@ enum bpf_prog_attributes {
 	BPF_PROG_UNSPEC,
 	BPF_PROG_TEXT,		/* array of eBPF instructions */
 	BPF_PROG_LICENSE,	/* license string */
+	/* optional program attributes */
+	BPF_PROG_LOG_LEVEL,	/* verbosity level of eBPF verifier */
+	BPF_PROG_LOG_BUF,	/* user supplied buffer */
+	BPF_PROG_LOG_SIZE,	/* size of user buffer */
 	__BPF_PROG_ATTR_MAX,
 };
 #define BPF_PROG_ATTR_MAX (__BPF_PROG_ATTR_MAX - 1)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a3581646ee11..60cb760cb423 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -443,6 +443,9 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
 static const struct nla_policy prog_policy[BPF_PROG_ATTR_MAX + 1] = {
 	[BPF_PROG_TEXT]      = { .type = NLA_BINARY },
 	[BPF_PROG_LICENSE]   = { .type = NLA_NUL_STRING },
+	[BPF_PROG_LOG_LEVEL] = { .type = NLA_U32 },
+	[BPF_PROG_LOG_BUF]   = { .len = sizeof(void *) },
+	[BPF_PROG_LOG_SIZE]  = { .type = NLA_U32 },
 };
 
 static int bpf_prog_load(enum bpf_prog_type type, struct nlattr __user *uattr,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c1dc2441994a..2484a3387ca2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -144,9 +144,245 @@
  * load/store to bpf_context are checked against known fields
  */
 
+/* single container for all structs
+ * one verifier_env per bpf_check() call
+ */
+struct verifier_env {
+};
+
+/* verbose verifier prints what it's seeing
+ * bpf_check() is called under lock, so no race to access these global vars
+ */
+static u32 log_level, log_size, log_len;
+static void *log_buf;
+
+static DEFINE_MUTEX(bpf_verifier_lock);
+
+/* log_level controls verbosity level of eBPF verifier.
+ * verbose() is used to dump the verification trace to the log, so the user
+ * can figure out what's wrong with the program
+ */
+static void verbose(const char *fmt, ...)
+{
+	va_list args;
+
+	if (log_level == 0 || log_len >= log_size - 1)
+		return;
+
+	va_start(args, fmt);
+	log_len += vscnprintf(log_buf + log_len, log_size - log_len, fmt, args);
+	va_end(args);
+}
+
+static const char *const bpf_class_string[] = {
+	[BPF_LD]    = "ld",
+	[BPF_LDX]   = "ldx",
+	[BPF_ST]    = "st",
+	[BPF_STX]   = "stx",
+	[BPF_ALU]   = "alu",
+	[BPF_JMP]   = "jmp",
+	[BPF_RET]   = "BUG",
+	[BPF_ALU64] = "alu64",
+};
+
+static const char *const bpf_alu_string[] = {
+	[BPF_ADD >> 4]  = "+=",
+	[BPF_SUB >> 4]  = "-=",
+	[BPF_MUL >> 4]  = "*=",
+	[BPF_DIV >> 4]  = "/=",
+	[BPF_OR  >> 4]  = "|=",
+	[BPF_AND >> 4]  = "&=",
+	[BPF_LSH >> 4]  = "<<=",
+	[BPF_RSH >> 4]  = ">>=",
+	[BPF_NEG >> 4]  = "neg",
+	[BPF_MOD >> 4]  = "%=",
+	[BPF_XOR >> 4]  = "^=",
+	[BPF_MOV >> 4]  = "=",
+	[BPF_ARSH >> 4] = "s>>=",
+	[BPF_END >> 4]  = "endian",
+};
+
+static const char *const bpf_ldst_string[] = {
+	[BPF_W >> 3]  = "u32",
+	[BPF_H >> 3]  = "u16",
+	[BPF_B >> 3]  = "u8",
+	[BPF_DW >> 3] = "u64",
+};
+
+static const char *const bpf_jmp_string[] = {
+	[BPF_JA >> 4]   = "jmp",
+	[BPF_JEQ >> 4]  = "==",
+	[BPF_JGT >> 4]  = ">",
+	[BPF_JGE >> 4]  = ">=",
+	[BPF_JSET >> 4] = "&",
+	[BPF_JNE >> 4]  = "!=",
+	[BPF_JSGT >> 4] = "s>",
+	[BPF_JSGE >> 4] = "s>=",
+	[BPF_CALL >> 4] = "call",
+	[BPF_EXIT >> 4] = "exit",
+};
+
+static void print_bpf_insn(struct bpf_insn *insn)
+{
+	u8 class = BPF_CLASS(insn->code);
+
+	if (class == BPF_ALU || class == BPF_ALU64) {
+		if (BPF_SRC(insn->code) == BPF_X)
+			verbose("(%02x) %sr%d %s %sr%d\n",
+				insn->code, class == BPF_ALU ? "(u32) " : "",
+				insn->dst_reg,
+				bpf_alu_string[BPF_OP(insn->code) >> 4],
+				class == BPF_ALU ? "(u32) " : "",
+				insn->src_reg);
+		else
+			verbose("(%02x) %sr%d %s %s%d\n",
+				insn->code, class == BPF_ALU ? "(u32) " : "",
+				insn->dst_reg,
+				bpf_alu_string[BPF_OP(insn->code) >> 4],
+				class == BPF_ALU ? "(u32) " : "",
+				insn->imm);
+	} else if (class == BPF_STX) {
+		if (BPF_MODE(insn->code) == BPF_MEM)
+			verbose("(%02x) *(%s *)(r%d %+d) = r%d\n",
+				insn->code,
+				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+				insn->dst_reg,
+				insn->off, insn->src_reg);
+		else if (BPF_MODE(insn->code) == BPF_XADD)
+			verbose("(%02x) lock *(%s *)(r%d %+d) += r%d\n",
+				insn->code,
+				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+				insn->dst_reg, insn->off,
+				insn->src_reg);
+		else
+			verbose("BUG_%02x\n", insn->code);
+	} else if (class == BPF_ST) {
+		if (BPF_MODE(insn->code) != BPF_MEM) {
+			verbose("BUG_st_%02x\n", insn->code);
+			return;
+		}
+		verbose("(%02x) *(%s *)(r%d %+d) = %d\n",
+			insn->code,
+			bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+			insn->dst_reg,
+			insn->off, insn->imm);
+	} else if (class == BPF_LDX) {
+		if (BPF_MODE(insn->code) != BPF_MEM) {
+			verbose("BUG_ldx_%02x\n", insn->code);
+			return;
+		}
+		verbose("(%02x) r%d = *(%s *)(r%d %+d)\n",
+			insn->code, insn->dst_reg,
+			bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+			insn->src_reg, insn->off);
+	} else if (class == BPF_LD) {
+		if (BPF_MODE(insn->code) == BPF_ABS) {
+			verbose("(%02x) r0 = *(%s *)skb[%d]\n",
+				insn->code,
+				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+				insn->imm);
+		} else if (BPF_MODE(insn->code) == BPF_IND) {
+			verbose("(%02x) r0 = *(%s *)skb[r%d + %d]\n",
+				insn->code,
+				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+				insn->src_reg, insn->imm);
+		} else if (BPF_MODE(insn->code) == BPF_IMM) {
+			verbose("(%02x) r%d = 0x%x\n",
+				insn->code, insn->dst_reg, insn->imm);
+		} else {
+			verbose("BUG_ld_%02x\n", insn->code);
+			return;
+		}
+	} else if (class == BPF_JMP) {
+		u8 opcode = BPF_OP(insn->code);
+
+		if (opcode == BPF_CALL) {
+			verbose("(%02x) call %d\n", insn->code, insn->imm);
+		} else if (insn->code == (BPF_JMP | BPF_JA)) {
+			verbose("(%02x) goto pc%+d\n",
+				insn->code, insn->off);
+		} else if (insn->code == (BPF_JMP | BPF_EXIT)) {
+			verbose("(%02x) exit\n", insn->code);
+		} else if (BPF_SRC(insn->code) == BPF_X) {
+			verbose("(%02x) if r%d %s r%d goto pc%+d\n",
+				insn->code, insn->dst_reg,
+				bpf_jmp_string[BPF_OP(insn->code) >> 4],
+				insn->src_reg, insn->off);
+		} else {
+			verbose("(%02x) if r%d %s 0x%x goto pc%+d\n",
+				insn->code, insn->dst_reg,
+				bpf_jmp_string[BPF_OP(insn->code) >> 4],
+				insn->imm, insn->off);
+		}
+	} else {
+		verbose("(%02x) %s\n", insn->code, bpf_class_string[class]);
+	}
+}
+
 int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 {
+	void __user *log_ubuf = NULL;
+	struct verifier_env *env;
 	int ret = -EINVAL;
 
+	if (prog->len <= 0 || prog->len > BPF_MAXINSNS)
+		return -E2BIG;
+
+	/* 'struct verifier_env' can be global, but since it's not small,
+	 * allocate/free it every time bpf_check() is called
+	 */
+	env = kzalloc(sizeof(struct verifier_env), GFP_KERNEL);
+	if (!env)
+		return -ENOMEM;
+
+	/* grab the mutex to protect few globals used by verifier */
+	mutex_lock(&bpf_verifier_lock);
+
+	if (tb[BPF_PROG_LOG_LEVEL] && tb[BPF_PROG_LOG_BUF] &&
+	    tb[BPF_PROG_LOG_SIZE]) {
+		/* user requested verbose verifier output
+		 * and supplied buffer to store the verification trace
+		 */
+		log_level = nla_get_u32(tb[BPF_PROG_LOG_LEVEL]);
+		log_ubuf = *(void __user **) nla_data(tb[BPF_PROG_LOG_BUF]);
+		log_size = nla_get_u32(tb[BPF_PROG_LOG_SIZE]);
+		log_len = 0;
+
+		ret = -EINVAL;
+		/* log_* values have to be sane */
+		if (log_size < 128 || log_size > UINT_MAX >> 8 ||
+		    log_level == 0 || log_ubuf == NULL)
+			goto free_env;
+
+		ret = -ENOMEM;
+		log_buf = vmalloc(log_size);
+		if (!log_buf)
+			goto free_env;
+	} else {
+		log_level = 0;
+	}
+
+	/* ret = do_check(env); */
+
+	if (log_level && log_len >= log_size - 1) {
+		BUG_ON(log_len >= log_size);
+		/* verifier log exceeded user supplied buffer */
+		ret = -ENOSPC;
+		/* fall through to return what was recorded */
+	}
+
+	/* copy verifier log back to user space including trailing zero */
+	if (log_level && copy_to_user(log_ubuf, log_buf, log_len + 1) != 0) {
+		ret = -EFAULT;
+		goto free_log_buf;
+	}
+
+
+free_log_buf:
+	if (log_level)
+		vfree(log_buf);
+free_env:
+	kfree(env);
+	mutex_unlock(&bpf_verifier_lock);
 	return ret;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 12/29] bpf: handle pseudo BPF_LD_IMM64 insn
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (10 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 11/29] bpf: verifier (add ability to receive verification log) Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 13/29] bpf: verifier (add branch/goto checks) Alexei Starovoitov
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

eBPF programs passed from userspace are using pseudo BPF_LD_IMM64 instructions
to refer to process-local map_fd. Scan the program for such instructions and
if FDs are valid, convert them to 'struct bpf_map' pointers which will be used
by verifier to check access to maps in bpf_map_lookup/update() calls.
If program passes verifier, convert pseudo BPF_LD_IMM64 into generic by dropping
BPF_PSEUDO_MAP_FD flag.

Note that eBPF interpreter is generic and knows nothing about pseudo insns.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/uapi/linux/bpf.h |    6 ++
 kernel/bpf/verifier.c    |  147 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d280acad4b28..2f4d92df4fbe 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -176,6 +176,12 @@ enum {
 		.off   = 0,					\
 		.imm   = ((__u64) (IMM)) >> 32 })
 
+#define BPF_PSEUDO_MAP_FD	1
+
+/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
+#define BPF_LD_MAP_FD(DST, MAP_FD)				\
+	BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
+
 /* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
 
 #define BPF_MOV64_RAW(TYPE, DST, SRC, IMM)			\
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2484a3387ca2..28ce2cfa49db 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -144,10 +144,15 @@
  * load/store to bpf_context are checked against known fields
  */
 
+#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
+
 /* single container for all structs
  * one verifier_env per bpf_check() call
  */
 struct verifier_env {
+	struct bpf_prog *prog;		/* eBPF program being verified */
+	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
+	u32 used_map_cnt;		/* number of used maps */
 };
 
 /* verbose verifier prints what it's seeing
@@ -319,6 +324,115 @@ static void print_bpf_insn(struct bpf_insn *insn)
 	}
 }
 
+/* return the map pointer stored inside BPF_LD_IMM64 instruction */
+static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
+{
+	u64 imm64 = ((u64) (u32) insn[0].imm) | ((u64) (u32) insn[1].imm) << 32;
+
+	return (struct bpf_map *) (unsigned long) imm64;
+}
+
+/* look for pseudo eBPF instructions that access map FDs and
+ * replace them with actual map pointers
+ */
+static int replace_map_fd_with_map_ptr(struct verifier_env *env)
+{
+	struct bpf_insn *insn = env->prog->insnsi;
+	int insn_cnt = env->prog->len;
+	int i, j;
+
+	for (i = 0; i < insn_cnt; i++, insn++) {
+		if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
+			struct bpf_map *map;
+			struct fd f;
+
+			if (i == insn_cnt - 1 || insn[1].code != 0 ||
+			    insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
+			    insn[1].off != 0) {
+				verbose("invalid bpf_ld_imm64 insn\n");
+				return -EINVAL;
+			}
+
+			if (insn->src_reg == 0)
+				/* valid generic load 64-bit imm */
+				goto next_insn;
+
+			if (insn->src_reg != BPF_PSEUDO_MAP_FD) {
+				verbose("unrecognized bpf_ld_imm64 insn\n");
+				return -EINVAL;
+			}
+
+			f = fdget(insn->imm);
+
+			map = bpf_map_get(f);
+			if (IS_ERR(map)) {
+				verbose("fd %d is not pointing to valid bpf_map\n",
+					insn->imm);
+				fdput(f);
+				return PTR_ERR(map);
+			}
+
+			/* store map pointer inside BPF_LD_IMM64 instruction */
+			insn[0].imm = (u32) (unsigned long) map;
+			insn[1].imm = ((u64) (unsigned long) map) >> 32;
+
+			/* check whether we recorded this map already */
+			for (j = 0; j < env->used_map_cnt; j++)
+				if (env->used_maps[j] == map) {
+					fdput(f);
+					goto next_insn;
+				}
+
+			if (env->used_map_cnt >= MAX_USED_MAPS) {
+				fdput(f);
+				return -E2BIG;
+			}
+
+			/* remember this map */
+			env->used_maps[env->used_map_cnt++] = map;
+
+			/* hold the map. If the program is rejected by verifier,
+			 * the map will be released by release_maps() or it
+			 * will be used by the valid program until it's unloaded
+			 * and all maps are released in free_bpf_prog_info()
+			 */
+			atomic_inc(&map->refcnt);
+
+			fdput(f);
+next_insn:
+			insn++;
+			i++;
+		}
+	}
+
+	/* now all pseudo BPF_LD_IMM64 instructions load valid
+	 * 'struct bpf_map *' into a register instead of user map_fd.
+	 * These pointers will be used later by verifier to validate map access.
+	 */
+	return 0;
+}
+
+/* drop refcnt of maps used by the rejected program */
+static void release_maps(struct verifier_env *env)
+{
+	int i;
+
+	for (i = 0; i < env->used_map_cnt; i++)
+		bpf_map_put(env->used_maps[i]);
+}
+
+/* convert pseudo BPF_LD_IMM64 into generic BPF_LD_IMM64 */
+static void convert_pseudo_ld_imm64(struct verifier_env *env)
+{
+	struct bpf_insn *insn = env->prog->insnsi;
+	int insn_cnt = env->prog->len;
+	int i;
+
+	for (i = 0; i < insn_cnt; i++, insn++)
+		if (insn->code == (BPF_LD | BPF_IMM | BPF_DW))
+			insn->src_reg = 0;
+}
+
 int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 {
 	void __user *log_ubuf = NULL;
@@ -335,6 +449,8 @@ int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 	if (!env)
 		return -ENOMEM;
 
+	env->prog = prog;
+
 	/* grab the mutex to protect few globals used by verifier */
 	mutex_lock(&bpf_verifier_lock);
 
@@ -362,8 +478,14 @@ int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 		log_level = 0;
 	}
 
+	ret = replace_map_fd_with_map_ptr(env);
+	if (ret < 0)
+		goto skip_full_check;
+
 	/* ret = do_check(env); */
 
+skip_full_check:
+
 	if (log_level && log_len >= log_size - 1) {
 		BUG_ON(log_len >= log_size);
 		/* verifier log exceeded user supplied buffer */
@@ -377,11 +499,36 @@ int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 		goto free_log_buf;
 	}
 
+	if (ret == 0 && env->used_map_cnt) {
+		/* if program passed verifier, update used_maps in bpf_prog_info */
+		prog->info->used_maps = kmalloc_array(env->used_map_cnt,
+						      sizeof(env->used_maps[0]),
+						      GFP_KERNEL);
+
+		if (!prog->info->used_maps) {
+			ret = -ENOMEM;
+			goto free_log_buf;
+		}
+
+		memcpy(prog->info->used_maps, env->used_maps,
+		       sizeof(env->used_maps[0]) * env->used_map_cnt);
+		prog->info->used_map_cnt = env->used_map_cnt;
+
+		/* program is valid. Convert pseudo bpf_ld_imm64 into generic
+		 * bpf_ld_imm64 instructions
+		 */
+		convert_pseudo_ld_imm64(env);
+	}
 
 free_log_buf:
 	if (log_level)
 		vfree(log_buf);
 free_env:
+	if (!prog->info->used_maps)
+		/* if we didn't copy map pointers into bpf_prog_info, release
+		 * them now. Otherwise free_bpf_prog_info() will release them.
+		 */
+		release_maps(env);
 	kfree(env);
 	mutex_unlock(&bpf_verifier_lock);
 	return ret;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 13/29] bpf: verifier (add branch/goto checks)
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (11 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 12/29] bpf: handle pseudo BPF_LD_IMM64 insn Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 14/29] bpf: verifier (add verifier core) Alexei Starovoitov
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

check that control flow graph of eBPF program is a directed acyclic graph

check_cfg() does:
- detect loops
- detect unreachable instructions
- check that program terminates with BPF_EXIT insn
- check that all branches are within program boundary

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 kernel/bpf/verifier.c |  183 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 183 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 28ce2cfa49db..454079b145b2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -332,6 +332,185 @@ static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
 	return (struct bpf_map *) (unsigned long) imm64;
 }
 
+/* non-recursive DFS pseudo code
+ * 1  procedure DFS-iterative(G,v):
+ * 2      label v as discovered
+ * 3      let S be a stack
+ * 4      S.push(v)
+ * 5      while S is not empty
+ * 6            t <- S.pop()
+ * 7            if t is what we're looking for:
+ * 8                return t
+ * 9            for all edges e in G.adjacentEdges(t) do
+ * 10               if edge e is already labelled
+ * 11                   continue with the next edge
+ * 12               w <- G.adjacentVertex(t,e)
+ * 13               if vertex w is not discovered and not explored
+ * 14                   label e as tree-edge
+ * 15                   label w as discovered
+ * 16                   S.push(w)
+ * 17                   continue at 5
+ * 18               else if vertex w is discovered
+ * 19                   label e as back-edge
+ * 20               else
+ * 21                   // vertex w is explored
+ * 22                   label e as forward- or cross-edge
+ * 23           label t as explored
+ * 24           S.pop()
+ *
+ * convention:
+ * 0x10 - discovered
+ * 0x11 - discovered and fall-through edge labelled
+ * 0x12 - discovered and fall-through and branch edges labelled
+ * 0x20 - explored
+ */
+
+enum {
+	DISCOVERED = 0x10,
+	EXPLORED = 0x20,
+	FALLTHROUGH = 1,
+	BRANCH = 2,
+};
+
+#define PUSH_INT(I) \
+	do { \
+		if (cur_stack >= insn_cnt) { \
+			ret = -E2BIG; \
+			goto free_st; \
+		} \
+		stack[cur_stack++] = I; \
+	} while (0)
+
+#define PEEK_INT() \
+	({ \
+		int _ret; \
+		if (cur_stack == 0) \
+			_ret = -1; \
+		else \
+			_ret = stack[cur_stack - 1]; \
+		_ret; \
+	 })
+
+#define POP_INT() \
+	({ \
+		int _ret; \
+		if (cur_stack == 0) \
+			_ret = -1; \
+		else \
+			_ret = stack[--cur_stack]; \
+		_ret; \
+	 })
+
+#define PUSH_INSN(T, W, E) \
+	do { \
+		int w = W; \
+		if (E == FALLTHROUGH && st[T] >= (DISCOVERED | FALLTHROUGH)) \
+			break; \
+		if (E == BRANCH && st[T] >= (DISCOVERED | BRANCH)) \
+			break; \
+		if (w < 0 || w >= insn_cnt) { \
+			verbose("jump out of range from insn %d to %d\n", T, w); \
+			ret = -EINVAL; \
+			goto free_st; \
+		} \
+		if (st[w] == 0) { \
+			/* tree-edge */ \
+			st[T] = DISCOVERED | E; \
+			st[w] = DISCOVERED; \
+			PUSH_INT(w); \
+			goto peek_stack; \
+		} else if ((st[w] & 0xF0) == DISCOVERED) { \
+			verbose("back-edge from insn %d to %d\n", T, w); \
+			ret = -EINVAL; \
+			goto free_st; \
+		} else if (st[w] == EXPLORED) { \
+			/* forward- or cross-edge */ \
+			st[T] = DISCOVERED | E; \
+		} else { \
+			verbose("insn state internal bug\n"); \
+			ret = -EFAULT; \
+			goto free_st; \
+		} \
+	} while (0)
+
+/* non-recursive depth-first-search to detect loops in BPF program
+ * loop == back-edge in directed graph
+ */
+static int check_cfg(struct verifier_env *env)
+{
+	struct bpf_insn *insns = env->prog->insnsi;
+	int insn_cnt = env->prog->len;
+	int cur_stack = 0;
+	int *stack;
+	int ret = 0;
+	int *st;
+	int i, t;
+
+	st = kzalloc(sizeof(int) * insn_cnt, GFP_KERNEL);
+	if (!st)
+		return -ENOMEM;
+
+	stack = kzalloc(sizeof(int) * insn_cnt, GFP_KERNEL);
+	if (!stack) {
+		kfree(st);
+		return -ENOMEM;
+	}
+
+	st[0] = DISCOVERED; /* mark 1st insn as discovered */
+	PUSH_INT(0);
+
+peek_stack:
+	while ((t = PEEK_INT()) != -1) {
+		if (BPF_CLASS(insns[t].code) == BPF_JMP) {
+			u8 opcode = BPF_OP(insns[t].code);
+
+			if (opcode == BPF_EXIT) {
+				goto mark_explored;
+			} else if (opcode == BPF_CALL) {
+				PUSH_INSN(t, t + 1, FALLTHROUGH);
+			} else if (opcode == BPF_JA) {
+				if (BPF_SRC(insns[t].code) != BPF_K) {
+					ret = -EINVAL;
+					goto free_st;
+				}
+				/* unconditional jump with single edge */
+				PUSH_INSN(t, t + insns[t].off + 1, FALLTHROUGH);
+			} else {
+				/* conditional jump with two edges */
+				PUSH_INSN(t, t + 1, FALLTHROUGH);
+				PUSH_INSN(t, t + insns[t].off + 1, BRANCH);
+			}
+		} else {
+			/* all other non-branch instructions with single
+			 * fall-through edge
+			 */
+			PUSH_INSN(t, t + 1, FALLTHROUGH);
+		}
+
+mark_explored:
+		st[t] = EXPLORED;
+		if (POP_INT() == -1) {
+			verbose("pop_int internal bug\n");
+			ret = -EFAULT;
+			goto free_st;
+		}
+	}
+
+
+	for (i = 0; i < insn_cnt; i++) {
+		if (st[i] != EXPLORED) {
+			verbose("unreachable insn %d\n", i);
+			ret = -EINVAL;
+			goto free_st;
+		}
+	}
+
+free_st:
+	kfree(st);
+	kfree(stack);
+	return ret;
+}
+
 /* look for pseudo eBPF instructions that access map FDs and
  * replace them with actual map pointers
  */
@@ -482,6 +661,10 @@ int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 	if (ret < 0)
 		goto skip_full_check;
 
+	ret = check_cfg(env);
+	if (ret < 0)
+		goto skip_full_check;
+
 	/* ret = do_check(env); */
 
 skip_full_check:
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 14/29] bpf: verifier (add verifier core)
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (12 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 13/29] bpf: verifier (add branch/goto checks) Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 15/29] bpf: verifier (add state prunning optimization) Alexei Starovoitov
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

This patch adds verifier core which simulates execution of every insn and
records the state of registers and program stack. Every branch instruction seen
during simulation is pushed into state stack. When verifier reaches BPF_EXIT,
it pops the state from the stack and continues until it reaches BPF_EXIT again.
For program:
1: bpf_mov r1, xxx
2: if (r1 == 0) goto 5
3: bpf_mov r0, 1
4: goto 6
5: bpf_mov r0, 2
6: bpf_exit
The verifier will walk insns: 1, 2, 3, 4, 6
then it will pop the state recorded at insn#2 and will continue: 5, 6

This way it walks all possible paths through the program and checks all
possible values of registers. While doing so, it checks for:
- invalid instructions
- uninitialized register access
- uninitialized stack access
- misaligned stack access
- out of range stack access
- invalid calling convention
- BPF_LD_ABS|IND instructions are only used in socket filters
- instruction encoding is not using reserved fields

Kernel subsystem configures the verifier with two callbacks:

- bool (*is_valid_access)(int off, int size, enum bpf_access_type type);
  that provides information to the verifer which fields of 'ctx'
  are accessible (remember 'ctx' is the first argument to eBPF program)

- const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
  returns argument constraints of kernel helper functions that eBPF program
  may call, so that verifier can checks that R1-R5 types match the prototype

More details in Documentation/networking/filter.txt and in kernel/bpf/verifier.c

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/bpf.h      |   47 ++
 include/uapi/linux/bpf.h |    1 +
 kernel/bpf/verifier.c    | 1061 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 1108 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d818e473d12c..4d99c62c6cea 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -47,6 +47,31 @@ void bpf_register_map_type(struct bpf_map_type_list *tl);
 void bpf_map_put(struct bpf_map *map);
 struct bpf_map *bpf_map_get(struct fd f);
 
+/* function argument constraints */
+enum bpf_arg_type {
+	ARG_ANYTHING = 0,	/* any argument is ok */
+
+	/* the following constraints used to prototype
+	 * bpf_map_lookup/update/delete_elem() functions
+	 */
+	ARG_CONST_MAP_PTR,	/* const argument used as pointer to bpf_map */
+	ARG_PTR_TO_MAP_KEY,	/* pointer to stack used as map key */
+	ARG_PTR_TO_MAP_VALUE,	/* pointer to stack used as map value */
+
+	/* the following constraints used to prototype bpf_memcmp() and other
+	 * functions that access data on eBPF program stack
+	 */
+	ARG_PTR_TO_STACK,	/* any pointer to eBPF program stack */
+	ARG_CONST_STACK_SIZE,	/* number of bytes accessed from stack */
+};
+
+/* type of values returned from helper functions */
+enum bpf_return_type {
+	RET_INTEGER,			/* function returns integer */
+	RET_VOID,			/* function doesn't return anything */
+	RET_PTR_TO_MAP_VALUE_OR_NULL,	/* returns a pointer to map elem value or NULL */
+};
+
 /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
  * to in-kernel helper functions and for adjusting imm32 field in BPF_CALL
  * instructions after verifying
@@ -54,11 +79,33 @@ struct bpf_map *bpf_map_get(struct fd f);
 struct bpf_func_proto {
 	u64 (*func)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 	bool gpl_only;
+	enum bpf_return_type ret_type;
+	enum bpf_arg_type arg1_type;
+	enum bpf_arg_type arg2_type;
+	enum bpf_arg_type arg3_type;
+	enum bpf_arg_type arg4_type;
+	enum bpf_arg_type arg5_type;
+};
+
+/* bpf_context is intentionally undefined structure. Pointer to bpf_context is
+ * the first argument to eBPF programs.
+ * For socket filters: 'struct bpf_context *' == 'struct sk_buff *'
+ */
+struct bpf_context;
+
+enum bpf_access_type {
+	BPF_READ = 1,
+	BPF_WRITE = 2
 };
 
 struct bpf_verifier_ops {
 	/* return eBPF function prototype for verification */
 	const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
+
+	/* return true if 'size' wide access at offset 'off' within bpf_context
+	 * with 'type' (read or write) is allowed
+	 */
+	bool (*is_valid_access)(int off, int size, enum bpf_access_type type);
 };
 
 struct bpf_prog_type_list {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2f4d92df4fbe..63284088aa48 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -396,6 +396,7 @@ enum bpf_prog_attributes {
 
 enum bpf_prog_type {
 	BPF_PROG_TYPE_UNSPEC,
+	BPF_PROG_TYPE_SOCKET_FILTER,
 };
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 454079b145b2..14a0262b101c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -144,6 +144,72 @@
  * load/store to bpf_context are checked against known fields
  */
 
+#define _(OP) ({ int ret = (OP); if (ret < 0) return ret; })
+
+/* types of values stored in eBPF registers */
+enum bpf_reg_type {
+	NOT_INIT = 0,		 /* nothing was written into register */
+	UNKNOWN_VALUE,		 /* reg doesn't contain a valid pointer */
+	PTR_TO_CTX,		 /* reg points to bpf_context */
+	CONST_PTR_TO_MAP,	 /* reg points to struct bpf_map */
+	PTR_TO_MAP_VALUE,	 /* reg points to map element value */
+	PTR_TO_MAP_VALUE_OR_NULL,/* points to map elem value or NULL */
+	FRAME_PTR,		 /* reg == frame_pointer */
+	PTR_TO_STACK,		 /* reg == frame_pointer + imm */
+	CONST_IMM,		 /* constant integer value */
+};
+
+struct reg_state {
+	enum bpf_reg_type type;
+	union {
+		/* valid when type == CONST_IMM | PTR_TO_STACK */
+		int imm;
+
+		/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
+		 *   PTR_TO_MAP_VALUE_OR_NULL
+		 */
+		struct bpf_map *map_ptr;
+	};
+};
+
+enum bpf_stack_slot_type {
+	STACK_INVALID,    /* nothing was stored in this stack slot */
+	STACK_SPILL,      /* 1st byte of register spilled into stack */
+	STACK_SPILL_PART, /* other 7 bytes of register spill */
+	STACK_MISC	  /* BPF program wrote some data into this slot */
+};
+
+struct bpf_stack_slot {
+	enum bpf_stack_slot_type stype;
+	struct reg_state reg_st;
+};
+
+/* state of the program:
+ * type of all registers and stack info
+ */
+struct verifier_state {
+	struct reg_state regs[MAX_BPF_REG];
+	struct bpf_stack_slot stack[MAX_BPF_STACK];
+};
+
+/* linked list of verifier states used to prune search */
+struct verifier_state_list {
+	struct verifier_state state;
+	struct verifier_state_list *next;
+};
+
+/* verifier_state + insn_idx are pushed to stack when branch is encountered */
+struct verifier_stack_elem {
+	/* verifer state is 'st'
+	 * before processing instruction 'insn_idx'
+	 * and after processing instruction 'prev_insn_idx'
+	 */
+	struct verifier_state st;
+	int insn_idx;
+	int prev_insn_idx;
+	struct verifier_stack_elem *next;
+};
+
 #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
 
 /* single container for all structs
@@ -151,6 +217,9 @@
  */
 struct verifier_env {
 	struct bpf_prog *prog;		/* eBPF program being verified */
+	struct verifier_stack_elem *head; /* stack of verifier states to be processed */
+	int stack_size;			/* number of states to be processed */
+	struct verifier_state cur_state; /* current verifier state */
 	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
 	u32 used_map_cnt;		/* number of used maps */
 };
@@ -179,6 +248,45 @@ static void verbose(const char *fmt, ...)
 	va_end(args);
 }
 
+/* string representation of 'enum bpf_reg_type' */
+static const char * const reg_type_str[] = {
+	[NOT_INIT]		= "?",
+	[UNKNOWN_VALUE]		= "inv",
+	[PTR_TO_CTX]		= "ctx",
+	[CONST_PTR_TO_MAP]	= "map_ptr",
+	[PTR_TO_MAP_VALUE]	= "map_value",
+	[PTR_TO_MAP_VALUE_OR_NULL] = "map_value_or_null",
+	[FRAME_PTR]		= "fp",
+	[PTR_TO_STACK]		= "fp",
+	[CONST_IMM]		= "imm",
+};
+
+static void print_verifier_state(struct verifier_env *env)
+{
+	enum bpf_reg_type t;
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++) {
+		t = env->cur_state.regs[i].type;
+		if (t == NOT_INIT)
+			continue;
+		verbose(" R%d=%s", i, reg_type_str[t]);
+		if (t == CONST_IMM || t == PTR_TO_STACK)
+			verbose("%d", env->cur_state.regs[i].imm);
+		else if (t == CONST_PTR_TO_MAP || t == PTR_TO_MAP_VALUE ||
+			 t == PTR_TO_MAP_VALUE_OR_NULL)
+			verbose("(ks=%d,vs=%d)",
+				env->cur_state.regs[i].map_ptr->key_size,
+				env->cur_state.regs[i].map_ptr->value_size);
+	}
+	for (i = 0; i < MAX_BPF_STACK; i++) {
+		if (env->cur_state.stack[i].stype == STACK_SPILL)
+			verbose(" fp%d=%s", -MAX_BPF_STACK + i,
+				reg_type_str[env->cur_state.stack[i].reg_st.type]);
+	}
+	verbose("\n");
+}
+
 static const char *const bpf_class_string[] = {
 	[BPF_LD]    = "ld",
 	[BPF_LDX]   = "ldx",
@@ -324,6 +432,695 @@ static void print_bpf_insn(struct bpf_insn *insn)
 	}
 }
 
+static int pop_stack(struct verifier_env *env, int *prev_insn_idx)
+{
+	struct verifier_stack_elem *elem;
+	int insn_idx;
+
+	if (env->head == NULL)
+		return -1;
+
+	memcpy(&env->cur_state, &env->head->st, sizeof(env->cur_state));
+	insn_idx = env->head->insn_idx;
+	if (prev_insn_idx)
+		*prev_insn_idx = env->head->prev_insn_idx;
+	elem = env->head->next;
+	kfree(env->head);
+	env->head = elem;
+	env->stack_size--;
+	return insn_idx;
+}
+
+static struct verifier_state *push_stack(struct verifier_env *env, int insn_idx,
+					 int prev_insn_idx)
+{
+	struct verifier_stack_elem *elem;
+
+	elem = kmalloc(sizeof(struct verifier_stack_elem), GFP_KERNEL);
+	if (!elem)
+		goto err;
+
+	memcpy(&elem->st, &env->cur_state, sizeof(env->cur_state));
+	elem->insn_idx = insn_idx;
+	elem->prev_insn_idx = prev_insn_idx;
+	elem->next = env->head;
+	env->head = elem;
+	env->stack_size++;
+	if (env->stack_size > 1024) {
+		verbose("BPF program is too complex\n");
+		goto err;
+	}
+	return &elem->st;
+err:
+	/* pop all elements and return */
+	while (pop_stack(env, NULL) >= 0);
+	return NULL;
+}
+
+#define CALLER_SAVED_REGS 6
+static const int caller_saved[CALLER_SAVED_REGS] = {
+	BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5
+};
+
+static void init_reg_state(struct reg_state *regs)
+{
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++) {
+		regs[i].type = NOT_INIT;
+		regs[i].imm = 0;
+		regs[i].map_ptr = NULL;
+	}
+
+	/* frame pointer */
+	regs[BPF_REG_FP].type = FRAME_PTR;
+
+	/* 1st arg to a function */
+	regs[BPF_REG_1].type = PTR_TO_CTX;
+}
+
+static void mark_reg_unknown_value(struct reg_state *regs, u32 regno)
+{
+	BUG_ON(regno >= MAX_BPF_REG);
+	regs[regno].type = UNKNOWN_VALUE;
+	regs[regno].imm = 0;
+	regs[regno].map_ptr = NULL;
+}
+
+enum reg_arg_type {
+	SRC_OP,		/* register is used as source operand */
+	DST_OP,		/* register is used as destination operand */
+	DST_OP_NO_MARK	/* same as above, check only, don't mark */
+};
+
+static int check_reg_arg(struct reg_state *regs, u32 regno,
+			 enum reg_arg_type t)
+{
+	if (regno >= MAX_BPF_REG) {
+		verbose("R%d is invalid\n", regno);
+		return -EINVAL;
+	}
+
+	if (t == SRC_OP) {
+		/* check whether register used as source operand can be read */
+		if (regs[regno].type == NOT_INIT) {
+			verbose("R%d !read_ok\n", regno);
+			return -EACCES;
+		}
+	} else {
+		/* check whether register used as dest operand can be written to */
+		if (regno == BPF_REG_FP) {
+			verbose("frame pointer is read only\n");
+			return -EACCES;
+		}
+		if (t == DST_OP)
+			mark_reg_unknown_value(regs, regno);
+	}
+	return 0;
+}
+
+static int bpf_size_to_bytes(int bpf_size)
+{
+	if (bpf_size == BPF_W)
+		return 4;
+	else if (bpf_size == BPF_H)
+		return 2;
+	else if (bpf_size == BPF_B)
+		return 1;
+	else if (bpf_size == BPF_DW)
+		return 8;
+	else
+		return -EINVAL;
+}
+
+/* check_stack_read/write functions track spill/fill of registers,
+ * stack boundary and alignment are checked in check_mem_access()
+ */
+static int check_stack_write(struct verifier_state *state, int off, int size,
+			     int value_regno)
+{
+	struct bpf_stack_slot *slot;
+	int i;
+
+	if (value_regno >= 0 &&
+	    (state->regs[value_regno].type == PTR_TO_MAP_VALUE ||
+	     state->regs[value_regno].type == PTR_TO_STACK ||
+	     state->regs[value_regno].type == PTR_TO_CTX)) {
+
+		/* register containing pointer is being spilled into stack */
+		if (size != 8) {
+			verbose("invalid size of register spill\n");
+			return -EACCES;
+		}
+
+		slot = &state->stack[MAX_BPF_STACK + off];
+		slot->stype = STACK_SPILL;
+		/* save register state */
+		slot->reg_st = state->regs[value_regno];
+		for (i = 1; i < 8; i++) {
+			slot = &state->stack[MAX_BPF_STACK + off + i];
+			slot->stype = STACK_SPILL_PART;
+			slot->reg_st.type = UNKNOWN_VALUE;
+			slot->reg_st.map_ptr = NULL;
+		}
+	} else {
+
+		/* regular write of data into stack */
+		for (i = 0; i < size; i++) {
+			slot = &state->stack[MAX_BPF_STACK + off + i];
+			slot->stype = STACK_MISC;
+			slot->reg_st.type = UNKNOWN_VALUE;
+			slot->reg_st.map_ptr = NULL;
+		}
+	}
+	return 0;
+}
+
+static int check_stack_read(struct verifier_state *state, int off, int size,
+			    int value_regno)
+{
+	int i;
+	struct bpf_stack_slot *slot;
+
+	slot = &state->stack[MAX_BPF_STACK + off];
+
+	if (slot->stype == STACK_SPILL) {
+		if (size != 8) {
+			verbose("invalid size of register spill\n");
+			return -EACCES;
+		}
+		for (i = 1; i < 8; i++) {
+			if (state->stack[MAX_BPF_STACK + off + i].stype !=
+			    STACK_SPILL_PART) {
+				verbose("corrupted spill memory\n");
+				return -EACCES;
+			}
+		}
+
+		if (value_regno >= 0)
+			/* restore register state from stack */
+			state->regs[value_regno] = slot->reg_st;
+		return 0;
+	} else {
+		for (i = 0; i < size; i++) {
+			if (state->stack[MAX_BPF_STACK + off + i].stype !=
+			    STACK_MISC) {
+				verbose("invalid read from stack off %d+%d size %d\n",
+					off, i, size);
+				return -EACCES;
+			}
+		}
+		if (value_regno >= 0)
+			/* have read misc data from the stack */
+			mark_reg_unknown_value(state->regs, value_regno);
+		return 0;
+	}
+}
+
+/* check read/write into map element returned by bpf_map_lookup_elem() */
+static int check_map_access(struct verifier_env *env, u32 regno, int off,
+			    int size)
+{
+	struct bpf_map *map = env->cur_state.regs[regno].map_ptr;
+
+	if (off < 0 || off + size > map->value_size) {
+		verbose("invalid access to map value, value_size=%d off=%d size=%d\n",
+			map->value_size, off, size);
+		return -EACCES;
+	}
+	return 0;
+}
+
+/* check access to 'struct bpf_context' fields */
+static int check_ctx_access(struct verifier_env *env, int off, int size,
+			    enum bpf_access_type t)
+{
+	if (env->prog->info->ops->is_valid_access &&
+	    env->prog->info->ops->is_valid_access(off, size, t))
+		return 0;
+
+	verbose("invalid bpf_context access off=%d size=%d\n", off, size);
+	return -EACCES;
+}
+
+/* check whether memory at (regno + off) is accessible for t = (read | write)
+ * if t==write, value_regno is a register which value is stored into memory
+ * if t==read, value_regno is a register which will receive the value from memory
+ * if t==write && value_regno==-1, some unknown value is stored into memory
+ * if t==read && value_regno==-1, don't care what we read from memory
+ */
+static int check_mem_access(struct verifier_env *env, u32 regno, int off,
+			    int bpf_size, enum bpf_access_type t,
+			    int value_regno)
+{
+	struct verifier_state *state = &env->cur_state;
+	int size;
+
+	_(size = bpf_size_to_bytes(bpf_size));
+
+	if (off % size != 0) {
+		verbose("misaligned access off %d size %d\n", off, size);
+		return -EACCES;
+	}
+
+	if (state->regs[regno].type == PTR_TO_MAP_VALUE) {
+		_(check_map_access(env, regno, off, size));
+		if (t == BPF_READ && value_regno >= 0)
+			mark_reg_unknown_value(state->regs, value_regno);
+
+	} else if (state->regs[regno].type == PTR_TO_CTX) {
+		_(check_ctx_access(env, off, size, t));
+		if (t == BPF_READ && value_regno >= 0)
+			mark_reg_unknown_value(state->regs, value_regno);
+
+	} else if (state->regs[regno].type == FRAME_PTR) {
+		if (off >= 0 || off < -MAX_BPF_STACK) {
+			verbose("invalid stack off=%d size=%d\n", off, size);
+			return -EACCES;
+		}
+		if (t == BPF_WRITE)
+			_(check_stack_write(state, off, size, value_regno));
+		else
+			_(check_stack_read(state, off, size, value_regno));
+	} else {
+		verbose("R%d invalid mem access '%s'\n",
+			regno, reg_type_str[state->regs[regno].type]);
+		return -EACCES;
+	}
+	return 0;
+}
+
+static int check_xadd(struct verifier_env *env, struct bpf_insn *insn)
+{
+	struct reg_state *regs = env->cur_state.regs;
+
+	if ((BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) ||
+	    insn->imm != 0) {
+		verbose("BPF_XADD uses reserved fields\n");
+		return -EINVAL;
+	}
+
+	/* check src1 operand */
+	_(check_reg_arg(regs, insn->src_reg, SRC_OP));
+	/* check src2 operand */
+	_(check_reg_arg(regs, insn->dst_reg, SRC_OP));
+
+	/* check whether atomic_add can read the memory */
+	_(check_mem_access(env, insn->dst_reg, insn->off,
+			   BPF_SIZE(insn->code), BPF_READ, -1));
+
+	/* check whether atomic_add can write into the same memory */
+	_(check_mem_access(env, insn->dst_reg, insn->off,
+			   BPF_SIZE(insn->code), BPF_WRITE, -1));
+	return 0;
+}
+
+/* when register 'regno' is passed into function that will read 'access_size'
+ * bytes from that pointer, make sure that it's within stack boundary
+ * and all elements of stack are initialized
+ */
+static int check_stack_boundary(struct verifier_env *env,
+				int regno, int access_size)
+{
+	struct verifier_state *state = &env->cur_state;
+	struct reg_state *regs = state->regs;
+	int off, i;
+
+	if (regs[regno].type != PTR_TO_STACK)
+		return -EACCES;
+
+	off = regs[regno].imm;
+	if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 ||
+	    access_size <= 0) {
+		verbose("invalid stack type R%d off=%d access_size=%d\n",
+			regno, off, access_size);
+		return -EACCES;
+	}
+
+	for (i = 0; i < access_size; i++) {
+		if (state->stack[MAX_BPF_STACK + off + i].stype != STACK_MISC) {
+			verbose("invalid indirect read from stack off %d+%d size %d\n",
+				off, i, access_size);
+			return -EACCES;
+		}
+	}
+	return 0;
+}
+
+static int check_func_arg(struct verifier_env *env, u32 regno,
+			  enum bpf_arg_type arg_type, struct bpf_map **mapp)
+{
+	struct reg_state *reg = env->cur_state.regs + regno;
+	enum bpf_reg_type expected_type;
+
+	if (arg_type == ARG_ANYTHING)
+		return 0;
+
+	if (reg->type == NOT_INIT) {
+		verbose("R%d !read_ok\n", regno);
+		return -EACCES;
+	}
+
+	if (arg_type == ARG_PTR_TO_STACK || arg_type == ARG_PTR_TO_MAP_KEY ||
+	    arg_type == ARG_PTR_TO_MAP_VALUE) {
+		expected_type = PTR_TO_STACK;
+	} else if (arg_type == ARG_CONST_STACK_SIZE) {
+		expected_type = CONST_IMM;
+	} else if (arg_type == ARG_CONST_MAP_PTR) {
+		expected_type = CONST_PTR_TO_MAP;
+	} else {
+		verbose("unsupported arg_type %d\n", arg_type);
+		return -EFAULT;
+	}
+
+	if (reg->type != expected_type) {
+		verbose("R%d type=%s expected=%s\n", regno,
+			reg_type_str[reg->type], reg_type_str[expected_type]);
+		return -EACCES;
+	}
+
+	if (arg_type == ARG_CONST_MAP_PTR) {
+		/* bpf_map_xxx(map_ptr) call: remember that map_ptr */
+		*mapp = reg->map_ptr;
+
+	} else if (arg_type == ARG_PTR_TO_MAP_KEY) {
+		/* bpf_map_xxx(..., map_ptr, ..., key) call:
+		 * check that [key, key + map->key_size) are within
+		 * stack limits and initialized
+		 */
+		if (!*mapp) {
+			/* in function declaration map_ptr must come before
+			 * map_key or map_elem, so that it's verified
+			 * and known before we have to check map_key here.
+			 * It means that kernel subsystem misconfigured verifier
+			 */
+			verbose("invalid map_ptr to access map->key\n");
+			return -EACCES;
+		}
+		_(check_stack_boundary(env, regno, (*mapp)->key_size));
+
+	} else if (arg_type == ARG_PTR_TO_MAP_VALUE) {
+		/* bpf_map_xxx(..., map_ptr, ..., value) call:
+		 * check [value, value + map->value_size) validity
+		 */
+		if (!*mapp) {
+			/* kernel subsystem misconfigured verifier */
+			verbose("invalid map_ptr to access map->elem\n");
+			return -EACCES;
+		}
+		_(check_stack_boundary(env, regno, (*mapp)->value_size));
+
+	} else if (arg_type == ARG_CONST_STACK_SIZE) {
+		/* bpf_xxx(..., buf, len) call will access 'len' bytes
+		 * from stack pointer 'buf'. Check it
+		 * note: regno == len, regno - 1 == buf
+		 */
+		if (regno == 0) {
+			/* kernel subsystem misconfigured verifier */
+			verbose("ARG_CONST_STACK_SIZE cannot be first argument\n");
+			return -EACCES;
+		}
+		_(check_stack_boundary(env, regno - 1, reg->imm));
+	}
+
+	return 0;
+}
+
+static int check_call(struct verifier_env *env, int func_id)
+{
+	struct verifier_state *state = &env->cur_state;
+	const struct bpf_func_proto *fn = NULL;
+	struct reg_state *regs = state->regs;
+	struct bpf_map *map = NULL;
+	struct reg_state *reg;
+	int i;
+
+	/* find function prototype */
+	if (func_id <= 0 || func_id >= __BPF_FUNC_MAX_ID) {
+		verbose("invalid func %d\n", func_id);
+		return -EINVAL;
+	}
+
+	if (env->prog->info->ops->get_func_proto)
+		fn = env->prog->info->ops->get_func_proto(func_id);
+
+	if (!fn) {
+		verbose("unknown func %d\n", func_id);
+		return -EINVAL;
+	}
+
+	/* eBPF programs must be GPL compatible to use GPL-ed functions */
+	if (!env->prog->info->is_gpl_compatible && fn->gpl_only) {
+		verbose("cannot call GPL only function from proprietary program\n");
+		return -EINVAL;
+	}
+
+	/* check args */
+	_(check_func_arg(env, BPF_REG_1, fn->arg1_type, &map));
+	_(check_func_arg(env, BPF_REG_2, fn->arg2_type, &map));
+	_(check_func_arg(env, BPF_REG_3, fn->arg3_type, &map));
+	_(check_func_arg(env, BPF_REG_4, fn->arg4_type, &map));
+	_(check_func_arg(env, BPF_REG_5, fn->arg5_type, &map));
+
+	/* reset caller saved regs */
+	for (i = 0; i < CALLER_SAVED_REGS; i++) {
+		reg = regs + caller_saved[i];
+		reg->type = NOT_INIT;
+		reg->imm = 0;
+	}
+
+	/* update return register */
+	if (fn->ret_type == RET_INTEGER) {
+		regs[BPF_REG_0].type = UNKNOWN_VALUE;
+	} else if (fn->ret_type == RET_VOID) {
+		regs[BPF_REG_0].type = NOT_INIT;
+	} else if (fn->ret_type == RET_PTR_TO_MAP_VALUE_OR_NULL) {
+		regs[BPF_REG_0].type = PTR_TO_MAP_VALUE_OR_NULL;
+		/* remember map_ptr, so that check_map_access()
+		 * can check 'value_size' boundary of memory access
+		 * to map element returned from bpf_map_lookup_elem()
+		 */
+		if (map == NULL) {
+			verbose("kernel subsystem misconfigured verifier\n");
+			return -EINVAL;
+		}
+		regs[BPF_REG_0].map_ptr = map;
+	} else {
+		verbose("unknown return type %d of func %d\n",
+			fn->ret_type, func_id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* check validity of 32-bit and 64-bit arithmetic operations */
+static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn)
+{
+	u8 opcode = BPF_OP(insn->code);
+
+	if (opcode == BPF_END || opcode == BPF_NEG) {
+		if (opcode == BPF_NEG) {
+			if (BPF_SRC(insn->code) != 0 ||
+			    insn->src_reg != BPF_REG_0 ||
+			    insn->off != 0 || insn->imm != 0) {
+				verbose("BPF_NEG uses reserved fields\n");
+				return -EINVAL;
+			}
+		} else {
+			if (insn->src_reg != BPF_REG_0 || insn->off != 0 ||
+			    (insn->imm != 16 && insn->imm != 32 && insn->imm != 64)) {
+				verbose("BPF_END uses reserved fields\n");
+				return -EINVAL;
+			}
+		}
+
+		/* check src operand */
+		_(check_reg_arg(regs, insn->dst_reg, SRC_OP));
+
+		/* check dest operand */
+		_(check_reg_arg(regs, insn->dst_reg, DST_OP));
+
+	} else if (opcode == BPF_MOV) {
+
+		if (BPF_SRC(insn->code) == BPF_X) {
+			if (insn->imm != 0 || insn->off != 0) {
+				verbose("BPF_MOV uses reserved fields\n");
+				return -EINVAL;
+			}
+
+			/* check src operand */
+			_(check_reg_arg(regs, insn->src_reg, SRC_OP));
+		} else {
+			if (insn->src_reg != BPF_REG_0 || insn->off != 0) {
+				verbose("BPF_MOV uses reserved fields\n");
+				return -EINVAL;
+			}
+		}
+
+		/* check dest operand */
+		_(check_reg_arg(regs, insn->dst_reg, DST_OP));
+
+		if (BPF_SRC(insn->code) == BPF_X) {
+			if (BPF_CLASS(insn->code) == BPF_ALU64) {
+				/* case: R1 = R2
+				 * copy register state to dest reg
+				 */
+				regs[insn->dst_reg] = regs[insn->src_reg];
+			} else {
+				regs[insn->dst_reg].type = UNKNOWN_VALUE;
+				regs[insn->dst_reg].map_ptr = NULL;
+			}
+		} else {
+			/* case: R = imm
+			 * remember the value we stored into this reg
+			 */
+			regs[insn->dst_reg].type = CONST_IMM;
+			regs[insn->dst_reg].imm = insn->imm;
+		}
+
+	} else if (opcode > BPF_END) {
+		verbose("invalid BPF_ALU opcode %x\n", opcode);
+		return -EINVAL;
+
+	} else {	/* all other ALU ops: and, sub, xor, add, ... */
+
+		bool stack_relative = false;
+
+		if (BPF_SRC(insn->code) == BPF_X) {
+			if (insn->imm != 0 || insn->off != 0) {
+				verbose("BPF_ALU uses reserved fields\n");
+				return -EINVAL;
+			}
+			/* check src1 operand */
+			_(check_reg_arg(regs, insn->src_reg, SRC_OP));
+		} else {
+			if (insn->src_reg != BPF_REG_0 || insn->off != 0) {
+				verbose("BPF_ALU uses reserved fields\n");
+				return -EINVAL;
+			}
+		}
+
+		/* check src2 operand */
+		_(check_reg_arg(regs, insn->dst_reg, SRC_OP));
+
+		if ((opcode == BPF_MOD || opcode == BPF_DIV) &&
+		    BPF_SRC(insn->code) == BPF_K && insn->imm == 0) {
+			verbose("div by zero\n");
+			return -EINVAL;
+		}
+
+		/* pattern match 'bpf_add Rx, imm' instruction */
+		if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 &&
+		    regs[insn->dst_reg].type == FRAME_PTR &&
+		    BPF_SRC(insn->code) == BPF_K)
+			stack_relative = true;
+
+		/* check dest operand */
+		_(check_reg_arg(regs, insn->dst_reg, DST_OP));
+
+		if (stack_relative) {
+			regs[insn->dst_reg].type = PTR_TO_STACK;
+			regs[insn->dst_reg].imm = insn->imm;
+		}
+	}
+
+	return 0;
+}
+
+static int check_cond_jmp_op(struct verifier_env *env,
+			     struct bpf_insn *insn, int *insn_idx)
+{
+	struct reg_state *regs = env->cur_state.regs;
+	struct verifier_state *other_branch;
+	u8 opcode = BPF_OP(insn->code);
+
+	if (opcode > BPF_EXIT) {
+		verbose("invalid BPF_JMP opcode %x\n", opcode);
+		return -EINVAL;
+	}
+
+	if (BPF_SRC(insn->code) == BPF_X) {
+		if (insn->imm != 0) {
+			verbose("BPF_JMP uses reserved fields\n");
+			return -EINVAL;
+		}
+
+		/* check src1 operand */
+		_(check_reg_arg(regs, insn->src_reg, SRC_OP));
+	} else {
+		if (insn->src_reg != BPF_REG_0) {
+			verbose("BPF_JMP uses reserved fields\n");
+			return -EINVAL;
+		}
+	}
+
+	/* check src2 operand */
+	_(check_reg_arg(regs, insn->dst_reg, SRC_OP));
+
+	/* detect if R == 0 where R was initialized to zero earlier */
+	if (BPF_SRC(insn->code) == BPF_K &&
+	    (opcode == BPF_JEQ || opcode == BPF_JNE) &&
+	    regs[insn->dst_reg].type == CONST_IMM &&
+	    regs[insn->dst_reg].imm == insn->imm) {
+		if (opcode == BPF_JEQ) {
+			/* if (imm == imm) goto pc+off;
+			 * only follow the goto, ignore fall-through
+			 */
+			*insn_idx += insn->off;
+			return 0;
+		} else {
+			/* if (imm != imm) goto pc+off;
+			 * only follow fall-through branch, since
+			 * that's where the program will go
+			 */
+			return 0;
+		}
+	}
+
+	other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx);
+	if (!other_branch)
+		return -EFAULT;
+
+	/* detect if R == 0 where R is returned value from bpf_map_lookup_elem() */
+	if (BPF_SRC(insn->code) == BPF_K &&
+	    insn->imm == 0 && (opcode == BPF_JEQ ||
+			       opcode == BPF_JNE) &&
+	    regs[insn->dst_reg].type == PTR_TO_MAP_VALUE_OR_NULL) {
+		if (opcode == BPF_JEQ) {
+			/* next fallthrough insn can access memory via
+			 * this register
+			 */
+			regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
+			/* branch targer cannot access it, since reg == 0 */
+			other_branch->regs[insn->dst_reg].type = CONST_IMM;
+			other_branch->regs[insn->dst_reg].imm = 0;
+		} else {
+			other_branch->regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
+			regs[insn->dst_reg].type = CONST_IMM;
+			regs[insn->dst_reg].imm = 0;
+		}
+	} else if (BPF_SRC(insn->code) == BPF_K &&
+		   (opcode == BPF_JEQ || opcode == BPF_JNE)) {
+
+		if (opcode == BPF_JEQ) {
+			/* detect if (R == imm) goto
+			 * and in the target state recognize that R = imm
+			 */
+			other_branch->regs[insn->dst_reg].type = CONST_IMM;
+			other_branch->regs[insn->dst_reg].imm = insn->imm;
+		} else {
+			/* detect if (R != imm) goto
+			 * and in the fall-through state recognize that R = imm
+			 */
+			regs[insn->dst_reg].type = CONST_IMM;
+			regs[insn->dst_reg].imm = insn->imm;
+		}
+	}
+	if (log_level)
+		print_verifier_state(env);
+	return 0;
+}
+
 /* return the map pointer stored inside BPF_LD_IMM64 instruction */
 static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
 {
@@ -332,6 +1129,93 @@ static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
 	return (struct bpf_map *) (unsigned long) imm64;
 }
 
+/* verify BPF_LD_IMM64 instruction */
+static int check_ld_imm(struct verifier_env *env, struct bpf_insn *insn)
+{
+	struct reg_state *regs = env->cur_state.regs;
+
+	if (BPF_SIZE(insn->code) != BPF_DW) {
+		verbose("invalid BPF_LD_IMM insn\n");
+		return -EINVAL;
+	}
+	if (insn->off != 0) {
+		verbose("BPF_LD_IMM64 uses reserved fields\n");
+		return -EINVAL;
+	}
+
+	_(check_reg_arg(regs, insn->dst_reg, DST_OP));
+
+	if (insn->src_reg == 0)
+		/* generic move 64-bit immediate into a register */
+		return 0;
+
+	/* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
+	BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
+
+	regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
+	regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
+	return 0;
+}
+
+/* verify safety of LD_ABS|LD_IND instructions:
+ * - they can only appear in the programs where ctx == skb
+ * - since they are wrappers of function calls, they scratch R1-R5 registers,
+ *   preserve R6-R9, and store return value into R0
+ *
+ * Implicit input:
+ *   ctx == skb == R6 == CTX
+ *
+ * Explicit input:
+ *   SRC == any register
+ *   IMM == 32-bit immediate
+ *
+ * Output:
+ *   R0 - 8/16/32-bit skb data converted to cpu endianness
+ */
+static int check_ld_abs(struct verifier_env *env, struct bpf_insn *insn)
+{
+	struct reg_state *regs = env->cur_state.regs;
+	u8 mode = BPF_MODE(insn->code);
+	struct reg_state *reg;
+	int i;
+
+	if (env->prog->info->prog_type != BPF_PROG_TYPE_SOCKET_FILTER) {
+		verbose("BPF_LD_ABS|IND instructions are only allowed in socket filters\n");
+		return -EINVAL;
+	}
+
+	if (insn->dst_reg != BPF_REG_0 || insn->off != 0 ||
+	    (mode == BPF_ABS && insn->src_reg != BPF_REG_0)) {
+		verbose("BPF_LD_ABS uses reserved fields\n");
+		return -EINVAL;
+	}
+
+	/* check whether implicit source operand (register R6) is readable */
+	_(check_reg_arg(regs, BPF_REG_6, SRC_OP));
+
+	if (regs[BPF_REG_6].type != PTR_TO_CTX) {
+		verbose("at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
+		return -EINVAL;
+	}
+
+	if (mode == BPF_IND)
+		/* check explicit source operand */
+		_(check_reg_arg(regs, insn->src_reg, SRC_OP));
+
+	/* reset caller saved regs to unreadable */
+	for (i = 0; i < CALLER_SAVED_REGS; i++) {
+		reg = regs + caller_saved[i];
+		reg->type = NOT_INIT;
+		reg->imm = 0;
+	}
+
+	/* mark destination R0 register as readable, since it contains
+	 * the value fetched from the packet
+	 */
+	regs[BPF_REG_0].type = UNKNOWN_VALUE;
+	return 0;
+}
+
 /* non-recursive DFS pseudo code
  * 1  procedure DFS-iterative(G,v):
  * 2      label v as discovered
@@ -511,6 +1395,180 @@ free_st:
 	return ret;
 }
 
+static int do_check(struct verifier_env *env)
+{
+	struct verifier_state *state = &env->cur_state;
+	struct bpf_insn *insns = env->prog->insnsi;
+	struct reg_state *regs = state->regs;
+	int insn_cnt = env->prog->len;
+	int insn_idx, prev_insn_idx = 0;
+	int insn_processed = 0;
+	bool do_print_state = false;
+
+	init_reg_state(regs);
+	insn_idx = 0;
+	for (;;) {
+		struct bpf_insn *insn;
+		u8 class;
+
+		if (insn_idx >= insn_cnt) {
+			verbose("invalid insn idx %d insn_cnt %d\n",
+				insn_idx, insn_cnt);
+			return -EFAULT;
+		}
+
+		insn = &insns[insn_idx];
+		class = BPF_CLASS(insn->code);
+
+		if (++insn_processed > 32768) {
+			verbose("BPF program is too large. Proccessed %d insn\n",
+				insn_processed);
+			return -E2BIG;
+		}
+
+		if (log_level && do_print_state) {
+			verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx);
+			print_verifier_state(env);
+			do_print_state = false;
+		}
+
+		if (log_level) {
+			verbose("%d: ", insn_idx);
+			print_bpf_insn(insn);
+		}
+
+		if (class == BPF_ALU || class == BPF_ALU64) {
+			_(check_alu_op(regs, insn));
+
+		} else if (class == BPF_LDX) {
+			if (BPF_MODE(insn->code) != BPF_MEM ||
+			    insn->imm != 0) {
+				verbose("BPF_LDX uses reserved fields\n");
+				return -EINVAL;
+			}
+			/* check src operand */
+			_(check_reg_arg(regs, insn->src_reg, SRC_OP));
+
+			_(check_reg_arg(regs, insn->dst_reg, DST_OP_NO_MARK));
+
+			/* check that memory (src_reg + off) is readable,
+			 * the state of dst_reg will be updated by this func
+			 */
+			_(check_mem_access(env, insn->src_reg, insn->off,
+					   BPF_SIZE(insn->code), BPF_READ,
+					   insn->dst_reg));
+
+		} else if (class == BPF_STX) {
+			if (BPF_MODE(insn->code) == BPF_XADD) {
+				_(check_xadd(env, insn));
+				insn_idx++;
+				continue;
+			}
+
+			if (BPF_MODE(insn->code) != BPF_MEM ||
+			    insn->imm != 0) {
+				verbose("BPF_STX uses reserved fields\n");
+				return -EINVAL;
+			}
+			/* check src1 operand */
+			_(check_reg_arg(regs, insn->src_reg, SRC_OP));
+			/* check src2 operand */
+			_(check_reg_arg(regs, insn->dst_reg, SRC_OP));
+
+			/* check that memory (dst_reg + off) is writeable */
+			_(check_mem_access(env, insn->dst_reg, insn->off,
+					   BPF_SIZE(insn->code), BPF_WRITE,
+					   insn->src_reg));
+
+		} else if (class == BPF_ST) {
+			if (BPF_MODE(insn->code) != BPF_MEM ||
+			    insn->src_reg != BPF_REG_0) {
+				verbose("BPF_ST uses reserved fields\n");
+				return -EINVAL;
+			}
+			/* check src operand */
+			_(check_reg_arg(regs, insn->dst_reg, SRC_OP));
+
+			/* check that memory (dst_reg + off) is writeable */
+			_(check_mem_access(env, insn->dst_reg, insn->off,
+					   BPF_SIZE(insn->code), BPF_WRITE,
+					   -1));
+
+		} else if (class == BPF_JMP) {
+			u8 opcode = BPF_OP(insn->code);
+
+			if (opcode == BPF_CALL) {
+				if (BPF_SRC(insn->code) != BPF_K ||
+				    insn->off != 0 ||
+				    insn->src_reg != BPF_REG_0 ||
+				    insn->dst_reg != BPF_REG_0) {
+					verbose("BPF_CALL uses reserved fields\n");
+					return -EINVAL;
+				}
+
+				_(check_call(env, insn->imm));
+
+			} else if (opcode == BPF_JA) {
+				if (BPF_SRC(insn->code) != BPF_K ||
+				    insn->imm != 0 ||
+				    insn->src_reg != BPF_REG_0 ||
+				    insn->dst_reg != BPF_REG_0) {
+					verbose("BPF_JA uses reserved fields\n");
+					return -EINVAL;
+				}
+
+				insn_idx += insn->off + 1;
+				continue;
+
+			} else if (opcode == BPF_EXIT) {
+				if (BPF_SRC(insn->code) != BPF_K ||
+				    insn->imm != 0 ||
+				    insn->src_reg != BPF_REG_0 ||
+				    insn->dst_reg != BPF_REG_0) {
+					verbose("BPF_EXIT uses reserved fields\n");
+					return -EINVAL;
+				}
+
+				/* eBPF calling convetion is such that R0 is used
+				 * to return the value from eBPF program.
+				 * Make sure that it's readable at this time
+				 * of bpf_exit, which means that program wrote
+				 * something into it earlier
+				 */
+				_(check_reg_arg(regs, BPF_REG_0, SRC_OP));
+				insn_idx = pop_stack(env, &prev_insn_idx);
+				if (insn_idx < 0) {
+					break;
+				} else {
+					do_print_state = true;
+					continue;
+				}
+			} else {
+				_(check_cond_jmp_op(env, insn, &insn_idx));
+			}
+		} else if (class == BPF_LD) {
+			u8 mode = BPF_MODE(insn->code);
+
+			if (mode == BPF_ABS || mode == BPF_IND) {
+				_(check_ld_abs(env, insn));
+			} else if (mode == BPF_IMM) {
+				_(check_ld_imm(env, insn));
+				insn_idx++;
+			} else {
+				verbose("invalid BPF_LD mode\n");
+				return -EINVAL;
+			}
+		} else {
+			verbose("unknown insn class %d\n", class);
+			return -EINVAL;
+		}
+
+		insn_idx++;
+	}
+
+	return 0;
+}
+
 /* look for pseudo eBPF instructions that access map FDs and
  * replace them with actual map pointers
  */
@@ -665,9 +1723,10 @@ int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 	if (ret < 0)
 		goto skip_full_check;
 
-	/* ret = do_check(env); */
+	ret = do_check(env);
 
 skip_full_check:
+	while (pop_stack(env, NULL) >= 0);
 
 	if (log_level && log_len >= log_size - 1) {
 		BUG_ON(log_len >= log_size);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 15/29] bpf: verifier (add state prunning optimization)
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (13 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 14/29] bpf: verifier (add verifier core) Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 16/29] bpf: allow eBPF programs to use maps Alexei Starovoitov
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

since verifier walks all possible paths it's important to recognize
equivalent verifier states to speed up verification.

If one of the old states is more strict than the current state, it means
the current state doesn't need to be explored further, since verifier already
concluded that more strict state leads to valid bpf_exit.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 kernel/bpf/verifier.c |  134 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 14a0262b101c..49374066a5bd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -220,6 +220,7 @@ struct verifier_env {
 	struct verifier_stack_elem *head; /* stack of verifier states to be processed */
 	int stack_size;			/* number of states to be processed */
 	struct verifier_state cur_state; /* current verifier state */
+	struct verifier_state_list **branch_landing; /* search prunning optimization */
 	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
 	u32 used_map_cnt;		/* number of used maps */
 };
@@ -1256,6 +1257,8 @@ enum {
 	BRANCH = 2,
 };
 
+#define STATE_END ((struct verifier_state_list *) -1L)
+
 #define PUSH_INT(I) \
 	do { \
 		if (cur_stack >= insn_cnt) { \
@@ -1297,6 +1300,9 @@ enum {
 			ret = -EINVAL; \
 			goto free_st; \
 		} \
+		if (E == BRANCH) \
+			/* mark branch target for state pruning */ \
+			env->branch_landing[w] = STATE_END; \
 		if (st[w] == 0) { \
 			/* tree-edge */ \
 			st[T] = DISCOVERED | E; \
@@ -1364,6 +1370,10 @@ peek_stack:
 				PUSH_INSN(t, t + 1, FALLTHROUGH);
 				PUSH_INSN(t, t + insns[t].off + 1, BRANCH);
 			}
+			/* tell verifier to check for equivalent verifier states
+			 * after every call and jump
+			 */
+			env->branch_landing[t + 1] = STATE_END;
 		} else {
 			/* all other non-branch instructions with single
 			 * fall-through edge
@@ -1395,6 +1405,88 @@ free_st:
 	return ret;
 }
 
+/* compare two verifier states
+ *
+ * all states stored in state_list are known to be valid, since
+ * verifier reached 'bpf_exit' instruction through them
+ *
+ * this function is called when verifier exploring different branches of
+ * execution popped from the state stack. If it sees an old state that has
+ * more strict register state and more strict stack state then this execution
+ * branch doesn't need to be explored further, since verifier already
+ * concluded that more strict state leads to valid finish.
+ *
+ * Therefore two states are equivalent if register state is more conservative
+ * and explored stack state is more conservative than the current one.
+ * Example:
+ *       explored                   current
+ * (slot1=INV slot2=MISC) == (slot1=MISC slot2=MISC)
+ * (slot1=MISC slot2=MISC) != (slot1=INV slot2=MISC)
+ *
+ * In other words if current stack state (one being explored) has more
+ * valid slots than old one that already passed validation, it means
+ * the verifier can stop exploring and conclude that current state is valid too
+ *
+ * Similarly with registers. If explored state has register type as invalid
+ * whereas register type in current state is meaningful, it means that
+ * the current state will reach 'bpf_exit' instruction safely
+ */
+static bool states_equal(struct verifier_state *old, struct verifier_state *cur)
+{
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++) {
+		if (memcmp(&old->regs[i], &cur->regs[i],
+			   sizeof(old->regs[0])) != 0) {
+			if (old->regs[i].type == NOT_INIT ||
+			    old->regs[i].type == UNKNOWN_VALUE)
+				continue;
+			return false;
+		}
+	}
+
+	for (i = 0; i < MAX_BPF_STACK; i++) {
+		if (memcmp(&old->stack[i], &cur->stack[i],
+			   sizeof(old->stack[0])) != 0) {
+			if (old->stack[i].stype == STACK_INVALID)
+				continue;
+			return false;
+		}
+	}
+	return true;
+}
+
+static int is_state_visited(struct verifier_env *env, int insn_idx)
+{
+	struct verifier_state_list *new_sl;
+	struct verifier_state_list *sl;
+
+	sl = env->branch_landing[insn_idx];
+	if (!sl)
+		/* no branch jump to this insn, ignore it */
+		return 0;
+
+	while (sl != STATE_END) {
+		if (states_equal(&sl->state, &env->cur_state))
+			/* reached equivalent register/stack state,
+			 * prune the search
+			 */
+			return 1;
+		sl = sl->next;
+	}
+	new_sl = kmalloc(sizeof(struct verifier_state_list), GFP_KERNEL);
+
+	if (!new_sl)
+		/* ignore ENOMEM, it doesn't affect correctness */
+		return 0;
+
+	/* add new state to the head of linked list */
+	memcpy(&new_sl->state, &env->cur_state, sizeof(env->cur_state));
+	new_sl->next = env->branch_landing[insn_idx];
+	env->branch_landing[insn_idx] = new_sl;
+	return 0;
+}
+
 static int do_check(struct verifier_env *env)
 {
 	struct verifier_state *state = &env->cur_state;
@@ -1426,6 +1518,17 @@ static int do_check(struct verifier_env *env)
 			return -E2BIG;
 		}
 
+		if (is_state_visited(env, insn_idx)) {
+			if (log_level) {
+				if (do_print_state)
+					verbose("\nfrom %d to %d: safe\n",
+						prev_insn_idx, insn_idx);
+				else
+					verbose("%d: safe\n", insn_idx);
+			}
+			goto process_bpf_exit;
+		}
+
 		if (log_level && do_print_state) {
 			verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx);
 			print_verifier_state(env);
@@ -1536,6 +1639,7 @@ static int do_check(struct verifier_env *env)
 				 * something into it earlier
 				 */
 				_(check_reg_arg(regs, BPF_REG_0, SRC_OP));
+process_bpf_exit:
 				insn_idx = pop_stack(env, &prev_insn_idx);
 				if (insn_idx < 0) {
 					break;
@@ -1670,6 +1774,28 @@ static void convert_pseudo_ld_imm64(struct verifier_env *env)
 			insn->src_reg = 0;
 }
 
+static void free_states(struct verifier_env *env)
+{
+	struct verifier_state_list *sl, *sln;
+	int i;
+
+	if (!env->branch_landing)
+		return;
+
+	for (i = 0; i < env->prog->len; i++) {
+		sl = env->branch_landing[i];
+
+		if (sl)
+			while (sl != STATE_END) {
+				sln = sl->next;
+				kfree(sl);
+				sl = sln;
+			}
+	}
+
+	kfree(env->branch_landing);
+}
+
 int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 {
 	void __user *log_ubuf = NULL;
@@ -1719,6 +1845,13 @@ int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 	if (ret < 0)
 		goto skip_full_check;
 
+	env->branch_landing = kcalloc(prog->len,
+				      sizeof(struct verifier_state_list *),
+				      GFP_KERNEL);
+	ret = -ENOMEM;
+	if (!env->branch_landing)
+		goto skip_full_check;
+
 	ret = check_cfg(env);
 	if (ret < 0)
 		goto skip_full_check;
@@ -1727,6 +1860,7 @@ int bpf_check(struct bpf_prog *prog, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1])
 
 skip_full_check:
 	while (pop_stack(env, NULL) >= 0);
+	free_states(env);
 
 	if (log_level && log_len >= log_size - 1) {
 		BUG_ON(log_len >= log_size);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 16/29] bpf: allow eBPF programs to use maps
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (14 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 15/29] bpf: verifier (add state prunning optimization) Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 17/29] bpf: split eBPF out of NET Alexei Starovoitov
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

expose bpf_map_lookup_elem(), bpf_map_update_elem(), bpf_map_delete_elem()
map accessors to eBPF programs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/linux/bpf.h      |    5 ++++
 include/uapi/linux/bpf.h |    3 ++
 kernel/bpf/syscall.c     |   68 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4d99c62c6cea..e811e294b59a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -132,4 +132,9 @@ struct bpf_prog *bpf_prog_get(u32 ufd);
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog *fp, struct nlattr *tb[BPF_PROG_ATTR_MAX + 1]);
 
+/* in-kernel helper functions called from eBPF programs */
+u64 bpf_map_lookup_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+u64 bpf_map_update_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+u64 bpf_map_delete_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 63284088aa48..8e4fd3f1c3ec 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -404,6 +404,9 @@ enum bpf_prog_type {
  */
 enum bpf_func_id {
 	BPF_FUNC_unspec,
+	BPF_FUNC_map_lookup_elem, /* void *map_lookup_elem(&map, &key) */
+	BPF_FUNC_map_update_elem, /* int map_update_elem(&map, &key, &value) */
+	BPF_FUNC_map_delete_elem, /* int map_delete_elem(&map, &key) */
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 60cb760cb423..009fe8c77a0b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -588,3 +588,71 @@ SYSCALL_DEFINE5(bpf, int, cmd, unsigned long, arg2, unsigned long, arg3,
 		return -EINVAL;
 	}
 }
+
+/* called from eBPF program under rcu lock
+ *
+ * if kernel subsystem is allowing eBPF programs to call this function,
+ * inside its own verifier_ops->get_func_proto() callback it should return
+ * (struct bpf_func_proto) {
+ *    .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+ *    .arg1_type = ARG_CONST_MAP_PTR,
+ *    .arg2_type = ARG_PTR_TO_MAP_KEY,
+ * }
+ * so that eBPF verifier properly checks the arguments
+ */
+u64 bpf_map_lookup_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
+	void *key = (void *) (unsigned long) r2;
+	void *value;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	value = map->ops->map_lookup_elem(map, key);
+
+	return (unsigned long) value;
+}
+
+/* called from eBPF program under rcu lock
+ *
+ * if kernel subsystem is allowing eBPF programs to call this function,
+ * inside its own verifier_ops->get_func_proto() callback it should return
+ * (struct bpf_func_proto) {
+ *    .ret_type = RET_INTEGER,
+ *    .arg1_type = ARG_CONST_MAP_PTR,
+ *    .arg2_type = ARG_PTR_TO_MAP_KEY,
+ *    .arg3_type = ARG_PTR_TO_MAP_VALUE,
+ * }
+ * so that eBPF verifier properly checks the arguments
+ */
+u64 bpf_map_update_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
+	void *key = (void *) (unsigned long) r2;
+	void *value = (void *) (unsigned long) r3;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	return map->ops->map_update_elem(map, key, value);
+}
+
+/* called from eBPF program under rcu lock
+ *
+ * if kernel subsystem is allowing eBPF programs to call this function,
+ * inside its own verifier_ops->get_func_proto() callback it should return
+ * (struct bpf_func_proto) {
+ *    .ret_type = RET_INTEGER,
+ *    .arg1_type = ARG_CONST_MAP_PTR,
+ *    .arg2_type = ARG_PTR_TO_MAP_KEY,
+ * }
+ * so that eBPF verifier properly checks the arguments
+ */
+u64 bpf_map_delete_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
+	void *key = (void *) (unsigned long) r2;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	return map->ops->map_delete_elem(map, key);
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 17/29] bpf: split eBPF out of NET
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (15 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 16/29] bpf: allow eBPF programs to use maps Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 18/29] tracing: allow eBPF programs to be attached to events Alexei Starovoitov
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

let eBPF have its own CONFIG_BPF, so that tracing and other subsystems don't
need to depend on all of NET

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 arch/Kconfig      |    4 ++++
 kernel/Makefile   |    2 +-
 kernel/bpf/core.c |   12 ++++++++++++
 net/Kconfig       |    1 +
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 0eae9df35b88..98cb1838c8ff 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -333,6 +333,10 @@ config SECCOMP_FILTER
 
 	  See Documentation/prctl/seccomp_filter.txt for details.
 
+config BPF
+	select NLATTR
+	boolean
+
 config HAVE_CC_STACKPROTECTOR
 	bool
 	help
diff --git a/kernel/Makefile b/kernel/Makefile
index dc5c77544fd6..17ea6d4a9a24 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -86,7 +86,7 @@ obj-$(CONFIG_RING_BUFFER) += trace/
 obj-$(CONFIG_TRACEPOINTS) += trace/
 obj-$(CONFIG_IRQ_WORK) += irq_work.o
 obj-$(CONFIG_CPU_PM) += cpu_pm.o
-obj-$(CONFIG_NET) += bpf/
+obj-$(CONFIG_BPF) += bpf/
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 0434c2170f2b..c17ba0ef3dcf 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -537,3 +537,15 @@ void bpf_prog_free(struct bpf_prog *fp)
 	bpf_jit_free(fp);
 }
 EXPORT_SYMBOL_GPL(bpf_prog_free);
+
+/* To emulate LD_ABS/LD_IND instructions __sk_run_filter() may call
+ * skb_copy_bits(), so provide a weak definition for it in NET-less config.
+ * seccomp_check_filter() verifies that seccomp filters are not using
+ * LD_ABS/LD_IND instructions. Other BPF users (like tracing filters)
+ * must not use these instructions unless ctx==skb
+ */
+int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to,
+			 int len)
+{
+	return -EFAULT;
+}
diff --git a/net/Kconfig b/net/Kconfig
index 4051fdfa4367..9a99e16d6f28 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -6,6 +6,7 @@ menuconfig NET
 	bool "Networking support"
 	select NLATTR
 	select GENERIC_NET_UTILS
+	select BPF
 	---help---
 	  Unless you really know what you are doing, you should say Y here.
 	  The reason is that some programs need kernel networking support even
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 18/29] tracing: allow eBPF programs to be attached to events
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (16 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 17/29] bpf: split eBPF out of NET Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 19/29] tracing: allow eBPF programs call printk() Alexei Starovoitov
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

User interface:
fd = open("/sys/kernel/debug/tracing/__event__/filter")

write(fd, "bpf_123")

where 123 is process local FD associated with eBPF program previously loaded.
__event__ is static tracepoint event or syscall.
(kprobe support is in next patch)
Once program is successfully attached to tracepoint event, the tracepoint
will be auto-enabled

close(fd)
auto-disables tracepoint event and detaches eBPF program from it

eBPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- memcmp
- dump_stack
- fetch_ptr/u64/u32/u16/u8 values from unsafe address via probe_kernel_read(),
  so that eBPF program can walk any kernel data structures

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 fs/btrfs/super.c                   |    3 +
 include/linux/ftrace_event.h       |    5 +
 include/trace/bpf_trace.h          |   23 +++++
 include/trace/ftrace.h             |   25 +++++
 include/uapi/linux/bpf.h           |    8 ++
 kernel/trace/Kconfig               |    1 +
 kernel/trace/Makefile              |    1 +
 kernel/trace/bpf_trace.c           |  183 ++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.h               |    3 +
 kernel/trace/trace_events.c        |   41 +++++++-
 kernel/trace/trace_events_filter.c |   72 +++++++++++++-
 kernel/trace/trace_syscalls.c      |   32 +++++++
 12 files changed, 395 insertions(+), 2 deletions(-)
 create mode 100644 include/trace/bpf_trace.h
 create mode 100644 kernel/trace/bpf_trace.c

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c4124de4435b..274d20752e07 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -61,6 +61,9 @@
 #include "tests/btrfs-tests.h"
 
 #define CREATE_TRACE_POINTS
+/* dummy definition to make compiler happy */
+struct __btrfs_workqueue {
+};
 #include <trace/events/btrfs.h>
 
 static const struct super_operations btrfs_super_ops;
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 28672e87e910..10602bfa07fe 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -237,6 +237,7 @@ enum {
 	TRACE_EVENT_FL_WAS_ENABLED_BIT,
 	TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
 	TRACE_EVENT_FL_TRACEPOINT_BIT,
+	TRACE_EVENT_FL_BPF_BIT,
 };
 
 /*
@@ -259,6 +260,7 @@ enum {
 	TRACE_EVENT_FL_WAS_ENABLED	= (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
 	TRACE_EVENT_FL_USE_CALL_FILTER	= (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
 	TRACE_EVENT_FL_TRACEPOINT	= (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
+	TRACE_EVENT_FL_BPF		= (1 << TRACE_EVENT_FL_BPF_BIT),
 };
 
 struct ftrace_event_call {
@@ -533,6 +535,9 @@ event_trigger_unlock_commit_regs(struct ftrace_event_file *file,
 		event_triggers_post_call(file, tt);
 }
 
+struct bpf_context;
+void trace_filter_call_bpf(struct event_filter *filter, struct bpf_context *ctx);
+
 enum {
 	FILTER_OTHER = 0,
 	FILTER_STATIC_STRING,
diff --git a/include/trace/bpf_trace.h b/include/trace/bpf_trace.h
new file mode 100644
index 000000000000..6074cc917800
--- /dev/null
+++ b/include/trace/bpf_trace.h
@@ -0,0 +1,23 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _LINUX_KERNEL_BPF_TRACE_H
+#define _LINUX_KERNEL_BPF_TRACE_H
+
+/* For tracing filters save first six arguments of tracepoint events.
+ * argN fields match one to one to arguments passed to tracepoint events.
+ */
+struct bpf_context {
+	u64 arg1;
+	u64 arg2;
+	u64 arg3;
+	u64 arg4;
+	u64 arg5;
+	u64 arg6;
+	u64 ret;
+};
+
+#endif /* _LINUX_KERNEL_BPF_TRACE_H */
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 26b4f2e13275..4c7d59bdc24a 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -17,6 +17,7 @@
  */
 
 #include <linux/ftrace_event.h>
+#include <trace/bpf_trace.h>
 
 /*
  * DECLARE_EVENT_CLASS can be used to add a generic function
@@ -619,6 +620,20 @@ static inline notrace int ftrace_get_offsets_##call(			\
 #undef __perf_task
 #define __perf_task(t)	(t)
 
+/* cast any interger or pointer type to u64 without warnings */
+#define __CAST_TO_U64(expr) \
+	 __builtin_choose_expr(sizeof(long) < sizeof(expr), \
+			       (u64) (expr - ((typeof(expr))0)), \
+			       (u64) (long) expr)
+
+#define __BPF_CAST1(a,...) __CAST_TO_U64(a)
+#define __BPF_CAST2(a,...) __CAST_TO_U64(a), __BPF_CAST1(__VA_ARGS__)
+#define __BPF_CAST3(a,...) __CAST_TO_U64(a), __BPF_CAST2(__VA_ARGS__)
+#define __BPF_CAST4(a,...) __CAST_TO_U64(a), __BPF_CAST3(__VA_ARGS__)
+#define __BPF_CAST5(a,...) __CAST_TO_U64(a), __BPF_CAST4(__VA_ARGS__)
+#define __BPF_CAST6(a,...) __CAST_TO_U64(a), __BPF_CAST5(__VA_ARGS__)
+#define __BPF_CAST(a,...)  __CAST_TO_U64(a), __BPF_CAST6(__VA_ARGS__)
+
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
 									\
@@ -634,6 +649,16 @@ ftrace_raw_event_##call(void *__data, proto)				\
 	if (ftrace_trigger_soft_disabled(ftrace_file))			\
 		return;							\
 									\
+	if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) &&	\
+	    unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \
+		struct bpf_context __ctx = ((struct bpf_context) {	\
+				__BPF_CAST(args, 0, 0, 0, 0, 0, 0)	\
+			});						\
+									\
+		trace_filter_call_bpf(ftrace_file->filter, &__ctx);	\
+		return;							\
+	}								\
+									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
 									\
 	entry = ftrace_event_buffer_reserve(&fbuffer, ftrace_file,	\
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8e4fd3f1c3ec..9a633c5b6b0a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -397,6 +397,7 @@ enum bpf_prog_attributes {
 enum bpf_prog_type {
 	BPF_PROG_TYPE_UNSPEC,
 	BPF_PROG_TYPE_SOCKET_FILTER,
+	BPF_PROG_TYPE_TRACING_FILTER,
 };
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -407,6 +408,13 @@ enum bpf_func_id {
 	BPF_FUNC_map_lookup_elem, /* void *map_lookup_elem(&map, &key) */
 	BPF_FUNC_map_update_elem, /* int map_update_elem(&map, &key, &value) */
 	BPF_FUNC_map_delete_elem, /* int map_delete_elem(&map, &key) */
+	BPF_FUNC_fetch_ptr,       /* void *bpf_fetch_ptr(void *unsafe_ptr) */
+	BPF_FUNC_fetch_u64,       /* u64 bpf_fetch_u64(void *unsafe_ptr) */
+	BPF_FUNC_fetch_u32,       /* u32 bpf_fetch_u32(void *unsafe_ptr) */
+	BPF_FUNC_fetch_u16,       /* u16 bpf_fetch_u16(void *unsafe_ptr) */
+	BPF_FUNC_fetch_u8,        /* u8 bpf_fetch_u8(void *unsafe_ptr) */
+	BPF_FUNC_memcmp,          /* int bpf_memcmp(void *unsafe_ptr, void *safe_ptr, int size) */
+	BPF_FUNC_dump_stack,      /* void bpf_dump_stack(void) */
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a5da09c899dd..038424e54443 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -75,6 +75,7 @@ config FTRACE_NMI_ENTER
 
 config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
+	select BPF
 	bool
 
 config CONTEXT_SWITCH_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 67d6369ddf83..fe897168a19e 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -53,6 +53,7 @@ obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
 endif
 obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
 obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
+obj-$(CONFIG_EVENT_TRACING) += bpf_trace.o
 obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
 obj-$(CONFIG_TRACEPOINTS) += power-traces.o
 ifeq ($(CONFIG_PM_RUNTIME),y)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
new file mode 100644
index 000000000000..b4751e2c0d52
--- /dev/null
+++ b/kernel/trace/bpf_trace.c
@@ -0,0 +1,183 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/uaccess.h>
+#include <trace/bpf_trace.h>
+#include "trace.h"
+
+static u64 bpf_fetch_ptr(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	void *unsafe_ptr = (void *) (long) r1;
+	void *ptr = NULL;
+
+	probe_kernel_read(&ptr, unsafe_ptr, sizeof(ptr));
+	return (u64) (unsigned long) ptr;
+}
+
+#define FETCH(SIZE) \
+static u64 bpf_fetch_##SIZE(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)	\
+{									\
+	void *unsafe_ptr = (void *) (long) r1;				\
+	SIZE val = 0;							\
+									\
+	probe_kernel_read(&val, unsafe_ptr, sizeof(val));		\
+	return (u64) (SIZE) val;					\
+}
+FETCH(u64)
+FETCH(u32)
+FETCH(u16)
+FETCH(u8)
+#undef FETCH
+
+static u64 bpf_memcmp(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	void *unsafe_ptr = (void *) (long) r1;
+	void *safe_ptr = (void *) (long) r2;
+	u32 size = (u32) r3;
+	char buf[64];
+	int err;
+
+	if (size < 64) {
+		err = probe_kernel_read(buf, unsafe_ptr, size);
+		if (err)
+			return err;
+		return memcmp(buf, safe_ptr, size);
+	}
+	return -1;
+}
+
+static u64 bpf_dump_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	trace_dump_stack(0);
+	return 0;
+}
+
+static struct bpf_func_proto tracing_filter_funcs[] = {
+#define FETCH(SIZE)				\
+	[BPF_FUNC_fetch_##SIZE] = {		\
+		.func = bpf_fetch_##SIZE,	\
+		.gpl_only = false,		\
+		.ret_type = RET_INTEGER,	\
+	},
+	FETCH(ptr)
+	FETCH(u64)
+	FETCH(u32)
+	FETCH(u16)
+	FETCH(u8)
+#undef FETCH
+	[BPF_FUNC_memcmp] = {
+		.func = bpf_memcmp,
+		.gpl_only = false,
+		.ret_type = RET_INTEGER,
+		.arg1_type = ARG_ANYTHING,
+		.arg2_type = ARG_PTR_TO_STACK,
+		.arg3_type = ARG_CONST_STACK_SIZE,
+	},
+	[BPF_FUNC_dump_stack] = {
+		.func = bpf_dump_stack,
+		.gpl_only = false,
+		.ret_type = RET_VOID,
+	},
+	[BPF_FUNC_map_lookup_elem] = {
+		.func = bpf_map_lookup_elem,
+		.gpl_only = false,
+		.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+		.arg1_type = ARG_CONST_MAP_PTR,
+		.arg2_type = ARG_PTR_TO_MAP_KEY,
+	},
+	[BPF_FUNC_map_update_elem] = {
+		.func = bpf_map_update_elem,
+		.gpl_only = false,
+		.ret_type = RET_INTEGER,
+		.arg1_type = ARG_CONST_MAP_PTR,
+		.arg2_type = ARG_PTR_TO_MAP_KEY,
+		.arg3_type = ARG_PTR_TO_MAP_VALUE,
+	},
+	[BPF_FUNC_map_delete_elem] = {
+		.func = bpf_map_delete_elem,
+		.gpl_only = false,
+		.ret_type = RET_INTEGER,
+		.arg1_type = ARG_CONST_MAP_PTR,
+		.arg2_type = ARG_PTR_TO_MAP_KEY,
+	},
+};
+
+static const struct bpf_func_proto *tracing_filter_func_proto(enum bpf_func_id func_id)
+{
+	if (func_id < 0 || func_id >= ARRAY_SIZE(tracing_filter_funcs))
+		return NULL;
+	return &tracing_filter_funcs[func_id];
+}
+
+static const struct bpf_context_access {
+	int size;
+	enum bpf_access_type type;
+} tracing_filter_ctx_access[] = {
+	[offsetof(struct bpf_context, arg1)] = {
+		FIELD_SIZEOF(struct bpf_context, arg1),
+		BPF_READ
+	},
+	[offsetof(struct bpf_context, arg2)] = {
+		FIELD_SIZEOF(struct bpf_context, arg2),
+		BPF_READ
+	},
+	[offsetof(struct bpf_context, arg3)] = {
+		FIELD_SIZEOF(struct bpf_context, arg3),
+		BPF_READ
+	},
+	[offsetof(struct bpf_context, arg4)] = {
+		FIELD_SIZEOF(struct bpf_context, arg4),
+		BPF_READ
+	},
+	[offsetof(struct bpf_context, arg5)] = {
+		FIELD_SIZEOF(struct bpf_context, arg5),
+		BPF_READ
+	},
+	[offsetof(struct bpf_context, arg6)] = {
+		FIELD_SIZEOF(struct bpf_context, arg6),
+		BPF_READ
+	},
+	[offsetof(struct bpf_context, ret)] = {
+		FIELD_SIZEOF(struct bpf_context, ret),
+		BPF_READ
+	},
+};
+
+static bool tracing_filter_is_valid_access(int off, int size, enum bpf_access_type type)
+{
+	const struct bpf_context_access *access;
+
+	if (off < 0 || off >= ARRAY_SIZE(tracing_filter_ctx_access))
+		return false;
+
+	access = &tracing_filter_ctx_access[off];
+	if (access->size == size && (access->type & type))
+		return true;
+
+	return false;
+}
+
+static struct bpf_verifier_ops tracing_filter_ops = {
+	.get_func_proto = tracing_filter_func_proto,
+	.is_valid_access = tracing_filter_is_valid_access,
+};
+
+static struct bpf_prog_type_list tl = {
+	.ops = &tracing_filter_ops,
+	.type = BPF_PROG_TYPE_TRACING_FILTER,
+};
+
+static int __init register_tracing_filter_ops(void)
+{
+	bpf_register_prog_type(&tl);
+	return 0;
+}
+late_initcall(register_tracing_filter_ops);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 385391fb1d3b..f0b7caa71b9d 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -986,12 +986,15 @@ struct ftrace_event_field {
 	int			is_signed;
 };
 
+struct bpf_prog;
+
 struct event_filter {
 	int			n_preds;	/* Number assigned */
 	int			a_preds;	/* allocated */
 	struct filter_pred	*preds;
 	struct filter_pred	*root;
 	char			*filter_string;
+	struct bpf_prog		*prog;
 };
 
 struct event_subsystem {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index ef06ce7e9cf8..181820f3a571 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1051,6 +1051,26 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
 	return r;
 }
 
+static int event_filter_release(struct inode *inode, struct file *filp)
+{
+	struct ftrace_event_file *file;
+	char buf[2] = "0";
+
+	mutex_lock(&event_mutex);
+	file = event_file_data(filp);
+	if (file) {
+		if (file->event_call->flags & TRACE_EVENT_FL_BPF) {
+			/* auto-disable the filter */
+			ftrace_event_enable_disable(file, 0);
+
+			/* if BPF filter was used, clear it on fd close */
+			apply_event_filter(file, buf);
+		}
+	}
+	mutex_unlock(&event_mutex);
+	return 0;
+}
+
 static ssize_t
 event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
 		   loff_t *ppos)
@@ -1074,10 +1094,28 @@ event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
 
 	mutex_lock(&event_mutex);
 	file = event_file_data(filp);
-	if (file)
+	if (file) {
+		/*
+		 * note to user space tools:
+		 * write() into debugfs/tracing/events/xxx/filter file
+		 * must be done with the same privilege level as open()
+		 */
 		err = apply_event_filter(file, buf);
+		if (!err && file->event_call->flags & TRACE_EVENT_FL_BPF)
+			/* once filter is applied, auto-enable it */
+			ftrace_event_enable_disable(file, 1);
+	}
+
 	mutex_unlock(&event_mutex);
 
+	if (file && file->event_call->flags & TRACE_EVENT_FL_BPF) {
+		/*
+		 * allocate per-cpu printk buffers, since eBPF program
+		 * might be calling bpf_trace_printk
+		 */
+		trace_printk_init_buffers();
+	}
+
 	free_page((unsigned long) buf);
 	if (err < 0)
 		return err;
@@ -1328,6 +1366,7 @@ static const struct file_operations ftrace_event_filter_fops = {
 	.open = tracing_open_generic,
 	.read = event_filter_read,
 	.write = event_filter_write,
+	.release = event_filter_release,
 	.llseek = default_llseek,
 };
 
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 7a8c1528e141..401fca436054 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -23,6 +23,9 @@
 #include <linux/mutex.h>
 #include <linux/perf_event.h>
 #include <linux/slab.h>
+#include <linux/bpf.h>
+#include <trace/bpf_trace.h>
+#include <linux/filter.h>
 
 #include "trace.h"
 #include "trace_output.h"
@@ -535,6 +538,16 @@ static int filter_match_preds_cb(enum move_type move, struct filter_pred *pred,
 	return WALK_PRED_DEFAULT;
 }
 
+void trace_filter_call_bpf(struct event_filter *filter, struct bpf_context *ctx)
+{
+	BUG_ON(!filter || !filter->prog);
+
+	rcu_read_lock();
+	BPF_PROG_RUN(filter->prog, (void *) ctx);
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(trace_filter_call_bpf);
+
 /* return 1 if event matches, 0 otherwise (discard) */
 int filter_match_preds(struct event_filter *filter, void *rec)
 {
@@ -789,6 +802,8 @@ static void __free_filter(struct event_filter *filter)
 	if (!filter)
 		return;
 
+	if (filter->prog)
+		bpf_prog_put(filter->prog);
 	__free_preds(filter);
 	kfree(filter->filter_string);
 	kfree(filter);
@@ -1857,6 +1872,48 @@ static int create_filter_start(char *filter_str, bool set_str,
 	return err;
 }
 
+static int create_filter_bpf(char *filter_str, struct event_filter **filterp)
+{
+	struct event_filter *filter;
+	struct bpf_prog *prog;
+	long ufd;
+	int err = 0;
+
+	*filterp = NULL;
+
+	filter = __alloc_filter();
+	if (!filter)
+		return -ENOMEM;
+
+	err = replace_filter_string(filter, filter_str);
+	if (err)
+		goto free_filter;
+
+	err = kstrtol(filter_str + 4, 0, &ufd);
+	if (err)
+		goto free_filter;
+
+	err = -ESRCH;
+	prog = bpf_prog_get(ufd);
+	if (!prog)
+		goto free_filter;
+
+	filter->prog = prog;
+
+	err = -EINVAL;
+	if (prog->info->prog_type != BPF_PROG_TYPE_TRACING_FILTER)
+		/* prog_id is valid, but it's not a tracing filter program */
+		goto free_filter;
+
+	*filterp = filter;
+
+	return 0;
+
+free_filter:
+	__free_filter(filter);
+	return err;
+}
+
 static void create_filter_finish(struct filter_parse_state *ps)
 {
 	if (ps) {
@@ -1966,7 +2023,20 @@ int apply_event_filter(struct ftrace_event_file *file, char *filter_string)
 		return 0;
 	}
 
-	err = create_filter(call, filter_string, true, &filter);
+	/*
+	 * 'bpf_123' string is a request to attach eBPF program with id == 123
+	 * also accept 'bpf 123', 'bpf.123', 'bpf-123' variants
+	 */
+	if (memcmp(filter_string, "bpf", 3) == 0 && filter_string[3] != 0 &&
+	    filter_string[4] != 0) {
+		err = create_filter_bpf(filter_string, &filter);
+		if (!err)
+			call->flags |= TRACE_EVENT_FL_BPF;
+	} else {
+		err = create_filter(call, filter_string, true, &filter);
+		if (!err)
+			call->flags &= ~TRACE_EVENT_FL_BPF;
+	}
 
 	/*
 	 * Always swap the call filter with the new filter
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 759d5e004517..7a3d0623763f 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -7,6 +7,7 @@
 #include <linux/ftrace.h>
 #include <linux/perf_event.h>
 #include <asm/syscall.h>
+#include <trace/bpf_trace.h>
 
 #include "trace_output.h"
 #include "trace.h"
@@ -299,6 +300,21 @@ static int __init syscall_exit_define_fields(struct ftrace_event_call *call)
 	return ret;
 }
 
+static void populate_bpf_ctx(struct bpf_context *ctx, struct pt_regs *regs)
+{
+	struct task_struct *task = current;
+	unsigned long args[6];
+
+	syscall_get_arguments(task, regs, 0, 6, args);
+	ctx->arg1 = args[0];
+	ctx->arg2 = args[1];
+	ctx->arg3 = args[2];
+	ctx->arg4 = args[3];
+	ctx->arg5 = args[4];
+	ctx->arg6 = args[5];
+	ctx->ret = syscall_get_return_value(task, regs);
+}
+
 static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 {
 	struct trace_array *tr = data;
@@ -328,6 +344,14 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	if (!sys_data)
 		return;
 
+	if (ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF) {
+		struct bpf_context ctx;
+
+		populate_bpf_ctx(&ctx, regs);
+		trace_filter_call_bpf(ftrace_file->filter, &ctx);
+		return;
+	}
+
 	size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
 
 	local_save_flags(irq_flags);
@@ -375,6 +399,14 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
+	if (ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF) {
+		struct bpf_context ctx;
+
+		populate_bpf_ctx(&ctx, regs);
+		trace_filter_call_bpf(ftrace_file->filter, &ctx);
+		return;
+	}
+
 	local_save_flags(irq_flags);
 	pc = preempt_count();
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 19/29] tracing: allow eBPF programs call printk()
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (17 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 18/29] tracing: allow eBPF programs to be attached to events Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 20/29] tracing: allow eBPF programs to be attached to kprobe/kretprobe Alexei Starovoitov
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

limited printk() with %d %u %x %p modifiers only

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/uapi/linux/bpf.h |    1 +
 kernel/trace/bpf_trace.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9a633c5b6b0a..5adaad826471 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -415,6 +415,7 @@ enum bpf_func_id {
 	BPF_FUNC_fetch_u8,        /* u8 bpf_fetch_u8(void *unsafe_ptr) */
 	BPF_FUNC_memcmp,          /* int bpf_memcmp(void *unsafe_ptr, void *safe_ptr, int size) */
 	BPF_FUNC_dump_stack,      /* void bpf_dump_stack(void) */
+	BPF_FUNC_printk,          /* int bpf_printk(const char *fmt, int fmt_size, ...) */
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b4751e2c0d52..ff98be5a24d6 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -60,6 +60,60 @@ static u64 bpf_dump_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 	return 0;
 }
 
+/* limited printk()
+ * only %d %u %x %ld %lu %lx %lld %llu %llx %p conversion specifiers allowed
+ */
+static u64 bpf_printk(u64 r1, u64 fmt_size, u64 r3, u64 r4, u64 r5)
+{
+	char *fmt = (char *) (long) r1;
+	int fmt_cnt = 0;
+	bool mod_l[3] = {};
+	int i;
+
+	/* bpf_check() guarantees that fmt points to bpf program stack and
+	 * fmt_size bytes of it were initialized by bpf program
+	 */
+	if (fmt[fmt_size - 1] != 0)
+		return -EINVAL;
+
+	/* check format string for allowed specifiers */
+	for (i = 0; i < fmt_size; i++)
+		if (fmt[i] == '%') {
+			if (fmt_cnt >= 3)
+				return -EINVAL;
+			i++;
+			if (i >= fmt_size)
+				return -EINVAL;
+
+			if (fmt[i] == 'l') {
+				mod_l[fmt_cnt] = true;
+				i++;
+				if (i >= fmt_size)
+					return -EINVAL;
+			} else if (fmt[i] == 'p') {
+				mod_l[fmt_cnt] = true;
+				fmt_cnt++;
+				continue;
+			}
+
+			if (fmt[i] == 'l') {
+				mod_l[fmt_cnt] = true;
+				i++;
+				if (i >= fmt_size)
+					return -EINVAL;
+			}
+
+			if (fmt[i] != 'd' && fmt[i] != 'u' && fmt[i] != 'x')
+				return -EINVAL;
+			fmt_cnt++;
+		}
+
+	return __trace_printk((unsigned long) __builtin_return_address(3), fmt,
+			      mod_l[0] ? r3 : (u32) r3,
+			      mod_l[1] ? r4 : (u32) r4,
+			      mod_l[2] ? r5 : (u32) r5);
+}
+
 static struct bpf_func_proto tracing_filter_funcs[] = {
 #define FETCH(SIZE)				\
 	[BPF_FUNC_fetch_##SIZE] = {		\
@@ -86,6 +140,13 @@ static struct bpf_func_proto tracing_filter_funcs[] = {
 		.gpl_only = false,
 		.ret_type = RET_VOID,
 	},
+	[BPF_FUNC_printk] = {
+		.func = bpf_printk,
+		.gpl_only = true,
+		.ret_type = RET_INTEGER,
+		.arg1_type = ARG_PTR_TO_STACK,
+		.arg2_type = ARG_CONST_STACK_SIZE,
+	},
 	[BPF_FUNC_map_lookup_elem] = {
 		.func = bpf_map_lookup_elem,
 		.gpl_only = false,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 20/29] tracing: allow eBPF programs to be attached to kprobe/kretprobe
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (18 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 19/29] tracing: allow eBPF programs call printk() Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 21/29] tracing: allow eBPF programs to call ktime_get_ns() and get_current() Alexei Starovoitov
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 kernel/trace/trace_kprobe.c |   28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 282f6e4e5539..b6db92207c99 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -19,6 +19,7 @@
 
 #include <linux/module.h>
 #include <linux/uaccess.h>
+#include <trace/bpf_trace.h>
 
 #include "trace_probe.h"
 
@@ -930,6 +931,22 @@ __kprobe_trace_func(struct trace_kprobe *tk, struct pt_regs *regs,
 	if (ftrace_trigger_soft_disabled(ftrace_file))
 		return;
 
+	if (call->flags & TRACE_EVENT_FL_BPF) {
+		struct bpf_context ctx = {};
+		unsigned long args[3];
+		/* get first 3 arguments of the function. x64 syscall ABI uses
+		 * the same 3 registers as x64 calling convention.
+		 * todo: implement it cleanly via arch specific
+		 * regs_get_argument_nth() helper
+		 */
+		syscall_get_arguments(current, regs, 0, 3, args);
+		ctx.arg1 = args[0];
+		ctx.arg2 = args[1];
+		ctx.arg3 = args[2];
+		trace_filter_call_bpf(ftrace_file->filter, &ctx);
+		return;
+	}
+
 	local_save_flags(irq_flags);
 	pc = preempt_count();
 
@@ -978,6 +995,17 @@ __kretprobe_trace_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
 	if (ftrace_trigger_soft_disabled(ftrace_file))
 		return;
 
+	if (call->flags & TRACE_EVENT_FL_BPF) {
+		struct bpf_context ctx = {};
+		/* assume that register used to return a value from syscall is
+		 * the same as register used to return a value from a function
+		 * todo: provide arch specific helper
+		 */
+		ctx.ret = syscall_get_return_value(current, regs);
+		trace_filter_call_bpf(ftrace_file->filter, &ctx);
+		return;
+	}
+
 	local_save_flags(irq_flags);
 	pc = preempt_count();
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 21/29] tracing: allow eBPF programs to call ktime_get_ns() and get_current()
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (19 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 20/29] tracing: allow eBPF programs to be attached to kprobe/kretprobe Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 22/29] samples: bpf: add mini eBPF library to manipulate maps and programs Alexei Starovoitov
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/uapi/linux/bpf.h |    2 ++
 kernel/trace/bpf_trace.c |   20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5adaad826471..e866a2fcba8b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -416,6 +416,8 @@ enum bpf_func_id {
 	BPF_FUNC_memcmp,          /* int bpf_memcmp(void *unsafe_ptr, void *safe_ptr, int size) */
 	BPF_FUNC_dump_stack,      /* void bpf_dump_stack(void) */
 	BPF_FUNC_printk,          /* int bpf_printk(const char *fmt, int fmt_size, ...) */
+	BPF_FUNC_ktime_get_ns,    /* u64 bpf_ktime_get_ns(void) */
+	BPF_FUNC_get_current,     /* struct task_struct *bpf_get_current(void) */
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ff98be5a24d6..a98e13e1131b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -114,6 +114,16 @@ static u64 bpf_printk(u64 r1, u64 fmt_size, u64 r3, u64 r4, u64 r5)
 			      mod_l[2] ? r5 : (u32) r5);
 }
 
+static u64 bpf_ktime_get_ns(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	return ktime_get_ns();
+}
+
+static u64 bpf_get_current(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	return (u64) (long) current;
+}
+
 static struct bpf_func_proto tracing_filter_funcs[] = {
 #define FETCH(SIZE)				\
 	[BPF_FUNC_fetch_##SIZE] = {		\
@@ -169,6 +179,16 @@ static struct bpf_func_proto tracing_filter_funcs[] = {
 		.arg1_type = ARG_CONST_MAP_PTR,
 		.arg2_type = ARG_PTR_TO_MAP_KEY,
 	},
+	[BPF_FUNC_ktime_get_ns] = {
+		.func = bpf_ktime_get_ns,
+		.gpl_only = true,
+		.ret_type = RET_INTEGER,
+	},
+	[BPF_FUNC_get_current] = {
+		.func = bpf_get_current,
+		.gpl_only = true,
+		.ret_type = RET_INTEGER,
+	},
 };
 
 static const struct bpf_func_proto *tracing_filter_func_proto(enum bpf_func_id func_id)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 22/29] samples: bpf: add mini eBPF library to manipulate maps and programs
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (20 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 21/29] tracing: allow eBPF programs to call ktime_get_ns() and get_current() Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 23/29] samples: bpf: example of tracing filters with eBPF Alexei Starovoitov
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

the library includes a trivial set of BPF syscall wrappers:

int bpf_create_map(int key_size, int value_size, int max_entries);

int bpf_update_elem(int fd, void *key, void *value);

int bpf_lookup_elem(int fd, void *key, void *value);

int bpf_delete_elem(int fd, void *key);

int bpf_get_next_key(int fd, void *key, void *next_key);

int bpf_prog_load(enum bpf_prog_type prog_type,
		  const struct sock_filter_int *insns, int insn_len,
		  const char *license);

bpf_prog_load() stores verifier log into global bpf_log_buf[] array

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 samples/bpf/libbpf.c |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/libbpf.h |   21 ++++++++
 2 files changed, 159 insertions(+)
 create mode 100644 samples/bpf/libbpf.c
 create mode 100644 samples/bpf/libbpf.h

diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
new file mode 100644
index 000000000000..81396dab5caa
--- /dev/null
+++ b/samples/bpf/libbpf.c
@@ -0,0 +1,138 @@
+/* eBPF mini library */
+#include <stdlib.h>
+#include <stdio.h>
+#include <linux/unistd.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/netlink.h>
+#include <linux/bpf.h>
+#include <errno.h>
+#include "libbpf.h"
+
+struct nlattr_u32 {
+	__u16 nla_len;
+	__u16 nla_type;
+	__u32 val;
+};
+
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+		   int max_entries)
+{
+	struct nlattr_u32 attr[] = {
+		{
+			.nla_len = sizeof(struct nlattr_u32),
+			.nla_type = BPF_MAP_KEY_SIZE,
+			.val = key_size,
+		},
+		{
+			.nla_len = sizeof(struct nlattr_u32),
+			.nla_type = BPF_MAP_VALUE_SIZE,
+			.val = value_size,
+		},
+		{
+			.nla_len = sizeof(struct nlattr_u32),
+			.nla_type = BPF_MAP_MAX_ENTRIES,
+			.val = max_entries,
+		},
+	};
+
+	return syscall(__NR_bpf, BPF_MAP_CREATE, map_type, attr, sizeof(attr), 0);
+}
+
+
+int bpf_update_elem(int fd, void *key, void *value)
+{
+	return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, fd, key, value, 0);
+}
+
+int bpf_lookup_elem(int fd, void *key, void *value)
+{
+	return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, fd, key, value, 0);
+}
+
+int bpf_delete_elem(int fd, void *key)
+{
+	return syscall(__NR_bpf, BPF_MAP_DELETE_ELEM, fd, key, 0, 0);
+}
+
+int bpf_get_next_key(int fd, void *key, void *next_key)
+{
+	return syscall(__NR_bpf, BPF_MAP_GET_NEXT_KEY, fd, key, next_key, 0);
+}
+
+#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))
+
+char bpf_log_buf[LOG_BUF_SIZE];
+
+int bpf_prog_load(enum bpf_prog_type prog_type,
+		  const struct bpf_insn *insns, int prog_len,
+		  const char *license)
+{
+	int nlattr_size, license_len, err;
+	void *nlattr, *ptr;
+	char *log_buf = bpf_log_buf;
+	int log_size = LOG_BUF_SIZE;
+	int log_level = 1;
+
+	log_buf[0] = 0;
+
+	license_len = strlen(license) + 1;
+	nlattr_size = sizeof(struct nlattr) + prog_len + sizeof(struct nlattr) +
+		ROUND_UP(license_len, 4) +
+		sizeof(struct nlattr) + sizeof(log_level) +
+		sizeof(struct nlattr) + sizeof(log_buf) +
+		sizeof(struct nlattr) + sizeof(log_size);
+
+	ptr = nlattr = malloc(nlattr_size);
+	if (!ptr) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	*(struct nlattr *) ptr = (struct nlattr) {
+		.nla_len = prog_len + sizeof(struct nlattr),
+		.nla_type = BPF_PROG_TEXT,
+	};
+	ptr += sizeof(struct nlattr);
+
+	memcpy(ptr, insns, prog_len);
+	ptr += prog_len;
+
+	*(struct nlattr *) ptr = (struct nlattr) {
+		.nla_len = ROUND_UP(license_len, 4) + sizeof(struct nlattr),
+		.nla_type = BPF_PROG_LICENSE,
+	};
+	ptr += sizeof(struct nlattr);
+
+	memcpy(ptr, license, license_len);
+	ptr += ROUND_UP(license_len, 4);
+
+	*(struct nlattr *) ptr = (struct nlattr) {
+		.nla_len = sizeof(log_level) + sizeof(struct nlattr),
+		.nla_type = BPF_PROG_LOG_LEVEL,
+	};
+	ptr += sizeof(struct nlattr);
+	memcpy(ptr, &log_level, sizeof(log_level));
+	ptr += sizeof(log_level);
+
+	*(struct nlattr *) ptr = (struct nlattr) {
+		.nla_len = sizeof(log_buf) + sizeof(struct nlattr),
+		.nla_type = BPF_PROG_LOG_BUF,
+	};
+	ptr += sizeof(struct nlattr);
+	memcpy(ptr, &log_buf, sizeof(log_buf));
+	ptr += sizeof(log_buf);
+
+	*(struct nlattr *) ptr = (struct nlattr) {
+		.nla_len = sizeof(log_size) + sizeof(struct nlattr),
+		.nla_type = BPF_PROG_LOG_SIZE,
+	};
+	ptr += sizeof(struct nlattr);
+	memcpy(ptr, &log_size, sizeof(log_size));
+	ptr += sizeof(log_size);
+
+	err = syscall(__NR_bpf, BPF_PROG_LOAD, prog_type, nlattr, nlattr_size, 0);
+
+	free(nlattr);
+	return err;
+}
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
new file mode 100644
index 000000000000..b19e39794291
--- /dev/null
+++ b/samples/bpf/libbpf.h
@@ -0,0 +1,21 @@
+/* eBPF mini library */
+#ifndef __LIBBPF_H
+#define __LIBBPF_H
+
+struct bpf_insn;
+
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+		   int max_entries);
+int bpf_update_elem(int fd, void *key, void *value);
+int bpf_lookup_elem(int fd, void *key, void *value);
+int bpf_delete_elem(int fd, void *key);
+int bpf_get_next_key(int fd, void *key, void *next_key);
+
+int bpf_prog_load(enum bpf_prog_type prog_type,
+		  const struct bpf_insn *insns, int insn_len,
+		  const char *license);
+
+#define LOG_BUF_SIZE 8192
+extern char bpf_log_buf[LOG_BUF_SIZE];
+
+#endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 23/29] samples: bpf: example of tracing filters with eBPF
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (21 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 22/29] samples: bpf: add mini eBPF library to manipulate maps and programs Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 24/29] bpf: verifier test Alexei Starovoitov
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

simple packet drop monitor:
- in-kernel eBPF program attaches to kfree_skb() event and records number
  of packet drops at given location
- userspace iterates over the map every second and prints stats

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 samples/bpf/Makefile  |   12 +++++
 samples/bpf/dropmon.c |  122 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 134 insertions(+)
 create mode 100644 samples/bpf/Makefile
 create mode 100644 samples/bpf/dropmon.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
new file mode 100644
index 000000000000..8dfbacaef9ba
--- /dev/null
+++ b/samples/bpf/Makefile
@@ -0,0 +1,12 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+
+# List of programs to build
+hostprogs-y := dropmon
+
+dropmon-objs := dropmon.o libbpf.o
+
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
+
+HOSTCFLAGS += -I$(objtree)/usr/include
diff --git a/samples/bpf/dropmon.c b/samples/bpf/dropmon.c
new file mode 100644
index 000000000000..3b921e2f4c8f
--- /dev/null
+++ b/samples/bpf/dropmon.c
@@ -0,0 +1,122 @@
+/* simple packet drop monitor:
+ * - in-kernel eBPF program attaches to kfree_skb() event and records number
+ *   of packet drops at given location
+ * - userspace iterates over the map every second and prints stats
+ */
+#include <stdio.h>
+#include <unistd.h>
+#include <linux/bpf.h>
+#include <errno.h>
+#include <linux/unistd.h>
+#include <string.h>
+#include <linux/filter.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include "libbpf.h"
+
+#define TRACEPOINT "/sys/kernel/debug/tracing/events/skb/kfree_skb/"
+
+static int write_to_file(const char *file, const char *str, bool keep_open)
+{
+	int fd, err;
+
+	fd = open(file, O_WRONLY);
+	err = write(fd, str, strlen(str));
+	(void) err;
+
+	if (keep_open) {
+		return fd;
+	} else {
+		close(fd);
+		return -1;
+	}
+}
+
+static int dropmon(void)
+{
+	/* the following eBPF program is equivalent to C:
+	 * void filter(struct bpf_context *ctx)
+	 * {
+	 *   long loc = ctx->arg2;
+	 *   long init_val = 1;
+	 *   void *value;
+	 *
+	 *   value = bpf_map_lookup_elem(MAP_ID, &loc);
+	 *   if (value) {
+	 *      (*(long *) value) += 1;
+	 *   } else {
+	 *      bpf_map_update_elem(MAP_ID, &loc, &init_val);
+	 *   }
+	 * }
+	 */
+	static struct bpf_insn prog[] = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), /* r2 = *(u64 *)(r1 + 8) */
+		BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -8), /* *(u64 *)(fp - 8) = r2 */
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), /* r2 = fp - 8 */
+		BPF_LD_MAP_FD(BPF_REG_1, 0),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+		BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
+		BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+		BPF_EXIT_INSN(),
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 1), /* *(u64 *)(fp - 16) = 1 */
+		BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -16), /* r3 = fp - 16 */
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), /* r2 = fp - 8 */
+		BPF_LD_MAP_FD(BPF_REG_1, 0),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_update_elem),
+		BPF_EXIT_INSN(),
+	};
+
+	long long key, next_key, value = 0;
+	int prog_fd, map_fd, i;
+	char fmt[32];
+
+	map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 1024);
+	if (map_fd < 0) {
+		printf("failed to create map '%s'\n", strerror(errno));
+		goto cleanup;
+	}
+
+	prog[4].imm = map_fd;
+	prog[16].imm = map_fd;
+
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACING_FILTER, prog,
+				sizeof(prog), "GPL");
+	if (prog_fd < 0) {
+		printf("failed to load prog '%s'\n", strerror(errno));
+		return -1;
+	}
+
+	sprintf(fmt, "bpf_%d", prog_fd);
+
+	write_to_file(TRACEPOINT "filter", fmt, true);
+
+	for (i = 0; i < 10; i++) {
+		key = 0;
+		while (bpf_get_next_key(map_fd, &key, &next_key) == 0) {
+			bpf_lookup_elem(map_fd, &next_key, &value);
+			printf("location 0x%llx count %lld\n", next_key, value);
+			key = next_key;
+		}
+		if (key)
+			printf("\n");
+		sleep(1);
+	}
+
+cleanup:
+	/* maps, programs, tracepoint filters will auto cleanup on process exit */
+
+	return 0;
+}
+
+int main(void)
+{
+	dropmon();
+	return 0;
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 24/29] bpf: verifier test
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (22 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 23/29] samples: bpf: example of tracing filters with eBPF Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 25/29] bpf: llvm backend Alexei Starovoitov
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

verifier testsuite from user space. Tests valid and invalid programs
and expects predefined error log messages from kernel.
42 tests so far.

$ sudo ./test_verifier
 #0 add+sub+mul OK
 #1 dropmon OK
 #2 dropmon with non-looping jump back OK
 #3 unreachable OK
 ...
 #40 access memory with incorrect alignment OK
 #41 sometimes access memory with incorrect alignment OK

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 samples/bpf/Makefile        |    3 +-
 samples/bpf/test_verifier.c |  599 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 601 insertions(+), 1 deletion(-)
 create mode 100644 samples/bpf/test_verifier.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 8dfbacaef9ba..7264ba51d496 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -2,9 +2,10 @@
 obj- := dummy.o
 
 # List of programs to build
-hostprogs-y := dropmon
+hostprogs-y := dropmon test_verifier
 
 dropmon-objs := dropmon.o libbpf.o
+test_verifier-objs := test_verifier.o libbpf.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
diff --git a/samples/bpf/test_verifier.c b/samples/bpf/test_verifier.c
new file mode 100644
index 000000000000..aa70aa33d84c
--- /dev/null
+++ b/samples/bpf/test_verifier.c
@@ -0,0 +1,599 @@
+/*
+ * Testsuite for eBPF verifier
+ *
+ * Copyright (c) 2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <stdio.h>
+#include <unistd.h>
+#include <linux/bpf.h>
+#include <errno.h>
+#include <linux/unistd.h>
+#include <string.h>
+#include <linux/filter.h>
+#include "libbpf.h"
+
+#define MAX_INSNS 512
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
+
+struct bpf_test {
+	const char *descr;
+	struct bpf_insn	insns[MAX_INSNS];
+	int fixup[32];
+	const char *errstr;
+	enum {
+		ACCEPT,
+		REJECT
+	} result;
+};
+
+static struct bpf_test tests[] = {
+	{
+		"add+sub+mul",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_1, 1),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 2),
+			BPF_MOV64_IMM(BPF_REG_2, 3),
+			BPF_ALU64_REG(BPF_SUB, BPF_REG_1, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -1),
+			BPF_ALU64_IMM(BPF_MUL, BPF_REG_1, 3),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+	},
+	{
+		"dropmon",
+		.insns = {
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), /* r2 = *(u64 *)(r1 + 8) */
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -8), /* *(u64 *)(fp - 8) = r2 */
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), /* r2 = fp - 8 */
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+			BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
+			BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+			BPF_EXIT_INSN(),
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 1), /* *(u64 *)(fp - 16) = 1 */
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -16), /* r3 = fp - 16 */
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), /* r2 = fp - 8 */
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_update_elem),
+			BPF_EXIT_INSN(),
+		},
+		.fixup = {4, 16},
+		.result = ACCEPT,
+	},
+	{
+		"dropmon with non-looping jump back",
+		.insns = {
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), /* r2 = *(u64 *)(r1 + 8) */
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -8), /* *(u64 *)(fp - 8) = r2 */
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), /* r2 = fp - 8 */
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+			BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
+			BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+			BPF_EXIT_INSN(),
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 1), /* *(u64 *)(fp - 16) = 1 */
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -16), /* r3 = fp - 16 */
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), /* r2 = fp - 8 */
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_update_elem),
+			BPF_JMP_IMM(BPF_JA, 0, 0, -10),
+		},
+		.fixup = {4, 16},
+		.result = ACCEPT,
+	},
+	{
+		"unreachable",
+		.insns = {
+			BPF_EXIT_INSN(),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "unreachable",
+		.result = REJECT,
+	},
+	{
+		"unreachable2",
+		.insns = {
+			BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "unreachable",
+		.result = REJECT,
+	},
+	{
+		"out of range jump",
+		.insns = {
+			BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "jump out of range",
+		.result = REJECT,
+	},
+	{
+		"out of range jump2",
+		.insns = {
+			BPF_JMP_IMM(BPF_JA, 0, 0, -2),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "jump out of range",
+		.result = REJECT,
+	},
+	{
+		"test1 ld_imm64",
+		.insns = {
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1),
+			BPF_LD_IMM64(BPF_REG_0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 1),
+			BPF_LD_IMM64(BPF_REG_0, 1),
+			BPF_MOV64_IMM(BPF_REG_0, 2),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid BPF_LD_IMM insn",
+		.result = REJECT,
+	},
+	{
+		"test2 ld_imm64",
+		.insns = {
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1),
+			BPF_LD_IMM64(BPF_REG_0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 1),
+			BPF_LD_IMM64(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid BPF_LD_IMM insn",
+		.result = REJECT,
+	},
+	{
+		"test3 ld_imm64",
+		.insns = {
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1),
+			BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 0),
+			BPF_LD_IMM64(BPF_REG_0, 1),
+			BPF_LD_IMM64(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid bpf_ld_imm64 insn",
+		.result = REJECT,
+	},
+	{
+		"test4 ld_imm64",
+		.insns = {
+			BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid bpf_ld_imm64 insn",
+		.result = REJECT,
+	},
+	{
+		"test5 ld_imm64",
+		.insns = {
+			BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, 0, 0, 0),
+		},
+		.errstr = "invalid bpf_ld_imm64 insn",
+		.result = REJECT,
+	},
+	{
+		"no bpf_exit",
+		.insns = {
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_0, BPF_REG_2),
+		},
+		.errstr = "jump out of range",
+		.result = REJECT,
+	},
+	{
+		"loop (back-edge)",
+		.insns = {
+			BPF_JMP_IMM(BPF_JA, 0, 0, -1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "back-edge",
+		.result = REJECT,
+	},
+	{
+		"loop2 (back-edge)",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+			BPF_JMP_IMM(BPF_JA, 0, 0, -4),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "back-edge",
+		.result = REJECT,
+	},
+	{
+		"conditional loop",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+			BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, -3),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "back-edge",
+		.result = REJECT,
+	},
+	{
+		"read uninitialized register",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R2 !read_ok",
+		.result = REJECT,
+	},
+	{
+		"read invalid register",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_0, -1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R15 is invalid",
+		.result = REJECT,
+	},
+	{
+		"program doesn't init R0 before exit",
+		.insns = {
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_2, BPF_REG_1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R0 !read_ok",
+		.result = REJECT,
+	},
+	{
+		"stack out of bounds",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid stack",
+		.result = REJECT,
+	},
+	{
+		"invalid call insn1",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL | BPF_X, 0, 0, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "BPF_CALL uses reserved",
+		.result = REJECT,
+	},
+	{
+		"invalid call insn2",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 1, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "BPF_CALL uses reserved",
+		.result = REJECT,
+	},
+	{
+		"invalid function call",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 1234567),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid func 1234567",
+		.result = REJECT,
+	},
+	{
+		"uninitialized stack1",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_EXIT_INSN(),
+		},
+		.fixup = {2},
+		.errstr = "invalid indirect read from stack",
+		.result = REJECT,
+	},
+	{
+		"uninitialized stack2",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, -8),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid read from stack",
+		.result = REJECT,
+	},
+	{
+		"check valid spill/fill",
+		.insns = {
+			/* spill R1(ctx) into stack */
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+
+			/* fill it back into R2 */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -8),
+
+			/* should be able to access R0 = *(R2 + 8) */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 8),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+	},
+	{
+		"check corrupted spill/fill",
+		.insns = {
+			/* spill R1(ctx) into stack */
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+
+			/* mess up with R1 pointer on stack */
+			BPF_ST_MEM(BPF_B, BPF_REG_10, -7, 0x23),
+
+			/* fill back into R0 should fail */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "corrupted spill",
+		.result = REJECT,
+	},
+	{
+		"invalid src register in STX",
+		.insns = {
+			BPF_STX_MEM(BPF_B, BPF_REG_10, -1, -1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R15 is invalid",
+		.result = REJECT,
+	},
+	{
+		"invalid dst register in STX",
+		.insns = {
+			BPF_STX_MEM(BPF_B, 14, BPF_REG_10, -1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R14 is invalid",
+		.result = REJECT,
+	},
+	{
+		"invalid dst register in ST",
+		.insns = {
+			BPF_ST_MEM(BPF_B, 14, -1, -1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R14 is invalid",
+		.result = REJECT,
+	},
+	{
+		"invalid src register in LDX",
+		.insns = {
+			BPF_LDX_MEM(BPF_B, BPF_REG_0, 12, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R12 is invalid",
+		.result = REJECT,
+	},
+	{
+		"invalid dst register in LDX",
+		.insns = {
+			BPF_LDX_MEM(BPF_B, 11, BPF_REG_1, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R11 is invalid",
+		.result = REJECT,
+	},
+	{
+		"junk insn",
+		.insns = {
+			BPF_RAW_INSN(0, 0, 0, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid BPF_LD_IMM",
+		.result = REJECT,
+	},
+	{
+		"junk insn2",
+		.insns = {
+			BPF_RAW_INSN(1, 0, 0, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "BPF_LDX uses reserved fields",
+		.result = REJECT,
+	},
+	{
+		"junk insn3",
+		.insns = {
+			BPF_RAW_INSN(-1, 0, 0, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid BPF_ALU opcode f0",
+		.result = REJECT,
+	},
+	{
+		"junk insn4",
+		.insns = {
+			BPF_RAW_INSN(-1, -1, -1, -1, -1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid BPF_ALU opcode f0",
+		.result = REJECT,
+	},
+	{
+		"junk insn5",
+		.insns = {
+			BPF_RAW_INSN(0x7f, -1, -1, -1, -1),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "BPF_ALU uses reserved fields",
+		.result = REJECT,
+	},
+	{
+		"misaligned read from stack",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, -4),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "misaligned access",
+		.result = REJECT,
+	},
+	{
+		"invalid map_fd for function call",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_ALU64_REG(BPF_MOV, BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "fd 0 is not pointing to valid bpf_map",
+		.result = REJECT,
+	},
+	{
+		"don't check return value before access",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup = {3},
+		.errstr = "R0 invalid mem access 'map_value_or_null'",
+		.result = REJECT,
+	},
+	{
+		"access memory with incorrect alignment",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+			BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup = {3},
+		.errstr = "misaligned access",
+		.result = REJECT,
+	},
+	{
+		"sometimes access memory with incorrect alignment",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
+			BPF_EXIT_INSN(),
+			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.fixup = {3},
+		.errstr = "R0 invalid mem access",
+		.result = REJECT,
+	},
+};
+
+static int probe_filter_length(struct bpf_insn *fp)
+{
+	int len = 0;
+
+	for (len = MAX_INSNS - 1; len > 0; --len)
+		if (fp[len].code != 0 || fp[len].imm != 0)
+			break;
+
+	return len + 1;
+}
+
+static int create_map(void)
+{
+	long long key, value = 0;
+	int map_fd;
+
+	map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 1024);
+	if (map_fd < 0) {
+		printf("failed to create map '%s'\n", strerror(errno));
+	}
+
+	return map_fd;
+}
+
+static int test(void)
+{
+	int prog_fd, i;
+
+	for (i = 0; i < ARRAY_SIZE(tests); i++) {
+		struct bpf_insn *prog = tests[i].insns;
+		int prog_len = probe_filter_length(prog);
+		int *fixup = tests[i].fixup;
+		int map_fd = -1;
+
+		if (*fixup) {
+
+			map_fd = create_map();
+
+			do {
+				prog[*fixup].imm = map_fd;
+				fixup++;
+			} while (*fixup);
+		}
+		printf("#%d %s ", i, tests[i].descr);
+
+		prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACING_FILTER, prog,
+					prog_len * sizeof(struct bpf_insn),
+					"GPL");
+
+		if (tests[i].result == ACCEPT) {
+			if (prog_fd < 0) {
+				printf("FAIL\nfailed to load prog '%s'\n",
+				       strerror(errno));
+				printf("%s", bpf_log_buf);
+				goto fail;
+			}
+		} else {
+			if (prog_fd >= 0) {
+				printf("FAIL\nunexpected success to load\n");
+				printf("%s", bpf_log_buf);
+				goto fail;
+			}
+			if (strstr(bpf_log_buf, tests[i].errstr) == 0) {
+				printf("FAIL\nunexpected error message: %s",
+				       bpf_log_buf);
+				goto fail;
+			}
+		}
+
+		printf("OK\n");
+fail:
+		if (map_fd >= 0)
+			close(map_fd);
+		close(prog_fd);
+
+	}
+
+	return 0;
+}
+
+int main(void)
+{
+	return test();
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 25/29] bpf: llvm backend
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (23 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 24/29] bpf: verifier test Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 26/29] samples: bpf: elf file loader Alexei Starovoitov
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

llvm backend that generated eBPF and emits either
binary ELF or human readable eBPF assembler

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 tools/bpf/llvm/.gitignore                          |   54 ++
 tools/bpf/llvm/LICENSE.TXT                         |   70 ++
 tools/bpf/llvm/Makefile.rules                      |  641 ++++++++++++++++++
 tools/bpf/llvm/README.txt                          |   23 +
 tools/bpf/llvm/bld/Makefile                        |   27 +
 tools/bpf/llvm/bld/Makefile.common                 |   14 +
 tools/bpf/llvm/bld/Makefile.config                 |  124 ++++
 .../llvm/bld/include/llvm/Config/AsmParsers.def    |    8 +
 .../llvm/bld/include/llvm/Config/AsmPrinters.def   |    9 +
 .../llvm/bld/include/llvm/Config/Disassemblers.def |    8 +
 tools/bpf/llvm/bld/include/llvm/Config/Targets.def |    9 +
 .../bpf/llvm/bld/include/llvm/Support/DataTypes.h  |   96 +++
 tools/bpf/llvm/bld/lib/Makefile                    |   11 +
 .../llvm/bld/lib/Target/BPF/InstPrinter/Makefile   |   10 +
 .../llvm/bld/lib/Target/BPF/MCTargetDesc/Makefile  |   11 +
 tools/bpf/llvm/bld/lib/Target/BPF/Makefile         |   17 +
 .../llvm/bld/lib/Target/BPF/TargetInfo/Makefile    |   10 +
 tools/bpf/llvm/bld/lib/Target/Makefile             |   11 +
 tools/bpf/llvm/bld/tools/Makefile                  |   12 +
 tools/bpf/llvm/bld/tools/llc/Makefile              |   15 +
 tools/bpf/llvm/lib/Target/BPF/BPF.h                |   28 +
 tools/bpf/llvm/lib/Target/BPF/BPF.td               |   29 +
 tools/bpf/llvm/lib/Target/BPF/BPFAsmPrinter.cpp    |  100 +++
 tools/bpf/llvm/lib/Target/BPF/BPFCallingConv.td    |   24 +
 tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.cpp |   36 ++
 tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.h   |   35 +
 tools/bpf/llvm/lib/Target/BPF/BPFISelDAGToDAG.cpp  |  182 ++++++
 tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.cpp  |  683 ++++++++++++++++++++
 tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.h    |  105 +++
 tools/bpf/llvm/lib/Target/BPF/BPFInstrFormats.td   |   29 +
 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.cpp     |  162 +++++
 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.h       |   53 ++
 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.td      |  498 ++++++++++++++
 tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.cpp   |   77 +++
 tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.h     |   40 ++
 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.cpp  |  122 ++++
 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.h    |   65 ++
 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.td   |   39 ++
 tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.cpp     |   23 +
 tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.h       |   33 +
 tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.cpp |   66 ++
 tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.h   |   69 ++
 .../lib/Target/BPF/InstPrinter/BPFInstPrinter.cpp  |   81 +++
 .../lib/Target/BPF/InstPrinter/BPFInstPrinter.h    |   34 +
 .../lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp  |   89 +++
 .../llvm/lib/Target/BPF/MCTargetDesc/BPFBaseInfo.h |   33 +
 .../Target/BPF/MCTargetDesc/BPFELFObjectWriter.cpp |   56 ++
 .../lib/Target/BPF/MCTargetDesc/BPFMCAsmInfo.h     |   34 +
 .../Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp   |  167 +++++
 .../Target/BPF/MCTargetDesc/BPFMCTargetDesc.cpp    |  115 ++++
 .../lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.h  |   56 ++
 .../lib/Target/BPF/TargetInfo/BPFTargetInfo.cpp    |   13 +
 tools/bpf/llvm/tools/llc/llc.cpp                   |  381 +++++++++++
 53 files changed, 4737 insertions(+)
 create mode 100644 tools/bpf/llvm/.gitignore
 create mode 100644 tools/bpf/llvm/LICENSE.TXT
 create mode 100644 tools/bpf/llvm/Makefile.rules
 create mode 100644 tools/bpf/llvm/README.txt
 create mode 100644 tools/bpf/llvm/bld/Makefile
 create mode 100644 tools/bpf/llvm/bld/Makefile.common
 create mode 100644 tools/bpf/llvm/bld/Makefile.config
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Config/AsmParsers.def
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Config/AsmPrinters.def
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Config/Disassemblers.def
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Config/Targets.def
 create mode 100644 tools/bpf/llvm/bld/include/llvm/Support/DataTypes.h
 create mode 100644 tools/bpf/llvm/bld/lib/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/BPF/InstPrinter/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/BPF/MCTargetDesc/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/BPF/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/BPF/TargetInfo/Makefile
 create mode 100644 tools/bpf/llvm/bld/lib/Target/Makefile
 create mode 100644 tools/bpf/llvm/bld/tools/Makefile
 create mode 100644 tools/bpf/llvm/bld/tools/llc/Makefile
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPF.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPF.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFAsmPrinter.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFCallingConv.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFISelDAGToDAG.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFInstrFormats.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.td
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFBaseInfo.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFELFObjectWriter.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCAsmInfo.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.cpp
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.h
 create mode 100644 tools/bpf/llvm/lib/Target/BPF/TargetInfo/BPFTargetInfo.cpp
 create mode 100644 tools/bpf/llvm/tools/llc/llc.cpp

diff --git a/tools/bpf/llvm/.gitignore b/tools/bpf/llvm/.gitignore
new file mode 100644
index 000000000000..7d6c5117e9a3
--- /dev/null
+++ b/tools/bpf/llvm/.gitignore
@@ -0,0 +1,54 @@
+#==============================================================================#
+# This file specifies intentionally untracked files that git should ignore.
+# See: http://www.kernel.org/pub/software/scm/git/docs/gitignore.html
+#
+# This file is intentionally different from the output of `git svn show-ignore`,
+# as most of those are useless.
+#==============================================================================#
+
+#==============================================================================#
+# File extensions to be ignored anywhere in the tree.
+#==============================================================================#
+# Temp files created by most text editors.
+*~
+# Merge files created by git.
+*.orig
+# Byte compiled python modules.
+*.pyc
+# vim swap files
+.*.swp
+*.inc
+Debug+Asserts
+Release+Asserts
+
+#==============================================================================#
+# Explicit files to ignore (only matches one).
+#==============================================================================#
+.cproject
+.gitusers
+.project
+autom4te.cache
+cscope.files
+cscope.out
+autoconf/aclocal.m4
+autoconf/autom4te.cache
+compile_commands.json
+
+#==============================================================================#
+# Directories to ignore (do not add trailing '/'s, they skip symlinks).
+#==============================================================================#
+# External projects that are tracked independently.
+projects/*
+!projects/sample
+!projects/CMakeLists.txt
+!projects/Makefile
+# Clang, which is tracked independently.
+tools/clang
+# LLDB, which is tracked independently.
+tools/lldb
+# lld, which is tracked independently.
+tools/lld
+# Sphinx build tree, if building in-source dir.
+docs/_build
+# build directory
+build
diff --git a/tools/bpf/llvm/LICENSE.TXT b/tools/bpf/llvm/LICENSE.TXT
new file mode 100644
index 000000000000..00cf60116941
--- /dev/null
+++ b/tools/bpf/llvm/LICENSE.TXT
@@ -0,0 +1,70 @@
+==============================================================================
+LLVM Release License
+==============================================================================
+University of Illinois/NCSA
+Open Source License
+
+Copyright (c) 2003-2012 University of Illinois at Urbana-Champaign.
+All rights reserved.
+
+Developed by:
+
+    LLVM Team
+
+    University of Illinois at Urbana-Champaign
+
+    http://llvm.org
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal with
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+    * Redistributions of source code must retain the above copyright notice,
+      this list of conditions and the following disclaimers.
+
+    * Redistributions in binary form must reproduce the above copyright notice,
+      this list of conditions and the following disclaimers in the
+      documentation and/or other materials provided with the distribution.
+
+    * Neither the names of the LLVM Team, University of Illinois at
+      Urbana-Champaign, nor the names of its contributors may be used to
+      endorse or promote products derived from this Software without specific
+      prior written permission.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE
+SOFTWARE.
+
+==============================================================================
+Copyrights and Licenses for Third Party Software Distributed with LLVM:
+==============================================================================
+The LLVM software contains code written by third parties.  Such software will
+have its own individual LICENSE.TXT file in the directory in which it appears.
+This file will describe the copyrights, license, and restrictions which apply
+to that code.
+
+The disclaimer of warranty in the University of Illinois Open Source License
+applies to all code in the LLVM Distribution, and nothing in any of the
+other licenses gives permission to use the names of the LLVM Team or the
+University of Illinois to endorse or promote products derived from this
+Software.
+
+The following pieces of software have additional or alternate copyrights,
+licenses, and/or restrictions:
+
+Program             Directory
+-------             ---------
+Autoconf            llvm/autoconf
+                    llvm/projects/ModuleMaker/autoconf
+                    llvm/projects/sample/autoconf
+CellSPU backend     llvm/lib/Target/CellSPU/README.txt
+Google Test         llvm/utils/unittest/googletest
+OpenBSD regex       llvm/lib/Support/{reg*, COPYRIGHT.regex}
+pyyaml tests        llvm/test/YAMLParser/{*.data, LICENSE.TXT}
diff --git a/tools/bpf/llvm/Makefile.rules b/tools/bpf/llvm/Makefile.rules
new file mode 100644
index 000000000000..968952712d43
--- /dev/null
+++ b/tools/bpf/llvm/Makefile.rules
@@ -0,0 +1,641 @@
+#===-- Makefile.rules - Common make rules for LLVM ---------*- Makefile -*--===#
+#
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+#
+# This file is included by all of the LLVM makefiles.  For details on how to use
+# it properly, please see the document MakefileGuide.html in the docs directory.
+
+# TARGETS: Define standard targets that can be invoked
+
+# Define the various target sets
+RecursiveTargets := all clean clean-all install uninstall
+LocalTargets     := all-local clean-local clean-all-local check-local \
+                    install-local uninstall-local
+TopLevelTargets  := check dist-clean
+UserTargets      := $(RecursiveTargets) $(LocalTargets) $(TopLevelTargets)
+InternalTargets  := preconditions
+
+# INITIALIZATION: Basic things the makefile needs
+
+# Set the VPATH so that we can find source files.
+VPATH=$(PROJ_SRC_DIR)
+
+# Reset the list of suffixes we know how to build.
+.SUFFIXES:
+.SUFFIXES: .c .cpp .cc .h .hpp .o .a
+.SUFFIXES: $(SHLIBEXT) $(SUFFIXES)
+
+# Mark all of these targets as phony to avoid implicit rule search
+.PHONY: $(UserTargets) $(InternalTargets)
+
+# Make sure all the user-target rules are double colon rules and
+# they are defined first.
+
+$(UserTargets)::
+
+# PRECONDITIONS: that which must be built/checked first
+
+SrcMakefiles       := $(filter %Makefile %Makefile.tests,\
+                      $(wildcard $(PROJ_SRC_DIR)/Makefile*))
+ObjMakefiles       := $(subst $(PROJ_SRC_DIR),$(PROJ_OBJ_DIR),$(SrcMakefiles))
+MakefileConfig     := $(PROJ_OBJ_ROOT)/Makefile.config
+MakefileCommon     := $(PROJ_OBJ_ROOT)/Makefile.common
+PreConditions      := $(ObjMakefiles)
+PreConditions      += $(MakefileCommon)
+PreConditions      += $(MakefileConfig)
+
+preconditions: $(PreConditions)
+
+# Make sure the BUILT_SOURCES are built first
+$(filter-out clean clean-local,$(UserTargets)):: $(BUILT_SOURCES)
+
+clean-all-local::
+ifneq ($(strip $(BUILT_SOURCES)),)
+	-$(Verb) $(RM) -f $(BUILT_SOURCES)
+endif
+
+$(BUILT_SOURCES) : $(ObjMakefiles)
+
+ifndef PROJ_MAKEFILE
+PROJ_MAKEFILE := $(PROJ_OBJ_DIR)/Makefile
+endif
+
+# Set up the basic dependencies
+$(UserTargets):: $(PreConditions)
+
+all:: all-local
+clean:: clean-local
+clean-all:: clean-local clean-all-local
+install:: install-local
+uninstall:: uninstall-local
+install-local:: all-local
+
+# VARIABLES: Set up various variables based on configuration data
+
+# Variable for if this make is for a "cleaning" target
+ifneq ($(strip $(filter clean clean-local dist-clean,$(MAKECMDGOALS))),)
+  IS_CLEANING_TARGET=1
+endif
+
+# Variables derived from configuration we are building
+
+CPP.Defines :=
+ifeq ($(ENABLE_OPTIMIZED),1)
+  BuildMode := Release
+  OmitFramePointer := -fomit-frame-pointer
+
+  CXX.Flags += $(OPTIMIZE_OPTION) $(OmitFramePointer)
+  C.Flags   += $(OPTIMIZE_OPTION) $(OmitFramePointer)
+  LD.Flags  += $(OPTIMIZE_OPTION)
+  ifdef DEBUG_SYMBOLS
+    BuildMode := $(BuildMode)+Debug
+    CXX.Flags += -g
+    C.Flags   += -g
+    LD.Flags  += -g
+    KEEP_SYMBOLS := 1
+  endif
+else
+  ifdef NO_DEBUG_SYMBOLS
+    BuildMode := Unoptimized
+    CXX.Flags +=
+    C.Flags   +=
+    LD.Flags  +=
+    KEEP_SYMBOLS := 1
+  else
+    BuildMode := Debug
+    CXX.Flags += -g
+    C.Flags   += -g
+    LD.Flags  += -g
+    KEEP_SYMBOLS := 1
+  endif
+endif
+
+ifeq ($(ENABLE_WERROR),1)
+  CXX.Flags += -Werror
+  C.Flags += -Werror
+endif
+
+ifeq ($(ENABLE_VISIBILITY_INLINES_HIDDEN),1)
+    CXX.Flags += -fvisibility-inlines-hidden
+endif
+
+CXX.Flags += -fno-exceptions
+
+CXX.Flags += -fno-rtti
+
+# If DISABLE_ASSERTIONS=1 is specified (make command line or configured),
+# then disable assertions by defining the appropriate preprocessor symbols.
+ifeq ($(DISABLE_ASSERTIONS),1)
+  CPP.Defines += -DNDEBUG
+else
+  BuildMode := $(BuildMode)+Asserts
+  CPP.Defines += -D_DEBUG
+endif
+
+# If ENABLE_EXPENSIVE_CHECKS=1 is specified (make command line or
+# configured), then enable expensive checks by defining the
+# appropriate preprocessor symbols.
+ifeq ($(ENABLE_EXPENSIVE_CHECKS),1)
+  BuildMode := $(BuildMode)+Checks
+  CPP.Defines += -DXDEBUG
+endif
+
+DOTDIR_TIMESTAMP_COMMAND := $(DATE)
+
+CXX.Flags     += -Woverloaded-virtual
+CPP.BaseFlags += $(CPP.Defines)
+AR.Flags      := cru
+
+# Directory locations
+
+ObjRootDir  := $(PROJ_OBJ_DIR)/$(BuildMode)
+ObjDir      := $(ObjRootDir)
+LibDir      := $(PROJ_OBJ_ROOT)/$(BuildMode)/lib
+ToolDir     := $(PROJ_OBJ_ROOT)/$(BuildMode)/bin
+ExmplDir    := $(PROJ_OBJ_ROOT)/$(BuildMode)/examples
+LLVMLibDir  := $(LLVM_OBJ_ROOT)/$(BuildMode)/lib
+LLVMToolDir := $(LLVM_OBJ_ROOT)/$(BuildMode)/bin
+LLVMExmplDir:= $(LLVM_OBJ_ROOT)/$(BuildMode)/examples
+
+# Locations of shared libraries
+SharedPrefix     := lib
+SharedLibDir     := $(LibDir)
+LLVMSharedLibDir := $(LLVMLibDir)
+
+# Full Paths To Compiled Tools and Utilities
+EchoCmd  := $(ECHO) llvm[$(MAKELEVEL)]:
+
+Echo     := @$(EchoCmd)
+LLVMToolDir := $(shell $(LLVM_CONFIG) --bindir)
+LLVMLibDir := $(shell $(LLVM_CONFIG) --libdir)
+LLVMIncludeDir := $(shell $(LLVM_CONFIG) --includedir)
+ifndef LLVM_TBLGEN
+LLVM_TBLGEN   := $(LLVMToolDir)/llvm-tblgen$(EXEEXT)
+endif
+
+SharedLinkOptions=-shared
+
+ifdef TOOL_VERBOSE
+  C.Flags += -v
+  CXX.Flags += -v
+  LD.Flags += -v
+  VERBOSE := 1
+endif
+
+# Adjust settings for verbose mode
+ifndef VERBOSE
+  Verb := @
+  AR.Flags += >/dev/null 2>/dev/null
+endif
+
+# By default, strip symbol information from executable
+ifndef KEEP_SYMBOLS
+  Strip := $(PLATFORMSTRIPOPTS)
+  StripWarnMsg := "(without symbols)"
+  Install.StripFlag += -s
+endif
+
+ifdef TOOL_NO_EXPORTS
+  DynamicFlags :=
+else
+  DynamicFlag := $(RDYNAMIC)
+endif
+
+# Adjust linker flags for building an executable
+ifdef TOOLNAME
+  LD.Flags += $(RPATH) -Wl,'$$ORIGIN/../lib'
+  LD.Flags += $(RPATH) -Wl,$(ToolDir) $(DynamicFlag)
+endif
+
+# Options To Invoke Tools
+ifdef EXTRA_LD_OPTIONS
+LD.Flags += $(EXTRA_LD_OPTIONS)
+endif
+
+ifndef NO_PEDANTIC
+CompileCommonOpts += -pedantic -Wno-long-long
+endif
+CompileCommonOpts += -Wall -W -Wno-unused-parameter -Wwrite-strings \
+                     $(EXTRA_OPTIONS)
+# Enable cast-qual for C++; the workaround is to use const_cast.
+CXX.Flags += -Wcast-qual
+
+LD.Flags    += -L$(LibDir) -L$(LLVMLibDir)
+
+CPP.BaseFlags += -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
+# All -I flags should go here, so that they don't confuse llvm-config.
+CPP.Flags     += $(sort -I$(PROJ_OBJ_DIR) -I$(PROJ_SRC_DIR) \
+	         $(patsubst %,-I%/include,\
+	         $(PROJ_OBJ_ROOT) $(PROJ_SRC_ROOT) \
+	         $(LLVM_OBJ_ROOT) $(LLVM_SRC_ROOT))) \
+	         -I$(LLVMIncludeDir) $(CPP.BaseFlags)
+
+Compile.Wrapper :=
+
+Compile.C     = $(Compile.Wrapper) \
+	          $(CC) $(CPP.Flags) $(C.Flags) $(CFLAGS) $(CPPFLAGS) \
+                $(TargetCommonOpts) $(CompileCommonOpts) -c
+Compile.CXX   = $(Compile.Wrapper) \
+	          $(CXX) $(CPP.Flags) $(CXX.Flags) $(CXXFLAGS) $(CPPFLAGS) \
+                $(TargetCommonOpts) $(CompileCommonOpts) -c
+Link          = $(Compile.Wrapper) \
+	          $(CXX) $(CPP.Flags) $(CXX.Flags) $(CXXFLAGS) $(LD.Flags) \
+                $(LDFLAGS) $(TargetCommonOpts)  $(CompileCommonOpts) $(Strip)
+
+ProgInstall   = $(INSTALL) $(Install.StripFlag) -m 0755
+ScriptInstall = $(INSTALL) -m 0755
+DataInstall   = $(INSTALL) -m 0644
+
+TableGen.Flags= -I $(call SYSPATH, $(PROJ_SRC_DIR)) \
+                -I $(call SYSPATH, $(LLVMIncludeDir)) \
+                -I $(call SYSPATH, $(PROJ_SRC_ROOT)/include) \
+                -I $(call SYSPATH, $(PROJ_SRC_ROOT)/lib/Target)
+LLVMTableGen  = $(LLVM_TBLGEN) $(TableGen.Flags)
+
+Archive       = $(AR) $(AR.Flags)
+ifdef RANLIB
+Ranlib        = $(RANLIB)
+else
+Ranlib        = ranlib
+endif
+
+AliasTool     = ln -s
+
+# Get the list of source files and compute object file
+# names from them.
+ifndef SOURCES
+  Sources := $(notdir $(wildcard $(PROJ_SRC_DIR)/*.cpp \
+             $(PROJ_SRC_DIR)/*.cc $(PROJ_SRC_DIR)/*.c))
+else
+  Sources := $(SOURCES)
+endif
+
+ifdef BUILT_SOURCES
+Sources += $(filter %.cpp %.c %.cc,$(BUILT_SOURCES))
+endif
+
+BaseNameSources := $(sort $(basename $(Sources)))
+
+ObjectsO  := $(BaseNameSources:%=$(ObjDir)/%.o)
+
+ECHOPATH := $(Verb)$(ECHO)
+
+# DIRECTORIES: Handle recursive descent of directory structure
+
+# Provide rules to make install dirs. This must be early
+# in the file so they get built before dependencies
+
+$(DESTDIR)$(PROJ_bindir)::
+	$(Verb) $(MKDIR) $@
+
+# To create other directories, as needed, and timestamp their creation
+%/.dir:
+	$(Verb) $(MKDIR) $* > /dev/null
+	$(Verb) $(DOTDIR_TIMESTAMP_COMMAND) > $@
+
+.PRECIOUS: $(ObjDir)/.dir $(LibDir)/.dir $(ToolDir)/.dir $(ExmplDir)/.dir
+.PRECIOUS: $(LLVMLibDir)/.dir $(LLVMToolDir)/.dir $(LLVMExmplDir)/.dir
+
+# Handle the DIRS options for sequential construction
+
+SubDirs :=
+ifdef DIRS
+SubDirs += $(DIRS)
+
+ifneq ($(PROJ_SRC_ROOT),$(PROJ_OBJ_ROOT))
+$(RecursiveTargets)::
+	$(Verb) for dir in $(DIRS); do \
+	  if ([ ! -f $$dir/Makefile ] || \
+	      command test $$dir/Makefile -ot $(PROJ_SRC_DIR)/$$dir/Makefile ); then \
+	    $(MKDIR) $$dir; \
+	    $(CP) $(PROJ_SRC_DIR)/$$dir/Makefile $$dir/Makefile; \
+	  fi; \
+	  ($(MAKE) -C $$dir $@ ) || exit 1; \
+	done
+else
+$(RecursiveTargets)::
+	$(Verb) for dir in $(DIRS); do \
+	  ($(MAKE) -C $$dir $@ ) || exit 1; \
+	done
+endif
+
+endif
+
+# Handle the PARALLEL_DIRS options for parallel construction
+ifdef PARALLEL_DIRS
+
+SubDirs += $(PARALLEL_DIRS)
+
+# Unfortunately, this list must be maintained if new recursive targets are added
+all      :: $(addsuffix /.makeall      ,$(PARALLEL_DIRS))
+clean    :: $(addsuffix /.makeclean    ,$(PARALLEL_DIRS))
+clean-all:: $(addsuffix /.makeclean-all,$(PARALLEL_DIRS))
+install  :: $(addsuffix /.makeinstall  ,$(PARALLEL_DIRS))
+uninstall:: $(addsuffix /.makeuninstall,$(PARALLEL_DIRS))
+
+ParallelTargets := $(foreach T,$(RecursiveTargets),%/.make$(T))
+
+$(ParallelTargets) :
+	$(Verb) \
+	  SD=$(PROJ_SRC_DIR)/$(@D); \
+	  DD=$(@D); \
+	  if [ ! -f $$SD/Makefile ]; then \
+	    SD=$(@D); \
+	    DD=$(notdir $(@D)); \
+	  fi; \
+	  if ([ ! -f $$DD/Makefile ] || \
+	            command test $$DD/Makefile -ot \
+                      $$SD/Makefile ); then \
+	  $(MKDIR) $$DD; \
+	  $(CP) $$SD/Makefile $$DD/Makefile; \
+	fi; \
+	$(MAKE) -C $$DD $(subst $(@D)/.make,,$@)
+endif
+
+# Set up variables for building libraries
+
+# Define various command line options pertaining to the
+# libraries needed when linking. There are "Proj" libs
+# (defined by the user's project) and "LLVM" libs (defined
+# by the LLVM project).
+
+ifdef USEDLIBS
+ProjLibsOptions := $(patsubst %.a.o, -l%, $(addsuffix .o, $(USEDLIBS)))
+ProjLibsOptions := $(patsubst %.o, $(LibDir)/%.o,  $(ProjLibsOptions))
+ProjUsedLibs    := $(patsubst %.a.o, lib%.a, $(addsuffix .o, $(USEDLIBS)))
+ProjLibsPaths   := $(addprefix $(LibDir)/,$(ProjUsedLibs))
+endif
+
+ifdef LLVMLIBS
+LLVMLibsOptions := $(patsubst %.a.o, -l%, $(addsuffix .o, $(LLVMLIBS)))
+LLVMLibsOptions := $(patsubst %.o, $(LLVMLibDir)/%.o, $(LLVMLibsOptions))
+LLVMUsedLibs    := $(patsubst %.a.o, lib%.a, $(addsuffix .o, $(LLVMLIBS)))
+LLVMLibsPaths   := $(addprefix $(LLVMLibDir)/,$(LLVMUsedLibs))
+endif
+
+ifndef IS_CLEANING_TARGET
+ifdef LINK_COMPONENTS
+
+LLVMConfigLibs := $(shell $(LLVM_CONFIG) --libs $(LINK_COMPONENTS) || echo Error)
+ifeq ($(LLVMConfigLibs),Error)
+$(error llvm-config --libs failed)
+endif
+LLVMLibsOptions += $(LLVMConfigLibs)
+LLVMConfigLibfiles := $(shell $(LLVM_CONFIG) --libfiles $(LINK_COMPONENTS) || echo Error)
+ifeq ($(LLVMConfigLibfiles),Error)
+$(error llvm-config --libfiles failed)
+endif
+LLVMLibsPaths += $(LLVMConfigLibfiles)
+
+endif
+endif
+
+# Library Build Rules: Four ways to build a library
+
+# if we're building a library ...
+ifdef LIBRARYNAME
+
+# Make sure there isn't any extraneous whitespace on the LIBRARYNAME option
+LIBRARYNAME := $(strip $(LIBRARYNAME))
+BaseLibName.A  := lib$(LIBRARYNAME).a
+BaseLibName.SO := $(SharedPrefix)$(LIBRARYNAME)$(SHLIBEXT)
+LibName.A  := $(LibDir)/$(BaseLibName.A)
+LibName.SO := $(SharedLibDir)/$(BaseLibName.SO)
+LibName.O  := $(LibDir)/$(LIBRARYNAME).o
+
+# Library Targets:
+#   If neither BUILD_ARCHIVE or LOADABLE_MODULE are specified, default to
+#   building an archive.
+ifndef NO_BUILD_ARCHIVE
+ifndef BUILD_ARCHIVE
+ifndef LOADABLE_MODULE
+BUILD_ARCHIVE = 1
+endif
+endif
+endif
+
+# Archive Library Targets:
+#   If the user wanted a regular archive library built,
+#   then we provide targets for building them.
+ifdef BUILD_ARCHIVE
+
+all-local:: $(LibName.A)
+
+$(LibName.A): $(ObjectsO) $(LibDir)/.dir
+	$(Echo) Building $(BuildMode) Archive Library $(notdir $@)
+	-$(Verb) $(RM) -f $@
+	$(Verb) $(Archive) $@ $(ObjectsO)
+	$(Verb) $(Ranlib) $@
+
+clean-local::
+ifneq ($(strip $(LibName.A)),)
+	-$(Verb) $(RM) -f $(LibName.A)
+endif
+
+install-local::
+	$(Echo) Install circumvented with NO_INSTALL
+uninstall-local::
+	$(Echo) Uninstall circumvented with NO_INSTALL
+endif
+
+# endif LIBRARYNAME
+endif
+
+# Tool Build Rules: Build executable tool based on TOOLNAME option
+
+ifdef TOOLNAME
+
+# Set up variables for building a tool.
+TOOLEXENAME := $(strip $(TOOLNAME))$(EXEEXT)
+ToolBuildPath   := $(ToolDir)/$(TOOLEXENAME)
+
+# Provide targets for building the tools
+all-local:: $(ToolBuildPath)
+
+clean-local::
+ifneq ($(strip $(ToolBuildPath)),)
+	-$(Verb) $(RM) -f $(ToolBuildPath)
+endif
+
+$(ToolBuildPath): $(ToolDir)/.dir
+
+$(ToolBuildPath): $(ObjectsO) $(ProjLibsPaths) $(LLVMLibsPaths)
+	$(Echo) Linking $(BuildMode) executable $(TOOLNAME) $(StripWarnMsg)
+	$(Verb) $(Link) -o $@ $(TOOLLINKOPTS) $(ObjectsO) $(ProjLibsOptions) \
+	$(LLVMLibsOptions) $(ExtraLibs) $(TOOLLINKOPTSB) $(LIBS)
+	$(Echo) ======= Finished Linking $(BuildMode) Executable $(TOOLNAME) \
+          $(StripWarnMsg)
+
+ifdef NO_INSTALL
+install-local::
+	$(Echo) Install circumvented with NO_INSTALL
+uninstall-local::
+	$(Echo) Uninstall circumvented with NO_INSTALL
+else
+
+ToolBinDir = $(DESTDIR)$(PROJ_bindir)
+DestTool = $(ToolBinDir)/$(program_prefix)$(TOOLEXENAME)
+
+install-local:: $(DestTool)
+
+$(DestTool): $(ToolBuildPath)
+	$(Echo) Installing $(BuildMode) $(DestTool)
+	$(Verb) $(MKDIR) $(ToolBinDir)
+	$(Verb) $(ProgInstall) $(ToolBuildPath) $(DestTool)
+
+uninstall-local::
+	$(Echo) Uninstalling $(BuildMode) $(DestTool)
+	-$(Verb) $(RM) -f $(DestTool)
+
+endif
+endif
+
+# Create .o files in the ObjDir directory from the .cpp and .c files...
+
+DEPEND_OPTIONS = -MMD -MP -MF "$(ObjDir)/$*.d.tmp" \
+         -MT "$(ObjDir)/$*.o" -MT "$(ObjDir)/$*.d"
+
+# If the build succeeded, move the dependency file over, otherwise
+# remove it.
+DEPEND_MOVEFILE = then $(MV) -f "$(ObjDir)/$*.d.tmp" "$(ObjDir)/$*.d"; \
+                  else $(RM) "$(ObjDir)/$*.d.tmp"; exit 1; fi
+
+$(ObjDir)/%.o: %.cpp $(ObjDir)/.dir $(BUILT_SOURCES) $(PROJ_MAKEFILE)
+	$(Echo) "Compiling $*.cpp for $(BuildMode) build" $(PIC_FLAG)
+	$(Verb) if $(Compile.CXX) $(DEPEND_OPTIONS) $< -o $(ObjDir)/$*.o ; \
+	        $(DEPEND_MOVEFILE)
+
+$(ObjDir)/%.o: %.cc $(ObjDir)/.dir $(BUILT_SOURCES) $(PROJ_MAKEFILE)
+	$(Echo) "Compiling $*.cc for $(BuildMode) build" $(PIC_FLAG)
+	$(Verb) if $(Compile.CXX) $(DEPEND_OPTIONS) $< -o $(ObjDir)/$*.o ; \
+	        $(DEPEND_MOVEFILE)
+
+$(ObjDir)/%.o: %.c $(ObjDir)/.dir $(BUILT_SOURCES) $(PROJ_MAKEFILE)
+	$(Echo) "Compiling $*.c for $(BuildMode) build" $(PIC_FLAG)
+	$(Verb) if $(Compile.C) $(DEPEND_OPTIONS) $< -o $(ObjDir)/$*.o ; \
+	        $(DEPEND_MOVEFILE)
+
+# TABLEGEN: Provide rules for running tblgen to produce *.inc files
+
+ifdef TARGET
+TABLEGEN_INC_FILES_COMMON = 1
+endif
+
+ifdef TABLEGEN_INC_FILES_COMMON
+
+INCFiles := $(filter %.inc,$(BUILT_SOURCES))
+INCTMPFiles := $(INCFiles:%=$(ObjDir)/%.tmp)
+.PRECIOUS: $(INCTMPFiles) $(INCFiles)
+
+# INCFiles rule: All of the tblgen generated files are emitted to
+# $(ObjDir)/%.inc.tmp, instead of emitting them directly to %.inc.  This allows
+# us to only "touch" the real file if the contents of it change.  IOW, if
+# tblgen is modified, all of the .inc.tmp files are regenerated, but no
+# dependencies of the .inc files are, unless the contents of the .inc file
+# changes.
+$(INCFiles) : %.inc : $(ObjDir)/%.inc.tmp
+	$(Verb) $(CMP) -s $@ $< || $(CP) $< $@
+
+endif # TABLEGEN_INC_FILES_COMMON
+
+ifdef TARGET
+
+TDFiles := $(strip $(wildcard $(PROJ_SRC_DIR)/*.td) \
+           $(LLVMIncludeDir)/llvm/Target/Target.td \
+           $(LLVMIncludeDir)/llvm/Target/TargetCallingConv.td \
+           $(LLVMIncludeDir)/llvm/Target/TargetSchedule.td \
+           $(LLVMIncludeDir)/llvm/Target/TargetSelectionDAG.td \
+           $(LLVMIncludeDir)/llvm/CodeGen/ValueTypes.td) \
+           $(wildcard $(LLVMIncludeDir)/llvm/Intrinsics*.td)
+
+# All .inc.tmp files depend on the .td files.
+$(INCTMPFiles) : $(TDFiles)
+
+$(TARGET:%=$(ObjDir)/%GenRegisterInfo.inc.tmp): \
+$(ObjDir)/%GenRegisterInfo.inc.tmp : %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) register info implementation with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-register-info -o $(call SYSPATH, $@) $<
+
+$(TARGET:%=$(ObjDir)/%GenInstrInfo.inc.tmp): \
+$(ObjDir)/%GenInstrInfo.inc.tmp : %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) instruction information with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-instr-info -o $(call SYSPATH, $@) $<
+
+$(TARGET:%=$(ObjDir)/%GenAsmWriter.inc.tmp): \
+$(ObjDir)/%GenAsmWriter.inc.tmp : %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) assembly writer with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-asm-writer -o $(call SYSPATH, $@) $<
+
+$(TARGET:%=$(ObjDir)/%GenAsmMatcher.inc.tmp): \
+$(ObjDir)/%GenAsmMatcher.inc.tmp : %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) assembly matcher with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-asm-matcher -o $(call SYSPATH, $@) $<
+
+$(TARGET:%=$(ObjDir)/%GenMCCodeEmitter.inc.tmp): \
+$(ObjDir)/%GenMCCodeEmitter.inc.tmp: %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) MC code emitter with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-emitter -mc-emitter -o $(call SYSPATH, $@) $<
+
+$(TARGET:%=$(ObjDir)/%GenCodeEmitter.inc.tmp): \
+$(ObjDir)/%GenCodeEmitter.inc.tmp: %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) code emitter with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-emitter -o $(call SYSPATH, $@) $<
+
+$(TARGET:%=$(ObjDir)/%GenDAGISel.inc.tmp): \
+$(ObjDir)/%GenDAGISel.inc.tmp : %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) DAG instruction selector implementation with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-dag-isel -o $(call SYSPATH, $@) $<
+
+$(TARGET:%=$(ObjDir)/%GenSubtargetInfo.inc.tmp): \
+$(ObjDir)/%GenSubtargetInfo.inc.tmp : %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) subtarget information with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-subtarget -o $(call SYSPATH, $@) $<
+
+$(TARGET:%=$(ObjDir)/%GenCallingConv.inc.tmp): \
+$(ObjDir)/%GenCallingConv.inc.tmp : %.td $(ObjDir)/.dir $(LLVM_TBLGEN)
+	$(Echo) "Building $(<F) calling convention information with tblgen"
+	$(Verb) $(LLVMTableGen) -gen-callingconv -o $(call SYSPATH, $@) $<
+
+clean-local::
+	-$(Verb) $(RM) -f $(INCFiles)
+
+endif # TARGET
+
+# This rules ensures that header files that are removed still have a rule for
+# which they can be "generated."  This allows make to ignore them and
+# reproduce the dependency lists.
+%.h:: ;
+%.hpp:: ;
+
+# Define clean-local to clean the current directory. Note that this uses a
+# very conservative approach ensuring that empty variables do not cause
+# errors or disastrous removal.
+clean-local::
+ifneq ($(strip $(ObjRootDir)),)
+	-$(Verb) $(RM) -rf $(ObjRootDir)
+endif
+ifneq ($(strip $(SHLIBEXT)),) # Extra paranoia - make real sure SHLIBEXT is set
+	-$(Verb) $(RM) -f *$(SHLIBEXT)
+endif
+
+clean-all-local::
+	-$(Verb) $(RM) -rf Debug Release Profile
+
+
+# DEPENDENCIES: Include the dependency files if we should
+ifndef DISABLE_AUTO_DEPENDENCIES
+
+# If its not one of the cleaning targets
+ifndef IS_CLEANING_TARGET
+
+# Get the list of dependency files
+DependSourceFiles := $(basename $(filter %.cpp %.c %.cc %.m %.mm, $(Sources)))
+DependFiles := $(DependSourceFiles:%=$(PROJ_OBJ_DIR)/$(BuildMode)/%.d)
+
+-include $(DependFiles) ""
+
+endif
+
+endif
+
diff --git a/tools/bpf/llvm/README.txt b/tools/bpf/llvm/README.txt
new file mode 100644
index 000000000000..971207b33ff7
--- /dev/null
+++ b/tools/bpf/llvm/README.txt
@@ -0,0 +1,23 @@
+LLVM BPF backend:
+lib/Target/BPF/*.cpp
+
+Links with LLVM 3.2, 3.3 and 3.4
+
+prerequisites:
+apt-get install clang llvm-3.[234]-dev
+
+To build:
+$cd bld
+$make
+if 'llvm-config-3.2' is not found in PATH, build with:
+$make -j4 LLVM_CONFIG=/path_to/llvm-config
+
+To run:
+$clang -O2 -emit-llvm -c file.c -o -|./bld/Debug+Asserts/bin/llc -o file.o
+
+'clang' - is unmodified clang used to build x86 code
+'llc' - llvm bit-code to BPF compiler
+file.o - BPF binary image, see include/uapi/linux/bpf.h
+
+$clang -O2 -emit-llvm -c file.c -o -|llc -filetype=asm -o file.s
+will emit human readable BPF assembler instead.
diff --git a/tools/bpf/llvm/bld/Makefile b/tools/bpf/llvm/bld/Makefile
new file mode 100644
index 000000000000..7ac0938906f6
--- /dev/null
+++ b/tools/bpf/llvm/bld/Makefile
@@ -0,0 +1,27 @@
+#===- ./Makefile -------------------------------------------*- Makefile -*--===#
+#
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+
+ifndef LLVM_CONFIG
+LLVM_CONFIG := llvm-config-3.2
+export LLVM_CONFIG
+endif
+
+LEVEL := .
+
+DIRS := lib tools
+
+include $(LEVEL)/Makefile.config
+
+# Include the main makefile machinery.
+include $(LLVM_SRC_ROOT)/Makefile.rules
+
+# NOTE: This needs to remain as the last target definition in this file so
+# that it gets executed last.
+all::
+	$(Echo) '*****' Completed $(BuildMode) Build
+
+# declare all targets at this level to be serial:
+.NOTPARALLEL:
+
diff --git a/tools/bpf/llvm/bld/Makefile.common b/tools/bpf/llvm/bld/Makefile.common
new file mode 100644
index 000000000000..624f7d3096a5
--- /dev/null
+++ b/tools/bpf/llvm/bld/Makefile.common
@@ -0,0 +1,14 @@
+#===-- Makefile.common - Common make rules for LLVM --------*- Makefile -*--===#
+#
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+
+# Configuration file to set paths specific to local installation of LLVM
+ifndef LLVM_OBJ_ROOT
+include $(LEVEL)/Makefile.config
+else
+include $(LLVM_OBJ_ROOT)/Makefile.config
+endif
+
+# Include all of the build rules used for making LLVM
+include $(LLVM_SRC_ROOT)/Makefile.rules
diff --git a/tools/bpf/llvm/bld/Makefile.config b/tools/bpf/llvm/bld/Makefile.config
new file mode 100644
index 000000000000..d8eda0551161
--- /dev/null
+++ b/tools/bpf/llvm/bld/Makefile.config
@@ -0,0 +1,124 @@
+#===-- Makefile.config - Local configuration for LLVM ------*- Makefile -*--===#
+#
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+#
+# This file is included by Makefile.common.  It defines paths and other
+# values specific to a particular installation of LLVM.
+#
+
+# Directory Configuration
+#	This section of the Makefile determines what is where.  To be
+#	specific, there are several locations that need to be defined:
+#
+#	o LLVM_SRC_ROOT  : The root directory of the LLVM source code.
+#	o LLVM_OBJ_ROOT  : The root directory containing the built LLVM code.
+#
+#	o PROJ_SRC_DIR  : The directory containing the code to build.
+#	o PROJ_SRC_ROOT : The root directory of the code to build.
+#
+#	o PROJ_OBJ_DIR  : The directory in which compiled code will be placed.
+#	o PROJ_OBJ_ROOT : The root directory in which compiled code is placed.
+
+PWD := /bin/pwd
+
+# The macro below is expanded when 'realpath' is not built-in.
+# Built-in 'realpath' is available on GNU Make 3.81.
+realpath = $(shell cd $(1); $(PWD))
+
+PROJ_OBJ_DIR  := $(call realpath, .)
+PROJ_OBJ_ROOT := $(call realpath, $(PROJ_OBJ_DIR)/$(LEVEL))
+
+LLVM_SRC_ROOT   := $(call realpath, $(PROJ_OBJ_DIR)/$(LEVEL)/..)
+LLVM_OBJ_ROOT   := $(call realpath, $(PROJ_OBJ_DIR)/$(LEVEL))
+PROJ_SRC_ROOT   := $(LLVM_SRC_ROOT)
+PROJ_SRC_DIR    := $(LLVM_SRC_ROOT)$(patsubst $(PROJ_OBJ_ROOT)%,%,$(PROJ_OBJ_DIR))
+
+prefix          := /usr/local
+PROJ_prefix     := $(prefix)
+program_prefix  := 
+
+PROJ_bindir     := $(PROJ_prefix)/bin
+
+# Extra options to compile LLVM with
+EXTRA_OPTIONS=
+
+# Extra options to link LLVM with
+EXTRA_LD_OPTIONS=
+
+# Path to the C++ compiler to use.  This is an optional setting, which defaults
+# to whatever your gmake defaults to.
+CXX = g++
+
+# Path to the CC binary, which use used by testcases for native builds.
+CC := gcc
+
+# Linker flags.
+LDFLAGS+=
+
+# Path to the library archiver program.
+AR_PATH = ar
+AR = ar
+
+# The pathnames of the programs we require to build
+CMP        := /usr/bin/cmp
+CP         := /bin/cp
+DATE       := /bin/date
+INSTALL    := /usr/bin/install -c
+MKDIR      := mkdir -p
+MV         := /bin/mv
+RANLIB     := ranlib
+RM         := /bin/rm
+
+LIBS       := -lncurses -lpthread -ldl -lm
+
+# Targets that we should build
+TARGETS_TO_BUILD=BPF 
+
+# What to pass as rpath flag to g++
+RPATH := -Wl,-R
+
+# What to pass as -rdynamic flag to g++
+RDYNAMIC := -Wl,-export-dynamic
+
+# When ENABLE_WERROR is enabled, we'll pass -Werror on the command line
+ENABLE_WERROR = 0
+
+# When ENABLE_OPTIMIZED is enabled, LLVM code is optimized and output is put
+# into the "Release" directories. Otherwise, LLVM code is not optimized and
+# output is put in the "Debug" directories.
+#ENABLE_OPTIMIZED = 1
+
+# When DISABLE_ASSERTIONS is enabled, builds of all of the LLVM code will
+# exclude assertion checks, otherwise they are included.
+#DISABLE_ASSERTIONS = 1
+
+# When DEBUG_SYMBOLS is enabled, the compiler libraries will retain debug
+# symbols.
+#DEBUG_SYMBOLS = 1
+
+# When KEEP_SYMBOLS is enabled, installed executables will never have their
+# symbols stripped.
+#KEEP_SYMBOLS = 1
+
+# The compiler flags to use for optimized builds.
+OPTIMIZE_OPTION := -O3
+
+# Use -fvisibility-inlines-hidden?
+ENABLE_VISIBILITY_INLINES_HIDDEN := 1
+
+# This option tells the Makefiles to produce verbose output.
+# It essentially prints the commands that make is executing
+#VERBOSE = 1
+
+# Shared library extension for host platform.
+SHLIBEXT = .so
+
+# Executable file extension for host platform.
+EXEEXT = 
+
+# Things we just assume are "there"
+ECHO := echo
+
+SYSPATH = $(1)
+
diff --git a/tools/bpf/llvm/bld/include/llvm/Config/AsmParsers.def b/tools/bpf/llvm/bld/include/llvm/Config/AsmParsers.def
new file mode 100644
index 000000000000..9efd8f45526c
--- /dev/null
+++ b/tools/bpf/llvm/bld/include/llvm/Config/AsmParsers.def
@@ -0,0 +1,8 @@
+/*===- llvm/Config/AsmParsers.def - LLVM Assembly Parsers -------*- C++ -*-===*\
+|* This file is distributed under the University of Illinois Open Source      *|
+|* License. See LICENSE.TXT for details.                                      *|
+\*===----------------------------------------------------------------------===*/
+#ifndef LLVM_ASM_PARSER
+#  error Please define the macro LLVM_ASM_PARSER(TargetName)
+#endif
+#undef LLVM_ASM_PARSER
diff --git a/tools/bpf/llvm/bld/include/llvm/Config/AsmPrinters.def b/tools/bpf/llvm/bld/include/llvm/Config/AsmPrinters.def
new file mode 100644
index 000000000000..f212afa62b37
--- /dev/null
+++ b/tools/bpf/llvm/bld/include/llvm/Config/AsmPrinters.def
@@ -0,0 +1,9 @@
+/*===- llvm/Config/AsmPrinters.def - LLVM Assembly Printers -----*- C++ -*-===*\
+|* This file is distributed under the University of Illinois Open Source      *|
+|* License. See LICENSE.TXT for details.                                      *|
+\*===----------------------------------------------------------------------===*/
+#ifndef LLVM_ASM_PRINTER
+#  error Please define the macro LLVM_ASM_PRINTER(TargetName)
+#endif
+LLVM_ASM_PRINTER(BPF) 
+#undef LLVM_ASM_PRINTER
diff --git a/tools/bpf/llvm/bld/include/llvm/Config/Disassemblers.def b/tools/bpf/llvm/bld/include/llvm/Config/Disassemblers.def
new file mode 100644
index 000000000000..527473fb24e5
--- /dev/null
+++ b/tools/bpf/llvm/bld/include/llvm/Config/Disassemblers.def
@@ -0,0 +1,8 @@
+/*===- llvm/Config/Disassemblers.def - LLVM Assembly Parsers ----*- C++ -*-===*\
+|* This file is distributed under the University of Illinois Open Source      *|
+|* License. See LICENSE.TXT for details.                                      *|
+\*===----------------------------------------------------------------------===*/
+#ifndef LLVM_DISASSEMBLER
+#  error Please define the macro LLVM_DISASSEMBLER(TargetName)
+#endif
+#undef LLVM_DISASSEMBLER
diff --git a/tools/bpf/llvm/bld/include/llvm/Config/Targets.def b/tools/bpf/llvm/bld/include/llvm/Config/Targets.def
new file mode 100644
index 000000000000..cb2852cb19cf
--- /dev/null
+++ b/tools/bpf/llvm/bld/include/llvm/Config/Targets.def
@@ -0,0 +1,9 @@
+/*===- llvm/Config/Targets.def - LLVM Target Architectures ------*- C++ -*-===*\
+|* This file is distributed under the University of Illinois Open Source      *|
+|* License. See LICENSE.TXT for details.                                      *|
+\*===----------------------------------------------------------------------===*/
+#ifndef LLVM_TARGET
+#  error Please define the macro LLVM_TARGET(TargetName)
+#endif
+LLVM_TARGET(BPF) 
+#undef LLVM_TARGET
diff --git a/tools/bpf/llvm/bld/include/llvm/Support/DataTypes.h b/tools/bpf/llvm/bld/include/llvm/Support/DataTypes.h
new file mode 100644
index 000000000000..81328a6c0679
--- /dev/null
+++ b/tools/bpf/llvm/bld/include/llvm/Support/DataTypes.h
@@ -0,0 +1,96 @@
+/* include/llvm/Support/DataTypes.h.  Generated from DataTypes.h.in by configure.  */
+/*===-- include/Support/DataTypes.h - Define fixed size types -----*- C -*-===*\
+|*                                                                            *|
+|*                     The LLVM Compiler Infrastructure                       *|
+|*                                                                            *|
+|* This file is distributed under the University of Illinois Open Source      *|
+|* License. See LICENSE.TXT for details.                                      *|
+|*                                                                            *|
+|*===----------------------------------------------------------------------===*|
+|*                                                                            *|
+|* This file contains definitions to figure out the size of _HOST_ data types.*|
+|* This file is important because different host OS's define different macros,*|
+|* which makes portability tough.  This file exports the following            *|
+|* definitions:                                                               *|
+|*                                                                            *|
+|*   [u]int(32|64)_t : typedefs for signed and unsigned 32/64 bit system types*|
+|*   [U]INT(8|16|32|64)_(MIN|MAX) : Constants for the min and max values.     *|
+|*                                                                            *|
+|* No library is required when using these functions.                         *|
+|*                                                                            *|
+|*===----------------------------------------------------------------------===*/
+
+/* Please leave this file C-compatible. */
+
+#ifndef SUPPORT_DATATYPES_H
+#define SUPPORT_DATATYPES_H
+
+#define HAVE_SYS_TYPES_H 1
+#define HAVE_INTTYPES_H 1
+#define HAVE_STDINT_H 1
+#define HAVE_UINT64_T 1
+/* #undef HAVE_U_INT64_T */
+
+#ifdef __cplusplus
+#include <cmath>
+#else
+#include <math.h>
+#endif
+
+/* Note that this header's correct operation depends on __STDC_LIMIT_MACROS
+   being defined.  We would define it here, but in order to prevent Bad Things
+   happening when system headers or C++ STL headers include stdint.h before we
+   define it here, we define it on the g++ command line (in Makefile.rules). */
+#if !defined(__STDC_LIMIT_MACROS)
+# error "Must #define __STDC_LIMIT_MACROS before #including Support/DataTypes.h"
+#endif
+
+#if !defined(__STDC_CONSTANT_MACROS)
+# error "Must #define __STDC_CONSTANT_MACROS before " \
+        "#including Support/DataTypes.h"
+#endif
+
+/* Note that <inttypes.h> includes <stdint.h>, if this is a C99 system. */
+#ifdef HAVE_SYS_TYPES_H
+#include <sys/types.h>
+#endif
+
+#ifdef HAVE_INTTYPES_H
+#include <inttypes.h>
+#endif
+
+#ifdef HAVE_STDINT_H
+#include <stdint.h>
+#endif
+
+/* Handle incorrect definition of uint64_t as u_int64_t */
+#ifndef HAVE_UINT64_T
+#ifdef HAVE_U_INT64_T
+typedef u_int64_t uint64_t;
+#else
+# error "Don't have a definition for uint64_t on this platform"
+#endif
+#endif
+
+/* Set defaults for constants which we cannot find. */
+#if !defined(INT64_MAX)
+# define INT64_MAX 9223372036854775807LL
+#endif
+#if !defined(INT64_MIN)
+# define INT64_MIN ((-INT64_MAX)-1)
+#endif
+#if !defined(UINT64_MAX)
+# define UINT64_MAX 0xffffffffffffffffULL
+#endif
+
+#if __GNUC__ > 3
+#define END_WITH_NULL __attribute__((sentinel))
+#else
+#define END_WITH_NULL
+#endif
+
+#ifndef HUGE_VALF
+#define HUGE_VALF (float)HUGE_VAL
+#endif
+
+#endif  /* SUPPORT_DATATYPES_H */
diff --git a/tools/bpf/llvm/bld/lib/Makefile b/tools/bpf/llvm/bld/lib/Makefile
new file mode 100644
index 000000000000..5c7e219ad9bf
--- /dev/null
+++ b/tools/bpf/llvm/bld/lib/Makefile
@@ -0,0 +1,11 @@
+##===- lib/Makefile ----------------------------------------*- Makefile -*-===##
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+LEVEL = ..
+
+include $(LEVEL)/Makefile.config
+
+PARALLEL_DIRS := Target
+
+include $(LEVEL)/Makefile.common
+
diff --git a/tools/bpf/llvm/bld/lib/Target/BPF/InstPrinter/Makefile b/tools/bpf/llvm/bld/lib/Target/BPF/InstPrinter/Makefile
new file mode 100644
index 000000000000..d9a4522d4320
--- /dev/null
+++ b/tools/bpf/llvm/bld/lib/Target/BPF/InstPrinter/Makefile
@@ -0,0 +1,10 @@
+##===- lib/Target/BPF/InstPrinter/Makefile ----------------*- Makefile -*-===##
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+LEVEL = ../../../..
+LIBRARYNAME = LLVMBPFAsmPrinter
+
+# Hack: we need to include 'main' BPF target directory to grab private headers
+CPP.Flags += -I$(PROJ_OBJ_DIR)/.. -I$(PROJ_SRC_DIR)/..
+
+include $(LEVEL)/Makefile.common
diff --git a/tools/bpf/llvm/bld/lib/Target/BPF/MCTargetDesc/Makefile b/tools/bpf/llvm/bld/lib/Target/BPF/MCTargetDesc/Makefile
new file mode 100644
index 000000000000..5f2e2090d449
--- /dev/null
+++ b/tools/bpf/llvm/bld/lib/Target/BPF/MCTargetDesc/Makefile
@@ -0,0 +1,11 @@
+##===- lib/Target/BPF/TargetDesc/Makefile ----------------*- Makefile -*-===##
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+
+LEVEL = ../../../..
+LIBRARYNAME = LLVMBPFDesc
+
+# Hack: we need to include 'main' target directory to grab private headers
+CPP.Flags += -I$(PROJ_OBJ_DIR)/.. -I$(PROJ_SRC_DIR)/..
+
+include $(LEVEL)/Makefile.common
diff --git a/tools/bpf/llvm/bld/lib/Target/BPF/Makefile b/tools/bpf/llvm/bld/lib/Target/BPF/Makefile
new file mode 100644
index 000000000000..14dea1a3851b
--- /dev/null
+++ b/tools/bpf/llvm/bld/lib/Target/BPF/Makefile
@@ -0,0 +1,17 @@
+##===- lib/Target/BPF/Makefile ---------------------------*- Makefile -*-===##
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+
+LEVEL = ../../..
+LIBRARYNAME = LLVMBPFCodeGen
+TARGET = BPF
+
+# Make sure that tblgen is run, first thing.
+BUILT_SOURCES = BPFGenRegisterInfo.inc BPFGenInstrInfo.inc \
+		BPFGenAsmWriter.inc BPFGenAsmMatcher.inc BPFGenDAGISel.inc \
+		BPFGenMCCodeEmitter.inc BPFGenSubtargetInfo.inc BPFGenCallingConv.inc
+
+DIRS = InstPrinter TargetInfo MCTargetDesc
+
+include $(LEVEL)/Makefile.common
+
diff --git a/tools/bpf/llvm/bld/lib/Target/BPF/TargetInfo/Makefile b/tools/bpf/llvm/bld/lib/Target/BPF/TargetInfo/Makefile
new file mode 100644
index 000000000000..fdf9056d37f6
--- /dev/null
+++ b/tools/bpf/llvm/bld/lib/Target/BPF/TargetInfo/Makefile
@@ -0,0 +1,10 @@
+##===- lib/Target/BPF/TargetInfo/Makefile ----------------*- Makefile -*-===##
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+LEVEL = ../../../..
+LIBRARYNAME = LLVMBPFInfo
+
+# Hack: we need to include 'main' target directory to grab private headers
+CPPFLAGS = -I$(PROJ_OBJ_DIR)/.. -I$(PROJ_SRC_DIR)/..
+
+include $(LEVEL)/Makefile.common
diff --git a/tools/bpf/llvm/bld/lib/Target/Makefile b/tools/bpf/llvm/bld/lib/Target/Makefile
new file mode 100644
index 000000000000..06e518533407
--- /dev/null
+++ b/tools/bpf/llvm/bld/lib/Target/Makefile
@@ -0,0 +1,11 @@
+#===- lib/Target/Makefile ----------------------------------*- Makefile -*-===##
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+
+LEVEL = ../..
+
+include $(LEVEL)/Makefile.config
+
+PARALLEL_DIRS := $(TARGETS_TO_BUILD)
+
+include $(LLVM_SRC_ROOT)/Makefile.rules
diff --git a/tools/bpf/llvm/bld/tools/Makefile b/tools/bpf/llvm/bld/tools/Makefile
new file mode 100644
index 000000000000..6613681389d5
--- /dev/null
+++ b/tools/bpf/llvm/bld/tools/Makefile
@@ -0,0 +1,12 @@
+##===- tools/Makefile --------------------------------------*- Makefile -*-===##
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+
+LEVEL := ..
+
+include $(LEVEL)/Makefile.config
+
+DIRS :=
+PARALLEL_DIRS := llc
+
+include $(LEVEL)/Makefile.common
diff --git a/tools/bpf/llvm/bld/tools/llc/Makefile b/tools/bpf/llvm/bld/tools/llc/Makefile
new file mode 100644
index 000000000000..499feb0ac113
--- /dev/null
+++ b/tools/bpf/llvm/bld/tools/llc/Makefile
@@ -0,0 +1,15 @@
+#===- tools/llc/Makefile -----------------------------------*- Makefile -*-===##
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+
+LEVEL := ../..
+TOOLNAME := llc
+ifneq (,$(filter $(shell $(LLVM_CONFIG) --version),3.3 3.4))
+LINK_COMPONENTS := asmparser asmprinter codegen bitreader core mc selectiondag support target irreader
+else
+LINK_COMPONENTS := asmparser asmprinter codegen bitreader core mc selectiondag support target
+endif
+USEDLIBS := LLVMBPFCodeGen.a LLVMBPFDesc.a LLVMBPFInfo.a LLVMBPFAsmPrinter.a
+
+include $(LEVEL)/Makefile.common
+
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPF.h b/tools/bpf/llvm/lib/Target/BPF/BPF.h
new file mode 100644
index 000000000000..53fe49a9c4ea
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPF.h
@@ -0,0 +1,28 @@
+//===-- BPF.h - Top-level interface for BPF representation ----*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#ifndef TARGET_BPF_H
+#define TARGET_BPF_H
+#include "llvm/Config/config.h"
+#undef LLVM_NATIVE_TARGET
+#undef LLVM_NATIVE_ASMPRINTER
+#undef LLVM_NATIVE_ASMPARSER
+#undef LLVM_NATIVE_DISASSEMBLER
+#include "MCTargetDesc/BPFBaseInfo.h"
+#include "MCTargetDesc/BPFMCTargetDesc.h"
+#include "llvm/Target/TargetMachine.h"
+
+namespace llvm {
+class FunctionPass;
+class TargetMachine;
+class BPFTargetMachine;
+
+/// createBPFISelDag - This pass converts a legalized DAG into a
+/// BPF-specific DAG, ready for instruction scheduling.
+FunctionPass *createBPFISelDag(BPFTargetMachine &TM);
+
+extern Target TheBPFTarget;
+}
+
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPF.td b/tools/bpf/llvm/lib/Target/BPF/BPF.td
new file mode 100644
index 000000000000..867c7f8134de
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPF.td
@@ -0,0 +1,29 @@
+//===- BPF.td - Describe the BPF Target Machine --------*- tablegen -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+// Target-independent interfaces which we are implementing
+include "llvm/Target/Target.td"
+
+// BPF Subtarget features.
+include "BPFRegisterInfo.td"
+include "BPFCallingConv.td"
+include "BPFInstrInfo.td"
+
+def BPFInstrInfo : InstrInfo;
+
+class Proc<string Name, list<SubtargetFeature> Features>
+ : Processor<Name, NoItineraries, Features>;
+
+def : Proc<"generic", []>;
+
+def BPFInstPrinter : AsmWriter {
+  string AsmWriterClassName  = "InstPrinter";
+  bit isMCAsmWriter = 1;
+}
+
+// Declare the target which we are implementing
+def BPF : Target {
+  let InstructionSet = BPFInstrInfo;
+  let AssemblyWriters = [BPFInstPrinter];
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFAsmPrinter.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFAsmPrinter.cpp
new file mode 100644
index 000000000000..9740d87a593e
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFAsmPrinter.cpp
@@ -0,0 +1,100 @@
+//===-- BPFAsmPrinter.cpp - BPF LLVM assembly writer --------------------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file contains a printer that converts from our internal representation
+// of machine-dependent LLVM code to the BPF assembly language.
+
+#define DEBUG_TYPE "asm-printer"
+#include "BPF.h"
+#include "BPFInstrInfo.h"
+#include "BPFMCInstLower.h"
+#include "BPFTargetMachine.h"
+#include "InstPrinter/BPFInstPrinter.h"
+#include "llvm/Assembly/Writer.h"
+#include "llvm/CodeGen/AsmPrinter.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineConstantPool.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/MC/MCAsmInfo.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/MC/MCStreamer.h"
+#include "llvm/MC/MCSymbol.h"
+#include "llvm/Target/Mangler.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Support/raw_ostream.h"
+using namespace llvm;
+
+namespace {
+  class BPFAsmPrinter : public AsmPrinter {
+  public:
+    explicit BPFAsmPrinter(TargetMachine &TM, MCStreamer &Streamer)
+      : AsmPrinter(TM, Streamer) {}
+
+    virtual const char *getPassName() const {
+      return "BPF Assembly Printer";
+    }
+
+    void printOperand(const MachineInstr *MI, int OpNum,
+                      raw_ostream &O, const char* Modifier = 0);
+    void EmitInstruction(const MachineInstr *MI);
+  private:
+    void customEmitInstruction(const MachineInstr *MI);
+  };
+}
+
+void BPFAsmPrinter::printOperand(const MachineInstr *MI, int OpNum,
+                                  raw_ostream &O, const char *Modifier) {
+  const MachineOperand &MO = MI->getOperand(OpNum);
+
+  switch (MO.getType()) {
+  case MachineOperand::MO_Register:
+    O << BPFInstPrinter::getRegisterName(MO.getReg());
+    break;
+
+  case MachineOperand::MO_Immediate:
+    O << MO.getImm();
+    break;
+
+  case MachineOperand::MO_MachineBasicBlock:
+    O << *MO.getMBB()->getSymbol();
+    break;
+
+  case MachineOperand::MO_GlobalAddress:
+#if LLVM_VERSION_MINOR==4
+      O << *getSymbol(MO.getGlobal());
+#else
+      O << *Mang->getSymbol(MO.getGlobal());
+#endif
+    break;
+
+  default:
+    llvm_unreachable("<unknown operand type>");
+    O << "bug";
+    return;
+  }
+}
+
+void BPFAsmPrinter::customEmitInstruction(const MachineInstr *MI) {
+  BPFMCInstLower MCInstLowering(OutContext, *Mang, *this);
+
+  MCInst TmpInst;
+  MCInstLowering.Lower(MI, TmpInst);
+  OutStreamer.EmitInstruction(TmpInst);
+}
+
+void BPFAsmPrinter::EmitInstruction(const MachineInstr *MI) {
+
+  MachineBasicBlock::const_instr_iterator I = MI;
+  MachineBasicBlock::const_instr_iterator E = MI->getParent()->instr_end();
+
+  do {
+    customEmitInstruction(I++);
+  } while ((I != E) && I->isInsideBundle());
+}
+
+// Force static initialization.
+extern "C" void LLVMInitializeBPFAsmPrinter() {
+  RegisterAsmPrinter<BPFAsmPrinter> X(TheBPFTarget);
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFCallingConv.td b/tools/bpf/llvm/lib/Target/BPF/BPFCallingConv.td
new file mode 100644
index 000000000000..27c327e5f5e9
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFCallingConv.td
@@ -0,0 +1,24 @@
+//===- BPFCallingConv.td - Calling Conventions BPF -------*- tablegen -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This describes the calling conventions for the BPF architectures.
+
+// BPF 64-bit C return-value convention.
+def RetCC_BPF64 : CallingConv<[
+  CCIfType<[i64], CCAssignToReg<[R0]>>
+]>;
+
+// BPF 64-bit C Calling convention.
+def CC_BPF64 : CallingConv<[
+  // Promote i8/i16/i32 args to i64
+  CCIfType<[i8, i16, i32], CCPromoteToType<i64>>,
+
+  // All arguments get passed in integer registers if there is space.
+  CCIfType<[i64], CCAssignToReg<[R1, R2, R3, R4, R5]>>,
+
+  // Alternatively, they are assigned to the stack in 8-byte aligned units.
+  CCAssignToStack<8, 8>
+]>;
+
+def CSR: CalleeSavedRegs<(add R6, R7, R8, R9, R10)>;
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.cpp
new file mode 100644
index 000000000000..b263b5fd0494
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.cpp
@@ -0,0 +1,36 @@
+//===-- BPFFrameLowering.cpp - BPF Frame Information --------------------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file contains the BPF implementation of TargetFrameLowering class.
+
+#include "BPFFrameLowering.h"
+#include "BPFInstrInfo.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+
+using namespace llvm;
+
+bool BPFFrameLowering::hasFP(const MachineFunction &MF) const {
+  return true;
+}
+
+void BPFFrameLowering::emitPrologue(MachineFunction &MF) const {
+}
+
+void BPFFrameLowering::emitEpilogue(MachineFunction &MF,
+                                    MachineBasicBlock &MBB) const {
+}
+
+void BPFFrameLowering::
+processFunctionBeforeCalleeSavedScan(MachineFunction &MF,
+                                     RegScavenger *RS) const {
+  MachineRegisterInfo& MRI = MF.getRegInfo();
+
+  MRI.setPhysRegUnused(BPF::R6);
+  MRI.setPhysRegUnused(BPF::R7);
+  MRI.setPhysRegUnused(BPF::R8);
+  MRI.setPhysRegUnused(BPF::R9);
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.h b/tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.h
new file mode 100644
index 000000000000..3e3d9adb8dc9
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFFrameLowering.h
@@ -0,0 +1,35 @@
+//===-- BPFFrameLowering.h - Define frame lowering for BPF ---*- C++ -*--===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#ifndef BPF_FRAMEINFO_H
+#define BPF_FRAMEINFO_H
+
+#include "BPF.h"
+#include "BPFSubtarget.h"
+#include "llvm/Target/TargetFrameLowering.h"
+
+namespace llvm {
+class BPFSubtarget;
+
+class BPFFrameLowering : public TargetFrameLowering {
+public:
+  explicit BPFFrameLowering(const BPFSubtarget &sti)
+    : TargetFrameLowering(TargetFrameLowering::StackGrowsDown, 8, 0) {
+  }
+
+  void emitPrologue(MachineFunction &MF) const;
+  void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
+
+  bool hasFP(const MachineFunction &MF) const;
+  virtual void processFunctionBeforeCalleeSavedScan(MachineFunction &MF,
+                                                    RegScavenger *RS) const;
+
+  // llvm 3.3 defines it here
+  void eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
+                                     MachineBasicBlock::iterator MI) const {
+    MBB.erase(MI);
+  }
+};
+}
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFISelDAGToDAG.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFISelDAGToDAG.cpp
new file mode 100644
index 000000000000..85f905bae574
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFISelDAGToDAG.cpp
@@ -0,0 +1,182 @@
+//===-- BPFISelDAGToDAG.cpp - A dag to dag inst selector for BPF --------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file defines an instruction selector for the BPF target.
+
+#define DEBUG_TYPE "bpf-isel"
+#include "BPF.h"
+#include "BPFRegisterInfo.h"
+#include "BPFSubtarget.h"
+#include "BPFTargetMachine.h"
+#include "llvm/Support/CFG.h"
+#include "llvm/CodeGen/MachineConstantPool.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/SelectionDAGISel.h"
+#include "llvm/Target/TargetMachine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/raw_ostream.h"
+using namespace llvm;
+
+// Instruction Selector Implementation
+namespace {
+
+class BPFDAGToDAGISel : public SelectionDAGISel {
+
+  /// TM - Keep a reference to BPFTargetMachine.
+  BPFTargetMachine &TM;
+
+  /// Subtarget - Keep a pointer to the BPFSubtarget around so that we can
+  /// make the right decision when generating code for different targets.
+  const BPFSubtarget &Subtarget;
+
+public:
+  explicit BPFDAGToDAGISel(BPFTargetMachine &tm) :
+  SelectionDAGISel(tm),
+  TM(tm), Subtarget(tm.getSubtarget<BPFSubtarget>()) {}
+
+  // Pass Name
+  virtual const char *getPassName() const {
+    return "BPF DAG->DAG Pattern Instruction Selection";
+  }
+
+private:
+  // Include the pieces autogenerated from the target description.
+  #include "BPFGenDAGISel.inc"
+
+  /// getTargetMachine - Return a reference to the TargetMachine, casted
+  /// to the target-specific type.
+  const BPFTargetMachine &getTargetMachine() {
+    return static_cast<const BPFTargetMachine &>(TM);
+  }
+
+  /// getInstrInfo - Return a reference to the TargetInstrInfo, casted
+  /// to the target-specific type.
+  const BPFInstrInfo *getInstrInfo() {
+    return getTargetMachine().getInstrInfo();
+  }
+
+  SDNode *Select(SDNode *N);
+
+  // Complex Pattern for address selection.
+  bool SelectAddr(SDValue Addr, SDValue &Base, SDValue &Offset);
+
+  // getI32Imm - Return a target constant with the specified value, of type i32.
+  inline SDValue getI32Imm(unsigned Imm) {
+    return CurDAG->getTargetConstant(Imm, MVT::i64);
+  }
+};
+
+}
+
+/// ComplexPattern used on BPFInstrInfo
+/// Used on BPF Load/Store instructions
+bool BPFDAGToDAGISel::
+SelectAddr(SDValue Addr, SDValue &Base, SDValue &Offset) {
+  // if Address is FI, get the TargetFrameIndex.
+  if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Addr)) {
+    Base   = CurDAG->getTargetFrameIndex(FIN->getIndex(), MVT::i64);
+    Offset = CurDAG->getTargetConstant(0, MVT::i64);
+    return true;
+  }
+
+  if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
+      Addr.getOpcode() == ISD::TargetGlobalAddress)
+    return false;
+
+  // Addresses of the form FI+const or FI|const
+  if (CurDAG->isBaseWithConstantOffset(Addr)) {
+    ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Addr.getOperand(1));
+    if (isInt<32>(CN->getSExtValue())) {
+
+      // If the first operand is a FI, get the TargetFI Node
+      if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>
+                                  (Addr.getOperand(0)))
+        Base = CurDAG->getTargetFrameIndex(FIN->getIndex(), MVT::i64);
+      else
+        Base = Addr.getOperand(0);
+
+      Offset = CurDAG->getTargetConstant(CN->getSExtValue(), MVT::i64);
+      return true;
+    }
+  }
+
+  // Operand is a result from an ADD.
+  if (Addr.getOpcode() == ISD::ADD) {
+    if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Addr.getOperand(1))) {
+      if (isInt<32>(CN->getSExtValue())) {
+
+        // If the first operand is a FI, get the TargetFI Node
+        if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>
+                                    (Addr.getOperand(0))) {
+          Base = CurDAG->getTargetFrameIndex(FIN->getIndex(), MVT::i64);
+        } else {
+          Base = Addr.getOperand(0);
+        }
+
+        Offset = CurDAG->getTargetConstant(CN->getSExtValue(), MVT::i64);
+        return true;
+      }
+    }
+  }
+
+  Base   = Addr;
+  Offset = CurDAG->getTargetConstant(0, MVT::i64);
+  return true;
+}
+
+/// Select instructions not customized! Used for
+/// expanded, promoted and normal instructions
+SDNode* BPFDAGToDAGISel::Select(SDNode *Node) {
+  unsigned Opcode = Node->getOpcode();
+
+  // Dump information about the Node being selected
+  DEBUG(errs() << "Selecting: "; Node->dump(CurDAG); errs() << "\n");
+
+  // If we have a custom node, we already have selected!
+  if (Node->isMachineOpcode()) {
+    DEBUG(errs() << "== "; Node->dump(CurDAG); errs() << "\n");
+    return NULL;
+  }
+
+  // tablegen selection should be handled here.
+  switch(Opcode) {
+    default: break;
+
+    case ISD::FrameIndex: {
+        int FI = dyn_cast<FrameIndexSDNode>(Node)->getIndex();
+        EVT VT = Node->getValueType(0);
+        SDValue TFI = CurDAG->getTargetFrameIndex(FI, VT);
+        unsigned Opc = BPF::MOV_rr;
+        if (Node->hasOneUse())
+          return CurDAG->SelectNodeTo(Node, Opc, VT, TFI);
+#if LLVM_VERSION_MINOR==4
+        return CurDAG->getMachineNode(Opc, SDLoc(Node), VT, TFI);
+#else
+        return CurDAG->getMachineNode(Opc, Node->getDebugLoc(), VT, TFI);
+#endif
+
+    }
+  }
+
+  // Select the default instruction
+  SDNode *ResNode = SelectCode(Node);
+
+  DEBUG(errs() << "=> ");
+  if (ResNode == NULL || ResNode == Node)
+    DEBUG(Node->dump(CurDAG));
+  else
+    DEBUG(ResNode->dump(CurDAG));
+  DEBUG(errs() << "\n");
+  return ResNode;
+}
+
+/// createBPFISelDag - This pass converts a legalized DAG into a
+/// BPF-specific DAG, ready for instruction scheduling.
+FunctionPass *llvm::createBPFISelDag(BPFTargetMachine &TM) {
+  return new BPFDAGToDAGISel(TM);
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.cpp
new file mode 100644
index 000000000000..aee0c72af7fa
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -0,0 +1,683 @@
+//===-- BPFISelLowering.cpp - BPF DAG Lowering Implementation  ----------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file implements the BPFTargetLowering class.
+
+#define DEBUG_TYPE "bpf-lower"
+
+#include "BPFISelLowering.h"
+#include "BPF.h"
+#include "BPFTargetMachine.h"
+#include "BPFSubtarget.h"
+#include "llvm/CodeGen/CallingConvLower.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/SelectionDAGISel.h"
+#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
+#include "llvm/CodeGen/ValueTypes.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/raw_ostream.h"
+using namespace llvm;
+
+BPFTargetLowering::BPFTargetLowering(BPFTargetMachine &tm) :
+  TargetLowering(tm, new TargetLoweringObjectFileELF()),
+  Subtarget(*tm.getSubtargetImpl()), TM(tm) {
+
+  // Set up the register classes.
+  addRegisterClass(MVT::i64, &BPF::GPRRegClass);
+
+  // Compute derived properties from the register classes
+  computeRegisterProperties();
+
+  setStackPointerRegisterToSaveRestore(BPF::R11);
+
+  setOperationAction(ISD::BR_CC,             MVT::i64, Custom);
+  setOperationAction(ISD::BR_JT,             MVT::Other, Expand);
+  setOperationAction(ISD::BRCOND,            MVT::Other, Expand);
+  setOperationAction(ISD::SETCC,             MVT::i64, Expand);
+  setOperationAction(ISD::SELECT,            MVT::i64, Expand);
+  setOperationAction(ISD::SELECT_CC,         MVT::i64, Custom);
+
+//  setCondCodeAction(ISD::SETLT,             MVT::i64, Expand);
+
+  setOperationAction(ISD::GlobalAddress,     MVT::i64, Custom);
+  /*setOperationAction(ISD::BlockAddress,      MVT::i64, Custom);
+  setOperationAction(ISD::JumpTable,         MVT::i64, Custom);
+  setOperationAction(ISD::ConstantPool,      MVT::i64, Custom);*/
+
+  setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i64,   Custom);
+  setOperationAction(ISD::STACKSAVE,          MVT::Other, Expand);
+  setOperationAction(ISD::STACKRESTORE,       MVT::Other, Expand);
+
+/*  setOperationAction(ISD::VASTART,            MVT::Other, Custom);
+  setOperationAction(ISD::VAARG,              MVT::Other, Expand);
+  setOperationAction(ISD::VACOPY,             MVT::Other, Expand);
+  setOperationAction(ISD::VAEND,              MVT::Other, Expand);*/
+
+//    setOperationAction(ISD::SDIV,            MVT::i64, Expand);
+//  setOperationAction(ISD::UDIV,            MVT::i64, Expand);
+
+  setOperationAction(ISD::SDIVREM,           MVT::i64, Expand);
+  setOperationAction(ISD::UDIVREM,           MVT::i64, Expand);
+  setOperationAction(ISD::SREM,              MVT::i64, Expand);
+  setOperationAction(ISD::UREM,              MVT::i64, Expand);
+
+//  setOperationAction(ISD::MUL,             MVT::i64, Expand);
+
+  setOperationAction(ISD::MULHU,             MVT::i64, Expand);
+  setOperationAction(ISD::MULHS,             MVT::i64, Expand);
+  setOperationAction(ISD::UMUL_LOHI,         MVT::i64, Expand);
+  setOperationAction(ISD::SMUL_LOHI,         MVT::i64, Expand);
+
+  setOperationAction(ISD::ADDC, MVT::i64, Expand);
+  setOperationAction(ISD::ADDE, MVT::i64, Expand);
+  setOperationAction(ISD::SUBC, MVT::i64, Expand);
+  setOperationAction(ISD::SUBE, MVT::i64, Expand);
+
+  setOperationAction(ISD::ROTR,              MVT::i64, Expand);
+  setOperationAction(ISD::ROTL,              MVT::i64, Expand);
+  setOperationAction(ISD::SHL_PARTS,         MVT::i64, Expand);
+  setOperationAction(ISD::SRL_PARTS,         MVT::i64, Expand);
+  setOperationAction(ISD::SRA_PARTS,         MVT::i64, Expand);
+
+  setOperationAction(ISD::BSWAP,             MVT::i64, Expand);
+  setOperationAction(ISD::CTTZ,              MVT::i64, Custom);
+  setOperationAction(ISD::CTLZ,              MVT::i64, Custom);
+  setOperationAction(ISD::CTTZ_ZERO_UNDEF,   MVT::i64, Custom);
+  setOperationAction(ISD::CTLZ_ZERO_UNDEF,   MVT::i64, Custom);
+  setOperationAction(ISD::CTPOP,             MVT::i64, Expand);
+
+
+  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1,   Expand);
+  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8,   Expand);
+  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16,  Expand);
+  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32,  Expand);
+
+  // Extended load operations for i1 types must be promoted
+  setLoadExtAction(ISD::EXTLOAD,             MVT::i1,   Promote);
+  setLoadExtAction(ISD::ZEXTLOAD,            MVT::i1,   Promote);
+  setLoadExtAction(ISD::SEXTLOAD,            MVT::i1,   Promote);
+
+  setLoadExtAction(ISD::SEXTLOAD,            MVT::i8,   Expand);
+  setLoadExtAction(ISD::SEXTLOAD,            MVT::i16,   Expand);
+  setLoadExtAction(ISD::SEXTLOAD,            MVT::i32,   Expand);
+
+  // Function alignments (log2)
+  setMinFunctionAlignment(3);
+  setPrefFunctionAlignment(3);
+
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+  MaxStoresPerMemcpy = 128;
+  MaxStoresPerMemcpyOptSize = 128;
+  MaxStoresPerMemset = 128;
+#else
+  maxStoresPerMemcpy = 128;
+  maxStoresPerMemcpyOptSize = 128;
+  maxStoresPerMemset = 128;
+#endif
+}
+
+SDValue BPFTargetLowering::LowerOperation(SDValue Op,
+                                          SelectionDAG &DAG) const {
+  switch (Op.getOpcode()) {
+  case ISD::BR_CC:              return LowerBR_CC(Op, DAG);
+  case ISD::GlobalAddress:      return LowerGlobalAddress(Op, DAG);
+  case ISD::SELECT_CC:          return LowerSELECT_CC(Op, DAG);
+  default:
+    llvm_unreachable("unimplemented operand");
+  }
+}
+
+//                      Calling Convention Implementation
+#include "BPFGenCallingConv.inc"
+
+SDValue
+BPFTargetLowering::LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
+                                        bool isVarArg,
+                                        const SmallVectorImpl<ISD::InputArg>
+                                        &Ins,
+#if LLVM_VERSION_MINOR==4
+                                        SDLoc dl,
+#else
+                                        DebugLoc dl,
+#endif
+                                        SelectionDAG &DAG,
+                                        SmallVectorImpl<SDValue> &InVals)
+                                          const {
+  switch (CallConv) {
+  default:
+    llvm_unreachable("Unsupported calling convention");
+  case CallingConv::C:
+  case CallingConv::Fast:
+    break;
+  }
+
+/// LowerCCCArguments - transform physical registers into virtual registers and
+/// generate load operations for arguments places on the stack.
+  MachineFunction &MF = DAG.getMachineFunction();
+  MachineRegisterInfo &RegInfo = MF.getRegInfo();
+
+  // Assign locations to all of the incoming arguments.
+  SmallVector<CCValAssign, 16> ArgLocs;
+  CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(),
+                 getTargetMachine(), ArgLocs, *DAG.getContext());
+  CCInfo.AnalyzeFormalArguments(Ins, CC_BPF64);
+
+  for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
+    CCValAssign &VA = ArgLocs[i];
+    if (VA.isRegLoc()) {
+      // Arguments passed in registers
+      EVT RegVT = VA.getLocVT();
+      switch (RegVT.getSimpleVT().SimpleTy) {
+      default:
+        {
+#ifndef NDEBUG
+          errs() << "LowerFormalArguments Unhandled argument type: "
+               << RegVT.getSimpleVT().SimpleTy << "\n";
+#endif
+          llvm_unreachable(0);
+        }
+      case MVT::i64:
+        unsigned VReg = RegInfo.createVirtualRegister(&BPF::GPRRegClass);
+        RegInfo.addLiveIn(VA.getLocReg(), VReg);
+        SDValue ArgValue = DAG.getCopyFromReg(Chain, dl, VReg, RegVT);
+
+        // If this is an 8/16/32-bit value, it is really passed promoted to 64
+        // bits. Insert an assert[sz]ext to capture this, then truncate to the
+        // right size.
+        if (VA.getLocInfo() == CCValAssign::SExt)
+          ArgValue = DAG.getNode(ISD::AssertSext, dl, RegVT, ArgValue,
+                                 DAG.getValueType(VA.getValVT()));
+        else if (VA.getLocInfo() == CCValAssign::ZExt)
+          ArgValue = DAG.getNode(ISD::AssertZext, dl, RegVT, ArgValue,
+                                 DAG.getValueType(VA.getValVT()));
+
+        if (VA.getLocInfo() != CCValAssign::Full)
+          ArgValue = DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), ArgValue);
+
+        InVals.push_back(ArgValue);
+      }
+    } else {
+      assert(VA.isMemLoc());
+      errs() << "Function: " << MF.getName() << " ";
+      MF.getFunction()->getFunctionType()->dump();
+      errs() << "\n";
+      report_fatal_error("too many function args");
+    }
+  }
+
+  if (isVarArg || MF.getFunction()->hasStructRetAttr()) {
+    errs() << "Function: " << MF.getName() << " ";
+    MF.getFunction()->getFunctionType()->dump();
+    errs() << "\n";
+    report_fatal_error("functions with VarArgs or StructRet are not supported");
+  }
+
+  return Chain;
+}
+
+SDValue
+BPFTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
+                              SmallVectorImpl<SDValue> &InVals) const {
+  SelectionDAG &DAG                     = CLI.DAG;
+  SmallVector<ISD::OutputArg, 32> &Outs = CLI.Outs;
+  SmallVector<SDValue, 32> &OutVals     = CLI.OutVals;
+  SmallVector<ISD::InputArg, 32> &Ins   = CLI.Ins;
+  SDValue Chain                         = CLI.Chain;
+  SDValue Callee                        = CLI.Callee;
+  bool &isTailCall                      = CLI.IsTailCall;
+  CallingConv::ID CallConv              = CLI.CallConv;
+  bool isVarArg                         = CLI.IsVarArg;
+
+  // BPF target does not support tail call optimization.
+  isTailCall = false;
+
+  switch (CallConv) {
+  default:
+    report_fatal_error("Unsupported calling convention");
+  case CallingConv::Fast:
+  case CallingConv::C:
+    break;
+  }
+
+/// LowerCCCCallTo - functions arguments are copied from virtual regs to
+/// (physical regs)/(stack frame), CALLSEQ_START and CALLSEQ_END are emitted.
+
+  // Analyze operands of the call, assigning locations to each operand.
+  SmallVector<CCValAssign, 16> ArgLocs;
+  CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(),
+                 getTargetMachine(), ArgLocs, *DAG.getContext());
+
+  CCInfo.AnalyzeCallOperands(Outs, CC_BPF64);
+
+  // Get a count of how many bytes are to be pushed on the stack.
+  unsigned NumBytes = CCInfo.getNextStackOffset();
+
+  // Create local copies for byval args.
+  SmallVector<SDValue, 8> ByValArgs;
+
+  if (Outs.size() >= 6) {
+    errs() << "too many arguments to a function ";
+    Callee.dump();
+    report_fatal_error("too many args\n");
+  }
+
+  for (unsigned i = 0,  e = Outs.size(); i != e; ++i) {
+    ISD::ArgFlagsTy Flags = Outs[i].Flags;
+    if (!Flags.isByVal())
+      continue;
+
+    Callee.dump();
+    report_fatal_error("cannot pass by value");
+  }
+
+  Chain = DAG.getCALLSEQ_START(Chain, DAG.getConstant(NumBytes,
+                                                      getPointerTy(), true)
+#if LLVM_VERSION_MINOR==4
+                                                      , CLI.DL
+#endif
+                                                      );
+
+  SmallVector<std::pair<unsigned, SDValue>, 4> RegsToPass;
+  SDValue StackPtr;
+
+  // Walk the register/memloc assignments, inserting copies/loads.
+  for (unsigned i = 0, j = 0, e = ArgLocs.size(); i != e; ++i) {
+    CCValAssign &VA = ArgLocs[i];
+    SDValue Arg = OutVals[i];
+    ISD::ArgFlagsTy Flags = Outs[i].Flags;
+
+
+    // Promote the value if needed.
+    switch (VA.getLocInfo()) {
+      default: llvm_unreachable("Unknown loc info!");
+      case CCValAssign::Full: break;
+      case CCValAssign::SExt:
+        Arg = DAG.getNode(ISD::SIGN_EXTEND, CLI.DL, VA.getLocVT(), Arg);
+        break;
+      case CCValAssign::ZExt:
+        Arg = DAG.getNode(ISD::ZERO_EXTEND, CLI.DL, VA.getLocVT(), Arg);
+        break;
+      case CCValAssign::AExt:
+        Arg = DAG.getNode(ISD::ANY_EXTEND, CLI.DL, VA.getLocVT(), Arg);
+        break;
+    }
+
+    // Use local copy if it is a byval arg.
+    if (Flags.isByVal())
+      Arg = ByValArgs[j++];
+
+    // Arguments that can be passed on register must be kept at RegsToPass
+    // vector
+    if (VA.isRegLoc()) {
+      RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));
+    } else {
+      llvm_unreachable("call arg pass bug");
+    }
+  }
+
+  SDValue InFlag;
+
+  // Build a sequence of copy-to-reg nodes chained together with token chain and
+  // flag operands which copy the outgoing args into registers.  The InFlag in
+  // necessary since all emitted instructions must be stuck together.
+  for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) {
+    Chain = DAG.getCopyToReg(Chain, CLI.DL, RegsToPass[i].first,
+                             RegsToPass[i].second, InFlag);
+    InFlag = Chain.getValue(1);
+  }
+
+  // If the callee is a GlobalAddress node (quite common, every direct call is)
+  // turn it into a TargetGlobalAddress node so that legalize doesn't hack it.
+  // Likewise ExternalSymbol -> TargetExternalSymbol.
+  if (GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee)) {
+    Callee = DAG.getTargetGlobalAddress(G->getGlobal(), CLI.DL, getPointerTy(), G->getOffset()/*0*/,
+                                        0);
+  } else if (ExternalSymbolSDNode *E = dyn_cast<ExternalSymbolSDNode>(Callee)) {
+    Callee = DAG.getTargetExternalSymbol(E->getSymbol(), getPointerTy(),
+                                         0);
+  }
+
+  // Returns a chain & a flag for retval copy to use.
+  SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
+  SmallVector<SDValue, 8> Ops;
+  Ops.push_back(Chain);
+  Ops.push_back(Callee);
+
+  // Add argument registers to the end of the list so that they are
+  // known live into the call.
+  for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i)
+    Ops.push_back(DAG.getRegister(RegsToPass[i].first,
+                                  RegsToPass[i].second.getValueType()));
+
+  if (InFlag.getNode())
+    Ops.push_back(InFlag);
+
+  Chain = DAG.getNode(BPFISD::CALL, CLI.DL, NodeTys, &Ops[0], Ops.size());
+  InFlag = Chain.getValue(1);
+
+  // Create the CALLSEQ_END node.
+  Chain = DAG.getCALLSEQ_END(Chain,
+                             DAG.getConstant(NumBytes, getPointerTy(), true),
+                             DAG.getConstant(0, getPointerTy(), true),
+                             InFlag
+#if LLVM_VERSION_MINOR==4
+                             , CLI.DL
+#endif
+                             );
+  InFlag = Chain.getValue(1);
+
+  // Handle result values, copying them out of physregs into vregs that we
+  // return.
+  return LowerCallResult(Chain, InFlag, CallConv, isVarArg, Ins, CLI.DL,
+                         DAG, InVals);
+}
+
+SDValue
+BPFTargetLowering::LowerReturn(SDValue Chain,
+                               CallingConv::ID CallConv, bool isVarArg,
+                               const SmallVectorImpl<ISD::OutputArg> &Outs,
+                               const SmallVectorImpl<SDValue> &OutVals,
+#if LLVM_VERSION_MINOR==4
+                               SDLoc dl,
+#else
+                               DebugLoc dl,
+#endif
+                               SelectionDAG &DAG) const {
+
+  // CCValAssign - represent the assignment of the return value to a location
+  SmallVector<CCValAssign, 16> RVLocs;
+
+  // CCState - Info about the registers and stack slot.
+  CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(),
+                 getTargetMachine(), RVLocs, *DAG.getContext());
+
+  // Analize return values.
+  CCInfo.AnalyzeReturn(Outs, RetCC_BPF64);
+
+  // If this is the first return lowered for this function, add the regs to the
+  // liveout set for the function.
+#if LLVM_VERSION_MINOR==2
+  if (DAG.getMachineFunction().getRegInfo().liveout_empty()) {
+    for (unsigned i = 0; i != RVLocs.size(); ++i)
+      if (RVLocs[i].isRegLoc())
+        DAG.getMachineFunction().getRegInfo().addLiveOut(RVLocs[i].getLocReg());
+  }
+#endif
+
+  SDValue Flag;
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+  SmallVector<SDValue, 4> RetOps(1, Chain);
+#endif
+
+  // Copy the result values into the output registers.
+  for (unsigned i = 0; i != RVLocs.size(); ++i) {
+    CCValAssign &VA = RVLocs[i];
+    assert(VA.isRegLoc() && "Can only return in registers!");
+
+    Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(),
+                             OutVals[i], Flag);
+
+    // Guarantee that all emitted copies are stuck together,
+    // avoiding something bad.
+    Flag = Chain.getValue(1);
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+    RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
+#endif
+  }
+
+  if (DAG.getMachineFunction().getFunction()->hasStructRetAttr()) {
+    errs() << "Function: " << DAG.getMachineFunction().getName() << " ";
+    DAG.getMachineFunction().getFunction()->getFunctionType()->dump();
+    errs() << "\n";
+    report_fatal_error("BPF doesn't support struct return");
+  }
+
+  unsigned Opc = BPFISD::RET_FLAG;
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+  RetOps[0] = Chain;  // Update chain.
+
+  // Add the flag if we have it.
+  if (Flag.getNode())
+    RetOps.push_back(Flag);
+
+  return DAG.getNode(Opc, dl, MVT::Other, &RetOps[0], RetOps.size());
+#else
+  if (Flag.getNode())
+    return DAG.getNode(Opc, dl, MVT::Other, Chain, Flag);
+
+  // Return Void
+  return DAG.getNode(Opc, dl, MVT::Other, Chain);
+#endif
+}
+
+/// LowerCallResult - Lower the result values of a call into the
+/// appropriate copies out of appropriate physical registers.
+SDValue
+BPFTargetLowering::LowerCallResult(SDValue Chain, SDValue InFlag,
+                                   CallingConv::ID CallConv, bool isVarArg,
+                                   const SmallVectorImpl<ISD::InputArg> &Ins,
+#if LLVM_VERSION_MINOR==4
+                                   SDLoc dl,
+#else
+                                   DebugLoc dl,
+#endif
+                                   SelectionDAG &DAG,
+                                   SmallVectorImpl<SDValue> &InVals) const {
+
+  // Assign locations to each value returned by this call.
+  SmallVector<CCValAssign, 16> RVLocs;
+  CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(),
+                 getTargetMachine(), RVLocs, *DAG.getContext());
+
+  CCInfo.AnalyzeCallResult(Ins, RetCC_BPF64);
+
+  // Copy all of the result registers out of their specified physreg.
+  for (unsigned i = 0; i != RVLocs.size(); ++i) {
+    Chain = DAG.getCopyFromReg(Chain, dl, RVLocs[i].getLocReg(),
+                               RVLocs[i].getValVT(), InFlag).getValue(1);
+    InFlag = Chain.getValue(2);
+    InVals.push_back(Chain.getValue(0));
+  }
+
+  return Chain;
+}
+
+static bool NegateCC(SDValue &LHS, SDValue &RHS, ISD::CondCode &CC)
+{
+  switch (CC) {
+  default:
+    return false;
+  case ISD::SETULT:
+    CC = ISD::SETUGT;
+    std::swap(LHS, RHS);
+    return true;
+  case ISD::SETULE:
+    CC = ISD::SETUGE;
+    std::swap(LHS, RHS);
+    return true;
+  case ISD::SETLT:
+    CC = ISD::SETGT;
+    std::swap(LHS, RHS);
+    return true;
+  case ISD::SETLE:
+    CC = ISD::SETGE;
+    std::swap(LHS, RHS);
+    return true;
+  }
+}
+
+SDValue BPFTargetLowering::LowerBR_CC(SDValue Op,
+                                      SelectionDAG &DAG) const {
+  SDValue Chain  = Op.getOperand(0);
+  ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
+  SDValue LHS   = Op.getOperand(2);
+  SDValue RHS   = Op.getOperand(3);
+  SDValue Dest  = Op.getOperand(4);
+#if LLVM_VERSION_MINOR==4
+  SDLoc    dl(Op);
+#else
+  DebugLoc dl   = Op.getDebugLoc();
+#endif
+
+  NegateCC(LHS, RHS, CC);
+
+  return DAG.getNode(BPFISD::BR_CC, dl, Op.getValueType(),
+                     Chain, LHS, RHS, DAG.getConstant(CC, MVT::i64), Dest);
+}
+
+SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op,
+                                          SelectionDAG &DAG) const {
+  SDValue LHS    = Op.getOperand(0);
+  SDValue RHS    = Op.getOperand(1);
+  SDValue TrueV  = Op.getOperand(2);
+  SDValue FalseV = Op.getOperand(3);
+  ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(4))->get();
+#if LLVM_VERSION_MINOR==4
+  SDLoc    dl(Op);
+#else
+  DebugLoc dl    = Op.getDebugLoc();
+#endif
+
+  NegateCC(LHS, RHS, CC);
+
+  SDValue TargetCC = DAG.getConstant(CC, MVT::i64);
+
+  SDVTList VTs = DAG.getVTList(Op.getValueType(), MVT::Glue);
+  SmallVector<SDValue, 5> Ops;
+  Ops.push_back(LHS);
+  Ops.push_back(RHS);
+  Ops.push_back(TargetCC);
+  Ops.push_back(TrueV);
+  Ops.push_back(FalseV);
+
+  SDValue sel = DAG.getNode(BPFISD::SELECT_CC, dl, VTs, &Ops[0], Ops.size());
+  DEBUG(errs() << "LowerSELECT_CC:\n"; sel.dumpr(); errs() << "\n");
+  return sel;
+}
+
+const char *BPFTargetLowering::getTargetNodeName(unsigned Opcode) const {
+  switch (Opcode) {
+  default: return NULL;
+  case BPFISD::ADJDYNALLOC:        return "BPFISD::ADJDYNALLOC";
+  case BPFISD::RET_FLAG:           return "BPFISD::RET_FLAG";
+  case BPFISD::CALL:               return "BPFISD::CALL";
+  case BPFISD::SELECT_CC:          return "BPFISD::SELECT_CC";
+  case BPFISD::BR_CC:              return "BPFISD::BR_CC";
+  case BPFISD::Wrapper:            return "BPFISD::Wrapper";
+  }
+}
+
+SDValue BPFTargetLowering::LowerGlobalAddress(SDValue Op,
+                                              SelectionDAG &DAG) const {
+//  Op.dump();
+#if LLVM_VERSION_MINOR==4
+  SDLoc    dl(Op);
+#else
+  DebugLoc dl = Op.getDebugLoc();
+#endif
+  const GlobalValue *GV = cast<GlobalAddressSDNode>(Op)->getGlobal();
+  SDValue GA = DAG.getTargetGlobalAddress(GV, dl, MVT::i64);
+
+  return DAG.getNode(BPFISD::Wrapper, dl, MVT::i64, GA);
+}
+
+MachineBasicBlock*
+BPFTargetLowering::EmitInstrWithCustomInserter(MachineInstr *MI,
+                                               MachineBasicBlock *BB) const {
+  unsigned Opc = MI->getOpcode();
+
+  const TargetInstrInfo &TII = *getTargetMachine().getInstrInfo();
+  DebugLoc dl = MI->getDebugLoc();
+
+  assert(Opc == BPF::Select && "Unexpected instr type to insert");
+
+  // To "insert" a SELECT instruction, we actually have to insert the diamond
+  // control-flow pattern.  The incoming instruction knows the destination vreg
+  // to set, the condition code register to branch on, the true/false values to
+  // select between, and a branch opcode to use.
+  const BasicBlock *LLVM_BB = BB->getBasicBlock();
+  MachineFunction::iterator I = BB;
+  ++I;
+
+  //  thisMBB:
+  //  ...
+  //   TrueVal = ...
+  //   jmp_XX r1, r2 goto copy1MBB
+  //   fallthrough --> copy0MBB
+  MachineBasicBlock *thisMBB = BB;
+  MachineFunction *F = BB->getParent();
+  MachineBasicBlock *copy0MBB = F->CreateMachineBasicBlock(LLVM_BB);
+  MachineBasicBlock *copy1MBB = F->CreateMachineBasicBlock(LLVM_BB);
+
+  F->insert(I, copy0MBB);
+  F->insert(I, copy1MBB);
+  // Update machine-CFG edges by transferring all successors of the current
+  // block to the new block which will contain the Phi node for the select.
+  copy1MBB->splice(copy1MBB->begin(), BB,
+                   llvm::next(MachineBasicBlock::iterator(MI)),
+                   BB->end());
+  copy1MBB->transferSuccessorsAndUpdatePHIs(BB);
+  // Next, add the true and fallthrough blocks as its successors.
+  BB->addSuccessor(copy0MBB);
+  BB->addSuccessor(copy1MBB);
+
+  // Insert Branch if Flag
+  unsigned LHS = MI->getOperand(1).getReg();
+  unsigned RHS = MI->getOperand(2).getReg();
+  int CC  = MI->getOperand(3).getImm();
+  switch (CC) {
+  case ISD::SETGT:
+    BuildMI(BB, dl, TII.get(BPF::JSGT_rr))
+      .addReg(LHS).addReg(RHS).addMBB(copy1MBB);
+    break;
+  case ISD::SETUGT:
+    BuildMI(BB, dl, TII.get(BPF::JUGT_rr))
+      .addReg(LHS).addReg(RHS).addMBB(copy1MBB);
+    break;
+  case ISD::SETGE:
+    BuildMI(BB, dl, TII.get(BPF::JSGE_rr))
+      .addReg(LHS).addReg(RHS).addMBB(copy1MBB);
+    break;
+  case ISD::SETUGE:
+    BuildMI(BB, dl, TII.get(BPF::JUGE_rr))
+      .addReg(LHS).addReg(RHS).addMBB(copy1MBB);
+    break;
+  case ISD::SETEQ:
+    BuildMI(BB, dl, TII.get(BPF::JEQ_rr))
+      .addReg(LHS).addReg(RHS).addMBB(copy1MBB);
+    break;
+  case ISD::SETNE:
+    BuildMI(BB, dl, TII.get(BPF::JNE_rr))
+      .addReg(LHS).addReg(RHS).addMBB(copy1MBB);
+    break;
+  default:
+    report_fatal_error("unimplemented select CondCode " + Twine(CC));
+  }
+
+  //  copy0MBB:
+  //   %FalseValue = ...
+  //   # fallthrough to copy1MBB
+  BB = copy0MBB;
+
+  // Update machine-CFG edges
+  BB->addSuccessor(copy1MBB);
+
+  //  copy1MBB:
+  //   %Result = phi [ %FalseValue, copy0MBB ], [ %TrueValue, thisMBB ]
+  //  ...
+  BB = copy1MBB;
+  BuildMI(*BB, BB->begin(), dl, TII.get(BPF::PHI),
+          MI->getOperand(0).getReg())
+    .addReg(MI->getOperand(5).getReg()).addMBB(copy0MBB)
+    .addReg(MI->getOperand(4).getReg()).addMBB(thisMBB);
+
+  MI->eraseFromParent();   // The pseudo instruction is gone now.
+  return BB;
+}
+
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.h b/tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.h
new file mode 100644
index 000000000000..0850a9e2dee4
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -0,0 +1,105 @@
+//===-- BPFISelLowering.h - BPF DAG Lowering Interface -......-*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file defines the interfaces that BPF uses to lower LLVM code into a
+// selection DAG.
+
+#ifndef LLVM_TARGET_BPF_ISELLOWERING_H
+#define LLVM_TARGET_BPF_ISELLOWERING_H
+
+#include "BPF.h"
+#include "llvm/CodeGen/SelectionDAG.h"
+#include "llvm/Target/TargetLowering.h"
+
+namespace llvm {
+  namespace BPFISD {
+    enum {
+      FIRST_NUMBER = ISD::BUILTIN_OP_END,
+
+      ADJDYNALLOC,
+
+      /// Return with a flag operand. Operand 0 is the chain operand.
+      RET_FLAG,
+
+      /// CALL - These operations represent an abstract call instruction, which
+      /// includes a bunch of information.
+      CALL,
+
+      /// SELECT_CC - Operand 0 and operand 1 are selection variable, operand 3
+      /// is condition code and operand 4 is flag operand.
+      SELECT_CC,
+
+      // BR_CC - Used to glue together a l.bf to a l.sfXX
+      BR_CC,
+
+      /// Wrapper - A wrapper node for TargetConstantPool, TargetExternalSymbol,
+      /// and TargetGlobalAddress.
+      Wrapper
+    };
+  }
+
+  class BPFSubtarget;
+  class BPFTargetMachine;
+
+  class BPFTargetLowering : public TargetLowering {
+  public:
+    explicit BPFTargetLowering(BPFTargetMachine &TM);
+
+    /// LowerOperation - Provide custom lowering hooks for some operations.
+    virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const;
+
+    /// getTargetNodeName - This method returns the name of a target specific
+    /// DAG node.
+    virtual const char *getTargetNodeName(unsigned Opcode) const;
+
+    SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
+    SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
+    SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
+
+    MachineBasicBlock* EmitInstrWithCustomInserter(MachineInstr *MI,
+                                                   MachineBasicBlock *BB) const;
+
+  private:
+    const BPFSubtarget &Subtarget;
+    const BPFTargetMachine &TM;
+
+    SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
+                            CallingConv::ID CallConv, bool isVarArg,
+                            const SmallVectorImpl<ISD::InputArg> &Ins,
+#if LLVM_VERSION_MINOR==4
+                            SDLoc dl,
+#else
+                            DebugLoc dl,
+#endif
+                            SelectionDAG &DAG,
+                            SmallVectorImpl<SDValue> &InVals) const;
+
+    SDValue LowerCall(TargetLowering::CallLoweringInfo &CLI,
+                      SmallVectorImpl<SDValue> &InVals) const;
+
+    SDValue LowerFormalArguments(SDValue Chain,
+                                 CallingConv::ID CallConv, bool isVarArg,
+                                 const SmallVectorImpl<ISD::InputArg> &Ins,
+#if LLVM_VERSION_MINOR==4
+                                 SDLoc dl,
+#else
+                                 DebugLoc dl,
+#endif
+                                 SelectionDAG &DAG,
+                                 SmallVectorImpl<SDValue> &InVals) const;
+
+    SDValue LowerReturn(SDValue Chain,
+                        CallingConv::ID CallConv, bool isVarArg,
+                        const SmallVectorImpl<ISD::OutputArg> &Outs,
+                        const SmallVectorImpl<SDValue> &OutVals,
+#if LLVM_VERSION_MINOR==4
+                        SDLoc dl,
+#else
+                        DebugLoc dl,
+#endif
+                        SelectionDAG &DAG) const;
+  };
+}
+
+#endif // LLVM_TARGET_BPF_ISELLOWERING_H
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFInstrFormats.td b/tools/bpf/llvm/lib/Target/BPF/BPFInstrFormats.td
new file mode 100644
index 000000000000..122ff19e8562
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFInstrFormats.td
@@ -0,0 +1,29 @@
+//===- BPFInstrFormats.td - BPF Instruction Formats ----*- tablegen -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+class InstBPF<dag outs, dag ins, string asmstr, list<dag> pattern>
+  : Instruction {
+  field bits<64> Inst;
+  field bits<64> SoftFail = 0;
+  let Size = 8;
+
+  let Namespace = "BPF";
+  let DecoderNamespace = "BPF";
+
+  bits<3> bpfClass;
+  let Inst{58-56} = bpfClass;
+
+  dag OutOperandList = outs;
+  dag InOperandList = ins;
+  let AsmString = asmstr;
+  let Pattern = pattern;
+}
+
+// Pseudo instructions
+class Pseudo<dag outs, dag ins, string asmstr, list<dag> pattern>
+  : InstBPF<outs, ins, asmstr, pattern> {
+  let Inst{63-0} = 0;
+  let isPseudo = 1;
+}
+
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.cpp
new file mode 100644
index 000000000000..943de85ed952
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.cpp
@@ -0,0 +1,162 @@
+//===-- BPFInstrInfo.cpp - BPF Instruction Information --------*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file contains the BPF implementation of the TargetInstrInfo class.
+
+#include "BPF.h"
+#include "BPFInstrInfo.h"
+#include "BPFSubtarget.h"
+#include "BPFTargetMachine.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SmallVector.h"
+
+#define GET_INSTRINFO_CTOR_DTOR /* for 3.4 */
+#define GET_INSTRINFO_CTOR /* for 3.2, 3.3 */
+#include "BPFGenInstrInfo.inc"
+
+using namespace llvm;
+
+BPFInstrInfo::BPFInstrInfo()
+  : BPFGenInstrInfo(BPF::ADJCALLSTACKDOWN, BPF::ADJCALLSTACKUP),
+    RI(*this) {
+}
+
+void BPFInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
+                                MachineBasicBlock::iterator I, DebugLoc DL,
+                                unsigned DestReg, unsigned SrcReg,
+                                bool KillSrc) const {
+  if (BPF::GPRRegClass.contains(DestReg, SrcReg))
+    BuildMI(MBB, I, DL, get(BPF::MOV_rr), DestReg)
+      .addReg(SrcReg, getKillRegState(KillSrc));
+  else
+    llvm_unreachable("Impossible reg-to-reg copy");
+}
+
+void BPFInstrInfo::
+storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
+                    unsigned SrcReg, bool isKill, int FI,
+                    const TargetRegisterClass *RC,
+                    const TargetRegisterInfo *TRI) const {
+  DebugLoc DL;
+  if (I != MBB.end()) DL = I->getDebugLoc();
+
+  if (RC == &BPF::GPRRegClass)
+    BuildMI(MBB, I, DL, get(BPF::STD)).addReg(SrcReg, getKillRegState(isKill))
+      .addFrameIndex(FI).addImm(0);
+  else
+    llvm_unreachable("Can't store this register to stack slot");
+}
+
+void BPFInstrInfo::
+loadRegFromStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
+                     unsigned DestReg, int FI,
+                     const TargetRegisterClass *RC,
+                     const TargetRegisterInfo *TRI) const {
+  DebugLoc DL;
+  if (I != MBB.end()) DL = I->getDebugLoc();
+
+  if (RC == &BPF::GPRRegClass)
+    BuildMI(MBB, I, DL, get(BPF::LDD), DestReg).addFrameIndex(FI).addImm(0);
+  else
+    llvm_unreachable("Can't load this register from stack slot");
+}
+
+bool BPFInstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,
+                                  MachineBasicBlock *&TBB,
+                                  MachineBasicBlock *&FBB,
+                                  SmallVectorImpl<MachineOperand> &Cond,
+                                  bool AllowModify) const {
+  // Start from the bottom of the block and work up, examining the
+  // terminator instructions.
+  MachineBasicBlock::iterator I = MBB.end();
+  while (I != MBB.begin()) {
+    --I;
+    if (I->isDebugValue())
+      continue;
+
+    // Working from the bottom, when we see a non-terminator
+    // instruction, we're done.
+    if (!isUnpredicatedTerminator(I))
+      break;
+
+    // A terminator that isn't a branch can't easily be handled
+    // by this analysis.
+    if (!I->isBranch())
+      return true;
+
+    // Handle unconditional branches.
+    if (I->getOpcode() == BPF::JMP) {
+      if (!AllowModify) {
+        TBB = I->getOperand(0).getMBB();
+        continue;
+      }
+
+      // If the block has any instructions after a J, delete them.
+      while (llvm::next(I) != MBB.end())
+        llvm::next(I)->eraseFromParent();
+      Cond.clear();
+      FBB = 0;
+
+      // Delete the J if it's equivalent to a fall-through.
+      if (MBB.isLayoutSuccessor(I->getOperand(0).getMBB())) {
+        TBB = 0;
+        I->eraseFromParent();
+        I = MBB.end();
+        continue;
+      }
+
+      // TBB is used to indicate the unconditinal destination.
+      TBB = I->getOperand(0).getMBB();
+      continue;
+    }
+    // Cannot handle conditional branches
+    return true;
+  }
+
+  return false;
+}
+
+unsigned
+BPFInstrInfo::InsertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
+                            MachineBasicBlock *FBB,
+                            const SmallVectorImpl<MachineOperand> &Cond,
+                            DebugLoc DL) const {
+  // Shouldn't be a fall through.
+  assert(TBB && "InsertBranch must not be told to insert a fallthrough");
+
+  if (Cond.empty()) {
+    // Unconditional branch
+    assert(!FBB && "Unconditional branch with multiple successors!");
+    BuildMI(&MBB, DL, get(BPF::JMP)).addMBB(TBB);
+    return 1;
+  }
+
+  llvm_unreachable("Unexpected conditional branch");
+  return 0;
+}
+
+unsigned BPFInstrInfo::RemoveBranch(MachineBasicBlock &MBB) const {
+  MachineBasicBlock::iterator I = MBB.end();
+  unsigned Count = 0;
+
+  while (I != MBB.begin()) {
+    --I;
+    if (I->isDebugValue())
+      continue;
+    if (I->getOpcode() != BPF::JMP)
+      break;
+    // Remove the branch.
+    I->eraseFromParent();
+    I = MBB.end();
+    ++Count;
+  }
+
+  return Count;
+}
+
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.h b/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.h
new file mode 100644
index 000000000000..911387d0a06a
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.h
@@ -0,0 +1,53 @@
+//===- BPFInstrInfo.h - BPF Instruction Information ---------*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#ifndef BPFINSTRUCTIONINFO_H
+#define BPFINSTRUCTIONINFO_H
+
+#include "BPFRegisterInfo.h"
+#include "BPFSubtarget.h"
+#include "llvm/Target/TargetInstrInfo.h"
+
+#define GET_INSTRINFO_HEADER
+#include "BPFGenInstrInfo.inc"
+
+namespace llvm {
+
+class BPFInstrInfo : public BPFGenInstrInfo {
+  const BPFRegisterInfo RI;
+public:
+  BPFInstrInfo();
+
+  virtual const BPFRegisterInfo &getRegisterInfo() const { return RI; }
+
+  virtual void copyPhysReg(MachineBasicBlock &MBB,
+                           MachineBasicBlock::iterator I, DebugLoc DL,
+                           unsigned DestReg, unsigned SrcReg,
+                           bool KillSrc) const;
+
+  virtual void storeRegToStackSlot(MachineBasicBlock &MBB,
+                                   MachineBasicBlock::iterator MBBI,
+                                   unsigned SrcReg, bool isKill, int FrameIndex,
+                                   const TargetRegisterClass *RC,
+                                   const TargetRegisterInfo *TRI) const;
+
+  virtual void loadRegFromStackSlot(MachineBasicBlock &MBB,
+                                    MachineBasicBlock::iterator MBBI,
+                                    unsigned DestReg, int FrameIndex,
+                                    const TargetRegisterClass *RC,
+                                    const TargetRegisterInfo *TRI) const;
+  bool AnalyzeBranch(MachineBasicBlock &MBB,
+                     MachineBasicBlock *&TBB, MachineBasicBlock *&FBB,
+                     SmallVectorImpl<MachineOperand> &Cond,
+                     bool AllowModify) const;
+
+  unsigned RemoveBranch(MachineBasicBlock &MBB) const;
+  unsigned InsertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
+                        MachineBasicBlock *FBB,
+                        const SmallVectorImpl<MachineOperand> &Cond,
+                        DebugLoc DL) const;
+};
+}
+
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.td b/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.td
new file mode 100644
index 000000000000..cce46b187592
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -0,0 +1,498 @@
+//===-- BPFInstrInfo.td - Target Description for BPF Target -------------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file describes the BPF instructions in TableGen format.
+
+include "BPFInstrFormats.td"
+
+// Instruction Operands and Patterns
+
+//  These are target-independent nodes, but have target-specific formats.
+def SDT_BPFCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, iPTR> ]>;
+def SDT_BPFCallSeqEnd   : SDCallSeqEnd<[ SDTCisVT<0, iPTR>,
+                                          SDTCisVT<1, iPTR> ]>;
+def SDT_BPFCall         : SDTypeProfile<0, -1, [SDTCisVT<0, iPTR>]>;
+def SDT_BPFSetFlag      : SDTypeProfile<0, 3, [SDTCisSameAs<0, 1>]>;
+def SDT_BPFSelectCC     : SDTypeProfile<1, 5, [SDTCisSameAs<1, 2>, SDTCisSameAs<0, 4>,
+                                                SDTCisSameAs<4, 5>]>;
+def SDT_BPFBrCC         : SDTypeProfile<0, 4, [SDTCisSameAs<0, 1>, SDTCisVT<3, OtherVT>]>;
+
+def SDT_BPFWrapper      : SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>,
+                                                SDTCisPtrTy<0>]>;
+//def SDT_BPFAdjDynAlloc  : SDTypeProfile<1, 1, [SDTCisVT<0, i64>,
+//                                                SDTCisVT<1, i64>]>;
+
+def call            : SDNode<"BPFISD::CALL", SDT_BPFCall,
+                             [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue,
+                              SDNPVariadic]>;
+def retflag         : SDNode<"BPFISD::RET_FLAG", SDTNone,
+                             [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
+def callseq_start   : SDNode<"ISD::CALLSEQ_START", SDT_BPFCallSeqStart,
+                             [SDNPHasChain, SDNPOutGlue]>;
+def callseq_end     : SDNode<"ISD::CALLSEQ_END",   SDT_BPFCallSeqEnd,
+                             [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;
+def BPFbrcc        : SDNode<"BPFISD::BR_CC", SDT_BPFBrCC,
+                              [SDNPHasChain, SDNPOutGlue, SDNPInGlue]>;
+
+def BPFselectcc    : SDNode<"BPFISD::SELECT_CC", SDT_BPFSelectCC, [SDNPInGlue]>;
+def BPFWrapper     : SDNode<"BPFISD::Wrapper", SDT_BPFWrapper>;
+
+//def BPFadjdynalloc : SDNode<"BPFISD::ADJDYNALLOC", SDT_BPFAdjDynAlloc>;
+
+// helper macros to produce 64-bit constant
+// 0x11223344 55667788 ->
+// reg = 0x11223344
+// reg <<= 32
+// reg += 0x55667788
+//
+// 0x11223344 FF667788 ->
+// reg = 0x11223345
+// reg <<= 32
+// reg += (long long)(int)0xFF667788
+def LO32 : SDNodeXForm<imm, [{
+  return CurDAG->getTargetConstant((int64_t)(int32_t)(uint64_t)N->getZExtValue(),
+                                   MVT::i64);
+}]>;
+def HI32 : SDNodeXForm<imm, [{
+  return CurDAG->getTargetConstant(((int64_t)N->getZExtValue() -
+         (int64_t)(int32_t)(uint64_t)N->getZExtValue()) >> 32, MVT::i64);
+}]>;
+
+
+def brtarget : Operand<OtherVT>;
+def calltarget : Operand<i64>;
+
+def s32imm   : Operand<i64> {
+}
+
+def u64imm   : Operand<i64> {
+  let PrintMethod = "printImm64Operand";
+}
+
+def immSExt32 : PatLeaf<(imm),
+                [{return isInt<32>(N->getSExtValue()); }]>;
+
+// Addressing modes.
+def ADDRri : ComplexPattern<i64, 2, "SelectAddr", [frameindex], []>;
+
+// Address operands
+def MEMri : Operand<i64> {
+  let PrintMethod = "printMemOperand";
+  let EncoderMethod = "getMemoryOpValue";
+  let DecoderMethod = "todo_decode_memri";
+  let MIOperandInfo = (ops GPR, i16imm);
+}
+
+// Conditional code predicates - used for pattern matching for SF instructions
+def BPF_CC_EQ  : PatLeaf<(imm),
+                  [{return (N->getZExtValue() == ISD::SETEQ);}]>;
+def BPF_CC_NE  : PatLeaf<(imm),
+                  [{return (N->getZExtValue() == ISD::SETNE);}]>;
+def BPF_CC_GE  : PatLeaf<(imm),
+                  [{return (N->getZExtValue() == ISD::SETGE);}]>;
+def BPF_CC_GT  : PatLeaf<(imm),
+                  [{return (N->getZExtValue() == ISD::SETGT);}]>;
+def BPF_CC_GTU : PatLeaf<(imm),
+                  [{return (N->getZExtValue() == ISD::SETUGT);}]>;
+def BPF_CC_GEU : PatLeaf<(imm),
+                  [{return (N->getZExtValue() == ISD::SETUGE);}]>;
+
+// jump instructions
+class JMP_RR<bits<4> br_op, string asmstr, PatLeaf Cond>
+  : InstBPF<(outs), (ins GPR:$rA, GPR:$rX, brtarget:$dst),
+           !strconcat(asmstr, "\t$rA, $rX goto $dst"),
+           [(BPFbrcc (i64 GPR:$rA), (i64 GPR:$rX), Cond, bb:$dst)]> {
+  bits<4> op;
+  bits<1> src;
+  bits<4> rA;
+  bits<4> rX;
+  bits<16> dst;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{55-52} = rX;
+  let Inst{51-48} = rA;
+  let Inst{47-32} = dst;
+
+  let op = br_op;
+  let src = 1;
+  let bpfClass = 5; // BPF_JMP
+}
+
+class JMP_RI<bits<4> br_op, string asmstr, PatLeaf Cond>
+  : InstBPF<(outs), (ins GPR:$rA, s32imm:$imm, brtarget:$dst),
+           !strconcat(asmstr, "i\t$rA, $imm goto $dst"),
+           [(BPFbrcc (i64 GPR:$rA), immSExt32:$imm, Cond, bb:$dst)]> {
+  bits<4> op;
+  bits<1> src;
+  bits<4> rA;
+  bits<16> dst;
+  bits<32> imm;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{51-48} = rA;
+  let Inst{47-32} = dst;
+  let Inst{31-0} = imm;
+
+  let op = br_op;
+  let src = 0;
+  let bpfClass = 5; // BPF_JMP
+}
+
+multiclass J<bits<4> op2Val, string asmstr, PatLeaf Cond> {
+  def _rr : JMP_RR<op2Val, asmstr, Cond>;
+  def _ri : JMP_RI<op2Val, asmstr, Cond>;
+}
+
+let isBranch = 1, isTerminator = 1, hasDelaySlot=0 in {
+// cmp+goto instructions
+defm JEQ  : J<0x1, "jeq",  BPF_CC_EQ>;
+defm JUGT : J<0x2, "jgt", BPF_CC_GTU>;
+defm JUGE : J<0x3, "jge", BPF_CC_GEU>;
+defm JNE  : J<0x5, "jne",  BPF_CC_NE>;
+defm JSGT : J<0x6, "jsgt", BPF_CC_GT>;
+defm JSGE : J<0x7, "jsge", BPF_CC_GE>;
+}
+
+// ALU instructions
+class ALU_RI<bits<4> aluOp, string asmstr, SDNode OpNode>
+  : InstBPF<(outs GPR:$rA), (ins GPR:$rS, s32imm:$imm),
+            !strconcat(asmstr, "i\t$rA, $imm"),
+            [(set GPR:$rA, (OpNode GPR:$rS, immSExt32:$imm))]> {
+  bits<4> op;
+  bits<1> src;
+  bits<4> rA;
+  bits<32> imm;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{51-48} = rA;
+  let Inst{31-0} = imm;
+
+  let op = aluOp;
+  let src = 0;
+  let bpfClass = 7; // BPF_ALU64
+}
+
+class ALU_RR<bits<4> aluOp, string asmstr, SDNode OpNode>
+  : InstBPF<(outs GPR:$rA), (ins GPR:$rS, GPR:$rX),
+            !strconcat(asmstr, "\t$rA, $rX"),
+            [(set GPR:$rA, (OpNode (i64 GPR:$rS), (i64 GPR:$rX)))]> {
+  bits<4> op;
+  bits<1> src;
+  bits<4> rA;
+  bits<4> rX;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{55-52} = rX;
+  let Inst{51-48} = rA;
+
+  let op = aluOp;
+  let src = 1;
+  let bpfClass = 7; // BPF_ALU64
+}
+
+multiclass ALU<bits<4> opVal, string asmstr, SDNode OpNode> {
+  def _rr : ALU_RR<opVal, asmstr, OpNode>;
+  def _ri : ALU_RI<opVal, asmstr, OpNode>;
+}
+
+let Constraints = "$rA = $rS" in {
+let isAsCheapAsAMove = 1 in {
+  defm ADD : ALU<0x0, "add", add>;
+  defm SUB : ALU<0x1, "sub", sub>;
+  defm OR  : ALU<0x4, "or", or>;
+  defm AND : ALU<0x5, "and", and>;
+  defm SLL : ALU<0x6, "sll", shl>;
+  defm SRL : ALU<0x7, "srl", srl>;
+  defm XOR : ALU<0xa, "xor", xor>;
+  defm SRA : ALU<0xc, "sra", sra>;
+}
+  defm MUL : ALU<0x2, "mul", mul>;
+  defm DIV : ALU<0x3, "div", udiv>;
+}
+
+class MOV_RR<string asmstr>
+  : InstBPF<(outs GPR:$rA), (ins GPR:$rX),
+            !strconcat(asmstr, "\t$rA, $rX"),
+            []> {
+  bits<4> op;
+  bits<1> src;
+  bits<4> rA;
+  bits<4> rX;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{55-52} = rX;
+  let Inst{51-48} = rA;
+
+  let op = 0xb;     // BPF_MOV
+  let src = 1;      // BPF_X
+  let bpfClass = 7; // BPF_ALU64
+}
+
+class MOV_RI<string asmstr>
+  : InstBPF<(outs GPR:$rA), (ins s32imm:$imm),
+            !strconcat(asmstr, "\t$rA, $imm"),
+            [(set GPR:$rA, (i64 immSExt32:$imm))]> {
+  bits<4> op;
+  bits<1> src;
+  bits<4> rA;
+  bits<32> imm;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{51-48} = rA;
+  let Inst{31-0} = imm;
+
+  let op = 0xb;     // BPF_MOV
+  let src = 0;      // BPF_K
+  let bpfClass = 7; // BPF_ALU64
+}
+def MOV_rr : MOV_RR<"mov">;
+def MOV_ri : MOV_RI<"mov">;
+
+class LD_IMM64<bits<4> pseudo, string asmstr>
+  : InstBPF<(outs GPR:$rA), (ins u64imm:$imm),
+            !strconcat(asmstr, "\t$rA, $imm"),
+            [(set GPR:$rA, (i64 imm:$imm))]> {
+
+  bits<3> mode;
+  bits<2> size;
+  bits<4> rA;
+  bits<64> imm;
+
+  let Inst{63-61} = mode;
+  let Inst{60-59} = size;
+  let Inst{51-48} = rA;
+  let Inst{55-52} = pseudo;
+  let Inst{47-32} = 0;
+  let Inst{31-0} = imm{31-0};
+
+  let mode = 0;     // BPF_IMM
+  let size = 3;     // BPF_DW
+  let bpfClass = 0; // BPF_LD
+}
+def LD_imm64 : LD_IMM64<0, "ld_64">;
+
+// STORE instructions
+class STORE<bits<2> sizeOp, string asmstring, list<dag> pattern>
+  : InstBPF<(outs), (ins GPR:$rX, MEMri:$addr),
+          !strconcat(asmstring, "\t$addr, $rX"), pattern> {
+  bits<3> mode;
+  bits<2> size;
+  bits<4> rX;
+  bits<20> addr;
+
+  let Inst{63-61} = mode;
+  let Inst{60-59} = size;
+  let Inst{51-48} = addr{19-16}; // base reg
+  let Inst{55-52} = rX;
+  let Inst{47-32} = addr{15-0}; // offset
+
+  let mode = 3;     // BPF_MEM
+  let size = sizeOp;
+  let bpfClass = 3; // BPF_STX
+}
+
+class STOREi64<bits<2> subOp, string asmstring, PatFrag opNode>
+  : STORE<subOp, asmstring, [(opNode (i64 GPR:$rX), ADDRri:$addr)]>;
+
+def STW : STOREi64<0x0, "stw", truncstorei32>;
+def STH : STOREi64<0x1, "sth", truncstorei16>;
+def STB : STOREi64<0x2, "stb", truncstorei8>;
+def STD : STOREi64<0x3, "std", store>;
+
+// LOAD instructions
+class LOAD<bits<2> sizeOp, string asmstring, list<dag> pattern>
+  : InstBPF<(outs GPR:$rA), (ins MEMri:$addr),
+           !strconcat(asmstring, "\t$rA, $addr"), pattern> {
+  bits<3> mode;
+  bits<2> size;
+  bits<4> rA;
+  bits<20> addr;
+
+  let Inst{63-61} = mode;
+  let Inst{60-59} = size;
+  let Inst{51-48} = rA;
+  let Inst{55-52} = addr{19-16};
+  let Inst{47-32} = addr{15-0};
+
+  let mode = 3;     // BPF_MEM
+  let size = sizeOp;
+  let bpfClass = 1; // BPF_LDX
+}
+
+class LOADi64<bits<2> sizeOp, string asmstring, PatFrag opNode>
+  : LOAD<sizeOp, asmstring, [(set (i64 GPR:$rA), (opNode ADDRri:$addr))]>;
+
+def LDW : LOADi64<0x0, "ldw", zextloadi32>;
+def LDH : LOADi64<0x1, "ldh", zextloadi16>;
+def LDB : LOADi64<0x2, "ldb", zextloadi8>;
+def LDD : LOADi64<0x3, "ldd", load>;
+
+//def LDBS : LOADi64<0x2, "ldbs", sextloadi8>;
+//def LDHS : LOADi64<0x1, "ldhs", sextloadi16>;
+//def LDWS : LOADi64<0x0, "ldws", sextloadi32>;
+
+class BRANCH<bits<4> subOp, string asmstring, list<dag> pattern>
+  : InstBPF<(outs), (ins brtarget:$dst),
+           !strconcat(asmstring, "\t$dst"), pattern> {
+  bits<4> op;
+  bits<16> dst;
+  bits<1> src;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{47-32} = dst;
+
+  let op = subOp;
+  let src = 0;
+  let bpfClass = 5; // BPF_JMP
+}
+
+class CALL<string asmstring>
+  : InstBPF<(outs), (ins calltarget:$dst),
+           !strconcat(asmstring, "\t$dst"), []> {
+  bits<4> op;
+  bits<32> dst;
+  bits<1> src;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{31-0} = dst;
+
+  let op = 8;       // BPF_CALL
+  let src = 0;
+  let bpfClass = 5; // BPF_JMP
+}
+
+// Jump always
+let isBranch = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1 in {
+  def JMP : BRANCH<0x0, "jmp", [(br bb:$dst)]>;
+}
+
+// Jump and link
+let isCall=1, hasDelaySlot=0,
+    Uses = [R11],
+    // Potentially clobbered registers
+    Defs = [R0, R1, R2, R3, R4, R5] in {
+  def JAL  : CALL<"call">;
+}
+
+class NOP_I<string asmstr>
+  : InstBPF<(outs), (ins i32imm:$imm),
+           !strconcat(asmstr, "\t$imm"), []> {
+  // mov r0, r0 == nop
+  bits<4> op;
+  bits<1> src;
+  bits<4> rA;
+  bits<4> rX;
+
+  let Inst{63-60} = op;
+  let Inst{59} = src;
+  let Inst{55-52} = rX;
+  let Inst{51-48} = rA;
+
+  let op = 0xb;     // BPF_MOV
+  let src = 1;      // BPF_X
+  let bpfClass = 7; // BPF_ALU64
+  let rX = 0;       // R0
+  let rA = 0;       // R0
+}
+
+let neverHasSideEffects = 1 in
+  def NOP : NOP_I<"nop">;
+
+class RET<string asmstring>
+  : InstBPF<(outs), (ins),
+           !strconcat(asmstring, ""), [(retflag)]> {
+  bits<4> op;
+
+  let Inst{63-60} = op;
+  let Inst{59} = 0;
+  let Inst{31-0} = 0;
+
+  let op = 9;       // BPF_EXIT
+  let bpfClass = 5; // BPF_JMP
+}
+
+let isReturn = 1, isTerminator = 1, hasDelaySlot=0, isBarrier = 1, isNotDuplicable = 1 in {
+  def RET : RET<"ret">;
+}
+
+// ADJCALLSTACKDOWN/UP pseudo insns
+let Defs = [R11], Uses = [R11] in {
+def ADJCALLSTACKDOWN : Pseudo<(outs), (ins i64imm:$amt),
+                              "#ADJCALLSTACKDOWN $amt",
+                              [(callseq_start timm:$amt)]>;
+def ADJCALLSTACKUP   : Pseudo<(outs), (ins i64imm:$amt1, i64imm:$amt2),
+                              "#ADJCALLSTACKUP $amt1 $amt2",
+                              [(callseq_end timm:$amt1, timm:$amt2)]>;
+}
+
+
+//let Defs = [R11], Uses = [R11] in {
+//  def ADJDYNALLOC : Pseudo<(outs GPR:$dst), (ins GPR:$src),
+//                    "#ADJDYNALLOC $dst $src",
+//                    [(set GPR:$dst, (BPFadjdynalloc GPR:$src))]>;
+//}
+
+
+let usesCustomInserter = 1 in {
+  def Select : Pseudo<(outs GPR:$dst), (ins GPR:$lhs, GPR:$rhs, s32imm:$imm, GPR:$src, GPR:$src2),
+                       "# Select PSEUDO $dst = $lhs $imm $rhs ? $src : $src2",
+                       [(set (i64 GPR:$dst),
+                        (BPFselectcc (i64 GPR:$lhs), (i64 GPR:$rhs), (i64 imm:$imm), (i64 GPR:$src), (i64 GPR:$src2)))]>;
+}
+
+// load 64-bit global addr into register
+def : Pat<(BPFWrapper tglobaladdr:$in), (LD_imm64 tglobaladdr:$in)>;
+
+// arbitrary immediate
+//def : Pat<(i64 imm:$imm), (ADD_ri (SLL_ri (MOV_ri (HI32 imm:$imm)), 32), (LO32 imm:$imm))>;
+
+// 0xffffFFFF doesn't fit into simm32, optimize common case
+def : Pat<(i64 (and (i64 GPR:$src), 0xffffFFFF)), (SRL_ri (SLL_ri (i64 GPR:$src), 32), 32)>;
+
+// Calls
+def : Pat<(call tglobaladdr:$dst), (JAL tglobaladdr:$dst)>;
+//def : Pat<(call texternalsym:$dst), (JAL texternalsym:$dst)>;
+//def : Pat<(call (i32 imm:$dst)), (JAL (i32 imm:$dst))>;
+def : Pat<(call imm:$dst), (JAL imm:$dst)>;
+
+// Loads
+def : Pat<(extloadi8  ADDRri:$src), (i64 (LDB ADDRri:$src))>;
+def : Pat<(extloadi16 ADDRri:$src), (i64 (LDH ADDRri:$src))>;
+def : Pat<(extloadi32 ADDRri:$src), (i64 (LDW ADDRri:$src))>;
+
+// Atomics
+class XADD<bits<2> sizeOp, string asmstr, PatFrag opNode>
+  : InstBPF<(outs GPR:$dst), (ins MEMri:$addr, GPR:$val),
+            !strconcat(asmstr, "\t$dst, $addr, $val"),
+            [(set GPR:$dst, (opNode ADDRri:$addr, GPR:$val))]> {
+  bits<3> mode;
+  bits<2> size;
+  bits<4> rX;
+  bits<20> addr;
+
+  let Inst{63-61} = mode;
+  let Inst{60-59} = size;
+  let Inst{51-48} = addr{19-16}; // base reg
+  let Inst{55-52} = rX;
+  let Inst{47-32} = addr{15-0}; // offset
+
+  let mode = 6;     // BPF_XADD
+  let size = sizeOp;
+  let bpfClass = 3; // BPF_STX
+}
+
+let Constraints = "$dst = $val" in {
+def XADD32 : XADD<0, "xadd32", atomic_load_add_32>;
+def XADD16 : XADD<1, "xadd16", atomic_load_add_16>;
+def XADD8  : XADD<2, "xadd8", atomic_load_add_8>;
+def XADD64 : XADD<3, "xadd64", atomic_load_add_64>;
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.cpp
new file mode 100644
index 000000000000..5c15ed76b27b
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.cpp
@@ -0,0 +1,77 @@
+//=-- BPFMCInstLower.cpp - Convert BPF MachineInstr to an MCInst ----------=//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file contains code to lower BPF MachineInstrs to their corresponding
+// MCInst records.
+
+#include "BPFMCInstLower.h"
+#include "MCTargetDesc/BPFBaseInfo.h"
+#include "llvm/CodeGen/AsmPrinter.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/MC/MCAsmInfo.h"
+#include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCExpr.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/Target/Mangler.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/ADT/SmallString.h"
+using namespace llvm;
+
+MCSymbol *BPFMCInstLower::
+GetGlobalAddressSymbol(const MachineOperand &MO) const {
+#if LLVM_VERSION_MINOR==4
+  return Printer.getSymbol(MO.getGlobal());
+#else
+  return Printer.Mang->getSymbol(MO.getGlobal());
+#endif
+}
+
+MCOperand BPFMCInstLower::
+LowerSymbolOperand(const MachineOperand &MO, MCSymbol *Sym) const {
+
+  const MCExpr *Expr = MCSymbolRefExpr::Create(Sym, Ctx);
+
+  if (!MO.isJTI() && MO.getOffset())
+    llvm_unreachable("unknown symbol op");
+//    Expr = MCBinaryExpr::CreateAdd(Expr,
+//                                   MCConstantExpr::Create(MO.getOffset(), Ctx),
+//                                   Ctx);
+  return MCOperand::CreateExpr(Expr);
+}
+
+void BPFMCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
+  OutMI.setOpcode(MI->getOpcode());
+
+  for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
+    const MachineOperand &MO = MI->getOperand(i);
+
+    MCOperand MCOp;
+    switch (MO.getType()) {
+    default:
+      MI->dump();
+      llvm_unreachable("unknown operand type");
+    case MachineOperand::MO_Register:
+      // Ignore all implicit register operands.
+      if (MO.isImplicit()) continue;
+      MCOp = MCOperand::CreateReg(MO.getReg());
+      break;
+    case MachineOperand::MO_Immediate:
+      MCOp = MCOperand::CreateImm(MO.getImm());
+      break;
+    case MachineOperand::MO_MachineBasicBlock:
+      MCOp = MCOperand::CreateExpr(MCSymbolRefExpr::Create(
+                                   MO.getMBB()->getSymbol(), Ctx));
+      break;
+    case MachineOperand::MO_RegisterMask:
+      continue;
+    case MachineOperand::MO_GlobalAddress:
+      MCOp = LowerSymbolOperand(MO, GetGlobalAddressSymbol(MO));
+      break;
+    }
+
+    OutMI.addOperand(MCOp);
+  }
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.h b/tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.h
new file mode 100644
index 000000000000..aaff0c3e8a68
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFMCInstLower.h
@@ -0,0 +1,40 @@
+//===-- BPFMCInstLower.h - Lower MachineInstr to MCInst --------*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#ifndef BPF_MCINSTLOWER_H
+#define BPF_MCINSTLOWER_H
+
+#include "llvm/Support/Compiler.h"
+
+namespace llvm {
+  class AsmPrinter;
+  class MCContext;
+  class MCInst;
+  class MCOperand;
+  class MCSymbol;
+  class MachineInstr;
+  class MachineModuleInfoMachO;
+  class MachineOperand;
+  class Mangler;
+
+  /// BPFMCInstLower - This class is used to lower an MachineInstr
+  /// into an MCInst.
+class LLVM_LIBRARY_VISIBILITY BPFMCInstLower {
+  MCContext &Ctx;
+  Mangler &Mang;
+
+  AsmPrinter &Printer;
+public:
+  BPFMCInstLower(MCContext &ctx, Mangler &mang, AsmPrinter &printer)
+    : Ctx(ctx), Mang(mang), Printer(printer) {}
+  void Lower(const MachineInstr *MI, MCInst &OutMI) const;
+
+  MCOperand LowerSymbolOperand(const MachineOperand &MO, MCSymbol *Sym) const;
+
+  MCSymbol *GetGlobalAddressSymbol(const MachineOperand &MO) const;
+};
+
+}
+
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
new file mode 100644
index 000000000000..7d4604153d50
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.cpp
@@ -0,0 +1,122 @@
+//===-- BPFRegisterInfo.cpp - BPF Register Information --------*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file contains the BPF implementation of the TargetRegisterInfo class.
+
+#include "BPF.h"
+#include "BPFRegisterInfo.h"
+#include "BPFSubtarget.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/RegisterScavenging.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Target/TargetFrameLowering.h"
+#include "llvm/Target/TargetInstrInfo.h"
+
+#define GET_REGINFO_TARGET_DESC
+#include "BPFGenRegisterInfo.inc"
+using namespace llvm;
+
+BPFRegisterInfo::BPFRegisterInfo(const TargetInstrInfo &tii)
+  : BPFGenRegisterInfo(BPF::R0), TII(tii) {
+}
+
+const uint16_t*
+BPFRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
+  return CSR_SaveList;
+}
+
+BitVector BPFRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
+  BitVector Reserved(getNumRegs());
+  Reserved.set(BPF::R10);
+  Reserved.set(BPF::R11);
+  return Reserved;
+}
+
+bool
+BPFRegisterInfo::requiresRegisterScavenging(const MachineFunction &MF) const {
+  return true;
+}
+
+void
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+BPFRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
+                           int SPAdj, unsigned FIOperandNum,
+                           RegScavenger *RS) const {
+#else
+BPFRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
+                                       int SPAdj, RegScavenger *RS) const {
+#endif
+  assert(SPAdj == 0 && "Unexpected");
+
+  unsigned i = 0;
+  MachineInstr &MI = *II;
+  MachineFunction &MF = *MI.getParent()->getParent();
+  DebugLoc dl = MI.getDebugLoc();
+
+  while (!MI.getOperand(i).isFI()) {
+    ++i;
+    assert(i < MI.getNumOperands() && "Instr doesn't have FrameIndex operand!");
+  }
+
+  unsigned FrameReg = getFrameRegister(MF);
+  int FrameIndex = MI.getOperand(i).getIndex();
+
+  if (MI.getOpcode() == BPF::MOV_rr) {
+    int Offset = MF.getFrameInfo()->getObjectOffset(FrameIndex);
+
+    MI.getOperand(i).ChangeToRegister(FrameReg, false);
+
+    MachineBasicBlock &MBB = *MI.getParent();
+    unsigned reg = MI.getOperand(i - 1).getReg();
+    BuildMI(MBB, ++ II, dl, TII.get(BPF::ADD_ri), reg)
+       .addReg(reg).addImm(Offset);
+    return;
+  }
+
+  int Offset = MF.getFrameInfo()->getObjectOffset(FrameIndex) +
+               MI.getOperand(i+1).getImm();
+
+  if (!isInt<32>(Offset)) {
+    llvm_unreachable("bug in frame offset");
+  }
+
+  MI.getOperand(i).ChangeToRegister(FrameReg, false);
+  MI.getOperand(i+1).ChangeToImmediate(Offset);
+}
+
+void BPFRegisterInfo::
+processFunctionBeforeFrameFinalized(MachineFunction &MF) const {}
+
+bool BPFRegisterInfo::hasBasePointer(const MachineFunction &MF) const {
+   return false;
+}
+
+bool BPFRegisterInfo::needsStackRealignment(const MachineFunction &MF) const {
+  return false;
+}
+
+unsigned BPFRegisterInfo::getRARegister() const {
+  return BPF::R0;
+}
+
+unsigned BPFRegisterInfo::getFrameRegister(const MachineFunction &MF) const {
+  return BPF::R10;
+}
+
+unsigned BPFRegisterInfo::getBaseRegister() const {
+  llvm_unreachable("What is the base register");
+  return 0;
+}
+
+unsigned BPFRegisterInfo::getEHExceptionRegister() const {
+  llvm_unreachable("What is the exception register");
+  return 0;
+}
+
+unsigned BPFRegisterInfo::getEHHandlerRegister() const {
+  llvm_unreachable("What is the exception handler register");
+  return 0;
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.h b/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.h
new file mode 100644
index 000000000000..8aeb341fbb30
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.h
@@ -0,0 +1,65 @@
+//===- BPFRegisterInfo.h - BPF Register Information Impl ------*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file contains the BPF implementation of the TargetRegisterInfo class.
+
+#ifndef BPFREGISTERINFO_H
+#define BPFREGISTERINFO_H
+
+#include "llvm/Target/TargetRegisterInfo.h"
+
+#define GET_REGINFO_HEADER
+#include "BPFGenRegisterInfo.inc"
+
+namespace llvm {
+
+class TargetInstrInfo;
+class Type;
+
+struct BPFRegisterInfo : public BPFGenRegisterInfo {
+  const TargetInstrInfo &TII;
+
+  BPFRegisterInfo(const TargetInstrInfo &tii);
+
+  /// Code Generation virtual methods...
+  const uint16_t *getCalleeSavedRegs(const MachineFunction *MF = 0) const;
+
+  BitVector getReservedRegs(const MachineFunction &MF) const;
+
+  bool requiresRegisterScavenging(const MachineFunction &MF) const;
+
+  // llvm 3.2 defines it here
+  void eliminateCallFramePseudoInstr(MachineFunction &MF,
+                                     MachineBasicBlock &MBB,
+                                     MachineBasicBlock::iterator I) const {
+    // Discard ADJCALLSTACKDOWN, ADJCALLSTACKUP instructions.
+    MBB.erase(I);
+  }
+
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+  void eliminateFrameIndex(MachineBasicBlock::iterator MI,
+                           int SPAdj, unsigned FIOperandNum,
+                           RegScavenger *RS = NULL) const;
+#else
+  void eliminateFrameIndex(MachineBasicBlock::iterator II,
+                           int SPAdj, RegScavenger *RS = NULL) const;
+#endif
+
+  void processFunctionBeforeFrameFinalized(MachineFunction &MF) const;
+
+  bool hasBasePointer(const MachineFunction &MF) const;
+  bool needsStackRealignment(const MachineFunction &MF) const;
+
+  // Debug information queries.
+  unsigned getRARegister() const;
+  unsigned getFrameRegister(const MachineFunction &MF) const;
+  unsigned getBaseRegister() const;
+
+  // Exception handling queries.
+  unsigned getEHExceptionRegister() const;
+  unsigned getEHHandlerRegister() const;
+  int getDwarfRegNum(unsigned RegNum, bool isEH) const;
+};
+}
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.td b/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.td
new file mode 100644
index 000000000000..fac081788be1
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFRegisterInfo.td
@@ -0,0 +1,39 @@
+//===- BPFRegisterInfo.td - BPF Register defs ------------*- tablegen -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//  Declarations that describe the BPF register file
+
+class BPFReg<string n> : Register<n> {
+  field bits<4> Num;
+  let Namespace = "BPF";
+}
+
+// Registers are identified with 4-bit ID numbers.
+// Ri - 64-bit integer registers
+class Ri<bits<4> num, string n> : BPFReg<n> {
+  let Num = num;
+}
+
+// Integer registers
+def R0 : Ri< 0, "r0">, DwarfRegNum<[0]>;
+def R1 : Ri< 1, "r1">, DwarfRegNum<[1]>;
+def R2 : Ri< 2, "r2">, DwarfRegNum<[2]>;
+def R3 : Ri< 3, "r3">, DwarfRegNum<[3]>;
+def R4 : Ri< 4, "r4">, DwarfRegNum<[4]>;
+def R5 : Ri< 5, "r5">, DwarfRegNum<[5]>;
+def R6 : Ri< 6, "r6">, DwarfRegNum<[6]>;
+def R7 : Ri< 7, "r7">, DwarfRegNum<[7]>;
+def R8 : Ri< 8, "r8">, DwarfRegNum<[8]>;
+def R9 : Ri< 9, "r9">, DwarfRegNum<[9]>;
+def R10 : Ri<10, "r10">, DwarfRegNum<[10]>;
+def R11 : Ri<11, "r11">, DwarfRegNum<[11]>;
+
+// Register classes.
+def GPR : RegisterClass<"BPF", [i64], 64, (add R1, R2, R3, R4, R5,
+                                           R6, R7, R8, R9, // callee saved
+                                           R0, // return value
+                                           R11,  // stack ptr
+                                           R10  // frame ptr
+                                          )>;
+
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.cpp
new file mode 100644
index 000000000000..6e98f6d1a514
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.cpp
@@ -0,0 +1,23 @@
+//===- BPFSubtarget.cpp - BPF Subtarget Information -----------*- C++ -*-=//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#include "BPF.h"
+#include "BPFSubtarget.h"
+#define GET_SUBTARGETINFO_TARGET_DESC
+#define GET_SUBTARGETINFO_CTOR
+#include "BPFGenSubtargetInfo.inc"
+using namespace llvm;
+
+void BPFSubtarget::anchor() { }
+
+BPFSubtarget::BPFSubtarget(const std::string &TT,
+                           const std::string &CPU, const std::string &FS)
+  : BPFGenSubtargetInfo(TT, CPU, FS)
+{
+  std::string CPUName = CPU;
+  if (CPUName.empty())
+    CPUName = "generic";
+
+  ParseSubtargetFeatures(CPUName, FS);
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.h b/tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.h
new file mode 100644
index 000000000000..cd5d8759c20f
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFSubtarget.h
@@ -0,0 +1,33 @@
+//=====-- BPFSubtarget.h - Define Subtarget for the BPF -----*- C++ -*--==//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#ifndef BPFSUBTARGET_H
+#define BPFSUBTARGET_H
+
+#include "llvm/Target/TargetSubtargetInfo.h"
+#include "llvm/Target/TargetMachine.h"
+
+#include <string>
+
+#define GET_SUBTARGETINFO_HEADER
+#include "BPFGenSubtargetInfo.inc"
+
+namespace llvm {
+
+class BPFSubtarget : public BPFGenSubtargetInfo {
+  virtual void anchor();
+public:
+  /// This constructor initializes the data members to match that
+  /// of the specified triple.
+  ///
+  BPFSubtarget(const std::string &TT, const std::string &CPU,
+                 const std::string &FS);
+  
+  /// ParseSubtargetFeatures - Parses features string setting specified 
+  /// subtarget options.  Definition of function is auto generated by tblgen.
+  void ParseSubtargetFeatures(StringRef CPU, StringRef FS);
+};
+}
+
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.cpp b/tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.cpp
new file mode 100644
index 000000000000..73b927d67510
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.cpp
@@ -0,0 +1,66 @@
+//===-- BPFTargetMachine.cpp - Define TargetMachine for BPF ---------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// Implements the info about BPF target spec.
+
+#include "BPF.h"
+#include "BPFTargetMachine.h"
+#include "llvm/PassManager.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/Support/FormattedStream.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Target/TargetOptions.h"
+using namespace llvm;
+
+extern "C" void LLVMInitializeBPFTarget() {
+  // Register the target.
+  RegisterTargetMachine<BPFTargetMachine> X(TheBPFTarget);
+}
+
+// DataLayout --> Little-endian, 64-bit pointer/ABI/alignment
+// The stack is always 8 byte aligned
+// On function prologue, the stack is created by decrementing
+// its pointer. Once decremented, all references are done with positive
+// offset from the stack/frame pointer.
+BPFTargetMachine::
+BPFTargetMachine(const Target &T, StringRef TT,
+                    StringRef CPU, StringRef FS, const TargetOptions &Options,
+                    Reloc::Model RM, CodeModel::Model CM,
+                    CodeGenOpt::Level OL)
+  : LLVMTargetMachine(T, TT, CPU, FS, Options, RM, CM, OL),
+  Subtarget(TT, CPU, FS),
+  // x86-64 like
+  DL("e-p:64:64-s:64-f64:64:64-i64:64:64-n8:16:32:64-S128"),
+  InstrInfo(), TLInfo(*this), TSInfo(*this),
+  FrameLowering(Subtarget) {
+#if LLVM_VERSION_MINOR==4
+  initAsmInfo();
+#endif
+}
+namespace {
+/// BPF Code Generator Pass Configuration Options.
+class BPFPassConfig : public TargetPassConfig {
+public:
+  BPFPassConfig(BPFTargetMachine *TM, PassManagerBase &PM)
+    : TargetPassConfig(TM, PM) {}
+
+  BPFTargetMachine &getBPFTargetMachine() const {
+    return getTM<BPFTargetMachine>();
+  }
+
+  virtual bool addInstSelector();
+};
+}
+
+TargetPassConfig *BPFTargetMachine::createPassConfig(PassManagerBase &PM) {
+  return new BPFPassConfig(this, PM);
+}
+
+// Install an instruction selector pass using
+// the ISelDag to gen BPF code.
+bool BPFPassConfig::addInstSelector() {
+  addPass(createBPFISelDag(getBPFTargetMachine()));
+
+  return false;
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.h b/tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.h
new file mode 100644
index 000000000000..1d6b07084260
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/BPFTargetMachine.h
@@ -0,0 +1,69 @@
+//===-- BPFTargetMachine.h - Define TargetMachine for BPF --- C++ ---===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file declares the BPF specific subclass of TargetMachine.
+
+#ifndef BPF_TARGETMACHINE_H
+#define BPF_TARGETMACHINE_H
+
+#include "BPFSubtarget.h"
+#include "BPFInstrInfo.h"
+#include "BPFISelLowering.h"
+#include "llvm/Target/TargetSelectionDAGInfo.h"
+#include "BPFFrameLowering.h"
+#include "llvm/Target/TargetMachine.h"
+#if !defined(LLVM_VERSION_MINOR)
+#error "Uknown version"
+#endif
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+#include "llvm/IR/DataLayout.h"
+#else
+#include "llvm/DataLayout.h"
+#endif
+#include "llvm/Target/TargetFrameLowering.h"
+
+namespace llvm {
+  class formatted_raw_ostream;
+
+  class BPFTargetMachine : public LLVMTargetMachine {
+    BPFSubtarget       Subtarget;
+    const DataLayout   DL; // Calculates type size & alignment
+    BPFInstrInfo       InstrInfo;
+    BPFTargetLowering  TLInfo;
+    TargetSelectionDAGInfo TSInfo;
+    BPFFrameLowering   FrameLowering;
+  public:
+    BPFTargetMachine(const Target &T, StringRef TT,
+                        StringRef CPU, StringRef FS,
+                        const TargetOptions &Options,
+                        Reloc::Model RM, CodeModel::Model CM,
+                        CodeGenOpt::Level OL);
+
+    virtual const BPFInstrInfo *getInstrInfo() const
+    { return &InstrInfo; }
+
+    virtual const TargetFrameLowering *getFrameLowering() const
+    { return &FrameLowering; }
+
+    virtual const BPFSubtarget *getSubtargetImpl() const
+    { return &Subtarget; }
+
+    virtual const DataLayout *getDataLayout() const
+    { return &DL;}
+
+    virtual const BPFRegisterInfo *getRegisterInfo() const
+    { return &InstrInfo.getRegisterInfo(); }
+
+    virtual const BPFTargetLowering *getTargetLowering() const
+    { return &TLInfo; }
+
+    virtual const TargetSelectionDAGInfo* getSelectionDAGInfo() const
+    { return &TSInfo; }
+
+    // Pass Pipeline Configuration
+    virtual TargetPassConfig *createPassConfig(PassManagerBase &PM);
+  };
+}
+
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.cpp b/tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.cpp
new file mode 100644
index 000000000000..3ed0ea1981fe
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.cpp
@@ -0,0 +1,81 @@
+//===-- BPFInstPrinter.cpp - Convert BPF MCInst to asm syntax -----------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This class prints an BPF MCInst to a .s file.
+
+#define DEBUG_TYPE "asm-printer"
+#include "BPF.h"
+#include "BPFInstPrinter.h"
+#include "llvm/MC/MCAsmInfo.h"
+#include "llvm/MC/MCExpr.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/MC/MCSymbol.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormattedStream.h"
+using namespace llvm;
+
+
+// Include the auto-generated portion of the assembly writer.
+#include "BPFGenAsmWriter.inc"
+
+void BPFInstPrinter::printInst(const MCInst *MI, raw_ostream &O,
+                                StringRef Annot) {
+  printInstruction(MI, O);
+  printAnnotation(O, Annot);
+}
+
+static void printExpr(const MCExpr *Expr, raw_ostream &O) {
+  const MCSymbolRefExpr *SRE;
+
+  if (const MCBinaryExpr *BE = dyn_cast<MCBinaryExpr>(Expr))
+    SRE = dyn_cast<MCSymbolRefExpr>(BE->getLHS());
+  else
+    SRE = dyn_cast<MCSymbolRefExpr>(Expr);
+  assert(SRE && "Unexpected MCExpr type.");
+
+  MCSymbolRefExpr::VariantKind Kind = SRE->getKind();
+
+  assert(Kind == MCSymbolRefExpr::VK_None);
+  O << *Expr;
+}
+
+void BPFInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
+                                   raw_ostream &O, const char *Modifier) {
+  assert((Modifier == 0 || Modifier[0] == 0) && "No modifiers supported");
+  const MCOperand &Op = MI->getOperand(OpNo);
+  if (Op.isReg()) {
+    O << getRegisterName(Op.getReg());
+  } else if (Op.isImm()) {
+    O << (int32_t)Op.getImm();
+  } else {
+    assert(Op.isExpr() && "Expected an expression");
+    printExpr(Op.getExpr(), O);
+  }
+}
+
+void BPFInstPrinter::printMemOperand(const MCInst *MI, int OpNo,
+                                      raw_ostream &O, const char *Modifier) {
+  const MCOperand &RegOp = MI->getOperand(OpNo);
+  const MCOperand &OffsetOp = MI->getOperand(OpNo+1);
+  // offset
+  if (OffsetOp.isImm()) {
+    O << OffsetOp.getImm();
+  } else {
+    assert(0 && "Expected an immediate");
+//    assert(OffsetOp.isExpr() && "Expected an expression");
+//    printExpr(OffsetOp.getExpr(), O);
+  }
+  // register
+  assert(RegOp.isReg() && "Register operand not a register");
+  O  << "(" << getRegisterName(RegOp.getReg()) << ")";
+}
+
+void BPFInstPrinter::printImm64Operand(const MCInst *MI, unsigned OpNo,
+                                         raw_ostream &O) {
+  const MCOperand &Op = MI->getOperand(OpNo);
+  if (Op.isImm())
+    O << (uint64_t)Op.getImm();
+  else
+    O << Op;
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.h b/tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.h
new file mode 100644
index 000000000000..77dd30dca040
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/InstPrinter/BPFInstPrinter.h
@@ -0,0 +1,34 @@
+//= BPFInstPrinter.h - Convert BPF MCInst to asm syntax ---------*- C++ -*--//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This class prints a BPF MCInst to a .s file.
+
+#ifndef BPFINSTPRINTER_H
+#define BPFINSTPRINTER_H
+
+#include "llvm/MC/MCInstPrinter.h"
+
+namespace llvm {
+  class MCOperand;
+
+  class BPFInstPrinter : public MCInstPrinter {
+  public:
+    BPFInstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,
+                    const MCRegisterInfo &MRI)
+      : MCInstPrinter(MAI, MII, MRI) {}
+
+    void printInst(const MCInst *MI, raw_ostream &O, StringRef Annot);
+    void printOperand(const MCInst *MI, unsigned OpNo,
+                      raw_ostream &O, const char *Modifier = 0);
+    void printMemOperand(const MCInst *MI, int OpNo,raw_ostream &O,
+                         const char *Modifier = 0);
+    void printImm64Operand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
+
+    // Autogenerated by tblgen.
+    void printInstruction(const MCInst *MI, raw_ostream &O);
+    static const char *getRegisterName(unsigned RegNo);
+  };
+}
+
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp
new file mode 100644
index 000000000000..4bf62f46d631
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp
@@ -0,0 +1,89 @@
+//===-- BPFAsmBackend.cpp - BPF Assembler Backend -----------------------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#include "MCTargetDesc/BPFMCTargetDesc.h"
+#include "llvm/MC/MCAsmBackend.h"
+#include "llvm/MC/MCAssembler.h"
+#include "llvm/MC/MCDirectives.h"
+#include "llvm/MC/MCELFObjectWriter.h"
+#include "llvm/MC/MCFixupKindInfo.h"
+#include "llvm/MC/MCObjectWriter.h"
+#include "llvm/MC/MCSubtargetInfo.h"
+#include "llvm/MC/MCExpr.h"
+#include "llvm/MC/MCSymbol.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace llvm;
+
+namespace {
+class BPFAsmBackend : public MCAsmBackend {
+public:
+  BPFAsmBackend(): MCAsmBackend() {}
+  virtual ~BPFAsmBackend() {}
+
+  void applyFixup(const MCFixup &Fixup, char *Data, unsigned DataSize,
+                  uint64_t Value) const;
+
+  MCObjectWriter *createObjectWriter(raw_ostream &OS) const;
+
+  // No instruction requires relaxation
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+  bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,
+                            const MCRelaxableFragment *DF,
+                            const MCAsmLayout &Layout) const { return false; }
+#else
+  bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value, 
+                            const MCInstFragment *DF,
+                            const MCAsmLayout &Layout) const { return false; }
+#endif
+  
+  unsigned getNumFixupKinds() const { return 1; }
+
+  bool mayNeedRelaxation(const MCInst &Inst) const { return false; }
+
+  void relaxInstruction(const MCInst &Inst, MCInst &Res) const {}
+
+  bool writeNopData(uint64_t Count, MCObjectWriter *OW) const;
+};
+
+bool BPFAsmBackend::writeNopData(uint64_t Count, MCObjectWriter *OW) const {
+  if ((Count % 8) != 0)
+    return false;
+
+  for (uint64_t i = 0; i < Count; i += 8)
+    OW->Write64(0x15000000);
+
+  return true;
+}
+
+void BPFAsmBackend::applyFixup(const MCFixup &Fixup, char *Data,
+                               unsigned DataSize, uint64_t Value) const {
+
+  if (0)
+   errs() << "<MCFixup" << " Offset:" << Fixup.getOffset() << " Value:" <<
+     *(Fixup.getValue()) << " Kind:" << Fixup.getKind() <<
+     " val " << Value << ">\n";
+
+  if (Fixup.getKind() == FK_SecRel_4 || Fixup.getKind() == FK_SecRel_8) {
+    assert (Value == 0);
+    return;
+  }
+  assert (Fixup.getKind() == FK_PCRel_2);
+  *(uint16_t*)&Data[Fixup.getOffset() + 2] = (uint16_t) ((Value - 8) / 8);
+}
+
+MCObjectWriter *BPFAsmBackend::createObjectWriter(raw_ostream &OS) const {
+  return createBPFELFObjectWriter(OS, 0);
+}
+
+}
+
+MCAsmBackend *llvm::createBPFAsmBackend(const Target &T,
+#if LLVM_VERSION_MINOR==4
+                                        const MCRegisterInfo &MRI,
+#endif
+                                        StringRef TT, StringRef CPU) {
+  return new BPFAsmBackend();
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFBaseInfo.h b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFBaseInfo.h
new file mode 100644
index 000000000000..9d03073668da
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFBaseInfo.h
@@ -0,0 +1,33 @@
+//===-- BPFBaseInfo.h - Top level definitions for BPF MC ------*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+#ifndef BPFBASEINFO_H
+#define BPFBASEINFO_H
+
+#include "BPFMCTargetDesc.h"
+#include "llvm/MC/MCExpr.h"
+#include "llvm/Support/DataTypes.h"
+#include "llvm/Support/ErrorHandling.h"
+
+namespace llvm {
+
+static inline unsigned getBPFRegisterNumbering(unsigned Reg) {
+  switch(Reg) {
+    case BPF::R0  : return 0;
+    case BPF::R1  : return 1;
+    case BPF::R2  : return 2;
+    case BPF::R3  : return 3;
+    case BPF::R4  : return 4;
+    case BPF::R5  : return 5;
+    case BPF::R6  : return 6;
+    case BPF::R7  : return 7;
+    case BPF::R8  : return 8;
+    case BPF::R9  : return 9;
+    case BPF::R10 : return 10;
+    case BPF::R11 : return 11;
+    default: llvm_unreachable("Unknown register number!");
+  }
+}
+
+}
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFELFObjectWriter.cpp b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFELFObjectWriter.cpp
new file mode 100644
index 000000000000..7170e878c805
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFELFObjectWriter.cpp
@@ -0,0 +1,56 @@
+//===-- BPFELFObjectWriter.cpp - BPF Writer -------------------------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#include "MCTargetDesc/BPFBaseInfo.h"
+#include "MCTargetDesc/BPFMCTargetDesc.h"
+#include "llvm/MC/MCELFObjectWriter.h"
+#include "llvm/MC/MCFixup.h"
+#include "llvm/Support/ErrorHandling.h"
+
+using namespace llvm;
+
+namespace {
+  class BPFELFObjectWriter : public MCELFObjectTargetWriter {
+  public:
+    BPFELFObjectWriter(uint8_t OSABI);
+
+    virtual ~BPFELFObjectWriter();
+  protected:
+    virtual unsigned GetRelocType(const MCValue &Target, const MCFixup &Fixup,
+                                  bool IsPCRel, bool IsRelocWithSymbol,
+                                  int64_t Addend) const;
+  };
+}
+
+BPFELFObjectWriter::BPFELFObjectWriter(uint8_t OSABI)
+  : MCELFObjectTargetWriter(/*Is64Bit*/ true, OSABI, ELF::EM_NONE,
+                            /*HasRelocationAddend*/ false) {}
+
+BPFELFObjectWriter::~BPFELFObjectWriter() {
+}
+
+unsigned BPFELFObjectWriter::GetRelocType(const MCValue &Target,
+                                          const MCFixup &Fixup,
+                                          bool IsPCRel,
+                                          bool IsRelocWithSymbol,
+                                          int64_t Addend) const {
+  // determine the type of the relocation
+  unsigned Type;
+  switch ((unsigned)Fixup.getKind()) {
+  default: llvm_unreachable("invalid fixup kind!");
+  case FK_SecRel_8:
+    Type = ELF::R_X86_64_64;
+    break;
+  case FK_SecRel_4:
+    Type = ELF::R_X86_64_PC32;
+    break;
+  }
+  return Type;
+}
+
+MCObjectWriter *llvm::createBPFELFObjectWriter(raw_ostream &OS,
+                                               uint8_t OSABI) {
+  MCELFObjectTargetWriter *MOTW = new BPFELFObjectWriter(OSABI);
+  return createELFObjectWriter(MOTW, OS,  /*IsLittleEndian=*/ true);
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCAsmInfo.h b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCAsmInfo.h
new file mode 100644
index 000000000000..99132ee9230c
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCAsmInfo.h
@@ -0,0 +1,34 @@
+//=====-- BPFMCAsmInfo.h - BPF asm properties -----------*- C++ -*--====//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#ifndef BPF_MCASM_INFO_H
+#define BPF_MCASM_INFO_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/MC/MCAsmInfo.h"
+
+namespace llvm {
+  class Target;
+  
+  class BPFMCAsmInfo : public MCAsmInfo {
+  public:
+#if LLVM_VERSION_MINOR==4
+    explicit BPFMCAsmInfo(StringRef TT) {
+#else
+    explicit BPFMCAsmInfo(const Target &T, StringRef TT) {
+#endif
+      PrivateGlobalPrefix         = ".L";
+      WeakRefDirective            = "\t.weak\t";
+
+      // BPF assembly requires ".section" before ".bss"
+      UsesELFSectionDirectiveForBSS = true;
+
+      HasSingleParameterDotFile = false;
+      HasDotTypeDotSizeDirective = false;
+    }
+  };
+
+}
+
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp
new file mode 100644
index 000000000000..a215bb7b365f
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp
@@ -0,0 +1,167 @@
+//===-- BPFMCCodeEmitter.cpp - Convert BPF code to machine code ---------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#define DEBUG_TYPE "mccodeemitter"
+#include "MCTargetDesc/BPFBaseInfo.h"
+#include "MCTargetDesc/BPFMCTargetDesc.h"
+#include "llvm/MC/MCCodeEmitter.h"
+#include "llvm/MC/MCFixup.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/MC/MCInstrInfo.h"
+#include "llvm/MC/MCRegisterInfo.h"
+#include "llvm/MC/MCSubtargetInfo.h"
+#include "llvm/MC/MCSymbol.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/Support/raw_ostream.h"
+using namespace llvm;
+
+namespace {
+class BPFMCCodeEmitter : public MCCodeEmitter {
+  BPFMCCodeEmitter(const BPFMCCodeEmitter &);
+  void operator=(const BPFMCCodeEmitter &);
+  const MCInstrInfo &MCII;
+  const MCSubtargetInfo &STI;
+  MCContext &Ctx;
+
+public:
+  BPFMCCodeEmitter(const MCInstrInfo &mcii, const MCSubtargetInfo &sti,
+                    MCContext &ctx)
+    : MCII(mcii), STI(sti), Ctx(ctx) {
+    }
+
+  ~BPFMCCodeEmitter() {}
+
+  // getBinaryCodeForInstr - TableGen'erated function for getting the
+  // binary encoding for an instruction.
+  uint64_t getBinaryCodeForInstr(const MCInst &MI,
+                                 SmallVectorImpl<MCFixup> &Fixups) const;
+
+   // getMachineOpValue - Return binary encoding of operand. If the machin
+   // operand requires relocation, record the relocation and return zero.
+  unsigned getMachineOpValue(const MCInst &MI,const MCOperand &MO,
+                             SmallVectorImpl<MCFixup> &Fixups) const;
+
+  uint64_t getMemoryOpValue(const MCInst &MI, unsigned Op,
+                            SmallVectorImpl<MCFixup> &Fixups) const;
+
+  void EncodeInstruction(const MCInst &MI, raw_ostream &OS,
+                         SmallVectorImpl<MCFixup> &Fixups) const;
+};
+}
+
+MCCodeEmitter *llvm::createBPFMCCodeEmitter(const MCInstrInfo &MCII,
+                                             const MCRegisterInfo &MRI,
+                                             const MCSubtargetInfo &STI,
+                                             MCContext &Ctx) {
+  return new BPFMCCodeEmitter(MCII, STI, Ctx);
+}
+
+/// getMachineOpValue - Return binary encoding of operand. If the machine
+/// operand requires relocation, record the relocation and return zero.
+unsigned BPFMCCodeEmitter::
+getMachineOpValue(const MCInst &MI, const MCOperand &MO,
+                  SmallVectorImpl<MCFixup> &Fixups) const {
+  if (MO.isReg())
+    return getBPFRegisterNumbering(MO.getReg());
+  if (MO.isImm())
+    return static_cast<unsigned>(MO.getImm());
+  
+  assert(MO.isExpr());
+
+  const MCExpr *Expr = MO.getExpr();
+  MCExpr::ExprKind Kind = Expr->getKind();
+
+/*  if (Kind == MCExpr::Binary) {
+    Expr = static_cast<const MCBinaryExpr*>(Expr)->getLHS();
+    Kind = Expr->getKind();
+  }*/
+
+  assert (Kind == MCExpr::SymbolRef);
+
+  if (MI.getOpcode() == BPF::JAL) {
+    /* func call name */
+    Fixups.push_back(MCFixup::Create(0, Expr, FK_SecRel_4));
+//    const MCSymbolRefExpr *SRE = dyn_cast<MCSymbolRefExpr>(Expr);
+//    return getStrtabIndex(SRE->getSymbol().getName());
+
+  } else if (MI.getOpcode() == BPF::LD_imm64) {
+    Fixups.push_back(MCFixup::Create(0, Expr, FK_SecRel_8));
+  } else {
+    /* bb label */
+    Fixups.push_back(MCFixup::Create(0, Expr, FK_PCRel_2));
+  }
+  return 0;
+}
+
+// Emit one byte through output stream
+void EmitByte(unsigned char C, unsigned &CurByte, raw_ostream &OS) {
+  OS << (char)C;
+  ++CurByte;
+}
+
+// Emit a series of bytes (little endian)
+void EmitLEConstant(uint64_t Val, unsigned Size, unsigned &CurByte,
+                    raw_ostream &OS) {
+  assert(Size <= 8 && "size too big in emit constant");
+
+  for (unsigned i = 0; i != Size; ++i) {
+    EmitByte(Val & 255, CurByte, OS);
+    Val >>= 8;
+  }
+}
+
+// Emit a series of bytes (big endian)
+void EmitBEConstant(uint64_t Val, unsigned Size, unsigned &CurByte,
+                    raw_ostream &OS) {
+  assert(Size <= 8 && "size too big in emit constant");
+
+  for (int i = (Size-1)*8; i >= 0; i-=8)
+    EmitByte((Val >> i) & 255, CurByte, OS);
+}
+
+void BPFMCCodeEmitter::EncodeInstruction(const MCInst &MI, raw_ostream &OS,
+                                         SmallVectorImpl<MCFixup> &Fixups) const {
+  unsigned Opcode = MI.getOpcode();
+//  const MCInstrDesc &Desc = MCII.get(Opcode);
+  // Keep track of the current byte being emitted
+  unsigned CurByte = 0;
+
+  if (Opcode == BPF::LD_imm64) {
+    uint64_t Value = getBinaryCodeForInstr(MI, Fixups);
+    EmitByte(Value >> 56, CurByte, OS);
+    EmitByte(((Value >> 48) & 0xff), CurByte, OS);
+    EmitLEConstant(0, 2, CurByte, OS);
+    EmitLEConstant(Value & 0xffffFFFF, 4, CurByte, OS);
+
+    const MCOperand &MO = MI.getOperand(1);
+    uint64_t Imm = MO.isImm() ? MO.getImm() : 0;
+    EmitByte(0, CurByte, OS);
+    EmitByte(0, CurByte, OS);
+    EmitLEConstant(0, 2, CurByte, OS);
+    EmitLEConstant(Imm >> 32, 4, CurByte, OS);
+  } else {
+    // Get instruction encoding and emit it
+    uint64_t Value = getBinaryCodeForInstr(MI, Fixups);
+    EmitByte(Value >> 56, CurByte, OS);
+    EmitByte((Value >> 48) & 0xff, CurByte, OS);
+    EmitLEConstant((Value >> 32) & 0xffff, 2, CurByte, OS);
+    EmitLEConstant(Value & 0xffffFFFF, 4, CurByte, OS);
+  }
+}
+
+// Encode BPF Memory Operand
+uint64_t BPFMCCodeEmitter::getMemoryOpValue(const MCInst &MI, unsigned Op,
+                                            SmallVectorImpl<MCFixup> &Fixups) const {
+  uint64_t encoding;
+  const MCOperand op1 = MI.getOperand(1);
+  assert(op1.isReg() && "First operand is not register.");
+  encoding = getBPFRegisterNumbering(op1.getReg());
+  encoding <<= 16;
+  MCOperand op2 = MI.getOperand(2);
+  assert(op2.isImm() && "Second operand is not immediate.");
+  encoding |= op2.getImm() & 0xffff;
+  return encoding;
+}
+
+#include "BPFGenMCCodeEmitter.inc"
diff --git a/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.cpp b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.cpp
new file mode 100644
index 000000000000..db043d77f518
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.cpp
@@ -0,0 +1,115 @@
+//===-- BPFMCTargetDesc.cpp - BPF Target Descriptions -----------------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file provides BPF specific target descriptions.
+
+#include "BPF.h"
+#include "BPFMCTargetDesc.h"
+#include "BPFMCAsmInfo.h"
+#include "InstPrinter/BPFInstPrinter.h"
+#include "llvm/MC/MCCodeGenInfo.h"
+#include "llvm/MC/MCInstrInfo.h"
+#include "llvm/MC/MCRegisterInfo.h"
+#include "llvm/MC/MCStreamer.h"
+#include "llvm/MC/MCSubtargetInfo.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/TargetRegistry.h"
+
+#define GET_INSTRINFO_MC_DESC
+#include "BPFGenInstrInfo.inc"
+
+#define GET_SUBTARGETINFO_MC_DESC
+#include "BPFGenSubtargetInfo.inc"
+
+#define GET_REGINFO_MC_DESC
+#include "BPFGenRegisterInfo.inc"
+
+using namespace llvm;
+
+static MCInstrInfo *createBPFMCInstrInfo() {
+  MCInstrInfo *X = new MCInstrInfo();
+  InitBPFMCInstrInfo(X);
+  return X;
+}
+
+static MCRegisterInfo *createBPFMCRegisterInfo(StringRef TT) {
+  MCRegisterInfo *X = new MCRegisterInfo();
+  InitBPFMCRegisterInfo(X, BPF::R9);
+  return X;
+}
+
+static MCSubtargetInfo *createBPFMCSubtargetInfo(StringRef TT, StringRef CPU,
+                                                   StringRef FS) {
+  MCSubtargetInfo *X = new MCSubtargetInfo();
+  InitBPFMCSubtargetInfo(X, TT, CPU, FS);
+  return X;
+}
+
+static MCCodeGenInfo *createBPFMCCodeGenInfo(StringRef TT, Reloc::Model RM,
+                                               CodeModel::Model CM,
+                                               CodeGenOpt::Level OL) {
+  MCCodeGenInfo *X = new MCCodeGenInfo();
+  X->InitMCCodeGenInfo(RM, CM, OL);
+  return X;
+}
+
+static MCStreamer *createBPFMCStreamer(const Target &T, StringRef TT,
+                                    MCContext &Ctx, MCAsmBackend &MAB,
+                                    raw_ostream &_OS,
+                                    MCCodeEmitter *_Emitter,
+                                    bool RelaxAll,
+                                    bool NoExecStack) {
+#if LLVM_VERSION_MINOR==4
+  return createELFStreamer(Ctx, 0, MAB, _OS, _Emitter, RelaxAll, NoExecStack);
+#else
+  return createELFStreamer(Ctx, MAB, _OS, _Emitter, RelaxAll, NoExecStack);
+#endif
+}
+
+static MCInstPrinter *createBPFMCInstPrinter(const Target &T,
+                                              unsigned SyntaxVariant,
+                                              const MCAsmInfo &MAI,
+                                              const MCInstrInfo &MII,
+                                              const MCRegisterInfo &MRI,
+                                              const MCSubtargetInfo &STI) {
+  if (SyntaxVariant == 0)
+    return new BPFInstPrinter(MAI, MII, MRI);
+  return 0;
+}
+
+extern "C" void LLVMInitializeBPFTargetMC() {
+  // Register the MC asm info.
+  RegisterMCAsmInfo<BPFMCAsmInfo> X(TheBPFTarget);
+
+  // Register the MC codegen info.
+  TargetRegistry::RegisterMCCodeGenInfo(TheBPFTarget,
+                                       createBPFMCCodeGenInfo);
+
+  // Register the MC instruction info.
+  TargetRegistry::RegisterMCInstrInfo(TheBPFTarget, createBPFMCInstrInfo);
+
+  // Register the MC register info.
+  TargetRegistry::RegisterMCRegInfo(TheBPFTarget, createBPFMCRegisterInfo);
+
+  // Register the MC subtarget info.
+  TargetRegistry::RegisterMCSubtargetInfo(TheBPFTarget,
+                                          createBPFMCSubtargetInfo);
+
+  // Register the MC code emitter
+  TargetRegistry::RegisterMCCodeEmitter(TheBPFTarget,
+                                        llvm::createBPFMCCodeEmitter);
+
+  // Register the ASM Backend
+  TargetRegistry::RegisterMCAsmBackend(TheBPFTarget,
+                                       createBPFAsmBackend);
+
+  // Register the object streamer
+  TargetRegistry::RegisterMCObjectStreamer(TheBPFTarget,
+                                           createBPFMCStreamer);
+
+
+  // Register the MCInstPrinter.
+  TargetRegistry::RegisterMCInstPrinter(TheBPFTarget,
+                                        createBPFMCInstPrinter);
+}
diff --git a/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.h b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.h
new file mode 100644
index 000000000000..b337a00abf5b
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/MCTargetDesc/BPFMCTargetDesc.h
@@ -0,0 +1,56 @@
+//===-- BPFMCTargetDesc.h - BPF Target Descriptions -----------*- C++ -*-===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This file provides BPF specific target descriptions.
+
+#ifndef BPFMCTARGETDESC_H
+#define BPFMCTARGETDESC_H
+
+#include "llvm/Support/DataTypes.h"
+#include "llvm/Config/config.h"
+
+namespace llvm {
+class MCAsmBackend;
+class MCCodeEmitter;
+class MCContext;
+class MCInstrInfo;
+class MCObjectWriter;
+class MCRegisterInfo;
+class MCSubtargetInfo;
+class Target;
+class StringRef;
+class raw_ostream;
+
+extern Target TheBPFTarget;
+
+MCCodeEmitter *createBPFMCCodeEmitter(const MCInstrInfo &MCII,
+                                       const MCRegisterInfo &MRI,
+                                       const MCSubtargetInfo &STI,
+                                       MCContext &Ctx);
+
+MCAsmBackend *createBPFAsmBackend(const Target &T,
+#if LLVM_VERSION_MINOR==4
+                                  const MCRegisterInfo &MRI,
+#endif
+                                  StringRef TT, StringRef CPU);
+
+
+MCObjectWriter *createBPFELFObjectWriter(raw_ostream &OS, uint8_t OSABI);
+}
+
+// Defines symbolic names for BPF registers.  This defines a mapping from
+// register name to register number.
+//
+#define GET_REGINFO_ENUM
+#include "BPFGenRegisterInfo.inc"
+
+// Defines symbolic names for the BPF instructions.
+//
+#define GET_INSTRINFO_ENUM
+#include "BPFGenInstrInfo.inc"
+
+#define GET_SUBTARGETINFO_ENUM
+#include "BPFGenSubtargetInfo.inc"
+
+#endif
diff --git a/tools/bpf/llvm/lib/Target/BPF/TargetInfo/BPFTargetInfo.cpp b/tools/bpf/llvm/lib/Target/BPF/TargetInfo/BPFTargetInfo.cpp
new file mode 100644
index 000000000000..4d16305e2b86
--- /dev/null
+++ b/tools/bpf/llvm/lib/Target/BPF/TargetInfo/BPFTargetInfo.cpp
@@ -0,0 +1,13 @@
+//===-- BPFTargetInfo.cpp - BPF Target Implementation -----------------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+
+#include "BPF.h"
+#include "llvm/Support/TargetRegistry.h"
+using namespace llvm;
+
+Target llvm::TheBPFTarget;
+
+extern "C" void LLVMInitializeBPFTargetInfo() {
+  RegisterTarget<Triple::x86_64> X(TheBPFTarget, "bpf", "BPF");
+}
diff --git a/tools/bpf/llvm/tools/llc/llc.cpp b/tools/bpf/llvm/tools/llc/llc.cpp
new file mode 100644
index 000000000000..517a7a82ce49
--- /dev/null
+++ b/tools/bpf/llvm/tools/llc/llc.cpp
@@ -0,0 +1,381 @@
+//===-- llc.cpp - Implement the LLVM Native Code Generator ----------------===//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// This is the llc code generator driver. It provides a convenient
+// command-line interface for generating native assembly-language code
+// or C code, given LLVM bitcode.
+
+#include "llvm/Config/config.h"
+#undef LLVM_NATIVE_TARGET
+#undef LLVM_NATIVE_ASMPRINTER
+#undef LLVM_NATIVE_ASMPARSER
+#undef LLVM_NATIVE_DISASSEMBLER
+#if LLVM_VERSION_MINOR==3 || LLVM_VERSION_MINOR==4
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/DataLayout.h"
+#include "llvm/IRReader/IRReader.h"
+#include "llvm/Support/SourceMgr.h"
+#else
+#include "llvm/LLVMContext.h"
+#include "llvm/Module.h"
+#include "llvm/DataLayout.h"
+#include "llvm/Support/IRReader.h"
+#endif
+#include "llvm/PassManager.h"
+#include "llvm/Pass.h"
+#include "llvm/ADT/Triple.h"
+#include "llvm/Assembly/PrintModulePass.h"
+#include "llvm/CodeGen/LinkAllAsmWriterComponents.h"
+#include "llvm/CodeGen/LinkAllCodegenComponents.h"
+#include "llvm/MC/SubtargetFeature.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/FormattedStream.h"
+#include "llvm/Support/ManagedStatic.h"
+#include "llvm/Support/PluginLoader.h"
+#include "llvm/Support/PrettyStackTrace.h"
+#include "llvm/Support/ToolOutputFile.h"
+#include "llvm/Support/Host.h"
+#include "llvm/Support/Signals.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Support/TargetSelect.h"
+#include "llvm/Target/TargetLibraryInfo.h"
+#include "llvm/Target/TargetMachine.h"
+#include <memory>
+using namespace llvm;
+
+extern "C" {
+void AnnotateHappensBefore(const char *file, int line,
+                           const volatile void *cv) {}
+void AnnotateHappensAfter(const char *file, int line,
+                          const volatile void *cv) {}
+void AnnotateIgnoreWritesBegin(const char *file, int line) {}
+void AnnotateIgnoreWritesEnd(const char *file, int line) {}
+}
+
+__attribute__((weak)) bool llvm::DebugFlag;
+
+__attribute__((weak)) bool llvm::isCurrentDebugType(const char *Type) {
+ return false;
+}
+
+// General options for llc.  Other pass-specific options are specified
+// within the corresponding llc passes, and target-specific options
+// and back-end code generation options are specified with the target machine.
+//
+static cl::opt<std::string>
+InputFilename(cl::Positional, cl::desc("<input bitcode>"), cl::init("-"));
+
+static cl::opt<std::string>
+OutputFilename("o", cl::desc("Output filename"), cl::value_desc("filename"));
+
+// Determine optimization level.
+static cl::opt<char>
+OptLevel("O",
+         cl::desc("Optimization level. [-O0, -O1, -O2, or -O3] "
+                  "(default = '-O2')"),
+         cl::Prefix,
+         cl::ZeroOrMore,
+         cl::init(' '));
+
+static cl::opt<std::string>
+TargetTriple("mtriple", cl::desc("Override target triple for module"));
+
+static cl::list<std::string>
+MAttrs("mattr",
+  cl::CommaSeparated,
+  cl::desc("Target specific attributes (-mattr=help for details)"),
+  cl::value_desc("a1,+a2,-a3,..."));
+
+cl::opt<TargetMachine::CodeGenFileType>
+FileType("filetype", cl::init(TargetMachine::CGFT_ObjectFile),
+  cl::desc("Choose a file type (not all types are supported by all targets):"),
+  cl::values(
+       clEnumValN(TargetMachine::CGFT_AssemblyFile, "asm",
+                  "Emit an assembly ('.s') file"),
+       clEnumValN(TargetMachine::CGFT_ObjectFile, "obj",
+                  "Emit a native object ('.o') file"),
+       clEnumValN(TargetMachine::CGFT_Null, "null",
+                  "Emit nothing, for performance testing"),
+       clEnumValEnd));
+
+cl::opt<bool> NoVerify("disable-verify", cl::Hidden,
+                       cl::desc("Do not verify input module"));
+
+static cl::opt<bool>
+DontPlaceZerosInBSS("nozero-initialized-in-bss",
+  cl::desc("Don't place zero-initialized symbols into bss section"),
+  cl::init(false));
+
+static cl::opt<bool>
+DisableSimplifyLibCalls("disable-simplify-libcalls",
+  cl::desc("Disable simplify-libcalls"),
+  cl::init(false));
+
+static cl::opt<bool>
+EnableGuaranteedTailCallOpt("tailcallopt",
+  cl::desc("Turn fastcc calls into tail calls by (potentially) changing ABI."),
+  cl::init(false));
+
+static cl::opt<bool>
+DisableTailCalls("disable-tail-calls",
+  cl::desc("Never emit tail calls"),
+  cl::init(false));
+
+static cl::opt<std::string> StopAfter("stop-after",
+  cl::desc("Stop compilation after a specific pass"),
+  cl::value_desc("pass-name"),
+  cl::init(""));
+static cl::opt<std::string> StartAfter("start-after",
+  cl::desc("Resume compilation after a specific pass"),
+  cl::value_desc("pass-name"),
+  cl::init(""));
+
+// GetFileNameRoot - Helper function to get the basename of a filename.
+static inline std::string
+GetFileNameRoot(const std::string &InputFilename) {
+  std::string IFN = InputFilename;
+  std::string outputFilename;
+  int Len = IFN.length();
+  if ((Len > 2) &&
+      IFN[Len-3] == '.' &&
+      ((IFN[Len-2] == 'b' && IFN[Len-1] == 'c') ||
+       (IFN[Len-2] == 'l' && IFN[Len-1] == 'l'))) {
+    outputFilename = std::string(IFN.begin(), IFN.end()-3); // s/.bc/.s/
+  } else {
+    outputFilename = IFN;
+  }
+  return outputFilename;
+}
+
+static tool_output_file *GetOutputStream(const char *TargetName,
+                                         Triple::OSType OS,
+                                         const char *ProgName) {
+  // If we don't yet have an output filename, make one.
+  if (OutputFilename.empty()) {
+    if (InputFilename == "-")
+      OutputFilename = "-";
+    else {
+      OutputFilename = GetFileNameRoot(InputFilename);
+
+      switch (FileType) {
+      case TargetMachine::CGFT_AssemblyFile:
+        if (TargetName[0] == 'c') {
+          if (TargetName[1] == 0)
+            OutputFilename += ".cbe.c";
+          else if (TargetName[1] == 'p' && TargetName[2] == 'p')
+            OutputFilename += ".cpp";
+          else
+            OutputFilename += ".s";
+        } else
+          OutputFilename += ".s";
+        break;
+      case TargetMachine::CGFT_ObjectFile:
+        OutputFilename += ".o";
+        break;
+      case TargetMachine::CGFT_Null:
+        OutputFilename += ".null";
+        break;
+      }
+    }
+  }
+
+  // Decide if we need "binary" output.
+  bool Binary = false;
+  switch (FileType) {
+  case TargetMachine::CGFT_AssemblyFile:
+    break;
+  case TargetMachine::CGFT_ObjectFile:
+  case TargetMachine::CGFT_Null:
+    Binary = true;
+    break;
+  }
+
+  // Open the file.
+  std::string error;
+#if LLVM_VERSION_MINOR==4
+  sys::fs::OpenFlags OpenFlags = sys::fs::F_None;
+  if (Binary)
+    OpenFlags |= sys::fs::F_Binary;
+#else
+  unsigned OpenFlags = 0;
+  if (Binary) OpenFlags |= raw_fd_ostream::F_Binary;
+#endif
+  tool_output_file *FDOut = new tool_output_file(OutputFilename.c_str(), error,
+                                                 OpenFlags);
+  if (!error.empty()) {
+    errs() << error << '\n';
+    delete FDOut;
+    return 0;
+  }
+
+  return FDOut;
+}
+
+// main - Entry point for the llc compiler.
+//
+int main(int argc, char **argv) {
+  sys::PrintStackTraceOnErrorSignal();
+  PrettyStackTraceProgram X(argc, argv);
+
+  // Enable debug stream buffering.
+  EnableDebugBuffering = true;
+
+  LLVMContext &Context = getGlobalContext();
+  llvm_shutdown_obj Y;  // Call llvm_shutdown() on exit.
+
+  // Initialize targets first, so that --version shows registered targets.
+  InitializeAllTargets();
+  InitializeAllTargetMCs();
+  InitializeAllAsmPrinters();
+  InitializeAllAsmParsers();
+
+  // Initialize codegen and IR passes used by llc so that the -print-after,
+  // -print-before, and -stop-after options work.
+  PassRegistry *Registry = PassRegistry::getPassRegistry();
+  initializeCore(*Registry);
+  initializeCodeGen(*Registry);
+  initializeLoopStrengthReducePass(*Registry);
+  initializeLowerIntrinsicsPass(*Registry);
+  initializeUnreachableBlockElimPass(*Registry);
+
+  // Register the target printer for --version.
+  cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);
+
+  cl::ParseCommandLineOptions(argc, argv, "llvm system compiler\n");
+
+  // Load the module to be compiled...
+  SMDiagnostic Err;
+  std::auto_ptr<Module> M;
+  Module *mod = 0;
+  Triple TheTriple;
+
+  M.reset(ParseIRFile(InputFilename, Err, Context));
+  mod = M.get();
+  if (mod == 0) {
+    Err.print(argv[0], errs());
+    return 1;
+  }
+
+  // If we are supposed to override the target triple, do so now.
+  if (!TargetTriple.empty())
+    mod->setTargetTriple(Triple::normalize(TargetTriple));
+  TheTriple = Triple(mod->getTargetTriple());
+
+  if (TheTriple.getTriple().empty())
+    TheTriple.setTriple(sys::getDefaultTargetTriple());
+
+  // Get the target specific parser.
+  std::string Error;
+  const Target *TheTarget = TargetRegistry::lookupTarget("bpf", TheTriple,
+                                                         Error);
+  if (!TheTarget) {
+    errs() << argv[0] << ": " << Error;
+    return 1;
+  }
+
+  // Package up features to be passed to target/subtarget
+  std::string FeaturesStr;
+  if (MAttrs.size()) {
+    SubtargetFeatures Features;
+    for (unsigned i = 0; i != MAttrs.size(); ++i)
+      Features.AddFeature(MAttrs[i]);
+    FeaturesStr = Features.getString();
+  }
+
+  CodeGenOpt::Level OLvl = CodeGenOpt::Default;
+  switch (OptLevel) {
+  default:
+    errs() << argv[0] << ": invalid optimization level.\n";
+    return 1;
+  case ' ': break;
+  case '0': OLvl = CodeGenOpt::None; break;
+  case '1': OLvl = CodeGenOpt::Less; break;
+  case '2': OLvl = CodeGenOpt::Default; break;
+  case '3': OLvl = CodeGenOpt::Aggressive; break;
+  }
+
+  TargetOptions Options;
+  Options.NoZerosInBSS = DontPlaceZerosInBSS;
+  Options.GuaranteedTailCallOpt = EnableGuaranteedTailCallOpt;
+  Options.DisableTailCalls = DisableTailCalls;
+
+  std::auto_ptr<TargetMachine>
+    target(TheTarget->createTargetMachine(TheTriple.getTriple(),
+                                          "", FeaturesStr, Options,
+                                          Reloc::Default, CodeModel::Default, OLvl));
+  assert(target.get() && "Could not allocate target machine!");
+  assert(mod && "Should have exited after outputting help!");
+  TargetMachine &Target = *target.get();
+
+  Target.setMCUseLoc(false);
+
+  Target.setMCUseCFI(false);
+
+  // Figure out where we are going to send the output.
+  OwningPtr<tool_output_file> Out
+    (GetOutputStream(TheTarget->getName(), TheTriple.getOS(), argv[0]));
+  if (!Out) return 1;
+
+  // Build up all of the passes that we want to do to the module.
+  PassManager PM;
+
+  // Add an appropriate TargetLibraryInfo pass for the module's triple.
+  TargetLibraryInfo *TLI = new TargetLibraryInfo(TheTriple);
+  if (DisableSimplifyLibCalls)
+    TLI->disableAllFunctions();
+  PM.add(TLI);
+
+  // Add the target data from the target machine, if it exists, or the module.
+  if (const DataLayout *TD = Target.getDataLayout())
+    PM.add(new DataLayout(*TD));
+  else
+    PM.add(new DataLayout(mod));
+
+  // Override default to generate verbose assembly.
+  Target.setAsmVerbosityDefault(true);
+
+  {
+    formatted_raw_ostream FOS(Out->os());
+
+    AnalysisID StartAfterID = 0;
+    AnalysisID StopAfterID = 0;
+    const PassRegistry *PR = PassRegistry::getPassRegistry();
+    if (!StartAfter.empty()) {
+      const PassInfo *PI = PR->getPassInfo(StartAfter);
+      if (!PI) {
+        errs() << argv[0] << ": start-after pass is not registered.\n";
+        return 1;
+      }
+      StartAfterID = PI->getTypeInfo();
+    }
+    if (!StopAfter.empty()) {
+      const PassInfo *PI = PR->getPassInfo(StopAfter);
+      if (!PI) {
+        errs() << argv[0] << ": stop-after pass is not registered.\n";
+        return 1;
+      }
+      StopAfterID = PI->getTypeInfo();
+    }
+
+    // Ask the target to add backend passes as necessary.
+    if (Target.addPassesToEmitFile(PM, FOS, FileType, NoVerify,
+                                   StartAfterID, StopAfterID)) {
+      errs() << argv[0] << ": target does not support generation of this"
+             << " file type!\n";
+      return 1;
+    }
+
+    // Before executing passes, print the final values of the LLVM options.
+    cl::PrintOptionValues();
+
+    PM.run(*mod);
+  }
+
+  // Declare success.
+  Out->keep();
+
+  return 0;
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 26/29] samples: bpf: elf file loader
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (24 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 25/29] bpf: llvm backend Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 27/29] samples: bpf: eBPF example in C Alexei Starovoitov
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

simple .o parser and loader using BPF syscall.
.o is a standard ELF generated by LLVM backend

It parses elf file compiled by llvm .c->.o
- parses 'maps' section and creates maps via BPF syscall
- parses 'license' section and passes it to syscall
- parses elf relocations for BPF maps and adjusts BPF_LD_IMM64 insns
  by storing map_fd into insn->imm and marking such insns as BPF_PSEUDO_MAP_FD
- loads eBPF program via BPF syscall
- attaches program FD to tracepoint events

One ELF file can contain multiple BPF programs attached to multiple
tracepoint events

int load_bpf_file(char *path);

bpf_helpers.h is a set of in-kernel helper functions available to eBPF programs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 samples/bpf/bpf_helpers.h |   27 ++++++
 samples/bpf/bpf_load.c    |  234 +++++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/bpf_load.h    |   26 +++++
 3 files changed, 287 insertions(+)
 create mode 100644 samples/bpf/bpf_helpers.h
 create mode 100644 samples/bpf/bpf_load.c
 create mode 100644 samples/bpf/bpf_load.h

diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
new file mode 100644
index 000000000000..08752fad472d
--- /dev/null
+++ b/samples/bpf/bpf_helpers.h
@@ -0,0 +1,27 @@
+#ifndef __BPF_HELPERS_H
+#define __BPF_HELPERS_H
+
+#define SEC(NAME) __attribute__((section(NAME), used))
+
+static void *(*bpf_fetch_ptr)(void *unsafe_ptr) = (void *) BPF_FUNC_fetch_ptr;
+static unsigned long long (*bpf_fetch_u64)(void *unsafe_ptr) = (void *) BPF_FUNC_fetch_u64;
+static unsigned int (*bpf_fetch_u32)(void *unsafe_ptr) = (void *) BPF_FUNC_fetch_u32;
+static unsigned short (*bpf_fetch_u16)(void *unsafe_ptr) = (void *) BPF_FUNC_fetch_u16;
+static unsigned char (*bpf_fetch_u8)(void *unsafe_ptr) = (void *) BPF_FUNC_fetch_u8;
+static int (*bpf_printk)(const char *fmt, int fmt_size, ...) = (void *) BPF_FUNC_printk;
+static int (*bpf_memcmp)(void *unsafe_ptr, void *safe_ptr, int size) = (void *) BPF_FUNC_memcmp;
+static void *(*bpf_map_lookup_elem)(void *map, void *key) = (void*) BPF_FUNC_map_lookup_elem;
+static int (*bpf_map_update_elem)(void *map, void *key, void *value) = (void*) BPF_FUNC_map_update_elem;
+static int (*bpf_map_delete_elem)(void *map, void *key) = (void *) BPF_FUNC_map_delete_elem;
+static void (*bpf_dump_stack)(void) = (void *) BPF_FUNC_dump_stack;
+static unsigned long long (*bpf_ktime_get_ns)(void) = (void *) BPF_FUNC_ktime_get_ns;
+static void *(*bpf_get_current)(void) = (void *) BPF_FUNC_get_current;
+
+struct bpf_map_def {
+	int type;
+	int key_size;
+	int value_size;
+	int max_entries;
+};
+
+#endif
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
new file mode 100644
index 000000000000..a37a5cd25121
--- /dev/null
+++ b/samples/bpf/bpf_load.c
@@ -0,0 +1,234 @@
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <libelf.h>
+#include <gelf.h>
+#include <errno.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdbool.h>
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include "libbpf.h"
+#include "bpf_helpers.h"
+#include "bpf_load.h"
+
+#define DEBUGFS "/sys/kernel/debug/tracing/"
+
+static char license[128];
+static bool processed_sec[128];
+int map_fd[MAX_MAPS];
+
+static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
+{
+	int fd, event_fd, err;
+	char fmt[32];
+	char path[256] = DEBUGFS;
+
+	fd = bpf_prog_load(BPF_PROG_TYPE_TRACING_FILTER, prog, size, license);
+
+	if (fd < 0) {
+		printf("bpf_prog_load() err=%d\n%s", errno, bpf_log_buf);
+		return fd;
+	}
+
+	snprintf(fmt, sizeof(fmt), "bpf-%d", fd);
+
+	strcat(path, event);
+	strcat(path, "/filter");
+
+	printf("writing %s -> %s\n", fmt, path);
+
+	event_fd = open(path, O_WRONLY, 0);
+	if (event_fd < 0) {
+		printf("failed to open event %s\n", event);
+		return event_fd;
+	}
+
+	err = write(event_fd, fmt, strlen(fmt));
+	(void) err;
+
+	return 0;
+}
+
+static int load_maps(struct bpf_map_def *maps, int len)
+{
+	int i;
+
+	for (i = 0; i < len / sizeof(struct bpf_map_def); i++) {
+
+		map_fd[i] = bpf_create_map(maps[i].type,
+					   maps[i].key_size,
+					   maps[i].value_size,
+					   maps[i].max_entries);
+		if (map_fd[i] < 0)
+			return 1;
+	}
+	return 0;
+}
+
+static int get_sec(Elf *elf, int i, GElf_Ehdr *ehdr, char **shname,
+		   GElf_Shdr *shdr, Elf_Data **data)
+{
+	Elf_Scn *scn;
+
+	scn = elf_getscn(elf, i);
+	if (!scn)
+		return 1;
+
+	if (gelf_getshdr(scn, shdr) != shdr)
+		return 2;
+
+	*shname = elf_strptr(elf, ehdr->e_shstrndx, shdr->sh_name);
+	if (!*shname || !shdr->sh_size)
+		return 3;
+
+	*data = elf_getdata(scn, 0);
+	if (!*data || elf_getdata(scn, *data) != NULL)
+		return 4;
+
+	return 0;
+}
+
+static int parse_relo_and_apply(Elf_Data *data, Elf_Data *symbols,
+				GElf_Shdr *shdr, struct bpf_insn *insn)
+{
+	int i, nrels;
+
+	nrels = shdr->sh_size / shdr->sh_entsize;
+
+	for (i = 0; i < nrels; i++) {
+		GElf_Sym sym;
+		GElf_Rel rel;
+		unsigned int insn_idx;
+
+		gelf_getrel(data, i, &rel);
+
+		insn_idx = rel.r_offset / sizeof(struct bpf_insn);
+
+		gelf_getsym(symbols, GELF_R_SYM(rel.r_info), &sym);
+
+		if (insn[insn_idx].code != (BPF_LD | BPF_IMM | BPF_DW)) {
+			printf("invalid relo for insn[%d].code 0x%x\n",
+			       insn_idx, insn[insn_idx].code);
+			return 1;
+		}
+		insn[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+		insn[insn_idx].imm = map_fd[sym.st_value / sizeof(struct bpf_map_def)];
+	}
+
+	return 0;
+}
+
+int load_bpf_file(char *path)
+{
+	int fd, i;
+	Elf *elf;
+	GElf_Ehdr ehdr;
+	GElf_Shdr shdr, shdr_prog;
+	Elf_Data *data, *data_prog, *symbols = NULL;
+	char *shname, *shname_prog;
+
+	if (elf_version(EV_CURRENT) == EV_NONE)
+		return 1;
+
+	fd = open(path, O_RDONLY, 0);
+	if (fd < 0)
+		return 1;
+
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	if (!elf)
+		return 1;
+
+	if (gelf_getehdr(elf, &ehdr) != &ehdr)
+		return 1;
+
+	/* scan over all elf sections to get license and map info */
+	for (i = 1; i < ehdr.e_shnum; i++) {
+
+		if (get_sec(elf, i, &ehdr, &shname, &shdr, &data))
+			continue;
+
+		if (0)
+			printf("section %d:%s data %p size %zd link %d flags %d\n",
+			       i, shname, data->d_buf, data->d_size,
+			       shdr.sh_link, (int) shdr.sh_flags);
+
+		if (strcmp(shname, "license") == 0) {
+			processed_sec[i] = true;
+			memcpy(license, data->d_buf, data->d_size);
+		} else if (strcmp(shname, "maps") == 0) {
+			processed_sec[i] = true;
+			if (load_maps(data->d_buf, data->d_size))
+				return 1;
+		} else if (shdr.sh_type == SHT_SYMTAB) {
+			symbols = data;
+		}
+	}
+
+	/* load programs that need map fixup (relocations) */
+	for (i = 1; i < ehdr.e_shnum; i++) {
+
+		if (get_sec(elf, i, &ehdr, &shname, &shdr, &data))
+			continue;
+		if (shdr.sh_type == SHT_REL) {
+			struct bpf_insn *insns;
+
+			if (get_sec(elf, shdr.sh_info, &ehdr, &shname_prog,
+				    &shdr_prog, &data_prog))
+				continue;
+
+			if (0)
+				printf("relo %s into %s\n", shname, shname_prog);
+
+			insns = (struct bpf_insn *) data_prog->d_buf;
+
+			processed_sec[shdr.sh_info] = true;
+			processed_sec[i] = true;
+
+			if (parse_relo_and_apply(data, symbols, &shdr, insns))
+				continue;
+
+			if (memcmp(shname_prog, "events/", sizeof("events/") - 1) == 0)
+				load_and_attach(shname_prog, insns, data_prog->d_size);
+		}
+	}
+
+	/* load programs that don't use maps */
+	for (i = 1; i < ehdr.e_shnum; i++) {
+
+		if (processed_sec[i])
+			continue;
+
+		if (get_sec(elf, i, &ehdr, &shname, &shdr, &data))
+			continue;
+
+		if (memcmp(shname, "events/", sizeof("events/") - 1) == 0)
+			load_and_attach(shname, data->d_buf, data->d_size);
+	}
+
+	close(fd);
+	return 0;
+}
+
+void read_trace_pipe(void)
+{
+	int trace_fd;
+
+	trace_fd = open(DEBUGFS "trace_pipe", O_RDONLY, 0);
+	if (trace_fd < 0)
+		return;
+
+	while (1) {
+		static char buf[4096];
+		ssize_t sz;
+
+		sz = read(trace_fd, buf, sizeof(buf));
+		if (sz) {
+			buf[sz] = 0;
+			puts(buf);
+		}
+	}
+}
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
new file mode 100644
index 000000000000..209190d793ff
--- /dev/null
+++ b/samples/bpf/bpf_load.h
@@ -0,0 +1,26 @@
+#ifndef __BPF_LOAD_H
+#define __BPF_LOAD_H
+
+#define MAX_MAPS 64
+
+extern int map_fd[MAX_MAPS];
+
+/* parses elf file compiled by llvm .c->.o
+ * . parses 'maps' section and creates maps via BPF syscall
+ * . parses 'license' section and passes it to syscall
+ * . parses elf relocations for BPF maps and adjusts BPF_LD_IMM64 insns by
+ *   storing map_fd into insn->imm and marking such insns as BPF_PSEUDO_MAP_FD
+ * . loads eBPF program via BPF syscall
+ * . attaches program FD to tracepoint events
+ *
+ * One ELF file can contain multiple BPF programs attached to multiple
+ * tracepoint events
+ *
+ * returns zero on success
+ */
+int load_bpf_file(char *path);
+
+/* forever reads /sys/.../trace_pipe */
+void read_trace_pipe(void);
+
+#endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 27/29] samples: bpf: eBPF example in C
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (25 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 26/29] samples: bpf: elf file loader Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 28/29] samples: bpf: counting " Alexei Starovoitov
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

ex1_kern.c - C program which will be compiled into eBPF
to filter netif_receive_skb events on skb->dev->name == "lo"

ex1_user.c - corresponding user space component that
forever reads /sys/.../trace_pipe

Usage:
$ sudo ex1

should see:
writing bpf-4 -> /sys/kernel/debug/tracing/events/net/netif_receive_skb/filter
  ping-25476 [001] ..s3  5639.718218: __netif_receive_skb_core: skb 4d51700 dev b9e6000
  ping-25476 [001] ..s3  5639.718262: __netif_receive_skb_core: skb 4d51400 dev b9e6000

  ping-25476 [002] ..s3  5640.716233: __netif_receive_skb_core: skb 5d06500 dev b9e6000
  ping-25476 [002] ..s3  5640.716272: __netif_receive_skb_core: skb 5d06300 dev b9e6000

Ctrl-C at any time, kernel will auto cleanup

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 samples/bpf/Makefile   |   15 +++++++++++++--
 samples/bpf/ex1_kern.c |   27 +++++++++++++++++++++++++++
 samples/bpf/ex1_user.c |   24 ++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 2 deletions(-)
 create mode 100644 samples/bpf/ex1_kern.c
 create mode 100644 samples/bpf/ex1_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 7264ba51d496..7ce9e6b0d3d0 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -2,12 +2,23 @@
 obj- := dummy.o
 
 # List of programs to build
-hostprogs-y := dropmon test_verifier
+hostprogs-y := dropmon test_verifier ex1
 
 dropmon-objs := dropmon.o libbpf.o
 test_verifier-objs := test_verifier.o libbpf.o
+ex1-objs := bpf_load.o libbpf.o ex1_user.o
 
 # Tell kbuild to always build the programs
-always := $(hostprogs-y)
+always := $(hostprogs-y) ex1_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
+
+HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
+HOSTLOADLIBES_ex1 += -lelf
+
+LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc
+
+%.o: %.c
+	clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
+		-D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \
+		-O2 -emit-llvm -c $< -o -| $(LLC) -o $@
diff --git a/samples/bpf/ex1_kern.c b/samples/bpf/ex1_kern.c
new file mode 100644
index 000000000000..05a06cccaadb
--- /dev/null
+++ b/samples/bpf/ex1_kern.c
@@ -0,0 +1,27 @@
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <uapi/linux/bpf.h>
+#include <trace/bpf_trace.h>
+#include "bpf_helpers.h"
+
+SEC("events/net/netif_receive_skb")
+int bpf_prog1(struct bpf_context *ctx)
+{
+	/*
+	 * attaches to /sys/kernel/debug/tracing/events/net/netif_receive_skb
+	 * prints events for loobpack device only
+	 */
+	char devname[] = "lo";
+	struct net_device *dev;
+	struct sk_buff *skb = 0;
+
+	skb = (struct sk_buff *) ctx->arg1;
+	dev = bpf_fetch_ptr(&skb->dev);
+	if (bpf_memcmp(dev->name, devname, 2) == 0) {
+		char fmt[] = "skb %x dev %x\n";
+		bpf_printk(fmt, sizeof(fmt), skb, dev);
+	}
+	return 0;
+}
+
+char license[] SEC("license") = "GPL";
diff --git a/samples/bpf/ex1_user.c b/samples/bpf/ex1_user.c
new file mode 100644
index 000000000000..e85c1b483f57
--- /dev/null
+++ b/samples/bpf/ex1_user.c
@@ -0,0 +1,24 @@
+#include <stdio.h>
+#include <linux/bpf.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+
+int main(int ac, char **argv)
+{
+	FILE *f;
+	char filename[256];
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+
+	f = popen("ping -c5 localhost", "r");
+	(void) f;
+
+	read_trace_pipe();
+
+	return 0;
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 28/29] samples: bpf: counting eBPF example in C
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (26 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 27/29] samples: bpf: eBPF example in C Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-24 20:21 ` [PATCH v5 net-next 29/29] samples: bpf: IO latency analysis (iosnoop/heatmap) Alexei Starovoitov
  2014-08-25  0:39 ` [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm David Miller
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

this example has two probes in C that use two different maps.

1st probe is the similar to dropmon.c. It attaches to kfree_skb tracepoint and
count number of packet drops at different locations

2nd probe attaches to kprobe/sys_write and computes a histogram of different
write sizes

Usage:
$ sudo ex2

Should see:
writing bpf-5 -> /sys/kernel/debug/tracing/events/skb/kfree_skb/filter
writing bpf-8 -> /sys/kernel/debug/tracing/events/kprobes/sys_write/filter
location 0xffffffff816efc67 count 1

location 0xffffffff815d8030 count 1
location 0xffffffff816efc67 count 3

location 0xffffffff815d8030 count 4
location 0xffffffff816efc67 count 9

           syscall write() stats
     byte_size       : count     distribution
       1 -> 1        : 3141     |****                                  |
       2 -> 3        : 2        |                                      |
       4 -> 7        : 14       |                                      |
       8 -> 15       : 3268     |*****                                 |
      16 -> 31       : 732      |                                      |
      32 -> 63       : 20042    |************************************* |
      64 -> 127      : 12154    |**********************                |
     128 -> 255      : 2215     |***                                   |
     256 -> 511      : 9        |                                      |
     512 -> 1023     : 0        |                                      |
    1024 -> 2047     : 1        |                                      |

Ctrl-C at any time. Kernel will auto cleanup maps and programs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 samples/bpf/Makefile   |    6 ++--
 samples/bpf/ex2_kern.c |   73 +++++++++++++++++++++++++++++++++++++
 samples/bpf/ex2_user.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 171 insertions(+), 2 deletions(-)
 create mode 100644 samples/bpf/ex2_kern.c
 create mode 100644 samples/bpf/ex2_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 7ce9e6b0d3d0..d2de86188925 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -2,19 +2,21 @@
 obj- := dummy.o
 
 # List of programs to build
-hostprogs-y := dropmon test_verifier ex1
+hostprogs-y := dropmon test_verifier ex1 ex2
 
 dropmon-objs := dropmon.o libbpf.o
 test_verifier-objs := test_verifier.o libbpf.o
 ex1-objs := bpf_load.o libbpf.o ex1_user.o
+ex2-objs := bpf_load.o libbpf.o ex2_user.o
 
 # Tell kbuild to always build the programs
-always := $(hostprogs-y) ex1_kern.o
+always := $(hostprogs-y) ex1_kern.o ex2_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
 HOSTLOADLIBES_ex1 += -lelf
+HOSTLOADLIBES_ex2 += -lelf
 
 LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc
 
diff --git a/samples/bpf/ex2_kern.c b/samples/bpf/ex2_kern.c
new file mode 100644
index 000000000000..2daa50b27ce5
--- /dev/null
+++ b/samples/bpf/ex2_kern.c
@@ -0,0 +1,73 @@
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <uapi/linux/bpf.h>
+#include <trace/bpf_trace.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") my_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(long),
+	.value_size = sizeof(long),
+	.max_entries = 1024,
+};
+
+SEC("events/skb/kfree_skb")
+int bpf_prog2(struct bpf_context *ctx)
+{
+	long loc = ctx->arg2;
+	long init_val = 1;
+	void *value;
+
+	value = bpf_map_lookup_elem(&my_map, &loc);
+	if (value)
+		(*(long *) value) += 1;
+	else
+		bpf_map_update_elem(&my_map, &loc, &init_val);
+	return 0;
+}
+
+static unsigned int log2(unsigned int v)
+{
+	unsigned int r;
+	unsigned int shift;
+
+	r = (v > 0xFFFF) << 4; v >>= r;
+	shift = (v > 0xFF) << 3; v >>= shift; r |= shift;
+	shift = (v > 0xF) << 2; v >>= shift; r |= shift;
+	shift = (v > 0x3) << 1; v >>= shift; r |= shift;
+	r |= (v >> 1);
+	return r;
+}
+
+static unsigned int log2l(unsigned long v)
+{
+	unsigned int hi = v >> 32;
+	if (hi)
+		return log2(hi) + 32;
+	else
+		return log2(v);
+}
+
+struct bpf_map_def SEC("maps") my_hist_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(u32),
+	.value_size = sizeof(long),
+	.max_entries = 64,
+};
+
+SEC("events/kprobes/sys_write")
+int bpf_prog3(struct bpf_context *ctx)
+{
+	long write_size = ctx->arg3;
+	long init_val = 1;
+	void *value;
+	u32 index = log2l(write_size);
+
+	value = bpf_map_lookup_elem(&my_hist_map, &index);
+	if (value)
+		__sync_fetch_and_add((long *)value, 1);
+	else
+		bpf_map_update_elem(&my_hist_map, &index, &init_val);
+	return 0;
+}
+char license[] SEC("license") = "GPL";
diff --git a/samples/bpf/ex2_user.c b/samples/bpf/ex2_user.c
new file mode 100644
index 000000000000..fd5ce21ae60a
--- /dev/null
+++ b/samples/bpf/ex2_user.c
@@ -0,0 +1,94 @@
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <linux/bpf.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+
+#define MAX_INDEX	64
+#define MAX_STARS	38
+
+static void stars(char *str, long val, long max, int width)
+{
+	int i;
+
+	for (i = 0; i < (width * val / max) - 1 && i < width - 1; i++)
+		str[i] = '*';
+	if (val > max)
+		str[i - 1] = '+';
+	str[i] = '\0';
+}
+
+static void print_hist(int fd)
+{
+	int key, next_key;
+	long value;
+	long data[MAX_INDEX] = {};
+	char starstr[MAX_STARS];
+	int i;
+	int max_ind = -1;
+	long max_value = 0;
+
+	key = -1; /* some unknown key */
+	while (bpf_get_next_key(fd, &key, &next_key) == 0) {
+		bpf_lookup_elem(fd, &next_key, &value);
+		if (next_key > MAX_INDEX) {
+			printf("BUG: invalid index %d\n", next_key);
+		} else {
+			data[next_key] = value;
+			if (next_key > max_ind)
+				max_ind = next_key;
+			if (value > max_value)
+				max_value = value;
+		}
+		key = next_key;
+	}
+
+	printf("           syscall write() stats\n");
+	printf("     byte_size       : count     distribution\n");
+	for (i = 1; i <= max_ind + 1; i++) {
+		stars(starstr, data[i - 1], max_value, MAX_STARS);
+		printf("%8ld -> %-8ld : %-8ld |%-*s|\n",
+		       (1l << i) >> 1, (1l << i) - 1, data[i - 1],
+		       MAX_STARS, starstr);
+	}
+}
+static void int_exit(int sig)
+{
+	print_hist(map_fd[1]);
+	exit(0);
+}
+
+int main(int ac, char **argv)
+{
+	char filename[256];
+	long key, next_key, value;
+	int i;
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	signal(SIGINT, int_exit);
+
+	i = system("echo 'p:sys_write sys_write' > /sys/kernel/debug/tracing/kprobe_events");
+
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+
+	for (i = 0; i < 5; i++) {
+		key = 0;
+		while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
+			bpf_lookup_elem(map_fd[0], &next_key, &value);
+			printf("location 0x%lx count %ld\n", next_key, value);
+			key = next_key;
+		}
+		if (key)
+			printf("\n");
+		sleep(1);
+	}
+	print_hist(map_fd[1]);
+
+	return 0;
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 net-next 29/29] samples: bpf: IO latency analysis (iosnoop/heatmap)
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (27 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 28/29] samples: bpf: counting " Alexei Starovoitov
@ 2014-08-24 20:21 ` Alexei Starovoitov
  2014-08-25  0:39 ` [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm David Miller
  29 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-24 20:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Ingo Molnar, Linus Torvalds, Andy Lutomirski, Steven Rostedt,
	Daniel Borkmann, Chema Gonzalez, Eric Dumazet, Peter Zijlstra,
	Brendan Gregg, Namhyung Kim, H. Peter Anvin, Andrew Morton,
	Kees Cook, linux-api, netdev, linux-kernel

eBPF C program attaches to block_rq_issue/block_rq_complete events to calculate
IO latency. Then it waits for the first 100 events to compute average latency
and uses range [0 .. ave_lat * 2] to record histogram of events in this latency
range.
User space reads this histogram map every 2 seconds and prints it as a 'heatmap'
using gray shades of text terminal. Black spaces have many events and white
spaces have very few events. Left most space is the smallest latency, right most
space is the largest latency in the range.
If kernel sees too many events that fall out of histogram range, user space
adjusts the range up, so heatmap for next 2 seconds will be more accurate.

Usage:
$ sudo ./ex3
and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal.
Observe IO latencies and how different activity (like 'make kernel') affects it.

Similar experiments can be done for network transmit latencies, syscalls, etc

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 samples/bpf/Makefile   |    6 +-
 samples/bpf/ex3_kern.c |  104 +++++++++++++++++++++++++++++++++
 samples/bpf/ex3_user.c |  149 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 257 insertions(+), 2 deletions(-)
 create mode 100644 samples/bpf/ex3_kern.c
 create mode 100644 samples/bpf/ex3_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index d2de86188925..9e7a9bc2194d 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -2,21 +2,23 @@
 obj- := dummy.o
 
 # List of programs to build
-hostprogs-y := dropmon test_verifier ex1 ex2
+hostprogs-y := dropmon test_verifier ex1 ex2 ex3
 
 dropmon-objs := dropmon.o libbpf.o
 test_verifier-objs := test_verifier.o libbpf.o
 ex1-objs := bpf_load.o libbpf.o ex1_user.o
 ex2-objs := bpf_load.o libbpf.o ex2_user.o
+ex3-objs := bpf_load.o libbpf.o ex3_user.o
 
 # Tell kbuild to always build the programs
-always := $(hostprogs-y) ex1_kern.o ex2_kern.o
+always := $(hostprogs-y) ex1_kern.o ex2_kern.o ex3_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
 HOSTLOADLIBES_ex1 += -lelf
 HOSTLOADLIBES_ex2 += -lelf
+HOSTLOADLIBES_ex3 += -lelf
 
 LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc
 
diff --git a/samples/bpf/ex3_kern.c b/samples/bpf/ex3_kern.c
new file mode 100644
index 000000000000..45ff40ff1077
--- /dev/null
+++ b/samples/bpf/ex3_kern.c
@@ -0,0 +1,104 @@
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <uapi/linux/bpf.h>
+#include <trace/bpf_trace.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") my_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(long),
+	.value_size = sizeof(u64),
+	.max_entries = 4096,
+};
+
+/* alternative events:
+ * SEC("events/syscalls/sys_enter_write")
+ * SEC("events/net/net_dev_start_xmit")
+ */
+SEC("events/block/block_rq_issue")
+int bpf_prog1(struct bpf_context *ctx)
+{
+	long rq = ctx->arg2; /* long rq = bpf_get_current(); */
+	u64 val = bpf_ktime_get_ns();
+
+	bpf_map_update_elem(&my_map, &rq, &val);
+	return 0;
+}
+
+struct globals {
+	u64 lat_ave;
+	u64 lat_sum;
+	u64 missed;
+	u64 max_lat;
+	int num_samples;
+};
+
+struct bpf_map_def SEC("maps") global_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(int),
+	.value_size = sizeof(struct globals),
+	.max_entries = 1,
+};
+
+#define MAX_SLOT 32
+
+struct bpf_map_def SEC("maps") lat_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(int),
+	.value_size = sizeof(u64),
+	.max_entries = MAX_SLOT,
+};
+
+/* alternative evenets:
+ * SEC("events/syscalls/sys_exit_write")
+ * SEC("events/net/net_dev_xmit")
+ */
+SEC("events/block/block_rq_complete")
+int bpf_prog2(struct bpf_context *ctx)
+{
+	long rq = ctx->arg2;
+	void *value;
+
+	value = bpf_map_lookup_elem(&my_map, &rq);
+	if (!value)
+		return 0;
+
+	u64 cur_time = bpf_ktime_get_ns();
+	u64 delta = (cur_time - *(u64 *)value) / 1000;
+
+	bpf_map_delete_elem(&my_map, &rq);
+
+	int ind = 1;
+	struct globals *g = bpf_map_lookup_elem(&global_map, &ind);
+	if (!g)
+		return 0;
+	if (g->lat_ave == 0) {
+		g->num_samples++;
+		g->lat_sum += delta;
+		if (g->num_samples >= 100) {
+			g->lat_ave = g->lat_sum / g->num_samples;
+			if (0/* debug */) {
+				char fmt[] = "after %d samples average latency %ld usec\n";
+				bpf_printk(fmt, sizeof(fmt), g->num_samples,
+					   g->lat_ave);
+			}
+		}
+	} else {
+		u64 max_lat = g->lat_ave * 2;
+		if (delta > max_lat) {
+			g->missed++;
+			if (delta > g->max_lat)
+				g->max_lat = delta;
+			return 0;
+		}
+
+		ind = delta * MAX_SLOT / max_lat;
+		value = bpf_map_lookup_elem(&lat_map, &ind);
+		if (!value)
+			return 0;
+		(*(u64 *)value) ++;
+	}
+
+	return 0;
+}
+char license[] SEC("license") = "GPL";
diff --git a/samples/bpf/ex3_user.c b/samples/bpf/ex3_user.c
new file mode 100644
index 000000000000..508a7c3b61c5
--- /dev/null
+++ b/samples/bpf/ex3_user.c
@@ -0,0 +1,149 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <unistd.h>
+#include <linux/bpf.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
+
+struct globals {
+	__u64 lat_ave;
+	__u64 lat_sum;
+	__u64 missed;
+	__u64 max_lat;
+	int num_samples;
+};
+
+static void clear_stats(int fd)
+{
+	int key;
+	__u64 value = 0;
+	for (key = 0; key < 32; key++)
+		bpf_update_elem(fd, &key, &value);
+}
+
+const char *color[] = {
+	"\033[48;5;255m",
+	"\033[48;5;252m",
+	"\033[48;5;250m",
+	"\033[48;5;248m",
+	"\033[48;5;246m",
+	"\033[48;5;244m",
+	"\033[48;5;242m",
+	"\033[48;5;240m",
+	"\033[48;5;238m",
+	"\033[48;5;236m",
+	"\033[48;5;234m",
+	"\033[48;5;232m",
+};
+const int num_colors = ARRAY_SIZE(color);
+
+const char nocolor[] = "\033[00m";
+
+static void print_banner(__u64 max_lat)
+{
+	printf("0 usec     ...          %lld usec\n", max_lat);
+}
+
+static void print_hist(int fd)
+{
+	int key;
+	__u64 value;
+	__u64 cnt[32];
+	__u64 max_cnt = 0;
+	__u64 total_events = 0;
+	int max_bucket = 0;
+
+	for (key = 0; key < 32; key++) {
+		value = 0;
+		bpf_lookup_elem(fd, &key, &value);
+		if (value > 0)
+			max_bucket = key;
+		cnt[key] = value;
+		total_events += value;
+		if (value > max_cnt)
+			max_cnt = value;
+	}
+	clear_stats(fd);
+	for (key = 0; key < 32; key++) {
+		int c = num_colors * cnt[key] / (max_cnt + 1);
+		printf("%s %s", color[c], nocolor);
+	}
+	printf(" captured=%lld", total_events);
+
+	key = 1;
+	struct globals g = {};
+	bpf_lookup_elem(map_fd[1], &key, &g);
+
+	printf(" missed=%lld max_lat=%lld usec\n",
+	       g.missed, g.max_lat);
+
+	if (g.missed > 10 && g.missed > total_events / 10) {
+		printf("adjusting range UP...\n");
+		g.lat_ave = g.max_lat / 2;
+		print_banner(g.lat_ave * 2);
+	} else if (max_bucket < 4 && total_events > 100) {
+		printf("adjusting range DOWN...\n");
+		g.lat_ave = g.lat_ave / 4;
+		print_banner(g.lat_ave * 2);
+	}
+	/* clear some globals */
+	g.missed = 0;
+	g.max_lat = 0;
+	bpf_update_elem(map_fd[1], &key, &g);
+}
+
+static void int_exit(int sig)
+{
+	print_hist(map_fd[2]);
+	exit(0);
+}
+
+int main(int ac, char **argv)
+{
+	char filename[256];
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+
+	clear_stats(map_fd[2]);
+
+	int key = 1;
+	struct globals g = {};
+	if (bpf_update_elem(map_fd[1], &key, &g) != 0) {
+		printf("bug\n");
+		return 1;
+	}
+	signal(SIGINT, int_exit);
+
+	if (fork() == 0) {
+		read_trace_pipe();
+	} else {
+		printf("waiting for events to determine average latency...\n");
+		for (;;) {
+			bpf_lookup_elem(map_fd[1], &key, &g);
+			if (g.lat_ave)
+				break;
+			sleep(1);
+		}
+
+		printf("  IO latency in usec\n"
+		       "  %s %s - many events with this latency\n"
+		       "  %s %s - few events\n",
+		       color[num_colors - 1], nocolor,
+		       color[0], nocolor);
+		print_banner(g.lat_ave * 2);
+		for (;;) {
+			print_hist(map_fd[2]);
+			sleep(2);
+		}
+	}
+
+	return 0;
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm
  2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
                   ` (28 preceding siblings ...)
  2014-08-24 20:21 ` [PATCH v5 net-next 29/29] samples: bpf: IO latency analysis (iosnoop/heatmap) Alexei Starovoitov
@ 2014-08-25  0:39 ` David Miller
  29 siblings, 0 replies; 33+ messages in thread
From: David Miller @ 2014-08-25  0:39 UTC (permalink / raw)
  To: ast
  Cc: mingo, torvalds, luto, rostedt, dborkman, chema, edumazet,
	a.p.zijlstra, brendan.d.gregg, namhyung, hpa, akpm, keescook,
	linux-api, netdev, linux-kernel

From: Alexei Starovoitov <ast@plumgrid.com>
Date: Sun, 24 Aug 2014 13:21:01 -0700

> enough RFCs, let's finalize it...

Please break this down into smaller, easier to review, sets
of changes.

Asking people to review nearly 30 patches at once isn't reasonable.

Shoot for something like about 10 at a time, at most.

Thank you.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps
  2014-08-24 20:21 ` [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps Alexei Starovoitov
@ 2014-08-25 21:33   ` Cong Wang
  2014-08-25 22:07     ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Cong Wang @ 2014-08-25 21:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski,
	Steven Rostedt, Daniel Borkmann, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Brendan Gregg, Namhyung Kim, H. Peter Anvin,
	Andrew Morton, Kees Cook, linux-api, netdev, linux-kernel

On Sun, Aug 24, 2014 at 1:21 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> 'maps' is a generic storage of different types for sharing data between kernel
> and userspace.
>
> The maps are accessed from user space via BPF syscall, which has commands:
>
> - create a map with given type and attributes
>   fd = bpf_map_create(map_type, struct nlattr *attr, int len)
>   returns fd or negative error
>
> - lookup key in a given map referenced by fd
>   err = bpf_map_lookup_elem(int fd, void *key, void *value)
>   returns zero and stores found elem into value or negative error
>
> - create or update key/value pair in a given map
>   err = bpf_map_update_elem(int fd, void *key, void *value)
>   returns zero or negative error
>
> - find and delete element by key in a given map
>   err = bpf_map_delete_elem(int fd, void *key)
>
> - iterate map elements (based on input key return next_key)
>   err = bpf_map_get_next_key(int fd, void *key, void *next_key)


I think you need to document the bpf() syscall instead of wrappers on it,
from a developer's point of view. You will anyway need to document a new
syscall with a man page as a general rule.

In the changelog I mean something like:

err = bpf(BPF_MAP_LOOKUP_ELEM, ...);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps
  2014-08-25 21:33   ` Cong Wang
@ 2014-08-25 22:07     ` Alexei Starovoitov
  0 siblings, 0 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2014-08-25 22:07 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Ingo Molnar, Linus Torvalds, Andy Lutomirski,
	Steven Rostedt, Daniel Borkmann, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Brendan Gregg, Namhyung Kim, H. Peter Anvin,
	Andrew Morton, Kees Cook, Linux API, netdev, linux-kernel

On Mon, Aug 25, 2014 at 2:33 PM, Cong Wang <cwang@twopensource.com> wrote:
> On Sun, Aug 24, 2014 at 1:21 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> 'maps' is a generic storage of different types for sharing data between kernel
>> and userspace.
>>
>> The maps are accessed from user space via BPF syscall, which has commands:
>>
>> - create a map with given type and attributes
>>   fd = bpf_map_create(map_type, struct nlattr *attr, int len)
>>   returns fd or negative error
>>
>> - lookup key in a given map referenced by fd
>>   err = bpf_map_lookup_elem(int fd, void *key, void *value)
>>   returns zero and stores found elem into value or negative error
>>
>> - create or update key/value pair in a given map
>>   err = bpf_map_update_elem(int fd, void *key, void *value)
>>   returns zero or negative error
>>
>> - find and delete element by key in a given map
>>   err = bpf_map_delete_elem(int fd, void *key)
>>
>> - iterate map elements (based on input key return next_key)
>>   err = bpf_map_get_next_key(int fd, void *key, void *next_key)
>
>
> I think you need to document the bpf() syscall instead of wrappers on it,
> from a developer's point of view. You will anyway need to document a new
> syscall with a man page as a general rule.

yep. I've mentioned before that man page is on todo list. I'm delaying
writing it, because it's the most difficult part and I don't want to keep
rewriting it when interface changes (like it did from global id to fd).
Once implementation lands, manpage will be the highest priority.

> In the changelog I mean something like:
>
> err = bpf(BPF_MAP_LOOKUP_ELEM, ...);

Are you saying instead of:
err = bpf_map_lookup_elem(int fd, void *key, void *value)
write
err = bpf(BPF_MAP_LOOKUP_ELEM, fd, key, value)
in commit log?
I think that style carries less information per line.

For man page I'll document the syscall in a traditional way, but
for commit log I like to have maximum info in the fewest lines.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-08-25 22:07 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-24 20:21 [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 01/29] bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 02/29] net: filter: add "load 64-bit immediate" eBPF instruction Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 03/29] net: filter: split filter.h and expose eBPF to user space Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 04/29] bpf: introduce syscall(BPF, ...) and BPF maps Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 05/29] bpf: enable bpf syscall on x64 and i386 Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 06/29] bpf: add lookup/update/delete/iterate methods to BPF maps Alexei Starovoitov
2014-08-25 21:33   ` Cong Wang
2014-08-25 22:07     ` Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 07/29] bpf: add hashtable type of " Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 08/29] bpf: expand BPF syscall with program load/unload Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 09/29] bpf: handle pseudo BPF_CALL insn Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 10/29] bpf: verifier (add docs) Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 11/29] bpf: verifier (add ability to receive verification log) Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 12/29] bpf: handle pseudo BPF_LD_IMM64 insn Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 13/29] bpf: verifier (add branch/goto checks) Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 14/29] bpf: verifier (add verifier core) Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 15/29] bpf: verifier (add state prunning optimization) Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 16/29] bpf: allow eBPF programs to use maps Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 17/29] bpf: split eBPF out of NET Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 18/29] tracing: allow eBPF programs to be attached to events Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 19/29] tracing: allow eBPF programs call printk() Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 20/29] tracing: allow eBPF programs to be attached to kprobe/kretprobe Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 21/29] tracing: allow eBPF programs to call ktime_get_ns() and get_current() Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 22/29] samples: bpf: add mini eBPF library to manipulate maps and programs Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 23/29] samples: bpf: example of tracing filters with eBPF Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 24/29] bpf: verifier test Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 25/29] bpf: llvm backend Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 26/29] samples: bpf: elf file loader Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 27/29] samples: bpf: eBPF example in C Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 28/29] samples: bpf: counting " Alexei Starovoitov
2014-08-24 20:21 ` [PATCH v5 net-next 29/29] samples: bpf: IO latency analysis (iosnoop/heatmap) Alexei Starovoitov
2014-08-25  0:39 ` [PATCH v5 net-next 00/29] BPF syscall, maps, verifier, samples, llvm David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).