All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v0] bpf: BPF based latency tracing
@ 2015-06-18 11:40 Daniel Wagner
  2015-06-18 17:06 ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Wagner @ 2015-06-18 11:40 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: linux-kernel, Daniel Wagner

BPF offers another way to generate latency histograms. We attach
kprobes at trace_preempt_off and trace_preempt_on and calculate the
time it takes to from seeing the off/on transition.

The first array is used to store the start time stamp. The key is the
CPU id. The second array stores the log2(time diff). We need to use
static allocation here (array and not hash tables). The kprobes
hooking into trace_preempt_on|off should not calling any dynamic
memory allocation or free path. We need to avoid recursivly
getting called. Besides that, it reduces jitter in the measurement.

CPU 0
      latency        : count     distribution
       1 -> 1        : 0        |                                        |
       2 -> 3        : 0        |                                        |
       4 -> 7        : 0        |                                        |
       8 -> 15       : 0        |                                        |
      16 -> 31       : 0        |                                        |
      32 -> 63       : 0        |                                        |
      64 -> 127      : 0        |                                        |
     128 -> 255      : 0        |                                        |
     256 -> 511      : 0        |                                        |
     512 -> 1023     : 0        |                                        |
    1024 -> 2047     : 0        |                                        |
    2048 -> 4095     : 166723   |*************************************** |
    4096 -> 8191     : 19870    |***                                     |
    8192 -> 16383    : 6324     |                                        |
   16384 -> 32767    : 1098     |                                        |
   32768 -> 65535    : 190      |                                        |
   65536 -> 131071   : 179      |                                        |
  131072 -> 262143   : 18       |                                        |
  262144 -> 524287   : 4        |                                        |
  524288 -> 1048575  : 1363     |                                        |
CPU 1
      latency        : count     distribution
       1 -> 1        : 0        |                                        |
       2 -> 3        : 0        |                                        |
       4 -> 7        : 0        |                                        |
       8 -> 15       : 0        |                                        |
      16 -> 31       : 0        |                                        |
      32 -> 63       : 0        |                                        |
      64 -> 127      : 0        |                                        |
     128 -> 255      : 0        |                                        |
     256 -> 511      : 0        |                                        |
     512 -> 1023     : 0        |                                        |
    1024 -> 2047     : 0        |                                        |
    2048 -> 4095     : 114042   |*************************************** |
    4096 -> 8191     : 9587     |**                                      |
    8192 -> 16383    : 4140     |                                        |
   16384 -> 32767    : 673      |                                        |
   32768 -> 65535    : 179      |                                        |
   65536 -> 131071   : 29       |                                        |
  131072 -> 262143   : 4        |                                        |
  262144 -> 524287   : 1        |                                        |
  524288 -> 1048575  : 364      |                                        |
CPU 2
      latency        : count     distribution
       1 -> 1        : 0        |                                        |
       2 -> 3        : 0        |                                        |
       4 -> 7        : 0        |                                        |
       8 -> 15       : 0        |                                        |
      16 -> 31       : 0        |                                        |
      32 -> 63       : 0        |                                        |
      64 -> 127      : 0        |                                        |
     128 -> 255      : 0        |                                        |
     256 -> 511      : 0        |                                        |
     512 -> 1023     : 0        |                                        |
    1024 -> 2047     : 0        |                                        |
    2048 -> 4095     : 40147    |*************************************** |
    4096 -> 8191     : 2300     |*                                       |
    8192 -> 16383    : 828      |                                        |
   16384 -> 32767    : 178      |                                        |
   32768 -> 65535    : 59       |                                        |
   65536 -> 131071   : 2        |                                        |
  131072 -> 262143   : 0        |                                        |
  262144 -> 524287   : 1        |                                        |
  524288 -> 1048575  : 174      |                                        |
CPU 3
      latency        : count     distribution
       1 -> 1        : 0        |                                        |
       2 -> 3        : 0        |                                        |
       4 -> 7        : 0        |                                        |
       8 -> 15       : 0        |                                        |
      16 -> 31       : 0        |                                        |
      32 -> 63       : 0        |                                        |
      64 -> 127      : 0        |                                        |
     128 -> 255      : 0        |                                        |
     256 -> 511      : 0        |                                        |
     512 -> 1023     : 0        |                                        |
    1024 -> 2047     : 0        |                                        |
    2048 -> 4095     : 29626    |*************************************** |
    4096 -> 8191     : 2704     |**                                      |
    8192 -> 16383    : 1090     |                                        |
   16384 -> 32767    : 160      |                                        |
   32768 -> 65535    : 72       |                                        |
   65536 -> 131071   : 32       |                                        |
  131072 -> 262143   : 26       |                                        |
  262144 -> 524287   : 12       |                                        |
  524288 -> 1048575  : 298      |                                        |

All this is based on the trace3 examples written by
Alexei Starovoitov <ast@plumgrid.com>.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
Hi Alexei,

This version is working nicely for me. No hickups so far.
There are a couple of dependencies which are missing in my
tree but they are already on the way to mainline:

 - rcu_is_watching fix
 - bpf_get_smp_processor_id() export (including bpf_helpers.h)

I am not sure if it is really worth spending more time getting
the hash table working for the trace_preempt_[on|off] kprobes.
There are so many things which could go wrong, so going with
a static version seems for me the right choice.

cheers,
daniel

changes v0:
	- renamed to lathist since there is no direct hw latency involved
	-  use arrays instead of hash tables

 samples/bpf/Makefile       |   4 ++
 samples/bpf/lathist_kern.c |  99 +++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/lathist_user.c | 103 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 206 insertions(+)
 create mode 100644 samples/bpf/lathist_kern.c
 create mode 100644 samples/bpf/lathist_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index f4e0bb2..ba02b6f 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -10,6 +10,7 @@ hostprogs-y += tracex1
 hostprogs-y += tracex2
 hostprogs-y += tracex3
 hostprogs-y += tracex4
+hostprogs-y += lathist
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -20,6 +21,7 @@ tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
 tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
 tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
 tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
+lathist-objs := bpf_load.o libbpf.o lathist_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -30,6 +32,7 @@ always += tracex2_kern.o
 always += tracex3_kern.o
 always += tracex4_kern.o
 always += tcbpf1_kern.o
+always += lathist_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
@@ -40,6 +43,7 @@ HOSTLOADLIBES_tracex1 += -lelf
 HOSTLOADLIBES_tracex2 += -lelf
 HOSTLOADLIBES_tracex3 += -lelf
 HOSTLOADLIBES_tracex4 += -lelf -lrt
+HOSTLOADLIBES_lathist += -lelf
 
 # point this to your LLVM backend with bpf support
 LLC = /opt/llvm/bin/llc
diff --git a/samples/bpf/lathist_kern.c b/samples/bpf/lathist_kern.c
new file mode 100644
index 0000000..18fa088
--- /dev/null
+++ b/samples/bpf/lathist_kern.c
@@ -0,0 +1,99 @@
+/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
+ * Copyright (c) 2015 BMW Car IT GmbH
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/version.h>
+#include <linux/ptrace.h>
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+#define MAX_ENTRIES	20
+#define MAX_CPU		4
+
+/* We need to stick to static allocated memory (an array instead of
+ * hash table) because managing dynamic memory from the
+ * trace_preempt_[on|off] tracepoints hooks is not supported.
+ */
+
+struct bpf_map_def SEC("maps") my_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(u64),
+	.max_entries = MAX_CPU,
+};
+
+SEC("kprobe/trace_preempt_off")
+int bpf_prog1(struct pt_regs *ctx)
+{
+	int cpu = bpf_get_smp_processor_id();
+	u64 *ts = bpf_map_lookup_elem(&my_map, &cpu);
+
+	if (ts)
+		*ts = bpf_ktime_get_ns();
+
+	return 0;
+}
+
+static unsigned int log2(unsigned int v)
+{
+	unsigned int r;
+	unsigned int shift;
+
+	r = (v > 0xFFFF) << 4; v >>= r;
+	shift = (v > 0xFF) << 3; v >>= shift; r |= shift;
+	shift = (v > 0xF) << 2; v >>= shift; r |= shift;
+	shift = (v > 0x3) << 1; v >>= shift; r |= shift;
+	r |= (v >> 1);
+
+	return r;
+}
+
+static unsigned int log2l(unsigned long v)
+{
+	unsigned int hi = v >> 32;
+
+	if (hi)
+		return log2(hi) + 32;
+	else
+		return log2(v);
+}
+
+struct bpf_map_def SEC("maps") my_lat = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(long),
+	.max_entries = MAX_CPU * MAX_ENTRIES,
+};
+
+SEC("kprobe/trace_preempt_on")
+int bpf_prog2(struct pt_regs *ctx)
+{
+	u64 *ts, cur_ts, delta;
+	int key, cpu;
+	long *val;
+
+	cpu = bpf_get_smp_processor_id();
+	ts = bpf_map_lookup_elem(&my_map, &cpu);
+	if (!ts)
+		return 0;
+
+	cur_ts = bpf_ktime_get_ns();
+	delta = log2l(cur_ts - *ts);
+
+	if (delta > MAX_ENTRIES - 1)
+		delta = MAX_ENTRIES - 1;
+
+	key = cpu * MAX_ENTRIES + delta;
+	val = bpf_map_lookup_elem(&my_lat, &key);
+	if (val)
+		__sync_fetch_and_add((long *)val, 1);
+
+	return 0;
+
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/lathist_user.c b/samples/bpf/lathist_user.c
new file mode 100644
index 0000000..65da8c1
--- /dev/null
+++ b/samples/bpf/lathist_user.c
@@ -0,0 +1,103 @@
+/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
+ * Copyright (c) 2015 BMW Car IT GmbH
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <linux/bpf.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+
+#define MAX_ENTRIES	20
+#define MAX_CPU		4
+#define MAX_STARS	40
+
+struct cpu_hist {
+	long data[MAX_ENTRIES];
+	long max;
+};
+
+static struct cpu_hist cpu_hist[MAX_CPU];
+
+static void stars(char *str, long val, long max, int width)
+{
+	int i;
+
+	for (i = 0; i < (width * val / max) - 1 && i < width - 1; i++)
+		str[i] = '*';
+	if (val > max)
+		str[i - 1] = '+';
+	str[i] = '\0';
+}
+
+static void print_hist(void)
+{
+	char starstr[MAX_STARS];
+	struct cpu_hist *hist;
+	int i, j;
+
+	/* clear screen */
+	printf("\033[2J");
+
+	for (j = 0; j < MAX_CPU; j++) {
+		hist = &cpu_hist[j];
+
+		/* ignore CPUs without data (maybe offline?) */
+		if (hist->max == 0)
+			continue;
+
+		printf("CPU %d\n", j);
+		printf("      latency        : count     distribution\n");
+		for (i = 1; i <= MAX_ENTRIES; i++) {
+			stars(starstr, hist->data[i - 1], hist->max, MAX_STARS);
+			printf("%8ld -> %-8ld : %-8ld |%-*s|\n",
+				(1l << i) >> 1, (1l << i) - 1,
+				hist->data[i - 1], MAX_STARS, starstr);
+		}
+	}
+}
+
+static void get_data(int fd)
+{
+	long key, value;
+	int c, i;
+
+	for (i = 0; i < MAX_CPU; i++)
+		cpu_hist[i].max = 0;
+
+	for (c = 0; c < MAX_CPU; c++) {
+		for (i = 0; i < MAX_ENTRIES; i++) {
+			key = c * MAX_ENTRIES + i;
+			bpf_lookup_elem(fd, &key, &value);
+
+			cpu_hist[c].data[i] = value;
+			if (value > cpu_hist[c].max)
+				cpu_hist[c].max = value;
+		}
+	}
+}
+
+int main(int argc, char **argv)
+{
+	char filename[256];
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+
+	while (1) {
+		get_data(map_fd[1]);
+		print_hist();
+		sleep(5);
+	}
+
+	return 0;
+}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v0] bpf: BPF based latency tracing
  2015-06-18 11:40 [PATCH v0] bpf: BPF based latency tracing Daniel Wagner
@ 2015-06-18 17:06 ` Alexei Starovoitov
  2015-06-19  6:07   ` Daniel Wagner
  0 siblings, 1 reply; 4+ messages in thread
From: Alexei Starovoitov @ 2015-06-18 17:06 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-kernel

On 6/18/15 4:40 AM, Daniel Wagner wrote:
> BPF offers another way to generate latency histograms. We attach
> kprobes at trace_preempt_off and trace_preempt_on and calculate the
> time it takes to from seeing the off/on transition.
>
> The first array is used to store the start time stamp. The key is the
> CPU id. The second array stores the log2(time diff). We need to use
> static allocation here (array and not hash tables). The kprobes
> hooking into trace_preempt_on|off should not calling any dynamic
> memory allocation or free path. We need to avoid recursivly
> getting called. Besides that, it reduces jitter in the measurement.
>
> CPU 0
>        latency        : count     distribution
>         1 -> 1        : 0        |                                        |
>         2 -> 3        : 0        |                                        |
>         4 -> 7        : 0        |                                        |
>         8 -> 15       : 0        |                                        |
>        16 -> 31       : 0        |                                        |
>        32 -> 63       : 0        |                                        |
>        64 -> 127      : 0        |                                        |
>       128 -> 255      : 0        |                                        |
>       256 -> 511      : 0        |                                        |
>       512 -> 1023     : 0        |                                        |
>      1024 -> 2047     : 0        |                                        |
>      2048 -> 4095     : 166723   |*************************************** |
>      4096 -> 8191     : 19870    |***                                     |
>      8192 -> 16383    : 6324     |                                        |
>     16384 -> 32767    : 1098     |                                        |

nice useful sample indeed!
The numbers are non-JITed, right?
JIT should reduce the measurement cost 2-3x, but preempt_on/off
latency probably will stay in 2k range.

> I am not sure if it is really worth spending more time getting
> the hash table working for the trace_preempt_[on|off] kprobes.
> There are so many things which could go wrong, so going with
> a static version seems for me the right choice.

agree. for this use case arrays are better choice anyway.
But I'll keep working on getting hash tables working even
in this extreme conditions. bpf should be always rock solid.

I'm only a bit suspicious of kprobes, since we have:
NOKPROBE_SYMBOL(preempt_count_sub)
but trace_preemp_on() called by preempt_count_sub()
don't have this mark...

> +SEC("kprobe/trace_preempt_off")
> +int bpf_prog1(struct pt_regs *ctx)
> +{
> +	int cpu = bpf_get_smp_processor_id();
> +	u64 *ts = bpf_map_lookup_elem(&my_map, &cpu);
> +
> +	if (ts)
> +		*ts = bpf_ktime_get_ns();

btw, I'm planning to add native per-cpu maps which will
speed up things more and reduce measurement overhead.

I think you can retarget this patch to net-next and send
it to netdev. It's not too late for this merge window.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v0] bpf: BPF based latency tracing
  2015-06-18 17:06 ` Alexei Starovoitov
@ 2015-06-19  6:07   ` Daniel Wagner
  2015-06-19  7:06     ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Wagner @ 2015-06-19  6:07 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: linux-kernel

On 06/18/2015 07:06 PM, Alexei Starovoitov wrote:
> On 6/18/15 4:40 AM, Daniel Wagner wrote:
>> BPF offers another way to generate latency histograms. We attach
>> kprobes at trace_preempt_off and trace_preempt_on and calculate the
>> time it takes to from seeing the off/on transition.
>>
>> The first array is used to store the start time stamp. The key is the
>> CPU id. The second array stores the log2(time diff). We need to use
>> static allocation here (array and not hash tables). The kprobes
>> hooking into trace_preempt_on|off should not calling any dynamic
>> memory allocation or free path. We need to avoid recursivly
>> getting called. Besides that, it reduces jitter in the measurement.
>>
>> CPU 0
>>        latency        : count     distribution
>>         1 -> 1        : 0       
>> |                                        |
>>         2 -> 3        : 0       
>> |                                        |
>>         4 -> 7        : 0       
>> |                                        |
>>         8 -> 15       : 0       
>> |                                        |
>>        16 -> 31       : 0       
>> |                                        |
>>        32 -> 63       : 0       
>> |                                        |
>>        64 -> 127      : 0       
>> |                                        |
>>       128 -> 255      : 0       
>> |                                        |
>>       256 -> 511      : 0       
>> |                                        |
>>       512 -> 1023     : 0       
>> |                                        |
>>      1024 -> 2047     : 0       
>> |                                        |
>>      2048 -> 4095     : 166723  
>> |*************************************** |
>>      4096 -> 8191     : 19870   
>> |***                                     |
>>      8192 -> 16383    : 6324    
>> |                                        |
>>     16384 -> 32767    : 1098    
>> |                                        |
> 
> nice useful sample indeed!
> The numbers are non-JITed, right?

You are right. I forgot to enable the JIT compiler. 

> JIT should reduce the measurement cost 2-3x, but preempt_on/off
> latency probably will stay in 2k range.

Here is an example output with JIT enabled after a few seconds.

      latency        : count     distribution
       1 -> 1        : 0        |                                        |
       2 -> 3        : 0        |                                        |
       4 -> 7        : 0        |                                        |
       8 -> 15       : 0        |                                        |
      16 -> 31       : 0        |                                        |
      32 -> 63       : 0        |                                        |
      64 -> 127      : 0        |                                        |
     128 -> 255      : 0        |                                        |
     256 -> 511      : 0        |                                        |
     512 -> 1023     : 0        |                                        |
    1024 -> 2047     : 0        |                                        |
    2048 -> 4095     : 11595    |*************************************** |
    4096 -> 8191     : 3086     |*********                               |
    8192 -> 16383    : 392      |                                        |
   16384 -> 32767    : 114      |                                        |
   32768 -> 65535    : 16       |                                        |
   65536 -> 131071   : 1        |                                        |
  131072 -> 262143   : 0        |                                        |
  262144 -> 524287   : 0        |                                        |
  524288 -> 1048575  : 34       |                                        |

As you predicted it wont go lower than 2k range. 

>> I am not sure if it is really worth spending more time getting
>> the hash table working for the trace_preempt_[on|off] kprobes.
>> There are so many things which could go wrong, so going with
>> a static version seems for me the right choice.
> 
> agree. for this use case arrays are better choice anyway.
> But I'll keep working on getting hash tables working even
> in this extreme conditions. bpf should be always rock solid.
> 
> I'm only a bit suspicious of kprobes, since we have:
> NOKPROBE_SYMBOL(preempt_count_sub)
> but trace_preemp_on() called by preempt_count_sub()
> don't have this mark...

The original commit indicates that anything called from
preempt_disable() should also be marked as NOKPROBE_SYMBOL:

	commit 43627582799db317e966ecb0002c2c3c9805ec0f
	Author:	Srinivasa Ds <srinivasa@in.ibm.com>  Sun Feb 24 00:24:04 2008
	Committer:	Linus Torvalds <torvalds@woody.linux-foundation.org>  Sun Feb 24 02:13:24 2008
	Original File:	kernel/sched.c

	kprobes: refuse kprobe insertion on add/sub_preempt_counter()

	Kprobes makes use of preempt_disable(),preempt_enable_noresched() and these
	functions inturn call add/sub_preempt_count().  So we need to refuse user from
	inserting probe in to these functions.

	This patch disallows user from probing add/sub_preempt_count().


Obviously, this would render this patch useless. 

>> +SEC("kprobe/trace_preempt_off")

BTW, is there a reason why not supporting build-in
tracepoints/events? It looks like it is only an artificial
limitation of bpf_helpers.

>> +int bpf_prog1(struct pt_regs *ctx)
>> +{
>> +    int cpu = bpf_get_smp_processor_id();
>> +    u64 *ts = bpf_map_lookup_elem(&my_map, &cpu);
>> +
>> +    if (ts)
>> +        *ts = bpf_ktime_get_ns();
> 
> btw, I'm planning to add native per-cpu maps which will
> speed up things more and reduce measurement overhead.

Funny I was about to suggest something like this :)

> I think you can retarget this patch to net-next and send
> it to netdev. It's not too late for this merge window.

I'll rebase it to net-next.

cheers,
daniel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v0] bpf: BPF based latency tracing
  2015-06-19  6:07   ` Daniel Wagner
@ 2015-06-19  7:06     ` Alexei Starovoitov
  0 siblings, 0 replies; 4+ messages in thread
From: Alexei Starovoitov @ 2015-06-19  7:06 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-kernel

On 6/18/15 11:07 PM, Daniel Wagner wrote:
>> I'm only a bit suspicious of kprobes, since we have:
>> >NOKPROBE_SYMBOL(preempt_count_sub)
>> >but trace_preemp_on() called by preempt_count_sub()
>> >don't have this mark...
> The original commit indicates that anything called from
> preempt_disable() should also be marked as NOKPROBE_SYMBOL:
>
> 	commit 43627582799db317e966ecb0002c2c3c9805ec0f
> 	Author:	Srinivasa Ds<srinivasa@in.ibm.com>   Sun Feb 24 00:24:04 2008
> 	Committer:	Linus Torvalds<torvalds@woody.linux-foundation.org>   Sun Feb 24 02:13:24 2008
> 	Original File:	kernel/sched.c
>
> 	kprobes: refuse kprobe insertion on add/sub_preempt_counter()
...
> Obviously, this would render this patch useless.

well, I've tracked it to that commit as well, but I couldn't find
any discussion about kprobe crashes that led to that patch.
kprobe has its own mechanism to prevent recursion.

>>> >>+SEC("kprobe/trace_preempt_off")
> BTW, is there a reason why not supporting build-in
> tracepoints/events? It looks like it is only an artificial
> limitation of bpf_helpers.

The original bpf+tracing patch attached programs to both
tracepoints and kprobes, but there was a concern that it
promotes tracepoint arguments to stable ABI, since tracepoints in 
general are considered stable by most maintainers.
So we decided to go for bpf+kprobe for now, since kprobes
are unstable, so no one can complain that scripts suddenly
break because probed function disappears or its arguments change.
Since then we've discussed attaching to trace marker, debug tracepoints
and other things. So hopefully soon it will be ready.

>>> >>+int bpf_prog1(struct pt_regs *ctx)
>>> >>+{
>>> >>+    int cpu = bpf_get_smp_processor_id();
>>> >>+    u64 *ts = bpf_map_lookup_elem(&my_map, &cpu);
>>> >>+
>>> >>+    if (ts)
>>> >>+        *ts = bpf_ktime_get_ns();
>> >
>> >btw, I'm planning to add native per-cpu maps which will
>> >speed up things more and reduce measurement overhead.
> Funny I was about to suggest something like this :)
>
>> >I think you can retarget this patch to net-next and send
>> >it to netdev. It's not too late for this merge window.
> I'll rebase it to net-next.

Great :)


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-19  7:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-18 11:40 [PATCH v0] bpf: BPF based latency tracing Daniel Wagner
2015-06-18 17:06 ` Alexei Starovoitov
2015-06-19  6:07   ` Daniel Wagner
2015-06-19  7:06     ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.