All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 bpf-next 00/10] bpf, tracing: introduce bpf raw tracepoints
@ 2018-03-28  2:10 ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

v6->v7:
- adopted Steven's bpf_raw_tp_map section approach to find tracepoint
  and corresponding bpf probe function instead of kallsyms approach.
  dropped kernel_tracepoint_find_by_name() patch

v5->v6:
- avoid changing semantics of for_each_kernel_tracepoint() function, instead
  introduce kernel_tracepoint_find_by_name() helper

v4->v5:
- adopted Daniel's fancy REPEAT macro in bpf_trace.c in patch 7
  
v3->v4:
- adopted Linus's CAST_TO_U64 macro to cast any integer, pointer, or small
  struct to u64. That nicely reduced the size of patch 1

v2->v3:
- with Linus's suggestion introduced generic COUNT_ARGS and CONCATENATE macros
  (or rather moved them from apparmor)
  that cleaned up patches 6 and 7
- added patch 4 to refactor trace_iwlwifi_dev_ucode_error() from 17 args to 4
  Now any tracepoint with >12 args will have build error

v1->v2:
- simplified api by combing bpf_raw_tp_open(name) + bpf_attach(prog_fd) into
  bpf_raw_tp_open(name, prog_fd) as suggested by Daniel.
  That simplifies bpf_detach as well which is now simple close() of fd.
- fixed memory leak in error path which was spotted by Daniel.
- fixed bpf_get_stackid(), bpf_perf_event_output() called from raw tracepoints
- added more tests
- fixed allyesconfig build caught by buildbot

v1:
This patch set is a different way to address the pressing need to access
task_struct pointers in sched tracepoints from bpf programs.

The first approach simply added these pointers to sched tracepoints:
https://lkml.org/lkml/2017/12/14/753
which Peter nacked.
Few options were discussed and eventually the discussion converged on
doing bpf specific tracepoint_probe_register() probe functions.
Details here:
https://lkml.org/lkml/2017/12/20/929

Patch 1 is kernel wide cleanup of pass-struct-by-value into
pass-struct-by-reference into tracepoints.

Patches 2 and 3 are minor cleanups to address allyesconfig build

Patch 4 refactor trace_iwlwifi_dev_ucode_error from 17 to 4 args

Patch 5 introduces COUNT_ARGS macro

Patch 6 minor prep work to expose number of arguments passed
into tracepoints.

Patch 7 introduces BPF_RAW_TRACEPOINT api.
the auto-cleanup and multiple concurrent users are must have
features of tracing api. For bpf raw tracepoints it looks like:
  // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
  prog_fd = bpf_prog_load(...);

  // receive anon_inode fd for given bpf_raw_tracepoint
  // and attach bpf program to it
  raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool will automatically
detach bpf program, unload it and unregister tracepoint probe.
More details in patch 7.

Patch 8 - trivial support in libbpf
Patches 9, 10 - user space tests

samples/bpf/test_overhead performance on 1 cpu:

tracepoint    base  kprobe+bpf tracepoint+bpf raw_tracepoint+bpf
task_rename   1.1M   769K        947K            1.0M
urandom_read  789K   697K        750K            755K

Alexei Starovoitov (10):
  treewide: remove large struct-pass-by-value from tracepoint arguments
  net/mediatek: disambiguate mt76 vs mt7601u trace events
  net/mac802154: disambiguate mac80215 vs mac802154 trace events
  net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint
  macro: introduce COUNT_ARGS() macro
  tracepoint: compute num_args at build time
  bpf: introduce BPF_RAW_TRACEPOINT
  libbpf: add bpf_raw_tracepoint_open helper
  samples/bpf: raw tracepoint test
  selftests/bpf: test for bpf_get_stackid() from raw tracepoints

 drivers/infiniband/hw/hfi1/file_ops.c              |   2 +-
 drivers/infiniband/hw/hfi1/trace_ctxts.h           |  12 +-
 drivers/net/wireless/intel/iwlwifi/dvm/main.c      |   7 +-
 .../wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h  |  39 ++---
 drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c  |   1 +
 drivers/net/wireless/intel/iwlwifi/mvm/utils.c     |   7 +-
 drivers/net/wireless/mediatek/mt7601u/trace.h      |   6 +-
 include/asm-generic/vmlinux.lds.h                  |  10 ++
 include/linux/bpf_types.h                          |   1 +
 include/linux/kernel.h                             |   7 +
 include/linux/trace_events.h                       |  42 +++++
 include/linux/tracepoint-defs.h                    |   6 +
 include/linux/tracepoint.h                         |  12 +-
 include/trace/bpf_probe.h                          |  91 ++++++++++
 include/trace/define_trace.h                       |  15 +-
 include/trace/events/f2fs.h                        |   2 +-
 include/uapi/linux/bpf.h                           |  11 ++
 kernel/bpf/syscall.c                               |  78 +++++++++
 kernel/trace/bpf_trace.c                           | 183 +++++++++++++++++++++
 net/mac802154/trace.h                              |   8 +-
 net/wireless/trace.h                               |   2 +-
 samples/bpf/Makefile                               |   1 +
 samples/bpf/bpf_load.c                             |  14 ++
 samples/bpf/test_overhead_raw_tp_kern.c            |  17 ++
 samples/bpf/test_overhead_user.c                   |  12 ++
 security/apparmor/include/path.h                   |   7 +-
 sound/firewire/amdtp-stream-trace.h                |   2 +-
 tools/include/uapi/linux/bpf.h                     |  11 ++
 tools/lib/bpf/bpf.c                                |  11 ++
 tools/lib/bpf/bpf.h                                |   1 +
 tools/testing/selftests/bpf/test_progs.c           |  91 +++++++---
 31 files changed, 619 insertions(+), 90 deletions(-)
 create mode 100644 include/trace/bpf_probe.h
 create mode 100644 samples/bpf/test_overhead_raw_tp_kern.c

-- 
2.9.5

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 00/10] bpf, tracing: introduce bpf raw tracepoints
@ 2018-03-28  2:10 ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

v6->v7:
- adopted Steven's bpf_raw_tp_map section approach to find tracepoint
  and corresponding bpf probe function instead of kallsyms approach.
  dropped kernel_tracepoint_find_by_name() patch

v5->v6:
- avoid changing semantics of for_each_kernel_tracepoint() function, instead
  introduce kernel_tracepoint_find_by_name() helper

v4->v5:
- adopted Daniel's fancy REPEAT macro in bpf_trace.c in patch 7
  
v3->v4:
- adopted Linus's CAST_TO_U64 macro to cast any integer, pointer, or small
  struct to u64. That nicely reduced the size of patch 1

v2->v3:
- with Linus's suggestion introduced generic COUNT_ARGS and CONCATENATE macros
  (or rather moved them from apparmor)
  that cleaned up patches 6 and 7
- added patch 4 to refactor trace_iwlwifi_dev_ucode_error() from 17 args to 4
  Now any tracepoint with >12 args will have build error

v1->v2:
- simplified api by combing bpf_raw_tp_open(name) + bpf_attach(prog_fd) into
  bpf_raw_tp_open(name, prog_fd) as suggested by Daniel.
  That simplifies bpf_detach as well which is now simple close() of fd.
- fixed memory leak in error path which was spotted by Daniel.
- fixed bpf_get_stackid(), bpf_perf_event_output() called from raw tracepoints
- added more tests
- fixed allyesconfig build caught by buildbot

v1:
This patch set is a different way to address the pressing need to access
task_struct pointers in sched tracepoints from bpf programs.

The first approach simply added these pointers to sched tracepoints:
https://lkml.org/lkml/2017/12/14/753
which Peter nacked.
Few options were discussed and eventually the discussion converged on
doing bpf specific tracepoint_probe_register() probe functions.
Details here:
https://lkml.org/lkml/2017/12/20/929

Patch 1 is kernel wide cleanup of pass-struct-by-value into
pass-struct-by-reference into tracepoints.

Patches 2 and 3 are minor cleanups to address allyesconfig build

Patch 4 refactor trace_iwlwifi_dev_ucode_error from 17 to 4 args

Patch 5 introduces COUNT_ARGS macro

Patch 6 minor prep work to expose number of arguments passed
into tracepoints.

Patch 7 introduces BPF_RAW_TRACEPOINT api.
the auto-cleanup and multiple concurrent users are must have
features of tracing api. For bpf raw tracepoints it looks like:
  // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
  prog_fd = bpf_prog_load(...);

  // receive anon_inode fd for given bpf_raw_tracepoint
  // and attach bpf program to it
  raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool will automatically
detach bpf program, unload it and unregister tracepoint probe.
More details in patch 7.

Patch 8 - trivial support in libbpf
Patches 9, 10 - user space tests

samples/bpf/test_overhead performance on 1 cpu:

tracepoint    base  kprobe+bpf tracepoint+bpf raw_tracepoint+bpf
task_rename   1.1M   769K        947K            1.0M
urandom_read  789K   697K        750K            755K

Alexei Starovoitov (10):
  treewide: remove large struct-pass-by-value from tracepoint arguments
  net/mediatek: disambiguate mt76 vs mt7601u trace events
  net/mac802154: disambiguate mac80215 vs mac802154 trace events
  net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint
  macro: introduce COUNT_ARGS() macro
  tracepoint: compute num_args at build time
  bpf: introduce BPF_RAW_TRACEPOINT
  libbpf: add bpf_raw_tracepoint_open helper
  samples/bpf: raw tracepoint test
  selftests/bpf: test for bpf_get_stackid() from raw tracepoints

 drivers/infiniband/hw/hfi1/file_ops.c              |   2 +-
 drivers/infiniband/hw/hfi1/trace_ctxts.h           |  12 +-
 drivers/net/wireless/intel/iwlwifi/dvm/main.c      |   7 +-
 .../wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h  |  39 ++---
 drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c  |   1 +
 drivers/net/wireless/intel/iwlwifi/mvm/utils.c     |   7 +-
 drivers/net/wireless/mediatek/mt7601u/trace.h      |   6 +-
 include/asm-generic/vmlinux.lds.h                  |  10 ++
 include/linux/bpf_types.h                          |   1 +
 include/linux/kernel.h                             |   7 +
 include/linux/trace_events.h                       |  42 +++++
 include/linux/tracepoint-defs.h                    |   6 +
 include/linux/tracepoint.h                         |  12 +-
 include/trace/bpf_probe.h                          |  91 ++++++++++
 include/trace/define_trace.h                       |  15 +-
 include/trace/events/f2fs.h                        |   2 +-
 include/uapi/linux/bpf.h                           |  11 ++
 kernel/bpf/syscall.c                               |  78 +++++++++
 kernel/trace/bpf_trace.c                           | 183 +++++++++++++++++++++
 net/mac802154/trace.h                              |   8 +-
 net/wireless/trace.h                               |   2 +-
 samples/bpf/Makefile                               |   1 +
 samples/bpf/bpf_load.c                             |  14 ++
 samples/bpf/test_overhead_raw_tp_kern.c            |  17 ++
 samples/bpf/test_overhead_user.c                   |  12 ++
 security/apparmor/include/path.h                   |   7 +-
 sound/firewire/amdtp-stream-trace.h                |   2 +-
 tools/include/uapi/linux/bpf.h                     |  11 ++
 tools/lib/bpf/bpf.c                                |  11 ++
 tools/lib/bpf/bpf.h                                |   1 +
 tools/testing/selftests/bpf/test_progs.c           |  91 +++++++---
 31 files changed, 619 insertions(+), 90 deletions(-)
 create mode 100644 include/trace/bpf_probe.h
 create mode 100644 samples/bpf/test_overhead_raw_tp_kern.c

-- 
2.9.5

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 01/10] treewide: remove large struct-pass-by-value from tracepoint arguments
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:10   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

- fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value
- convert 'type array[]' tracepoint arguments into 'type *array',
  since compiler will warn that sizeof('type array[]') == sizeof('type *array')
  and later should be used instead

The CAST_TO_U64 macro in the later patch will enforce that tracepoint
arguments can only be integers, pointers, or less than 8 byte structures.
Larger structures should be passed by reference.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/infiniband/hw/hfi1/file_ops.c    |  2 +-
 drivers/infiniband/hw/hfi1/trace_ctxts.h | 12 ++++++------
 include/trace/events/f2fs.h              |  2 +-
 net/wireless/trace.h                     |  2 +-
 sound/firewire/amdtp-stream-trace.h      |  2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 41fafebe3b0d..da4aa1a95b11 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1153,7 +1153,7 @@ static int get_ctxt_info(struct hfi1_filedata *fd, unsigned long arg, u32 len)
 	cinfo.sdma_ring_size = fd->cq->nentries;
 	cinfo.rcvegr_size = uctxt->egrbufs.rcvtid_size;
 
-	trace_hfi1_ctxt_info(uctxt->dd, uctxt->ctxt, fd->subctxt, cinfo);
+	trace_hfi1_ctxt_info(uctxt->dd, uctxt->ctxt, fd->subctxt, &cinfo);
 	if (copy_to_user((void __user *)arg, &cinfo, len))
 		return -EFAULT;
 
diff --git a/drivers/infiniband/hw/hfi1/trace_ctxts.h b/drivers/infiniband/hw/hfi1/trace_ctxts.h
index 4eb4cc798035..e00c8a7d559c 100644
--- a/drivers/infiniband/hw/hfi1/trace_ctxts.h
+++ b/drivers/infiniband/hw/hfi1/trace_ctxts.h
@@ -106,7 +106,7 @@ TRACE_EVENT(hfi1_uctxtdata,
 TRACE_EVENT(hfi1_ctxt_info,
 	    TP_PROTO(struct hfi1_devdata *dd, unsigned int ctxt,
 		     unsigned int subctxt,
-		     struct hfi1_ctxt_info cinfo),
+		     struct hfi1_ctxt_info *cinfo),
 	    TP_ARGS(dd, ctxt, subctxt, cinfo),
 	    TP_STRUCT__entry(DD_DEV_ENTRY(dd)
 			     __field(unsigned int, ctxt)
@@ -120,11 +120,11 @@ TRACE_EVENT(hfi1_ctxt_info,
 	    TP_fast_assign(DD_DEV_ASSIGN(dd);
 			    __entry->ctxt = ctxt;
 			    __entry->subctxt = subctxt;
-			    __entry->egrtids = cinfo.egrtids;
-			    __entry->rcvhdrq_cnt = cinfo.rcvhdrq_cnt;
-			    __entry->rcvhdrq_size = cinfo.rcvhdrq_entsize;
-			    __entry->sdma_ring_size = cinfo.sdma_ring_size;
-			    __entry->rcvegr_size = cinfo.rcvegr_size;
+			    __entry->egrtids = cinfo->egrtids;
+			    __entry->rcvhdrq_cnt = cinfo->rcvhdrq_cnt;
+			    __entry->rcvhdrq_size = cinfo->rcvhdrq_entsize;
+			    __entry->sdma_ring_size = cinfo->sdma_ring_size;
+			    __entry->rcvegr_size = cinfo->rcvegr_size;
 			    ),
 	    TP_printk("[%s] ctxt %u:%u " CINFO_FMT,
 		      __get_str(dev),
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 06c87f9f720c..795698925d20 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -491,7 +491,7 @@ DEFINE_EVENT(f2fs__truncate_node, f2fs_truncate_node,
 
 TRACE_EVENT(f2fs_truncate_partial_nodes,
 
-	TP_PROTO(struct inode *inode, nid_t nid[], int depth, int err),
+	TP_PROTO(struct inode *inode, nid_t *nid, int depth, int err),
 
 	TP_ARGS(inode, nid, depth, err),
 
diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index 5152938b358d..018c81fa72fb 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -3137,7 +3137,7 @@ TRACE_EVENT(rdev_start_radar_detection,
 
 TRACE_EVENT(rdev_set_mcast_rate,
 	TP_PROTO(struct wiphy *wiphy, struct net_device *netdev,
-		 int mcast_rate[NUM_NL80211_BANDS]),
+		 int *mcast_rate),
 	TP_ARGS(wiphy, netdev, mcast_rate),
 	TP_STRUCT__entry(
 		WIPHY_ENTRY
diff --git a/sound/firewire/amdtp-stream-trace.h b/sound/firewire/amdtp-stream-trace.h
index ea0d486652c8..54cdd4ffa9ce 100644
--- a/sound/firewire/amdtp-stream-trace.h
+++ b/sound/firewire/amdtp-stream-trace.h
@@ -14,7 +14,7 @@
 #include <linux/tracepoint.h>
 
 TRACE_EVENT(in_packet,
-	TP_PROTO(const struct amdtp_stream *s, u32 cycles, u32 cip_header[2], unsigned int payload_length, unsigned int index),
+	TP_PROTO(const struct amdtp_stream *s, u32 cycles, u32 *cip_header, unsigned int payload_length, unsigned int index),
 	TP_ARGS(s, cycles, cip_header, payload_length, index),
 	TP_STRUCT__entry(
 		__field(unsigned int, second)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 01/10] treewide: remove large struct-pass-by-value from tracepoint arguments
@ 2018-03-28  2:10   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

- fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value
- convert 'type array[]' tracepoint arguments into 'type *array',
  since compiler will warn that sizeof('type array[]') == sizeof('type *array')
  and later should be used instead

The CAST_TO_U64 macro in the later patch will enforce that tracepoint
arguments can only be integers, pointers, or less than 8 byte structures.
Larger structures should be passed by reference.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/infiniband/hw/hfi1/file_ops.c    |  2 +-
 drivers/infiniband/hw/hfi1/trace_ctxts.h | 12 ++++++------
 include/trace/events/f2fs.h              |  2 +-
 net/wireless/trace.h                     |  2 +-
 sound/firewire/amdtp-stream-trace.h      |  2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 41fafebe3b0d..da4aa1a95b11 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1153,7 +1153,7 @@ static int get_ctxt_info(struct hfi1_filedata *fd, unsigned long arg, u32 len)
 	cinfo.sdma_ring_size = fd->cq->nentries;
 	cinfo.rcvegr_size = uctxt->egrbufs.rcvtid_size;
 
-	trace_hfi1_ctxt_info(uctxt->dd, uctxt->ctxt, fd->subctxt, cinfo);
+	trace_hfi1_ctxt_info(uctxt->dd, uctxt->ctxt, fd->subctxt, &cinfo);
 	if (copy_to_user((void __user *)arg, &cinfo, len))
 		return -EFAULT;
 
diff --git a/drivers/infiniband/hw/hfi1/trace_ctxts.h b/drivers/infiniband/hw/hfi1/trace_ctxts.h
index 4eb4cc798035..e00c8a7d559c 100644
--- a/drivers/infiniband/hw/hfi1/trace_ctxts.h
+++ b/drivers/infiniband/hw/hfi1/trace_ctxts.h
@@ -106,7 +106,7 @@ TRACE_EVENT(hfi1_uctxtdata,
 TRACE_EVENT(hfi1_ctxt_info,
 	    TP_PROTO(struct hfi1_devdata *dd, unsigned int ctxt,
 		     unsigned int subctxt,
-		     struct hfi1_ctxt_info cinfo),
+		     struct hfi1_ctxt_info *cinfo),
 	    TP_ARGS(dd, ctxt, subctxt, cinfo),
 	    TP_STRUCT__entry(DD_DEV_ENTRY(dd)
 			     __field(unsigned int, ctxt)
@@ -120,11 +120,11 @@ TRACE_EVENT(hfi1_ctxt_info,
 	    TP_fast_assign(DD_DEV_ASSIGN(dd);
 			    __entry->ctxt = ctxt;
 			    __entry->subctxt = subctxt;
-			    __entry->egrtids = cinfo.egrtids;
-			    __entry->rcvhdrq_cnt = cinfo.rcvhdrq_cnt;
-			    __entry->rcvhdrq_size = cinfo.rcvhdrq_entsize;
-			    __entry->sdma_ring_size = cinfo.sdma_ring_size;
-			    __entry->rcvegr_size = cinfo.rcvegr_size;
+			    __entry->egrtids = cinfo->egrtids;
+			    __entry->rcvhdrq_cnt = cinfo->rcvhdrq_cnt;
+			    __entry->rcvhdrq_size = cinfo->rcvhdrq_entsize;
+			    __entry->sdma_ring_size = cinfo->sdma_ring_size;
+			    __entry->rcvegr_size = cinfo->rcvegr_size;
 			    ),
 	    TP_printk("[%s] ctxt %u:%u " CINFO_FMT,
 		      __get_str(dev),
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 06c87f9f720c..795698925d20 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -491,7 +491,7 @@ DEFINE_EVENT(f2fs__truncate_node, f2fs_truncate_node,
 
 TRACE_EVENT(f2fs_truncate_partial_nodes,
 
-	TP_PROTO(struct inode *inode, nid_t nid[], int depth, int err),
+	TP_PROTO(struct inode *inode, nid_t *nid, int depth, int err),
 
 	TP_ARGS(inode, nid, depth, err),
 
diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index 5152938b358d..018c81fa72fb 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -3137,7 +3137,7 @@ TRACE_EVENT(rdev_start_radar_detection,
 
 TRACE_EVENT(rdev_set_mcast_rate,
 	TP_PROTO(struct wiphy *wiphy, struct net_device *netdev,
-		 int mcast_rate[NUM_NL80211_BANDS]),
+		 int *mcast_rate),
 	TP_ARGS(wiphy, netdev, mcast_rate),
 	TP_STRUCT__entry(
 		WIPHY_ENTRY
diff --git a/sound/firewire/amdtp-stream-trace.h b/sound/firewire/amdtp-stream-trace.h
index ea0d486652c8..54cdd4ffa9ce 100644
--- a/sound/firewire/amdtp-stream-trace.h
+++ b/sound/firewire/amdtp-stream-trace.h
@@ -14,7 +14,7 @@
 #include <linux/tracepoint.h>
 
 TRACE_EVENT(in_packet,
-	TP_PROTO(const struct amdtp_stream *s, u32 cycles, u32 cip_header[2], unsigned int payload_length, unsigned int index),
+	TP_PROTO(const struct amdtp_stream *s, u32 cycles, u32 *cip_header, unsigned int payload_length, unsigned int index),
 	TP_ARGS(s, cycles, cip_header, payload_length, index),
 	TP_STRUCT__entry(
 		__field(unsigned int, second)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 02/10] net/mediatek: disambiguate mt76 vs mt7601u trace events
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:10   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

two trace events defined with the same name and both unused.
They conflict in allyesconfig build. Rename one of them.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/net/wireless/mediatek/mt7601u/trace.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt7601u/trace.h b/drivers/net/wireless/mediatek/mt7601u/trace.h
index 289897300ef0..82c8898b9076 100644
--- a/drivers/net/wireless/mediatek/mt7601u/trace.h
+++ b/drivers/net/wireless/mediatek/mt7601u/trace.h
@@ -34,7 +34,7 @@
 #define REG_PR_FMT	"%04x=%08x"
 #define REG_PR_ARG	__entry->reg, __entry->val
 
-DECLARE_EVENT_CLASS(dev_reg_evt,
+DECLARE_EVENT_CLASS(dev_reg_evtu,
 	TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
 	TP_ARGS(dev, reg, val),
 	TP_STRUCT__entry(
@@ -51,12 +51,12 @@ DECLARE_EVENT_CLASS(dev_reg_evt,
 	)
 );
 
-DEFINE_EVENT(dev_reg_evt, reg_read,
+DEFINE_EVENT(dev_reg_evtu, reg_read,
 	TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
 	TP_ARGS(dev, reg, val)
 );
 
-DEFINE_EVENT(dev_reg_evt, reg_write,
+DEFINE_EVENT(dev_reg_evtu, reg_write,
 	TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
 	TP_ARGS(dev, reg, val)
 );
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 02/10] net/mediatek: disambiguate mt76 vs mt7601u trace events
@ 2018-03-28  2:10   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

two trace events defined with the same name and both unused.
They conflict in allyesconfig build. Rename one of them.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/net/wireless/mediatek/mt7601u/trace.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt7601u/trace.h b/drivers/net/wireless/mediatek/mt7601u/trace.h
index 289897300ef0..82c8898b9076 100644
--- a/drivers/net/wireless/mediatek/mt7601u/trace.h
+++ b/drivers/net/wireless/mediatek/mt7601u/trace.h
@@ -34,7 +34,7 @@
 #define REG_PR_FMT	"%04x=%08x"
 #define REG_PR_ARG	__entry->reg, __entry->val
 
-DECLARE_EVENT_CLASS(dev_reg_evt,
+DECLARE_EVENT_CLASS(dev_reg_evtu,
 	TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
 	TP_ARGS(dev, reg, val),
 	TP_STRUCT__entry(
@@ -51,12 +51,12 @@ DECLARE_EVENT_CLASS(dev_reg_evt,
 	)
 );
 
-DEFINE_EVENT(dev_reg_evt, reg_read,
+DEFINE_EVENT(dev_reg_evtu, reg_read,
 	TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
 	TP_ARGS(dev, reg, val)
 );
 
-DEFINE_EVENT(dev_reg_evt, reg_write,
+DEFINE_EVENT(dev_reg_evtu, reg_write,
 	TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
 	TP_ARGS(dev, reg, val)
 );
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 03/10] net/mac802154: disambiguate mac80215 vs mac802154 trace events
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:10   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

two trace events defined with the same name and both unused.
They conflict in allyesconfig build. Rename one of them.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 net/mac802154/trace.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/mac802154/trace.h b/net/mac802154/trace.h
index 2c8a43d3607f..df855c33daf2 100644
--- a/net/mac802154/trace.h
+++ b/net/mac802154/trace.h
@@ -33,7 +33,7 @@
 
 /* Tracing for driver callbacks */
 
-DECLARE_EVENT_CLASS(local_only_evt,
+DECLARE_EVENT_CLASS(local_only_evt4,
 	TP_PROTO(struct ieee802154_local *local),
 	TP_ARGS(local),
 	TP_STRUCT__entry(
@@ -45,7 +45,7 @@ DECLARE_EVENT_CLASS(local_only_evt,
 	TP_printk(LOCAL_PR_FMT, LOCAL_PR_ARG)
 );
 
-DEFINE_EVENT(local_only_evt, 802154_drv_return_void,
+DEFINE_EVENT(local_only_evt4, 802154_drv_return_void,
 	TP_PROTO(struct ieee802154_local *local),
 	TP_ARGS(local)
 );
@@ -65,12 +65,12 @@ TRACE_EVENT(802154_drv_return_int,
 		  __entry->ret)
 );
 
-DEFINE_EVENT(local_only_evt, 802154_drv_start,
+DEFINE_EVENT(local_only_evt4, 802154_drv_start,
 	TP_PROTO(struct ieee802154_local *local),
 	TP_ARGS(local)
 );
 
-DEFINE_EVENT(local_only_evt, 802154_drv_stop,
+DEFINE_EVENT(local_only_evt4, 802154_drv_stop,
 	TP_PROTO(struct ieee802154_local *local),
 	TP_ARGS(local)
 );
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 03/10] net/mac802154: disambiguate mac80215 vs mac802154 trace events
@ 2018-03-28  2:10   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

two trace events defined with the same name and both unused.
They conflict in allyesconfig build. Rename one of them.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 net/mac802154/trace.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/mac802154/trace.h b/net/mac802154/trace.h
index 2c8a43d3607f..df855c33daf2 100644
--- a/net/mac802154/trace.h
+++ b/net/mac802154/trace.h
@@ -33,7 +33,7 @@
 
 /* Tracing for driver callbacks */
 
-DECLARE_EVENT_CLASS(local_only_evt,
+DECLARE_EVENT_CLASS(local_only_evt4,
 	TP_PROTO(struct ieee802154_local *local),
 	TP_ARGS(local),
 	TP_STRUCT__entry(
@@ -45,7 +45,7 @@ DECLARE_EVENT_CLASS(local_only_evt,
 	TP_printk(LOCAL_PR_FMT, LOCAL_PR_ARG)
 );
 
-DEFINE_EVENT(local_only_evt, 802154_drv_return_void,
+DEFINE_EVENT(local_only_evt4, 802154_drv_return_void,
 	TP_PROTO(struct ieee802154_local *local),
 	TP_ARGS(local)
 );
@@ -65,12 +65,12 @@ TRACE_EVENT(802154_drv_return_int,
 		  __entry->ret)
 );
 
-DEFINE_EVENT(local_only_evt, 802154_drv_start,
+DEFINE_EVENT(local_only_evt4, 802154_drv_start,
 	TP_PROTO(struct ieee802154_local *local),
 	TP_ARGS(local)
 );
 
-DEFINE_EVENT(local_only_evt, 802154_drv_stop,
+DEFINE_EVENT(local_only_evt4, 802154_drv_stop,
 	TP_PROTO(struct ieee802154_local *local),
 	TP_ARGS(local)
 );
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 04/10] net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:10   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

fix iwlwifi_dev_ucode_error tracepoint to pass pointer to a table
instead of all 17 arguments by value.
dvm/main.c and mvm/utils.c have 'struct iwl_error_event_table'
defined with very similar yet subtly different fields and offsets.
tracepoint is still common and using definition of 'struct iwl_error_event_table'
from dvm/commands.h while copying fields.
Long term this tracepoint probably should be split into two.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/net/wireless/intel/iwlwifi/dvm/main.c      |  7 +---
 .../wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h  | 39 ++++++++++------------
 drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c  |  1 +
 drivers/net/wireless/intel/iwlwifi/mvm/utils.c     |  7 +---
 4 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/main.c b/drivers/net/wireless/intel/iwlwifi/dvm/main.c
index d11d72615de2..e68254e12764 100644
--- a/drivers/net/wireless/intel/iwlwifi/dvm/main.c
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/main.c
@@ -1651,12 +1651,7 @@ static void iwl_dump_nic_error_log(struct iwl_priv *priv)
 			priv->status, table.valid);
 	}
 
-	trace_iwlwifi_dev_ucode_error(trans->dev, table.error_id, table.tsf_low,
-				      table.data1, table.data2, table.line,
-				      table.blink2, table.ilink1, table.ilink2,
-				      table.bcon_time, table.gp1, table.gp2,
-				      table.gp3, table.ucode_ver, table.hw_ver,
-				      0, table.brd_ver);
+	trace_iwlwifi_dev_ucode_error(trans->dev, &table, 0, table.brd_ver);
 	IWL_ERR(priv, "0x%08X | %-28s\n", table.error_id,
 		desc_lookup(table.error_id));
 	IWL_ERR(priv, "0x%08X | uPc\n", table.pc);
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
index 9518a82f44c2..27e3e4e96aa2 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
@@ -126,14 +126,11 @@ TRACE_EVENT(iwlwifi_dev_tx,
 		  __entry->framelen, __entry->skbaddr)
 );
 
+struct iwl_error_event_table;
 TRACE_EVENT(iwlwifi_dev_ucode_error,
-	TP_PROTO(const struct device *dev, u32 desc, u32 tsf_low,
-		 u32 data1, u32 data2, u32 line, u32 blink2, u32 ilink1,
-		 u32 ilink2, u32 bcon_time, u32 gp1, u32 gp2, u32 rev_type,
-		 u32 major, u32 minor, u32 hw_ver, u32 brd_ver),
-	TP_ARGS(dev, desc, tsf_low, data1, data2, line,
-		 blink2, ilink1, ilink2, bcon_time, gp1, gp2,
-		 rev_type, major, minor, hw_ver, brd_ver),
+	TP_PROTO(const struct device *dev, const struct iwl_error_event_table *table,
+		 u32 hw_ver, u32 brd_ver),
+	TP_ARGS(dev, table, hw_ver, brd_ver),
 	TP_STRUCT__entry(
 		DEV_ENTRY
 		__field(u32, desc)
@@ -155,20 +152,20 @@ TRACE_EVENT(iwlwifi_dev_ucode_error,
 	),
 	TP_fast_assign(
 		DEV_ASSIGN;
-		__entry->desc = desc;
-		__entry->tsf_low = tsf_low;
-		__entry->data1 = data1;
-		__entry->data2 = data2;
-		__entry->line = line;
-		__entry->blink2 = blink2;
-		__entry->ilink1 = ilink1;
-		__entry->ilink2 = ilink2;
-		__entry->bcon_time = bcon_time;
-		__entry->gp1 = gp1;
-		__entry->gp2 = gp2;
-		__entry->rev_type = rev_type;
-		__entry->major = major;
-		__entry->minor = minor;
+		__entry->desc = table->error_id;
+		__entry->tsf_low = table->tsf_low;
+		__entry->data1 = table->data1;
+		__entry->data2 = table->data2;
+		__entry->line = table->line;
+		__entry->blink2 = table->blink2;
+		__entry->ilink1 = table->ilink1;
+		__entry->ilink2 = table->ilink2;
+		__entry->bcon_time = table->bcon_time;
+		__entry->gp1 = table->gp1;
+		__entry->gp2 = table->gp2;
+		__entry->rev_type = table->gp3;
+		__entry->major = table->ucode_ver;
+		__entry->minor = table->hw_ver;
 		__entry->hw_ver = hw_ver;
 		__entry->brd_ver = brd_ver;
 	),
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
index 50510fb6ab8c..6aa719865a58 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
@@ -30,6 +30,7 @@
 #ifndef __CHECKER__
 #include "iwl-trans.h"
 
+#include "dvm/commands.h"
 #define CREATE_TRACE_POINTS
 #include "iwl-devtrace.h"
 
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
index d65e1db7c097..5442ead876eb 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
@@ -549,12 +549,7 @@ static void iwl_mvm_dump_lmac_error_log(struct iwl_mvm *mvm, u32 base)
 
 	IWL_ERR(mvm, "Loaded firmware version: %s\n", mvm->fw->fw_version);
 
-	trace_iwlwifi_dev_ucode_error(trans->dev, table.error_id, table.tsf_low,
-				      table.data1, table.data2, table.data3,
-				      table.blink2, table.ilink1,
-				      table.ilink2, table.bcon_time, table.gp1,
-				      table.gp2, table.fw_rev_type, table.major,
-				      table.minor, table.hw_ver, table.brd_ver);
+	trace_iwlwifi_dev_ucode_error(trans->dev, &table, table.hw_ver, table.brd_ver);
 	IWL_ERR(mvm, "0x%08X | %-28s\n", table.error_id,
 		desc_lookup(table.error_id));
 	IWL_ERR(mvm, "0x%08X | trm_hw_status0\n", table.trm_hw_status0);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 04/10] net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint
@ 2018-03-28  2:10   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:10 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

fix iwlwifi_dev_ucode_error tracepoint to pass pointer to a table
instead of all 17 arguments by value.
dvm/main.c and mvm/utils.c have 'struct iwl_error_event_table'
defined with very similar yet subtly different fields and offsets.
tracepoint is still common and using definition of 'struct iwl_error_event_table'
from dvm/commands.h while copying fields.
Long term this tracepoint probably should be split into two.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 drivers/net/wireless/intel/iwlwifi/dvm/main.c      |  7 +---
 .../wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h  | 39 ++++++++++------------
 drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c  |  1 +
 drivers/net/wireless/intel/iwlwifi/mvm/utils.c     |  7 +---
 4 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/main.c b/drivers/net/wireless/intel/iwlwifi/dvm/main.c
index d11d72615de2..e68254e12764 100644
--- a/drivers/net/wireless/intel/iwlwifi/dvm/main.c
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/main.c
@@ -1651,12 +1651,7 @@ static void iwl_dump_nic_error_log(struct iwl_priv *priv)
 			priv->status, table.valid);
 	}
 
-	trace_iwlwifi_dev_ucode_error(trans->dev, table.error_id, table.tsf_low,
-				      table.data1, table.data2, table.line,
-				      table.blink2, table.ilink1, table.ilink2,
-				      table.bcon_time, table.gp1, table.gp2,
-				      table.gp3, table.ucode_ver, table.hw_ver,
-				      0, table.brd_ver);
+	trace_iwlwifi_dev_ucode_error(trans->dev, &table, 0, table.brd_ver);
 	IWL_ERR(priv, "0x%08X | %-28s\n", table.error_id,
 		desc_lookup(table.error_id));
 	IWL_ERR(priv, "0x%08X | uPc\n", table.pc);
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
index 9518a82f44c2..27e3e4e96aa2 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
@@ -126,14 +126,11 @@ TRACE_EVENT(iwlwifi_dev_tx,
 		  __entry->framelen, __entry->skbaddr)
 );
 
+struct iwl_error_event_table;
 TRACE_EVENT(iwlwifi_dev_ucode_error,
-	TP_PROTO(const struct device *dev, u32 desc, u32 tsf_low,
-		 u32 data1, u32 data2, u32 line, u32 blink2, u32 ilink1,
-		 u32 ilink2, u32 bcon_time, u32 gp1, u32 gp2, u32 rev_type,
-		 u32 major, u32 minor, u32 hw_ver, u32 brd_ver),
-	TP_ARGS(dev, desc, tsf_low, data1, data2, line,
-		 blink2, ilink1, ilink2, bcon_time, gp1, gp2,
-		 rev_type, major, minor, hw_ver, brd_ver),
+	TP_PROTO(const struct device *dev, const struct iwl_error_event_table *table,
+		 u32 hw_ver, u32 brd_ver),
+	TP_ARGS(dev, table, hw_ver, brd_ver),
 	TP_STRUCT__entry(
 		DEV_ENTRY
 		__field(u32, desc)
@@ -155,20 +152,20 @@ TRACE_EVENT(iwlwifi_dev_ucode_error,
 	),
 	TP_fast_assign(
 		DEV_ASSIGN;
-		__entry->desc = desc;
-		__entry->tsf_low = tsf_low;
-		__entry->data1 = data1;
-		__entry->data2 = data2;
-		__entry->line = line;
-		__entry->blink2 = blink2;
-		__entry->ilink1 = ilink1;
-		__entry->ilink2 = ilink2;
-		__entry->bcon_time = bcon_time;
-		__entry->gp1 = gp1;
-		__entry->gp2 = gp2;
-		__entry->rev_type = rev_type;
-		__entry->major = major;
-		__entry->minor = minor;
+		__entry->desc = table->error_id;
+		__entry->tsf_low = table->tsf_low;
+		__entry->data1 = table->data1;
+		__entry->data2 = table->data2;
+		__entry->line = table->line;
+		__entry->blink2 = table->blink2;
+		__entry->ilink1 = table->ilink1;
+		__entry->ilink2 = table->ilink2;
+		__entry->bcon_time = table->bcon_time;
+		__entry->gp1 = table->gp1;
+		__entry->gp2 = table->gp2;
+		__entry->rev_type = table->gp3;
+		__entry->major = table->ucode_ver;
+		__entry->minor = table->hw_ver;
 		__entry->hw_ver = hw_ver;
 		__entry->brd_ver = brd_ver;
 	),
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
index 50510fb6ab8c..6aa719865a58 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
@@ -30,6 +30,7 @@
 #ifndef __CHECKER__
 #include "iwl-trans.h"
 
+#include "dvm/commands.h"
 #define CREATE_TRACE_POINTS
 #include "iwl-devtrace.h"
 
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
index d65e1db7c097..5442ead876eb 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
@@ -549,12 +549,7 @@ static void iwl_mvm_dump_lmac_error_log(struct iwl_mvm *mvm, u32 base)
 
 	IWL_ERR(mvm, "Loaded firmware version: %s\n", mvm->fw->fw_version);
 
-	trace_iwlwifi_dev_ucode_error(trans->dev, table.error_id, table.tsf_low,
-				      table.data1, table.data2, table.data3,
-				      table.blink2, table.ilink1,
-				      table.ilink2, table.bcon_time, table.gp1,
-				      table.gp2, table.fw_rev_type, table.major,
-				      table.minor, table.hw_ver, table.brd_ver);
+	trace_iwlwifi_dev_ucode_error(trans->dev, &table, table.hw_ver, table.brd_ver);
 	IWL_ERR(mvm, "0x%08X | %-28s\n", table.error_id,
 		desc_lookup(table.error_id));
 	IWL_ERR(mvm, "0x%08X | trm_hw_status0\n", table.trm_hw_status0);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 05/10] macro: introduce COUNT_ARGS() macro
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:11   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

move COUNT_ARGS() macro from apparmor to generic header and extend it
to count till twelve.

COUNT() was an alternative name for this logic, but it's used for
different purpose in many other places.

Similarly for CONCATENATE() macro.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/kernel.h           | 7 +++++++
 security/apparmor/include/path.h | 7 +------
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..293fa0677fba 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -919,6 +919,13 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
 #define swap(a, b) \
 	do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
 
+/* This counts to 12. Any more, it will return 13th argument. */
+#define __COUNT_ARGS(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _n, X...) _n
+#define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
+
+#define __CONCAT(a, b) a ## b
+#define CONCATENATE(a, b) __CONCAT(a, b)
+
 /**
  * container_of - cast a member of a structure out to the containing structure
  * @ptr:	the pointer to the member.
diff --git a/security/apparmor/include/path.h b/security/apparmor/include/path.h
index 05fb3305671e..e042b994f2b8 100644
--- a/security/apparmor/include/path.h
+++ b/security/apparmor/include/path.h
@@ -43,15 +43,10 @@ struct aa_buffers {
 
 DECLARE_PER_CPU(struct aa_buffers, aa_buffers);
 
-#define COUNT_ARGS(X...) COUNT_ARGS_HELPER(, ##X, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
-#define COUNT_ARGS_HELPER(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, n, X...) n
-#define CONCAT(X, Y) X ## Y
-#define CONCAT_AFTER(X, Y) CONCAT(X, Y)
-
 #define ASSIGN(FN, X, N) ((X) = FN(N))
 #define EVAL1(FN, X) ASSIGN(FN, X, 0) /*X = FN(0)*/
 #define EVAL2(FN, X, Y...) do { ASSIGN(FN, X, 1);  EVAL1(FN, Y); } while (0)
-#define EVAL(FN, X...) CONCAT_AFTER(EVAL, COUNT_ARGS(X))(FN, X)
+#define EVAL(FN, X...) CONCATENATE(EVAL, COUNT_ARGS(X))(FN, X)
 
 #define for_each_cpu_buffer(I) for ((I) = 0; (I) < MAX_PATH_BUFFERS; (I)++)
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 05/10] macro: introduce COUNT_ARGS() macro
@ 2018-03-28  2:11   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

move COUNT_ARGS() macro from apparmor to generic header and extend it
to count till twelve.

COUNT() was an alternative name for this logic, but it's used for
different purpose in many other places.

Similarly for CONCATENATE() macro.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/kernel.h           | 7 +++++++
 security/apparmor/include/path.h | 7 +------
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..293fa0677fba 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -919,6 +919,13 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
 #define swap(a, b) \
 	do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
 
+/* This counts to 12. Any more, it will return 13th argument. */
+#define __COUNT_ARGS(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _n, X...) _n
+#define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
+
+#define __CONCAT(a, b) a ## b
+#define CONCATENATE(a, b) __CONCAT(a, b)
+
 /**
  * container_of - cast a member of a structure out to the containing structure
  * @ptr:	the pointer to the member.
diff --git a/security/apparmor/include/path.h b/security/apparmor/include/path.h
index 05fb3305671e..e042b994f2b8 100644
--- a/security/apparmor/include/path.h
+++ b/security/apparmor/include/path.h
@@ -43,15 +43,10 @@ struct aa_buffers {
 
 DECLARE_PER_CPU(struct aa_buffers, aa_buffers);
 
-#define COUNT_ARGS(X...) COUNT_ARGS_HELPER(, ##X, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
-#define COUNT_ARGS_HELPER(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, n, X...) n
-#define CONCAT(X, Y) X ## Y
-#define CONCAT_AFTER(X, Y) CONCAT(X, Y)
-
 #define ASSIGN(FN, X, N) ((X) = FN(N))
 #define EVAL1(FN, X) ASSIGN(FN, X, 0) /*X = FN(0)*/
 #define EVAL2(FN, X, Y...) do { ASSIGN(FN, X, 1);  EVAL1(FN, Y); } while (0)
-#define EVAL(FN, X...) CONCAT_AFTER(EVAL, COUNT_ARGS(X))(FN, X)
+#define EVAL(FN, X...) CONCATENATE(EVAL, COUNT_ARGS(X))(FN, X)
 
 #define for_each_cpu_buffer(I) for ((I) = 0; (I) < MAX_PATH_BUFFERS; (I)++)
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:11   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

compute number of arguments passed into tracepoint
at compile time and store it as part of 'struct tracepoint'.
The number is necessary to check safety of bpf program access that
is coming in subsequent patch.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/tracepoint-defs.h |  1 +
 include/linux/tracepoint.h      | 12 ++++++------
 include/trace/define_trace.h    | 14 +++++++-------
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 64ed7064f1fa..39a283c61c51 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -33,6 +33,7 @@ struct tracepoint {
 	int (*regfunc)(void);
 	void (*unregfunc)(void);
 	struct tracepoint_func __rcu *funcs;
+	u32 num_args;
 };
 
 #endif
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..c92f4adbc0d7 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -230,18 +230,18 @@ extern void syscall_unregfunc(void);
  * structures, so we create an array of pointers that will be used for iteration
  * on the tracepoints.
  */
-#define DEFINE_TRACE_FN(name, reg, unreg)				 \
+#define DEFINE_TRACE_FN(name, reg, unreg, num_args)			 \
 	static const char __tpstrtab_##name[]				 \
 	__attribute__((section("__tracepoints_strings"))) = #name;	 \
 	struct tracepoint __tracepoint_##name				 \
 	__attribute__((section("__tracepoints"))) =			 \
-		{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL };\
+		{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL, num_args };\
 	static struct tracepoint * const __tracepoint_ptr_##name __used	 \
 	__attribute__((section("__tracepoints_ptrs"))) =		 \
 		&__tracepoint_##name;
 
-#define DEFINE_TRACE(name)						\
-	DEFINE_TRACE_FN(name, NULL, NULL);
+#define DEFINE_TRACE(name, num_args)					\
+	DEFINE_TRACE_FN(name, NULL, NULL, num_args);
 
 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)				\
 	EXPORT_SYMBOL_GPL(__tracepoint_##name)
@@ -275,8 +275,8 @@ extern void syscall_unregfunc(void);
 		return false;						\
 	}
 
-#define DEFINE_TRACE_FN(name, reg, unreg)
-#define DEFINE_TRACE(name)
+#define DEFINE_TRACE_FN(name, reg, unreg, num_args)
+#define DEFINE_TRACE(name, num_args)
 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
 #define EXPORT_TRACEPOINT_SYMBOL(name)
 
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index d9e3d4aa3f6e..96b22ace9ae7 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -25,7 +25,7 @@
 
 #undef TRACE_EVENT
 #define TRACE_EVENT(name, proto, args, tstruct, assign, print)	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, COUNT_ARGS(args))
 
 #undef TRACE_EVENT_CONDITION
 #define TRACE_EVENT_CONDITION(name, proto, args, cond, tstruct, assign, print) \
@@ -39,24 +39,24 @@
 #undef TRACE_EVENT_FN
 #define TRACE_EVENT_FN(name, proto, args, tstruct,		\
 		assign, print, reg, unreg)			\
-	DEFINE_TRACE_FN(name, reg, unreg)
+	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
 
 #undef TRACE_EVENT_FN_COND
 #define TRACE_EVENT_FN_COND(name, proto, args, cond, tstruct,		\
 		assign, print, reg, unreg)			\
-	DEFINE_TRACE_FN(name, reg, unreg)
+	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
 
 #undef DEFINE_EVENT
 #define DEFINE_EVENT(template, name, proto, args) \
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, COUNT_ARGS(args))
 
 #undef DEFINE_EVENT_FN
 #define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg) \
-	DEFINE_TRACE_FN(name, reg, unreg)
+	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
 
 #undef DEFINE_EVENT_PRINT
 #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, COUNT_ARGS(args))
 
 #undef DEFINE_EVENT_CONDITION
 #define DEFINE_EVENT_CONDITION(template, name, proto, args, cond) \
@@ -64,7 +64,7 @@
 
 #undef DECLARE_TRACE
 #define DECLARE_TRACE(name, proto, args)	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, COUNT_ARGS(args))
 
 #undef TRACE_INCLUDE
 #undef __TRACE_INCLUDE
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
@ 2018-03-28  2:11   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

compute number of arguments passed into tracepoint
at compile time and store it as part of 'struct tracepoint'.
The number is necessary to check safety of bpf program access that
is coming in subsequent patch.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/tracepoint-defs.h |  1 +
 include/linux/tracepoint.h      | 12 ++++++------
 include/trace/define_trace.h    | 14 +++++++-------
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 64ed7064f1fa..39a283c61c51 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -33,6 +33,7 @@ struct tracepoint {
 	int (*regfunc)(void);
 	void (*unregfunc)(void);
 	struct tracepoint_func __rcu *funcs;
+	u32 num_args;
 };
 
 #endif
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..c92f4adbc0d7 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -230,18 +230,18 @@ extern void syscall_unregfunc(void);
  * structures, so we create an array of pointers that will be used for iteration
  * on the tracepoints.
  */
-#define DEFINE_TRACE_FN(name, reg, unreg)				 \
+#define DEFINE_TRACE_FN(name, reg, unreg, num_args)			 \
 	static const char __tpstrtab_##name[]				 \
 	__attribute__((section("__tracepoints_strings"))) = #name;	 \
 	struct tracepoint __tracepoint_##name				 \
 	__attribute__((section("__tracepoints"))) =			 \
-		{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL };\
+		{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL, num_args };\
 	static struct tracepoint * const __tracepoint_ptr_##name __used	 \
 	__attribute__((section("__tracepoints_ptrs"))) =		 \
 		&__tracepoint_##name;
 
-#define DEFINE_TRACE(name)						\
-	DEFINE_TRACE_FN(name, NULL, NULL);
+#define DEFINE_TRACE(name, num_args)					\
+	DEFINE_TRACE_FN(name, NULL, NULL, num_args);
 
 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)				\
 	EXPORT_SYMBOL_GPL(__tracepoint_##name)
@@ -275,8 +275,8 @@ extern void syscall_unregfunc(void);
 		return false;						\
 	}
 
-#define DEFINE_TRACE_FN(name, reg, unreg)
-#define DEFINE_TRACE(name)
+#define DEFINE_TRACE_FN(name, reg, unreg, num_args)
+#define DEFINE_TRACE(name, num_args)
 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
 #define EXPORT_TRACEPOINT_SYMBOL(name)
 
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index d9e3d4aa3f6e..96b22ace9ae7 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -25,7 +25,7 @@
 
 #undef TRACE_EVENT
 #define TRACE_EVENT(name, proto, args, tstruct, assign, print)	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, COUNT_ARGS(args))
 
 #undef TRACE_EVENT_CONDITION
 #define TRACE_EVENT_CONDITION(name, proto, args, cond, tstruct, assign, print) \
@@ -39,24 +39,24 @@
 #undef TRACE_EVENT_FN
 #define TRACE_EVENT_FN(name, proto, args, tstruct,		\
 		assign, print, reg, unreg)			\
-	DEFINE_TRACE_FN(name, reg, unreg)
+	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
 
 #undef TRACE_EVENT_FN_COND
 #define TRACE_EVENT_FN_COND(name, proto, args, cond, tstruct,		\
 		assign, print, reg, unreg)			\
-	DEFINE_TRACE_FN(name, reg, unreg)
+	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
 
 #undef DEFINE_EVENT
 #define DEFINE_EVENT(template, name, proto, args) \
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, COUNT_ARGS(args))
 
 #undef DEFINE_EVENT_FN
 #define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg) \
-	DEFINE_TRACE_FN(name, reg, unreg)
+	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
 
 #undef DEFINE_EVENT_PRINT
 #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, COUNT_ARGS(args))
 
 #undef DEFINE_EVENT_CONDITION
 #define DEFINE_EVENT_CONDITION(template, name, proto, args, cond) \
@@ -64,7 +64,7 @@
 
 #undef DECLARE_TRACE
 #define DECLARE_TRACE(name, proto, args)	\
-	DEFINE_TRACE(name)
+	DEFINE_TRACE(name, COUNT_ARGS(args))
 
 #undef TRACE_INCLUDE
 #undef __TRACE_INCLUDE
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 07/10] bpf: introduce BPF_RAW_TRACEPOINT
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:11   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
       __u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
  // program can read args[N] where N depends on tracepoint
  // and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
   0xffffffff81132080 <+0>:     mov    %ecx,%ecx
   0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
        TP_PROTO(const struct net_device *dev,
                 const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.

This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.

The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

__bpf_raw_tp_map section logic was contributed by Steven Rostedt

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/asm-generic/vmlinux.lds.h |  10 +++
 include/linux/bpf_types.h         |   1 +
 include/linux/trace_events.h      |  42 +++++++++
 include/linux/tracepoint-defs.h   |   5 ++
 include/trace/bpf_probe.h         |  91 +++++++++++++++++++
 include/trace/define_trace.h      |   1 +
 include/uapi/linux/bpf.h          |  11 +++
 kernel/bpf/syscall.c              |  78 ++++++++++++++++
 kernel/trace/bpf_trace.c          | 183 ++++++++++++++++++++++++++++++++++++++
 9 files changed, 422 insertions(+)
 create mode 100644 include/trace/bpf_probe.h

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 1ab0e520d6fc..8add3493a202 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -178,6 +178,15 @@
 #define TRACE_SYSCALLS()
 #endif
 
+#ifdef CONFIG_BPF_EVENTS
+#define BPF_RAW_TP() STRUCT_ALIGN();					\
+			 VMLINUX_SYMBOL(__start__bpf_raw_tp) = .;	\
+			 KEEP(*(__bpf_raw_tp_map))			\
+			 VMLINUX_SYMBOL(__stop__bpf_raw_tp) = .;
+#else
+#define BPF_RAW_TP()
+#endif
+
 #ifdef CONFIG_SERIAL_EARLYCON
 #define EARLYCON_TABLE() STRUCT_ALIGN();			\
 			 VMLINUX_SYMBOL(__earlycon_table) = .;	\
@@ -249,6 +258,7 @@
 	LIKELY_PROFILE()		       				\
 	BRANCH_PROFILE()						\
 	TRACE_PRINTKS()							\
+	BPF_RAW_TP()							\
 	TRACEPOINT_STR()
 
 /*
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 5e2e8a49fb21..6d7243bfb0ff 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -19,6 +19,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
 BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
 BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
 BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
+BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 #endif
 #ifdef CONFIG_CGROUP_BPF
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a1442c4e513..b0357cd198b0 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -468,6 +468,9 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
+int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
+struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name);
 #else
 static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 {
@@ -487,6 +490,18 @@ perf_event_query_prog_array(struct perf_event *event, void __user *info)
 {
 	return -EOPNOTSUPP;
 }
+static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *p)
+{
+	return -EOPNOTSUPP;
+}
+static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *p)
+{
+	return -EOPNOTSUPP;
+}
+static inline struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name)
+{
+	return NULL;
+}
 #endif
 
 enum {
@@ -546,6 +561,33 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
 void perf_trace_buf_update(void *record, u16 type);
 void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 
+void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
+void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
+void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3);
+void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4);
+void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5);
+void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5, u64 arg6);
+void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
+void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		    u64 arg8);
+void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		    u64 arg8, u64 arg9);
+void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		     u64 arg8, u64 arg9, u64 arg10);
+void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		     u64 arg8, u64 arg9, u64 arg10, u64 arg11);
+void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		     u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
 void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
 			       struct trace_event_call *call, u64 count,
 			       struct pt_regs *regs, struct hlist_head *head,
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 39a283c61c51..35db8dd48c4c 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -36,4 +36,9 @@ struct tracepoint {
 	u32 num_args;
 };
 
+struct bpf_raw_event_map {
+	struct tracepoint	*tp;
+	void			*bpf_func;
+};
+
 #endif
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
new file mode 100644
index 000000000000..c374ed1f2eee
--- /dev/null
+++ b/include/trace/bpf_probe.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#undef TRACE_SYSTEM_VAR
+
+#ifdef CONFIG_BPF_EVENTS
+
+#undef __entry
+#define __entry entry
+
+#undef __get_dynamic_array
+#define __get_dynamic_array(field)	\
+		((void *)__entry + (__entry->__data_loc_##field & 0xffff))
+
+#undef __get_dynamic_array_len
+#define __get_dynamic_array_len(field)	\
+		((__entry->__data_loc_##field >> 16) & 0xffff)
+
+#undef __get_str
+#define __get_str(field) ((char *)__get_dynamic_array(field))
+
+#undef __get_bitmask
+#define __get_bitmask(field) (char *)__get_dynamic_array(field)
+
+#undef __perf_count
+#define __perf_count(c)	(c)
+
+#undef __perf_task
+#define __perf_task(t)	(t)
+
+/* cast any integer, pointer, or small struct to u64 */
+#define UINTTYPE(size) \
+	__typeof__(__builtin_choose_expr(size == 1,  (u8)1, \
+		   __builtin_choose_expr(size == 2, (u16)2, \
+		   __builtin_choose_expr(size == 4, (u32)3, \
+		   __builtin_choose_expr(size == 8, (u64)4, \
+					 (void)5)))))
+#define __CAST_TO_U64(x) ({ \
+	typeof(x) __src = (x); \
+	UINTTYPE(sizeof(x)) __dst; \
+	memcpy(&__dst, &__src, sizeof(__dst)); \
+	(u64)__dst; })
+
+#define __CAST1(a,...) __CAST_TO_U64(a)
+#define __CAST2(a,...) __CAST_TO_U64(a), __CAST1(__VA_ARGS__)
+#define __CAST3(a,...) __CAST_TO_U64(a), __CAST2(__VA_ARGS__)
+#define __CAST4(a,...) __CAST_TO_U64(a), __CAST3(__VA_ARGS__)
+#define __CAST5(a,...) __CAST_TO_U64(a), __CAST4(__VA_ARGS__)
+#define __CAST6(a,...) __CAST_TO_U64(a), __CAST5(__VA_ARGS__)
+#define __CAST7(a,...) __CAST_TO_U64(a), __CAST6(__VA_ARGS__)
+#define __CAST8(a,...) __CAST_TO_U64(a), __CAST7(__VA_ARGS__)
+#define __CAST9(a,...) __CAST_TO_U64(a), __CAST8(__VA_ARGS__)
+#define __CAST10(a,...) __CAST_TO_U64(a), __CAST9(__VA_ARGS__)
+#define __CAST11(a,...) __CAST_TO_U64(a), __CAST10(__VA_ARGS__)
+#define __CAST12(a,...) __CAST_TO_U64(a), __CAST11(__VA_ARGS__)
+/* tracepoints with more than 12 arguments will hit build error */
+#define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
+
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
+static notrace void							\
+__bpf_trace_##call(void *__data, proto)					\
+{									\
+	struct bpf_prog *prog = __data;					\
+	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
+}
+
+/*
+ * This part is compiled out, it is only here as a build time check
+ * to make sure that if the tracepoint handling changes, the
+ * bpf probe will fail to compile unless it too is updated.
+ */
+#undef DEFINE_EVENT
+#define DEFINE_EVENT(template, call, proto, args)			\
+static inline void bpf_test_probe_##call(void)				\
+{									\
+	check_trace_callback_type_##call(__bpf_trace_##template);	\
+}									\
+static struct bpf_raw_event_map	__used					\
+	__attribute__((section("__bpf_raw_tp_map")))			\
+__bpf_trace_tp_map_##call = {						\
+	.tp		= &__tracepoint_##call,				\
+	.bpf_func	= (void *)__bpf_trace_##template,		\
+};
+
+
+#undef DEFINE_EVENT_PRINT
+#define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
+	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
+
+#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+#endif /* CONFIG_BPF_EVENTS */
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 96b22ace9ae7..5f8216bc261f 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -95,6 +95,7 @@
 #ifdef TRACEPOINTS_ENABLED
 #include <trace/trace_events.h>
 #include <trace/perf.h>
+#include <trace/bpf_probe.h>
 #endif
 
 #undef TRACE_EVENT
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 18b7c510c511..1878201c2d77 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@ enum bpf_cmd {
 	BPF_MAP_GET_FD_BY_ID,
 	BPF_OBJ_GET_INFO_BY_FD,
 	BPF_PROG_QUERY,
+	BPF_RAW_TRACEPOINT_OPEN,
 };
 
 enum bpf_map_type {
@@ -134,6 +135,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_SKB,
 	BPF_PROG_TYPE_CGROUP_DEVICE,
 	BPF_PROG_TYPE_SK_MSG,
+	BPF_PROG_TYPE_RAW_TRACEPOINT,
 };
 
 enum bpf_attach_type {
@@ -344,6 +346,11 @@ union bpf_attr {
 		__aligned_u64	prog_ids;
 		__u32		prog_cnt;
 	} query;
+
+	struct {
+		__u64 name;
+		__u32 prog_fd;
+	} raw_tracepoint;
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
@@ -1152,4 +1159,8 @@ struct bpf_cgroup_dev_ctx {
 	__u32 minor;
 };
 
+struct bpf_raw_tracepoint_args {
+	__u64 args[0];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3aeb4ea2a93a..1894537dcada 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1311,6 +1311,81 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
+struct bpf_raw_tracepoint {
+	struct bpf_raw_event_map *btp;
+	struct bpf_prog *prog;
+};
+
+static int bpf_raw_tracepoint_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_raw_tracepoint *raw_tp = filp->private_data;
+
+	if (raw_tp->prog) {
+		bpf_probe_unregister(raw_tp->btp, raw_tp->prog);
+		bpf_prog_put(raw_tp->prog);
+	}
+	kfree(raw_tp);
+	return 0;
+}
+
+static const struct file_operations bpf_raw_tp_fops = {
+	.release	= bpf_raw_tracepoint_release,
+	.read		= bpf_dummy_read,
+	.write		= bpf_dummy_write,
+};
+
+#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
+
+static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
+{
+	struct bpf_raw_tracepoint *raw_tp;
+	struct bpf_raw_event_map *btp;
+	struct bpf_prog *prog;
+	char tp_name[128];
+	int tp_fd, err;
+
+	if (strncpy_from_user(tp_name, u64_to_user_ptr(attr->raw_tracepoint.name),
+			      sizeof(tp_name) - 1) < 0)
+		return -EFAULT;
+	tp_name[sizeof(tp_name) - 1] = 0;
+
+	btp = bpf_find_raw_tracepoint(tp_name);
+	if (!btp)
+		return -ENOENT;
+
+	raw_tp = kzalloc(sizeof(*raw_tp), GFP_USER);
+	if (!raw_tp)
+		return -ENOMEM;
+	raw_tp->btp = btp;
+
+	prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
+				 BPF_PROG_TYPE_RAW_TRACEPOINT);
+	if (IS_ERR(prog)) {
+		err = PTR_ERR(prog);
+		goto out_free_tp;
+	}
+
+	err = bpf_probe_register(raw_tp->btp, prog);
+	if (err)
+		goto out_put_prog;
+
+	raw_tp->prog = prog;
+	tp_fd = anon_inode_getfd("bpf-raw-tracepoint", &bpf_raw_tp_fops, raw_tp,
+				 O_CLOEXEC);
+	if (tp_fd < 0) {
+		bpf_probe_unregister(raw_tp->btp, prog);
+		err = tp_fd;
+		goto out_put_prog;
+	}
+	return tp_fd;
+
+out_put_prog:
+	bpf_prog_put(prog);
+out_free_tp:
+	kfree(raw_tp);
+	return err;
+}
+
 #ifdef CONFIG_CGROUP_BPF
 
 #define BPF_PROG_ATTACH_LAST_FIELD attach_flags
@@ -1921,6 +1996,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_OBJ_GET_INFO_BY_FD:
 		err = bpf_obj_get_info_by_fd(&attr, uattr);
 		break;
+	case BPF_RAW_TRACEPOINT_OPEN:
+		err = bpf_raw_tracepoint_open(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index c634e093951f..7d0009e712f5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -723,6 +723,86 @@ const struct bpf_verifier_ops tracepoint_verifier_ops = {
 const struct bpf_prog_ops tracepoint_prog_ops = {
 };
 
+/*
+ * bpf_raw_tp_regs are separate from bpf_pt_regs used from skb/xdp
+ * to avoid potential recursive reuse issue when/if tracepoints are added
+ * inside bpf_*_event_output and/or bpf_get_stack_id
+ */
+static DEFINE_PER_CPU(struct pt_regs, bpf_raw_tp_regs);
+BPF_CALL_5(bpf_perf_event_output_raw_tp, struct bpf_raw_tracepoint_args *, args,
+	   struct bpf_map *, map, u64, flags, void *, data, u64, size)
+{
+	struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
+
+	perf_fetch_caller_regs(regs);
+	return ____bpf_perf_event_output(regs, map, flags, data, size);
+}
+
+static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
+	.func		= bpf_perf_event_output_raw_tp,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
+BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
+	   struct bpf_map *, map, u64, flags)
+{
+	struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
+
+	perf_fetch_caller_regs(regs);
+	/* similar to bpf_perf_event_output_tp, but pt_regs fetched differently */
+	return bpf_get_stackid((unsigned long) regs, (unsigned long) map,
+			       flags, 0, 0);
+}
+
+static const struct bpf_func_proto bpf_get_stackid_proto_raw_tp = {
+	.func		= bpf_get_stackid_raw_tp,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *raw_tp_prog_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	case BPF_FUNC_perf_event_output:
+		return &bpf_perf_event_output_proto_raw_tp;
+	case BPF_FUNC_get_stackid:
+		return &bpf_get_stackid_proto_raw_tp;
+	default:
+		return tracing_func_proto(func_id);
+	}
+}
+
+static bool raw_tp_prog_is_valid_access(int off, int size,
+					enum bpf_access_type type,
+					struct bpf_insn_access_aux *info)
+{
+	/* largest tracepoint in the kernel has 12 args */
+	if (off < 0 || off >= sizeof(__u64) * 12)
+		return false;
+	if (type != BPF_READ)
+		return false;
+	if (off % size != 0)
+		return false;
+	return true;
+}
+
+const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
+	.get_func_proto  = raw_tp_prog_func_proto,
+	.is_valid_access = raw_tp_prog_is_valid_access,
+};
+
+const struct bpf_prog_ops raw_tracepoint_prog_ops = {
+};
+
 static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
 				    struct bpf_insn_access_aux *info)
 {
@@ -896,3 +976,106 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
 
 	return ret;
 }
+
+extern struct bpf_raw_event_map __start__bpf_raw_tp[];
+extern struct bpf_raw_event_map __stop__bpf_raw_tp[];
+
+struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name)
+{
+	struct bpf_raw_event_map *btp = __start__bpf_raw_tp;
+
+	for (; btp < __stop__bpf_raw_tp; btp++) {
+		if (!strcmp(btp->tp->name, name))
+			return btp;
+	}
+	return NULL;
+}
+
+static __always_inline
+void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
+{
+	rcu_read_lock();
+	preempt_disable();
+	(void) BPF_PROG_RUN(prog, args);
+	preempt_enable();
+	rcu_read_unlock();
+}
+
+#define UNPACK(...)			__VA_ARGS__
+#define REPEAT_1(FN, DL, X, ...)	FN(X)
+#define REPEAT_2(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
+#define REPEAT_3(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_2(FN, DL, __VA_ARGS__)
+#define REPEAT_4(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_3(FN, DL, __VA_ARGS__)
+#define REPEAT_5(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_4(FN, DL, __VA_ARGS__)
+#define REPEAT_6(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_5(FN, DL, __VA_ARGS__)
+#define REPEAT_7(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_6(FN, DL, __VA_ARGS__)
+#define REPEAT_8(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_7(FN, DL, __VA_ARGS__)
+#define REPEAT_9(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_8(FN, DL, __VA_ARGS__)
+#define REPEAT_10(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_9(FN, DL, __VA_ARGS__)
+#define REPEAT_11(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_10(FN, DL, __VA_ARGS__)
+#define REPEAT_12(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_11(FN, DL, __VA_ARGS__)
+#define REPEAT(X, FN, DL, ...)		REPEAT_##X(FN, DL, __VA_ARGS__)
+
+#define SARG(X)		u64 arg##X
+#define COPY(X)		args[X] = arg##X
+
+#define __DL_COM	(,)
+#define __DL_SEM	(;)
+
+#define __SEQ_0_11	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
+
+#define BPF_TRACE_DEFN_x(x)						\
+	void bpf_trace_run##x(struct bpf_prog *prog,			\
+			      REPEAT(x, SARG, __DL_COM, __SEQ_0_11))	\
+	{								\
+		u64 args[x];						\
+		REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);			\
+		__bpf_trace_run(prog, args);				\
+	}								\
+	EXPORT_SYMBOL_GPL(bpf_trace_run##x)
+BPF_TRACE_DEFN_x(1);
+BPF_TRACE_DEFN_x(2);
+BPF_TRACE_DEFN_x(3);
+BPF_TRACE_DEFN_x(4);
+BPF_TRACE_DEFN_x(5);
+BPF_TRACE_DEFN_x(6);
+BPF_TRACE_DEFN_x(7);
+BPF_TRACE_DEFN_x(8);
+BPF_TRACE_DEFN_x(9);
+BPF_TRACE_DEFN_x(10);
+BPF_TRACE_DEFN_x(11);
+BPF_TRACE_DEFN_x(12);
+
+static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+{
+	struct tracepoint *tp = btp->tp;
+
+	/*
+	 * check that program doesn't access arguments beyond what's
+	 * available in this tracepoint
+	 */
+	if (prog->aux->max_ctx_offset > tp->num_args * sizeof(u64))
+		return -EINVAL;
+
+	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
+}
+
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+{
+	int err;
+
+	mutex_lock(&bpf_event_mutex);
+	err = __bpf_probe_register(btp, prog);
+	mutex_unlock(&bpf_event_mutex);
+	return err;
+}
+
+int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+{
+	int err;
+
+	mutex_lock(&bpf_event_mutex);
+	err = tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
+	mutex_unlock(&bpf_event_mutex);
+	return err;
+}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 07/10] bpf: introduce BPF_RAW_TRACEPOINT
@ 2018-03-28  2:11   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
       __u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
  // program can read args[N] where N depends on tracepoint
  // and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
   0xffffffff81132080 <+0>:     mov    %ecx,%ecx
   0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
        TP_PROTO(const struct net_device *dev,
                 const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.

This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.

The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

__bpf_raw_tp_map section logic was contributed by Steven Rostedt

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/asm-generic/vmlinux.lds.h |  10 +++
 include/linux/bpf_types.h         |   1 +
 include/linux/trace_events.h      |  42 +++++++++
 include/linux/tracepoint-defs.h   |   5 ++
 include/trace/bpf_probe.h         |  91 +++++++++++++++++++
 include/trace/define_trace.h      |   1 +
 include/uapi/linux/bpf.h          |  11 +++
 kernel/bpf/syscall.c              |  78 ++++++++++++++++
 kernel/trace/bpf_trace.c          | 183 ++++++++++++++++++++++++++++++++++++++
 9 files changed, 422 insertions(+)
 create mode 100644 include/trace/bpf_probe.h

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 1ab0e520d6fc..8add3493a202 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -178,6 +178,15 @@
 #define TRACE_SYSCALLS()
 #endif
 
+#ifdef CONFIG_BPF_EVENTS
+#define BPF_RAW_TP() STRUCT_ALIGN();					\
+			 VMLINUX_SYMBOL(__start__bpf_raw_tp) = .;	\
+			 KEEP(*(__bpf_raw_tp_map))			\
+			 VMLINUX_SYMBOL(__stop__bpf_raw_tp) = .;
+#else
+#define BPF_RAW_TP()
+#endif
+
 #ifdef CONFIG_SERIAL_EARLYCON
 #define EARLYCON_TABLE() STRUCT_ALIGN();			\
 			 VMLINUX_SYMBOL(__earlycon_table) = .;	\
@@ -249,6 +258,7 @@
 	LIKELY_PROFILE()		       				\
 	BRANCH_PROFILE()						\
 	TRACE_PRINTKS()							\
+	BPF_RAW_TP()							\
 	TRACEPOINT_STR()
 
 /*
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 5e2e8a49fb21..6d7243bfb0ff 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -19,6 +19,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
 BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
 BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
 BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
+BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 #endif
 #ifdef CONFIG_CGROUP_BPF
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a1442c4e513..b0357cd198b0 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -468,6 +468,9 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
+int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
+struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name);
 #else
 static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 {
@@ -487,6 +490,18 @@ perf_event_query_prog_array(struct perf_event *event, void __user *info)
 {
 	return -EOPNOTSUPP;
 }
+static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *p)
+{
+	return -EOPNOTSUPP;
+}
+static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *p)
+{
+	return -EOPNOTSUPP;
+}
+static inline struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name)
+{
+	return NULL;
+}
 #endif
 
 enum {
@@ -546,6 +561,33 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
 void perf_trace_buf_update(void *record, u16 type);
 void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 
+void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
+void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
+void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3);
+void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4);
+void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5);
+void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5, u64 arg6);
+void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
+void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		    u64 arg8);
+void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		    u64 arg8, u64 arg9);
+void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		     u64 arg8, u64 arg9, u64 arg10);
+void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		     u64 arg8, u64 arg9, u64 arg10, u64 arg11);
+void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
+		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+		     u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
 void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
 			       struct trace_event_call *call, u64 count,
 			       struct pt_regs *regs, struct hlist_head *head,
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 39a283c61c51..35db8dd48c4c 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -36,4 +36,9 @@ struct tracepoint {
 	u32 num_args;
 };
 
+struct bpf_raw_event_map {
+	struct tracepoint	*tp;
+	void			*bpf_func;
+};
+
 #endif
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
new file mode 100644
index 000000000000..c374ed1f2eee
--- /dev/null
+++ b/include/trace/bpf_probe.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#undef TRACE_SYSTEM_VAR
+
+#ifdef CONFIG_BPF_EVENTS
+
+#undef __entry
+#define __entry entry
+
+#undef __get_dynamic_array
+#define __get_dynamic_array(field)	\
+		((void *)__entry + (__entry->__data_loc_##field & 0xffff))
+
+#undef __get_dynamic_array_len
+#define __get_dynamic_array_len(field)	\
+		((__entry->__data_loc_##field >> 16) & 0xffff)
+
+#undef __get_str
+#define __get_str(field) ((char *)__get_dynamic_array(field))
+
+#undef __get_bitmask
+#define __get_bitmask(field) (char *)__get_dynamic_array(field)
+
+#undef __perf_count
+#define __perf_count(c)	(c)
+
+#undef __perf_task
+#define __perf_task(t)	(t)
+
+/* cast any integer, pointer, or small struct to u64 */
+#define UINTTYPE(size) \
+	__typeof__(__builtin_choose_expr(size == 1,  (u8)1, \
+		   __builtin_choose_expr(size == 2, (u16)2, \
+		   __builtin_choose_expr(size == 4, (u32)3, \
+		   __builtin_choose_expr(size == 8, (u64)4, \
+					 (void)5)))))
+#define __CAST_TO_U64(x) ({ \
+	typeof(x) __src = (x); \
+	UINTTYPE(sizeof(x)) __dst; \
+	memcpy(&__dst, &__src, sizeof(__dst)); \
+	(u64)__dst; })
+
+#define __CAST1(a,...) __CAST_TO_U64(a)
+#define __CAST2(a,...) __CAST_TO_U64(a), __CAST1(__VA_ARGS__)
+#define __CAST3(a,...) __CAST_TO_U64(a), __CAST2(__VA_ARGS__)
+#define __CAST4(a,...) __CAST_TO_U64(a), __CAST3(__VA_ARGS__)
+#define __CAST5(a,...) __CAST_TO_U64(a), __CAST4(__VA_ARGS__)
+#define __CAST6(a,...) __CAST_TO_U64(a), __CAST5(__VA_ARGS__)
+#define __CAST7(a,...) __CAST_TO_U64(a), __CAST6(__VA_ARGS__)
+#define __CAST8(a,...) __CAST_TO_U64(a), __CAST7(__VA_ARGS__)
+#define __CAST9(a,...) __CAST_TO_U64(a), __CAST8(__VA_ARGS__)
+#define __CAST10(a,...) __CAST_TO_U64(a), __CAST9(__VA_ARGS__)
+#define __CAST11(a,...) __CAST_TO_U64(a), __CAST10(__VA_ARGS__)
+#define __CAST12(a,...) __CAST_TO_U64(a), __CAST11(__VA_ARGS__)
+/* tracepoints with more than 12 arguments will hit build error */
+#define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
+
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
+static notrace void							\
+__bpf_trace_##call(void *__data, proto)					\
+{									\
+	struct bpf_prog *prog = __data;					\
+	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
+}
+
+/*
+ * This part is compiled out, it is only here as a build time check
+ * to make sure that if the tracepoint handling changes, the
+ * bpf probe will fail to compile unless it too is updated.
+ */
+#undef DEFINE_EVENT
+#define DEFINE_EVENT(template, call, proto, args)			\
+static inline void bpf_test_probe_##call(void)				\
+{									\
+	check_trace_callback_type_##call(__bpf_trace_##template);	\
+}									\
+static struct bpf_raw_event_map	__used					\
+	__attribute__((section("__bpf_raw_tp_map")))			\
+__bpf_trace_tp_map_##call = {						\
+	.tp		= &__tracepoint_##call,				\
+	.bpf_func	= (void *)__bpf_trace_##template,		\
+};
+
+
+#undef DEFINE_EVENT_PRINT
+#define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
+	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
+
+#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+#endif /* CONFIG_BPF_EVENTS */
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 96b22ace9ae7..5f8216bc261f 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -95,6 +95,7 @@
 #ifdef TRACEPOINTS_ENABLED
 #include <trace/trace_events.h>
 #include <trace/perf.h>
+#include <trace/bpf_probe.h>
 #endif
 
 #undef TRACE_EVENT
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 18b7c510c511..1878201c2d77 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@ enum bpf_cmd {
 	BPF_MAP_GET_FD_BY_ID,
 	BPF_OBJ_GET_INFO_BY_FD,
 	BPF_PROG_QUERY,
+	BPF_RAW_TRACEPOINT_OPEN,
 };
 
 enum bpf_map_type {
@@ -134,6 +135,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_SKB,
 	BPF_PROG_TYPE_CGROUP_DEVICE,
 	BPF_PROG_TYPE_SK_MSG,
+	BPF_PROG_TYPE_RAW_TRACEPOINT,
 };
 
 enum bpf_attach_type {
@@ -344,6 +346,11 @@ union bpf_attr {
 		__aligned_u64	prog_ids;
 		__u32		prog_cnt;
 	} query;
+
+	struct {
+		__u64 name;
+		__u32 prog_fd;
+	} raw_tracepoint;
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
@@ -1152,4 +1159,8 @@ struct bpf_cgroup_dev_ctx {
 	__u32 minor;
 };
 
+struct bpf_raw_tracepoint_args {
+	__u64 args[0];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3aeb4ea2a93a..1894537dcada 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1311,6 +1311,81 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
+struct bpf_raw_tracepoint {
+	struct bpf_raw_event_map *btp;
+	struct bpf_prog *prog;
+};
+
+static int bpf_raw_tracepoint_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_raw_tracepoint *raw_tp = filp->private_data;
+
+	if (raw_tp->prog) {
+		bpf_probe_unregister(raw_tp->btp, raw_tp->prog);
+		bpf_prog_put(raw_tp->prog);
+	}
+	kfree(raw_tp);
+	return 0;
+}
+
+static const struct file_operations bpf_raw_tp_fops = {
+	.release	= bpf_raw_tracepoint_release,
+	.read		= bpf_dummy_read,
+	.write		= bpf_dummy_write,
+};
+
+#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
+
+static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
+{
+	struct bpf_raw_tracepoint *raw_tp;
+	struct bpf_raw_event_map *btp;
+	struct bpf_prog *prog;
+	char tp_name[128];
+	int tp_fd, err;
+
+	if (strncpy_from_user(tp_name, u64_to_user_ptr(attr->raw_tracepoint.name),
+			      sizeof(tp_name) - 1) < 0)
+		return -EFAULT;
+	tp_name[sizeof(tp_name) - 1] = 0;
+
+	btp = bpf_find_raw_tracepoint(tp_name);
+	if (!btp)
+		return -ENOENT;
+
+	raw_tp = kzalloc(sizeof(*raw_tp), GFP_USER);
+	if (!raw_tp)
+		return -ENOMEM;
+	raw_tp->btp = btp;
+
+	prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
+				 BPF_PROG_TYPE_RAW_TRACEPOINT);
+	if (IS_ERR(prog)) {
+		err = PTR_ERR(prog);
+		goto out_free_tp;
+	}
+
+	err = bpf_probe_register(raw_tp->btp, prog);
+	if (err)
+		goto out_put_prog;
+
+	raw_tp->prog = prog;
+	tp_fd = anon_inode_getfd("bpf-raw-tracepoint", &bpf_raw_tp_fops, raw_tp,
+				 O_CLOEXEC);
+	if (tp_fd < 0) {
+		bpf_probe_unregister(raw_tp->btp, prog);
+		err = tp_fd;
+		goto out_put_prog;
+	}
+	return tp_fd;
+
+out_put_prog:
+	bpf_prog_put(prog);
+out_free_tp:
+	kfree(raw_tp);
+	return err;
+}
+
 #ifdef CONFIG_CGROUP_BPF
 
 #define BPF_PROG_ATTACH_LAST_FIELD attach_flags
@@ -1921,6 +1996,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_OBJ_GET_INFO_BY_FD:
 		err = bpf_obj_get_info_by_fd(&attr, uattr);
 		break;
+	case BPF_RAW_TRACEPOINT_OPEN:
+		err = bpf_raw_tracepoint_open(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index c634e093951f..7d0009e712f5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -723,6 +723,86 @@ const struct bpf_verifier_ops tracepoint_verifier_ops = {
 const struct bpf_prog_ops tracepoint_prog_ops = {
 };
 
+/*
+ * bpf_raw_tp_regs are separate from bpf_pt_regs used from skb/xdp
+ * to avoid potential recursive reuse issue when/if tracepoints are added
+ * inside bpf_*_event_output and/or bpf_get_stack_id
+ */
+static DEFINE_PER_CPU(struct pt_regs, bpf_raw_tp_regs);
+BPF_CALL_5(bpf_perf_event_output_raw_tp, struct bpf_raw_tracepoint_args *, args,
+	   struct bpf_map *, map, u64, flags, void *, data, u64, size)
+{
+	struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
+
+	perf_fetch_caller_regs(regs);
+	return ____bpf_perf_event_output(regs, map, flags, data, size);
+}
+
+static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
+	.func		= bpf_perf_event_output_raw_tp,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
+BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
+	   struct bpf_map *, map, u64, flags)
+{
+	struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
+
+	perf_fetch_caller_regs(regs);
+	/* similar to bpf_perf_event_output_tp, but pt_regs fetched differently */
+	return bpf_get_stackid((unsigned long) regs, (unsigned long) map,
+			       flags, 0, 0);
+}
+
+static const struct bpf_func_proto bpf_get_stackid_proto_raw_tp = {
+	.func		= bpf_get_stackid_raw_tp,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *raw_tp_prog_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	case BPF_FUNC_perf_event_output:
+		return &bpf_perf_event_output_proto_raw_tp;
+	case BPF_FUNC_get_stackid:
+		return &bpf_get_stackid_proto_raw_tp;
+	default:
+		return tracing_func_proto(func_id);
+	}
+}
+
+static bool raw_tp_prog_is_valid_access(int off, int size,
+					enum bpf_access_type type,
+					struct bpf_insn_access_aux *info)
+{
+	/* largest tracepoint in the kernel has 12 args */
+	if (off < 0 || off >= sizeof(__u64) * 12)
+		return false;
+	if (type != BPF_READ)
+		return false;
+	if (off % size != 0)
+		return false;
+	return true;
+}
+
+const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
+	.get_func_proto  = raw_tp_prog_func_proto,
+	.is_valid_access = raw_tp_prog_is_valid_access,
+};
+
+const struct bpf_prog_ops raw_tracepoint_prog_ops = {
+};
+
 static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
 				    struct bpf_insn_access_aux *info)
 {
@@ -896,3 +976,106 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
 
 	return ret;
 }
+
+extern struct bpf_raw_event_map __start__bpf_raw_tp[];
+extern struct bpf_raw_event_map __stop__bpf_raw_tp[];
+
+struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name)
+{
+	struct bpf_raw_event_map *btp = __start__bpf_raw_tp;
+
+	for (; btp < __stop__bpf_raw_tp; btp++) {
+		if (!strcmp(btp->tp->name, name))
+			return btp;
+	}
+	return NULL;
+}
+
+static __always_inline
+void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
+{
+	rcu_read_lock();
+	preempt_disable();
+	(void) BPF_PROG_RUN(prog, args);
+	preempt_enable();
+	rcu_read_unlock();
+}
+
+#define UNPACK(...)			__VA_ARGS__
+#define REPEAT_1(FN, DL, X, ...)	FN(X)
+#define REPEAT_2(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
+#define REPEAT_3(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_2(FN, DL, __VA_ARGS__)
+#define REPEAT_4(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_3(FN, DL, __VA_ARGS__)
+#define REPEAT_5(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_4(FN, DL, __VA_ARGS__)
+#define REPEAT_6(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_5(FN, DL, __VA_ARGS__)
+#define REPEAT_7(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_6(FN, DL, __VA_ARGS__)
+#define REPEAT_8(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_7(FN, DL, __VA_ARGS__)
+#define REPEAT_9(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_8(FN, DL, __VA_ARGS__)
+#define REPEAT_10(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_9(FN, DL, __VA_ARGS__)
+#define REPEAT_11(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_10(FN, DL, __VA_ARGS__)
+#define REPEAT_12(FN, DL, X, ...)	FN(X) UNPACK DL REPEAT_11(FN, DL, __VA_ARGS__)
+#define REPEAT(X, FN, DL, ...)		REPEAT_##X(FN, DL, __VA_ARGS__)
+
+#define SARG(X)		u64 arg##X
+#define COPY(X)		args[X] = arg##X
+
+#define __DL_COM	(,)
+#define __DL_SEM	(;)
+
+#define __SEQ_0_11	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
+
+#define BPF_TRACE_DEFN_x(x)						\
+	void bpf_trace_run##x(struct bpf_prog *prog,			\
+			      REPEAT(x, SARG, __DL_COM, __SEQ_0_11))	\
+	{								\
+		u64 args[x];						\
+		REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);			\
+		__bpf_trace_run(prog, args);				\
+	}								\
+	EXPORT_SYMBOL_GPL(bpf_trace_run##x)
+BPF_TRACE_DEFN_x(1);
+BPF_TRACE_DEFN_x(2);
+BPF_TRACE_DEFN_x(3);
+BPF_TRACE_DEFN_x(4);
+BPF_TRACE_DEFN_x(5);
+BPF_TRACE_DEFN_x(6);
+BPF_TRACE_DEFN_x(7);
+BPF_TRACE_DEFN_x(8);
+BPF_TRACE_DEFN_x(9);
+BPF_TRACE_DEFN_x(10);
+BPF_TRACE_DEFN_x(11);
+BPF_TRACE_DEFN_x(12);
+
+static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+{
+	struct tracepoint *tp = btp->tp;
+
+	/*
+	 * check that program doesn't access arguments beyond what's
+	 * available in this tracepoint
+	 */
+	if (prog->aux->max_ctx_offset > tp->num_args * sizeof(u64))
+		return -EINVAL;
+
+	return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog);
+}
+
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+{
+	int err;
+
+	mutex_lock(&bpf_event_mutex);
+	err = __bpf_probe_register(btp, prog);
+	mutex_unlock(&bpf_event_mutex);
+	return err;
+}
+
+int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+{
+	int err;
+
+	mutex_lock(&bpf_event_mutex);
+	err = tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
+	mutex_unlock(&bpf_event_mutex);
+	return err;
+}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 08/10] libbpf: add bpf_raw_tracepoint_open helper
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:11   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

add bpf_raw_tracepoint_open(const char *name, int prog_fd) api to libbpf

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/include/uapi/linux/bpf.h | 11 +++++++++++
 tools/lib/bpf/bpf.c            | 11 +++++++++++
 tools/lib/bpf/bpf.h            |  1 +
 3 files changed, 23 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d245c41213ac..58060bec999d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@ enum bpf_cmd {
 	BPF_MAP_GET_FD_BY_ID,
 	BPF_OBJ_GET_INFO_BY_FD,
 	BPF_PROG_QUERY,
+	BPF_RAW_TRACEPOINT_OPEN,
 };
 
 enum bpf_map_type {
@@ -134,6 +135,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_SKB,
 	BPF_PROG_TYPE_CGROUP_DEVICE,
 	BPF_PROG_TYPE_SK_MSG,
+	BPF_PROG_TYPE_RAW_TRACEPOINT,
 };
 
 enum bpf_attach_type {
@@ -344,6 +346,11 @@ union bpf_attr {
 		__aligned_u64	prog_ids;
 		__u32		prog_cnt;
 	} query;
+
+	struct {
+		__u64 name;
+		__u32 prog_fd;
+	} raw_tracepoint;
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
@@ -1151,4 +1158,8 @@ struct bpf_cgroup_dev_ctx {
 	__u32 minor;
 };
 
+struct bpf_raw_tracepoint_args {
+	__u64 args[0];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 592a58a2b681..e0500055f1a6 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -428,6 +428,17 @@ int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
 	return err;
 }
 
+int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.raw_tracepoint.name = ptr_to_u64(name);
+	attr.raw_tracepoint.prog_fd = prog_fd;
+
+	return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
+}
+
 int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 {
 	struct sockaddr_nl sa;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 8d18fb73d7fb..ee59342c6f42 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -79,4 +79,5 @@ int bpf_map_get_fd_by_id(__u32 id);
 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
 		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt);
+int bpf_raw_tracepoint_open(const char *name, int prog_fd);
 #endif
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 08/10] libbpf: add bpf_raw_tracepoint_open helper
@ 2018-03-28  2:11   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

add bpf_raw_tracepoint_open(const char *name, int prog_fd) api to libbpf

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/include/uapi/linux/bpf.h | 11 +++++++++++
 tools/lib/bpf/bpf.c            | 11 +++++++++++
 tools/lib/bpf/bpf.h            |  1 +
 3 files changed, 23 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d245c41213ac..58060bec999d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@ enum bpf_cmd {
 	BPF_MAP_GET_FD_BY_ID,
 	BPF_OBJ_GET_INFO_BY_FD,
 	BPF_PROG_QUERY,
+	BPF_RAW_TRACEPOINT_OPEN,
 };
 
 enum bpf_map_type {
@@ -134,6 +135,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_SKB,
 	BPF_PROG_TYPE_CGROUP_DEVICE,
 	BPF_PROG_TYPE_SK_MSG,
+	BPF_PROG_TYPE_RAW_TRACEPOINT,
 };
 
 enum bpf_attach_type {
@@ -344,6 +346,11 @@ union bpf_attr {
 		__aligned_u64	prog_ids;
 		__u32		prog_cnt;
 	} query;
+
+	struct {
+		__u64 name;
+		__u32 prog_fd;
+	} raw_tracepoint;
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
@@ -1151,4 +1158,8 @@ struct bpf_cgroup_dev_ctx {
 	__u32 minor;
 };
 
+struct bpf_raw_tracepoint_args {
+	__u64 args[0];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 592a58a2b681..e0500055f1a6 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -428,6 +428,17 @@ int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
 	return err;
 }
 
+int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.raw_tracepoint.name = ptr_to_u64(name);
+	attr.raw_tracepoint.prog_fd = prog_fd;
+
+	return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
+}
+
 int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 {
 	struct sockaddr_nl sa;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 8d18fb73d7fb..ee59342c6f42 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -79,4 +79,5 @@ int bpf_map_get_fd_by_id(__u32 id);
 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
 		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt);
+int bpf_raw_tracepoint_open(const char *name, int prog_fd);
 #endif
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 09/10] samples/bpf: raw tracepoint test
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:11   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

add empty raw_tracepoint bpf program to test overhead similar
to kprobe and traditional tracepoint tests

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/Makefile                    |  1 +
 samples/bpf/bpf_load.c                  | 14 ++++++++++++++
 samples/bpf/test_overhead_raw_tp_kern.c | 17 +++++++++++++++++
 samples/bpf/test_overhead_user.c        | 12 ++++++++++++
 4 files changed, 44 insertions(+)
 create mode 100644 samples/bpf/test_overhead_raw_tp_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 2c2a587e0942..4d6a6edd4bf6 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -119,6 +119,7 @@ always += offwaketime_kern.o
 always += spintest_kern.o
 always += map_perf_test_kern.o
 always += test_overhead_tp_kern.o
+always += test_overhead_raw_tp_kern.o
 always += test_overhead_kprobe_kern.o
 always += parse_varlen.o parse_simple.o parse_ldabs.o
 always += test_cgrp2_tc_kern.o
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index b1a310c3ae89..bebe4188b4b3 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -61,6 +61,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
 	bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
 	bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0;
+	bool is_raw_tracepoint = strncmp(event, "raw_tracepoint/", 15) == 0;
 	bool is_xdp = strncmp(event, "xdp", 3) == 0;
 	bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
 	bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
@@ -85,6 +86,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		prog_type = BPF_PROG_TYPE_KPROBE;
 	} else if (is_tracepoint) {
 		prog_type = BPF_PROG_TYPE_TRACEPOINT;
+	} else if (is_raw_tracepoint) {
+		prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT;
 	} else if (is_xdp) {
 		prog_type = BPF_PROG_TYPE_XDP;
 	} else if (is_perf_event) {
@@ -131,6 +134,16 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		return populate_prog_array(event, fd);
 	}
 
+	if (is_raw_tracepoint) {
+		efd = bpf_raw_tracepoint_open(event + 15, fd);
+		if (efd < 0) {
+			printf("tracepoint %s %s\n", event + 15, strerror(errno));
+			return -1;
+		}
+		event_fd[prog_cnt - 1] = efd;
+		return 0;
+	}
+
 	if (is_kprobe || is_kretprobe) {
 		if (is_kprobe)
 			event += 7;
@@ -587,6 +600,7 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 		if (memcmp(shname, "kprobe/", 7) == 0 ||
 		    memcmp(shname, "kretprobe/", 10) == 0 ||
 		    memcmp(shname, "tracepoint/", 11) == 0 ||
+		    memcmp(shname, "raw_tracepoint/", 15) == 0 ||
 		    memcmp(shname, "xdp", 3) == 0 ||
 		    memcmp(shname, "perf_event", 10) == 0 ||
 		    memcmp(shname, "socket", 6) == 0 ||
diff --git a/samples/bpf/test_overhead_raw_tp_kern.c b/samples/bpf/test_overhead_raw_tp_kern.c
new file mode 100644
index 000000000000..d2af8bc1c805
--- /dev/null
+++ b/samples/bpf/test_overhead_raw_tp_kern.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Facebook */
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+SEC("raw_tracepoint/task_rename")
+int prog(struct bpf_raw_tracepoint_args *ctx)
+{
+	return 0;
+}
+
+SEC("raw_tracepoint/urandom_read")
+int prog2(struct bpf_raw_tracepoint_args *ctx)
+{
+	return 0;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_overhead_user.c b/samples/bpf/test_overhead_user.c
index d291167fd3c7..e1d35e07a10e 100644
--- a/samples/bpf/test_overhead_user.c
+++ b/samples/bpf/test_overhead_user.c
@@ -158,5 +158,17 @@ int main(int argc, char **argv)
 		unload_progs();
 	}
 
+	if (test_flags & 0xC0) {
+		snprintf(filename, sizeof(filename),
+			 "%s_raw_tp_kern.o", argv[0]);
+		if (load_bpf_file(filename)) {
+			printf("%s", bpf_log_buf);
+			return 1;
+		}
+		printf("w/RAW_TRACEPOINT\n");
+		run_perf_test(num_cpu, test_flags >> 6);
+		unload_progs();
+	}
+
 	return 0;
 }
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 09/10] samples/bpf: raw tracepoint test
@ 2018-03-28  2:11   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

add empty raw_tracepoint bpf program to test overhead similar
to kprobe and traditional tracepoint tests

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/Makefile                    |  1 +
 samples/bpf/bpf_load.c                  | 14 ++++++++++++++
 samples/bpf/test_overhead_raw_tp_kern.c | 17 +++++++++++++++++
 samples/bpf/test_overhead_user.c        | 12 ++++++++++++
 4 files changed, 44 insertions(+)
 create mode 100644 samples/bpf/test_overhead_raw_tp_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 2c2a587e0942..4d6a6edd4bf6 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -119,6 +119,7 @@ always += offwaketime_kern.o
 always += spintest_kern.o
 always += map_perf_test_kern.o
 always += test_overhead_tp_kern.o
+always += test_overhead_raw_tp_kern.o
 always += test_overhead_kprobe_kern.o
 always += parse_varlen.o parse_simple.o parse_ldabs.o
 always += test_cgrp2_tc_kern.o
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index b1a310c3ae89..bebe4188b4b3 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -61,6 +61,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 	bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
 	bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
 	bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0;
+	bool is_raw_tracepoint = strncmp(event, "raw_tracepoint/", 15) == 0;
 	bool is_xdp = strncmp(event, "xdp", 3) == 0;
 	bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
 	bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
@@ -85,6 +86,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		prog_type = BPF_PROG_TYPE_KPROBE;
 	} else if (is_tracepoint) {
 		prog_type = BPF_PROG_TYPE_TRACEPOINT;
+	} else if (is_raw_tracepoint) {
+		prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT;
 	} else if (is_xdp) {
 		prog_type = BPF_PROG_TYPE_XDP;
 	} else if (is_perf_event) {
@@ -131,6 +134,16 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		return populate_prog_array(event, fd);
 	}
 
+	if (is_raw_tracepoint) {
+		efd = bpf_raw_tracepoint_open(event + 15, fd);
+		if (efd < 0) {
+			printf("tracepoint %s %s\n", event + 15, strerror(errno));
+			return -1;
+		}
+		event_fd[prog_cnt - 1] = efd;
+		return 0;
+	}
+
 	if (is_kprobe || is_kretprobe) {
 		if (is_kprobe)
 			event += 7;
@@ -587,6 +600,7 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
 		if (memcmp(shname, "kprobe/", 7) == 0 ||
 		    memcmp(shname, "kretprobe/", 10) == 0 ||
 		    memcmp(shname, "tracepoint/", 11) == 0 ||
+		    memcmp(shname, "raw_tracepoint/", 15) == 0 ||
 		    memcmp(shname, "xdp", 3) == 0 ||
 		    memcmp(shname, "perf_event", 10) == 0 ||
 		    memcmp(shname, "socket", 6) == 0 ||
diff --git a/samples/bpf/test_overhead_raw_tp_kern.c b/samples/bpf/test_overhead_raw_tp_kern.c
new file mode 100644
index 000000000000..d2af8bc1c805
--- /dev/null
+++ b/samples/bpf/test_overhead_raw_tp_kern.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Facebook */
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+SEC("raw_tracepoint/task_rename")
+int prog(struct bpf_raw_tracepoint_args *ctx)
+{
+	return 0;
+}
+
+SEC("raw_tracepoint/urandom_read")
+int prog2(struct bpf_raw_tracepoint_args *ctx)
+{
+	return 0;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_overhead_user.c b/samples/bpf/test_overhead_user.c
index d291167fd3c7..e1d35e07a10e 100644
--- a/samples/bpf/test_overhead_user.c
+++ b/samples/bpf/test_overhead_user.c
@@ -158,5 +158,17 @@ int main(int argc, char **argv)
 		unload_progs();
 	}
 
+	if (test_flags & 0xC0) {
+		snprintf(filename, sizeof(filename),
+			 "%s_raw_tp_kern.o", argv[0]);
+		if (load_bpf_file(filename)) {
+			printf("%s", bpf_log_buf);
+			return 1;
+		}
+		printf("w/RAW_TRACEPOINT\n");
+		run_perf_test(num_cpu, test_flags >> 6);
+		unload_progs();
+	}
+
 	return 0;
 }
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 10/10] selftests/bpf: test for bpf_get_stackid() from raw tracepoints
  2018-03-28  2:10 ` Alexei Starovoitov
@ 2018-03-28  2:11   ` Alexei Starovoitov
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

similar to traditional traceopint test add bpf_get_stackid() test
from raw tracepoints
and reduce verbosity of existing stackmap test

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/test_progs.c | 91 ++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index e9df48b306df..faadbe233966 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -877,7 +877,7 @@ static void test_stacktrace_map()
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
 	if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
-		goto out;
+		return;
 
 	/* Get the ID for the sched/sched_switch tracepoint */
 	snprintf(buf, sizeof(buf),
@@ -888,8 +888,7 @@ static void test_stacktrace_map()
 
 	bytes = read(efd, buf, sizeof(buf));
 	close(efd);
-	if (CHECK(bytes <= 0 || bytes >= sizeof(buf),
-		  "read", "bytes %d errno %d\n", bytes, errno))
+	if (bytes <= 0 || bytes >= sizeof(buf))
 		goto close_prog;
 
 	/* Open the perf event and attach bpf progrram */
@@ -906,29 +905,24 @@ static void test_stacktrace_map()
 		goto close_prog;
 
 	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
-	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n",
-		  err, errno))
-		goto close_pmu;
+	if (err)
+		goto disable_pmu;
 
 	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
-	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n",
-		  err, errno))
+	if (err)
 		goto disable_pmu;
 
 	/* find map fds */
 	control_map_fd = bpf_find_map(__func__, obj, "control_map");
-	if (CHECK(control_map_fd < 0, "bpf_find_map control_map",
-		  "err %d errno %d\n", err, errno))
+	if (control_map_fd < 0)
 		goto disable_pmu;
 
 	stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
-	if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap",
-		  "err %d errno %d\n", err, errno))
+	if (stackid_hmap_fd < 0)
 		goto disable_pmu;
 
 	stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
-	if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n",
-		  err, errno))
+	if (stackmap_fd < 0)
 		goto disable_pmu;
 
 	/* give some time for bpf program run */
@@ -945,24 +939,78 @@ static void test_stacktrace_map()
 	err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
 	if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
 		  "err %d errno %d\n", err, errno))
-		goto disable_pmu;
+		goto disable_pmu_noerr;
 
 	err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
 	if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
 		  "err %d errno %d\n", err, errno))
-		; /* fall through */
+		goto disable_pmu_noerr;
 
+	goto disable_pmu_noerr;
 disable_pmu:
+	error_cnt++;
+disable_pmu_noerr:
 	ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
-
-close_pmu:
 	close(pmu_fd);
-
 close_prog:
 	bpf_object__close(obj);
+}
 
-out:
-	return;
+static void test_stacktrace_map_raw_tp()
+{
+	int control_map_fd, stackid_hmap_fd, stackmap_fd;
+	const char *file = "./test_stacktrace_map.o";
+	int efd, err, prog_fd;
+	__u32 key, val, duration = 0;
+	struct bpf_object *obj;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_RAW_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))
+		return;
+
+	efd = bpf_raw_tracepoint_open("sched_switch", prog_fd);
+	if (CHECK(efd < 0, "raw_tp_open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+
+	/* find map fds */
+	control_map_fd = bpf_find_map(__func__, obj, "control_map");
+	if (control_map_fd < 0)
+		goto close_prog;
+
+	stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
+	if (stackid_hmap_fd < 0)
+		goto close_prog;
+
+	stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
+	if (stackmap_fd < 0)
+		goto close_prog;
+
+	/* give some time for bpf program run */
+	sleep(1);
+
+	/* disable stack trace collection */
+	key = 0;
+	val = 1;
+	bpf_map_update_elem(control_map_fd, &key, &val, 0);
+
+	/* for every element in stackid_hmap, we can find a corresponding one
+	 * in stackmap, and vise versa.
+	 */
+	err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
+	if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
+		  "err %d errno %d\n", err, errno))
+		goto close_prog;
+
+	err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
+	if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
+		  "err %d errno %d\n", err, errno))
+		goto close_prog;
+
+	goto close_prog_noerr;
+close_prog:
+	error_cnt++;
+close_prog_noerr:
+	bpf_object__close(obj);
 }
 
 static int extract_build_id(char *build_id, size_t size)
@@ -1138,6 +1186,7 @@ int main(void)
 	test_tp_attach_query();
 	test_stacktrace_map();
 	test_stacktrace_build_id();
+	test_stacktrace_map_raw_tp();
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v7 bpf-next 10/10] selftests/bpf: test for bpf_get_stackid() from raw tracepoints
@ 2018-03-28  2:11   ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28  2:11 UTC (permalink / raw)
  To: davem
  Cc: daniel, torvalds, peterz, rostedt, mathieu.desnoyers, netdev,
	kernel-team, linux-api

From: Alexei Starovoitov <ast@kernel.org>

similar to traditional traceopint test add bpf_get_stackid() test
from raw tracepoints
and reduce verbosity of existing stackmap test

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/test_progs.c | 91 ++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index e9df48b306df..faadbe233966 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -877,7 +877,7 @@ static void test_stacktrace_map()
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
 	if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
-		goto out;
+		return;
 
 	/* Get the ID for the sched/sched_switch tracepoint */
 	snprintf(buf, sizeof(buf),
@@ -888,8 +888,7 @@ static void test_stacktrace_map()
 
 	bytes = read(efd, buf, sizeof(buf));
 	close(efd);
-	if (CHECK(bytes <= 0 || bytes >= sizeof(buf),
-		  "read", "bytes %d errno %d\n", bytes, errno))
+	if (bytes <= 0 || bytes >= sizeof(buf))
 		goto close_prog;
 
 	/* Open the perf event and attach bpf progrram */
@@ -906,29 +905,24 @@ static void test_stacktrace_map()
 		goto close_prog;
 
 	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
-	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n",
-		  err, errno))
-		goto close_pmu;
+	if (err)
+		goto disable_pmu;
 
 	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
-	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n",
-		  err, errno))
+	if (err)
 		goto disable_pmu;
 
 	/* find map fds */
 	control_map_fd = bpf_find_map(__func__, obj, "control_map");
-	if (CHECK(control_map_fd < 0, "bpf_find_map control_map",
-		  "err %d errno %d\n", err, errno))
+	if (control_map_fd < 0)
 		goto disable_pmu;
 
 	stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
-	if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap",
-		  "err %d errno %d\n", err, errno))
+	if (stackid_hmap_fd < 0)
 		goto disable_pmu;
 
 	stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
-	if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n",
-		  err, errno))
+	if (stackmap_fd < 0)
 		goto disable_pmu;
 
 	/* give some time for bpf program run */
@@ -945,24 +939,78 @@ static void test_stacktrace_map()
 	err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
 	if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
 		  "err %d errno %d\n", err, errno))
-		goto disable_pmu;
+		goto disable_pmu_noerr;
 
 	err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
 	if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
 		  "err %d errno %d\n", err, errno))
-		; /* fall through */
+		goto disable_pmu_noerr;
 
+	goto disable_pmu_noerr;
 disable_pmu:
+	error_cnt++;
+disable_pmu_noerr:
 	ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
-
-close_pmu:
 	close(pmu_fd);
-
 close_prog:
 	bpf_object__close(obj);
+}
 
-out:
-	return;
+static void test_stacktrace_map_raw_tp()
+{
+	int control_map_fd, stackid_hmap_fd, stackmap_fd;
+	const char *file = "./test_stacktrace_map.o";
+	int efd, err, prog_fd;
+	__u32 key, val, duration = 0;
+	struct bpf_object *obj;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_RAW_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))
+		return;
+
+	efd = bpf_raw_tracepoint_open("sched_switch", prog_fd);
+	if (CHECK(efd < 0, "raw_tp_open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+
+	/* find map fds */
+	control_map_fd = bpf_find_map(__func__, obj, "control_map");
+	if (control_map_fd < 0)
+		goto close_prog;
+
+	stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
+	if (stackid_hmap_fd < 0)
+		goto close_prog;
+
+	stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
+	if (stackmap_fd < 0)
+		goto close_prog;
+
+	/* give some time for bpf program run */
+	sleep(1);
+
+	/* disable stack trace collection */
+	key = 0;
+	val = 1;
+	bpf_map_update_elem(control_map_fd, &key, &val, 0);
+
+	/* for every element in stackid_hmap, we can find a corresponding one
+	 * in stackmap, and vise versa.
+	 */
+	err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
+	if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
+		  "err %d errno %d\n", err, errno))
+		goto close_prog;
+
+	err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
+	if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
+		  "err %d errno %d\n", err, errno))
+		goto close_prog;
+
+	goto close_prog_noerr;
+close_prog:
+	error_cnt++;
+close_prog_noerr:
+	bpf_object__close(obj);
 }
 
 static int extract_build_id(char *build_id, size_t size)
@@ -1138,6 +1186,7 @@ int main(void)
 	test_tp_attach_query();
 	test_stacktrace_map();
 	test_stacktrace_build_id();
+	test_stacktrace_map_raw_tp();
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28  2:11   ` Alexei Starovoitov
  (?)
@ 2018-03-28 13:49   ` Mathieu Desnoyers
  2018-03-28 16:43     ` Alexei Starovoitov
  -1 siblings, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2018-03-28 13:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Linus Torvalds, Peter Zijlstra,
	rostedt, netdev, kernel-team, linux-api

----- On Mar 27, 2018, at 10:11 PM, Alexei Starovoitov ast@fb.com wrote:

> From: Alexei Starovoitov <ast@kernel.org>
> 
> compute number of arguments passed into tracepoint
> at compile time and store it as part of 'struct tracepoint'.
> The number is necessary to check safety of bpf program access that
> is coming in subsequent patch.


Hi Alexei,

Given that only eBPF needs this parameter count, we can move
it to the struct bpf_raw_event_map newly introduced by Steven,
right ? This would reduce bloat of struct tracepoint. For instance,
we don't need to keep this count around when eBPF is configured
out.

Thanks,

Mathieu

> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
> include/linux/tracepoint-defs.h |  1 +
> include/linux/tracepoint.h      | 12 ++++++------
> include/trace/define_trace.h    | 14 +++++++-------
> 3 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> index 64ed7064f1fa..39a283c61c51 100644
> --- a/include/linux/tracepoint-defs.h
> +++ b/include/linux/tracepoint-defs.h
> @@ -33,6 +33,7 @@ struct tracepoint {
> 	int (*regfunc)(void);
> 	void (*unregfunc)(void);
> 	struct tracepoint_func __rcu *funcs;
> +	u32 num_args;
> };
> 
> #endif
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..c92f4adbc0d7 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -230,18 +230,18 @@ extern void syscall_unregfunc(void);
>  * structures, so we create an array of pointers that will be used for iteration
>  * on the tracepoints.
>  */
> -#define DEFINE_TRACE_FN(name, reg, unreg)				 \
> +#define DEFINE_TRACE_FN(name, reg, unreg, num_args)			 \
> 	static const char __tpstrtab_##name[]				 \
> 	__attribute__((section("__tracepoints_strings"))) = #name;	 \
> 	struct tracepoint __tracepoint_##name				 \
> 	__attribute__((section("__tracepoints"))) =			 \
> -		{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL };\
> +		{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL, num_args };\
> 	static struct tracepoint * const __tracepoint_ptr_##name __used	 \
> 	__attribute__((section("__tracepoints_ptrs"))) =		 \
> 		&__tracepoint_##name;
> 
> -#define DEFINE_TRACE(name)						\
> -	DEFINE_TRACE_FN(name, NULL, NULL);
> +#define DEFINE_TRACE(name, num_args)					\
> +	DEFINE_TRACE_FN(name, NULL, NULL, num_args);
> 
> #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)				\
> 	EXPORT_SYMBOL_GPL(__tracepoint_##name)
> @@ -275,8 +275,8 @@ extern void syscall_unregfunc(void);
> 		return false;						\
> 	}
> 
> -#define DEFINE_TRACE_FN(name, reg, unreg)
> -#define DEFINE_TRACE(name)
> +#define DEFINE_TRACE_FN(name, reg, unreg, num_args)
> +#define DEFINE_TRACE(name, num_args)
> #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
> #define EXPORT_TRACEPOINT_SYMBOL(name)
> 
> diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
> index d9e3d4aa3f6e..96b22ace9ae7 100644
> --- a/include/trace/define_trace.h
> +++ b/include/trace/define_trace.h
> @@ -25,7 +25,7 @@
> 
> #undef TRACE_EVENT
> #define TRACE_EVENT(name, proto, args, tstruct, assign, print)	\
> -	DEFINE_TRACE(name)
> +	DEFINE_TRACE(name, COUNT_ARGS(args))
> 
> #undef TRACE_EVENT_CONDITION
> #define TRACE_EVENT_CONDITION(name, proto, args, cond, tstruct, assign, print) \
> @@ -39,24 +39,24 @@
> #undef TRACE_EVENT_FN
> #define TRACE_EVENT_FN(name, proto, args, tstruct,		\
> 		assign, print, reg, unreg)			\
> -	DEFINE_TRACE_FN(name, reg, unreg)
> +	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
> 
> #undef TRACE_EVENT_FN_COND
> #define TRACE_EVENT_FN_COND(name, proto, args, cond, tstruct,		\
> 		assign, print, reg, unreg)			\
> -	DEFINE_TRACE_FN(name, reg, unreg)
> +	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
> 
> #undef DEFINE_EVENT
> #define DEFINE_EVENT(template, name, proto, args) \
> -	DEFINE_TRACE(name)
> +	DEFINE_TRACE(name, COUNT_ARGS(args))
> 
> #undef DEFINE_EVENT_FN
> #define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg) \
> -	DEFINE_TRACE_FN(name, reg, unreg)
> +	DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
> 
> #undef DEFINE_EVENT_PRINT
> #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
> -	DEFINE_TRACE(name)
> +	DEFINE_TRACE(name, COUNT_ARGS(args))
> 
> #undef DEFINE_EVENT_CONDITION
> #define DEFINE_EVENT_CONDITION(template, name, proto, args, cond) \
> @@ -64,7 +64,7 @@
> 
> #undef DECLARE_TRACE
> #define DECLARE_TRACE(name, proto, args)	\
> -	DEFINE_TRACE(name)
> +	DEFINE_TRACE(name, COUNT_ARGS(args))
> 
> #undef TRACE_INCLUDE
> #undef __TRACE_INCLUDE
> --
> 2.9.5

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 13:49   ` Mathieu Desnoyers
@ 2018-03-28 16:43     ` Alexei Starovoitov
  2018-03-28 17:04       ` Steven Rostedt
  2018-03-28 17:14       ` Mathieu Desnoyers
  0 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28 16:43 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David S. Miller, Daniel Borkmann, Linus Torvalds, Peter Zijlstra,
	rostedt, netdev, kernel-team, linux-api

On 3/28/18 6:49 AM, Mathieu Desnoyers wrote:
> ----- On Mar 27, 2018, at 10:11 PM, Alexei Starovoitov ast@fb.com wrote:
>
>> From: Alexei Starovoitov <ast@kernel.org>
>>
>> compute number of arguments passed into tracepoint
>> at compile time and store it as part of 'struct tracepoint'.
>> The number is necessary to check safety of bpf program access that
>> is coming in subsequent patch.
>
>
> Hi Alexei,
>
> Given that only eBPF needs this parameter count, we can move
> it to the struct bpf_raw_event_map newly introduced by Steven,
> right ? This would reduce bloat of struct tracepoint. For instance,
> we don't need to keep this count around when eBPF is configured
> out.

Makes sense. That is indeed cleaner. Will respin.

What I don't like though is 'bloat' argument.
'u32 num_args' padded to 8-byte takes exactly the same amount
of space in 'struct tracepoint' and in 'struct bpf_raw_event_map'
The number of these structures is the same as well
and chances that tracepoints are on while bpf is off are slim.
More so few emails ago you said:
"I'm perfectly fine with adding the "num_args" stuff. I think it's
really useful. It's only the for_each_kernel_tracepoint change for
which I'm trying to understand the rationale."

I'm guessing now you found out that num_args is not useful to lttng
and it bloats its data structures?
It's ok to change an opinion and I'm completely fine using that as
a real reason.
Will repsin, as I said. No problem.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 16:43     ` Alexei Starovoitov
@ 2018-03-28 17:04       ` Steven Rostedt
  2018-03-28 17:10         ` Alexei Starovoitov
  2018-03-28 17:14       ` Mathieu Desnoyers
  1 sibling, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2018-03-28 17:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mathieu Desnoyers, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On Wed, 28 Mar 2018 09:43:56 -0700
Alexei Starovoitov <ast@fb.com> wrote:
> >
> > Given that only eBPF needs this parameter count, we can move
> > it to the struct bpf_raw_event_map newly introduced by Steven,
> > right ? This would reduce bloat of struct tracepoint. For instance,
> > we don't need to keep this count around when eBPF is configured
> > out.  
> 
> Makes sense. That is indeed cleaner. Will respin.
> 
> What I don't like though is 'bloat' argument.
> 'u32 num_args' padded to 8-byte takes exactly the same amount
> of space in 'struct tracepoint' and in 'struct bpf_raw_event_map'
> The number of these structures is the same as well
> and chances that tracepoints are on while bpf is off are slim.
> More so few emails ago you said:
> "I'm perfectly fine with adding the "num_args" stuff. I think it's
> really useful. It's only the for_each_kernel_tracepoint change for
> which I'm trying to understand the rationale."

I don't really care which one it goes in. The padding bloat is the same
for both :-/  But I wonder if we can shrink it by doing a trick that
Josh did in one of his patches. That is, to use a 32bit offset instead
of a direct pointer. Since you are only accessing core kernel
tracepoints.

Thus, we could have

struct bpf_raw_event_map {
	u32			tp_offset;
	u32			num_args;
	void			*bpf_func;
};

and have:

	u64 tp_offset = (u64)tp - (u64)_sdata;

	if (WARN_ON(tp_offset > UINT_MAX)
		return -EINVAL;

	 btp->tp_offset = (u32)tp_offset;

And to get the tp, all you need to do is:

	tp = (struct tracepoint *)(btp->tp_offset + (unsigned long)_sdata);

I've been thinking of doing this for other parts of the tracepoints and
ftrace code.

BTW, thanks for changing your code. I really appreciate it.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 17:04       ` Steven Rostedt
@ 2018-03-28 17:10         ` Alexei Starovoitov
  2018-03-28 17:38           ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28 17:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On 3/28/18 10:04 AM, Steven Rostedt wrote:
> On Wed, 28 Mar 2018 09:43:56 -0700
> Alexei Starovoitov <ast@fb.com> wrote:
>>>
>>> Given that only eBPF needs this parameter count, we can move
>>> it to the struct bpf_raw_event_map newly introduced by Steven,
>>> right ? This would reduce bloat of struct tracepoint. For instance,
>>> we don't need to keep this count around when eBPF is configured
>>> out.
>>
>> Makes sense. That is indeed cleaner. Will respin.
>>
>> What I don't like though is 'bloat' argument.
>> 'u32 num_args' padded to 8-byte takes exactly the same amount
>> of space in 'struct tracepoint' and in 'struct bpf_raw_event_map'
>> The number of these structures is the same as well
>> and chances that tracepoints are on while bpf is off are slim.
>> More so few emails ago you said:
>> "I'm perfectly fine with adding the "num_args" stuff. I think it's
>> really useful. It's only the for_each_kernel_tracepoint change for
>> which I'm trying to understand the rationale."
>
> I don't really care which one it goes in. The padding bloat is the same
> for both :-/  But I wonder if we can shrink it by doing a trick that
> Josh did in one of his patches. That is, to use a 32bit offset instead
> of a direct pointer. Since you are only accessing core kernel
> tracepoints.
>
> Thus, we could have
>
> struct bpf_raw_event_map {
> 	u32			tp_offset;
> 	u32			num_args;
> 	void			*bpf_func;
> };
>
> and have:
>
> 	u64 tp_offset = (u64)tp - (u64)_sdata;
>
> 	if (WARN_ON(tp_offset > UINT_MAX)
> 		return -EINVAL;
>
> 	 btp->tp_offset = (u32)tp_offset;

above math has to be build time constant, so warn_on likely
won't work.
imo the whole thing is too fragile and obscure.
I suggest to compress this 8 bytes * num_of_tracepoints later.
Especially would be good to do it in one way for
bpf_raw_event_map, ftrace and other places.

> And to get the tp, all you need to do is:
>
> 	tp = (struct tracepoint *)(btp->tp_offset + (unsigned long)_sdata);
>
> I've been thinking of doing this for other parts of the tracepoints and
> ftrace code.
>
> BTW, thanks for changing your code. I really appreciate it.
>
> -- Steve
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 16:43     ` Alexei Starovoitov
  2018-03-28 17:04       ` Steven Rostedt
@ 2018-03-28 17:14       ` Mathieu Desnoyers
  1 sibling, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2018-03-28 17:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Linus Torvalds, Peter Zijlstra,
	rostedt, netdev, kernel-team, linux-api

----- On Mar 28, 2018, at 12:43 PM, Alexei Starovoitov ast@fb.com wrote:

> On 3/28/18 6:49 AM, Mathieu Desnoyers wrote:
>> ----- On Mar 27, 2018, at 10:11 PM, Alexei Starovoitov ast@fb.com wrote:
>>
>>> From: Alexei Starovoitov <ast@kernel.org>
>>>
>>> compute number of arguments passed into tracepoint
>>> at compile time and store it as part of 'struct tracepoint'.
>>> The number is necessary to check safety of bpf program access that
>>> is coming in subsequent patch.
>>
>>
>> Hi Alexei,
>>
>> Given that only eBPF needs this parameter count, we can move
>> it to the struct bpf_raw_event_map newly introduced by Steven,
>> right ? This would reduce bloat of struct tracepoint. For instance,
>> we don't need to keep this count around when eBPF is configured
>> out.
> 
> Makes sense. That is indeed cleaner. Will respin.
> 
> What I don't like though is 'bloat' argument.
> 'u32 num_args' padded to 8-byte takes exactly the same amount
> of space in 'struct tracepoint' and in 'struct bpf_raw_event_map'
> The number of these structures is the same as well
> and chances that tracepoints are on while bpf is off are slim.
> More so few emails ago you said:
> "I'm perfectly fine with adding the "num_args" stuff. I think it's
> really useful. It's only the for_each_kernel_tracepoint change for
> which I'm trying to understand the rationale."
> 
> I'm guessing now you found out that num_args is not useful to lttng
> and it bloats its data structures?

I'm concerned about kernel configurations with only ftrace and
perf, with eBPF disabled. In those configurations, adding ebpf-specific
fields to struct tracepoint increases the kernel image size needlessly.
LTTng is not my concern in this discussion.

I'm also concerned about adding more and more elements to struct tracepoint
in general. For instance, I'd really like to see the regfunc/unregfunc
fields go away into a different section, allowing us to reduce the size of
struct tracepoint, so we can do a better use of the CPU caches when
tracing is enabled. The "funcs" field needs to be loaded each time a
tracepoint is fired, and therefore needs to be cache-hot. Adding more
cache-cold fields in there degrades cache locality, because "hot"
cache-lines then need to hold more cache-cold data.

> It's ok to change an opinion and I'm completely fine using that as
> a real reason.

Seeing the code of the new eBPF section provided by Steven made me realize
that we could use it to hold the num_args as well, keeping all the eBPF
data nice and tidy together. So yes, this is new information that changed
my opinion on the matter.

> Will repsin, as I said. No problem.

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 17:10         ` Alexei Starovoitov
@ 2018-03-28 17:38           ` Steven Rostedt
  2018-03-28 18:03             ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2018-03-28 17:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mathieu Desnoyers, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On Wed, 28 Mar 2018 10:10:34 -0700
Alexei Starovoitov <ast@fb.com> wrote:


> > and have:
> >
> > 	u64 tp_offset = (u64)tp - (u64)_sdata;
> >
> > 	if (WARN_ON(tp_offset > UINT_MAX)
> > 		return -EINVAL;
> >
> > 	 btp->tp_offset = (u32)tp_offset;  
> 
> above math has to be build time constant, so warn_on likely
> won't work.

Right, it would require a BUILD_BUG_ON.

> imo the whole thing is too fragile and obscure.
> I suggest to compress this 8 bytes * num_of_tracepoints later.
> Especially would be good to do it in one way for
> bpf_raw_event_map, ftrace and other places.

Fair enough. We can defer this shrinkage to another time. I only
suggested it here over your concern for the added bloat.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 07/10] bpf: introduce BPF_RAW_TRACEPOINT
  2018-03-28  2:11   ` Alexei Starovoitov
@ 2018-03-28 17:41     ` Steven Rostedt
  -1 siblings, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2018-03-28 17:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, torvalds, peterz, mathieu.desnoyers, netdev,
	kernel-team, linux-api

On Tue, 27 Mar 2018 19:11:02 -0700
Alexei Starovoitov <ast@fb.com> wrote:

> From: Alexei Starovoitov <ast@kernel.org>
> 
> Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
> kernel internal arguments of the tracepoints in their raw form.
> 
> >From bpf program point of view the access to the arguments look like:  
> struct bpf_raw_tracepoint_args {
>        __u64 args[0];
> };
> 
> int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
> {
>   // program can read args[N] where N depends on tracepoint
>   // and statically verified at program load+attach time
> }
> 
> kprobe+bpf infrastructure allows programs access function arguments.
> This feature allows programs access raw tracepoint arguments.
> 
> Similar to proposed 'dynamic ftrace events' there are no abi guarantees
> to what the tracepoints arguments are and what their meaning is.
> The program needs to type cast args properly and use bpf_probe_read()
> helper to access struct fields when argument is a pointer.
> 
> For every tracepoint __bpf_trace_##call function is prepared.
> In assembler it looks like:
> (gdb) disassemble __bpf_trace_xdp_exception
> Dump of assembler code for function __bpf_trace_xdp_exception:
>    0xffffffff81132080 <+0>:     mov    %ecx,%ecx
>    0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>
> 
> where
> 
> TRACE_EVENT(xdp_exception,
>         TP_PROTO(const struct net_device *dev,
>                  const struct bpf_prog *xdp, u32 act),
> 
> The above assembler snippet is casting 32-bit 'act' field into 'u64'
> to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
> All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
> and in total this approach adds 7k bytes to .text.
> 
> This approach gives the lowest possible overhead
> while calling trace_xdp_exception() from kernel C code and
> transitioning into bpf land.
> Since tracepoint+bpf are used at speeds of 1M+ events per second
> this is valuable optimization.
> 
> The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
> that returns anon_inode FD of 'bpf-raw-tracepoint' object.
> 
> The user space looks like:
> // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
> prog_fd = bpf_prog_load(...);
> // receive anon_inode fd for given bpf_raw_tracepoint with prog attached
> raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
> 
> Ctrl-C of tracing daemon or cmdline tool that uses this feature
> will automatically detach bpf program, unload it and
> unregister tracepoint probe.
> 
> On the kernel side the __bpf_raw_tp_map section of pointers to
> tracepoint definition and to __bpf_trace_*() probe function is used
> to find a tracepoint with "xdp_exception" name and
> corresponding __bpf_trace_xdp_exception() probe function
> which are passed to tracepoint_probe_register() to connect probe
> with tracepoint.
> 
> Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
> tracepoint mechanisms. perf_event_open() can be used in parallel
> on the same tracepoint.
> Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
> Each with its own bpf program. The kernel will execute
> all tracepoint probes and all attached bpf programs.
> 
> In the future bpf_raw_tracepoints can be extended with
> query/introspection logic.
> 
> __bpf_raw_tp_map section logic was contributed by Steven Rostedt
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---

Just an FYI, I applied all the patches up to and including this one
(made sure BPF_EVENTS was enabled in my config this time), built and
booted the kernel and ran a bunch of tests (not my full suite, but
enough).

It didn't affect any other tracing features that I can see.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 07/10] bpf: introduce BPF_RAW_TRACEPOINT
@ 2018-03-28 17:41     ` Steven Rostedt
  0 siblings, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2018-03-28 17:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, torvalds, peterz, mathieu.desnoyers, netdev,
	kernel-team, linux-api

On Tue, 27 Mar 2018 19:11:02 -0700
Alexei Starovoitov <ast@fb.com> wrote:

> From: Alexei Starovoitov <ast@kernel.org>
> 
> Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
> kernel internal arguments of the tracepoints in their raw form.
> 
> >From bpf program point of view the access to the arguments look like:  
> struct bpf_raw_tracepoint_args {
>        __u64 args[0];
> };
> 
> int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
> {
>   // program can read args[N] where N depends on tracepoint
>   // and statically verified at program load+attach time
> }
> 
> kprobe+bpf infrastructure allows programs access function arguments.
> This feature allows programs access raw tracepoint arguments.
> 
> Similar to proposed 'dynamic ftrace events' there are no abi guarantees
> to what the tracepoints arguments are and what their meaning is.
> The program needs to type cast args properly and use bpf_probe_read()
> helper to access struct fields when argument is a pointer.
> 
> For every tracepoint __bpf_trace_##call function is prepared.
> In assembler it looks like:
> (gdb) disassemble __bpf_trace_xdp_exception
> Dump of assembler code for function __bpf_trace_xdp_exception:
>    0xffffffff81132080 <+0>:     mov    %ecx,%ecx
>    0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>
> 
> where
> 
> TRACE_EVENT(xdp_exception,
>         TP_PROTO(const struct net_device *dev,
>                  const struct bpf_prog *xdp, u32 act),
> 
> The above assembler snippet is casting 32-bit 'act' field into 'u64'
> to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
> All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
> and in total this approach adds 7k bytes to .text.
> 
> This approach gives the lowest possible overhead
> while calling trace_xdp_exception() from kernel C code and
> transitioning into bpf land.
> Since tracepoint+bpf are used at speeds of 1M+ events per second
> this is valuable optimization.
> 
> The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
> that returns anon_inode FD of 'bpf-raw-tracepoint' object.
> 
> The user space looks like:
> // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
> prog_fd = bpf_prog_load(...);
> // receive anon_inode fd for given bpf_raw_tracepoint with prog attached
> raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
> 
> Ctrl-C of tracing daemon or cmdline tool that uses this feature
> will automatically detach bpf program, unload it and
> unregister tracepoint probe.
> 
> On the kernel side the __bpf_raw_tp_map section of pointers to
> tracepoint definition and to __bpf_trace_*() probe function is used
> to find a tracepoint with "xdp_exception" name and
> corresponding __bpf_trace_xdp_exception() probe function
> which are passed to tracepoint_probe_register() to connect probe
> with tracepoint.
> 
> Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
> tracepoint mechanisms. perf_event_open() can be used in parallel
> on the same tracepoint.
> Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
> Each with its own bpf program. The kernel will execute
> all tracepoint probes and all attached bpf programs.
> 
> In the future bpf_raw_tracepoints can be extended with
> query/introspection logic.
> 
> __bpf_raw_tp_map section logic was contributed by Steven Rostedt
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---

Just an FYI, I applied all the patches up to and including this one
(made sure BPF_EVENTS was enabled in my config this time), built and
booted the kernel and ran a bunch of tests (not my full suite, but
enough).

It didn't affect any other tracing features that I can see.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 17:38           ` Steven Rostedt
@ 2018-03-28 18:03             ` Alexei Starovoitov
  2018-03-28 18:10               ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28 18:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On 3/28/18 10:38 AM, Steven Rostedt wrote:
> On Wed, 28 Mar 2018 10:10:34 -0700
> Alexei Starovoitov <ast@fb.com> wrote:
>
>
>>> and have:
>>>
>>> 	u64 tp_offset = (u64)tp - (u64)_sdata;
>>>
>>> 	if (WARN_ON(tp_offset > UINT_MAX)
>>> 		return -EINVAL;
>>>
>>> 	 btp->tp_offset = (u32)tp_offset;
>>
>> above math has to be build time constant, so warn_on likely
>> won't work.
>
> Right, it would require a BUILD_BUG_ON.
>
>> imo the whole thing is too fragile and obscure.
>> I suggest to compress this 8 bytes * num_of_tracepoints later.
>> Especially would be good to do it in one way for
>> bpf_raw_event_map, ftrace and other places.
>
> Fair enough. We can defer this shrinkage to another time. I only
> suggested it here over your concern for the added bloat.

Actually, I will take it back.
I think the current shape of the patch is better.
struct tracepoint is aligned to 32-bytes by linker
Though sizeof(struct tracepoint) == 48 it actually consumes 64 bytes
of memory.
(gdb) p (void*)&__tracepoint_sys_enter - (void*)&__tracepoint_sys_exit
$3 = 64

Adding num_args to 'struct tracepoint' makes it sizeof==56,
but it still takes 64-bytes in memory.

In this patch sizeof(struct bpf_raw_tp_map) == 16 and
(gdb) p (void*)&__bpf_trace_tp_map_sys_enter - 
(void*)&__bpf_trace_tp_map_sys_exit
$3 = 16

so it consumes exactly the same 16-bytes.
If we add 'u32 num_args' to it, it will have
sizeof(struct bpf_raw_tp_map) == 24 and will consume 32-bytes
of memory.

So clearly adding num_args to struct tracepont gives zero additional
bloat vs before this patch set, whereas moving it to
struct bpf_raw_tp_map will add 16 * num_of_tracepoints bytes of memory.

I can live with this overhead if Mathieu insists,
but I prefer to keep it in 'struct tracepoint'.

Thoughts?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 18:03             ` Alexei Starovoitov
@ 2018-03-28 18:10               ` Steven Rostedt
  2018-03-28 18:19                 ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2018-03-28 18:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mathieu Desnoyers, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On Wed, 28 Mar 2018 11:03:24 -0700
Alexei Starovoitov <ast@fb.com> wrote:

> I can live with this overhead if Mathieu insists,
> but I prefer to keep it in 'struct tracepoint'.
> 
> Thoughts?

I'm fine with keeping it as is. We could probably use it for future
enhancements in perf and ftrace.

Perhaps, we should just add a:

#ifdef CONFIG_BPF_EVENTS

Around the use cases of num_args.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 18:10               ` Steven Rostedt
@ 2018-03-28 18:19                 ` Alexei Starovoitov
  2018-03-28 18:54                   ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28 18:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On 3/28/18 11:10 AM, Steven Rostedt wrote:
> On Wed, 28 Mar 2018 11:03:24 -0700
> Alexei Starovoitov <ast@fb.com> wrote:
>
>> I can live with this overhead if Mathieu insists,
>> but I prefer to keep it in 'struct tracepoint'.
>>
>> Thoughts?
>
> I'm fine with keeping it as is. We could probably use it for future
> enhancements in perf and ftrace.
>
> Perhaps, we should just add a:
>
> #ifdef CONFIG_BPF_EVENTS
>
> Around the use cases of num_args.

it sounds like a good idea, but implementation wise
it will be ifdef CONFIG_BPF_EVENTS around u32 num_args;
in struct tracepoint and similar double definition of
DEFINE_TRACE_FN. One that uses num_args to init
struct tracepoint and one that doesn't ?
Feels like serious uglification of already macros heavy code.
Also what it will address?
cache hot/cold argument clearly doesn't apply.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 18:19                 ` Alexei Starovoitov
@ 2018-03-28 18:54                   ` Steven Rostedt
  2018-03-28 19:22                     ` Mathieu Desnoyers
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2018-03-28 18:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mathieu Desnoyers, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On Wed, 28 Mar 2018 11:19:34 -0700
Alexei Starovoitov <ast@fb.com> wrote:

> On 3/28/18 11:10 AM, Steven Rostedt wrote:
> > On Wed, 28 Mar 2018 11:03:24 -0700
> > Alexei Starovoitov <ast@fb.com> wrote:
> >  
> >> I can live with this overhead if Mathieu insists,
> >> but I prefer to keep it in 'struct tracepoint'.
> >>
> >> Thoughts?  
> >
> > I'm fine with keeping it as is. We could probably use it for future
> > enhancements in perf and ftrace.
> >
> > Perhaps, we should just add a:
> >
> > #ifdef CONFIG_BPF_EVENTS
> >
> > Around the use cases of num_args.  
> 
> it sounds like a good idea, but implementation wise
> it will be ifdef CONFIG_BPF_EVENTS around u32 num_args;
> in struct tracepoint and similar double definition of
> DEFINE_TRACE_FN. One that uses num_args to init
> struct tracepoint and one that doesn't ?
> Feels like serious uglification of already macros heavy code.
> Also what it will address?

32bit bloat ;-)

But I agree, it's not worth uglifying it.

-- Steve

> cache hot/cold argument clearly doesn't apply.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 18:54                   ` Steven Rostedt
@ 2018-03-28 19:22                     ` Mathieu Desnoyers
  2018-03-28 19:25                       ` Alexei Starovoitov
  2018-03-28 19:32                       ` Steven Rostedt
  0 siblings, 2 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2018-03-28 19:22 UTC (permalink / raw)
  To: rostedt
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf


----- On Mar 28, 2018, at 2:54 PM, rostedt rostedt@goodmis.org wrote:

> On Wed, 28 Mar 2018 11:19:34 -0700
> Alexei Starovoitov <ast@fb.com> wrote:
> 
>> On 3/28/18 11:10 AM, Steven Rostedt wrote:
>> > On Wed, 28 Mar 2018 11:03:24 -0700
>> > Alexei Starovoitov <ast@fb.com> wrote:
>> >  
>> >> I can live with this overhead if Mathieu insists,
>> >> but I prefer to keep it in 'struct tracepoint'.
>> >>
>> >> Thoughts?
>> >
>> > I'm fine with keeping it as is. We could probably use it for future
>> > enhancements in perf and ftrace.
>> >
>> > Perhaps, we should just add a:
>> >
>> > #ifdef CONFIG_BPF_EVENTS
>> >
>> > Around the use cases of num_args.
>> 
>> it sounds like a good idea, but implementation wise
>> it will be ifdef CONFIG_BPF_EVENTS around u32 num_args;
>> in struct tracepoint and similar double definition of
>> DEFINE_TRACE_FN. One that uses num_args to init
>> struct tracepoint and one that doesn't ?
>> Feels like serious uglification of already macros heavy code.
>> Also what it will address?
> 
> 32bit bloat ;-)
> 
> But I agree, it's not worth uglifying it.
> 
> -- Steve
> 
> > cache hot/cold argument clearly doesn't apply.

In the current situation I'm fine with adding this extra field
to struct tracepoint. However, we should keep in mind to move
all non-required cache-cold fields to a separate section at
some point. Clearly just this single field won't make a difference
due to other fields and padding.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 19:22                     ` Mathieu Desnoyers
@ 2018-03-28 19:25                       ` Alexei Starovoitov
  2018-03-28 19:32                       ` Steven Rostedt
  1 sibling, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2018-03-28 19:25 UTC (permalink / raw)
  To: Mathieu Desnoyers, rostedt
  Cc: David S. Miller, Daniel Borkmann, Linus Torvalds, Peter Zijlstra,
	netdev, kernel-team, linux-api, Josh Poimboeuf

On 3/28/18 12:22 PM, Mathieu Desnoyers wrote:
>
> ----- On Mar 28, 2018, at 2:54 PM, rostedt rostedt@goodmis.org wrote:
>
>> On Wed, 28 Mar 2018 11:19:34 -0700
>> Alexei Starovoitov <ast@fb.com> wrote:
>>
>>> On 3/28/18 11:10 AM, Steven Rostedt wrote:
>>>> On Wed, 28 Mar 2018 11:03:24 -0700
>>>> Alexei Starovoitov <ast@fb.com> wrote:
>>>>
>>>>> I can live with this overhead if Mathieu insists,
>>>>> but I prefer to keep it in 'struct tracepoint'.
>>>>>
>>>>> Thoughts?
>>>>
>>>> I'm fine with keeping it as is. We could probably use it for future
>>>> enhancements in perf and ftrace.
>>>>
>>>> Perhaps, we should just add a:
>>>>
>>>> #ifdef CONFIG_BPF_EVENTS
>>>>
>>>> Around the use cases of num_args.
>>>
>>> it sounds like a good idea, but implementation wise
>>> it will be ifdef CONFIG_BPF_EVENTS around u32 num_args;
>>> in struct tracepoint and similar double definition of
>>> DEFINE_TRACE_FN. One that uses num_args to init
>>> struct tracepoint and one that doesn't ?
>>> Feels like serious uglification of already macros heavy code.
>>> Also what it will address?
>>
>> 32bit bloat ;-)
>>
>> But I agree, it's not worth uglifying it.
>>
>> -- Steve
>>
>>> cache hot/cold argument clearly doesn't apply.
>
> In the current situation I'm fine with adding this extra field
> to struct tracepoint. However, we should keep in mind to move
> all non-required cache-cold fields to a separate section at
> some point. Clearly just this single field won't make a difference
> due to other fields and padding.

Submitted v8 where num_args is moved to bpf side.
Please ack it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 19:22                     ` Mathieu Desnoyers
  2018-03-28 19:25                       ` Alexei Starovoitov
@ 2018-03-28 19:32                       ` Steven Rostedt
  2018-03-28 19:38                         ` Steven Rostedt
  1 sibling, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2018-03-28 19:32 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On Wed, 28 Mar 2018 15:22:24 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> > > cache hot/cold argument clearly doesn't apply.  
> 
> In the current situation I'm fine with adding this extra field
> to struct tracepoint. However, we should keep in mind to move
> all non-required cache-cold fields to a separate section at
> some point. Clearly just this single field won't make a difference
> due to other fields and padding.

funcs is the only part of the tracepoint structure that needs hot
cache. Thus, instead of trying to keep the tracepoint structure small
for cache reasons, pull funcs out of the tracepoint structure.

What about this patch?

Of course, this just adds another 8 bytes per tracepoint :-/ But it
keeps the functions away from the tracepoint structures. We could even
add a section for them to be together, but I'm not sure that will help
much more that just letting the linker place them, as these function
pointers of the same system will probably be grouped together.

-- Steve

diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 35db8dd48c4c..8d100163e9af 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -32,7 +32,7 @@ struct tracepoint {
 	struct static_key key;
 	int (*regfunc)(void);
 	void (*unregfunc)(void);
-	struct tracepoint_func __rcu *funcs;
+	struct tracepoint_func **funcs;
 	u32 num_args;
 };
 
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c92f4adbc0d7..b55282202f71 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -129,7 +129,7 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(name, proto, args, cond, rcucheck)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
@@ -140,7 +140,7 @@ extern void syscall_unregfunc(void);
 		if (rcucheck)						\
 			rcu_irq_enter_irqson();				\
 		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+		it_func_ptr = rcu_dereference_sched(__trace_##name##_funcs); \
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -158,7 +158,7 @@ extern void syscall_unregfunc(void);
 	static inline void trace_##name##_rcuidle(proto)		\
 	{								\
 		if (static_key_false(&__tracepoint_##name.key))		\
-			__DO_TRACE(&__tracepoint_##name,		\
+			__DO_TRACE(name,				\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args),			\
 				TP_CONDITION(cond), 1);			\
@@ -181,10 +181,11 @@ extern void syscall_unregfunc(void);
  */
 #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
 	extern struct tracepoint __tracepoint_##name;			\
+	extern struct tracepoint_func __rcu *__trace_##name##_funcs;	\
 	static inline void trace_##name(proto)				\
 	{								\
 		if (static_key_false(&__tracepoint_##name.key))		\
-			__DO_TRACE(&__tracepoint_##name,		\
+			__DO_TRACE(name,				\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args),			\
 				TP_CONDITION(cond), 0);			\
@@ -233,9 +234,11 @@ extern void syscall_unregfunc(void);
 #define DEFINE_TRACE_FN(name, reg, unreg, num_args)			 \
 	static const char __tpstrtab_##name[]				 \
 	__attribute__((section("__tracepoints_strings"))) = #name;	 \
+	struct tracepoint_func __rcu *__trace_##name##_funcs;		 \
 	struct tracepoint __tracepoint_##name				 \
 	__attribute__((section("__tracepoints"))) =			 \
-		{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL, num_args };\
+		{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg,	 \
+		  &__trace_##name##_funcs, num_args };			 \
 	static struct tracepoint * const __tracepoint_ptr_##name __used	 \
 	__attribute__((section("__tracepoints_ptrs"))) =		 \
 		&__tracepoint_##name;
@@ -244,9 +247,12 @@ extern void syscall_unregfunc(void);
 	DEFINE_TRACE_FN(name, NULL, NULL, num_args);
 
 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)				\
-	EXPORT_SYMBOL_GPL(__tracepoint_##name)
+	EXPORT_SYMBOL_GPL(__tracepoint_##name);				\
+	EXPORT_SYMBOL_GPL(__trace_##name##_funcs)
+
 #define EXPORT_TRACEPOINT_SYMBOL(name)					\
-	EXPORT_SYMBOL(__tracepoint_##name)
+	EXPORT_SYMBOL(__tracepoint_##name);				\
+	EXPORT_SYMBOL(__trace_##name##_funcs)
 
 #else /* !TRACEPOINTS_ENABLED */
 #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index a02bc09d765a..47b884809f22 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -809,6 +809,7 @@ static void free_synth_tracepoint(struct tracepoint *tp)
 	if (!tp)
 		return;
 
+	kfree(tp->funcs);
 	kfree(tp->name);
 	kfree(tp);
 }
@@ -827,6 +828,13 @@ static struct tracepoint *alloc_synth_tracepoint(char *name)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	tp->funcs = kzalloc(sizeof(*tp->funcs), GFP_KERNEL);
+	if (!tp->funcs) {
+		kfree(tp->name);
+		kfree(tp);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	return tp;
 }
 
@@ -846,7 +854,7 @@ static inline void trace_synth(struct synth_event *event, u64 *var_ref_vals,
 		if (!(cpu_online(raw_smp_processor_id())))
 			return;
 
-		probe_func_ptr = rcu_dereference_sched((tp)->funcs);
+		probe_func_ptr = rcu_dereference_sched(*(tp)->funcs);
 		if (probe_func_ptr) {
 			do {
 				probe_func = probe_func_ptr->func;
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 671b13457387..638a35f77841 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -203,7 +203,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
 			return ret;
 	}
 
-	tp_funcs = rcu_dereference_protected(tp->funcs,
+	tp_funcs = rcu_dereference_protected(*tp->funcs,
 			lockdep_is_held(&tracepoints_mutex));
 	old = func_add(&tp_funcs, func, prio);
 	if (IS_ERR(old)) {
@@ -217,7 +217,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
 	 * a pointer to it.  This array is referenced by __DO_TRACE from
 	 * include/linux/tracepoint.h using rcu_dereference_sched().
 	 */
-	rcu_assign_pointer(tp->funcs, tp_funcs);
+	rcu_assign_pointer(*tp->funcs, tp_funcs);
 	if (!static_key_enabled(&tp->key))
 		static_key_slow_inc(&tp->key);
 	release_probes(old);
@@ -235,7 +235,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
 {
 	struct tracepoint_func *old, *tp_funcs;
 
-	tp_funcs = rcu_dereference_protected(tp->funcs,
+	tp_funcs = rcu_dereference_protected(*tp->funcs,
 			lockdep_is_held(&tracepoints_mutex));
 	old = func_remove(&tp_funcs, func);
 	if (IS_ERR(old)) {
@@ -251,7 +251,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
 		if (static_key_enabled(&tp->key))
 			static_key_slow_dec(&tp->key);
 	}
-	rcu_assign_pointer(tp->funcs, tp_funcs);
+	rcu_assign_pointer(*tp->funcs, tp_funcs);
 	release_probes(old);
 	return 0;
 }
@@ -398,7 +398,7 @@ static void tp_module_going_check_quiescent(struct tracepoint * const *begin,
 	if (!begin)
 		return;
 	for (iter = begin; iter < end; iter++)
-		WARN_ON_ONCE((*iter)->funcs);
+		WARN_ON_ONCE(*(*iter)->funcs);
 }
 
 static int tracepoint_module_coming(struct module *mod)

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 19:32                       ` Steven Rostedt
@ 2018-03-28 19:38                         ` Steven Rostedt
  2018-03-28 19:47                           ` Mathieu Desnoyers
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2018-03-28 19:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

On Wed, 28 Mar 2018 15:32:20 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
> +#define __DO_TRACE(name, proto, args, cond, rcucheck)			\
>  	do {								\
>  		struct tracepoint_func *it_func_ptr;			\
>  		void *it_func;						\
> @@ -140,7 +140,7 @@ extern void syscall_unregfunc(void);
>  		if (rcucheck)						\
>  			rcu_irq_enter_irqson();				\
>  		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +		it_func_ptr = rcu_dereference_sched(__trace_##name##_funcs); \

What we lose in data size, we may make up for in text (which is even
more important). This will remove a dereference in the hot path.

I'll make a few builds and run size on the vmlinux images to see how
this pans out.

-- Steve


>  		if (it_func_ptr) {					\
>  			do {						\

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time
  2018-03-28 19:38                         ` Steven Rostedt
@ 2018-03-28 19:47                           ` Mathieu Desnoyers
  0 siblings, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2018-03-28 19:47 UTC (permalink / raw)
  To: rostedt
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
	Linus Torvalds, Peter Zijlstra, netdev, kernel-team, linux-api,
	Josh Poimboeuf

----- On Mar 28, 2018, at 3:38 PM, rostedt rostedt@goodmis.org wrote:

> On Wed, 28 Mar 2018 15:32:20 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
>> +#define __DO_TRACE(name, proto, args, cond, rcucheck)			\
>>  	do {								\
>>  		struct tracepoint_func *it_func_ptr;			\
>>  		void *it_func;						\
>> @@ -140,7 +140,7 @@ extern void syscall_unregfunc(void);
>>  		if (rcucheck)						\
>>  			rcu_irq_enter_irqson();				\
>>  		rcu_read_lock_sched_notrace();				\
>> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
>> +		it_func_ptr = rcu_dereference_sched(__trace_##name##_funcs); \
> 
> What we lose in data size, we may make up for in text (which is even
> more important). This will remove a dereference in the hot path.
> 
> I'll make a few builds and run size on the vmlinux images to see how
> this pans out.

I like the approach. If it passes testing, I think it's a valuable improvement
to lessen cache footprint of tracepoint when tracing is active.

Thanks!

Mathieu


> 
> -- Steve
> 
> 
>>  		if (it_func_ptr) {					\
> >  			do {						\

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-03-28 19:47 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-28  2:10 [PATCH v7 bpf-next 00/10] bpf, tracing: introduce bpf raw tracepoints Alexei Starovoitov
2018-03-28  2:10 ` Alexei Starovoitov
2018-03-28  2:10 ` [PATCH v7 bpf-next 01/10] treewide: remove large struct-pass-by-value from tracepoint arguments Alexei Starovoitov
2018-03-28  2:10   ` Alexei Starovoitov
2018-03-28  2:10 ` [PATCH v7 bpf-next 02/10] net/mediatek: disambiguate mt76 vs mt7601u trace events Alexei Starovoitov
2018-03-28  2:10   ` Alexei Starovoitov
2018-03-28  2:10 ` [PATCH v7 bpf-next 03/10] net/mac802154: disambiguate mac80215 vs mac802154 " Alexei Starovoitov
2018-03-28  2:10   ` Alexei Starovoitov
2018-03-28  2:10 ` [PATCH v7 bpf-next 04/10] net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint Alexei Starovoitov
2018-03-28  2:10   ` Alexei Starovoitov
2018-03-28  2:11 ` [PATCH v7 bpf-next 05/10] macro: introduce COUNT_ARGS() macro Alexei Starovoitov
2018-03-28  2:11   ` Alexei Starovoitov
2018-03-28  2:11 ` [PATCH v7 bpf-next 06/10] tracepoint: compute num_args at build time Alexei Starovoitov
2018-03-28  2:11   ` Alexei Starovoitov
2018-03-28 13:49   ` Mathieu Desnoyers
2018-03-28 16:43     ` Alexei Starovoitov
2018-03-28 17:04       ` Steven Rostedt
2018-03-28 17:10         ` Alexei Starovoitov
2018-03-28 17:38           ` Steven Rostedt
2018-03-28 18:03             ` Alexei Starovoitov
2018-03-28 18:10               ` Steven Rostedt
2018-03-28 18:19                 ` Alexei Starovoitov
2018-03-28 18:54                   ` Steven Rostedt
2018-03-28 19:22                     ` Mathieu Desnoyers
2018-03-28 19:25                       ` Alexei Starovoitov
2018-03-28 19:32                       ` Steven Rostedt
2018-03-28 19:38                         ` Steven Rostedt
2018-03-28 19:47                           ` Mathieu Desnoyers
2018-03-28 17:14       ` Mathieu Desnoyers
2018-03-28  2:11 ` [PATCH v7 bpf-next 07/10] bpf: introduce BPF_RAW_TRACEPOINT Alexei Starovoitov
2018-03-28  2:11   ` Alexei Starovoitov
2018-03-28 17:41   ` Steven Rostedt
2018-03-28 17:41     ` Steven Rostedt
2018-03-28  2:11 ` [PATCH v7 bpf-next 08/10] libbpf: add bpf_raw_tracepoint_open helper Alexei Starovoitov
2018-03-28  2:11   ` Alexei Starovoitov
2018-03-28  2:11 ` [PATCH v7 bpf-next 09/10] samples/bpf: raw tracepoint test Alexei Starovoitov
2018-03-28  2:11   ` Alexei Starovoitov
2018-03-28  2:11 ` [PATCH v7 bpf-next 10/10] selftests/bpf: test for bpf_get_stackid() from raw tracepoints Alexei Starovoitov
2018-03-28  2:11   ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.