All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
@ 2017-02-08 17:24 Tom Zanussi
  2017-02-08 17:24 ` [RFC][PATCH 01/21] tracing: Add hist_field_name() accessor Tom Zanussi
                   ` (23 more replies)
  0 siblings, 24 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:24 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

This patchset adds support for 'inter-event' quantities to the trace
event subsystem.  The most important example of inter-event quantities
are latencies, or the time differences between two events.

One of the main motivations for adding this capability is to provide a
general-purpose base that existing existing tools such as the -RT
latency_hist patchset can be built upon, while at the same time
providing a simple way for users to track latencies (or any
inter-event quantity) generically between any two events.

Previous -RT latency_hist patchsets that take advantage of the trace
event subsystem have been submitted, but they essentially hard-code
special-case tracepoints and application logic in ways that can't be
reused.  It seemed to me that rather than providing a one-off patchset
devoted specifically to generating the specific histograms in the
latency_hist patchset, it should be possible to build the same
functionality on top of a generic layer allowing users to do similar
things for other non-latency_hist applications.

In addition to preliminary patches that add some basic missing
functionality such as a common ringbuffer-derived timestamp and
dynamically-creatable tracepoints, the overall patchset is divided up
into a few different areas that combine to produce the overall goal
(The Documentation patch explains all the details):

  - variables and simple expressions required to calculate a latency

    In order to calculate a latency or any inter-event value,
    something from one event needs to be saved and later retrieved,
    and some operation such as subtraction or addition is performed on
    it.  This means some minimal form of variables and expressions,
    which the first set of patches implements.  Saving and retrieving
    events to use in a latency calculation is normally done using a
    hash table, and that's exactly what we have with trace event hist
    triggers, so that's where variables are instantiated, set, and
    retrieved.  Basically, variables are set on one entry and
    retrieved and used by a 'matching' event.

  - 'synthetic' events, combining variables from other events

    The trace event interface is based on pseudo-files associated with
    individual events, so it wouldn't really make sense to have
    quantities derived from multiple events attached to any one of
    those events.  For that reason, the patchset implements a means of
    combining variables from other events into a separate 'synthetic'
    event, which can be treated as if it were just like any other
    trace event in the system.

  - 'actions' generating synthetic events, among other things

    Variables and synthetic events provide the data and data structure
    for new events, but something still needs to actually generate an
    event using that data.  'Actions' are expanded to provide that
    capability.  Though it hasn't been explicitly called as much
    before, the default 'action' currently for a hist trigger is to
    update the matching histogram entry's sum values.  This patchset
    essentially expands that to provide a new 'onmatch.trace(event)'
    action that can be used to have one event generate another.  The
    mechanism is extensible to other actions, and in fact the patchset
    also includes another, 'onmax(var).save(field,...)' that can be
    used to save context whenever a value exceeds the previous maximum
    (something also needed by latency_hist).

I'm submitting the patchset (based on tracing/for-next) as an RFC not
only to get comments, but because there are still some problems I
haven't fixed yet...

Here are some examples that should make things less abstract.

  ====
  Example - wakeup latency
  ====

  This basically implements the -RT latency_hist 'wakeup_latency'
  histogram using the synthetic events, variables, and actions
  described.  The output below is from a run of cyclictest using the
  following command:

    # rt-tests/cyclictest -p 80 -n -s -t 2

  What we're measuring the latency of is the time between when a
  thread (of cyclictest) is awakened and when it's scheduled in.  To
  do that we add triggers to sched_wakeup and sched_switch with the
  appropriate variables, and on a matching sched_switch event,
  generate a synthetic 'wakeup_latency' event.  Since it's just
  another trace event like any other, we can also define a histogram
  on that event, the output of which is what we see displayed when
  reading the wakeup_latency 'hist' file.

  First, we create a synthetic event called wakeup_latency, that
  references 3 variables from other events:

    # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
                           pid=sched_switch:woken_pid \
                           prio=sched_switch:woken_prio' >> \
            /sys/kernel/debug/tracing/synthetic_events

  Next we add a trigger to sched_wakeup, which saves the value of the
  'common_timestamp' when that event is hit in a variable, ts0.  Note
  that this happens only when 'comm==cyclictest'.

  Also, 'common_timestamp' is a new field defined on every event (if
  needed - if there are no users of timestamps in a trace, timestamps
  won't be saved and there's no additional overhead from that).

    #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
             comm=="cyclictest"' >> \
             /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger

  Next, we add a trigger to sched_switch.  When the pid being switched
  to matches the pid woken up by a previous sched_wakeup event, this
  event grabs the ts0 saved on that event, takes the difference
  between it and the current sched_switch's common_timestamp, and
  assigns it to a new 'wakeup_lat' variable.  It also saves a couple
  other variables and then invokes the onmatch().trace() action which
  generates a new wakeup_latency event using those variables.

    # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
       wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
            if next_comm=="cyclictest"' >> \
            /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

  Finally, all we have left to do is create a standard histogram
  simply naming the fields of the wakeup_latency synthetic event:

    # echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \
            /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger

  At any time, we can see the histogram output by simply reading the
  synthetic/wakeup_latency/hist file:

    # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist

    # event histogram
    #
    # trigger info: hist:keys=pid,prio,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
    #

    { pid:       2519, prio:        120, lat:          1 } hitcount:         12
    { pid:       2519, prio:        120, lat:          2 } hitcount:        671
    { pid:       2519, prio:        120, lat:          3 } hitcount:        588
    { pid:       2519, prio:        120, lat:          4 } hitcount:        202
    { pid:       2519, prio:        120, lat:          5 } hitcount:         28
    { pid:       2519, prio:        120, lat:          6 } hitcount:         13
    { pid:       2519, prio:        120, lat:          7 } hitcount:         12
    { pid:       2519, prio:        120, lat:          8 } hitcount:          7
    { pid:       2519, prio:        120, lat:          9 } hitcount:         12
    { pid:       2519, prio:        120, lat:         10 } hitcount:         11
    { pid:       2519, prio:        120, lat:         11 } hitcount:          7
    { pid:       2519, prio:        120, lat:         12 } hitcount:          6
    { pid:       2519, prio:        120, lat:         13 } hitcount:          1
    { pid:       2519, prio:        120, lat:         17 } hitcount:          1
    { pid:       2519, prio:        120, lat:         18 } hitcount:          3
    { pid:       2519, prio:        120, lat:         19 } hitcount:          2
    { pid:       2519, prio:        120, lat:         22 } hitcount:          2
    { pid:       2519, prio:        120, lat:         23 } hitcount:          1
    { pid:       2519, prio:        120, lat:         24 } hitcount:          1
    { pid:       2519, prio:        120, lat:         27 } hitcount:          1
    { pid:       2519, prio:        120, lat:         34 } hitcount:          1
    { pid:       2519, prio:        120, lat:         53 } hitcount:          1
    { pid:       2519, prio:        120, lat:         67 } hitcount:          1
    { pid:       2519, prio:        120, lat:         69 } hitcount:          1
    { pid:       2521, prio:         19, lat:          1 } hitcount:        735
    { pid:       2521, prio:         19, lat:          2 } hitcount:       8978
    { pid:       2521, prio:         19, lat:          3 } hitcount:       4798
    { pid:       2521, prio:         19, lat:          4 } hitcount:        716
    { pid:       2521, prio:         19, lat:          5 } hitcount:        298
    { pid:       2521, prio:         19, lat:          6 } hitcount:        136
    { pid:       2521, prio:         19, lat:          7 } hitcount:         93
    { pid:       2521, prio:         19, lat:          8 } hitcount:         51
    { pid:       2521, prio:         19, lat:          9 } hitcount:         20
    { pid:       2521, prio:         19, lat:         10 } hitcount:         18
    { pid:       2521, prio:         19, lat:         11 } hitcount:          3
    { pid:       2521, prio:         19, lat:         12 } hitcount:          1
    { pid:       2521, prio:         19, lat:         13 } hitcount:          3
    { pid:       2521, prio:         19, lat:         15 } hitcount:          1
    { pid:       2521, prio:         19, lat:         17 } hitcount:          1
    { pid:       2521, prio:         19, lat:         18 } hitcount:          1
    { pid:       2521, prio:         19, lat:         20 } hitcount:          2
    { pid:       2521, prio:         19, lat:         22 } hitcount:          1
    { pid:       2521, prio:         19, lat:         25 } hitcount:          1
    { pid:       2521, prio:         19, lat:         26 } hitcount:          1
    { pid:       2522, prio:         19, lat:          1 } hitcount:        392
    { pid:       2522, prio:         19, lat:          2 } hitcount:       5376
    { pid:       2522, prio:         19, lat:          3 } hitcount:       3982
    { pid:       2522, prio:         19, lat:          4 } hitcount:        500
    { pid:       2522, prio:         19, lat:          5 } hitcount:        202
    { pid:       2522, prio:         19, lat:          6 } hitcount:         67
    { pid:       2522, prio:         19, lat:          7 } hitcount:         35
    { pid:       2522, prio:         19, lat:          8 } hitcount:         12
    { pid:       2522, prio:         19, lat:          9 } hitcount:          9
    { pid:       2522, prio:         19, lat:         10 } hitcount:          4
    { pid:       2522, prio:         19, lat:         11 } hitcount:          3
    { pid:       2522, prio:         19, lat:         12 } hitcount:          1
    { pid:       2522, prio:         19, lat:         13 } hitcount:          1
    { pid:       2522, prio:         19, lat:         16 } hitcount:          1
    { pid:       2522, prio:         19, lat:         18 } hitcount:          2
    { pid:       2522, prio:         19, lat:         19 } hitcount:          1
    { pid:       2522, prio:         19, lat:         21 } hitcount:          2
    { pid:       2522, prio:         19, lat:         22 } hitcount:          1
    { pid:       2522, prio:         19, lat:         23 } hitcount:          1
    { pid:       2522, prio:         19, lat:         45 } hitcount:          1
    { pid:       2522, prio:         19, lat:         82 } hitcount:          1

    Totals:
        Hits: 28037
        Entries: 65
        Dropped: 0

  The above output uses the .usecs modifier to common_timestamp, so
  the latencies are reported in microseconds.  The default, without
  the modifier, is nanoseconds, but that's too fine-grained to put
  directly into a histogram - for that however we can use the .log2
  modifier on the 'lat' key.  Otherwise the rest is the same:

    # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist

    # event histogram
    #
    # trigger info: hist:keys=pid,prio,lat.log2:vals=hitcount:sort=pid,lat.log2:size=2048 [active]
    #

    { pid:       2585, prio:        120, lat: ~ 2^10 } hitcount:          1
    { pid:       2585, prio:        120, lat: ~ 2^11 } hitcount:        379
    { pid:       2585, prio:        120, lat: ~ 2^12 } hitcount:       1008
    { pid:       2585, prio:        120, lat: ~ 2^13 } hitcount:         42
    { pid:       2585, prio:        120, lat: ~ 2^14 } hitcount:         18
    { pid:       2585, prio:        120, lat: ~ 2^15 } hitcount:          3
    { pid:       2585, prio:        120, lat: ~ 2^16 } hitcount:          1
    { pid:       2586, prio:         19, lat: ~ 2^11 } hitcount:       4715
    { pid:       2586, prio:         19, lat: ~ 2^12 } hitcount:       9161
    { pid:       2586, prio:         19, lat: ~ 2^13 } hitcount:        632
    { pid:       2586, prio:         19, lat: ~ 2^14 } hitcount:         47
    { pid:       2586, prio:         19, lat: ~ 2^15 } hitcount:          3
    { pid:       2586, prio:         19, lat: ~ 2^17 } hitcount:          1
    { pid:       2587, prio:         19, lat: ~ 2^11 } hitcount:       3398
    { pid:       2587, prio:         19, lat: ~ 2^12 } hitcount:       5762
    { pid:       2587, prio:         19, lat: ~ 2^13 } hitcount:        505
    { pid:       2587, prio:         19, lat: ~ 2^14 } hitcount:         58
    { pid:       2587, prio:         19, lat: ~ 2^15 } hitcount:          3
    { pid:       2587, prio:         19, lat: ~ 2^17 } hitcount:          1

    Totals:
        Hits: 25738
        Entries: 19
        Dropped: 0


  ====
  Example - wakeup latency with onmax()
  ====

  This example is the same as the previous ones, but here we're using
  the onmax() action to save some context (several fields of the
  sched_switch event) whenever the latency (wakeup_lat) exceeds the
  previous maximum.

  As with the similar functionality of the -RT latency_hist
  histograms, it's useful to be able to capture information about the
  previous process, which potentially could have contributed to the
  maximum latency that was saved.

    # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio: \
            wakeup_lat=common_timestamp.usecs-ts0:\
            onmax(wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
            if next_comm=="cyclictest"' >> \
            /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

    # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

    # event histogram
    #

    { next_pid:       3519 } hitcount:       2453  next_prio:          0 \
                             common_timestamp-ts0: 12785539665130611
	max:         79  next_comm: cyclictest \
        prev_pid:          0  prev_prio:        120  prev_comm: swapper/1

    { next_pid:       3521 } hitcount:      16425  next_prio:          0  \
                             common_timestamp-ts0: 12785539665130611
	max:         84  next_comm: cyclictest \
        prev_pid:          0  prev_prio:        120  prev_comm: swapper/2

    { next_pid:       3520 } hitcount:      24593  next_prio:          0 \
                             common_timestamp-ts0: 12785539665130611
	max:         98  next_comm: cyclictest \
        prev_pid:          0  prev_prio:        120  prev_comm: swapper/0
                                    
    Totals:
        Hits: 217355
        Entries: 3
        Dropped: 0


  And, verifying, we can see that the max latencies captured above
  match the highest latencies for each thread in the wakeup_latency
  histogram:

    # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist

    # event histogram
    #
    # trigger info: hist:keys=pid,prio,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
    #

    { pid:       3519, prio:        120, lat:          0 } hitcount:          3
    { pid:       3519, prio:        120, lat:          1 } hitcount:        176
    { pid:       3519, prio:        120, lat:          2 } hitcount:       1284
    { pid:       3519, prio:        120, lat:          3 } hitcount:        709
    ...
    { pid:       3519, prio:        120, lat:         79 } hitcount:          1

    { pid:       3520, prio:         19, lat:          1 } hitcount:       3372
    { pid:       3520, prio:         19, lat:          2 } hitcount:      14777
    { pid:       3520, prio:         19, lat:          3 } hitcount:       4678
    { pid:       3520, prio:         19, lat:          4 } hitcount:        926
    ...
    { pid:       3520, prio:         19, lat:         98 } hitcount:          1

    { pid:       3521, prio:         19, lat:          1 } hitcount:       1551
    { pid:       3521, prio:         19, lat:          2 } hitcount:       8827
    { pid:       3521, prio:         19, lat:          3 } hitcount:       4620
    { pid:       3521, prio:         19, lat:          4 } hitcount:        876
    { pid:       3521, prio:         19, lat:         84 } hitcount:          1

    Totals:
        Hits: 43471
        Entries: 71
        Dropped: 0


  ====
  Example - combined wakeup and switchtime (wakeupswitch) latency
  ====

  Finally, this example is quite a bit more involved, but that's
  because it implements 3 latencies, one which is a combination of the
  other two.  This also, is something that the -RT latency_hist
  patchset does and which this patchset adds generic support for.

  The latency_hist patchset creates a few individual latency
  histograms but also combines them into larger overall combined
  histograms.  For example, the time between when a thread is awakened
  and when it actually continues executing in userspace is something
  covered by a histogram, but it's also broken down into two
  sub-histograms, one covering the time between sched_wakeup and the
  time the thread is scheduled in (wakeup_latency as above), and the
  time between when the thread is scheduled in and the time it
  actually begins executing again (return from sys_nanosleep), covered
  by a separate switchtime_latency histogram.

  The below combines the wakeup_latency histogram from before, adds a
  new switchtime_latency histogram, and another, wakeupswitch_latency,
  that's a combination of the other two.

  There isn't anything really new here, other than that the use of the
  addition operator to add two latencies to produce the
  wakeupswitch_latency.

    # wakeup latency

    # echo 'wakeup_latency lat=sched_switch:ss_lat pid=sched_switch:ss_pid' >> /sys/kernel/debug/tracing/synthetic_events

    # echo 'hist:keys=pid:wakeup_ts=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger

    # echo 'hist:keys=ss_pid=next_pid:ss_lat=common_timestamp.usecs-wakeup_ts:onmatch().trace(wakeup_latency) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

    # echo 'hist:keys=pid:wakeup_latency=lat:sort=pid' >> /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
echo 'hist:keys=pid,lat:sort=pid,lat' >> /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger


    # switchtime latency

    # echo 'switchtime_latency lat=sys_exit_nanosleep:ns_lat pid=sys_exit_nanosleep:ns_pid' >> /sys/kernel/debug/tracing/synthetic_events

    # echo 'hist:keys=next_pid:ss_ts=common_timestamp.usecs if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

    # echo 'hist:key=ns_pid=common_pid:ns_lat=common_timestamp.usecs-ss_ts:onmatch().trace(switchtime_latency)' >> /sys/kernel/debug/tracing/events/syscalls/sys_exit_nanosleep/trigger

    # echo 'hist:keys=pid:switchtime_latency=lat:sort=pid' >> /sys/kernel/debug/tracing/events/synthetic/switchtime_latency/trigger

    # echo 'hist:keys=pid,lat:sort=pid,lat' >> /sys/kernel/debug/tracing/events/synthetic/switchtime_latency/trigger

    # wakeupswitch latency

    # echo 'wakeupswitch_latency pid=sys_exit_nanosleep:ns_pid lat=sys_exit_nanosleep:wakeupswitch_lat' >> /sys/kernel/debug/tracing/synthetic_events

    # echo 'hist:key=common_pid:wakeupswitch_lat=wakeup_latency+switchtime_latency:onmatch().trace(wakeupswitch_latency)' >> /sys/kernel/debug/tracing/events/syscalls/sys_exit_nanosleep/trigger

    # echo 'hist:keys=pid,lat:sort=pid,lat' >> /sys/kernel/debug/tracing/events/synthetic/wakeupswitch_latency/trigger


    # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist

    # event histogram
    #
    # trigger info: hist:keys=pid,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
    #

    { pid:       3015, lat:          1 } hitcount:          2
    { pid:       3015, lat:          2 } hitcount:        167
    { pid:       3015, lat:          3 } hitcount:        348
    { pid:       3015, lat:          4 } hitcount:        139
    { pid:       3015, lat:          5 } hitcount:         14
    { pid:       3015, lat:          6 } hitcount:          6
    { pid:       3015, lat:          7 } hitcount:          7
    { pid:       3015, lat:          8 } hitcount:          8
    { pid:       3015, lat:          9 } hitcount:          3
    { pid:       3015, lat:         10 } hitcount:          2
    { pid:       3015, lat:         11 } hitcount:          2
    { pid:       3015, lat:         12 } hitcount:          1
    { pid:       3015, lat:         13 } hitcount:          1
    { pid:       3015, lat:         21 } hitcount:          1
    { pid:       3016, lat:          1 } hitcount:        294
    { pid:       3016, lat:          2 } hitcount:       3641
    { pid:       3016, lat:          3 } hitcount:       2535
    { pid:       3016, lat:          4 } hitcount:        311
    { pid:       3016, lat:          5 } hitcount:        109
    { pid:       3016, lat:          6 } hitcount:         59
    { pid:       3016, lat:          7 } hitcount:         27
    { pid:       3016, lat:          8 } hitcount:         13
    { pid:       3016, lat:          9 } hitcount:          5
    { pid:       3016, lat:         10 } hitcount:          1
    { pid:       3016, lat:         11 } hitcount:          1
    { pid:       3016, lat:         12 } hitcount:          1
    { pid:       3016, lat:         13 } hitcount:          2
    { pid:       3016, lat:         15 } hitcount:          1
    { pid:       3016, lat:         17 } hitcount:          2
    { pid:       3016, lat:         21 } hitcount:          1
    { pid:       3017, lat:          1 } hitcount:         85
    { pid:       3017, lat:          2 } hitcount:       1752
    { pid:       3017, lat:          3 } hitcount:       2334
    { pid:       3017, lat:          4 } hitcount:        308
    { pid:       3017, lat:          5 } hitcount:         96
    { pid:       3017, lat:          6 } hitcount:         46
    { pid:       3017, lat:          7 } hitcount:         31
    { pid:       3017, lat:          8 } hitcount:         12
    { pid:       3017, lat:          9 } hitcount:         11
    { pid:       3017, lat:         10 } hitcount:          3
    { pid:       3017, lat:         12 } hitcount:          2
    { pid:       3017, lat:         16 } hitcount:          1
    { pid:       3017, lat:         21 } hitcount:          1

    Totals:
        Hits: 12386
        Entries: 43
        Dropped: 0


    # cat /sys/kernel/debug/tracing/events/synthetic/switchtime_latency/hist

    # event histogram
    #
    # trigger info: hist:keys=pid,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
    #

    { pid:       3015, lat:          1 } hitcount:          2
    { pid:       3015, lat:          2 } hitcount:         46
    { pid:       3015, lat:          3 } hitcount:        284
    { pid:       3015, lat:          4 } hitcount:        164
    { pid:       3015, lat:          5 } hitcount:        116
    { pid:       3015, lat:          6 } hitcount:         61
    { pid:       3015, lat:          7 } hitcount:          3
    { pid:       3015, lat:          8 } hitcount:          2
    { pid:       3015, lat:          9 } hitcount:          2
    { pid:       3015, lat:         11 } hitcount:          3
    { pid:       3015, lat:         12 } hitcount:          4
    { pid:       3015, lat:         13 } hitcount:          5
    { pid:       3015, lat:         14 } hitcount:          3
    { pid:       3015, lat:         15 } hitcount:          5
    { pid:       3015, lat:         20 } hitcount:          1
    { pid:       3016, lat:          1 } hitcount:        469
    { pid:       3016, lat:          2 } hitcount:       5068
    { pid:       3016, lat:          3 } hitcount:        920
    { pid:       3016, lat:          4 } hitcount:        257
    { pid:       3016, lat:          5 } hitcount:         97
    { pid:       3016, lat:          6 } hitcount:         50
    { pid:       3016, lat:          7 } hitcount:         32
    { pid:       3016, lat:          8 } hitcount:         26
    { pid:       3016, lat:          9 } hitcount:         24
    { pid:       3016, lat:         10 } hitcount:         21
    { pid:       3016, lat:         11 } hitcount:         10
    { pid:       3016, lat:         12 } hitcount:         12
    { pid:       3016, lat:         13 } hitcount:          6
    { pid:       3016, lat:         14 } hitcount:          3
    { pid:       3016, lat:         15 } hitcount:          1
    { pid:       3016, lat:         17 } hitcount:          1
    { pid:       3016, lat:         19 } hitcount:          1
    { pid:       3016, lat:         21 } hitcount:          2
    { pid:       3016, lat:         25 } hitcount:          1
    { pid:       3016, lat:         29 } hitcount:          1
    { pid:       3016, lat:         61 } hitcount:          1
    { pid:       3017, lat:          1 } hitcount:        101
    { pid:       3017, lat:          2 } hitcount:       3278
    { pid:       3017, lat:          3 } hitcount:        877
    { pid:       3017, lat:          4 } hitcount:        207
    { pid:       3017, lat:          5 } hitcount:         66
    { pid:       3017, lat:          6 } hitcount:         52
    { pid:       3017, lat:          7 } hitcount:         27
    { pid:       3017, lat:          8 } hitcount:         19
    { pid:       3017, lat:          9 } hitcount:         20
    { pid:       3017, lat:         10 } hitcount:         16
    { pid:       3017, lat:         11 } hitcount:          8
    { pid:       3017, lat:         12 } hitcount:          4
    { pid:       3017, lat:         13 } hitcount:          2
    { pid:       3017, lat:         14 } hitcount:          2
    { pid:       3017, lat:         15 } hitcount:          1
    { pid:       3017, lat:         16 } hitcount:          1
    { pid:       3017, lat:         24 } hitcount:          1

    Totals:
        Hits: 12386
        Entries: 53
        Dropped: 0


    # cat /sys/kernel/debug/tracing/events/synthetic/wakeupswitch_latency/hist

    # event histogram
    #
    # trigger info: hist:keys=pid,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
    #

    { pid:       3015, lat:          3 } hitcount:          2
    { pid:       3015, lat:          4 } hitcount:         21
    { pid:       3015, lat:          5 } hitcount:         77
    { pid:       3015, lat:          6 } hitcount:        174
    { pid:       3015, lat:          7 } hitcount:        180
    { pid:       3015, lat:          8 } hitcount:        123
    { pid:       3015, lat:          9 } hitcount:         65
    { pid:       3015, lat:         10 } hitcount:         13
    { pid:       3015, lat:         11 } hitcount:          9
    { pid:       3015, lat:         12 } hitcount:          5
    { pid:       3015, lat:         13 } hitcount:          5
    { pid:       3015, lat:         14 } hitcount:          5
    { pid:       3015, lat:         15 } hitcount:          3
    { pid:       3015, lat:         16 } hitcount:          4
    { pid:       3015, lat:         17 } hitcount:          3
    { pid:       3015, lat:         18 } hitcount:          7
    { pid:       3015, lat:         19 } hitcount:          2
    { pid:       3015, lat:         24 } hitcount:          1
    { pid:       3015, lat:         25 } hitcount:          1
    { pid:       3016, lat:          2 } hitcount:          3
    { pid:       3016, lat:          3 } hitcount:        472
    { pid:       3016, lat:          4 } hitcount:       3149
    { pid:       3016, lat:          5 } hitcount:       2148
    { pid:       3016, lat:          6 } hitcount:        516
    { pid:       3016, lat:          7 } hitcount:        250
    { pid:       3016, lat:          8 } hitcount:        174
    { pid:       3016, lat:          9 } hitcount:         90
    { pid:       3016, lat:         10 } hitcount:         58
    { pid:       3016, lat:         11 } hitcount:         40
    { pid:       3016, lat:         12 } hitcount:         31
    { pid:       3016, lat:         13 } hitcount:         16
    { pid:       3016, lat:         14 } hitcount:         12
    { pid:       3016, lat:         15 } hitcount:          9
    { pid:       3016, lat:         16 } hitcount:         10
    { pid:       3016, lat:         17 } hitcount:          6
    { pid:       3016, lat:         18 } hitcount:          7
    { pid:       3016, lat:         19 } hitcount:          2
    { pid:       3016, lat:         20 } hitcount:          2
    { pid:       3016, lat:         21 } hitcount:          1
    { pid:       3016, lat:         23 } hitcount:          2
    { pid:       3016, lat:         28 } hitcount:          2
    { pid:       3016, lat:         34 } hitcount:          1
    { pid:       3016, lat:         63 } hitcount:          1
    { pid:       3017, lat:          2 } hitcount:          2
    { pid:       3017, lat:          3 } hitcount:         89
    { pid:       3017, lat:          4 } hitcount:       1417
    { pid:       3017, lat:          5 } hitcount:       2026
    { pid:       3017, lat:          6 } hitcount:        549
    { pid:       3017, lat:          7 } hitcount:        169
    { pid:       3017, lat:          8 } hitcount:        148
    { pid:       3017, lat:          9 } hitcount:        104
    { pid:       3017, lat:         10 } hitcount:         63
    { pid:       3017, lat:         11 } hitcount:         36
    { pid:       3017, lat:         12 } hitcount:         34
    { pid:       3017, lat:         13 } hitcount:         19
    { pid:       3017, lat:         14 } hitcount:          9
    { pid:       3017, lat:         15 } hitcount:          6
    { pid:       3017, lat:         16 } hitcount:          3
    { pid:       3017, lat:         17 } hitcount:          3
    { pid:       3017, lat:         19 } hitcount:          1
    { pid:       3017, lat:         23 } hitcount:          1
    { pid:       3017, lat:         26 } hitcount:          2

    Totals:
        Hits: 12383
        Entries: 62
        Dropped: 0


The following changes since commit e704eff3ff5138a462443dcd64d071165df18782:

  ftrace: Have set_graph_function handle multiple functions in one write (2017-02-03 10:59:52 -0500)

are available in the git repository at:

  git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/inter-event-v0
  http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/inter-event-v0

Tom Zanussi (21):
  tracing: Add hist_field_name() accessor
  tracing: Reimplement log2
  ring-buffer: Add TIME_EXTEND_ABS ring buffer type
  tracing: Give event triggers access to ring_buffer_event
  tracing: Add ring buffer event param to hist field functions
  tracing: Increase tracing map KEYS_MAX size
  tracing: Break out hist trigger assignment parsing
  tracing: Make traceprobe parsing code reusable
  tracing: Add hist trigger timestamp support
  tracing: Add per-element variable support to tracing_map
  tracing: Add variable support to hist triggers
  tracing: Account for variables in named trigger compatibility
  tracing: Add simple expression support to hist triggers
  tracing: Add variable reference handling to hist triggers
  tracing: Add usecs modifier for hist trigger timestamps
  tracing: Add support for dynamic tracepoints
  tracing: Add hist trigger action hook
  tracing: Add support for 'synthetic' events
  tracing: Add 'onmatch' hist trigger action support
  tracing: Add 'onmax' hist trigger action support
  tracing: Add inter-event hist trigger Documentation

 Documentation/trace/events.txt      |  330 +++++
 include/linux/ring_buffer.h         |   12 +-
 include/linux/trace_events.h        |   14 +-
 include/linux/tracepoint.h          |   11 +-
 kernel/trace/ring_buffer.c          |  109 +-
 kernel/trace/trace.c                |  108 +-
 kernel/trace/trace.h                |   20 +-
 kernel/trace/trace_events.c         |    4 +-
 kernel/trace/trace_events_hist.c    | 2687 ++++++++++++++++++++++++++++++++---
 kernel/trace/trace_events_trigger.c |   47 +-
 kernel/trace/trace_kprobe.c         |   18 +-
 kernel/trace/trace_probe.c          |   75 -
 kernel/trace/trace_probe.h          |    7 -
 kernel/trace/trace_uprobe.c         |    2 +-
 kernel/trace/tracing_map.c          |  113 ++
 kernel/trace/tracing_map.h          |   13 +-
 kernel/tracepoint.c                 |   42 +-
 17 files changed, 3244 insertions(+), 368 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH 01/21] tracing: Add hist_field_name() accessor
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
@ 2017-02-08 17:24 ` Tom Zanussi
  2017-02-08 20:09   ` Steven Rostedt
  2017-02-08 17:24 ` [RFC][PATCH 02/21] tracing: Reimplement log2 Tom Zanussi
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:24 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

In preparation for hist_fields that won't be strictly based on
trace_event_fields, add a new hist_field_name() accessor to allow that
flexibility and update associated users.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 59 +++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index f3a960e..37347d7 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -145,6 +145,16 @@ struct hist_trigger_data {
 	struct tracing_map		*map;
 };
 
+static const char *hist_field_name(struct hist_field *field)
+{
+	const char *field_name = NULL;
+
+	if (field->field)
+		field_name = field->field->name;
+
+	return field_name;
+}
+
 static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
 {
 	hist_field_fn_t fn = NULL;
@@ -652,7 +662,6 @@ static int is_descending(const char *str)
 static int create_sort_keys(struct hist_trigger_data *hist_data)
 {
 	char *fields_str = hist_data->attrs->sort_key_str;
-	struct ftrace_event_field *field = NULL;
 	struct tracing_map_sort_key *sort_key;
 	int descending, ret = 0;
 	unsigned int i, j;
@@ -669,7 +678,9 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
 	}
 
 	for (i = 0; i < TRACING_MAP_SORT_KEYS_MAX; i++) {
+		struct hist_field *hist_field;
 		char *field_str, *field_name;
+		const char *test_name;
 
 		sort_key = &hist_data->sort_keys[i];
 
@@ -702,8 +713,9 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
 		}
 
 		for (j = 1; j < hist_data->n_fields; j++) {
-			field = hist_data->fields[j]->field;
-			if (field && (strcmp(field_name, field->name) == 0)) {
+			hist_field = hist_data->fields[j];
+			test_name = hist_field_name(hist_field);
+			if (strcmp(field_name, test_name) == 0) {
 				sort_key->field_idx = j;
 				descending = is_descending(field_str);
 				if (descending < 0) {
@@ -951,6 +963,7 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
 	struct hist_field *key_field;
 	char str[KSYM_SYMBOL_LEN];
 	bool multiline = false;
+	const char *field_name;
 	unsigned int i;
 	u64 uval;
 
@@ -962,26 +975,27 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
 		if (i > hist_data->n_vals)
 			seq_puts(m, ", ");
 
+		field_name = hist_field_name(key_field);
+
 		if (key_field->flags & HIST_FIELD_FL_HEX) {
 			uval = *(u64 *)(key + key_field->offset);
-			seq_printf(m, "%s: %llx",
-				   key_field->field->name, uval);
+			seq_printf(m, "%s: %llx", field_name, uval);
 		} else if (key_field->flags & HIST_FIELD_FL_SYM) {
 			uval = *(u64 *)(key + key_field->offset);
 			sprint_symbol_no_offset(str, uval);
-			seq_printf(m, "%s: [%llx] %-45s",
-				   key_field->field->name, uval, str);
+			seq_printf(m, "%s: [%llx] %-45s", field_name,
+				   uval, str);
 		} else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) {
 			uval = *(u64 *)(key + key_field->offset);
 			sprint_symbol(str, uval);
-			seq_printf(m, "%s: [%llx] %-55s",
-				   key_field->field->name, uval, str);
+			seq_printf(m, "%s: [%llx] %-55s", field_name,
+				   uval, str);
 		} else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
 			char *comm = elt->private_data;
 
 			uval = *(u64 *)(key + key_field->offset);
-			seq_printf(m, "%s: %-16s[%10llu]",
-				   key_field->field->name, comm, uval);
+			seq_printf(m, "%s: %-16s[%10llu]", field_name,
+				   comm, uval);
 		} else if (key_field->flags & HIST_FIELD_FL_SYSCALL) {
 			const char *syscall_name;
 
@@ -990,8 +1004,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
 			if (!syscall_name)
 				syscall_name = "unknown_syscall";
 
-			seq_printf(m, "%s: %-30s[%3llu]",
-				   key_field->field->name, syscall_name, uval);
+			seq_printf(m, "%s: %-30s[%3llu]", field_name,
+				   syscall_name, uval);
 		} else if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
 			seq_puts(m, "stacktrace:\n");
 			hist_trigger_stacktrace_print(m,
@@ -999,15 +1013,14 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
 						      HIST_STACKTRACE_DEPTH);
 			multiline = true;
 		} else if (key_field->flags & HIST_FIELD_FL_LOG2) {
-			seq_printf(m, "%s: ~ 2^%-2llu", key_field->field->name,
+			seq_printf(m, "%s: ~ 2^%-2llu", field_name,
 				   *(u64 *)(key + key_field->offset));
 		} else if (key_field->flags & HIST_FIELD_FL_STRING) {
-			seq_printf(m, "%s: %-50s", key_field->field->name,
+			seq_printf(m, "%s: %-50s", field_name,
 				   (char *)(key + key_field->offset));
 		} else {
 			uval = *(u64 *)(key + key_field->offset);
-			seq_printf(m, "%s: %10llu", key_field->field->name,
-				   uval);
+			seq_printf(m, "%s: %10llu", field_name, uval);
 		}
 	}
 
@@ -1020,13 +1033,13 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
 		   tracing_map_read_sum(elt, HITCOUNT_IDX));
 
 	for (i = 1; i < hist_data->n_vals; i++) {
+		field_name = hist_field_name(hist_data->fields[i]);
+
 		if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
-			seq_printf(m, "  %s: %10llx",
-				   hist_data->fields[i]->field->name,
+			seq_printf(m, "  %s: %10llx", field_name,
 				   tracing_map_read_sum(elt, i));
 		} else {
-			seq_printf(m, "  %s: %10llu",
-				   hist_data->fields[i]->field->name,
+			seq_printf(m, "  %s: %10llu", field_name,
 				   tracing_map_read_sum(elt, i));
 		}
 	}
@@ -1141,7 +1154,9 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)
 
 static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
 {
-	seq_printf(m, "%s", hist_field->field->name);
+	const char *field_name = hist_field_name(hist_field);
+
+	seq_printf(m, "%s", field_name);
 	if (hist_field->flags) {
 		const char *flags_str = get_hist_field_flags(hist_field);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 02/21] tracing: Reimplement log2
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
  2017-02-08 17:24 ` [RFC][PATCH 01/21] tracing: Add hist_field_name() accessor Tom Zanussi
@ 2017-02-08 17:24 ` Tom Zanussi
  2017-02-08 20:13   ` Steven Rostedt
  2017-02-08 17:24 ` [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type Tom Zanussi
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:24 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

log2 as currently implemented applies only to u64 trace_event_field
derived fields, and assumes that anything it's applied to is a u64
field.

To prepare for synthetic fields like latencies, log2 should be
applicable to those as well, so take the opportunity now to fix the
current problems as well as expand to more general uses.

log2 should be thought of as a chaining function rather than a field
type.  To enable this as well as possible future function
implementations, add a hist_field operand array into the hist_field
definition for this purpose, and make use of it to implement the log2
'function'.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 37347d7..afd766a 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -27,12 +27,16 @@
 
 typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event);
 
+#define HIST_FIELD_OPERANDS_MAX	2
+
 struct hist_field {
 	struct ftrace_event_field	*field;
 	unsigned long			flags;
 	hist_field_fn_t			fn;
 	unsigned int			size;
 	unsigned int			offset;
+	unsigned int                    is_signed;
+	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
 };
 
 static u64 hist_field_none(struct hist_field *field, void *event)
@@ -70,7 +74,9 @@ static u64 hist_field_pstring(struct hist_field *hist_field, void *event)
 
 static u64 hist_field_log2(struct hist_field *hist_field, void *event)
 {
-	u64 val = *(u64 *)(event + hist_field->field->offset);
+	struct hist_field *operand = hist_field->operands[0];
+
+	u64 val = operand->fn(operand, event);
 
 	return (u64) ilog2(roundup_pow_of_two(val));
 }
@@ -151,6 +157,8 @@ static const char *hist_field_name(struct hist_field *field)
 
 	if (field->field)
 		field_name = field->field->name;
+	else if (field->flags & HIST_FIELD_FL_LOG2)
+		field_name = hist_field_name(field->operands[0]);
 
 	return field_name;
 }
@@ -351,6 +359,14 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
 
 static void destroy_hist_field(struct hist_field *hist_field)
 {
+	unsigned int i;
+
+	if (!hist_field)
+		return;
+
+	for (i = 0; i < HIST_FIELD_OPERANDS_MAX; i++)
+		destroy_hist_field(hist_field->operands[i]);
+
 	kfree(hist_field);
 }
 
@@ -377,7 +393,10 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 	}
 
 	if (flags & HIST_FIELD_FL_LOG2) {
+		unsigned long fl = flags & ~HIST_FIELD_FL_LOG2;
 		hist_field->fn = hist_field_log2;
+		hist_field->operands[0] = create_hist_field(field, fl);
+		hist_field->size = hist_field->operands[0]->size;
 		goto out;
 	}
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
  2017-02-08 17:24 ` [RFC][PATCH 01/21] tracing: Add hist_field_name() accessor Tom Zanussi
  2017-02-08 17:24 ` [RFC][PATCH 02/21] tracing: Reimplement log2 Tom Zanussi
@ 2017-02-08 17:24 ` Tom Zanussi
  2017-02-08 20:32   ` Steven Rostedt
  2017-02-08 17:25 ` [RFC][PATCH 04/21] tracing: Give event triggers access to ring_buffer_event Tom Zanussi
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:24 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Replace the unused RINGBUF_TYPE_TIME_STAMP ring buffer type with
RINGBUF_TYPE_TIME_EXTEND_ABS, which forces extended time_deltas for
all events.

Having time_deltas that aren't dependent on previous events in the
ring buffer makes it feasible to use the ring_buffer_event timetamps
in a more random-access way, to be used for purposes other than serial
event printing.

To set/reset this mode, use tracing_set_timestamp_abs().

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 include/linux/ring_buffer.h |  12 ++++-
 kernel/trace/ring_buffer.c  | 109 ++++++++++++++++++++++++++++++++------------
 kernel/trace/trace.c        |  25 +++++++++-
 kernel/trace/trace.h        |   2 +
 4 files changed, 117 insertions(+), 31 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b6d4568..c3a1064 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -36,6 +36,12 @@ struct ring_buffer_event {
  *				 array[0] = time delta (28 .. 59)
  *				 size = 8 bytes
  *
+ * @RINGBUF_TYPE_TIME_EXTEND_ABS:
+ *				 Extend the time delta, but interpret it as
+ *				 absolute, not relative
+ *				 array[0] = time delta (28 .. 59)
+ *				 size = 8 bytes
+ *
  * @RINGBUF_TYPE_TIME_STAMP:	Sync time stamp with external clock
  *				 array[0]    = tv_nsec
  *				 array[1..2] = tv_sec
@@ -56,12 +62,12 @@ enum ring_buffer_type {
 	RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
 	RINGBUF_TYPE_PADDING,
 	RINGBUF_TYPE_TIME_EXTEND,
-	/* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
-	RINGBUF_TYPE_TIME_STAMP,
+	RINGBUF_TYPE_TIME_EXTEND_ABS,
 };
 
 unsigned ring_buffer_event_length(struct ring_buffer_event *event);
 void *ring_buffer_event_data(struct ring_buffer_event *event);
+u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event);
 
 /*
  * ring_buffer_discard_commit will remove an event that has not
@@ -180,6 +186,8 @@ void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
 				      int cpu, u64 *ts);
 void ring_buffer_set_clock(struct ring_buffer *buffer,
 			   u64 (*clock)(void));
+void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs);
+bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer);
 
 size_t ring_buffer_page_len(void *page);
 
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a85739e..c9c9a83 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -41,6 +41,8 @@ int ring_buffer_print_entry_header(struct trace_seq *s)
 			 RINGBUF_TYPE_PADDING);
 	trace_seq_printf(s, "\ttime_extend : type == %d\n",
 			 RINGBUF_TYPE_TIME_EXTEND);
+	trace_seq_printf(s, "\ttime_extend_abs : type == %d\n",
+			 RINGBUF_TYPE_TIME_EXTEND_ABS);
 	trace_seq_printf(s, "\tdata max type_len  == %d\n",
 			 RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
 
@@ -186,11 +188,9 @@ static void rb_event_set_padding(struct ring_buffer_event *event)
 		return  event->array[0] + RB_EVNT_HDR_SIZE;
 
 	case RINGBUF_TYPE_TIME_EXTEND:
+	case RINGBUF_TYPE_TIME_EXTEND_ABS:
 		return RB_LEN_TIME_EXTEND;
 
-	case RINGBUF_TYPE_TIME_STAMP:
-		return RB_LEN_TIME_STAMP;
-
 	case RINGBUF_TYPE_DATA:
 		return rb_event_data_length(event);
 	default:
@@ -209,7 +209,8 @@ static void rb_event_set_padding(struct ring_buffer_event *event)
 {
 	unsigned len = 0;
 
-	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
+	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS) {
 		/* time extends include the data event after it */
 		len = RB_LEN_TIME_EXTEND;
 		event = skip_time_extend(event);
@@ -231,7 +232,8 @@ unsigned ring_buffer_event_length(struct ring_buffer_event *event)
 {
 	unsigned length;
 
-	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
+	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS)
 		event = skip_time_extend(event);
 
 	length = rb_event_length(event);
@@ -248,7 +250,8 @@ unsigned ring_buffer_event_length(struct ring_buffer_event *event)
 static __always_inline void *
 rb_event_data(struct ring_buffer_event *event)
 {
-	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
+	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS)
 		event = skip_time_extend(event);
 	BUG_ON(event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
 	/* If length is in len field, then array[0] has the data */
@@ -483,6 +486,7 @@ struct ring_buffer {
 	u64				(*clock)(void);
 
 	struct rb_irq_work		irq_work;
+	bool				time_stamp_abs;
 };
 
 struct ring_buffer_iter {
@@ -1377,6 +1381,16 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
 	buffer->clock = clock;
 }
 
+void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs)
+{
+	buffer->time_stamp_abs = abs;
+}
+
+bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer)
+{
+	return buffer->time_stamp_abs;
+}
+
 static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
 
 static inline unsigned long rb_page_entries(struct buffer_page *bpage)
@@ -2207,13 +2221,16 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
 }
 
 /* Slow path, do not inline */
-static noinline struct ring_buffer_event *
-rb_add_time_stamp(struct ring_buffer_event *event, u64 delta)
+static struct noinline ring_buffer_event *
+rb_add_time_stamp(struct ring_buffer_event *event, u64 delta, bool abs)
 {
-	event->type_len = RINGBUF_TYPE_TIME_EXTEND;
+	if (abs)
+		event->type_len = RINGBUF_TYPE_TIME_EXTEND_ABS;
+	else
+		event->type_len = RINGBUF_TYPE_TIME_EXTEND;
 
-	/* Not the first event on the page? */
-	if (rb_event_index(event)) {
+	/* Not the first event on the page, or not delta? */
+	if (abs || rb_event_index(event)) {
 		event->time_delta = delta & TS_MASK;
 		event->array[0] = delta >> TS_SHIFT;
 	} else {
@@ -2256,7 +2273,9 @@ static inline bool rb_event_is_commit(struct ring_buffer_per_cpu *cpu_buffer,
 	 * add it to the start of the resevered space.
 	 */
 	if (unlikely(info->add_timestamp)) {
-		event = rb_add_time_stamp(event, delta);
+		bool abs = ring_buffer_time_stamp_abs(cpu_buffer->buffer);
+
+		event = rb_add_time_stamp(event, info->delta, abs);
 		length -= RB_LEN_TIME_EXTEND;
 		delta = 0;
 	}
@@ -2444,7 +2463,8 @@ static __always_inline void rb_end_commit(struct ring_buffer_per_cpu *cpu_buffer
 
 static inline void rb_event_discard(struct ring_buffer_event *event)
 {
-	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
+	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS)
 		event = skip_time_extend(event);
 
 	/* array[0] holds the actual length for the discarded event */
@@ -2475,6 +2495,10 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
 {
 	u64 delta;
 
+	/* Ignore write_stamp if TIME_EXTEND_ABS */
+	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS)
+		return;
+
 	/*
 	 * The event first in the commit queue updates the
 	 * time stamp.
@@ -2492,8 +2516,7 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
 			delta <<= TS_SHIFT;
 			delta += event->time_delta;
 			cpu_buffer->write_stamp += delta;
-		} else
-			cpu_buffer->write_stamp += event->time_delta;
+		}
 	}
 }
 
@@ -2674,7 +2697,7 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
 	 * If this is the first commit on the page, then it has the same
 	 * timestamp as the page itself.
 	 */
-	if (!tail)
+	if (!tail && !ring_buffer_time_stamp_abs(cpu_buffer->buffer))
 		info->delta = 0;
 
 	/* See if we shot pass the end of this buffer page */
@@ -2752,8 +2775,11 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
 	/* make sure this diff is calculated here */
 	barrier();
 
-	/* Did the write stamp get updated already? */
-	if (likely(info.ts >= cpu_buffer->write_stamp)) {
+	if (ring_buffer_time_stamp_abs(buffer)) {
+		info.delta = info.ts;
+		rb_handle_timestamp(cpu_buffer, &info);
+	} else /* Did the write stamp get updated already? */
+		if (likely(info.ts >= cpu_buffer->write_stamp)) {
 		info.delta = diff;
 		if (unlikely(test_time_stamp(info.delta)))
 			rb_handle_timestamp(cpu_buffer, &info);
@@ -3429,8 +3455,8 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
 		cpu_buffer->read_stamp += delta;
 		return;
 
-	case RINGBUF_TYPE_TIME_STAMP:
-		/* FIXME: not implemented */
+	case RINGBUF_TYPE_TIME_EXTEND_ABS:
+		/* Ignore read_stamp if TIME_EXTEND_ABS */
 		return;
 
 	case RINGBUF_TYPE_DATA:
@@ -3460,8 +3486,8 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
 		iter->read_stamp += delta;
 		return;
 
-	case RINGBUF_TYPE_TIME_STAMP:
-		/* FIXME: not implemented */
+	case RINGBUF_TYPE_TIME_EXTEND_ABS:
+		/* Ignore read_stamp if TIME_EXTEND_ABS */
 		return;
 
 	case RINGBUF_TYPE_DATA:
@@ -3677,6 +3703,17 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
 	return cpu_buffer->lost_events;
 }
 
+u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event)
+{
+	u64 ts;
+
+	ts = event->array[0];
+	ts <<= TS_SHIFT;
+	ts += event->time_delta;
+
+	return ts;
+}
+
 static struct ring_buffer_event *
 rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts,
 	       unsigned long *lost_events)
@@ -3685,6 +3722,9 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
 	struct buffer_page *reader;
 	int nr_loops = 0;
 
+	if (ts)
+		*ts = 0;
+
  again:
 	/*
 	 * We repeat when a time extend is encountered.
@@ -3720,13 +3760,18 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
 		rb_advance_reader(cpu_buffer);
 		goto again;
 
-	case RINGBUF_TYPE_TIME_STAMP:
-		/* FIXME: not implemented */
+	case RINGBUF_TYPE_TIME_EXTEND_ABS:
+		if (ts) {
+			*ts = ring_buffer_event_time_stamp(event);
+			ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
+							 cpu_buffer->cpu, ts);
+		}
+		/* Internal data, OK to advance */
 		rb_advance_reader(cpu_buffer);
 		goto again;
 
 	case RINGBUF_TYPE_DATA:
-		if (ts) {
+		if (ts && !(*ts)) {
 			*ts = cpu_buffer->read_stamp + event->time_delta;
 			ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
 							 cpu_buffer->cpu, ts);
@@ -3751,6 +3796,9 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
 	struct ring_buffer_event *event;
 	int nr_loops = 0;
 
+	if (ts)
+		*ts = 0;
+
 	cpu_buffer = iter->cpu_buffer;
 	buffer = cpu_buffer->buffer;
 
@@ -3802,13 +3850,18 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
 		rb_advance_iter(iter);
 		goto again;
 
-	case RINGBUF_TYPE_TIME_STAMP:
-		/* FIXME: not implemented */
+	case RINGBUF_TYPE_TIME_EXTEND_ABS:
+		if (ts) {
+			*ts = ring_buffer_event_time_stamp(event);
+			ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
+							 cpu_buffer->cpu, ts);
+		}
+		/* Internal data, OK to advance */
 		rb_advance_iter(iter);
 		goto again;
 
 	case RINGBUF_TYPE_DATA:
-		if (ts) {
+		if (ts && !(*ts)) {
 			*ts = iter->read_stamp + event->time_delta;
 			ring_buffer_normalize_time_stamp(buffer,
 							 cpu_buffer->cpu, ts);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4589b67..5868656 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2090,7 +2090,7 @@ struct ring_buffer_event *
 
 	*current_rb = trace_file->tr->trace_buffer.buffer;
 
-	if ((trace_file->flags &
+	if (!ring_buffer_time_stamp_abs(*current_rb) && (trace_file->flags &
 	     (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
 	    (entry = this_cpu_read(trace_buffered_event))) {
 		/* Try to use the per cpu buffer first */
@@ -5967,6 +5967,29 @@ static int tracing_clock_open(struct inode *inode, struct file *file)
 	return ret;
 }
 
+int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs)
+{
+	mutex_lock(&trace_types_lock);
+
+	ring_buffer_set_time_stamp_abs(tr->trace_buffer.buffer, abs);
+
+	/*
+	 * New timestamps may not be consistent with the previous setting.
+	 * Reset the buffer so that it doesn't have incomparable timestamps.
+	 */
+	tracing_reset_online_cpus(&tr->trace_buffer);
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+	if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
+		ring_buffer_set_time_stamp_abs(tr->max_buffer.buffer, abs);
+	tracing_reset_online_cpus(&tr->max_buffer);
+#endif
+
+	mutex_unlock(&trace_types_lock);
+
+	return 0;
+}
+
 struct ftrace_buffer_info {
 	struct trace_iterator	iter;
 	void			*spare;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index afbec96..12bc7fa 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -278,6 +278,8 @@ enum {
 extern int trace_array_get(struct trace_array *tr);
 extern void trace_array_put(struct trace_array *tr);
 
+extern int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs);
+
 /*
  * The global tracer (top) should be the first trace array added,
  * but we check the flag anyway.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 04/21] tracing: Give event triggers access to ring_buffer_event
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (2 preceding siblings ...)
  2017-02-08 17:24 ` [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 05/21] tracing: Add ring buffer event param to hist field functions Tom Zanussi
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

The ring_buffer event can provide a timestamp that may be useful to
various triggers - pass it into the handlers for that purpose.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 include/linux/trace_events.h        | 14 ++++++-----
 kernel/trace/trace.h                |  9 +++----
 kernel/trace/trace_events_hist.c    | 11 +++++----
 kernel/trace/trace_events_trigger.c | 47 +++++++++++++++++++++++--------------
 4 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index be00761..06a5d2d 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -395,11 +395,13 @@ enum event_trigger_type {
 
 extern int filter_match_preds(struct event_filter *filter, void *rec);
 
-extern enum event_trigger_type event_triggers_call(struct trace_event_file *file,
-						   void *rec);
-extern void event_triggers_post_call(struct trace_event_file *file,
-				     enum event_trigger_type tt,
-				     void *rec);
+extern enum event_trigger_type
+event_triggers_call(struct trace_event_file *file, void *rec,
+		    struct ring_buffer_event *event);
+extern void
+event_triggers_post_call(struct trace_event_file *file,
+			 enum event_trigger_type tt,
+			 void *rec, struct ring_buffer_event *event);
 
 bool trace_event_ignore_this_pid(struct trace_event_file *trace_file);
 
@@ -419,7 +421,7 @@ extern void event_triggers_post_call(struct trace_event_file *file,
 
 	if (!(eflags & EVENT_FILE_FL_TRIGGER_COND)) {
 		if (eflags & EVENT_FILE_FL_TRIGGER_MODE)
-			event_triggers_call(file, NULL);
+			event_triggers_call(file, NULL, NULL);
 		if (eflags & EVENT_FILE_FL_SOFT_DISABLED)
 			return true;
 		if (eflags & EVENT_FILE_FL_PID_FILTER)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 12bc7fa..ac55fa1 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1187,7 +1187,7 @@ static inline void trace_buffer_unlock_commit(struct trace_array *tr,
 	unsigned long eflags = file->flags;
 
 	if (eflags & EVENT_FILE_FL_TRIGGER_COND)
-		*tt = event_triggers_call(file, entry);
+		*tt = event_triggers_call(file, entry, event);
 
 	if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags) ||
 	    (unlikely(file->flags & EVENT_FILE_FL_FILTERED) &&
@@ -1224,7 +1224,7 @@ static inline void trace_buffer_unlock_commit(struct trace_array *tr,
 		trace_buffer_unlock_commit(file->tr, buffer, event, irq_flags, pc);
 
 	if (tt)
-		event_triggers_post_call(file, tt, entry);
+		event_triggers_post_call(file, tt, entry, event);
 }
 
 /**
@@ -1257,7 +1257,7 @@ static inline void trace_buffer_unlock_commit(struct trace_array *tr,
 						irq_flags, pc, regs);
 
 	if (tt)
-		event_triggers_post_call(file, tt, entry);
+		event_triggers_post_call(file, tt, entry, event);
 }
 
 #define FILTER_PRED_INVALID	((unsigned short)-1)
@@ -1479,7 +1479,8 @@ extern void set_named_trigger_data(struct event_trigger_data *data,
  */
 struct event_trigger_ops {
 	void			(*func)(struct event_trigger_data *data,
-					void *rec);
+					void *rec,
+					struct ring_buffer_event *rbe);
 	int			(*init)(struct event_trigger_ops *ops,
 					struct event_trigger_data *data);
 	void			(*free)(struct event_trigger_ops *ops,
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index afd766a..902df2c 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -907,7 +907,8 @@ static inline void add_to_key(char *compound_key, void *key,
 	memcpy(compound_key + key_field->offset, key, size);
 }
 
-static void event_hist_trigger(struct event_trigger_data *data, void *rec)
+static void event_hist_trigger(struct event_trigger_data *data, void *rec,
+			       struct ring_buffer_event *event)
 {
 	struct hist_trigger_data *hist_data = data->private_data;
 	bool use_compound_key = (hist_data->n_keys > 1);
@@ -1658,7 +1659,8 @@ __init int register_trigger_hist_cmd(void)
 }
 
 static void
-hist_enable_trigger(struct event_trigger_data *data, void *rec)
+hist_enable_trigger(struct event_trigger_data *data, void *rec,
+		    struct ring_buffer_event *event)
 {
 	struct enable_trigger_data *enable_data = data->private_data;
 	struct event_trigger_data *test;
@@ -1674,7 +1676,8 @@ __init int register_trigger_hist_cmd(void)
 }
 
 static void
-hist_enable_count_trigger(struct event_trigger_data *data, void *rec)
+hist_enable_count_trigger(struct event_trigger_data *data, void *rec,
+			  struct ring_buffer_event *event)
 {
 	if (!data->count)
 		return;
@@ -1682,7 +1685,7 @@ __init int register_trigger_hist_cmd(void)
 	if (data->count != -1)
 		(data->count)--;
 
-	hist_enable_trigger(data, rec);
+	hist_enable_trigger(data, rec, event);
 }
 
 static struct event_trigger_ops hist_enable_trigger_ops = {
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 6721a1e8..152548f 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -62,7 +62,8 @@ void trigger_data_free(struct event_trigger_data *data)
  * any trigger that should be deferred, ETT_NONE if nothing to defer.
  */
 enum event_trigger_type
-event_triggers_call(struct trace_event_file *file, void *rec)
+event_triggers_call(struct trace_event_file *file, void *rec,
+		    struct ring_buffer_event *event)
 {
 	struct event_trigger_data *data;
 	enum event_trigger_type tt = ETT_NONE;
@@ -75,7 +76,7 @@ enum event_trigger_type
 		if (data->paused)
 			continue;
 		if (!rec) {
-			data->ops->func(data, rec);
+			data->ops->func(data, rec, event);
 			continue;
 		}
 		filter = rcu_dereference_sched(data->filter);
@@ -85,7 +86,7 @@ enum event_trigger_type
 			tt |= data->cmd_ops->trigger_type;
 			continue;
 		}
-		data->ops->func(data, rec);
+		data->ops->func(data, rec, event);
 	}
 	return tt;
 }
@@ -107,7 +108,7 @@ enum event_trigger_type
 void
 event_triggers_post_call(struct trace_event_file *file,
 			 enum event_trigger_type tt,
-			 void *rec)
+			 void *rec, struct ring_buffer_event *event)
 {
 	struct event_trigger_data *data;
 
@@ -115,7 +116,7 @@ enum event_trigger_type
 		if (data->paused)
 			continue;
 		if (data->cmd_ops->trigger_type & tt)
-			data->ops->func(data, rec);
+			data->ops->func(data, rec, event);
 	}
 }
 EXPORT_SYMBOL_GPL(event_triggers_post_call);
@@ -908,7 +909,8 @@ void set_named_trigger_data(struct event_trigger_data *data,
 }
 
 static void
-traceon_trigger(struct event_trigger_data *data, void *rec)
+traceon_trigger(struct event_trigger_data *data, void *rec,
+		struct ring_buffer_event *event)
 {
 	if (tracing_is_on())
 		return;
@@ -917,7 +919,8 @@ void set_named_trigger_data(struct event_trigger_data *data,
 }
 
 static void
-traceon_count_trigger(struct event_trigger_data *data, void *rec)
+traceon_count_trigger(struct event_trigger_data *data, void *rec,
+		      struct ring_buffer_event *event)
 {
 	if (tracing_is_on())
 		return;
@@ -932,7 +935,8 @@ void set_named_trigger_data(struct event_trigger_data *data,
 }
 
 static void
-traceoff_trigger(struct event_trigger_data *data, void *rec)
+traceoff_trigger(struct event_trigger_data *data, void *rec,
+		 struct ring_buffer_event *event)
 {
 	if (!tracing_is_on())
 		return;
@@ -941,7 +945,8 @@ void set_named_trigger_data(struct event_trigger_data *data,
 }
 
 static void
-traceoff_count_trigger(struct event_trigger_data *data, void *rec)
+traceoff_count_trigger(struct event_trigger_data *data, void *rec,
+		       struct ring_buffer_event *event)
 {
 	if (!tracing_is_on())
 		return;
@@ -1038,13 +1043,15 @@ void set_named_trigger_data(struct event_trigger_data *data,
 
 #ifdef CONFIG_TRACER_SNAPSHOT
 static void
-snapshot_trigger(struct event_trigger_data *data, void *rec)
+snapshot_trigger(struct event_trigger_data *data, void *rec,
+		 struct ring_buffer_event *event)
 {
 	tracing_snapshot();
 }
 
 static void
-snapshot_count_trigger(struct event_trigger_data *data, void *rec)
+snapshot_count_trigger(struct event_trigger_data *data, void *rec,
+		       struct ring_buffer_event *event)
 {
 	if (!data->count)
 		return;
@@ -1052,7 +1059,7 @@ void set_named_trigger_data(struct event_trigger_data *data,
 	if (data->count != -1)
 		(data->count)--;
 
-	snapshot_trigger(data, rec);
+	snapshot_trigger(data, rec, event);
 }
 
 static int
@@ -1131,13 +1138,15 @@ static __init int register_trigger_snapshot_cmd(void)
 #define STACK_SKIP 3
 
 static void
-stacktrace_trigger(struct event_trigger_data *data, void *rec)
+stacktrace_trigger(struct event_trigger_data *data, void *rec,
+		   struct ring_buffer_event *event)
 {
 	trace_dump_stack(STACK_SKIP);
 }
 
 static void
-stacktrace_count_trigger(struct event_trigger_data *data, void *rec)
+stacktrace_count_trigger(struct event_trigger_data *data, void *rec,
+			 struct ring_buffer_event *event)
 {
 	if (!data->count)
 		return;
@@ -1145,7 +1154,7 @@ static __init int register_trigger_snapshot_cmd(void)
 	if (data->count != -1)
 		(data->count)--;
 
-	stacktrace_trigger(data, rec);
+	stacktrace_trigger(data, rec, event);
 }
 
 static int
@@ -1207,7 +1216,8 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
 }
 
 static void
-event_enable_trigger(struct event_trigger_data *data, void *rec)
+event_enable_trigger(struct event_trigger_data *data, void *rec,
+		     struct ring_buffer_event *event)
 {
 	struct enable_trigger_data *enable_data = data->private_data;
 
@@ -1218,7 +1228,8 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
 }
 
 static void
-event_enable_count_trigger(struct event_trigger_data *data, void *rec)
+event_enable_count_trigger(struct event_trigger_data *data, void *rec,
+			   struct ring_buffer_event *event)
 {
 	struct enable_trigger_data *enable_data = data->private_data;
 
@@ -1232,7 +1243,7 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
 	if (data->count != -1)
 		(data->count)--;
 
-	event_enable_trigger(data, rec);
+	event_enable_trigger(data, rec, event);
 }
 
 int event_enable_trigger_print(struct seq_file *m,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 05/21] tracing: Add ring buffer event param to hist field functions
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (3 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 04/21] tracing: Give event triggers access to ring_buffer_event Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 06/21] tracing: Increase tracing map KEYS_MAX size Tom Zanussi
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Some events such as timestamps require access to a ring_buffer_event
struct; add a param so that hist field functions can access that.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 902df2c..38faa08 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -25,7 +25,8 @@
 
 struct hist_field;
 
-typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event);
+typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
+				struct ring_buffer_event *rbe);
 
 #define HIST_FIELD_OPERANDS_MAX	2
 
@@ -39,24 +40,28 @@ struct hist_field {
 	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
 };
 
-static u64 hist_field_none(struct hist_field *field, void *event)
+static u64 hist_field_none(struct hist_field *field, void *event,
+			   struct ring_buffer_event *rbe)
 {
 	return 0;
 }
 
-static u64 hist_field_counter(struct hist_field *field, void *event)
+static u64 hist_field_counter(struct hist_field *field, void *event,
+			      struct ring_buffer_event *rbe)
 {
 	return 1;
 }
 
-static u64 hist_field_string(struct hist_field *hist_field, void *event)
+static u64 hist_field_string(struct hist_field *hist_field, void *event,
+			     struct ring_buffer_event *rbe)
 {
 	char *addr = (char *)(event + hist_field->field->offset);
 
 	return (u64)(unsigned long)addr;
 }
 
-static u64 hist_field_dynstring(struct hist_field *hist_field, void *event)
+static u64 hist_field_dynstring(struct hist_field *hist_field, void *event,
+				struct ring_buffer_event *rbe)
 {
 	u32 str_item = *(u32 *)(event + hist_field->field->offset);
 	int str_loc = str_item & 0xffff;
@@ -65,24 +70,28 @@ static u64 hist_field_dynstring(struct hist_field *hist_field, void *event)
 	return (u64)(unsigned long)addr;
 }
 
-static u64 hist_field_pstring(struct hist_field *hist_field, void *event)
+static u64 hist_field_pstring(struct hist_field *hist_field, void *event,
+			      struct ring_buffer_event *rbe)
 {
 	char **addr = (char **)(event + hist_field->field->offset);
 
 	return (u64)(unsigned long)*addr;
 }
 
-static u64 hist_field_log2(struct hist_field *hist_field, void *event)
+static u64 hist_field_log2(struct hist_field *hist_field, void *event,
+			   struct ring_buffer_event *rbe)
 {
 	struct hist_field *operand = hist_field->operands[0];
 
-	u64 val = operand->fn(operand, event);
+	u64 val = operand->fn(operand, event, rbe);
 
 	return (u64) ilog2(roundup_pow_of_two(val));
 }
 
 #define DEFINE_HIST_FIELD_FN(type)					\
-static u64 hist_field_##type(struct hist_field *hist_field, void *event)\
+	static u64 hist_field_##type(struct hist_field *hist_field,	\
+				     void *event,			\
+				     struct ring_buffer_event *rbe)	\
 {									\
 	type *addr = (type *)(event + hist_field->field->offset);	\
 									\
@@ -869,8 +878,8 @@ static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
 }
 
 static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
-				    struct tracing_map_elt *elt,
-				    void *rec)
+				    struct tracing_map_elt *elt, void *rec,
+				    struct ring_buffer_event *rbe)
 {
 	struct hist_field *hist_field;
 	unsigned int i;
@@ -878,7 +887,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
 
 	for_each_hist_val_field(i, hist_data) {
 		hist_field = hist_data->fields[i];
-		hist_val = hist_field->fn(hist_field, rec);
+		hist_val = hist_field->fn(hist_field, rec, rbe);
 		tracing_map_update_sum(elt, i, hist_val);
 	}
 }
@@ -908,7 +917,7 @@ static inline void add_to_key(char *compound_key, void *key,
 }
 
 static void event_hist_trigger(struct event_trigger_data *data, void *rec,
-			       struct ring_buffer_event *event)
+			       struct ring_buffer_event *rbe)
 {
 	struct hist_trigger_data *hist_data = data->private_data;
 	bool use_compound_key = (hist_data->n_keys > 1);
@@ -937,7 +946,7 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 
 			key = entries;
 		} else {
-			field_contents = key_field->fn(key_field, rec);
+			field_contents = key_field->fn(key_field, rec, rbe);
 			if (key_field->flags & HIST_FIELD_FL_STRING) {
 				key = (void *)(unsigned long)field_contents;
 				use_compound_key = true;
@@ -954,7 +963,7 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 
 	elt = tracing_map_insert(hist_data->map, key);
 	if (elt)
-		hist_trigger_elt_update(hist_data, elt, rec);
+		hist_trigger_elt_update(hist_data, elt, rec, rbe);
 }
 
 static void hist_trigger_stacktrace_print(struct seq_file *m,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 06/21] tracing: Increase tracing map KEYS_MAX size
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (4 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 05/21] tracing: Add ring buffer event param to hist field functions Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 07/21] tracing: Break out hist trigger assignment parsing Tom Zanussi
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

The current default for the number of subkeys in a compound key is 2,
which is too restrictive.  Increase it to a more realistic value of 3.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/tracing_map.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
index 618838f..f097511 100644
--- a/kernel/trace/tracing_map.h
+++ b/kernel/trace/tracing_map.h
@@ -5,7 +5,7 @@
 #define TRACING_MAP_BITS_MAX		17
 #define TRACING_MAP_BITS_MIN		7
 
-#define TRACING_MAP_KEYS_MAX		2
+#define TRACING_MAP_KEYS_MAX		3
 #define TRACING_MAP_VALS_MAX		3
 #define TRACING_MAP_FIELDS_MAX		(TRACING_MAP_KEYS_MAX + \
 					 TRACING_MAP_VALS_MAX)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 07/21] tracing: Break out hist trigger assignment parsing
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (5 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 06/21] tracing: Increase tracing map KEYS_MAX size Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 08/21] tracing: Make traceprobe parsing code reusable Tom Zanussi
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

This will make it easier to add variables, and makes the parsing code
cleaner regardless.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 56 +++++++++++++++++++++++++---------------
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 38faa08..4e70872 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -243,6 +243,35 @@ static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
 	kfree(attrs);
 }
 
+static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
+{
+	int ret = 0;
+
+	if ((strncmp(str, "key=", strlen("key=")) == 0) ||
+	    (strncmp(str, "keys=", strlen("keys=")) == 0))
+		attrs->keys_str = kstrdup(str, GFP_KERNEL);
+	else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
+		 (strncmp(str, "vals=", strlen("vals=")) == 0) ||
+		 (strncmp(str, "values=", strlen("values=")) == 0))
+		attrs->vals_str = kstrdup(str, GFP_KERNEL);
+	else if (strncmp(str, "sort=", strlen("sort=")) == 0)
+		attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
+	else if (strncmp(str, "name=", strlen("name=")) == 0)
+		attrs->name = kstrdup(str, GFP_KERNEL);
+	else if (strncmp(str, "size=", strlen("size=")) == 0) {
+		int map_bits = parse_map_size(str);
+
+		if (map_bits < 0) {
+			ret = map_bits;
+			goto out;
+		}
+		attrs->map_bits = map_bits;
+	} else
+		ret = -EINVAL;
+ out:
+	return ret;
+}
+
 static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
 {
 	struct hist_trigger_attrs *attrs;
@@ -255,33 +284,18 @@ static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
 	while (trigger_str) {
 		char *str = strsep(&trigger_str, ":");
 
-		if ((strncmp(str, "key=", strlen("key=")) == 0) ||
-		    (strncmp(str, "keys=", strlen("keys=")) == 0))
-			attrs->keys_str = kstrdup(str, GFP_KERNEL);
-		else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
-			 (strncmp(str, "vals=", strlen("vals=")) == 0) ||
-			 (strncmp(str, "values=", strlen("values=")) == 0))
-			attrs->vals_str = kstrdup(str, GFP_KERNEL);
-		else if (strncmp(str, "sort=", strlen("sort=")) == 0)
-			attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
-		else if (strncmp(str, "name=", strlen("name=")) == 0)
-			attrs->name = kstrdup(str, GFP_KERNEL);
-		else if (strcmp(str, "pause") == 0)
+		if (strchr(str, '=')) {
+			ret = parse_assignment(str, attrs);
+			if (ret)
+				goto free;
+		} else if (strcmp(str, "pause") == 0)
 			attrs->pause = true;
 		else if ((strcmp(str, "cont") == 0) ||
 			 (strcmp(str, "continue") == 0))
 			attrs->cont = true;
 		else if (strcmp(str, "clear") == 0)
 			attrs->clear = true;
-		else if (strncmp(str, "size=", strlen("size=")) == 0) {
-			int map_bits = parse_map_size(str);
-
-			if (map_bits < 0) {
-				ret = map_bits;
-				goto free;
-			}
-			attrs->map_bits = map_bits;
-		} else {
+		else {
 			ret = -EINVAL;
 			goto free;
 		}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 08/21] tracing: Make traceprobe parsing code reusable
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (6 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 07/21] tracing: Break out hist trigger assignment parsing Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-09 20:40   ` Steven Rostedt
  2017-02-08 17:25 ` [RFC][PATCH 09/21] tracing: Add hist trigger timestamp support Tom Zanussi
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

traceprobe_probes_write() and traceprobe_command() actually contain
nothing that ties them to kprobes - the code is generically useful for
similar types of parsing elsewhere, so separate it out and move it to
trace.c/trace.h.

Other than moving it, the only change is in naming:
traceprobe_probes_write() becomes trace_parse_run_command() and
traceprobe_command() becomes trace_run_command().

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace.c        | 75 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.h        |  7 +++++
 kernel/trace/trace_kprobe.c | 18 +++++------
 kernel/trace/trace_probe.c  | 75 ---------------------------------------------
 kernel/trace/trace_probe.h  |  7 -----
 kernel/trace/trace_uprobe.c |  2 +-
 6 files changed, 92 insertions(+), 92 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5868656..78dff2f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7912,6 +7912,81 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
 }
 EXPORT_SYMBOL_GPL(ftrace_dump);
 
+int trace_run_command(const char *buf, int (*createfn)(int, char **))
+{
+	char **argv;
+	int argc, ret;
+
+	argc = 0;
+	ret = 0;
+	argv = argv_split(GFP_KERNEL, buf, &argc);
+	if (!argv)
+		return -ENOMEM;
+
+	if (argc)
+		ret = createfn(argc, argv);
+
+	argv_free(argv);
+
+	return ret;
+}
+
+#define WRITE_BUFSIZE  4096
+
+ssize_t trace_parse_run_command(struct file *file, const char __user *buffer,
+				size_t count, loff_t *ppos,
+				int (*createfn)(int, char **))
+{
+	char *kbuf, *tmp;
+	int ret = 0;
+	size_t done = 0;
+	size_t size;
+
+	kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	while (done < count) {
+		size = count - done;
+
+		if (size >= WRITE_BUFSIZE)
+			size = WRITE_BUFSIZE - 1;
+
+		if (copy_from_user(kbuf, buffer + done, size)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		kbuf[size] = '\0';
+		tmp = strchr(kbuf, '\n');
+
+		if (tmp) {
+			*tmp = '\0';
+			size = tmp - kbuf + 1;
+		} else if (done + size < count) {
+			pr_warn("Line length is too long: Should be less than %d\n",
+				WRITE_BUFSIZE);
+			ret = -EINVAL;
+			goto out;
+		}
+		done += size;
+		/* Remove comments */
+		tmp = strchr(kbuf, '#');
+
+		if (tmp)
+			*tmp = '\0';
+
+		ret = trace_run_command(kbuf, createfn);
+		if (ret)
+			goto out;
+	}
+	ret = done;
+
+out:
+	kfree(kbuf);
+
+	return ret;
+}
+
 __init static int tracer_alloc_buffers(void)
 {
 	int ring_buf_size;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index ac55fa1..f2af21b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1647,6 +1647,13 @@ extern int trace_event_enable_disable(struct trace_event_file *file,
 int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set);
 int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled);
 
+#define MAX_EVENT_NAME_LEN	64
+
+extern int trace_run_command(const char *buf, int (*createfn)(int, char**));
+extern ssize_t trace_parse_run_command(struct file *file,
+		const char __user *buffer, size_t count, loff_t *ppos,
+		int (*createfn)(int, char**));
+
 /*
  * Normal trace_printk() and friends allocates special buffers
  * to do the manipulation, as well as saves the print formats
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index a133ecd..8f3b4d9 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -876,8 +876,8 @@ static int probes_open(struct inode *inode, struct file *file)
 static ssize_t probes_write(struct file *file, const char __user *buffer,
 			    size_t count, loff_t *ppos)
 {
-	return traceprobe_probes_write(file, buffer, count, ppos,
-			create_trace_kprobe);
+	return trace_parse_run_command(file, buffer, count, ppos,
+				       create_trace_kprobe);
 }
 
 static const struct file_operations kprobe_events_ops = {
@@ -1402,9 +1402,9 @@ static __init int kprobe_trace_self_tests_init(void)
 
 	pr_info("Testing kprobe tracing: ");
 
-	ret = traceprobe_command("p:testprobe kprobe_trace_selftest_target "
-				  "$stack $stack0 +0($stack)",
-				  create_trace_kprobe);
+	ret = trace_run_command("p:testprobe kprobe_trace_selftest_target "
+				"$stack $stack0 +0($stack)",
+				create_trace_kprobe);
 	if (WARN_ON_ONCE(ret)) {
 		pr_warn("error on probing function entry.\n");
 		warn++;
@@ -1424,8 +1424,8 @@ static __init int kprobe_trace_self_tests_init(void)
 		}
 	}
 
-	ret = traceprobe_command("r:testprobe2 kprobe_trace_selftest_target "
-				  "$retval", create_trace_kprobe);
+	ret = trace_run_command("r:testprobe2 kprobe_trace_selftest_target "
+				"$retval", create_trace_kprobe);
 	if (WARN_ON_ONCE(ret)) {
 		pr_warn("error on probing function return.\n");
 		warn++;
@@ -1495,13 +1495,13 @@ static __init int kprobe_trace_self_tests_init(void)
 			disable_trace_kprobe(tk, file);
 	}
 
-	ret = traceprobe_command("-:testprobe", create_trace_kprobe);
+	ret = trace_run_command("-:testprobe", create_trace_kprobe);
 	if (WARN_ON_ONCE(ret)) {
 		pr_warn("error on deleting a probe.\n");
 		warn++;
 	}
 
-	ret = traceprobe_command("-:testprobe2", create_trace_kprobe);
+	ret = trace_run_command("-:testprobe2", create_trace_kprobe);
 	if (WARN_ON_ONCE(ret)) {
 		pr_warn("error on deleting a probe.\n");
 		warn++;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 8c0553d..b7de026 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -622,81 +622,6 @@ void traceprobe_free_probe_arg(struct probe_arg *arg)
 	kfree(arg->comm);
 }
 
-int traceprobe_command(const char *buf, int (*createfn)(int, char **))
-{
-	char **argv;
-	int argc, ret;
-
-	argc = 0;
-	ret = 0;
-	argv = argv_split(GFP_KERNEL, buf, &argc);
-	if (!argv)
-		return -ENOMEM;
-
-	if (argc)
-		ret = createfn(argc, argv);
-
-	argv_free(argv);
-
-	return ret;
-}
-
-#define WRITE_BUFSIZE  4096
-
-ssize_t traceprobe_probes_write(struct file *file, const char __user *buffer,
-				size_t count, loff_t *ppos,
-				int (*createfn)(int, char **))
-{
-	char *kbuf, *tmp;
-	int ret = 0;
-	size_t done = 0;
-	size_t size;
-
-	kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
-	if (!kbuf)
-		return -ENOMEM;
-
-	while (done < count) {
-		size = count - done;
-
-		if (size >= WRITE_BUFSIZE)
-			size = WRITE_BUFSIZE - 1;
-
-		if (copy_from_user(kbuf, buffer + done, size)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		kbuf[size] = '\0';
-		tmp = strchr(kbuf, '\n');
-
-		if (tmp) {
-			*tmp = '\0';
-			size = tmp - kbuf + 1;
-		} else if (done + size < count) {
-			pr_warn("Line length is too long: Should be less than %d\n",
-				WRITE_BUFSIZE);
-			ret = -EINVAL;
-			goto out;
-		}
-		done += size;
-		/* Remove comments */
-		tmp = strchr(kbuf, '#');
-
-		if (tmp)
-			*tmp = '\0';
-
-		ret = traceprobe_command(kbuf, createfn);
-		if (ret)
-			goto out;
-	}
-	ret = done;
-
-out:
-	kfree(kbuf);
-
-	return ret;
-}
-
 static int __set_print_fmt(struct trace_probe *tp, char *buf, int len,
 			   bool is_return)
 {
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 0c0ae54..37ab38c 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -42,7 +42,6 @@
 
 #define MAX_TRACE_ARGS		128
 #define MAX_ARGSTR_LEN		63
-#define MAX_EVENT_NAME_LEN	64
 #define MAX_STRING_SIZE		PATH_MAX
 
 /* Reserved field names */
@@ -356,12 +355,6 @@ extern int traceprobe_conflict_field_name(const char *name,
 
 extern int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset);
 
-extern ssize_t traceprobe_probes_write(struct file *file,
-		const char __user *buffer, size_t count, loff_t *ppos,
-		int (*createfn)(int, char**));
-
-extern int traceprobe_command(const char *buf, int (*createfn)(int, char**));
-
 /* Sum up total data length for dynamic arraies (strings) */
 static nokprobe_inline int
 __get_data_size(struct trace_probe *tp, struct pt_regs *regs)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 4f2ba2b..10e3ec8 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -649,7 +649,7 @@ static int probes_open(struct inode *inode, struct file *file)
 static ssize_t probes_write(struct file *file, const char __user *buffer,
 			    size_t count, loff_t *ppos)
 {
-	return traceprobe_probes_write(file, buffer, count, ppos, create_trace_uprobe);
+	return trace_parse_run_command(file, buffer, count, ppos, create_trace_uprobe);
 }
 
 static const struct file_operations uprobe_events_ops = {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 09/21] tracing: Add hist trigger timestamp support
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (7 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 08/21] tracing: Make traceprobe parsing code reusable Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-10  6:14   ` Namhyung Kim
  2017-02-08 17:25 ` [RFC][PATCH 10/21] tracing: Add per-element variable support to tracing_map Tom Zanussi
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Add support for a timestamp event field.  This is actually a 'pseudo-'
event field in that it behaves like it's part of the event record, but
is really part of the corresponding ring buffer event.

To make use of the timestamp field, users can specify "common_timestamp"
as a field name for any histogram.  Note that this doesn't make much
sense on its own either as either a key or value, but needs to be
supported even so, since follow-on patches will add support for making
use of this field in time deltas.

Note that the use of this field requires the ring buffer be put into
TIME_EXTEND_ABS mode, which saves the complete timestamp for each
event rather than an offset.  This mode will be enabled if and only if
a histogram makes use of the "common_timestamp" field.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 85 +++++++++++++++++++++++++++++-----------
 1 file changed, 62 insertions(+), 23 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 4e70872..8d7f7dd 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -88,6 +88,12 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event,
 	return (u64) ilog2(roundup_pow_of_two(val));
 }
 
+static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
+				struct ring_buffer_event *rbe)
+{
+	return ring_buffer_event_time_stamp(rbe);
+}
+
 #define DEFINE_HIST_FIELD_FN(type)					\
 	static u64 hist_field_##type(struct hist_field *hist_field,	\
 				     void *event,			\
@@ -134,6 +140,7 @@ enum hist_field_flags {
 	HIST_FIELD_FL_SYSCALL		= 128,
 	HIST_FIELD_FL_STACKTRACE	= 256,
 	HIST_FIELD_FL_LOG2		= 512,
+	HIST_FIELD_FL_TIMESTAMP		= 1024,
 };
 
 struct hist_trigger_attrs {
@@ -158,6 +165,7 @@ struct hist_trigger_data {
 	struct trace_event_file		*event_file;
 	struct hist_trigger_attrs	*attrs;
 	struct tracing_map		*map;
+	bool				enable_timestamps;
 };
 
 static const char *hist_field_name(struct hist_field *field)
@@ -404,6 +412,7 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 	hist_field = kzalloc(sizeof(struct hist_field), GFP_KERNEL);
 	if (!hist_field)
 		return NULL;
+	hist_field->is_signed = false;
 
 	if (flags & HIST_FIELD_FL_HITCOUNT) {
 		hist_field->fn = hist_field_counter;
@@ -423,6 +432,12 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 		goto out;
 	}
 
+	if (flags & HIST_FIELD_FL_TIMESTAMP) {
+		hist_field->fn = hist_field_timestamp;
+		hist_field->size = sizeof(u64);
+		goto out;
+	}
+
 	if (WARN_ON_ONCE(!field))
 		goto out;
 
@@ -500,10 +515,15 @@ static int create_val_field(struct hist_trigger_data *hist_data,
 		}
 	}
 
-	field = trace_find_event_field(file->event_call, field_name);
-	if (!field) {
-		ret = -EINVAL;
-		goto out;
+	if (strcmp(field_name, "common_timestamp") == 0) {
+		flags |= HIST_FIELD_FL_TIMESTAMP;
+		hist_data->enable_timestamps = true;
+	} else {
+		field = trace_find_event_field(file->event_call, field_name);
+		if (!field) {
+			ret = -EINVAL;
+			goto out;
+		}
 	}
 
 	hist_data->fields[val_idx] = create_hist_field(field, flags);
@@ -598,16 +618,22 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 			}
 		}
 
-		field = trace_find_event_field(file->event_call, field_name);
-		if (!field) {
-			ret = -EINVAL;
-			goto out;
-		}
+		if (strcmp(field_name, "common_timestamp") == 0) {
+			flags |= HIST_FIELD_FL_TIMESTAMP;
+			hist_data->enable_timestamps = true;
+			key_size = sizeof(u64);
+		} else {
+			field = trace_find_event_field(file->event_call, field_name);
+			if (!field) {
+				ret = -EINVAL;
+				goto out;
+			}
 
-		if (is_string_field(field))
-			key_size = MAX_FILTER_STR_VAL;
-		else
-			key_size = field->size;
+			if (is_string_field(field))
+				key_size = MAX_FILTER_STR_VAL;
+			else
+				key_size = field->size;
+		}
 	}
 
 	hist_data->fields[key_idx] = create_hist_field(field, flags);
@@ -744,7 +770,7 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
 			break;
 		}
 
-		if (strcmp(field_name, "hitcount") == 0) {
+		if ((strcmp(field_name, "hitcount") == 0)) {
 			descending = is_descending(field_str);
 			if (descending < 0) {
 				ret = descending;
@@ -802,6 +828,9 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
 
 			if (hist_field->flags & HIST_FIELD_FL_STACKTRACE)
 				cmp_fn = tracing_map_cmp_none;
+			else if (!field)
+				cmp_fn = tracing_map_cmp_num(hist_field->size,
+							     hist_field->is_signed);
 			else if (is_string_field(field))
 				cmp_fn = tracing_map_cmp_string;
 			else
@@ -1231,6 +1260,8 @@ static int event_hist_trigger_print(struct seq_file *m,
 
 		if (key_field->flags & HIST_FIELD_FL_STACKTRACE)
 			seq_puts(m, "stacktrace");
+		else if (key_field->flags & HIST_FIELD_FL_TIMESTAMP)
+			seq_puts(m, "common_timestamp");
 		else
 			hist_field_print(m, key_field);
 	}
@@ -1240,6 +1271,8 @@ static int event_hist_trigger_print(struct seq_file *m,
 	for_each_hist_val_field(i, hist_data) {
 		if (i == HITCOUNT_IDX)
 			seq_puts(m, "hitcount");
+		else if (hist_data->fields[i]->flags & HIST_FIELD_FL_TIMESTAMP)
+			seq_puts(m, "common_timestamp");
 		else {
 			seq_puts(m, ",");
 			hist_field_print(m, hist_data->fields[i]);
@@ -1250,27 +1283,27 @@ static int event_hist_trigger_print(struct seq_file *m,
 
 	for (i = 0; i < hist_data->n_sort_keys; i++) {
 		struct tracing_map_sort_key *sort_key;
+		unsigned int idx;
 
 		sort_key = &hist_data->sort_keys[i];
+		idx = sort_key->field_idx;
+
+		if (WARN_ON(idx >= TRACING_MAP_FIELDS_MAX))
+			return -EINVAL;
 
 		if (i > 0)
 			seq_puts(m, ",");
 
-		if (sort_key->field_idx == HITCOUNT_IDX)
+		if (idx == HITCOUNT_IDX)
 			seq_puts(m, "hitcount");
-		else {
-			unsigned int idx = sort_key->field_idx;
-
-			if (WARN_ON(idx >= TRACING_MAP_FIELDS_MAX))
-				return -EINVAL;
-
+		else if (hist_data->fields[idx]->flags & HIST_FIELD_FL_TIMESTAMP)
+			seq_puts(m, "common_timestamp");
+		else
 			hist_field_print(m, hist_data->fields[idx]);
-		}
 
 		if (sort_key->descending)
 			seq_puts(m, ".descending");
 	}
-
 	seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits));
 
 	if (data->filter_str)
@@ -1438,6 +1471,10 @@ static bool hist_trigger_match(struct event_trigger_data *data,
 			return false;
 		if (key_field->offset != key_field_test->offset)
 			return false;
+		if (key_field->size != key_field_test->size)
+			return false;
+		if (key_field->is_signed != key_field_test->is_signed)
+			return false;
 	}
 
 	for (i = 0; i < hist_data->n_sort_keys; i++) {
@@ -1520,6 +1557,8 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
 
 	update_cond_flag(file);
 
+	tracing_set_time_stamp_abs(file->tr, true);
+
 	if (trace_event_trigger_enable_disable(file, 1) < 0) {
 		list_del_rcu(&data->list);
 		update_cond_flag(file);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 10/21] tracing: Add per-element variable support to tracing_map
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (8 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 09/21] tracing: Add hist trigger timestamp support Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 11/21] tracing: Add variable support to hist triggers Tom Zanussi
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

In order to allow information to be passed between trace events, add
support for per-element variables to tracing_map.  This provides a
means for histograms to associate a value or values with an entry when
it's saved or updated, and retrieved by a subsequent event occurrences.

Variables can be set using tracing_map_set_var() and read using
tracing_map_read_var().  tracing_map_var_set() returns true or false
depending on whether or not the variable has been set or not, which is
important for event-matching applications.

tracing_map_read_var_once() reads the variable and resets it to the
'unset' state, implementing read-once variables, which are also
important for event-matching uses.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/tracing_map.c | 113 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/tracing_map.h |  11 +++++
 2 files changed, 124 insertions(+)

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index 0a689bb..f987069 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -66,6 +66,73 @@ u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i)
 	return (u64)atomic64_read(&elt->fields[i].sum);
 }
 
+/**
+ * tracing_map_set_var - Assign a tracing_map_elt's variable field
+ * @elt: The tracing_map_elt
+ * @i: The index of the given variable associated with the tracing_map_elt
+ * @n: The value to assign
+ *
+ * Assign n to variable i associated with the specified tracing_map_elt
+ * instance.  The index i is the index returned by the call to
+ * tracing_map_add_var() when the tracing map was set up.
+ */
+void tracing_map_set_var(struct tracing_map_elt *elt, unsigned int i, u64 n)
+{
+	atomic64_set(&elt->vars[i], n);
+	elt->var_set[i] = true;
+}
+
+/**
+ * tracing_map_var_set - Return whether or not a variable has been set
+ * @elt: The tracing_map_elt
+ * @i: The index of the given variable associated with the tracing_map_elt
+ *
+ * Return true if the variable has been set, false otherwise.  The
+ * index i is the index returned by the call to tracing_map_add_var()
+ * when the tracing map was set up.
+ */
+bool tracing_map_var_set(struct tracing_map_elt *elt, unsigned int i)
+{
+	return elt->var_set[i];
+}
+
+/**
+ * tracing_map_read_var - Return the value of a tracing_map_elt's variable field
+ * @elt: The tracing_map_elt
+ * @i: The index of the given variable associated with the tracing_map_elt
+ *
+ * Retrieve the value of the variable i associated with the specified
+ * tracing_map_elt instance.  The index i is the index returned by the
+ * call to tracing_map_add_var() when the tracing map was set
+ * up.
+ *
+ * Return: The variable value associated with field i for elt.
+ */
+u64 tracing_map_read_var(struct tracing_map_elt *elt, unsigned int i)
+{
+	return (u64)atomic64_read(&elt->vars[i]);
+}
+
+/**
+ * tracing_map_read_var_once - Return and reset a tracing_map_elt's variable field
+ * @elt: The tracing_map_elt
+ * @i: The index of the given variable associated with the tracing_map_elt
+ *
+ * Retrieve the value of the variable i associated with the specified
+ * tracing_map_elt instance, and reset the variable to the 'not set'
+ * state.  The index i is the index returned by the call to
+ * tracing_map_add_var() when the tracing map was set up.  The reset
+ * essentially makes the variable a read-once variable if it's only
+ * accessed using this function.
+ *
+ * Return: The variable value associated with field i for elt.
+ */
+u64 tracing_map_read_var_once(struct tracing_map_elt *elt, unsigned int i)
+{
+	elt->var_set[i] = false;
+	return (u64)atomic64_read(&elt->vars[i]);
+}
+
 int tracing_map_cmp_string(void *val_a, void *val_b)
 {
 	char *a = val_a;
@@ -171,6 +238,28 @@ int tracing_map_add_sum_field(struct tracing_map *map)
 }
 
 /**
+ * tracing_map_add_var - Add a field describing a tracing_map var
+ * @map: The tracing_map
+ *
+ * Add a var to the map and return the index identifying it in the map
+ * and associated tracing_map_elts.  This is the index used for
+ * instance to update a var for a particular tracing_map_elt using
+ * tracing_map_update_var() or reading it via tracing_map_read_var().
+ *
+ * Return: The index identifying the var in the map and associated
+ * tracing_map_elts, or -EINVAL on error.
+ */
+int tracing_map_add_var(struct tracing_map *map)
+{
+	int ret = -EINVAL;
+
+	if (map->n_vars < TRACING_MAP_VARS_MAX)
+		ret = map->n_vars++;
+
+	return ret;
+}
+
+/**
  * tracing_map_add_key_field - Add a field describing a tracing_map key
  * @map: The tracing_map
  * @offset: The offset within the key
@@ -277,6 +366,11 @@ static void tracing_map_elt_clear(struct tracing_map_elt *elt)
 		if (elt->fields[i].cmp_fn == tracing_map_cmp_atomic64)
 			atomic64_set(&elt->fields[i].sum, 0);
 
+	for (i = 0; i < elt->map->n_vars; i++) {
+		atomic64_set(&elt->vars[i], 0);
+		elt->var_set[i] = false;
+	}
+
 	if (elt->map->ops && elt->map->ops->elt_clear)
 		elt->map->ops->elt_clear(elt);
 }
@@ -303,6 +397,8 @@ static void tracing_map_elt_free(struct tracing_map_elt *elt)
 	if (elt->map->ops && elt->map->ops->elt_free)
 		elt->map->ops->elt_free(elt);
 	kfree(elt->fields);
+	kfree(elt->vars);
+	kfree(elt->var_set);
 	kfree(elt->key);
 	kfree(elt);
 }
@@ -330,6 +426,18 @@ static struct tracing_map_elt *tracing_map_elt_alloc(struct tracing_map *map)
 		goto free;
 	}
 
+	elt->vars = kcalloc(map->n_vars, sizeof(*elt->vars), GFP_KERNEL);
+	if (!elt->vars) {
+		err = -ENOMEM;
+		goto free;
+	}
+
+	elt->var_set = kcalloc(map->n_vars, sizeof(*elt->var_set), GFP_KERNEL);
+	if (!elt->var_set) {
+		err = -ENOMEM;
+		goto free;
+	}
+
 	tracing_map_elt_init_fields(elt);
 
 	if (map->ops && map->ops->elt_alloc) {
@@ -833,6 +941,11 @@ static struct tracing_map_elt *copy_elt(struct tracing_map_elt *elt)
 		dup_elt->fields[i].cmp_fn = elt->fields[i].cmp_fn;
 	}
 
+	for (i = 0; i < elt->map->n_vars; i++) {
+		atomic64_set(&dup_elt->vars[i], atomic64_read(&elt->vars[i]));
+		dup_elt->var_set[i] = elt->var_set[i];
+	}
+
 	return dup_elt;
 }
 
diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
index f097511..303bc4f 100644
--- a/kernel/trace/tracing_map.h
+++ b/kernel/trace/tracing_map.h
@@ -9,6 +9,7 @@
 #define TRACING_MAP_VALS_MAX		3
 #define TRACING_MAP_FIELDS_MAX		(TRACING_MAP_KEYS_MAX + \
 					 TRACING_MAP_VALS_MAX)
+#define TRACING_MAP_VARS_MAX		16
 #define TRACING_MAP_SORT_KEYS_MAX	2
 
 typedef int (*tracing_map_cmp_fn_t) (void *val_a, void *val_b);
@@ -136,6 +137,8 @@ struct tracing_map_field {
 struct tracing_map_elt {
 	struct tracing_map		*map;
 	struct tracing_map_field	*fields;
+	atomic64_t			*vars;
+	bool				*var_set;
 	void				*key;
 	void				*private_data;
 };
@@ -191,6 +194,7 @@ struct tracing_map {
 	int				key_idx[TRACING_MAP_KEYS_MAX];
 	unsigned int			n_keys;
 	struct tracing_map_sort_key	sort_key;
+	unsigned int			n_vars;
 	atomic64_t			hits;
 	atomic64_t			drops;
 };
@@ -247,6 +251,7 @@ struct tracing_map_ops {
 extern int tracing_map_init(struct tracing_map *map);
 
 extern int tracing_map_add_sum_field(struct tracing_map *map);
+extern int tracing_map_add_var(struct tracing_map *map);
 extern int tracing_map_add_key_field(struct tracing_map *map,
 				     unsigned int offset,
 				     tracing_map_cmp_fn_t cmp_fn);
@@ -266,7 +271,13 @@ extern tracing_map_cmp_fn_t tracing_map_cmp_num(int field_size,
 
 extern void tracing_map_update_sum(struct tracing_map_elt *elt,
 				   unsigned int i, u64 n);
+extern void tracing_map_set_var(struct tracing_map_elt *elt,
+				unsigned int i, u64 n);
+extern bool tracing_map_var_set(struct tracing_map_elt *elt, unsigned int i);
 extern u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i);
+extern u64 tracing_map_read_var(struct tracing_map_elt *elt, unsigned int i);
+extern u64 tracing_map_read_var_once(struct tracing_map_elt *elt, unsigned int i);
+
 extern void tracing_map_set_field_descr(struct tracing_map *map,
 					unsigned int i,
 					unsigned int key_offset,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 11/21] tracing: Add variable support to hist triggers
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (9 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 10/21] tracing: Add per-element variable support to tracing_map Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-13  6:03   ` Namhyung Kim
  2017-02-08 17:25 ` [RFC][PATCH 12/21] tracing: Account for variables in named trigger compatibility Tom Zanussi
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Add support for saving the value of a current event's event field by
assigning it to a variable that can be read by a subsequent event.

The basic syntax for saving a variable is to simply prefix a unique
variable name not corresponding to any keyword along with an '=' sign
to any event field.

Both keys and values can be saved and retrieved in this way:

    # echo 'hist:keys=next_pid:vals=ts0=common_timestamp ...
    # echo 'hist:key=timer_pid=common_pid ...'

If a variable isn't a key variable or prefixed with 'vals=', the
associated event field will be saved in a variable but won't be summed
as a value:

    # echo 'hist:keys=next_pid:ts1=common_timestamp:...

Multiple variables can be assigned at the same time:

    # echo 'hist:keys=pid:vals=ts0=common_timestamp,b=field1,field2 ...

Variables set as above can be used by being referenced from another
event, as described in a subsequent patch.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 160 ++++++++++++++++++++++++++++++++-------
 1 file changed, 131 insertions(+), 29 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 8d7f7dd..e707577 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -29,6 +29,7 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
 				struct ring_buffer_event *rbe);
 
 #define HIST_FIELD_OPERANDS_MAX	2
+#define HIST_ASSIGNMENT_MAX	4
 
 struct hist_field {
 	struct ftrace_event_field	*field;
@@ -36,8 +37,10 @@ struct hist_field {
 	hist_field_fn_t			fn;
 	unsigned int			size;
 	unsigned int			offset;
-	unsigned int                    is_signed;
+	unsigned int			is_signed;
 	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
+	u64				var_val;
+	char				*var_name;
 };
 
 static u64 hist_field_none(struct hist_field *field, void *event,
@@ -140,12 +143,16 @@ enum hist_field_flags {
 	HIST_FIELD_FL_SYSCALL		= 128,
 	HIST_FIELD_FL_STACKTRACE	= 256,
 	HIST_FIELD_FL_LOG2		= 512,
-	HIST_FIELD_FL_TIMESTAMP		= 1024,
+	HIST_FIELD_FL_VAR		= 1024,
+	HIST_FIELD_FL_VAR_ONLY		= 2048,
+	HIST_FIELD_FL_TIMESTAMP		= 4096,
 };
 
 struct hist_trigger_attrs {
 	char		*keys_str;
 	char		*vals_str;
+	char		*assignment_str[HIST_ASSIGNMENT_MAX];
+	unsigned int	n_assignments;
 	char		*sort_key_str;
 	char		*name;
 	bool		pause;
@@ -241,9 +248,14 @@ static int parse_map_size(char *str)
 
 static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
 {
+	unsigned int i;
+
 	if (!attrs)
 		return;
 
+	for (i = 0; i < attrs->n_assignments; i++)
+		kfree(attrs->assignment_str[i]);
+
 	kfree(attrs->name);
 	kfree(attrs->sort_key_str);
 	kfree(attrs->keys_str);
@@ -258,9 +270,9 @@ static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
 	if ((strncmp(str, "key=", strlen("key=")) == 0) ||
 	    (strncmp(str, "keys=", strlen("keys=")) == 0))
 		attrs->keys_str = kstrdup(str, GFP_KERNEL);
-	else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
-		 (strncmp(str, "vals=", strlen("vals=")) == 0) ||
-		 (strncmp(str, "values=", strlen("values=")) == 0))
+	else if (((strncmp(str, "val=", strlen("val=")) == 0) ||
+		  (strncmp(str, "vals=", strlen("vals=")) == 0) ||
+		  (strncmp(str, "values=", strlen("values=")) == 0)))
 		attrs->vals_str = kstrdup(str, GFP_KERNEL);
 	else if (strncmp(str, "sort=", strlen("sort=")) == 0)
 		attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
@@ -274,8 +286,22 @@ static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
 			goto out;
 		}
 		attrs->map_bits = map_bits;
-	} else
-		ret = -EINVAL;
+	} else {
+		char *assignment;
+
+		if (attrs->n_assignments == HIST_ASSIGNMENT_MAX) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		assignment = kstrdup(str, GFP_KERNEL);
+		if (!assignment) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		attrs->assignment_str[attrs->n_assignments++] = assignment;
+	}
  out:
 	return ret;
 }
@@ -398,11 +424,14 @@ static void destroy_hist_field(struct hist_field *hist_field)
 	for (i = 0; i < HIST_FIELD_OPERANDS_MAX; i++)
 		destroy_hist_field(hist_field->operands[i]);
 
+	kfree(hist_field->var_name);
+
 	kfree(hist_field);
 }
 
 static struct hist_field *create_hist_field(struct ftrace_event_field *field,
-					    unsigned long flags)
+					    unsigned long flags,
+					    char *var_name)
 {
 	struct hist_field *hist_field;
 
@@ -427,7 +456,7 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 	if (flags & HIST_FIELD_FL_LOG2) {
 		unsigned long fl = flags & ~HIST_FIELD_FL_LOG2;
 		hist_field->fn = hist_field_log2;
-		hist_field->operands[0] = create_hist_field(field, fl);
+		hist_field->operands[0] = create_hist_field(field, fl, NULL);
 		hist_field->size = hist_field->operands[0]->size;
 		goto out;
 	}
@@ -461,6 +490,8 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
  out:
 	hist_field->field = field;
 	hist_field->flags = flags;
+	if (var_name)
+		hist_field->var_name = kstrdup(var_name, GFP_KERNEL);
 
 	return hist_field;
 }
@@ -480,7 +511,7 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
 static int create_hitcount_val(struct hist_trigger_data *hist_data)
 {
 	hist_data->fields[HITCOUNT_IDX] =
-		create_hist_field(NULL, HIST_FIELD_FL_HITCOUNT);
+		create_hist_field(NULL, HIST_FIELD_FL_HITCOUNT, NULL);
 	if (!hist_data->fields[HITCOUNT_IDX])
 		return -ENOMEM;
 
@@ -495,16 +526,29 @@ static int create_hitcount_val(struct hist_trigger_data *hist_data)
 static int create_val_field(struct hist_trigger_data *hist_data,
 			    unsigned int val_idx,
 			    struct trace_event_file *file,
-			    char *field_str)
+			    char *field_str, char *var_name)
 {
 	struct ftrace_event_field *field = NULL;
+	char *field_name, *token;
 	unsigned long flags = 0;
-	char *field_name;
 	int ret = 0;
 
 	if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX))
 		return -EINVAL;
 
+	if (var_name)
+		flags |= HIST_FIELD_FL_VAR | HIST_FIELD_FL_VAR_ONLY;
+
+	token = strsep(&field_str, "=");
+	if (field_str) {
+		if (var_name) {
+			ret = -EINVAL;
+			goto out;
+		}
+		var_name = token;
+		flags |= HIST_FIELD_FL_VAR;
+	}
+
 	field_name = strsep(&field_str, ".");
 	if (field_str) {
 		if (strcmp(field_str, "hex") == 0)
@@ -526,7 +570,7 @@ static int create_val_field(struct hist_trigger_data *hist_data,
 		}
 	}
 
-	hist_data->fields[val_idx] = create_hist_field(field, flags);
+	hist_data->fields[val_idx] = create_hist_field(field, flags, var_name);
 	if (!hist_data->fields[val_idx]) {
 		ret = -ENOMEM;
 		goto out;
@@ -544,7 +588,7 @@ static int create_val_fields(struct hist_trigger_data *hist_data,
 			     struct trace_event_file *file)
 {
 	char *fields_str, *field_str;
-	unsigned int i, j;
+	unsigned int i, j = 1;
 	int ret;
 
 	ret = create_hitcount_val(hist_data);
@@ -555,10 +599,6 @@ static int create_val_fields(struct hist_trigger_data *hist_data,
 	if (!fields_str)
 		goto out;
 
-	strsep(&fields_str, "=");
-	if (!fields_str)
-		goto out;
-
 	for (i = 0, j = 1; i < TRACING_MAP_VALS_MAX &&
 		     j < TRACING_MAP_VALS_MAX; i++) {
 		field_str = strsep(&fields_str, ",");
@@ -566,7 +606,7 @@ static int create_val_fields(struct hist_trigger_data *hist_data,
 			break;
 		if (strcmp(field_str, "hitcount") == 0)
 			continue;
-		ret = create_val_field(hist_data, j++, file, field_str);
+		ret = create_val_field(hist_data, j++, file, field_str, NULL);
 		if (ret)
 			goto out;
 	}
@@ -585,6 +625,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 	struct ftrace_event_field *field = NULL;
 	unsigned long flags = 0;
 	unsigned int key_size;
+	char *var_name;
 	int ret = 0;
 
 	if (WARN_ON(key_idx >= TRACING_MAP_FIELDS_MAX))
@@ -592,6 +633,10 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 
 	flags |= HIST_FIELD_FL_KEY;
 
+	var_name = strsep(&field_str, "=");
+	if (field_str)
+		flags |= HIST_FIELD_FL_VAR;
+
 	if (strcmp(field_str, "stacktrace") == 0) {
 		flags |= HIST_FIELD_FL_STACKTRACE;
 		key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH;
@@ -636,7 +681,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 		}
 	}
 
-	hist_data->fields[key_idx] = create_hist_field(field, flags);
+	hist_data->fields[key_idx] = create_hist_field(field, flags, var_name);
 	if (!hist_data->fields[key_idx]) {
 		ret = -ENOMEM;
 		goto out;
@@ -695,6 +740,31 @@ static int create_key_fields(struct hist_trigger_data *hist_data,
 	return ret;
 }
 
+static int create_var_fields(struct hist_trigger_data *hist_data,
+			     struct trace_event_file *file)
+{
+	unsigned int i, j = hist_data->n_vals;
+	char *str, *var;
+	int ret = 0;
+
+	for (i = 0; i < hist_data->attrs->n_assignments; i++) {
+
+		str = hist_data->attrs->assignment_str[i];
+
+		var = strsep(&str, "=");
+		if (!str) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		ret = create_val_field(hist_data, j++, file, str, var);
+		if (ret)
+			goto out;
+	}
+ out:
+	return ret;
+}
+
 static int create_hist_fields(struct hist_trigger_data *hist_data,
 			      struct trace_event_file *file)
 {
@@ -704,6 +774,10 @@ static int create_hist_fields(struct hist_trigger_data *hist_data,
 	if (ret)
 		goto out;
 
+	ret = create_var_fields(hist_data, file);
+	if (ret)
+		goto out;
+
 	ret = create_key_fields(hist_data, file);
 	if (ret)
 		goto out;
@@ -839,8 +913,7 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
 			idx = tracing_map_add_key_field(map,
 							hist_field->offset,
 							cmp_fn);
-
-		} else
+		} else if (!(hist_field->flags & HIST_FIELD_FL_VAR))
 			idx = tracing_map_add_sum_field(map);
 
 		if (idx < 0)
@@ -931,6 +1004,11 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
 	for_each_hist_val_field(i, hist_data) {
 		hist_field = hist_data->fields[i];
 		hist_val = hist_field->fn(hist_field, rec, rbe);
+		if (hist_field->flags & HIST_FIELD_FL_VAR) {
+			hist_field->var_val = hist_val;
+			if (hist_field->flags & HIST_FIELD_FL_VAR_ONLY)
+				continue;
+		}
 		tracing_map_update_sum(elt, i, hist_val);
 	}
 }
@@ -996,17 +1074,21 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 			} else
 				key = (void *)&field_contents;
 		}
-
 		if (use_compound_key)
 			add_to_key(compound_key, key, key_field, rec);
+
+		if (key_field->flags & HIST_FIELD_FL_VAR)
+			key_field->var_val = (u64)key;
 	}
 
 	if (use_compound_key)
 		key = compound_key;
 
 	elt = tracing_map_insert(hist_data->map, key);
-	if (elt)
-		hist_trigger_elt_update(hist_data, elt, rec, rbe);
+	if (!elt)
+		return;
+
+	hist_trigger_elt_update(hist_data, elt, rec, rbe);
 }
 
 static void hist_trigger_stacktrace_print(struct seq_file *m,
@@ -1228,7 +1310,12 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
 {
 	const char *field_name = hist_field_name(hist_field);
 
-	seq_printf(m, "%s", field_name);
+	if (hist_field->var_name)
+		seq_printf(m, "%s=", hist_field->var_name);
+
+	if (field_name)
+		seq_printf(m, "%s", field_name);
+
 	if (hist_field->flags) {
 		const char *flags_str = get_hist_field_flags(hist_field);
 
@@ -1237,6 +1324,16 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
 	}
 }
 
+static bool var_only(struct hist_trigger_data *hist_data)
+{
+	unsigned int i;
+
+	for_each_hist_val_field(i, hist_data)
+		if (hist_data->fields[i]->flags & HIST_FIELD_FL_VAR_ONLY)
+			return true;
+	return false;
+}
+
 static int event_hist_trigger_print(struct seq_file *m,
 				    struct event_trigger_ops *ops,
 				    struct event_trigger_data *data)
@@ -1266,15 +1363,19 @@ static int event_hist_trigger_print(struct seq_file *m,
 			hist_field_print(m, key_field);
 	}
 
-	seq_puts(m, ":vals=");
+	if (!var_only(hist_data))
+		seq_puts(m, ":vals=");
+	else
+		seq_puts(m, ":");
 
 	for_each_hist_val_field(i, hist_data) {
-		if (i == HITCOUNT_IDX)
+		if (i == HITCOUNT_IDX && !var_only(hist_data))
 			seq_puts(m, "hitcount");
 		else if (hist_data->fields[i]->flags & HIST_FIELD_FL_TIMESTAMP)
 			seq_puts(m, "common_timestamp");
 		else {
-			seq_puts(m, ",");
+			if (!var_only(hist_data))
+				seq_puts(m, ",");
 			hist_field_print(m, hist_data->fields[i]);
 		}
 	}
@@ -1673,6 +1774,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
 	}
 
 	ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
+
 	/*
 	 * The above returns on success the # of triggers registered,
 	 * but if it didn't register any it returns zero.  Consider no
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 12/21] tracing: Account for variables in named trigger compatibility
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (10 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 11/21] tracing: Add variable support to hist triggers Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-13  6:04   ` Namhyung Kim
  2017-02-08 17:25 ` [RFC][PATCH 13/21] tracing: Add simple expression support to hist triggers Tom Zanussi
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Named triggers must also have the same set of variables in order to be
considered compatible - update the trigger match test to account for
that.

The reason for this requirement is that named triggers with variables
are meant to allow one or more events to set the same variable.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index e707577..889455e 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1576,6 +1576,10 @@ static bool hist_trigger_match(struct event_trigger_data *data,
 			return false;
 		if (key_field->is_signed != key_field_test->is_signed)
 			return false;
+		if ((key_field->var_name && !key_field_test->var_name) ||
+		    (!key_field->var_name && key_field_test->var_name) ||
+		    strcmp(key_field->var_name, key_field_test->var_name) != 0)
+			return false;
 	}
 
 	for (i = 0; i < hist_data->n_sort_keys; i++) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 13/21] tracing: Add simple expression support to hist triggers
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (11 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 12/21] tracing: Account for variables in named trigger compatibility Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-14  2:37   ` Namhyung Kim
  2017-02-08 17:25 ` [RFC][PATCH 14/21] tracing: Add variable reference handling " Tom Zanussi
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Add support for simple addition, subtraction, and unary expressions
(-(expr) and expr, where expr = b-a, a+b, a+b+c) to hist triggers, in
order to support a minimal set of useful inter-event calculations.

These operations are needed for calculating latencies between events
(timestamp1-timestamp0) and for combined latencies (latencies over 3
or more events).

In the process, factor out some common code from key and value
parsing.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 445 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 387 insertions(+), 58 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 889455e..cea95b6 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -31,6 +31,13 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
 #define HIST_FIELD_OPERANDS_MAX	2
 #define HIST_ASSIGNMENT_MAX	4
 
+enum field_op_id {
+	FIELD_OP_NONE,
+	FIELD_OP_PLUS,
+	FIELD_OP_MINUS,
+	FIELD_OP_UNARY_MINUS,
+};
+
 struct hist_field {
 	struct ftrace_event_field	*field;
 	unsigned long			flags;
@@ -41,6 +48,8 @@ struct hist_field {
 	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
 	u64				var_val;
 	char				*var_name;
+	enum field_op_id		operator;
+	char				*name;
 };
 
 static u64 hist_field_none(struct hist_field *field, void *event,
@@ -81,6 +90,47 @@ static u64 hist_field_pstring(struct hist_field *hist_field, void *event,
 	return (u64)(unsigned long)*addr;
 }
 
+static u64 hist_field_var_val(struct hist_field *hist_field, void *event,
+			      struct ring_buffer_event *rbe)
+{
+	return hist_field->var_val;
+}
+
+static u64 hist_field_plus(struct hist_field *hist_field, void *event,
+			   struct ring_buffer_event *rbe)
+{
+	struct hist_field *operand1 = hist_field->operands[0];
+	struct hist_field *operand2 = hist_field->operands[1];
+
+	u64 val1 = operand1->fn(operand1, event, rbe);
+	u64 val2 = operand2->fn(operand2, event, rbe);
+
+	return val1 + val2;
+}
+
+static u64 hist_field_minus(struct hist_field *hist_field, void *event,
+			    struct ring_buffer_event *rbe)
+{
+	struct hist_field *operand1 = hist_field->operands[0];
+	struct hist_field *operand2 = hist_field->operands[1];
+
+	u64 val1 = operand1->fn(operand1, event, rbe);
+	u64 val2 = operand2->fn(operand2, event, rbe);
+
+	return val1 - val2;
+}
+
+static u64 hist_field_unary_minus(struct hist_field *hist_field, void *event,
+				  struct ring_buffer_event *rbe)
+{
+	struct hist_field *operand = hist_field->operands[0];
+
+	s64 sval = (s64)operand->fn(operand, event, rbe);
+	u64 val = (u64)-sval;
+
+	return val;
+}
+
 static u64 hist_field_log2(struct hist_field *hist_field, void *event,
 			   struct ring_buffer_event *rbe)
 {
@@ -145,7 +195,8 @@ enum hist_field_flags {
 	HIST_FIELD_FL_LOG2		= 512,
 	HIST_FIELD_FL_VAR		= 1024,
 	HIST_FIELD_FL_VAR_ONLY		= 2048,
-	HIST_FIELD_FL_TIMESTAMP		= 4096,
+	HIST_FIELD_FL_EXPR		= 4096,
+	HIST_FIELD_FL_TIMESTAMP		= 8192,
 };
 
 struct hist_trigger_attrs {
@@ -183,6 +234,10 @@ static const char *hist_field_name(struct hist_field *field)
 		field_name = field->field->name;
 	else if (field->flags & HIST_FIELD_FL_LOG2)
 		field_name = hist_field_name(field->operands[0]);
+	else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
+		field_name = "common_timestamp";
+	else if (field->flags & HIST_FIELD_FL_EXPR)
+		field_name = field->name;
 
 	return field_name;
 }
@@ -407,6 +462,44 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
 		save_comm(comm, current);
 }
 
+static struct ftrace_event_field *
+parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
+	    char *field_str, unsigned long *flags)
+{
+	struct ftrace_event_field *field = NULL;
+	char *field_name;
+
+	field_name = strsep(&field_str, ".");
+	if (field_str) {
+		if (strcmp(field_str, "hex") == 0)
+			*flags |= HIST_FIELD_FL_HEX;
+		else if (strcmp(field_str, "sym") == 0)
+			*flags |= HIST_FIELD_FL_SYM;
+		else if (strcmp(field_str, "sym-offset") == 0)
+			*flags |= HIST_FIELD_FL_SYM_OFFSET;
+		else if ((strcmp(field_str, "execname") == 0) &&
+			 (strcmp(field_name, "common_pid") == 0))
+			*flags |= HIST_FIELD_FL_EXECNAME;
+		else if (strcmp(field_str, "syscall") == 0)
+			*flags |= HIST_FIELD_FL_SYSCALL;
+		else if (strcmp(field_str, "log2") == 0)
+			*flags |= HIST_FIELD_FL_LOG2;
+		else
+			return ERR_PTR(-EINVAL);
+	}
+
+	if (strcmp(field_name, "common_timestamp") == 0) {
+		*flags |= HIST_FIELD_FL_TIMESTAMP;
+		hist_data->enable_timestamps = true;
+	} else {
+		field = trace_find_event_field(file->event_call, field_name);
+		if (!field)
+			return ERR_PTR(-EINVAL);
+	}
+
+	return field;
+}
+
 static const struct tracing_map_ops hist_trigger_elt_comm_ops = {
 	.elt_alloc	= hist_trigger_elt_comm_alloc,
 	.elt_copy	= hist_trigger_elt_comm_copy,
@@ -414,6 +507,73 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
 	.elt_init	= hist_trigger_elt_comm_init,
 };
 
+static char *expr_str(struct hist_field *field)
+{
+	char *expr = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
+
+	if (!expr)
+		return NULL;
+
+	if (field->operator == FIELD_OP_UNARY_MINUS) {
+		char *subexpr;
+
+		strcat(expr, "-(");
+		subexpr = expr_str(field->operands[0]);
+		if (!subexpr) {
+			kfree(expr);
+			return NULL;
+		}
+		strcat(expr, subexpr);
+		strcat(expr, ")");
+
+		return expr;
+	}
+
+	strcat(expr, hist_field_name(field->operands[0]));
+
+	switch (field->operator) {
+	case FIELD_OP_MINUS:
+		strcat(expr, "-");
+		break;
+	case FIELD_OP_PLUS:
+		strcat(expr, "+");
+		break;
+	default:
+		kfree(expr);
+		return NULL;
+	}
+
+	strcat(expr, hist_field_name(field->operands[1]));
+
+	return expr;
+}
+
+static int contains_operator(char *str)
+{
+	enum field_op_id field_op = FIELD_OP_NONE;
+	char *op;
+
+	op = strpbrk(str, "+-");
+	if (!op)
+		return FIELD_OP_NONE;
+
+	switch (*op) {
+	case '-':
+		if (*str == '-')
+			field_op = FIELD_OP_UNARY_MINUS;
+		else
+			field_op = FIELD_OP_MINUS;
+		break;
+	case '+':
+		field_op = FIELD_OP_PLUS;
+		break;
+	default:
+		break;
+	}
+
+	return field_op;
+}
+
 static void destroy_hist_field(struct hist_field *hist_field)
 {
 	unsigned int i;
@@ -425,6 +585,7 @@ static void destroy_hist_field(struct hist_field *hist_field)
 		destroy_hist_field(hist_field->operands[i]);
 
 	kfree(hist_field->var_name);
+	kfree(hist_field->name);
 
 	kfree(hist_field);
 }
@@ -443,6 +604,9 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 		return NULL;
 	hist_field->is_signed = false;
 
+	if (flags & HIST_FIELD_FL_EXPR)
+		goto out; /* caller will populate */
+
 	if (flags & HIST_FIELD_FL_HITCOUNT) {
 		hist_field->fn = hist_field_counter;
 		goto out;
@@ -508,6 +672,187 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
 	}
 }
 
+static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
+				     struct trace_event_file *file,
+				     char *str, unsigned long flags,
+				     char *var_name);
+
+static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
+				      struct trace_event_file *file,
+				      char *str, unsigned long flags,
+				      char *var_name)
+{
+	struct hist_field *operand1, *expr = NULL;
+	struct ftrace_event_field *field = NULL;
+	unsigned long operand_flags;
+	char *operand1_str;
+	int ret = 0;
+	char *s;
+
+	// we support only -(xxx) i.e. explicit parens required
+
+	str++; // skip leading '-'
+
+	s = strchr(str, '(');
+	if (s)
+		str++;
+	else {
+		ret = -EINVAL;
+		goto free;
+	}
+
+	s = strchr(str, ')');
+	if (s)
+		*s = '\0';
+	else {
+		ret = -EINVAL; // no closing ')'
+		goto free;
+	}
+
+	operand1_str = strsep(&str, "(");
+	if (!operand1_str)
+		goto free;
+
+	flags |= HIST_FIELD_FL_EXPR;
+	expr = create_hist_field(NULL, flags, var_name);
+	if (!expr) {
+		ret = -ENOMEM;
+		goto free;
+	}
+
+	operand_flags = 0;
+	operand1 = parse_expr(hist_data, file, str, operand_flags, NULL);
+	if (IS_ERR(operand1)) {
+		ret = PTR_ERR(operand1);
+		goto free;
+	}
+
+	if (operand1 == NULL) {
+		operand_flags = 0;
+		field = parse_field(hist_data, file, operand1_str,
+				    &operand_flags);
+		if (IS_ERR(field)) {
+			ret = PTR_ERR(field);
+			goto free;
+		}
+		operand1 = create_hist_field(field, operand_flags, NULL);
+		if (!operand1) {
+			ret = -ENOMEM;
+			goto free;
+		}
+	}
+
+	expr->fn = hist_field_unary_minus;
+	expr->operands[0] = operand1;
+	expr->operator = FIELD_OP_UNARY_MINUS;
+	expr->name = expr_str(expr);
+
+	return expr;
+ free:
+	return ERR_PTR(ret);
+}
+
+static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
+				     struct trace_event_file *file,
+				     char *str, unsigned long flags,
+				     char *var_name)
+{
+	struct hist_field *operand1, *operand2, *expr = NULL;
+	struct ftrace_event_field *field = NULL;
+	unsigned long operand_flags;
+	int field_op, ret = -EINVAL;
+	char *sep, *operand1_str;
+
+	field_op = contains_operator(str);
+	if (field_op == FIELD_OP_NONE)
+		return NULL;
+
+	if (field_op == FIELD_OP_UNARY_MINUS)
+		return parse_unary(hist_data, file, str, flags, var_name);
+
+	switch (field_op) {
+	case FIELD_OP_MINUS:
+		sep = "-";
+		break;
+	case FIELD_OP_PLUS:
+		sep = "+";
+		break;
+	default:
+		goto free;
+	}
+
+	operand1_str = strsep(&str, sep);
+	if (!operand1_str || !str)
+		goto free;
+
+	operand_flags = 0;
+	field = parse_field(hist_data, file, operand1_str, &operand_flags);
+	if (IS_ERR(field)) {
+		ret = PTR_ERR(field);
+		goto free;
+	}
+	operand1 = create_hist_field(field, operand_flags, NULL);
+	if (!operand1) {
+		ret = -ENOMEM;
+		operand1 = NULL;
+		goto free;
+	}
+
+	// rest of string could be another expression e.g. b+c in a+b+c
+	operand_flags = 0;
+	operand2 = parse_expr(hist_data, file, str, operand_flags, NULL);
+	if (IS_ERR(operand2)) {
+		ret = PTR_ERR(operand2);
+		operand2 = NULL;
+		goto free;
+	}
+	if (!operand2) {
+		operand_flags = 0;
+		field = parse_field(hist_data, file, str, &operand_flags);
+		if (IS_ERR(field)) {
+			ret = PTR_ERR(field);
+			goto free;
+		}
+		operand2 = create_hist_field(field, operand_flags, NULL);
+		if (!operand2) {
+			ret = -ENOMEM;
+			operand2 = NULL;
+			goto free;
+		}
+	}
+
+	flags |= HIST_FIELD_FL_EXPR;
+	expr = create_hist_field(NULL, flags, var_name);
+	if (!expr) {
+		ret = -ENOMEM;
+		goto free;
+	}
+
+	expr->operands[0] = operand1;
+	expr->operands[1] = operand2;
+	expr->operator = field_op;
+	expr->name = expr_str(expr);
+
+	switch (field_op) {
+	case FIELD_OP_MINUS:
+		expr->fn = hist_field_minus;
+		break;
+	case FIELD_OP_PLUS:
+		expr->fn = hist_field_plus;
+		break;
+	default:
+		goto free;
+	}
+
+	return expr;
+ free:
+	destroy_hist_field(operand1);
+	destroy_hist_field(operand2);
+	destroy_hist_field(expr);
+
+	return ERR_PTR(ret);
+}
+
 static int create_hitcount_val(struct hist_trigger_data *hist_data)
 {
 	hist_data->fields[HITCOUNT_IDX] =
@@ -529,8 +874,9 @@ static int create_val_field(struct hist_trigger_data *hist_data,
 			    char *field_str, char *var_name)
 {
 	struct ftrace_event_field *field = NULL;
-	char *field_name, *token;
+	struct hist_field *hist_field;
 	unsigned long flags = 0;
+	char *token;
 	int ret = 0;
 
 	if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX))
@@ -549,32 +895,27 @@ static int create_val_field(struct hist_trigger_data *hist_data,
 		flags |= HIST_FIELD_FL_VAR;
 	}
 
-	field_name = strsep(&field_str, ".");
-	if (field_str) {
-		if (strcmp(field_str, "hex") == 0)
-			flags |= HIST_FIELD_FL_HEX;
-		else {
-			ret = -EINVAL;
+	hist_field = parse_expr(hist_data, file, field_str, flags, var_name);
+	if (IS_ERR(hist_field)) {
+		ret = PTR_ERR(hist_field);
+		goto out;
+	}
+
+	if (!hist_field) {
+		field = parse_field(hist_data, file, field_str, &flags);
+		if (IS_ERR(field)) {
+			ret = PTR_ERR(field);
 			goto out;
 		}
-	}
 
-	if (strcmp(field_name, "common_timestamp") == 0) {
-		flags |= HIST_FIELD_FL_TIMESTAMP;
-		hist_data->enable_timestamps = true;
-	} else {
-		field = trace_find_event_field(file->event_call, field_name);
-		if (!field) {
-			ret = -EINVAL;
+		hist_field = create_hist_field(field, flags, var_name);
+		if (!hist_field) {
+			ret = -ENOMEM;
 			goto out;
 		}
 	}
 
-	hist_data->fields[val_idx] = create_hist_field(field, flags, var_name);
-	if (!hist_data->fields[val_idx]) {
-		ret = -ENOMEM;
-		goto out;
-	}
+	hist_data->fields[val_idx] = hist_field;
 
 	++hist_data->n_vals;
 
@@ -623,6 +964,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 			    char *field_str)
 {
 	struct ftrace_event_field *field = NULL;
+	struct hist_field *hist_field;
 	unsigned long flags = 0;
 	unsigned int key_size;
 	char *var_name;
@@ -640,53 +982,40 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 	if (strcmp(field_str, "stacktrace") == 0) {
 		flags |= HIST_FIELD_FL_STACKTRACE;
 		key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH;
+		hist_field = create_hist_field(field, flags, var_name);
 	} else {
-		char *field_name = strsep(&field_str, ".");
-
-		if (field_str) {
-			if (strcmp(field_str, "hex") == 0)
-				flags |= HIST_FIELD_FL_HEX;
-			else if (strcmp(field_str, "sym") == 0)
-				flags |= HIST_FIELD_FL_SYM;
-			else if (strcmp(field_str, "sym-offset") == 0)
-				flags |= HIST_FIELD_FL_SYM_OFFSET;
-			else if ((strcmp(field_str, "execname") == 0) &&
-				 (strcmp(field_name, "common_pid") == 0))
-				flags |= HIST_FIELD_FL_EXECNAME;
-			else if (strcmp(field_str, "syscall") == 0)
-				flags |= HIST_FIELD_FL_SYSCALL;
-			else if (strcmp(field_str, "log2") == 0)
-				flags |= HIST_FIELD_FL_LOG2;
-			else {
-				ret = -EINVAL;
-				goto out;
-			}
+		hist_field = parse_expr(hist_data, file, field_str, flags,
+					var_name);
+		if (IS_ERR(hist_field)) {
+			ret = PTR_ERR(hist_field);
+			goto out;
 		}
 
-		if (strcmp(field_name, "common_timestamp") == 0) {
-			flags |= HIST_FIELD_FL_TIMESTAMP;
-			hist_data->enable_timestamps = true;
-			key_size = sizeof(u64);
-		} else {
-			field = trace_find_event_field(file->event_call, field_name);
-			if (!field) {
-				ret = -EINVAL;
+		if (!hist_field) {
+			field = parse_field(hist_data, file, field_str,
+					    &flags);
+			if (IS_ERR(field)) {
+				ret = PTR_ERR(field);
 				goto out;
 			}
 
-			if (is_string_field(field))
-				key_size = MAX_FILTER_STR_VAL;
-			else
-				key_size = field->size;
+			hist_field = create_hist_field(field, flags, var_name);
+			if (!hist_field) {
+				ret = -ENOMEM;
+				goto out;
+			}
 		}
-	}
 
-	hist_data->fields[key_idx] = create_hist_field(field, flags, var_name);
-	if (!hist_data->fields[key_idx]) {
-		ret = -ENOMEM;
-		goto out;
+		if (flags & HIST_FIELD_FL_TIMESTAMP)
+			key_size = sizeof(u64);
+		else if (is_string_field(field))
+			key_size = MAX_FILTER_STR_VAL;
+		else
+			key_size = field->size;
 	}
 
+	hist_data->fields[key_idx] = hist_field;
+
 	key_size = ALIGN(key_size, sizeof(u64));
 	hist_data->fields[key_idx]->size = key_size;
 	hist_data->fields[key_idx]->offset = key_offset;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 14/21] tracing: Add variable reference handling to hist triggers
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (12 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 13/21] tracing: Add simple expression support to hist triggers Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 15/21] tracing: Add usecs modifier for hist trigger timestamps Tom Zanussi
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Add the necessary infrastructure to allow the variables defined on one
event to be referenced in another.  This allows variables set by a
previous event to be referenced and used in expressions combining the
variable values saved by that previous event and the event fields of
the current event.  For example, here's how a latency can be
calculated and saved into yet another variable named 'wakeup_lat':

    # echo 'hist:keys=pid,prio:ts0=common_timestamp ...
    # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-ts0 ...

In the first event, the event's timetamp is saved into the variable
ts0.  In the next line, ts0 is subtracted from the second event's
timestamp to produce the latency.

Further users of variable references will be described in subsequent
patches, such as for instance how the 'wakeup_lat' variable above can
be displayed in a latency histogram.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 742 +++++++++++++++++++++++++++++----------
 1 file changed, 556 insertions(+), 186 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index cea95b6..c0a2ce8 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -25,8 +25,10 @@
 
 struct hist_field;
 
-typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
-				struct ring_buffer_event *rbe);
+typedef u64 (*hist_field_fn_t) (struct hist_field *field,
+				struct tracing_map_elt *elt,
+				struct ring_buffer_event *rbe,
+				void *event);
 
 #define HIST_FIELD_OPERANDS_MAX	2
 #define HIST_ASSIGNMENT_MAX	4
@@ -38,6 +40,11 @@ enum field_op_id {
 	FIELD_OP_UNARY_MINUS,
 };
 
+struct hist_var_ref {
+	struct hist_trigger_data	*hist_data;
+	unsigned int			idx;
+};
+
 struct hist_field {
 	struct ftrace_event_field	*field;
 	unsigned long			flags;
@@ -45,35 +52,46 @@ struct hist_field {
 	unsigned int			size;
 	unsigned int			offset;
 	unsigned int			is_signed;
-	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
-	u64				var_val;
+	struct hist_var_ref		var_ref;
+	unsigned int			var_ref_idx;
 	char				*var_name;
+	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
 	enum field_op_id		operator;
 	char				*name;
+	u64				var_val;
+	unsigned int			var_idx;
 };
 
-static u64 hist_field_none(struct hist_field *field, void *event,
-			   struct ring_buffer_event *rbe)
+static u64 hist_field_none(struct hist_field *field,
+			   struct tracing_map_elt *elt,
+			   struct ring_buffer_event *rbe,
+			   void *event)
 {
 	return 0;
 }
 
-static u64 hist_field_counter(struct hist_field *field, void *event,
-			      struct ring_buffer_event *rbe)
+static u64 hist_field_counter(struct hist_field *field,
+			      struct tracing_map_elt *elt,
+			      struct ring_buffer_event *rbe,
+			      void *event)
 {
 	return 1;
 }
 
-static u64 hist_field_string(struct hist_field *hist_field, void *event,
-			     struct ring_buffer_event *rbe)
+static u64 hist_field_string(struct hist_field *hist_field,
+			     struct tracing_map_elt *elt,
+			     struct ring_buffer_event *rbe,
+			     void *event)
 {
 	char *addr = (char *)(event + hist_field->field->offset);
 
 	return (u64)(unsigned long)addr;
 }
 
-static u64 hist_field_dynstring(struct hist_field *hist_field, void *event,
-				struct ring_buffer_event *rbe)
+static u64 hist_field_dynstring(struct hist_field *hist_field,
+				struct tracing_map_elt *elt,
+				struct ring_buffer_event *rbe,
+				void *event)
 {
 	u32 str_item = *(u32 *)(event + hist_field->field->offset);
 	int str_loc = str_item & 0xffff;
@@ -82,75 +100,82 @@ static u64 hist_field_dynstring(struct hist_field *hist_field, void *event,
 	return (u64)(unsigned long)addr;
 }
 
-static u64 hist_field_pstring(struct hist_field *hist_field, void *event,
-			      struct ring_buffer_event *rbe)
+static u64 hist_field_pstring(struct hist_field *hist_field,
+			      struct tracing_map_elt *elt,
+			      struct ring_buffer_event *rbe,
+			      void *event)
 {
 	char **addr = (char **)(event + hist_field->field->offset);
 
 	return (u64)(unsigned long)*addr;
 }
 
-static u64 hist_field_var_val(struct hist_field *hist_field, void *event,
-			      struct ring_buffer_event *rbe)
-{
-	return hist_field->var_val;
-}
-
-static u64 hist_field_plus(struct hist_field *hist_field, void *event,
-			   struct ring_buffer_event *rbe)
+static u64 hist_field_plus(struct hist_field *hist_field,
+			   struct tracing_map_elt *elt,
+			   struct ring_buffer_event *rbe,
+			   void *event)
 {
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, event, rbe);
-	u64 val2 = operand2->fn(operand2, event, rbe);
+	u64 val1 = operand1->fn(operand1, elt, rbe, event);
+	u64 val2 = operand2->fn(operand2, elt, rbe, event);
 
 	return val1 + val2;
 }
 
-static u64 hist_field_minus(struct hist_field *hist_field, void *event,
-			    struct ring_buffer_event *rbe)
+static u64 hist_field_minus(struct hist_field *hist_field,
+			    struct tracing_map_elt *elt,
+			    struct ring_buffer_event *rbe,
+			    void *event)
 {
 	struct hist_field *operand1 = hist_field->operands[0];
 	struct hist_field *operand2 = hist_field->operands[1];
 
-	u64 val1 = operand1->fn(operand1, event, rbe);
-	u64 val2 = operand2->fn(operand2, event, rbe);
+	u64 val1 = operand1->fn(operand1, elt, rbe, event);
+	u64 val2 = operand2->fn(operand2, elt, rbe, event);
 
 	return val1 - val2;
 }
 
-static u64 hist_field_unary_minus(struct hist_field *hist_field, void *event,
-				  struct ring_buffer_event *rbe)
+static u64 hist_field_unary_minus(struct hist_field *hist_field,
+				  struct tracing_map_elt *elt,
+				  struct ring_buffer_event *rbe,
+				  void *event)
 {
 	struct hist_field *operand = hist_field->operands[0];
 
-	s64 sval = (s64)operand->fn(operand, event, rbe);
+	s64 sval = (s64)operand->fn(operand, elt, rbe, event);
 	u64 val = (u64)-sval;
 
 	return val;
 }
 
-static u64 hist_field_log2(struct hist_field *hist_field, void *event,
-			   struct ring_buffer_event *rbe)
+static u64 hist_field_log2(struct hist_field *hist_field,
+			   struct tracing_map_elt *elt,
+			   struct ring_buffer_event *rbe,
+			   void *event)
 {
 	struct hist_field *operand = hist_field->operands[0];
 
-	u64 val = operand->fn(operand, event, rbe);
+	u64 val = operand->fn(operand, elt, rbe, event);
 
 	return (u64) ilog2(roundup_pow_of_two(val));
 }
 
-static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
-				struct ring_buffer_event *rbe)
+static u64 hist_field_timestamp(struct hist_field *hist_field,
+				struct tracing_map_elt *elt,
+				struct ring_buffer_event *rbe,
+				void *event)
 {
 	return ring_buffer_event_time_stamp(rbe);
 }
 
 #define DEFINE_HIST_FIELD_FN(type)					\
 	static u64 hist_field_##type(struct hist_field *hist_field,	\
-				     void *event,			\
-				     struct ring_buffer_event *rbe)	\
+				     struct tracing_map_elt *elt,	\
+				     struct ring_buffer_event *rbe,	\
+				     void *event)			\
 {									\
 	type *addr = (type *)(event + hist_field->field->offset);	\
 									\
@@ -195,8 +220,9 @@ enum hist_field_flags {
 	HIST_FIELD_FL_LOG2		= 512,
 	HIST_FIELD_FL_VAR		= 1024,
 	HIST_FIELD_FL_VAR_ONLY		= 2048,
-	HIST_FIELD_FL_EXPR		= 4096,
-	HIST_FIELD_FL_TIMESTAMP		= 8192,
+	HIST_FIELD_FL_VAR_REF		= 4096,
+	HIST_FIELD_FL_EXPR		= 8192,
+	HIST_FIELD_FL_TIMESTAMP		= 16384,
 };
 
 struct hist_trigger_attrs {
@@ -214,9 +240,12 @@ struct hist_trigger_attrs {
 
 struct hist_trigger_data {
 	struct hist_field               *fields[TRACING_MAP_FIELDS_MAX];
+	struct hist_field               *var_refs[TRACING_MAP_VARS_MAX];
 	unsigned int			n_vals;
 	unsigned int			n_keys;
 	unsigned int			n_fields;
+	unsigned int			n_vars;
+	unsigned int			n_var_refs;
 	unsigned int			key_size;
 	struct tracing_map_sort_key	sort_keys[TRACING_MAP_SORT_KEYS_MAX];
 	unsigned int			n_sort_keys;
@@ -226,6 +255,224 @@ struct hist_trigger_data {
 	bool				enable_timestamps;
 };
 
+static LIST_HEAD(hist_var_list);
+
+struct hist_var_data {
+	struct list_head list;
+	struct hist_trigger_data *hist_data;
+};
+
+static struct hist_field *check_var_ref(struct hist_field *hist_field,
+					struct hist_trigger_data *var_data,
+					unsigned int var_idx)
+{
+	struct hist_field *found = NULL;
+
+	if (hist_field && hist_field->flags & HIST_FIELD_FL_VAR_REF) {
+		if (hist_field->var_ref.idx == var_idx &&
+		    hist_field->var_ref.hist_data == var_data) {
+			found = hist_field;
+		}
+	}
+
+	return found;
+}
+
+static struct hist_field *find_var_ref(struct hist_trigger_data *hist_data,
+				       struct hist_trigger_data *var_data,
+				       unsigned int var_idx)
+{
+	struct hist_field *hist_field, *found = NULL;
+	unsigned int i, j;
+
+	for_each_hist_field(i, hist_data) {
+		hist_field = hist_data->fields[i];
+		found = check_var_ref(hist_field, var_data, var_idx);
+		if (found)
+			return found;
+
+		for (j = 0; j < HIST_FIELD_OPERANDS_MAX; j++) {
+			struct hist_field *operand;
+			operand = hist_field->operands[j];
+			found = check_var_ref(operand, var_data, var_idx);
+			if (found)
+				return found;
+		}
+	}
+
+	return found;
+}
+
+static struct hist_field *find_any_var_ref(struct hist_trigger_data *hist_data,
+					   unsigned int var_idx)
+{
+	struct hist_field *found = NULL;
+	struct hist_var_data *var_data;
+
+	list_for_each_entry(var_data, &hist_var_list, list) {
+		found = find_var_ref(var_data->hist_data, hist_data, var_idx);
+		if (found)
+			break;
+	}
+
+	return found;
+}
+
+static bool check_var_refs(struct hist_trigger_data *hist_data)
+{
+	struct hist_field *field;
+	bool found = false;
+	int i;
+
+	for_each_hist_field(i, hist_data) {
+		field = hist_data->fields[i];
+		if (field && field->flags & HIST_FIELD_FL_VAR) {
+			if (find_any_var_ref(hist_data, field->var_ref.idx)) {
+				found = true;
+				break;
+			}
+		}
+	}
+
+	return found;
+}
+
+static struct hist_var_data *find_hist_vars(struct hist_trigger_data *hist_data)
+{
+	struct hist_var_data *var_data, *found = NULL;
+
+	list_for_each_entry(var_data, &hist_var_list, list) {
+		if (var_data->hist_data == hist_data) {
+			found = var_data;
+			break;
+		}
+	}
+
+	return found;
+}
+
+static bool has_hist_vars(struct hist_trigger_data *hist_data)
+{
+	struct hist_field *hist_field;
+	bool found = false;
+	int i;
+
+	for_each_hist_field(i, hist_data) {
+		hist_field = hist_data->fields[i];
+		if (hist_field && hist_field->flags & HIST_FIELD_FL_VAR) {
+			found = true;
+			break;
+		}
+	}
+
+	return found;
+}
+
+static int save_hist_vars(struct hist_trigger_data *hist_data)
+{
+	struct hist_var_data *var_data;
+
+	var_data = find_hist_vars(hist_data);
+	if (var_data)
+		return 0;
+
+	var_data = kzalloc(sizeof(*var_data), GFP_KERNEL);
+	if (!var_data)
+		return -ENOMEM;
+
+	var_data->hist_data = hist_data;
+	list_add(&var_data->list, &hist_var_list);
+
+	return 0;
+}
+
+static int remove_hist_vars(struct hist_trigger_data *hist_data)
+{
+	struct hist_var_data *var_data;
+
+	var_data = find_hist_vars(hist_data);
+	if (!var_data)
+		return -EINVAL;
+
+	if (check_var_refs(hist_data))
+		return -EINVAL;
+
+	list_del(&var_data->list);
+
+	return 0;
+}
+
+static struct hist_field *find_var_field(struct hist_trigger_data *hist_data,
+					 char *var_name)
+{
+	struct hist_field *hist_field, *found = NULL;
+	int i;
+
+	for_each_hist_field(i, hist_data) {
+		hist_field = hist_data->fields[i];
+		if (hist_field && hist_field->flags & HIST_FIELD_FL_VAR) {
+			if (strcmp(hist_field->var_name, var_name) == 0) {
+				found = hist_field;
+				break;
+			}
+		}
+	}
+
+	return found;
+}
+
+struct hist_elt_data {
+	char *comm;
+	u64 *var_ref_vals;
+};
+
+static u64 hist_field_var_ref(struct hist_field *hist_field,
+			      struct tracing_map_elt *elt,
+			      struct ring_buffer_event *rbe,
+			      void *event)
+{
+	struct hist_elt_data *elt_data;
+	u64 var_val = 0;
+
+	elt_data = elt->private_data;
+	var_val = elt_data->var_ref_vals[hist_field->var_ref_idx];
+
+	return var_val;
+}
+
+static bool resolve_var_refs(struct hist_trigger_data *hist_data,
+			     void *key,
+			     u64 *var_ref_vals)
+{
+	struct hist_trigger_data *var_data;
+	struct tracing_map_elt *var_elt;
+	struct hist_field *hist_field;
+	bool resolved = true;
+	unsigned int i, var_idx;
+	u64 var_val = 0;
+
+	for (i = 0; i < hist_data->n_var_refs; i++) {
+		hist_field = hist_data->var_refs[i];
+		var_idx = hist_field->var_ref.idx;
+		var_data = hist_field->var_ref.hist_data;
+
+		var_elt = tracing_map_lookup(var_data->map, key);
+		if (!var_elt) {
+			resolved = false;
+			break;
+		}
+		if (!tracing_map_var_set(var_elt, var_idx)) {
+			resolved = false;
+			break;
+		}
+
+		var_val = tracing_map_read_var(var_elt, var_idx);
+		var_ref_vals[i] = var_val;
+	}
+
+	return resolved;
+}
+
 static const char *hist_field_name(struct hist_field *field)
 {
 	const char *field_name = NULL;
@@ -236,7 +483,8 @@ static const char *hist_field_name(struct hist_field *field)
 		field_name = hist_field_name(field->operands[0]);
 	else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
 		field_name = "common_timestamp";
-	else if (field->flags & HIST_FIELD_FL_EXPR)
+	else if (field->flags & HIST_FIELD_FL_EXPR ||
+		 field->flags & HIST_FIELD_FL_VAR_REF)
 		field_name = field->name;
 
 	return field_name;
@@ -417,26 +665,36 @@ static inline void save_comm(char *comm, struct task_struct *task)
 	memcpy(comm, task->comm, TASK_COMM_LEN);
 }
 
-static void hist_trigger_elt_comm_free(struct tracing_map_elt *elt)
+static void hist_trigger_elt_data_free(struct tracing_map_elt *elt)
 {
-	kfree((char *)elt->private_data);
+	struct hist_elt_data *private_data = elt->private_data;
+
+	kfree(private_data->comm);
+	kfree(private_data);
 }
 
-static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
+static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
 {
 	struct hist_trigger_data *hist_data = elt->map->private_data;
+	unsigned int size = TASK_COMM_LEN + 1;
+	struct hist_elt_data *elt_data;
 	struct hist_field *key_field;
 	unsigned int i;
 
+	elt->private_data = elt_data = kzalloc(sizeof(*elt_data), GFP_KERNEL);
+	if (!elt_data)
+		return -ENOMEM;
+
 	for_each_hist_key_field(i, hist_data) {
 		key_field = hist_data->fields[i];
 
 		if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
-			unsigned int size = TASK_COMM_LEN + 1;
-
-			elt->private_data = kzalloc(size, GFP_KERNEL);
-			if (!elt->private_data)
+			elt_data->comm = kzalloc(size, GFP_KERNEL);
+			if (!elt_data->comm) {
+				kfree(elt_data);
+				elt->private_data = NULL;
 				return -ENOMEM;
+			}
 			break;
 		}
 	}
@@ -444,67 +702,31 @@ static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
 	return 0;
 }
 
-static void hist_trigger_elt_comm_copy(struct tracing_map_elt *to,
+static void hist_trigger_elt_data_copy(struct tracing_map_elt *to,
 				       struct tracing_map_elt *from)
 {
-	char *comm_from = from->private_data;
-	char *comm_to = to->private_data;
+	struct hist_elt_data *from_data = from->private_data;
+	struct hist_elt_data *to_data = to->private_data;
 
-	if (comm_from)
-		memcpy(comm_to, comm_from, TASK_COMM_LEN + 1);
-}
+	memcpy(to_data, from_data, sizeof(*to));
 
-static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
-{
-	char *comm = elt->private_data;
-
-	if (comm)
-		save_comm(comm, current);
+	if (from_data->comm)
+		memcpy(to_data->comm, from_data->comm, TASK_COMM_LEN + 1);
 }
 
-static struct ftrace_event_field *
-parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
-	    char *field_str, unsigned long *flags)
+static void hist_trigger_elt_data_init(struct tracing_map_elt *elt)
 {
-	struct ftrace_event_field *field = NULL;
-	char *field_name;
-
-	field_name = strsep(&field_str, ".");
-	if (field_str) {
-		if (strcmp(field_str, "hex") == 0)
-			*flags |= HIST_FIELD_FL_HEX;
-		else if (strcmp(field_str, "sym") == 0)
-			*flags |= HIST_FIELD_FL_SYM;
-		else if (strcmp(field_str, "sym-offset") == 0)
-			*flags |= HIST_FIELD_FL_SYM_OFFSET;
-		else if ((strcmp(field_str, "execname") == 0) &&
-			 (strcmp(field_name, "common_pid") == 0))
-			*flags |= HIST_FIELD_FL_EXECNAME;
-		else if (strcmp(field_str, "syscall") == 0)
-			*flags |= HIST_FIELD_FL_SYSCALL;
-		else if (strcmp(field_str, "log2") == 0)
-			*flags |= HIST_FIELD_FL_LOG2;
-		else
-			return ERR_PTR(-EINVAL);
-	}
+	struct hist_elt_data *private_data = elt->private_data;
 
-	if (strcmp(field_name, "common_timestamp") == 0) {
-		*flags |= HIST_FIELD_FL_TIMESTAMP;
-		hist_data->enable_timestamps = true;
-	} else {
-		field = trace_find_event_field(file->event_call, field_name);
-		if (!field)
-			return ERR_PTR(-EINVAL);
-	}
-
-	return field;
+	if (private_data->comm)
+		save_comm(private_data->comm, current);
 }
 
-static const struct tracing_map_ops hist_trigger_elt_comm_ops = {
-	.elt_alloc	= hist_trigger_elt_comm_alloc,
-	.elt_copy	= hist_trigger_elt_comm_copy,
-	.elt_free	= hist_trigger_elt_comm_free,
-	.elt_init	= hist_trigger_elt_comm_init,
+static const struct tracing_map_ops hist_trigger_elt_data_ops = {
+	.elt_alloc	= hist_trigger_elt_data_alloc,
+	.elt_copy	= hist_trigger_elt_data_copy,
+	.elt_free	= hist_trigger_elt_data_free,
+	.elt_init	= hist_trigger_elt_data_init,
 };
 
 static char *expr_str(struct hist_field *field)
@@ -607,6 +829,11 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 	if (flags & HIST_FIELD_FL_EXPR)
 		goto out; /* caller will populate */
 
+	if (flags & HIST_FIELD_FL_VAR_REF) {
+		hist_field->fn = hist_field_var_ref;
+		goto out;
+	}
+
 	if (flags & HIST_FIELD_FL_HITCOUNT) {
 		hist_field->fn = hist_field_counter;
 		goto out;
@@ -636,6 +863,7 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 
 	if (is_string_field(field)) {
 		flags |= HIST_FIELD_FL_STRING;
+		hist_field->size = MAX_FILTER_STR_VAL;
 
 		if (field->filter_type == FILTER_STATIC_STRING)
 			hist_field->fn = hist_field_string;
@@ -644,6 +872,7 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
 		else
 			hist_field->fn = hist_field_pstring;
 	} else {
+		hist_field->size = field->size;
 		hist_field->fn = select_value_fn(field->size,
 						 field->is_signed);
 		if (!hist_field->fn) {
@@ -672,6 +901,101 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
 	}
 }
 
+static struct hist_field *parse_var_ref(char *var_name)
+{
+	struct hist_var_data *var_data;
+	struct hist_field *var_field = NULL, *ref_field = NULL;
+
+	list_for_each_entry(var_data, &hist_var_list, list) {
+		var_field = find_var_field(var_data->hist_data, var_name);
+		if (var_field)
+			break;
+	}
+
+	if (var_field) {
+		unsigned long flags = HIST_FIELD_FL_VAR_REF;
+
+		ref_field = create_hist_field(NULL, flags, NULL);
+		if (ref_field) {
+			ref_field->var_ref.idx = var_field->var_ref.idx;
+			ref_field->var_ref.hist_data = var_data->hist_data;
+			ref_field->name = kstrdup(var_field->var_name, GFP_KERNEL);
+		}
+	}
+
+	return ref_field;
+}
+
+static struct ftrace_event_field *
+parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
+	    char *field_str, unsigned long *flags)
+{
+	struct ftrace_event_field *field = NULL;
+	char *field_name;
+
+	field_name = strsep(&field_str, ".");
+	if (field_str) {
+		if (strcmp(field_str, "hex") == 0)
+			*flags |= HIST_FIELD_FL_HEX;
+		else if (strcmp(field_str, "sym") == 0)
+			*flags |= HIST_FIELD_FL_SYM;
+		else if (strcmp(field_str, "sym-offset") == 0)
+			*flags |= HIST_FIELD_FL_SYM_OFFSET;
+		else if ((strcmp(field_str, "execname") == 0) &&
+			 (strcmp(field_name, "common_pid") == 0))
+			*flags |= HIST_FIELD_FL_EXECNAME;
+		else if (strcmp(field_str, "syscall") == 0)
+			*flags |= HIST_FIELD_FL_SYSCALL;
+		else if (strcmp(field_str, "log2") == 0)
+			*flags |= HIST_FIELD_FL_LOG2;
+		else
+			return ERR_PTR(-EINVAL);
+	}
+
+	if (strcmp(field_name, "common_timestamp") == 0) {
+		*flags |= HIST_FIELD_FL_TIMESTAMP;
+		hist_data->enable_timestamps = true;
+	} else {
+		field = trace_find_event_field(file->event_call, field_name);
+		if (!field)
+			return ERR_PTR(-EINVAL);
+	}
+
+	return field;
+}
+
+struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
+			      struct trace_event_file *file, char *str,
+			      unsigned long *flags, char *var_name)
+{
+	struct ftrace_event_field *field = NULL;
+	struct hist_field *hist_field = NULL;
+	int ret = 0;
+
+	hist_field = parse_var_ref(str);
+	if (hist_field) {
+		hist_data->var_refs[hist_data->n_var_refs] = hist_field;
+		hist_field->var_ref_idx = hist_data->n_var_refs++;
+		return hist_field;
+	}
+
+	field = parse_field(hist_data, file, str, flags);
+	if (IS_ERR(field)) {
+		ret = PTR_ERR(field);
+		goto out;
+	}
+
+	hist_field = create_hist_field(field, *flags, var_name);
+	if (!hist_field) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	return hist_field;
+ out:
+	return ERR_PTR(ret);
+}
+
 static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 				     struct trace_event_file *file,
 				     char *str, unsigned long flags,
@@ -683,7 +1007,6 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
 				      char *var_name)
 {
 	struct hist_field *operand1, *expr = NULL;
-	struct ftrace_event_field *field = NULL;
 	unsigned long operand_flags;
 	char *operand1_str;
 	int ret = 0;
@@ -729,15 +1052,10 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
 
 	if (operand1 == NULL) {
 		operand_flags = 0;
-		field = parse_field(hist_data, file, operand1_str,
-				    &operand_flags);
-		if (IS_ERR(field)) {
-			ret = PTR_ERR(field);
-			goto free;
-		}
-		operand1 = create_hist_field(field, operand_flags, NULL);
-		if (!operand1) {
-			ret = -ENOMEM;
+		operand1 = parse_atom(hist_data, file, operand1_str,
+				      &operand_flags, NULL);
+		if (IS_ERR(operand1)) {
+			ret = PTR_ERR(operand1);
 			goto free;
 		}
 	}
@@ -757,8 +1075,7 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 				     char *str, unsigned long flags,
 				     char *var_name)
 {
-	struct hist_field *operand1, *operand2, *expr = NULL;
-	struct ftrace_event_field *field = NULL;
+	struct hist_field *operand1 = NULL, *operand2 = NULL, *expr = NULL;
 	unsigned long operand_flags;
 	int field_op, ret = -EINVAL;
 	char *sep, *operand1_str;
@@ -786,14 +1103,10 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 		goto free;
 
 	operand_flags = 0;
-	field = parse_field(hist_data, file, operand1_str, &operand_flags);
-	if (IS_ERR(field)) {
-		ret = PTR_ERR(field);
-		goto free;
-	}
-	operand1 = create_hist_field(field, operand_flags, NULL);
-	if (!operand1) {
-		ret = -ENOMEM;
+	operand1 = parse_atom(hist_data, file, operand1_str,
+			      &operand_flags, NULL);
+	if (IS_ERR(operand1)) {
+		ret = PTR_ERR(operand1);
 		operand1 = NULL;
 		goto free;
 	}
@@ -808,14 +1121,10 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 	}
 	if (!operand2) {
 		operand_flags = 0;
-		field = parse_field(hist_data, file, str, &operand_flags);
-		if (IS_ERR(field)) {
-			ret = PTR_ERR(field);
-			goto free;
-		}
-		operand2 = create_hist_field(field, operand_flags, NULL);
-		if (!operand2) {
-			ret = -ENOMEM;
+		operand2 = parse_atom(hist_data, file, str,
+				      &operand_flags, NULL);
+		if (IS_ERR(operand2)) {
+			ret = PTR_ERR(operand2);
 			operand2 = NULL;
 			goto free;
 		}
@@ -873,7 +1182,6 @@ static int create_val_field(struct hist_trigger_data *hist_data,
 			    struct trace_event_file *file,
 			    char *field_str, char *var_name)
 {
-	struct ftrace_event_field *field = NULL;
 	struct hist_field *hist_field;
 	unsigned long flags = 0;
 	char *token;
@@ -893,7 +1201,8 @@ static int create_val_field(struct hist_trigger_data *hist_data,
 		}
 		var_name = token;
 		flags |= HIST_FIELD_FL_VAR;
-	}
+	} else
+		field_str = token;
 
 	hist_field = parse_expr(hist_data, file, field_str, flags, var_name);
 	if (IS_ERR(hist_field)) {
@@ -902,15 +1211,10 @@ static int create_val_field(struct hist_trigger_data *hist_data,
 	}
 
 	if (!hist_field) {
-		field = parse_field(hist_data, file, field_str, &flags);
-		if (IS_ERR(field)) {
-			ret = PTR_ERR(field);
-			goto out;
-		}
-
-		hist_field = create_hist_field(field, flags, var_name);
-		if (!hist_field) {
-			ret = -ENOMEM;
+		hist_field = parse_atom(hist_data, file, field_str,
+					&flags, var_name);
+		if (IS_ERR(hist_field)) {
+			ret = PTR_ERR(hist_field);
 			goto out;
 		}
 	}
@@ -945,12 +1249,15 @@ static int create_val_fields(struct hist_trigger_data *hist_data,
 		field_str = strsep(&fields_str, ",");
 		if (!field_str)
 			break;
+
 		if (strcmp(field_str, "hitcount") == 0)
 			continue;
+
 		ret = create_val_field(hist_data, j++, file, field_str, NULL);
 		if (ret)
 			goto out;
 	}
+
 	if (fields_str && (strcmp(fields_str, "hitcount") != 0))
 		ret = -EINVAL;
  out:
@@ -963,7 +1270,6 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 			    struct trace_event_file *file,
 			    char *field_str)
 {
-	struct ftrace_event_field *field = NULL;
 	struct hist_field *hist_field;
 	unsigned long flags = 0;
 	unsigned int key_size;
@@ -978,11 +1284,15 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 	var_name = strsep(&field_str, "=");
 	if (field_str)
 		flags |= HIST_FIELD_FL_VAR;
+	else {
+		field_str = var_name;
+		var_name = NULL;
+	}
 
 	if (strcmp(field_str, "stacktrace") == 0) {
 		flags |= HIST_FIELD_FL_STACKTRACE;
 		key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH;
-		hist_field = create_hist_field(field, flags, var_name);
+		hist_field = create_hist_field(NULL, flags, var_name);
 	} else {
 		hist_field = parse_expr(hist_data, file, field_str, flags,
 					var_name);
@@ -992,26 +1302,21 @@ static int create_key_field(struct hist_trigger_data *hist_data,
 		}
 
 		if (!hist_field) {
-			field = parse_field(hist_data, file, field_str,
-					    &flags);
-			if (IS_ERR(field)) {
-				ret = PTR_ERR(field);
+			hist_field = parse_atom(hist_data, file, field_str,
+						&flags, var_name);
+			if (IS_ERR(hist_field)) {
+				ret = PTR_ERR(hist_field);
 				goto out;
 			}
+		}
 
-			hist_field = create_hist_field(field, flags, var_name);
-			if (!hist_field) {
-				ret = -ENOMEM;
-				goto out;
-			}
+		if (hist_field->flags & HIST_FIELD_FL_VAR_REF) {
+			destroy_hist_field(hist_field);
+			ret = -EINVAL;
+			goto out;
 		}
 
-		if (flags & HIST_FIELD_FL_TIMESTAMP)
-			key_size = sizeof(u64);
-		else if (is_string_field(field))
-			key_size = MAX_FILTER_STR_VAL;
-		else
-			key_size = field->size;
+		key_size = hist_field->size;
 	}
 
 	hist_data->fields[key_idx] = hist_field;
@@ -1054,16 +1359,20 @@ static int create_key_fields(struct hist_trigger_data *hist_data,
 		field_str = strsep(&fields_str, ",");
 		if (!field_str)
 			break;
+
 		ret = create_key_field(hist_data, i, key_offset,
 				       file, field_str);
 		if (ret < 0)
 			goto out;
+
 		key_offset += ret;
 	}
+
 	if (fields_str) {
 		ret = -EINVAL;
 		goto out;
 	}
+
 	ret = 0;
  out:
 	return ret;
@@ -1247,24 +1556,17 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
 
 		if (idx < 0)
 			return idx;
-	}
-
-	return 0;
-}
 
-static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
-{
-	struct hist_field *key_field;
-	unsigned int i;
-
-	for_each_hist_key_field(i, hist_data) {
-		key_field = hist_data->fields[i];
-
-		if (key_field->flags & HIST_FIELD_FL_EXECNAME)
-			return true;
+		if (hist_field->flags & HIST_FIELD_FL_VAR) {
+			idx = tracing_map_add_var(map);
+			if (idx < 0)
+				return idx;
+			hist_field->var_ref.idx = idx;
+			hist_data->n_vars++;
+		}
 	}
 
-	return false;
+	return 0;
 }
 
 static struct hist_trigger_data *
@@ -1290,8 +1592,7 @@ static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
 	if (ret)
 		goto free;
 
-	if (need_tracing_map_ops(hist_data))
-		map_ops = &hist_trigger_elt_comm_ops;
+	map_ops = &hist_trigger_elt_data_ops;
 
 	hist_data->map = tracing_map_create(map_bits, hist_data->key_size,
 					    map_ops, hist_data);
@@ -1324,22 +1625,37 @@ static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
 
 static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
 				    struct tracing_map_elt *elt, void *rec,
-				    struct ring_buffer_event *rbe)
+				    struct ring_buffer_event *rbe,
+				    u64 *var_ref_vals)
 {
 	struct hist_field *hist_field;
-	unsigned int i;
+	unsigned int i, var_idx;
 	u64 hist_val;
+	struct hist_elt_data *elt_data;
+
+	elt_data = elt->private_data;
+	elt_data->var_ref_vals = var_ref_vals;
 
 	for_each_hist_val_field(i, hist_data) {
 		hist_field = hist_data->fields[i];
-		hist_val = hist_field->fn(hist_field, rec, rbe);
+		hist_val = hist_field->fn(hist_field, elt, rbe, rec);
 		if (hist_field->flags & HIST_FIELD_FL_VAR) {
-			hist_field->var_val = hist_val;
+			var_idx = hist_field->var_ref.idx;
+			tracing_map_set_var(elt, var_idx, hist_val);
 			if (hist_field->flags & HIST_FIELD_FL_VAR_ONLY)
 				continue;
 		}
 		tracing_map_update_sum(elt, i, hist_val);
 	}
+
+	for_each_hist_key_field(i, hist_data) {
+		hist_field = hist_data->fields[i];
+		if (hist_field->flags & HIST_FIELD_FL_VAR) {
+			hist_val = hist_field->fn(hist_field, elt, rbe, rec);
+			var_idx = hist_field->var_ref.idx;
+			tracing_map_set_var(elt, var_idx, hist_val);
+		}
+	}
 }
 
 static inline void add_to_key(char *compound_key, void *key,
@@ -1372,10 +1688,11 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 	struct hist_trigger_data *hist_data = data->private_data;
 	bool use_compound_key = (hist_data->n_keys > 1);
 	unsigned long entries[HIST_STACKTRACE_DEPTH];
+	u64 var_ref_vals[TRACING_MAP_VARS_MAX];
 	char compound_key[HIST_KEY_SIZE_MAX];
 	struct stack_trace stacktrace;
 	struct hist_field *key_field;
-	struct tracing_map_elt *elt;
+	struct tracing_map_elt *elt = NULL;
 	u64 field_contents;
 	void *key = NULL;
 	unsigned int i;
@@ -1396,7 +1713,7 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 
 			key = entries;
 		} else {
-			field_contents = key_field->fn(key_field, rec, rbe);
+			field_contents = key_field->fn(key_field, elt, rbe, rec);
 			if (key_field->flags & HIST_FIELD_FL_STRING) {
 				key = (void *)(unsigned long)field_contents;
 				use_compound_key = true;
@@ -1405,19 +1722,20 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 		}
 		if (use_compound_key)
 			add_to_key(compound_key, key, key_field, rec);
-
-		if (key_field->flags & HIST_FIELD_FL_VAR)
-			key_field->var_val = (u64)key;
 	}
 
 	if (use_compound_key)
 		key = compound_key;
 
+	if (hist_data->n_var_refs &&
+	    !resolve_var_refs(hist_data, key, var_ref_vals))
+		return;
+
 	elt = tracing_map_insert(hist_data->map, key);
 	if (!elt)
 		return;
 
-	hist_trigger_elt_update(hist_data, elt, rec, rbe);
+	hist_trigger_elt_update(hist_data, elt, rec, rbe, var_ref_vals);
 }
 
 static void hist_trigger_stacktrace_print(struct seq_file *m,
@@ -1774,7 +2092,12 @@ static void event_hist_trigger_free(struct event_trigger_ops *ops,
 	if (!data->ref) {
 		if (data->name)
 			del_named_trigger(data);
+
 		trigger_data_free(data);
+
+		if (remove_hist_vars(hist_data))
+			return;
+
 		destroy_hist_data(hist_data);
 	}
 }
@@ -1992,16 +2315,47 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
 	update_cond_flag(file);
 
 	tracing_set_time_stamp_abs(file->tr, true);
+ out:
+	return ret;
+}
+
+static int hist_trigger_enable(struct event_trigger_data *data,
+			       struct trace_event_file *file)
+{
+	int ret = 0;
 
 	if (trace_event_trigger_enable_disable(file, 1) < 0) {
 		list_del_rcu(&data->list);
 		update_cond_flag(file);
 		ret--;
 	}
- out:
+
 	return ret;
 }
 
+static bool hist_trigger_check_refs(struct event_trigger_data *data,
+				    struct trace_event_file *file)
+{
+	struct hist_trigger_data *hist_data = data->private_data;
+	struct event_trigger_data *test, *named_data = NULL;
+
+	if (hist_data->attrs->name)
+		named_data = find_named_trigger(hist_data->attrs->name);
+
+	list_for_each_entry_rcu(test, &file->triggers, list) {
+		if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+			if (!hist_trigger_match(data, test, named_data, false))
+				continue;
+			hist_data = test->private_data;
+			if (check_var_refs(hist_data))
+				return true;
+			break;
+		}
+	}
+
+	return false;
+}
+
 static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,
 				    struct event_trigger_data *data,
 				    struct trace_event_file *file)
@@ -2012,7 +2366,6 @@ static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,
 
 	if (hist_data->attrs->name)
 		named_data = find_named_trigger(hist_data->attrs->name);
-
 	list_for_each_entry_rcu(test, &file->triggers, list) {
 		if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
 			if (!hist_trigger_match(data, test, named_data, false))
@@ -2100,6 +2453,11 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
 			goto out_free;
 	}
 
+	if (hist_trigger_check_refs(trigger_data, file)) {
+		ret = -EINVAL;
+		goto out_free;
+	}
+
 	if (glob[0] == '!') {
 		cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
 		ret = 0;
@@ -2119,16 +2477,28 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
 		goto out_free;
 	} else if (ret < 0)
 		goto out_free;
+
+	if (has_hist_vars(hist_data))
+		save_hist_vars(hist_data);
+
+	ret = tracing_map_init(hist_data->map);
+	if (ret)
+		goto out_unreg;
+
+	ret = hist_trigger_enable(trigger_data, file);
+	if (ret)
+		goto out_unreg;
+
 	/* Just return zero, not the number of registered triggers */
 	ret = 0;
  out:
 	return ret;
+ out_unreg:
+	cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
  out_free:
 	if (cmd_ops->set_filter)
 		cmd_ops->set_filter(NULL, trigger_data, NULL);
-
 	kfree(trigger_data);
-
 	destroy_hist_data(hist_data);
 	goto out;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 15/21] tracing: Add usecs modifier for hist trigger timestamps
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (13 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 14/21] tracing: Add variable reference handling " Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 16/21] tracing: Add support for dynamic tracepoints Tom Zanussi
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Appending .usecs onto a common_timestamp field will cause the
timestamp value to be in microseconds instead of the default
nanoseconds.  A typical latency histogram using usecs would look like
this:

   # echo 'hist:keys=pid,prio:ts0=common_timestamp.usecs ...
   # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-ts0 ...

This also adds an external trace_clock_in_ns() to trace.c for the
timestamp conversion.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace.c             |  8 ++++++++
 kernel/trace/trace.h             |  2 ++
 kernel/trace/trace_events_hist.c | 33 ++++++++++++++++++++++++---------
 3 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 78dff2f..f2a3b9c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1172,6 +1172,14 @@ unsigned long nsecs_to_usecs(unsigned long nsecs)
 	ARCH_TRACE_CLOCKS
 };
 
+bool trace_clock_in_ns(struct trace_array *tr)
+{
+	if (trace_clocks[tr->clock_id].in_ns)
+		return true;
+
+	return false;
+}
+
 /*
  * trace_parser_get_init - gets the buffer for trace parser
  */
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f2af21b..1d0991b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -280,6 +280,8 @@ enum {
 
 extern int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs);
 
+extern bool trace_clock_in_ns(struct trace_array *tr);
+
 /*
  * The global tracer (top) should be the first trace array added,
  * but we check the flag anyway.
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index c0a2ce8..13d67d3 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -163,14 +163,6 @@ static u64 hist_field_log2(struct hist_field *hist_field,
 	return (u64) ilog2(roundup_pow_of_two(val));
 }
 
-static u64 hist_field_timestamp(struct hist_field *hist_field,
-				struct tracing_map_elt *elt,
-				struct ring_buffer_event *rbe,
-				void *event)
-{
-	return ring_buffer_event_time_stamp(rbe);
-}
-
 #define DEFINE_HIST_FIELD_FN(type)					\
 	static u64 hist_field_##type(struct hist_field *hist_field,	\
 				     struct tracing_map_elt *elt,	\
@@ -223,6 +215,7 @@ enum hist_field_flags {
 	HIST_FIELD_FL_VAR_REF		= 4096,
 	HIST_FIELD_FL_EXPR		= 8192,
 	HIST_FIELD_FL_TIMESTAMP		= 16384,
+	HIST_FIELD_FL_TIMESTAMP_USECS	= 32768,
 };
 
 struct hist_trigger_attrs {
@@ -235,6 +228,7 @@ struct hist_trigger_attrs {
 	bool		pause;
 	bool		cont;
 	bool		clear;
+	bool		ts_in_usecs;
 	unsigned int	map_bits;
 };
 
@@ -255,6 +249,22 @@ struct hist_trigger_data {
 	bool				enable_timestamps;
 };
 
+static u64 hist_field_timestamp(struct hist_field *hist_field,
+				struct tracing_map_elt *elt,
+				struct ring_buffer_event *rbe,
+				void *event)
+{
+	struct hist_trigger_data *hist_data = elt->map->private_data;
+	struct trace_array *tr = hist_data->event_file->tr;
+
+	u64 ts = ring_buffer_event_time_stamp(rbe);
+
+	if (hist_data->attrs->ts_in_usecs && trace_clock_in_ns(tr))
+		ts = ns2usecs(ts);
+
+	return ts;
+}
+
 static LIST_HEAD(hist_var_list);
 
 struct hist_var_data {
@@ -620,7 +630,6 @@ static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
 
 	while (trigger_str) {
 		char *str = strsep(&trigger_str, ":");
-
 		if (strchr(str, '=')) {
 			ret = parse_assignment(str, attrs);
 			if (ret)
@@ -948,6 +957,8 @@ static struct hist_field *parse_var_ref(char *var_name)
 			*flags |= HIST_FIELD_FL_SYSCALL;
 		else if (strcmp(field_str, "log2") == 0)
 			*flags |= HIST_FIELD_FL_LOG2;
+		else if (strcmp(field_str, "usecs") == 0)
+			*flags |= HIST_FIELD_FL_TIMESTAMP_USECS;
 		else
 			return ERR_PTR(-EINVAL);
 	}
@@ -955,6 +966,8 @@ static struct hist_field *parse_var_ref(char *var_name)
 	if (strcmp(field_name, "common_timestamp") == 0) {
 		*flags |= HIST_FIELD_FL_TIMESTAMP;
 		hist_data->enable_timestamps = true;
+		if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS)
+			hist_data->attrs->ts_in_usecs = true;
 	} else {
 		field = trace_find_event_field(file->event_call, field_name);
 		if (!field)
@@ -1949,6 +1962,8 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)
 		flags_str = "syscall";
 	else if (hist_field->flags & HIST_FIELD_FL_LOG2)
 		flags_str = "log2";
+	else if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP_USECS)
+		flags_str = "usecs";
 
 	return flags_str;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 16/21] tracing: Add support for dynamic tracepoints
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (14 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 15/21] tracing: Add usecs modifier for hist trigger timestamps Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 17/21] tracing: Add hist trigger action hook Tom Zanussi
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

The tracepoint infrastructure assumes statically-defined tracepoints
and uses static_keys for tracepoint enablement.  In order to define
tracepoints on the fly, we need to have a dynamic counterpart.

Add a dynamic_tracepoint_probe_register() and a dynamic param onto
tracepoint_probe_unregister() for this purpose.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 include/linux/tracepoint.h  | 11 +++++++----
 kernel/trace/trace_events.c |  4 ++--
 kernel/tracepoint.c         | 42 ++++++++++++++++++++++++++++++------------
 3 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index f72fcfe..72438cb 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -37,9 +37,12 @@ struct trace_enum_map {
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
 tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data,
-			       int prio);
+			       int prio, bool dynamic);
+extern int dynamic_tracepoint_probe_register(struct tracepoint *tp,
+					     void *probe, void *data);
 extern int
-tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data);
+tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data,
+			    bool dynamic);
 extern void
 for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv),
 		void *priv);
@@ -206,13 +209,13 @@ static inline void tracepoint_synchronize_unregister(void)
 				   int prio)				\
 	{								\
 		return tracepoint_probe_register_prio(&__tracepoint_##name, \
-					      (void *)probe, data, prio); \
+				      (void *)probe, data, prio, false); \
 	}								\
 	static inline int						\
 	unregister_trace_##name(void (*probe)(data_proto), void *data)	\
 	{								\
 		return tracepoint_probe_unregister(&__tracepoint_##name,\
-						(void *)probe, data);	\
+					   (void *)probe, data, false); \
 	}								\
 	static inline void						\
 	check_trace_callback_type_##name(void (*cb)(data_proto))	\
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 9311654..abba9ac 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -297,7 +297,7 @@ int trace_event_reg(struct trace_event_call *call,
 	case TRACE_REG_UNREGISTER:
 		tracepoint_probe_unregister(call->tp,
 					    call->class->probe,
-					    file);
+					    file, false);
 		return 0;
 
 #ifdef CONFIG_PERF_EVENTS
@@ -308,7 +308,7 @@ int trace_event_reg(struct trace_event_call *call,
 	case TRACE_REG_PERF_UNREGISTER:
 		tracepoint_probe_unregister(call->tp,
 					    call->class->perf_probe,
-					    call);
+					    call, false);
 		return 0;
 	case TRACE_REG_PERF_OPEN:
 	case TRACE_REG_PERF_CLOSE:
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 1f9a31f..6c55edd 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -191,12 +191,15 @@ static void *func_remove(struct tracepoint_func **funcs,
  * Add the probe function to a tracepoint.
  */
 static int tracepoint_add_func(struct tracepoint *tp,
-			       struct tracepoint_func *func, int prio)
+			       struct tracepoint_func *func, int prio,
+			       bool dynamic)
 {
 	struct tracepoint_func *old, *tp_funcs;
 	int ret;
 
-	if (tp->regfunc && !static_key_enabled(&tp->key)) {
+	if (tp->regfunc &&
+	    ((dynamic && !(atomic_read(&tp->key.enabled) > 0)) ||
+	     !static_key_enabled(&tp->key))) {
 		ret = tp->regfunc();
 		if (ret < 0)
 			return ret;
@@ -218,7 +221,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
 	 * is used.
 	 */
 	rcu_assign_pointer(tp->funcs, tp_funcs);
-	if (!static_key_enabled(&tp->key))
+	if (dynamic && !(atomic_read(&tp->key.enabled) > 0))
+		atomic_inc(&tp->key.enabled);
+	else if (!dynamic && !static_key_enabled(&tp->key))
 		static_key_slow_inc(&tp->key);
 	release_probes(old);
 	return 0;
@@ -231,7 +236,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
  * by preempt_disable around the call site.
  */
 static int tracepoint_remove_func(struct tracepoint *tp,
-		struct tracepoint_func *func)
+				  struct tracepoint_func *func, bool dynamic)
 {
 	struct tracepoint_func *old, *tp_funcs;
 
@@ -245,10 +250,14 @@ static int tracepoint_remove_func(struct tracepoint *tp,
 
 	if (!tp_funcs) {
 		/* Removed last function */
-		if (tp->unregfunc && static_key_enabled(&tp->key))
+		if (tp->unregfunc &&
+		    ((dynamic && (atomic_read(&tp->key.enabled) > 0)) ||
+		     static_key_enabled(&tp->key)))
 			tp->unregfunc();
 
-		if (static_key_enabled(&tp->key))
+		if (dynamic && (atomic_read(&tp->key.enabled) > 0))
+			atomic_dec(&tp->key.enabled);
+		else if (!dynamic && static_key_enabled(&tp->key))
 			static_key_slow_dec(&tp->key);
 	}
 	rcu_assign_pointer(tp->funcs, tp_funcs);
@@ -257,7 +266,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
 }
 
 /**
- * tracepoint_probe_register -  Connect a probe to a tracepoint
+ * tracepoint_probe_register_prio -  Connect a probe to a tracepoint
  * @tp: tracepoint
  * @probe: probe handler
  * @data: tracepoint data
@@ -270,7 +279,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
  * within module exit functions.
  */
 int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe,
-				   void *data, int prio)
+				   void *data, int prio, bool dynamic)
 {
 	struct tracepoint_func tp_func;
 	int ret;
@@ -279,7 +288,7 @@ int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe,
 	tp_func.func = probe;
 	tp_func.data = data;
 	tp_func.prio = prio;
-	ret = tracepoint_add_func(tp, &tp_func, prio);
+	ret = tracepoint_add_func(tp, &tp_func, prio, dynamic);
 	mutex_unlock(&tracepoints_mutex);
 	return ret;
 }
@@ -300,10 +309,18 @@ int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe,
  */
 int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data)
 {
-	return tracepoint_probe_register_prio(tp, probe, data, TRACEPOINT_DEFAULT_PRIO);
+	return tracepoint_probe_register_prio(tp, probe, data, TRACEPOINT_DEFAULT_PRIO, false);
 }
 EXPORT_SYMBOL_GPL(tracepoint_probe_register);
 
+int dynamic_tracepoint_probe_register(struct tracepoint *tp, void *probe,
+				      void *data)
+{
+	return tracepoint_probe_register_prio(tp, probe, data,
+					      TRACEPOINT_DEFAULT_PRIO, true);
+}
+EXPORT_SYMBOL_GPL(dynamic_tracepoint_probe_register);
+
 /**
  * tracepoint_probe_unregister -  Disconnect a probe from a tracepoint
  * @tp: tracepoint
@@ -312,7 +329,8 @@ int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data)
  *
  * Returns 0 if ok, error value on error.
  */
-int tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data)
+int tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data,
+				bool dynamic)
 {
 	struct tracepoint_func tp_func;
 	int ret;
@@ -320,7 +338,7 @@ int tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data)
 	mutex_lock(&tracepoints_mutex);
 	tp_func.func = probe;
 	tp_func.data = data;
-	ret = tracepoint_remove_func(tp, &tp_func);
+	ret = tracepoint_remove_func(tp, &tp_func, dynamic);
 	mutex_unlock(&tracepoints_mutex);
 	return ret;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 17/21] tracing: Add hist trigger action hook
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (15 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 16/21] tracing: Add support for dynamic tracepoints Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 18/21] tracing: Add support for 'synthetic' events Tom Zanussi
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Add a hook for executing extra actions whenever a histogram entry is
added or updated.

The default 'action' when a hist entry is added to a histogram is to
update the set of values associated with it.  Some applications may
want to perform additional actions at that point, such as generate
another event, or compare and save a maximum.

Add a simple framework for doing that; specific actions will be
implemented on top of it in later patches.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 155 +++++++++++++++++++++++++++++++++++----
 1 file changed, 139 insertions(+), 16 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 13d67d3..0147141 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -32,6 +32,7 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field,
 
 #define HIST_FIELD_OPERANDS_MAX	2
 #define HIST_ASSIGNMENT_MAX	4
+#define HIST_ACTIONS_MAX	8
 
 enum field_op_id {
 	FIELD_OP_NONE,
@@ -60,6 +61,7 @@ struct hist_field {
 	char				*name;
 	u64				var_val;
 	unsigned int			var_idx;
+	bool                            read_once;
 };
 
 static u64 hist_field_none(struct hist_field *field,
@@ -230,6 +232,8 @@ struct hist_trigger_attrs {
 	bool		clear;
 	bool		ts_in_usecs;
 	unsigned int	map_bits;
+	char		*action_str[HIST_ACTIONS_MAX];
+	unsigned int	n_actions;
 };
 
 struct hist_trigger_data {
@@ -247,6 +251,8 @@ struct hist_trigger_data {
 	struct hist_trigger_attrs	*attrs;
 	struct tracing_map		*map;
 	bool				enable_timestamps;
+	struct action_data		*actions[HIST_ACTIONS_MAX];
+	unsigned int			n_actions;
 };
 
 static u64 hist_field_timestamp(struct hist_field *hist_field,
@@ -452,7 +458,7 @@ static u64 hist_field_var_ref(struct hist_field *hist_field,
 
 static bool resolve_var_refs(struct hist_trigger_data *hist_data,
 			     void *key,
-			     u64 *var_ref_vals)
+			     u64 *var_ref_vals, bool self)
 {
 	struct hist_trigger_data *var_data;
 	struct tracing_map_elt *var_elt;
@@ -466,17 +472,26 @@ static bool resolve_var_refs(struct hist_trigger_data *hist_data,
 		var_idx = hist_field->var_ref.idx;
 		var_data = hist_field->var_ref.hist_data;
 
+		if ((self && var_data != hist_data) ||
+		    (!self && var_data == hist_data))
+			continue;
+
 		var_elt = tracing_map_lookup(var_data->map, key);
 		if (!var_elt) {
 			resolved = false;
 			break;
 		}
+
 		if (!tracing_map_var_set(var_elt, var_idx)) {
 			resolved = false;
 			break;
 		}
 
-		var_val = tracing_map_read_var(var_elt, var_idx);
+		if (self || !hist_field->read_once)
+			var_val = tracing_map_read_var(var_elt, var_idx);
+		else
+			var_val = tracing_map_read_var_once(var_elt, var_idx);
+
 		var_ref_vals[i] = var_val;
 	}
 
@@ -569,6 +584,9 @@ static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
 	for (i = 0; i < attrs->n_assignments; i++)
 		kfree(attrs->assignment_str[i]);
 
+	for (i = 0; i < attrs->n_actions; i++)
+		kfree(attrs->action_str[i]);
+
 	kfree(attrs->name);
 	kfree(attrs->sort_key_str);
 	kfree(attrs->keys_str);
@@ -576,6 +594,16 @@ static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
 	kfree(attrs);
 }
 
+static int parse_action(char *str, struct hist_trigger_attrs *attrs)
+{
+	int ret = 0;
+
+	if (attrs->n_actions == HIST_ACTIONS_MAX)
+		return -EINVAL;
+
+	return ret;
+}
+
 static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
 {
 	int ret = 0;
@@ -642,8 +670,13 @@ static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
 		else if (strcmp(str, "clear") == 0)
 			attrs->clear = true;
 		else {
-			ret = -EINVAL;
-			goto free;
+			ret = parse_action(str, attrs);
+			if (ret < 0)
+				goto free;
+			if (!ret) {
+				ret = -EINVAL;
+				goto free;
+			}
 		}
 	}
 
@@ -910,6 +943,18 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
 	}
 }
 
+struct action_data;
+
+typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
+			     struct tracing_map_elt *elt, void *rec,
+			     struct ring_buffer_event *rbe,
+			     struct action_data *data, u64 *var_ref_vals);
+
+struct action_data {
+	action_fn_t	fn;
+	unsigned int	var_ref_idx;
+};
+
 static struct hist_field *parse_var_ref(char *var_name)
 {
 	struct hist_var_data *var_data;
@@ -1150,6 +1195,9 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
 		goto free;
 	}
 
+	operand1->read_once = true;
+	operand2->read_once = true;
+
 	expr->operands[0] = operand1;
 	expr->operands[1] = operand2;
 	expr->operator = field_op;
@@ -1529,14 +1577,6 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
 	return ret;
 }
 
-static void destroy_hist_data(struct hist_trigger_data *hist_data)
-{
-	destroy_hist_trigger_attrs(hist_data->attrs);
-	destroy_hist_fields(hist_data);
-	tracing_map_destroy(hist_data->map);
-	kfree(hist_data);
-}
-
 static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
 {
 	struct tracing_map *map = hist_data->map;
@@ -1582,6 +1622,63 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
 	return 0;
 }
 
+static void destroy_actions(struct hist_trigger_data *hist_data)
+{
+	unsigned int i;
+
+	for (i = 0; i < hist_data->n_actions; i++) {
+		struct action_data *data = hist_data->actions[i];
+
+		kfree(data);
+	}
+}
+
+static int create_actions(struct hist_trigger_data *hist_data)
+{
+	unsigned int i;
+	int ret = 0;
+	char *str;
+
+	for (i = 0; i < hist_data->attrs->n_actions; i++) {
+		str = hist_data->attrs->action_str[i];
+	}
+
+	return ret;
+}
+
+static void print_actions(struct seq_file *m,
+			  struct hist_trigger_data *hist_data,
+			  struct tracing_map_elt *elt)
+{
+	unsigned int i;
+
+	for (i = 0; i < hist_data->n_actions; i++) {
+		struct action_data *data = hist_data->actions[i];
+	}
+}
+
+static void print_actions_spec(struct seq_file *m,
+			       struct hist_trigger_data *hist_data)
+{
+	unsigned int i;
+
+	for (i = 0; i < hist_data->n_actions; i++) {
+		struct action_data *data = hist_data->actions[i];
+	}
+}
+
+static void destroy_hist_data(struct hist_trigger_data *hist_data)
+{
+	if (!hist_data)
+		return;
+
+	destroy_hist_trigger_attrs(hist_data->attrs);
+	destroy_hist_fields(hist_data);
+	tracing_map_destroy(hist_data->map);
+	destroy_actions(hist_data);
+	kfree(hist_data);
+}
+
 static struct hist_trigger_data *
 create_hist_data(unsigned int map_bits,
 		 struct hist_trigger_attrs *attrs,
@@ -1695,6 +1792,20 @@ static inline void add_to_key(char *compound_key, void *key,
 	memcpy(compound_key + key_field->offset, key, size);
 }
 
+static void
+hist_trigger_actions(struct hist_trigger_data *hist_data,
+		     struct tracing_map_elt *elt, void *rec,
+		     struct ring_buffer_event *rbe, u64 *var_ref_vals)
+{
+	struct action_data *data;
+	unsigned int i;
+
+	for (i = 0; i < hist_data->n_actions; i++) {
+		data = hist_data->actions[i];
+		data->fn(hist_data, elt, rec, rbe, data, var_ref_vals);
+	}
+}
+
 static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 			       struct ring_buffer_event *rbe)
 {
@@ -1741,7 +1852,7 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 		key = compound_key;
 
 	if (hist_data->n_var_refs &&
-	    !resolve_var_refs(hist_data, key, var_ref_vals))
+	    !resolve_var_refs(hist_data, key, var_ref_vals, false))
 		return;
 
 	elt = tracing_map_insert(hist_data->map, key);
@@ -1749,6 +1860,9 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
 		return;
 
 	hist_trigger_elt_update(hist_data, elt, rec, rbe, var_ref_vals);
+
+	if (resolve_var_refs(hist_data, key, var_ref_vals, true))
+		hist_trigger_actions(hist_data, elt, rec, rbe, var_ref_vals);
 }
 
 static void hist_trigger_stacktrace_print(struct seq_file *m,
@@ -2069,6 +2183,8 @@ static int event_hist_trigger_print(struct seq_file *m,
 	}
 	seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits));
 
+	print_actions_spec(m, hist_data);
+
 	if (data->filter_str)
 		seq_printf(m, " if %s", data->filter_str);
 
@@ -2421,6 +2537,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
 	struct hist_trigger_attrs *attrs;
 	struct event_trigger_ops *trigger_ops;
 	struct hist_trigger_data *hist_data;
+	bool unreg_self = false;
 	char *trigger;
 	int ret = 0;
 
@@ -2496,6 +2613,10 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
 	if (has_hist_vars(hist_data))
 		save_hist_vars(hist_data);
 
+	ret = create_actions(hist_data);
+	if (ret)
+		goto out_unreg;
+
 	ret = tracing_map_init(hist_data->map);
 	if (ret)
 		goto out_unreg;
@@ -2503,18 +2624,20 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
 	ret = hist_trigger_enable(trigger_data, file);
 	if (ret)
 		goto out_unreg;
-
 	/* Just return zero, not the number of registered triggers */
 	ret = 0;
  out:
 	return ret;
  out_unreg:
 	cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+	unreg_self = true;
  out_free:
 	if (cmd_ops->set_filter)
 		cmd_ops->set_filter(NULL, trigger_data, NULL);
-	kfree(trigger_data);
-	destroy_hist_data(hist_data);
+	if (!unreg_self) {
+		kfree(trigger_data);
+		destroy_hist_data(hist_data);
+	}
 	goto out;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 18/21] tracing: Add support for 'synthetic' events
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (16 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 17/21] tracing: Add hist trigger action hook Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 19/21] tracing: Add 'onmatch' hist trigger action support Tom Zanussi
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Synthetic events are user-defined events generated from hist trigger
variables saved from one or more other events.

To define a synthetic event, the user writes a simple specification
consisting of the name of the new event along with one or more
variables defined on other events, to the tracing/synthetic_events
file.

For instance, the following creates a new event named 'wakeup_latency'
with 3 fields: lat, pid, and prio.  Each of those fields is simply a
variable reference to a variable on another event:

    # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
      pid=sched_switch:woken_pid prio=sched_switch:woken_prio' >> \
      /sys/kernel/debug/tracing/synthetic_events

Reading the tracing/synthetic_events file lists all the
currently-defined synthetic events, in this case the event we defined
above:

    # cat /sys/kernel/debug/tracing/synthetic_events
    wakeup_latency lat=sched_switch:wakeup_lat*, pid=sched_switch:woken_pid*, \
                   prio=sched_switch:woken_prio*

Any event field that hasn't yet been 'resolved' is shown with an
asterisk following it.  A field will be unresolved if another event
defining the specified variable hasn't been defined yet.  Once we add
the second event below, those variables are defined:

    # echo 'hist:keys=pid,prio:ts0=common_timestamp.usecs' >> \
      /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger

    # echo 'hist:keys=woken_pid=next_pid,woken_prio=next_prio:\
      wakeup_lat=common_timestamp.usecs-ts0:' >> \
      /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

At that point the variables are defined and displaying the
synthetic_event no longer displays the asterisks:

    # cat /sys/kernel/debug/tracing/synthetic_events
    wakeup_latency lat=sched:sched_switch:wakeup_lat, \
    pid=sched:sched_switch:woken_pid, prio=sched:sched_switch:woken_prio

At this point, the synthetic event is ready to use, and a histogram
can be defined using it:

    # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
    /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger

The new event is created under the tracing/events/synthetic/ directory
and looks and behaves just like any other event:

    # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
      enable  filter  format  hist  id  trigger

Although a histogram can be defined for it, nothing will happen until
an action tracing that event via the trace_synthetic() function
occurs.  The trace_synthetic() function is very similar to all the
other trace_* invocations spread throughout the kernel, except in this
case the trace_ function and its corresponding tracepoint isn't
statically generated but defined by the user at run-time.

How this can be automatically hooked up via a hist trigger 'action' is
discussed in a subsequent patch.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 822 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 814 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0147141..46da09f 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -19,10 +19,13 @@
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/stacktrace.h>
+#include <linux/tracefs.h>
 
 #include "tracing_map.h"
 #include "trace.h"
 
+#define SYNTHETIC_EVENT_SYSTEM "synthetic"
+
 struct hist_field;
 
 typedef u64 (*hist_field_fn_t) (struct hist_field *field,
@@ -44,6 +47,10 @@ enum field_op_id {
 struct hist_var_ref {
 	struct hist_trigger_data	*hist_data;
 	unsigned int			idx;
+	bool				pending;
+	char				*pending_system;
+	char				*pending_event_name;
+	char				*pending_var_name;
 };
 
 struct hist_field {
@@ -419,7 +426,9 @@ static int remove_hist_vars(struct hist_trigger_data *hist_data)
 }
 
 static struct hist_field *find_var_field(struct hist_trigger_data *hist_data,
-					 char *var_name)
+					 const char *system,
+					 const char *event_name,
+					 const char *var_name)
 {
 	struct hist_field *hist_field, *found = NULL;
 	int i;
@@ -943,6 +952,22 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
 	}
 }
 
+struct synthetic_event_field {
+	char *name;
+	struct hist_field *var_ref;
+};
+
+struct synthetic_event {
+	struct list_head		list;
+	char				*name;
+	struct synthetic_event_field	*fields;
+	unsigned int			n_fields;
+	u64				*var_ref_vals;
+	struct trace_event_class	class;
+	struct trace_event_call		call;
+	struct tracepoint		*tp;
+};
+
 struct action_data;
 
 typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
@@ -953,15 +978,158 @@ typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
 struct action_data {
 	action_fn_t	fn;
 	unsigned int	var_ref_idx;
+	struct synthetic_event *synthetic_event;
 };
 
-static struct hist_field *parse_var_ref(char *var_name)
+static LIST_HEAD(synthetic_events_list);
+static DEFINE_MUTEX(synthetic_event_mutex);
+
+static void free_synthetic_tracepoint(struct tracepoint *tp)
+{
+	if (!tp)
+		return;
+
+	kfree(tp->name);
+	kfree(tp);
+}
+
+static struct tracepoint *alloc_synthetic_tracepoint(char *name)
+{
+	struct tracepoint *tp;
+	int ret = 0;
+
+	tp = kzalloc(sizeof(*tp), GFP_KERNEL);
+	if (!tp) {
+		ret = -ENOMEM;
+		goto free;
+	}
+
+	tp->name = kstrdup(name, GFP_KERNEL);
+	if (!tp->name) {
+		ret = -ENOMEM;
+		goto free;
+	}
+
+	return tp;
+ free:
+	free_synthetic_tracepoint(tp);
+
+	return ERR_PTR(ret);
+}
+
+static inline void trace_synthetic(struct synthetic_event *event,
+				   u64 *var_ref_vals,
+				   unsigned int var_ref_idx)
+{
+	struct tracepoint *tp = event->tp;
+
+	if (unlikely(atomic_read(&tp->key.enabled) > 0)) {
+		struct tracepoint_func *it_func_ptr;
+		void *it_func;
+		void *__data;
+
+		if (!(cpu_online(raw_smp_processor_id())))
+			return;
+
+		it_func_ptr = rcu_dereference_sched((tp)->funcs);
+		if (it_func_ptr) {
+			do {
+				it_func = (it_func_ptr)->func;
+				__data = (it_func_ptr)->data;
+				((void(*)(void *__data, u64* var_ref_vals, unsigned int var_ref_idx))(it_func))(__data, var_ref_vals, var_ref_idx);
+			} while ((++it_func_ptr)->func);
+		}
+	}
+}
+
+static struct synthetic_event *find_synthetic_event(char *name);
+
+static void reset_pending_var_refs(struct hist_trigger_data *hist_data,
+				   struct synthetic_event *event)
+{
+	const char *system, *event_name, *pending_system, *pending_event_name;
+	struct synthetic_event_field *se_field;
+	struct trace_event_call *call;
+	struct hist_field *ref_field;
+	unsigned int i;
+
+	call = hist_data->event_file->event_call;
+	system = call->class->system;
+	event_name = trace_event_name(call);
+
+	for (i = 0; i < event->n_fields; i++) {
+		se_field = &event->fields[i];
+		ref_field = se_field->var_ref;
+
+		pending_system = ref_field->var_ref.pending_system;
+		if ((pending_system) && (strcmp(system, pending_system) != 0))
+			continue;
+
+		pending_event_name = ref_field->var_ref.pending_event_name;
+		if (pending_event_name &&
+		    (strcmp(event_name, pending_event_name) == 0))
+			ref_field->var_ref.pending = true;
+	}
+}
+
+static void unresolve_pending_var_refs(struct hist_trigger_data *hist_data)
+{
+	struct synthetic_event *event;
+
+	mutex_lock(&synthetic_event_mutex);
+	list_for_each_entry(event, &synthetic_events_list, list)
+		reset_pending_var_refs(hist_data, event);
+	mutex_unlock(&synthetic_event_mutex);
+}
+
+static bool resolve_pending_var_refs(struct synthetic_event *event)
+{
+	struct hist_var_data *var_data;
+	struct hist_field *var_field = NULL, *ref_field = NULL;
+	struct synthetic_event_field *se_field;
+	char *system, *event_name, *var_name;
+	bool pending = false;
+	unsigned int i;
+
+	for (i = 0; i < event->n_fields; i++) {
+		se_field = &event->fields[i];
+		ref_field = se_field->var_ref;
+		if (!ref_field->var_ref.pending)
+			continue;
+
+		pending = true;
+
+		system = ref_field->var_ref.pending_system;
+		event_name = ref_field->var_ref.pending_event_name;
+		var_name = ref_field->var_ref.pending_var_name;
+
+		list_for_each_entry(var_data, &hist_var_list, list) {
+			var_field = find_var_field(var_data->hist_data, system,
+						   event_name, var_name);
+			if (!var_field)
+				continue;
+
+			ref_field->var_ref.idx = var_field->var_ref.idx;
+			ref_field->var_ref.hist_data = var_data->hist_data;
+			if (!ref_field->name)
+				ref_field->name = kstrdup(var_field->var_name, GFP_KERNEL);
+			ref_field->var_ref.pending = false;
+			pending = false;
+		}
+	}
+
+	return !pending;
+}
+
+static struct hist_field *parse_var_ref(char *system, char *event_name,
+					char *var_name, bool defer)
 {
 	struct hist_var_data *var_data;
 	struct hist_field *var_field = NULL, *ref_field = NULL;
 
 	list_for_each_entry(var_data, &hist_var_list, list) {
-		var_field = find_var_field(var_data->hist_data, var_name);
+		var_field = find_var_field(var_data->hist_data, system,
+					   event_name, var_name);
 		if (var_field)
 			break;
 	}
@@ -975,6 +1143,25 @@ static struct hist_field *parse_var_ref(char *var_name)
 			ref_field->var_ref.hist_data = var_data->hist_data;
 			ref_field->name = kstrdup(var_field->var_name, GFP_KERNEL);
 		}
+	} else if (defer) {
+		unsigned long flags = HIST_FIELD_FL_VAR_REF;
+
+		ref_field = create_hist_field(NULL, flags, NULL);
+		if (ref_field) {
+			char *str;
+
+			ref_field->var_ref.pending = true;
+			if (system) {
+				str = kstrdup(system, GFP_KERNEL);
+				ref_field->var_ref.pending_system = str;
+			}
+			if (event_name) {
+				str = kstrdup(event_name, GFP_KERNEL);
+				ref_field->var_ref.pending_event_name = str;
+			}
+			str = kstrdup(var_name, GFP_KERNEL);
+			ref_field->var_ref.pending_var_name = str;
+		}
 	}
 
 	return ref_field;
@@ -1030,7 +1217,7 @@ struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
 	struct hist_field *hist_field = NULL;
 	int ret = 0;
 
-	hist_field = parse_var_ref(str);
+	hist_field = parse_var_ref(NULL, NULL, str, false);
 	if (hist_field) {
 		hist_data->var_refs[hist_data->n_var_refs] = hist_field;
 		hist_field->var_ref_idx = hist_data->n_var_refs++;
@@ -1622,6 +1809,20 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
 	return 0;
 }
 
+static int add_synthetic_var_refs(struct hist_trigger_data *hist_data,
+				  struct synthetic_event *event)
+{
+	unsigned int i, var_ref_idx = hist_data->n_var_refs;
+
+	for (i = 0; i < event->n_fields; i++) {
+		struct hist_field *var_ref = event->fields[i].var_ref;
+
+		hist_data->var_refs[hist_data->n_var_refs++] = var_ref;
+	}
+
+	return var_ref_idx;
+}
+
 static void destroy_actions(struct hist_trigger_data *hist_data)
 {
 	unsigned int i;
@@ -1716,10 +1917,6 @@ static void destroy_hist_data(struct hist_trigger_data *hist_data)
 	if (ret)
 		goto free;
 
-	ret = tracing_map_init(hist_data->map);
-	if (ret)
-		goto free;
-
 	hist_data->event_file = file;
  out:
 	return hist_data;
@@ -2226,6 +2423,8 @@ static void event_hist_trigger_free(struct event_trigger_ops *ops,
 
 		trigger_data_free(data);
 
+		unresolve_pending_var_refs(hist_data);
+
 		if (remove_hist_vars(hist_data))
 			return;
 
@@ -2795,3 +2994,610 @@ __init int register_trigger_hist_enable_disable_cmds(void)
 
 	return ret;
 }
+
+static void free_synthetic_event_field(struct synthetic_event_field *field)
+{
+	if (field->var_ref->var_ref.pending)
+		destroy_hist_field(field->var_ref);
+	kfree(field->name);
+}
+
+static void free_synthetic_event_print_fmt(struct trace_event_call *call)
+{
+	kfree(call->print_fmt);
+}
+
+static void free_synthetic_event(struct synthetic_event *event)
+{
+	unsigned int i;
+
+	if (!event)
+		return;
+
+	for (i = 0; i < event->n_fields; i++)
+		free_synthetic_event_field(&event->fields[i]);
+
+	kfree(event->fields);
+	kfree(event->name);
+
+	kfree(event->class.system);
+	free_synthetic_tracepoint(event->tp);
+	free_synthetic_event_print_fmt(&event->call);
+
+	kfree(event);
+}
+
+static struct synthetic_event *alloc_synthetic_event(char *event_name,
+						     int n_fields)
+{
+	struct synthetic_event *event;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event) {
+		event = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	event->name = kstrdup(event_name, GFP_KERNEL);
+	if (!event->name) {
+		kfree(event);
+		event = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	event->n_fields = n_fields;
+	event->fields = kcalloc(n_fields, sizeof(*event->fields), GFP_KERNEL);
+	if (!event->fields)
+		goto err;
+ out:
+	return event;
+ err:
+	free_synthetic_event(event);
+	event = NULL;
+	goto out;
+}
+
+static struct synthetic_event *find_synthetic_event(char *name)
+{
+	struct synthetic_event *event, *found = NULL;
+
+	mutex_lock(&synthetic_event_mutex);
+
+	list_for_each_entry(event, &synthetic_events_list, list) {
+		if (strcmp(event->name, name) == 0) {
+			found = event;
+			goto out;
+		}
+	}
+ out:
+	mutex_unlock(&synthetic_event_mutex);
+
+	return found;
+}
+
+struct synthetic_trace_event {
+	struct trace_entry	ent;
+	int			n_fields;
+	u64			fields[];
+};
+
+static int synthetic_event_define_fields(struct trace_event_call *call)
+{
+	struct synthetic_event *event = call->data;
+	struct synthetic_trace_event trace;
+	unsigned int i;
+	int ret = 0;
+	int offset = offsetof(typeof(trace), fields);
+
+	for (i = 0; i < event->n_fields; i++) {
+		ret = trace_define_field(call, "u64", event->fields[i].name,
+					 offset, sizeof(u64), 0, FILTER_OTHER);
+		offset += sizeof(u64);
+	}
+
+	return ret;
+}
+
+static enum print_line_t
+print_synthetic_event(struct trace_iterator *iter, int flags,
+		      struct trace_event *event)
+{
+	struct trace_array *tr = iter->tr;
+	struct trace_seq *s = &iter->seq;
+	struct synthetic_trace_event *entry;
+	struct synthetic_event *se;
+	unsigned int i;
+
+	entry = (struct synthetic_trace_event *)iter->ent;
+	se = container_of(event, struct synthetic_event, call.event);
+
+	trace_seq_printf(s, "%s: ", se->name);
+
+	for (i = 0; i < entry->n_fields; i++) {
+		if (trace_seq_has_overflowed(s))
+			goto end;
+
+		/* parameter types */
+		if (tr->trace_flags & TRACE_ITER_VERBOSE)
+			trace_seq_printf(s, "%s ", "u64");
+
+		/* parameter values */
+		trace_seq_printf(s, "%s=%llu%s", se->fields[i].name,
+				 entry->fields[i],
+				 i == entry->n_fields - 1 ? "" : ", ");
+	}
+end:
+	trace_seq_putc(s, '\n');
+
+	return trace_handle_return(s);
+}
+
+static struct trace_event_functions synthetic_event_funcs = {
+	.trace		= print_synthetic_event
+};
+
+static notrace void
+trace_event_raw_event_synthetic(void *__data,
+				u64 *var_ref_vals,
+				unsigned int var_ref_idx)
+{
+	struct trace_event_file *trace_file = __data;
+	struct synthetic_trace_event *entry;
+	struct trace_event_buffer fbuffer;
+	int fields_size;
+	unsigned int i;
+
+	struct synthetic_event *event;
+
+	event = trace_file->event_call->data;
+
+	if (trace_trigger_soft_disabled(trace_file))
+		return;
+
+	fields_size = event->n_fields * sizeof(u64);
+
+	entry = trace_event_buffer_reserve(&fbuffer, trace_file,
+					   sizeof(*entry) + fields_size);
+	if (!entry)
+		return;
+
+	entry->n_fields = event->n_fields;
+
+	for (i = 0; i < event->n_fields; i++)
+		entry->fields[i] = var_ref_vals[var_ref_idx + i];
+
+	trace_event_buffer_commit(&fbuffer);
+}
+
+static int __set_synthetic_event_print_fmt(struct synthetic_event *event,
+					   char *buf, int len)
+{
+	int pos = 0;
+	int i;
+
+	/* When len=0, we just calculate the needed length */
+#define LEN_OR_ZERO (len ? len - pos : 0)
+
+	pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
+	for (i = 0; i < event->n_fields; i++) {
+		pos += snprintf(buf + pos, LEN_OR_ZERO, "%s: 0x%%0%zulx%s",
+				event->fields[i].name, sizeof(u64),
+				i == event->n_fields - 1 ? "" : ", ");
+	}
+	pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
+
+	for (i = 0; i < event->n_fields; i++) {
+		pos += snprintf(buf + pos, LEN_OR_ZERO,
+				", ((u64)(REC->%s))", event->fields[i].name);
+	}
+
+#undef LEN_OR_ZERO
+
+	/* return the length of print_fmt */
+	return pos;
+}
+
+static int set_synthetic_event_print_fmt(struct trace_event_call *call)
+{
+	struct synthetic_event *event = call->data;
+	char *print_fmt;
+	int len;
+
+	/* First: called with 0 length to calculate the needed length */
+	len = __set_synthetic_event_print_fmt(event, NULL, 0);
+
+	print_fmt = kmalloc(len + 1, GFP_KERNEL);
+	if (!print_fmt)
+		return -ENOMEM;
+
+	/* Second: actually write the @print_fmt */
+	__set_synthetic_event_print_fmt(event, print_fmt, len + 1);
+	call->print_fmt = print_fmt;
+
+	return 0;
+}
+
+int dynamic_trace_event_reg(struct trace_event_call *call,
+			    enum trace_reg type, void *data)
+{
+	struct trace_event_file *file = data;
+
+	WARN_ON(!(call->flags & TRACE_EVENT_FL_TRACEPOINT));
+	switch (type) {
+	case TRACE_REG_REGISTER:
+		return dynamic_tracepoint_probe_register(call->tp,
+							 call->class->probe,
+							 file);
+	case TRACE_REG_UNREGISTER:
+		tracepoint_probe_unregister(call->tp,
+					    call->class->probe,
+					    file, true);
+		return 0;
+
+#ifdef CONFIG_PERF_EVENTS
+	case TRACE_REG_PERF_REGISTER:
+		return dynamic_tracepoint_probe_register(call->tp,
+							 call->class->perf_probe,
+							 call);
+	case TRACE_REG_PERF_UNREGISTER:
+		tracepoint_probe_unregister(call->tp,
+					    call->class->perf_probe,
+					    call, true);
+		return 0;
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+		return 0;
+#endif
+	}
+	return 0;
+}
+
+static int register_synthetic_event(struct synthetic_event *event)
+{
+	struct trace_event_call *call = &event->call;
+	int ret = 0;
+
+	event->call.class = &event->class;
+	event->class.system = kstrdup(SYNTHETIC_EVENT_SYSTEM, GFP_KERNEL);
+	if (!event->class.system) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	event->tp = alloc_synthetic_tracepoint(event->name);
+	if (IS_ERR(event->tp)) {
+		ret = PTR_ERR(event->tp);
+		event->tp = NULL;
+		goto out;
+	}
+
+	INIT_LIST_HEAD(&call->class->fields);
+	call->event.funcs = &synthetic_event_funcs;
+	call->class->define_fields = synthetic_event_define_fields;
+
+	ret = register_trace_event(&call->event);
+	if (!ret) {
+		ret = -ENODEV;
+		goto out;
+	}
+	call->flags = TRACE_EVENT_FL_TRACEPOINT;
+	call->class->reg = dynamic_trace_event_reg;
+	call->class->probe = trace_event_raw_event_synthetic;
+	call->data = event;
+	call->tp = event->tp;
+	ret = trace_add_event_call(call);
+	if (ret) {
+		pr_warn("Failed to register synthetic event: %s\n",
+			trace_event_name(call));
+		goto err;
+	}
+
+	ret = set_synthetic_event_print_fmt(call);
+	if (ret < 0) {
+		trace_remove_event_call(call);
+		goto err;
+	}
+ out:
+	return ret;
+ err:
+	unregister_trace_event(&call->event);
+	goto out;
+}
+
+static int unregister_synthetic_event(struct synthetic_event *event)
+{
+	struct trace_event_call *call = &event->call;
+	int ret;
+
+	ret = trace_remove_event_call(call);
+	if (ret) {
+		pr_warn("Failed to remove synthetic event: %s\n",
+			trace_event_name(call));
+		free_synthetic_event_print_fmt(call);
+		unregister_trace_event(&call->event);
+	}
+
+	return ret;
+}
+
+static int add_synthetic_event(struct synthetic_event *event)
+{
+	int ret;
+
+	mutex_lock(&synthetic_event_mutex);
+
+	ret = register_synthetic_event(event);
+	if (ret)
+		goto out;
+	list_add(&event->list, &synthetic_events_list);
+out:
+	mutex_unlock(&synthetic_event_mutex);
+
+	return ret;
+}
+
+static void remove_synthetic_event(struct synthetic_event *event)
+{
+	mutex_lock(&synthetic_event_mutex);
+
+	unregister_synthetic_event(event);
+	list_del(&event->list);
+
+	mutex_unlock(&synthetic_event_mutex);
+}
+
+static int parse_synthetic_field(struct synthetic_event *event,
+				 char *str, int i)
+{
+	char *field_name, *system, *event_name, *var_name;
+	struct hist_field *var_ref;
+	int ret = 0;
+
+	field_name = strsep(&str, "=");
+	if (!str || !field_name) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	event->fields[i].name = kstrdup(field_name, GFP_KERNEL);
+	if (!event->fields[i].name)
+		ret = -ENOMEM;
+
+	system = strsep(&str, ":");
+	if (!system || !str) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	event_name = strsep(&str, ":");
+	if (!str) {
+		var_name = event_name;
+		event_name = system;
+		system = NULL;
+	} else
+		var_name = str;
+
+	var_ref = parse_var_ref(system, event_name, var_name, true);
+	if (!var_ref) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	event->fields[i].var_ref = var_ref;
+ out:
+	return ret;
+}
+
+static int create_synthetic_event(int argc, char **argv)
+{
+	struct synthetic_event *event = NULL;
+	bool delete_event = false;
+	int i, ret = 0;
+	char *token;
+
+	/*
+	 * Argument syntax:
+	 *  - Add synthetic event: hist:<event_name> [EVENT:]VAR ...
+	 *  - Remove synthetic event: !hist:<event_name> [EVENT:]VAR ...
+	 * EVENT can be sys:event_name or event_name or nothing if VAR unique
+	 */
+	if (argc < 1) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	token = argv[0];
+	if (token[0] == '!') {
+		delete_event = true;
+		token++;
+	}
+
+	event = find_synthetic_event(token);
+	if (event) {
+		if (delete_event) {
+			remove_synthetic_event(event);
+			goto err;
+		} else
+			ret = -EEXIST;
+		goto out;
+	} else if (delete_event) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (argc < 2) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	event = alloc_synthetic_event(token, argc - 1);
+	if (IS_ERR(event)) {
+		ret = PTR_ERR(event);
+		event = NULL;
+		goto err;
+	}
+
+	for (i = 1; i < argc; i++) {
+		ret = parse_synthetic_field(event, argv[i], i - 1);
+		if (ret)
+			goto err;
+	}
+
+	ret = add_synthetic_event(event);
+	if (ret)
+		goto err;
+ out:
+	return ret;
+ err:
+	free_synthetic_event(event);
+
+	goto out;
+}
+
+static int release_all_synthetic_events(void)
+{
+	struct synthetic_event *event, *e;
+
+	mutex_lock(&synthetic_event_mutex);
+
+	list_for_each_entry_safe(event, e, &synthetic_events_list, list) {
+		remove_synthetic_event(event);
+		free_synthetic_event(event);
+	}
+
+	mutex_unlock(&synthetic_event_mutex);
+
+	return 0;
+}
+
+
+static void *synthetic_events_seq_start(struct seq_file *m, loff_t *pos)
+{
+	mutex_lock(&synthetic_event_mutex);
+
+	return seq_list_start(&synthetic_events_list, *pos);
+}
+
+static void *synthetic_events_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	return seq_list_next(v, &synthetic_events_list, pos);
+}
+
+static void synthetic_events_seq_stop(struct seq_file *m, void *v)
+{
+	mutex_unlock(&synthetic_event_mutex);
+}
+
+static int synthetic_events_seq_show(struct seq_file *m, void *v)
+{
+	struct synthetic_event_field *se_field;
+	const char *var_name, *system, *event_name;
+	struct hist_trigger_data *hist_data;
+	struct synthetic_event *event = v;
+	struct trace_event_call *call;
+	struct hist_field *ref_field;
+	bool pending;
+	unsigned int i;
+
+	seq_printf(m, "%s ", event->name);
+
+	for (i = 0; i < event->n_fields; i++) {
+		se_field = &event->fields[i];
+		ref_field = se_field->var_ref;
+		pending = ref_field->var_ref.pending;
+		if (!pending) {
+			hist_data = ref_field->var_ref.hist_data;
+			call = hist_data->event_file->event_call;
+			system = call->class->system;
+			event_name = trace_event_name(call);
+		} else {
+			system = ref_field->var_ref.pending_system;
+			event_name = ref_field->var_ref.pending_event_name;
+		}
+
+		var_name = ref_field->var_ref.pending_var_name;
+
+		/* parameter values */
+		seq_printf(m, "%s=%s%s%s:%s%s%s", event->fields[i].name,
+			   system ? system : "", system ? ":" : "",
+			   event_name, var_name, pending ? "*" : "",
+			   i == event->n_fields - 1 ? "" : ", ");
+	}
+
+	seq_putc(m, '\n');
+
+	return 0;
+}
+
+static const struct seq_operations synthetic_events_seq_op = {
+	.start  = synthetic_events_seq_start,
+	.next   = synthetic_events_seq_next,
+	.stop   = synthetic_events_seq_stop,
+	.show   = synthetic_events_seq_show
+};
+
+static int synthetic_events_open(struct inode *inode, struct file *file)
+{
+	int ret;
+
+	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
+		ret = release_all_synthetic_events();
+		if (ret < 0)
+			return ret;
+	}
+
+	return seq_open(file, &synthetic_events_seq_op);
+}
+
+static ssize_t synthetic_events_write(struct file *file,
+				      const char __user *buffer,
+				      size_t count, loff_t *ppos)
+{
+	return trace_parse_run_command(file, buffer, count, ppos,
+				       create_synthetic_event);
+}
+
+static const struct file_operations synthetic_events_fops = {
+	.open           = synthetic_events_open,
+	.write		= synthetic_events_write,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = seq_release,
+};
+
+static __init int trace_events_hist_init(void)
+{
+	struct dentry *entry = NULL;
+	struct trace_array *tr;
+	struct dentry *d_tracer;
+	int err = 0;
+
+	tr = top_trace_array();
+	if (!tr) {
+		err = -ENODEV;
+		goto err;
+	}
+
+	d_tracer = tracing_init_dentry();
+	if (IS_ERR(d_tracer)) {
+		err = PTR_ERR(d_tracer);
+		goto err;
+	}
+
+	entry = tracefs_create_file("synthetic_events", 0644, d_tracer,
+				    tr, &synthetic_events_fops);
+	if (!entry) {
+		err = -ENODEV;
+		goto err;
+	}
+
+	return err;
+ err:
+	pr_warn("Could not create tracefs 'synthetic_events' entry\n");
+
+	return err;
+}
+
+fs_initcall(trace_events_hist_init);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 19/21] tracing: Add 'onmatch' hist trigger action support
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (17 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 18/21] tracing: Add support for 'synthetic' events Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 20/21] tracing: Add 'onmax' " Tom Zanussi
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Add an 'onmatch().trace(event)' hist trigger action which is invoked
with the set of resolved variables named in the given synthetic event.
The result is the generation of a synthetic event that consists of the
values contained in those variables at the time the invoking event was
hit.

As an example the below defines a simple synthetic event using a
variable defined on the sched_wakeup_new event, and shows the event
definition with unresolved fields, since the sched_wakeup_new event
with the testpid variable hasn't been defined yet:

    # echo 'wakeup_new_test pid=sched_wakeup_new:testpid' >> \
      /sys/kernel/debug/tracing/synthetic_events

    # cat /sys/kernel/debug/tracing/synthetic_events
      wakeup_new_test pid=sched_wakeup_new:testpid*

The following hist trigger both defines the missing testpid variable
and specifies an onmatch().trace action that generates a
wakeup_new_test synthetic event whenever a sched_wakeup_new event
occurs, which because of the 'if comm == "cyclictest"' filter only
happens when the executable is cyclictest:

    # echo 'hist:keys=testpid=pid:onmatch().trace(wakeup_new_test) \
      if comm=="cyclictest"' >> \
      /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger

Creating and displaying a histogram based on those events is now just
a matter of using the fields and new synthetic event in the
tracing/events/synthetic directory, as usual:

    # echo 'hist:keys=pid:sort=pid' >> \
      /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 167 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 46da09f..2f9efb8 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -279,6 +279,7 @@ static u64 hist_field_timestamp(struct hist_field *hist_field,
 }
 
 static LIST_HEAD(hist_var_list);
+static LIST_HEAD(hist_action_list);
 
 struct hist_var_data {
 	struct list_head list;
@@ -610,6 +611,16 @@ static int parse_action(char *str, struct hist_trigger_attrs *attrs)
 	if (attrs->n_actions == HIST_ACTIONS_MAX)
 		return -EINVAL;
 
+	if ((strncmp(str, "onmatch(", strlen("onmatch(")) == 0)) {
+		attrs->action_str[attrs->n_actions] = kstrdup(str, GFP_KERNEL);
+		if (!attrs->action_str[attrs->n_actions]) {
+			ret = -ENOMEM;
+			return ret;
+		}
+		attrs->n_actions++;
+		ret = 1;
+	}
+
 	return ret;
 }
 
@@ -1823,6 +1834,129 @@ static int add_synthetic_var_refs(struct hist_trigger_data *hist_data,
 	return var_ref_idx;
 }
 
+static void action_trace(struct hist_trigger_data *hist_data,
+			 struct tracing_map_elt *elt, void *rec,
+			 struct ring_buffer_event *rbe,
+			 struct action_data *data, u64 *var_ref_vals)
+{
+	struct synthetic_event *event = data->synthetic_event;
+
+	trace_synthetic(event, var_ref_vals, data->var_ref_idx);
+}
+
+static bool check_hist_action_refs(struct hist_trigger_data *hist_data,
+				   struct synthetic_event *event)
+{
+	unsigned int i;
+
+	for (i = 0; i < hist_data->n_actions; i++) {
+		struct action_data *data = hist_data->actions[i];
+
+		if (data->fn == action_trace && data->synthetic_event == event)
+			return true;
+	}
+
+	return false;
+}
+
+static bool check_synthetic_action_refs(struct synthetic_event *event)
+{
+	struct hist_var_data *var_data;
+
+	list_for_each_entry(var_data, &hist_action_list, list)
+		if (check_hist_action_refs(var_data->hist_data, event))
+			return true;
+
+	return false;
+}
+
+static struct hist_var_data *find_hist_actions(struct hist_trigger_data *hist_data)
+{
+	struct hist_var_data *var_data, *found = NULL;
+
+	list_for_each_entry(var_data, &hist_action_list, list) {
+		if (var_data->hist_data == hist_data) {
+			found = var_data;
+			break;
+		}
+	}
+
+	return found;
+}
+
+static int save_hist_actions(struct hist_trigger_data *hist_data)
+{
+	struct hist_var_data *var_data;
+
+	var_data = find_hist_actions(hist_data);
+	if (var_data)
+		return 0;
+
+	var_data = kzalloc(sizeof(*var_data), GFP_KERNEL);
+	if (!var_data)
+		return -ENOMEM;
+
+	var_data->hist_data = hist_data;
+	list_add(&var_data->list, &hist_action_list);
+
+	return 0;
+}
+
+static int remove_hist_actions(struct hist_trigger_data *hist_data)
+{
+	struct hist_var_data *var_data;
+
+	var_data = find_hist_actions(hist_data);
+	if (!var_data)
+		return -EINVAL;
+
+	list_del(&var_data->list);
+
+	return 0;
+}
+
+static int create_onmatch_data(char *str, struct hist_trigger_data *hist_data)
+{
+	char *fn_name, *param;
+	struct action_data *data;
+	int ret = 0;
+
+	strsep(&str, ".");
+	if (!str)
+		return -EINVAL;
+
+	fn_name = strsep(&str, "(");
+	if (!fn_name || !str)
+		return -EINVAL;
+
+	if (strncmp(fn_name, "trace", strlen("trace")) == 0) {
+		struct synthetic_event *event;
+
+		param = strsep(&str, ")");
+		if (!param)
+			return -EINVAL;
+
+		event = find_synthetic_event(param);
+		if (!event)
+			return -EINVAL;
+
+		if (!resolve_pending_var_refs(event))
+			return -EINVAL;
+
+		data = kzalloc(sizeof(*data), GFP_KERNEL);
+		if (!data)
+			return -ENOMEM;
+
+		data->fn = action_trace;
+		data->synthetic_event = event;
+		data->var_ref_idx = add_synthetic_var_refs(hist_data, event);
+		hist_data->actions[hist_data->n_actions++] = data;
+		save_hist_actions(hist_data);
+	}
+
+	return ret;
+}
+
 static void destroy_actions(struct hist_trigger_data *hist_data)
 {
 	unsigned int i;
@@ -1842,6 +1976,14 @@ static int create_actions(struct hist_trigger_data *hist_data)
 
 	for (i = 0; i < hist_data->attrs->n_actions; i++) {
 		str = hist_data->attrs->action_str[i];
+
+		if (strncmp(str, "onmatch(", strlen("onmatch(")) == 0) {
+			char *action_str = str + strlen("onmatch(");
+
+			ret = create_onmatch_data(action_str, hist_data);
+			if (ret)
+				return ret;
+		}
 	}
 
 	return ret;
@@ -1858,6 +2000,16 @@ static void print_actions(struct seq_file *m,
 	}
 }
 
+static void print_onmatch_spec(struct seq_file *m,
+			       struct hist_trigger_data *hist_data,
+			       struct action_data *data)
+{
+	seq_puts(m, ":onmatch().");
+
+	if (data->synthetic_event)
+		seq_printf(m, "trace(%s)", data->synthetic_event->name);
+}
+
 static void print_actions_spec(struct seq_file *m,
 			       struct hist_trigger_data *hist_data)
 {
@@ -1865,6 +2017,9 @@ static void print_actions_spec(struct seq_file *m,
 
 	for (i = 0; i < hist_data->n_actions; i++) {
 		struct action_data *data = hist_data->actions[i];
+
+		if (data->fn == action_trace)
+			print_onmatch_spec(m, hist_data, data);
 	}
 }
 
@@ -2428,6 +2583,9 @@ static void event_hist_trigger_free(struct event_trigger_ops *ops,
 		if (remove_hist_vars(hist_data))
 			return;
 
+		if (remove_hist_actions(hist_data))
+			return;
+
 		destroy_hist_data(hist_data);
 	}
 }
@@ -3417,6 +3575,10 @@ static int create_synthetic_event(int argc, char **argv)
 	event = find_synthetic_event(token);
 	if (event) {
 		if (delete_event) {
+			if (check_synthetic_action_refs(event)) {
+				ret = -EINVAL;
+				goto out;
+			}
 			remove_synthetic_event(event);
 			goto err;
 		} else
@@ -3462,6 +3624,11 @@ static int release_all_synthetic_events(void)
 
 	mutex_lock(&synthetic_event_mutex);
 
+	list_for_each_entry(event, &synthetic_events_list, list) {
+		if (check_synthetic_action_refs(event))
+			return -EINVAL;
+	}
+
 	list_for_each_entry_safe(event, e, &synthetic_events_list, list) {
 		remove_synthetic_event(event);
 		free_synthetic_event(event);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 20/21] tracing: Add 'onmax' hist trigger action support
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (18 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 19/21] tracing: Add 'onmatch' hist trigger action support Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 17:25 ` [RFC][PATCH 21/21] tracing: Add inter-event hist trigger Documentation Tom Zanussi
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Add an 'onmax(var).save(field...)' hist trigger action which is
invoked whenever an event exceeds the current maximum.

The end result is that the trace event fields specified as the
onmax.save() params will be saved if 'var' exceeds the current maximum
for that hist trigger entry.  This allows context from the event that
exhibited the new maximum to be saved for later reference.  When the
histogram is displayed, additional fields displaying the saved values
will be printed.

As an example the below defines a couple of hist triggers, one for
sched_wakeup and another for sched_switch, keyed on pid.  Whenever a
sched_wakeup occurs, the timestamp is saved in the entry corresponding
to the current pid, and when the scheduler switches back to that pid,
the timestamp difference is calculated.  If the resulting latency
exceeds the current maximum latency, the specified save() values are
saved:

    # echo 'hist:keys=pid:ts0=common_timestamp.usecs \
      if comm=="cyclictest"' >> \
      /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger

    # echo 'hist:keys=next_pid:\
      wakeup_lat=common_timestamp.usecs-ts0:\
      onmax(wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
      if next_comm=="cyclictest"' >> \
      /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

When the histogram is displayed, the max value and the saved values
corresponding to the max are displayed following the rest of the
fields:

    # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
      { next_pid:       2255 } hitcount:        239 \
        common_timestamp-ts0:          0
        max:         27  next_comm: cyclictest \
        prev_pid:          0  prev_prio:        120  prev_comm: swapper/1 \
      { next_pid:       2256 } hitcount:       2355  common_timestamp-ts0: 0 \
      	max:         49  next_comm: cyclictest \
        prev_pid:          0  prev_prio:        120  prev_comm: swapper/0

    Totals:
        Hits: 12970
        Entries: 2
        Dropped: 0

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 kernel/trace/trace_events_hist.c | 280 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 278 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 2f9efb8..14e05da 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -36,6 +36,7 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field,
 #define HIST_FIELD_OPERANDS_MAX	2
 #define HIST_ASSIGNMENT_MAX	4
 #define HIST_ACTIONS_MAX	8
+#define HIST_ON_MAX_SAVE_MAX	8
 
 enum field_op_id {
 	FIELD_OP_NONE,
@@ -260,6 +261,7 @@ struct hist_trigger_data {
 	bool				enable_timestamps;
 	struct action_data		*actions[HIST_ACTIONS_MAX];
 	unsigned int			n_actions;
+	unsigned int			n_onmax_str;
 };
 
 static u64 hist_field_timestamp(struct hist_field *hist_field,
@@ -450,6 +452,7 @@ static struct hist_field *find_var_field(struct hist_trigger_data *hist_data,
 struct hist_elt_data {
 	char *comm;
 	u64 *var_ref_vals;
+	char *onmax_str[HIST_ON_MAX_SAVE_MAX];
 };
 
 static u64 hist_field_var_ref(struct hist_field *hist_field,
@@ -611,7 +614,8 @@ static int parse_action(char *str, struct hist_trigger_attrs *attrs)
 	if (attrs->n_actions == HIST_ACTIONS_MAX)
 		return -EINVAL;
 
-	if ((strncmp(str, "onmatch(", strlen("onmatch(")) == 0)) {
+	if ((strncmp(str, "onmatch(", strlen("onmatch(")) == 0) ||
+	    (strncmp(str, "onmax(", strlen("onmax(")) == 0)) {
 		attrs->action_str[attrs->n_actions] = kstrdup(str, GFP_KERNEL);
 		if (!attrs->action_str[attrs->n_actions]) {
 			ret = -ENOMEM;
@@ -729,7 +733,12 @@ static inline void save_comm(char *comm, struct task_struct *task)
 
 static void hist_trigger_elt_data_free(struct tracing_map_elt *elt)
 {
+	struct hist_trigger_data *hist_data = elt->map->private_data;
 	struct hist_elt_data *private_data = elt->private_data;
+	unsigned int i;
+
+	for (i = 0; i < hist_data->n_onmax_str; i++)
+		kfree(private_data->onmax_str[i]);
 
 	kfree(private_data->comm);
 	kfree(private_data);
@@ -761,6 +770,14 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
 		}
 	}
 
+	for (i = 0; i < hist_data->n_onmax_str; i++) {
+		elt_data->onmax_str[i] = kzalloc(size, GFP_KERNEL);
+		if (!elt_data->onmax_str[i]) {
+			hist_trigger_elt_data_free(elt);
+			return -ENOMEM;
+		}
+	}
+
 	return 0;
 }
 
@@ -990,6 +1007,12 @@ struct action_data {
 	action_fn_t	fn;
 	unsigned int	var_ref_idx;
 	struct synthetic_event *synthetic_event;
+
+	unsigned int		max_var_ref_idx;
+	struct hist_field	*max_save_val[HIST_ON_MAX_SAVE_MAX];
+	struct hist_field	*max_save_var[HIST_ON_MAX_SAVE_MAX];
+	unsigned int		max_n_save;
+	struct hist_field	*max_var;
 };
 
 static LIST_HEAD(synthetic_events_list);
@@ -1957,6 +1980,225 @@ static int create_onmatch_data(char *str, struct hist_trigger_data *hist_data)
 	return ret;
 }
 
+static void onmax_print(struct seq_file *m,
+			struct hist_trigger_data *hist_data,
+			struct tracing_map_elt *elt,
+			struct action_data *data)
+{
+	unsigned int i, save_var_idx, max_idx = data->max_var->var_ref.idx;
+
+	seq_printf(m, "\n\tmax: %10llu", tracing_map_read_var(elt, max_idx));
+
+	for (i = 0; i < data->max_n_save; i++) {
+		struct hist_field *save_val = data->max_save_val[i];
+		struct hist_field *save_var = data->max_save_var[i];
+		u64 val;
+
+		save_var_idx = save_var->var_ref.idx;
+
+		val = tracing_map_read_var(elt, save_var_idx);
+
+		if (save_val->flags & HIST_FIELD_FL_STRING) {
+			seq_printf(m, "  %s: %-50s", save_var->var_name,
+				   (char *)(val));
+		} else
+			seq_printf(m, "  %s: %10llu", save_var->var_name, val);
+	}
+}
+
+static void onmax_save(struct hist_trigger_data *hist_data,
+		       struct tracing_map_elt *elt, void *rec,
+		       struct ring_buffer_event *rbe,
+		       struct action_data *data, u64 *var_ref_vals)
+{
+	unsigned int i, j, save_idx, max_idx = data->max_var->var_ref.idx;
+	unsigned int max_var_ref_idx = data->max_var_ref_idx;
+	struct hist_elt_data *elt_data = elt->private_data;
+
+	u64 var_val, max_val;
+
+	var_val = var_ref_vals[max_var_ref_idx];
+	max_val = tracing_map_read_var(elt, max_idx);
+
+	if (var_val <= max_val)
+		return;
+
+	tracing_map_set_var(elt, max_idx, var_val);
+
+	for (i = 0, j = 0; i < data->max_n_save; i++) {
+		struct hist_field *save_val = data->max_save_val[i];
+
+		var_val = save_val->fn(save_val, elt, rbe, rec);
+		save_idx = data->max_save_var[i]->var_ref.idx;
+
+		if (save_val->flags & HIST_FIELD_FL_STRING) {
+			char *onmax_str = elt_data->onmax_str[j++];
+
+			memcpy(onmax_str, (char *)var_val, TASK_COMM_LEN + 1);
+			var_val = (u64)onmax_str;
+		}
+		tracing_map_set_var(elt, save_idx, var_val);
+	}
+}
+
+static struct hist_field *create_free_var(struct hist_trigger_data *hist_data,
+					  char *name)
+{
+	struct hist_field *var;
+	unsigned int idx;
+
+	var = kzalloc(sizeof(struct hist_field), GFP_KERNEL);
+	if (!var) {
+		var = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	idx = tracing_map_add_var(hist_data->map);
+	if (idx < 0) {
+		kfree(var);
+		var = ERR_PTR(-EINVAL);
+		goto out;
+	}
+
+	var->flags = HIST_FIELD_FL_VAR;
+	var->var_ref.idx = idx;
+	var->var_ref.hist_data = hist_data;
+	var->var_name = kstrdup(name, GFP_KERNEL);
+ out:
+	return var;
+}
+
+static int create_save_var(struct hist_trigger_data *hist_data,
+			   struct trace_event_file *file,
+			   struct action_data *data,
+			   const char *event_name, char *field_str)
+{
+	struct hist_field *val_field, *save_var;
+	unsigned long flags = HIST_FIELD_FL_VAR;
+	int ret = 0;
+
+	val_field = parse_atom(hist_data, file, field_str,
+			       &flags, NULL);
+	if (IS_ERR(val_field)) {
+		ret = PTR_ERR(val_field);
+		goto out;
+	}
+	data->max_save_val[data->max_n_save] = val_field;
+	if (val_field->flags & HIST_FIELD_FL_STRING)
+		hist_data->n_onmax_str++;
+
+	save_var = create_free_var(hist_data, field_str);
+	if (IS_ERR(save_var)) {
+		ret = PTR_ERR(save_var);
+		goto out;
+	}
+	data->max_save_var[data->max_n_save++] = save_var;
+
+	if (WARN_ON(data->max_n_save > HIST_ON_MAX_SAVE_MAX)) {
+		ret = -EINVAL;
+		goto out;
+	}
+ out:
+	return ret;
+}
+
+static void destroy_onmax_data(struct action_data *data)
+{
+	unsigned int i;
+
+	destroy_hist_field(data->max_var);
+
+	for (i = 0; i < data->max_n_save; i++) {
+		destroy_hist_field(data->max_save_val[i]);
+		destroy_hist_field(data->max_save_var[i]);
+	}
+
+	kfree(data);
+}
+
+static int create_onmax_data(char *str, struct hist_trigger_data *hist_data)
+{
+	struct trace_event_call *call = hist_data->event_file->event_call;
+	unsigned int n_save = 0, var_ref_idx = hist_data->n_var_refs;
+	struct hist_field *var_field, *ref_field, *max_var_field;
+	struct trace_event_file *file = hist_data->event_file;
+	char *fn_name, *onmax_var;
+	struct action_data *data;
+	const char *event_name;
+	unsigned long flags;
+	int ret = 0;
+
+	onmax_var = strsep(&str, ")");
+	if (!onmax_var || !str)
+		return -EINVAL;
+
+	event_name = trace_event_name(call);
+	var_field = find_var_field(hist_data, NULL, event_name, onmax_var);
+	if (var_field) {
+		flags = HIST_FIELD_FL_VAR_REF;
+		ref_field = create_hist_field(NULL, flags, NULL);
+		if (ref_field) {
+			ref_field->var_ref.idx = var_field->var_ref.idx;
+			ref_field->var_ref.hist_data = hist_data;
+			ref_field->name = kstrdup(var_field->var_name, GFP_KERNEL);
+			hist_data->var_refs[hist_data->n_var_refs] = ref_field;
+			ref_field->var_ref_idx = hist_data->n_var_refs++;
+		}
+	}
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+	data->fn = onmax_save;
+	data->max_var_ref_idx = var_ref_idx;
+	max_var_field = create_free_var(hist_data, "max");
+	if (IS_ERR(max_var_field)) {
+		ret = PTR_ERR(max_var_field);
+		goto free;
+	}
+	data->max_var = max_var_field;
+
+	strsep(&str, ".");
+	if (!str) {
+		ret = -EINVAL;
+		goto free;
+	}
+
+	fn_name = strsep(&str, "(");
+	if (!fn_name || !str) {
+		ret = -EINVAL;
+		goto free;
+	}
+
+	if (strncmp(fn_name, "save", strlen("save")) == 0) {
+		char *save_param = strsep(&str, ")");
+
+		if (!save_param) {
+			ret = -EINVAL;
+			goto free;
+		}
+
+		while (save_param) {
+			char *save_var = strsep(&save_param, ",");
+
+			ret = create_save_var(hist_data, file, data,
+					      event_name, save_var);
+			if (ret)
+				goto free;
+			n_save++;
+		}
+	}
+
+	data->max_n_save = n_save;
+
+	hist_data->actions[hist_data->n_actions++] = data;
+ out:
+	return ret;
+ free:
+	destroy_onmax_data(data);
+	goto out;
+}
+
 static void destroy_actions(struct hist_trigger_data *hist_data)
 {
 	unsigned int i;
@@ -1964,7 +2206,10 @@ static void destroy_actions(struct hist_trigger_data *hist_data)
 	for (i = 0; i < hist_data->n_actions; i++) {
 		struct action_data *data = hist_data->actions[i];
 
-		kfree(data);
+		if (data->fn == onmax_save)
+			destroy_onmax_data(data);
+		else
+			kfree(data);
 	}
 }
 
@@ -1983,6 +2228,12 @@ static int create_actions(struct hist_trigger_data *hist_data)
 			ret = create_onmatch_data(action_str, hist_data);
 			if (ret)
 				return ret;
+		} else if (strncmp(str, "onmax(", strlen("onmax(")) == 0) {
+			char *action_str = str + strlen("onmax(");
+
+			ret = create_onmax_data(action_str, hist_data);
+			if (ret)
+				return ret;
 		}
 	}
 
@@ -1997,7 +2248,28 @@ static void print_actions(struct seq_file *m,
 
 	for (i = 0; i < hist_data->n_actions; i++) {
 		struct action_data *data = hist_data->actions[i];
+
+		if (data->fn == onmax_save)
+			onmax_print(m, hist_data, elt, data);
+	}
+}
+
+static void print_onmax_spec(struct seq_file *m,
+			     struct hist_trigger_data *hist_data,
+			     struct action_data *data)
+{
+	unsigned int i;
+
+	seq_puts(m, ":onmax(");
+	seq_printf(m, "%s", hist_data->var_refs[data->max_var_ref_idx]->name);
+	seq_puts(m, ").save(");
+
+	for (i = 0; i < data->max_n_save; i++) {
+		seq_printf(m, "%s", data->max_save_var[i]->var_name);
+		if (i < data->max_n_save - 1)
+			seq_puts(m, ",");
 	}
+	seq_puts(m, ")");
 }
 
 static void print_onmatch_spec(struct seq_file *m,
@@ -2020,6 +2292,8 @@ static void print_actions_spec(struct seq_file *m,
 
 		if (data->fn == action_trace)
 			print_onmatch_spec(m, hist_data, data);
+		else if (data->fn == onmax_save)
+			print_onmax_spec(m, hist_data, data);
 	}
 }
 
@@ -2324,6 +2598,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
 		}
 	}
 
+	print_actions(m, hist_data, elt);
+
 	seq_puts(m, "\n");
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC][PATCH 21/21] tracing: Add inter-event hist trigger Documentation
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (19 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 20/21] tracing: Add 'onmax' " Tom Zanussi
@ 2017-02-08 17:25 ` Tom Zanussi
  2017-02-08 20:01 ` [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Steven Rostedt
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 17:25 UTC (permalink / raw)
  To: rostedt
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users, Tom Zanussi

Add background and details on inter-event hist triggers, including
hist variables, synthetic events, and actions.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
 Documentation/trace/events.txt | 330 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 330 insertions(+)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 2cc08d4..e3fb774 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -571,6 +571,7 @@ The following commands are supported:
 	.sym-offset display an address as a symbol and offset
 	.syscall    display a syscall id as a system call name
 	.execname   display a common_pid as a program name
+	.usecs      display a common_timestamp in microseconds
 
   Note that in general the semantics of a given field aren't
   interpreted when applying a modifier to it, but there are some
@@ -2064,3 +2065,332 @@ The following commands are supported:
         Hits: 489
         Entries: 7
         Dropped: 0
+
+6.3 Inter-event hist triggers
+-----------------------------
+
+Inter-event hist triggers are hist triggers that combine values from
+one or more other events and create a histogram using that data.  Data
+from an inter-event histogram can in turn become the source for
+further combined histograms, thus providing a chain of related
+histograms, which is important for some applications.
+
+The most important example of an inter-event quantity that can be used
+in this manner is latency, which is simply a difference in timestamps
+between two events (although trace events don't have an externally
+visible timestamp field, the inter-event hist trigger support adds a
+pseudo-field to all events named 'common_timestamp' which can be used
+as if it were an actual event field).  Although latency is the most
+important inter-event quantity, note that because the support is
+completely general across the trace event subsystem, any event field
+can be used in an inter-event quantity.
+
+An example of a histogram that combines data from other histograms
+into a useful chain would be a 'wakeupswitch latency' histogram that
+combines a 'wakeup latency' histogram and a 'switch latency'
+histogram.
+
+Normally, a hist trigger specification consists of a (possibly
+compound) key along with one or more numeric values, which are
+continually updated sums associated with that key.  A histogram
+specification in this case consists of individual key and value
+specifications that refer to trace event fields associated with a
+single event type.
+
+The inter-event hist trigger extension allows fields from multiple
+events to be referenced and combined into a multi-event histogram
+specification.  In support of this overall goal, a few enabling
+features have been added to the hist trigger support:
+
+  - In order to compute an inter-event quantity, a value from one
+    event needs to saved and then referenced from another event.  This
+    requires the introduction of support for histogram 'variables'.
+
+  - The computation of inter-event quantities and their combination
+    require some minimal amount of support for applying simple
+    expressions to variables (+ and -).
+
+  - A histogram consisting of inter-event quantities isn't logically a
+    histogram on either event (so having the 'hist' file for either
+    event host the histogram output doesn't really make sense).  To
+    address the idea that the histogram is associated with a
+    combination of events, support is added allowing the creation of
+    'synthetic' events that are events derived from other events.
+    These synthetic events are full-fledged events just like any other
+    and can be used as such, as for instance to create the
+    'combination' histograms mentioned previously.
+
+  - A set of 'actions' can be associated with histogram entries -
+    these can be used to generate the previously mentioned synthetic
+    events, but can also be used for other purposes, such as for
+    example saving context when a 'max' latency has been hit.
+
+  - Trace events don't have a 'timestamp' associated with them, but
+    there is an implicit timestamp saved along with an event in the
+    underlying ftrace ring buffer.  This timestamp is now exposed as a
+    a synthetic field named 'common_timestamp' which can be used in
+    histograms as if it were any other event field.  By default it is
+    in units of nanoseconds; appending '.usecs' to a common_timestamp
+    field changes the units to microseconds.
+
+These features are decribed in more detail in the following sections.
+
+6.3.1 Histogram Variables
+-------------------------
+
+Variables are simply named locations used for saving and retrieving
+values between matching events.  A 'matching' event is defined as an
+event that has a matching key - if a variable is saved for a histogram
+entry corresponding to that key, any subsequent event with a matching
+key can access that variable.
+
+A variable's value is normally available to any subsequent event until
+it is set to something else by a subsequent event.  The one exception
+to that rule is that any variable used in an expression is essentially
+'read-once' - once it's used by an expression in a subsequent event,
+it's reset to its 'unset' state, which means it can't be used again
+unless it's set again.  This ensures not only that an event doesn't
+use an uninitialized variable in a calculation, but that that variable
+is used only once and not for any unrelated subsequent match.
+
+The basic syntax for saving a variable is to simply prefix a unique
+variable name not corresponding to any keyword along with an '=' sign
+to any event field.
+
+Either keys or values can be saved and retrieved in this way.  This
+creates a variable named 'ts0' for a histogram entry with the key
+'next_pid':
+
+  # echo 'hist:keys=next_pid:vals=ts0=common_timestamp ... >> event/trigger
+
+The ts0 variable can be accessed by any subsequent event having the
+same pid as 'next_pid'.  Because 'vals=' is used, the common_timestamp
+variable value will also be summed as a normal histogram value would
+(though for a timestamp it makes little sense).
+
+The below shows that a key value can also be saved in the same way:
+
+  # echo 'hist:key=timer_pid=common_pid ...' >> event/trigger
+
+If a variable isn't a key variable or prefixed with 'vals=', the
+associated event field will be saved in a variable but won't be summed
+as a value:
+
+  # echo 'hist:keys=next_pid:ts1=common_timestamp ... >> event/trigger
+
+Multiple variables can be assigned at the same time.  The below would
+result in both ts0 and b being created as variables, with both
+common_timestamp and field1 additionally being summed as values:
+
+  # echo 'hist:keys=pid:vals=ts0=common_timestamp,b=field1 ... >> event/trigger
+
+Any number of variables not bound to a 'vals=' prefix can also be
+assigned by simply separating them with colons.  Below is the same
+thing but without the values being summed in the histogram:
+
+  # echo 'hist:keys=pid:ts0=common_timestamp:b=field1 ... >> event/trigger
+
+Variables set as above can be referenced and used in expressions on
+another event.
+
+For example, here's how a latency can be calculated:
+
+  # echo 'hist:keys=pid,prio:ts0=common_timestamp ... >> event1/trigger
+  # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-ts0 ... >> event2/trigger
+
+In the first line above, the event's timetamp is saved into the
+variable ts0.  In the next line, ts0 is subtracted from the second
+event's timestamp to produce the latency, which is then assigned into
+yet another variable, 'wakeup_lat'.  The hist trigger below in turn
+makes use of the wakeup_lat variable to compute a combined latency
+using the same key and variable from yet another event:
+
+  # echo 'hist:key=pid:wakeupswitch_lat=wakeup_lat+switchtime_lat ... >> event3/trigger
+
+6.3.2 Synthetic Events
+----------------------
+
+Synthetic events are user-defined events generated from hist trigger
+variables associated with one or more other events.  Their purpose is
+to provide a mechanism for displaying data spanning multiple events
+consistent with the existing and already familiar usage for normal
+events.
+
+Because they are derived from one or more other hist triggers, using
+them requires that those other triggers be defined before the
+synthetic event can be full resolved and generated.  A synthetic event
+can be defined before or after those events, but can't be used until
+all references to other events are resolved.
+
+To define a synthetic event, the user writes a simple specification
+consisting of the name of the new event along with one or more
+variables defined on other events, to the tracing/synthetic_events
+file.
+
+For instance, the following creates a new event named 'wakeup_latency'
+with 3 fields: lat, pid, and prio.  Each of those fields is simply a
+variable reference to a variable on another event:
+
+  # echo 'wakeup_latency \
+          lat=sched_switch:wakeup_lat \
+          pid=sched_switch:woken_pid \
+	  prio=sched_switch:woken_prio' >> \
+	  /sys/kernel/debug/tracing/synthetic_events
+
+Reading the tracing/synthetic_events file lists all the currently
+defined synthetic events, in this case the event defined above:
+
+  # cat /sys/kernel/debug/tracing/synthetic_events
+    wakeup_latency lat=sched_switch:wakeup_lat*, \
+                   pid=sched_switch:woken_pid*, \
+                   prio=sched_switch:woken_prio*
+
+Any event field that hasn't yet been 'resolved' is shown with an
+asterisk following it.  A field will be unresolved if another event
+defining the specified variable hasn't been defined yet.  Once the
+second event below is added, those variables are defined:
+
+  # echo 'hist:keys=pid,prio:ts0=common_timestamp.usecs' >> \
+          /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
+
+  # echo 'hist:keys=woken_pid=next_pid,woken_prio=next_prio:\
+          wakeup_lat=common_timestamp.usecs-ts0:' >> \
+          /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+At that point the variables are defined and displaying the
+synthetic_event no longer displays the asterisks:
+
+  # cat /sys/kernel/debug/tracing/synthetic_events
+        wakeup_latency \
+	lat=sched:sched_switch:wakeup_lat, \
+        pid=sched:sched_switch:woken_pid, \
+	prio=sched:sched_switch:woken_prio
+
+At this point, the synthetic event is ready to use, and a histogram
+can be defined using it:
+
+  # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
+        /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
+
+The new event is created under the tracing/events/synthetic/ directory
+and looks and behaves just like any other event:
+
+  # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
+        enable  filter  format  hist  id  trigger
+
+Like any other event, once a histogram is enabled for the event, the
+output can be displayed by reading the event's 'hist' file.
+
+Although a histogram can be defined for a synthetic event, it won't be
+populated until actions that actually trace that event occur.  To set
+that up, the user associates a 'trace' action naming the synthetic
+event with a triggering event.  This causes the synthetic event to be
+traced whenever a match occurs (see Section 6.3.3 below).
+
+6.3.3 Hist trigger 'actions'
+----------------------------
+
+A hist trigger 'action' is a function that's executed whenever a
+histogram entry is added or updated.
+
+The default 'action' if no special function is explicity specified is
+as it always has been, to simply update the set of values associated
+with an entry.  Some applications, however, may want to perform
+additional actions at that point, such as generate another event, or
+compare and save a maximum.
+
+The following additional actions are available.  To specify an action
+for a given event, simply specify the action between colons in the
+hist trigger specification.
+
+  - onmatch().trace(synthetic_event_name)
+
+    The 'onmatch().trace(event)' hist trigger action is invoked
+    whenever an event matches and the histogram entry would be or is
+    added or updated.  It causes the named synthetic event to be
+    generated with the values of the set of resolved variables named
+    in the given synthetic event.  The result is the generation of a
+    synthetic event that consists of the values contained in those
+    variables at the time the invoking event was hit.
+
+    As an example the below defines a simple synthetic event using a
+    variable defined on the sched_wakeup_new event, and shows the
+    event definition with unresolved fields, since the
+    sched_wakeup_new event with the testpid variable hasn't been
+    defined yet:
+
+    # echo 'wakeup_new_test pid=sched_wakeup_new:testpid' >> \
+           /sys/kernel/debug/tracing/synthetic_events
+
+    # cat /sys/kernel/debug/tracing/synthetic_events
+          wakeup_new_test pid=sched_wakeup_new:testpid*
+
+    The following hist trigger both defines the missing testpid
+    variable and specifies an onmatch().trace action that generates a
+    wakeup_new_test synthetic event whenever a sched_wakeup_new event
+    occurs, which because of the 'if comm == "cyclictest"' filter only
+    happens when the executable is cyclictest:
+
+    # echo 'hist:keys=testpid=pid:onmatch().trace(wakeup_new_test) \
+            if comm=="cyclictest"' >> \
+            /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
+
+    Creating and displaying a histogram based on those events is now
+    just a matter of using the fields and new synthetic event in the
+    tracing/events/synthetic directory, as usual:
+
+    # echo 'hist:keys=pid:sort=pid' >> \
+           /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger
+
+  - onmax(var).save(field,...)
+
+    The 'onmax(var).save(field,...)' hist trigger action is invoked
+    whenever the value of 'var' associated with a histogram entry
+    exceeds the current maximum contained in that variable.
+
+    The end result is that the trace event fields specified as the
+    onmax.save() params will be saved if 'var' exceeds the current
+    maximum for that hist trigger entry.  This allows context from the
+    event that exhibited the new maximum to be saved for later
+    reference.  When the histogram is displayed, additional fields
+    displaying the saved values will be printed.
+
+    As an example the below defines a couple of hist triggers, one for
+    sched_wakeup and another for sched_switch, keyed on pid.  Whenever
+    a sched_wakeup occurs, the timestamp is saved in the entry
+    corresponding to the current pid, and when the scheduler switches
+    back to that pid, the timestamp difference is calculated.  If the
+    resulting latency, stored in wakeup_lat, exceeds the current
+    maximum latency, the values specified in the save() fields are
+    recoreded:
+
+    # echo 'hist:keys=pid:ts0=common_timestamp.usecs \
+            if comm=="cyclictest"' >> \
+            /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
+
+    # echo 'hist:keys=next_pid:\
+            wakeup_lat=common_timestamp.usecs-ts0:\
+            onmax(wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
+            if next_comm=="cyclictest"' >> \
+            /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+    When the histogram is displayed, the max value and the saved
+    values corresponding to the max are displayed following the rest
+    of the fields:
+
+    # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
+      { next_pid:       2255 } hitcount:        239
+        common_timestamp-ts0:          0
+        max:         27
+	next_comm: cyclictest
+        prev_pid:          0  prev_prio:        120  prev_comm: swapper/1
+
+      { next_pid:       2256 } hitcount:       2355
+        common_timestamp-ts0: 0
+        max:         49  next_comm: cyclictest
+        prev_pid:          0  prev_prio:        120  prev_comm: swapper/0
+
+      Totals:
+          Hits: 12970
+          Entries: 2
+          Dropped: 0
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (20 preceding siblings ...)
  2017-02-08 17:25 ` [RFC][PATCH 21/21] tracing: Add inter-event hist trigger Documentation Tom Zanussi
@ 2017-02-08 20:01 ` Steven Rostedt
  2017-02-08 20:19   ` Tom Zanussi
  2017-02-08 23:28   ` Tom Zanussi
  2017-02-08 23:13 ` Masami Hiramatsu
  2017-02-10  4:16 ` Namhyung Kim
  23 siblings, 2 replies; 56+ messages in thread
From: Steven Rostedt @ 2017-02-08 20:01 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed,  8 Feb 2017 11:24:56 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

>     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
>                            pid=sched_switch:woken_pid \
>                            prio=sched_switch:woken_prio' >> \
>             /sys/kernel/debug/tracing/synthetic_events

I applied all your patches, did the above and then:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
 IP: free_synthetic_event+0x46/0xb0
 PGD 0 
 
 Oops: 0000 [#1] SMP
 Modules linked in: ip6table_filter ip6_tables x86_pkg_temp_thermal kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core irqbypass snd_seq snd_seq_device snd_pcm snd_timer snd i2c_i801 soundcore wmi i915 i2c_algo_bit drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ptp drm pps_core i2c_core video
 CPU: 0 PID: 1389 Comm: bash Not tainted 4.10.0-rc2-test+ #36
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff8800c8c48000 task.stack: ffffc90000a00000
 RIP: 0010:free_synthetic_event+0x46/0xb0
 RSP: 0018:ffffc90000a03cf8 EFLAGS: 00010282
 RAX: 0000000015675b00 RBX: ffff8800ccf4bcd0 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffc90000a03d18 R08: 0000000000000000 R09: ffff8800ccea7c6c
 R10: 000000000000003d R11: ffff8800c8c48000 R12: 0000000000000001
 R13: ffff8800c8ca6000 R14: ffff8800ccea7d88 R15: 0000000000000001
 FS:  00007fde7acdb700(0000) GS:ffff88011ea00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000034 CR3: 00000000ccc44000 CR4: 00000000001406f0
 Call Trace:
  create_synthetic_event+0xcb/0x580
  ? parse_var_ref+0x120/0x120
  trace_run_command+0x5f/0x70
  trace_parse_run_command+0x76/0x150
  ? parse_var_ref+0x120/0x120
  synthetic_events_write+0x10/0x20
  __vfs_write+0x28/0x140
  ? vmacache_find+0xcc/0x180
  ? rw_verify_area+0xd8/0x190
  ? _cond_resched+0x2e/0x50
  ? __sb_start_write+0x82/0xe0
  vfs_write+0x138/0x260
  SyS_write+0x4f/0xa0
  entry_SYSCALL_64_fastpath+0x13/0x94


Anyway, I kinda like the approach. I'll go patch by patch to see what
you did and maybe even take a few now, if they seem appropriate as
helpers.

-- Steve

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 01/21] tracing: Add hist_field_name() accessor
  2017-02-08 17:24 ` [RFC][PATCH 01/21] tracing: Add hist_field_name() accessor Tom Zanussi
@ 2017-02-08 20:09   ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2017-02-08 20:09 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed,  8 Feb 2017 11:24:57 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

> In preparation for hist_fields that won't be strictly based on
> trace_event_fields, add a new hist_field_name() accessor to allow that
> flexibility and update associated users.
> 
> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> ---
>  kernel/trace/trace_events_hist.c | 59 +++++++++++++++++++++++++---------------
>  1 file changed, 37 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index f3a960e..37347d7 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -145,6 +145,16 @@ struct hist_trigger_data {
>  	struct tracing_map		*map;
>  };
>  
> +static const char *hist_field_name(struct hist_field *field)
> +{
> +	const char *field_name = NULL;
> +
> +	if (field->field)
> +		field_name = field->field->name;
> +
> +	return field_name;
> +}

It would be much simpler to do:

{
	if (field->field)
		return field->field->name;

	return NULL;
}

or even:

	return field->field ? field->field->name : NULL;



> +
>  static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
>  {
>  	hist_field_fn_t fn = NULL;
> @@ -652,7 +662,6 @@ static int is_descending(const char *str)
>  static int create_sort_keys(struct hist_trigger_data *hist_data)
>  {
>  	char *fields_str = hist_data->attrs->sort_key_str;
> -	struct ftrace_event_field *field = NULL;
>  	struct tracing_map_sort_key *sort_key;
>  	int descending, ret = 0;
>  	unsigned int i, j;
> @@ -669,7 +678,9 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
>  	}
>  
>  	for (i = 0; i < TRACING_MAP_SORT_KEYS_MAX; i++) {
> +		struct hist_field *hist_field;
>  		char *field_str, *field_name;
> +		const char *test_name;
>  
>  		sort_key = &hist_data->sort_keys[i];
>  
> @@ -702,8 +713,9 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
>  		}
>  
>  		for (j = 1; j < hist_data->n_fields; j++) {
> -			field = hist_data->fields[j]->field;
> -			if (field && (strcmp(field_name, field->name) == 0)) {
> +			hist_field = hist_data->fields[j];
> +			test_name = hist_field_name(hist_field);
> +			if (strcmp(field_name, test_name) == 0) {

If hist_field_name() returns NULL, the strcmp() will crash.

>  				sort_key->field_idx = j;
>  				descending = is_descending(field_str);
>  				if (descending < 0) {
> @@ -951,6 +963,7 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
>  	struct hist_field *key_field;
>  	char str[KSYM_SYMBOL_LEN];
>  	bool multiline = false;
> +	const char *field_name;
>  	unsigned int i;
>  	u64 uval;
>  

-- Steve

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 02/21] tracing: Reimplement log2
  2017-02-08 17:24 ` [RFC][PATCH 02/21] tracing: Reimplement log2 Tom Zanussi
@ 2017-02-08 20:13   ` Steven Rostedt
  2017-02-08 20:25     ` Tom Zanussi
  0 siblings, 1 reply; 56+ messages in thread
From: Steven Rostedt @ 2017-02-08 20:13 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed,  8 Feb 2017 11:24:58 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

>  static void destroy_hist_field(struct hist_field *hist_field)
>  {
> +	unsigned int i;
> +
> +	if (!hist_field)
> +		return;
> +
> +	for (i = 0; i < HIST_FIELD_OPERANDS_MAX; i++)
> +		destroy_hist_field(hist_field->operands[i]);

Recursive functions get me really nervous. What limits it? Is this user
defined? Perhaps we need to find a better way to handle this that's not
recursive, or at least put in a hard limit of the amount it can recurse.

-- Steve

> +
>  	kfree(hist_field);
>  }
>  
> @@ -377,7 +393,10 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
>  	}
>  
>  	if (flags & HIST_FIELD_FL_LOG2) {
> +		unsigned long fl = flags & ~HIST_FIELD_FL_LOG2;
>  		hist_field->fn = hist_field_log2;
> +		hist_field->operands[0] = create_hist_field(field, fl);
> +		hist_field->size = hist_field->operands[0]->size;
>  		goto out;
>  	}
>  

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-08 20:01 ` [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Steven Rostedt
@ 2017-02-08 20:19   ` Tom Zanussi
  2017-02-08 23:28   ` Tom Zanussi
  1 sibling, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 20:19 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

Hi Steve,

On Wed, 2017-02-08 at 15:01 -0500, Steven Rostedt wrote:
> On Wed,  8 Feb 2017 11:24:56 -0600
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
> >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> >                            pid=sched_switch:woken_pid \
> >                            prio=sched_switch:woken_prio' >> \
> >             /sys/kernel/debug/tracing/synthetic_events
> 
> I applied all your patches, did the above and then:
> 
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
>  IP: free_synthetic_event+0x46/0xb0
>  PGD 0 
>  
>  Oops: 0000 [#1] SMP
>  Modules linked in: ip6table_filter ip6_tables x86_pkg_temp_thermal kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core irqbypass snd_seq snd_seq_device snd_pcm snd_timer snd i2c_i801 soundcore wmi i915 i2c_algo_bit drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ptp drm pps_core i2c_core video
>  CPU: 0 PID: 1389 Comm: bash Not tainted 4.10.0-rc2-test+ #36
>  Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
>  task: ffff8800c8c48000 task.stack: ffffc90000a00000
>  RIP: 0010:free_synthetic_event+0x46/0xb0
>  RSP: 0018:ffffc90000a03cf8 EFLAGS: 00010282
>  RAX: 0000000015675b00 RBX: ffff8800ccf4bcd0 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>  RBP: ffffc90000a03d18 R08: 0000000000000000 R09: ffff8800ccea7c6c
>  R10: 000000000000003d R11: ffff8800c8c48000 R12: 0000000000000001
>  R13: ffff8800c8ca6000 R14: ffff8800ccea7d88 R15: 0000000000000001
>  FS:  00007fde7acdb700(0000) GS:ffff88011ea00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000034 CR3: 00000000ccc44000 CR4: 00000000001406f0
>  Call Trace:
>   create_synthetic_event+0xcb/0x580
>   ? parse_var_ref+0x120/0x120
>   trace_run_command+0x5f/0x70
>   trace_parse_run_command+0x76/0x150
>   ? parse_var_ref+0x120/0x120
>   synthetic_events_write+0x10/0x20
>   __vfs_write+0x28/0x140
>   ? vmacache_find+0xcc/0x180
>   ? rw_verify_area+0xd8/0x190
>   ? _cond_resched+0x2e/0x50
>   ? __sb_start_write+0x82/0xe0
>   vfs_write+0x138/0x260
>   SyS_write+0x4f/0xa0
>   entry_SYSCALL_64_fastpath+0x13/0x94
> 

Hmm, ok, I'm not seeing it here, but I'll take a deeper look and see
what could be going on there.

> 
> Anyway, I kinda like the approach. I'll go patch by patch to see what
> you did and maybe even take a few now, if they seem appropriate as
> helpers.
> 

OK, great, thanks!

Tom

> -- Steve

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 02/21] tracing: Reimplement log2
  2017-02-08 20:13   ` Steven Rostedt
@ 2017-02-08 20:25     ` Tom Zanussi
  0 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 20:25 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed, 2017-02-08 at 15:13 -0500, Steven Rostedt wrote:
> On Wed,  8 Feb 2017 11:24:58 -0600
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
> >  static void destroy_hist_field(struct hist_field *hist_field)
> >  {
> > +	unsigned int i;
> > +
> > +	if (!hist_field)
> > +		return;
> > +
> > +	for (i = 0; i < HIST_FIELD_OPERANDS_MAX; i++)
> > +		destroy_hist_field(hist_field->operands[i]);
> 
> Recursive functions get me really nervous. What limits it? Is this user
> defined? Perhaps we need to find a better way to handle this that's not
> recursive, or at least put in a hard limit of the amount it can recurse.
> 

It's limited by the expression depth, which shouldn't be more than 1
deep, but you're right, there should be an explicit limit check, just in
case - will add one.

Tom

> -- Steve
> 
> > +
> >  	kfree(hist_field);
> >  }
> >  
> > @@ -377,7 +393,10 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
> >  	}
> >  
> >  	if (flags & HIST_FIELD_FL_LOG2) {
> > +		unsigned long fl = flags & ~HIST_FIELD_FL_LOG2;
> >  		hist_field->fn = hist_field_log2;
> > +		hist_field->operands[0] = create_hist_field(field, fl);
> > +		hist_field->size = hist_field->operands[0]->size;
> >  		goto out;
> >  	}
> >  
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type
  2017-02-08 17:24 ` [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type Tom Zanussi
@ 2017-02-08 20:32   ` Steven Rostedt
  2017-02-08 20:55     ` Tom Zanussi
  2017-02-10  6:04     ` Namhyung Kim
  0 siblings, 2 replies; 56+ messages in thread
From: Steven Rostedt @ 2017-02-08 20:32 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed,  8 Feb 2017 11:24:59 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

> Replace the unused RINGBUF_TYPE_TIME_STAMP ring buffer type with
> RINGBUF_TYPE_TIME_EXTEND_ABS, which forces extended time_deltas for
> all events.

Hmm, I could probably have this be used for nested commits :-/

> 
> Having time_deltas that aren't dependent on previous events in the
> ring buffer makes it feasible to use the ring_buffer_event timetamps
> in a more random-access way, to be used for purposes other than serial
> event printing.
> 
> To set/reset this mode, use tracing_set_timestamp_abs().
> 
> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> ---
>  include/linux/ring_buffer.h |  12 ++++-
>  kernel/trace/ring_buffer.c  | 109 ++++++++++++++++++++++++++++++++------------
>  kernel/trace/trace.c        |  25 +++++++++-
>  kernel/trace/trace.h        |   2 +
>  4 files changed, 117 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index b6d4568..c3a1064 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -36,6 +36,12 @@ struct ring_buffer_event {
>   *				 array[0] = time delta (28 .. 59)
>   *				 size = 8 bytes
>   *
> + * @RINGBUF_TYPE_TIME_EXTEND_ABS:
> + *				 Extend the time delta, but interpret it as
> + *				 absolute, not relative
> + *				 array[0] = time delta (28 .. 59)
> + *				 size = 8 bytes
> + *
>   * @RINGBUF_TYPE_TIME_STAMP:	Sync time stamp with external clock

I guess you need to nuke this comment too.

>   *				 array[0]    = tv_nsec
>   *				 array[1..2] = tv_sec
> @@ -56,12 +62,12 @@ enum ring_buffer_type {
>  	RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
>  	RINGBUF_TYPE_PADDING,
>  	RINGBUF_TYPE_TIME_EXTEND,
> -	/* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
> -	RINGBUF_TYPE_TIME_STAMP,
> +	RINGBUF_TYPE_TIME_EXTEND_ABS,
>  };
>  
>  unsigned ring_buffer_event_length(struct ring_buffer_event *event);
>  void *ring_buffer_event_data(struct ring_buffer_event *event);
> +u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event);
>  
>  /*
>   * ring_buffer_discard_commit will remove an event that has not
> @@ -180,6 +186,8 @@ void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
>  				      int cpu, u64 *ts);
>  void ring_buffer_set_clock(struct ring_buffer *buffer,
>  			   u64 (*clock)(void));
> +void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs);
> +bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer);
>  
>  size_t ring_buffer_page_len(void *page);
>  
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index a85739e..c9c9a83 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -41,6 +41,8 @@ int ring_buffer_print_entry_header(struct trace_seq *s)
>  			 RINGBUF_TYPE_PADDING);
>  	trace_seq_printf(s, "\ttime_extend : type == %d\n",
>  			 RINGBUF_TYPE_TIME_EXTEND);
> +	trace_seq_printf(s, "\ttime_extend_abs : type == %d\n",
> +			 RINGBUF_TYPE_TIME_EXTEND_ABS);
>  	trace_seq_printf(s, "\tdata max type_len  == %d\n",
>  			 RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
>  
> @@ -186,11 +188,9 @@ static void rb_event_set_padding(struct ring_buffer_event *event)
>  		return  event->array[0] + RB_EVNT_HDR_SIZE;
>  
>  	case RINGBUF_TYPE_TIME_EXTEND:
> +	case RINGBUF_TYPE_TIME_EXTEND_ABS:
>  		return RB_LEN_TIME_EXTEND;
>  
> -	case RINGBUF_TYPE_TIME_STAMP:
> -		return RB_LEN_TIME_STAMP;
> -
>  	case RINGBUF_TYPE_DATA:
>  		return rb_event_data_length(event);
>  	default:
> @@ -209,7 +209,8 @@ static void rb_event_set_padding(struct ring_buffer_event *event)
>  {
>  	unsigned len = 0;
>  
> -	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
> +	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS) {

Hmm, we could micro-optimize this with:

	event->type_len > RINGBUF_TYPE_PADDING

But it would require comments and/or a wrapper to define it so people
in the future know what it is doing.


>  		/* time extends include the data event after it */
>  		len = RB_LEN_TIME_EXTEND;
>  		event = skip_time_extend(event);
> @@ -231,7 +232,8 @@ unsigned ring_buffer_event_length(struct ring_buffer_event *event)
>  {
>  	unsigned length;
>  
> -	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
> +	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS)
>  		event = skip_time_extend(event);
>  
>  	length = rb_event_length(event);
> @@ -248,7 +250,8 @@ unsigned ring_buffer_event_length(struct ring_buffer_event *event)
>  static __always_inline void *
>  rb_event_data(struct ring_buffer_event *event)
>  {
> -	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
> +	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS)
>  		event = skip_time_extend(event);
>  	BUG_ON(event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
>  	/* If length is in len field, then array[0] has the data */
> @@ -483,6 +486,7 @@ struct ring_buffer {
>  	u64				(*clock)(void);
>  
>  	struct rb_irq_work		irq_work;
> +	bool				time_stamp_abs;
>  };
>  
>  struct ring_buffer_iter {
> @@ -1377,6 +1381,16 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
>  	buffer->clock = clock;
>  }
>  
> +void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs)
> +{
> +	buffer->time_stamp_abs = abs;
> +}
> +
> +bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer)
> +{
> +	return buffer->time_stamp_abs;
> +}
> +
>  static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);
>  
>  static inline unsigned long rb_page_entries(struct buffer_page *bpage)
> @@ -2207,13 +2221,16 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
>  }
>  
>  /* Slow path, do not inline */
> -static noinline struct ring_buffer_event *
> -rb_add_time_stamp(struct ring_buffer_event *event, u64 delta)
> +static struct noinline ring_buffer_event *
> +rb_add_time_stamp(struct ring_buffer_event *event, u64 delta, bool abs)
>  {
> -	event->type_len = RINGBUF_TYPE_TIME_EXTEND;
> +	if (abs)
> +		event->type_len = RINGBUF_TYPE_TIME_EXTEND_ABS;
> +	else
> +		event->type_len = RINGBUF_TYPE_TIME_EXTEND;
>  
> -	/* Not the first event on the page? */
> -	if (rb_event_index(event)) {
> +	/* Not the first event on the page, or not delta? */
> +	if (abs || rb_event_index(event)) {
>  		event->time_delta = delta & TS_MASK;
>  		event->array[0] = delta >> TS_SHIFT;
>  	} else {
> @@ -2256,7 +2273,9 @@ static inline bool rb_event_is_commit(struct ring_buffer_per_cpu *cpu_buffer,
>  	 * add it to the start of the resevered space.
>  	 */
>  	if (unlikely(info->add_timestamp)) {
> -		event = rb_add_time_stamp(event, delta);
> +		bool abs = ring_buffer_time_stamp_abs(cpu_buffer->buffer);
> +
> +		event = rb_add_time_stamp(event, info->delta, abs);
>  		length -= RB_LEN_TIME_EXTEND;
>  		delta = 0;
>  	}
> @@ -2444,7 +2463,8 @@ static __always_inline void rb_end_commit(struct ring_buffer_per_cpu *cpu_buffer
>  
>  static inline void rb_event_discard(struct ring_buffer_event *event)
>  {
> -	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
> +	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS)
>  		event = skip_time_extend(event);
>  
>  	/* array[0] holds the actual length for the discarded event */
> @@ -2475,6 +2495,10 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
>  {
>  	u64 delta;
>  
> +	/* Ignore write_stamp if TIME_EXTEND_ABS */
> +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS)
> +		return;
> +

Hmm, I don't trust this. This function does a bit of book keeping as
well.


>  	/*
>  	 * The event first in the commit queue updates the
>  	 * time stamp.
> @@ -2492,8 +2516,7 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
>  			delta <<= TS_SHIFT;
>  			delta += event->time_delta;
>  			cpu_buffer->write_stamp += delta;
> -		} else
> -			cpu_buffer->write_stamp += event->time_delta;
> +		}

And why is this removed?

>  	}
>  }
>  
> @@ -2674,7 +2697,7 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
>  	 * If this is the first commit on the page, then it has the same
>  	 * timestamp as the page itself.
>  	 */
> -	if (!tail)
> +	if (!tail && !ring_buffer_time_stamp_abs(cpu_buffer->buffer))
>  		info->delta = 0;
>  
>  	/* See if we shot pass the end of this buffer page */
> @@ -2752,8 +2775,11 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
>  	/* make sure this diff is calculated here */
>  	barrier();
>  
> -	/* Did the write stamp get updated already? */
> -	if (likely(info.ts >= cpu_buffer->write_stamp)) {
> +	if (ring_buffer_time_stamp_abs(buffer)) {
> +		info.delta = info.ts;
> +		rb_handle_timestamp(cpu_buffer, &info);
> +	} else /* Did the write stamp get updated already? */
> +		if (likely(info.ts >= cpu_buffer->write_stamp)) {

OK, please break this patch up into two. Although, I may take it and
start on it as well ;-)  One with the implementation of the EXTEND_ABS,
and the other with the setting of the flags.

If we are going to implement the time stamp ext, I want to see if I can
use it to fix other issues with the ring buffer. Actually, I'm thinking
that we could keep the TIME_STAMP name, and just implement it as a full
timestamp, not a delta. I believe that was what I wanted it for in the
first place.

-- Steve


>  		info.delta = diff;
>  		if (unlikely(test_time_stamp(info.delta)))
>  			rb_handle_timestamp(cpu_buffer, &info);
> @@ -3429,8 +3455,8 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
>  		cpu_buffer->read_stamp += delta;
>  		return;
>  
> -	case RINGBUF_TYPE_TIME_STAMP:
> -		/* FIXME: not implemented */
> +	case RINGBUF_TYPE_TIME_EXTEND_ABS:
> +		/* Ignore read_stamp if TIME_EXTEND_ABS */
>  		return;
>  
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type
  2017-02-08 20:32   ` Steven Rostedt
@ 2017-02-08 20:55     ` Tom Zanussi
  2017-02-09 14:54       ` Steven Rostedt
  2017-02-10  6:04     ` Namhyung Kim
  1 sibling, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 20:55 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed, 2017-02-08 at 15:32 -0500, Steven Rostedt wrote:
> On Wed,  8 Feb 2017 11:24:59 -0600
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
...
> >  	/*
> >  	 * The event first in the commit queue updates the
> >  	 * time stamp.
> > @@ -2492,8 +2516,7 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
> >  			delta <<= TS_SHIFT;
> >  			delta += event->time_delta;
> >  			cpu_buffer->write_stamp += delta;
> > -		} else
> > -			cpu_buffer->write_stamp += event->time_delta;
> > +		}
> 
> And why is this removed?
> 

Yeah, it doesn't make sense, given that we've returned already.  Looks
like it was just a lineo..

> >  	}
> >  }
> >  
> > @@ -2674,7 +2697,7 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
> >  	 * If this is the first commit on the page, then it has the same
> >  	 * timestamp as the page itself.
> >  	 */
> > -	if (!tail)
> > +	if (!tail && !ring_buffer_time_stamp_abs(cpu_buffer->buffer))
> >  		info->delta = 0;
> >  
> >  	/* See if we shot pass the end of this buffer page */
> > @@ -2752,8 +2775,11 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
> >  	/* make sure this diff is calculated here */
> >  	barrier();
> >  
> > -	/* Did the write stamp get updated already? */
> > -	if (likely(info.ts >= cpu_buffer->write_stamp)) {
> > +	if (ring_buffer_time_stamp_abs(buffer)) {
> > +		info.delta = info.ts;
> > +		rb_handle_timestamp(cpu_buffer, &info);
> > +	} else /* Did the write stamp get updated already? */
> > +		if (likely(info.ts >= cpu_buffer->write_stamp)) {
> 
> OK, please break this patch up into two. Although, I may take it and
> start on it as well ;-)  One with the implementation of the EXTEND_ABS,
> and the other with the setting of the flags.
> 

OK, I'll break it up if I don't see you do anything with it in the
meantime..

Tom

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (21 preceding siblings ...)
  2017-02-08 20:01 ` [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Steven Rostedt
@ 2017-02-08 23:13 ` Masami Hiramatsu
  2017-02-09  1:14   ` Tom Zanussi
  2017-02-10  4:16 ` Namhyung Kim
  23 siblings, 1 reply; 56+ messages in thread
From: Masami Hiramatsu @ 2017-02-08 23:13 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: rostedt, tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

Hi Tom,

On Wed,  8 Feb 2017 11:24:56 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

> This patchset adds support for 'inter-event' quantities to the trace
> event subsystem.  The most important example of inter-event quantities
> are latencies, or the time differences between two events.

Great! This is what I dream! :)
I'd like to use it. 

>   - 'actions' generating synthetic events, among other things
> 
>     Variables and synthetic events provide the data and data structure
>     for new events, but something still needs to actually generate an
>     event using that data.  'Actions' are expanded to provide that
>     capability.  Though it hasn't been explicitly called as much
>     before, the default 'action' currently for a hist trigger is to
>     update the matching histogram entry's sum values.  This patchset
>     essentially expands that to provide a new 'onmatch.trace(event)'
>     action that can be used to have one event generate another.  The
>     mechanism is extensible to other actions, and in fact the patchset
>     also includes another, 'onmax(var).save(field,...)' that can be
>     used to save context whenever a value exceeds the previous maximum
>     (something also needed by latency_hist).

BTW, I would like to comment on this grammer.

> 
> I'm submitting the patchset (based on tracing/for-next) as an RFC not
> only to get comments, but because there are still some problems I
> haven't fixed yet...
> 
> Here are some examples that should make things less abstract.
> 
>   ====
>   Example - wakeup latency
>   ====
> 
>   This basically implements the -RT latency_hist 'wakeup_latency'
>   histogram using the synthetic events, variables, and actions
>   described.  The output below is from a run of cyclictest using the
>   following command:
> 
>     # rt-tests/cyclictest -p 80 -n -s -t 2
> 
>   What we're measuring the latency of is the time between when a
>   thread (of cyclictest) is awakened and when it's scheduled in.  To
>   do that we add triggers to sched_wakeup and sched_switch with the
>   appropriate variables, and on a matching sched_switch event,
>   generate a synthetic 'wakeup_latency' event.  Since it's just
>   another trace event like any other, we can also define a histogram
>   on that event, the output of which is what we see displayed when
>   reading the wakeup_latency 'hist' file.
> 
>   First, we create a synthetic event called wakeup_latency, that
>   references 3 variables from other events:
> 
>     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
>                            pid=sched_switch:woken_pid \
>                            prio=sched_switch:woken_prio' >> \
>             /sys/kernel/debug/tracing/synthetic_events
> 
>   Next we add a trigger to sched_wakeup, which saves the value of the
>   'common_timestamp' when that event is hit in a variable, ts0.  Note
>   that this happens only when 'comm==cyclictest'.
> 
>   Also, 'common_timestamp' is a new field defined on every event (if
>   needed - if there are no users of timestamps in a trace, timestamps
>   won't be saved and there's no additional overhead from that).
> 
>     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
>              comm=="cyclictest"' >> \
>              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> 
>   Next, we add a trigger to sched_switch.  When the pid being switched
>   to matches the pid woken up by a previous sched_wakeup event, this
>   event grabs the ts0 saved on that event, takes the difference
>   between it and the current sched_switch's common_timestamp, and
>   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
>   other variables and then invokes the onmatch().trace() action which
>   generates a new wakeup_latency event using those variables.
> 
>     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
>        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
>             if next_comm=="cyclictest"' >> \
>             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

Hmm, this looks a bit hard to understand, I guess that onmatch() means
"if there is an event which has ts0 variable and the event's key matches
this key, take some action".
I think there are 2 indefinate point that
- Where the 'ts0' came from? what the variable will have 'global' scope?
- What matches to what? onmatch() doesn't tell it.

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-08 20:01 ` [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Steven Rostedt
  2017-02-08 20:19   ` Tom Zanussi
@ 2017-02-08 23:28   ` Tom Zanussi
  2017-02-09  2:14     ` Steven Rostedt
  1 sibling, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-08 23:28 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed, 2017-02-08 at 15:01 -0500, Steven Rostedt wrote:
> On Wed,  8 Feb 2017 11:24:56 -0600
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
> >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> >                            pid=sched_switch:woken_pid \
> >                            prio=sched_switch:woken_prio' >> \
> >             /sys/kernel/debug/tracing/synthetic_events
> 
> I applied all your patches, did the above and then:
> 
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
>  IP: free_synthetic_event+0x46/0xb0
>  PGD 0 
>  

OK, well, the problem is actually the '\' 'continuation' characters in
the above string.  I'll fix that properly, but in the meantime, removing
those chars in the documentation strings will let you try it out i.e.

# echo 'wakeup_latency lat=sched_switch:wakeup_lat pid=sched_switch:woken_pid prio=sched_switch:woken_prio' >> /sys/kernel/debug/tracing/synthetic_events

Tom

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-08 23:13 ` Masami Hiramatsu
@ 2017-02-09  1:14   ` Tom Zanussi
  2017-02-09 14:18     ` Masami Hiramatsu
  2017-02-09 14:46     ` Frank Ch. Eigler
  0 siblings, 2 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-09  1:14 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: rostedt, tglx, namhyung, linux-kernel, linux-rt-users

Hi Masami,

On Thu, 2017-02-09 at 08:13 +0900, Masami Hiramatsu wrote:
> Hi Tom,
> 
> On Wed,  8 Feb 2017 11:24:56 -0600
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
> > This patchset adds support for 'inter-event' quantities to the trace
> > event subsystem.  The most important example of inter-event quantities
> > are latencies, or the time differences between two events.
> 
> Great! This is what I dream! :)
> I'd like to use it. 
> 

Glad to hear it, thanks!

> >   - 'actions' generating synthetic events, among other things
> > 
> >     Variables and synthetic events provide the data and data structure
> >     for new events, but something still needs to actually generate an
> >     event using that data.  'Actions' are expanded to provide that
> >     capability.  Though it hasn't been explicitly called as much
> >     before, the default 'action' currently for a hist trigger is to
> >     update the matching histogram entry's sum values.  This patchset
> >     essentially expands that to provide a new 'onmatch.trace(event)'
> >     action that can be used to have one event generate another.  The
> >     mechanism is extensible to other actions, and in fact the patchset
> >     also includes another, 'onmax(var).save(field,...)' that can be
> >     used to save context whenever a value exceeds the previous maximum
> >     (something also needed by latency_hist).
> 
> BTW, I would like to comment on this grammer.
> 
> > 
> > I'm submitting the patchset (based on tracing/for-next) as an RFC not
> > only to get comments, but because there are still some problems I
> > haven't fixed yet...
> > 
> > Here are some examples that should make things less abstract.
> > 
> >   ====
> >   Example - wakeup latency
> >   ====
> > 
> >   This basically implements the -RT latency_hist 'wakeup_latency'
> >   histogram using the synthetic events, variables, and actions
> >   described.  The output below is from a run of cyclictest using the
> >   following command:
> > 
> >     # rt-tests/cyclictest -p 80 -n -s -t 2
> > 
> >   What we're measuring the latency of is the time between when a
> >   thread (of cyclictest) is awakened and when it's scheduled in.  To
> >   do that we add triggers to sched_wakeup and sched_switch with the
> >   appropriate variables, and on a matching sched_switch event,
> >   generate a synthetic 'wakeup_latency' event.  Since it's just
> >   another trace event like any other, we can also define a histogram
> >   on that event, the output of which is what we see displayed when
> >   reading the wakeup_latency 'hist' file.
> > 
> >   First, we create a synthetic event called wakeup_latency, that
> >   references 3 variables from other events:
> > 
> >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> >                            pid=sched_switch:woken_pid \
> >                            prio=sched_switch:woken_prio' >> \
> >             /sys/kernel/debug/tracing/synthetic_events
> > 
> >   Next we add a trigger to sched_wakeup, which saves the value of the
> >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> >   that this happens only when 'comm==cyclictest'.
> > 
> >   Also, 'common_timestamp' is a new field defined on every event (if
> >   needed - if there are no users of timestamps in a trace, timestamps
> >   won't be saved and there's no additional overhead from that).
> > 
> >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> >              comm=="cyclictest"' >> \
> >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > 
> >   Next, we add a trigger to sched_switch.  When the pid being switched
> >   to matches the pid woken up by a previous sched_wakeup event, this
> >   event grabs the ts0 saved on that event, takes the difference
> >   between it and the current sched_switch's common_timestamp, and
> >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> >   other variables and then invokes the onmatch().trace() action which
> >   generates a new wakeup_latency event using those variables.
> > 
> >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> >             if next_comm=="cyclictest"' >> \
> >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> 
> Hmm, this looks a bit hard to understand, I guess that onmatch() means
> "if there is an event which has ts0 variable and the event's key matches
> this key, take some action".

Yes, that's pretty much it. It's essentially shorthand for this kind of
common idiom, where timestamp[] is an associative array, which in our
case is the tracing_map of the histogram: 

event sched_wakeup()
{
	ts0[wakeup_pid] = now()
}

event sched_switch()
{
	if (ts0[next_pid])
		latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */
}

Only if ts0 has already been set does the onmatch() get invoked - if ts0
hasn't been set, there's no match and the trace(wakeup_latency) doesn't
happen.

> I think there are 2 indefinate point that
> - Where the 'ts0' came from? what the variable will have 'global' scope?

ts0 is basically a per-table-entry variable - there's one for each entry
in the table, and it can only be accessed by events with matching keys.
The table owns the variable name, so you can't have two different tables
with the ts0 variable.

So if we create a histogram on event1 and associate a variable ts0 with
it, any event hit on that histogram assigns to the corresponding entry's
ts0 instance. 

If we create a histogram on event2 which references ts0, it knows that
ts0 belongs to event1's histogram, and when there's a hit on event2, the
same key is used to look up the entry corresponding to that key on
event1, and if there's a matching entry, it grabs the value of ts0 from
that and subtracts it from the current event's value to produce the
latency or whatever it is.

So, that's a long-winded way of saying that the name ts0 is global
across all tables (histograms) but an instance of ts0 is local to each
entry in the table that owns the name.

> - What matches to what? onmatch() doesn't tell it.
> 

It's implied by the references to other events - in order for ts0 to be
resolved, it needs to find the match on event1.  Also, the synthetic
event has references to variables on other events - in order to generate
the synthetic event, those variables also need to be resolved to
matching events - note that variables can also come from the current
event as well.

Hope that clears things up a bit (although the details under the covers
might seem confusing).

Tom  


> Thank you,
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-08 23:28   ` Tom Zanussi
@ 2017-02-09  2:14     ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2017-02-09  2:14 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed, 08 Feb 2017 17:28:50 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

> On Wed, 2017-02-08 at 15:01 -0500, Steven Rostedt wrote:
> > On Wed,  8 Feb 2017 11:24:56 -0600
> > Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> >   
> > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > >                            pid=sched_switch:woken_pid \
> > >                            prio=sched_switch:woken_prio' >> \
> > >             /sys/kernel/debug/tracing/synthetic_events  
> > 
> > I applied all your patches, did the above and then:
> > 
> >  BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
> >  IP: free_synthetic_event+0x46/0xb0
> >  PGD 0 
> >    
> 
> OK, well, the problem is actually the '\' 'continuation' characters in
> the above string.  I'll fix that properly, but in the meantime, removing
> those chars in the documentation strings will let you try it out i.e.
> 
> # echo 'wakeup_latency lat=sched_switch:wakeup_lat pid=sched_switch:woken_pid prio=sched_switch:woken_prio' >> /sys/kernel/debug/tracing/synthetic_events

Ah, OK. Yeah, I did a cut and paste when entering it.

-- Steve

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-09  1:14   ` Tom Zanussi
@ 2017-02-09 14:18     ` Masami Hiramatsu
  2017-02-09 17:18       ` Tom Zanussi
  2017-02-09 14:46     ` Frank Ch. Eigler
  1 sibling, 1 reply; 56+ messages in thread
From: Masami Hiramatsu @ 2017-02-09 14:18 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: rostedt, tglx, namhyung, linux-kernel, linux-rt-users

Hi Tom,

On Wed, 08 Feb 2017 19:14:22 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

> > > I'm submitting the patchset (based on tracing/for-next) as an RFC not
> > > only to get comments, but because there are still some problems I
> > > haven't fixed yet...
> > > 
> > > Here are some examples that should make things less abstract.
> > > 
> > >   ====
> > >   Example - wakeup latency
> > >   ====
> > > 
> > >   This basically implements the -RT latency_hist 'wakeup_latency'
> > >   histogram using the synthetic events, variables, and actions
> > >   described.  The output below is from a run of cyclictest using the
> > >   following command:
> > > 
> > >     # rt-tests/cyclictest -p 80 -n -s -t 2
> > > 
> > >   What we're measuring the latency of is the time between when a
> > >   thread (of cyclictest) is awakened and when it's scheduled in.  To
> > >   do that we add triggers to sched_wakeup and sched_switch with the
> > >   appropriate variables, and on a matching sched_switch event,
> > >   generate a synthetic 'wakeup_latency' event.  Since it's just
> > >   another trace event like any other, we can also define a histogram
> > >   on that event, the output of which is what we see displayed when
> > >   reading the wakeup_latency 'hist' file.
> > > 
> > >   First, we create a synthetic event called wakeup_latency, that
> > >   references 3 variables from other events:
> > > 
> > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > >                            pid=sched_switch:woken_pid \
> > >                            prio=sched_switch:woken_prio' >> \
> > >             /sys/kernel/debug/tracing/synthetic_events
> > > 
> > >   Next we add a trigger to sched_wakeup, which saves the value of the
> > >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> > >   that this happens only when 'comm==cyclictest'.
> > > 
> > >   Also, 'common_timestamp' is a new field defined on every event (if
> > >   needed - if there are no users of timestamps in a trace, timestamps
> > >   won't be saved and there's no additional overhead from that).
> > > 
> > >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> > >              comm=="cyclictest"' >> \
> > >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > 
> > >   Next, we add a trigger to sched_switch.  When the pid being switched
> > >   to matches the pid woken up by a previous sched_wakeup event, this
> > >   event grabs the ts0 saved on that event, takes the difference
> > >   between it and the current sched_switch's common_timestamp, and
> > >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> > >   other variables and then invokes the onmatch().trace() action which
> > >   generates a new wakeup_latency event using those variables.
> > > 
> > >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> > >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> > >             if next_comm=="cyclictest"' >> \
> > >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > 
> > Hmm, this looks a bit hard to understand, I guess that onmatch() means
> > "if there is an event which has ts0 variable and the event's key matches
> > this key, take some action".
> 
> Yes, that's pretty much it. It's essentially shorthand for this kind of
> common idiom, where timestamp[] is an associative array, which in our
> case is the tracing_map of the histogram: 
> 
> event sched_wakeup()
> {
> 	ts0[wakeup_pid] = now()
> }
> 
> event sched_switch()
> {
> 	if (ts0[next_pid])
> 		latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */
> }
> 
> Only if ts0 has already been set does the onmatch() get invoked - if ts0
> hasn't been set, there's no match and the trace(wakeup_latency) doesn't
> happen.

OK, it reminds me other questions.

- Even if there is no matched ts0, sched_switch's hist will store
  woken_pid etc on its histogram map?
- If there is matched ts0 and wakeup_latency event has been kicked,
  the matched entry on ts0 is removed? and also in that case what
  happens on sched_switch's hist?

> > I think there are 2 indefinate point that
> > - Where the 'ts0' came from? what the variable will have 'global' scope?
> 
> ts0 is basically a per-table-entry variable - there's one for each entry
> in the table, and it can only be accessed by events with matching keys.
> The table owns the variable name, so you can't have two different tables
> with the ts0 variable.

Would you mean 'ts0' is a special name?

> So if we create a histogram on event1 and associate a variable ts0 with
> it, any event hit on that histogram assigns to the corresponding entry's
> ts0 instance. 
> 
> If we create a histogram on event2 which references ts0, it knows that
> ts0 belongs to event1's histogram, and when there's a hit on event2, the
> same key is used to look up the entry corresponding to that key on
> event1, and if there's a matching entry, it grabs the value of ts0 from
> that and subtracts it from the current event's value to produce the
> latency or whatever it is.
> 
> So, that's a long-winded way of saying that the name ts0 is global
> across all tables (histograms) but an instance of ts0 is local to each
> entry in the table that owns the name.

Ah, what I concerned was the scope of name... not instance.

Hmm, in that case, what about other variables in sched_switch?
it seems to have woken_pid,woken_prio and wakeup_lat. Are those also
becomes global instance?

Since I saw below definition, I expected those were not global.
> > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > >                            pid=sched_switch:woken_pid \
> > >                            prio=sched_switch:woken_prio' >> \
> > >             /sys/kernel/debug/tracing/synthetic_events

And if so, it is very unsure for users to check what variables are
already defined. I think we'd better to have a 'global'tag for ts0.

> > - What matches to what? onmatch() doesn't tell it.
> > 
> 
> It's implied by the references to other events - in order for ts0 to be
> resolved, it needs to find the match on event1.  Also, the synthetic
> event has references to variables on other events - in order to generate
> the synthetic event, those variables also need to be resolved to
> matching events - note that variables can also come from the current
> event as well.

I don't like such implications, which can make users lost in events easily,
especially for triggers since we don't have the system-wide list of triggers.
IMHO, since this interface is for a kind of programming, it should provide
the abstract but consistent system model too. Implication will mislead
users.

> Hope that clears things up a bit (although the details under the covers
> might seem confusing).

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-09  1:14   ` Tom Zanussi
  2017-02-09 14:18     ` Masami Hiramatsu
@ 2017-02-09 14:46     ` Frank Ch. Eigler
  2017-02-09 18:43       ` Tom Zanussi
  1 sibling, 1 reply; 56+ messages in thread
From: Frank Ch. Eigler @ 2017-02-09 14:46 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Masami Hiramatsu, rostedt, tglx, namhyung, linux-kernel, linux-rt-users


Hi, Tom -


tom.zanussi wrote:

> [...]
>> Hmm, this looks a bit hard to understand, I guess that onmatch() means
>> "if there is an event which has ts0 variable and the event's key matches
>> this key, take some action".
>
> Yes, that's pretty much it. It's essentially shorthand for this kind of
> common idiom, where timestamp[] is an associative array, which in our
> case is the tracing_map of the histogram: 
>
> event sched_wakeup()
> {
> 	ts0[wakeup_pid] = now()
> }
> event sched_switch()
> {
> 	if (ts0[next_pid])
> 		latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */
> }

By the way, here is a working systemtap version of this demo:

# cat foo.stp
global ts0%, latency%
function now() { return gettimeofday_us() }

probe kernel.trace("sched_wakeup") { ts0[$p->pid] = now() }

probe kernel.trace("sched_switch") {
   if (ts0[$next->pid])
      latency[$next->pid,$next->prio] <<< now() - ts0[$next->pid];
}

probe timer.s(5) {
   foreach ([pid+,x] in latency) {
      println("pid:", pid, " prio:", x)
      print(@hist_log(latency[pid,x]))
   }
   delete latency
}


# stap foo.stp
[...]
pid:20183 prio:109
value |-------------------------------------------------- count
    2 |                                                   0
    4 |                                                   0
    8 |@                                                  1
   16 |                                                   0
   32 |                                                   0

pid:29095 prio:120
value |-------------------------------------------------- count
    0 |                                                    1
    1 |@@@@                                                8
    2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             76
    4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     60
    8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68
   16 |@@@@@@@@                                           16
   32 |                                                    0
   64 |                                                    0
[...]




> ts0 is basically a per-table-entry variable - there's one for each
> entry in the table, and it can only be accessed by events with
> matching keys.  [...]  So, that's a long-winded way of saying that the
> name ts0 is global across all tables (histograms) but an instance of
> ts0 is local to each entry in the table that owns the name.

In systemtap, one of the things we take care of is automatic concurrency
control over such shared variables.  Even if many CPUs run these same
functions and try to access the same ts0/latency hash tables at the same
time, things will work correctly.  I'm curious how your code deals with
this.


- FChE

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type
  2017-02-08 20:55     ` Tom Zanussi
@ 2017-02-09 14:54       ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2017-02-09 14:54 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users

On Wed, 08 Feb 2017 14:55:48 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:


> OK, I'll break it up if I don't see you do anything with it in the
> meantime..

I think I'll let you break it up, and then work on the part that just
handles the implementation of the time stamp. I'll have to see how it
affects trace-cmd too.

-- Steve

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-09 14:18     ` Masami Hiramatsu
@ 2017-02-09 17:18       ` Tom Zanussi
  2017-02-09 19:57         ` Steven Rostedt
  0 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-09 17:18 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: rostedt, tglx, namhyung, linux-kernel, linux-rt-users

On Thu, 2017-02-09 at 23:18 +0900, Masami Hiramatsu wrote:
> Hi Tom,
> 
> On Wed, 08 Feb 2017 19:14:22 -0600
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
> > > > I'm submitting the patchset (based on tracing/for-next) as an RFC not
> > > > only to get comments, but because there are still some problems I
> > > > haven't fixed yet...
> > > > 
> > > > Here are some examples that should make things less abstract.
> > > > 
> > > >   ====
> > > >   Example - wakeup latency
> > > >   ====
> > > > 
> > > >   This basically implements the -RT latency_hist 'wakeup_latency'
> > > >   histogram using the synthetic events, variables, and actions
> > > >   described.  The output below is from a run of cyclictest using the
> > > >   following command:
> > > > 
> > > >     # rt-tests/cyclictest -p 80 -n -s -t 2
> > > > 
> > > >   What we're measuring the latency of is the time between when a
> > > >   thread (of cyclictest) is awakened and when it's scheduled in.  To
> > > >   do that we add triggers to sched_wakeup and sched_switch with the
> > > >   appropriate variables, and on a matching sched_switch event,
> > > >   generate a synthetic 'wakeup_latency' event.  Since it's just
> > > >   another trace event like any other, we can also define a histogram
> > > >   on that event, the output of which is what we see displayed when
> > > >   reading the wakeup_latency 'hist' file.
> > > > 
> > > >   First, we create a synthetic event called wakeup_latency, that
> > > >   references 3 variables from other events:
> > > > 
> > > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > > >                            pid=sched_switch:woken_pid \
> > > >                            prio=sched_switch:woken_prio' >> \
> > > >             /sys/kernel/debug/tracing/synthetic_events
> > > > 
> > > >   Next we add a trigger to sched_wakeup, which saves the value of the
> > > >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> > > >   that this happens only when 'comm==cyclictest'.
> > > > 
> > > >   Also, 'common_timestamp' is a new field defined on every event (if
> > > >   needed - if there are no users of timestamps in a trace, timestamps
> > > >   won't be saved and there's no additional overhead from that).
> > > > 
> > > >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> > > >              comm=="cyclictest"' >> \
> > > >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > > 
> > > >   Next, we add a trigger to sched_switch.  When the pid being switched
> > > >   to matches the pid woken up by a previous sched_wakeup event, this
> > > >   event grabs the ts0 saved on that event, takes the difference
> > > >   between it and the current sched_switch's common_timestamp, and
> > > >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> > > >   other variables and then invokes the onmatch().trace() action which
> > > >   generates a new wakeup_latency event using those variables.
> > > > 
> > > >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> > > >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> > > >             if next_comm=="cyclictest"' >> \
> > > >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > > 
> > > Hmm, this looks a bit hard to understand, I guess that onmatch() means
> > > "if there is an event which has ts0 variable and the event's key matches
> > > this key, take some action".
> > 
> > Yes, that's pretty much it. It's essentially shorthand for this kind of
> > common idiom, where timestamp[] is an associative array, which in our
> > case is the tracing_map of the histogram: 
> > 
> > event sched_wakeup()
> > {
> > 	ts0[wakeup_pid] = now()
> > }
> > 
> > event sched_switch()
> > {
> > 	if (ts0[next_pid])
> > 		latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */
> > }
> > 
> > Only if ts0 has already been set does the onmatch() get invoked - if ts0
> > hasn't been set, there's no match and the trace(wakeup_latency) doesn't
> > happen.
> 
> OK, it reminds me other questions.
> 
> - Even if there is no matched ts0, sched_switch's hist will store
>   woken_pid etc on its histogram map?

Yes, the match is just to invoke the onmatch() action, but the variables
are set regardless.

> - If there is matched ts0 and wakeup_latency event has been kicked,
>   the matched entry on ts0 is removed? and also in that case what
>   happens on sched_switch's hist?
> 

The entry isn't actually removed, but as far as ts0 goes, the result is
the same as if it had been - ts0 is a read-once variable, so once it's
used by the latency calculation, it's reset to an 'unset' after reading.
This essentially is how the 'if ts0[next_pid]' gets implemented -
actually, I should have added the implied removal to the pseudocode
above:

 	if (ts0[next_pid])
 		latency = now() - ts0[next_pid]
                ts0[next_pid] = null

The variables on sched_switch, since they aren't referenced by any
expression, aren't read-once, and just remain as they are.

I wanted to avoid making the user explicitly specify somehow whether a
variable was 'normal' or read-once, so I figured a simplifying
assumption could be that if the variable was referenced in an
expression, that means it should be read-once, while otherwise there's
no reason it should be read-once.

> > > I think there are 2 indefinate point that
> > > - Where the 'ts0' came from? what the variable will have 'global' scope?
> > 
> > ts0 is basically a per-table-entry variable - there's one for each entry
> > in the table, and it can only be accessed by events with matching keys.
> > The table owns the variable name, so you can't have two different tables
> > with the ts0 variable.
> 
> Would you mean 'ts0' is a special name?
> 

Not sure what you mean by a special name here...

> > So if we create a histogram on event1 and associate a variable ts0 with
> > it, any event hit on that histogram assigns to the corresponding entry's
> > ts0 instance. 
> > 
> > If we create a histogram on event2 which references ts0, it knows that
> > ts0 belongs to event1's histogram, and when there's a hit on event2, the
> > same key is used to look up the entry corresponding to that key on
> > event1, and if there's a matching entry, it grabs the value of ts0 from
> > that and subtracts it from the current event's value to produce the
> > latency or whatever it is.
> > 
> > So, that's a long-winded way of saying that the name ts0 is global
> > across all tables (histograms) but an instance of ts0 is local to each
> > entry in the table that owns the name.
> 
> Ah, what I concerned was the scope of name... not instance.
> 
> Hmm, in that case, what about other variables in sched_switch?
> it seems to have woken_pid,woken_prio and wakeup_lat. Are those also
> becomes global instance?

> Since I saw below definition, I expected those were not global.

Actually, internally, every variable is fully scoped as
system/event/var_name, and that's what those refer to.

To simplify things for the user, I didn't want to require the user to
have to fully qualify variable names in the triggers where there most
commonly used e.g. common_timestamp.usecs-sched_wakeup.ts0, so just made
the simplification that a variable should be globally unique, and
therefore didn't implement any scope parsing in the trigger, assuming
unique names.

But I agree, there is some inconsistency with that and the below -
variables probably shouldn't be global, and if there are two variabls
with the same name we should allow the user to resolve the ambiguity by
explicitly specifying the full system/event/var_name or event/var_name
which usually suffices.

> > > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > > >                            pid=sched_switch:woken_pid \
> > > >                            prio=sched_switch:woken_prio' >> \
> > > >             /sys/kernel/debug/tracing/synthetic_events
> 
> And if so, it is very unsure for users to check what variables are
> already defined. I think we'd better to have a 'global'tag for ts0.
> 
> > > - What matches to what? onmatch() doesn't tell it.
> > > 
> > 
> > It's implied by the references to other events - in order for ts0 to be
> > resolved, it needs to find the match on event1.  Also, the synthetic
> > event has references to variables on other events - in order to generate
> > the synthetic event, those variables also need to be resolved to
> > matching events - note that variables can also come from the current
> > event as well.
> 
> I don't like such implications, which can make users lost in events easily,
> especially for triggers since we don't have the system-wide list of triggers.
> IMHO, since this interface is for a kind of programming, it should provide
> the abstract but consistent system model too. Implication will mislead
> users.

I think it might make a lot of sense at this point to actually create a
system-wide list of active triggers e.g. tracing/events/triggers or
something like that.  It's something I've kind of wanted anyway, and
would be really useful if not indispensable for this.  Actually, I
thought it might even be nice to have some kind of mini-fs or something
making it easy to group sets of related triggers and enable and
disable/remove them as a group, but a simple list would suffice too...

Tom   

> 
> > Hope that clears things up a bit (although the details under the covers
> > might seem confusing).
> 
> Thank you,
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-09 14:46     ` Frank Ch. Eigler
@ 2017-02-09 18:43       ` Tom Zanussi
  0 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-09 18:43 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Masami Hiramatsu, rostedt, tglx, namhyung, linux-kernel, linux-rt-users

Hi Frank,

On Thu, 2017-02-09 at 09:46 -0500, Frank Ch. Eigler wrote:
> Hi, Tom -
> 
> 
> tom.zanussi wrote:
> 
> > [...]
> >> Hmm, this looks a bit hard to understand, I guess that onmatch() means
> >> "if there is an event which has ts0 variable and the event's key matches
> >> this key, take some action".
> >
> > Yes, that's pretty much it. It's essentially shorthand for this kind of
> > common idiom, where timestamp[] is an associative array, which in our
> > case is the tracing_map of the histogram: 
> >
> > event sched_wakeup()
> > {
> > 	ts0[wakeup_pid] = now()
> > }
> > event sched_switch()
> > {
> > 	if (ts0[next_pid])
> > 		latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */
> > }
> 
> By the way, here is a working systemtap version of this demo:
> 
> # cat foo.stp
> global ts0%, latency%
> function now() { return gettimeofday_us() }
> 
> probe kernel.trace("sched_wakeup") { ts0[$p->pid] = now() }
> 
> probe kernel.trace("sched_switch") {
>    if (ts0[$next->pid])
>       latency[$next->pid,$next->prio] <<< now() - ts0[$next->pid];
> }
> 
> probe timer.s(5) {
>    foreach ([pid+,x] in latency) {
>       println("pid:", pid, " prio:", x)
>       print(@hist_log(latency[pid,x]))
>    }
>    delete latency
> }
> 
> 
> # stap foo.stp
> [...]
> pid:20183 prio:109
> value |-------------------------------------------------- count
>     2 |                                                   0
>     4 |                                                   0
>     8 |@                                                  1
>    16 |                                                   0
>    32 |                                                   0
> 
> pid:29095 prio:120
> value |-------------------------------------------------- count
>     0 |                                                    1
>     1 |@@@@                                                8
>     2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             76
>     4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     60
>     8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 68
>    16 |@@@@@@@@                                           16
>    32 |                                                    0
>    64 |                                                    0
> [...]
> 
> 

Nice!

> 
> 
> > ts0 is basically a per-table-entry variable - there's one for each
> > entry in the table, and it can only be accessed by events with
> > matching keys.  [...]  So, that's a long-winded way of saying that the
> > name ts0 is global across all tables (histograms) but an instance of
> > ts0 is local to each entry in the table that owns the name.
> 
> In systemtap, one of the things we take care of is automatic concurrency
> control over such shared variables.  Even if many CPUs run these same
> functions and try to access the same ts0/latency hash tables at the same
> time, things will work correctly.  I'm curious how your code deals with
> this.
> 

The hash tables used by the hist triggers this is based on are
themselves based on tracing_maps, which are lock-free hash tables, and
the variables themselves are atomic, so there shouldn't be any problems
with concurrent access, unless I'm missing something...

Tom 

> 
> - FChE

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-09 17:18       ` Tom Zanussi
@ 2017-02-09 19:57         ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2017-02-09 19:57 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Masami Hiramatsu, tglx, namhyung, linux-kernel, linux-rt-users

On Thu, 09 Feb 2017 11:18:32 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

> I think it might make a lot of sense at this point to actually create a
> system-wide list of active triggers e.g. tracing/events/triggers or
> something like that.  It's something I've kind of wanted anyway, and
> would be really useful if not indispensable for this.  Actually, I
> thought it might even be nice to have some kind of mini-fs or something
> making it easy to group sets of related triggers and enable and
> disable/remove them as a group, but a simple list would suffice too...

Note, (new) trace-cmd does this for you (by querying all trigger files).

# echo 'stacktrace if COMM ~ "system*"' > /debug/tracing/events/syscalls/sys_enter_open/trigger
# trace-cmd stat

Events:
  (none enabled)

Triggers:
  syscalls:sys_enter_open "stacktrace:unlimited if COMM ~ "system*""

Tracing is enabled



-- Steve

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 08/21] tracing: Make traceprobe parsing code reusable
  2017-02-08 17:25 ` [RFC][PATCH 08/21] tracing: Make traceprobe parsing code reusable Tom Zanussi
@ 2017-02-09 20:40   ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2017-02-09 20:40 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: tglx, mhiramat, namhyung, linux-kernel, linux-rt-users,
	Srikar Dronamraju

On Wed,  8 Feb 2017 11:25:04 -0600
Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

> traceprobe_probes_write() and traceprobe_command() actually contain
> nothing that ties them to kprobes - the code is generically useful for
> similar types of parsing elsewhere, so separate it out and move it to
> trace.c/trace.h.
> 
> Other than moving it, the only change is in naming:
> traceprobe_probes_write() becomes trace_parse_run_command() and
> traceprobe_command() becomes trace_run_command().
> 
> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> ---
>  kernel/trace/trace.c        | 75 +++++++++++++++++++++++++++++++++++++++++++++
>  kernel/trace/trace.h        |  7 +++++
>  kernel/trace/trace_kprobe.c | 18 +++++------
>  kernel/trace/trace_probe.c  | 75 ---------------------------------------------
>  kernel/trace/trace_probe.h  |  7 -----
>  kernel/trace/trace_uprobe.c |  2 +-
>  6 files changed, 92 insertions(+), 92 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5868656..78dff2f 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -7912,6 +7912,81 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
>  }
>  EXPORT_SYMBOL_GPL(ftrace_dump);
>  
> +int trace_run_command(const char *buf, int (*createfn)(int, char **))
> +{
> +	char **argv;
> +	int argc, ret;
> +
> +	argc = 0;
> +	ret = 0;
> +	argv = argv_split(GFP_KERNEL, buf, &argc);
> +	if (!argv)
> +		return -ENOMEM;
> +
> +	if (argc)
> +		ret = createfn(argc, argv);
> +
> +	argv_free(argv);
> +
> +	return ret;
> +}
> +
> +#define WRITE_BUFSIZE  4096
> +
> +ssize_t trace_parse_run_command(struct file *file, const char __user *buffer,
> +				size_t count, loff_t *ppos,
> +				int (*createfn)(int, char **))
> +{
> +	char *kbuf, *tmp;
> +	int ret = 0;
> +	size_t done = 0;
> +	size_t size;
> +
> +	kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
> +	if (!kbuf)
> +		return -ENOMEM;
> +
> +	while (done < count) {
> +		size = count - done;
> +
> +		if (size >= WRITE_BUFSIZE)
> +			size = WRITE_BUFSIZE - 1;
> +
> +		if (copy_from_user(kbuf, buffer + done, size)) {

OK, I'm looking at this code now, and I really dislike how we do a
copy_from_user() at every iteration even when not necessary.

I'm going to fix this in the trace_probe.c file, so giving you a heads
up, that my change will conflict with this patch.

-- Steve

> +			ret = -EFAULT;
> +			goto out;
> +		}
> +		kbuf[size] = '\0';
> +		tmp = strchr(kbuf, '\n');
> +
> +		if (tmp) {
> +			*tmp = '\0';
> +			size = tmp - kbuf + 1;
> +		} else if (done + size < count) {
> +			pr_warn("Line length is too long: Should be less than %d\n",
> +				WRITE_BUFSIZE);
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		done += size;
> +		/* Remove comments */
> +		tmp = strchr(kbuf, '#');
> +
> +		if (tmp)
> +			*tmp = '\0';
> +
> +		ret = trace_run_command(kbuf, createfn);
> +		if (ret)
> +			goto out;
> +	}
> +	ret = done;
> +
> +out:
> +	kfree(kbuf);
> +
> +	return ret;
> +}
> +
>  __init static int tracer_alloc_buffers(void)
>  {
>  	int ring_buf_size;
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index ac55fa1..f2af21b 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -1647,6 +1647,13 @@ extern int trace_event_enable_disable(struct trace_event_file *file,
>  int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set);
>  int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled);
>  
> +#define MAX_EVENT_NAME_LEN	64
> +
> +extern int trace_run_command(const char *buf, int (*createfn)(int, char**));
> +extern ssize_t trace_parse_run_command(struct file *file,
> +		const char __user *buffer, size_t count, loff_t *ppos,
> +		int (*createfn)(int, char**));
> +
>  /*
>   * Normal trace_printk() and friends allocates special buffers
>   * to do the manipulation, as well as saves the print formats
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index a133ecd..8f3b4d9 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -876,8 +876,8 @@ static int probes_open(struct inode *inode, struct file *file)
>  static ssize_t probes_write(struct file *file, const char __user *buffer,
>  			    size_t count, loff_t *ppos)
>  {
> -	return traceprobe_probes_write(file, buffer, count, ppos,
> -			create_trace_kprobe);
> +	return trace_parse_run_command(file, buffer, count, ppos,
> +				       create_trace_kprobe);
>  }
>  
>  static const struct file_operations kprobe_events_ops = {
> @@ -1402,9 +1402,9 @@ static __init int kprobe_trace_self_tests_init(void)
>  
>  	pr_info("Testing kprobe tracing: ");
>  
> -	ret = traceprobe_command("p:testprobe kprobe_trace_selftest_target "
> -				  "$stack $stack0 +0($stack)",
> -				  create_trace_kprobe);
> +	ret = trace_run_command("p:testprobe kprobe_trace_selftest_target "
> +				"$stack $stack0 +0($stack)",
> +				create_trace_kprobe);
>  	if (WARN_ON_ONCE(ret)) {
>  		pr_warn("error on probing function entry.\n");
>  		warn++;
> @@ -1424,8 +1424,8 @@ static __init int kprobe_trace_self_tests_init(void)
>  		}
>  	}
>  
> -	ret = traceprobe_command("r:testprobe2 kprobe_trace_selftest_target "
> -				  "$retval", create_trace_kprobe);
> +	ret = trace_run_command("r:testprobe2 kprobe_trace_selftest_target "
> +				"$retval", create_trace_kprobe);
>  	if (WARN_ON_ONCE(ret)) {
>  		pr_warn("error on probing function return.\n");
>  		warn++;
> @@ -1495,13 +1495,13 @@ static __init int kprobe_trace_self_tests_init(void)
>  			disable_trace_kprobe(tk, file);
>  	}
>  
> -	ret = traceprobe_command("-:testprobe", create_trace_kprobe);
> +	ret = trace_run_command("-:testprobe", create_trace_kprobe);
>  	if (WARN_ON_ONCE(ret)) {
>  		pr_warn("error on deleting a probe.\n");
>  		warn++;
>  	}
>  
> -	ret = traceprobe_command("-:testprobe2", create_trace_kprobe);
> +	ret = trace_run_command("-:testprobe2", create_trace_kprobe);
>  	if (WARN_ON_ONCE(ret)) {
>  		pr_warn("error on deleting a probe.\n");
>  		warn++;
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 8c0553d..b7de026 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -622,81 +622,6 @@ void traceprobe_free_probe_arg(struct probe_arg *arg)
>  	kfree(arg->comm);
>  }
>  
> -int traceprobe_command(const char *buf, int (*createfn)(int, char **))
> -{
> -	char **argv;
> -	int argc, ret;
> -
> -	argc = 0;
> -	ret = 0;
> -	argv = argv_split(GFP_KERNEL, buf, &argc);
> -	if (!argv)
> -		return -ENOMEM;
> -
> -	if (argc)
> -		ret = createfn(argc, argv);
> -
> -	argv_free(argv);
> -
> -	return ret;
> -}
> -
> -#define WRITE_BUFSIZE  4096
> -
> -ssize_t traceprobe_probes_write(struct file *file, const char __user *buffer,
> -				size_t count, loff_t *ppos,
> -				int (*createfn)(int, char **))
> -{
> -	char *kbuf, *tmp;
> -	int ret = 0;
> -	size_t done = 0;
> -	size_t size;
> -
> -	kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
> -	if (!kbuf)
> -		return -ENOMEM;
> -
> -	while (done < count) {
> -		size = count - done;
> -
> -		if (size >= WRITE_BUFSIZE)
> -			size = WRITE_BUFSIZE - 1;
> -
> -		if (copy_from_user(kbuf, buffer + done, size)) {
> -			ret = -EFAULT;
> -			goto out;
> -		}
> -		kbuf[size] = '\0';
> -		tmp = strchr(kbuf, '\n');
> -
> -		if (tmp) {
> -			*tmp = '\0';
> -			size = tmp - kbuf + 1;
> -		} else if (done + size < count) {
> -			pr_warn("Line length is too long: Should be less than %d\n",
> -				WRITE_BUFSIZE);
> -			ret = -EINVAL;
> -			goto out;
> -		}
> -		done += size;
> -		/* Remove comments */
> -		tmp = strchr(kbuf, '#');
> -
> -		if (tmp)
> -			*tmp = '\0';
> -
> -		ret = traceprobe_command(kbuf, createfn);
> -		if (ret)
> -			goto out;
> -	}
> -	ret = done;
> -
> -out:
> -	kfree(kbuf);
> -
> -	return ret;
> -}
> -
>  static int __set_print_fmt(struct trace_probe *tp, char *buf, int len,
>  			   bool is_return)
>  {
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 0c0ae54..37ab38c 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -42,7 +42,6 @@
>  
>  #define MAX_TRACE_ARGS		128
>  #define MAX_ARGSTR_LEN		63
> -#define MAX_EVENT_NAME_LEN	64
>  #define MAX_STRING_SIZE		PATH_MAX
>  
>  /* Reserved field names */
> @@ -356,12 +355,6 @@ extern int traceprobe_conflict_field_name(const char *name,
>  
>  extern int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset);
>  
> -extern ssize_t traceprobe_probes_write(struct file *file,
> -		const char __user *buffer, size_t count, loff_t *ppos,
> -		int (*createfn)(int, char**));
> -
> -extern int traceprobe_command(const char *buf, int (*createfn)(int, char**));
> -
>  /* Sum up total data length for dynamic arraies (strings) */
>  static nokprobe_inline int
>  __get_data_size(struct trace_probe *tp, struct pt_regs *regs)
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 4f2ba2b..10e3ec8 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -649,7 +649,7 @@ static int probes_open(struct inode *inode, struct file *file)
>  static ssize_t probes_write(struct file *file, const char __user *buffer,
>  			    size_t count, loff_t *ppos)
>  {
> -	return traceprobe_probes_write(file, buffer, count, ppos, create_trace_uprobe);
> +	return trace_parse_run_command(file, buffer, count, ppos, create_trace_uprobe);
>  }
>  
>  static const struct file_operations uprobe_events_ops = {

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
                   ` (22 preceding siblings ...)
  2017-02-08 23:13 ` Masami Hiramatsu
@ 2017-02-10  4:16 ` Namhyung Kim
  2017-02-10  9:34   ` Masami Hiramatsu
  2017-02-10 18:43   ` Tom Zanussi
  23 siblings, 2 replies; 56+ messages in thread
From: Namhyung Kim @ 2017-02-10  4:16 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

Hi Tom,

On Wed, Feb 08, 2017 at 11:24:56AM -0600, Tom Zanussi wrote:
> This patchset adds support for 'inter-event' quantities to the trace
> event subsystem.  The most important example of inter-event quantities
> are latencies, or the time differences between two events.
> 
> One of the main motivations for adding this capability is to provide a
> general-purpose base that existing existing tools such as the -RT
> latency_hist patchset can be built upon, while at the same time
> providing a simple way for users to track latencies (or any
> inter-event quantity) generically between any two events.
> 
> Previous -RT latency_hist patchsets that take advantage of the trace
> event subsystem have been submitted, but they essentially hard-code
> special-case tracepoints and application logic in ways that can't be
> reused.  It seemed to me that rather than providing a one-off patchset
> devoted specifically to generating the specific histograms in the
> latency_hist patchset, it should be possible to build the same
> functionality on top of a generic layer allowing users to do similar
> things for other non-latency_hist applications.
> 
> In addition to preliminary patches that add some basic missing
> functionality such as a common ringbuffer-derived timestamp and
> dynamically-creatable tracepoints, the overall patchset is divided up
> into a few different areas that combine to produce the overall goal
> (The Documentation patch explains all the details):

Looks very nice!

> 
>   - variables and simple expressions required to calculate a latency
> 
>     In order to calculate a latency or any inter-event value,
>     something from one event needs to be saved and later retrieved,
>     and some operation such as subtraction or addition is performed on
>     it.  This means some minimal form of variables and expressions,
>     which the first set of patches implements.  Saving and retrieving
>     events to use in a latency calculation is normally done using a
>     hash table, and that's exactly what we have with trace event hist
>     triggers, so that's where variables are instantiated, set, and
>     retrieved.  Basically, variables are set on one entry and
>     retrieved and used by a 'matching' event.
> 
>   - 'synthetic' events, combining variables from other events
> 
>     The trace event interface is based on pseudo-files associated with
>     individual events, so it wouldn't really make sense to have
>     quantities derived from multiple events attached to any one of
>     those events.  For that reason, the patchset implements a means of
>     combining variables from other events into a separate 'synthetic'
>     event, which can be treated as if it were just like any other
>     trace event in the system.
> 
>   - 'actions' generating synthetic events, among other things
> 
>     Variables and synthetic events provide the data and data structure
>     for new events, but something still needs to actually generate an
>     event using that data.  'Actions' are expanded to provide that
>     capability.  Though it hasn't been explicitly called as much
>     before, the default 'action' currently for a hist trigger is to
>     update the matching histogram entry's sum values.  This patchset
>     essentially expands that to provide a new 'onmatch.trace(event)'
>     action that can be used to have one event generate another.  The
>     mechanism is extensible to other actions, and in fact the patchset
>     also includes another, 'onmax(var).save(field,...)' that can be
>     used to save context whenever a value exceeds the previous maximum
>     (something also needed by latency_hist).
> 
> I'm submitting the patchset (based on tracing/for-next) as an RFC not
> only to get comments, but because there are still some problems I
> haven't fixed yet...
> 
> Here are some examples that should make things less abstract.
> 
>   ====
>   Example - wakeup latency
>   ====
> 
>   This basically implements the -RT latency_hist 'wakeup_latency'
>   histogram using the synthetic events, variables, and actions
>   described.  The output below is from a run of cyclictest using the
>   following command:
> 
>     # rt-tests/cyclictest -p 80 -n -s -t 2
> 
>   What we're measuring the latency of is the time between when a
>   thread (of cyclictest) is awakened and when it's scheduled in.  To
>   do that we add triggers to sched_wakeup and sched_switch with the
>   appropriate variables, and on a matching sched_switch event,
>   generate a synthetic 'wakeup_latency' event.  Since it's just
>   another trace event like any other, we can also define a histogram
>   on that event, the output of which is what we see displayed when
>   reading the wakeup_latency 'hist' file.
> 
>   First, we create a synthetic event called wakeup_latency, that
>   references 3 variables from other events:
> 
>     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
>                            pid=sched_switch:woken_pid \
>                            prio=sched_switch:woken_prio' >> \
>             /sys/kernel/debug/tracing/synthetic_events
> 
>   Next we add a trigger to sched_wakeup, which saves the value of the
>   'common_timestamp' when that event is hit in a variable, ts0.  Note
>   that this happens only when 'comm==cyclictest'.
> 
>   Also, 'common_timestamp' is a new field defined on every event (if
>   needed - if there are no users of timestamps in a trace, timestamps
>   won't be saved and there's no additional overhead from that).
> 
>     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
>              comm=="cyclictest"' >> \
>              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> 
>   Next, we add a trigger to sched_switch.  When the pid being switched
>   to matches the pid woken up by a previous sched_wakeup event, this
>   event grabs the ts0 saved on that event, takes the difference
>   between it and the current sched_switch's common_timestamp, and
>   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
>   other variables and then invokes the onmatch().trace() action which
>   generates a new wakeup_latency event using those variables.
> 
>     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
>        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
>             if next_comm=="cyclictest"' >> \
>             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

As Masami said, I think the syntax is a bit hard to understand.  Also
it'd be nice to access an event field directly (i.e. not by adding a
field in a hist).  Maybe we can use a prefix like '$' to identify hist
fields..

How about below?

  # echo 'wakeup_latency \
  		lat=sched_switch.$wakeup_lat  \
		pid=sched_switch.next_pid     \
		prio=sched_switch.next_prio' >> \
	/sys/kernel/debug/tracing/synthetic_events

  # echo 'hist: \
  		keys=pid: \
		ts0=common_timestamp.usec \
		if comm=="cyclictest"' >> \
	/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger

  # echo 'hist: \
		keys=next_pid: \
		wakeup_lat=common_timestamp.usec-$ts0: \
		onmatch(sched_wakeup).trace(wakeup_latency) \
		if next_comm=="cyclictest"' >> \
	/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

By passing an event name to 'onmatch', we can know where to find $ts0
easily IMHO.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type
  2017-02-08 20:32   ` Steven Rostedt
  2017-02-08 20:55     ` Tom Zanussi
@ 2017-02-10  6:04     ` Namhyung Kim
  2017-02-10 14:28       ` Steven Rostedt
  1 sibling, 1 reply; 56+ messages in thread
From: Namhyung Kim @ 2017-02-10  6:04 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Tom Zanussi, tglx, mhiramat, linux-kernel, linux-rt-users

On Wed, Feb 08, 2017 at 03:32:00PM -0500, Steven Rostedt wrote:
> On Wed,  8 Feb 2017 11:24:59 -0600
> Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
> > Replace the unused RINGBUF_TYPE_TIME_STAMP ring buffer type with
> > RINGBUF_TYPE_TIME_EXTEND_ABS, which forces extended time_deltas for
> > all events.
> 
> Hmm, I could probably have this be used for nested commits :-/
> 
> > 
> > Having time_deltas that aren't dependent on previous events in the
> > ring buffer makes it feasible to use the ring_buffer_event timetamps
> > in a more random-access way, to be used for purposes other than serial
> > event printing.
> > 
> > To set/reset this mode, use tracing_set_timestamp_abs().
> > 
> > Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> > ---
> >  include/linux/ring_buffer.h |  12 ++++-
> >  kernel/trace/ring_buffer.c  | 109 ++++++++++++++++++++++++++++++++------------
> >  kernel/trace/trace.c        |  25 +++++++++-
> >  kernel/trace/trace.h        |   2 +
> >  4 files changed, 117 insertions(+), 31 deletions(-)
> > 
> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > index b6d4568..c3a1064 100644
> > --- a/include/linux/ring_buffer.h
> > +++ b/include/linux/ring_buffer.h
> > @@ -36,6 +36,12 @@ struct ring_buffer_event {
> >   *				 array[0] = time delta (28 .. 59)
> >   *				 size = 8 bytes
> >   *
> > + * @RINGBUF_TYPE_TIME_EXTEND_ABS:
> > + *				 Extend the time delta, but interpret it as
> > + *				 absolute, not relative
> > + *				 array[0] = time delta (28 .. 59)

It's not a delta.

> > + *				 size = 8 bytes
> > + *
> >   * @RINGBUF_TYPE_TIME_STAMP:	Sync time stamp with external clock
> 
> I guess you need to nuke this comment too.
> 
> >   *				 array[0]    = tv_nsec
> >   *				 array[1..2] = tv_sec
> > @@ -56,12 +62,12 @@ enum ring_buffer_type {
> >  	RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
> >  	RINGBUF_TYPE_PADDING,
> >  	RINGBUF_TYPE_TIME_EXTEND,
> > -	/* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
> > -	RINGBUF_TYPE_TIME_STAMP,
> > +	RINGBUF_TYPE_TIME_EXTEND_ABS,
> >  };
> >  
> >  unsigned ring_buffer_event_length(struct ring_buffer_event *event);
> >  void *ring_buffer_event_data(struct ring_buffer_event *event);
> > +u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event);
> >  
> >  /*
> >   * ring_buffer_discard_commit will remove an event that has not
> > @@ -180,6 +186,8 @@ void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
> >  				      int cpu, u64 *ts);
> >  void ring_buffer_set_clock(struct ring_buffer *buffer,
> >  			   u64 (*clock)(void));
> > +void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs);
> > +bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer);
> >  
> >  size_t ring_buffer_page_len(void *page);
> >  
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index a85739e..c9c9a83 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -41,6 +41,8 @@ int ring_buffer_print_entry_header(struct trace_seq *s)
> >  			 RINGBUF_TYPE_PADDING);
> >  	trace_seq_printf(s, "\ttime_extend : type == %d\n",
> >  			 RINGBUF_TYPE_TIME_EXTEND);
> > +	trace_seq_printf(s, "\ttime_extend_abs : type == %d\n",
> > +			 RINGBUF_TYPE_TIME_EXTEND_ABS);
> >  	trace_seq_printf(s, "\tdata max type_len  == %d\n",
> >  			 RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
> >  
> > @@ -186,11 +188,9 @@ static void rb_event_set_padding(struct ring_buffer_event *event)
> >  		return  event->array[0] + RB_EVNT_HDR_SIZE;
> >  
> >  	case RINGBUF_TYPE_TIME_EXTEND:
> > +	case RINGBUF_TYPE_TIME_EXTEND_ABS:
> >  		return RB_LEN_TIME_EXTEND;
> >  
> > -	case RINGBUF_TYPE_TIME_STAMP:
> > -		return RB_LEN_TIME_STAMP;
> > -
> >  	case RINGBUF_TYPE_DATA:
> >  		return rb_event_data_length(event);
> >  	default:
> > @@ -209,7 +209,8 @@ static void rb_event_set_padding(struct ring_buffer_event *event)
> >  {
> >  	unsigned len = 0;
> >  
> > -	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
> > +	if (event->type_len == RINGBUF_TYPE_TIME_EXTEND ||
> > +	    event->type_len == RINGBUF_TYPE_TIME_EXTEND_ABS) {
> 
> Hmm, we could micro-optimize this with:
> 
> 	event->type_len > RINGBUF_TYPE_PADDING
> 
> But it would require comments and/or a wrapper to define it so people
> in the future know what it is doing.

What about

	event->type_len >= RINGBUF_TYPE_TIME_EXTEND

?  I think it's easier to understand what it's doing.

Thanks,
Namhyung

> 
> 
> >  		/* time extends include the data event after it */
> >  		len = RB_LEN_TIME_EXTEND;
> >  		event = skip_time_extend(event);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 09/21] tracing: Add hist trigger timestamp support
  2017-02-08 17:25 ` [RFC][PATCH 09/21] tracing: Add hist trigger timestamp support Tom Zanussi
@ 2017-02-10  6:14   ` Namhyung Kim
  0 siblings, 0 replies; 56+ messages in thread
From: Namhyung Kim @ 2017-02-10  6:14 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

On Wed, Feb 08, 2017 at 11:25:05AM -0600, Tom Zanussi wrote:
> Add support for a timestamp event field.  This is actually a 'pseudo-'
> event field in that it behaves like it's part of the event record, but
> is really part of the corresponding ring buffer event.
> 
> To make use of the timestamp field, users can specify "common_timestamp"
> as a field name for any histogram.  Note that this doesn't make much
> sense on its own either as either a key or value, but needs to be
> supported even so, since follow-on patches will add support for making
> use of this field in time deltas.
> 
> Note that the use of this field requires the ring buffer be put into
> TIME_EXTEND_ABS mode, which saves the complete timestamp for each
> event rather than an offset.  This mode will be enabled if and only if
> a histogram makes use of the "common_timestamp" field.
> 
> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> ---
>  kernel/trace/trace_events_hist.c | 85 +++++++++++++++++++++++++++++-----------
>  1 file changed, 62 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 4e70872..8d7f7dd 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -88,6 +88,12 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event,
>  	return (u64) ilog2(roundup_pow_of_two(val));
>  }
>  
> +static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
> +				struct ring_buffer_event *rbe)
> +{
> +	return ring_buffer_event_time_stamp(rbe);
> +}
> +
>  #define DEFINE_HIST_FIELD_FN(type)					\
>  	static u64 hist_field_##type(struct hist_field *hist_field,	\
>  				     void *event,			\
> @@ -134,6 +140,7 @@ enum hist_field_flags {
>  	HIST_FIELD_FL_SYSCALL		= 128,
>  	HIST_FIELD_FL_STACKTRACE	= 256,
>  	HIST_FIELD_FL_LOG2		= 512,
> +	HIST_FIELD_FL_TIMESTAMP		= 1024,
>  };
>  
>  struct hist_trigger_attrs {
> @@ -158,6 +165,7 @@ struct hist_trigger_data {
>  	struct trace_event_file		*event_file;
>  	struct hist_trigger_attrs	*attrs;
>  	struct tracing_map		*map;
> +	bool				enable_timestamps;
>  };
>  
>  static const char *hist_field_name(struct hist_field *field)
> @@ -404,6 +412,7 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
>  	hist_field = kzalloc(sizeof(struct hist_field), GFP_KERNEL);
>  	if (!hist_field)
>  		return NULL;
> +	hist_field->is_signed = false;

I think it's not needed since hist_field is allocated by kzalloc().
Also I cannot find where ->is_signed is set.

>  
>  	if (flags & HIST_FIELD_FL_HITCOUNT) {
>  		hist_field->fn = hist_field_counter;
> @@ -423,6 +432,12 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
>  		goto out;
>  	}
>  
> +	if (flags & HIST_FIELD_FL_TIMESTAMP) {
> +		hist_field->fn = hist_field_timestamp;
> +		hist_field->size = sizeof(u64);
> +		goto out;
> +	}
> +
>  	if (WARN_ON_ONCE(!field))
>  		goto out;
>

[SNIP]
> @@ -1520,6 +1557,8 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
>  
>  	update_cond_flag(file);
>  
> +	tracing_set_time_stamp_abs(file->tr, true);
> +

Hmm.. does it makes all events using absolute timestamp?  Where is it
turned off?

Thanks,
Namhyung


>  	if (trace_event_trigger_enable_disable(file, 1) < 0) {
>  		list_del_rcu(&data->list);
>  		update_cond_flag(file);
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-10  4:16 ` Namhyung Kim
@ 2017-02-10  9:34   ` Masami Hiramatsu
  2017-02-10 18:58     ` Tom Zanussi
  2017-02-10 18:43   ` Tom Zanussi
  1 sibling, 1 reply; 56+ messages in thread
From: Masami Hiramatsu @ 2017-02-10  9:34 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Tom Zanussi, rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

On Fri, 10 Feb 2017 13:16:17 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> >   ====
> >   Example - wakeup latency
> >   ====
> > 
> >   This basically implements the -RT latency_hist 'wakeup_latency'
> >   histogram using the synthetic events, variables, and actions
> >   described.  The output below is from a run of cyclictest using the
> >   following command:
> > 
> >     # rt-tests/cyclictest -p 80 -n -s -t 2
> > 
> >   What we're measuring the latency of is the time between when a
> >   thread (of cyclictest) is awakened and when it's scheduled in.  To
> >   do that we add triggers to sched_wakeup and sched_switch with the
> >   appropriate variables, and on a matching sched_switch event,
> >   generate a synthetic 'wakeup_latency' event.  Since it's just
> >   another trace event like any other, we can also define a histogram
> >   on that event, the output of which is what we see displayed when
> >   reading the wakeup_latency 'hist' file.
> > 
> >   First, we create a synthetic event called wakeup_latency, that
> >   references 3 variables from other events:
> > 
> >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> >                            pid=sched_switch:woken_pid \
> >                            prio=sched_switch:woken_prio' >> \
> >             /sys/kernel/debug/tracing/synthetic_events
> > 
> >   Next we add a trigger to sched_wakeup, which saves the value of the
> >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> >   that this happens only when 'comm==cyclictest'.
> > 
> >   Also, 'common_timestamp' is a new field defined on every event (if
> >   needed - if there are no users of timestamps in a trace, timestamps
> >   won't be saved and there's no additional overhead from that).
> > 
> >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> >              comm=="cyclictest"' >> \
> >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > 
> >   Next, we add a trigger to sched_switch.  When the pid being switched
> >   to matches the pid woken up by a previous sched_wakeup event, this
> >   event grabs the ts0 saved on that event, takes the difference
> >   between it and the current sched_switch's common_timestamp, and
> >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> >   other variables and then invokes the onmatch().trace() action which
> >   generates a new wakeup_latency event using those variables.
> > 
> >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> >             if next_comm=="cyclictest"' >> \
> >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> 
> As Masami said, I think the syntax is a bit hard to understand.  Also
> it'd be nice to access an event field directly (i.e. not by adding a
> field in a hist).  Maybe we can use a prefix like '$' to identify hist
> fields..

Ah that's a nice idea!

> 
> How about below?
> 
>   # echo 'wakeup_latency \
>   		lat=sched_switch.$wakeup_lat  \
> 		pid=sched_switch.next_pid     \
> 		prio=sched_switch.next_prio' >> \
> 	/sys/kernel/debug/tracing/synthetic_events

Should we define these parameter assignment at this.point?

I think this syntax binds wakeup_latency event to sched_switch too tight. I 
mean, if someone kicks this event from some other event, it may easily lose 
values.
So, at this point, we will define event name and what parameters it has,
until binding this event to onmatch().

>   # echo 'hist: \
>   		keys=pid: \
> 		ts0=common_timestamp.usec \
> 		if comm=="cyclictest"' >> \
> 	/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> 
>   # echo 'hist: \
> 		keys=next_pid: \
> 		wakeup_lat=common_timestamp.usec-$ts0: \
> 		onmatch(sched_wakeup).trace(wakeup_latency) \

This one seems much better for me, but I would like to ask you call event 
directly from onmatch, like as

 "onmatch(sched_wakeup).wakeup_latency(wakeup_lat,next_pid,next_prio)"

At this point, kernel will finalize the wakeup_latency event with wakeup_lat,
next_pid and next_prio.

> 		if next_comm=="cyclictest"' >> \
> 	/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> 
> By passing an event name to 'onmatch', we can know where to find $ts0
> easily IMHO.

Agree. That's easiler to understand :)

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type
  2017-02-10  6:04     ` Namhyung Kim
@ 2017-02-10 14:28       ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2017-02-10 14:28 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Tom Zanussi, tglx, mhiramat, linux-kernel, linux-rt-users

On Fri, 10 Feb 2017 15:04:51 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> > 
> > But it would require comments and/or a wrapper to define it so people
> > in the future know what it is doing.  
> 
> What about
> 
> 	event->type_len >= RINGBUF_TYPE_TIME_EXTEND
> 
> ?  I think it's easier to understand what it's doing.
> 

Either way, I'd like to have it defined as a macro.

-- Steve

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-10  4:16 ` Namhyung Kim
  2017-02-10  9:34   ` Masami Hiramatsu
@ 2017-02-10 18:43   ` Tom Zanussi
  1 sibling, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-10 18:43 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

Hi Namhyung,

On Fri, 2017-02-10 at 13:16 +0900, Namhyung Kim wrote:
> Hi Tom,
> 
> On Wed, Feb 08, 2017 at 11:24:56AM -0600, Tom Zanussi wrote:
> > This patchset adds support for 'inter-event' quantities to the trace
> > event subsystem.  The most important example of inter-event quantities
> > are latencies, or the time differences between two events.
> > 
> > One of the main motivations for adding this capability is to provide a
> > general-purpose base that existing existing tools such as the -RT
> > latency_hist patchset can be built upon, while at the same time
> > providing a simple way for users to track latencies (or any
> > inter-event quantity) generically between any two events.
> > 
> > Previous -RT latency_hist patchsets that take advantage of the trace
> > event subsystem have been submitted, but they essentially hard-code
> > special-case tracepoints and application logic in ways that can't be
> > reused.  It seemed to me that rather than providing a one-off patchset
> > devoted specifically to generating the specific histograms in the
> > latency_hist patchset, it should be possible to build the same
> > functionality on top of a generic layer allowing users to do similar
> > things for other non-latency_hist applications.
> > 
> > In addition to preliminary patches that add some basic missing
> > functionality such as a common ringbuffer-derived timestamp and
> > dynamically-creatable tracepoints, the overall patchset is divided up
> > into a few different areas that combine to produce the overall goal
> > (The Documentation patch explains all the details):
> 
> Looks very nice!
> 

Thanks!

...
 
> >   First, we create a synthetic event called wakeup_latency, that
> >   references 3 variables from other events:
> > 
> >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> >                            pid=sched_switch:woken_pid \
> >                            prio=sched_switch:woken_prio' >> \
> >             /sys/kernel/debug/tracing/synthetic_events
> > 
> >   Next we add a trigger to sched_wakeup, which saves the value of the
> >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> >   that this happens only when 'comm==cyclictest'.
> > 
> >   Also, 'common_timestamp' is a new field defined on every event (if
> >   needed - if there are no users of timestamps in a trace, timestamps
> >   won't be saved and there's no additional overhead from that).
> > 
> >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> >              comm=="cyclictest"' >> \
> >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > 
> >   Next, we add a trigger to sched_switch.  When the pid being switched
> >   to matches the pid woken up by a previous sched_wakeup event, this
> >   event grabs the ts0 saved on that event, takes the difference
> >   between it and the current sched_switch's common_timestamp, and
> >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> >   other variables and then invokes the onmatch().trace() action which
> >   generates a new wakeup_latency event using those variables.
> > 
> >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> >             if next_comm=="cyclictest"' >> \
> >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> 
> As Masami said, I think the syntax is a bit hard to understand.  Also
> it'd be nice to access an event field directly (i.e. not by adding a
> field in a hist).  Maybe we can use a prefix like '$' to identify hist
> fields..

Yes, that's a good point, and I like the $ syntax - it makes the
variables obvious to users.

> 
> How about below?
> 
>   # echo 'wakeup_latency \
>   		lat=sched_switch.$wakeup_lat  \
> 		pid=sched_switch.next_pid     \
> 		prio=sched_switch.next_prio' >> \
> 	/sys/kernel/debug/tracing/synthetic_events
> 
>   # echo 'hist: \
>   		keys=pid: \
> 		ts0=common_timestamp.usec \
> 		if comm=="cyclictest"' >> \
> 	/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> 
>   # echo 'hist: \
> 		keys=next_pid: \
> 		wakeup_lat=common_timestamp.usec-$ts0: \
> 		onmatch(sched_wakeup).trace(wakeup_latency) \
> 		if next_comm=="cyclictest"' >> \
> 	/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> 
> By passing an event name to 'onmatch', we can know where to find $ts0
> easily IMHO.
> 

I think that also makes a lot of sense - keeping things as explicit as
possible makes it clear what's going on.  I'll make these changes,
modulo the comments along similar lines from Masami.

Thanks for the excellent suggestions!

Tom

> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-10  9:34   ` Masami Hiramatsu
@ 2017-02-10 18:58     ` Tom Zanussi
  2017-02-13  1:04       ` Namhyung Kim
  0 siblings, 1 reply; 56+ messages in thread
From: Tom Zanussi @ 2017-02-10 18:58 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Namhyung Kim, rostedt, tglx, linux-kernel, linux-rt-users

Hi Masami,

On Fri, 2017-02-10 at 18:34 +0900, Masami Hiramatsu wrote:
> On Fri, 10 Feb 2017 13:16:17 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > >   ====
> > >   Example - wakeup latency
> > >   ====
> > > 
> > >   This basically implements the -RT latency_hist 'wakeup_latency'
> > >   histogram using the synthetic events, variables, and actions
> > >   described.  The output below is from a run of cyclictest using the
> > >   following command:
> > > 
> > >     # rt-tests/cyclictest -p 80 -n -s -t 2
> > > 
> > >   What we're measuring the latency of is the time between when a
> > >   thread (of cyclictest) is awakened and when it's scheduled in.  To
> > >   do that we add triggers to sched_wakeup and sched_switch with the
> > >   appropriate variables, and on a matching sched_switch event,
> > >   generate a synthetic 'wakeup_latency' event.  Since it's just
> > >   another trace event like any other, we can also define a histogram
> > >   on that event, the output of which is what we see displayed when
> > >   reading the wakeup_latency 'hist' file.
> > > 
> > >   First, we create a synthetic event called wakeup_latency, that
> > >   references 3 variables from other events:
> > > 
> > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > >                            pid=sched_switch:woken_pid \
> > >                            prio=sched_switch:woken_prio' >> \
> > >             /sys/kernel/debug/tracing/synthetic_events
> > > 
> > >   Next we add a trigger to sched_wakeup, which saves the value of the
> > >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> > >   that this happens only when 'comm==cyclictest'.
> > > 
> > >   Also, 'common_timestamp' is a new field defined on every event (if
> > >   needed - if there are no users of timestamps in a trace, timestamps
> > >   won't be saved and there's no additional overhead from that).
> > > 
> > >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> > >              comm=="cyclictest"' >> \
> > >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > 
> > >   Next, we add a trigger to sched_switch.  When the pid being switched
> > >   to matches the pid woken up by a previous sched_wakeup event, this
> > >   event grabs the ts0 saved on that event, takes the difference
> > >   between it and the current sched_switch's common_timestamp, and
> > >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> > >   other variables and then invokes the onmatch().trace() action which
> > >   generates a new wakeup_latency event using those variables.
> > > 
> > >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> > >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> > >             if next_comm=="cyclictest"' >> \
> > >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > 
> > As Masami said, I think the syntax is a bit hard to understand.  Also
> > it'd be nice to access an event field directly (i.e. not by adding a
> > field in a hist).  Maybe we can use a prefix like '$' to identify hist
> > fields..
> 
> Ah that's a nice idea!
> 
> > 
> > How about below?
> > 
> >   # echo 'wakeup_latency \
> >   		lat=sched_switch.$wakeup_lat  \
> > 		pid=sched_switch.next_pid     \
> > 		prio=sched_switch.next_prio' >> \
> > 	/sys/kernel/debug/tracing/synthetic_events
> 
> Should we define these parameter assignment at this.point?
> 
> I think this syntax binds wakeup_latency event to sched_switch too tight. I 
> mean, if someone kicks this event from some other event, it may easily lose 
> values.
> So, at this point, we will define event name and what parameters it has,
> until binding this event to onmatch().
> 

Right, I agree this binding doesn't need to be done here, good idea to
defer it as below...

> >   # echo 'hist: \
> >   		keys=pid: \
> > 		ts0=common_timestamp.usec \
> > 		if comm=="cyclictest"' >> \
> > 	/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > 
> >   # echo 'hist: \
> > 		keys=next_pid: \
> > 		wakeup_lat=common_timestamp.usec-$ts0: \
> > 		onmatch(sched_wakeup).trace(wakeup_latency) \
> 
> This one seems much better for me, but I would like to ask you call event 
> directly from onmatch, like as
> 
>  "onmatch(sched_wakeup).wakeup_latency(wakeup_lat,next_pid,next_prio)"
> 
> At this point, kernel will finalize the wakeup_latency event with wakeup_lat,
> next_pid and next_prio.
> 

Yes, I like this much better - things are no longer so implicit and
therefore subject to confusion, and the syntax itself makes more sense,
even if it is a bit more verbose on the trigger, which is fine.

Thanks for taking the time to think about this and for suggesting these
great ideas..

Tom

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-10 18:58     ` Tom Zanussi
@ 2017-02-13  1:04       ` Namhyung Kim
  2017-02-14  9:37         ` Masami Hiramatsu
  2017-02-14 15:27         ` Tom Zanussi
  0 siblings, 2 replies; 56+ messages in thread
From: Namhyung Kim @ 2017-02-13  1:04 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: Masami Hiramatsu, rostedt, tglx, linux-kernel, linux-rt-users

On Fri, Feb 10, 2017 at 12:58:30PM -0600, Tom Zanussi wrote:
> Hi Masami,
> 
> On Fri, 2017-02-10 at 18:34 +0900, Masami Hiramatsu wrote:
> > On Fri, 10 Feb 2017 13:16:17 +0900
> > Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> > > >   ====
> > > >   Example - wakeup latency
> > > >   ====
> > > > 
> > > >   This basically implements the -RT latency_hist 'wakeup_latency'
> > > >   histogram using the synthetic events, variables, and actions
> > > >   described.  The output below is from a run of cyclictest using the
> > > >   following command:
> > > > 
> > > >     # rt-tests/cyclictest -p 80 -n -s -t 2
> > > > 
> > > >   What we're measuring the latency of is the time between when a
> > > >   thread (of cyclictest) is awakened and when it's scheduled in.  To
> > > >   do that we add triggers to sched_wakeup and sched_switch with the
> > > >   appropriate variables, and on a matching sched_switch event,
> > > >   generate a synthetic 'wakeup_latency' event.  Since it's just
> > > >   another trace event like any other, we can also define a histogram
> > > >   on that event, the output of which is what we see displayed when
> > > >   reading the wakeup_latency 'hist' file.
> > > > 
> > > >   First, we create a synthetic event called wakeup_latency, that
> > > >   references 3 variables from other events:
> > > > 
> > > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > > >                            pid=sched_switch:woken_pid \
> > > >                            prio=sched_switch:woken_prio' >> \
> > > >             /sys/kernel/debug/tracing/synthetic_events
> > > > 
> > > >   Next we add a trigger to sched_wakeup, which saves the value of the
> > > >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> > > >   that this happens only when 'comm==cyclictest'.
> > > > 
> > > >   Also, 'common_timestamp' is a new field defined on every event (if
> > > >   needed - if there are no users of timestamps in a trace, timestamps
> > > >   won't be saved and there's no additional overhead from that).
> > > > 
> > > >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> > > >              comm=="cyclictest"' >> \
> > > >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > > 
> > > >   Next, we add a trigger to sched_switch.  When the pid being switched
> > > >   to matches the pid woken up by a previous sched_wakeup event, this
> > > >   event grabs the ts0 saved on that event, takes the difference
> > > >   between it and the current sched_switch's common_timestamp, and
> > > >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> > > >   other variables and then invokes the onmatch().trace() action which
> > > >   generates a new wakeup_latency event using those variables.
> > > > 
> > > >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> > > >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> > > >             if next_comm=="cyclictest"' >> \
> > > >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > > 
> > > As Masami said, I think the syntax is a bit hard to understand.  Also
> > > it'd be nice to access an event field directly (i.e. not by adding a
> > > field in a hist).  Maybe we can use a prefix like '$' to identify hist
> > > fields..
> > 
> > Ah that's a nice idea!
> > 
> > > 
> > > How about below?
> > > 
> > >   # echo 'wakeup_latency \
> > >   		lat=sched_switch.$wakeup_lat  \
> > > 		pid=sched_switch.next_pid     \
> > > 		prio=sched_switch.next_prio' >> \
> > > 	/sys/kernel/debug/tracing/synthetic_events
> > 
> > Should we define these parameter assignment at this.point?
> > 
> > I think this syntax binds wakeup_latency event to sched_switch too tight. I 
> > mean, if someone kicks this event from some other event, it may easily lose 
> > values.
> > So, at this point, we will define event name and what parameters it has,
> > until binding this event to onmatch().
> > 
> 
> Right, I agree this binding doesn't need to be done here, good idea to
> defer it as below...
> 
> > >   # echo 'hist: \
> > >   		keys=pid: \
> > > 		ts0=common_timestamp.usec \
> > > 		if comm=="cyclictest"' >> \
> > > 	/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > 
> > >   # echo 'hist: \
> > > 		keys=next_pid: \
> > > 		wakeup_lat=common_timestamp.usec-$ts0: \
> > > 		onmatch(sched_wakeup).trace(wakeup_latency) \
> > 
> > This one seems much better for me, but I would like to ask you call event 
> > directly from onmatch, like as
> > 
> >  "onmatch(sched_wakeup).wakeup_latency(wakeup_lat,next_pid,next_prio)"
> > 
> > At this point, kernel will finalize the wakeup_latency event with wakeup_lat,
> > next_pid and next_prio.
> > 
> 
> Yes, I like this much better - things are no longer so implicit and
> therefore subject to confusion, and the syntax itself makes more sense,
> even if it is a bit more verbose on the trigger, which is fine.

I thought about it too, but it needs to add some kind of type checking
then.  What if another hist generates the event with totally different
info?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 11/21] tracing: Add variable support to hist triggers
  2017-02-08 17:25 ` [RFC][PATCH 11/21] tracing: Add variable support to hist triggers Tom Zanussi
@ 2017-02-13  6:03   ` Namhyung Kim
  2017-02-14 15:25     ` Tom Zanussi
  0 siblings, 1 reply; 56+ messages in thread
From: Namhyung Kim @ 2017-02-13  6:03 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

On Wed, Feb 08, 2017 at 11:25:07AM -0600, Tom Zanussi wrote:
> Add support for saving the value of a current event's event field by
> assigning it to a variable that can be read by a subsequent event.
> 
> The basic syntax for saving a variable is to simply prefix a unique
> variable name not corresponding to any keyword along with an '=' sign
> to any event field.
> 
> Both keys and values can be saved and retrieved in this way:
> 
>     # echo 'hist:keys=next_pid:vals=ts0=common_timestamp ...
>     # echo 'hist:key=timer_pid=common_pid ...'
> 
> If a variable isn't a key variable or prefixed with 'vals=', the
> associated event field will be saved in a variable but won't be summed
> as a value:
> 
>     # echo 'hist:keys=next_pid:ts1=common_timestamp:...
> 
> Multiple variables can be assigned at the same time:
> 
>     # echo 'hist:keys=pid:vals=ts0=common_timestamp,b=field1,field2 ...
> 
> Variables set as above can be used by being referenced from another
> event, as described in a subsequent patch.
> 
> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> ---
>  kernel/trace/trace_events_hist.c | 160 ++++++++++++++++++++++++++++++++-------
>  1 file changed, 131 insertions(+), 29 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 8d7f7dd..e707577 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -29,6 +29,7 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
>  				struct ring_buffer_event *rbe);
>  
>  #define HIST_FIELD_OPERANDS_MAX	2
> +#define HIST_ASSIGNMENT_MAX	4
>  
>  struct hist_field {
>  	struct ftrace_event_field	*field;
> @@ -36,8 +37,10 @@ struct hist_field {
>  	hist_field_fn_t			fn;
>  	unsigned int			size;
>  	unsigned int			offset;
> -	unsigned int                    is_signed;
> +	unsigned int			is_signed;

It seems like an unnecessary change.

>  	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
> +	u64				var_val;
> +	char				*var_name;
>  };
>  
>  static u64 hist_field_none(struct hist_field *field, void *event,
> @@ -140,12 +143,16 @@ enum hist_field_flags {
>  	HIST_FIELD_FL_SYSCALL		= 128,
>  	HIST_FIELD_FL_STACKTRACE	= 256,
>  	HIST_FIELD_FL_LOG2		= 512,
> -	HIST_FIELD_FL_TIMESTAMP		= 1024,
> +	HIST_FIELD_FL_VAR		= 1024,
> +	HIST_FIELD_FL_VAR_ONLY		= 2048,
> +	HIST_FIELD_FL_TIMESTAMP		= 4096,

Why did you move the timestamp?

>  };
>  
>  struct hist_trigger_attrs {
>  	char		*keys_str;
>  	char		*vals_str;
> +	char		*assignment_str[HIST_ASSIGNMENT_MAX];
> +	unsigned int	n_assignments;
>  	char		*sort_key_str;
>  	char		*name;
>  	bool		pause;
> @@ -241,9 +248,14 @@ static int parse_map_size(char *str)
>  
>  static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
>  {
> +	unsigned int i;
> +
>  	if (!attrs)
>  		return;
>  
> +	for (i = 0; i < attrs->n_assignments; i++)
> +		kfree(attrs->assignment_str[i]);
> +
>  	kfree(attrs->name);
>  	kfree(attrs->sort_key_str);
>  	kfree(attrs->keys_str);
> @@ -258,9 +270,9 @@ static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
>  	if ((strncmp(str, "key=", strlen("key=")) == 0) ||
>  	    (strncmp(str, "keys=", strlen("keys=")) == 0))
>  		attrs->keys_str = kstrdup(str, GFP_KERNEL);
> -	else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
> -		 (strncmp(str, "vals=", strlen("vals=")) == 0) ||
> -		 (strncmp(str, "values=", strlen("values=")) == 0))
> +	else if (((strncmp(str, "val=", strlen("val=")) == 0) ||
> +		  (strncmp(str, "vals=", strlen("vals=")) == 0) ||
> +		  (strncmp(str, "values=", strlen("values=")) == 0)))

Looks unnecessary too.

>  		attrs->vals_str = kstrdup(str, GFP_KERNEL);
>  	else if (strncmp(str, "sort=", strlen("sort=")) == 0)
>  		attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
> @@ -274,8 +286,22 @@ static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
>  			goto out;
>  		}
>  		attrs->map_bits = map_bits;
> -	} else
> -		ret = -EINVAL;
> +	} else {
> +		char *assignment;
> +
> +		if (attrs->n_assignments == HIST_ASSIGNMENT_MAX) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		assignment = kstrdup(str, GFP_KERNEL);
> +		if (!assignment) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		attrs->assignment_str[attrs->n_assignments++] = assignment;
> +	}
>   out:
>  	return ret;
>  }

[SNIP]
> @@ -839,8 +913,7 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
>  			idx = tracing_map_add_key_field(map,
>  							hist_field->offset,
>  							cmp_fn);
> -
> -		} else
> +		} else if (!(hist_field->flags & HIST_FIELD_FL_VAR))
>  			idx = tracing_map_add_sum_field(map);
>  
>  		if (idx < 0)
> @@ -931,6 +1004,11 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
>  	for_each_hist_val_field(i, hist_data) {
>  		hist_field = hist_data->fields[i];
>  		hist_val = hist_field->fn(hist_field, rec, rbe);
> +		if (hist_field->flags & HIST_FIELD_FL_VAR) {
> +			hist_field->var_val = hist_val;
> +			if (hist_field->flags & HIST_FIELD_FL_VAR_ONLY)
> +				continue;
> +		}
>  		tracing_map_update_sum(elt, i, hist_val);

Hmm.. you didn't add a field for HIST_FIELD_FL_VAR but it attempts to
update it, no?


>  	}
>  }
> @@ -996,17 +1074,21 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
>  			} else
>  				key = (void *)&field_contents;
>  		}
> -
>  		if (use_compound_key)
>  			add_to_key(compound_key, key, key_field, rec);
> +
> +		if (key_field->flags & HIST_FIELD_FL_VAR)
> +			key_field->var_val = (u64)key;
>  	}
>  
>  	if (use_compound_key)
>  		key = compound_key;
>  
>  	elt = tracing_map_insert(hist_data->map, key);
> -	if (elt)
> -		hist_trigger_elt_update(hist_data, elt, rec, rbe);
> +	if (!elt)
> +		return;
> +
> +	hist_trigger_elt_update(hist_data, elt, rec, rbe);
>  }
>  
>  static void hist_trigger_stacktrace_print(struct seq_file *m,
> @@ -1228,7 +1310,12 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
>  {
>  	const char *field_name = hist_field_name(hist_field);
>  
> -	seq_printf(m, "%s", field_name);
> +	if (hist_field->var_name)
> +		seq_printf(m, "%s=", hist_field->var_name);
> +
> +	if (field_name)
> +		seq_printf(m, "%s", field_name);
> +
>  	if (hist_field->flags) {
>  		const char *flags_str = get_hist_field_flags(hist_field);
>  
> @@ -1237,6 +1324,16 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
>  	}
>  }
>  
> +static bool var_only(struct hist_trigger_data *hist_data)
> +{
> +	unsigned int i;
> +
> +	for_each_hist_val_field(i, hist_data)
> +		if (hist_data->fields[i]->flags & HIST_FIELD_FL_VAR_ONLY)
> +			return true;
> +	return false;
> +}

This means if there's a var-only field, the whole hist will be treated
as var-only, right?  What if it has normal fields and var-only fields
at the same time?  Didn't it have "hitcount"?  Please see below..

> +
>  static int event_hist_trigger_print(struct seq_file *m,
>  				    struct event_trigger_ops *ops,
>  				    struct event_trigger_data *data)
> @@ -1266,15 +1363,19 @@ static int event_hist_trigger_print(struct seq_file *m,
>  			hist_field_print(m, key_field);
>  	}
>  
> -	seq_puts(m, ":vals=");
> +	if (!var_only(hist_data))
> +		seq_puts(m, ":vals=");
> +	else
> +		seq_puts(m, ":");
>  
>  	for_each_hist_val_field(i, hist_data) {
> -		if (i == HITCOUNT_IDX)
> +		if (i == HITCOUNT_IDX && !var_only(hist_data))
>  			seq_puts(m, "hitcount");

Looks like var-only hist cannot have hitcount, right?

>  		else if (hist_data->fields[i]->flags & HIST_FIELD_FL_TIMESTAMP)
>  			seq_puts(m, "common_timestamp");
>  		else {
> -			seq_puts(m, ",");
> +			if (!var_only(hist_data))
> +				seq_puts(m, ",");

If a var-only hist hist can have multiple fields, it should print ","
as well IMHO.  Also it seems "common_timestamp" also needs it.

Thanks,
Namhyung

>  			hist_field_print(m, hist_data->fields[i]);
>  		}
>  	}
> @@ -1673,6 +1774,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
>  	}
>  
>  	ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> +
>  	/*
>  	 * The above returns on success the # of triggers registered,
>  	 * but if it didn't register any it returns zero.  Consider no
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 12/21] tracing: Account for variables in named trigger compatibility
  2017-02-08 17:25 ` [RFC][PATCH 12/21] tracing: Account for variables in named trigger compatibility Tom Zanussi
@ 2017-02-13  6:04   ` Namhyung Kim
  2017-02-14 15:26     ` Tom Zanussi
  0 siblings, 1 reply; 56+ messages in thread
From: Namhyung Kim @ 2017-02-13  6:04 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

On Wed, Feb 08, 2017 at 11:25:08AM -0600, Tom Zanussi wrote:
> Named triggers must also have the same set of variables in order to be
> considered compatible - update the trigger match test to account for
> that.
> 
> The reason for this requirement is that named triggers with variables
> are meant to allow one or more events to set the same variable.
> 
> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> ---
>  kernel/trace/trace_events_hist.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index e707577..889455e 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1576,6 +1576,10 @@ static bool hist_trigger_match(struct event_trigger_data *data,
>  			return false;
>  		if (key_field->is_signed != key_field_test->is_signed)
>  			return false;
> +		if ((key_field->var_name && !key_field_test->var_name) ||
> +		    (!key_field->var_name && key_field_test->var_name) ||
> +		    strcmp(key_field->var_name, key_field_test->var_name) != 0)
> +			return false;

What if key_field->var_name and key_field_test->var_name are both
NULL?

Thanks,
Namhyung

>  	}
>  
>  	for (i = 0; i < hist_data->n_sort_keys; i++) {
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 13/21] tracing: Add simple expression support to hist triggers
  2017-02-08 17:25 ` [RFC][PATCH 13/21] tracing: Add simple expression support to hist triggers Tom Zanussi
@ 2017-02-14  2:37   ` Namhyung Kim
  2017-02-14 15:29     ` Tom Zanussi
  0 siblings, 1 reply; 56+ messages in thread
From: Namhyung Kim @ 2017-02-14  2:37 UTC (permalink / raw)
  To: Tom Zanussi; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

On Wed, Feb 08, 2017 at 11:25:09AM -0600, Tom Zanussi wrote:
> Add support for simple addition, subtraction, and unary expressions
> (-(expr) and expr, where expr = b-a, a+b, a+b+c) to hist triggers, in
> order to support a minimal set of useful inter-event calculations.
> 
> These operations are needed for calculating latencies between events
> (timestamp1-timestamp0) and for combined latencies (latencies over 3
> or more events).
> 
> In the process, factor out some common code from key and value
> parsing.
> 
> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> ---

[SNIP]
> +static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
> +				     struct trace_event_file *file,
> +				     char *str, unsigned long flags,
> +				     char *var_name);
> +
> +static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
> +				      struct trace_event_file *file,
> +				      char *str, unsigned long flags,
> +				      char *var_name)
> +{
> +	struct hist_field *operand1, *expr = NULL;
> +	struct ftrace_event_field *field = NULL;
> +	unsigned long operand_flags;
> +	char *operand1_str;
> +	int ret = 0;
> +	char *s;
> +
> +	// we support only -(xxx) i.e. explicit parens required
> +
> +	str++; // skip leading '-'
> +
> +	s = strchr(str, '(');
> +	if (s)
> +		str++;
> +	else {
> +		ret = -EINVAL;
> +		goto free;
> +	}
> +
> +	s = strchr(str, ')');
> +	if (s)
> +		*s = '\0';
> +	else {
> +		ret = -EINVAL; // no closing ')'
> +		goto free;
> +	}
> +
> +	operand1_str = strsep(&str, "(");
> +	if (!operand1_str)
> +		goto free;
> +
> +	flags |= HIST_FIELD_FL_EXPR;
> +	expr = create_hist_field(NULL, flags, var_name);
> +	if (!expr) {
> +		ret = -ENOMEM;
> +		goto free;
> +	}
> +
> +	operand_flags = 0;
> +	operand1 = parse_expr(hist_data, file, str, operand_flags, NULL);

Doesn't it create an unbounded recursion?

Thanks,
Namhyung


> +	if (IS_ERR(operand1)) {
> +		ret = PTR_ERR(operand1);
> +		goto free;
> +	}
> +
> +	if (operand1 == NULL) {
> +		operand_flags = 0;
> +		field = parse_field(hist_data, file, operand1_str,
> +				    &operand_flags);
> +		if (IS_ERR(field)) {
> +			ret = PTR_ERR(field);
> +			goto free;
> +		}
> +		operand1 = create_hist_field(field, operand_flags, NULL);
> +		if (!operand1) {
> +			ret = -ENOMEM;
> +			goto free;
> +		}
> +	}
> +
> +	expr->fn = hist_field_unary_minus;
> +	expr->operands[0] = operand1;
> +	expr->operator = FIELD_OP_UNARY_MINUS;
> +	expr->name = expr_str(expr);
> +
> +	return expr;
> + free:
> +	return ERR_PTR(ret);
> +}
> +
> +static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
> +				     struct trace_event_file *file,
> +				     char *str, unsigned long flags,
> +				     char *var_name)
> +{
> +	struct hist_field *operand1, *operand2, *expr = NULL;
> +	struct ftrace_event_field *field = NULL;
> +	unsigned long operand_flags;
> +	int field_op, ret = -EINVAL;
> +	char *sep, *operand1_str;
> +
> +	field_op = contains_operator(str);
> +	if (field_op == FIELD_OP_NONE)
> +		return NULL;
> +
> +	if (field_op == FIELD_OP_UNARY_MINUS)
> +		return parse_unary(hist_data, file, str, flags, var_name);
> +
> +	switch (field_op) {
> +	case FIELD_OP_MINUS:
> +		sep = "-";
> +		break;
> +	case FIELD_OP_PLUS:
> +		sep = "+";
> +		break;
> +	default:
> +		goto free;
> +	}
> +
> +	operand1_str = strsep(&str, sep);
> +	if (!operand1_str || !str)
> +		goto free;
> +
> +	operand_flags = 0;
> +	field = parse_field(hist_data, file, operand1_str, &operand_flags);
> +	if (IS_ERR(field)) {
> +		ret = PTR_ERR(field);
> +		goto free;
> +	}
> +	operand1 = create_hist_field(field, operand_flags, NULL);
> +	if (!operand1) {
> +		ret = -ENOMEM;
> +		operand1 = NULL;
> +		goto free;
> +	}
> +
> +	// rest of string could be another expression e.g. b+c in a+b+c
> +	operand_flags = 0;
> +	operand2 = parse_expr(hist_data, file, str, operand_flags, NULL);
> +	if (IS_ERR(operand2)) {
> +		ret = PTR_ERR(operand2);
> +		operand2 = NULL;
> +		goto free;
> +	}
> +	if (!operand2) {
> +		operand_flags = 0;
> +		field = parse_field(hist_data, file, str, &operand_flags);
> +		if (IS_ERR(field)) {
> +			ret = PTR_ERR(field);
> +			goto free;
> +		}
> +		operand2 = create_hist_field(field, operand_flags, NULL);
> +		if (!operand2) {
> +			ret = -ENOMEM;
> +			operand2 = NULL;
> +			goto free;
> +		}
> +	}
> +
> +	flags |= HIST_FIELD_FL_EXPR;
> +	expr = create_hist_field(NULL, flags, var_name);
> +	if (!expr) {
> +		ret = -ENOMEM;
> +		goto free;
> +	}
> +
> +	expr->operands[0] = operand1;
> +	expr->operands[1] = operand2;
> +	expr->operator = field_op;
> +	expr->name = expr_str(expr);
> +
> +	switch (field_op) {
> +	case FIELD_OP_MINUS:
> +		expr->fn = hist_field_minus;
> +		break;
> +	case FIELD_OP_PLUS:
> +		expr->fn = hist_field_plus;
> +		break;
> +	default:
> +		goto free;
> +	}
> +
> +	return expr;
> + free:
> +	destroy_hist_field(operand1);
> +	destroy_hist_field(operand2);
> +	destroy_hist_field(expr);
> +
> +	return ERR_PTR(ret);
> +}
> +
>  static int create_hitcount_val(struct hist_trigger_data *hist_data)
>  {
>  	hist_data->fields[HITCOUNT_IDX] =
> @@ -529,8 +874,9 @@ static int create_val_field(struct hist_trigger_data *hist_data,
>  			    char *field_str, char *var_name)
>  {
>  	struct ftrace_event_field *field = NULL;
> -	char *field_name, *token;
> +	struct hist_field *hist_field;
>  	unsigned long flags = 0;
> +	char *token;
>  	int ret = 0;
>  
>  	if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX))
> @@ -549,32 +895,27 @@ static int create_val_field(struct hist_trigger_data *hist_data,
>  		flags |= HIST_FIELD_FL_VAR;
>  	}
>  
> -	field_name = strsep(&field_str, ".");
> -	if (field_str) {
> -		if (strcmp(field_str, "hex") == 0)
> -			flags |= HIST_FIELD_FL_HEX;
> -		else {
> -			ret = -EINVAL;
> +	hist_field = parse_expr(hist_data, file, field_str, flags, var_name);
> +	if (IS_ERR(hist_field)) {
> +		ret = PTR_ERR(hist_field);
> +		goto out;
> +	}
> +
> +	if (!hist_field) {
> +		field = parse_field(hist_data, file, field_str, &flags);
> +		if (IS_ERR(field)) {
> +			ret = PTR_ERR(field);
>  			goto out;
>  		}
> -	}
>  
> -	if (strcmp(field_name, "common_timestamp") == 0) {
> -		flags |= HIST_FIELD_FL_TIMESTAMP;
> -		hist_data->enable_timestamps = true;
> -	} else {
> -		field = trace_find_event_field(file->event_call, field_name);
> -		if (!field) {
> -			ret = -EINVAL;
> +		hist_field = create_hist_field(field, flags, var_name);
> +		if (!hist_field) {
> +			ret = -ENOMEM;
>  			goto out;
>  		}
>  	}
>  
> -	hist_data->fields[val_idx] = create_hist_field(field, flags, var_name);
> -	if (!hist_data->fields[val_idx]) {
> -		ret = -ENOMEM;
> -		goto out;
> -	}
> +	hist_data->fields[val_idx] = hist_field;
>  
>  	++hist_data->n_vals;
>  
> @@ -623,6 +964,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
>  			    char *field_str)
>  {
>  	struct ftrace_event_field *field = NULL;
> +	struct hist_field *hist_field;
>  	unsigned long flags = 0;
>  	unsigned int key_size;
>  	char *var_name;
> @@ -640,53 +982,40 @@ static int create_key_field(struct hist_trigger_data *hist_data,
>  	if (strcmp(field_str, "stacktrace") == 0) {
>  		flags |= HIST_FIELD_FL_STACKTRACE;
>  		key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH;
> +		hist_field = create_hist_field(field, flags, var_name);
>  	} else {
> -		char *field_name = strsep(&field_str, ".");
> -
> -		if (field_str) {
> -			if (strcmp(field_str, "hex") == 0)
> -				flags |= HIST_FIELD_FL_HEX;
> -			else if (strcmp(field_str, "sym") == 0)
> -				flags |= HIST_FIELD_FL_SYM;
> -			else if (strcmp(field_str, "sym-offset") == 0)
> -				flags |= HIST_FIELD_FL_SYM_OFFSET;
> -			else if ((strcmp(field_str, "execname") == 0) &&
> -				 (strcmp(field_name, "common_pid") == 0))
> -				flags |= HIST_FIELD_FL_EXECNAME;
> -			else if (strcmp(field_str, "syscall") == 0)
> -				flags |= HIST_FIELD_FL_SYSCALL;
> -			else if (strcmp(field_str, "log2") == 0)
> -				flags |= HIST_FIELD_FL_LOG2;
> -			else {
> -				ret = -EINVAL;
> -				goto out;
> -			}
> +		hist_field = parse_expr(hist_data, file, field_str, flags,
> +					var_name);
> +		if (IS_ERR(hist_field)) {
> +			ret = PTR_ERR(hist_field);
> +			goto out;
>  		}
>  
> -		if (strcmp(field_name, "common_timestamp") == 0) {
> -			flags |= HIST_FIELD_FL_TIMESTAMP;
> -			hist_data->enable_timestamps = true;
> -			key_size = sizeof(u64);
> -		} else {
> -			field = trace_find_event_field(file->event_call, field_name);
> -			if (!field) {
> -				ret = -EINVAL;
> +		if (!hist_field) {
> +			field = parse_field(hist_data, file, field_str,
> +					    &flags);
> +			if (IS_ERR(field)) {
> +				ret = PTR_ERR(field);
>  				goto out;
>  			}
>  
> -			if (is_string_field(field))
> -				key_size = MAX_FILTER_STR_VAL;
> -			else
> -				key_size = field->size;
> +			hist_field = create_hist_field(field, flags, var_name);
> +			if (!hist_field) {
> +				ret = -ENOMEM;
> +				goto out;
> +			}
>  		}
> -	}
>  
> -	hist_data->fields[key_idx] = create_hist_field(field, flags, var_name);
> -	if (!hist_data->fields[key_idx]) {
> -		ret = -ENOMEM;
> -		goto out;
> +		if (flags & HIST_FIELD_FL_TIMESTAMP)
> +			key_size = sizeof(u64);
> +		else if (is_string_field(field))
> +			key_size = MAX_FILTER_STR_VAL;
> +		else
> +			key_size = field->size;
>  	}
>  
> +	hist_data->fields[key_idx] = hist_field;
> +
>  	key_size = ALIGN(key_size, sizeof(u64));
>  	hist_data->fields[key_idx]->size = key_size;
>  	hist_data->fields[key_idx]->offset = key_offset;
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-13  1:04       ` Namhyung Kim
@ 2017-02-14  9:37         ` Masami Hiramatsu
  2017-02-14 15:27         ` Tom Zanussi
  1 sibling, 0 replies; 56+ messages in thread
From: Masami Hiramatsu @ 2017-02-14  9:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Tom Zanussi, Masami Hiramatsu, rostedt, tglx, linux-kernel,
	linux-rt-users

On Mon, 13 Feb 2017 10:04:16 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> On Fri, Feb 10, 2017 at 12:58:30PM -0600, Tom Zanussi wrote:
> > Hi Masami,
> > 
> > On Fri, 2017-02-10 at 18:34 +0900, Masami Hiramatsu wrote:
> > > On Fri, 10 Feb 2017 13:16:17 +0900
> > > Namhyung Kim <namhyung@kernel.org> wrote:
> > > 
> > > > >   ====
> > > > >   Example - wakeup latency
> > > > >   ====
> > > > > 
> > > > >   This basically implements the -RT latency_hist 'wakeup_latency'
> > > > >   histogram using the synthetic events, variables, and actions
> > > > >   described.  The output below is from a run of cyclictest using the
> > > > >   following command:
> > > > > 
> > > > >     # rt-tests/cyclictest -p 80 -n -s -t 2
> > > > > 
> > > > >   What we're measuring the latency of is the time between when a
> > > > >   thread (of cyclictest) is awakened and when it's scheduled in.  To
> > > > >   do that we add triggers to sched_wakeup and sched_switch with the
> > > > >   appropriate variables, and on a matching sched_switch event,
> > > > >   generate a synthetic 'wakeup_latency' event.  Since it's just
> > > > >   another trace event like any other, we can also define a histogram
> > > > >   on that event, the output of which is what we see displayed when
> > > > >   reading the wakeup_latency 'hist' file.
> > > > > 
> > > > >   First, we create a synthetic event called wakeup_latency, that
> > > > >   references 3 variables from other events:
> > > > > 
> > > > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > > > >                            pid=sched_switch:woken_pid \
> > > > >                            prio=sched_switch:woken_prio' >> \
> > > > >             /sys/kernel/debug/tracing/synthetic_events
> > > > > 
> > > > >   Next we add a trigger to sched_wakeup, which saves the value of the
> > > > >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> > > > >   that this happens only when 'comm==cyclictest'.
> > > > > 
> > > > >   Also, 'common_timestamp' is a new field defined on every event (if
> > > > >   needed - if there are no users of timestamps in a trace, timestamps
> > > > >   won't be saved and there's no additional overhead from that).
> > > > > 
> > > > >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> > > > >              comm=="cyclictest"' >> \
> > > > >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > > > 
> > > > >   Next, we add a trigger to sched_switch.  When the pid being switched
> > > > >   to matches the pid woken up by a previous sched_wakeup event, this
> > > > >   event grabs the ts0 saved on that event, takes the difference
> > > > >   between it and the current sched_switch's common_timestamp, and
> > > > >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> > > > >   other variables and then invokes the onmatch().trace() action which
> > > > >   generates a new wakeup_latency event using those variables.
> > > > > 
> > > > >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> > > > >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> > > > >             if next_comm=="cyclictest"' >> \
> > > > >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > > > 
> > > > As Masami said, I think the syntax is a bit hard to understand.  Also
> > > > it'd be nice to access an event field directly (i.e. not by adding a
> > > > field in a hist).  Maybe we can use a prefix like '$' to identify hist
> > > > fields..
> > > 
> > > Ah that's a nice idea!
> > > 
> > > > 
> > > > How about below?
> > > > 
> > > >   # echo 'wakeup_latency \
> > > >   		lat=sched_switch.$wakeup_lat  \
> > > > 		pid=sched_switch.next_pid     \
> > > > 		prio=sched_switch.next_prio' >> \
> > > > 	/sys/kernel/debug/tracing/synthetic_events
> > > 
> > > Should we define these parameter assignment at this.point?
> > > 
> > > I think this syntax binds wakeup_latency event to sched_switch too tight. I 
> > > mean, if someone kicks this event from some other event, it may easily lose 
> > > values.
> > > So, at this point, we will define event name and what parameters it has,
> > > until binding this event to onmatch().
> > > 
> > 
> > Right, I agree this binding doesn't need to be done here, good idea to
> > defer it as below...
> > 
> > > >   # echo 'hist: \
> > > >   		keys=pid: \
> > > > 		ts0=common_timestamp.usec \
> > > > 		if comm=="cyclictest"' >> \
> > > > 	/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > > 
> > > >   # echo 'hist: \
> > > > 		keys=next_pid: \
> > > > 		wakeup_lat=common_timestamp.usec-$ts0: \
> > > > 		onmatch(sched_wakeup).trace(wakeup_latency) \
> > > 
> > > This one seems much better for me, but I would like to ask you call event 
> > > directly from onmatch, like as
> > > 
> > >  "onmatch(sched_wakeup).wakeup_latency(wakeup_lat,next_pid,next_prio)"
> > > 
> > > At this point, kernel will finalize the wakeup_latency event with wakeup_lat,
> > > next_pid and next_prio.
> > > 
> > 
> > Yes, I like this much better - things are no longer so implicit and
> > therefore subject to confusion, and the syntax itself makes more sense,
> > even if it is a bit more verbose on the trigger, which is fine.
> 
> I thought about it too, but it needs to add some kind of type checking
> then.  What if another hist generates the event with totally different
> info?

In that case, we can just reject the onmatch command :)
Anyway, when we bind it to other events, the type should be checked.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 11/21] tracing: Add variable support to hist triggers
  2017-02-13  6:03   ` Namhyung Kim
@ 2017-02-14 15:25     ` Tom Zanussi
  0 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-14 15:25 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

Hi Namhyung,

On Mon, 2017-02-13 at 15:03 +0900, Namhyung Kim wrote:
> On Wed, Feb 08, 2017 at 11:25:07AM -0600, Tom Zanussi wrote:
> > Add support for saving the value of a current event's event field by
> > assigning it to a variable that can be read by a subsequent event.
> > 
> > The basic syntax for saving a variable is to simply prefix a unique
> > variable name not corresponding to any keyword along with an '=' sign
> > to any event field.
> > 
> > Both keys and values can be saved and retrieved in this way:
> > 
> >     # echo 'hist:keys=next_pid:vals=ts0=common_timestamp ...
> >     # echo 'hist:key=timer_pid=common_pid ...'
> > 
> > If a variable isn't a key variable or prefixed with 'vals=', the
> > associated event field will be saved in a variable but won't be summed
> > as a value:
> > 
> >     # echo 'hist:keys=next_pid:ts1=common_timestamp:...
> > 
> > Multiple variables can be assigned at the same time:
> > 
> >     # echo 'hist:keys=pid:vals=ts0=common_timestamp,b=field1,field2 ...
> > 
> > Variables set as above can be used by being referenced from another
> > event, as described in a subsequent patch.
> > 
> > Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> > ---
> >  kernel/trace/trace_events_hist.c | 160 ++++++++++++++++++++++++++++++++-------
> >  1 file changed, 131 insertions(+), 29 deletions(-)
> > 
> > diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> > index 8d7f7dd..e707577 100644
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -29,6 +29,7 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
> >  				struct ring_buffer_event *rbe);
> >  
> >  #define HIST_FIELD_OPERANDS_MAX	2
> > +#define HIST_ASSIGNMENT_MAX	4
> >  
> >  struct hist_field {
> >  	struct ftrace_event_field	*field;
> > @@ -36,8 +37,10 @@ struct hist_field {
> >  	hist_field_fn_t			fn;
> >  	unsigned int			size;
> >  	unsigned int			offset;
> > -	unsigned int                    is_signed;
> > +	unsigned int			is_signed;
> 
> It seems like an unnecessary change.
> 

Yes, it is.

> >  	struct hist_field		*operands[HIST_FIELD_OPERANDS_MAX];
> > +	u64				var_val;
> > +	char				*var_name;
> >  };
> >  
> >  static u64 hist_field_none(struct hist_field *field, void *event,
> > @@ -140,12 +143,16 @@ enum hist_field_flags {
> >  	HIST_FIELD_FL_SYSCALL		= 128,
> >  	HIST_FIELD_FL_STACKTRACE	= 256,
> >  	HIST_FIELD_FL_LOG2		= 512,
> > -	HIST_FIELD_FL_TIMESTAMP		= 1024,
> > +	HIST_FIELD_FL_VAR		= 1024,
> > +	HIST_FIELD_FL_VAR_ONLY		= 2048,
> > +	HIST_FIELD_FL_TIMESTAMP		= 4096,
> 
> Why did you move the timestamp?
> 

No good reason, just happened to be how some refactoring factored out.

> >  };
> >  
> >  struct hist_trigger_attrs {
> >  	char		*keys_str;
> >  	char		*vals_str;
> > +	char		*assignment_str[HIST_ASSIGNMENT_MAX];
> > +	unsigned int	n_assignments;
> >  	char		*sort_key_str;
> >  	char		*name;
> >  	bool		pause;
> > @@ -241,9 +248,14 @@ static int parse_map_size(char *str)
> >  
> >  static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
> >  {
> > +	unsigned int i;
> > +
> >  	if (!attrs)
> >  		return;
> >  
> > +	for (i = 0; i < attrs->n_assignments; i++)
> > +		kfree(attrs->assignment_str[i]);
> > +
> >  	kfree(attrs->name);
> >  	kfree(attrs->sort_key_str);
> >  	kfree(attrs->keys_str);
> > @@ -258,9 +270,9 @@ static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
> >  	if ((strncmp(str, "key=", strlen("key=")) == 0) ||
> >  	    (strncmp(str, "keys=", strlen("keys=")) == 0))
> >  		attrs->keys_str = kstrdup(str, GFP_KERNEL);
> > -	else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
> > -		 (strncmp(str, "vals=", strlen("vals=")) == 0) ||
> > -		 (strncmp(str, "values=", strlen("values=")) == 0))
> > +	else if (((strncmp(str, "val=", strlen("val=")) == 0) ||
> > +		  (strncmp(str, "vals=", strlen("vals=")) == 0) ||
> > +		  (strncmp(str, "values=", strlen("values=")) == 0)))
> 
> Looks unnecessary too.
> 

Yep.

> >  		attrs->vals_str = kstrdup(str, GFP_KERNEL);
> >  	else if (strncmp(str, "sort=", strlen("sort=")) == 0)
> >  		attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
> > @@ -274,8 +286,22 @@ static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
> >  			goto out;
> >  		}
> >  		attrs->map_bits = map_bits;
> > -	} else
> > -		ret = -EINVAL;
> > +	} else {
> > +		char *assignment;
> > +
> > +		if (attrs->n_assignments == HIST_ASSIGNMENT_MAX) {
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		assignment = kstrdup(str, GFP_KERNEL);
> > +		if (!assignment) {
> > +			ret = -ENOMEM;
> > +			goto out;
> > +		}
> > +
> > +		attrs->assignment_str[attrs->n_assignments++] = assignment;
> > +	}
> >   out:
> >  	return ret;
> >  }
> 
> [SNIP]
> > @@ -839,8 +913,7 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
> >  			idx = tracing_map_add_key_field(map,
> >  							hist_field->offset,
> >  							cmp_fn);
> > -
> > -		} else
> > +		} else if (!(hist_field->flags & HIST_FIELD_FL_VAR))
> >  			idx = tracing_map_add_sum_field(map);
> >  
> >  		if (idx < 0)
> > @@ -931,6 +1004,11 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
> >  	for_each_hist_val_field(i, hist_data) {
> >  		hist_field = hist_data->fields[i];
> >  		hist_val = hist_field->fn(hist_field, rec, rbe);
> > +		if (hist_field->flags & HIST_FIELD_FL_VAR) {
> > +			hist_field->var_val = hist_val;
> > +			if (hist_field->flags & HIST_FIELD_FL_VAR_ONLY)
> > +				continue;
> > +		}
> >  		tracing_map_update_sum(elt, i, hist_val);
> 
> Hmm.. you didn't add a field for HIST_FIELD_FL_VAR but it attempts to
> update it, no?
> 

There's a little confusion regarding VAR_ONLY, and the whole thing needs
to be cleaned up wrt that.  Originally, for simplicity I implemented it
such that you could only either have variables or values, not both.
Later when I added the assignment code, I changed that and some of the
old related code remained.  Anyway, suffice it to say that this whole
area will be cleaned up... 

> 
> >  	}
> >  }
> > @@ -996,17 +1074,21 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
> >  			} else
> >  				key = (void *)&field_contents;
> >  		}
> > -
> >  		if (use_compound_key)
> >  			add_to_key(compound_key, key, key_field, rec);
> > +
> > +		if (key_field->flags & HIST_FIELD_FL_VAR)
> > +			key_field->var_val = (u64)key;
> >  	}
> >  
> >  	if (use_compound_key)
> >  		key = compound_key;
> >  
> >  	elt = tracing_map_insert(hist_data->map, key);
> > -	if (elt)
> > -		hist_trigger_elt_update(hist_data, elt, rec, rbe);
> > +	if (!elt)
> > +		return;
> > +
> > +	hist_trigger_elt_update(hist_data, elt, rec, rbe);
> >  }
> >  
> >  static void hist_trigger_stacktrace_print(struct seq_file *m,
> > @@ -1228,7 +1310,12 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
> >  {
> >  	const char *field_name = hist_field_name(hist_field);
> >  
> > -	seq_printf(m, "%s", field_name);
> > +	if (hist_field->var_name)
> > +		seq_printf(m, "%s=", hist_field->var_name);
> > +
> > +	if (field_name)
> > +		seq_printf(m, "%s", field_name);
> > +
> >  	if (hist_field->flags) {
> >  		const char *flags_str = get_hist_field_flags(hist_field);
> >  
> > @@ -1237,6 +1324,16 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
> >  	}
> >  }
> >  
> > +static bool var_only(struct hist_trigger_data *hist_data)
> > +{
> > +	unsigned int i;
> > +
> > +	for_each_hist_val_field(i, hist_data)
> > +		if (hist_data->fields[i]->flags & HIST_FIELD_FL_VAR_ONLY)
> > +			return true;
> > +	return false;
> > +}
> 
> This means if there's a var-only field, the whole hist will be treated
> as var-only, right?  What if it has normal fields and var-only fields
> at the same time?  Didn't it have "hitcount"?  Please see below..
> 

As mentioned, this is a holdover that no longer makes sense, will clean
the whole thing up in the next version.

> > +
> >  static int event_hist_trigger_print(struct seq_file *m,
> >  				    struct event_trigger_ops *ops,
> >  				    struct event_trigger_data *data)
> > @@ -1266,15 +1363,19 @@ static int event_hist_trigger_print(struct seq_file *m,
> >  			hist_field_print(m, key_field);
> >  	}
> >  
> > -	seq_puts(m, ":vals=");
> > +	if (!var_only(hist_data))
> > +		seq_puts(m, ":vals=");
> > +	else
> > +		seq_puts(m, ":");
> >  
> >  	for_each_hist_val_field(i, hist_data) {
> > -		if (i == HITCOUNT_IDX)
> > +		if (i == HITCOUNT_IDX && !var_only(hist_data))
> >  			seq_puts(m, "hitcount");
> 
> Looks like var-only hist cannot have hitcount, right?
> 
> >  		else if (hist_data->fields[i]->flags & HIST_FIELD_FL_TIMESTAMP)
> >  			seq_puts(m, "common_timestamp");
> >  		else {
> > -			seq_puts(m, ",");
> > +			if (!var_only(hist_data))
> > +				seq_puts(m, ",");
> 
> If a var-only hist hist can have multiple fields, it should print ","
> as well IMHO.  Also it seems "common_timestamp" also needs it.
> 

Yep, some missing commas in the output, thanks for pointing them out.

Thanks,

Tom

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 12/21] tracing: Account for variables in named trigger compatibility
  2017-02-13  6:04   ` Namhyung Kim
@ 2017-02-14 15:26     ` Tom Zanussi
  0 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-14 15:26 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

On Mon, 2017-02-13 at 15:04 +0900, Namhyung Kim wrote:
> On Wed, Feb 08, 2017 at 11:25:08AM -0600, Tom Zanussi wrote:
> > Named triggers must also have the same set of variables in order to be
> > considered compatible - update the trigger match test to account for
> > that.
> > 
> > The reason for this requirement is that named triggers with variables
> > are meant to allow one or more events to set the same variable.
> > 
> > Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> > ---
> >  kernel/trace/trace_events_hist.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> > index e707577..889455e 100644
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -1576,6 +1576,10 @@ static bool hist_trigger_match(struct event_trigger_data *data,
> >  			return false;
> >  		if (key_field->is_signed != key_field_test->is_signed)
> >  			return false;
> > +		if ((key_field->var_name && !key_field_test->var_name) ||
> > +		    (!key_field->var_name && key_field_test->var_name) ||
> > +		    strcmp(key_field->var_name, key_field_test->var_name) != 0)
> > +			return false;
> 
> What if key_field->var_name and key_field_test->var_name are both
> NULL?
> 

Yep, it's a problem, thanks for pointing it out.

Tom

> Thanks,
> Namhyung
> 
> >  	}
> >  
> >  	for (i = 0; i < hist_data->n_sort_keys; i++) {
> > -- 
> > 1.9.3
> > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
  2017-02-13  1:04       ` Namhyung Kim
  2017-02-14  9:37         ` Masami Hiramatsu
@ 2017-02-14 15:27         ` Tom Zanussi
  1 sibling, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-14 15:27 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Masami Hiramatsu, rostedt, tglx, linux-kernel, linux-rt-users

On Mon, 2017-02-13 at 10:04 +0900, Namhyung Kim wrote:
> On Fri, Feb 10, 2017 at 12:58:30PM -0600, Tom Zanussi wrote:
> > Hi Masami,
> > 
> > On Fri, 2017-02-10 at 18:34 +0900, Masami Hiramatsu wrote:
> > > On Fri, 10 Feb 2017 13:16:17 +0900
> > > Namhyung Kim <namhyung@kernel.org> wrote:
> > > 
> > > > >   ====
> > > > >   Example - wakeup latency
> > > > >   ====
> > > > > 
> > > > >   This basically implements the -RT latency_hist 'wakeup_latency'
> > > > >   histogram using the synthetic events, variables, and actions
> > > > >   described.  The output below is from a run of cyclictest using the
> > > > >   following command:
> > > > > 
> > > > >     # rt-tests/cyclictest -p 80 -n -s -t 2
> > > > > 
> > > > >   What we're measuring the latency of is the time between when a
> > > > >   thread (of cyclictest) is awakened and when it's scheduled in.  To
> > > > >   do that we add triggers to sched_wakeup and sched_switch with the
> > > > >   appropriate variables, and on a matching sched_switch event,
> > > > >   generate a synthetic 'wakeup_latency' event.  Since it's just
> > > > >   another trace event like any other, we can also define a histogram
> > > > >   on that event, the output of which is what we see displayed when
> > > > >   reading the wakeup_latency 'hist' file.
> > > > > 
> > > > >   First, we create a synthetic event called wakeup_latency, that
> > > > >   references 3 variables from other events:
> > > > > 
> > > > >     # echo 'wakeup_latency lat=sched_switch:wakeup_lat \
> > > > >                            pid=sched_switch:woken_pid \
> > > > >                            prio=sched_switch:woken_prio' >> \
> > > > >             /sys/kernel/debug/tracing/synthetic_events
> > > > > 
> > > > >   Next we add a trigger to sched_wakeup, which saves the value of the
> > > > >   'common_timestamp' when that event is hit in a variable, ts0.  Note
> > > > >   that this happens only when 'comm==cyclictest'.
> > > > > 
> > > > >   Also, 'common_timestamp' is a new field defined on every event (if
> > > > >   needed - if there are no users of timestamps in a trace, timestamps
> > > > >   won't be saved and there's no additional overhead from that).
> > > > > 
> > > > >     #  echo 'hist:keys=pid:ts0=common_timestamp.usecs if \
> > > > >              comm=="cyclictest"' >> \
> > > > >              /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > > > 
> > > > >   Next, we add a trigger to sched_switch.  When the pid being switched
> > > > >   to matches the pid woken up by a previous sched_wakeup event, this
> > > > >   event grabs the ts0 saved on that event, takes the difference
> > > > >   between it and the current sched_switch's common_timestamp, and
> > > > >   assigns it to a new 'wakeup_lat' variable.  It also saves a couple
> > > > >   other variables and then invokes the onmatch().trace() action which
> > > > >   generates a new wakeup_latency event using those variables.
> > > > > 
> > > > >     # echo 'hist:keys=woken_pid=next_pid:woken_prio=next_prio:\
> > > > >        wakeup_lat=common_timestamp.usecs-ts0:onmatch().trace(wakeup_latency) \
> > > > >             if next_comm=="cyclictest"' >> \
> > > > >             /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > > > 
> > > > As Masami said, I think the syntax is a bit hard to understand.  Also
> > > > it'd be nice to access an event field directly (i.e. not by adding a
> > > > field in a hist).  Maybe we can use a prefix like '$' to identify hist
> > > > fields..
> > > 
> > > Ah that's a nice idea!
> > > 
> > > > 
> > > > How about below?
> > > > 
> > > >   # echo 'wakeup_latency \
> > > >   		lat=sched_switch.$wakeup_lat  \
> > > > 		pid=sched_switch.next_pid     \
> > > > 		prio=sched_switch.next_prio' >> \
> > > > 	/sys/kernel/debug/tracing/synthetic_events
> > > 
> > > Should we define these parameter assignment at this.point?
> > > 
> > > I think this syntax binds wakeup_latency event to sched_switch too tight. I 
> > > mean, if someone kicks this event from some other event, it may easily lose 
> > > values.
> > > So, at this point, we will define event name and what parameters it has,
> > > until binding this event to onmatch().
> > > 
> > 
> > Right, I agree this binding doesn't need to be done here, good idea to
> > defer it as below...
> > 
> > > >   # echo 'hist: \
> > > >   		keys=pid: \
> > > > 		ts0=common_timestamp.usec \
> > > > 		if comm=="cyclictest"' >> \
> > > > 	/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > > > 
> > > >   # echo 'hist: \
> > > > 		keys=next_pid: \
> > > > 		wakeup_lat=common_timestamp.usec-$ts0: \
> > > > 		onmatch(sched_wakeup).trace(wakeup_latency) \
> > > 
> > > This one seems much better for me, but I would like to ask you call event 
> > > directly from onmatch, like as
> > > 
> > >  "onmatch(sched_wakeup).wakeup_latency(wakeup_lat,next_pid,next_prio)"
> > > 
> > > At this point, kernel will finalize the wakeup_latency event with wakeup_lat,
> > > next_pid and next_prio.
> > > 
> > 
> > Yes, I like this much better - things are no longer so implicit and
> > therefore subject to confusion, and the syntax itself makes more sense,
> > even if it is a bit more verbose on the trigger, which is fine.
> 
> I thought about it too, but it needs to add some kind of type checking
> then.  What if another hist generates the event with totally different
> info?
> 

Yes, I'll add type info and checking for this.

Thanks,

Tom

> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH 13/21] tracing: Add simple expression support to hist triggers
  2017-02-14  2:37   ` Namhyung Kim
@ 2017-02-14 15:29     ` Tom Zanussi
  0 siblings, 0 replies; 56+ messages in thread
From: Tom Zanussi @ 2017-02-14 15:29 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: rostedt, tglx, mhiramat, linux-kernel, linux-rt-users

Hi Namhyung,

On Tue, 2017-02-14 at 11:37 +0900, Namhyung Kim wrote:
> On Wed, Feb 08, 2017 at 11:25:09AM -0600, Tom Zanussi wrote:
> > Add support for simple addition, subtraction, and unary expressions
> > (-(expr) and expr, where expr = b-a, a+b, a+b+c) to hist triggers, in
> > order to support a minimal set of useful inter-event calculations.
> > 
> > These operations are needed for calculating latencies between events
> > (timestamp1-timestamp0) and for combined latencies (latencies over 3
> > or more events).
> > 
> > In the process, factor out some common code from key and value
> > parsing.
> > 
> > Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
> > ---
> 
> [SNIP]
> > +static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
> > +				     struct trace_event_file *file,
> > +				     char *str, unsigned long flags,
> > +				     char *var_name);
> > +
> > +static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
> > +				      struct trace_event_file *file,
> > +				      char *str, unsigned long flags,
> > +				      char *var_name)
> > +{
> > +	struct hist_field *operand1, *expr = NULL;
> > +	struct ftrace_event_field *field = NULL;
> > +	unsigned long operand_flags;
> > +	char *operand1_str;
> > +	int ret = 0;
> > +	char *s;
> > +
> > +	// we support only -(xxx) i.e. explicit parens required
> > +
> > +	str++; // skip leading '-'
> > +
> > +	s = strchr(str, '(');
> > +	if (s)
> > +		str++;
> > +	else {
> > +		ret = -EINVAL;
> > +		goto free;
> > +	}
> > +
> > +	s = strchr(str, ')');
> > +	if (s)
> > +		*s = '\0';
> > +	else {
> > +		ret = -EINVAL; // no closing ')'
> > +		goto free;
> > +	}
> > +
> > +	operand1_str = strsep(&str, "(");
> > +	if (!operand1_str)
> > +		goto free;
> > +
> > +	flags |= HIST_FIELD_FL_EXPR;
> > +	expr = create_hist_field(NULL, flags, var_name);
> > +	if (!expr) {
> > +		ret = -ENOMEM;
> > +		goto free;
> > +	}
> > +
> > +	operand_flags = 0;
> > +	operand1 = parse_expr(hist_data, file, str, operand_flags, NULL);
> 
> Doesn't it create an unbounded recursion?
> 

Yeah, Steve asked the same thing about some similar code - I'll either
get rid of the recursion or make sure it's unbounded.

Thanks,

Tom

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2017-02-14 15:29 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-08 17:24 [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Tom Zanussi
2017-02-08 17:24 ` [RFC][PATCH 01/21] tracing: Add hist_field_name() accessor Tom Zanussi
2017-02-08 20:09   ` Steven Rostedt
2017-02-08 17:24 ` [RFC][PATCH 02/21] tracing: Reimplement log2 Tom Zanussi
2017-02-08 20:13   ` Steven Rostedt
2017-02-08 20:25     ` Tom Zanussi
2017-02-08 17:24 ` [RFC][PATCH 03/21] ring-buffer: Add TIME_EXTEND_ABS ring buffer type Tom Zanussi
2017-02-08 20:32   ` Steven Rostedt
2017-02-08 20:55     ` Tom Zanussi
2017-02-09 14:54       ` Steven Rostedt
2017-02-10  6:04     ` Namhyung Kim
2017-02-10 14:28       ` Steven Rostedt
2017-02-08 17:25 ` [RFC][PATCH 04/21] tracing: Give event triggers access to ring_buffer_event Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 05/21] tracing: Add ring buffer event param to hist field functions Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 06/21] tracing: Increase tracing map KEYS_MAX size Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 07/21] tracing: Break out hist trigger assignment parsing Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 08/21] tracing: Make traceprobe parsing code reusable Tom Zanussi
2017-02-09 20:40   ` Steven Rostedt
2017-02-08 17:25 ` [RFC][PATCH 09/21] tracing: Add hist trigger timestamp support Tom Zanussi
2017-02-10  6:14   ` Namhyung Kim
2017-02-08 17:25 ` [RFC][PATCH 10/21] tracing: Add per-element variable support to tracing_map Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 11/21] tracing: Add variable support to hist triggers Tom Zanussi
2017-02-13  6:03   ` Namhyung Kim
2017-02-14 15:25     ` Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 12/21] tracing: Account for variables in named trigger compatibility Tom Zanussi
2017-02-13  6:04   ` Namhyung Kim
2017-02-14 15:26     ` Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 13/21] tracing: Add simple expression support to hist triggers Tom Zanussi
2017-02-14  2:37   ` Namhyung Kim
2017-02-14 15:29     ` Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 14/21] tracing: Add variable reference handling " Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 15/21] tracing: Add usecs modifier for hist trigger timestamps Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 16/21] tracing: Add support for dynamic tracepoints Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 17/21] tracing: Add hist trigger action hook Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 18/21] tracing: Add support for 'synthetic' events Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 19/21] tracing: Add 'onmatch' hist trigger action support Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 20/21] tracing: Add 'onmax' " Tom Zanussi
2017-02-08 17:25 ` [RFC][PATCH 21/21] tracing: Add inter-event hist trigger Documentation Tom Zanussi
2017-02-08 20:01 ` [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support Steven Rostedt
2017-02-08 20:19   ` Tom Zanussi
2017-02-08 23:28   ` Tom Zanussi
2017-02-09  2:14     ` Steven Rostedt
2017-02-08 23:13 ` Masami Hiramatsu
2017-02-09  1:14   ` Tom Zanussi
2017-02-09 14:18     ` Masami Hiramatsu
2017-02-09 17:18       ` Tom Zanussi
2017-02-09 19:57         ` Steven Rostedt
2017-02-09 14:46     ` Frank Ch. Eigler
2017-02-09 18:43       ` Tom Zanussi
2017-02-10  4:16 ` Namhyung Kim
2017-02-10  9:34   ` Masami Hiramatsu
2017-02-10 18:58     ` Tom Zanussi
2017-02-13  1:04       ` Namhyung Kim
2017-02-14  9:37         ` Masami Hiramatsu
2017-02-14 15:27         ` Tom Zanussi
2017-02-10 18:43   ` Tom Zanussi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.