kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Dubois <tbodt@google.com>
To: "Valdis Klētnieks" <valdis.kletnieks@vt.edu>
Cc: kernelnewbies@kernelnewbies.org, a.p.zijlstra@chello.nl,
	linux-kernel@vger.kernel.org
Subject: Re: perf_event wakeup_events = 0
Date: Sat, 7 Sep 2019 09:14:49 -0700	[thread overview]
Message-ID: <123C743E-C322-45DB-8796-BF6B6EE9CA80@google.com> (raw)
In-Reply-To: <943813.1567863629@turing-police>

>> The man page for perf_event_open(2) says that recent kernels treat a 0
>> value for wakeup_events the same as 1, which I believe means it will
>> notify after a single sample. However, strace on perf(1) shows that it
>> uses wakeup_events=0, and it's definitely not waking up on every
>> sample (it seems to be waking up every few seconds.)
>> tools/perf/design.txt says "Normally a notification is generated for
>> every page filled". Is the documentation wrong, or am I
>> misunderstanding something?
> 
>       wakeup_events, wakeup_watermark
>              This  union sets how many samples (wakeup_events) or bytes (wakeup_watermark) happen before an overflow
>              notification happens.  Which one is used is selected by the watermark bit flag.
> 
>              wakeup_events counts only PERF_RECORD_SAMPLE record types.  To receive overflow  notification  for  all
>              PERF_RECORD types choose watermark and set wakeup_watermark to 1.
> 
>              Prior  to Linux 3.0, setting wakeup_events to 0 resulted in no overflow notifications; more recent ker?
>              nels treat 0 the same as 1.
> 
> My reading of that is that in pre-3.0 kernels, you could choose to not get overflow
> notifications, and now you'll get them whether or not you wanted them.
> 
> Under "overflow handling", we see:
> 
>       Overflows are generated only by sampling events (sample_period must have a nonzero value).
> 
> So the reason strace says perf is only waking up every few seconds is probably
> because you either launched perf with options that only create trace events, or
> it takes several seconds for an overflow to happen on a sampling event. A lot
> of those fields are u64 counters, and won't overflow anytime soon.  Even the
> u32 counters can take a few seconds to overflow....

I launched perf record with the default options. Here’s one of the perf_event_open calls from strace:

08:57:37.083733 perf_event_open({type=PERF_TYPE_HARDWARE, size=PERF_ATTR_SIZE_VER5, config=PERF_COUNT_HW_CPU_CYCLES, sample_freq=4000, sample_type=PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_TIME|PERF_SAMPLE_PERIOD, read_format=PERF_FORMAT_ID, disabled=1, inherit=1, pinned=0, exclusive=0, exclusive_user=0, exclude_kernel=1, exclude_hv=0, exclude_idle=0, mmap=1, comm=1, freq=1, inherit_stat=0, enable_on_exec=1, task=1, watermark=0, precise_ip=3 /* must have 0 skid */, mmap_data=0, sample_id_all=1, exclude_host=0, exclude_guest=1, exclude_callchain_kernel=0, exclude_callchain_user=0, mmap2=1, comm_exec=1, use_clockid=0, context_switch=0, write_backward=0, namespaces=0, wakeup_events=0, config1=0, config2=0, sample_regs_user=0, sample_regs_intr=0, aux_watermark=0, sample_max_stack=0}, 134206, 18, -1, PERF_FLAG_FD_CLOEXEC) = 23 <0.000023>

sample_freq is 4000 (and freq is 1). Here’s the man page on this field:

       sample_period, sample_freq
              A "sampling" event is one that generates an  overflow  notifica‐
              tion  every N events, where N is given by sample_period.  A sam‐
              pling event has sample_period > 0.   When  an  overflow  occurs,
              requested  data is recorded in the mmap buffer.  The sample_type
              field controls what data is recorded on each overflow.

              sample_freq can be used if you wish to use frequency rather than
              period.   In  this case, you set the freq flag.  The kernel will
              adjust the sampling period to try and achieve the desired  rate.
              The rate of adjustment is a timer tick.

If I’m reading this right, this is a sampling event which overflows 4000 times a second. But perf then does a poll call which wakes up on this FD with POLLIN after 1.637 seconds, instead of 0.00025 seconds.

The man page snippet you pasted seems to be a different definition of an overflow event, but that doesn’t make sense either: the event generates a sample 4000 times a second, and 0 (supposedly the same as 1) would mean to wake up the FD after 1 sample, which would be 0.00025 seconds.

~Theodore
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

  reply	other threads:[~2019-09-16 15:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06 23:28 perf_event wakeup_events = 0 Theodore Dubois
2019-09-07 13:40 ` Valdis Klētnieks
2019-09-07 16:14   ` Theodore Dubois [this message]
2019-09-07 22:00     ` Valdis Klētnieks
2019-09-07 22:45     ` Valdis Klētnieks
2019-09-07 23:27       ` Theodore Dubois

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=123C743E-C322-45DB-8796-BF6B6EE9CA80@google.com \
    --to=tbodt@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=kernelnewbies@kernelnewbies.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=valdis.kletnieks@vt.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).