All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 00/19] perf tools: Factor ordered samples queue
@ 2014-07-25 14:55 Jiri Olsa
  2014-07-25 14:55 ` [PATCH 01/19] perf tools: Fix accounting of " Jiri Olsa
                   ` (18 more replies)
  0 siblings, 19 replies; 23+ messages in thread
From: Jiri Olsa @ 2014-07-25 14:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arnaldo Carvalho de Melo, Corey Ashford, David Ahern,
	Frederic Weisbecker, Ingo Molnar, Jean Pihet, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Jiri Olsa

hi,
this patchset factors session's ordered samples queue,
and allows to limit the size of this queue.

v4 changes:
  - split patch 17 into 2 patches (17/18 now) (Arnaldo)
  - omitted patch 18 which set default queue value to 100MB (Adrian)
  - factor patch 16 to display better debug messages (Adrian)

v3 changes:
  - rebased to latest tip/perf/core
  - add comment for WARN in patch 8 (David)
  - added ordered-events debug variable (David)
  - renamed ordered_events_(get|put) to ordered_events_(new|delete)
  - renamed struct ordered_events_queue to struct ordered_events

v2 changes:
  - several small changes for review comments (Namhyung)


The report command queues events till any of following
conditions is reached:
  - PERF_RECORD_FINISHED_ROUND event is processed
  - end of the file is reached

Any of above conditions will force the queue to flush some
events while keeping all allocated memory for next events.

If PERF_RECORD_FINISHED_ROUND is missing the queue will
allocate memory for every single event in the perf.data.
This could lead to enormous memory consuption and speed
degradation of report command for huge perf.data files.

With the quue allocation limit of 100 MB, I've got around
15% speedup on reporting of ~10GB perf.data file.

current code:
 Performance counter stats for './perf.old report --stdio -i perf-test.data' (3 runs):

   621,685,704,665      cycles                    ( +-  0.52% )
   873,397,467,969      instructions              ( +-  0.00% )

     286.133268732 seconds time elapsed           ( +-  1.13% )

with patches:
 Performance counter stats for './perf report --stdio -i perf-test.data' (3 runs):

   603,933,987,185      cycles                    ( +-  0.45% )
   869,139,445,070      instructions              ( +-  0.00% )

     245.337510637 seconds time elapsed           ( +-  0.49% )


The speed up seems to be mainly in less cycles spent in servicing
page faults:

current code:
     4.44%     0.01%  perf.old  [kernel.kallsyms]   [k] page_fault                                   

with patches:
     1.45%     0.00%      perf  [kernel.kallsyms]   [k] page_fault                                   

current code (faults event):
         6,643,807      faults                    ( +-  0.36% )

with patches (faults event):
         2,214,756      faults                    ( +-  3.03% )


Also now we have one of our big memory spender under control
and the ordered events queue code is put in separated object
with clear interface ready to be used by another command
like script.

Also reachable in here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/core_ordered_events

thanks,
jirka


Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
Jiri Olsa (19):
      perf tools: Fix accounting of ordered samples queue
      perf tools: Rename ordered_samples bool to ordered_events
      perf tools: Rename ordered_samples struct to ordered_events
      perf tools: Rename ordered_events members
      perf tools: Add ordered_events_(new|delete) interface
      perf tools: Factor ordered_events_flush to be more generic
      perf tools: Limit ordered events queue size
      perf tools: Flush ordered events in case of allocation failure
      perf tools: Make perf_session_deliver_event global
      perf tools: Create ordered-events object
      perf tools: Use list_move in ordered_events_delete function
      perf tools: Add ordered_events_init function
      perf tools: Add ordered_events_free function
      perf tools: Add perf_config_u64 function
      perf tools: Add report.queue-size config file option
      perf tools: Add debug prints for ordered events queue
      perf tools: Always force PERF_RECORD_FINISHED_ROUND event
      perf tools: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds
      perf tools: Allow out of order messages in forced flush

 tools/perf/Makefile.perf         |   2 +
 tools/perf/builtin-annotate.c    |   2 +-
 tools/perf/builtin-diff.c        |   2 +-
 tools/perf/builtin-inject.c      |   2 +-
 tools/perf/builtin-kmem.c        |   2 +-
 tools/perf/builtin-kvm.c         |   8 ++--
 tools/perf/builtin-lock.c        |   2 +-
 tools/perf/builtin-mem.c         |   2 +-
 tools/perf/builtin-record.c      |   7 ++-
 tools/perf/builtin-report.c      |  15 +++++-
 tools/perf/builtin-sched.c       |   2 +-
 tools/perf/builtin-script.c      |   2 +-
 tools/perf/builtin-timechart.c   |   2 +-
 tools/perf/builtin-trace.c       |   2 +-
 tools/perf/util/cache.h          |   1 +
 tools/perf/util/config.c         |  24 ++++++++++
 tools/perf/util/debug.c          |  36 ++++++++++++++-
 tools/perf/util/debug.h          |   8 ++++
 tools/perf/util/ordered-events.c | 245 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/ordered-events.h |  51 +++++++++++++++++++++
 tools/perf/util/session.c        | 217 ++++++++++++++++----------------------------------------------------------------------
 tools/perf/util/session.h        |  26 ++++-------
 tools/perf/util/tool.h           |   2 +-
 23 files changed, 448 insertions(+), 214 deletions(-)
 create mode 100644 tools/perf/util/ordered-events.c
 create mode 100644 tools/perf/util/ordered-events.h

^ permalink raw reply	[flat|nested] 23+ messages in thread
* [PATCHv3 00/19] perf tools: Factor ordered samples queue
@ 2014-07-20 21:55 Jiri Olsa
  2014-07-20 21:55 ` [PATCH 15/19] perf tools: Add report.queue-size config file option Jiri Olsa
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2014-07-20 21:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arnaldo Carvalho de Melo, Corey Ashford, David Ahern,
	Frederic Weisbecker, Ingo Molnar, Jean Pihet, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Jiri Olsa

hi,
this patchset factors session's ordered samples queue,
and allows to limit the size of this queue.

v3 changes:
  - rebased to latest tip/perf/core
  - add comment for WARN in patch 8 (David)
  - added ordered-events debug variable (David)
  - renamed ordered_events_(get|put) to ordered_events_(new|delete)
  - renamed struct ordered_events_queue to struct ordered_events

v2 changes:
  - several small changes for review comments (Namhyung)


The report command queues events till any of following
conditions is reached:
  - PERF_RECORD_FINISHED_ROUND event is processed
  - end of the file is reached

Any of above conditions will force the queue to flush some
events while keeping all allocated memory for next events.

If PERF_RECORD_FINISHED_ROUND is missing the queue will
allocate memory for every single event in the perf.data.
This could lead to enormous memory consuption and speed
degradation of report command for huge perf.data files.

With the quue allocation limit of 100 MB, I've got around
15% speedup on reporting of ~10GB perf.data file.

current code:
 Performance counter stats for './perf.old report --stdio -i perf-test.data' (3 runs):

   621,685,704,665      cycles                    ( +-  0.52% )
   873,397,467,969      instructions              ( +-  0.00% )

     286.133268732 seconds time elapsed           ( +-  1.13% )

with patches:
 Performance counter stats for './perf report --stdio -i perf-test.data' (3 runs):

   603,933,987,185      cycles                    ( +-  0.45% )
   869,139,445,070      instructions              ( +-  0.00% )

     245.337510637 seconds time elapsed           ( +-  0.49% )


The speed up seems to be mainly in less cycles spent in servicing
page faults:

current code:
     4.44%     0.01%  perf.old  [kernel.kallsyms]   [k] page_fault                                   

with patches:
     1.45%     0.00%      perf  [kernel.kallsyms]   [k] page_fault                                   

current code (faults event):
         6,643,807      faults                    ( +-  0.36% )

with patches (faults event):
         2,214,756      faults                    ( +-  3.03% )


Also now we have one of our big memory spender under control
and the ordered events queue code is put in separated object
with clear interface ready to be used by another command
like script.

Also reachable in here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/core_ordered_events

thanks,
jirka


Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-07-28  8:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-25 14:55 [PATCHv4 00/19] perf tools: Factor ordered samples queue Jiri Olsa
2014-07-25 14:55 ` [PATCH 01/19] perf tools: Fix accounting of " Jiri Olsa
2014-07-25 14:56 ` [PATCH 02/19] perf tools: Rename ordered_samples bool to ordered_events Jiri Olsa
2014-07-25 14:56 ` [PATCH 03/19] perf tools: Rename ordered_samples struct " Jiri Olsa
2014-07-25 14:56 ` [PATCH 04/19] perf tools: Rename ordered_events members Jiri Olsa
2014-07-25 14:56 ` [PATCH 05/19] perf tools: Add ordered_events_(new|delete) interface Jiri Olsa
2014-07-25 14:56 ` [PATCH 06/19] perf tools: Factor ordered_events_flush to be more generic Jiri Olsa
2014-07-25 14:56 ` [PATCH 07/19] perf tools: Limit ordered events queue size Jiri Olsa
2014-07-25 14:56 ` [PATCH 08/19] perf tools: Flush ordered events in case of allocation failure Jiri Olsa
2014-07-25 14:56 ` [PATCH 09/19] perf tools: Make perf_session_deliver_event global Jiri Olsa
2014-07-25 14:56 ` [PATCH 10/19] perf tools: Create ordered-events object Jiri Olsa
2014-07-25 14:56 ` [PATCH 11/19] perf tools: Use list_move in ordered_events_delete function Jiri Olsa
2014-07-25 14:56 ` [PATCH 12/19] perf tools: Add ordered_events_init function Jiri Olsa
2014-07-25 14:56 ` [PATCH 13/19] perf tools: Add ordered_events_free function Jiri Olsa
2014-07-25 14:56 ` [PATCH 14/19] perf tools: Add perf_config_u64 function Jiri Olsa
2014-07-25 14:56 ` [PATCH 15/19] perf tools: Add report.queue-size config file option Jiri Olsa
2014-07-25 14:56 ` [PATCH 16/19] perf tools: Add debug prints for ordered events queue Jiri Olsa
2014-07-25 14:56 ` [PATCH 17/19] perf tools: Always force PERF_RECORD_FINISHED_ROUND event Jiri Olsa
2014-07-28  8:27   ` [tip:perf/core] perf record: " tip-bot for Jiri Olsa
2014-07-25 14:56 ` [PATCH 18/19] perf tools: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds Jiri Olsa
2014-07-28  8:27   ` [tip:perf/core] perf record: " tip-bot for Jiri Olsa
2014-07-25 14:56 ` [PATCH 19/19] perf tools: Allow out of order messages in forced flush Jiri Olsa
  -- strict thread matches above, loose matches on Subject: below --
2014-07-20 21:55 [PATCHv3 00/19] perf tools: Factor ordered samples queue Jiri Olsa
2014-07-20 21:55 ` [PATCH 15/19] perf tools: Add report.queue-size config file option Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.