linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] perf kvm stat live: Copy events
@ 2014-09-18 17:07 Alexander Yarygin
  2014-09-18 17:07 ` [PATCH 1/2] perf session: Add option to copy events when queueing Alexander Yarygin
  2014-09-18 17:07 ` [PATCH 2/2] perf kvm stat live: Enable events copying Alexander Yarygin
  0 siblings, 2 replies; 12+ messages in thread
From: Alexander Yarygin @ 2014-09-18 17:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Yarygin, Arnaldo Carvalho de Melo,
	Christian Borntraeger, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

Hello,

This is a second attempt to fix 'perf kvm stat live' crash when it tries
to parse events that have been already overwritten by the kernel.
Previous thread: https://lkml.org/lkml/2014/9/12/450

Patches
- 1/2 is patch by David Ahern (https://lkml.org/lkml/2013/9/6/388)
  rebased to the current upstream (hope I did it right). It adds an option
  to copy events when they are pushed to the samples queue.
- 2/2 enables the copying for perf kvm stat live.

Looks like we have to live with possibility to get thousands malloc()s/free()s
per second...

Alexander Yarygin (2):
  perf session: Add option to copy events when queueing
  perf kvm stat live: Enable events copying

 tools/perf/builtin-kvm.c         |  1 +
 tools/perf/util/ordered-events.c | 12 +++++++++---
 tools/perf/util/ordered-events.h |  2 +-
 tools/perf/util/session.c        | 12 +++++++++---
 tools/perf/util/session.h        |  1 +
 5 files changed, 21 insertions(+), 7 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-18 17:07 [PATCH 0/2] perf kvm stat live: Copy events Alexander Yarygin
@ 2014-09-18 17:07 ` Alexander Yarygin
  2014-09-18 18:53   ` Arnaldo Carvalho de Melo
  2014-09-22  7:04   ` Jiri Olsa
  2014-09-18 17:07 ` [PATCH 2/2] perf kvm stat live: Enable events copying Alexander Yarygin
  1 sibling, 2 replies; 12+ messages in thread
From: Alexander Yarygin @ 2014-09-18 17:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Yarygin, Arnaldo Carvalho de Melo,
	Christian Borntraeger, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

From: David Ahern <dsahern@gmail.com>

When processing events the session code has an ordered samples queue which is
used to time-sort events coming in across multiple mmaps. At a later point in
time samples on the queue are flushed up to some timestamp at which point the
event is actually processed.

When analyzing events live (ie., record/analysis path in the same command)
there is a race that leads to corrupted events and parse errors which cause
perf to terminate. The problem is that when the event is placed in the ordered
samples queue it is only a reference to the event which is really sitting in
the mmap buffer. Even though the event is queued for later processing the mmap
tail pointer is updated which indicates to the kernel that the event has been
processed. The race is flushing the event from the queue before it gets
overwritten by some other event. For commands trying to process events live
(versus just writing to a file) and processing a high rate of events this leads
to parse failures and perf terminates.

Examples hitting this problem are 'perf kvm stat live', especially with nested
VMs which generate 100,000+ traces per second, and a command processing
scheduling events with a high rate of context switching -- e.g., running
'perf bench sched pipe'.

This patch offers live commands an option to copy the event when it is placed in
the ordered samples queue.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
---
 tools/perf/util/ordered-events.c | 12 +++++++++---
 tools/perf/util/ordered-events.h |  2 +-
 tools/perf/util/session.c        | 12 +++++++++---
 tools/perf/util/session.h        |  1 +
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/ordered-events.c b/tools/perf/util/ordered-events.c
index 706ce1a..5b51de5 100644
--- a/tools/perf/util/ordered-events.c
+++ b/tools/perf/util/ordered-events.c
@@ -140,11 +140,15 @@ static int __ordered_events__flush(struct perf_session *s,
 			break;
 
 		ret = perf_evlist__parse_sample(s->evlist, iter->event, &sample);
-		if (ret)
+		if (ret) {
 			pr_err("Can't parse sample, err = %d\n", ret);
-		else {
+			if (s->copy_on_queue)
+				free(iter->event);
+		} else {
 			ret = perf_session__deliver_event(s, iter->event, &sample, tool,
 							  iter->file_offset);
+			if (s->copy_on_queue)
+				free(iter->event);
 			if (ret)
 				return ret;
 		}
@@ -233,13 +237,15 @@ void ordered_events__init(struct ordered_events *oe)
 	oe->cur_alloc_size = 0;
 }
 
-void ordered_events__free(struct ordered_events *oe)
+void ordered_events__free(struct perf_session *s, struct ordered_events *oe)
 {
 	while (!list_empty(&oe->to_free)) {
 		struct ordered_event *event;
 
 		event = list_entry(oe->to_free.next, struct ordered_event, list);
 		list_del(&event->list);
+		if (s->copy_on_queue)
+			free(event->event);
 		free(event);
 	}
 }
diff --git a/tools/perf/util/ordered-events.h b/tools/perf/util/ordered-events.h
index 3b2f205..a156b0e 100644
--- a/tools/perf/util/ordered-events.h
+++ b/tools/perf/util/ordered-events.h
@@ -41,7 +41,7 @@ void ordered_events__delete(struct ordered_events *oe, struct ordered_event *eve
 int ordered_events__flush(struct perf_session *s, struct perf_tool *tool,
 			  enum oe_flush how);
 void ordered_events__init(struct ordered_events *oe);
-void ordered_events__free(struct ordered_events *oe);
+void ordered_events__free(struct perf_session *s, struct ordered_events *oe);
 
 static inline
 void ordered_events__set_alloc_size(struct ordered_events *oe, u64 size)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6d2d50d..40d3aec 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -542,7 +542,13 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
 		return -ENOMEM;
 
 	new->file_offset = file_offset;
-	new->event = event;
+
+	if (s->copy_on_queue) {
+		new->event = malloc(event->header.size);
+		memcpy(new->event, event, event->header.size);
+	} else
+		new->event = event;
+
 	return 0;
 }
 
@@ -1164,7 +1170,7 @@ done:
 out_err:
 	free(buf);
 	perf_session__warn_about_errors(session, tool);
-	ordered_events__free(&session->ordered_events);
+	ordered_events__free(session, &session->ordered_events);
 	return err;
 }
 
@@ -1309,7 +1315,7 @@ out:
 out_err:
 	ui_progress__finish();
 	perf_session__warn_about_errors(session, tool);
-	ordered_events__free(&session->ordered_events);
+	ordered_events__free(session, &session->ordered_events);
 	session->one_mmap = false;
 	return err;
 }
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 8dd41ca..137f77f 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -27,6 +27,7 @@ struct perf_session {
 	void			*one_mmap_addr;
 	u64			one_mmap_offset;
 	struct ordered_events	ordered_events;
+	bool			copy_on_queue;
 	struct perf_data_file	*file;
 };
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] perf kvm stat live: Enable events copying
  2014-09-18 17:07 [PATCH 0/2] perf kvm stat live: Copy events Alexander Yarygin
  2014-09-18 17:07 ` [PATCH 1/2] perf session: Add option to copy events when queueing Alexander Yarygin
@ 2014-09-18 17:07 ` Alexander Yarygin
  1 sibling, 0 replies; 12+ messages in thread
From: Alexander Yarygin @ 2014-09-18 17:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Yarygin, Arnaldo Carvalho de Melo,
	Christian Borntraeger, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

Process of analyzing events caused by 2 functions: mmap_read() and
finished_round().
During mmap_read(), perf receives events from shared memory, queues
their pointers for further processing in finished_round() and notifies
the kernel that the events have been processed.
By the time when finished_round() is invoked, queued events can be
overwritten by the kernel, so the finished_round() occurs on potentially
corrupted memory.

Let's copy events when queueing.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 tools/perf/builtin-kvm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index f5d3ae4..c5b6d1d 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1369,6 +1369,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 		goto out;
 	}
 	kvm->session->evlist = kvm->evlist;
+	kvm->session->copy_on_queue = true;
 	perf_session__set_id_hdr_size(kvm->session);
 	machine__synthesize_threads(&kvm->session->machines.host, &kvm->opts.target,
 				    kvm->evlist->threads, false);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-18 17:07 ` [PATCH 1/2] perf session: Add option to copy events when queueing Alexander Yarygin
@ 2014-09-18 18:53   ` Arnaldo Carvalho de Melo
  2014-09-18 20:21     ` David Ahern
  2014-09-22  7:04   ` Jiri Olsa
  1 sibling, 1 reply; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-18 18:53 UTC (permalink / raw)
  To: Alexander Yarygin
  Cc: linux-kernel, Christian Borntraeger, David Ahern,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

Em Thu, Sep 18, 2014 at 09:07:38PM +0400, Alexander Yarygin escreveu:
> From: David Ahern <dsahern@gmail.com>
> When processing events the session code has an ordered samples queue which is
> used to time-sort events coming in across multiple mmaps. At a later point in
> time samples on the queue are flushed up to some timestamp at which point the
> event is actually processed.
> 
> When analyzing events live (ie., record/analysis path in the same command)
> there is a race that leads to corrupted events and parse errors which cause
> perf to terminate. The problem is that when the event is placed in the ordered
> samples queue it is only a reference to the event which is really sitting in
> the mmap buffer. Even though the event is queued for later processing the mmap
> tail pointer is updated which indicates to the kernel that the event has been
> processed. The race is flushing the event from the queue before it gets
> overwritten by some other event. For commands trying to process events live
> (versus just writing to a file) and processing a high rate of events this leads
> to parse failures and perf terminates.
> 
> Examples hitting this problem are 'perf kvm stat live', especially with nested
> VMs which generate 100,000+ traces per second, and a command processing
> scheduling events with a high rate of context switching -- e.g., running
> 'perf bench sched pipe'.
> 
> This patch offers live commands an option to copy the event when it is placed in
> the ordered samples queue.

If nobody objects I'll merge this patch, as it fixes problems, but I
wonder if the best wouldn't be simply not calling
perf_evlist__mmap_consume() till the last event there is in fact
consumed... I.e. as we _really_ consume the events, we remove it from
there.

Instead of consuming the event at perf_tool->sample() time, we would
do it at perf_tool->finished_round(), would that be feasible? Has anyone
tried this?

David? Is this a st00pid idea?

- Arnaldo

> Signed-off-by: David Ahern <dsahern@gmail.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Namhyung Kim <namhyung.kim@lge.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
> ---
>  tools/perf/util/ordered-events.c | 12 +++++++++---
>  tools/perf/util/ordered-events.h |  2 +-
>  tools/perf/util/session.c        | 12 +++++++++---
>  tools/perf/util/session.h        |  1 +
>  4 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/util/ordered-events.c b/tools/perf/util/ordered-events.c
> index 706ce1a..5b51de5 100644
> --- a/tools/perf/util/ordered-events.c
> +++ b/tools/perf/util/ordered-events.c
> @@ -140,11 +140,15 @@ static int __ordered_events__flush(struct perf_session *s,
>  			break;
>  
>  		ret = perf_evlist__parse_sample(s->evlist, iter->event, &sample);
> -		if (ret)
> +		if (ret) {
>  			pr_err("Can't parse sample, err = %d\n", ret);
> -		else {
> +			if (s->copy_on_queue)
> +				free(iter->event);
> +		} else {
>  			ret = perf_session__deliver_event(s, iter->event, &sample, tool,
>  							  iter->file_offset);
> +			if (s->copy_on_queue)
> +				free(iter->event);
>  			if (ret)
>  				return ret;
>  		}
> @@ -233,13 +237,15 @@ void ordered_events__init(struct ordered_events *oe)
>  	oe->cur_alloc_size = 0;
>  }
>  
> -void ordered_events__free(struct ordered_events *oe)
> +void ordered_events__free(struct perf_session *s, struct ordered_events *oe)
>  {
>  	while (!list_empty(&oe->to_free)) {
>  		struct ordered_event *event;
>  
>  		event = list_entry(oe->to_free.next, struct ordered_event, list);
>  		list_del(&event->list);
> +		if (s->copy_on_queue)
> +			free(event->event);
>  		free(event);
>  	}
>  }
> diff --git a/tools/perf/util/ordered-events.h b/tools/perf/util/ordered-events.h
> index 3b2f205..a156b0e 100644
> --- a/tools/perf/util/ordered-events.h
> +++ b/tools/perf/util/ordered-events.h
> @@ -41,7 +41,7 @@ void ordered_events__delete(struct ordered_events *oe, struct ordered_event *eve
>  int ordered_events__flush(struct perf_session *s, struct perf_tool *tool,
>  			  enum oe_flush how);
>  void ordered_events__init(struct ordered_events *oe);
> -void ordered_events__free(struct ordered_events *oe);
> +void ordered_events__free(struct perf_session *s, struct ordered_events *oe);
>  
>  static inline
>  void ordered_events__set_alloc_size(struct ordered_events *oe, u64 size)
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 6d2d50d..40d3aec 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -542,7 +542,13 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
>  		return -ENOMEM;
>  
>  	new->file_offset = file_offset;
> -	new->event = event;
> +
> +	if (s->copy_on_queue) {
> +		new->event = malloc(event->header.size);
> +		memcpy(new->event, event, event->header.size);
> +	} else
> +		new->event = event;
> +
>  	return 0;
>  }
>  
> @@ -1164,7 +1170,7 @@ done:
>  out_err:
>  	free(buf);
>  	perf_session__warn_about_errors(session, tool);
> -	ordered_events__free(&session->ordered_events);
> +	ordered_events__free(session, &session->ordered_events);
>  	return err;
>  }
>  
> @@ -1309,7 +1315,7 @@ out:
>  out_err:
>  	ui_progress__finish();
>  	perf_session__warn_about_errors(session, tool);
> -	ordered_events__free(&session->ordered_events);
> +	ordered_events__free(session, &session->ordered_events);
>  	session->one_mmap = false;
>  	return err;
>  }
> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index 8dd41ca..137f77f 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -27,6 +27,7 @@ struct perf_session {
>  	void			*one_mmap_addr;
>  	u64			one_mmap_offset;
>  	struct ordered_events	ordered_events;
> +	bool			copy_on_queue;
>  	struct perf_data_file	*file;
>  };
>  
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-18 18:53   ` Arnaldo Carvalho de Melo
@ 2014-09-18 20:21     ` David Ahern
  2014-09-18 20:29       ` David Ahern
  0 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2014-09-18 20:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Alexander Yarygin
  Cc: linux-kernel, Christian Borntraeger, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

On 9/18/14, 12:53 PM, Arnaldo Carvalho de Melo wrote:
> If nobody objects I'll merge this patch, as it fixes problems, but I
> wonder if the best wouldn't be simply not calling
> perf_evlist__mmap_consume() till the last event there is in fact
> consumed... I.e. as we _really_ consume the events, we remove it from
> there.
>
> Instead of consuming the event at perf_tool->sample() time, we would
> do it at perf_tool->finished_round(), would that be feasible? Has anyone
> tried this?

Hmmm... haven't tried this.  Conceptually it should work - at least 
nothing comes to mind at the moment.


Patch looked ok to me: Reviewed-by: David Ahern <dsahern@gmail.com>

Jiri should review based on his recent changes for ordered events.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-18 20:21     ` David Ahern
@ 2014-09-18 20:29       ` David Ahern
  2014-09-19  8:48         ` Alexander Yarygin
  0 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2014-09-18 20:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Alexander Yarygin
  Cc: linux-kernel, Christian Borntraeger, Frederic Weisbecker,
	Ingo Molnar, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

On 9/18/14, 2:21 PM, David Ahern wrote:
> On 9/18/14, 12:53 PM, Arnaldo Carvalho de Melo wrote:
>> If nobody objects I'll merge this patch, as it fixes problems, but I
>> wonder if the best wouldn't be simply not calling
>> perf_evlist__mmap_consume() till the last event there is in fact
>> consumed... I.e. as we _really_ consume the events, we remove it from
>> there.
>>
>> Instead of consuming the event at perf_tool->sample() time, we would
>> do it at perf_tool->finished_round(), would that be feasible? Has anyone
>> tried this?
>
> Hmmm... haven't tried this.  Conceptually it should work - at least
> nothing comes to mind at the moment.

Upon further review ...

Alex you might want to try this first. Malloc and copy of all events is 
going to bring some serious overhead. Can avoid that if consuming the 
event in finished_round works.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-18 20:29       ` David Ahern
@ 2014-09-19  8:48         ` Alexander Yarygin
  2014-09-19 14:21           ` David Ahern
  2014-09-22  7:04           ` Jiri Olsa
  0 siblings, 2 replies; 12+ messages in thread
From: Alexander Yarygin @ 2014-09-19  8:48 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Christian Borntraeger,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Alexander Yarygin

David Ahern <dsahern@gmail.com> writes:

> On 9/18/14, 2:21 PM, David Ahern wrote:
>> On 9/18/14, 12:53 PM, Arnaldo Carvalho de Melo wrote:
>>> If nobody objects I'll merge this patch, as it fixes problems, but I
>>> wonder if the best wouldn't be simply not calling
>>> perf_evlist__mmap_consume() till the last event there is in fact
>>> consumed... I.e. as we _really_ consume the events, we remove it from
>>> there.
>>>
>>> Instead of consuming the event at perf_tool->sample() time, we would
>>> do it at perf_tool->finished_round(), would that be feasible? Has anyone
>>> tried this?
>>
>> Hmmm... haven't tried this.  Conceptually it should work - at least
>> nothing comes to mind at the moment.
>
> Upon further review ...
>
> Alex you might want to try this first. Malloc and copy of all events
> is going to bring some serious overhead. Can avoid that if consuming
> the event in finished_round works.
>
> David

I've tried that:

--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -737,7 +737,6 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx,
                 * FIXME: Here we can't consume the event, as perf_session_queue_event will
                 *        point to it, and it'll get possibly overwritten by the kernel.
                 */
-               perf_evlist__mmap_consume(kvm->evlist, idx);
 
                if (err) {
                        pr_err("Failed to enqueue sample: %d\n", err);
@@ -787,6 +786,10 @@ static int perf_kvm__mmap_read(struct perf_kvm_stat *kvm)
        if (ntotal) {
                kvm->session->ordered_samples.next_flush = flush_time;
                err = kvm->tool.finished_round(&kvm->tool, NULL, kvm->session);
+
+               for (i = 0; i < kvm->evlist->nr_mmaps; i++)
+                       perf_evlist__mmap_consume(kvm->evlist, i);
+
                if (err) {
                        if (kvm->lost_events)
                                pr_info("\nLost events: %" PRIu64
                        "\n\n",

It did't work. Turned out that there is at least one event alive after
finished_round(), usually I get more - ~20. Not sure why, maybe it's
another problem which should be solved at first?


Also, I tried to follow 'perf-top' way:

  while (perf_evlist__mmap_read() != NULL) {
    perf_evlist__parse_sample();
    perf_event__process_sample();
    perf_evlist__mmap_consume();
  }

I.e. without session_queue. In this case perf won't crash, but it will
process significantly less events.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-19  8:48         ` Alexander Yarygin
@ 2014-09-19 14:21           ` David Ahern
  2014-09-19 16:25             ` Alexander Yarygin
  2014-09-22  7:04           ` Jiri Olsa
  1 sibling, 1 reply; 12+ messages in thread
From: David Ahern @ 2014-09-19 14:21 UTC (permalink / raw)
  To: Alexander Yarygin
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Christian Borntraeger,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

On 9/19/14, 2:48 AM, Alexander Yarygin wrote:
> It did't work. Turned out that there is at least one event alive after
> finished_round(), usually I get more - ~20. Not sure why, maybe it's
> another problem which should be solved at first?

hmm.... perf_evlist__mmap_consume is not at the event level, but at the 
mmap level -- it assumes everything read has been processed which is not 
true for this case.

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-19 14:21           ` David Ahern
@ 2014-09-19 16:25             ` Alexander Yarygin
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Yarygin @ 2014-09-19 16:25 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Christian Borntraeger,
	Frederic Weisbecker, Ingo Molnar, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

David Ahern <dsahern@gmail.com> writes:

> On 9/19/14, 2:48 AM, Alexander Yarygin wrote:
>> It did't work. Turned out that there is at least one event alive after
>> finished_round(), usually I get more - ~20. Not sure why, maybe it's
>> another problem which should be solved at first?
>
> hmm.... perf_evlist__mmap_consume is not at the event level, but at
> the mmap level -- it assumes everything read has been processed which
> is not true for this case.
>
> David

So, in general, we know when each event has been processed, but at that
time there is no reference to appropriate mmap. To create that link we
still need to allocate something per event.


I'm confused by the applying perf_evlist__mmap_consume() for each
perf_evlist__mmap_read, shouldn't it be for each perf_kvm__mmap_read_idx()?

	for (i = 0; i < kvm->evlist->nr_mmaps; i++) {
		n = perf_kvm__mmap_read_idx(kvm, i, &mmap_time);
		perf_evlist__mmap_consume(kvm->evlist, idx); /* <-- here? */
		if (n < 0)
			return -1;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-18 17:07 ` [PATCH 1/2] perf session: Add option to copy events when queueing Alexander Yarygin
  2014-09-18 18:53   ` Arnaldo Carvalho de Melo
@ 2014-09-22  7:04   ` Jiri Olsa
  2014-09-29 18:33     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2014-09-22  7:04 UTC (permalink / raw)
  To: Alexander Yarygin
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Christian Borntraeger,
	David Ahern, Frederic Weisbecker, Ingo Molnar, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

On Thu, Sep 18, 2014 at 09:07:38PM +0400, Alexander Yarygin wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> When processing events the session code has an ordered samples queue which is
> used to time-sort events coming in across multiple mmaps. At a later point in
> time samples on the queue are flushed up to some timestamp at which point the
> event is actually processed.
> 
> When analyzing events live (ie., record/analysis path in the same command)
> there is a race that leads to corrupted events and parse errors which cause
> perf to terminate. The problem is that when the event is placed in the ordered
> samples queue it is only a reference to the event which is really sitting in
> the mmap buffer. Even though the event is queued for later processing the mmap
> tail pointer is updated which indicates to the kernel that the event has been
> processed. The race is flushing the event from the queue before it gets
> overwritten by some other event. For commands trying to process events live
> (versus just writing to a file) and processing a high rate of events this leads
> to parse failures and perf terminates.
> 
> Examples hitting this problem are 'perf kvm stat live', especially with nested
> VMs which generate 100,000+ traces per second, and a command processing
> scheduling events with a high rate of context switching -- e.g., running
> 'perf bench sched pipe'.
> 
> This patch offers live commands an option to copy the event when it is placed in
> the ordered samples queue.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Namhyung Kim <namhyung.kim@lge.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
> ---
>  tools/perf/util/ordered-events.c | 12 +++++++++---
>  tools/perf/util/ordered-events.h |  2 +-
>  tools/perf/util/session.c        | 12 +++++++++---
>  tools/perf/util/session.h        |  1 +
>  4 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/util/ordered-events.c b/tools/perf/util/ordered-events.c
> index 706ce1a..5b51de5 100644
> --- a/tools/perf/util/ordered-events.c
> +++ b/tools/perf/util/ordered-events.c
> @@ -140,11 +140,15 @@ static int __ordered_events__flush(struct perf_session *s,
>  			break;
>  
>  		ret = perf_evlist__parse_sample(s->evlist, iter->event, &sample);
> -		if (ret)
> +		if (ret) {
>  			pr_err("Can't parse sample, err = %d\n", ret);
> -		else {
> +			if (s->copy_on_queue)
> +				free(iter->event);
> +		} else {
>  			ret = perf_session__deliver_event(s, iter->event, &sample, tool,
>  							  iter->file_offset);
> +			if (s->copy_on_queue)
> +				free(iter->event);
>  			if (ret)
>  				return ret;
>  		}
> @@ -233,13 +237,15 @@ void ordered_events__init(struct ordered_events *oe)
>  	oe->cur_alloc_size = 0;
>  }
>  
> -void ordered_events__free(struct ordered_events *oe)
> +void ordered_events__free(struct perf_session *s, struct ordered_events *oe)
>  {
>  	while (!list_empty(&oe->to_free)) {
>  		struct ordered_event *event;
>  
>  		event = list_entry(oe->to_free.next, struct ordered_event, list);
>  		list_del(&event->list);
> +		if (s->copy_on_queue)
> +			free(event->event);
>  		free(event);
>  	}
>  }
> diff --git a/tools/perf/util/ordered-events.h b/tools/perf/util/ordered-events.h
> index 3b2f205..a156b0e 100644
> --- a/tools/perf/util/ordered-events.h
> +++ b/tools/perf/util/ordered-events.h
> @@ -41,7 +41,7 @@ void ordered_events__delete(struct ordered_events *oe, struct ordered_event *eve
>  int ordered_events__flush(struct perf_session *s, struct perf_tool *tool,
>  			  enum oe_flush how);
>  void ordered_events__init(struct ordered_events *oe);
> -void ordered_events__free(struct ordered_events *oe);
> +void ordered_events__free(struct perf_session *s, struct ordered_events *oe);
>  
>  static inline
>  void ordered_events__set_alloc_size(struct ordered_events *oe, u64 size)
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 6d2d50d..40d3aec 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -542,7 +542,13 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
>  		return -ENOMEM;
>  
>  	new->file_offset = file_offset;
> -	new->event = event;
> +
> +	if (s->copy_on_queue) {
> +		new->event = malloc(event->header.size);
> +		memcpy(new->event, event, event->header.size);

we have memdup, and you need to check for allocation failure

hum.. how about allocation limits?

Now have report.queue-size to keep track and limit the ordered_events
memory size. I think we should add this allocation size under this
limit as well.

> +	} else
> +		new->event = event;
> +

also the copy_on_queue flag and the logic above feels more like
it belongs to 'ordered_events' object to me

jirka

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-19  8:48         ` Alexander Yarygin
  2014-09-19 14:21           ` David Ahern
@ 2014-09-22  7:04           ` Jiri Olsa
  1 sibling, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2014-09-22  7:04 UTC (permalink / raw)
  To: Alexander Yarygin
  Cc: David Ahern, Arnaldo Carvalho de Melo, linux-kernel,
	Christian Borntraeger, Frederic Weisbecker, Ingo Molnar,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Stephane Eranian

On Fri, Sep 19, 2014 at 12:48:21PM +0400, Alexander Yarygin wrote:
> David Ahern <dsahern@gmail.com> writes:
> 
> > On 9/18/14, 2:21 PM, David Ahern wrote:
> >> On 9/18/14, 12:53 PM, Arnaldo Carvalho de Melo wrote:
> >>> If nobody objects I'll merge this patch, as it fixes problems, but I
> >>> wonder if the best wouldn't be simply not calling
> >>> perf_evlist__mmap_consume() till the last event there is in fact
> >>> consumed... I.e. as we _really_ consume the events, we remove it from
> >>> there.
> >>>
> >>> Instead of consuming the event at perf_tool->sample() time, we would
> >>> do it at perf_tool->finished_round(), would that be feasible? Has anyone
> >>> tried this?
> >>
> >> Hmmm... haven't tried this.  Conceptually it should work - at least
> >> nothing comes to mind at the moment.
> >
> > Upon further review ...
> >
> > Alex you might want to try this first. Malloc and copy of all events
> > is going to bring some serious overhead. Can avoid that if consuming
> > the event in finished_round works.
> >
> > David
> 
> I've tried that:
> 
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -737,7 +737,6 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx,
>                  * FIXME: Here we can't consume the event, as perf_session_queue_event will
>                  *        point to it, and it'll get possibly overwritten by the kernel.
>                  */
> -               perf_evlist__mmap_consume(kvm->evlist, idx);
>  
>                 if (err) {
>                         pr_err("Failed to enqueue sample: %d\n", err);
> @@ -787,6 +786,10 @@ static int perf_kvm__mmap_read(struct perf_kvm_stat *kvm)
>         if (ntotal) {
>                 kvm->session->ordered_samples.next_flush = flush_time;
>                 err = kvm->tool.finished_round(&kvm->tool, NULL, kvm->session);
> +
> +               for (i = 0; i < kvm->evlist->nr_mmaps; i++)
> +                       perf_evlist__mmap_consume(kvm->evlist, i);
> +
>                 if (err) {
>                         if (kvm->lost_events)
>                                 pr_info("\nLost events: %" PRIu64
>                         "\n\n",
> 
> It did't work. Turned out that there is at least one event alive after
> finished_round(), usually I get more - ~20. Not sure why, maybe it's
> another problem which should be solved at first?

The flush timestamp at the moment of the ROUND event is not
the max timestamp of the queue. It's set as max queue timestamp
by the previous flush as explained in util/session.c
process_finished_round function comment

> 
> 
> Also, I tried to follow 'perf-top' way:
> 
>   while (perf_evlist__mmap_read() != NULL) {
>     perf_evlist__parse_sample();
>     perf_event__process_sample();
>     perf_evlist__mmap_consume();
>   }
> 
> I.e. without session_queue. In this case perf won't crash, but it will
> process significantly less events.

right, I guess they are dropped in kernel due to the
userspace processing being slow

I'm ok with this solution.. with some comments in my other email

jirka

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] perf session: Add option to copy events when queueing
  2014-09-22  7:04   ` Jiri Olsa
@ 2014-09-29 18:33     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-29 18:33 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexander Yarygin, linux-kernel, Christian Borntraeger,
	David Ahern, Frederic Weisbecker, Ingo Molnar, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

Em Mon, Sep 22, 2014 at 09:04:10AM +0200, Jiri Olsa escreveu:
> On Thu, Sep 18, 2014 at 09:07:38PM +0400, Alexander Yarygin wrote:
> > +++ b/tools/perf/util/session.c
> > @@ -542,7 +542,13 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
> >  		return -ENOMEM;

> >  	new->file_offset = file_offset;
> > -	new->event = event;

> > +	if (s->copy_on_queue) {
> > +		new->event = malloc(event->header.size);
> > +		memcpy(new->event, event, event->header.size);

> we have memdup, and you need to check for allocation failure

Right.
 
> hum.. how about allocation limits?
 
> Now have report.queue-size to keep track and limit the ordered_events
> memory size. I think we should add this allocation size under this
> limit as well.

Yup.
 
> > +	} else
> > +		new->event = event;
> > +
> 
> also the copy_on_queue flag and the logic above feels more like
> it belongs to 'ordered_events' object to me

Agreed.

- Arnaldo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-29 18:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-18 17:07 [PATCH 0/2] perf kvm stat live: Copy events Alexander Yarygin
2014-09-18 17:07 ` [PATCH 1/2] perf session: Add option to copy events when queueing Alexander Yarygin
2014-09-18 18:53   ` Arnaldo Carvalho de Melo
2014-09-18 20:21     ` David Ahern
2014-09-18 20:29       ` David Ahern
2014-09-19  8:48         ` Alexander Yarygin
2014-09-19 14:21           ` David Ahern
2014-09-19 16:25             ` Alexander Yarygin
2014-09-22  7:04           ` Jiri Olsa
2014-09-22  7:04   ` Jiri Olsa
2014-09-29 18:33     ` Arnaldo Carvalho de Melo
2014-09-18 17:07 ` [PATCH 2/2] perf kvm stat live: Enable events copying Alexander Yarygin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).