All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] new mmap_read interfaces for ring buffer
@ 2017-10-10 17:20 kan.liang
  2017-10-10 17:20 ` [PATCH 01/10] perf record: new interfaces to read ring buffer to file kan.liang
                   ` (9 more replies)
  0 siblings, 10 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

The mmap_read interfaces for ring buffer has some issues
 - Not well organized. For example, perf record maintains some codes in
   builtin-record.c. Other perf tools use perf_mmap__read* functions to
   access ring buffer. Some of the codes should be shared.
 - Only support forward mode. Although there is
   perf_mmap__read_backward, no one use it. It's not well maintained.
   But perf top need to switch to overwrite backward mode for good
   performance.
 - There is bug found on overwrite backward mode.

The patch series will move all the mmap_read related code to evlist.c,
and provide new standard interfaces to access ring buffer.
The new interfaces include,
perf_evlist__mmap_read_init     //wrapper of perf_mmap__read_init
perf_mmap__read_init            //Initialization and calculate the range
perf_mmap__read_to_file		//read the ring buffer and write to file 
perf_mmap__read_event		//read the event in specified range
perf_mmap__read_done		//Update the md->prev for later use

For perf record, it needs to read all available data in ring buffer and
write them to perf.data file.
It can be done as below,
  perf_mmap__read_init
  perf_mmap__read_to_file
  perf_mmap__consume

For other perf tools, it needs to read and process events one by one.
It can be done as below,
  perf_mmap__read_init
  while (event = perf_mmap__read_event) {
    //process the event
    perf_mmap__consume
  }
  perf_mmap__read_done


All the stale interfaces will be removed, which include
perf_evlist__mmap_read
perf_evlist__mmap_read_forward
perf_evlist__mmap_read_backward
perf_evlist__mmap_read_catchup
perf_mmap__read_catchup
perf_mmap__read_forward
perf_mmap__read_backward

The patch series only changes the interfaces. It doesn't change the
core perf_mmap__read function. There is no functional change.

Kan Liang (10):
  perf record: new interfaces to read ring buffer to file
  perf tool: fix: Don't discard prev in backward mode
  perf tool: new functions to read event from ring buffer
  perf tool: perf_mmap__read_init wraper for evlist
  perf top: apply new mmap_read interfaces
  perf trace: apply new mmap_read interfaces
  perf kvm: apply new mmap_read interfaces
  perf python: apply new mmap_read interfaces
  perf tests: apply new mmap_read interfaces
  perf tool: remove stale mmap_read interfaces

 tools/perf/arch/x86/tests/perf-time-to-tsc.c |   9 +-
 tools/perf/builtin-kvm.c                     |  10 +-
 tools/perf/builtin-record.c                  | 111 +++---------
 tools/perf/builtin-top.c                     |   9 +-
 tools/perf/builtin-trace.c                   |   9 +-
 tools/perf/tests/backward-ring-buffer.c      |  10 +-
 tools/perf/tests/bpf.c                       |  10 +-
 tools/perf/tests/code-reading.c              |   8 +-
 tools/perf/tests/keep-tracking.c             |   8 +-
 tools/perf/tests/mmap-basic.c                |   9 +-
 tools/perf/tests/openat-syscall-tp-fields.c  |   9 +-
 tools/perf/tests/perf-record.c               |   9 +-
 tools/perf/tests/sw-clock.c                  |   8 +-
 tools/perf/tests/switch-tracking.c           |   8 +-
 tools/perf/tests/task-exit.c                 |   8 +-
 tools/perf/util/evlist.c                     | 244 +++++++++++++++++----------
 tools/perf/util/evlist.h                     |  31 ++--
 tools/perf/util/python.c                     |   9 +-
 18 files changed, 314 insertions(+), 205 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/10] perf record: new interfaces to read ring buffer to file
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 18:24   ` Arnaldo Carvalho de Melo
  2017-10-10 18:30   ` Arnaldo Carvalho de Melo
  2017-10-10 17:20 ` [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode kan.liang
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Factor out the mmap_read related codes into separate functions and move
them to evlist.c
struct perf_mmap_read is introduced to store the specific range of ring
buffer information.

The way to read specific range of ring buffer into file is as below,
  perf_mmap__read_init
  perf_mmap__read_to_file
  perf_mmap__consume

There is no functional change.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-record.c | 111 ++++++++------------------------------------
 tools/perf/util/evlist.c    | 111 ++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h    |  15 ++++++
 3 files changed, 146 insertions(+), 91 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 234fdf4..df555e9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -105,6 +105,17 @@ static bool switch_output_time(struct record *rec)
 	       trigger_is_ready(&switch_output_trigger);
 }
 
+static void update_output_size(struct record *rec, size_t size)
+{
+	if (size == 0)
+		return;
+
+	rec->bytes_written += size;
+
+	if (switch_output_size(rec))
+		trigger_hit(&switch_output_trigger);
+}
+
 static int record__write(struct record *rec, void *bf, size_t size)
 {
 	if (perf_data_file__write(rec->session->file, bf, size) < 0) {
@@ -112,10 +123,7 @@ static int record__write(struct record *rec, void *bf, size_t size)
 		return -1;
 	}
 
-	rec->bytes_written += size;
-
-	if (switch_output_size(rec))
-		trigger_hit(&switch_output_trigger);
+	update_output_size(rec, size);
 
 	return 0;
 }
@@ -130,106 +138,27 @@ static int process_synthesized_event(struct perf_tool *tool,
 }
 
 static int
-backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
-{
-	struct perf_event_header *pheader;
-	u64 evt_head = head;
-	int size = mask + 1;
-
-	pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
-	pheader = (struct perf_event_header *)(buf + (head & mask));
-	*start = head;
-	while (true) {
-		if (evt_head - head >= (unsigned int)size) {
-			pr_debug("Finished reading backward ring buffer: rewind\n");
-			if (evt_head - head > (unsigned int)size)
-				evt_head -= pheader->size;
-			*end = evt_head;
-			return 0;
-		}
-
-		pheader = (struct perf_event_header *)(buf + (evt_head & mask));
-
-		if (pheader->size == 0) {
-			pr_debug("Finished reading backward ring buffer: get start\n");
-			*end = evt_head;
-			return 0;
-		}
-
-		evt_head += pheader->size;
-		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
-	}
-	WARN_ONCE(1, "Shouldn't get here\n");
-	return -1;
-}
-
-static int
-rb_find_range(void *data, int mask, u64 head, u64 old,
-	      u64 *start, u64 *end, bool backward)
-{
-	if (!backward) {
-		*start = old;
-		*end = head;
-		return 0;
-	}
-
-	return backward_rb_find_range(data, mask, head, start, end);
-}
-
-static int
 record__mmap_read(struct record *rec, struct perf_mmap *md,
 		  bool overwrite, bool backward)
 {
-	u64 head = perf_mmap__read_head(md);
-	u64 old = md->prev;
-	u64 end = head, start = old;
-	unsigned char *data = md->base + page_size;
-	unsigned long size;
-	void *buf;
-	int rc = 0;
+	struct perf_mmap_read read;
 
-	if (rb_find_range(data, md->mask, head,
-			  old, &start, &end, backward))
+	if (perf_mmap__read_init(md, &read, overwrite, backward))
 		return -1;
 
-	if (start == end)
+	if (read.start == read.end)
 		return 0;
 
 	rec->samples++;
 
-	size = end - start;
-	if (size > (unsigned long)(md->mask) + 1) {
-		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
-
-		md->prev = head;
-		perf_mmap__consume(md, overwrite || backward);
-		return 0;
-	}
-
-	if ((start & md->mask) + size != (end & md->mask)) {
-		buf = &data[start & md->mask];
-		size = md->mask + 1 - (start & md->mask);
-		start += size;
-
-		if (record__write(rec, buf, size) < 0) {
-			rc = -1;
-			goto out;
-		}
-	}
+	if (perf_mmap__read_to_file(&read, rec->session->file) < 0)
+		return -1;
 
-	buf = &data[start & md->mask];
-	size = end - start;
-	start += size;
+	update_output_size(rec, read.size);
 
-	if (record__write(rec, buf, size) < 0) {
-		rc = -1;
-		goto out;
-	}
-
-	md->prev = head;
 	perf_mmap__consume(md, overwrite || backward);
-out:
-	return rc;
+
+	return 0;
 }
 
 static volatile int done;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 6a0d7ff..33b8837 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -704,6 +704,117 @@ static int perf_evlist__resume(struct perf_evlist *evlist)
 	return perf_evlist__set_paused(evlist, false);
 }
 
+static int
+backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
+{
+	struct perf_event_header *pheader;
+	u64 evt_head = head;
+	int size = mask + 1;
+
+	pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
+	pheader = (struct perf_event_header *)(buf + (head & mask));
+	*start = head;
+	while (true) {
+		if (evt_head - head >= (unsigned int)size) {
+			pr_debug("Finished reading backward ring buffer: rewind\n");
+			if (evt_head - head > (unsigned int)size)
+				evt_head -= pheader->size;
+			*end = evt_head;
+			return 0;
+		}
+
+		pheader = (struct perf_event_header *)(buf + (evt_head & mask));
+
+		if (pheader->size == 0) {
+			pr_debug("Finished reading backward ring buffer: get start\n");
+			*end = evt_head;
+			return 0;
+		}
+
+		evt_head += pheader->size;
+		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
+	}
+	WARN_ONCE(1, "Shouldn't get here\n");
+	return -1;
+}
+
+static int
+rb_find_range(void *data, int mask, u64 head, u64 old,
+	      u64 *start, u64 *end, bool backward)
+{
+	if (!backward) {
+		*start = old;
+		*end = head;
+		return 0;
+	}
+
+	return backward_rb_find_range(data, mask, head, start, end);
+}
+
+/*
+ * Initialize struct perf_mmap_read
+ * Calculate the range of ring buffer to read
+ */
+int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
+			 bool overwrite, bool backward)
+{
+	unsigned char *data = md->base + page_size;
+
+	read->md = md;
+	read->head = perf_mmap__read_head(md);
+	read->overwrite = overwrite;
+	read->backward = backward;
+	read->size = 0;
+	return rb_find_range(data, md->mask, read->head, md->prev,
+			     &read->start, &read->end, backward);
+}
+
+/*
+ * Read the ring buffer in the range which specified in struct perf_mmap_read,
+ * and write to file.
+ * Used by perf record.
+ */
+int perf_mmap__read_to_file(struct perf_mmap_read *read,
+			    struct perf_data_file *file)
+{
+	struct perf_mmap *md = read->md;
+	unsigned char *data = md->base + page_size;
+	unsigned long size;
+	void *buf;
+
+	size = read->end - read->start;
+	if (size > (unsigned long)(md->mask) + 1) {
+		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
+		read->size = 0;
+		goto out;
+	}
+
+	if ((read->start & md->mask) + size != (read->end & md->mask)) {
+		buf = &data[read->start & md->mask];
+		size = md->mask + 1 - (read->start & md->mask);
+		read->start += size;
+		read->size += size;
+
+		if (perf_data_file__write(file, buf, size) < 0) {
+			pr_err("failed to write %s, error: %m\n", file->path);
+			return -1;
+		}
+	}
+
+	buf = &data[read->start & md->mask];
+	size = read->end - read->start;
+	read->start += size;
+	read->size += size;
+
+	if (perf_data_file__write(file, buf, size) < 0) {
+		pr_err("failed to write %s, error: %m\n", file->path);
+		return -1;
+	}
+out:
+	md->prev = read->head;
+	return 0;
+}
+
 /* When check_messup is true, 'end' must points to a good entry */
 static union perf_event *
 perf_mmap__read(struct perf_mmap *md, bool check_messup, u64 start,
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b1c14f1..1ce4857 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -39,6 +39,16 @@ struct perf_mmap {
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
 };
 
+struct perf_mmap_read {
+	struct perf_mmap	*md;
+	u64			head;
+	u64			start;
+	u64			end;
+	bool			overwrite;
+	bool			backward;
+	unsigned long		size;
+};
+
 static inline size_t
 perf_mmap__mmap_len(struct perf_mmap *map)
 {
@@ -193,6 +203,11 @@ void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
 
 void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
 
+int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
+			 bool overwrite, bool backward);
+int perf_mmap__read_to_file(struct perf_mmap_read *read,
+			    struct perf_data_file *file);
+
 int perf_evlist__open(struct perf_evlist *evlist);
 void perf_evlist__close(struct perf_evlist *evlist);
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
  2017-10-10 17:20 ` [PATCH 01/10] perf record: new interfaces to read ring buffer to file kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 18:14   ` Arnaldo Carvalho de Melo
  2017-10-10 18:23   ` Wangnan (F)
  2017-10-10 17:20 ` [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer kan.liang
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Perf record can switch output. The new output should only store the data
after switching. However, in overwrite backward mode, the new output
still have the data from old output.

At the end of mmap_read, the position of processed ring buffer is saved
in md->prev. Next mmap_read should be end in md->prev.
However, the md->prev is discarded. So next mmap_read has to process
whole valid ring buffer, which definitely include the old processed
data.

Set the prev as the end of the range in backward mode.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/evlist.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 33b8837..7d23cf5 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -742,13 +742,25 @@ static int
 rb_find_range(void *data, int mask, u64 head, u64 old,
 	      u64 *start, u64 *end, bool backward)
 {
+	int ret;
+
 	if (!backward) {
 		*start = old;
 		*end = head;
 		return 0;
 	}
 
-	return backward_rb_find_range(data, mask, head, start, end);
+	ret = backward_rb_find_range(data, mask, head, start, end);
+
+	/*
+	 * The start and end from backward_rb_find_range is the range for all
+	 * valid data in ring buffer.
+	 * However, part of the data is processed previously.
+	 * Reset the end to drop the processed data
+	 */
+	*end = old;
+
+	return ret;
 }
 
 /*
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
  2017-10-10 17:20 ` [PATCH 01/10] perf record: new interfaces to read ring buffer to file kan.liang
  2017-10-10 17:20 ` [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 18:15   ` Arnaldo Carvalho de Melo
  2017-10-10 17:20 ` [PATCH 04/10] perf tool: perf_mmap__read_init wrapper for evlist kan.liang
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

The perf_evlist__mmap_read only support forward mode. It needs a common
function to support both forward and backward mode.
The perf_evlist__mmap_read_backward is buggy.

Introduce a new mmap read interface to support both forward and backward
mode. It can read event one by one from the specific range of ring
buffer.

The standard process to mmap__read event would be as below.
 perf_mmap__read_init
 while (event = perf_mmap__read_event) {
   //process the event
   perf_mmap__consume
 }
 perf_mmap__read_done

The following patches will use it to replace the old interfaces.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/evlist.c | 41 +++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7d23cf5..b36211e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -897,6 +897,47 @@ perf_mmap__read(struct perf_mmap *md, bool check_messup, u64 start,
 	return event;
 }
 
+/*
+ * Read the first event in the specified range of the ring buffer.
+ * Used by most of the perf tools and tests
+ */
+union perf_event *
+perf_mmap__read_event(struct perf_mmap_read *read)
+{
+	struct perf_mmap *md = read->md;
+	union perf_event *event;
+
+	/*
+	 * Check if event was unmapped due to a POLLHUP/POLLERR.
+	 */
+	if (!refcount_read(&md->refcnt))
+		return NULL;
+
+	/* In backward, the ringbuffer is already paused. */
+	if (!read->backward) {
+		read->end = perf_mmap__read_head(md);
+		if (!read->end)
+			return NULL;
+	}
+
+	event = perf_mmap__read(md, !read->backward && read->overwrite,
+				read->start, read->end, &md->prev);
+	read->start = md->prev;
+	return event;
+}
+
+/*
+ * Mandatory for backward
+ * The direction of backward mmap__read_event is from head to tail.
+ * The last mmap__read_event will set tail to md->prev.
+ * Need to correct the md->prev.
+ */
+void
+perf_mmap__read_done(struct perf_mmap_read *read)
+{
+	read->md->prev = read->head;
+}
+
 union perf_event *perf_mmap__read_forward(struct perf_mmap *md, bool check_messup)
 {
 	u64 head;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 1ce4857..53baf26 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -207,6 +207,8 @@ int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
 			 bool overwrite, bool backward);
 int perf_mmap__read_to_file(struct perf_mmap_read *read,
 			    struct perf_data_file *file);
+union perf_event *perf_mmap__read_event(struct perf_mmap_read *read);
+void perf_mmap__read_done(struct perf_mmap_read *read);
 
 int perf_evlist__open(struct perf_evlist *evlist);
 void perf_evlist__close(struct perf_evlist *evlist);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/10] perf tool: perf_mmap__read_init wrapper for evlist
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
                   ` (2 preceding siblings ...)
  2017-10-10 17:20 ` [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 17:20 ` [PATCH 05/10] perf top: apply new mmap_read interfaces kan.liang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Add a wrapper perf_evlist__mmap_read_init for perf_mmap__read_init.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/evlist.c | 9 +++++++++
 tools/perf/util/evlist.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index b36211e..f7fefdb 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -781,6 +781,15 @@ int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
 			     &read->start, &read->end, backward);
 }
 
+int perf_evlist__mmap_read_init(struct perf_evlist *evlist, int idx,
+				struct perf_mmap_read* read, bool backward)
+{
+	struct perf_mmap *md;
+
+	md = backward ? &evlist->backward_mmap[idx] : &evlist->mmap[idx];
+	return perf_mmap__read_init(md, read, evlist->overwrite, backward);
+}
+
 /*
  * Read the ring buffer in the range which specified in struct perf_mmap_read,
  * and write to file.
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 53baf26..f292936 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -209,6 +209,8 @@ int perf_mmap__read_to_file(struct perf_mmap_read *read,
 			    struct perf_data_file *file);
 union perf_event *perf_mmap__read_event(struct perf_mmap_read *read);
 void perf_mmap__read_done(struct perf_mmap_read *read);
+int perf_evlist__mmap_read_init(struct perf_evlist *evlist, int idx,
+				struct perf_mmap_read* read, bool backward);
 
 int perf_evlist__open(struct perf_evlist *evlist);
 void perf_evlist__close(struct perf_evlist *evlist);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 05/10] perf top: apply new mmap_read interfaces
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
                   ` (3 preceding siblings ...)
  2017-10-10 17:20 ` [PATCH 04/10] perf tool: perf_mmap__read_init wrapper for evlist kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 17:20 ` [PATCH 06/10] perf trace: " kan.liang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

no functional change

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-top.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 477a869..be9ffae 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -805,11 +805,17 @@ static void perf_top__mmap_read_idx(struct perf_top *top, int idx)
 	struct perf_sample sample;
 	struct perf_evsel *evsel;
 	struct perf_session *session = top->session;
+	struct perf_mmap_read read;
 	union perf_event *event;
 	struct machine *machine;
 	int ret;
 
-	while ((event = perf_evlist__mmap_read(top->evlist, idx)) != NULL) {
+	if (perf_evlist__mmap_read_init(top->evlist, idx, &read, false)) {
+		pr_err("Can't get mmap information\n");
+		return;
+	}
+
+	while ((event = perf_mmap__read_event(&read)) != NULL) {
 		ret = perf_evlist__parse_sample(top->evlist, event, &sample);
 		if (ret) {
 			pr_err("Can't parse sample, err = %d\n", ret);
@@ -866,6 +872,7 @@ static void perf_top__mmap_read_idx(struct perf_top *top, int idx)
 next_event:
 		perf_evlist__mmap_consume(top->evlist, idx);
 	}
+	perf_mmap__read_done(&read);
 }
 
 static void perf_top__mmap_read(struct perf_top *top)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/10] perf trace: apply new mmap_read interfaces
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
                   ` (4 preceding siblings ...)
  2017-10-10 17:20 ` [PATCH 05/10] perf top: apply new mmap_read interfaces kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 17:20 ` [PATCH 07/10] perf kvm: " kan.liang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

no functional change

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-trace.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index afef6fe..cd5aaf2 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2424,8 +2424,14 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
 		union perf_event *event;
+		struct perf_mmap_read read;
 
-		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+		if (perf_evlist__mmap_read_init(evlist, i, &read, false)) {
+			fprintf(trace->output, "Can't get mmap information\n");
+			continue;
+		}
+
+		while ((event = perf_mmap__read_event(&read)) != NULL) {
 			struct perf_sample sample;
 
 			++trace->nr_events;
@@ -2448,6 +2454,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
 				draining = true;
 			}
 		}
+		perf_mmap__read_done(&read);
 	}
 
 	if (trace->nr_events == before) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 07/10] perf kvm: apply new mmap_read interfaces
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
                   ` (5 preceding siblings ...)
  2017-10-10 17:20 ` [PATCH 06/10] perf trace: " kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 17:20 ` [PATCH 08/10] perf python: " kan.liang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

no functional change

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/builtin-kvm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 721f4f9..397a6b6 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -741,11 +741,18 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx,
 {
 	union perf_event *event;
 	struct perf_sample sample;
+	struct perf_mmap_read read;
 	s64 n = 0;
 	int err;
 
 	*mmap_time = ULLONG_MAX;
-	while ((event = perf_evlist__mmap_read(kvm->evlist, idx)) != NULL) {
+
+	if (perf_evlist__mmap_read_init(kvm->evlist, idx, &read, false)) {
+		pr_err("Can't get mmap information\n");
+		return -1;
+	}
+
+	while ((event = perf_mmap__read_event(&read)) != NULL) {
 		err = perf_evlist__parse_sample(kvm->evlist, event, &sample);
 		if (err) {
 			perf_evlist__mmap_consume(kvm->evlist, idx);
@@ -774,6 +781,7 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx,
 		if (n == PERF_KVM__MAX_EVENTS_PER_MMAP)
 			break;
 	}
+	perf_mmap__read_done(&read);
 
 	return n;
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 08/10] perf python: apply new mmap_read interfaces
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
                   ` (6 preceding siblings ...)
  2017-10-10 17:20 ` [PATCH 07/10] perf kvm: " kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 17:20 ` [PATCH 09/10] perf tests: " kan.liang
  2017-10-10 17:20 ` [PATCH 10/10] perf tool: remove stale " kan.liang
  9 siblings, 0 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

no functional change

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/python.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index c129e99..cfbaa86 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -948,6 +948,7 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist,
 {
 	struct perf_evlist *evlist = &pevlist->evlist;
 	union perf_event *event;
+	struct perf_mmap_read read;
 	int sample_id_all = 1, cpu;
 	static char *kwlist[] = { "cpu", "sample_id_all", NULL };
 	int err;
@@ -956,7 +957,11 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist,
 					 &cpu, &sample_id_all))
 		return NULL;
 
-	event = perf_evlist__mmap_read(evlist, cpu);
+	if (perf_evlist__mmap_read_init(evlist, cpu, &read, false))
+		return PyErr_Format(PyExc_OSError,
+				    "Can't get mmap information\n");
+
+	event = perf_mmap__read_event(&read);
 	if (event != NULL) {
 		PyObject *pyevent = pyrf_event__new(event);
 		struct pyrf_event *pevent = (struct pyrf_event *)pyevent;
@@ -975,7 +980,7 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist,
 
 		/* Consume the even only after we parsed it out. */
 		perf_evlist__mmap_consume(evlist, cpu);
-
+		perf_mmap__read_done(&read);
 		if (err)
 			return PyErr_Format(PyExc_OSError,
 					    "perf: can't parse sample, err=%d", err);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 09/10] perf tests: apply new mmap_read interfaces
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
                   ` (7 preceding siblings ...)
  2017-10-10 17:20 ` [PATCH 08/10] perf python: " kan.liang
@ 2017-10-10 17:20 ` kan.liang
  2017-10-10 17:20 ` [PATCH 10/10] perf tool: remove stale " kan.liang
  9 siblings, 0 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

no functional change

backward-ring-buffer is the test case for backward mode.

The rest of the test cases are forward mode.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/arch/x86/tests/perf-time-to-tsc.c |  9 ++++++++-
 tools/perf/tests/backward-ring-buffer.c      | 10 ++++++++--
 tools/perf/tests/bpf.c                       | 10 +++++++++-
 tools/perf/tests/code-reading.c              |  8 +++++++-
 tools/perf/tests/keep-tracking.c             |  8 +++++++-
 tools/perf/tests/mmap-basic.c                |  9 ++++++++-
 tools/perf/tests/openat-syscall-tp-fields.c  |  9 ++++++++-
 tools/perf/tests/perf-record.c               |  9 ++++++++-
 tools/perf/tests/sw-clock.c                  |  8 +++++++-
 tools/perf/tests/switch-tracking.c           |  8 +++++++-
 tools/perf/tests/task-exit.c                 |  8 +++++++-
 11 files changed, 84 insertions(+), 12 deletions(-)

diff --git a/tools/perf/arch/x86/tests/perf-time-to-tsc.c b/tools/perf/arch/x86/tests/perf-time-to-tsc.c
index 5dd7efb..e1cdd23 100644
--- a/tools/perf/arch/x86/tests/perf-time-to-tsc.c
+++ b/tools/perf/arch/x86/tests/perf-time-to-tsc.c
@@ -57,6 +57,7 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe
 	struct perf_tsc_conversion tc;
 	struct perf_event_mmap_page *pc;
 	union perf_event *event;
+	struct perf_mmap_read read;
 	u64 test_tsc, comm1_tsc, comm2_tsc;
 	u64 test_time, comm1_time = 0, comm2_time = 0;
 
@@ -108,7 +109,12 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe
 	perf_evlist__disable(evlist);
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
-		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+		if (perf_evlist__mmap_read_init(evlist, i, &read, false)) {
+			pr_err("Can't get mmap information\n");
+			continue;
+		}
+
+		while ((event = perf_mmap__read_event(&read)) != NULL) {
 			struct perf_sample sample;
 
 			if (event->header.type != PERF_RECORD_COMM ||
@@ -129,6 +135,7 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe
 next_event:
 			perf_evlist__mmap_consume(evlist, i);
 		}
+		perf_mmap__read_done(&read);
 	}
 
 	if (!comm1_time || !comm2_time)
diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
index d233ad3..4841b7e 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -31,9 +31,14 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
 		union perf_event *event;
+		struct perf_mmap_read read;
 
-		perf_mmap__read_catchup(&evlist->backward_mmap[i]);
-		while ((event = perf_mmap__read_backward(&evlist->backward_mmap[i])) != NULL) {
+		if (perf_evlist__mmap_read_init(evlist, i, &read, true)) {
+			pr_err("Can't get mmap information\n");
+			continue;
+		}
+
+		while ((event = perf_mmap__read_event(&read)) != NULL) {
 			const u32 type = event->header.type;
 
 			switch (type) {
@@ -48,6 +53,7 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
 				return TEST_FAIL;
 			}
 		}
+		perf_mmap__read_done(&read);
 	}
 	return TEST_OK;
 }
diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
index 34c22cd..0ab7f8f 100644
--- a/tools/perf/tests/bpf.c
+++ b/tools/perf/tests/bpf.c
@@ -180,13 +180,21 @@ static int do_test(struct bpf_object *obj, int (*func)(void),
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
 		union perf_event *event;
+		struct perf_mmap_read read;
 
-		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+		if (perf_evlist__mmap_read_init(evlist, i, &read, false)) {
+			pr_err("Can't get mmap information\n");
+			continue;
+		}
+
+		while ((event = perf_mmap__read_event(&read)) != NULL) {
 			const u32 type = event->header.type;
 
 			if (type == PERF_RECORD_SAMPLE)
 				count ++;
+			perf_evlist__mmap_consume(evlist, i);
 		}
+		perf_mmap__read_done(&read);
 	}
 
 	if (count != expect) {
diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 466a462..69cf7fc 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -408,15 +408,21 @@ static int process_events(struct machine *machine, struct perf_evlist *evlist,
 			  struct state *state)
 {
 	union perf_event *event;
+	struct perf_mmap_read read;
 	int i, ret;
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
-		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+		if (perf_evlist__mmap_read_init(evlist, i, &read, false)) {
+			pr_err("Can't get mmap information\n");
+			continue;
+		}
+		while ((event = perf_mmap__read_event(&read)) != NULL) {
 			ret = process_event(machine, evlist, event, state);
 			perf_evlist__mmap_consume(evlist, i);
 			if (ret < 0)
 				return ret;
 		}
+		perf_mmap__read_done(&read);
 	}
 	return 0;
 }
diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c
index 7394286..b2beab0 100644
--- a/tools/perf/tests/keep-tracking.c
+++ b/tools/perf/tests/keep-tracking.c
@@ -26,11 +26,16 @@
 static int find_comm(struct perf_evlist *evlist, const char *comm)
 {
 	union perf_event *event;
+	struct perf_mmap_read read;
 	int i, found;
 
 	found = 0;
 	for (i = 0; i < evlist->nr_mmaps; i++) {
-		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+		if (perf_evlist__mmap_read_init(evlist, i, &read, false)) {
+			pr_err("Can't get mmap information\n");
+			continue;
+		}
+		while ((event = perf_mmap__read_event(&read)) != NULL) {
 			if (event->header.type == PERF_RECORD_COMM &&
 			    (pid_t)event->comm.pid == getpid() &&
 			    (pid_t)event->comm.tid == getpid() &&
@@ -38,6 +43,7 @@ static int find_comm(struct perf_evlist *evlist, const char *comm)
 				found += 1;
 			perf_evlist__mmap_consume(evlist, i);
 		}
+		perf_mmap__read_done(&read);
 	}
 	return found;
 }
diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c
index bc8a70e..529c0c7 100644
--- a/tools/perf/tests/mmap-basic.c
+++ b/tools/perf/tests/mmap-basic.c
@@ -37,6 +37,7 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse
 		     expected_nr_events[nsyscalls], i, j;
 	struct perf_evsel *evsels[nsyscalls], *evsel;
 	char sbuf[STRERR_BUFSIZE];
+	struct perf_mmap_read read;
 
 	threads = thread_map__new(-1, getpid(), UINT_MAX);
 	if (threads == NULL) {
@@ -105,7 +106,12 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse
 			++foo;
 		}
 
-	while ((event = perf_evlist__mmap_read(evlist, 0)) != NULL) {
+	if (perf_evlist__mmap_read_init(evlist, 0, &read, false)) {
+		pr_err("Can't get mmap information\n");
+		goto out_delete_evlist;
+	}
+
+	while ((event = perf_mmap__read_event(&read)) != NULL) {
 		struct perf_sample sample;
 
 		if (event->header.type != PERF_RECORD_SAMPLE) {
@@ -130,6 +136,7 @@ int test__basic_mmap(struct test *test __maybe_unused, int subtest __maybe_unuse
 		nr_events[evsel->idx]++;
 		perf_evlist__mmap_consume(evlist, 0);
 	}
+	perf_mmap__read_done(&read);
 
 	err = 0;
 	evlist__for_each_entry(evlist, evsel) {
diff --git a/tools/perf/tests/openat-syscall-tp-fields.c b/tools/perf/tests/openat-syscall-tp-fields.c
index b6ee1c4..8431561 100644
--- a/tools/perf/tests/openat-syscall-tp-fields.c
+++ b/tools/perf/tests/openat-syscall-tp-fields.c
@@ -82,8 +82,14 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest
 
 		for (i = 0; i < evlist->nr_mmaps; i++) {
 			union perf_event *event;
+			struct perf_mmap_read read;
 
-			while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+			if (perf_evlist__mmap_read_init(evlist, i, &read, false)) {
+				pr_err("Can't get mmap information\n");
+				continue;
+			}
+
+			while ((event = perf_mmap__read_event(&read)) != NULL) {
 				const u32 type = event->header.type;
 				int tp_flags;
 				struct perf_sample sample;
@@ -111,6 +117,7 @@ int test__syscall_openat_tp_fields(struct test *test __maybe_unused, int subtest
 
 				goto out_ok;
 			}
+			perf_mmap__read_done(&read);
 		}
 
 		if (nr_events == before)
diff --git a/tools/perf/tests/perf-record.c b/tools/perf/tests/perf-record.c
index 19b6500..c5e1f08 100644
--- a/tools/perf/tests/perf-record.c
+++ b/tools/perf/tests/perf-record.c
@@ -163,8 +163,14 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus
 
 		for (i = 0; i < evlist->nr_mmaps; i++) {
 			union perf_event *event;
+			struct perf_mmap_read read;
 
-			while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+			if (perf_evlist__mmap_read_init(evlist, i, &read, false)) {
+				pr_err("Can't get mmap information\n");
+				continue;
+			}
+
+			while ((event = perf_mmap__read_event(&read)) != NULL) {
 				const u32 type = event->header.type;
 				const char *name = perf_event__name(type);
 
@@ -267,6 +273,7 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus
 
 				perf_evlist__mmap_consume(evlist, i);
 			}
+			perf_mmap__read_done(&read);
 		}
 
 		/*
diff --git a/tools/perf/tests/sw-clock.c b/tools/perf/tests/sw-clock.c
index d88511f..a5dd3e8 100644
--- a/tools/perf/tests/sw-clock.c
+++ b/tools/perf/tests/sw-clock.c
@@ -38,6 +38,7 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id)
 	};
 	struct cpu_map *cpus;
 	struct thread_map *threads;
+	struct perf_mmap_read read;
 
 	attr.sample_freq = 500;
 
@@ -92,7 +93,11 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id)
 
 	perf_evlist__disable(evlist);
 
-	while ((event = perf_evlist__mmap_read(evlist, 0)) != NULL) {
+	if (perf_evlist__mmap_read_init(evlist, 0, &read, false)) {
+		pr_err("Can't get mmap information\n");
+		goto out_delete_evlist;
+	}
+	while ((event = perf_mmap__read_event(&read)) != NULL) {
 		struct perf_sample sample;
 
 		if (event->header.type != PERF_RECORD_SAMPLE)
@@ -109,6 +114,7 @@ static int __test__sw_clock_freq(enum perf_sw_ids clock_id)
 next_event:
 		perf_evlist__mmap_consume(evlist, 0);
 	}
+	perf_mmap__read_done(&read);
 
 	if ((u64) nr_samples == total_periods) {
 		pr_debug("All (%d) samples have period value of 1!\n",
diff --git a/tools/perf/tests/switch-tracking.c b/tools/perf/tests/switch-tracking.c
index 2acd785..3b4f2f6 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -257,16 +257,22 @@ static int process_events(struct perf_evlist *evlist,
 	unsigned pos, cnt = 0;
 	LIST_HEAD(events);
 	struct event_node *events_array, *node;
+	struct perf_mmap_read read;
 	int i, ret;
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
-		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+		if (perf_evlist__mmap_read_init(evlist, i, &read, false)) {
+			pr_err("Can't get mmap information\n");
+			continue;
+		}
+		while ((event = perf_mmap__read_event(&read)) != NULL) {
 			cnt += 1;
 			ret = add_event(evlist, &events, event);
 			perf_evlist__mmap_consume(evlist, i);
 			if (ret < 0)
 				goto out_free_nodes;
 		}
+		perf_mmap__read_done(&read);
 	}
 
 	events_array = calloc(cnt, sizeof(struct event_node));
diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c
index f0881d0..2529fba 100644
--- a/tools/perf/tests/task-exit.c
+++ b/tools/perf/tests/task-exit.c
@@ -46,6 +46,7 @@ int test__task_exit(struct test *test __maybe_unused, int subtest __maybe_unused
 	char sbuf[STRERR_BUFSIZE];
 	struct cpu_map *cpus;
 	struct thread_map *threads;
+	struct perf_mmap_read read;
 
 	signal(SIGCHLD, sig_handler);
 
@@ -105,12 +106,17 @@ int test__task_exit(struct test *test __maybe_unused, int subtest __maybe_unused
 	perf_evlist__start_workload(evlist);
 
 retry:
-	while ((event = perf_evlist__mmap_read(evlist, 0)) != NULL) {
+	if (perf_evlist__mmap_read_init(evlist, 0, &read, false)) {
+		pr_err("Can't get mmap information\n");
+		goto out_delete_evlist;
+	}
+	while ((event = perf_mmap__read_event(&read)) != NULL) {
 		if (event->header.type == PERF_RECORD_EXIT)
 			nr_exit++;
 
 		perf_evlist__mmap_consume(evlist, 0);
 	}
+	perf_mmap__read_done(&read);
 
 	if (!exited || !nr_exit) {
 		perf_evlist__poll(evlist, -1);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/10] perf tool: remove stale mmap_read interfaces
  2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
                   ` (8 preceding siblings ...)
  2017-10-10 17:20 ` [PATCH 09/10] perf tests: " kan.liang
@ 2017-10-10 17:20 ` kan.liang
  9 siblings, 0 replies; 41+ messages in thread
From: kan.liang @ 2017-10-10 17:20 UTC (permalink / raw)
  To: acme, peterz, mingo, linux-kernel
  Cc: jolsa, wangnan0, hekuang, namhyung, alexander.shishkin,
	adrian.hunter, ak, Kan Liang

From: Kan Liang <kan.liang@intel.com>

No one use it

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 tools/perf/util/evlist.c | 101 +----------------------------------------------
 tools/perf/util/evlist.h |  12 ------
 2 files changed, 1 insertion(+), 112 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f7fefdb..d8cf70b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -947,105 +947,6 @@ perf_mmap__read_done(struct perf_mmap_read *read)
 	read->md->prev = read->head;
 }
 
-union perf_event *perf_mmap__read_forward(struct perf_mmap *md, bool check_messup)
-{
-	u64 head;
-	u64 old = md->prev;
-
-	/*
-	 * Check if event was unmapped due to a POLLHUP/POLLERR.
-	 */
-	if (!refcount_read(&md->refcnt))
-		return NULL;
-
-	head = perf_mmap__read_head(md);
-
-	return perf_mmap__read(md, check_messup, old, head, &md->prev);
-}
-
-union perf_event *
-perf_mmap__read_backward(struct perf_mmap *md)
-{
-	u64 head, end;
-	u64 start = md->prev;
-
-	/*
-	 * Check if event was unmapped due to a POLLHUP/POLLERR.
-	 */
-	if (!refcount_read(&md->refcnt))
-		return NULL;
-
-	head = perf_mmap__read_head(md);
-	if (!head)
-		return NULL;
-
-	/*
-	 * 'head' pointer starts from 0. Kernel minus sizeof(record) form
-	 * it each time when kernel writes to it, so in fact 'head' is
-	 * negative. 'end' pointer is made manually by adding the size of
-	 * the ring buffer to 'head' pointer, means the validate data can
-	 * read is the whole ring buffer. If 'end' is positive, the ring
-	 * buffer has not fully filled, so we must adjust 'end' to 0.
-	 *
-	 * However, since both 'head' and 'end' is unsigned, we can't
-	 * simply compare 'end' against 0. Here we compare '-head' and
-	 * the size of the ring buffer, where -head is the number of bytes
-	 * kernel write to the ring buffer.
-	 */
-	if (-head < (u64)(md->mask + 1))
-		end = 0;
-	else
-		end = head + md->mask + 1;
-
-	return perf_mmap__read(md, false, start, end, &md->prev);
-}
-
-union perf_event *perf_evlist__mmap_read_forward(struct perf_evlist *evlist, int idx)
-{
-	struct perf_mmap *md = &evlist->mmap[idx];
-
-	/*
-	 * Check messup is required for forward overwritable ring buffer:
-	 * memory pointed by md->prev can be overwritten in this case.
-	 * No need for read-write ring buffer: kernel stop outputting when
-	 * it hit md->prev (perf_mmap__consume()).
-	 */
-	return perf_mmap__read_forward(md, evlist->overwrite);
-}
-
-union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
-{
-	struct perf_mmap *md = &evlist->mmap[idx];
-
-	/*
-	 * No need to check messup for backward ring buffer:
-	 * We can always read arbitrary long data from a backward
-	 * ring buffer unless we forget to pause it before reading.
-	 */
-	return perf_mmap__read_backward(md);
-}
-
-union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
-{
-	return perf_evlist__mmap_read_forward(evlist, idx);
-}
-
-void perf_mmap__read_catchup(struct perf_mmap *md)
-{
-	u64 head;
-
-	if (!refcount_read(&md->refcnt))
-		return;
-
-	head = perf_mmap__read_head(md);
-	md->prev = head;
-}
-
-void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx)
-{
-	perf_mmap__read_catchup(&evlist->mmap[idx]);
-}
-
 static bool perf_mmap__empty(struct perf_mmap *md)
 {
 	return perf_mmap__read_head(md) == md->prev && !md->auxtrace_mmap.base;
@@ -1464,7 +1365,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  * @auxtrace_overwrite - overwrite older auxtrace data?
  *
  * If @overwrite is %false the user needs to signal event consumption using
- * perf_mmap__write_tail().  Using perf_evlist__mmap_read() does this
+ * perf_mmap__write_tail().  Using perf_mmap__read_event() does this
  * automatically.
  *
  * Similarly, if @auxtrace_overwrite is %false the user needs to signal data
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index f292936..8ae4496 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -187,20 +187,8 @@ struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
 
 void perf_evlist__toggle_bkw_mmap(struct perf_evlist *evlist, enum bkw_mmap_state state);
 
-union perf_event *perf_mmap__read_forward(struct perf_mmap *map, bool check_messup);
-union perf_event *perf_mmap__read_backward(struct perf_mmap *map);
-
-void perf_mmap__read_catchup(struct perf_mmap *md);
 void perf_mmap__consume(struct perf_mmap *md, bool overwrite);
 
-union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx);
-
-union perf_event *perf_evlist__mmap_read_forward(struct perf_evlist *evlist,
-						 int idx);
-union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist,
-						  int idx);
-void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
-
 void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
 
 int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 17:20 ` [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode kan.liang
@ 2017-10-10 18:14   ` Arnaldo Carvalho de Melo
  2017-10-10 18:18     ` Liang, Kan
  2017-10-13 12:55     ` Liang, Kan
  2017-10-10 18:23   ` Wangnan (F)
  1 sibling, 2 replies; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-10 18:14 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, adrian.hunter, ak

Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.liang@intel.com escreveu:
> From: Kan Liang <kan.liang@intel.com>
> 
> Perf record can switch output. The new output should only store the data
> after switching. However, in overwrite backward mode, the new output
> still have the data from old output.
> 
> At the end of mmap_read, the position of processed ring buffer is saved
> in md->prev. Next mmap_read should be end in md->prev.
> However, the md->prev is discarded. So next mmap_read has to process
> whole valid ring buffer, which definitely include the old processed
> data.
> 
> Set the prev as the end of the range in backward mode.

Do you think this should go together with the rest of this patchkit?
Probably it should be processed ASAP, i.e. perf/urgent, no?

- Arnaldo
 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  tools/perf/util/evlist.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 33b8837..7d23cf5 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -742,13 +742,25 @@ static int
>  rb_find_range(void *data, int mask, u64 head, u64 old,
>  	      u64 *start, u64 *end, bool backward)
>  {
> +	int ret;
> +
>  	if (!backward) {
>  		*start = old;
>  		*end = head;
>  		return 0;
>  	}
>  
> -	return backward_rb_find_range(data, mask, head, start, end);
> +	ret = backward_rb_find_range(data, mask, head, start, end);
> +
> +	/*
> +	 * The start and end from backward_rb_find_range is the range for all
> +	 * valid data in ring buffer.
> +	 * However, part of the data is processed previously.
> +	 * Reset the end to drop the processed data
> +	 */
> +	*end = old;
> +
> +	return ret;
>  }
>  
>  /*
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 17:20 ` [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer kan.liang
@ 2017-10-10 18:15   ` Arnaldo Carvalho de Melo
  2017-10-10 18:28     ` Liang, Kan
  0 siblings, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-10 18:15 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, adrian.hunter, ak

Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com escreveu:
> From: Kan Liang <kan.liang@intel.com>
> 
> The perf_evlist__mmap_read only support forward mode. It needs a common
> function to support both forward and backward mode.

> The perf_evlist__mmap_read_backward is buggy.

So, what is the bug? You state that it is buggy, but don't spell out the
bug, please do so.

If it fixes an existing bug, then it should go separate from this
patchkit, right?

- Arnaldo
 
> Introduce a new mmap read interface to support both forward and backward
> mode. It can read event one by one from the specific range of ring
> buffer.
> 
> The standard process to mmap__read event would be as below.
>  perf_mmap__read_init
>  while (event = perf_mmap__read_event) {
>    //process the event
>    perf_mmap__consume
>  }
>  perf_mmap__read_done
> 
> The following patches will use it to replace the old interfaces.
> 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  tools/perf/util/evlist.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/evlist.h |  2 ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 7d23cf5..b36211e 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -897,6 +897,47 @@ perf_mmap__read(struct perf_mmap *md, bool check_messup, u64 start,
>  	return event;
>  }
>  
> +/*
> + * Read the first event in the specified range of the ring buffer.
> + * Used by most of the perf tools and tests
> + */
> +union perf_event *
> +perf_mmap__read_event(struct perf_mmap_read *read)
> +{
> +	struct perf_mmap *md = read->md;
> +	union perf_event *event;
> +
> +	/*
> +	 * Check if event was unmapped due to a POLLHUP/POLLERR.
> +	 */
> +	if (!refcount_read(&md->refcnt))
> +		return NULL;
> +
> +	/* In backward, the ringbuffer is already paused. */
> +	if (!read->backward) {
> +		read->end = perf_mmap__read_head(md);
> +		if (!read->end)
> +			return NULL;
> +	}
> +
> +	event = perf_mmap__read(md, !read->backward && read->overwrite,
> +				read->start, read->end, &md->prev);
> +	read->start = md->prev;
> +	return event;
> +}
> +
> +/*
> + * Mandatory for backward
> + * The direction of backward mmap__read_event is from head to tail.
> + * The last mmap__read_event will set tail to md->prev.
> + * Need to correct the md->prev.
> + */
> +void
> +perf_mmap__read_done(struct perf_mmap_read *read)
> +{
> +	read->md->prev = read->head;
> +}
> +
>  union perf_event *perf_mmap__read_forward(struct perf_mmap *md, bool check_messup)
>  {
>  	u64 head;
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 1ce4857..53baf26 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -207,6 +207,8 @@ int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
>  			 bool overwrite, bool backward);
>  int perf_mmap__read_to_file(struct perf_mmap_read *read,
>  			    struct perf_data_file *file);
> +union perf_event *perf_mmap__read_event(struct perf_mmap_read *read);
> +void perf_mmap__read_done(struct perf_mmap_read *read);
>  
>  int perf_evlist__open(struct perf_evlist *evlist);
>  void perf_evlist__close(struct perf_evlist *evlist);
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 18:14   ` Arnaldo Carvalho de Melo
@ 2017-10-10 18:18     ` Liang, Kan
  2017-10-13 12:55     ` Liang, Kan
  1 sibling, 0 replies; 41+ messages in thread
From: Liang, Kan @ 2017-10-10 18:18 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

> Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.liang@intel.com escreveu:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > Perf record can switch output. The new output should only store the
> > data after switching. However, in overwrite backward mode, the new
> > output still have the data from old output.
> >
> > At the end of mmap_read, the position of processed ring buffer is
> > saved in md->prev. Next mmap_read should be end in md->prev.
> > However, the md->prev is discarded. So next mmap_read has to process
> > whole valid ring buffer, which definitely include the old processed
> > data.
> >
> > Set the prev as the end of the range in backward mode.
> 
> Do you think this should go together with the rest of this patchkit?
> Probably it should be processed ASAP, i.e. perf/urgent, no?

Sure, I think the fix can be merged separately. 

Thanks,
Kan

> 
> - Arnaldo
> 
> > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > ---
> >  tools/perf/util/evlist.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index
> > 33b8837..7d23cf5 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -742,13 +742,25 @@ static int
> >  rb_find_range(void *data, int mask, u64 head, u64 old,
> >  	      u64 *start, u64 *end, bool backward)  {
> > +	int ret;
> > +
> >  	if (!backward) {
> >  		*start = old;
> >  		*end = head;
> >  		return 0;
> >  	}
> >
> > -	return backward_rb_find_range(data, mask, head, start, end);
> > +	ret = backward_rb_find_range(data, mask, head, start, end);
> > +
> > +	/*
> > +	 * The start and end from backward_rb_find_range is the range for
> all
> > +	 * valid data in ring buffer.
> > +	 * However, part of the data is processed previously.
> > +	 * Reset the end to drop the processed data
> > +	 */
> > +	*end = old;
> > +
> > +	return ret;
> >  }
> >
> >  /*
> > --
> > 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 17:20 ` [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode kan.liang
  2017-10-10 18:14   ` Arnaldo Carvalho de Melo
@ 2017-10-10 18:23   ` Wangnan (F)
  2017-10-10 18:50     ` Wangnan (F)
  2017-10-10 19:36     ` Liang, Kan
  1 sibling, 2 replies; 41+ messages in thread
From: Wangnan (F) @ 2017-10-10 18:23 UTC (permalink / raw)
  To: kan.liang, acme, peterz, mingo, linux-kernel
  Cc: jolsa, hekuang, namhyung, alexander.shishkin, adrian.hunter, ak



On 2017/10/11 1:20, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> Perf record can switch output. The new output should only store the data
> after switching. However, in overwrite backward mode, the new output
> still have the data from old output.
>
> At the end of mmap_read, the position of processed ring buffer is saved
> in md->prev. Next mmap_read should be end in md->prev.
> However, the md->prev is discarded. So next mmap_read has to process
> whole valid ring buffer, which definitely include the old processed
> data.
>
> Set the prev as the end of the range in backward mode.
>
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>   tools/perf/util/evlist.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 33b8837..7d23cf5 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -742,13 +742,25 @@ static int
>   rb_find_range(void *data, int mask, u64 head, u64 old,
>   	      u64 *start, u64 *end, bool backward)
>   {
> +	int ret;
> +
>   	if (!backward) {
>   		*start = old;
>   		*end = head;
>   		return 0;
>   	}
>   
> -	return backward_rb_find_range(data, mask, head, start, end);
> +	ret = backward_rb_find_range(data, mask, head, start, end);
> +
> +	/*
> +	 * The start and end from backward_rb_find_range is the range for all
> +	 * valid data in ring buffer.
> +	 * However, part of the data is processed previously.
> +	 * Reset the end to drop the processed data
> +	 */
> +	*end = old;
> +

I don't understand this patch. What rb_find_range() wants to do is to 
find start
and end pointer, so record__mmap_read() knows where to start reading and 
where to stop.
In backward mode, 'start' pointer is clearly 'head' (because 'head' is 
the header of the
last record kernel writes to the ring buffer), but the 'end' pointer is 
not very easy to
find ('end' pointer is the last (the earliest) available record in the 
ring buffer).
We have to parse the whole ring buffer to find the last available record 
and set its
tail into 'end', this is what backward_rb_find_range() tries hard to do. 
However,
your patch unconditionally overwrites *end (which is set by 
backward_rb_find_range()),
makes backward_rb_find_range() meaningless.

In my mind when you decide to use backward ring buffer you must make it 
overwrite
because you want the kernel silently overwrite old records without 
waking up perf, while
still able to parse the ring buffer even if overwriting happens. The use 
case
should be: perf runs in background, kernel silently ignores old records 
in the ring
buffer until something bad happens, then perf dumps 'events before the 
bad thing'.
Similar to fight recorder. In this use case, each dumping is 
independent. Even
if one record is appears in previous dumping, it is harmless (and 
useful) to make
it appears in following dumping again.

If you really want to avoid record duplication, you need to changes 
record__mmap_read()'s
logic. Now it complains "failed to keep up with mmap data" and avoid 
dumping data when
size of newly generated data is larger than the size of the ring buffer. 
It is reasonable
for forward ring buffer because in this case you lost the head of the 
first record, the
whole ring buffer is unparseable. However, it is wrong in backward case. 
What you
should do in this case is dumping the whole ring buffer.

> +	return ret;
>   }
>   
>   /*

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/10] perf record: new interfaces to read ring buffer to file
  2017-10-10 17:20 ` [PATCH 01/10] perf record: new interfaces to read ring buffer to file kan.liang
@ 2017-10-10 18:24   ` Arnaldo Carvalho de Melo
  2017-10-10 18:30   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-10 18:24 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, adrian.hunter, ak

Em Tue, Oct 10, 2017 at 10:20:14AM -0700, kan.liang@intel.com escreveu:
> From: Kan Liang <kan.liang@intel.com>
> 
> Factor out the mmap_read related codes into separate functions and move
> them to evlist.c
> struct perf_mmap_read is introduced to store the specific range of ring
> buffer information.
> 
> The way to read specific range of ring buffer into file is as below,
>   perf_mmap__read_init
>   perf_mmap__read_to_file
>   perf_mmap__consume
> 
> There is no functional change.

So, I started doing some changes in this direction as well, thinking
that 'struct perf_mmap' is the right place to either dump the ring
buffer into a file, at some point in a thread per 'struct perf_mmap'
instance (which for 'perf record -a' will be one per cpu).

But my intention was to do the very simple case first, i.e.:

	perf record -a

Without backward ring buffer, snapshotting, etc, that would come later,
step by step.

I.e. when some of these features are requested, then we use the current
code, generating a simple perf.data file, etc.

In the simple case, we then would use one thread per 'struct perf_mmap'
and dump to separate files.

When 'perf report' or other tools 'perf script' noticed that new format,
with a directory per session, with one file per CPU, then it would
consider those to be the 'struct perf_mmap' 'frozen state' and would
read from them just like they would from real mmaped memory, again, some
more 'struct perf_mmap' instances, etc.

Please take a look at my perf/core branch, it has just the beginning of
this, perhaps we can collaborate on getting what you did on top of it,
or help me realize why this is not the way to go :-)

But I think that before getting to the more complicated bits (backward,
snapshotting, etc), 'perf top' should use threads to instead of dumping
the ring buffer contentes on files for later processing, to consume them
straight away, creating hist_entry instances in parallel and using the
refcounts and locks already in place for threads, maps, hist_entry, and
adding some more along the way to make it possible to process samples
in parallel and update the the struct machine/thread/map/symbol/dso
instances.

Meanwhile I'll continue reading your patches, thanks for working on it!

- Arnaldo
 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  tools/perf/builtin-record.c | 111 ++++++++------------------------------------
>  tools/perf/util/evlist.c    | 111 ++++++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/evlist.h    |  15 ++++++
>  3 files changed, 146 insertions(+), 91 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 234fdf4..df555e9 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -105,6 +105,17 @@ static bool switch_output_time(struct record *rec)
>  	       trigger_is_ready(&switch_output_trigger);
>  }
>  
> +static void update_output_size(struct record *rec, size_t size)
> +{
> +	if (size == 0)
> +		return;
> +
> +	rec->bytes_written += size;
> +
> +	if (switch_output_size(rec))
> +		trigger_hit(&switch_output_trigger);
> +}
> +
>  static int record__write(struct record *rec, void *bf, size_t size)
>  {
>  	if (perf_data_file__write(rec->session->file, bf, size) < 0) {
> @@ -112,10 +123,7 @@ static int record__write(struct record *rec, void *bf, size_t size)
>  		return -1;
>  	}
>  
> -	rec->bytes_written += size;
> -
> -	if (switch_output_size(rec))
> -		trigger_hit(&switch_output_trigger);
> +	update_output_size(rec, size);
>  
>  	return 0;
>  }
> @@ -130,106 +138,27 @@ static int process_synthesized_event(struct perf_tool *tool,
>  }
>  
>  static int
> -backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
> -{
> -	struct perf_event_header *pheader;
> -	u64 evt_head = head;
> -	int size = mask + 1;
> -
> -	pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
> -	pheader = (struct perf_event_header *)(buf + (head & mask));
> -	*start = head;
> -	while (true) {
> -		if (evt_head - head >= (unsigned int)size) {
> -			pr_debug("Finished reading backward ring buffer: rewind\n");
> -			if (evt_head - head > (unsigned int)size)
> -				evt_head -= pheader->size;
> -			*end = evt_head;
> -			return 0;
> -		}
> -
> -		pheader = (struct perf_event_header *)(buf + (evt_head & mask));
> -
> -		if (pheader->size == 0) {
> -			pr_debug("Finished reading backward ring buffer: get start\n");
> -			*end = evt_head;
> -			return 0;
> -		}
> -
> -		evt_head += pheader->size;
> -		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
> -	}
> -	WARN_ONCE(1, "Shouldn't get here\n");
> -	return -1;
> -}
> -
> -static int
> -rb_find_range(void *data, int mask, u64 head, u64 old,
> -	      u64 *start, u64 *end, bool backward)
> -{
> -	if (!backward) {
> -		*start = old;
> -		*end = head;
> -		return 0;
> -	}
> -
> -	return backward_rb_find_range(data, mask, head, start, end);
> -}
> -
> -static int
>  record__mmap_read(struct record *rec, struct perf_mmap *md,
>  		  bool overwrite, bool backward)
>  {
> -	u64 head = perf_mmap__read_head(md);
> -	u64 old = md->prev;
> -	u64 end = head, start = old;
> -	unsigned char *data = md->base + page_size;
> -	unsigned long size;
> -	void *buf;
> -	int rc = 0;
> +	struct perf_mmap_read read;
>  
> -	if (rb_find_range(data, md->mask, head,
> -			  old, &start, &end, backward))
> +	if (perf_mmap__read_init(md, &read, overwrite, backward))
>  		return -1;
>  
> -	if (start == end)
> +	if (read.start == read.end)
>  		return 0;
>  
>  	rec->samples++;
>  
> -	size = end - start;
> -	if (size > (unsigned long)(md->mask) + 1) {
> -		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
> -
> -		md->prev = head;
> -		perf_mmap__consume(md, overwrite || backward);
> -		return 0;
> -	}
> -
> -	if ((start & md->mask) + size != (end & md->mask)) {
> -		buf = &data[start & md->mask];
> -		size = md->mask + 1 - (start & md->mask);
> -		start += size;
> -
> -		if (record__write(rec, buf, size) < 0) {
> -			rc = -1;
> -			goto out;
> -		}
> -	}
> +	if (perf_mmap__read_to_file(&read, rec->session->file) < 0)
> +		return -1;
>  
> -	buf = &data[start & md->mask];
> -	size = end - start;
> -	start += size;
> +	update_output_size(rec, read.size);
>  
> -	if (record__write(rec, buf, size) < 0) {
> -		rc = -1;
> -		goto out;
> -	}
> -
> -	md->prev = head;
>  	perf_mmap__consume(md, overwrite || backward);
> -out:
> -	return rc;
> +
> +	return 0;
>  }
>  
>  static volatile int done;
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 6a0d7ff..33b8837 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -704,6 +704,117 @@ static int perf_evlist__resume(struct perf_evlist *evlist)
>  	return perf_evlist__set_paused(evlist, false);
>  }
>  
> +static int
> +backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
> +{
> +	struct perf_event_header *pheader;
> +	u64 evt_head = head;
> +	int size = mask + 1;
> +
> +	pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
> +	pheader = (struct perf_event_header *)(buf + (head & mask));
> +	*start = head;
> +	while (true) {
> +		if (evt_head - head >= (unsigned int)size) {
> +			pr_debug("Finished reading backward ring buffer: rewind\n");
> +			if (evt_head - head > (unsigned int)size)
> +				evt_head -= pheader->size;
> +			*end = evt_head;
> +			return 0;
> +		}
> +
> +		pheader = (struct perf_event_header *)(buf + (evt_head & mask));
> +
> +		if (pheader->size == 0) {
> +			pr_debug("Finished reading backward ring buffer: get start\n");
> +			*end = evt_head;
> +			return 0;
> +		}
> +
> +		evt_head += pheader->size;
> +		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
> +	}
> +	WARN_ONCE(1, "Shouldn't get here\n");
> +	return -1;
> +}
> +
> +static int
> +rb_find_range(void *data, int mask, u64 head, u64 old,
> +	      u64 *start, u64 *end, bool backward)
> +{
> +	if (!backward) {
> +		*start = old;
> +		*end = head;
> +		return 0;
> +	}
> +
> +	return backward_rb_find_range(data, mask, head, start, end);
> +}
> +
> +/*
> + * Initialize struct perf_mmap_read
> + * Calculate the range of ring buffer to read
> + */
> +int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
> +			 bool overwrite, bool backward)
> +{
> +	unsigned char *data = md->base + page_size;
> +
> +	read->md = md;
> +	read->head = perf_mmap__read_head(md);
> +	read->overwrite = overwrite;
> +	read->backward = backward;
> +	read->size = 0;
> +	return rb_find_range(data, md->mask, read->head, md->prev,
> +			     &read->start, &read->end, backward);
> +}
> +
> +/*
> + * Read the ring buffer in the range which specified in struct perf_mmap_read,
> + * and write to file.
> + * Used by perf record.
> + */
> +int perf_mmap__read_to_file(struct perf_mmap_read *read,
> +			    struct perf_data_file *file)
> +{
> +	struct perf_mmap *md = read->md;
> +	unsigned char *data = md->base + page_size;
> +	unsigned long size;
> +	void *buf;
> +
> +	size = read->end - read->start;
> +	if (size > (unsigned long)(md->mask) + 1) {
> +		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
> +		read->size = 0;
> +		goto out;
> +	}
> +
> +	if ((read->start & md->mask) + size != (read->end & md->mask)) {
> +		buf = &data[read->start & md->mask];
> +		size = md->mask + 1 - (read->start & md->mask);
> +		read->start += size;
> +		read->size += size;
> +
> +		if (perf_data_file__write(file, buf, size) < 0) {
> +			pr_err("failed to write %s, error: %m\n", file->path);
> +			return -1;
> +		}
> +	}
> +
> +	buf = &data[read->start & md->mask];
> +	size = read->end - read->start;
> +	read->start += size;
> +	read->size += size;
> +
> +	if (perf_data_file__write(file, buf, size) < 0) {
> +		pr_err("failed to write %s, error: %m\n", file->path);
> +		return -1;
> +	}
> +out:
> +	md->prev = read->head;
> +	return 0;
> +}
> +
>  /* When check_messup is true, 'end' must points to a good entry */
>  static union perf_event *
>  perf_mmap__read(struct perf_mmap *md, bool check_messup, u64 start,
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index b1c14f1..1ce4857 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -39,6 +39,16 @@ struct perf_mmap {
>  	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
>  };
>  
> +struct perf_mmap_read {
> +	struct perf_mmap	*md;
> +	u64			head;
> +	u64			start;
> +	u64			end;
> +	bool			overwrite;
> +	bool			backward;
> +	unsigned long		size;
> +};
> +
>  static inline size_t
>  perf_mmap__mmap_len(struct perf_mmap *map)
>  {
> @@ -193,6 +203,11 @@ void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
>  
>  void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
>  
> +int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
> +			 bool overwrite, bool backward);
> +int perf_mmap__read_to_file(struct perf_mmap_read *read,
> +			    struct perf_data_file *file);
> +
>  int perf_evlist__open(struct perf_evlist *evlist);
>  void perf_evlist__close(struct perf_evlist *evlist);
>  
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 18:15   ` Arnaldo Carvalho de Melo
@ 2017-10-10 18:28     ` Liang, Kan
  2017-10-10 18:34       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2017-10-10 18:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak



> Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com escreveu:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > The perf_evlist__mmap_read only support forward mode. It needs a
> > common function to support both forward and backward mode.
> 
> > The perf_evlist__mmap_read_backward is buggy.
> 
> So, what is the bug? You state that it is buggy, but don't spell out the bug,
> please do so.
> 

union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
{
	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should be backward_mmap

> If it fixes an existing bug, then it should go separate from this patchkit, right?

There is no one use perf_evlist__mmap_read_backward. So it doesn't trigger any issue.
I just simply drop the codes in patch 10.

Thanks,
Kan
> 
> - Arnaldo
> 
> > Introduce a new mmap read interface to support both forward and
> > backward mode. It can read event one by one from the specific range of
> > ring buffer.
> >
> > The standard process to mmap__read event would be as below.
> >  perf_mmap__read_init
> >  while (event = perf_mmap__read_event) {
> >    //process the event
> >    perf_mmap__consume
> >  }
> >  perf_mmap__read_done
> >
> > The following patches will use it to replace the old interfaces.
> >
> > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > ---
> >  tools/perf/util/evlist.c | 41
> > +++++++++++++++++++++++++++++++++++++++++
> >  tools/perf/util/evlist.h |  2 ++
> >  2 files changed, 43 insertions(+)
> >
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index
> > 7d23cf5..b36211e 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -897,6 +897,47 @@ perf_mmap__read(struct perf_mmap *md, bool
> check_messup, u64 start,
> >  	return event;
> >  }
> >
> > +/*
> > + * Read the first event in the specified range of the ring buffer.
> > + * Used by most of the perf tools and tests  */ union perf_event *
> > +perf_mmap__read_event(struct perf_mmap_read *read) {
> > +	struct perf_mmap *md = read->md;
> > +	union perf_event *event;
> > +
> > +	/*
> > +	 * Check if event was unmapped due to a POLLHUP/POLLERR.
> > +	 */
> > +	if (!refcount_read(&md->refcnt))
> > +		return NULL;
> > +
> > +	/* In backward, the ringbuffer is already paused. */
> > +	if (!read->backward) {
> > +		read->end = perf_mmap__read_head(md);
> > +		if (!read->end)
> > +			return NULL;
> > +	}
> > +
> > +	event = perf_mmap__read(md, !read->backward && read-
> >overwrite,
> > +				read->start, read->end, &md->prev);
> > +	read->start = md->prev;
> > +	return event;
> > +}
> > +
> > +/*
> > + * Mandatory for backward
> > + * The direction of backward mmap__read_event is from head to tail.
> > + * The last mmap__read_event will set tail to md->prev.
> > + * Need to correct the md->prev.
> > + */
> > +void
> > +perf_mmap__read_done(struct perf_mmap_read *read) {
> > +	read->md->prev = read->head;
> > +}
> > +
> >  union perf_event *perf_mmap__read_forward(struct perf_mmap *md,
> bool
> > check_messup)  {
> >  	u64 head;
> > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index
> > 1ce4857..53baf26 100644
> > --- a/tools/perf/util/evlist.h
> > +++ b/tools/perf/util/evlist.h
> > @@ -207,6 +207,8 @@ int perf_mmap__read_init(struct perf_mmap *md,
> struct perf_mmap_read *read,
> >  			 bool overwrite, bool backward);
> >  int perf_mmap__read_to_file(struct perf_mmap_read *read,
> >  			    struct perf_data_file *file);
> > +union perf_event *perf_mmap__read_event(struct perf_mmap_read
> *read);
> > +void perf_mmap__read_done(struct perf_mmap_read *read);
> >
> >  int perf_evlist__open(struct perf_evlist *evlist);  void
> > perf_evlist__close(struct perf_evlist *evlist);
> > --
> > 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/10] perf record: new interfaces to read ring buffer to file
  2017-10-10 17:20 ` [PATCH 01/10] perf record: new interfaces to read ring buffer to file kan.liang
  2017-10-10 18:24   ` Arnaldo Carvalho de Melo
@ 2017-10-10 18:30   ` Arnaldo Carvalho de Melo
  2017-10-11  4:12     ` Liang, Kan
  1 sibling, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-10 18:30 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, adrian.hunter, ak

Em Tue, Oct 10, 2017 at 10:20:14AM -0700, kan.liang@intel.com escreveu:
> From: Kan Liang <kan.liang@intel.com>
> 
> Factor out the mmap_read related codes into separate functions and move
> them to evlist.c
> struct perf_mmap_read is introduced to store the specific range of ring
> buffer information.
> 
> The way to read specific range of ring buffer into file is as below,
>   perf_mmap__read_init
>   perf_mmap__read_to_file
>   perf_mmap__consume
> 
> There is no functional change.
> 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  tools/perf/builtin-record.c | 111 ++++++++------------------------------------
>  tools/perf/util/evlist.c    | 111 ++++++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/evlist.h    |  15 ++++++
>  3 files changed, 146 insertions(+), 91 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 234fdf4..df555e9 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -105,6 +105,17 @@ static bool switch_output_time(struct record *rec)
>  	       trigger_is_ready(&switch_output_trigger);
>  }
>  
> +static void update_output_size(struct record *rec, size_t size)
> +{
> +	if (size == 0)
> +		return;
> +
> +	rec->bytes_written += size;
> +
> +	if (switch_output_size(rec))
> +		trigger_hit(&switch_output_trigger);
> +}
> +
>  static int record__write(struct record *rec, void *bf, size_t size)
>  {
>  	if (perf_data_file__write(rec->session->file, bf, size) < 0) {
> @@ -112,10 +123,7 @@ static int record__write(struct record *rec, void *bf, size_t size)
>  		return -1;
>  	}
>  
> -	rec->bytes_written += size;
> -
> -	if (switch_output_size(rec))
> -		trigger_hit(&switch_output_trigger);
> +	update_output_size(rec, size);
>  
>  	return 0;
>  }
> @@ -130,106 +138,27 @@ static int process_synthesized_event(struct perf_tool *tool,
>  }
>  
>  static int
> -backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
> -{
> -	struct perf_event_header *pheader;
> -	u64 evt_head = head;
> -	int size = mask + 1;
> -
> -	pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
> -	pheader = (struct perf_event_header *)(buf + (head & mask));
> -	*start = head;
> -	while (true) {
> -		if (evt_head - head >= (unsigned int)size) {
> -			pr_debug("Finished reading backward ring buffer: rewind\n");
> -			if (evt_head - head > (unsigned int)size)
> -				evt_head -= pheader->size;
> -			*end = evt_head;
> -			return 0;
> -		}
> -
> -		pheader = (struct perf_event_header *)(buf + (evt_head & mask));
> -
> -		if (pheader->size == 0) {
> -			pr_debug("Finished reading backward ring buffer: get start\n");
> -			*end = evt_head;
> -			return 0;
> -		}
> -
> -		evt_head += pheader->size;
> -		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
> -	}
> -	WARN_ONCE(1, "Shouldn't get here\n");
> -	return -1;
> -}
> -
> -static int
> -rb_find_range(void *data, int mask, u64 head, u64 old,
> -	      u64 *start, u64 *end, bool backward)
> -{
> -	if (!backward) {
> -		*start = old;
> -		*end = head;
> -		return 0;
> -	}
> -
> -	return backward_rb_find_range(data, mask, head, start, end);
> -}
> -
> -static int
>  record__mmap_read(struct record *rec, struct perf_mmap *md,
>  		  bool overwrite, bool backward)
>  {
> -	u64 head = perf_mmap__read_head(md);
> -	u64 old = md->prev;
> -	u64 end = head, start = old;
> -	unsigned char *data = md->base + page_size;
> -	unsigned long size;
> -	void *buf;
> -	int rc = 0;
> +	struct perf_mmap_read read;
>  
> -	if (rb_find_range(data, md->mask, head,
> -			  old, &start, &end, backward))
> +	if (perf_mmap__read_init(md, &read, overwrite, backward))
>  		return -1;
>  
> -	if (start == end)
> +	if (read.start == read.end)
>  		return 0;
>  
>  	rec->samples++;
>  
> -	size = end - start;
> -	if (size > (unsigned long)(md->mask) + 1) {
> -		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
> -
> -		md->prev = head;
> -		perf_mmap__consume(md, overwrite || backward);
> -		return 0;
> -	}
> -
> -	if ((start & md->mask) + size != (end & md->mask)) {
> -		buf = &data[start & md->mask];
> -		size = md->mask + 1 - (start & md->mask);
> -		start += size;
> -
> -		if (record__write(rec, buf, size) < 0) {
> -			rc = -1;
> -			goto out;
> -		}
> -	}
> +	if (perf_mmap__read_to_file(&read, rec->session->file) < 0)
> +		return -1;
>  
> -	buf = &data[start & md->mask];
> -	size = end - start;
> -	start += size;
> +	update_output_size(rec, read.size);
>  
> -	if (record__write(rec, buf, size) < 0) {
> -		rc = -1;
> -		goto out;
> -	}
> -
> -	md->prev = head;
>  	perf_mmap__consume(md, overwrite || backward);
> -out:
> -	return rc;
> +
> +	return 0;
>  }
>  
>  static volatile int done;
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 6a0d7ff..33b8837 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -704,6 +704,117 @@ static int perf_evlist__resume(struct perf_evlist *evlist)
>  	return perf_evlist__set_paused(evlist, false);
>  }
>  
> +static int
> +backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
> +{
> +	struct perf_event_header *pheader;
> +	u64 evt_head = head;
> +	int size = mask + 1;
> +
> +	pr_debug2("backward_rb_find_range: buf=%p, head=%"PRIx64"\n", buf, head);
> +	pheader = (struct perf_event_header *)(buf + (head & mask));
> +	*start = head;
> +	while (true) {
> +		if (evt_head - head >= (unsigned int)size) {
> +			pr_debug("Finished reading backward ring buffer: rewind\n");
> +			if (evt_head - head > (unsigned int)size)
> +				evt_head -= pheader->size;
> +			*end = evt_head;
> +			return 0;
> +		}
> +
> +		pheader = (struct perf_event_header *)(buf + (evt_head & mask));
> +
> +		if (pheader->size == 0) {
> +			pr_debug("Finished reading backward ring buffer: get start\n");
> +			*end = evt_head;
> +			return 0;
> +		}
> +
> +		evt_head += pheader->size;
> +		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
> +	}
> +	WARN_ONCE(1, "Shouldn't get here\n");
> +	return -1;
> +}
> +
> +static int
> +rb_find_range(void *data, int mask, u64 head, u64 old,
> +	      u64 *start, u64 *end, bool backward)
> +{
> +	if (!backward) {
> +		*start = old;
> +		*end = head;
> +		return 0;
> +	}
> +
> +	return backward_rb_find_range(data, mask, head, start, end);
> +}
> +
> +/*
> + * Initialize struct perf_mmap_read
> + * Calculate the range of ring buffer to read
> + */
> +int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
> +			 bool overwrite, bool backward)
> +{
> +	unsigned char *data = md->base + page_size;
> +
> +	read->md = md;
> +	read->head = perf_mmap__read_head(md);
> +	read->overwrite = overwrite;
> +	read->backward = backward;
> +	read->size = 0;
> +	return rb_find_range(data, md->mask, read->head, md->prev,
> +			     &read->start, &read->end, backward);
> +}
> +
> +/*
> + * Read the ring buffer in the range which specified in struct perf_mmap_read,
> + * and write to file.
> + * Used by perf record.
> + */
> +int perf_mmap__read_to_file(struct perf_mmap_read *read,
> +			    struct perf_data_file *file)
> +{
> +	struct perf_mmap *md = read->md;
> +	unsigned char *data = md->base + page_size;
> +	unsigned long size;
> +	void *buf;
> +
> +	size = read->end - read->start;
> +	if (size > (unsigned long)(md->mask) + 1) {
> +		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
> +		read->size = 0;
> +		goto out;
> +	}
> +
> +	if ((read->start & md->mask) + size != (read->end & md->mask)) {
> +		buf = &data[read->start & md->mask];
> +		size = md->mask + 1 - (read->start & md->mask);
> +		read->start += size;
> +		read->size += size;
> +
> +		if (perf_data_file__write(file, buf, size) < 0) {
> +			pr_err("failed to write %s, error: %m\n", file->path);
> +			return -1;
> +		}
> +	}
> +
> +	buf = &data[read->start & md->mask];
> +	size = read->end - read->start;
> +	read->start += size;
> +	read->size += size;
> +
> +	if (perf_data_file__write(file, buf, size) < 0) {
> +		pr_err("failed to write %s, error: %m\n", file->path);
> +		return -1;
> +	}
> +out:
> +	md->prev = read->head;
> +	return 0;
> +}
> +
>  /* When check_messup is true, 'end' must points to a good entry */
>  static union perf_event *
>  perf_mmap__read(struct perf_mmap *md, bool check_messup, u64 start,
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index b1c14f1..1ce4857 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -39,6 +39,16 @@ struct perf_mmap {
>  	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
>  };
>  
> +struct perf_mmap_read {
> +	struct perf_mmap	*md;
> +	u64			head;
> +	u64			start;
> +	u64			end;

So there will be always a one-on-one association of 'struct
perf_mmap_read' and 'struct perf_mmap', why not go on adding more fields
to 'struct perf_mmap' as we need, but not doing it all at once
(backward, snapshotting, overwrite, etc) but first the simple part, make
the most basic mode:

	perf record -a

	perf top

work, multithreaded, leaving the other more complicated modes
fallbacking to the old format, then when we have it solid, go on getting
the other features.

In the end, having the two formats supported will be needed anyway, and
we can as well ask for processing with both perf.data file formats to
compare results, while we strenghten out the new code.

I just think we should do this in a more fine grained way to avoid too
much code churn as well as having a fallback to the old code, that
albeit non scalable, is what we have been using and can help in
certifying that the new one works well, by comparing its outputs.

- Arnaldo

> +	bool			overwrite;
> +	bool			backward;
> +	unsigned long		size;
> +};
> +
>  static inline size_t
>  perf_mmap__mmap_len(struct perf_mmap *map)
>  {
> @@ -193,6 +203,11 @@ void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
>  
>  void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
>  
> +int perf_mmap__read_init(struct perf_mmap *md, struct perf_mmap_read *read,
> +			 bool overwrite, bool backward);
> +int perf_mmap__read_to_file(struct perf_mmap_read *read,
> +			    struct perf_data_file *file);
> +
>  int perf_evlist__open(struct perf_evlist *evlist);
>  void perf_evlist__close(struct perf_evlist *evlist);
>  
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 18:28     ` Liang, Kan
@ 2017-10-10 18:34       ` Arnaldo Carvalho de Melo
  2017-10-10 18:36         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-10 18:34 UTC (permalink / raw)
  To: Liang, Kan
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

Em Tue, Oct 10, 2017 at 06:28:18PM +0000, Liang, Kan escreveu:
> 
> 
> > Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com escreveu:
> > > From: Kan Liang <kan.liang@intel.com>
> > >
> > > The perf_evlist__mmap_read only support forward mode. It needs a
> > > common function to support both forward and backward mode.
> > 
> > > The perf_evlist__mmap_read_backward is buggy.
> > 
> > So, what is the bug? You state that it is buggy, but don't spell out the bug,
> > please do so.
> > 
> 
> union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
> {
> 	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should be backward_mmap
> 
> > If it fixes an existing bug, then it should go separate from this patchkit, right?
> 
> There is no one use perf_evlist__mmap_read_backward. So it doesn't trigger any issue.

There is no one at the end of your patchkit? Or no user _right now_? If
there is a user now, lemme see... yeah, no user right now, so _that_ is
yet another bug, i.e. it should be used, no? If this is just a left
over, then we should just throw it away, now, its a cleanup.

But if it should've been used right now, then we should fix the above
problem you noticed in one patch, then in another make it be used, this
way we get the current code base fixed.

The rest of the patchkit is something new, paving the way to
multithreading, and that we will be discussing separately from fixing
bugs found while working on this, agreed?

- Arnaldo

> I just simply drop the codes in patch 10.
> 
> Thanks,
> Kan
> > 
> > - Arnaldo
> > 
> > > Introduce a new mmap read interface to support both forward and
> > > backward mode. It can read event one by one from the specific range of
> > > ring buffer.
> > >
> > > The standard process to mmap__read event would be as below.
> > >  perf_mmap__read_init
> > >  while (event = perf_mmap__read_event) {
> > >    //process the event
> > >    perf_mmap__consume
> > >  }
> > >  perf_mmap__read_done
> > >
> > > The following patches will use it to replace the old interfaces.
> > >
> > > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > > ---
> > >  tools/perf/util/evlist.c | 41
> > > +++++++++++++++++++++++++++++++++++++++++
> > >  tools/perf/util/evlist.h |  2 ++
> > >  2 files changed, 43 insertions(+)
> > >
> > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index
> > > 7d23cf5..b36211e 100644
> > > --- a/tools/perf/util/evlist.c
> > > +++ b/tools/perf/util/evlist.c
> > > @@ -897,6 +897,47 @@ perf_mmap__read(struct perf_mmap *md, bool
> > check_messup, u64 start,
> > >  	return event;
> > >  }
> > >
> > > +/*
> > > + * Read the first event in the specified range of the ring buffer.
> > > + * Used by most of the perf tools and tests  */ union perf_event *
> > > +perf_mmap__read_event(struct perf_mmap_read *read) {
> > > +	struct perf_mmap *md = read->md;
> > > +	union perf_event *event;
> > > +
> > > +	/*
> > > +	 * Check if event was unmapped due to a POLLHUP/POLLERR.
> > > +	 */
> > > +	if (!refcount_read(&md->refcnt))
> > > +		return NULL;
> > > +
> > > +	/* In backward, the ringbuffer is already paused. */
> > > +	if (!read->backward) {
> > > +		read->end = perf_mmap__read_head(md);
> > > +		if (!read->end)
> > > +			return NULL;
> > > +	}
> > > +
> > > +	event = perf_mmap__read(md, !read->backward && read-
> > >overwrite,
> > > +				read->start, read->end, &md->prev);
> > > +	read->start = md->prev;
> > > +	return event;
> > > +}
> > > +
> > > +/*
> > > + * Mandatory for backward
> > > + * The direction of backward mmap__read_event is from head to tail.
> > > + * The last mmap__read_event will set tail to md->prev.
> > > + * Need to correct the md->prev.
> > > + */
> > > +void
> > > +perf_mmap__read_done(struct perf_mmap_read *read) {
> > > +	read->md->prev = read->head;
> > > +}
> > > +
> > >  union perf_event *perf_mmap__read_forward(struct perf_mmap *md,
> > bool
> > > check_messup)  {
> > >  	u64 head;
> > > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index
> > > 1ce4857..53baf26 100644
> > > --- a/tools/perf/util/evlist.h
> > > +++ b/tools/perf/util/evlist.h
> > > @@ -207,6 +207,8 @@ int perf_mmap__read_init(struct perf_mmap *md,
> > struct perf_mmap_read *read,
> > >  			 bool overwrite, bool backward);
> > >  int perf_mmap__read_to_file(struct perf_mmap_read *read,
> > >  			    struct perf_data_file *file);
> > > +union perf_event *perf_mmap__read_event(struct perf_mmap_read
> > *read);
> > > +void perf_mmap__read_done(struct perf_mmap_read *read);
> > >
> > >  int perf_evlist__open(struct perf_evlist *evlist);  void
> > > perf_evlist__close(struct perf_evlist *evlist);
> > > --
> > > 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 18:34       ` Arnaldo Carvalho de Melo
@ 2017-10-10 18:36         ` Arnaldo Carvalho de Melo
  2017-10-10 19:00           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-10 18:36 UTC (permalink / raw)
  To: Wang Nan
  Cc: Liang, Kan, peterz, mingo, linux-kernel, jolsa, wangnan0,
	hekuang, namhyung, alexander.shishkin, Hunter, Adrian, ak

Em Tue, Oct 10, 2017 at 03:34:55PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Oct 10, 2017 at 06:28:18PM +0000, Liang, Kan escreveu:
> > > Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com escreveu:
> > > > From: Kan Liang <kan.liang@intel.com>
> > > >
> > > > The perf_evlist__mmap_read only support forward mode. It needs a
> > > > common function to support both forward and backward mode.
> > > 
> > > > The perf_evlist__mmap_read_backward is buggy.
> > > 
> > > So, what is the bug? You state that it is buggy, but don't spell out the bug,
> > > please do so.
> > > 
> > 
> > union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
> > {
> > 	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should be backward_mmap
> > 
> > > If it fixes an existing bug, then it should go separate from this patchkit, right?
> > 
> > There is no one use perf_evlist__mmap_read_backward. So it doesn't trigger any issue.
> 
> There is no one at the end of your patchkit? Or no user _right now_? If
> there is a user now, lemme see... yeah, no user right now, so _that_ is
> yet another bug, i.e. it should be used, no? If this is just a left
> over, then we should just throw it away, now, its a cleanup.

Wang, can you take a look at these two issues? 

- Arnaldo

> But if it should've been used right now, then we should fix the above
> problem you noticed in one patch, then in another make it be used, this
> way we get the current code base fixed.
> 
> The rest of the patchkit is something new, paving the way to
> multithreading, and that we will be discussing separately from fixing
> bugs found while working on this, agreed?
> 
> - Arnaldo
> 
> > I just simply drop the codes in patch 10.
> > 
> > Thanks,
> > Kan
> > > 
> > > - Arnaldo
> > > 
> > > > Introduce a new mmap read interface to support both forward and
> > > > backward mode. It can read event one by one from the specific range of
> > > > ring buffer.
> > > >
> > > > The standard process to mmap__read event would be as below.
> > > >  perf_mmap__read_init
> > > >  while (event = perf_mmap__read_event) {
> > > >    //process the event
> > > >    perf_mmap__consume
> > > >  }
> > > >  perf_mmap__read_done
> > > >
> > > > The following patches will use it to replace the old interfaces.
> > > >
> > > > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > > > ---
> > > >  tools/perf/util/evlist.c | 41
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > >  tools/perf/util/evlist.h |  2 ++
> > > >  2 files changed, 43 insertions(+)
> > > >
> > > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index
> > > > 7d23cf5..b36211e 100644
> > > > --- a/tools/perf/util/evlist.c
> > > > +++ b/tools/perf/util/evlist.c
> > > > @@ -897,6 +897,47 @@ perf_mmap__read(struct perf_mmap *md, bool
> > > check_messup, u64 start,
> > > >  	return event;
> > > >  }
> > > >
> > > > +/*
> > > > + * Read the first event in the specified range of the ring buffer.
> > > > + * Used by most of the perf tools and tests  */ union perf_event *
> > > > +perf_mmap__read_event(struct perf_mmap_read *read) {
> > > > +	struct perf_mmap *md = read->md;
> > > > +	union perf_event *event;
> > > > +
> > > > +	/*
> > > > +	 * Check if event was unmapped due to a POLLHUP/POLLERR.
> > > > +	 */
> > > > +	if (!refcount_read(&md->refcnt))
> > > > +		return NULL;
> > > > +
> > > > +	/* In backward, the ringbuffer is already paused. */
> > > > +	if (!read->backward) {
> > > > +		read->end = perf_mmap__read_head(md);
> > > > +		if (!read->end)
> > > > +			return NULL;
> > > > +	}
> > > > +
> > > > +	event = perf_mmap__read(md, !read->backward && read-
> > > >overwrite,
> > > > +				read->start, read->end, &md->prev);
> > > > +	read->start = md->prev;
> > > > +	return event;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Mandatory for backward
> > > > + * The direction of backward mmap__read_event is from head to tail.
> > > > + * The last mmap__read_event will set tail to md->prev.
> > > > + * Need to correct the md->prev.
> > > > + */
> > > > +void
> > > > +perf_mmap__read_done(struct perf_mmap_read *read) {
> > > > +	read->md->prev = read->head;
> > > > +}
> > > > +
> > > >  union perf_event *perf_mmap__read_forward(struct perf_mmap *md,
> > > bool
> > > > check_messup)  {
> > > >  	u64 head;
> > > > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index
> > > > 1ce4857..53baf26 100644
> > > > --- a/tools/perf/util/evlist.h
> > > > +++ b/tools/perf/util/evlist.h
> > > > @@ -207,6 +207,8 @@ int perf_mmap__read_init(struct perf_mmap *md,
> > > struct perf_mmap_read *read,
> > > >  			 bool overwrite, bool backward);
> > > >  int perf_mmap__read_to_file(struct perf_mmap_read *read,
> > > >  			    struct perf_data_file *file);
> > > > +union perf_event *perf_mmap__read_event(struct perf_mmap_read
> > > *read);
> > > > +void perf_mmap__read_done(struct perf_mmap_read *read);
> > > >
> > > >  int perf_evlist__open(struct perf_evlist *evlist);  void
> > > > perf_evlist__close(struct perf_evlist *evlist);
> > > > --
> > > > 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 18:23   ` Wangnan (F)
@ 2017-10-10 18:50     ` Wangnan (F)
  2017-10-10 19:50       ` Liang, Kan
  2017-10-10 19:36     ` Liang, Kan
  1 sibling, 1 reply; 41+ messages in thread
From: Wangnan (F) @ 2017-10-10 18:50 UTC (permalink / raw)
  To: kan.liang, acme, peterz, mingo, linux-kernel
  Cc: jolsa, hekuang, namhyung, alexander.shishkin, adrian.hunter, ak



On 2017/10/11 2:23, Wangnan (F) wrote:
>
>
> On 2017/10/11 1:20, kan.liang@intel.com wrote:
>> From: Kan Liang <kan.liang@intel.com>
>>
>> Perf record can switch output. The new output should only store the data
>> after switching. However, in overwrite backward mode, the new output
>> still have the data from old output.
>>
>> At the end of mmap_read, the position of processed ring buffer is saved
>> in md->prev. Next mmap_read should be end in md->prev.
>> However, the md->prev is discarded. So next mmap_read has to process
>> whole valid ring buffer, which definitely include the old processed
>> data.
>>
>> Set the prev as the end of the range in backward mode.
>>
>> Signed-off-by: Kan Liang <kan.liang@intel.com>
>> ---
>>   tools/perf/util/evlist.c | 14 +++++++++++++-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index 33b8837..7d23cf5 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -742,13 +742,25 @@ static int
>>   rb_find_range(void *data, int mask, u64 head, u64 old,
>>             u64 *start, u64 *end, bool backward)
>>   {
>> +    int ret;
>> +
>>       if (!backward) {
>>           *start = old;
>>           *end = head;
>>           return 0;
>>       }
>>   -    return backward_rb_find_range(data, mask, head, start, end);
>> +    ret = backward_rb_find_range(data, mask, head, start, end);
>> +
>> +    /*
>> +     * The start and end from backward_rb_find_range is the range 
>> for all
>> +     * valid data in ring buffer.
>> +     * However, part of the data is processed previously.
>> +     * Reset the end to drop the processed data
>> +     */
>> +    *end = old;
>> +

[SNIP]

>
> If you really want to avoid record duplication, you need to changes 
> record__mmap_read()'s
> logic. Now it complains "failed to keep up with mmap data" and avoid 
> dumping data when
> size of newly generated data is larger than the size of the ring 
> buffer. It is reasonable
> for forward ring buffer because in this case you lost the head of the 
> first record, the
> whole ring buffer is unparseable. However, it is wrong in backward 
> case. What you
> should do in this case is dumping the whole ring buffer.
>

I think what you want should be something like this: (not tested)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ee7d0a8..f621a8e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -173,7 +173,9 @@ rb_find_range(void *data, int mask, u64 head, u64 old,
          return 0;
      }

-    return backward_rb_find_range(data, mask, head, start, end);
+    *start = head;
+    *end = old;
+    return 0;
  }

  static int
@@ -199,10 +201,15 @@ record__mmap_read(struct record *rec, struct 
perf_mmap *md,

      size = end - start;
      if (size > (unsigned long)(md->mask) + 1) {
-        WARN_ONCE(1, "failed to keep up with mmap data. (warn only 
once)\n");
+        if (!backward) {
+            WARN_ONCE(1, "failed to keep up with mmap data. (warn only 
once)\n");

-        md->prev = head;
-        perf_mmap__consume(md, overwrite || backward);
+            md->prev = head;
+            perf_mmap__consume(md, overwrite || backward);
+        } else {
+            backward_rb_find_range(data, md->mask, head, start, end);
+            /* FIXME: error processing */
+        }
          return 0;
      }


Use 'head' and 'old' to locate data position in ring buffer by default. 
If overwrite happen, use
backward_rb_find_range() to fetch the last available record and dump the 
whole ring buffer.

Thank you.

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 18:36         ` Arnaldo Carvalho de Melo
@ 2017-10-10 19:00           ` Arnaldo Carvalho de Melo
  2017-10-10 19:10             ` Wangnan (F)
  0 siblings, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-10 19:00 UTC (permalink / raw)
  To: Wang Nan
  Cc: Liang, Kan, peterz, mingo, linux-kernel, jolsa, hekuang,
	namhyung, alexander.shishkin, Hunter, Adrian, ak

Em Tue, Oct 10, 2017 at 03:36:28PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Oct 10, 2017 at 03:34:55PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Tue, Oct 10, 2017 at 06:28:18PM +0000, Liang, Kan escreveu:
> > > > Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com escreveu:
> > > > > From: Kan Liang <kan.liang@intel.com>
> > > > >
> > > > > The perf_evlist__mmap_read only support forward mode. It needs a
> > > > > common function to support both forward and backward mode.
> > > > 
> > > > > The perf_evlist__mmap_read_backward is buggy.
> > > > 
> > > > So, what is the bug? You state that it is buggy, but don't spell out the bug,
> > > > please do so.
> > > > 
> > > 
> > > union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
> > > {
> > > 	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should be backward_mmap
> > > 
> > > > If it fixes an existing bug, then it should go separate from this patchkit, right?
> > > 
> > > There is no one use perf_evlist__mmap_read_backward. So it doesn't trigger any issue.
> > 
> > There is no one at the end of your patchkit? Or no user _right now_? If
> > there is a user now, lemme see... yeah, no user right now, so _that_ is
> > yet another bug, i.e. it should be used, no? If this is just a left
> > over, then we should just throw it away, now, its a cleanup.
> 
> Wang, can you take a look at these two issues? 

So it looks leftover that should've been removed by the following cset, right Wang?

- Arnaldo

commit a0c6f451f90204847ce5f91c3268d83a76bde1b6
Author: Wang Nan <wangnan0@huawei.com>
Date:   Thu Jul 14 08:34:41 2016 +0000

    perf evlist: Drop evlist->backward
    
    Now there's no real user of evlist->backward. Drop it. We are going to
    use evlist->backward_mmap as a container for backward ring buffer.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 19:00           ` Arnaldo Carvalho de Melo
@ 2017-10-10 19:10             ` Wangnan (F)
  2017-10-10 19:17               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 41+ messages in thread
From: Wangnan (F) @ 2017-10-10 19:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Liang, Kan, peterz, mingo, linux-kernel, jolsa, hekuang,
	namhyung, alexander.shishkin, Hunter, Adrian, ak



On 2017/10/11 3:00, Arnaldo Carvalho de Melo wrote:
> Em Tue, Oct 10, 2017 at 03:36:28PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Tue, Oct 10, 2017 at 03:34:55PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Tue, Oct 10, 2017 at 06:28:18PM +0000, Liang, Kan escreveu:
>>>>> Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com escreveu:
>>>>>> From: Kan Liang <kan.liang@intel.com>
>>>>>>
>>>>>> The perf_evlist__mmap_read only support forward mode. It needs a
>>>>>> common function to support both forward and backward mode.
>>>>>> The perf_evlist__mmap_read_backward is buggy.
>>>>> So, what is the bug? You state that it is buggy, but don't spell out the bug,
>>>>> please do so.
>>>>>
>>>> union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
>>>> {
>>>> 	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should be backward_mmap
>>>>
>>>>> If it fixes an existing bug, then it should go separate from this patchkit, right?
>>>> There is no one use perf_evlist__mmap_read_backward. So it doesn't trigger any issue.
>>> There is no one at the end of your patchkit? Or no user _right now_? If
>>> there is a user now, lemme see... yeah, no user right now, so _that_ is
>>> yet another bug, i.e. it should be used, no? If this is just a left
>>> over, then we should just throw it away, now, its a cleanup.
>> Wang, can you take a look at these two issues?
> So it looks leftover that should've been removed by the following cset, right Wang?
>
> - Arnaldo
>
> commit a0c6f451f90204847ce5f91c3268d83a76bde1b6
> Author: Wang Nan <wangnan0@huawei.com>
> Date:   Thu Jul 14 08:34:41 2016 +0000
>
>      perf evlist: Drop evlist->backward
>      
>      Now there's no real user of evlist->backward. Drop it. We are going to
>      use evlist->backward_mmap as a container for backward ring buffer.
>

Yes, it should be removed, but then there will be no corresponding
function to perf_evlist__mmap_read(), which read an record from forward
ring buffer.

I think Kan wants to become the first user of this function because
he is trying to make 'perf top' utilizing backward ring buffer. It needs
perf_evlist__mmap_read_backward(), and he triggers the bug use his
unpublished patch set.

I think we can remove it now, let Kan fix and add it back in his 'perf top'
patch set.

Thank you.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 19:10             ` Wangnan (F)
@ 2017-10-10 19:17               ` Arnaldo Carvalho de Melo
  2017-10-10 19:22                 ` Wangnan (F)
  0 siblings, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-10 19:17 UTC (permalink / raw)
  To: Wangnan (F)
  Cc: Liang, Kan, peterz, mingo, linux-kernel, jolsa, hekuang,
	namhyung, alexander.shishkin, Hunter, Adrian, ak

Em Wed, Oct 11, 2017 at 03:10:37AM +0800, Wangnan (F) escreveu:
> 
> 
> On 2017/10/11 3:00, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Oct 10, 2017 at 03:36:28PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Tue, Oct 10, 2017 at 03:34:55PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Tue, Oct 10, 2017 at 06:28:18PM +0000, Liang, Kan escreveu:
> > > > > > Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com escreveu:
> > > > > > > From: Kan Liang <kan.liang@intel.com>
> > > > > > > 
> > > > > > > The perf_evlist__mmap_read only support forward mode. It needs a
> > > > > > > common function to support both forward and backward mode.
> > > > > > > The perf_evlist__mmap_read_backward is buggy.
> > > > > > So, what is the bug? You state that it is buggy, but don't spell out the bug,
> > > > > > please do so.
> > > > > > 
> > > > > union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
> > > > > {
> > > > > 	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should be backward_mmap
> > > > > 
> > > > > > If it fixes an existing bug, then it should go separate from this patchkit, right?
> > > > > There is no one use perf_evlist__mmap_read_backward. So it doesn't trigger any issue.
> > > > There is no one at the end of your patchkit? Or no user _right now_? If
> > > > there is a user now, lemme see... yeah, no user right now, so _that_ is
> > > > yet another bug, i.e. it should be used, no? If this is just a left
> > > > over, then we should just throw it away, now, its a cleanup.
> > > Wang, can you take a look at these two issues?
> > So it looks leftover that should've been removed by the following cset, right Wang?

> > commit a0c6f451f90204847ce5f91c3268d83a76bde1b6
> > Author: Wang Nan <wangnan0@huawei.com>
> > Date:   Thu Jul 14 08:34:41 2016 +0000

> >      perf evlist: Drop evlist->backward
> >      Now there's no real user of evlist->backward. Drop it. We are going to
> >      use evlist->backward_mmap as a container for backward ring buffer.
 
> Yes, it should be removed, but then there will be no corresponding
> function to perf_evlist__mmap_read(), which read an record from forward
> ring buffer.
 
> I think Kan wants to become the first user of this function because
> he is trying to make 'perf top' utilizing backward ring buffer. It needs
> perf_evlist__mmap_read_backward(), and he triggers the bug use his
> unpublished patch set.
 
> I think we can remove it now, let Kan fix and add it back in his 'perf top'
> patch set.

Well, if there will be a user, perhaps we should fix it, as it seems
interesting to have now for, as you said, a counterpart for the forward
ring buffer, and one that we have plans for using soon, right?

It doesn't need to go via perf/urgent as there is no current user, but I
could just fix it so that we have more info about its history in the git
commit logs.

- Arnaldo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 19:17               ` Arnaldo Carvalho de Melo
@ 2017-10-10 19:22                 ` Wangnan (F)
  2017-10-10 19:55                   ` Liang, Kan
  0 siblings, 1 reply; 41+ messages in thread
From: Wangnan (F) @ 2017-10-10 19:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Liang, Kan, peterz, mingo, linux-kernel, jolsa, hekuang,
	namhyung, alexander.shishkin, Hunter, Adrian, ak



On 2017/10/11 3:17, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 11, 2017 at 03:10:37AM +0800, Wangnan (F) escreveu:
>>
>> On 2017/10/11 3:00, Arnaldo Carvalho de Melo wrote:
>>> Em Tue, Oct 10, 2017 at 03:36:28PM -0300, Arnaldo Carvalho de Melo escreveu:
>>>> Em Tue, Oct 10, 2017 at 03:34:55PM -0300, Arnaldo Carvalho de Melo escreveu:
>>>>> Em Tue, Oct 10, 2017 at 06:28:18PM +0000, Liang, Kan escreveu:
>>>>>>> Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com escreveu:
>>>>>>>> From: Kan Liang <kan.liang@intel.com>
>>>>>>>>
>>>>>>>> The perf_evlist__mmap_read only support forward mode. It needs a
>>>>>>>> common function to support both forward and backward mode.
>>>>>>>> The perf_evlist__mmap_read_backward is buggy.
>>>>>>> So, what is the bug? You state that it is buggy, but don't spell out the bug,
>>>>>>> please do so.
>>>>>>>
>>>>>> union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
>>>>>> {
>>>>>> 	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should be backward_mmap
>>>>>>
>>>>>>> If it fixes an existing bug, then it should go separate from this patchkit, right?
>>>>>> There is no one use perf_evlist__mmap_read_backward. So it doesn't trigger any issue.
>>>>> There is no one at the end of your patchkit? Or no user _right now_? If
>>>>> there is a user now, lemme see... yeah, no user right now, so _that_ is
>>>>> yet another bug, i.e. it should be used, no? If this is just a left
>>>>> over, then we should just throw it away, now, its a cleanup.
>>>> Wang, can you take a look at these two issues?
>>> So it looks leftover that should've been removed by the following cset, right Wang?
>>> commit a0c6f451f90204847ce5f91c3268d83a76bde1b6
>>> Author: Wang Nan <wangnan0@huawei.com>
>>> Date:   Thu Jul 14 08:34:41 2016 +0000
>>>       perf evlist: Drop evlist->backward
>>>       Now there's no real user of evlist->backward. Drop it. We are going to
>>>       use evlist->backward_mmap as a container for backward ring buffer.
>   
>> Yes, it should be removed, but then there will be no corresponding
>> function to perf_evlist__mmap_read(), which read an record from forward
>> ring buffer.
>   
>> I think Kan wants to become the first user of this function because
>> he is trying to make 'perf top' utilizing backward ring buffer. It needs
>> perf_evlist__mmap_read_backward(), and he triggers the bug use his
>> unpublished patch set.
>   
>> I think we can remove it now, let Kan fix and add it back in his 'perf top'
>> patch set.
> Well, if there will be a user, perhaps we should fix it, as it seems
> interesting to have now for, as you said, a counterpart for the forward
> ring buffer, and one that we have plans for using soon, right?

Right if I understand Kan's patch 00/10 correctly. He said:

...
    But perf top need to switch to overwrite backward mode for good
    performance.
...

Thank you.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 18:23   ` Wangnan (F)
  2017-10-10 18:50     ` Wangnan (F)
@ 2017-10-10 19:36     ` Liang, Kan
  1 sibling, 0 replies; 41+ messages in thread
From: Liang, Kan @ 2017-10-10 19:36 UTC (permalink / raw)
  To: Wangnan (F), acme, peterz, mingo, linux-kernel
  Cc: jolsa, hekuang, namhyung, alexander.shishkin, Hunter, Adrian, ak

> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index 33b8837..7d23cf5 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -742,13 +742,25 @@ static int
> >   rb_find_range(void *data, int mask, u64 head, u64 old,
> >   	      u64 *start, u64 *end, bool backward)
> >   {
> > +	int ret;
> > +
> >   	if (!backward) {
> >   		*start = old;
> >   		*end = head;
> >   		return 0;
> >   	}
> >
> > -	return backward_rb_find_range(data, mask, head, start, end);
> > +	ret = backward_rb_find_range(data, mask, head, start, end);
> > +
> > +	/*
> > +	 * The start and end from backward_rb_find_range is the range for
> all
> > +	 * valid data in ring buffer.
> > +	 * However, part of the data is processed previously.
> > +	 * Reset the end to drop the processed data
> > +	 */
> > +	*end = old;
> > +
> 
> I don't understand this patch. What rb_find_range() wants to do is to
> find start
> and end pointer, so record__mmap_read() knows where to start reading and
> where to stop.
> In backward mode, 'start' pointer is clearly 'head' (because 'head' is
> the header of the
> last record kernel writes to the ring buffer), but the 'end' pointer is
> not very easy to
> find ('end' pointer is the last (the earliest) available record in the
> ring buffer).
> We have to parse the whole ring buffer to find the last available record
> and set its
> tail into 'end', this is what backward_rb_find_range() tries hard to do.

Yes.

> However,
> your patch unconditionally overwrites *end (which is set by
> backward_rb_find_range()),
> makes backward_rb_find_range() meaningless.
>

Think a bit more.
Yes, unconditionally overwrite has problem if kernel has already went
through the whole ring buffer and starts to overwrite.
However, I think rb_find_range/backward_rb_find_range still need to be
refined.
'old' is a good indicator for looking for the 'end' pointer.
It could speed up the backward_rb_find_range.

Also, do you think it’s really needed to read everything from ring buffer
every time?
I think what the tool want is the incremental. It doesn’t want to save and
process duplicate data.
 
So I think it’s better to modify the backward_rb_find_range to return the
'start' and 'end' for incremental.

> In my mind when you decide to use backward ring buffer you must make it
> overwrite
> because you want the kernel silently overwrite old records without
> waking up perf, while
> still able to parse the ring buffer even if overwriting happens. The use
> case
> should be: perf runs in background, kernel silently ignores old records
> in the ring
> buffer until something bad happens, then perf dumps 'events before the
> bad thing'.
> Similar to fight recorder. In this use case, each dumping is
> independent. Even
> if one record is appears in previous dumping, it is harmless (and
> useful) to make
> it appears in following dumping again.

It's not harmless for --switch-output, especially if we limit the output size. 

I'm still not sure why we need to save the same data in different dumps.
It's also harmful for the performance as well. 

> 
> If you really want to avoid record duplication, you need to changes
> record__mmap_read()'s
> logic. Now it complains "failed to keep up with mmap data" and avoid
> dumping data when
> size of newly generated data is larger than the size of the ring buffer.
> It is reasonable
> for forward ring buffer because in this case you lost the head of the
> first record, the
> whole ring buffer is unparseable. However, it is wrong in backward case.
> What you
> should do in this case is dumping the whole ring buffer.
> 
> > +	return ret;
> >   }
> >
> >   /*
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 18:50     ` Wangnan (F)
@ 2017-10-10 19:50       ` Liang, Kan
  2017-10-10 20:18         ` Wangnan (F)
  0 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2017-10-10 19:50 UTC (permalink / raw)
  To: Wangnan (F), acme, peterz, mingo, linux-kernel
  Cc: jolsa, hekuang, namhyung, alexander.shishkin, Hunter, Adrian, ak

> On 2017/10/11 2:23, Wangnan (F) wrote:
> >
> >
> > On 2017/10/11 1:20, kan.liang@intel.com wrote:
> >> From: Kan Liang <kan.liang@intel.com>
> >>
> >> Perf record can switch output. The new output should only store the data
> >> after switching. However, in overwrite backward mode, the new output
> >> still have the data from old output.
> >>
> >> At the end of mmap_read, the position of processed ring buffer is saved
> >> in md->prev. Next mmap_read should be end in md->prev.
> >> However, the md->prev is discarded. So next mmap_read has to process
> >> whole valid ring buffer, which definitely include the old processed
> >> data.
> >>
> >> Set the prev as the end of the range in backward mode.
> >>
> >> Signed-off-by: Kan Liang <kan.liang@intel.com>
> >> ---
> >>   tools/perf/util/evlist.c | 14 +++++++++++++-
> >>   1 file changed, 13 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >> index 33b8837..7d23cf5 100644
> >> --- a/tools/perf/util/evlist.c
> >> +++ b/tools/perf/util/evlist.c
> >> @@ -742,13 +742,25 @@ static int
> >>   rb_find_range(void *data, int mask, u64 head, u64 old,
> >>             u64 *start, u64 *end, bool backward)
> >>   {
> >> +    int ret;
> >> +
> >>       if (!backward) {
> >>           *start = old;
> >>           *end = head;
> >>           return 0;
> >>       }
> >>   -    return backward_rb_find_range(data, mask, head, start, end);
> >> +    ret = backward_rb_find_range(data, mask, head, start, end);
> >> +
> >> +    /*
> >> +     * The start and end from backward_rb_find_range is the range
> >> for all
> >> +     * valid data in ring buffer.
> >> +     * However, part of the data is processed previously.
> >> +     * Reset the end to drop the processed data
> >> +     */
> >> +    *end = old;
> >> +
> 
> [SNIP]
> 
> >
> > If you really want to avoid record duplication, you need to changes
> > record__mmap_read()'s
> > logic. Now it complains "failed to keep up with mmap data" and avoid
> > dumping data when
> > size of newly generated data is larger than the size of the ring
> > buffer. It is reasonable
> > for forward ring buffer because in this case you lost the head of the
> > first record, the
> > whole ring buffer is unparseable. However, it is wrong in backward
> > case. What you
> > should do in this case is dumping the whole ring buffer.
> >
> 
> I think what you want should be something like this: (not tested)
>

No. That's not what I want.
My test code never trigger the WARN_ONCE.

I think you will see the problem, if you simply run the command as below.
sudo ./perf record -e cycles:P -C0 --overwrite --switch-output=1s

The output size keep increasing. Because the new output always include the old outputs.
What I want is the 'start' and 'end' for the increase, not everything.

> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index ee7d0a8..f621a8e 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -173,7 +173,9 @@ rb_find_range(void *data, int mask, u64 head, u64
> old,
>           return 0;
>       }
> 
> -    return backward_rb_find_range(data, mask, head, start, end);
> +    *start = head;
> +    *end = old;
> +    return 0;
>   }
> 
>   static int
> @@ -199,10 +201,15 @@ record__mmap_read(struct record *rec, struct
> perf_mmap *md,
> 
>       size = end - start;
>       if (size > (unsigned long)(md->mask) + 1) {
> -        WARN_ONCE(1, "failed to keep up with mmap data. (warn only
> once)\n");
> +        if (!backward) {
> +            WARN_ONCE(1, "failed to keep up with mmap data. (warn only
> once)\n");
> 
> -        md->prev = head;
> -        perf_mmap__consume(md, overwrite || backward);
> +            md->prev = head;
> +            perf_mmap__consume(md, overwrite || backward);
> +        } else {
> +            backward_rb_find_range(data, md->mask, head, start, end);
> +            /* FIXME: error processing */
> +        }
>           return 0;
>       }
> 
> 
> Use 'head' and 'old' to locate data position in ring buffer by default.
> If overwrite happen, use
> backward_rb_find_range() to fetch the last available record and dump the
> whole ring buffer.
> 
> Thank you.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 19:22                 ` Wangnan (F)
@ 2017-10-10 19:55                   ` Liang, Kan
  2017-10-10 19:59                     ` Wangnan (F)
  0 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2017-10-10 19:55 UTC (permalink / raw)
  To: Wangnan (F), Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, jolsa, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

> On 2017/10/11 3:17, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Oct 11, 2017 at 03:10:37AM +0800, Wangnan (F) escreveu:
> >>
> >> On 2017/10/11 3:00, Arnaldo Carvalho de Melo wrote:
> >>> Em Tue, Oct 10, 2017 at 03:36:28PM -0300, Arnaldo Carvalho de Melo
> escreveu:
> >>>> Em Tue, Oct 10, 2017 at 03:34:55PM -0300, Arnaldo Carvalho de Melo
> escreveu:
> >>>>> Em Tue, Oct 10, 2017 at 06:28:18PM +0000, Liang, Kan escreveu:
> >>>>>>> Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com
> escreveu:
> >>>>>>>> From: Kan Liang <kan.liang@intel.com>
> >>>>>>>>
> >>>>>>>> The perf_evlist__mmap_read only support forward mode. It needs
> >>>>>>>> a common function to support both forward and backward mode.
> >>>>>>>> The perf_evlist__mmap_read_backward is buggy.
> >>>>>>> So, what is the bug? You state that it is buggy, but don't spell
> >>>>>>> out the bug, please do so.
> >>>>>>>
> >>>>>> union perf_event *perf_evlist__mmap_read_backward(struct
> >>>>>> perf_evlist *evlist, int idx) {
> >>>>>> 	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should
> be
> >>>>>> backward_mmap
> >>>>>>
> >>>>>>> If it fixes an existing bug, then it should go separate from this
> patchkit, right?
> >>>>>> There is no one use perf_evlist__mmap_read_backward. So it
> doesn't trigger any issue.
> >>>>> There is no one at the end of your patchkit? Or no user _right
> >>>>> now_? If there is a user now, lemme see... yeah, no user right
> >>>>> now, so _that_ is yet another bug, i.e. it should be used, no? If
> >>>>> this is just a left over, then we should just throw it away, now, its a
> cleanup.
> >>>> Wang, can you take a look at these two issues?
> >>> So it looks leftover that should've been removed by the following cset,
> right Wang?
> >>> commit a0c6f451f90204847ce5f91c3268d83a76bde1b6
> >>> Author: Wang Nan <wangnan0@huawei.com>
> >>> Date:   Thu Jul 14 08:34:41 2016 +0000
> >>>       perf evlist: Drop evlist->backward
> >>>       Now there's no real user of evlist->backward. Drop it. We are going to
> >>>       use evlist->backward_mmap as a container for backward ring buffer.
> >
> >> Yes, it should be removed, but then there will be no corresponding
> >> function to perf_evlist__mmap_read(), which read an record from
> >> forward ring buffer.
> >
> >> I think Kan wants to become the first user of this function because
> >> he is trying to make 'perf top' utilizing backward ring buffer. It
> >> needs perf_evlist__mmap_read_backward(), and he triggers the bug use
> >> his unpublished patch set.
> >
> >> I think we can remove it now, let Kan fix and add it back in his 'perf top'
> >> patch set.
> > Well, if there will be a user, perhaps we should fix it, as it seems
> > interesting to have now for, as you said, a counterpart for the
> > forward ring buffer, and one that we have plans for using soon, right?
> 
> Right if I understand Kan's patch 00/10 correctly. He said:
> 
> ...
>     But perf top need to switch to overwrite backward mode for good
>     performance.
> ...
> 
Yes, it will be used for perf top optimization.
That will be great if you can fix it.

I still think it's a good idea to have common interfaces for both perf record
and other tools like perf top.
rb_find_range/backward_rb_find_range could be shared.
The mmap read codes are similar.  

Thanks,
Kan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer
  2017-10-10 19:55                   ` Liang, Kan
@ 2017-10-10 19:59                     ` Wangnan (F)
  0 siblings, 0 replies; 41+ messages in thread
From: Wangnan (F) @ 2017-10-10 19:59 UTC (permalink / raw)
  To: Liang, Kan, Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, jolsa, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak



On 2017/10/11 3:55, Liang, Kan wrote:
>> On 2017/10/11 3:17, Arnaldo Carvalho de Melo wrote:
>>> Em Wed, Oct 11, 2017 at 03:10:37AM +0800, Wangnan (F) escreveu:
>>>> On 2017/10/11 3:00, Arnaldo Carvalho de Melo wrote:
>>>>> Em Tue, Oct 10, 2017 at 03:36:28PM -0300, Arnaldo Carvalho de Melo
>> escreveu:
>>>>>> Em Tue, Oct 10, 2017 at 03:34:55PM -0300, Arnaldo Carvalho de Melo
>> escreveu:
>>>>>>> Em Tue, Oct 10, 2017 at 06:28:18PM +0000, Liang, Kan escreveu:
>>>>>>>>> Em Tue, Oct 10, 2017 at 10:20:16AM -0700, kan.liang@intel.com
>> escreveu:
>>>>>>>>>> From: Kan Liang <kan.liang@intel.com>
>>>>>>>>>>
>>>>>>>>>> The perf_evlist__mmap_read only support forward mode. It needs
>>>>>>>>>> a common function to support both forward and backward mode.
>>>>>>>>>> The perf_evlist__mmap_read_backward is buggy.
>>>>>>>>> So, what is the bug? You state that it is buggy, but don't spell
>>>>>>>>> out the bug, please do so.
>>>>>>>>>
>>>>>>>> union perf_event *perf_evlist__mmap_read_backward(struct
>>>>>>>> perf_evlist *evlist, int idx) {
>>>>>>>> 	struct perf_mmap *md = &evlist->mmap[idx];  <--- it should
>> be
>>>>>>>> backward_mmap
>>>>>>>>
>>>>>>>>> If it fixes an existing bug, then it should go separate from this
>> patchkit, right?
>>>>>>>> There is no one use perf_evlist__mmap_read_backward. So it
>> doesn't trigger any issue.
>>>>>>> There is no one at the end of your patchkit? Or no user _right
>>>>>>> now_? If there is a user now, lemme see... yeah, no user right
>>>>>>> now, so _that_ is yet another bug, i.e. it should be used, no? If
>>>>>>> this is just a left over, then we should just throw it away, now, its a
>> cleanup.
>>>>>> Wang, can you take a look at these two issues?
>>>>> So it looks leftover that should've been removed by the following cset,
>> right Wang?
>>>>> commit a0c6f451f90204847ce5f91c3268d83a76bde1b6
>>>>> Author: Wang Nan <wangnan0@huawei.com>
>>>>> Date:   Thu Jul 14 08:34:41 2016 +0000
>>>>>        perf evlist: Drop evlist->backward
>>>>>        Now there's no real user of evlist->backward. Drop it. We are going to
>>>>>        use evlist->backward_mmap as a container for backward ring buffer.
>>>> Yes, it should be removed, but then there will be no corresponding
>>>> function to perf_evlist__mmap_read(), which read an record from
>>>> forward ring buffer.
>>>> I think Kan wants to become the first user of this function because
>>>> he is trying to make 'perf top' utilizing backward ring buffer. It
>>>> needs perf_evlist__mmap_read_backward(), and he triggers the bug use
>>>> his unpublished patch set.
>>>> I think we can remove it now, let Kan fix and add it back in his 'perf top'
>>>> patch set.
>>> Well, if there will be a user, perhaps we should fix it, as it seems
>>> interesting to have now for, as you said, a counterpart for the
>>> forward ring buffer, and one that we have plans for using soon, right?
>> Right if I understand Kan's patch 00/10 correctly. He said:
>>
>> ...
>>      But perf top need to switch to overwrite backward mode for good
>>      performance.
>> ...
>>
> Yes, it will be used for perf top optimization.
> That will be great if you can fix it.

You can fix it and post your fix together with your perf top
patch set. I can write the code but I can't test it because
there's no user now.

> I still think it's a good idea to have common interfaces for both perf record
> and other tools like perf top.
> rb_find_range/backward_rb_find_range could be shared.
> The mmap read codes are similar.

Agree. Please keep going.

> Thanks,
> Kan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 19:50       ` Liang, Kan
@ 2017-10-10 20:18         ` Wangnan (F)
  2017-10-11  2:12           ` Liang, Kan
  0 siblings, 1 reply; 41+ messages in thread
From: Wangnan (F) @ 2017-10-10 20:18 UTC (permalink / raw)
  To: Liang, Kan, acme, peterz, mingo, linux-kernel
  Cc: jolsa, hekuang, namhyung, alexander.shishkin, Hunter, Adrian, ak



On 2017/10/11 3:50, Liang, Kan wrote:
>> On 2017/10/11 2:23, Wangnan (F) wrote:
>>>
>>> On 2017/10/11 1:20, kan.liang@intel.com wrote:
>>>> From: Kan Liang <kan.liang@intel.com>
>>>>
>>>> Perf record can switch output. The new output should only store the data
>>>> after switching. However, in overwrite backward mode, the new output
>>>> still have the data from old output.
>>>>
>>>> At the end of mmap_read, the position of processed ring buffer is saved
>>>> in md->prev. Next mmap_read should be end in md->prev.
>>>> However, the md->prev is discarded. So next mmap_read has to process
>>>> whole valid ring buffer, which definitely include the old processed
>>>> data.
>>>>
>>>> Set the prev as the end of the range in backward mode.
>>>>
>>>> Signed-off-by: Kan Liang <kan.liang@intel.com>
>>>> ---
>>>>    tools/perf/util/evlist.c | 14 +++++++++++++-
>>>>    1 file changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>>>> index 33b8837..7d23cf5 100644
>>>> --- a/tools/perf/util/evlist.c
>>>> +++ b/tools/perf/util/evlist.c
>>>> @@ -742,13 +742,25 @@ static int
>>>>    rb_find_range(void *data, int mask, u64 head, u64 old,
>>>>              u64 *start, u64 *end, bool backward)
>>>>    {
>>>> +    int ret;
>>>> +
>>>>        if (!backward) {
>>>>            *start = old;
>>>>            *end = head;
>>>>            return 0;
>>>>        }
>>>>    -    return backward_rb_find_range(data, mask, head, start, end);
>>>> +    ret = backward_rb_find_range(data, mask, head, start, end);
>>>> +
>>>> +    /*
>>>> +     * The start and end from backward_rb_find_range is the range
>>>> for all
>>>> +     * valid data in ring buffer.
>>>> +     * However, part of the data is processed previously.
>>>> +     * Reset the end to drop the processed data
>>>> +     */
>>>> +    *end = old;
>>>> +
>> [SNIP]
>>
>>> If you really want to avoid record duplication, you need to changes
>>> record__mmap_read()'s
>>> logic. Now it complains "failed to keep up with mmap data" and avoid
>>> dumping data when
>>> size of newly generated data is larger than the size of the ring
>>> buffer. It is reasonable
>>> for forward ring buffer because in this case you lost the head of the
>>> first record, the
>>> whole ring buffer is unparseable. However, it is wrong in backward
>>> case. What you
>>> should do in this case is dumping the whole ring buffer.
>>>
>> I think what you want should be something like this: (not tested)
>>
> No. That's not what I want.
> My test code never trigger the WARN_ONCE.

The existing code never trigger that warning because the size computed
by rb_find_range is never larger than size of ring buffer. After applying
your patch, I believe it will trigger this WARN_ONCE and drop the whole
ring buffer. Please set a smaller ring buffer and try again.

> I think you will see the problem, if you simply run the command as below.
> sudo ./perf record -e cycles:P -C0 --overwrite --switch-output=1s
>
> The output size keep increasing. Because the new output always include the old outputs.
> What I want is the 'start' and 'end' for the increase, not everything.


This is my test result: add a '-m 1' for 'perf record' for shrinking 
ring buffer,
start a while loop on CPU 0 to increase data rate.

It stops increasing after the ring buffer is full:

$:~/linux/tools/perf$ sudo ./perf record -m1 -e cycles:P -C0 --overwrite 
--switch-output=1s
   Warning: File /home/w00229757/.perfconfig not owned by current user 
or root, ignoring it.
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2017101212165072 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2017101212165175 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2017101212165278 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2017101212165381 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2017101212165484 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2017101212165586 ]
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Dump perf.data.2017101212165653 ]
[ perf record: Captured and wrote 1.013 MB perf.data.<timestamp> ]

$ ls -l ./perf.data*
-rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165072
-rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165175
-rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165278
-rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165381
-rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165484
-rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165586
-rw------- 1 root root 1067812 Oct 12 12:16 ./perf.data.2017101212165653

You see the result keep getting larger because the ring buffer
is never full in your case.

Thank you.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 20:18         ` Wangnan (F)
@ 2017-10-11  2:12           ` Liang, Kan
  2017-10-11 14:57             ` Liang, Kan
  0 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2017-10-11  2:12 UTC (permalink / raw)
  To: Wangnan (F), acme, peterz, mingo, linux-kernel
  Cc: jolsa, hekuang, namhyung, alexander.shishkin, Hunter, Adrian, ak

> >>
> >>> If you really want to avoid record duplication, you need to changes
> >>> record__mmap_read()'s
> >>> logic. Now it complains "failed to keep up with mmap data" and avoid
> >>> dumping data when
> >>> size of newly generated data is larger than the size of the ring
> >>> buffer. It is reasonable
> >>> for forward ring buffer because in this case you lost the head of the
> >>> first record, the
> >>> whole ring buffer is unparseable. However, it is wrong in backward
> >>> case. What you
> >>> should do in this case is dumping the whole ring buffer.
> >>>
> >> I think what you want should be something like this: (not tested)
> >>
> > No. That's not what I want.
> > My test code never trigger the WARN_ONCE.
> 
> The existing code never trigger that warning because the size computed
> by rb_find_range is never larger than size of ring buffer. After applying
> your patch, I believe it will trigger this WARN_ONCE and drop the whole
> ring buffer. Please set a smaller ring buffer and try again.
> 
> > I think you will see the problem, if you simply run the command as below.
> > sudo ./perf record -e cycles:P -C0 --overwrite --switch-output=1s
> >
> > The output size keep increasing. Because the new output always include
> the old outputs.
> > What I want is the 'start' and 'end' for the increase, not everything.
> 
> 
> This is my test result: add a '-m 1' for 'perf record' for shrinking
> ring buffer,
> start a while loop on CPU 0 to increase data rate.
> 
> It stops increasing after the ring buffer is full:
> 
> $:~/linux/tools/perf$ sudo ./perf record -m1 -e cycles:P -C0 --overwrite
> --switch-output=1s
>    Warning: File /home/w00229757/.perfconfig not owned by current user
> or root, ignoring it.
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2017101212165072 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2017101212165175 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2017101212165278 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2017101212165381 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2017101212165484 ]
> [ perf record: dump data: Woken up 1 times ]
> [ perf record: Dump perf.data.2017101212165586 ]
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Dump perf.data.2017101212165653 ]
> [ perf record: Captured and wrote 1.013 MB perf.data.<timestamp> ]
> 
> $ ls -l ./perf.data*
> -rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165072
> -rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165175
> -rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165278
> -rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165381
> -rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165484
> -rw------- 1 root root  538988 Oct 12 12:16 ./perf.data.2017101212165586
> -rw------- 1 root root 1067812 Oct 12 12:16 ./perf.data.2017101212165653
> 
> You see the result keep getting larger because the ring buffer
> is never full in your case.

The increasing file size in my case indicates that the old processed data is dumped
into the new output.
I don't think it’s right. Because we should not process the same data multiple times.
That definitely increases the overhead of perf record. 


As digging deeper, I even suspect the overwrite mode doesn't really enabled
in perf record.
The overwrite is hard code to false.
	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
				 opts->auxtrace_mmap_pages,
				 opts->auxtrace_snapshot_mode) < 0) {
Did I miss something?


Thanks,
Kan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 01/10] perf record: new interfaces to read ring buffer to file
  2017-10-10 18:30   ` Arnaldo Carvalho de Melo
@ 2017-10-11  4:12     ` Liang, Kan
  2017-10-11 14:45       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2017-10-11  4:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

> >  /* When check_messup is true, 'end' must points to a good entry */
> > static union perf_event *  perf_mmap__read(struct perf_mmap *md, bool
> > check_messup, u64 start, diff --git a/tools/perf/util/evlist.h
> > b/tools/perf/util/evlist.h index b1c14f1..1ce4857 100644
> > --- a/tools/perf/util/evlist.h
> > +++ b/tools/perf/util/evlist.h
> > @@ -39,6 +39,16 @@ struct perf_mmap {
> >  	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> >  };
> >
> > +struct perf_mmap_read {
> > +	struct perf_mmap	*md;
> > +	u64			head;
> > +	u64			start;
> > +	u64			end;
> 
> So there will be always a one-on-one association of 'struct perf_mmap_read'
> and 'struct perf_mmap', why not go on adding more fields to 'struct
> perf_mmap' as we need

The fields in 'struct perf_mmap' needs to be recalculated before each reading.
So I put them in a new struct.  

> but not doing it all at once (backward, snapshotting,
> overwrite, etc) but first the simple part, make the most basic mode:
> 
> 	perf record -a
> 
> 	perf top
> 
> work, multithreaded, leaving the other more complicated modes fallbacking
> to the old format, then when we have it solid, go on getting the other
> features.

Agree. 
When I did perf top optimization, I also tried Namhyung's perf top multi-thread patch.
https://lwn.net/Articles/667469/
I think it may be a good start point.

I didn't work on his patch. Because the root cause of bad perf top performance
is non overwrite mode, which generate lots of samples shortly. It exceeds KNL's
computational capability. Multi-threading doesn't help much on this case.
So I tried to use overwrite mode then.

> 
> In the end, having the two formats supported will be needed anyway, and
> we can as well ask for processing with both perf.data file formats to compare
> results, while we strenghten out the new code.
>
> I just think we should do this in a more fine grained way to avoid too much
> code churn as well as having a fallback to the old code, that albeit non
> scalable, is what we have been using and can help in certifying that the new
> one works well, by comparing its outputs.

I already extended the multithreading support for event synthesization in perf
record. 
https://github.com/kliang2/perf.git perf_record_opt
I will send it out for review shortly after rebasing on the latest perf/core.

In the patch series, I realloc buffer for each thread to temporarily keep the
processing result, and write them to the perf.data at the end of event
synthesization. The number of synthesized event is not big (hundreds of
Kilobyte). So I think it should be OK to do that.

Thanks,
Kan
> 
> - Arnaldo
> 
> > +	bool			overwrite;
> > +	bool			backward;
> > +	unsigned long		size;
> > +};
> > +
> >  static inline size_t
> >  perf_mmap__mmap_len(struct perf_mmap *map)  { @@ -193,6 +203,11
> @@
> > void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int
> > idx);
> >
> >  void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
> >
> > +int perf_mmap__read_init(struct perf_mmap *md, struct
> perf_mmap_read *read,
> > +			 bool overwrite, bool backward);
> > +int perf_mmap__read_to_file(struct perf_mmap_read *read,
> > +			    struct perf_data_file *file);
> > +
> >  int perf_evlist__open(struct perf_evlist *evlist);  void
> > perf_evlist__close(struct perf_evlist *evlist);
> >
> > --
> > 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/10] perf record: new interfaces to read ring buffer to file
  2017-10-11  4:12     ` Liang, Kan
@ 2017-10-11 14:45       ` Arnaldo Carvalho de Melo
  2017-10-11 15:16         ` Liang, Kan
  0 siblings, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-11 14:45 UTC (permalink / raw)
  To: Liang, Kan
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

Em Wed, Oct 11, 2017 at 04:12:42AM +0000, Liang, Kan escreveu:
> > >  /* When check_messup is true, 'end' must points to a good entry */
> > > static union perf_event *  perf_mmap__read(struct perf_mmap *md, bool
> > > check_messup, u64 start, diff --git a/tools/perf/util/evlist.h
> > > b/tools/perf/util/evlist.h index b1c14f1..1ce4857 100644
> > > --- a/tools/perf/util/evlist.h
> > > +++ b/tools/perf/util/evlist.h
> > > @@ -39,6 +39,16 @@ struct perf_mmap {
> > >  	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> > >  };
> > >
> > > +struct perf_mmap_read {
> > > +	struct perf_mmap	*md;
> > > +	u64			head;
> > > +	u64			start;
> > > +	u64			end;
> > 
> > So there will be always a one-on-one association of 'struct perf_mmap_read'
> > and 'struct perf_mmap', why not go on adding more fields to 'struct
> > perf_mmap' as we need
> 
> The fields in 'struct perf_mmap' needs to be recalculated before each reading.
> So I put them in a new struct.  

Ok, but I still think that if there is a one on one relatioship of
perf_mmap_read with perf_mmap, then we should just extend the one we
already have for per-mmap operations, i.e. 'struct perf_mmap', I'll try
and provide a patch on top of my perf/core branch to see how it looks.
 
> > but not doing it all at once (backward, snapshotting,
> > overwrite, etc) but first the simple part, make the most basic mode:
> > 
> > 	perf record -a
> > 
> > 	perf top
> > 
> > work, multithreaded, leaving the other more complicated modes fallbacking
> > to the old format, then when we have it solid, go on getting the other
> > features.
> 
> Agree. 
> When I did perf top optimization, I also tried Namhyung's perf top multi-thread patch.
> https://lwn.net/Articles/667469/
> I think it may be a good start point.

I have to read that to understand why we need those indexes :-\
 
> I didn't work on his patch. Because the root cause of bad perf top performance
> is non overwrite mode, which generate lots of samples shortly. It exceeds KNL's
> computational capability. Multi-threading doesn't help much on this case.
> So I tried to use overwrite mode then.

Right, work on the problem you have at hand, but all these efforts
should be considered to move forward.
 
> > In the end, having the two formats supported will be needed anyway, and
> > we can as well ask for processing with both perf.data file formats to compare
> > results, while we strenghten out the new code.
> >
> > I just think we should do this in a more fine grained way to avoid too much
> > code churn as well as having a fallback to the old code, that albeit non
> > scalable, is what we have been using and can help in certifying that the new
> > one works well, by comparing its outputs.
> 
> I already extended the multithreading support for event synthesization in perf
> record. 
> https://github.com/kliang2/perf.git perf_record_opt
> I will send it out for review shortly after rebasing on the latest perf/core.
> 
> In the patch series, I realloc buffer for each thread to temporarily keep the
> processing result, and write them to the perf.data at the end of event
> synthesization. The number of synthesized event is not big (hundreds of
> Kilobyte). So I think it should be OK to do that.

Ok, one thing I noticed was that with the snapshotting code we
synthesize events multiple times, once per each new perf.data file, I
haven't tested that with the multithreaded synthesizing code we recently
merged, have you?

- Arnaldo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-11  2:12           ` Liang, Kan
@ 2017-10-11 14:57             ` Liang, Kan
  2017-10-12  1:11               ` Wangnan (F)
  0 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2017-10-11 14:57 UTC (permalink / raw)
  To: 'Wangnan (F)', 'acme@kernel.org',
	'peterz@infradead.org', 'mingo@redhat.com',
	'linux-kernel@vger.kernel.org'
  Cc: 'jolsa@kernel.org', 'hekuang@huawei.com',
	'namhyung@kernel.org',
	'alexander.shishkin@linux.intel.com',
	Hunter, Adrian, 'ak@linux.intel.com'


> > >>
> > >>> If you really want to avoid record duplication, you need to
> > >>> changes record__mmap_read()'s logic. Now it complains "failed to
> > >>> keep up with mmap data" and avoid dumping data when size of newly
> > >>> generated data is larger than the size of the ring buffer. It is
> > >>> reasonable for forward ring buffer because in this case you lost
> > >>> the head of the first record, the whole ring buffer is
> > >>> unparseable. However, it is wrong in backward case. What you
> > >>> should do in this case is dumping the whole ring buffer.
> > >>>
> > >> I think what you want should be something like this: (not tested)
> > >>
> > > No. That's not what I want.
> > > My test code never trigger the WARN_ONCE.
> >
> > The existing code never trigger that warning because the size computed
> > by rb_find_range is never larger than size of ring buffer. After
> > applying your patch, I believe it will trigger this WARN_ONCE and drop
> > the whole ring buffer. Please set a smaller ring buffer and try again.
> >
> > > I think you will see the problem, if you simply run the command as below.
> > > sudo ./perf record -e cycles:P -C0 --overwrite --switch-output=1s
> > >
> > > The output size keep increasing. Because the new output always
> > > include
> > the old outputs.
> > > What I want is the 'start' and 'end' for the increase, not everything.
> >
> >
> > This is my test result: add a '-m 1' for 'perf record' for shrinking
> > ring buffer, start a while loop on CPU 0 to increase data rate.
> >
> > It stops increasing after the ring buffer is full:
> >
> > $:~/linux/tools/perf$ sudo ./perf record -m1 -e cycles:P -C0
> > --overwrite --switch-output=1s
> >    Warning: File /home/w00229757/.perfconfig not owned by current user
> > or root, ignoring it.
> > [ perf record: dump data: Woken up 1 times ] [ perf record: Dump
> > perf.data.2017101212165072 ] [ perf record: dump data: Woken up 1
> > times ] [ perf record: Dump perf.data.2017101212165175 ] [ perf
> > record: dump data: Woken up 1 times ] [ perf record: Dump
> > perf.data.2017101212165278 ] [ perf record: dump data: Woken up 1
> > times ] [ perf record: Dump perf.data.2017101212165381 ] [ perf
> > record: dump data: Woken up 1 times ] [ perf record: Dump
> > perf.data.2017101212165484 ] [ perf record: dump data: Woken up 1
> > times ] [ perf record: Dump perf.data.2017101212165586 ] ^C[ perf
> > record: Woken up 1 times to write data ] [ perf record: Dump
> > perf.data.2017101212165653 ] [ perf record: Captured and wrote 1.013
> > MB perf.data.<timestamp> ]
> >
> > $ ls -l ./perf.data*
> > -rw------- 1 root root  538988 Oct 12 12:16
> > ./perf.data.2017101212165072
> > -rw------- 1 root root  538988 Oct 12 12:16
> > ./perf.data.2017101212165175
> > -rw------- 1 root root  538988 Oct 12 12:16
> > ./perf.data.2017101212165278
> > -rw------- 1 root root  538988 Oct 12 12:16
> > ./perf.data.2017101212165381
> > -rw------- 1 root root  538988 Oct 12 12:16
> > ./perf.data.2017101212165484
> > -rw------- 1 root root  538988 Oct 12 12:16
> > ./perf.data.2017101212165586
> > -rw------- 1 root root 1067812 Oct 12 12:16
> > ./perf.data.2017101212165653
> >
> > You see the result keep getting larger because the ring buffer is
> > never full in your case.
> 
> The increasing file size in my case indicates that the old processed data is
> dumped into the new output.
> I don't think it’s right. Because we should not process the same data multiple
> times.
> That definitely increases the overhead of perf record.
> 

For the issue, I mentioned above.
What do think about the patch as below?
It tries to avoid the duplicate data.

>From 8b058ea6977a97e5705aa2f64bdd014fd76d1247 Mon Sep 17 00:00:00 2001
From: Kan Liang <Kan.liang@intel.com>
Date: Wed, 11 Oct 2017 07:39:34 -0700
Subject: [PATCH] perf tool: fix: Don't discard prev in backward mode

Perf record can switch output. The new output should only store the data
after switching. However, in overwrite backward mode, the new output
still have the data from old output. That also brings extra overhead.

At the end of mmap_read, the position of processed ring buffer is saved
in md->prev. Next mmap_read should be end in md->prev if it is not
overwriten. That avoids to process duplicate data.
However, the md->prev is discarded. So next mmap_read has to process
whole valid ring buffer, which probably include the old processed
data.

Introduce fast path for backward_rb_find_range. Stop searching when
md->prev is detected.

Signed-off-by: Kan Liang <Kan.liang@intel.com>
---
 tools/perf/util/mmap.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 9fe5f9c..36b459a 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -254,7 +254,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd)
 	return 0;
 }
 
-static int backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
+static int backward_rb_find_range(void *buf, int mask, u64 head,
+				  u64 old, u64 *start, u64 *end)
 {
 	struct perf_event_header *pheader;
 	u64 evt_head = head;
@@ -282,6 +283,12 @@ static int backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64
 
 		evt_head += pheader->size;
 		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
+
+		/* fast path: avoid to process duplicate data */
+		if (old == evt_head) {
+			*end = evt_head;
+			return 0;
+		}
 	}
 	WARN_ONCE(1, "Shouldn't get here\n");
 	return -1;
@@ -296,7 +303,7 @@ static int rb_find_range(void *data, int mask, u64 head, u64 old,
 		return 0;
 	}
 
-	return backward_rb_find_range(data, mask, head, start, end);
+	return backward_rb_find_range(data, mask, head, old, start, end);
 }
 
 int perf_mmap__push(struct perf_mmap *md, bool overwrite, bool backward,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* RE: [PATCH 01/10] perf record: new interfaces to read ring buffer to file
  2017-10-11 14:45       ` Arnaldo Carvalho de Melo
@ 2017-10-11 15:16         ` Liang, Kan
  0 siblings, 0 replies; 41+ messages in thread
From: Liang, Kan @ 2017-10-11 15:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

> Em Wed, Oct 11, 2017 at 04:12:42AM +0000, Liang, Kan escreveu:
> > > >  /* When check_messup is true, 'end' must points to a good entry */
> > > > static union perf_event *  perf_mmap__read(struct perf_mmap *md,
> bool
> > > > check_messup, u64 start, diff --git a/tools/perf/util/evlist.h
> > > > b/tools/perf/util/evlist.h index b1c14f1..1ce4857 100644
> > > > --- a/tools/perf/util/evlist.h
> > > > +++ b/tools/perf/util/evlist.h
> > > > @@ -39,6 +39,16 @@ struct perf_mmap {
> > > >  	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> > > >  };
> > > >
> > > > +struct perf_mmap_read {
> > > > +	struct perf_mmap	*md;
> > > > +	u64			head;
> > > > +	u64			start;
> > > > +	u64			end;
> > >
> > > So there will be always a one-on-one association of 'struct
> perf_mmap_read'
> > > and 'struct perf_mmap', why not go on adding more fields to 'struct
> > > perf_mmap' as we need
> >
> > The fields in 'struct perf_mmap' needs to be recalculated before each
> reading.
> > So I put them in a new struct.
> 
> Ok, but I still think that if there is a one on one relatioship of
> perf_mmap_read with perf_mmap, then we should just extend the one we
> already have for per-mmap operations, i.e. 'struct perf_mmap', I'll try
> and provide a patch on top of my perf/core branch to see how it looks.
> 
> > > but not doing it all at once (backward, snapshotting,
> > > overwrite, etc) but first the simple part, make the most basic mode:
> > >
> > > 	perf record -a
> > >
> > > 	perf top
> > >
> > > work, multithreaded, leaving the other more complicated modes
> fallbacking
> > > to the old format, then when we have it solid, go on getting the other
> > > features.
> >
> > Agree.
> > When I did perf top optimization, I also tried Namhyung's perf top multi-
> thread patch.
> > https://lwn.net/Articles/667469/
> > I think it may be a good start point.
> 
> I have to read that to understand why we need those indexes :-\
> 
> > I didn't work on his patch. Because the root cause of bad perf top
> performance
> > is non overwrite mode, which generate lots of samples shortly. It exceeds
> KNL's
> > computational capability. Multi-threading doesn't help much on this case.
> > So I tried to use overwrite mode then.
> 
> Right, work on the problem you have at hand, but all these efforts
> should be considered to move forward.
> 
> > > In the end, having the two formats supported will be needed anyway,
> and
> > > we can as well ask for processing with both perf.data file formats to
> compare
> > > results, while we strenghten out the new code.
> > >
> > > I just think we should do this in a more fine grained way to avoid too
> much
> > > code churn as well as having a fallback to the old code, that albeit non
> > > scalable, is what we have been using and can help in certifying that the
> new
> > > one works well, by comparing its outputs.
> >
> > I already extended the multithreading support for event synthesization in
> perf
> > record.
> > https://github.com/kliang2/perf.git perf_record_opt
> > I will send it out for review shortly after rebasing on the latest perf/core.
> >
> > In the patch series, I realloc buffer for each thread to temporarily keep the
> > processing result, and write them to the perf.data at the end of event
> > synthesization. The number of synthesized event is not big (hundreds of
> > Kilobyte). So I think it should be OK to do that.
> 
> Ok, one thing I noticed was that with the snapshotting code we
> synthesize events multiple times, once per each new perf.data file, I
> haven't tested that with the multithreaded synthesizing code we recently
> merged, have you?

Not yet. I will do the test for perf record.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-11 14:57             ` Liang, Kan
@ 2017-10-12  1:11               ` Wangnan (F)
  2017-10-12 12:49                 ` Liang, Kan
  0 siblings, 1 reply; 41+ messages in thread
From: Wangnan (F) @ 2017-10-12  1:11 UTC (permalink / raw)
  To: Liang, Kan, 'acme@kernel.org',
	'peterz@infradead.org', 'mingo@redhat.com',
	'linux-kernel@vger.kernel.org'
  Cc: 'jolsa@kernel.org', 'hekuang@huawei.com',
	'namhyung@kernel.org',
	'alexander.shishkin@linux.intel.com',
	Hunter, Adrian, 'ak@linux.intel.com'



On 2017/10/11 22:57, Liang, Kan wrote:
>>>>>> If you really want to avoid record duplication, you need to
>>>>>> changes record__mmap_read()'s logic. Now it complains "failed to
>>>>>> keep up with mmap data" and avoid dumping data when size of newly
>>>>>> generated data is larger than the size of the ring buffer. It is
>>>>>> reasonable for forward ring buffer because in this case you lost
>>>>>> the head of the first record, the whole ring buffer is
>>>>>> unparseable. However, it is wrong in backward case. What you
>>>>>> should do in this case is dumping the whole ring buffer.
>>>>>>
>>>>> I think what you want should be something like this: (not tested)
>>>>>
>>>> No. That's not what I want.
>>>> My test code never trigger the WARN_ONCE.
>>> The existing code never trigger that warning because the size computed
>>> by rb_find_range is never larger than size of ring buffer. After
>>> applying your patch, I believe it will trigger this WARN_ONCE and drop
>>> the whole ring buffer. Please set a smaller ring buffer and try again.
>>>
>>>> I think you will see the problem, if you simply run the command as below.
>>>> sudo ./perf record -e cycles:P -C0 --overwrite --switch-output=1s
>>>>
>>>> The output size keep increasing. Because the new output always
>>>> include
>>> the old outputs.
>>>> What I want is the 'start' and 'end' for the increase, not everything.
>>>
>>> This is my test result: add a '-m 1' for 'perf record' for shrinking
>>> ring buffer, start a while loop on CPU 0 to increase data rate.
>>>
>>> It stops increasing after the ring buffer is full:
>>>
>>> $:~/linux/tools/perf$ sudo ./perf record -m1 -e cycles:P -C0
>>> --overwrite --switch-output=1s
>>>     Warning: File /home/w00229757/.perfconfig not owned by current user
>>> or root, ignoring it.
>>> [ perf record: dump data: Woken up 1 times ] [ perf record: Dump
>>> perf.data.2017101212165072 ] [ perf record: dump data: Woken up 1
>>> times ] [ perf record: Dump perf.data.2017101212165175 ] [ perf
>>> record: dump data: Woken up 1 times ] [ perf record: Dump
>>> perf.data.2017101212165278 ] [ perf record: dump data: Woken up 1
>>> times ] [ perf record: Dump perf.data.2017101212165381 ] [ perf
>>> record: dump data: Woken up 1 times ] [ perf record: Dump
>>> perf.data.2017101212165484 ] [ perf record: dump data: Woken up 1
>>> times ] [ perf record: Dump perf.data.2017101212165586 ] ^C[ perf
>>> record: Woken up 1 times to write data ] [ perf record: Dump
>>> perf.data.2017101212165653 ] [ perf record: Captured and wrote 1.013
>>> MB perf.data.<timestamp> ]
>>>
>>> $ ls -l ./perf.data*
>>> -rw------- 1 root root  538988 Oct 12 12:16
>>> ./perf.data.2017101212165072
>>> -rw------- 1 root root  538988 Oct 12 12:16
>>> ./perf.data.2017101212165175
>>> -rw------- 1 root root  538988 Oct 12 12:16
>>> ./perf.data.2017101212165278
>>> -rw------- 1 root root  538988 Oct 12 12:16
>>> ./perf.data.2017101212165381
>>> -rw------- 1 root root  538988 Oct 12 12:16
>>> ./perf.data.2017101212165484
>>> -rw------- 1 root root  538988 Oct 12 12:16
>>> ./perf.data.2017101212165586
>>> -rw------- 1 root root 1067812 Oct 12 12:16
>>> ./perf.data.2017101212165653
>>>
>>> You see the result keep getting larger because the ring buffer is
>>> never full in your case.
>> The increasing file size in my case indicates that the old processed data is
>> dumped into the new output.
>> I don't think it’s right. Because we should not process the same data multiple
>> times.
>> That definitely increases the overhead of perf record.
>>
> For the issue, I mentioned above.
> What do think about the patch as below?
> It tries to avoid the duplicate data.
>
>  From 8b058ea6977a97e5705aa2f64bdd014fd76d1247 Mon Sep 17 00:00:00 2001
> From: Kan Liang <Kan.liang@intel.com>
> Date: Wed, 11 Oct 2017 07:39:34 -0700
> Subject: [PATCH] perf tool: fix: Don't discard prev in backward mode
>
> Perf record can switch output. The new output should only store the data
> after switching. However, in overwrite backward mode, the new output
> still have the data from old output. That also brings extra overhead.
>
> At the end of mmap_read, the position of processed ring buffer is saved
> in md->prev. Next mmap_read should be end in md->prev if it is not
> overwriten. That avoids to process duplicate data.
> However, the md->prev is discarded. So next mmap_read has to process
> whole valid ring buffer, which probably include the old processed
> data.
>
> Introduce fast path for backward_rb_find_range. Stop searching when
> md->prev is detected.
>
> Signed-off-by: Kan Liang <Kan.liang@intel.com>
> ---
>   tools/perf/util/mmap.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 9fe5f9c..36b459a 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -254,7 +254,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd)
>   	return 0;
>   }
>   
> -static int backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64 *end)
> +static int backward_rb_find_range(void *buf, int mask, u64 head,
> +				  u64 old, u64 *start, u64 *end)
>   {
>   	struct perf_event_header *pheader;
>   	u64 evt_head = head;
> @@ -282,6 +283,12 @@ static int backward_rb_find_range(void *buf, int mask, u64 head, u64 *start, u64
>   
>   		evt_head += pheader->size;
>   		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
> +
> +		/* fast path: avoid to process duplicate data */
> +		if (old == evt_head) {
> +			*end = evt_head;
> +			return 0;
> +		}

You still need to parse the whole ring buffer.

You can use 'old - head' to check how many bytes are written into
the ring buffer since you last read. If its size is less than the
ring buffer size, there's no overwriting happen. In this case a
simple copy should be enough.

Thank you.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-12  1:11               ` Wangnan (F)
@ 2017-10-12 12:49                 ` Liang, Kan
  2017-10-12 14:43                   ` Wangnan (F)
  0 siblings, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2017-10-12 12:49 UTC (permalink / raw)
  To: Wangnan (F), 'acme@kernel.org',
	'peterz@infradead.org', 'mingo@redhat.com',
	'linux-kernel@vger.kernel.org'
  Cc: 'jolsa@kernel.org', 'hekuang@huawei.com',
	'namhyung@kernel.org',
	'alexander.shishkin@linux.intel.com',
	Hunter, Adrian, 'ak@linux.intel.com'

> >  From 8b058ea6977a97e5705aa2f64bdd014fd76d1247 Mon Sep 17
> 00:00:00
> > 2001
> > From: Kan Liang <Kan.liang@intel.com>
> > Date: Wed, 11 Oct 2017 07:39:34 -0700
> > Subject: [PATCH] perf tool: fix: Don't discard prev in backward mode
> >
> > Perf record can switch output. The new output should only store the
> > data after switching. However, in overwrite backward mode, the new
> > output still have the data from old output. That also brings extra overhead.
> >
> > At the end of mmap_read, the position of processed ring buffer is
> > saved in md->prev. Next mmap_read should be end in md->prev if it is
> > not overwriten. That avoids to process duplicate data.
> > However, the md->prev is discarded. So next mmap_read has to process
> > whole valid ring buffer, which probably include the old processed
> > data.
> >
> > Introduce fast path for backward_rb_find_range. Stop searching when
> > md->prev is detected.
> >
> > Signed-off-by: Kan Liang <Kan.liang@intel.com>
> > ---
> >   tools/perf/util/mmap.c | 11 +++++++++--
> >   1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index
> > 9fe5f9c..36b459a 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -254,7 +254,8 @@ int perf_mmap__mmap(struct perf_mmap *map,
> struct mmap_params *mp, int fd)
> >   	return 0;
> >   }
> >
> > -static int backward_rb_find_range(void *buf, int mask, u64 head, u64
> > *start, u64 *end)
> > +static int backward_rb_find_range(void *buf, int mask, u64 head,
> > +				  u64 old, u64 *start, u64 *end)
> >   {
> >   	struct perf_event_header *pheader;
> >   	u64 evt_head = head;
> > @@ -282,6 +283,12 @@ static int backward_rb_find_range(void *buf, int
> > mask, u64 head, u64 *start, u64
> >
> >   		evt_head += pheader->size;
> >   		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
> > +
> > +		/* fast path: avoid to process duplicate data */
> > +		if (old == evt_head) {
> > +			*end = evt_head;
> > +			return 0;
> > +		}
> 
> You still need to parse the whole ring buffer.

I don't think so. I'm not using (old & mask).
The upper bit of old is good enough to tell if there is overwriting.

Here are some debugging logs.
The start and end is the result from backward_rb_find_range.
If there is overwriting, it finishes with 'rewind', not 'fast path'.

sudo ./perf record -m1 -e cycles:P -C0 --overwrite --switch-output=1s
backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffff550
Finished reading backward ring buffer: fast path
start   0xfffffffffffff550 end 0x0

backward_rb_find_range: buf=0x7f9ea7a93000, head=ffffffffffffab30
Finished reading backward ring buffer: rewind
start   0xffffffffffffab30 end 0xffffffffffffbb10

backward_rb_find_range: buf=0x7f9ea7a93000, head=ffffffffffff28b0
Finished reading backward ring buffer: rewind
start   0xffffffffffff28b0 end 0xffffffffffff3898

backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe3e40
Finished reading backward ring buffer: rewind
start   0xfffffffffffe3e40 end 0xfffffffffffe4e40

backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe3aa0
Finished reading backward ring buffer: fast path
start   0xfffffffffffe3aa0 end 0xfffffffffffe3e40

backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe3470
Finished reading backward ring buffer: fast path
start   0xfffffffffffe3470 end 0xfffffffffffe3aa0

backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe0610
Finished reading backward ring buffer: rewind
start   0xfffffffffffe0610 end 0xfffffffffffe1610

backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe00d0
Finished reading backward ring buffer: fast path
start   0xfffffffffffe00d0 end 0xfffffffffffe0610

backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffdfe90
Finished reading backward ring buffer: fast path
start   0xfffffffffffdfe90 end 0xfffffffffffe00d0

Thanks,
Kan

> You can use 'old - head' to check how many bytes are written into the ring
> buffer since you last read. If its size is less than the ring buffer size, there's no
> overwriting happen. In this case a simple copy should be enough.
> 
> Thank you.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-12 12:49                 ` Liang, Kan
@ 2017-10-12 14:43                   ` Wangnan (F)
  0 siblings, 0 replies; 41+ messages in thread
From: Wangnan (F) @ 2017-10-12 14:43 UTC (permalink / raw)
  To: Liang, Kan, 'acme@kernel.org',
	'peterz@infradead.org', 'mingo@redhat.com',
	'linux-kernel@vger.kernel.org'
  Cc: 'jolsa@kernel.org', 'hekuang@huawei.com',
	'namhyung@kernel.org',
	'alexander.shishkin@linux.intel.com',
	Hunter, Adrian, 'ak@linux.intel.com'



On 2017/10/12 20:49, Liang, Kan wrote:
>>>   From 8b058ea6977a97e5705aa2f64bdd014fd76d1247 Mon Sep 17
>> 00:00:00
>>> 2001
>>> From: Kan Liang <Kan.liang@intel.com>
>>> Date: Wed, 11 Oct 2017 07:39:34 -0700
>>> Subject: [PATCH] perf tool: fix: Don't discard prev in backward mode
>>>
>>> Perf record can switch output. The new output should only store the
>>> data after switching. However, in overwrite backward mode, the new
>>> output still have the data from old output. That also brings extra overhead.
>>>
>>> At the end of mmap_read, the position of processed ring buffer is
>>> saved in md->prev. Next mmap_read should be end in md->prev if it is
>>> not overwriten. That avoids to process duplicate data.
>>> However, the md->prev is discarded. So next mmap_read has to process
>>> whole valid ring buffer, which probably include the old processed
>>> data.
>>>
>>> Introduce fast path for backward_rb_find_range. Stop searching when
>>> md->prev is detected.
>>>
>>> Signed-off-by: Kan Liang <Kan.liang@intel.com>
>>> ---
>>>    tools/perf/util/mmap.c | 11 +++++++++--
>>>    1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index
>>> 9fe5f9c..36b459a 100644
>>> --- a/tools/perf/util/mmap.c
>>> +++ b/tools/perf/util/mmap.c
>>> @@ -254,7 +254,8 @@ int perf_mmap__mmap(struct perf_mmap *map,
>> struct mmap_params *mp, int fd)
>>>    	return 0;
>>>    }
>>>
>>> -static int backward_rb_find_range(void *buf, int mask, u64 head, u64
>>> *start, u64 *end)
>>> +static int backward_rb_find_range(void *buf, int mask, u64 head,
>>> +				  u64 old, u64 *start, u64 *end)
>>>    {
>>>    	struct perf_event_header *pheader;
>>>    	u64 evt_head = head;
>>> @@ -282,6 +283,12 @@ static int backward_rb_find_range(void *buf, int
>>> mask, u64 head, u64 *start, u64
>>>
>>>    		evt_head += pheader->size;
>>>    		pr_debug3("move evt_head: %"PRIx64"\n", evt_head);
>>> +
>>> +		/* fast path: avoid to process duplicate data */
>>> +		if (old == evt_head) {
>>> +			*end = evt_head;
>>> +			return 0;
>>> +		}
>> You still need to parse the whole ring buffer.
> I don't think so. I'm not using (old & mask).
> The upper bit of old is good enough to tell if there is overwriting.
>
> Here are some debugging logs.
> The start and end is the result from backward_rb_find_range.
> If there is overwriting, it finishes with 'rewind', not 'fast path'.
>
> sudo ./perf record -m1 -e cycles:P -C0 --overwrite --switch-output=1s
> backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffff550
> Finished reading backward ring buffer: fast path
> start   0xfffffffffffff550 end 0x0
>
> backward_rb_find_range: buf=0x7f9ea7a93000, head=ffffffffffffab30
> Finished reading backward ring buffer: rewind
> start   0xffffffffffffab30 end 0xffffffffffffbb10
>
> backward_rb_find_range: buf=0x7f9ea7a93000, head=ffffffffffff28b0
> Finished reading backward ring buffer: rewind
> start   0xffffffffffff28b0 end 0xffffffffffff3898
>
> backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe3e40
> Finished reading backward ring buffer: rewind
> start   0xfffffffffffe3e40 end 0xfffffffffffe4e40
>
> backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe3aa0
> Finished reading backward ring buffer: fast path
> start   0xfffffffffffe3aa0 end 0xfffffffffffe3e40
>
> backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe3470
> Finished reading backward ring buffer: fast path
> start   0xfffffffffffe3470 end 0xfffffffffffe3aa0
>
> backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe0610
> Finished reading backward ring buffer: rewind
> start   0xfffffffffffe0610 end 0xfffffffffffe1610
>
> backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffe00d0
> Finished reading backward ring buffer: fast path
> start   0xfffffffffffe00d0 end 0xfffffffffffe0610
>
> backward_rb_find_range: buf=0x7f9ea7a93000, head=fffffffffffdfe90
> Finished reading backward ring buffer: fast path
> start   0xfffffffffffdfe90 end 0xfffffffffffe00d0

I'm not saying your code is incorrect. What I mean is
you can avoid calling backward_rb_find_range() which
parses the ring buffer.

Please see rb_find_range. In case of forward ring buffer,
it never reads anything for positioning. Just passing start
and end pointers. Your and my implementation requires
reading the ring buffer to find the last record. It needs
a while loop.

When 'old - head' less than the size of the ring buffer,
even backward ring buffer can also avoid such reading.
backward_rb_find_range() should be called when 'old - head'
larger than the size of ring buffer only.

Thank you.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-10 18:14   ` Arnaldo Carvalho de Melo
  2017-10-10 18:18     ` Liang, Kan
@ 2017-10-13 12:55     ` Liang, Kan
  2017-10-13 13:13       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 41+ messages in thread
From: Liang, Kan @ 2017-10-13 12:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

> Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.liang@intel.com escreveu:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > Perf record can switch output. The new output should only store the
> > data after switching. However, in overwrite backward mode, the new
> > output still have the data from old output.
> >
> > At the end of mmap_read, the position of processed ring buffer is
> > saved in md->prev. Next mmap_read should be end in md->prev.
> > However, the md->prev is discarded. So next mmap_read has to process
> > whole valid ring buffer, which definitely include the old processed
> > data.
> >
> > Set the prev as the end of the range in backward mode.
> 
> Do you think this should go together with the rest of this patchkit?
> Probably it should be processed ASAP, i.e. perf/urgent, no?
>

Hi Arnaldo,

After some discussion, Wang Nan proposed a new patch to fix this issue.
https://lkml.org/lkml/2017/10/12/441
I did some tests. It fixed the issue.

Could you please take a look?
If it's OK for you, I think it could be merged separately.

Thanks,
Kan
 
> - Arnaldo
> 
> > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > ---
> >  tools/perf/util/evlist.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index
> > 33b8837..7d23cf5 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -742,13 +742,25 @@ static int
> >  rb_find_range(void *data, int mask, u64 head, u64 old,
> >  	      u64 *start, u64 *end, bool backward)  {
> > +	int ret;
> > +
> >  	if (!backward) {
> >  		*start = old;
> >  		*end = head;
> >  		return 0;
> >  	}
> >
> > -	return backward_rb_find_range(data, mask, head, start, end);
> > +	ret = backward_rb_find_range(data, mask, head, start, end);
> > +
> > +	/*
> > +	 * The start and end from backward_rb_find_range is the range for
> all
> > +	 * valid data in ring buffer.
> > +	 * However, part of the data is processed previously.
> > +	 * Reset the end to drop the processed data
> > +	 */
> > +	*end = old;
> > +
> > +	return ret;
> >  }
> >
> >  /*
> > --
> > 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-13 12:55     ` Liang, Kan
@ 2017-10-13 13:13       ` Arnaldo Carvalho de Melo
  2017-10-13 13:14         ` Liang, Kan
  0 siblings, 1 reply; 41+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-13 13:13 UTC (permalink / raw)
  To: Liang, Kan
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

Em Fri, Oct 13, 2017 at 12:55:34PM +0000, Liang, Kan escreveu:
> > Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.liang@intel.com escreveu:
> > > From: Kan Liang <kan.liang@intel.com>
> > >
> > > Perf record can switch output. The new output should only store the
> > > data after switching. However, in overwrite backward mode, the new
> > > output still have the data from old output.
> > >
> > > At the end of mmap_read, the position of processed ring buffer is
> > > saved in md->prev. Next mmap_read should be end in md->prev.
> > > However, the md->prev is discarded. So next mmap_read has to process
> > > whole valid ring buffer, which definitely include the old processed
> > > data.
> > >
> > > Set the prev as the end of the range in backward mode.
> > 
> > Do you think this should go together with the rest of this patchkit?
> > Probably it should be processed ASAP, i.e. perf/urgent, no?
> >
> 
> Hi Arnaldo,
> 
> After some discussion, Wang Nan proposed a new patch to fix this issue.
> https://lkml.org/lkml/2017/10/12/441
> I did some tests. It fixed the issue.

Ok, I'll look at it and stick a "Tested-by: Kan", ok?

- Arnaldo
 
> Could you please take a look?
> If it's OK for you, I think it could be merged separately.
> 
> Thanks,
> Kan
>  
> > - Arnaldo
> > 
> > > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > > ---
> > >  tools/perf/util/evlist.c | 14 +++++++++++++-
> > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index
> > > 33b8837..7d23cf5 100644
> > > --- a/tools/perf/util/evlist.c
> > > +++ b/tools/perf/util/evlist.c
> > > @@ -742,13 +742,25 @@ static int
> > >  rb_find_range(void *data, int mask, u64 head, u64 old,
> > >  	      u64 *start, u64 *end, bool backward)  {
> > > +	int ret;
> > > +
> > >  	if (!backward) {
> > >  		*start = old;
> > >  		*end = head;
> > >  		return 0;
> > >  	}
> > >
> > > -	return backward_rb_find_range(data, mask, head, start, end);
> > > +	ret = backward_rb_find_range(data, mask, head, start, end);
> > > +
> > > +	/*
> > > +	 * The start and end from backward_rb_find_range is the range for
> > all
> > > +	 * valid data in ring buffer.
> > > +	 * However, part of the data is processed previously.
> > > +	 * Reset the end to drop the processed data
> > > +	 */
> > > +	*end = old;
> > > +
> > > +	return ret;
> > >  }
> > >
> > >  /*
> > > --
> > > 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode
  2017-10-13 13:13       ` Arnaldo Carvalho de Melo
@ 2017-10-13 13:14         ` Liang, Kan
  0 siblings, 0 replies; 41+ messages in thread
From: Liang, Kan @ 2017-10-13 13:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: peterz, mingo, linux-kernel, jolsa, wangnan0, hekuang, namhyung,
	alexander.shishkin, Hunter, Adrian, ak

> Em Fri, Oct 13, 2017 at 12:55:34PM +0000, Liang, Kan escreveu:
> > > Em Tue, Oct 10, 2017 at 10:20:15AM -0700, kan.liang@intel.com escreveu:
> > > > From: Kan Liang <kan.liang@intel.com>
> > > >
> > > > Perf record can switch output. The new output should only store
> > > > the data after switching. However, in overwrite backward mode, the
> > > > new output still have the data from old output.
> > > >
> > > > At the end of mmap_read, the position of processed ring buffer is
> > > > saved in md->prev. Next mmap_read should be end in md->prev.
> > > > However, the md->prev is discarded. So next mmap_read has to
> > > > process whole valid ring buffer, which definitely include the old
> > > > processed data.
> > > >
> > > > Set the prev as the end of the range in backward mode.
> > >
> > > Do you think this should go together with the rest of this patchkit?
> > > Probably it should be processed ASAP, i.e. perf/urgent, no?
> > >
> >
> > Hi Arnaldo,
> >
> > After some discussion, Wang Nan proposed a new patch to fix this issue.
> > https://lkml.org/lkml/2017/10/12/441
> > I did some tests. It fixed the issue.
> 
> Ok, I'll look at it and stick a "Tested-by: Kan", ok?

Sure.

Thanks,
Kan

> 
> - Arnaldo
> 
> > Could you please take a look?
> > If it's OK for you, I think it could be merged separately.
> >
> > Thanks,
> > Kan
> >
> > > - Arnaldo
> > >
> > > > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > > > ---
> > > >  tools/perf/util/evlist.c | 14 +++++++++++++-
> > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > > > index
> > > > 33b8837..7d23cf5 100644
> > > > --- a/tools/perf/util/evlist.c
> > > > +++ b/tools/perf/util/evlist.c
> > > > @@ -742,13 +742,25 @@ static int
> > > >  rb_find_range(void *data, int mask, u64 head, u64 old,
> > > >  	      u64 *start, u64 *end, bool backward)  {
> > > > +	int ret;
> > > > +
> > > >  	if (!backward) {
> > > >  		*start = old;
> > > >  		*end = head;
> > > >  		return 0;
> > > >  	}
> > > >
> > > > -	return backward_rb_find_range(data, mask, head, start, end);
> > > > +	ret = backward_rb_find_range(data, mask, head, start, end);
> > > > +
> > > > +	/*
> > > > +	 * The start and end from backward_rb_find_range is the range
> > > > +for
> > > all
> > > > +	 * valid data in ring buffer.
> > > > +	 * However, part of the data is processed previously.
> > > > +	 * Reset the end to drop the processed data
> > > > +	 */
> > > > +	*end = old;
> > > > +
> > > > +	return ret;
> > > >  }
> > > >
> > > >  /*
> > > > --
> > > > 2.5.5

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-10-13 13:14 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 17:20 [PATCH 00/10] new mmap_read interfaces for ring buffer kan.liang
2017-10-10 17:20 ` [PATCH 01/10] perf record: new interfaces to read ring buffer to file kan.liang
2017-10-10 18:24   ` Arnaldo Carvalho de Melo
2017-10-10 18:30   ` Arnaldo Carvalho de Melo
2017-10-11  4:12     ` Liang, Kan
2017-10-11 14:45       ` Arnaldo Carvalho de Melo
2017-10-11 15:16         ` Liang, Kan
2017-10-10 17:20 ` [PATCH 02/10] perf tool: fix: Don't discard prev in backward mode kan.liang
2017-10-10 18:14   ` Arnaldo Carvalho de Melo
2017-10-10 18:18     ` Liang, Kan
2017-10-13 12:55     ` Liang, Kan
2017-10-13 13:13       ` Arnaldo Carvalho de Melo
2017-10-13 13:14         ` Liang, Kan
2017-10-10 18:23   ` Wangnan (F)
2017-10-10 18:50     ` Wangnan (F)
2017-10-10 19:50       ` Liang, Kan
2017-10-10 20:18         ` Wangnan (F)
2017-10-11  2:12           ` Liang, Kan
2017-10-11 14:57             ` Liang, Kan
2017-10-12  1:11               ` Wangnan (F)
2017-10-12 12:49                 ` Liang, Kan
2017-10-12 14:43                   ` Wangnan (F)
2017-10-10 19:36     ` Liang, Kan
2017-10-10 17:20 ` [PATCH 03/10] perf tool: new iterfaces to read event from ring buffer kan.liang
2017-10-10 18:15   ` Arnaldo Carvalho de Melo
2017-10-10 18:28     ` Liang, Kan
2017-10-10 18:34       ` Arnaldo Carvalho de Melo
2017-10-10 18:36         ` Arnaldo Carvalho de Melo
2017-10-10 19:00           ` Arnaldo Carvalho de Melo
2017-10-10 19:10             ` Wangnan (F)
2017-10-10 19:17               ` Arnaldo Carvalho de Melo
2017-10-10 19:22                 ` Wangnan (F)
2017-10-10 19:55                   ` Liang, Kan
2017-10-10 19:59                     ` Wangnan (F)
2017-10-10 17:20 ` [PATCH 04/10] perf tool: perf_mmap__read_init wrapper for evlist kan.liang
2017-10-10 17:20 ` [PATCH 05/10] perf top: apply new mmap_read interfaces kan.liang
2017-10-10 17:20 ` [PATCH 06/10] perf trace: " kan.liang
2017-10-10 17:20 ` [PATCH 07/10] perf kvm: " kan.liang
2017-10-10 17:20 ` [PATCH 08/10] perf python: " kan.liang
2017-10-10 17:20 ` [PATCH 09/10] perf tests: " kan.liang
2017-10-10 17:20 ` [PATCH 10/10] perf tool: remove stale " kan.liang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.