From: Leo Yan <leo.yan@linaro.org> To: German Gomez <german.gomez@arm.com> Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>, Mathieu Poirier <mathieu.poirier@linaro.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>, Mike Leach <mike.leach@linaro.org>, linux-arm-kernel@lists.infradead.org, coresight@lists.linaro.org Subject: Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback Date: Thu, 23 Sep 2021 21:50:16 +0800 [thread overview] Message-ID: <20210923135016.GG400258@leoy-ThinkPad-X240s> (raw) In-Reply-To: <20210916154635.1525-4-german.gomez@arm.com> Hi German, On Thu, Sep 16, 2021 at 04:46:34PM +0100, German Gomez wrote: > The head pointer of the AUX buffer managed by the arm_spe_pmu.c driver > is not monotonically increasing, therefore the find_snapshot callback is > needed in order to find the trace data within the AUX buffer and avoid > wasting space in the perf.data file. > > The pointer is assumed to have wrapped if the buffer contains non-zero > data at the end. If it has wrapped, the entire contents of the AUX > buffer are stored in the perf.data file. Otherwise only the data up to > the head pointer is stored. > > Reviewed-by: James Clark <james.clark@arm.com> > Signed-off-by: German Gomez <german.gomez@arm.com> > --- > tools/perf/arch/arm64/util/arm-spe.c | 145 +++++++++++++++++++++++++++ > 1 file changed, 145 insertions(+) > > diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c > index f8b03d164b42..56785034fc84 100644 > --- a/tools/perf/arch/arm64/util/arm-spe.c > +++ b/tools/perf/arch/arm64/util/arm-spe.c > @@ -23,6 +23,7 @@ > #include "../../../util/auxtrace.h" > #include "../../../util/record.h" > #include "../../../util/arm-spe.h" > +#include <tools/libc_compat.h> // reallocarray > > #define KiB(x) ((x) * 1024) > #define MiB(x) ((x) * 1024 * 1024) > @@ -31,6 +32,8 @@ struct arm_spe_recording { > struct auxtrace_record itr; > struct perf_pmu *arm_spe_pmu; > struct evlist *evlist; > + int wrapped_cnt; > + bool *wrapped; > }; > > static void arm_spe_set_timestamp(struct auxtrace_record *itr, > @@ -299,6 +302,146 @@ static int arm_spe_snapshot_finish(struct auxtrace_record *itr) > return -EINVAL; > } > > +static int arm_spe_alloc_wrapped_array(struct arm_spe_recording *ptr, int idx) > +{ > + bool *wrapped; > + int cnt = ptr->wrapped_cnt, new_cnt, i; > + > + /* > + * No need to allocate, so return early. > + */ > + if (idx < cnt) > + return 0; > + > + /* > + * Make ptr->wrapped as big as idx. > + */ > + new_cnt = idx + 1; > + > + /* > + * Free'ed in arm_spe_recording_free(). > + */ > + wrapped = reallocarray(ptr->wrapped, new_cnt, sizeof(bool)); > + if (!wrapped) > + return -ENOMEM; > + > + /* > + * init new allocated values. > + */ > + for (i = cnt; i < new_cnt; i++) > + wrapped[i] = false; > + > + ptr->wrapped_cnt = new_cnt; > + ptr->wrapped = wrapped; > + > + return 0; > +} > + > +static bool arm_spe_buffer_has_wrapped(unsigned char *buffer, > + size_t buffer_size, u64 head) > +{ > + u64 i, watermark; > + u64 *buf = (u64 *)buffer; > + size_t buf_size = buffer_size; > + > + /* > + * Defensively handle the case where head might be continually increasing - if its value is > + * equal or greater than the size of the ring buffer, then we can safely determine it has > + * wrapped around. Otherwise, continue to detect if head might have wrapped. > + */ > + if (head >= buffer_size) > + return true; > + > + /* > + * We want to look the very last 512 byte (chosen arbitrarily) in the ring buffer. > + */ > + watermark = buf_size - 512; > + > + /* > + * The value of head is somewhere within the size of the ring buffer. This can be that there > + * hasn't been enough data to fill the ring buffer yet or the trace time was so long that > + * head has numerically wrapped around. To find we need to check if we have data at the > + * very end of the ring buffer. We can reliably do this because mmap'ed pages are zeroed > + * out and there is a fresh mapping with every new session. > + */ > + > + /* > + * head is less than 512 byte from the end of the ring buffer. > + */ > + if (head > watermark) > + watermark = head; > + > + /* > + * Speed things up by using 64 bit transactions (see "u64 *buf" above) > + */ > + watermark /= sizeof(u64); > + buf_size /= sizeof(u64); > + > + /* > + * If we find trace data at the end of the ring buffer, head has been there and has > + * numerically wrapped around at least once. > + */ > + for (i = watermark; i < buf_size; i++) > + if (buf[i]) > + return true; > + > + return false; > +} > + > +static int arm_spe_find_snapshot(struct auxtrace_record *itr, int idx, > + struct auxtrace_mmap *mm, unsigned char *data, > + u64 *head, u64 *old) > +{ > + int err; > + bool wrapped; > + struct arm_spe_recording *ptr = > + container_of(itr, struct arm_spe_recording, itr); > + > + /* > + * Allocate memory to keep track of wrapping if this is the first > + * time we deal with this *mm. > + */ > + if (idx >= ptr->wrapped_cnt) { > + err = arm_spe_alloc_wrapped_array(ptr, idx); > + if (err) > + return err; > + } > + > + /* > + * Check to see if *head has wrapped around. If it hasn't only the > + * amount of data between *head and *old is snapshot'ed to avoid > + * bloating the perf.data file with zeros. But as soon as *head has > + * wrapped around the entire size of the AUX ring buffer it taken. > + */ > + wrapped = ptr->wrapped[idx]; > + if (!wrapped && arm_spe_buffer_has_wrapped(data, mm->len, *head)) { > + wrapped = true; > + ptr->wrapped[idx] = true; > + } > + > + pr_debug3("%s: mmap index %d old head %zu new head %zu size %zu\n", > + __func__, idx, (size_t)*old, (size_t)*head, mm->len); > + > + /* > + * No wrap has occurred, we can just use *head and *old. > + */ > + if (!wrapped) > + return 0; > + > + /* > + * *head has wrapped around - adjust *head and *old to pickup the > + * entire content of the AUX buffer. > + */ > + if (*head >= mm->len) { > + *old = *head - mm->len; > + } else { > + *head += mm->len; > + *old = *head - mm->len; > + } > + > + return 0; > +} > + > static u64 arm_spe_reference(struct auxtrace_record *itr __maybe_unused) > { > struct timespec ts; > @@ -313,6 +456,7 @@ static void arm_spe_recording_free(struct auxtrace_record *itr) > struct arm_spe_recording *sper = > container_of(itr, struct arm_spe_recording, itr); > > + free(sper->wrapped); > free(sper); > } > > @@ -336,6 +480,7 @@ struct auxtrace_record *arm_spe_recording_init(int *err, > sper->itr.pmu = arm_spe_pmu; > sper->itr.snapshot_start = arm_spe_snapshot_start; > sper->itr.snapshot_finish = arm_spe_snapshot_finish; > + sper->itr.find_snapshot = arm_spe_find_snapshot; If I understand correctly, this patch copies the code from cs-etm for snapshot handling. About 2 months ago, we removed the Arm cs-etm's specific snapshot callback function and directly use perf's function __auxtrace_mmap__read() to handle 'head' and 'tail' pointers. Please see the commit for details: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f01c200d4405c4562e45e8bb4de44a5ce37b217 Before I review more details for snapshot enabling in patches 03 and 04, could you confirm if Arm SPE can use the same way with cs-etm for snapshot handling? From my understanding, this is a better way to handle AUX buffer's 'head' and 'tail'. Thanks, Leo > sper->itr.parse_snapshot_options = arm_spe_parse_snapshot_options; > sper->itr.recording_options = arm_spe_recording_options; > sper->itr.info_priv_size = arm_spe_info_priv_size; > -- > 2.17.1 >
WARNING: multiple messages have this Message-ID (diff)
From: Leo Yan <leo.yan@linaro.org> To: German Gomez <german.gomez@arm.com> Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>, Mathieu Poirier <mathieu.poirier@linaro.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>, Mike Leach <mike.leach@linaro.org>, linux-arm-kernel@lists.infradead.org, coresight@lists.linaro.org Subject: Re: [PATCH 4/5] perf arm-spe: Implement find_snapshot callback Date: Thu, 23 Sep 2021 21:50:16 +0800 [thread overview] Message-ID: <20210923135016.GG400258@leoy-ThinkPad-X240s> (raw) In-Reply-To: <20210916154635.1525-4-german.gomez@arm.com> Hi German, On Thu, Sep 16, 2021 at 04:46:34PM +0100, German Gomez wrote: > The head pointer of the AUX buffer managed by the arm_spe_pmu.c driver > is not monotonically increasing, therefore the find_snapshot callback is > needed in order to find the trace data within the AUX buffer and avoid > wasting space in the perf.data file. > > The pointer is assumed to have wrapped if the buffer contains non-zero > data at the end. If it has wrapped, the entire contents of the AUX > buffer are stored in the perf.data file. Otherwise only the data up to > the head pointer is stored. > > Reviewed-by: James Clark <james.clark@arm.com> > Signed-off-by: German Gomez <german.gomez@arm.com> > --- > tools/perf/arch/arm64/util/arm-spe.c | 145 +++++++++++++++++++++++++++ > 1 file changed, 145 insertions(+) > > diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c > index f8b03d164b42..56785034fc84 100644 > --- a/tools/perf/arch/arm64/util/arm-spe.c > +++ b/tools/perf/arch/arm64/util/arm-spe.c > @@ -23,6 +23,7 @@ > #include "../../../util/auxtrace.h" > #include "../../../util/record.h" > #include "../../../util/arm-spe.h" > +#include <tools/libc_compat.h> // reallocarray > > #define KiB(x) ((x) * 1024) > #define MiB(x) ((x) * 1024 * 1024) > @@ -31,6 +32,8 @@ struct arm_spe_recording { > struct auxtrace_record itr; > struct perf_pmu *arm_spe_pmu; > struct evlist *evlist; > + int wrapped_cnt; > + bool *wrapped; > }; > > static void arm_spe_set_timestamp(struct auxtrace_record *itr, > @@ -299,6 +302,146 @@ static int arm_spe_snapshot_finish(struct auxtrace_record *itr) > return -EINVAL; > } > > +static int arm_spe_alloc_wrapped_array(struct arm_spe_recording *ptr, int idx) > +{ > + bool *wrapped; > + int cnt = ptr->wrapped_cnt, new_cnt, i; > + > + /* > + * No need to allocate, so return early. > + */ > + if (idx < cnt) > + return 0; > + > + /* > + * Make ptr->wrapped as big as idx. > + */ > + new_cnt = idx + 1; > + > + /* > + * Free'ed in arm_spe_recording_free(). > + */ > + wrapped = reallocarray(ptr->wrapped, new_cnt, sizeof(bool)); > + if (!wrapped) > + return -ENOMEM; > + > + /* > + * init new allocated values. > + */ > + for (i = cnt; i < new_cnt; i++) > + wrapped[i] = false; > + > + ptr->wrapped_cnt = new_cnt; > + ptr->wrapped = wrapped; > + > + return 0; > +} > + > +static bool arm_spe_buffer_has_wrapped(unsigned char *buffer, > + size_t buffer_size, u64 head) > +{ > + u64 i, watermark; > + u64 *buf = (u64 *)buffer; > + size_t buf_size = buffer_size; > + > + /* > + * Defensively handle the case where head might be continually increasing - if its value is > + * equal or greater than the size of the ring buffer, then we can safely determine it has > + * wrapped around. Otherwise, continue to detect if head might have wrapped. > + */ > + if (head >= buffer_size) > + return true; > + > + /* > + * We want to look the very last 512 byte (chosen arbitrarily) in the ring buffer. > + */ > + watermark = buf_size - 512; > + > + /* > + * The value of head is somewhere within the size of the ring buffer. This can be that there > + * hasn't been enough data to fill the ring buffer yet or the trace time was so long that > + * head has numerically wrapped around. To find we need to check if we have data at the > + * very end of the ring buffer. We can reliably do this because mmap'ed pages are zeroed > + * out and there is a fresh mapping with every new session. > + */ > + > + /* > + * head is less than 512 byte from the end of the ring buffer. > + */ > + if (head > watermark) > + watermark = head; > + > + /* > + * Speed things up by using 64 bit transactions (see "u64 *buf" above) > + */ > + watermark /= sizeof(u64); > + buf_size /= sizeof(u64); > + > + /* > + * If we find trace data at the end of the ring buffer, head has been there and has > + * numerically wrapped around at least once. > + */ > + for (i = watermark; i < buf_size; i++) > + if (buf[i]) > + return true; > + > + return false; > +} > + > +static int arm_spe_find_snapshot(struct auxtrace_record *itr, int idx, > + struct auxtrace_mmap *mm, unsigned char *data, > + u64 *head, u64 *old) > +{ > + int err; > + bool wrapped; > + struct arm_spe_recording *ptr = > + container_of(itr, struct arm_spe_recording, itr); > + > + /* > + * Allocate memory to keep track of wrapping if this is the first > + * time we deal with this *mm. > + */ > + if (idx >= ptr->wrapped_cnt) { > + err = arm_spe_alloc_wrapped_array(ptr, idx); > + if (err) > + return err; > + } > + > + /* > + * Check to see if *head has wrapped around. If it hasn't only the > + * amount of data between *head and *old is snapshot'ed to avoid > + * bloating the perf.data file with zeros. But as soon as *head has > + * wrapped around the entire size of the AUX ring buffer it taken. > + */ > + wrapped = ptr->wrapped[idx]; > + if (!wrapped && arm_spe_buffer_has_wrapped(data, mm->len, *head)) { > + wrapped = true; > + ptr->wrapped[idx] = true; > + } > + > + pr_debug3("%s: mmap index %d old head %zu new head %zu size %zu\n", > + __func__, idx, (size_t)*old, (size_t)*head, mm->len); > + > + /* > + * No wrap has occurred, we can just use *head and *old. > + */ > + if (!wrapped) > + return 0; > + > + /* > + * *head has wrapped around - adjust *head and *old to pickup the > + * entire content of the AUX buffer. > + */ > + if (*head >= mm->len) { > + *old = *head - mm->len; > + } else { > + *head += mm->len; > + *old = *head - mm->len; > + } > + > + return 0; > +} > + > static u64 arm_spe_reference(struct auxtrace_record *itr __maybe_unused) > { > struct timespec ts; > @@ -313,6 +456,7 @@ static void arm_spe_recording_free(struct auxtrace_record *itr) > struct arm_spe_recording *sper = > container_of(itr, struct arm_spe_recording, itr); > > + free(sper->wrapped); > free(sper); > } > > @@ -336,6 +480,7 @@ struct auxtrace_record *arm_spe_recording_init(int *err, > sper->itr.pmu = arm_spe_pmu; > sper->itr.snapshot_start = arm_spe_snapshot_start; > sper->itr.snapshot_finish = arm_spe_snapshot_finish; > + sper->itr.find_snapshot = arm_spe_find_snapshot; If I understand correctly, this patch copies the code from cs-etm for snapshot handling. About 2 months ago, we removed the Arm cs-etm's specific snapshot callback function and directly use perf's function __auxtrace_mmap__read() to handle 'head' and 'tail' pointers. Please see the commit for details: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f01c200d4405c4562e45e8bb4de44a5ce37b217 Before I review more details for snapshot enabling in patches 03 and 04, could you confirm if Arm SPE can use the same way with cs-etm for snapshot handling? From my understanding, this is a better way to handle AUX buffer's 'head' and 'tail'. Thanks, Leo > sper->itr.parse_snapshot_options = arm_spe_parse_snapshot_options; > sper->itr.recording_options = arm_spe_recording_options; > sper->itr.info_priv_size = arm_spe_info_priv_size; > -- > 2.17.1 > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-09-23 13:50 UTC|newest] Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-16 15:46 [PATCH 1/5] perf cs-etm: Print size using consistent format German Gomez 2021-09-16 15:46 ` German Gomez 2021-09-16 15:46 ` [PATCH 2/5] perf arm-spe: " German Gomez 2021-09-16 15:46 ` German Gomez 2021-09-23 13:35 ` Leo Yan 2021-09-23 13:35 ` Leo Yan 2021-09-16 15:46 ` [PATCH 3/5] perf arm-spe: Add snapshot mode support German Gomez 2021-09-16 15:46 ` German Gomez 2021-10-20 12:48 ` Leo Yan 2021-10-20 12:48 ` Leo Yan 2021-09-16 15:46 ` [PATCH 4/5] perf arm-spe: Implement find_snapshot callback German Gomez 2021-09-16 15:46 ` German Gomez 2021-09-23 13:50 ` Leo Yan [this message] 2021-09-23 13:50 ` Leo Yan 2021-09-23 14:40 ` Leo Yan 2021-09-23 14:40 ` Leo Yan 2021-09-30 12:26 ` German Gomez 2021-09-30 12:26 ` German Gomez 2021-10-04 12:27 ` Leo Yan 2021-10-04 12:27 ` Leo Yan 2021-10-06 9:35 ` German Gomez 2021-10-06 9:35 ` German Gomez 2021-10-06 9:51 ` Leo Yan 2021-10-06 9:51 ` Leo Yan 2021-10-11 15:55 ` German Gomez 2021-10-11 15:55 ` German Gomez 2021-10-12 8:19 ` Will Deacon 2021-10-12 8:19 ` Will Deacon 2021-10-12 8:47 ` James Clark 2021-10-12 8:47 ` James Clark 2021-10-13 0:39 ` Leo Yan 2021-10-13 0:39 ` Leo Yan 2021-10-13 7:51 ` Will Deacon 2021-10-13 7:51 ` Will Deacon 2021-10-15 12:33 ` German Gomez 2021-10-15 12:33 ` German Gomez 2021-10-15 14:16 ` Leo Yan 2021-10-15 14:16 ` Leo Yan 2021-10-15 14:41 ` German Gomez 2021-10-15 14:41 ` German Gomez 2021-10-17 6:13 ` Leo Yan 2021-10-17 6:13 ` Leo Yan 2021-10-19 9:23 ` German Gomez 2021-10-19 9:23 ` German Gomez 2021-10-19 13:12 ` Leo Yan 2021-10-19 13:12 ` Leo Yan 2021-11-02 11:02 ` German Gomez 2021-11-02 11:02 ` German Gomez 2021-10-17 12:05 ` Leo Yan 2021-10-17 12:05 ` Leo Yan 2021-10-17 12:36 ` Leo Yan 2021-10-17 12:36 ` Leo Yan 2021-10-19 17:34 ` German Gomez 2021-10-19 17:34 ` German Gomez 2021-10-20 13:25 ` Leo Yan 2021-10-20 13:25 ` Leo Yan 2021-09-16 15:46 ` [PATCH 5/5] perf arm-spe: Snapshot mode test German Gomez 2021-09-16 15:46 ` German Gomez 2021-10-20 13:13 ` Leo Yan 2021-10-20 13:13 ` Leo Yan 2021-10-20 15:06 ` German Gomez 2021-10-20 15:06 ` German Gomez 2021-11-02 14:07 ` James Clark 2021-11-02 14:07 ` James Clark 2021-11-02 15:37 ` James Clark 2021-11-02 15:37 ` James Clark 2021-11-09 13:26 ` German Gomez 2021-11-09 13:26 ` German Gomez 2021-09-23 13:35 ` [PATCH 1/5] perf cs-etm: Print size using consistent format Leo Yan 2021-09-23 13:35 ` Leo Yan 2021-09-23 16:24 ` Mathieu Poirier 2021-09-23 16:24 ` Mathieu Poirier 2021-09-30 12:09 ` German Gomez 2021-09-30 12:09 ` German Gomez 2021-09-30 16:30 ` Mathieu Poirier 2021-09-30 16:30 ` Mathieu Poirier
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210923135016.GG400258@leoy-ThinkPad-X240s \ --to=leo.yan@linaro.org \ --cc=alexander.shishkin@linux.intel.com \ --cc=coresight@lists.linaro.org \ --cc=german.gomez@arm.com \ --cc=john.garry@huawei.com \ --cc=jolsa@redhat.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-perf-users@vger.kernel.org \ --cc=mark.rutland@arm.com \ --cc=mathieu.poirier@linaro.org \ --cc=mike.leach@linaro.org \ --cc=namhyung@kernel.org \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.