* [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
@ 2017-11-15 12:13 Sagar Arun Kamble
2017-11-15 12:13 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Sagar Arun Kamble
` (7 more replies)
0 siblings, 8 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-15 12:13 UTC (permalink / raw)
To: intel-gfx
We can compute system time corresponding to GPU timestamp by taking a
reference point (CPU monotonic time, GPU timestamp) and then adding
delta time computed using timecounter/cyclecounter support in kernel.
We have to configure cyclecounter with the GPU timestamp frequency.
Earlier approach that was based on cross-timestamp is not needed. It
was being used to approximate the frequency based on invalid assumptions
(possibly drift was being seen in the time due to precision issue).
The precision of time from GPU clocks is already in ns and timecounter
takes care of it as verified over variable durations.
This series adds base timecounter/cyclecounter changes and changes to
get GPU and CPU timestamps in OA samples.
Sagar Arun Kamble (1):
drm/i915/perf: Add support to correlate GPU timestamp with system time
Sourab Gupta (3):
drm/i915/perf: Add support for collecting 64 bit timestamps with OA
reports
drm/i915/perf: Extract raw GPU timestamps from OA reports
drm/i915/perf: Send system clock monotonic time in perf samples
drivers/gpu/drm/i915/i915_drv.h | 11 ++++
drivers/gpu/drm/i915/i915_perf.c | 124 ++++++++++++++++++++++++++++++++++++++-
drivers/gpu/drm/i915/i915_reg.h | 6 ++
include/uapi/drm/i915_drm.h | 14 +++++
4 files changed, 154 insertions(+), 1 deletion(-)
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
@ 2017-11-15 12:13 ` Sagar Arun Kamble
2017-11-15 12:25 ` Chris Wilson
2017-11-15 12:13 ` [RFC 2/4] drm/i915/perf: Add support for collecting 64 bit timestamps with OA reports Sagar Arun Kamble
` (6 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-15 12:13 UTC (permalink / raw)
To: intel-gfx; +Cc: Sourab Gupta, Matthew Auld
We can compute system time corresponding to GPU timestamp by taking a
reference point and then adding delta time computed using timecounter
and cyclecounter support in kernel. We have to configure cyclecounter
with the GPU timestamp frequency.
In further patches we can leverage timecounter_cyc2time function to
compute the system time corresponding to GPU timestamp cycles derived
from OA report. Important thing to note is timecounter_cyc2time considers
time backwards if delta timestamp is more than half the max ns time
covered by counter. (It will be ~35min for 36 bit counter. If this much
sampling duration is needed we will have to update tc->nsec by explicitly
reading the timecounter after duration less than 35min during sampling)
On enabling perf stream we start the timecounter/cyclecounter and while
collecting OA samples we translate GPU timestamp to System timestamp.
Earlier approach that was based on cross-timestamp is not needed. It
was being used to approximate the frequency based on invalid assumptions
(possibly drift was being seen in the time due to precision issue).
The precision of time from GPU clocks is already in ns and timecounter
takes care of it as verified over variable durations.
Cross-timestamp might be valuable to get the precise reference point
w.r.t device and system time but it needs a support for reading ART
counter from i915 which I find not available for i915. Hence, we are
compensating the offset to setup the reference point ourselves.
With this approach we have very fine precision of device and system time
only differing by 5-10us.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sourab Gupta <sourab.gupta@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 9 +++++++
drivers/gpu/drm/i915/i915_perf.c | 53 ++++++++++++++++++++++++++++++++++++++++
drivers/gpu/drm/i915/i915_reg.h | 6 +++++
3 files changed, 68 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2158a75..e08bc85 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -43,6 +43,7 @@
#include <linux/pm_qos.h>
#include <linux/reservation.h>
#include <linux/shmem_fs.h>
+#include <linux/timecounter.h>
#include <drm/drmP.h>
#include <drm/intel-gtt.h>
@@ -2149,6 +2150,14 @@ struct i915_perf_stream {
* @oa_config: The OA configuration used by the stream.
*/
struct i915_oa_config *oa_config;
+
+ /**
+ * System time correlation variables.
+ */
+ struct cyclecounter cc;
+ spinlock_t systime_lock;
+ struct timespec64 start_systime;
+ struct timecounter tc;
};
/**
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 00be015..72ddc34 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -192,6 +192,7 @@
*/
#include <linux/anon_inodes.h>
+#include <linux/clocksource.h>
#include <linux/sizes.h>
#include <linux/uuid.h>
@@ -2391,6 +2392,56 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
}
/**
+ * i915_cyclecounter_read - read raw cycle/timestamp counter
+ * @cc: cyclecounter structure
+ */
+static u64 i915_cyclecounter_read(const struct cyclecounter *cc)
+{
+ struct i915_perf_stream *stream = container_of(cc, typeof(*stream), cc);
+ struct drm_i915_private *dev_priv = stream->dev_priv;
+ u64 ts_count;
+
+ intel_runtime_pm_get(dev_priv);
+ ts_count = I915_READ64_2x32(GEN4_TIMESTAMP,
+ GEN7_TIMESTAMP_UDW);
+ intel_runtime_pm_put(dev_priv);
+
+ return ts_count;
+}
+
+static void i915_perf_init_cyclecounter(struct i915_perf_stream *stream)
+{
+ struct drm_i915_private *dev_priv = stream->dev_priv;
+ int cs_ts_freq = dev_priv->perf.oa.timestamp_frequency;
+ struct cyclecounter *cc = &stream->cc;
+ u32 maxsec;
+
+ cc->read = i915_cyclecounter_read;
+ cc->mask = CYCLECOUNTER_MASK(CS_TIMESTAMP_WIDTH(dev_priv));
+ maxsec = cc->mask / cs_ts_freq;
+
+ clocks_calc_mult_shift(&cc->mult, &cc->shift, cs_ts_freq,
+ NSEC_PER_SEC, maxsec);
+}
+
+static void i915_perf_init_timecounter(struct i915_perf_stream *stream)
+{
+#define SYSTIME_START_OFFSET 350000 /* Counter read takes about 350us */
+ unsigned long flags;
+ u64 ns;
+
+ i915_perf_init_cyclecounter(stream);
+ spin_lock_init(&stream->systime_lock);
+
+ getnstimeofday64(&stream->start_systime);
+ ns = timespec64_to_ns(&stream->start_systime) + SYSTIME_START_OFFSET;
+
+ spin_lock_irqsave(&stream->systime_lock, flags);
+ timecounter_init(&stream->tc, &stream->cc, ns);
+ spin_unlock_irqrestore(&stream->systime_lock, flags);
+}
+
+/**
* i915_perf_enable_locked - handle `I915_PERF_IOCTL_ENABLE` ioctl
* @stream: A disabled i915 perf stream
*
@@ -2408,6 +2459,8 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
/* Allow stream->ops->enable() to refer to this */
stream->enabled = true;
+ i915_perf_init_timecounter(stream);
+
if (stream->ops->enable)
stream->ops->enable(stream);
}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index cfdf4f8..e7e6966 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -8882,6 +8882,12 @@ enum skl_power_gate {
/* Gen4+ Timestamp and Pipe Frame time stamp registers */
#define GEN4_TIMESTAMP _MMIO(0x2358)
+#define GEN7_TIMESTAMP_UDW _MMIO(0x235C)
+#define PRE_GEN7_TIMESTAMP_WIDTH 32
+#define GEN7_TIMESTAMP_WIDTH 36
+#define CS_TIMESTAMP_WIDTH(dev_priv) \
+ (INTEL_GEN(dev_priv) < 7 ? PRE_GEN7_TIMESTAMP_WIDTH : \
+ GEN7_TIMESTAMP_WIDTH)
#define ILK_TIMESTAMP_HI _MMIO(0x70070)
#define IVB_TIMESTAMP_CTR _MMIO(0x44070)
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [RFC 2/4] drm/i915/perf: Add support for collecting 64 bit timestamps with OA reports
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
2017-11-15 12:13 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Sagar Arun Kamble
@ 2017-11-15 12:13 ` Sagar Arun Kamble
2017-12-06 16:01 ` Lionel Landwerlin
2017-11-15 12:13 ` [RFC 3/4] drm/i915/perf: Extract raw GPU timestamps from " Sagar Arun Kamble
` (5 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-15 12:13 UTC (permalink / raw)
To: intel-gfx; +Cc: Sourab Gupta, Matthew Auld
From: Sourab Gupta <sourab.gupta@intel.com>
With this patch, for RCS, timestamps and OA reports can be collected
together, and provided to userspace in separate sample fields.
Next patch adds changes to derive timestamp from OA report.
v2: Rebase. Limiting the changes to only OA sample read. Updated sample
flag name.
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sourab Gupta <sourab.gupta@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/drm/i915/i915_perf.c | 20 +++++++++++++++++++-
include/uapi/drm/i915_drm.h | 7 +++++++
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 72ddc34..f7e748c 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -293,6 +293,7 @@
#define OAREPORT_REASON_CTX_SWITCH (1<<3)
#define OAREPORT_REASON_CLK_RATIO (1<<5)
+#define I915_PERF_TS_SAMPLE_SIZE 8
/* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
*
@@ -333,7 +334,8 @@
[I915_OA_FORMAT_C4_B8] = { 7, 64 },
};
-#define SAMPLE_OA_REPORT (1<<0)
+#define SAMPLE_OA_REPORT BIT(0)
+#define SAMPLE_GPU_TS BIT(1)
/**
* struct perf_open_properties - for validated properties given to open a stream
@@ -599,6 +601,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
int report_size = dev_priv->perf.oa.oa_buffer.format_size;
struct drm_i915_perf_record_header header;
u32 sample_flags = stream->sample_flags;
+ u64 gpu_ts = 0;
header.type = DRM_I915_PERF_RECORD_SAMPLE;
header.pad = 0;
@@ -615,6 +618,13 @@ static int append_oa_sample(struct i915_perf_stream *stream,
if (sample_flags & SAMPLE_OA_REPORT) {
if (copy_to_user(buf, report, report_size))
return -EFAULT;
+ buf += report_size;
+ }
+
+ if (sample_flags & SAMPLE_GPU_TS) {
+ /* Timestamp to be populated from OA report */
+ if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
+ return -EFAULT;
}
(*offset) += header.size;
@@ -2100,6 +2110,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
stream->sample_flags |= SAMPLE_OA_REPORT;
stream->sample_size += format_size;
+ if (props->sample_flags & SAMPLE_GPU_TS) {
+ stream->sample_flags |= SAMPLE_GPU_TS;
+ stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
+ }
+
dev_priv->perf.oa.oa_buffer.format_size = format_size;
if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
return -EINVAL;
@@ -2815,6 +2830,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
case DRM_I915_PERF_PROP_SAMPLE_OA:
props->sample_flags |= SAMPLE_OA_REPORT;
break;
+ case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
+ props->sample_flags |= SAMPLE_GPU_TS;
+ break;
case DRM_I915_PERF_PROP_OA_METRICS_SET:
if (value == 0) {
DRM_DEBUG("Unknown OA metric set ID\n");
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index b579859..0b9249e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1447,6 +1447,12 @@ enum drm_i915_perf_property_id {
DRM_I915_PERF_PROP_SAMPLE_OA,
/**
+ * The value of this property set to 1 requests inclusion of GPU
+ * timestamp in the perf sample data.
+ */
+ DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
+
+ /**
* The value specifies which set of OA unit metrics should be
* be configured, defining the contents of any OA unit reports.
*/
@@ -1532,6 +1538,7 @@ enum drm_i915_perf_record_type {
* struct drm_i915_perf_record_header header;
*
* { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
+ * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
* };
*/
DRM_I915_PERF_RECORD_SAMPLE = 1,
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [RFC 3/4] drm/i915/perf: Extract raw GPU timestamps from OA reports
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
2017-11-15 12:13 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Sagar Arun Kamble
2017-11-15 12:13 ` [RFC 2/4] drm/i915/perf: Add support for collecting 64 bit timestamps with OA reports Sagar Arun Kamble
@ 2017-11-15 12:13 ` Sagar Arun Kamble
2017-12-06 19:55 ` Lionel Landwerlin
2017-11-15 12:13 ` [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples Sagar Arun Kamble
` (4 subsequent siblings)
7 siblings, 1 reply; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-15 12:13 UTC (permalink / raw)
To: intel-gfx; +Cc: Sourab Gupta, Matthew Auld
From: Sourab Gupta <sourab.gupta@intel.com>
The OA reports contain the least significant 32 bits of the gpu timestamp.
This patch enables retrieval of the timestamp field from OA reports, to
forward as 64 bit raw gpu timestamps in the perf samples.
v2: Rebase w.r.t new timecounter support.
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sourab Gupta <sourab.gupta@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 2 ++
drivers/gpu/drm/i915/i915_perf.c | 26 +++++++++++++++++++++++++-
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e08bc85..5534cd2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2151,6 +2151,8 @@ struct i915_perf_stream {
*/
struct i915_oa_config *oa_config;
+ u64 last_gpu_ts;
+
/**
* System time correlation variables.
*/
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f7e748c..3b721d7 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -575,6 +575,26 @@ static int append_oa_status(struct i915_perf_stream *stream,
}
/**
+ * get_gpu_ts_from_oa_report - Retrieve absolute gpu timestamp from OA report
+ *
+ * Note: We are assuming that we're updating last_gpu_ts frequently enough so
+ * that it's never possible to see multiple overflows before we compare
+ * sample_ts to last_gpu_ts. Since this is significantly large duration
+ * (~6min for 80ns ts base), we can safely assume so.
+ */
+static u64 get_gpu_ts_from_oa_report(struct i915_perf_stream *stream,
+ const u8 *report)
+{
+ u32 sample_ts = *(u32 *)(report + 4);
+ u32 delta;
+
+ delta = sample_ts - (u32)stream->last_gpu_ts;
+ stream->last_gpu_ts += delta;
+
+ return stream->last_gpu_ts;
+}
+
+/**
* append_oa_sample - Copies single OA report into userspace read() buffer.
* @stream: An i915-perf stream opened for OA metrics
* @buf: destination buffer given by userspace
@@ -622,7 +642,9 @@ static int append_oa_sample(struct i915_perf_stream *stream,
}
if (sample_flags & SAMPLE_GPU_TS) {
- /* Timestamp to be populated from OA report */
+ /* Timestamp populated from OA report */
+ gpu_ts = get_gpu_ts_from_oa_report(stream, report);
+
if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
return -EFAULT;
}
@@ -2421,6 +2443,8 @@ static u64 i915_cyclecounter_read(const struct cyclecounter *cc)
GEN7_TIMESTAMP_UDW);
intel_runtime_pm_put(dev_priv);
+ stream->last_gpu_ts = ts_count;
+
return ts_count;
}
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
` (2 preceding siblings ...)
2017-11-15 12:13 ` [RFC 3/4] drm/i915/perf: Extract raw GPU timestamps from " Sagar Arun Kamble
@ 2017-11-15 12:13 ` Sagar Arun Kamble
2017-11-15 12:31 ` Chris Wilson
` (2 more replies)
2017-11-15 12:30 ` ✗ Fi.CI.BAT: warning for GPU/CPU timestamps correlation for relating OA samples with system events Patchwork
` (3 subsequent siblings)
7 siblings, 3 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-15 12:13 UTC (permalink / raw)
To: intel-gfx; +Cc: Sourab Gupta, Matthew Auld
From: Sourab Gupta <sourab.gupta@intel.com>
Currently, we have the ability to only forward the GPU timestamps in the
samples (which are generated via OA reports). This limits the ability to
correlate these samples with the system events.
An ability is therefore needed to report timestamps in different clock
domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
practical use to the userspace. This ability becomes important
when we want to correlate/plot GPU events/samples with other system events
on the same timeline (e.g. vblank events, or timestamps when work was
submitted to kernel, etc.)
The patch here proposes a mechanism to achieve this. The correlation
between gpu time and system time is established using the timestamp clock
associated with the command stream, abstracted as timecounter/cyclecounter
to retrieve gpu/system time correlated values.
v2: Added i915_driver_init_late() function to capture the new late init
phase for perf (Chris)
v3: Removed cross-timestamp changes.
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sourab Gupta <sourab.gupta@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
include/uapi/drm/i915_drm.h | 7 +++++++
2 files changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3b721d7..94ee924 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -336,6 +336,7 @@
#define SAMPLE_OA_REPORT BIT(0)
#define SAMPLE_GPU_TS BIT(1)
+#define SAMPLE_SYSTEM_TS BIT(2)
/**
* struct perf_open_properties - for validated properties given to open a stream
@@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
struct drm_i915_perf_record_header header;
u32 sample_flags = stream->sample_flags;
u64 gpu_ts = 0;
+ u64 system_ts = 0;
header.type = DRM_I915_PERF_RECORD_SAMPLE;
header.pad = 0;
@@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
return -EFAULT;
+ buf += I915_PERF_TS_SAMPLE_SIZE;
+ }
+
+ if (sample_flags & SAMPLE_SYSTEM_TS) {
+ gpu_ts = get_gpu_ts_from_oa_report(stream, report);
+ /*
+ * XXX: timecounter_cyc2time considers time backwards if delta
+ * timestamp is more than half the max ns time covered by
+ * counter. It will be ~35min for 36 bit counter. If this much
+ * sampling duration is needed we will have to update tc->nsec
+ * by explicitly reading the timecounter (timecounter_read)
+ * before this duration.
+ */
+ system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
+
+ if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
+ return -EFAULT;
}
(*offset) += header.size;
@@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
}
+ if (props->sample_flags & SAMPLE_SYSTEM_TS) {
+ stream->sample_flags |= SAMPLE_SYSTEM_TS;
+ stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
+ }
+
dev_priv->perf.oa.oa_buffer.format_size = format_size;
if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
return -EINVAL;
@@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
props->sample_flags |= SAMPLE_GPU_TS;
break;
+ case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
+ props->sample_flags |= SAMPLE_SYSTEM_TS;
+ break;
case DRM_I915_PERF_PROP_OA_METRICS_SET:
if (value == 0) {
DRM_DEBUG("Unknown OA metric set ID\n");
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 0b9249e..283859c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
/**
+ * This property requests inclusion of CLOCK_MONOTONIC system time in
+ * the perf sample data.
+ */
+ DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
+
+ /**
* The value specifies which set of OA unit metrics should be
* be configured, defining the contents of any OA unit reports.
*/
@@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
*
* { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
* { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
+ * { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
* };
*/
DRM_I915_PERF_RECORD_SAMPLE = 1,
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time
2017-11-15 12:13 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Sagar Arun Kamble
@ 2017-11-15 12:25 ` Chris Wilson
2017-11-15 16:41 ` Sagar Arun Kamble
` (2 more replies)
0 siblings, 3 replies; 45+ messages in thread
From: Chris Wilson @ 2017-11-15 12:25 UTC (permalink / raw)
To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
Quoting Sagar Arun Kamble (2017-11-15 12:13:51)
> #include <drm/drmP.h>
> #include <drm/intel-gtt.h>
> @@ -2149,6 +2150,14 @@ struct i915_perf_stream {
> * @oa_config: The OA configuration used by the stream.
> */
> struct i915_oa_config *oa_config;
> +
> + /**
> + * System time correlation variables.
> + */
> + struct cyclecounter cc;
> + spinlock_t systime_lock;
> + struct timespec64 start_systime;
> + struct timecounter tc;
This pattern is repeated a lot by struct timecounter users. (I'm still
trying to understand why the common case is not catered for by a
convenience timecounter api.)
> };
>
> /**
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 00be015..72ddc34 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -192,6 +192,7 @@
> */
>
> #include <linux/anon_inodes.h>
> +#include <linux/clocksource.h>
> #include <linux/sizes.h>
> #include <linux/uuid.h>
>
> @@ -2391,6 +2392,56 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
> }
>
> /**
> + * i915_cyclecounter_read - read raw cycle/timestamp counter
> + * @cc: cyclecounter structure
> + */
> +static u64 i915_cyclecounter_read(const struct cyclecounter *cc)
> +{
> + struct i915_perf_stream *stream = container_of(cc, typeof(*stream), cc);
> + struct drm_i915_private *dev_priv = stream->dev_priv;
> + u64 ts_count;
> +
> + intel_runtime_pm_get(dev_priv);
> + ts_count = I915_READ64_2x32(GEN4_TIMESTAMP,
> + GEN7_TIMESTAMP_UDW);
> + intel_runtime_pm_put(dev_priv);
> +
> + return ts_count;
> +}
> +
> +static void i915_perf_init_cyclecounter(struct i915_perf_stream *stream)
> +{
> + struct drm_i915_private *dev_priv = stream->dev_priv;
> + int cs_ts_freq = dev_priv->perf.oa.timestamp_frequency;
> + struct cyclecounter *cc = &stream->cc;
> + u32 maxsec;
> +
> + cc->read = i915_cyclecounter_read;
> + cc->mask = CYCLECOUNTER_MASK(CS_TIMESTAMP_WIDTH(dev_priv));
> + maxsec = cc->mask / cs_ts_freq;
> +
> + clocks_calc_mult_shift(&cc->mult, &cc->shift, cs_ts_freq,
> + NSEC_PER_SEC, maxsec);
> +}
> +
> +static void i915_perf_init_timecounter(struct i915_perf_stream *stream)
> +{
> +#define SYSTIME_START_OFFSET 350000 /* Counter read takes about 350us */
> + unsigned long flags;
> + u64 ns;
> +
> + i915_perf_init_cyclecounter(stream);
> + spin_lock_init(&stream->systime_lock);
> +
> + getnstimeofday64(&stream->start_systime);
> + ns = timespec64_to_ns(&stream->start_systime) + SYSTIME_START_OFFSET;
Use ktime directly. Or else Arnd will be back with a patch to fix it.
(All non-ktime interfaces are effectively deprecated; obsolete for
drivers.)
> + spin_lock_irqsave(&stream->systime_lock, flags);
> + timecounter_init(&stream->tc, &stream->cc, ns);
> + spin_unlock_irqrestore(&stream->systime_lock, flags);
> +}
> +
> +/**
> * i915_perf_enable_locked - handle `I915_PERF_IOCTL_ENABLE` ioctl
> * @stream: A disabled i915 perf stream
> *
> @@ -2408,6 +2459,8 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
> /* Allow stream->ops->enable() to refer to this */
> stream->enabled = true;
>
> + i915_perf_init_timecounter(stream);
> +
> if (stream->ops->enable)
> stream->ops->enable(stream);
> }
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index cfdf4f8..e7e6966 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -8882,6 +8882,12 @@ enum skl_power_gate {
>
> /* Gen4+ Timestamp and Pipe Frame time stamp registers */
> #define GEN4_TIMESTAMP _MMIO(0x2358)
> +#define GEN7_TIMESTAMP_UDW _MMIO(0x235C)
> +#define PRE_GEN7_TIMESTAMP_WIDTH 32
> +#define GEN7_TIMESTAMP_WIDTH 36
> +#define CS_TIMESTAMP_WIDTH(dev_priv) \
> + (INTEL_GEN(dev_priv) < 7 ? PRE_GEN7_TIMESTAMP_WIDTH : \
> + GEN7_TIMESTAMP_WIDTH)
s/PRE_GEN7/GEN4/ would be consistent.
If you really want to add support for earlier, I9XX_.
Ok. I can accept the justification, and we are not the only ones who do
the cyclecounter -> timecounter correction like this.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* ✗ Fi.CI.BAT: warning for GPU/CPU timestamps correlation for relating OA samples with system events
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
` (3 preceding siblings ...)
2017-11-15 12:13 ` [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples Sagar Arun Kamble
@ 2017-11-15 12:30 ` Patchwork
2017-12-05 14:16 ` [RFC 0/4] " Lionel Landwerlin
` (2 subsequent siblings)
7 siblings, 0 replies; 45+ messages in thread
From: Patchwork @ 2017-11-15 12:30 UTC (permalink / raw)
To: Sagar Arun Kamble; +Cc: intel-gfx
== Series Details ==
Series: GPU/CPU timestamps correlation for relating OA samples with system events
URL : https://patchwork.freedesktop.org/series/33869/
State : warning
== Summary ==
Series 33869v1 GPU/CPU timestamps correlation for relating OA samples with system events
https://patchwork.freedesktop.org/api/1.0/series/33869/revisions/1/mbox/
Warning: Kernel 32bit buildtest failed
Test chamelium:
Subgroup dp-crc-fast:
pass -> FAIL (fi-kbl-7500u) fdo#102514
Test kms_pipe_crc_basic:
Subgroup read-crc-pipe-c:
skip -> PASS (fi-hsw-4770r)
Subgroup read-crc-pipe-c-frame-sequence:
pass -> SKIP (fi-hsw-4770r)
fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514
fi-bdw-5557u total:289 pass:268 dwarn:0 dfail:0 fail:0 skip:21 time:446s
fi-bdw-gvtdvm total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:453s
fi-blb-e6850 total:289 pass:223 dwarn:1 dfail:0 fail:0 skip:65 time:385s
fi-bsw-n3050 total:289 pass:243 dwarn:0 dfail:0 fail:0 skip:46 time:536s
fi-bwr-2160 total:289 pass:183 dwarn:0 dfail:0 fail:0 skip:106 time:277s
fi-bxt-dsi total:289 pass:259 dwarn:0 dfail:0 fail:0 skip:30 time:508s
fi-bxt-j4205 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:503s
fi-byt-j1900 total:289 pass:254 dwarn:0 dfail:0 fail:0 skip:35 time:503s
fi-byt-n2820 total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:489s
fi-elk-e7500 total:289 pass:229 dwarn:0 dfail:0 fail:0 skip:60 time:430s
fi-glk-1 total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:540s
fi-hsw-4770 total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:431s
fi-hsw-4770r total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:439s
fi-ilk-650 total:289 pass:228 dwarn:0 dfail:0 fail:0 skip:61 time:429s
fi-ivb-3520m total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:483s
fi-ivb-3770 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:463s
fi-kbl-7500u total:289 pass:263 dwarn:1 dfail:0 fail:1 skip:24 time:482s
fi-kbl-7560u total:289 pass:270 dwarn:0 dfail:0 fail:0 skip:19 time:534s
fi-kbl-7567u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:471s
fi-kbl-r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:531s
fi-pnv-d510 total:289 pass:222 dwarn:1 dfail:0 fail:0 skip:66 time:567s
fi-skl-6260u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:456s
fi-skl-6600u total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:548s
fi-skl-6700hq total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:563s
fi-skl-6700k total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:518s
fi-skl-6770hq total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:500s
fi-skl-gvtdvm total:289 pass:266 dwarn:0 dfail:0 fail:0 skip:23 time:460s
fi-snb-2520m total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:560s
fi-snb-2600 total:289 pass:249 dwarn:0 dfail:0 fail:0 skip:40 time:423s
Blacklisted hosts:
fi-cfl-s total:289 pass:254 dwarn:3 dfail:0 fail:0 skip:32 time:540s
fi-cnl-y total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:549s
2cac5c93bc9427509e5164289e810bdcabfc0351 drm-tip: 2017y-11m-15d-09h-10m-41s UTC integration manifest
c61b6ec45639 drm/i915/perf: Send system clock monotonic time in perf samples
46882968ea89 drm/i915/perf: Extract raw GPU timestamps from OA reports
065be89cf048 drm/i915/perf: Add support for collecting 64 bit timestamps with OA reports
878fc922eb52 drm/i915/perf: Add support to correlate GPU timestamp with system time
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7140/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples
2017-11-15 12:13 ` [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples Sagar Arun Kamble
@ 2017-11-15 12:31 ` Chris Wilson
2017-11-15 16:51 ` Sagar Arun Kamble
2017-11-15 17:54 ` Sagar Arun Kamble
2017-12-05 14:22 ` Lionel Landwerlin
2 siblings, 1 reply; 45+ messages in thread
From: Chris Wilson @ 2017-11-15 12:31 UTC (permalink / raw)
To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
Quoting Sagar Arun Kamble (2017-11-15 12:13:54)
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> Currently, we have the ability to only forward the GPU timestamps in the
> samples (which are generated via OA reports). This limits the ability to
> correlate these samples with the system events.
>
> An ability is therefore needed to report timestamps in different clock
> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
> practical use to the userspace. This ability becomes important
> when we want to correlate/plot GPU events/samples with other system events
> on the same timeline (e.g. vblank events, or timestamps when work was
> submitted to kernel, etc.)
>
> The patch here proposes a mechanism to achieve this. The correlation
> between gpu time and system time is established using the timestamp clock
> associated with the command stream, abstracted as timecounter/cyclecounter
> to retrieve gpu/system time correlated values.
>
> v2: Added i915_driver_init_late() function to capture the new late init
> phase for perf (Chris)
>
> v3: Removed cross-timestamp changes.
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sourab Gupta <sourab.gupta@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
> include/uapi/drm/i915_drm.h | 7 +++++++
> 2 files changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3b721d7..94ee924 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -336,6 +336,7 @@
>
> #define SAMPLE_OA_REPORT BIT(0)
> #define SAMPLE_GPU_TS BIT(1)
> +#define SAMPLE_SYSTEM_TS BIT(2)
>
> /**
> * struct perf_open_properties - for validated properties given to open a stream
> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
> struct drm_i915_perf_record_header header;
> u32 sample_flags = stream->sample_flags;
> u64 gpu_ts = 0;
> + u64 system_ts = 0;
>
> header.type = DRM_I915_PERF_RECORD_SAMPLE;
> header.pad = 0;
> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>
> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
> return -EFAULT;
> + buf += I915_PERF_TS_SAMPLE_SIZE;
> + }
> +
> + if (sample_flags & SAMPLE_SYSTEM_TS) {
> + gpu_ts = get_gpu_ts_from_oa_report(stream, report);
Scope your variables. Stops us from being confused as to where else
gpu_ts or sys_ts may be reused. For instance I first thought you were
using SAMPLE_GPU_TS to initialise gpu_ts
> + /*
> + * XXX: timecounter_cyc2time considers time backwards if delta
> + * timestamp is more than half the max ns time covered by
> + * counter. It will be ~35min for 36 bit counter. If this much
> + * sampling duration is needed we will have to update tc->nsec
> + * by explicitly reading the timecounter (timecounter_read)
> + * before this duration.
> + */
> + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
> +
> + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
> + return -EFAULT;
Advance buf.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time
2017-11-15 12:25 ` Chris Wilson
@ 2017-11-15 16:41 ` Sagar Arun Kamble
2017-11-23 7:34 ` Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time] Sagar Arun Kamble
2017-12-05 13:58 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Lionel Landwerlin
2 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-15 16:41 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 11/15/2017 5:55 PM, Chris Wilson wrote:
> Quoting Sagar Arun Kamble (2017-11-15 12:13:51)
>> #include <drm/drmP.h>
>> #include <drm/intel-gtt.h>
>> @@ -2149,6 +2150,14 @@ struct i915_perf_stream {
>> * @oa_config: The OA configuration used by the stream.
>> */
>> struct i915_oa_config *oa_config;
>> +
>> + /**
>> + * System time correlation variables.
>> + */
>> + struct cyclecounter cc;
>> + spinlock_t systime_lock;
>> + struct timespec64 start_systime;
>> + struct timecounter tc;
> This pattern is repeated a lot by struct timecounter users. (I'm still
> trying to understand why the common case is not catered for by a
> convenience timecounter api.)
Yes. Looks good case to change the cyclecounter* member in timecounter
to cyclecounter.
And then we can avoid separate cyclecounter.
>
>> };
>>
>> /**
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 00be015..72ddc34 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -192,6 +192,7 @@
>> */
>>
>> #include <linux/anon_inodes.h>
>> +#include <linux/clocksource.h>
>> #include <linux/sizes.h>
>> #include <linux/uuid.h>
>>
>> @@ -2391,6 +2392,56 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
>> }
>>
>> /**
>> + * i915_cyclecounter_read - read raw cycle/timestamp counter
>> + * @cc: cyclecounter structure
>> + */
>> +static u64 i915_cyclecounter_read(const struct cyclecounter *cc)
>> +{
>> + struct i915_perf_stream *stream = container_of(cc, typeof(*stream), cc);
>> + struct drm_i915_private *dev_priv = stream->dev_priv;
>> + u64 ts_count;
>> +
>> + intel_runtime_pm_get(dev_priv);
>> + ts_count = I915_READ64_2x32(GEN4_TIMESTAMP,
>> + GEN7_TIMESTAMP_UDW);
>> + intel_runtime_pm_put(dev_priv);
>> +
>> + return ts_count;
>> +}
>> +
>> +static void i915_perf_init_cyclecounter(struct i915_perf_stream *stream)
>> +{
>> + struct drm_i915_private *dev_priv = stream->dev_priv;
>> + int cs_ts_freq = dev_priv->perf.oa.timestamp_frequency;
>> + struct cyclecounter *cc = &stream->cc;
>> + u32 maxsec;
>> +
>> + cc->read = i915_cyclecounter_read;
>> + cc->mask = CYCLECOUNTER_MASK(CS_TIMESTAMP_WIDTH(dev_priv));
>> + maxsec = cc->mask / cs_ts_freq;
>> +
>> + clocks_calc_mult_shift(&cc->mult, &cc->shift, cs_ts_freq,
>> + NSEC_PER_SEC, maxsec);
>> +}
>> +
>> +static void i915_perf_init_timecounter(struct i915_perf_stream *stream)
>> +{
>> +#define SYSTIME_START_OFFSET 350000 /* Counter read takes about 350us */
>> + unsigned long flags;
>> + u64 ns;
>> +
>> + i915_perf_init_cyclecounter(stream);
>> + spin_lock_init(&stream->systime_lock);
>> +
>> + getnstimeofday64(&stream->start_systime);
>> + ns = timespec64_to_ns(&stream->start_systime) + SYSTIME_START_OFFSET;
> Use ktime directly. Or else Arnd will be back with a patch to fix it.
> (All non-ktime interfaces are effectively deprecated; obsolete for
> drivers.)
Ok. Will update.
>
>> + spin_lock_irqsave(&stream->systime_lock, flags);
>> + timecounter_init(&stream->tc, &stream->cc, ns);
>> + spin_unlock_irqrestore(&stream->systime_lock, flags);
>> +}
>> +
>> +/**
>> * i915_perf_enable_locked - handle `I915_PERF_IOCTL_ENABLE` ioctl
>> * @stream: A disabled i915 perf stream
>> *
>> @@ -2408,6 +2459,8 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>> /* Allow stream->ops->enable() to refer to this */
>> stream->enabled = true;
>>
>> + i915_perf_init_timecounter(stream);
>> +
>> if (stream->ops->enable)
>> stream->ops->enable(stream);
>> }
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index cfdf4f8..e7e6966 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -8882,6 +8882,12 @@ enum skl_power_gate {
>>
>> /* Gen4+ Timestamp and Pipe Frame time stamp registers */
>> #define GEN4_TIMESTAMP _MMIO(0x2358)
>> +#define GEN7_TIMESTAMP_UDW _MMIO(0x235C)
>> +#define PRE_GEN7_TIMESTAMP_WIDTH 32
>> +#define GEN7_TIMESTAMP_WIDTH 36
>> +#define CS_TIMESTAMP_WIDTH(dev_priv) \
>> + (INTEL_GEN(dev_priv) < 7 ? PRE_GEN7_TIMESTAMP_WIDTH : \
>> + GEN7_TIMESTAMP_WIDTH)
> s/PRE_GEN7/GEN4/ would be consistent.
> If you really want to add support for earlier, I9XX_.
Yes.
>
> Ok. I can accept the justification, and we are not the only ones who do
> the cyclecounter -> timecounter correction like this.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples
2017-11-15 12:31 ` Chris Wilson
@ 2017-11-15 16:51 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-15 16:51 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 11/15/2017 6:01 PM, Chris Wilson wrote:
> Quoting Sagar Arun Kamble (2017-11-15 12:13:54)
>> From: Sourab Gupta <sourab.gupta@intel.com>
>>
>> Currently, we have the ability to only forward the GPU timestamps in the
>> samples (which are generated via OA reports). This limits the ability to
>> correlate these samples with the system events.
>>
>> An ability is therefore needed to report timestamps in different clock
>> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
>> practical use to the userspace. This ability becomes important
>> when we want to correlate/plot GPU events/samples with other system events
>> on the same timeline (e.g. vblank events, or timestamps when work was
>> submitted to kernel, etc.)
>>
>> The patch here proposes a mechanism to achieve this. The correlation
>> between gpu time and system time is established using the timestamp clock
>> associated with the command stream, abstracted as timecounter/cyclecounter
>> to retrieve gpu/system time correlated values.
>>
>> v2: Added i915_driver_init_late() function to capture the new late init
>> phase for perf (Chris)
>>
>> v3: Removed cross-timestamp changes.
>>
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sourab Gupta <sourab.gupta@intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>> include/uapi/drm/i915_drm.h | 7 +++++++
>> 2 files changed, 34 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 3b721d7..94ee924 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -336,6 +336,7 @@
>>
>> #define SAMPLE_OA_REPORT BIT(0)
>> #define SAMPLE_GPU_TS BIT(1)
>> +#define SAMPLE_SYSTEM_TS BIT(2)
>>
>> /**
>> * struct perf_open_properties - for validated properties given to open a stream
>> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>> struct drm_i915_perf_record_header header;
>> u32 sample_flags = stream->sample_flags;
>> u64 gpu_ts = 0;
>> + u64 system_ts = 0;
>>
>> header.type = DRM_I915_PERF_RECORD_SAMPLE;
>> header.pad = 0;
>> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>>
>> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>> return -EFAULT;
>> + buf += I915_PERF_TS_SAMPLE_SIZE;
>> + }
>> +
>> + if (sample_flags & SAMPLE_SYSTEM_TS) {
>> + gpu_ts = get_gpu_ts_from_oa_report(stream, report);
> Scope your variables. Stops us from being confused as to where else
> gpu_ts or sys_ts may be reused. For instance I first thought you were
> using SAMPLE_GPU_TS to initialise gpu_ts
Yes
>> + /*
>> + * XXX: timecounter_cyc2time considers time backwards if delta
>> + * timestamp is more than half the max ns time covered by
>> + * counter. It will be ~35min for 36 bit counter. If this much
>> + * sampling duration is needed we will have to update tc->nsec
>> + * by explicitly reading the timecounter (timecounter_read)
>> + * before this duration.
>> + */
>> + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
>> +
>> + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
>> + return -EFAULT;
> Advance buf.
Had kept advance logic only if there is new field to be added as this
advance is missing for
SAMPLE_OA_REPORT currently in drm-tip. Will fix.
Thanks for the review :)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples
2017-11-15 12:13 ` [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples Sagar Arun Kamble
2017-11-15 12:31 ` Chris Wilson
@ 2017-11-15 17:54 ` Sagar Arun Kamble
2017-12-05 14:22 ` Lionel Landwerlin
2 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-15 17:54 UTC (permalink / raw)
To: intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 11/15/2017 5:43 PM, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> Currently, we have the ability to only forward the GPU timestamps in the
> samples (which are generated via OA reports). This limits the ability to
> correlate these samples with the system events.
>
> An ability is therefore needed to report timestamps in different clock
> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
> practical use to the userspace. This ability becomes important
> when we want to correlate/plot GPU events/samples with other system events
> on the same timeline (e.g. vblank events, or timestamps when work was
> submitted to kernel, etc.)
>
> The patch here proposes a mechanism to achieve this. The correlation
> between gpu time and system time is established using the timestamp clock
> associated with the command stream, abstracted as timecounter/cyclecounter
> to retrieve gpu/system time correlated values.
>
> v2: Added i915_driver_init_late() function to capture the new late init
> phase for perf (Chris)
>
> v3: Removed cross-timestamp changes.
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sourab Gupta <sourab.gupta@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
> include/uapi/drm/i915_drm.h | 7 +++++++
> 2 files changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3b721d7..94ee924 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -336,6 +336,7 @@
>
> #define SAMPLE_OA_REPORT BIT(0)
> #define SAMPLE_GPU_TS BIT(1)
> +#define SAMPLE_SYSTEM_TS BIT(2)
>
> /**
> * struct perf_open_properties - for validated properties given to open a stream
> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
> struct drm_i915_perf_record_header header;
> u32 sample_flags = stream->sample_flags;
> u64 gpu_ts = 0;
> + u64 system_ts = 0;
>
> header.type = DRM_I915_PERF_RECORD_SAMPLE;
> header.pad = 0;
> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>
> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
> return -EFAULT;
> + buf += I915_PERF_TS_SAMPLE_SIZE;
> + }
> +
> + if (sample_flags & SAMPLE_SYSTEM_TS) {
> + gpu_ts = get_gpu_ts_from_oa_report(stream, report);
> + /*
> + * XXX: timecounter_cyc2time considers time backwards if delta
> + * timestamp is more than half the max ns time covered by
> + * counter. It will be ~35min for 36 bit counter. If this much
> + * sampling duration is needed we will have to update tc->nsec
> + * by explicitly reading the timecounter (timecounter_read)
> + * before this duration.
This is implemented as overflow watchdog in mlx5e_timestamp_init
(drivers/net/ethernet/mellanox).
Will need similar mechanism here.
> + */
> + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
> +
> + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
> + return -EFAULT;
> }
>
> (*offset) += header.size;
> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
> stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
> }
>
> + if (props->sample_flags & SAMPLE_SYSTEM_TS) {
> + stream->sample_flags |= SAMPLE_SYSTEM_TS;
> + stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
> + }
> +
> dev_priv->perf.oa.oa_buffer.format_size = format_size;
> if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> return -EINVAL;
> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
> case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
> props->sample_flags |= SAMPLE_GPU_TS;
> break;
> + case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
> + props->sample_flags |= SAMPLE_SYSTEM_TS;
> + break;
> case DRM_I915_PERF_PROP_OA_METRICS_SET:
> if (value == 0) {
> DRM_DEBUG("Unknown OA metric set ID\n");
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 0b9249e..283859c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
> DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>
> /**
> + * This property requests inclusion of CLOCK_MONOTONIC system time in
> + * the perf sample data.
> + */
> + DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
> +
> + /**
> * The value specifies which set of OA unit metrics should be
> * be configured, defining the contents of any OA unit reports.
> */
> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
> *
> * { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
> * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
> + * { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
> * };
> */
> DRM_I915_PERF_RECORD_SAMPLE = 1,
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
2017-11-15 12:25 ` Chris Wilson
2017-11-15 16:41 ` Sagar Arun Kamble
@ 2017-11-23 7:34 ` Sagar Arun Kamble
2017-11-23 18:59 ` Thomas Gleixner
2017-12-05 13:58 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Lionel Landwerlin
2 siblings, 1 reply; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-23 7:34 UTC (permalink / raw)
To: John Stultz, Thomas Gleixner, Stephen Boyd
Cc: Chris Wilson, linux-kernel, Sagar A Kamble
Hi,
We needed inputs on possible optimization that can be done to timecounter/cyclecounter structures/usage.
This mail is in response to review of patch https://patchwork.freedesktop.org/patch/188448/.
As Chris's observation below, about dozen of timecounter users in the kernel have below structures
defined individually:
spinlock_t lock;
struct cyclecounter cc;
struct timecounter tc;
Can we move lock and cc to tc? That way it will be convenient.
Also it will allow unifying the locking/overflow watchdog handling across all drivers.
Please suggest.
Thanks
Sagar
On 11/15/2017 5:55 PM, Chris Wilson wrote:
> Quoting Sagar Arun Kamble (2017-11-15 12:13:51)
>> #include <drm/drmP.h>
>> #include <drm/intel-gtt.h>
>> @@ -2149,6 +2150,14 @@ struct i915_perf_stream {
>> * @oa_config: The OA configuration used by the stream.
>> */
>> struct i915_oa_config *oa_config;
>> +
>> + /**
>> + * System time correlation variables.
>> + */
>> + struct cyclecounter cc;
>> + spinlock_t systime_lock;
>> + struct timespec64 start_systime;
>> + struct timecounter tc;
> This pattern is repeated a lot by struct timecounter users. (I'm still
> trying to understand why the common case is not catered for by a
> convenience timecounter api.)
>
>> };
>>
>> /**
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 00be015..72ddc34 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -192,6 +192,7 @@
>> */
>>
>> #include <linux/anon_inodes.h>
>> +#include <linux/clocksource.h>
>> #include <linux/sizes.h>
>> #include <linux/uuid.h>
>>
>> @@ -2391,6 +2392,56 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
>> }
>>
>> /**
>> + * i915_cyclecounter_read - read raw cycle/timestamp counter
>> + * @cc: cyclecounter structure
>> + */
>> +static u64 i915_cyclecounter_read(const struct cyclecounter *cc)
>> +{
>> + struct i915_perf_stream *stream = container_of(cc, typeof(*stream), cc);
>> + struct drm_i915_private *dev_priv = stream->dev_priv;
>> + u64 ts_count;
>> +
>> + intel_runtime_pm_get(dev_priv);
>> + ts_count = I915_READ64_2x32(GEN4_TIMESTAMP,
>> + GEN7_TIMESTAMP_UDW);
>> + intel_runtime_pm_put(dev_priv);
>> +
>> + return ts_count;
>> +}
>> +
>> +static void i915_perf_init_cyclecounter(struct i915_perf_stream *stream)
>> +{
>> + struct drm_i915_private *dev_priv = stream->dev_priv;
>> + int cs_ts_freq = dev_priv->perf.oa.timestamp_frequency;
>> + struct cyclecounter *cc = &stream->cc;
>> + u32 maxsec;
>> +
>> + cc->read = i915_cyclecounter_read;
>> + cc->mask = CYCLECOUNTER_MASK(CS_TIMESTAMP_WIDTH(dev_priv));
>> + maxsec = cc->mask / cs_ts_freq;
>> +
>> + clocks_calc_mult_shift(&cc->mult, &cc->shift, cs_ts_freq,
>> + NSEC_PER_SEC, maxsec);
>> +}
>> +
>> +static void i915_perf_init_timecounter(struct i915_perf_stream *stream)
>> +{
>> +#define SYSTIME_START_OFFSET 350000 /* Counter read takes about 350us */
>> + unsigned long flags;
>> + u64 ns;
>> +
>> + i915_perf_init_cyclecounter(stream);
>> + spin_lock_init(&stream->systime_lock);
>> +
>> + getnstimeofday64(&stream->start_systime);
>> + ns = timespec64_to_ns(&stream->start_systime) + SYSTIME_START_OFFSET;
> Use ktime directly. Or else Arnd will be back with a patch to fix it.
> (All non-ktime interfaces are effectively deprecated; obsolete for
> drivers.)
>
>> + spin_lock_irqsave(&stream->systime_lock, flags);
>> + timecounter_init(&stream->tc, &stream->cc, ns);
>> + spin_unlock_irqrestore(&stream->systime_lock, flags);
>> +}
>> +
>> +/**
>> * i915_perf_enable_locked - handle `I915_PERF_IOCTL_ENABLE` ioctl
>> * @stream: A disabled i915 perf stream
>> *
>> @@ -2408,6 +2459,8 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>> /* Allow stream->ops->enable() to refer to this */
>> stream->enabled = true;
>>
>> + i915_perf_init_timecounter(stream);
>> +
>> if (stream->ops->enable)
>> stream->ops->enable(stream);
>> }
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index cfdf4f8..e7e6966 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -8882,6 +8882,12 @@ enum skl_power_gate {
>>
>> /* Gen4+ Timestamp and Pipe Frame time stamp registers */
>> #define GEN4_TIMESTAMP _MMIO(0x2358)
>> +#define GEN7_TIMESTAMP_UDW _MMIO(0x235C)
>> +#define PRE_GEN7_TIMESTAMP_WIDTH 32
>> +#define GEN7_TIMESTAMP_WIDTH 36
>> +#define CS_TIMESTAMP_WIDTH(dev_priv) \
>> + (INTEL_GEN(dev_priv) < 7 ? PRE_GEN7_TIMESTAMP_WIDTH : \
>> + GEN7_TIMESTAMP_WIDTH)
> s/PRE_GEN7/GEN4/ would be consistent.
> If you really want to add support for earlier, I9XX_.
>
> Ok. I can accept the justification, and we are not the only ones who do
> the cyclecounter -> timecounter correction like this.
> -Chris
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
2017-11-23 7:34 ` Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time] Sagar Arun Kamble
@ 2017-11-23 18:59 ` Thomas Gleixner
2017-11-24 9:06 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2017-11-23 18:59 UTC (permalink / raw)
To: Sagar Arun Kamble; +Cc: John Stultz, Stephen Boyd, Chris Wilson, linux-kernel
On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
> We needed inputs on possible optimization that can be done to
> timecounter/cyclecounter structures/usage.
> This mail is in response to review of patch
> https://patchwork.freedesktop.org/patch/188448/.
>
> As Chris's observation below, about dozen of timecounter users in the kernel
> have below structures
> defined individually:
>
> spinlock_t lock;
> struct cyclecounter cc;
> struct timecounter tc;
>
> Can we move lock and cc to tc? That way it will be convenient.
> Also it will allow unifying the locking/overflow watchdog handling across all
> drivers.
Looks like none of the timecounter usage sites has a real need to separate
timecounter and cyclecounter.
The lock is a different question. The locking of the various drivers
differs and I have no idea how you want to handle that. Just sticking the
lock into the datastructure and then not making use of it in the
timercounter code and leave it to the callsites does not make sense.
Thanks,
tglx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
2017-11-23 18:59 ` Thomas Gleixner
@ 2017-11-24 9:06 ` Sagar Arun Kamble
2017-11-24 13:31 ` Thomas Gleixner
0 siblings, 1 reply; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-24 9:06 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: John Stultz, Stephen Boyd, Chris Wilson, linux-kernel
On 11/24/2017 12:29 AM, Thomas Gleixner wrote:
> On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
>> We needed inputs on possible optimization that can be done to
>> timecounter/cyclecounter structures/usage.
>> This mail is in response to review of patch
>> https://patchwork.freedesktop.org/patch/188448/.
>>
>> As Chris's observation below, about dozen of timecounter users in the kernel
>> have below structures
>> defined individually:
>>
>> spinlock_t lock;
>> struct cyclecounter cc;
>> struct timecounter tc;
>>
>> Can we move lock and cc to tc? That way it will be convenient.
>> Also it will allow unifying the locking/overflow watchdog handling across all
>> drivers.
> Looks like none of the timecounter usage sites has a real need to separate
> timecounter and cyclecounter.
Yes. Will share patch for this change.
> The lock is a different question. The locking of the various drivers
> differs and I have no idea how you want to handle that. Just sticking the
> lock into the datastructure and then not making use of it in the
> timercounter code and leave it to the callsites does not make sense.
Most of the locks are held around timecounter_read. In some instances it is held when cyclecounter is
updated standalone or is updated along with timecounter calls.
Was thinking if we move the lock in timecounter functions, drivers just have to do locking around its
operations on cyclecounter. But then another problem I see is there are variation of locking calls
like lock_irqsave, lock_bh, write_lock_irqsave (some using rwlock_t). Should this all locking be left
to driver only then?
> Thanks,
>
> tglx
Thanks
Sagar
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
2017-11-24 9:06 ` Sagar Arun Kamble
@ 2017-11-24 13:31 ` Thomas Gleixner
2017-11-27 10:05 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2017-11-24 13:31 UTC (permalink / raw)
To: Sagar Arun Kamble; +Cc: John Stultz, Stephen Boyd, Chris Wilson, linux-kernel
On Fri, 24 Nov 2017, Sagar Arun Kamble wrote:
> On 11/24/2017 12:29 AM, Thomas Gleixner wrote:
> > On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
> > > We needed inputs on possible optimization that can be done to
> > > timecounter/cyclecounter structures/usage.
> > > This mail is in response to review of patch
> > > https://patchwork.freedesktop.org/patch/188448/.
> > >
> > > As Chris's observation below, about dozen of timecounter users in the
> > > kernel
> > > have below structures
> > > defined individually:
> > >
> > > spinlock_t lock;
> > > struct cyclecounter cc;
> > > struct timecounter tc;
> > >
> > > Can we move lock and cc to tc? That way it will be convenient.
> > > Also it will allow unifying the locking/overflow watchdog handling across
> > > all
> > > drivers.
> > Looks like none of the timecounter usage sites has a real need to separate
> > timecounter and cyclecounter.
>
> Yes. Will share patch for this change.
>
> > The lock is a different question. The locking of the various drivers
> > differs and I have no idea how you want to handle that. Just sticking the
> > lock into the datastructure and then not making use of it in the
> > timercounter code and leave it to the callsites does not make sense.
>
> Most of the locks are held around timecounter_read. In some instances it
> is held when cyclecounter is updated standalone or is updated along with
> timecounter calls. Was thinking if we move the lock in timecounter
> functions, drivers just have to do locking around its operations on
> cyclecounter. But then another problem I see is there are variation of
> locking calls like lock_irqsave, lock_bh, write_lock_irqsave (some using
> rwlock_t). Should this all locking be left to driver only then?
You could have the lock in the struct and protect the inner workings in the
related core functions.
That might remove locking requirements from some of the callers and the
others still have their own thing around it.
Thanks,
tglx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
2017-11-24 13:31 ` Thomas Gleixner
@ 2017-11-27 10:05 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-27 10:05 UTC (permalink / raw)
To: Thomas Gleixner
Cc: John Stultz, Stephen Boyd, Chris Wilson,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 11/24/2017 7:01 PM, Thomas Gleixner wrote:
> On Fri, 24 Nov 2017, Sagar Arun Kamble wrote:
>> On 11/24/2017 12:29 AM, Thomas Gleixner wrote:
>>> On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
>>>> We needed inputs on possible optimization that can be done to
>>>> timecounter/cyclecounter structures/usage.
>>>> This mail is in response to review of patch
>>>> https://patchwork.freedesktop.org/patch/188448/.
>>>>
>>>> As Chris's observation below, about dozen of timecounter users in the
>>>> kernel
>>>> have below structures
>>>> defined individually:
>>>>
>>>> spinlock_t lock;
>>>> struct cyclecounter cc;
>>>> struct timecounter tc;
>>>>
>>>> Can we move lock and cc to tc? That way it will be convenient.
>>>> Also it will allow unifying the locking/overflow watchdog handling across
>>>> all
>>>> drivers.
>>> Looks like none of the timecounter usage sites has a real need to separate
>>> timecounter and cyclecounter.
>> Yes. Will share patch for this change.
>>
>>> The lock is a different question. The locking of the various drivers
>>> differs and I have no idea how you want to handle that. Just sticking the
>>> lock into the datastructure and then not making use of it in the
>>> timercounter code and leave it to the callsites does not make sense.
>> Most of the locks are held around timecounter_read. In some instances it
>> is held when cyclecounter is updated standalone or is updated along with
>> timecounter calls. Was thinking if we move the lock in timecounter
>> functions, drivers just have to do locking around its operations on
>> cyclecounter. But then another problem I see is there are variation of
>> locking calls like lock_irqsave, lock_bh, write_lock_irqsave (some using
>> rwlock_t). Should this all locking be left to driver only then?
> You could have the lock in the struct and protect the inner workings in the
> related core functions.
>
> That might remove locking requirements from some of the callers and the
> others still have their own thing around it.
For drivers having static/fixed cyclecounter, we can rely only on lock inside timecounter.
Most of the network drivers update cyclecounter at runtime and they will have to rely on two locks if
we add one to timecounter. This may not be efficient for them. Also the lock in timecounter has to be less restrictive (may be seqlock) I guess.
Cc'd Mellanox list for inputs on this.
I have started feeling that the current approach of drivers managing the locks is the right one so better leave the
lock out of timecounter.
> Thanks,
>
> tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
@ 2017-11-27 10:05 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-11-27 10:05 UTC (permalink / raw)
To: Thomas Gleixner
Cc: John Stultz, Stephen Boyd, Chris Wilson, linux-kernel, netdev,
linux-rdma
On 11/24/2017 7:01 PM, Thomas Gleixner wrote:
> On Fri, 24 Nov 2017, Sagar Arun Kamble wrote:
>> On 11/24/2017 12:29 AM, Thomas Gleixner wrote:
>>> On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
>>>> We needed inputs on possible optimization that can be done to
>>>> timecounter/cyclecounter structures/usage.
>>>> This mail is in response to review of patch
>>>> https://patchwork.freedesktop.org/patch/188448/.
>>>>
>>>> As Chris's observation below, about dozen of timecounter users in the
>>>> kernel
>>>> have below structures
>>>> defined individually:
>>>>
>>>> spinlock_t lock;
>>>> struct cyclecounter cc;
>>>> struct timecounter tc;
>>>>
>>>> Can we move lock and cc to tc? That way it will be convenient.
>>>> Also it will allow unifying the locking/overflow watchdog handling across
>>>> all
>>>> drivers.
>>> Looks like none of the timecounter usage sites has a real need to separate
>>> timecounter and cyclecounter.
>> Yes. Will share patch for this change.
>>
>>> The lock is a different question. The locking of the various drivers
>>> differs and I have no idea how you want to handle that. Just sticking the
>>> lock into the datastructure and then not making use of it in the
>>> timercounter code and leave it to the callsites does not make sense.
>> Most of the locks are held around timecounter_read. In some instances it
>> is held when cyclecounter is updated standalone or is updated along with
>> timecounter calls. Was thinking if we move the lock in timecounter
>> functions, drivers just have to do locking around its operations on
>> cyclecounter. But then another problem I see is there are variation of
>> locking calls like lock_irqsave, lock_bh, write_lock_irqsave (some using
>> rwlock_t). Should this all locking be left to driver only then?
> You could have the lock in the struct and protect the inner workings in the
> related core functions.
>
> That might remove locking requirements from some of the callers and the
> others still have their own thing around it.
For drivers having static/fixed cyclecounter, we can rely only on lock inside timecounter.
Most of the network drivers update cyclecounter at runtime and they will have to rely on two locks if
we add one to timecounter. This may not be efficient for them. Also the lock in timecounter has to be less restrictive (may be seqlock) I guess.
Cc'd Mellanox list for inputs on this.
I have started feeling that the current approach of drivers managing the locks is the right one so better leave the
lock out of timecounter.
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
2017-11-27 10:05 ` Sagar Arun Kamble
@ 2017-11-30 21:03 ` Saeed Mahameed
-1 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2017-11-30 21:03 UTC (permalink / raw)
To: Sagar Arun Kamble
Cc: Thomas Gleixner, John Stultz, Stephen Boyd, Chris Wilson,
linux-kernel, Linux Netdev List,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
On Mon, Nov 27, 2017 at 2:05 AM, Sagar Arun Kamble
<sagar.a.kamble-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>
>
> On 11/24/2017 7:01 PM, Thomas Gleixner wrote:
>>
>> On Fri, 24 Nov 2017, Sagar Arun Kamble wrote:
>>>
>>> On 11/24/2017 12:29 AM, Thomas Gleixner wrote:
>>>>
>>>> On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
>>>>>
>>>>> We needed inputs on possible optimization that can be done to
>>>>> timecounter/cyclecounter structures/usage.
>>>>> This mail is in response to review of patch
>>>>> https://patchwork.freedesktop.org/patch/188448/.
>>>>>
>>>>> As Chris's observation below, about dozen of timecounter users in the
>>>>> kernel
>>>>> have below structures
>>>>> defined individually:
>>>>>
>>>>> spinlock_t lock;
>>>>> struct cyclecounter cc;
>>>>> struct timecounter tc;
>>>>>
>>>>> Can we move lock and cc to tc? That way it will be convenient.
>>>>> Also it will allow unifying the locking/overflow watchdog handling
>>>>> across
>>>>> all
>>>>> drivers.
>>>>
>>>> Looks like none of the timecounter usage sites has a real need to
>>>> separate
>>>> timecounter and cyclecounter.
>>>
>>> Yes. Will share patch for this change.
>>>
>>>> The lock is a different question. The locking of the various drivers
>>>> differs and I have no idea how you want to handle that. Just sticking
>>>> the
>>>> lock into the datastructure and then not making use of it in the
>>>> timercounter code and leave it to the callsites does not make sense.
>>>
>>> Most of the locks are held around timecounter_read. In some instances it
>>> is held when cyclecounter is updated standalone or is updated along with
>>> timecounter calls. Was thinking if we move the lock in timecounter
>>> functions, drivers just have to do locking around its operations on
>>> cyclecounter. But then another problem I see is there are variation of
>>> locking calls like lock_irqsave, lock_bh, write_lock_irqsave (some using
>>> rwlock_t). Should this all locking be left to driver only then?
>>
>> You could have the lock in the struct and protect the inner workings in
>> the
>> related core functions.
>>
>> That might remove locking requirements from some of the callers and the
>> others still have their own thing around it.
>
>
> For drivers having static/fixed cyclecounter, we can rely only on lock
> inside timecounter.
> Most of the network drivers update cyclecounter at runtime and they will
> have to rely on two locks if
> we add one to timecounter. This may not be efficient for them. Also the lock
> in timecounter has to be less restrictive (may be seqlock) I guess.
>
> Cc'd Mellanox list for inputs on this.
>
> I have started feeling that the current approach of drivers managing the
> locks is the right one so better leave the
> lock out of timecounter.
>
I agree here,
In mlx5 we rely on our own read/write lock to serialize access to
mlx5_clock struct (mlx5 timecounter and cyclecounter).
the access is not as simple as
lock()
call time_counter_API
unlock()
Sometimes we also explicitly update/adjust timecycles counters with
mlx5 specific calculations after we read the timecounter all inside
our lock.
e.g.
@mlx5_ptp_adjfreq()
write_lock_irqsave(&clock->lock, flags);
timecounter_read(&clock->tc);
clock->cycles.mult = neg_adj ? clock->nominal_c_mult - diff :
clock->nominal_c_mult + diff;
write_unlock_irqrestore(&clock->lock, flags);
So i don't think it will be a simple task to have a generic thread
safe timecounter API, without the need to specifically adjust it for
all driver use-cases.
Also as said above, in runtime it is not obvious in which context the
timecounter will be accessed irq/soft irq/user.
let's keep it as is, and let the driver decide which locking scheme is
most suitable for it.
Thanks,
Saeed.
>> Thanks,
>>
>> tglx
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
@ 2017-11-30 21:03 ` Saeed Mahameed
0 siblings, 0 replies; 45+ messages in thread
From: Saeed Mahameed @ 2017-11-30 21:03 UTC (permalink / raw)
To: Sagar Arun Kamble
Cc: Thomas Gleixner, John Stultz, Stephen Boyd, Chris Wilson,
linux-kernel, Linux Netdev List, linux-rdma
On Mon, Nov 27, 2017 at 2:05 AM, Sagar Arun Kamble
<sagar.a.kamble@intel.com> wrote:
>
>
> On 11/24/2017 7:01 PM, Thomas Gleixner wrote:
>>
>> On Fri, 24 Nov 2017, Sagar Arun Kamble wrote:
>>>
>>> On 11/24/2017 12:29 AM, Thomas Gleixner wrote:
>>>>
>>>> On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
>>>>>
>>>>> We needed inputs on possible optimization that can be done to
>>>>> timecounter/cyclecounter structures/usage.
>>>>> This mail is in response to review of patch
>>>>> https://patchwork.freedesktop.org/patch/188448/.
>>>>>
>>>>> As Chris's observation below, about dozen of timecounter users in the
>>>>> kernel
>>>>> have below structures
>>>>> defined individually:
>>>>>
>>>>> spinlock_t lock;
>>>>> struct cyclecounter cc;
>>>>> struct timecounter tc;
>>>>>
>>>>> Can we move lock and cc to tc? That way it will be convenient.
>>>>> Also it will allow unifying the locking/overflow watchdog handling
>>>>> across
>>>>> all
>>>>> drivers.
>>>>
>>>> Looks like none of the timecounter usage sites has a real need to
>>>> separate
>>>> timecounter and cyclecounter.
>>>
>>> Yes. Will share patch for this change.
>>>
>>>> The lock is a different question. The locking of the various drivers
>>>> differs and I have no idea how you want to handle that. Just sticking
>>>> the
>>>> lock into the datastructure and then not making use of it in the
>>>> timercounter code and leave it to the callsites does not make sense.
>>>
>>> Most of the locks are held around timecounter_read. In some instances it
>>> is held when cyclecounter is updated standalone or is updated along with
>>> timecounter calls. Was thinking if we move the lock in timecounter
>>> functions, drivers just have to do locking around its operations on
>>> cyclecounter. But then another problem I see is there are variation of
>>> locking calls like lock_irqsave, lock_bh, write_lock_irqsave (some using
>>> rwlock_t). Should this all locking be left to driver only then?
>>
>> You could have the lock in the struct and protect the inner workings in
>> the
>> related core functions.
>>
>> That might remove locking requirements from some of the callers and the
>> others still have their own thing around it.
>
>
> For drivers having static/fixed cyclecounter, we can rely only on lock
> inside timecounter.
> Most of the network drivers update cyclecounter at runtime and they will
> have to rely on two locks if
> we add one to timecounter. This may not be efficient for them. Also the lock
> in timecounter has to be less restrictive (may be seqlock) I guess.
>
> Cc'd Mellanox list for inputs on this.
>
> I have started feeling that the current approach of drivers managing the
> locks is the right one so better leave the
> lock out of timecounter.
>
I agree here,
In mlx5 we rely on our own read/write lock to serialize access to
mlx5_clock struct (mlx5 timecounter and cyclecounter).
the access is not as simple as
lock()
call time_counter_API
unlock()
Sometimes we also explicitly update/adjust timecycles counters with
mlx5 specific calculations after we read the timecounter all inside
our lock.
e.g.
@mlx5_ptp_adjfreq()
write_lock_irqsave(&clock->lock, flags);
timecounter_read(&clock->tc);
clock->cycles.mult = neg_adj ? clock->nominal_c_mult - diff :
clock->nominal_c_mult + diff;
write_unlock_irqrestore(&clock->lock, flags);
So i don't think it will be a simple task to have a generic thread
safe timecounter API, without the need to specifically adjust it for
all driver use-cases.
Also as said above, in runtime it is not obvious in which context the
timecounter will be accessed irq/soft irq/user.
let's keep it as is, and let the driver decide which locking scheme is
most suitable for it.
Thanks,
Saeed.
>> Thanks,
>>
>> tglx
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
2017-11-30 21:03 ` Saeed Mahameed
@ 2017-12-01 7:42 ` Sagar Arun Kamble
-1 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-01 7:42 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Thomas Gleixner, John Stultz, Stephen Boyd, Chris Wilson,
linux-kernel, Linux Netdev List,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 12/1/2017 2:33 AM, Saeed Mahameed wrote:
> On Mon, Nov 27, 2017 at 2:05 AM, Sagar Arun Kamble
> <sagar.a.kamble-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>>
>> On 11/24/2017 7:01 PM, Thomas Gleixner wrote:
>>> On Fri, 24 Nov 2017, Sagar Arun Kamble wrote:
>>>> On 11/24/2017 12:29 AM, Thomas Gleixner wrote:
>>>>> On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
>>>>>> We needed inputs on possible optimization that can be done to
>>>>>> timecounter/cyclecounter structures/usage.
>>>>>> This mail is in response to review of patch
>>>>>> https://patchwork.freedesktop.org/patch/188448/.
>>>>>>
>>>>>> As Chris's observation below, about dozen of timecounter users in the
>>>>>> kernel
>>>>>> have below structures
>>>>>> defined individually:
>>>>>>
>>>>>> spinlock_t lock;
>>>>>> struct cyclecounter cc;
>>>>>> struct timecounter tc;
>>>>>>
>>>>>> Can we move lock and cc to tc? That way it will be convenient.
>>>>>> Also it will allow unifying the locking/overflow watchdog handling
>>>>>> across
>>>>>> all
>>>>>> drivers.
>>>>> Looks like none of the timecounter usage sites has a real need to
>>>>> separate
>>>>> timecounter and cyclecounter.
>>>> Yes. Will share patch for this change.
>>>>
>>>>> The lock is a different question. The locking of the various drivers
>>>>> differs and I have no idea how you want to handle that. Just sticking
>>>>> the
>>>>> lock into the datastructure and then not making use of it in the
>>>>> timercounter code and leave it to the callsites does not make sense.
>>>> Most of the locks are held around timecounter_read. In some instances it
>>>> is held when cyclecounter is updated standalone or is updated along with
>>>> timecounter calls. Was thinking if we move the lock in timecounter
>>>> functions, drivers just have to do locking around its operations on
>>>> cyclecounter. But then another problem I see is there are variation of
>>>> locking calls like lock_irqsave, lock_bh, write_lock_irqsave (some using
>>>> rwlock_t). Should this all locking be left to driver only then?
>>> You could have the lock in the struct and protect the inner workings in
>>> the
>>> related core functions.
>>>
>>> That might remove locking requirements from some of the callers and the
>>> others still have their own thing around it.
>>
>> For drivers having static/fixed cyclecounter, we can rely only on lock
>> inside timecounter.
>> Most of the network drivers update cyclecounter at runtime and they will
>> have to rely on two locks if
>> we add one to timecounter. This may not be efficient for them. Also the lock
>> in timecounter has to be less restrictive (may be seqlock) I guess.
>>
>> Cc'd Mellanox list for inputs on this.
>>
>> I have started feeling that the current approach of drivers managing the
>> locks is the right one so better leave the
>> lock out of timecounter.
>>
> I agree here,
>
> In mlx5 we rely on our own read/write lock to serialize access to
> mlx5_clock struct (mlx5 timecounter and cyclecounter).
> the access is not as simple as
> lock()
> call time_counter_API
> unlock()
>
> Sometimes we also explicitly update/adjust timecycles counters with
> mlx5 specific calculations after we read the timecounter all inside
> our lock.
> e.g.
> @mlx5_ptp_adjfreq()
>
> write_lock_irqsave(&clock->lock, flags);
> timecounter_read(&clock->tc);
> clock->cycles.mult = neg_adj ? clock->nominal_c_mult - diff :
> clock->nominal_c_mult + diff;
> write_unlock_irqrestore(&clock->lock, flags);
>
> So i don't think it will be a simple task to have a generic thread
> safe timecounter API, without the need to specifically adjust it for
> all driver use-cases.
> Also as said above, in runtime it is not obvious in which context the
> timecounter will be accessed irq/soft irq/user.
>
> let's keep it as is, and let the driver decide which locking scheme is
> most suitable for it.
Yes. Thanks for your inputs Saeed.
Regards
Sagar
>
> Thanks,
> Saeed.
>
>>> Thanks,
>>>
>>> tglx
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time]
@ 2017-12-01 7:42 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-01 7:42 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Thomas Gleixner, John Stultz, Stephen Boyd, Chris Wilson,
linux-kernel, Linux Netdev List, linux-rdma
On 12/1/2017 2:33 AM, Saeed Mahameed wrote:
> On Mon, Nov 27, 2017 at 2:05 AM, Sagar Arun Kamble
> <sagar.a.kamble@intel.com> wrote:
>>
>> On 11/24/2017 7:01 PM, Thomas Gleixner wrote:
>>> On Fri, 24 Nov 2017, Sagar Arun Kamble wrote:
>>>> On 11/24/2017 12:29 AM, Thomas Gleixner wrote:
>>>>> On Thu, 23 Nov 2017, Sagar Arun Kamble wrote:
>>>>>> We needed inputs on possible optimization that can be done to
>>>>>> timecounter/cyclecounter structures/usage.
>>>>>> This mail is in response to review of patch
>>>>>> https://patchwork.freedesktop.org/patch/188448/.
>>>>>>
>>>>>> As Chris's observation below, about dozen of timecounter users in the
>>>>>> kernel
>>>>>> have below structures
>>>>>> defined individually:
>>>>>>
>>>>>> spinlock_t lock;
>>>>>> struct cyclecounter cc;
>>>>>> struct timecounter tc;
>>>>>>
>>>>>> Can we move lock and cc to tc? That way it will be convenient.
>>>>>> Also it will allow unifying the locking/overflow watchdog handling
>>>>>> across
>>>>>> all
>>>>>> drivers.
>>>>> Looks like none of the timecounter usage sites has a real need to
>>>>> separate
>>>>> timecounter and cyclecounter.
>>>> Yes. Will share patch for this change.
>>>>
>>>>> The lock is a different question. The locking of the various drivers
>>>>> differs and I have no idea how you want to handle that. Just sticking
>>>>> the
>>>>> lock into the datastructure and then not making use of it in the
>>>>> timercounter code and leave it to the callsites does not make sense.
>>>> Most of the locks are held around timecounter_read. In some instances it
>>>> is held when cyclecounter is updated standalone or is updated along with
>>>> timecounter calls. Was thinking if we move the lock in timecounter
>>>> functions, drivers just have to do locking around its operations on
>>>> cyclecounter. But then another problem I see is there are variation of
>>>> locking calls like lock_irqsave, lock_bh, write_lock_irqsave (some using
>>>> rwlock_t). Should this all locking be left to driver only then?
>>> You could have the lock in the struct and protect the inner workings in
>>> the
>>> related core functions.
>>>
>>> That might remove locking requirements from some of the callers and the
>>> others still have their own thing around it.
>>
>> For drivers having static/fixed cyclecounter, we can rely only on lock
>> inside timecounter.
>> Most of the network drivers update cyclecounter at runtime and they will
>> have to rely on two locks if
>> we add one to timecounter. This may not be efficient for them. Also the lock
>> in timecounter has to be less restrictive (may be seqlock) I guess.
>>
>> Cc'd Mellanox list for inputs on this.
>>
>> I have started feeling that the current approach of drivers managing the
>> locks is the right one so better leave the
>> lock out of timecounter.
>>
> I agree here,
>
> In mlx5 we rely on our own read/write lock to serialize access to
> mlx5_clock struct (mlx5 timecounter and cyclecounter).
> the access is not as simple as
> lock()
> call time_counter_API
> unlock()
>
> Sometimes we also explicitly update/adjust timecycles counters with
> mlx5 specific calculations after we read the timecounter all inside
> our lock.
> e.g.
> @mlx5_ptp_adjfreq()
>
> write_lock_irqsave(&clock->lock, flags);
> timecounter_read(&clock->tc);
> clock->cycles.mult = neg_adj ? clock->nominal_c_mult - diff :
> clock->nominal_c_mult + diff;
> write_unlock_irqrestore(&clock->lock, flags);
>
> So i don't think it will be a simple task to have a generic thread
> safe timecounter API, without the need to specifically adjust it for
> all driver use-cases.
> Also as said above, in runtime it is not obvious in which context the
> timecounter will be accessed irq/soft irq/user.
>
> let's keep it as is, and let the driver decide which locking scheme is
> most suitable for it.
Yes. Thanks for your inputs Saeed.
Regards
Sagar
>
> Thanks,
> Saeed.
>
>>> Thanks,
>>>
>>> tglx
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time
2017-11-15 12:25 ` Chris Wilson
2017-11-15 16:41 ` Sagar Arun Kamble
2017-11-23 7:34 ` Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time] Sagar Arun Kamble
@ 2017-12-05 13:58 ` Lionel Landwerlin
2017-12-06 8:17 ` Sagar Arun Kamble
2 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-05 13:58 UTC (permalink / raw)
To: Chris Wilson, Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 15/11/17 12:25, Chris Wilson wrote:
> Quoting Sagar Arun Kamble (2017-11-15 12:13:51)
>> #include <drm/drmP.h>
>> #include <drm/intel-gtt.h>
>> @@ -2149,6 +2150,14 @@ struct i915_perf_stream {
>> * @oa_config: The OA configuration used by the stream.
>> */
>> struct i915_oa_config *oa_config;
>> +
>> + /**
>> + * System time correlation variables.
>> + */
>> + struct cyclecounter cc;
>> + spinlock_t systime_lock;
>> + struct timespec64 start_systime;
>> + struct timecounter tc;
> This pattern is repeated a lot by struct timecounter users. (I'm still
> trying to understand why the common case is not catered for by a
> convenience timecounter api.)
>
>> };
>>
>> /**
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 00be015..72ddc34 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -192,6 +192,7 @@
>> */
>>
>> #include <linux/anon_inodes.h>
>> +#include <linux/clocksource.h>
>> #include <linux/sizes.h>
>> #include <linux/uuid.h>
>>
>> @@ -2391,6 +2392,56 @@ static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
>> }
>>
>> /**
>> + * i915_cyclecounter_read - read raw cycle/timestamp counter
>> + * @cc: cyclecounter structure
>> + */
>> +static u64 i915_cyclecounter_read(const struct cyclecounter *cc)
>> +{
>> + struct i915_perf_stream *stream = container_of(cc, typeof(*stream), cc);
>> + struct drm_i915_private *dev_priv = stream->dev_priv;
>> + u64 ts_count;
>> +
>> + intel_runtime_pm_get(dev_priv);
>> + ts_count = I915_READ64_2x32(GEN4_TIMESTAMP,
>> + GEN7_TIMESTAMP_UDW);
>> + intel_runtime_pm_put(dev_priv);
>> +
>> + return ts_count;
>> +}
>> +
>> +static void i915_perf_init_cyclecounter(struct i915_perf_stream *stream)
>> +{
>> + struct drm_i915_private *dev_priv = stream->dev_priv;
>> + int cs_ts_freq = dev_priv->perf.oa.timestamp_frequency;
>> + struct cyclecounter *cc = &stream->cc;
>> + u32 maxsec;
>> +
>> + cc->read = i915_cyclecounter_read;
>> + cc->mask = CYCLECOUNTER_MASK(CS_TIMESTAMP_WIDTH(dev_priv));
>> + maxsec = cc->mask / cs_ts_freq;
>> +
>> + clocks_calc_mult_shift(&cc->mult, &cc->shift, cs_ts_freq,
>> + NSEC_PER_SEC, maxsec);
>> +}
>> +
>> +static void i915_perf_init_timecounter(struct i915_perf_stream *stream)
>> +{
>> +#define SYSTIME_START_OFFSET 350000 /* Counter read takes about 350us */
>> + unsigned long flags;
>> + u64 ns;
>> +
>> + i915_perf_init_cyclecounter(stream);
>> + spin_lock_init(&stream->systime_lock);
>> +
>> + getnstimeofday64(&stream->start_systime);
>> + ns = timespec64_to_ns(&stream->start_systime) + SYSTIME_START_OFFSET;
> Use ktime directly. Or else Arnd will be back with a patch to fix it.
> (All non-ktime interfaces are effectively deprecated; obsolete for
> drivers.)
>
>> + spin_lock_irqsave(&stream->systime_lock, flags);
>> + timecounter_init(&stream->tc, &stream->cc, ns);
>> + spin_unlock_irqrestore(&stream->systime_lock, flags);
>> +}
>> +
>> +/**
>> * i915_perf_enable_locked - handle `I915_PERF_IOCTL_ENABLE` ioctl
>> * @stream: A disabled i915 perf stream
>> *
>> @@ -2408,6 +2459,8 @@ static void i915_perf_enable_locked(struct i915_perf_stream *stream)
>> /* Allow stream->ops->enable() to refer to this */
>> stream->enabled = true;
>>
>> + i915_perf_init_timecounter(stream);
>> +
>> if (stream->ops->enable)
>> stream->ops->enable(stream);
>> }
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index cfdf4f8..e7e6966 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -8882,6 +8882,12 @@ enum skl_power_gate {
>>
>> /* Gen4+ Timestamp and Pipe Frame time stamp registers */
>> #define GEN4_TIMESTAMP _MMIO(0x2358)
>> +#define GEN7_TIMESTAMP_UDW _MMIO(0x235C)
>> +#define PRE_GEN7_TIMESTAMP_WIDTH 32
>> +#define GEN7_TIMESTAMP_WIDTH 36
>> +#define CS_TIMESTAMP_WIDTH(dev_priv) \
>> + (INTEL_GEN(dev_priv) < 7 ? PRE_GEN7_TIMESTAMP_WIDTH : \
>> + GEN7_TIMESTAMP_WIDTH)
> s/PRE_GEN7/GEN4/ would be consistent.
> If you really want to add support for earlier, I9XX_.
Why not use the RING_TIMESTAMP() RING_TIMESTAMP_UDW() ?
There's already used for the reg_read ioctl.
>
> Ok. I can accept the justification, and we are not the only ones who do
> the cyclecounter -> timecounter correction like this.
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
` (4 preceding siblings ...)
2017-11-15 12:30 ` ✗ Fi.CI.BAT: warning for GPU/CPU timestamps correlation for relating OA samples with system events Patchwork
@ 2017-12-05 14:16 ` Lionel Landwerlin
2017-12-05 14:28 ` Robert Bragg
2017-12-06 20:02 ` Lionel Landwerlin
2017-12-07 0:48 ` Robert Bragg
7 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-05 14:16 UTC (permalink / raw)
To: Sagar Arun Kamble, intel-gfx
Hey Sagar,
Sorry for the delay looking into this series.
I've done some userspace/UI work in GPUTop to try to correlate perf
samples/tracepoints with i915 perf reports.
I wanted to avoid having to add too much logic into the kernel and tried
to sample both cpu clocks & gpu timestamps from userspace.
So far that's not working. People more knowledgable than I would have
realized that the kernel can sneak in work into syscalls.
So result is that 2 syscalls (one to get the cpu clock, one for the gpu
timestamp) back to back from the same thread leads to time differences
of anywhere from a few microseconds to in some cases close to
1millisecond. So it's basically unworkable.
Anyway the UI work won't go to waste :)
I'm thinking to go with your approach.
From my experiment with gputop, it seems we might want to use a
different cpu clock source though or make it configurable.
The perf infrastructure allows you to choose what clock you want to use.
Since we want to avoid time adjustments on that clock (because we're
adding deltas), a clock monotonic raw would make most sense.
I'll look at adding some tests for this too.
Thanks,
-
Lionel
On 15/11/17 12:13, Sagar Arun Kamble wrote:
> We can compute system time corresponding to GPU timestamp by taking a
> reference point (CPU monotonic time, GPU timestamp) and then adding
> delta time computed using timecounter/cyclecounter support in kernel.
> We have to configure cyclecounter with the GPU timestamp frequency.
> Earlier approach that was based on cross-timestamp is not needed. It
> was being used to approximate the frequency based on invalid assumptions
> (possibly drift was being seen in the time due to precision issue).
> The precision of time from GPU clocks is already in ns and timecounter
> takes care of it as verified over variable durations.
>
> This series adds base timecounter/cyclecounter changes and changes to
> get GPU and CPU timestamps in OA samples.
>
> Sagar Arun Kamble (1):
> drm/i915/perf: Add support to correlate GPU timestamp with system time
>
> Sourab Gupta (3):
> drm/i915/perf: Add support for collecting 64 bit timestamps with OA
> reports
> drm/i915/perf: Extract raw GPU timestamps from OA reports
> drm/i915/perf: Send system clock monotonic time in perf samples
>
> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
> drivers/gpu/drm/i915/i915_perf.c | 124 ++++++++++++++++++++++++++++++++++++++-
> drivers/gpu/drm/i915/i915_reg.h | 6 ++
> include/uapi/drm/i915_drm.h | 14 +++++
> 4 files changed, 154 insertions(+), 1 deletion(-)
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples
2017-11-15 12:13 ` [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples Sagar Arun Kamble
2017-11-15 12:31 ` Chris Wilson
2017-11-15 17:54 ` Sagar Arun Kamble
@ 2017-12-05 14:22 ` Lionel Landwerlin
2017-12-06 8:31 ` Sagar Arun Kamble
2 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-05 14:22 UTC (permalink / raw)
To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 15/11/17 12:13, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> Currently, we have the ability to only forward the GPU timestamps in the
> samples (which are generated via OA reports). This limits the ability to
> correlate these samples with the system events.
>
> An ability is therefore needed to report timestamps in different clock
> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
> practical use to the userspace. This ability becomes important
> when we want to correlate/plot GPU events/samples with other system events
> on the same timeline (e.g. vblank events, or timestamps when work was
> submitted to kernel, etc.)
>
> The patch here proposes a mechanism to achieve this. The correlation
> between gpu time and system time is established using the timestamp clock
> associated with the command stream, abstracted as timecounter/cyclecounter
> to retrieve gpu/system time correlated values.
>
> v2: Added i915_driver_init_late() function to capture the new late init
> phase for perf (Chris)
>
> v3: Removed cross-timestamp changes.
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sourab Gupta <sourab.gupta@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
> include/uapi/drm/i915_drm.h | 7 +++++++
> 2 files changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3b721d7..94ee924 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -336,6 +336,7 @@
>
> #define SAMPLE_OA_REPORT BIT(0)
> #define SAMPLE_GPU_TS BIT(1)
> +#define SAMPLE_SYSTEM_TS BIT(2)
>
> /**
> * struct perf_open_properties - for validated properties given to open a stream
> @@ -622,6 +623,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
> struct drm_i915_perf_record_header header;
> u32 sample_flags = stream->sample_flags;
> u64 gpu_ts = 0;
> + u64 system_ts = 0;
>
> header.type = DRM_I915_PERF_RECORD_SAMPLE;
> header.pad = 0;
> @@ -647,6 +649,23 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>
> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
> return -EFAULT;
> + buf += I915_PERF_TS_SAMPLE_SIZE;
This is a ridiculous nit, but I think using sizeof(u64) would be more
readable than this I915_PERF_TS_SAMPLE_SIZE define.
> + }
> +
> + if (sample_flags & SAMPLE_SYSTEM_TS) {
> + gpu_ts = get_gpu_ts_from_oa_report(stream, report);
> + /*
> + * XXX: timecounter_cyc2time considers time backwards if delta
> + * timestamp is more than half the max ns time covered by
> + * counter. It will be ~35min for 36 bit counter. If this much
> + * sampling duration is needed we will have to update tc->nsec
> + * by explicitly reading the timecounter (timecounter_read)
> + * before this duration.
> + */
> + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
> +
> + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
> + return -EFAULT;
> }
>
> (*offset) += header.size;
> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
> stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
> }
>
> + if (props->sample_flags & SAMPLE_SYSTEM_TS) {
> + stream->sample_flags |= SAMPLE_SYSTEM_TS;
> + stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
> + }
> +
> dev_priv->perf.oa.oa_buffer.format_size = format_size;
> if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
> return -EINVAL;
> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
> case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
> props->sample_flags |= SAMPLE_GPU_TS;
> break;
> + case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
> + props->sample_flags |= SAMPLE_SYSTEM_TS;
> + break;
> case DRM_I915_PERF_PROP_OA_METRICS_SET:
> if (value == 0) {
> DRM_DEBUG("Unknown OA metric set ID\n");
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 0b9249e..283859c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
> DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>
> /**
> + * This property requests inclusion of CLOCK_MONOTONIC system time in
> + * the perf sample data.
> + */
> + DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
> +
> + /**
> * The value specifies which set of OA unit metrics should be
> * be configured, defining the contents of any OA unit reports.
> */
> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
> *
> * { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
> * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
> + * { u64 system_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
I would just add those u64 fields before oa_report.
Since the report sizes are dictated by the hardware, I'm afraid that one
day someone might come up with a non 64bit aligned format (however
unlikely).
And since the new properties mean you need to be aware of the potential
new offsets, it's not breaking existing userspace.
> * };
> */
> DRM_I915_PERF_RECORD_SAMPLE = 1,
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-05 14:16 ` [RFC 0/4] " Lionel Landwerlin
@ 2017-12-05 14:28 ` Robert Bragg
2017-12-05 14:37 ` Lionel Landwerlin
0 siblings, 1 reply; 45+ messages in thread
From: Robert Bragg @ 2017-12-05 14:28 UTC (permalink / raw)
To: Lionel Landwerlin; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 3256 bytes --]
On Tue, Dec 5, 2017 at 2:16 PM, Lionel Landwerlin <
lionel.g.landwerlin@intel.com> wrote:
> Hey Sagar,
>
> Sorry for the delay looking into this series.
> I've done some userspace/UI work in GPUTop to try to correlate perf
> samples/tracepoints with i915 perf reports.
>
> I wanted to avoid having to add too much logic into the kernel and tried
> to sample both cpu clocks & gpu timestamps from userspace.
> So far that's not working. People more knowledgable than I would have
> realized that the kernel can sneak in work into syscalls.
> So result is that 2 syscalls (one to get the cpu clock, one for the gpu
> timestamp) back to back from the same thread leads to time differences of
> anywhere from a few microseconds to in some cases close to 1millisecond. So
> it's basically unworkable.
> Anyway the UI work won't go to waste :)
>
> I'm thinking to go with your approach.
> From my experiment with gputop, it seems we might want to use a different
> cpu clock source though or make it configurable.
> The perf infrastructure allows you to choose what clock you want to use.
> Since we want to avoid time adjustments on that clock (because we're adding
> deltas), a clock monotonic raw would make most sense.
>
I would guess the most generally useful clock domain to correlate with the
largest number of interesting events would surely be CLOCK_MONOTONIC, not
_MONOTONIC_RAW.
E.g. here's some discussion around why vblank events use CLOCK_MONOTINIC:
https://lists.freedesktop.org/archives/dri-devel/2012-October/028878.html
Br,
- Robert
> I'll look at adding some tests for this too.
>
> Thanks,
>
> -
> Lionel
>
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>
>> We can compute system time corresponding to GPU timestamp by taking a
>> reference point (CPU monotonic time, GPU timestamp) and then adding
>> delta time computed using timecounter/cyclecounter support in kernel.
>> We have to configure cyclecounter with the GPU timestamp frequency.
>> Earlier approach that was based on cross-timestamp is not needed. It
>> was being used to approximate the frequency based on invalid assumptions
>> (possibly drift was being seen in the time due to precision issue).
>> The precision of time from GPU clocks is already in ns and timecounter
>> takes care of it as verified over variable durations.
>>
>> This series adds base timecounter/cyclecounter changes and changes to
>> get GPU and CPU timestamps in OA samples.
>>
>> Sagar Arun Kamble (1):
>> drm/i915/perf: Add support to correlate GPU timestamp with system time
>>
>> Sourab Gupta (3):
>> drm/i915/perf: Add support for collecting 64 bit timestamps with OA
>> reports
>> drm/i915/perf: Extract raw GPU timestamps from OA reports
>> drm/i915/perf: Send system clock monotonic time in perf samples
>>
>> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
>> drivers/gpu/drm/i915/i915_perf.c | 124 ++++++++++++++++++++++++++++++
>> ++++++++-
>> drivers/gpu/drm/i915/i915_reg.h | 6 ++
>> include/uapi/drm/i915_drm.h | 14 +++++
>> 4 files changed, 154 insertions(+), 1 deletion(-)
>>
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
[-- Attachment #1.2: Type: text/html, Size: 4543 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-05 14:28 ` Robert Bragg
@ 2017-12-05 14:37 ` Lionel Landwerlin
2017-12-06 9:01 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-05 14:37 UTC (permalink / raw)
To: Robert Bragg; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 4004 bytes --]
On 05/12/17 14:28, Robert Bragg wrote:
>
>
> On Tue, Dec 5, 2017 at 2:16 PM, Lionel Landwerlin
> <lionel.g.landwerlin@intel.com <mailto:lionel.g.landwerlin@intel.com>>
> wrote:
>
> Hey Sagar,
>
> Sorry for the delay looking into this series.
> I've done some userspace/UI work in GPUTop to try to correlate
> perf samples/tracepoints with i915 perf reports.
>
> I wanted to avoid having to add too much logic into the kernel and
> tried to sample both cpu clocks & gpu timestamps from userspace.
> So far that's not working. People more knowledgable than I would
> have realized that the kernel can sneak in work into syscalls.
> So result is that 2 syscalls (one to get the cpu clock, one for
> the gpu timestamp) back to back from the same thread leads to time
> differences of anywhere from a few microseconds to in some cases
> close to 1millisecond. So it's basically unworkable.
> Anyway the UI work won't go to waste :)
>
> I'm thinking to go with your approach.
> >From my experiment with gputop, it seems we might want to use a
> different cpu clock source though or make it configurable.
> The perf infrastructure allows you to choose what clock you want
> to use. Since we want to avoid time adjustments on that clock
> (because we're adding deltas), a clock monotonic raw would make
> most sense.
>
>
> I would guess the most generally useful clock domain to correlate with
> the largest number of interesting events would surely be
> CLOCK_MONOTONIC, not _MONOTONIC_RAW.
>
> E.g. here's some discussion around why vblank events use
> CLOCK_MONOTINIC:
> https://lists.freedesktop.org/archives/dri-devel/2012-October/028878.html
>
> Br,
> - Robert
Thanks Rob, then I guess making it configurable when opening the stream
would be the safest option.
>
>
> I'll look at adding some tests for this too.
>
> Thanks,
>
> -
> Lionel
>
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>
> We can compute system time corresponding to GPU timestamp by
> taking a
> reference point (CPU monotonic time, GPU timestamp) and then
> adding
> delta time computed using timecounter/cyclecounter support in
> kernel.
> We have to configure cyclecounter with the GPU timestamp
> frequency.
> Earlier approach that was based on cross-timestamp is not
> needed. It
> was being used to approximate the frequency based on invalid
> assumptions
> (possibly drift was being seen in the time due to precision
> issue).
> The precision of time from GPU clocks is already in ns and
> timecounter
> takes care of it as verified over variable durations.
>
> This series adds base timecounter/cyclecounter changes and
> changes to
> get GPU and CPU timestamps in OA samples.
>
> Sagar Arun Kamble (1):
> drm/i915/perf: Add support to correlate GPU timestamp with
> system time
>
> Sourab Gupta (3):
> drm/i915/perf: Add support for collecting 64 bit timestamps
> with OA
> reports
> drm/i915/perf: Extract raw GPU timestamps from OA reports
> drm/i915/perf: Send system clock monotonic time in perf samples
>
> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
> drivers/gpu/drm/i915/i915_perf.c | 124
> ++++++++++++++++++++++++++++++++++++++-
> drivers/gpu/drm/i915/i915_reg.h | 6 ++
> include/uapi/drm/i915_drm.h | 14 +++++
> 4 files changed, 154 insertions(+), 1 deletion(-)
>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> <mailto:Intel-gfx@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>
>
>
[-- Attachment #1.2: Type: text/html, Size: 7437 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time
2017-12-05 13:58 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Lionel Landwerlin
@ 2017-12-06 8:17 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-06 8:17 UTC (permalink / raw)
To: Lionel Landwerlin, Chris Wilson, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 12/5/2017 7:28 PM, Lionel Landwerlin wrote:
> On 15/11/17 12:25, Chris Wilson wrote:
>> Quoting Sagar Arun Kamble (2017-11-15 12:13:51)
>>> #include <drm/drmP.h>
>>> #include <drm/intel-gtt.h>
>>> @@ -2149,6 +2150,14 @@ struct i915_perf_stream {
>>> * @oa_config: The OA configuration used by the stream.
>>> */
>>> struct i915_oa_config *oa_config;
>>> +
>>> + /**
>>> + * System time correlation variables.
>>> + */
>>> + struct cyclecounter cc;
>>> + spinlock_t systime_lock;
>>> + struct timespec64 start_systime;
>>> + struct timecounter tc;
>> This pattern is repeated a lot by struct timecounter users. (I'm still
>> trying to understand why the common case is not catered for by a
>> convenience timecounter api.)
>>
>>> };
>>> /**
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 00be015..72ddc34 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -192,6 +192,7 @@
>>> */
>>> #include <linux/anon_inodes.h>
>>> +#include <linux/clocksource.h>
>>> #include <linux/sizes.h>
>>> #include <linux/uuid.h>
>>> @@ -2391,6 +2392,56 @@ static unsigned int i915_perf_poll(struct
>>> file *file, poll_table *wait)
>>> }
>>> /**
>>> + * i915_cyclecounter_read - read raw cycle/timestamp counter
>>> + * @cc: cyclecounter structure
>>> + */
>>> +static u64 i915_cyclecounter_read(const struct cyclecounter *cc)
>>> +{
>>> + struct i915_perf_stream *stream = container_of(cc,
>>> typeof(*stream), cc);
>>> + struct drm_i915_private *dev_priv = stream->dev_priv;
>>> + u64 ts_count;
>>> +
>>> + intel_runtime_pm_get(dev_priv);
>>> + ts_count = I915_READ64_2x32(GEN4_TIMESTAMP,
>>> + GEN7_TIMESTAMP_UDW);
>>> + intel_runtime_pm_put(dev_priv);
>>> +
>>> + return ts_count;
>>> +}
>>> +
>>> +static void i915_perf_init_cyclecounter(struct i915_perf_stream
>>> *stream)
>>> +{
>>> + struct drm_i915_private *dev_priv = stream->dev_priv;
>>> + int cs_ts_freq = dev_priv->perf.oa.timestamp_frequency;
>>> + struct cyclecounter *cc = &stream->cc;
>>> + u32 maxsec;
>>> +
>>> + cc->read = i915_cyclecounter_read;
>>> + cc->mask = CYCLECOUNTER_MASK(CS_TIMESTAMP_WIDTH(dev_priv));
>>> + maxsec = cc->mask / cs_ts_freq;
>>> +
>>> + clocks_calc_mult_shift(&cc->mult, &cc->shift, cs_ts_freq,
>>> + NSEC_PER_SEC, maxsec);
>>> +}
>>> +
>>> +static void i915_perf_init_timecounter(struct i915_perf_stream
>>> *stream)
>>> +{
>>> +#define SYSTIME_START_OFFSET 350000 /* Counter read takes about
>>> 350us */
>>> + unsigned long flags;
>>> + u64 ns;
>>> +
>>> + i915_perf_init_cyclecounter(stream);
>>> + spin_lock_init(&stream->systime_lock);
>>> +
>>> + getnstimeofday64(&stream->start_systime);
>>> + ns = timespec64_to_ns(&stream->start_systime) +
>>> SYSTIME_START_OFFSET;
>> Use ktime directly. Or else Arnd will be back with a patch to fix it.
>> (All non-ktime interfaces are effectively deprecated; obsolete for
>> drivers.)
>>
>>> + spin_lock_irqsave(&stream->systime_lock, flags);
>>> + timecounter_init(&stream->tc, &stream->cc, ns);
>>> + spin_unlock_irqrestore(&stream->systime_lock, flags);
>>> +}
>>> +
>>> +/**
>>> * i915_perf_enable_locked - handle `I915_PERF_IOCTL_ENABLE` ioctl
>>> * @stream: A disabled i915 perf stream
>>> *
>>> @@ -2408,6 +2459,8 @@ static void i915_perf_enable_locked(struct
>>> i915_perf_stream *stream)
>>> /* Allow stream->ops->enable() to refer to this */
>>> stream->enabled = true;
>>> + i915_perf_init_timecounter(stream);
>>> +
>>> if (stream->ops->enable)
>>> stream->ops->enable(stream);
>>> }
>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>> b/drivers/gpu/drm/i915/i915_reg.h
>>> index cfdf4f8..e7e6966 100644
>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>> @@ -8882,6 +8882,12 @@ enum skl_power_gate {
>>> /* Gen4+ Timestamp and Pipe Frame time stamp registers */
>>> #define GEN4_TIMESTAMP _MMIO(0x2358)
>>> +#define GEN7_TIMESTAMP_UDW _MMIO(0x235C)
>>> +#define PRE_GEN7_TIMESTAMP_WIDTH 32
>>> +#define GEN7_TIMESTAMP_WIDTH 36
>>> +#define CS_TIMESTAMP_WIDTH(dev_priv) \
>>> + (INTEL_GEN(dev_priv) < 7 ? PRE_GEN7_TIMESTAMP_WIDTH : \
>>> + GEN7_TIMESTAMP_WIDTH)
>> s/PRE_GEN7/GEN4/ would be consistent.
>> If you really want to add support for earlier, I9XX_.
>
> Why not use the RING_TIMESTAMP() RING_TIMESTAMP_UDW() ?
> There's already used for the reg_read ioctl.
Yes. We can use these.
>
>>
>> Ok. I can accept the justification, and we are not the only ones who do
>> the cyclecounter -> timecounter correction like this.
>> -Chris
>>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples
2017-12-05 14:22 ` Lionel Landwerlin
@ 2017-12-06 8:31 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-06 8:31 UTC (permalink / raw)
To: Lionel Landwerlin, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 12/5/2017 7:52 PM, Lionel Landwerlin wrote:
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>> From: Sourab Gupta <sourab.gupta@intel.com>
>>
>> Currently, we have the ability to only forward the GPU timestamps in the
>> samples (which are generated via OA reports). This limits the ability to
>> correlate these samples with the system events.
>>
>> An ability is therefore needed to report timestamps in different clock
>> domains, such as CLOCK_MONOTONIC, in the perf samples to be of more
>> practical use to the userspace. This ability becomes important
>> when we want to correlate/plot GPU events/samples with other system
>> events
>> on the same timeline (e.g. vblank events, or timestamps when work was
>> submitted to kernel, etc.)
>>
>> The patch here proposes a mechanism to achieve this. The correlation
>> between gpu time and system time is established using the timestamp
>> clock
>> associated with the command stream, abstracted as
>> timecounter/cyclecounter
>> to retrieve gpu/system time correlated values.
>>
>> v2: Added i915_driver_init_late() function to capture the new late init
>> phase for perf (Chris)
>>
>> v3: Removed cross-timestamp changes.
>>
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sourab Gupta <sourab.gupta@intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++++++++
>> include/uapi/drm/i915_drm.h | 7 +++++++
>> 2 files changed, 34 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index 3b721d7..94ee924 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -336,6 +336,7 @@
>> #define SAMPLE_OA_REPORT BIT(0)
>> #define SAMPLE_GPU_TS BIT(1)
>> +#define SAMPLE_SYSTEM_TS BIT(2)
>> /**
>> * struct perf_open_properties - for validated properties given to
>> open a stream
>> @@ -622,6 +623,7 @@ static int append_oa_sample(struct
>> i915_perf_stream *stream,
>> struct drm_i915_perf_record_header header;
>> u32 sample_flags = stream->sample_flags;
>> u64 gpu_ts = 0;
>> + u64 system_ts = 0;
>> header.type = DRM_I915_PERF_RECORD_SAMPLE;
>> header.pad = 0;
>> @@ -647,6 +649,23 @@ static int append_oa_sample(struct
>> i915_perf_stream *stream,
>> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>> return -EFAULT;
>> + buf += I915_PERF_TS_SAMPLE_SIZE;
>
> This is a ridiculous nit, but I think using sizeof(u64) would be more
> readable than this I915_PERF_TS_SAMPLE_SIZE define.
Will update. Thanks.
>
>> + }
>> +
>> + if (sample_flags & SAMPLE_SYSTEM_TS) {
>> + gpu_ts = get_gpu_ts_from_oa_report(stream, report);
>> + /*
>> + * XXX: timecounter_cyc2time considers time backwards if delta
>> + * timestamp is more than half the max ns time covered by
>> + * counter. It will be ~35min for 36 bit counter. If this much
>> + * sampling duration is needed we will have to update tc->nsec
>> + * by explicitly reading the timecounter (timecounter_read)
>> + * before this duration.
>> + */
>> + system_ts = timecounter_cyc2time(&stream->tc, gpu_ts);
>> +
>> + if (copy_to_user(buf, &system_ts, I915_PERF_TS_SAMPLE_SIZE))
>> + return -EFAULT;
>> }
>> (*offset) += header.size;
>> @@ -2137,6 +2156,11 @@ static int i915_oa_stream_init(struct
>> i915_perf_stream *stream,
>> stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>> }
>> + if (props->sample_flags & SAMPLE_SYSTEM_TS) {
>> + stream->sample_flags |= SAMPLE_SYSTEM_TS;
>> + stream->sample_size += I915_PERF_TS_SAMPLE_SIZE;
>> + }
>> +
>> dev_priv->perf.oa.oa_buffer.format_size = format_size;
>> if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
>> return -EINVAL;
>> @@ -2857,6 +2881,9 @@ static int read_properties_unlocked(struct
>> drm_i915_private *dev_priv,
>> case DRM_I915_PERF_PROP_SAMPLE_GPU_TS:
>> props->sample_flags |= SAMPLE_GPU_TS;
>> break;
>> + case DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS:
>> + props->sample_flags |= SAMPLE_SYSTEM_TS;
>> + break;
>> case DRM_I915_PERF_PROP_OA_METRICS_SET:
>> if (value == 0) {
>> DRM_DEBUG("Unknown OA metric set ID\n");
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 0b9249e..283859c 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1453,6 +1453,12 @@ enum drm_i915_perf_property_id {
>> DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>> /**
>> + * This property requests inclusion of CLOCK_MONOTONIC system
>> time in
>> + * the perf sample data.
>> + */
>> + DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS,
>> +
>> + /**
>> * The value specifies which set of OA unit metrics should be
>> * be configured, defining the contents of any OA unit reports.
>> */
>> @@ -1539,6 +1545,7 @@ enum drm_i915_perf_record_type {
>> *
>> * { u32 oa_report[]; } && DRM_I915_PERF_PROP_SAMPLE_OA
>> * { u64 gpu_timestamp; } && DRM_I915_PERF_PROP_SAMPLE_GPU_TS
>> + * { u64 system_timestamp; } &&
>> DRM_I915_PERF_PROP_SAMPLE_SYSTEM_TS
>
> I would just add those u64 fields before oa_report.
> Since the report sizes are dictated by the hardware, I'm afraid that
> one day someone might come up with a non 64bit aligned format (however
> unlikely).
> And since the new properties mean you need to be aware of the
> potential new offsets, it's not breaking existing userspace.
Sure. Will update.
>
>> * };
>> */
>> DRM_I915_PERF_RECORD_SAMPLE = 1,
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-05 14:37 ` Lionel Landwerlin
@ 2017-12-06 9:01 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-06 9:01 UTC (permalink / raw)
To: Lionel Landwerlin, Robert Bragg; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 4441 bytes --]
On 12/5/2017 8:07 PM, Lionel Landwerlin wrote:
> On 05/12/17 14:28, Robert Bragg wrote:
>>
>>
>> On Tue, Dec 5, 2017 at 2:16 PM, Lionel Landwerlin
>> <lionel.g.landwerlin@intel.com
>> <mailto:lionel.g.landwerlin@intel.com>> wrote:
>>
>> Hey Sagar,
>>
>> Sorry for the delay looking into this series.
>> I've done some userspace/UI work in GPUTop to try to correlate
>> perf samples/tracepoints with i915 perf reports.
>>
>> I wanted to avoid having to add too much logic into the kernel
>> and tried to sample both cpu clocks & gpu timestamps from userspace.
>> So far that's not working. People more knowledgable than I would
>> have realized that the kernel can sneak in work into syscalls.
>> So result is that 2 syscalls (one to get the cpu clock, one for
>> the gpu timestamp) back to back from the same thread leads to
>> time differences of anywhere from a few microseconds to in some
>> cases close to 1millisecond. So it's basically unworkable.
>> Anyway the UI work won't go to waste :)
>>
>> I'm thinking to go with your approach.
>> >From my experiment with gputop, it seems we might want to use a
>> different cpu clock source though or make it configurable.
>> The perf infrastructure allows you to choose what clock you want
>> to use. Since we want to avoid time adjustments on that clock
>> (because we're adding deltas), a clock monotonic raw would make
>> most sense.
>>
>>
>> I would guess the most generally useful clock domain to correlate
>> with the largest number of interesting events would surely be
>> CLOCK_MONOTONIC, not _MONOTONIC_RAW.
>>
>> E.g. here's some discussion around why vblank events use
>> CLOCK_MONOTINIC:
>> https://lists.freedesktop.org/archives/dri-devel/2012-October/028878.html
>>
>> Br,
>> - Robert
>
> Thanks Rob, then I guess making it configurable when opening the
> stream would be the safest option.
>
All timecounter users today rely on monotonic clock (Can request for clock real/wall time, clock boot time, clock tai time
corresponding to monotonic clock).
Agree that we will need to have configuration option about clock to be used like in trace_clock in ftrace.
>>
>>
>> I'll look at adding some tests for this too.
>>
>> Thanks,
>>
>> -
>> Lionel
>>
>> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>>
>> We can compute system time corresponding to GPU timestamp by
>> taking a
>> reference point (CPU monotonic time, GPU timestamp) and then
>> adding
>> delta time computed using timecounter/cyclecounter support in
>> kernel.
>> We have to configure cyclecounter with the GPU timestamp
>> frequency.
>> Earlier approach that was based on cross-timestamp is not
>> needed. It
>> was being used to approximate the frequency based on invalid
>> assumptions
>> (possibly drift was being seen in the time due to precision
>> issue).
>> The precision of time from GPU clocks is already in ns and
>> timecounter
>> takes care of it as verified over variable durations.
>>
>> This series adds base timecounter/cyclecounter changes and
>> changes to
>> get GPU and CPU timestamps in OA samples.
>>
>> Sagar Arun Kamble (1):
>> drm/i915/perf: Add support to correlate GPU timestamp with
>> system time
>>
>> Sourab Gupta (3):
>> drm/i915/perf: Add support for collecting 64 bit
>> timestamps with OA
>> reports
>> drm/i915/perf: Extract raw GPU timestamps from OA reports
>> drm/i915/perf: Send system clock monotonic time in perf
>> samples
>>
>> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
>> drivers/gpu/drm/i915/i915_perf.c | 124
>> ++++++++++++++++++++++++++++++++++++++-
>> drivers/gpu/drm/i915/i915_reg.h | 6 ++
>> include/uapi/drm/i915_drm.h | 14 +++++
>> 4 files changed, 154 insertions(+), 1 deletion(-)
>>
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> <mailto:Intel-gfx@lists.freedesktop.org>
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>
>>
>>
>
[-- Attachment #1.2: Type: text/html, Size: 8404 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 2/4] drm/i915/perf: Add support for collecting 64 bit timestamps with OA reports
2017-11-15 12:13 ` [RFC 2/4] drm/i915/perf: Add support for collecting 64 bit timestamps with OA reports Sagar Arun Kamble
@ 2017-12-06 16:01 ` Lionel Landwerlin
2017-12-21 8:38 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-06 16:01 UTC (permalink / raw)
To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 15/11/17 12:13, Sagar Arun Kamble wrote:
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1447,6 +1447,12 @@ enum drm_i915_perf_property_id {
> DRM_I915_PERF_PROP_SAMPLE_OA,
>
> /**
> + * The value of this property set to 1 requests inclusion of GPU
> + * timestamp in the perf sample data.
> + */
> + DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
> +
> + /**
> * The value specifies which set of OA unit metrics should be
> * be configured, defining the contents of any OA unit reports.
> */
Inserting this, not at the end of this enum break API/ABI.
This applies to other patches too.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 3/4] drm/i915/perf: Extract raw GPU timestamps from OA reports
2017-11-15 12:13 ` [RFC 3/4] drm/i915/perf: Extract raw GPU timestamps from " Sagar Arun Kamble
@ 2017-12-06 19:55 ` Lionel Landwerlin
2017-12-21 8:50 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-06 19:55 UTC (permalink / raw)
To: Sagar Arun Kamble, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 15/11/17 12:13, Sagar Arun Kamble wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
>
> The OA reports contain the least significant 32 bits of the gpu timestamp.
> This patch enables retrieval of the timestamp field from OA reports, to
> forward as 64 bit raw gpu timestamps in the perf samples.
>
> v2: Rebase w.r.t new timecounter support.
>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sourab Gupta <sourab.gupta@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 2 ++
> drivers/gpu/drm/i915/i915_perf.c | 26 +++++++++++++++++++++++++-
> 2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e08bc85..5534cd2 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2151,6 +2151,8 @@ struct i915_perf_stream {
> */
> struct i915_oa_config *oa_config;
>
> + u64 last_gpu_ts;
> +
> /**
> * System time correlation variables.
> */
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index f7e748c..3b721d7 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -575,6 +575,26 @@ static int append_oa_status(struct i915_perf_stream *stream,
> }
>
> /**
> + * get_gpu_ts_from_oa_report - Retrieve absolute gpu timestamp from OA report
> + *
> + * Note: We are assuming that we're updating last_gpu_ts frequently enough so
> + * that it's never possible to see multiple overflows before we compare
> + * sample_ts to last_gpu_ts. Since this is significantly large duration
> + * (~6min for 80ns ts base), we can safely assume so.
> + */
> +static u64 get_gpu_ts_from_oa_report(struct i915_perf_stream *stream,
> + const u8 *report)
> +{
> + u32 sample_ts = *(u32 *)(report + 4);
> + u32 delta;
> +
> + delta = sample_ts - (u32)stream->last_gpu_ts;
> + stream->last_gpu_ts += delta;
> +
> + return stream->last_gpu_ts;
> +}
> +
> +/**
> * append_oa_sample - Copies single OA report into userspace read() buffer.
> * @stream: An i915-perf stream opened for OA metrics
> * @buf: destination buffer given by userspace
> @@ -622,7 +642,9 @@ static int append_oa_sample(struct i915_perf_stream *stream,
> }
>
> if (sample_flags & SAMPLE_GPU_TS) {
> - /* Timestamp to be populated from OA report */
> + /* Timestamp populated from OA report */
> + gpu_ts = get_gpu_ts_from_oa_report(stream, report);
> +
> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
> return -EFAULT;
> }
I think everything above this line should be merged int patch 2.
It's better to have a single functional patch.
> @@ -2421,6 +2443,8 @@ static u64 i915_cyclecounter_read(const struct cyclecounter *cc)
> GEN7_TIMESTAMP_UDW);
> intel_runtime_pm_put(dev_priv);
>
> + stream->last_gpu_ts = ts_count;
This doesn't look right. You're already adding a delta in
get_gpu_ts_from_oa_report().
This will produce incorrect timestamps. Since at the moment we won't
allow opening without PROP_SAMPLE_OA, I would just drop this line.
> +
> return ts_count;
> }
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
` (5 preceding siblings ...)
2017-12-05 14:16 ` [RFC 0/4] " Lionel Landwerlin
@ 2017-12-06 20:02 ` Lionel Landwerlin
2017-12-22 5:15 ` Sagar Arun Kamble
2017-12-07 0:48 ` Robert Bragg
7 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-06 20:02 UTC (permalink / raw)
To: Sagar Arun Kamble, intel-gfx
I've put together some trival IGT tests :
https://github.com/djdeath/intel-gpu-tools/commits/wip/djdeath/cpu-timestamps
With a few changes which I pointed in the review :
https://github.com/djdeath/linux/commit/d0e4cf4d3f464491b4ffe97d112284d1ce73656d
Put together it seems to work relatively well.
There is still a small drift happening between the 2 timestamps. I've
noticed over a 160ms of OA reports, there is a accumulated difference of
~35us between the GPU timestamp and cpu timestamps.
I may be doing something wrong with the scaling in the tests, or maybe
there is an issue in the kernel, or both.
I'll build the GPUTop parts and see if the results make sense.
Thanks!,
-
Lionel
On 15/11/17 12:13, Sagar Arun Kamble wrote:
> We can compute system time corresponding to GPU timestamp by taking a
> reference point (CPU monotonic time, GPU timestamp) and then adding
> delta time computed using timecounter/cyclecounter support in kernel.
> We have to configure cyclecounter with the GPU timestamp frequency.
> Earlier approach that was based on cross-timestamp is not needed. It
> was being used to approximate the frequency based on invalid assumptions
> (possibly drift was being seen in the time due to precision issue).
> The precision of time from GPU clocks is already in ns and timecounter
> takes care of it as verified over variable durations.
>
> This series adds base timecounter/cyclecounter changes and changes to
> get GPU and CPU timestamps in OA samples.
>
> Sagar Arun Kamble (1):
> drm/i915/perf: Add support to correlate GPU timestamp with system time
>
> Sourab Gupta (3):
> drm/i915/perf: Add support for collecting 64 bit timestamps with OA
> reports
> drm/i915/perf: Extract raw GPU timestamps from OA reports
> drm/i915/perf: Send system clock monotonic time in perf samples
>
> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
> drivers/gpu/drm/i915/i915_perf.c | 124 ++++++++++++++++++++++++++++++++++++++-
> drivers/gpu/drm/i915/i915_reg.h | 6 ++
> include/uapi/drm/i915_drm.h | 14 +++++
> 4 files changed, 154 insertions(+), 1 deletion(-)
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
` (6 preceding siblings ...)
2017-12-06 20:02 ` Lionel Landwerlin
@ 2017-12-07 0:48 ` Robert Bragg
2017-12-07 0:57 ` Robert Bragg
2017-12-22 6:06 ` Sagar Arun Kamble
7 siblings, 2 replies; 45+ messages in thread
From: Robert Bragg @ 2017-12-07 0:48 UTC (permalink / raw)
To: Sagar Arun Kamble; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 4234 bytes --]
On Wed, Nov 15, 2017 at 12:13 PM, Sagar Arun Kamble <
sagar.a.kamble@intel.com> wrote:
> We can compute system time corresponding to GPU timestamp by taking a
> reference point (CPU monotonic time, GPU timestamp) and then adding
> delta time computed using timecounter/cyclecounter support in kernel.
> We have to configure cyclecounter with the GPU timestamp frequency.
> Earlier approach that was based on cross-timestamp is not needed. It
> was being used to approximate the frequency based on invalid assumptions
> (possibly drift was being seen in the time due to precision issue).
> The precision of time from GPU clocks is already in ns and timecounter
> takes care of it as verified over variable durations.
>
Hi Sagar,
I have some doubts about this analysis...
The intent behind Sourab's original approach was to be able to determine
the frequency at runtime empirically because the constants we have aren't
particularly accurate. Without a perfectly stable frequency that's known
very precisely then an interpolated correlation will inevitably drift. I
think the nature of HW implies we can't expect to have either of those.
Then the general idea had been to try and use existing kernel
infrastructure for a problem which isn't unique to GPU clocks.
That's not to say that a more limited, simpler solution based on frequent
re-correlation wouldn't be more than welcome if tracking an accurate
frequency is too awkward for now, but I think some things need to be
considered in that case:
- It would be good to quantify the kind of drift seen in practice to know
how frequently it's necessary to re-synchronize. It sounds like you've done
this ("as verified over variable durations") so I'm curious what kind of
drift you saw. I'd imagine you would see a significant drift over, say, one
second and it might not take much longer for the drift to even become
clearly visible to the user when plotted in a UI. For reference I once
updated the arb_timer_query test in piglit to give some insight into this
drift (
https://lists.freedesktop.org/archives/piglit/2016-September/020673.html)
and at least from what I wrote back then it looks like I was seeing a drift
of a few milliseconds per second on SKL. I vaguely recall it being much
worse given the frequency constants we had for Haswell.
- What guarantees will be promised about monotonicity of correlated system
timestamps? Will it be guaranteed that sequential reports must have
monotonically increasing timestamps? That might be fiddly if the gpu +
system clock are periodically re-correlated, so it might be good to be
clear in documentation that the correlation is best-effort only for the
sake of implementation simplicity. That would still be good for a lot of
UIs I think and there's freedom for the driver to start simple and
potentially improve later by measuring the gpu clock frequency empirically.
Currently only one correlated pair of timestamps is read when enabling the
stream and so a relatively long time is likely to pass before the stream is
disabled (seconds, minutes while a user is running a system profiler) . It
seems very likely to me that these clocks are going to drift significantly
without introducing some form of periodic re-synchronization based on some
understanding of the drift that's seen.
Br,
- Robert
> This series adds base timecounter/cyclecounter changes and changes to
> get GPU and CPU timestamps in OA samples.
>
> Sagar Arun Kamble (1):
> drm/i915/perf: Add support to correlate GPU timestamp with system time
>
> Sourab Gupta (3):
> drm/i915/perf: Add support for collecting 64 bit timestamps with OA
> reports
> drm/i915/perf: Extract raw GPU timestamps from OA reports
> drm/i915/perf: Send system clock monotonic time in perf samples
>
> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
> drivers/gpu/drm/i915/i915_perf.c | 124 ++++++++++++++++++++++++++++++
> ++++++++-
> drivers/gpu/drm/i915/i915_reg.h | 6 ++
> include/uapi/drm/i915_drm.h | 14 +++++
> 4 files changed, 154 insertions(+), 1 deletion(-)
>
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
[-- Attachment #1.2: Type: text/html, Size: 5468 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-07 0:48 ` Robert Bragg
@ 2017-12-07 0:57 ` Robert Bragg
2017-12-21 12:59 ` Lionel Landwerlin
2017-12-22 6:06 ` Sagar Arun Kamble
1 sibling, 1 reply; 45+ messages in thread
From: Robert Bragg @ 2017-12-07 0:57 UTC (permalink / raw)
To: Sagar Arun Kamble; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 472 bytes --]
On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg <robert@sixbynine.org> wrote:
>
> at least from what I wrote back then it looks like I was seeing a drift of
> a few milliseconds per second on SKL. I vaguely recall it being much worse
> given the frequency constants we had for Haswell.
>
Sorry I didn't actually re-read my own message properly before referencing
it :) Apparently the 2ms per second drift was for Haswell, so presumably
not quite so bad for SKL.
- Robert
[-- Attachment #1.2: Type: text/html, Size: 1062 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 2/4] drm/i915/perf: Add support for collecting 64 bit timestamps with OA reports
2017-12-06 16:01 ` Lionel Landwerlin
@ 2017-12-21 8:38 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-21 8:38 UTC (permalink / raw)
To: Lionel Landwerlin, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 12/6/2017 9:31 PM, Lionel Landwerlin wrote:
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1447,6 +1447,12 @@ enum drm_i915_perf_property_id {
>> DRM_I915_PERF_PROP_SAMPLE_OA,
>> /**
>> + * The value of this property set to 1 requests inclusion of GPU
>> + * timestamp in the perf sample data.
>> + */
>> + DRM_I915_PERF_PROP_SAMPLE_GPU_TS,
>> +
>> + /**
>> * The value specifies which set of OA unit metrics should be
>> * be configured, defining the contents of any OA unit reports.
>> */
>
>
> Inserting this, not at the end of this enum break API/ABI.
> This applies to other patches too.
Thanks. Will update.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 3/4] drm/i915/perf: Extract raw GPU timestamps from OA reports
2017-12-06 19:55 ` Lionel Landwerlin
@ 2017-12-21 8:50 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-21 8:50 UTC (permalink / raw)
To: Lionel Landwerlin, intel-gfx; +Cc: Sourab Gupta, Matthew Auld
On 12/7/2017 1:25 AM, Lionel Landwerlin wrote:
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>> From: Sourab Gupta <sourab.gupta@intel.com>
>>
>> The OA reports contain the least significant 32 bits of the gpu
>> timestamp.
>> This patch enables retrieval of the timestamp field from OA reports, to
>> forward as 64 bit raw gpu timestamps in the perf samples.
>>
>> v2: Rebase w.r.t new timecounter support.
>>
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Sourab Gupta <sourab.gupta@intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_drv.h | 2 ++
>> drivers/gpu/drm/i915/i915_perf.c | 26 +++++++++++++++++++++++++-
>> 2 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index e08bc85..5534cd2 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2151,6 +2151,8 @@ struct i915_perf_stream {
>> */
>> struct i915_oa_config *oa_config;
>> + u64 last_gpu_ts;
>> +
>> /**
>> * System time correlation variables.
>> */
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index f7e748c..3b721d7 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -575,6 +575,26 @@ static int append_oa_status(struct
>> i915_perf_stream *stream,
>> }
>> /**
>> + * get_gpu_ts_from_oa_report - Retrieve absolute gpu timestamp from
>> OA report
>> + *
>> + * Note: We are assuming that we're updating last_gpu_ts frequently
>> enough so
>> + * that it's never possible to see multiple overflows before we compare
>> + * sample_ts to last_gpu_ts. Since this is significantly large duration
>> + * (~6min for 80ns ts base), we can safely assume so.
>> + */
>> +static u64 get_gpu_ts_from_oa_report(struct i915_perf_stream *stream,
>> + const u8 *report)
>> +{
>> + u32 sample_ts = *(u32 *)(report + 4);
>> + u32 delta;
>> +
>> + delta = sample_ts - (u32)stream->last_gpu_ts;
>> + stream->last_gpu_ts += delta;
>> +
>> + return stream->last_gpu_ts;
>> +}
>> +
>> +/**
>> * append_oa_sample - Copies single OA report into userspace read()
>> buffer.
>> * @stream: An i915-perf stream opened for OA metrics
>> * @buf: destination buffer given by userspace
>> @@ -622,7 +642,9 @@ static int append_oa_sample(struct
>> i915_perf_stream *stream,
>> }
>> if (sample_flags & SAMPLE_GPU_TS) {
>> - /* Timestamp to be populated from OA report */
>> + /* Timestamp populated from OA report */
>> + gpu_ts = get_gpu_ts_from_oa_report(stream, report);
>> +
>> if (copy_to_user(buf, &gpu_ts, I915_PERF_TS_SAMPLE_SIZE))
>> return -EFAULT;
>> }
>
> I think everything above this line should be merged int patch 2.
> It's better to have a single functional patch.
Yes. Will merge.
>
>> @@ -2421,6 +2443,8 @@ static u64 i915_cyclecounter_read(const struct
>> cyclecounter *cc)
>> GEN7_TIMESTAMP_UDW);
>> intel_runtime_pm_put(dev_priv);
>> + stream->last_gpu_ts = ts_count;
>
> This doesn't look right. You're already adding a delta in
> get_gpu_ts_from_oa_report().
> This will produce incorrect timestamps. Since at the moment we won't
> allow opening without PROP_SAMPLE_OA, I would just drop this line.
>
Yes. makes sense. will remove.
>> +
>> return ts_count;
>> }
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-07 0:57 ` Robert Bragg
@ 2017-12-21 12:59 ` Lionel Landwerlin
2017-12-22 9:30 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-21 12:59 UTC (permalink / raw)
To: Robert Bragg, Sagar Arun Kamble; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 1792 bytes --]
Some more findings I made while playing with this series & GPUTop.
Turns out the 2ms drift per second is due to timecounter. Adding the
delta this way :
https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607
Eliminates the drift. Timelines of perf i915 tracepoints & OA reports
now make a lot more sense.
There is still the issue that reading the CPU clock & the RCS timestamp
is inherently not atomic. So there is a delta there.
I think we should add a new i915 perf record type to express the delta
that we measure this way :
https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475
So that userspace knows there might be a global offset between the 2
times and is able to present it.
Measurement on my KBL system were in the order of a few microseconds
(~30us).
I guess we might be able to setup the correlation point better (masking
interruption?) to reduce the delta.
Thanks,
-
Lionel
On 07/12/17 00:57, Robert Bragg wrote:
>
>
> On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg <robert@sixbynine.org
> <mailto:robert@sixbynine.org>> wrote:
>
>
> at least from what I wrote back then it looks like I was seeing a
> drift of a few milliseconds per second on SKL. I vaguely recall it
> being much worse given the frequency constants we had for Haswell.
>
>
> Sorry I didn't actually re-read my own message properly before
> referencing it :) Apparently the 2ms per second drift was for Haswell,
> so presumably not quite so bad for SKL.
>
> - Robert
>
>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[-- Attachment #1.2: Type: text/html, Size: 4242 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-06 20:02 ` Lionel Landwerlin
@ 2017-12-22 5:15 ` Sagar Arun Kamble
2017-12-22 5:26 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-22 5:15 UTC (permalink / raw)
To: Lionel Landwerlin, intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 5273 bytes --]
On 12/7/2017 1:32 AM, Lionel Landwerlin wrote:
> I've put together some trival IGT tests :
> https://github.com/djdeath/intel-gpu-tools/commits/wip/djdeath/cpu-timestamps
> With a few changes which I pointed in the review :
> https://github.com/djdeath/linux/commit/d0e4cf4d3f464491b4ffe97d112284d1ce73656d
>
> Put together it seems to work relatively well.
> There is still a small drift happening between the 2 timestamps. I've
> noticed over a 160ms of OA reports, there is a accumulated difference
> of ~35us between the GPU timestamp and cpu timestamps.
> I may be doing something wrong with the scaling in the tests, or maybe
> there is an issue in the kernel, or both.
Went through the testcase. scaled_gpu_delta calculation is same as what
timecounter does in kernel for calculating system_time corresponding
to gpu timestamp hence we don't see much of delta between these two
times/time deltas.
Ideally we should be testing the system time delta by sampling system
time and gpu timestamp atomically (which isn't feasible unless we do
some precise adjustments)
I have attempted to check these two times/delta by sampling them in
debugfs and reading over variable periods manually and checking the delta.
https://github.com/sakamble/i915-timestamp-support/commit/03be3056752d7b05a02cd01f5c20b3fcfcf18395
It is showing that delta is less than 10us in most cases for 30min and
occasionally around 20-30us. Again this delta might be because of
initial time setting as well.
timecounter initializing system time and gpu clocks as pair should be
highly accurate for which I have currently taken 350us start offset.
I think gpu timestamp clock is highly stable as seen in my testing on
SKL. I think this clock is used for calculations in GuC too so will
be good to get confirmation about the stability of this clock from them
and HW team.
systemd-udevd-169 [001] .... 3.035812: i915_driver_load: sys
start time: 1512308011156099790
systemd-udevd-169 [001] d... 3.036012: i915_cyclecounter_read:
52025098974
cat-1654 [001] .... 52.407957: i915_cyclecounter_read:
52617562292
cat-1654 [001] .... 52.407958: i915_timestamp_info:
sys time: 1512308060527894638
cat-1654 [001] .... 52.407958: i915_timestamp_info:
ts: 52617562292 device time: 1512308060528043050
cat-1684 [001] .... 177.239733: i915_cyclecounter_read:
54115543581
cat-1684 [001] .... 177.239736: i915_timestamp_info:
sys time: 1512308185359666602
cat-1684 [001] .... 177.239737: i915_timestamp_info:
ts: 54115543581 device time: 1512308185359817372
cat-1693 [001] .... 329.820374: i915_cyclecounter_read:
55946511277
cat-1693 [001] .... 329.820377: i915_timestamp_info:
sys time: 1512308337940301732
cat-1693 [001] .... 329.820378: i915_timestamp_info:
ts: 55946511277 device time: 1512308337940458996
<delta between system time delta and gpu time delta for above two
samples (177, 329) = 6494ns>
cat-1702 [001] .... 506.980313: i915_cyclecounter_read:
58072430542
cat-1702 [001] .... 506.980315: i915_timestamp_info:
sys time: 1512308515100233102
cat-1702 [001] .... 506.980317: i915_timestamp_info:
ts: 58072430542 device time: 1512308515100398084
<delta between system time delta and gpu time delta for above two
samples (329, 506) = 6494ns>
>
> I'll build the GPUTop parts and see if the results make sense.
>
> Thanks!,
>
> -
> Lionel
>
> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>> We can compute system time corresponding to GPU timestamp by taking a
>> reference point (CPU monotonic time, GPU timestamp) and then adding
>> delta time computed using timecounter/cyclecounter support in kernel.
>> We have to configure cyclecounter with the GPU timestamp frequency.
>> Earlier approach that was based on cross-timestamp is not needed. It
>> was being used to approximate the frequency based on invalid assumptions
>> (possibly drift was being seen in the time due to precision issue).
>> The precision of time from GPU clocks is already in ns and timecounter
>> takes care of it as verified over variable durations.
>>
>> This series adds base timecounter/cyclecounter changes and changes to
>> get GPU and CPU timestamps in OA samples.
>>
>> Sagar Arun Kamble (1):
>> drm/i915/perf: Add support to correlate GPU timestamp with system
>> time
>>
>> Sourab Gupta (3):
>> drm/i915/perf: Add support for collecting 64 bit timestamps with OA
>> reports
>> drm/i915/perf: Extract raw GPU timestamps from OA reports
>> drm/i915/perf: Send system clock monotonic time in perf samples
>>
>> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
>> drivers/gpu/drm/i915/i915_perf.c | 124
>> ++++++++++++++++++++++++++++++++++++++-
>> drivers/gpu/drm/i915/i915_reg.h | 6 ++
>> include/uapi/drm/i915_drm.h | 14 +++++
>> 4 files changed, 154 insertions(+), 1 deletion(-)
>>
>
[-- Attachment #1.2: Type: text/html, Size: 7586 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-22 5:15 ` Sagar Arun Kamble
@ 2017-12-22 5:26 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-22 5:26 UTC (permalink / raw)
To: Lionel Landwerlin, intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 5783 bytes --]
On 12/22/2017 10:45 AM, Sagar Arun Kamble wrote:
>
>
>
> On 12/7/2017 1:32 AM, Lionel Landwerlin wrote:
>> I've put together some trival IGT tests :
>> https://github.com/djdeath/intel-gpu-tools/commits/wip/djdeath/cpu-timestamps
>> With a few changes which I pointed in the review :
>> https://github.com/djdeath/linux/commit/d0e4cf4d3f464491b4ffe97d112284d1ce73656d
>>
>> Put together it seems to work relatively well.
>> There is still a small drift happening between the 2 timestamps. I've
>> noticed over a 160ms of OA reports, there is a accumulated difference
>> of ~35us between the GPU timestamp and cpu timestamps.
>> I may be doing something wrong with the scaling in the tests, or
>> maybe there is an issue in the kernel, or both.
> Went through the testcase. scaled_gpu_delta calculation is same as
> what timecounter does in kernel for calculating system_time corresponding
> to gpu timestamp hence we don't see much of delta between these two
> times/time deltas.
> Ideally we should be testing the system time delta by sampling system
> time and gpu timestamp atomically (which isn't feasible unless we do
> some precise adjustments)
> I have attempted to check these two times/delta by sampling them in
> debugfs and reading over variable periods manually and checking the delta.
> https://github.com/sakamble/i915-timestamp-support/commit/03be3056752d7b05a02cd01f5c20b3fcfcf18395
>
> It is showing that delta is less than 10us in most cases for 30min and
> occasionally around 20-30us. Again this delta might be because of
> initial time setting as well.
> timecounter initializing system time and gpu clocks as pair should be
> highly accurate for which I have currently taken 350us start offset.
>
> I think gpu timestamp clock is highly stable as seen in my testing on
> SKL. I think this clock is used for calculations in GuC too so will
> be good to get confirmation about the stability of this clock from
> them and HW team.
>
> systemd-udevd-169 [001] .... 3.035812: i915_driver_load: sys
> start time: 1512308011156099790
>
> systemd-udevd-169 [001] d... 3.036012:
> i915_cyclecounter_read: 52025098974
>
> cat-1654 [001] .... 52.407957:
> i915_cyclecounter_read: 52617562292
>
> cat-1654 [001] .... 52.407958: i915_timestamp_info:
> sys time: 1512308060527894638
>
> cat-1654 [001] .... 52.407958: i915_timestamp_info:
> ts: 52617562292 device time: 1512308060528043050
>
> cat-1684 [001] .... 177.239733:
> i915_cyclecounter_read: 54115543581
>
> cat-1684 [001] .... 177.239736: i915_timestamp_info:
> sys time: 1512308185359666602
>
> cat-1684 [001] .... 177.239737: i915_timestamp_info:
> ts: 54115543581 device time: 1512308185359817372
>
> cat-1693 [001] .... 329.820374:
> i915_cyclecounter_read: 55946511277
>
> cat-1693 [001] .... 329.820377: i915_timestamp_info:
> sys time: 1512308337940301732
>
> cat-1693 [001] .... 329.820378: i915_timestamp_info:
> ts: 55946511277 device time: 1512308337940458996
>
>
> <delta between system time delta and gpu time delta for above two
> samples (177, 329) = 6494ns>
>
>
> cat-1702 [001] .... 506.980313:
> i915_cyclecounter_read: 58072430542
>
> cat-1702 [001] .... 506.980315: i915_timestamp_info:
> sys time: 1512308515100233102
>
> cat-1702 [001] .... 506.980317: i915_timestamp_info:
> ts: 58072430542 device time: 1512308515100398084
>
> <delta between system time delta and gpu time delta for above two
> samples (329, 506) = 6494ns>
>
Fixing typo here:
<delta between system time delta and gpu time delta for above two
samples (329, 506) = 7718ns>
>>
>> I'll build the GPUTop parts and see if the results make sense.
>>
>> Thanks!,
>>
>> -
>> Lionel
>>
>> On 15/11/17 12:13, Sagar Arun Kamble wrote:
>>> We can compute system time corresponding to GPU timestamp by taking a
>>> reference point (CPU monotonic time, GPU timestamp) and then adding
>>> delta time computed using timecounter/cyclecounter support in kernel.
>>> We have to configure cyclecounter with the GPU timestamp frequency.
>>> Earlier approach that was based on cross-timestamp is not needed. It
>>> was being used to approximate the frequency based on invalid
>>> assumptions
>>> (possibly drift was being seen in the time due to precision issue).
>>> The precision of time from GPU clocks is already in ns and timecounter
>>> takes care of it as verified over variable durations.
>>>
>>> This series adds base timecounter/cyclecounter changes and changes to
>>> get GPU and CPU timestamps in OA samples.
>>>
>>> Sagar Arun Kamble (1):
>>> drm/i915/perf: Add support to correlate GPU timestamp with system
>>> time
>>>
>>> Sourab Gupta (3):
>>> drm/i915/perf: Add support for collecting 64 bit timestamps with OA
>>> reports
>>> drm/i915/perf: Extract raw GPU timestamps from OA reports
>>> drm/i915/perf: Send system clock monotonic time in perf samples
>>>
>>> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
>>> drivers/gpu/drm/i915/i915_perf.c | 124
>>> ++++++++++++++++++++++++++++++++++++++-
>>> drivers/gpu/drm/i915/i915_reg.h | 6 ++
>>> include/uapi/drm/i915_drm.h | 14 +++++
>>> 4 files changed, 154 insertions(+), 1 deletion(-)
>>>
>>
>
>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[-- Attachment #1.2: Type: text/html, Size: 8774 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-07 0:48 ` Robert Bragg
2017-12-07 0:57 ` Robert Bragg
@ 2017-12-22 6:06 ` Sagar Arun Kamble
1 sibling, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-22 6:06 UTC (permalink / raw)
To: Robert Bragg; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 7440 bytes --]
On 12/7/2017 6:18 AM, Robert Bragg wrote:
>
>
> On Wed, Nov 15, 2017 at 12:13 PM, Sagar Arun Kamble
> <sagar.a.kamble@intel.com <mailto:sagar.a.kamble@intel.com>> wrote:
>
> We can compute system time corresponding to GPU timestamp by taking a
> reference point (CPU monotonic time, GPU timestamp) and then adding
> delta time computed using timecounter/cyclecounter support in kernel.
> We have to configure cyclecounter with the GPU timestamp frequency.
> Earlier approach that was based on cross-timestamp is not needed. It
> was being used to approximate the frequency based on invalid
> assumptions
> (possibly drift was being seen in the time due to precision issue).
> The precision of time from GPU clocks is already in ns and timecounter
> takes care of it as verified over variable durations.
>
>
> Hi Sagar,
>
> I have some doubts about this analysis...
>
> The intent behind Sourab's original approach was to be able to
> determine the frequency at runtime empirically because the constants
> we have aren't particularly accurate. Without a perfectly stable
> frequency that's known very precisely then an interpolated correlation
> will inevitably drift. I think the nature of HW implies we can't
> expect to have either of those. Then the general idea had been to try
> and use existing kernel infrastructure for a problem which isn't
> unique to GPU clocks.
Hi Robert,
Testing on SKL shows timestamps drift only about 10us for sampling done
in kernel for about 30min time.
Verified with changes from
https://github.com/sakamble/i915-timestamp-support/commits/drm-tip
Note that since we are sampling counter in debugfs, there is likely
overhead of read that is adding to drift so adjustment might be needed.
But with OA reports we just have to worry about initial timecounter
setup where we need accurate pair of system time and GPU timestamp clock
counts.
I think timestamp clock is highly stable and we don't need logic to
determine frequency at runtime. Will try to get confirmation from HW
team as well.
If we need to determine the frequency, Sourab's approach needs to refined as
1. It can be implemented entirely in i915 because what we need is pair
of system time and gpu clocks over different durations.
2. crosstimestamp framework usage in that approach is incorrect as
ideally we should be sending ART counter and GPU counter. Instead we were
hacking to send the TSC clock.
Quoting Thomas from https://patchwork.freedesktop.org/patch/144298/
get_device_system_crosststamp() is for timestamps taken via a clock
which is directly correlated with the timekeeper clocksource.
ART and TSC are correlated via: TSC = (ART * scale) + offset
get_device_system_crosststamp() invokes the device function which reads
ART, which is converted to CLOCK_MONOTONIC_RAW by the conversion above,
and then uses interpolation to map the CLOCK_MONOTONIC_RAW value to
CLOCK_MONOTONIC.
The device function does not know anything about TSC. All it knows about
is ART.
I am not aware if GPU timestamp clock is correlated with TSC like ART
for ethernet drivers and if i915 can read ART like ethernet drivers.
3. I have seen precision issues in the calculations in
i915_perf_clock_sync_work and usage of MONO_RAW which might jump time.
>
> That's not to say that a more limited, simpler solution based on
> frequent re-correlation wouldn't be more than welcome if tracking an
> accurate frequency is too awkward for now
Adjusting timecounter time can be another option if we confirm that GPU
timestamp frequency is stable.
> , but I think some things need to be considered in that case:
>
> - It would be good to quantify the kind of drift seen in practice to
> know how frequently it's necessary to re-synchronize. It sounds like
> you've done this ("as verified over variable durations") so I'm
> curious what kind of drift you saw. I'd imagine you would see a
> significant drift over, say, one second and it might not take much
> longer for the drift to even become clearly visible to the user when
> plotted in a UI. For reference I once updated the arb_timer_query test
> in piglit to give some insight into this drift
> (https://lists.freedesktop.org/archives/piglit/2016-September/020673.html)
> and at least from what I wrote back then it looks like I was seeing a
> drift of a few milliseconds per second on SKL. I vaguely recall it
> being much worse given the frequency constants we had for Haswell.
>
On SKL I have seen very small drift of less than 10us over a period of
30 minutes.
Verified with changes from
https://github.com/sakamble/i915-timestamp-support/commits/drm-tip
36bit counter will overflow in about 95min at 12mhz and timecounter
framework considers
counter value with delta from timecounter init of more than half of
total time covered by counter as time in the past so current approach
works for less than 45min.
Will need to add overflow watchdog support like other drivers which just
reinitializes timecounter prior to 45min.
> - What guarantees will be promised about monotonicity of correlated
> system timestamps? Will it be guaranteed that sequential reports must
> have monotonically increasing timestamps? That might be fiddly if the
> gpu + system clock are periodically re-correlated, so it might be good
> to be clear in documentation that the correlation is best-effort only
> for the sake of implementation simplicity. That would still be good
> for a lot of UIs I think and there's freedom for the driver to start
> simple and potentially improve later by measuring the gpu clock
> frequency empirically.
>
If we rely on timecounter alone without correlation to know frequency,
setting init time as MONOTONIC system time should take care of
monotonicity of correlated times.
Regards,
Sagar
> Currently only one correlated pair of timestamps is read when enabling
> the stream and so a relatively long time is likely to pass before the
> stream is disabled (seconds, minutes while a user is running a system
> profiler) . It seems very likely to me that these clocks are going to
> drift significantly without introducing some form of periodic
> re-synchronization based on some understanding of the drift that's seen.
>
> Br,
> - Robert
>
>
>
> This series adds base timecounter/cyclecounter changes and changes to
> get GPU and CPU timestamps in OA samples.
>
> Sagar Arun Kamble (1):
> drm/i915/perf: Add support to correlate GPU timestamp with
> system time
>
> Sourab Gupta (3):
> drm/i915/perf: Add support for collecting 64 bit timestamps with OA
> reports
> drm/i915/perf: Extract raw GPU timestamps from OA reports
> drm/i915/perf: Send system clock monotonic time in perf samples
>
> drivers/gpu/drm/i915/i915_drv.h | 11 ++++
> drivers/gpu/drm/i915/i915_perf.c | 124
> ++++++++++++++++++++++++++++++++++++++-
> drivers/gpu/drm/i915/i915_reg.h | 6 ++
> include/uapi/drm/i915_drm.h | 14 +++++
> 4 files changed, 154 insertions(+), 1 deletion(-)
>
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> <mailto:Intel-gfx@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>
>
>
[-- Attachment #1.2: Type: text/html, Size: 12510 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-21 12:59 ` Lionel Landwerlin
@ 2017-12-22 9:30 ` Sagar Arun Kamble
2017-12-22 10:16 ` Lionel Landwerlin
0 siblings, 1 reply; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-22 9:30 UTC (permalink / raw)
To: Lionel Landwerlin, Robert Bragg; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 2225 bytes --]
On 12/21/2017 6:29 PM, Lionel Landwerlin wrote:
> Some more findings I made while playing with this series & GPUTop.
> Turns out the 2ms drift per second is due to timecounter. Adding the
> delta this way :
>
> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607
>
> Eliminates the drift.
I see two imp. changes 1. approximation of start time during
init_timecounter 2. overflow handling in delta accumulation.
With these incorporated, I guess timecounter should also work in same
fashion.
> Timelines of perf i915 tracepoints & OA reports now make a lot more sense.
>
> There is still the issue that reading the CPU clock & the RCS
> timestamp is inherently not atomic. So there is a delta there.
> I think we should add a new i915 perf record type to express the delta
> that we measure this way :
>
> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475
>
> So that userspace knows there might be a global offset between the 2
> times and is able to present it.
agree on this. Delta ns1-ns0 can be interpreted as max drift.
> Measurement on my KBL system were in the order of a few microseconds
> (~30us).
> I guess we might be able to setup the correlation point better
> (masking interruption?) to reduce the delta.
already using spin_lock. Do you mean NMI?
>
> Thanks,
>
> -
> Lionel
>
>
> On 07/12/17 00:57, Robert Bragg wrote:
>>
>>
>> On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg <robert@sixbynine.org
>> <mailto:robert@sixbynine.org>> wrote:
>>
>>
>> at least from what I wrote back then it looks like I was seeing a
>> drift of a few milliseconds per second on SKL. I vaguely recall
>> it being much worse given the frequency constants we had for Haswell.
>>
>>
>> Sorry I didn't actually re-read my own message properly before
>> referencing it :) Apparently the 2ms per second drift was for
>> Haswell, so presumably not quite so bad for SKL.
>>
>> - Robert
>>
>>
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
[-- Attachment #1.2: Type: text/html, Size: 5592 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-22 9:30 ` Sagar Arun Kamble
@ 2017-12-22 10:16 ` Lionel Landwerlin
2017-12-26 5:32 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-22 10:16 UTC (permalink / raw)
To: Sagar Arun Kamble, Robert Bragg; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 3013 bytes --]
On 22/12/17 09:30, Sagar Arun Kamble wrote:
>
>
>
> On 12/21/2017 6:29 PM, Lionel Landwerlin wrote:
>> Some more findings I made while playing with this series & GPUTop.
>> Turns out the 2ms drift per second is due to timecounter. Adding the
>> delta this way :
>>
>> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607
>>
>> Eliminates the drift.
> I see two imp. changes 1. approximation of start time during
> init_timecounter 2. overflow handling in delta accumulation.
> With these incorporated, I guess timecounter should also work in same
> fashion.
I think the arithmetic in timecounter is inherently lossy and that's why
we're seeing a drift. Could we be using it wrong?
In the patch above, I think there is still a drift because of the
potential fractional part loss at every delta we add.
But it should only be a fraction of a nanosecond multiplied by the
number of reports over a period of time.
With a report every 1us, that should still be much less than a 1ms of
drift over 1s.
We can probably do better by always computing the clock using the entire
delta rather than the accumulated delta.
>> Timelines of perf i915 tracepoints & OA reports now make a lot more
>> sense.
>>
>> There is still the issue that reading the CPU clock & the RCS
>> timestamp is inherently not atomic. So there is a delta there.
>> I think we should add a new i915 perf record type to express the
>> delta that we measure this way :
>>
>> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475
>>
>> So that userspace knows there might be a global offset between the 2
>> times and is able to present it.
> agree on this. Delta ns1-ns0 can be interpreted as max drift.
>> Measurement on my KBL system were in the order of a few microseconds
>> (~30us).
>> I guess we might be able to setup the correlation point better
>> (masking interruption?) to reduce the delta.
> already using spin_lock. Do you mean NMI?
I don't actually know much on this point.
if spin_lock is the best we can do, then that's it :)
>>
>> Thanks,
>>
>> -
>> Lionel
>>
>>
>> On 07/12/17 00:57, Robert Bragg wrote:
>>>
>>>
>>> On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg <robert@sixbynine.org
>>> <mailto:robert@sixbynine.org>> wrote:
>>>
>>>
>>> at least from what I wrote back then it looks like I was seeing
>>> a drift of a few milliseconds per second on SKL. I vaguely
>>> recall it being much worse given the frequency constants we had
>>> for Haswell.
>>>
>>>
>>> Sorry I didn't actually re-read my own message properly before
>>> referencing it :) Apparently the 2ms per second drift was for
>>> Haswell, so presumably not quite so bad for SKL.
>>>
>>> - Robert
>>>
>>>
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>>
>
[-- Attachment #1.2: Type: text/html, Size: 7046 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-22 10:16 ` Lionel Landwerlin
@ 2017-12-26 5:32 ` Sagar Arun Kamble
2017-12-28 17:13 ` Lionel Landwerlin
0 siblings, 1 reply; 45+ messages in thread
From: Sagar Arun Kamble @ 2017-12-26 5:32 UTC (permalink / raw)
To: Lionel Landwerlin, Robert Bragg; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 3852 bytes --]
On 12/22/2017 3:46 PM, Lionel Landwerlin wrote:
> On 22/12/17 09:30, Sagar Arun Kamble wrote:
>>
>>
>>
>> On 12/21/2017 6:29 PM, Lionel Landwerlin wrote:
>>> Some more findings I made while playing with this series & GPUTop.
>>> Turns out the 2ms drift per second is due to timecounter. Adding the
>>> delta this way :
>>>
>>> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607
>>>
>>> Eliminates the drift.
>> I see two imp. changes 1. approximation of start time during
>> init_timecounter 2. overflow handling in delta accumulation.
>> With these incorporated, I guess timecounter should also work in same
>> fashion.
>
> I think the arithmetic in timecounter is inherently lossy and that's
> why we're seeing a drift.
Could you share details about platform, scenario in which 2ms drift per
second is being seen with timecounter.
I did not observe this on SKL.
> Could we be using it wrong?
>
if we use two changes highlighted above with timecounter maybe we will
get same results as your current implementation.
> In the patch above, I think there is still a drift because of the
> potential fractional part loss at every delta we add.
> But it should only be a fraction of a nanosecond multiplied by the
> number of reports over a period of time.
> With a report every 1us, that should still be much less than a 1ms of
> drift over 1s.
>
timecounter interface takes care of fractional parts so that should help us.
we can either go with timecounter or our own implementation provided
conversions are precise.
> We can probably do better by always computing the clock using the
> entire delta rather than the accumulated delta.
>
issue is that the reported clock cycles in the OA report is 32bits LSB
of GPU TS whereas counter is 36bits. Hence we will need to
accumulate the delta. ofc there is assumption that two reports can't be
spaced with count value of 0xffffffff apart.
>>> Timelines of perf i915 tracepoints & OA reports now make a lot more
>>> sense.
>>>
>>> There is still the issue that reading the CPU clock & the RCS
>>> timestamp is inherently not atomic. So there is a delta there.
>>> I think we should add a new i915 perf record type to express the
>>> delta that we measure this way :
>>>
>>> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475
>>>
>>> So that userspace knows there might be a global offset between the 2
>>> times and is able to present it.
>> agree on this. Delta ns1-ns0 can be interpreted as max drift.
>>> Measurement on my KBL system were in the order of a few microseconds
>>> (~30us).
>>> I guess we might be able to setup the correlation point better
>>> (masking interruption?) to reduce the delta.
>> already using spin_lock. Do you mean NMI?
>
> I don't actually know much on this point.
> if spin_lock is the best we can do, then that's it :)
>
>>>
>>> Thanks,
>>>
>>> -
>>> Lionel
>>>
>>>
>>> On 07/12/17 00:57, Robert Bragg wrote:
>>>>
>>>>
>>>> On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg <robert@sixbynine.org
>>>> <mailto:robert@sixbynine.org>> wrote:
>>>>
>>>>
>>>> at least from what I wrote back then it looks like I was seeing
>>>> a drift of a few milliseconds per second on SKL. I vaguely
>>>> recall it being much worse given the frequency constants we had
>>>> for Haswell.
>>>>
>>>>
>>>> Sorry I didn't actually re-read my own message properly before
>>>> referencing it :) Apparently the 2ms per second drift was for
>>>> Haswell, so presumably not quite so bad for SKL.
>>>>
>>>> - Robert
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Intel-gfx mailing list
>>>> Intel-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
>>>
>>
>
[-- Attachment #1.2: Type: text/html, Size: 8793 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-26 5:32 ` Sagar Arun Kamble
@ 2017-12-28 17:13 ` Lionel Landwerlin
2018-01-03 5:38 ` Sagar Arun Kamble
0 siblings, 1 reply; 45+ messages in thread
From: Lionel Landwerlin @ 2017-12-28 17:13 UTC (permalink / raw)
To: Sagar Arun Kamble, Robert Bragg; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 5395 bytes --]
On 26/12/17 05:32, Sagar Arun Kamble wrote:
>
>
>
> On 12/22/2017 3:46 PM, Lionel Landwerlin wrote:
>> On 22/12/17 09:30, Sagar Arun Kamble wrote:
>>>
>>>
>>>
>>> On 12/21/2017 6:29 PM, Lionel Landwerlin wrote:
>>>> Some more findings I made while playing with this series & GPUTop.
>>>> Turns out the 2ms drift per second is due to timecounter. Adding
>>>> the delta this way :
>>>>
>>>> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607
>>>>
>>>> Eliminates the drift.
>>> I see two imp. changes 1. approximation of start time during
>>> init_timecounter 2. overflow handling in delta accumulation.
>>> With these incorporated, I guess timecounter should also work in
>>> same fashion.
>>
>> I think the arithmetic in timecounter is inherently lossy and that's
>> why we're seeing a drift.
> Could you share details about platform, scenario in which 2ms drift
> per second is being seen with timecounter.
> I did not observe this on SKL.
The 2ms drift was on SKL GT4.
With the patch above, I'm seeing only a ~40us drift over ~7seconds of
recording both perf tracepoints & i915 perf reports.
I'm tracking the kernel tracepoints adding gem requests and the i915
perf reports.
Here a screenshot at the beginning of the 7s recording :
https://i.imgur.com/hnexgjQ.png (you can see the gem request add before
the work starts in the i915 perf reports).
At the end of the recording, the gem requests appear later than the work
in the i915 perf report : https://i.imgur.com/oCd0C9T.png
I'll try to prepare some IGT tests that show the drift using perf & i915
perf, so we can run those on different platforms.
I tend to mostly test on a SKL GT4 & KBL GT2, but BXT definitely needs
more attention...
>> Could we be using it wrong?
>>
> if we use two changes highlighted above with timecounter maybe we will
> get same results as your current implementation.
>> In the patch above, I think there is still a drift because of the
>> potential fractional part loss at every delta we add.
>> But it should only be a fraction of a nanosecond multiplied by the
>> number of reports over a period of time.
>> With a report every 1us, that should still be much less than a 1ms of
>> drift over 1s.
>>
> timecounter interface takes care of fractional parts so that should
> help us.
> we can either go with timecounter or our own implementation provided
> conversions are precise.
Looking at clocks_calc_mult_shift(), it seems clear to me that there is
less precision when using timecounter :
/*
* Find the conversion shift/mult pair which has the best
* accuracy and fits the maxsec conversion range:
*/
On the other hand, there is a performance penalty for doing a div64 for
every report.
>> We can probably do better by always computing the clock using the
>> entire delta rather than the accumulated delta.
>>
> issue is that the reported clock cycles in the OA report is 32bits LSB
> of GPU TS whereas counter is 36bits. Hence we will need to
> accumulate the delta. ofc there is assumption that two reports can't
> be spaced with count value of 0xffffffff apart.
You're right :)
I thought maybe we could do this :
Look at teduhe opening period parameter, if it's superior to the period
of timestamps wrapping, make sure we schle some work on kernel context
to generate a context switch report (like at least once every 6 minutes
on gen9).
>>>> Timelines of perf i915 tracepoints & OA reports now make a lot more
>>>> sense.
>>>>
>>>> There is still the issue that reading the CPU clock & the RCS
>>>> timestamp is inherently not atomic. So there is a delta there.
>>>> I think we should add a new i915 perf record type to express the
>>>> delta that we measure this way :
>>>>
>>>> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475
>>>>
>>>> So that userspace knows there might be a global offset between the
>>>> 2 times and is able to present it.
>>> agree on this. Delta ns1-ns0 can be interpreted as max drift.
>>>> Measurement on my KBL system were in the order of a few
>>>> microseconds (~30us).
>>>> I guess we might be able to setup the correlation point better
>>>> (masking interruption?) to reduce the delta.
>>> already using spin_lock. Do you mean NMI?
>>
>> I don't actually know much on this point.
>> if spin_lock is the best we can do, then that's it :)
>>
>>>>
>>>> Thanks,
>>>>
>>>> -
>>>> Lionel
>>>>
>>>>
>>>> On 07/12/17 00:57, Robert Bragg wrote:
>>>>>
>>>>>
>>>>> On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg
>>>>> <robert@sixbynine.org <mailto:robert@sixbynine.org>> wrote:
>>>>>
>>>>>
>>>>> at least from what I wrote back then it looks like I was
>>>>> seeing a drift of a few milliseconds per second on SKL. I
>>>>> vaguely recall it being much worse given the frequency
>>>>> constants we had for Haswell.
>>>>>
>>>>>
>>>>> Sorry I didn't actually re-read my own message properly before
>>>>> referencing it :) Apparently the 2ms per second drift was for
>>>>> Haswell, so presumably not quite so bad for SKL.
>>>>>
>>>>> - Robert
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Intel-gfx mailing list
>>>>> Intel-gfx@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>>
>>>>
>>>
>>
>
[-- Attachment #1.2: Type: text/html, Size: 11534 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events
2017-12-28 17:13 ` Lionel Landwerlin
@ 2018-01-03 5:38 ` Sagar Arun Kamble
0 siblings, 0 replies; 45+ messages in thread
From: Sagar Arun Kamble @ 2018-01-03 5:38 UTC (permalink / raw)
To: Lionel Landwerlin, Robert Bragg; +Cc: Intel Graphics Development
[-- Attachment #1.1: Type: text/plain, Size: 6819 bytes --]
On 12/28/2017 10:43 PM, Lionel Landwerlin wrote:
> On 26/12/17 05:32, Sagar Arun Kamble wrote:
>>
>>
>>
>> On 12/22/2017 3:46 PM, Lionel Landwerlin wrote:
>>> On 22/12/17 09:30, Sagar Arun Kamble wrote:
>>>>
>>>>
>>>>
>>>> On 12/21/2017 6:29 PM, Lionel Landwerlin wrote:
>>>>> Some more findings I made while playing with this series & GPUTop.
>>>>> Turns out the 2ms drift per second is due to timecounter. Adding
>>>>> the delta this way :
>>>>>
>>>>> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R607
>>>>>
>>>>> Eliminates the drift.
>>>> I see two imp. changes 1. approximation of start time during
>>>> init_timecounter 2. overflow handling in delta accumulation.
>>>> With these incorporated, I guess timecounter should also work in
>>>> same fashion.
>>>
>>> I think the arithmetic in timecounter is inherently lossy and that's
>>> why we're seeing a drift.
>> Could you share details about platform, scenario in which 2ms drift
>> per second is being seen with timecounter.
>> I did not observe this on SKL.
>
> The 2ms drift was on SKL GT4.
>
I have checked the timecounter arithmetic. Accuracy is very high (of the
order of micro ns per tick).
I interpreted maxsec parameter in calculation of mult/shift using
clocks_calc_mult_shift function as total time covered by counter
but actually it controls the conversion accuracy. Since we want best
possible accuracy passing zero should be preferred there.
For instance below are the mult/shift values and time reported for 10
minutes with these values for SKL GT2 12mhz.
As you can see drift due to calculation is only about 2us. We should
check by passing zero to clocks_calc_mult_shift and
delta handling new added with timecounter on SKL GT4. 2ms is huge drift
and it is very unlikely related to these calculations.
maxsec, mult, shift, tick time (mult/2^shift), total time
(10*60*12000000 * tick time), drift due to calculation
0, 2796202667, 25, 83.33333334326, 600,000,000,071.525, 71ns
3000, 174762667, 21, 83.33333349227, 600,000,001,144.409, 1144ns
6000, 87381333, 20, 83.33333301544, 599,999,997,711.181, 2289ns
> With the patch above, I'm seeing only a ~40us drift over ~7seconds of
> recording both perf tracepoints & i915 perf reports.
> I'm tracking the kernel tracepoints adding gem requests and the i915
> perf reports.
> Here a screenshot at the beginning of the 7s recording :
> https://i.imgur.com/hnexgjQ.png (you can see the gem request add
> before the work starts in the i915 perf reports).
> At the end of the recording, the gem requests appear later than the
> work in the i915 perf report : https://i.imgur.com/oCd0C9T.png
>
Looks like we need to have error margin of only few microseconds :)
> I'll try to prepare some IGT tests that show the drift using perf &
> i915 perf, so we can run those on different platforms.
> I tend to mostly test on a SKL GT4 & KBL GT2, but BXT definitely needs
> more attention...
>
>>> Could we be using it wrong?
>>>
>> if we use two changes highlighted above with timecounter maybe we
>> will get same results as your current implementation.
>>> In the patch above, I think there is still a drift because of the
>>> potential fractional part loss at every delta we add.
>>> But it should only be a fraction of a nanosecond multiplied by the
>>> number of reports over a period of time.
>>> With a report every 1us, that should still be much less than a 1ms
>>> of drift over 1s.
>>>
>> timecounter interface takes care of fractional parts so that should
>> help us.
>> we can either go with timecounter or our own implementation provided
>> conversions are precise.
>
> Looking at clocks_calc_mult_shift(), it seems clear to me that there
> is less precision when using timecounter :
>
> /*
> * Find the conversion shift/mult pair which has the best
> * accuracy and fits the maxsec conversion range:
> */
>
We can improve upon this by passing zero as maxsec to
clocks_calc_mult_shift.
> On the other hand, there is a performance penalty for doing a div64
> for every report.
>
>>> We can probably do better by always computing the clock using the
>>> entire delta rather than the accumulated delta.
>>>
>> issue is that the reported clock cycles in the OA report is 32bits
>> LSB of GPU TS whereas counter is 36bits. Hence we will need to
>> accumulate the delta. ofc there is assumption that two reports can't
>> be spaced with count value of 0xffffffff apart.
>
> You're right :)
> I thought maybe we could do this :
>
> Look at teduhe opening period parameter, if it's superior to the
> period of timestamps wrapping, make sure we schle some work on kernel
> context to generate a context switch report (like at least once every
> 6 minutes on gen9).
>
Looks fine to me.
>>>>> Timelines of perf i915 tracepoints & OA reports now make a lot
>>>>> more sense.
>>>>>
>>>>> There is still the issue that reading the CPU clock & the RCS
>>>>> timestamp is inherently not atomic. So there is a delta there.
>>>>> I think we should add a new i915 perf record type to express the
>>>>> delta that we measure this way :
>>>>>
>>>>> https://github.com/djdeath/linux/commit/7b002cb360483e331053aec0f98433a5bd5c5c3f#diff-9b74bd0cfaa90b601d80713c7bd56be4R2475
>>>>>
>>>>> So that userspace knows there might be a global offset between the
>>>>> 2 times and is able to present it.
>>>> agree on this. Delta ns1-ns0 can be interpreted as max drift.
>>>>> Measurement on my KBL system were in the order of a few
>>>>> microseconds (~30us).
>>>>> I guess we might be able to setup the correlation point better
>>>>> (masking interruption?) to reduce the delta.
>>>> already using spin_lock. Do you mean NMI?
>>>
>>> I don't actually know much on this point.
>>> if spin_lock is the best we can do, then that's it :)
>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -
>>>>> Lionel
>>>>>
>>>>>
>>>>> On 07/12/17 00:57, Robert Bragg wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 7, 2017 at 12:48 AM, Robert Bragg
>>>>>> <robert@sixbynine.org <mailto:robert@sixbynine.org>> wrote:
>>>>>>
>>>>>>
>>>>>> at least from what I wrote back then it looks like I was
>>>>>> seeing a drift of a few milliseconds per second on SKL. I
>>>>>> vaguely recall it being much worse given the frequency
>>>>>> constants we had for Haswell.
>>>>>>
>>>>>>
>>>>>> Sorry I didn't actually re-read my own message properly before
>>>>>> referencing it :) Apparently the 2ms per second drift was for
>>>>>> Haswell, so presumably not quite so bad for SKL.
>>>>>>
>>>>>> - Robert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Intel-gfx mailing list
>>>>>> Intel-gfx@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>>>
>>>>>
>>>>
>>>
>>
>
[-- Attachment #1.2: Type: text/html, Size: 14215 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2018-01-03 5:38 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-15 12:13 [RFC 0/4] GPU/CPU timestamps correlation for relating OA samples with system events Sagar Arun Kamble
2017-11-15 12:13 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Sagar Arun Kamble
2017-11-15 12:25 ` Chris Wilson
2017-11-15 16:41 ` Sagar Arun Kamble
2017-11-23 7:34 ` Creating cyclecounter and lock member in timecounter structure [ Was Re: [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time] Sagar Arun Kamble
2017-11-23 18:59 ` Thomas Gleixner
2017-11-24 9:06 ` Sagar Arun Kamble
2017-11-24 13:31 ` Thomas Gleixner
2017-11-27 10:05 ` Sagar Arun Kamble
2017-11-27 10:05 ` Sagar Arun Kamble
[not found] ` <63a2a495-1bdb-5d47-1202-9b538e9601d8-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-11-30 21:03 ` Saeed Mahameed
2017-11-30 21:03 ` Saeed Mahameed
[not found] ` <CALzJLG9JXOnr3EQ2zLcmwKx8S9-CGONRRBSAd9XwHdemEgOn2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-01 7:42 ` Sagar Arun Kamble
2017-12-01 7:42 ` Sagar Arun Kamble
2017-12-05 13:58 ` [RFC 1/4] drm/i915/perf: Add support to correlate GPU timestamp with system time Lionel Landwerlin
2017-12-06 8:17 ` Sagar Arun Kamble
2017-11-15 12:13 ` [RFC 2/4] drm/i915/perf: Add support for collecting 64 bit timestamps with OA reports Sagar Arun Kamble
2017-12-06 16:01 ` Lionel Landwerlin
2017-12-21 8:38 ` Sagar Arun Kamble
2017-11-15 12:13 ` [RFC 3/4] drm/i915/perf: Extract raw GPU timestamps from " Sagar Arun Kamble
2017-12-06 19:55 ` Lionel Landwerlin
2017-12-21 8:50 ` Sagar Arun Kamble
2017-11-15 12:13 ` [RFC 4/4] drm/i915/perf: Send system clock monotonic time in perf samples Sagar Arun Kamble
2017-11-15 12:31 ` Chris Wilson
2017-11-15 16:51 ` Sagar Arun Kamble
2017-11-15 17:54 ` Sagar Arun Kamble
2017-12-05 14:22 ` Lionel Landwerlin
2017-12-06 8:31 ` Sagar Arun Kamble
2017-11-15 12:30 ` ✗ Fi.CI.BAT: warning for GPU/CPU timestamps correlation for relating OA samples with system events Patchwork
2017-12-05 14:16 ` [RFC 0/4] " Lionel Landwerlin
2017-12-05 14:28 ` Robert Bragg
2017-12-05 14:37 ` Lionel Landwerlin
2017-12-06 9:01 ` Sagar Arun Kamble
2017-12-06 20:02 ` Lionel Landwerlin
2017-12-22 5:15 ` Sagar Arun Kamble
2017-12-22 5:26 ` Sagar Arun Kamble
2017-12-07 0:48 ` Robert Bragg
2017-12-07 0:57 ` Robert Bragg
2017-12-21 12:59 ` Lionel Landwerlin
2017-12-22 9:30 ` Sagar Arun Kamble
2017-12-22 10:16 ` Lionel Landwerlin
2017-12-26 5:32 ` Sagar Arun Kamble
2017-12-28 17:13 ` Lionel Landwerlin
2018-01-03 5:38 ` Sagar Arun Kamble
2017-12-22 6:06 ` Sagar Arun Kamble
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.