From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Bragg Subject: Re: [PATCH v3 6/7] drm/i915/perf: per-gen timebase for checking sample freq Date: Wed, 5 Apr 2017 18:59:52 +0100 Message-ID: References: <20170405162320.30094-1-robert@sixbynine.org> <20170405162320.30094-7-robert@sixbynine.org> <20170405170617.GS30290@intel.com> <493dc7dc-c7b4-6cf4-780c-7fe7364f1b1d@linux.intel.com> <20170405172615.GU30290@intel.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1060128264==" Return-path: Received: from mail-qk0-x242.google.com (mail-qk0-x242.google.com [IPv6:2607:f8b0:400d:c09::242]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4CBA66E86C for ; Wed, 5 Apr 2017 18:00:14 +0000 (UTC) Received: by mail-qk0-x242.google.com with SMTP id v75so2715249qkb.3 for ; Wed, 05 Apr 2017 11:00:14 -0700 (PDT) In-Reply-To: <20170405172615.GU30290@intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: =?UTF-8?B?VmlsbGUgU3lyasOkbMOk?= Cc: Lionel Landwerlin , Intel Graphics Development List-Id: intel-gfx@lists.freedesktop.org --===============1060128264== Content-Type: multipart/alternative; boundary=94eb2c08b83800148c054c6f2aae --94eb2c08b83800148c054c6f2aae Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, Apr 5, 2017 at 6:26 PM, Ville Syrj=C3=A4l=C3=A4 wrote: > On Wed, Apr 05, 2017 at 06:17:36PM +0100, Lionel Landwerlin wrote: > > On 05/04/17 18:06, Ville Syrj=C3=A4l=C3=A4 wrote: > > > On Wed, Apr 05, 2017 at 05:23:19PM +0100, Robert Bragg wrote: > > >> An oa_exponent_to_ns() utility and per-gen timebase constants where > > >> recently removed when updating the tail pointer race condition WA, a= nd > > >> this restores those so we can update the _PROP_OA_EXPONENT validatio= n > > >> done in read_properties_unlocked() to not assume we have a 12.5KHz > > >> timebase as we did for Haswell. > > >> > > >> Signed-off-by: Robert Bragg > > >> Cc: Lionel Landwerlin > > >> --- > > >> drivers/gpu/drm/i915/i915_drv.h | 1 + > > >> drivers/gpu/drm/i915/i915_perf.c | 21 +++++++++++++++------ > > >> 2 files changed, 16 insertions(+), 6 deletions(-) > > >> > > >> diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > > >> index 3a22b6fd0ee6..48b07d706f06 100644 > > >> --- a/drivers/gpu/drm/i915/i915_drv.h > > >> +++ b/drivers/gpu/drm/i915/i915_drv.h > > >> @@ -2463,6 +2463,7 @@ struct drm_i915_private { > > >> > > >> bool periodic; > > >> int period_exponent; > > >> + int timestamp_frequency; > > >> > > >> int metrics_set; > > >> > > >> diff --git a/drivers/gpu/drm/i915/i915_perf.c > b/drivers/gpu/drm/i915/i915_perf.c > > >> index 98eb6415b63a..87c0d1ce1b9f 100644 > > >> --- a/drivers/gpu/drm/i915/i915_perf.c > > >> +++ b/drivers/gpu/drm/i915/i915_perf.c > > >> @@ -2549,6 +2549,12 @@ i915_perf_open_ioctl_locked(struct > drm_i915_private *dev_priv, > > >> return ret; > > >> } > > >> > > >> +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int > exponent) > > >> +{ > > >> + return div_u64(1000000000ULL * (2ULL << exponent), > > >> + dev_priv->perf.oa.timestamp_frequency); > > >> +} > > >> + > > >> /** > > >> * read_properties_unlocked - validate + copy userspace stream ope= n > properties > > >> * @dev_priv: i915 device instance > > >> @@ -2647,14 +2653,9 @@ static int read_properties_unlocked(struct > drm_i915_private *dev_priv, > > >> /* Theoretically we can program the OA unit to > sample > > >> * every 160ns but don't allow that by default > unless > > >> * root. > > >> - * > > >> - * On Haswell the period is derived from the > exponent > > >> - * as: > > >> - * > > >> - * period =3D 80ns * 2^(exponent + 1) > > >> */ > > >> BUILD_BUG_ON(sizeof(oa_period) !=3D 8); > > >> - oa_period =3D 80ull * (2ull << value); > > >> + oa_period =3D oa_exponent_to_ns(dev_priv, value); > > >> > > >> /* This check is primarily to ensure that > oa_period <=3D > > >> * UINT32_MAX (before passing to do_div which onl= y > > >> @@ -2910,6 +2911,8 @@ void i915_perf_init(struct drm_i915_private > *dev_priv) > > >> dev_priv->perf.oa.ops.oa_hw_tail_read =3D > > >> gen7_oa_hw_tail_read; > > >> > > >> + dev_priv->perf.oa.timestamp_frequency =3D 12500000; > > >> + > > >> dev_priv->perf.oa.oa_formats =3D hsw_oa_formats; > > >> > > >> dev_priv->perf.oa.n_builtin_sets =3D > > >> @@ -2923,6 +2926,8 @@ void i915_perf_init(struct drm_i915_private > *dev_priv) > > >> */ > > >> > > >> if (IS_GEN8(dev_priv)) { > > >> + dev_priv->perf.oa.timestamp_frequency =3D 1250000= 0; > > >> + > > >> dev_priv->perf.oa.ctx_oactxctrl_offset =3D 0x120; > > >> dev_priv->perf.oa.ctx_flexeu0_offset =3D 0x2ce; > > >> dev_priv->perf.oa.gen8_valid_ctx_bit =3D (1<<25); > > >> @@ -2939,6 +2944,8 @@ void i915_perf_init(struct drm_i915_private > *dev_priv) > > >> i915_oa_select_metric_set_chv; > > >> } > > >> } else if (IS_GEN9(dev_priv)) { > > >> + dev_priv->perf.oa.timestamp_frequency =3D 1200000= 0; > > >> + > > >> dev_priv->perf.oa.ctx_oactxctrl_offset =3D 0x128; > > >> dev_priv->perf.oa.ctx_flexeu0_offset =3D 0x3de; > > >> dev_priv->perf.oa.gen8_valid_ctx_bit =3D (1<<16); > > >> @@ -2959,6 +2966,8 @@ void i915_perf_init(struct drm_i915_private > *dev_priv) > > >> dev_priv->perf.oa.ops.select_metric_set = =3D > > >> i915_oa_select_metric_set_sklgt4; > > >> } else if (IS_BROXTON(dev_priv)) { > > >> + dev_priv->perf.oa.timestamp_frequency =3D > 19200123; > > >> + > > > Why isn't this exactly 19.2MHz? > > > > It's based off the timestamp base (from documentation) : > > > > BDW: 80ns > > SKL: 83.333ns > > BXT: 52.083ns > > > > 1000000000 / 19200123 is the closest we can get. > > I'm pretty sure the clock is actually 19.2 and the 52.083 is just a > truncated value. Same for the 83.333 where the code does specify > 120MHz exactly. > Ah, right, that sounds most likely. I'll update. Thanks, - Robert > > > > > > > > >> dev_priv->perf.oa.n_builtin_sets =3D > > >> i915_oa_n_builtin_metric_sets_bxt= ; > > >> dev_priv->perf.oa.ops.select_metric_set = =3D > > >> -- > > >> 2.12.0 > > >> > > >> _______________________________________________ > > >> Intel-gfx mailing list > > >> Intel-gfx@lists.freedesktop.org > > >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > -- > Ville Syrj=C3=A4l=C3=A4 > Intel OTC > --94eb2c08b83800148c054c6f2aae Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Wed, Apr 5, 2017 at 6:26 PM, Ville Syrj=C3=A4l=C3=A4 <v= ille.syrjala@linux.intel.com> wrote:
On Wed, Apr 05, 2017 at 0= 6:17:36PM +0100, Lionel Landwerlin wrote:
> On 05/04/17 18:06, Ville Syrj=C3=A4l=C3=A4 wrote:
> > On Wed, Apr 05, 2017 at 05:23:19PM +0100, Robert Bragg wrote:
> >> An oa_exponent_to_ns() utility and per-gen timebase constants= where
> >> recently removed when updating the tail pointer race conditio= n WA, and
> >> this restores those so we can update the _PROP_OA_EXPONENT va= lidation
> >> done in read_properties_unlocked() to not assume we have a 12= .5KHz
> >> timebase as we did for Haswell.
> >>
> >> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
> >> ---
> >>=C2=A0 =C2=A0drivers/gpu/drm/i915/i915_drv.h=C2=A0 |=C2= =A0 1 +
> >>=C2=A0 =C2=A0drivers/gpu/drm/i915/i915_perf.c | 21 ++++++= +++++++++------
> >>=C2=A0 =C2=A02 files changed, 16 insertions(+), 6 deletions(-)=
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/g= pu/drm/i915/i915_drv.h
> >> index 3a22b6fd0ee6..48b07d706f06 100644
> >> --- a/drivers/gpu/drm/i915/i915_drv.h
> >> +++ b/drivers/gpu/drm/i915/i915_drv.h
> >> @@ -2463,6 +2463,7 @@ struct drm_i915_private {
> >>
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 bool periodic;
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 int period_exponent;
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 int timestamp_frequency;
> >>
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 int metrics_set;
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/= gpu/drm/i915/i915_perf.c
> >> index 98eb6415b63a..87c0d1ce1b9f 100644
> >> --- a/drivers/gpu/drm/i915/i915_perf.c
> >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> @@ -2549,6 +2549,12 @@ i915_perf_open_ioctl_locked(struc= t drm_i915_private *dev_priv,
> >>=C2=A0 =C2=A0 return ret;
> >>=C2=A0 =C2=A0}
> >>
> >> +static u64 oa_exponent_to_ns(struct drm_i915_private *dev_pr= iv, int exponent)
> >> +{
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0return div_u64(1000000000ULL * (2= ULL << exponent),
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 dev_priv->perf.oa.timestamp_frequency);
> >> +}
> >> +
> >>=C2=A0 =C2=A0/**
> >>=C2=A0 =C2=A0 * read_properties_unlocked - validate + copy use= rspace stream open properties
> >>=C2=A0 =C2=A0 * @dev_priv: i915 device instance
> >> @@ -2647,14 +2653,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 /* Theoretically we can program the OA unit to sample
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0* every 160ns but don't allow that by default unless
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0* root.
> >> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0*
> >> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0* On Haswell the period is derived from the exponent
> >> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0* as:
> >> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0*
> >> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0*=C2=A0 =C2=A0period =3D 80ns * 2^(exponent + 1)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0*/
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 BUILD_BUG_ON(sizeof(oa_period) !=3D 8);
> >> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 oa_period =3D 80ull * (2ull << value);
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 oa_period =3D oa_exponent_to_ns(dev_priv, value);
> >>
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 /* This check is primarily to ensure that oa_period <=3D
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0* UINT32_MAX (before passing to do_div which only
> >> @@ -2910,6 +2911,8 @@ void i915_perf_init(struct drm_i915_pri= vate *dev_priv)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_priv->perf.oa= .ops.oa_hw_tail_read =3D
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 gen7_oa_hw_tail_read;
> >>
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_priv->perf.oa.time= stamp_frequency =3D 12500000;
> >> +
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_priv->perf.oa= .oa_formats =3D hsw_oa_formats;
> >>
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_priv->perf.oa= .n_builtin_sets =3D
> >> @@ -2923,6 +2926,8 @@ void i915_perf_init(struct drm_i915_pri= vate *dev_priv)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
> >>
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (IS_GEN8(dev_priv= )) {
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 dev_priv->perf.oa.timestamp_frequency =3D 12500000;
> >> +
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 dev_priv->perf.oa.ctx_oactxctrl_offset =3D 0x120;
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 dev_priv->perf.oa.ctx_flexeu0_offset =3D 0x2ce;
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 dev_priv->perf.oa.gen8_valid_ctx_bit =3D (1<<25);
> >> @@ -2939,6 +2944,8 @@ void i915_perf_init(struct drm_i915_pri= vate *dev_priv)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i915_oa_sel= ect_metric_set_chv;
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 }
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } else if (IS_GEN9(d= ev_priv)) {
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 dev_priv->perf.oa.timestamp_frequency =3D 12000000;
> >> +
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 dev_priv->perf.oa.ctx_oactxctrl_offset =3D 0x128;
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 dev_priv->perf.oa.ctx_flexeu0_offset =3D 0x3de;
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 dev_priv->perf.oa.gen8_valid_ctx_bit =3D (1<<16);
> >> @@ -2959,6 +2966,8 @@ void i915_perf_init(struct drm_i915_pri= vate *dev_priv)
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_priv->perf.oa.ops.select_me= tric_set =3D
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i915_oa_sel= ect_metric_set_sklgt4;
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 } else if (IS_BROXTON(dev_priv)) {
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_priv->perf.oa.timestamp_frequen= cy =3D 19200123;
> >> +
> > Why isn't this exactly 19.2MHz?
>
> It's based off the timestamp base (from documentation) :
>
> BDW: 80ns
> SKL: 83.333ns
> BXT: 52.083ns
>
> 1000000000 / 19200123 is the closest we can get.

I'm pretty sure the clock is actually 19.2 and the 52.083 i= s just a
truncated value. Same for the 83.333 where the code does specify
120MHz exactly.

Ah, right, that sounds = most likely. I'll update.

Thanks,
- Rob= ert
=C2=A0

>
> >
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_priv->perf.oa.n_builtin_set= s =3D
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i915_oa_n_b= uiltin_metric_sets_bxt;
> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 dev_priv->perf.oa.ops.select_me= tric_set =3D
> >> --
> >> 2.12.0
> >>
> >> _______________________________________________
> >> Intel-gfx mailing list
> >> Intel-gfx@= lists.freedesktop.org
> >> https://lists.freedesktop.org/= mailman/listinfo/intel-gfx
>

--
Ville Syrj=C3=A4l=C3=A4
Intel OTC

--94eb2c08b83800148c054c6f2aae-- --===============1060128264== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4Cg== --===============1060128264==--