All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Steven Price <steven.price@arm.com>
Cc: Rob Herring <robh+dt@kernel.org>,
	kernel@collabora.com, Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [RESEND PATCH v2 2/2] drm/panfrost: Expose perf counters through debugfs
Date: Wed, 29 May 2019 08:55:18 +0200	[thread overview]
Message-ID: <20190529085518.65c7550b@collabora.com> (raw)
In-Reply-To: <20190528160115.5ef0a9b0@collabora.com>

On Tue, 28 May 2019 16:01:15 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Tue, 28 May 2019 15:53:48 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
> 
> > Hi Steve,
> > 
> > On Thu, 16 May 2019 17:21:39 +0100
> > Steven Price <steven.price@arm.com> wrote:
> > 
> >   
> > > >> +static void panfrost_perfcnt_setup(struct panfrost_device *pfdev)
> > > >> +{
> > > >> +       u32 cfg;
> > > >> +
> > > >> +       /*
> > > >> +        * Always use address space 0 for now.
> > > >> +        * FIXME: this needs to be updated when we start using different
> > > >> +        * address space.
> > > >> +        */
> > > >> +       cfg = GPU_PERFCNT_CFG_AS(0);
> > > >> +       if (panfrost_model_cmp(pfdev, 0x1000) >= 0)      
> > > > 
> > > > You've got a couple of these. Perhaps we should add either a
> > > > model_is_bifrost() helper or an is_bifrost variable to use.
> > > >       
> > > >> +               cfg |= GPU_PERFCNT_CFG_SETSEL(1);      
> > > 
> > > mali_kbase only sets this if CONFIG_MALI_PRFCNT_SET_SECONDARY is defined
> > > - i.e. this selects a different selection of counters. At at the very
> > > least this should be an optional flag that can be set, but I suspect the
> > > primary set are the ones you are interested in.    
> > 
> > I'll add a param to pass the set of counters to monitor.
> >   
> > >     
> > > >> +
> > > >> +       gpu_write(pfdev, GPU_PERFCNT_CFG,
> > > >> +                 cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_OFF));
> > > >> +
> > > >> +       if (!pfdev->perfcnt->enabled)
> > > >> +               return;
> > > >> +
> > > >> +       gpu_write(pfdev, GPU_PRFCNT_JM_EN, 0xffffffff);
> > > >> +       gpu_write(pfdev, GPU_PRFCNT_SHADER_EN, 0xffffffff);
> > > >> +       gpu_write(pfdev, GPU_PRFCNT_MMU_L2_EN, 0xffffffff);
> > > >> +
> > > >> +       /*
> > > >> +        * Due to PRLAM-8186 we need to disable the Tiler before we enable HW
> > > >> +        * counters.      
> > > > 
> > > > Seems like you could just always apply the work-around? It doesn't
> > > > look like it would be much different.      
> > > 
> > > Technically the workaround causes the driver to perform the operation in
> > > the wrong order below (write TILER_EN when the mode is MANUAL) - this is
> > > a workaround for the hardware issue (and therefore works on that
> > > hardware). But I wouldn't like to say it works on other hardware which
> > > doesn't have the issue.    
> > 
> > Okay, I'll keep the workaround only for HW that are impacted by the bug.
> >   
> > >     
> > > >> +        */
> > > >> +       if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
> > > >> +               gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0);
> > > >> +       else
> > > >> +               gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
> > > >> +
> > > >> +       gpu_write(pfdev, GPU_PERFCNT_CFG,
> > > >> +                 cfg | GPU_PERFCNT_CFG_MODE(GPU_PERFCNT_CFG_MODE_MANUAL));
> > > >> +
> > > >> +       if (panfrost_has_hw_issue(pfdev, HW_ISSUE_8186))
> > > >> +               gpu_write(pfdev, GPU_PRFCNT_TILER_EN, 0xffffffff);
> > > >> +}    
> > 
> > [...]
> >   
> > > >> +int panfrost_perfcnt_init(struct panfrost_device *pfdev)
> > > >> +{
> > > >> +       struct panfrost_perfcnt *perfcnt;
> > > >> +       struct drm_gem_shmem_object *bo;
> > > >> +       size_t size;
> > > >> +       u32 status;
> > > >> +       int ret;
> > > >> +
> > > >> +       if (panfrost_has_hw_feature(pfdev, HW_FEATURE_V4)) {
> > > >> +               unsigned int ncoregroups;
> > > >> +
> > > >> +               ncoregroups = hweight64(pfdev->features.l2_present);
> > > >> +               size = ncoregroups * BLOCKS_PER_COREGROUP *
> > > >> +                      COUNTERS_PER_BLOCK * BYTES_PER_COUNTER;
> > > >> +       } else {
> > > >> +               unsigned int nl2c, ncores;
> > > >> +
> > > >> +               /*
> > > >> +                * TODO: define a macro to extract the number of l2 caches from
> > > >> +                * mem_features.
> > > >> +                */
> > > >> +               nl2c = ((pfdev->features.mem_features >> 8) & GENMASK(3, 0)) + 1;
> > > >> +
> > > >> +               /*
> > > >> +                * The ARM driver is grouping cores per core group and then
> > > >> +                * only using the number of cores in group 0 to calculate the
> > > >> +                * size. Not sure why this is done like that, but I guess
> > > >> +                * shader_present will only show cores in the first group
> > > >> +                * anyway.
> > > >> +                */      
> > > 
> > > I'm not sure I understand this comment. I'm not sure which version of
> > > the driver you are looking at, but shader_present should contain all the
> > > cores (in every core group).
> > > 
> > > The kbase driver (in panfrost's git tree[1]), contains:
> > > 
> > > 	base_gpu_props *props = &kbdev->gpu_props.props;
> > > 	u32 nr_l2 = props->l2_props.num_l2_slices;
> > > 	u64 core_mask = props->coherency_info.group[0].core_mask;
> > > 
> > > 	u32 nr_blocks = fls64(core_mask);
> > > 
> > > 	/* JM and tiler counter blocks are always present */
> > > 	dump_size = (2 + nr_l2 + nr_blocks) *
> > > 			NR_CNT_PER_BLOCK *
> > > 			NR_BYTES_PER_CNT;
> > > 
> > > [1]
> > > https://gitlab.freedesktop.org/panfrost/mali_kbase/blob/master/driver/product/kernel/drivers/gpu/arm/midgard/mali_kbase_vinstr.c
> > > 
> > > i.e. the number of cores + number of L2 caches (plus 2 for JM/tiler)
> > > multiplied by the block size.    
> > 
> > Actually, I was referring to what's done in
> > kbase_gpuprops_construct_coherent_groups() [2].
> >   
> > > 
> > > Also another difference with the below is that fls64 and hweight64 don't
> > > return the same numbers. If the core mask is non-contiguous this will
> > > return different values.    
> > 
> > Right, I should use fls64() not hweight64().  
> 
> Oops, forgot to ask the extra question I had. Looks like when the GPU
> is busy and I enable/dump/disable perfcnt too quickly, I sometime gets
> {JM,SHADER,MMU_L2,TILER}_COUNTER[2] = 0 while I'd expect it to be set
> to the {JM,SHADER,MMU_L2,TILER}_EN value (0xff or 0xffff depending on
> the block). I made sure I was receiving the SAMPLE_DONE+CLEAN_DONE
> events. Any idea what could cause this?
> 
> Note that when the GPU is idle, {JM,SHADER,MMU_L2,TILER}_COUNTER[2] are
> set to the value I expect (0xff or 0xffff depending on the block).

I think I found the issue. Looks like drm_gem_shmem_vmap() is mapping
the BO in cached mode and there's no helper to do cache maintenance
operation on this BO (dma_map/unmap_sg()). Mapping it writecombine
seems to do the trick (at least I no longer have those inconsistencies
on {JM,SHADER,MMU_L2,TILER}_COUNTER[2]).

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

      reply	other threads:[~2019-05-29  6:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14 10:47 [RESEND PATCH v2 0/2] drm/panfrost: Expose perf counters to userspace Boris Brezillon
2019-05-14 10:48 ` [RESEND PATCH v2 1/2] drm/panfrost: Move gpu_{write, read}() macros to panfrost_regs.h Boris Brezillon
2019-05-14 10:48 ` [RESEND PATCH v2 2/2] drm/panfrost: Expose perf counters through debugfs Boris Brezillon
2019-05-14 10:58   ` Emil Velikov
2019-05-14 13:31   ` Rob Herring
2019-05-14 13:55     ` Boris Brezillon
2019-05-16 16:21     ` Steven Price
2019-05-28 13:53       ` Boris Brezillon
2019-05-28 14:01         ` Boris Brezillon
2019-05-29  6:55           ` Boris Brezillon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190529085518.65c7550b@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=alyssa@rosenzweig.io \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel@collabora.com \
    --cc=robh+dt@kernel.org \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.