All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Redfearn <matt.redfearn@mips.com>
To: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>,
	Florian Fainelli <f.fainelli@gmail.com>,
	<linux-mips@linux-mips.org>, Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	<linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Jiri Olsa <jolsa@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: Re: [PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand
Date: Thu, 17 May 2018 11:40:45 +0100	[thread overview]
Message-ID: <ea713dfc-a4ef-a2ea-de9f-bd40ef28b128@mips.com> (raw)
In-Reply-To: <20180516180518.GB12837@jamesdev>

Hi James,

On 16/05/18 19:05, James Hogan wrote:
> On Fri, Apr 20, 2018 at 11:23:07AM +0100, Matt Redfearn wrote:
>> Previously when performance counters are per-core, rather than
>> per-thread, the number available were divided by 2 on detection, and the
>> counters used by each thread in a core were "swizzled" to ensure
>> separation. However, this solution is suboptimal since it relies on a
>> couple of assumptions:
>> a) Always having 2 VPEs / core (number of counters was divided by 2)
>> b) Always having a number of counters implemented in the core that is
>>     divisible by 2. For instance if an SoC implementation had a single
>>     counter and 2 VPEs per core, then this logic would fail and no
>>     performance counters would be available.
>> The mechanism also does not allow for one VPE in a core using more than
>> it's allocation of the per-core counters to count multiple events even
>> though other VPEs may not be using them.
>>
>> Fix this situation by instead allocating (and releasing) per-core
>> performance counters when they are requested. This approach removes the
>> above assumptions and fixes the shortcomings.
>>
>> In order to do this:
>> Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
>> sibling is using a per-core counter, and to allocate a per-core counter
>> in all sibling CPUs.
>> Similarly, add a mipsxx_pmu_free_counter() function to release a
>> per-core counter in all sibling CPUs when it is finished with.
>> A new spinlock, core_counters_lock, is introduced to ensure exclusivity
>> when allocating and releasing per-core counters.
>> Since counters are now allocated per-core on demand, rather than being
>> reserved per-thread at boot, all of the "swizzling" of counters is
>> removed.
>>
>> The upshot is that in an SoC with 2 counters / thread, counters are
>> reported as:
>> Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
>> available to each CPU, irq 18
>>
>> Running an instance of a test program on each of 2 threads in a
>> core, both threads can use their 2 counters to count 2 events:
>>
>> taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
>> perf stat -e instructions:u,branches:u ./test_prog
>>
>>   Performance counter stats for './test_prog':
>>
>>               30002      instructions:u
>>               10000      branches:u
>>
>>         0.005164264 seconds time elapsed
>>   Performance counter stats for './test_prog':
>>
>>               30002      instructions:u
>>               10000      branches:u
>>
>>         0.006139975 seconds time elapsed
>>
>> In an SoC with 2 counters / core (which can be forced by setting
>> cpu_has_mipsmt_pertccounters = 0), counters are reported as:
>> Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
>> available to each core, irq 18
>>
>> Running an instance of a test program on each of 2 threads in a/soak/bin/bashsoak -E cpuhotplugHi
>> core, now only one thread manages to secure the performance counters to
>> count 2 events. The other thread does not get any counters.
>>
>> taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
>> perf stat -e instructions:u,branches:u ./test_prog
>>
>>   Performance counter stats for './test_prog':
>>
>>       <not counted>       instructions:u
>>       <not counted>       branches:u
>>
>>         0.005179533 seconds time elapsed
>>
>>   Performance counter stats for './test_prog':
>>
>>               30002      instructions:u
>>               10000      branches:u
>>
>>         0.005179467 seconds time elapsed
>>
>> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
> 
> While this sounds like an improvement in practice, being able to use
> more counters on single threaded stuff than otherwise, I'm a little
> concerned what would happen if a task was migrated to a different CPU
> and the perf counters couldn't be obtained on the new CPU due to
> counters already being in use. Would the values be incorrectly small?

This change was really forced by the new I7200 development. Current 
configurations have 2 counters per core, but each core has 3 VPEs - 
which means the current logic cannot correctly assign counters. IoW the 
2 assumptions stated in the commit log are no longer true.

Though you are right that if a task migrated to a core on which another 
VPE is already using the counters, this change would mean counters 
cannot be assigned. In that case we return EAGAIN. I'm not sure if that 
error would be handled gracefully by the scheduler and the task 
scheduled away again... The code events logic that backs this is tricky 
to follow.

Thanks,
Matt


> 
> Cheers
> James
> 

WARNING: multiple messages have this Message-ID (diff)
From: Matt Redfearn <matt.redfearn@mips.com>
To: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>,
	Florian Fainelli <f.fainelli@gmail.com>,
	linux-mips@linux-mips.org, Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Jiri Olsa <jolsa@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: Re: [PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand
Date: Thu, 17 May 2018 11:40:45 +0100	[thread overview]
Message-ID: <ea713dfc-a4ef-a2ea-de9f-bd40ef28b128@mips.com> (raw)
Message-ID: <20180517104045.Cwxy7Qd50bkjRiCSt62byR5N25zkAJAdiCRTayLJxU8@z> (raw)
In-Reply-To: <20180516180518.GB12837@jamesdev>

Hi James,

On 16/05/18 19:05, James Hogan wrote:
> On Fri, Apr 20, 2018 at 11:23:07AM +0100, Matt Redfearn wrote:
>> Previously when performance counters are per-core, rather than
>> per-thread, the number available were divided by 2 on detection, and the
>> counters used by each thread in a core were "swizzled" to ensure
>> separation. However, this solution is suboptimal since it relies on a
>> couple of assumptions:
>> a) Always having 2 VPEs / core (number of counters was divided by 2)
>> b) Always having a number of counters implemented in the core that is
>>     divisible by 2. For instance if an SoC implementation had a single
>>     counter and 2 VPEs per core, then this logic would fail and no
>>     performance counters would be available.
>> The mechanism also does not allow for one VPE in a core using more than
>> it's allocation of the per-core counters to count multiple events even
>> though other VPEs may not be using them.
>>
>> Fix this situation by instead allocating (and releasing) per-core
>> performance counters when they are requested. This approach removes the
>> above assumptions and fixes the shortcomings.
>>
>> In order to do this:
>> Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
>> sibling is using a per-core counter, and to allocate a per-core counter
>> in all sibling CPUs.
>> Similarly, add a mipsxx_pmu_free_counter() function to release a
>> per-core counter in all sibling CPUs when it is finished with.
>> A new spinlock, core_counters_lock, is introduced to ensure exclusivity
>> when allocating and releasing per-core counters.
>> Since counters are now allocated per-core on demand, rather than being
>> reserved per-thread at boot, all of the "swizzling" of counters is
>> removed.
>>
>> The upshot is that in an SoC with 2 counters / thread, counters are
>> reported as:
>> Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
>> available to each CPU, irq 18
>>
>> Running an instance of a test program on each of 2 threads in a
>> core, both threads can use their 2 counters to count 2 events:
>>
>> taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
>> perf stat -e instructions:u,branches:u ./test_prog
>>
>>   Performance counter stats for './test_prog':
>>
>>               30002      instructions:u
>>               10000      branches:u
>>
>>         0.005164264 seconds time elapsed
>>   Performance counter stats for './test_prog':
>>
>>               30002      instructions:u
>>               10000      branches:u
>>
>>         0.006139975 seconds time elapsed
>>
>> In an SoC with 2 counters / core (which can be forced by setting
>> cpu_has_mipsmt_pertccounters = 0), counters are reported as:
>> Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
>> available to each core, irq 18
>>
>> Running an instance of a test program on each of 2 threads in a/soak/bin/bashsoak -E cpuhotplugHi
>> core, now only one thread manages to secure the performance counters to
>> count 2 events. The other thread does not get any counters.
>>
>> taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
>> perf stat -e instructions:u,branches:u ./test_prog
>>
>>   Performance counter stats for './test_prog':
>>
>>       <not counted>       instructions:u
>>       <not counted>       branches:u
>>
>>         0.005179533 seconds time elapsed
>>
>>   Performance counter stats for './test_prog':
>>
>>               30002      instructions:u
>>               10000      branches:u
>>
>>         0.005179467 seconds time elapsed
>>
>> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
> 
> While this sounds like an improvement in practice, being able to use
> more counters on single threaded stuff than otherwise, I'm a little
> concerned what would happen if a task was migrated to a different CPU
> and the perf counters couldn't be obtained on the new CPU due to
> counters already being in use. Would the values be incorrectly small?

This change was really forced by the new I7200 development. Current 
configurations have 2 counters per core, but each core has 3 VPEs - 
which means the current logic cannot correctly assign counters. IoW the 
2 assumptions stated in the commit log are no longer true.

Though you are right that if a task migrated to a core on which another 
VPE is already using the counters, this change would mean counters 
cannot be assigned. In that case we return EAGAIN. I'm not sure if that 
error would be handled gracefully by the scheduler and the task 
scheduled away again... The code events logic that backs this is tricky 
to follow.

Thanks,
Matt


> 
> Cheers
> James
> 

  reply	other threads:[~2018-05-17 10:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-20 10:23 [PATCH v3 0/7] MIPS: perf: MT fixes and improvements Matt Redfearn
2018-04-20 10:23 ` Matt Redfearn
2018-04-20 10:23 ` [PATCH v3 1/7] MIPS: Probe for MIPS MT perf counters per TC Matt Redfearn
2018-04-20 10:23   ` Matt Redfearn
2018-04-20 10:23 ` [PATCH v3 2/7] MIPS: perf: More robustly probe for the presence of per-tc counters Matt Redfearn
2018-04-20 10:23   ` Matt Redfearn
2018-04-20 10:23 ` [PATCH v3 3/7] MIPS: perf: Use correct VPE ID when setting up VPE tracing Matt Redfearn
2018-04-20 10:23   ` Matt Redfearn
2018-04-20 10:23 ` [PATCH v3 4/7] MIPS: perf: Fix perf with MT counting other threads Matt Redfearn
2018-04-20 10:23   ` Matt Redfearn
2018-05-16 17:59   ` James Hogan
2018-05-17 10:35     ` Matt Redfearn
2018-05-17 10:35       ` Matt Redfearn
2018-04-20 10:23 ` [PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand Matt Redfearn
2018-04-20 10:23   ` Matt Redfearn
2018-05-16 18:05   ` James Hogan
2018-05-17 10:40     ` Matt Redfearn [this message]
2018-05-17 10:40       ` Matt Redfearn
2018-04-20 10:23 ` [PATCH v3 6/7] MIPS: perf: Fold vpe_id() macro into it's one last usage Matt Redfearn
2018-04-20 10:23   ` Matt Redfearn
2018-04-20 10:23 ` [PATCH v3 7/7] MIPS: perf: Fix BMIPS5000 system mode counting Matt Redfearn
2018-04-20 10:23   ` Matt Redfearn
2018-05-15 14:44   ` [PATCH v4] " Matt Redfearn
2018-05-15 14:44     ` Matt Redfearn
2018-04-20 22:51 ` [PATCH v3 0/7] MIPS: perf: MT fixes and improvements Florian Fainelli
2018-04-23 13:40   ` Matt Redfearn
2018-04-23 13:40     ` Matt Redfearn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea713dfc-a4ef-a2ea-de9f-bd40ef28b128@mips.com \
    --to=matt.redfearn@mips.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=f.fainelli@gmail.com \
    --cc=jhogan@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ralf@linux-mips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.