From: Reinette Chatre <reinette.chatre@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
tglx@linutronix.de, mingo@redhat.com, fenghua.yu@intel.com,
tony.luck@intel.com, vikas.shivappa@linux.intel.com,
gavin.hindman@intel.com, jithu.joseph@intel.com, hpa@zytor.com,
x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination with perf
Date: Fri, 10 Aug 2018 09:25:02 -0700 [thread overview]
Message-ID: <b689b8d4-5fe1-6fae-c873-6e62a273cd07@intel.com> (raw)
In-Reply-To: <5960739f-ee27-b182-3804-7e5d9356457f@intel.com>
Hi Peter,
On 8/8/2018 10:33 AM, Reinette Chatre wrote:
> On 8/8/2018 12:51 AM, Peter Zijlstra wrote:
>> On Tue, Aug 07, 2018 at 03:47:15PM -0700, Reinette Chatre wrote:
>>>> - I don't much fancy people accessing the guts of events like that;
>>>> would not an inline function like:
>>>>
>>>> static inline u64 x86_perf_rdpmc(struct perf_event *event)
>>>> {
>>>> u64 val;
>>>>
>>>> lockdep_assert_irqs_disabled();
>>>>
>>>> rdpmcl(event->hw.event_base_rdpmc, val);
>>>> return val;
>>>> }
>>>>
>>>> Work for you?
>>>
>>> No. This does not provide accurate results. Implementing the above produces:
>>> pseudo_lock_mea-366 [002] .... 34.950740: pseudo_lock_l2: hits=4096
>>> miss=4
>>
>> But it being an inline function should allow the compiler to optimize
>> and lift the event->hw.event_base_rdpmc load like you now do manually.
>> Also, like Tony already suggested, you can prime that load just fine by
>> doing an extra invocation.
>>
>> (and note that the above function is _much_ simpler than
>> perf_event_read_local())
>
> Unfortunately I do not find this to be the case. When I implement
> x86_perf_rdpmc() _exactly_ as you suggest above and do the measurement like:
>
> l2_hits_before = x86_perf_rdpmc(l2_hit_event);
> l2_miss_before = x86_perf_rdpmc(l2_miss_event);
> l2_hits_before = x86_perf_rdpmc(l2_hit_event);
> l2_miss_before = x86_perf_rdpmc(l2_miss_event);
> /* read memory */
> l2_hits_after = x86_perf_rdpmc(l2_hit_event);
> l2_miss_after = x86_perf_rdpmc(l2_miss_event);
>
>
> Then the results are not accurate, neither are the consistently
> inaccurate to consider a constant adjustment:
>
> pseudo_lock_mea-409 [002] .... 194.322611: pseudo_lock_l2: hits=4100
> miss=0
> pseudo_lock_mea-412 [002] .... 195.520203: pseudo_lock_l2: hits=4096
> miss=3
> pseudo_lock_mea-415 [002] .... 196.571114: pseudo_lock_l2: hits=4097
> miss=3
> pseudo_lock_mea-422 [002] .... 197.629118: pseudo_lock_l2: hits=4097
> miss=3
> pseudo_lock_mea-425 [002] .... 198.687160: pseudo_lock_l2: hits=4096
> miss=3
> pseudo_lock_mea-428 [002] .... 199.744156: pseudo_lock_l2: hits=4096
> miss=2
> pseudo_lock_mea-431 [002] .... 200.801131: pseudo_lock_l2: hits=4097
> miss=2
> pseudo_lock_mea-434 [002] .... 201.858141: pseudo_lock_l2: hits=4097
> miss=2
> pseudo_lock_mea-437 [002] .... 202.917168: pseudo_lock_l2: hits=4096
> miss=2
>
> I was able to test Tony's theory and replacing the reading of the
> "after" counts with a direct rdpmcl() improve the results. What I mean
> is this:
>
> l2_hit_pmcnum = x86_perf_rdpmc_ctr_get(l2_hit_event);
> l2_miss_pmcnum = x86_perf_rdpmc_ctr_get(l2_miss_event);
> l2_hits_before = x86_perf_rdpmc(l2_hit_event);
> l2_miss_before = x86_perf_rdpmc(l2_miss_event);
> l2_hits_before = x86_perf_rdpmc(l2_hit_event);
> l2_miss_before = x86_perf_rdpmc(l2_miss_event);
> /* read memory */
> rdpmcl(l2_hit_pmcnum, l2_hits_after);
> rdpmcl(l2_miss_pmcnum, l2_miss_after);
>
> I did not run my full tests with the above but a simple read of 256KB
> pseudo-locked memory gives:
> pseudo_lock_mea-492 [002] .... 372.001385: pseudo_lock_l2: hits=4096
> miss=0
> pseudo_lock_mea-495 [002] .... 373.059748: pseudo_lock_l2: hits=4096
> miss=0
> pseudo_lock_mea-498 [002] .... 374.117027: pseudo_lock_l2: hits=4096
> miss=0
> pseudo_lock_mea-501 [002] .... 375.182864: pseudo_lock_l2: hits=4096
> miss=0
> pseudo_lock_mea-504 [002] .... 376.243958: pseudo_lock_l2: hits=4096
> miss=0
>
> We thus seem to be encountering the issue Tony predicted where the
> memory being tested is evicting the earlier measurement code and data.
I thoroughly reviewed this email thread to ensure that all your feedback
is being addressed. At this time I believe the current solution does so
since it addresses all requirements I was able to capture:
- Use in-kernel interface to perf.
- Do not write directly to PMU registers.
- Do not introduce another PMU owner. perf maintains role as performing
resource arbitration for PMU.
- User space is able to use perf and resctrl at the same time.
- event_base_rdpmc is accessed and used only within an interrupts
disabled section.
- Internals of events are never accessed directly, inline function used.
- Due to "pinned" usage the scheduling of event may have failed. Error
state is checked in recommended way and have a credible error
handling.
- use X86_CONFIG
The pseudocode of the current solution is presented below. With this
solution I am able to address our customer requirement to be able to
measure a pseudo-locked region accurately while also addressing your
requirements to use perf correctly.
Is this solution acceptable to you?
#include "../../events/perf_event.h" /* For X86_CONFIG() */
/*
* The X86_CONFIG() macro cannot be used in
* a designated initializer as below - the initialization of
* the .config attribute is thus deferred to later in order
* to use X86_CONFIG
*/
static struct perf_event_attr l2_miss_attr = {
.type = PERF_TYPE_RAW,
.size = sizeof(struct perf_event_attr),
.pinned = 1,
.disabled = 0,
.exclude_user = 1
};
static struct perf_event_attr l2_hit_attr = {
.type = PERF_TYPE_RAW,
.size = sizeof(struct perf_event_attr),
.pinned = 1,
.disabled = 0,
.exclude_user = 1
};
static inline int x86_perf_rdpmc_ctr_get(struct perf_event *event)
{
lockdep_assert_irqs_disabled();
return IS_ERR(event) ? 0 : event->hw.event_base_rdpmc;
}
static inline int x86_perf_event_error_state(struct perf_event *event)
{
int ret = 0;
u64 tmp;
ret = perf_event_read_local(event, &tmp, NULL, NULL);
if (ret < 0)
return ret;
if (event->attr.pinned && event->oncpu != smp_processor_id())
return -EBUSY;
return ret;
}
/*
* Below is run by kernel thread on correct CPU as triggered
* by user via debugfs
*/
static int measure_cycles_perf_fn(...)
{
u64 l2_hits_before, l2_hits_after, l2_miss_before, l2_miss_after;
struct perf_event *l2_miss_event, *l2_hit_event;
int l2_hit_pmcnum, l2_miss_pmcnum;
/* Other vars */
l2_miss_attr.config = X86_CONFIG(.event=0xd1, .umask=0x10);
l2_hit_attr.config = X86_CONFIG(.event=0xd1, .umask=0x2);
l2_miss_event = perf_event_create_kernel_counter(&l2_miss_attr,
cpu,
NULL, NULL, NULL);
if (IS_ERR(l2_miss_event))
goto out;
l2_hit_event = perf_event_create_kernel_counter(&l2_hit_attr,
cpu,
NULL, NULL, NULL);
if (IS_ERR(l2_hit_event))
goto out_l2_miss;
local_irq_disable();
if (x86_perf_event_error_state(l2_miss_event)) {
local_irq_enable();
goto out_l2_hit;
}
if (x86_perf_event_error_state(l2_hit_event)) {
local_irq_enable();
goto out_l2_hit;
}
/* Disable hardware prefetchers */
/* Initialize local variables */
l2_hit_pmcnum = x86_perf_rdpmc_ctr_get(l2_hit_event);
l2_miss_pmcnum = x86_perf_rdpmc_ctr_get(l2_miss_event);
rdpmcl(l2_hit_pmcnum, l2_hits_before);
rdpmcl(l2_miss_pmcnum, l2_miss_before);
/*
* From SDM: Performing back-to-back fast reads are not guaranteed
* to be monotonic. To guarantee monotonicity on back-toback reads,
* a serializing instruction must be placed between the two
* RDPMC instructions
*/
rmb();
rdpmcl(l2_hit_pmcnum, l2_hits_before);
rdpmcl(l2_miss_pmcnum, l2_miss_before);
rmb();
/* Loop through pseudo-locked memory */
rdpmcl(l2_hit_pmcnum, l2_hits_after);
rdpmcl(l2_miss_pmcnum, l2_miss_after);
rmb();
/* Re-enable hardware prefetchers */
local_irq_enable();
/* Write results to kernel tracepoints */
out_l2_hit:
perf_event_release_kernel(l2_hit_event);
out_l2_miss:
perf_event_release_kernel(l2_miss_event);
out:
/* Cleanup */
}
Your feedback has been valuable and greatly appreciated.
Reinette
next prev parent reply other threads:[~2018-08-10 16:25 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-31 19:38 [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination with perf Reinette Chatre
2018-07-31 19:38 ` [PATCH 1/2] perf/x86: Expose PMC hardware reservation Reinette Chatre
2018-07-31 19:38 ` [PATCH 2/2] x86/intel_rdt: Coordinate performance monitoring with perf Reinette Chatre
2018-08-02 12:39 ` [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination " Peter Zijlstra
2018-08-02 16:14 ` Reinette Chatre
2018-08-02 16:18 ` Peter Zijlstra
2018-08-02 16:44 ` Reinette Chatre
2018-08-02 17:37 ` Peter Zijlstra
2018-08-02 18:18 ` Dave Hansen
2018-08-02 19:54 ` Peter Zijlstra
2018-08-02 20:06 ` Dave Hansen
2018-08-02 20:13 ` Peter Zijlstra
2018-08-02 20:43 ` Reinette Chatre
2018-08-03 10:49 ` Peter Zijlstra
2018-08-03 15:18 ` Reinette Chatre
2018-08-03 15:25 ` Peter Zijlstra
2018-08-03 18:37 ` Reinette Chatre
2018-08-06 19:50 ` Reinette Chatre
2018-08-06 22:12 ` Peter Zijlstra
2018-08-06 23:07 ` Reinette Chatre
2018-08-07 9:36 ` Peter Zijlstra
[not found] ` <ace0bebb-91ab-5d40-e7d7-d72d48302fa8@intel.com>
2018-08-08 1:28 ` Luck, Tony
2018-08-08 5:44 ` Reinette Chatre
2018-08-08 7:41 ` Peter Zijlstra
2018-08-08 15:55 ` Luck, Tony
2018-08-08 16:47 ` Peter Zijlstra
2018-08-08 16:51 ` Reinette Chatre
2018-08-08 7:51 ` Peter Zijlstra
2018-08-08 17:33 ` Reinette Chatre
2018-08-10 16:25 ` Reinette Chatre [this message]
2018-08-10 17:52 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b689b8d4-5fe1-6fae-c873-6e62a273cd07@intel.com \
--to=reinette.chatre@intel.com \
--cc=dave.hansen@intel.com \
--cc=fenghua.yu@intel.com \
--cc=gavin.hindman@intel.com \
--cc=hpa@zytor.com \
--cc=jithu.joseph@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=vikas.shivappa@linux.intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).