linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Newman <peternewman@google.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	Babu Moger <babu.moger@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Stephane Eranian <eranian@google.com>,
	James Morse <james.morse@arm.com>,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v1 3/9] x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events
Date: Fri, 1 Dec 2023 12:56:06 -0800	[thread overview]
Message-ID: <CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com> (raw)
In-Reply-To: <04c9eb5e-3395-05e6-f0cc-bc8f054a6031@intel.com>

Hi Reinette,

On Tue, May 16, 2023 at 5:06 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
> On 5/15/2023 7:42 AM, Peter Newman wrote:
> >
> > I used a simple parent-child pipe loop benchmark with the parent in
> > one monitoring group and the child in another to trigger 2M
> > context-switches on the same CPU and compared the sample-based
> > profiles on an AMD and Intel implementation. I used perf diff to
> > compare the samples between hard and soft RMID switches.
> >
> > Intel(R) Xeon(R) Platinum 8173M CPU @ 2.00GHz:
> >
> >               +44.80%  [kernel.kallsyms]  [k] __rmid_read
> >     10.43%     -9.52%  [kernel.kallsyms]  [k] __switch_to
> >
> > AMD EPYC 7B12 64-Core Processor:
> >
> >               +28.27%  [kernel.kallsyms]  [k] __rmid_read
> >     13.45%    -13.44%  [kernel.kallsyms]  [k] __switch_to
> >
> > Note that a soft RMID switch that doesn't change CLOSID skips the
> > PQR_ASSOC write completely, so from this data I can roughly say that
> > __rmid_read() is a little over 2x the length of a PQR_ASSOC write that
> > changes the current RMID on the AMD implementation and about 4.5x
> > longer on Intel.
> >
> > Let me know if this clarifies the cost enough or if you'd like to also
> > see instrumented measurements on the individual WRMSR/RDMSR
> > instructions.
>
> I can see from the data the portion of total time spent in __rmid_read().
> It is not clear to me what the impact on a context switch is. Is it
> possible to say with this data that: this solution makes a context switch
> x% slower?
>
> I think it may be optimistic to view this as a replacement of a PQR write.
> As you point out, that requires that a CPU switches between tasks with the
> same CLOSID. You demonstrate that resctrl already contributes a significant
> delay to __switch_to - this work will increase that much more, it has to
> be clear about this impact and motivate that it is acceptable.

We were operating under the assumption that if the overhead wasn't
acceptable, we would have heard complaints about it by now, but we
ultimately learned that this feature wasn't deployed as much as we had
originally thought on AMD hardware and that the overhead does need to
be addressed.

I am interested in your opinion on two options I'm exploring to
mitigate the overhead, both of which depend on an API like the one
Babu recently proposed for the AMD ABMC feature [1], where a new file
interface will allow the user to indicate which mon_groups are
actively being measured. I will refer to this as "assigned" for now,
as that's the current proposal.

The first is likely the simpler approach: only read MBM event counters
which have been marked as "assigned" in the filesystem to avoid paying
the context switch cost on tasks in groups which are not actively
being measured. In our use case, we calculate memory bandwidth on
every group every few minutes by reading the counters twice, 5 seconds
apart. We would just need counters read during this 5-second window.

The second involves avoiding the situation where a hardware counter
could be deallocated: Determine the number of simultaneous RMIDs
supported, reduce the effective number of RMIDs available to that
number. Use the default RMID (0) for all "unassigned" monitoring
groups and report "Unavailable" on all counter reads (and address the
default monitoring group's counts being unreliable). When assigned,
attempt to allocate one of the remaining, usable RMIDs to that group.
It would only be possible to assign all event counters (local, total,
occupancy) at the same time. Using this approach, we would no longer
be able to measure all groups at the same time, but this is something
we would already be accepting when using the AMD ABMC feature.

While the second feature is a lot more disruptive at the filesystem
layer, it does eliminate the added context switch overhead. Also, it
may be helpful in the long run for the filesystem code to start taking
a more abstract view of hardware monitoring resources, given that few
implementations can afford to assign hardware to all monitoring IDs
all the time. In both cases, the meaning of "assigned" could vary
greatly, even among AMD implementations.

Thanks!
-Peter

[1] https://lore.kernel.org/lkml/20231201005720.235639-1-babu.moger@amd.com/

  reply	other threads:[~2023-12-01 20:56 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-21 14:17 [PATCH v1 0/9] x86/resctrl: Use soft RMIDs for reliable MBM on AMD Peter Newman
2023-04-21 14:17 ` [PATCH v1 1/9] selftests/resctrl: Verify all RMIDs count together Peter Newman
2023-04-21 14:17 ` [PATCH v1 2/9] x86/resctrl: Hold a spinlock in __rmid_read() on AMD Peter Newman
2023-05-11 21:35   ` Reinette Chatre
2023-05-12 13:23     ` Peter Newman
2023-05-12 15:23       ` Reinette Chatre
2023-04-21 14:17 ` [PATCH v1 3/9] x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events Peter Newman
2023-05-11 21:37   ` Reinette Chatre
2023-05-12 13:25     ` Peter Newman
2023-05-12 15:26       ` Reinette Chatre
2023-05-15 14:42         ` Peter Newman
2023-05-17  0:05           ` Reinette Chatre
2023-12-01 20:56             ` Peter Newman [this message]
2023-12-05 21:57               ` Reinette Chatre
2023-12-06  0:33                 ` Peter Newman
2023-12-06  1:46                   ` Reinette Chatre
2023-12-06 18:38                     ` Peter Newman
2023-12-06 20:02                       ` Reinette Chatre
2023-05-16 14:18       ` Peter Newman
2023-05-16 14:27         ` Peter Newman
2023-06-01 14:45     ` Peter Newman
2023-06-01 17:14       ` Reinette Chatre
2023-04-21 14:17 ` [PATCH v1 4/9] x86/resctrl: Flush MBM event counts on soft RMID change Peter Newman
2023-05-11 21:37   ` Reinette Chatre
2023-04-21 14:17 ` [PATCH v1 5/9] x86/resctrl: Call mon_event_count() directly for soft RMIDs Peter Newman
2023-05-11 21:38   ` Reinette Chatre
2023-04-21 14:17 ` [PATCH v1 6/9] x86/resctrl: Create soft RMID version of __mon_event_count() Peter Newman
2023-05-11 21:38   ` Reinette Chatre
2023-04-21 14:17 ` [PATCH v1 7/9] x86/resctrl: Assign HW RMIDs to CPUs for soft RMID Peter Newman
2023-05-11 21:39   ` Reinette Chatre
2023-05-16 14:49     ` Peter Newman
2023-05-17  0:06       ` Reinette Chatre
2023-06-06 13:31         ` Peter Newman
2023-06-06 13:36   ` Peter Newman
2023-04-21 14:17 ` [PATCH v1 8/9] x86/resctrl: Use mbm_update() to push soft RMID counts Peter Newman
2023-05-11 21:40   ` Reinette Chatre
2023-06-02 12:42     ` Peter Newman
2023-06-06 13:48   ` Peter Newman
2023-04-21 14:17 ` [PATCH v1 9/9] x86/resctrl: Add mount option to enable soft RMID Peter Newman
2023-05-11 21:41   ` Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com' \
    --to=peternewman@google.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=eranian@google.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).