All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Peter Newman <peternewman@google.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>,
	Tony Luck <tony.luck@intel.com>,
	"Yu, Fenghua" <fenghua.yu@intel.com>,
	"Eranian, Stephane" <eranian@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Babu Moger <Babu.Moger@amd.com>,
	Gaurang Upasani <gupasani@google.com>
Subject: Re: [RFD] resctrl: reassigning a running container's CTRL_MON group
Date: Wed, 19 Oct 2022 14:57:58 +0100	[thread overview]
Message-ID: <835d769b-3662-7be5-dcdd-804cb1f3999a@arm.com> (raw)
In-Reply-To: <CALPaoCiKUQC+LxDwKQ0gE5AQniJi_nbzrXi_HA9ZBRtiXdw_dg@mail.gmail.com>

Hi Peter,

On 17/10/2022 11:15, Peter Newman wrote:
> On Wed, Oct 12, 2022 at 6:55 PM James Morse <james.morse@arm.com> wrote:
>> You originally asked:
>> | Any concerns about the CLOSID-reusing behavior?
>>
>> I don't think this will work well with MPAM ... I expect it will mess up the bandwidth
>> counters.
>>
>> MPAM's equivalent to RMID is PMG. While on x86 CLOSID and RMID are independent numbers,
>> this isn't true for PARTID (MPAM's version of CLOSID) and PMG. The PMG bits effectively
>> extended the PARTID with bits that aren't used to look up the configuration.
>>
>> x86's monitors match only on RMID, and there are 'enough' RMID... MPAMs monitors are more
>> complicated. I've seen details of a system that only has 1 bit of PMG space.
>>
>> While MPAM's bandwidth monitors can match just the PMG, there aren't expected to be enough
>> unique PMG for every control/monitor group to have a unique value. Instead, MPAM's
>> monitors are expected to be used with both the PARTID and PMG.
>>
>> ('bandwidth monitors' is relevant here, MPAM's 'cache storage utilisation' monitors can't
>> match on just PMG at all - they have to be told the PARTID too)
>>
>>
>> If you're re-using CLOSID like this, I think you'll end up with noisy measurements on MPAM
>> systems as the caches hold PARTID/PMG values from before the re-use pattern changed, and
>> the monitors have to match on both.

> Yes, that sounds like it would be an issue.
> 
> Following your refactoring changes, hopefully the MPAM driver could
> offer alternative methods for managing PARTIDs and PMGs depending on the
> available hardware resources.

Mmmm, I don't think anything other than one-partid per control group and one-pmg per
monitor group makes much sense.


> If there are a lot more PARTIDs than PMGs, then it would fit well with a
> user who never creates child MON groups. In case the number of MON
> groups gets ahead of the number of CTRL_MON groups and you've run out of
> PMGs, perhaps you would just try to allocate another PARTID and program
> the same partitioning configuration before giving up.

User-space can choose to do this.
If the kernel tries to be clever and do this behind user-space's back, it needs to
allocate two monitors for this secretly-two-control-groups, and always sum the counters
before reporting them to user-space.
If monitors are a contended resource, then you may be unable to monitor the
secretly-two-control-groups group once the kernel has done this.

I don't think the kernel should try to be too clever here.

> Of course, there
> wouldn't be much point in reusing PARTIDs in such a configuration
> either.

> If we used the child MON groups as the primary vehicle for moving a
> container's tasks between a small number of CTRL_MON groups like in
> Reinette's proposal, then it seems like it would be a better use of
> hardware to have many PMGs and few PARTIDs.

> In that case, the monitors would only match on PMGs.

This isn't how MPAM is designed to be used. You'll hit nasty corners.
The big one is the Cache Storage Utilisation counters.

See 11.5.2 of the MPAM spec, "MSMON_CFG_CSU_CTL, MPAM Memory System Monitor Configure
Cache Storage Usage Monitor Control Register". Not setting the MATCH_PARTID bit has this
warning:
| If MATCH_PMG is 1 and MATCH_PARTID is 0, it is CONSTRAINED UNPREDICTABLE whether the
| monitor instance:
| • Measures the storage used with matching PMG and with any PARTID.
| • Measures no storage usage, that is, MSMON_CSU.VALUE is zero.
| • Measures the storage used with matching PMG and PARTID, that is, treats
| MATCH_PARTID as = 1

'constrained unpredictable' is arm's term for "portable software can't rely on this".
The folk that designed MPAM don't believe "monitors would only match on PMGs" makes any
sense. A PMG is not an RMID. A case in point is the system with only 1 PMG bit.

I'm afraid this approach would preclude support for the llc_occupancy counter, and would
artificially reduce the number of control groups that can be created as each control group
needs an 'RMID'. On the machine with 1 PMG bit - you get 2 control groups, even though it
has many more PARTID.


> Provided that there are sufficient monitor
> instances, there would never be any need to reprogram a monitor's
> PMG.

It sounds like this moves the problem to "make everything a monitor group because only
monitor groups can be batch moved".

If the tasks file could be moved between control and monitor groups, causing resctrl to
relabel the tasks - would that solve more of the problem? (it eliminates the need to make
everything a monitor group)

The devil is in the detail, I'm not sure how it serialises with a fork()ing process, I'd
hope to do better than relying on the kernel walking the list of processes a lot quicker
than user-space can.


>> I have half-finished patches that add a 'resctrl' cgroup controller that can be used to
>> group tasks and assign them to control or monitor groups. (the creation and configuration
>> of control and monitor groups stays in resctrl - it effectively makes the tasks file
>> read-only). I think this might help, as a group of processes can be moved between two
>> control/monitor groups with one syscall. New processes that are created inherit from the
>> cgroup setting instead of their parent task.
>>
>> If want to take a look, its here:
>> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot/v6.0&id=4e5987d8ecbc8647dee0aebfb73c3890843ef5dd
> 
>> I've not worked the cgroup thread stuff out yet ... it doesn't appear to hook thread
>> creation, only fork().

> This looks very promising for our use case, as it would be very easy to
> use for a container manager. I'm glad you're looking into this.

Let me know if it solves this problem - I assume the resctrl topology is a subset of the
cgroup topology.

(apparently android needed cgroup support, but now its more complicated)


Thanks,

James

  reply	other threads:[~2022-10-19 14:17 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-07 10:39 [RFD] resctrl: reassigning a running container's CTRL_MON group Peter Newman
2022-10-07 15:36 ` Reinette Chatre
2022-10-07 15:44   ` Yu, Fenghua
2022-10-07 17:28     ` Tony Luck
2022-10-10 23:35       ` Reinette Chatre
2022-10-12 11:21         ` Peter Newman
2022-10-12 16:55           ` James Morse
2022-10-17 10:15             ` Peter Newman
2022-10-19 13:57               ` James Morse [this message]
2022-10-20 10:39                 ` Peter Newman
2022-10-21 12:42                   ` Peter Newman
2022-10-25 15:55                     ` James Morse
2022-10-26  8:52                       ` Peter Newman
2022-10-26 21:12                         ` Reinette Chatre
2022-10-27  7:56                           ` Peter Newman
2022-10-27 17:35                             ` Reinette Chatre
2022-11-01 15:23                               ` Peter Newman
2022-11-01 15:53                                 ` Peter Newman
2022-11-01 16:48                                   ` Reinette Chatre
2022-10-25 15:56                   ` James Morse
2022-10-21 20:09                 ` Reinette Chatre
2022-10-21 20:22                   ` Luck, Tony
2022-10-21 21:34                     ` Reinette Chatre
2022-11-03 17:06                   ` James Morse
2022-11-08 21:28                     ` Reinette Chatre
2022-11-08 21:56                       ` Luck, Tony
2022-11-08 23:18                         ` Reinette Chatre
2022-11-09 17:58                           ` James Morse
2022-11-09  9:50                       ` Peter Newman
2022-11-09 19:11                         ` Reinette Chatre
2022-11-11 18:38                           ` James Morse
2022-11-14 18:02                             ` Reinette Chatre
2022-11-16 13:20                             ` Peter Newman
2022-11-09 17:59                       ` James Morse
2022-11-09 19:12                         ` Reinette Chatre
2022-11-11 18:36                           ` James Morse
2022-10-12 16:57           ` Yu, Fenghua
2022-10-12 17:23           ` Reinette Chatre
2022-10-14 12:56             ` James Morse
2022-10-19  9:08             ` Peter Newman
2022-10-19 13:20               ` James Morse
2022-10-19 23:54               ` Reinette Chatre
2022-10-20  8:48                 ` Peter Newman
2022-10-20 19:08                   ` Reinette Chatre
2022-10-21 10:09                     ` Peter Newman
2022-10-25 15:56                       ` James Morse
2022-10-25 15:55                     ` James Morse
2022-10-26  9:36                       ` Peter Newman
2022-11-03 17:06                         ` James Morse
2022-11-08 21:25                           ` Reinette Chatre
2022-10-07 17:57 ` Moger, Babu
2022-10-11 15:00   ` Stephane Eranian
2022-10-11 14:59 ` Stephane Eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=835d769b-3662-7be5-dcdd-804cb1f3999a@arm.com \
    --to=james.morse@arm.com \
    --cc=Babu.Moger@amd.com \
    --cc=eranian@google.com \
    --cc=fenghua.yu@intel.com \
    --cc=gupasani@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.