From: James Morse <james.morse@arm.com>
To: Reinette Chatre <reinette.chatre@intel.com>,
x86@kernel.org, linux-kernel@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
shameerali.kolothum.thodi@huawei.com,
D Scott Phillips OS <scott@os.amperecomputing.com>,
carl@os.amperecomputing.com, lcherian@marvell.com,
bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com,
xingxin.hx@openanolis.org, baolin.wang@linux.alibaba.com,
Jamie Iles <quic_jiles@quicinc.com>,
Xin Hao <xhao@linux.alibaba.com>,
peternewman@google.com, dfustini@baylibre.com
Subject: Re: [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read()
Date: Fri, 15 Sep 2023 18:37:33 +0100 [thread overview]
Message-ID: <acd712b8-748b-e9f5-0bf3-9cfadca34c95@arm.com> (raw)
In-Reply-To: <4f7facea-ffc4-63c3-b960-fa92eb03b04c@intel.com>
Hi Reinette,
On 25/08/2023 00:04, Reinette Chatre wrote:
> On 8/24/2023 9:56 AM, James Morse wrote:
>> On 09/08/2023 23:37, Reinette Chatre wrote:
>>> On 7/28/2023 9:42 AM, James Morse wrote:
>>>> Depending on the number of monitors available, Arm's MPAM may need to
>>>> allocate a monitor prior to reading the counter value. Allocating a
>>>> contended resource may involve sleeping.
>>>>
>>>> add_rmid_to_limbo() calls resctrl_arch_rmid_read() for multiple domains,
>>>> the allocation should be valid for all domains.
>>>>
>>>> __check_limbo() and mon_event_count() each make multiple calls to
>>>> resctrl_arch_rmid_read(), to avoid extra work on contended systems,
>>>> the allocation should be valid for multiple invocations of
>>>> resctrl_arch_rmid_read().
>>>>
>>>> Add arch hooks for this allocation, which need calling before
>>>> resctrl_arch_rmid_read(). The allocated monitor is passed to
>>>> resctrl_arch_rmid_read(), then freed again afterwards. The helper
>>>> can be called on any CPU, and can sleep.
>>
>>> Looking at the error paths all the errors are silent failures.
>>
>> Yeah, I don't really expect this to ever fail. The memory arm64 needs to allocate is
>> smaller than a pointer - if that fails, I think there are bigger problems. The hardware
>> resource is something the call will wait for.
>>
>> As you note, it's hard to propagate an unlikely error back from here.
>>
>>
>>> On the
>>> failure in mon_event_read() this could potentially be handled by setting
>>> the "err" field in struct rmid_read ... at least then the caller can print
>>> an error instead of displaying a zero count to the user.
>>
>> Sure, that covers the one a human being might see.
>
> Right.
>
>>> The other failures are harder to handle though.
>>
>> I don't think the silent failure is such a bad thing. For the limbo handler, no RMID moves
>> between the lists until the handler is able to make progress.
>
> ok, so it needs to ensure that the handler is still rescheduled
> when such a failure is encountered.
Yup, the silent error occurs in __check_limbo(), and cqm_handle_limbo() will still
reschedule the worker. Similarly, for mbm_update(), mbm_handle_overflow() will still
reschedule the work.
>> For the overflow handler, its possible an overflow will get missed (I still have an
>> overflow interrupt I can use here). But I don't think this will be the biggest problem on
>> a machine that is struggling to allocate 4 bytes.
>
> As I now (I think) better understand for MPAM it is 4 bytes of memory as well as
> reservation of a hardware resource. Could something go wrong attempting to find an
> available hardware resource that as you state later is indeed scarce? I wonder if
> it would not be helpful to at least have resctrl log an error from the
> places where it is not possible to propagate the error.
If it can't allocate a monitor, it should block until one becomes available. Errors should
never occur during normal use.
I'll add pr_warn_ratelimited() for errors returned on this path.
>>> Considering that these contexts are allocated and
>>> freed so often, why not allocate them once (perhaps in struct rdt_hw_domain?)
>>> on driver load with clear error handling?
>>
>> Because the resource they represent is scarce. You may have 100 control or monitor groups,
>> but only 10 hardware monitors. The hardware monitor has to be allocated and programmed
>> before it can be read.
>
> I think I misunderstood what "context" is when I wrote the above. I
> was thinking about memory allocation that can be done early and
> neglected to connect the "context" to be an actual hardware resource.
Let me know if there is a better name. Obviously I had to avoid 'resource'!
Thanks,
James
next prev parent reply other threads:[~2023-09-15 17:39 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-28 16:42 [PATCH v5 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
2023-07-28 16:42 ` [PATCH v5 01/24] x86/resctrl: Track the closid with the rmid James Morse
2023-08-09 22:32 ` Reinette Chatre
2023-08-24 16:50 ` James Morse
2023-08-15 0:09 ` Fenghua Yu
2023-07-28 16:42 ` [PATCH v5 02/24] x86/resctrl: Access per-rmid structures by index James Morse
2023-08-09 22:32 ` Reinette Chatre
2023-08-24 16:51 ` James Morse
2023-08-25 0:29 ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
2023-08-09 22:32 ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
2023-08-15 0:50 ` Fenghua Yu
2023-08-24 16:52 ` James Morse
2023-07-28 16:42 ` [PATCH v5 05/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
2023-08-09 22:33 ` Reinette Chatre
2023-08-15 1:22 ` Fenghua Yu
2023-07-28 16:42 ` [PATCH v5 06/24] x86/resctrl: Track the number of dirty RMID a CLOSID has James Morse
2023-08-09 22:33 ` Reinette Chatre
2023-08-24 16:53 ` James Morse
2023-08-24 22:58 ` Reinette Chatre
2023-08-30 22:32 ` Tony Luck
2023-08-14 23:58 ` Fenghua Yu
2023-08-15 2:37 ` Fenghua Yu
2023-08-24 16:53 ` James Morse
2023-07-28 16:42 ` [PATCH v5 07/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
2023-07-28 16:42 ` [PATCH v5 08/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
2023-08-15 2:59 ` Fenghua Yu
2023-08-24 16:54 ` James Morse
2023-07-28 16:42 ` [PATCH v5 09/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
2023-07-28 16:42 ` [PATCH v5 10/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
2023-08-09 22:34 ` Reinette Chatre
2023-08-24 16:55 ` James Morse
2023-08-25 0:43 ` Reinette Chatre
2023-09-08 15:58 ` James Morse
2023-07-28 16:42 ` [PATCH v5 11/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow James Morse
2023-07-28 16:42 ` [PATCH v5 12/24] x86/resctrl: Make resctrl_arch_rmid_read() retry when it is interrupted James Morse
2023-08-09 22:35 ` Reinette Chatre
2023-08-24 16:55 ` James Morse
2023-08-24 23:01 ` Reinette Chatre
2023-09-08 15:58 ` James Morse
2023-09-08 20:15 ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
2023-07-28 16:42 ` [PATCH v5 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
2023-08-09 22:36 ` Reinette Chatre
2023-08-24 16:56 ` James Morse
2023-08-24 23:02 ` Reinette Chatre
2023-09-08 15:58 ` James Morse
2023-09-08 20:15 ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
2023-08-09 22:37 ` Reinette Chatre
2023-08-24 16:56 ` James Morse
2023-08-24 23:04 ` Reinette Chatre
2023-09-15 17:37 ` James Morse [this message]
2023-07-28 16:42 ` [PATCH v5 16/24] x86/resctrl: Make resctrl_mounted checks explicit James Morse
2023-07-28 16:42 ` [PATCH v5 17/24] x86/resctrl: Move alloc/mon static keys into helpers James Morse
2023-07-28 16:42 ` [PATCH v5 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
2023-07-28 16:42 ` [PATCH v5 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
2023-08-17 18:34 ` Fenghua Yu
2023-08-24 16:57 ` James Morse
2023-07-28 16:42 ` [PATCH v5 20/24] x86/resctrl: Add cpu online callback for resctrl work James Morse
2023-08-09 22:38 ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
2023-08-09 22:38 ` Reinette Chatre
2023-08-24 16:57 ` James Morse
2023-07-28 16:42 ` [PATCH v5 22/24] x86/resctrl: Add cpu offline callback for resctrl work James Morse
2023-07-28 16:42 ` [PATCH v5 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu() James Morse
2023-08-09 22:39 ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 24/24] x86/resctrl: Separate arch and fs resctrl locks James Morse
2023-08-09 22:41 ` Reinette Chatre
2023-08-24 16:57 ` James Morse
2023-08-18 22:05 ` Fenghua Yu
2023-08-24 16:58 ` James Morse
2023-08-03 7:34 ` [PATCH v5 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)
2023-08-24 16:58 ` James Morse
2023-08-22 8:42 ` Peter Newman
2023-08-24 16:58 ` James Morse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acd712b8-748b-e9f5-0bf3-9cfadca34c95@arm.com \
--to=james.morse@arm.com \
--cc=Babu.Moger@amd.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=bobo.shaobowang@huawei.com \
--cc=bp@alien8.de \
--cc=carl@os.amperecomputing.com \
--cc=dfustini@baylibre.com \
--cc=fenghua.yu@intel.com \
--cc=hpa@zytor.com \
--cc=lcherian@marvell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peternewman@google.com \
--cc=quic_jiles@quicinc.com \
--cc=reinette.chatre@intel.com \
--cc=scott@os.amperecomputing.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=tan.shaopeng@fujitsu.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xhao@linux.alibaba.com \
--cc=xingxin.hx@openanolis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).