linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: James Morse <james.morse@arm.com>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
	<shameerali.kolothum.thodi@huawei.com>,
	Jamie Iles <jamie@nuviainc.com>,
	"D Scott Phillips OS" <scott@os.amperecomputing.com>,
	<lcherian@marvell.com>, <bobo.shaobowang@huawei.com>,
	<tan.shaopeng@fujitsu.com>
Subject: Re: [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
Date: Fri, 29 Oct 2021 15:22:01 -0700	[thread overview]
Message-ID: <bcdf2daf-ae8a-9047-e171-3dd36a98a66a@intel.com> (raw)
In-Reply-To: <76b70d56-b4b2-6fec-693a-a2105f446ec6@arm.com>

Hi James,

On 10/29/2021 8:50 AM, James Morse wrote:
> On 27/10/2021 21:41, Reinette Chatre wrote:
>> On 10/27/2021 9:50 AM, James Morse wrote:
>>> On 15/10/2021 23:28, Reinette Chatre wrote:
>>>> On 10/1/2021 9:02 AM, James Morse wrote:
>>>>> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
>>>>> second. It reads the hardware register, calculates the bandwidth and
>>>>> updates m->prev_bw_msr which is used to hold the previous hardware register
>>>>> value.
>>>>>
>>>>> Operating directly on hardware register values makes it difficult to make
>>>>> this code architecture independent, so that it can be moved to /fs/,
>>>>> making the mba_sc feature something resctrl supports with no additional
>>>>> support from the architecture.
>>>>> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
>>>>> register using __mon_event_count().
>>>>
>>>> Looking back I think 06c5fe9b12dd ("x86/resctrl: Fix incorrect local bandwidth when mba_sc
>>>> is enabled") may explain how the code ended up the way it is.
>>>>
>>>>> Change mbm_bw_count() to use the current chunks value from
>>>>> __mon_event_count() to calculate bandwidth. This means it no longer
>>>>> operates on hardware register values.
>>>>
>>>> ok ... so could the patch just do this as it is stated here? The way it is implemented is
>>>> very complicated and hard (for me) to verify the correctness (more below).
>>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> index 6c8226987dd6..a1232462db14 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>
>>>>>     static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>>>>>     {
>>>>>         struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
>>>>>         struct mbm_state *m = &rr->d->mbm_local[rmid];
>>>>> -    u64 tval, cur_bw, chunks;
>>>>> +    u64 cur_bw, chunks, cur_chunks;
>>>>>     -    tval = __rmid_read(rmid, rr->evtid);
>>>>> -    if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
>>>>> -        return;
>>>>> +    cur_chunks = rr->val;
>>>>> +    chunks = cur_chunks - m->prev_bw_chunks;
>>>>> +    m->prev_bw_chunks = cur_chunks;
>>>>>     -    chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
>>>>> -    cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
>>>>> +    cur_bw = (chunks * hw_res->mon_scale) >> 20;
>>>
>>>> I find this quite confusing. What if a new m->prev_chunks is introduced instead and
>>>> initialized in __mon_event_count() to the value of chunks, and then here in mbm_bw_count
>>>> it could just retrieve it (chunks = m->prev_chunks).
>>>
>>> (I agree the diff is noisy, it may be easier as a new function as this is a replacement
>>> not a transform of the existing function)
>>>
>>> __mon_event_count() can also be triggered by user-space reading the file, so any of its
>>> 'prev' values should be ignored, as they aren't updated on the 1-second timer needed to
>>> get this in MB/s.
> 
>> The resource group's mutex is taken before __mon_event_count() is called via user-space or
>> via the overflow handler so I think that mbm_bw_count() can assume that the prev values
>> are from the __mon_event_count() called just before it.
> 
> That is true. But changing this to work with the overflow+corrected value directly means
> it doesn't need changing again as each of those steps are moved into the architecture
> specific function. Changing this would make the later patches noisier, and we would have
> the same discussion on a later patch.

ok

> 
> 
>>> __mon_event_count()'s chunks values hasn't been through get_corrected_mbm_count(), so it
>>> would need to be rr->val, which is what this code starts with for the "number of chunks
>>> ever read by this counter".
> 
>> The value could be corrected in mbm_bw_count(), no?
> 
> It could, but the aim of the series is to move all the architecture specific behaviour
> behind an arch helper.

ok - I am still working on understanding how these helpers are organized

> 
> MPAMs counters read in bytes, and when they don't, its up to the MPAM architecture
> specific code to fix the hardware values before resctrl gets them.
> 
> There is no reason for the mba_sc code to be architecture specific, it operates on the
> counters and controls.
> 
> 
>>> The variable 'chunks' has been used too much here, so its lost its meaning. How about:
>>> |    current_chunk_count = rr->val;
>>> |    delta_counter = current_chunk_count - m->prev_chunk_count;
>>> |    cur_bw = (delta_counter * hw_res->mon_scale) >> 20;
>>> |
>>> |    m->prev_chunk_count = current_chunk_count;
>>>
>>>
>>> The 'delta_counter' step was previously hidden in mbm_overflow_count(), which also had to
>>> do with overflow of the hardware counter. Because a previously sanitised value is being
>>> used, the hardware counters resolution doesn't need to be considered.
>>> (which helps make mba_sc architecture independent)
> 
>> This is the part that is not obvious to me: is the difference between the two individually
>> sanitized measurements the same as sanitizing the difference between the two measurements?
> 
> I agree get_corrected_mbm_count()'s rmid check and shift hide what it is doing, but it
> boils down to a multiply. The existing code is (a - b)*cf, which is the same as this a*cf
> - b*cf.
> 
> I'm not worried about this going wrong after 18-and-a-bit Exabytes of data is transferred,
> at current memory speeds that would take decades. But: none of the 'cf' values are greater
> than two, and the hardware register has two bits taken for error codes, so there is no a
> or b that hardware can represent, with a cf less than two, that overflows a 64bit unsigned
> long.

Thank you for answering it in this way. This seems fair. Could the 
commit message please elaborate more on the changes involved? The 
current summary of "Change mbm_bw_count() to use the current chunks 
value from __mon_event_count() to calculate bandwidth." is too cryptic 
(for me).

Reinette



  reply	other threads:[~2021-10-29 22:22 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
2021-10-01 16:02 ` [PATCH v2 01/23] x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails James Morse
2021-10-01 16:02 ` [PATCH v2 02/23] x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu() James Morse
2021-10-01 16:02 ` [PATCH v2 03/23] x86/resctrl: Kill off alloc_enabled James Morse
2021-10-15 22:19   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled James Morse
2021-10-19 23:18   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work James Morse
2021-10-15 22:19   ` Reinette Chatre
2021-10-22 18:30     ` James Morse
2021-10-19 23:19   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 06/23] x86/resctrl: Group struct rdt_hw_domain cleanup James Morse
2021-10-01 16:02 ` [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work James Morse
2021-10-19 23:19   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
2021-10-15 22:26   ` Reinette Chatre
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
2021-10-15 22:26   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val James Morse
2021-10-15 22:27   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 11/23] x86/resctrl: Remove set_mba_sc()s control array re-initialisation James Morse
2021-10-01 16:02 ` [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
2021-10-07  6:13   ` tan.shaopeng
2021-10-27 16:50     ` James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-27 16:50     ` James Morse
2021-10-27 20:41       ` Reinette Chatre
2021-10-29 15:50         ` James Morse
2021-10-29 22:22           ` Reinette Chatre [this message]
2021-10-01 16:02 ` [PATCH v2 15/23] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
2021-10-15 22:29   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 16/23] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
2021-10-07  6:16   ` tan.shaopeng
2021-10-01 16:02 ` [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read() James Morse
2021-10-15 22:29   ` Reinette Chatre
2021-10-19 23:20   ` Babu Moger
2021-10-20 18:15     ` Reinette Chatre
2021-10-20 19:22       ` Babu Moger
2021-10-20 20:28         ` Reinette Chatre
2021-10-27 16:50           ` James Morse
2021-10-27 18:59             ` Babu Moger
2021-10-01 16:02 ` [PATCH v2 18/23] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
2021-10-15 22:30   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 19/23] x86/resctrl: Move mbm_overflow_count() " James Morse
2021-10-01 16:02 ` [PATCH v2 20/23] x86/resctrl: Move get_corrected_mbm_count() " James Morse
2021-10-01 16:03 ` [PATCH v2 21/23] x86/resctrl: Rename and change the units of resctrl_cqm_threshold James Morse
2021-10-01 16:03 ` [PATCH v2 22/23] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data James Morse
2021-10-01 16:03 ` [PATCH v2 23/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
2021-10-13  2:09 ` [PATCH v2 00/23] " tan.shaopeng
2021-10-19 23:17 ` Babu Moger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bcdf2daf-ae8a-9047-e171-3dd36a98a66a@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Babu.Moger@amd.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=bp@alien8.de \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jamie@nuviainc.com \
    --cc=lcherian@marvell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=scott@os.amperecomputing.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).