All of lore.kernel.org
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: James Morse <james.morse@arm.com>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
	<shameerali.kolothum.thodi@huawei.com>,
	Jamie Iles <jamie@nuviainc.com>,
	"D Scott Phillips OS" <scott@os.amperecomputing.com>,
	<lcherian@marvell.com>, <bobo.shaobowang@huawei.com>,
	<tan.shaopeng@fujitsu.com>
Subject: Re: [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
Date: Fri, 29 Oct 2021 15:22:01 -0700	[thread overview]
Message-ID: <bcdf2daf-ae8a-9047-e171-3dd36a98a66a@intel.com> (raw)
In-Reply-To: <76b70d56-b4b2-6fec-693a-a2105f446ec6@arm.com>

Hi James,

On 10/29/2021 8:50 AM, James Morse wrote:
> On 27/10/2021 21:41, Reinette Chatre wrote:
>> On 10/27/2021 9:50 AM, James Morse wrote:
>>> On 15/10/2021 23:28, Reinette Chatre wrote:
>>>> On 10/1/2021 9:02 AM, James Morse wrote:
>>>>> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
>>>>> second. It reads the hardware register, calculates the bandwidth and
>>>>> updates m->prev_bw_msr which is used to hold the previous hardware register
>>>>> value.
>>>>>
>>>>> Operating directly on hardware register values makes it difficult to make
>>>>> this code architecture independent, so that it can be moved to /fs/,
>>>>> making the mba_sc feature something resctrl supports with no additional
>>>>> support from the architecture.
>>>>> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
>>>>> register using __mon_event_count().
>>>>
>>>> Looking back I think 06c5fe9b12dd ("x86/resctrl: Fix incorrect local bandwidth when mba_sc
>>>> is enabled") may explain how the code ended up the way it is.
>>>>
>>>>> Change mbm_bw_count() to use the current chunks value from
>>>>> __mon_event_count() to calculate bandwidth. This means it no longer
>>>>> operates on hardware register values.
>>>>
>>>> ok ... so could the patch just do this as it is stated here? The way it is implemented is
>>>> very complicated and hard (for me) to verify the correctness (more below).
>>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> index 6c8226987dd6..a1232462db14 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>
>>>>>     static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>>>>>     {
>>>>>         struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
>>>>>         struct mbm_state *m = &rr->d->mbm_local[rmid];
>>>>> -    u64 tval, cur_bw, chunks;
>>>>> +    u64 cur_bw, chunks, cur_chunks;
>>>>>     -    tval = __rmid_read(rmid, rr->evtid);
>>>>> -    if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
>>>>> -        return;
>>>>> +    cur_chunks = rr->val;
>>>>> +    chunks = cur_chunks - m->prev_bw_chunks;
>>>>> +    m->prev_bw_chunks = cur_chunks;
>>>>>     -    chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
>>>>> -    cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
>>>>> +    cur_bw = (chunks * hw_res->mon_scale) >> 20;
>>>
>>>> I find this quite confusing. What if a new m->prev_chunks is introduced instead and
>>>> initialized in __mon_event_count() to the value of chunks, and then here in mbm_bw_count
>>>> it could just retrieve it (chunks = m->prev_chunks).
>>>
>>> (I agree the diff is noisy, it may be easier as a new function as this is a replacement
>>> not a transform of the existing function)
>>>
>>> __mon_event_count() can also be triggered by user-space reading the file, so any of its
>>> 'prev' values should be ignored, as they aren't updated on the 1-second timer needed to
>>> get this in MB/s.
> 
>> The resource group's mutex is taken before __mon_event_count() is called via user-space or
>> via the overflow handler so I think that mbm_bw_count() can assume that the prev values
>> are from the __mon_event_count() called just before it.
> 
> That is true. But changing this to work with the overflow+corrected value directly means
> it doesn't need changing again as each of those steps are moved into the architecture
> specific function. Changing this would make the later patches noisier, and we would have
> the same discussion on a later patch.

ok

> 
> 
>>> __mon_event_count()'s chunks values hasn't been through get_corrected_mbm_count(), so it
>>> would need to be rr->val, which is what this code starts with for the "number of chunks
>>> ever read by this counter".
> 
>> The value could be corrected in mbm_bw_count(), no?
> 
> It could, but the aim of the series is to move all the architecture specific behaviour
> behind an arch helper.

ok - I am still working on understanding how these helpers are organized

> 
> MPAMs counters read in bytes, and when they don't, its up to the MPAM architecture
> specific code to fix the hardware values before resctrl gets them.
> 
> There is no reason for the mba_sc code to be architecture specific, it operates on the
> counters and controls.
> 
> 
>>> The variable 'chunks' has been used too much here, so its lost its meaning. How about:
>>> |    current_chunk_count = rr->val;
>>> |    delta_counter = current_chunk_count - m->prev_chunk_count;
>>> |    cur_bw = (delta_counter * hw_res->mon_scale) >> 20;
>>> |
>>> |    m->prev_chunk_count = current_chunk_count;
>>>
>>>
>>> The 'delta_counter' step was previously hidden in mbm_overflow_count(), which also had to
>>> do with overflow of the hardware counter. Because a previously sanitised value is being
>>> used, the hardware counters resolution doesn't need to be considered.
>>> (which helps make mba_sc architecture independent)
> 
>> This is the part that is not obvious to me: is the difference between the two individually
>> sanitized measurements the same as sanitizing the difference between the two measurements?
> 
> I agree get_corrected_mbm_count()'s rmid check and shift hide what it is doing, but it
> boils down to a multiply. The existing code is (a - b)*cf, which is the same as this a*cf
> - b*cf.
> 
> I'm not worried about this going wrong after 18-and-a-bit Exabytes of data is transferred,
> at current memory speeds that would take decades. But: none of the 'cf' values are greater
> than two, and the hardware register has two bits taken for error codes, so there is no a
> or b that hardware can represent, with a cf less than two, that overflows a 64bit unsigned
> long.

Thank you for answering it in this way. This seems fair. Could the 
commit message please elaborate more on the changes involved? The 
current summary of "Change mbm_bw_count() to use the current chunks 
value from __mon_event_count() to calculate bandwidth." is too cryptic 
(for me).

Reinette



  reply	other threads:[~2021-10-29 22:22 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
2021-10-01 16:02 ` [PATCH v2 01/23] x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails James Morse
2021-10-01 16:02 ` [PATCH v2 02/23] x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu() James Morse
2021-10-01 16:02 ` [PATCH v2 03/23] x86/resctrl: Kill off alloc_enabled James Morse
2021-10-15 22:19   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled James Morse
2021-10-19 23:18   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work James Morse
2021-10-15 22:19   ` Reinette Chatre
2021-10-22 18:30     ` James Morse
2021-10-19 23:19   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 06/23] x86/resctrl: Group struct rdt_hw_domain cleanup James Morse
2021-10-01 16:02 ` [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work James Morse
2021-10-19 23:19   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
2021-10-15 22:26   ` Reinette Chatre
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
2021-10-15 22:26   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val James Morse
2021-10-15 22:27   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 11/23] x86/resctrl: Remove set_mba_sc()s control array re-initialisation James Morse
2021-10-01 16:02 ` [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
2021-10-07  6:13   ` tan.shaopeng
2021-10-27 16:50     ` James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-27 16:50     ` James Morse
2021-10-27 20:41       ` Reinette Chatre
2021-10-29 15:50         ` James Morse
2021-10-29 22:22           ` Reinette Chatre [this message]
2021-10-01 16:02 ` [PATCH v2 15/23] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
2021-10-15 22:29   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 16/23] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
2021-10-07  6:16   ` tan.shaopeng
2021-10-01 16:02 ` [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read() James Morse
2021-10-15 22:29   ` Reinette Chatre
2021-10-19 23:20   ` Babu Moger
2021-10-20 18:15     ` Reinette Chatre
2021-10-20 19:22       ` Babu Moger
2021-10-20 20:28         ` Reinette Chatre
2021-10-27 16:50           ` James Morse
2021-10-27 18:59             ` Babu Moger
2021-10-01 16:02 ` [PATCH v2 18/23] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
2021-10-15 22:30   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 19/23] x86/resctrl: Move mbm_overflow_count() " James Morse
2021-10-01 16:02 ` [PATCH v2 20/23] x86/resctrl: Move get_corrected_mbm_count() " James Morse
2021-10-01 16:03 ` [PATCH v2 21/23] x86/resctrl: Rename and change the units of resctrl_cqm_threshold James Morse
2021-10-01 16:03 ` [PATCH v2 22/23] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data James Morse
2021-10-01 16:03 ` [PATCH v2 23/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
2021-10-13  2:09 ` [PATCH v2 00/23] " tan.shaopeng
2021-10-19 23:17 ` Babu Moger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bcdf2daf-ae8a-9047-e171-3dd36a98a66a@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Babu.Moger@amd.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=bp@alien8.de \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jamie@nuviainc.com \
    --cc=lcherian@marvell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=scott@os.amperecomputing.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.