linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/resctrl: Fix default monitoring groups reporting
@ 2021-06-29 18:07 Babu Moger
  2021-07-19 20:43 ` Reinette Chatre
  0 siblings, 1 reply; 3+ messages in thread
From: Babu Moger @ 2021-06-29 18:07 UTC (permalink / raw)
  To: tglx, mingo, bp
  Cc: fenghua.yu, x86, linux-kernel, babu.moger, hpa, reinette.chatre,
	pawel.szulik

From: Babu Moger <Babu.Moger@amd.com>

Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on
the entire filesystem.

Steps to reproduce.
1. #mount -t resctrl resctrl /sys/fs/resctrl/

2. #cd /sys/fs/resctrl/

3. #cat mon_data/mon_L3_00/mbm_total_bytes 23189832

4. #mkdir mon_groups/test1 (create sub monitor group)

5. #cat mon_data/mon_L3_00/mbm_total_bytes Unavailable

When a new monitoring group is created, a new RMID is assigned to the new
group. But the RMID is not active yet. When the events are read on the new
RMID, it is expected to report the status as "Unavailable".

When the user reads the events on the default monitoring group with
multiple subgroups, the events on all sub groups are consolidated together.
Currently, if any of the RMID reads report as "Unavailable", then
everything will be reported as "Unavailable".

Fix the issue by discarding the "Unavailable" reads and reporting all the
successful RMID reads. This is not a problem on Intel systesm as Intel
reports 0 on Inactive RMIDs.

Reported-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Babu Moger <Babu.Moger@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311
---
 arch/x86/kernel/cpu/resctrl/monitor.c |   27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index dbeaa8409313..9573a30c0587 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -285,15 +285,14 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 	return chunks >>= shift;
 }
 
-static int __mon_event_count(u32 rmid, struct rmid_read *rr)
+static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 {
 	struct mbm_state *m;
 	u64 chunks, tval;
 
 	tval = __rmid_read(rmid, rr->evtid);
 	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
-		rr->val = tval;
-		return -EINVAL;
+		return tval;
 	}
 	switch (rr->evtid) {
 	case QOS_L3_OCCUP_EVENT_ID:
@@ -305,12 +304,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 	case QOS_L3_MBM_LOCAL_EVENT_ID:
 		m = &rr->d->mbm_local[rmid];
 		break;
-	default:
-		/*
-		 * Code would never reach here because
-		 * an invalid event id would fail the __rmid_read.
-		 */
-		return -EINVAL;
 	}
 
 	if (rr->first) {
@@ -361,23 +354,29 @@ void mon_event_count(void *info)
 	struct rdtgroup *rdtgrp, *entry;
 	struct rmid_read *rr = info;
 	struct list_head *head;
+	u64 ret_val;
 
 	rdtgrp = rr->rgrp;
 
-	if (__mon_event_count(rdtgrp->mon.rmid, rr))
-		return;
+	ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
 
 	/*
-	 * For Ctrl groups read data from child monitor groups.
+	 * For Ctrl groups read data from child monitor groups and
+	 * add them together. Count events which are read successfully.
+	 * Discard the rmid_read's reporting errors.
 	 */
 	head = &rdtgrp->mon.crdtgrp_list;
 
 	if (rdtgrp->type == RDTCTRL_GROUP) {
 		list_for_each_entry(entry, head, mon.crdtgrp_list) {
-			if (__mon_event_count(entry->mon.rmid, rr))
-				return;
+			if (__mon_event_count(entry->mon.rmid, rr) == 0)
+				ret_val = 0;
 		}
 	}
+
+	/* Report error if none of rmid_reads are successful */
+	if (ret_val)
+		rr->val = ret_val;
 }
 
 /*


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/resctrl: Fix default monitoring groups reporting
  2021-06-29 18:07 [PATCH] x86/resctrl: Fix default monitoring groups reporting Babu Moger
@ 2021-07-19 20:43 ` Reinette Chatre
  2021-07-20 19:15   ` Babu Moger
  0 siblings, 1 reply; 3+ messages in thread
From: Reinette Chatre @ 2021-07-19 20:43 UTC (permalink / raw)
  To: Babu Moger, tglx, mingo, bp
  Cc: fenghua.yu, x86, linux-kernel, hpa, pawel.szulik, Luck, Tony

Hi Babu,

On 6/29/2021 11:07 AM, Babu Moger wrote:
> From: Babu Moger <Babu.Moger@amd.com>
> 
> Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
> getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on
> the entire filesystem.
> 
> Steps to reproduce.
> 1. #mount -t resctrl resctrl /sys/fs/resctrl/
> 
> 2. #cd /sys/fs/resctrl/
> 
> 3. #cat mon_data/mon_L3_00/mbm_total_bytes 23189832
> 
> 4. #mkdir mon_groups/test1 (create sub monitor group)
> 
> 5. #cat mon_data/mon_L3_00/mbm_total_bytes Unavailable
> 
> When a new monitoring group is created, a new RMID is assigned to the new
> group. But the RMID is not active yet. When the events are read on the new
> RMID, it is expected to report the status as "Unavailable".
> 
> When the user reads the events on the default monitoring group with
> multiple subgroups, the events on all sub groups are consolidated together.
> Currently, if any of the RMID reads report as "Unavailable", then
> everything will be reported as "Unavailable".
> 
> Fix the issue by discarding the "Unavailable" reads and reporting all the
> successful RMID reads. This is not a problem on Intel systesm as Intel

systesm -> systems

> reports 0 on Inactive RMIDs.
> 
> Reported-by: Paweł Szulik <pawel.szulik@intel.com>
> Signed-off-by: Babu Moger <Babu.Moger@amd.com>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311

Is a "Fixes" available? If this is specific to AMD then could this be 
the change that enabled AMD systems?

> ---
>   arch/x86/kernel/cpu/resctrl/monitor.c |   27 +++++++++++++--------------
>   1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index dbeaa8409313..9573a30c0587 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -285,15 +285,14 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
>   	return chunks >>= shift;
>   }
>   
> -static int __mon_event_count(u32 rmid, struct rmid_read *rr)
> +static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>   {
>   	struct mbm_state *m;
>   	u64 chunks, tval;
>   
>   	tval = __rmid_read(rmid, rr->evtid);
>   	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
> -		rr->val = tval;
> -		return -EINVAL;
> +		return tval;
>   	}
>   	switch (rr->evtid) {
>   	case QOS_L3_OCCUP_EVENT_ID:
> @@ -305,12 +304,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
>   	case QOS_L3_MBM_LOCAL_EVENT_ID:
>   		m = &rr->d->mbm_local[rmid];
>   		break;
> -	default:
> -		/*
> -		 * Code would never reach here because
> -		 * an invalid event id would fail the __rmid_read.
> -		 */
> -		return -EINVAL;
>   	}
>   
>   	if (rr->first) {
> @@ -361,23 +354,29 @@ void mon_event_count(void *info)
>   	struct rdtgroup *rdtgrp, *entry;
>   	struct rmid_read *rr = info;
>   	struct list_head *head;
> +	u64 ret_val;
>   
>   	rdtgrp = rr->rgrp;
>   
> -	if (__mon_event_count(rdtgrp->mon.rmid, rr))
> -		return;
> +	ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
>   
>   	/*
> -	 * For Ctrl groups read data from child monitor groups.
> +	 * For Ctrl groups read data from child monitor groups and
> +	 * add them together. Count events which are read successfully.
> +	 * Discard the rmid_read's reporting errors.
>   	 */
>   	head = &rdtgrp->mon.crdtgrp_list;
>   
>   	if (rdtgrp->type == RDTCTRL_GROUP) {
>   		list_for_each_entry(entry, head, mon.crdtgrp_list) {
> -			if (__mon_event_count(entry->mon.rmid, rr))
> -				return;
> +			if (__mon_event_count(entry->mon.rmid, rr) == 0)
> +				ret_val = 0;
>   		}
>   	}
> +
> +	/* Report error if none of rmid_reads are successful */
> +	if (ret_val)
> +		rr->val = ret_val;
>   }
>   
>   /*
> 

With the commit message comments addressed:
Acked-by: Reinette Chatre <reinette.chatre@intel.com>

Thank you very much

Reinette

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/resctrl: Fix default monitoring groups reporting
  2021-07-19 20:43 ` Reinette Chatre
@ 2021-07-20 19:15   ` Babu Moger
  0 siblings, 0 replies; 3+ messages in thread
From: Babu Moger @ 2021-07-20 19:15 UTC (permalink / raw)
  To: Reinette Chatre, tglx, mingo, bp
  Cc: fenghua.yu, x86, linux-kernel, hpa, pawel.szulik, Luck, Tony



On 7/19/21 3:43 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/29/2021 11:07 AM, Babu Moger wrote:
>> From: Babu Moger <Babu.Moger@amd.com>
>>
>> Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
>> getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on
>> the entire filesystem.
>>
>> Steps to reproduce.
>> 1. #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> 2. #cd /sys/fs/resctrl/
>>
>> 3. #cat mon_data/mon_L3_00/mbm_total_bytes 23189832
>>
>> 4. #mkdir mon_groups/test1 (create sub monitor group)
>>
>> 5. #cat mon_data/mon_L3_00/mbm_total_bytes Unavailable
>>
>> When a new monitoring group is created, a new RMID is assigned to the new
>> group. But the RMID is not active yet. When the events are read on the new
>> RMID, it is expected to report the status as "Unavailable".
>>
>> When the user reads the events on the default monitoring group with
>> multiple subgroups, the events on all sub groups are consolidated together.
>> Currently, if any of the RMID reads report as "Unavailable", then
>> everything will be reported as "Unavailable".
>>
>> Fix the issue by discarding the "Unavailable" reads and reporting all the
>> successful RMID reads. This is not a problem on Intel systesm as Intel
> 
> systesm -> systems

Sure.
> 
>> reports 0 on Inactive RMIDs.
>>
>> Reported-by: Paweł Szulik <pawel.szulik@intel.com>
>> Signed-off-by: Babu Moger <Babu.Moger@amd.com>
>> Link:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D213311&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C6931f61de9f34fc2175708d94af5eae3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637623242329908534%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eyJwUSKewq8msA6iv6%2FVrbr9QBLUxZKhyJneRREBfm0%3D&amp;reserved=0
>>
> 
> Is a "Fixes" available? If this is specific to AMD then could this be the
> change that enabled AMD systems?

Yes, I will add "Fixes" in my next revision. Hope I will find the proper
commit. I would consider this as a generic fix.


> 
>> ---
>>   arch/x86/kernel/cpu/resctrl/monitor.c |   27 +++++++++++++--------------
>>   1 file changed, 13 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index dbeaa8409313..9573a30c0587 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -285,15 +285,14 @@ static u64 mbm_overflow_count(u64 prev_msr, u64
>> cur_msr, unsigned int width)
>>       return chunks >>= shift;
>>   }
>>   -static int __mon_event_count(u32 rmid, struct rmid_read *rr)
>> +static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>>   {
>>       struct mbm_state *m;
>>       u64 chunks, tval;
>>         tval = __rmid_read(rmid, rr->evtid);
>>       if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
>> -        rr->val = tval;
>> -        return -EINVAL;
>> +        return tval;
>>       }
>>       switch (rr->evtid) {
>>       case QOS_L3_OCCUP_EVENT_ID:
>> @@ -305,12 +304,6 @@ static int __mon_event_count(u32 rmid, struct
>> rmid_read *rr)
>>       case QOS_L3_MBM_LOCAL_EVENT_ID:
>>           m = &rr->d->mbm_local[rmid];
>>           break;
>> -    default:
>> -        /*
>> -         * Code would never reach here because
>> -         * an invalid event id would fail the __rmid_read.
>> -         */
>> -        return -EINVAL;
>>       }
>>         if (rr->first) {
>> @@ -361,23 +354,29 @@ void mon_event_count(void *info)
>>       struct rdtgroup *rdtgrp, *entry;
>>       struct rmid_read *rr = info;
>>       struct list_head *head;
>> +    u64 ret_val;
>>         rdtgrp = rr->rgrp;
>>   -    if (__mon_event_count(rdtgrp->mon.rmid, rr))
>> -        return;
>> +    ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
>>         /*
>> -     * For Ctrl groups read data from child monitor groups.
>> +     * For Ctrl groups read data from child monitor groups and
>> +     * add them together. Count events which are read successfully.
>> +     * Discard the rmid_read's reporting errors.
>>        */
>>       head = &rdtgrp->mon.crdtgrp_list;
>>         if (rdtgrp->type == RDTCTRL_GROUP) {
>>           list_for_each_entry(entry, head, mon.crdtgrp_list) {
>> -            if (__mon_event_count(entry->mon.rmid, rr))
>> -                return;
>> +            if (__mon_event_count(entry->mon.rmid, rr) == 0)
>> +                ret_val = 0;
>>           }
>>       }
>> +
>> +    /* Report error if none of rmid_reads are successful */
>> +    if (ret_val)
>> +        rr->val = ret_val;
>>   }
>>     /*
>>
> 
> With the commit message comments addressed:
> Acked-by: Reinette Chatre <reinette.chatre@intel.com>

Thank You.

> 
> Thank you very much
> 
> Reinette

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-20 19:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-29 18:07 [PATCH] x86/resctrl: Fix default monitoring groups reporting Babu Moger
2021-07-19 20:43 ` Reinette Chatre
2021-07-20 19:15   ` Babu Moger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).