All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
@ 2020-12-04  6:27 Xiaochen Shen
  2020-12-09 19:06 ` [tip: x86/cache] " tip-bot2 for Xiaochen Shen
  2020-12-10 16:57 ` [tip: x86/urgent] " tip-bot2 for Xiaochen Shen
  0 siblings, 2 replies; 10+ messages in thread
From: Xiaochen Shen @ 2020-12-04  6:27 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, tony.luck, fenghua.yu, reinette.chatre
  Cc: x86, linux-kernel, pei.p.jia, xiaochen.shen

MBA software controller (mba_sc) is a feedback loop which periodically
reads MBM counters and tries to restrict the bandwidth below a user
specified bandwidth. It tags along MBM counter overflow handler to do
the updates with 1s interval in mbm_update() and update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is read out
to the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 54dffe574e67..a98519a3a2e6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -279,7 +279,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 		return;
 
 	chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width);
-	m->chunks += chunks;
 	cur_bw = (chunks * r->mon_scale) >> 20;
 
 	if (m->delta_comp)
@@ -450,15 +449,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+		__mon_event_count(rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
 		 * control groups and when user has enabled
 		 * the software controller explicitly.
 		 */
-		if (!is_mba_sc(NULL))
-			__mon_event_count(rmid, &rr);
-		else
+		if (is_mba_sc(NULL))
 			mbm_bw_count(rmid, &rr);
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-04  6:27 [PATCH] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled Xiaochen Shen
@ 2020-12-09 19:06 ` tip-bot2 for Xiaochen Shen
  2020-12-09 22:23   ` Borislav Petkov
  2020-12-10 16:57 ` [tip: x86/urgent] " tip-bot2 for Xiaochen Shen
  1 sibling, 1 reply; 10+ messages in thread
From: tip-bot2 for Xiaochen Shen @ 2020-12-09 19:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Xiaochen Shen, Borislav Petkov, Tony Luck, x86, linux-kernel

The following commit has been merged into the x86/cache branch of tip:

Commit-ID:     2ba836dbe2467d31fffb439258c2f454c6f1a317
Gitweb:        https://git.kernel.org/tip/2ba836dbe2467d31fffb439258c2f454c6f1a317
Author:        Xiaochen Shen <xiaochen.shen@intel.com>
AuthorDate:    Fri, 04 Dec 2020 14:27:59 +08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Wed, 09 Dec 2020 20:00:46 +01:00

x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled

The MBA software controller (mba_sc) is a feedback loop which periodically
reads MBM counters and tries to restrict the bandwidth below a user
specified bandwidth. It tags along the MBM counter overflow handler to do
the updates with 1s interval in mbm_update() and update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated using incorrect m->chunks is shown to
the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 622073f..93a33b7 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -320,7 +320,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 	}
 
 	chunks = mbm_overflow_count(m->prev_msr, tval, rr->r->mbm_width);
-	m->chunks += chunks;
 	m->prev_msr = tval;
 
 	rr->val += get_corrected_mbm_count(rmid, m->chunks);
@@ -514,15 +513,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+		__mon_event_count(rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
 		 * control groups and when user has enabled
 		 * the software controller explicitly.
 		 */
-		if (!is_mba_sc(NULL))
-			__mon_event_count(rmid, &rr);
-		else
+		if (is_mba_sc(NULL))
 			mbm_bw_count(rmid, &rr);
 	}
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-09 19:06 ` [tip: x86/cache] " tip-bot2 for Xiaochen Shen
@ 2020-12-09 22:23   ` Borislav Petkov
  2020-12-09 22:27     ` Borislav Petkov
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Borislav Petkov @ 2020-12-09 22:23 UTC (permalink / raw)
  To: Xiaochen Shen
  Cc: linux-tip-commits, Xiaochen Shen, Borislav Petkov, Tony Luck,
	x86, linux-kernel

On Wed, Dec 09, 2020 at 07:06:58PM -0000, tip-bot2 for Xiaochen Shen wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 622073f..93a33b7 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -320,7 +320,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
>  	}
>  
>  	chunks = mbm_overflow_count(m->prev_msr, tval, rr->r->mbm_width);
> -	m->chunks += chunks;
>  	m->prev_msr = tval;
>  
>  	rr->val += get_corrected_mbm_count(rmid, m->chunks);


Hmm, zapping this one. First, there's an unused variable warning:

https://lkml.kernel.org/r/202012100516.H7sTNehL-lkp@intel.com

and you should remove the chunks assignment too.

And then it didn't apply cleanly:

$ test-apply.sh /tmp/xiaochen.01 
checking file arch/x86/kernel/cpu/resctrl/monitor.c
Hunk #1 FAILED at 279.
Hunk #2 succeeded at 514 (offset 64 lines).
1 out of 2 hunks FAILED

I wiggled it in but it ended up removing the wrong chunks inc line -
not the one in mbm_bw_count() but in __mon_event_count() - which I just
realized.

So please redo this patch against:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/cache

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-09 22:23   ` Borislav Petkov
@ 2020-12-09 22:27     ` Borislav Petkov
  2020-12-10  4:45     ` Xiaochen Shen
  2020-12-10  5:49     ` [tip: x86/cache v2] " Xiaochen Shen
  2 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2020-12-09 22:27 UTC (permalink / raw)
  To: Xiaochen Shen; +Cc: linux-tip-commits, Tony Luck, x86, linux-kernel

On Wed, Dec 09, 2020 at 11:23:28PM +0100, Borislav Petkov wrote:
> and you should remove the chunks assignment too.

Yah, ignore that part - mbm_bw_count() does need chunks for cur_bw. Oh
well.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-09 22:23   ` Borislav Petkov
  2020-12-09 22:27     ` Borislav Petkov
@ 2020-12-10  4:45     ` Xiaochen Shen
  2020-12-10 10:26       ` Borislav Petkov
  2020-12-10  5:49     ` [tip: x86/cache v2] " Xiaochen Shen
  2 siblings, 1 reply; 10+ messages in thread
From: Xiaochen Shen @ 2020-12-10  4:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-tip-commits, Borislav Petkov, Tony Luck, x86, linux-kernel,
	Xiaochen Shen

Hi Boris,

On 12/10/2020 6:23, Borislav Petkov wrote:
> And then it didn't apply cleanly:
>
> $ test-apply.sh /tmp/xiaochen.01
> checking file arch/x86/kernel/cpu/resctrl/monitor.c
> Hunk #1 FAILED at 279.
> Hunk #2 succeeded at 514 (offset 64 lines).
> 1 out of 2 hunks FAILED
>
> I wiggled it in but it ended up removing the wrong chunks inc line -
> not the one in mbm_bw_count() but in __mon_event_count() - which I just
> realized.

Thank you for clarifying this issue. It is not a 0-DAY CI issue.

>
> So please redo this patch against:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/cache
>
> Thx.

I will send a patch against branch tip: x86/cache soon.
Thank you.

-- 
Best regards,
Xiaochen


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: x86/cache v2] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-09 22:23   ` Borislav Petkov
  2020-12-09 22:27     ` Borislav Petkov
  2020-12-10  4:45     ` Xiaochen Shen
@ 2020-12-10  5:49     ` Xiaochen Shen
  2 siblings, 0 replies; 10+ messages in thread
From: Xiaochen Shen @ 2020-12-10  5:49 UTC (permalink / raw)
  To: bp; +Cc: linux-tip-commits, bp, tony.luck, x86, linux-kernel, xiaochen.shen

MBA software controller (mba_sc) is a feedback loop which periodically
reads MBM counters and tries to restrict the bandwidth below a user
specified bandwidth. It tags along MBM counter overflow handler to do
the updates with 1s interval in mbm_update() and update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is read out
to the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 622073f..7ac3121 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -343,7 +343,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 		return;
 
 	chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width);
-	m->chunks += chunks;
 	cur_bw = (get_corrected_mbm_count(rmid, chunks) * r->mon_scale) >> 20;
 
 	if (m->delta_comp)
@@ -514,15 +513,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+		__mon_event_count(rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
 		 * control groups and when user has enabled
 		 * the software controller explicitly.
 		 */
-		if (!is_mba_sc(NULL))
-			__mon_event_count(rmid, &rr);
-		else
+		if (is_mba_sc(NULL))
 			mbm_bw_count(rmid, &rr);
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-10  4:45     ` Xiaochen Shen
@ 2020-12-10 10:26       ` Borislav Petkov
  2020-12-10 16:40         ` Xiaochen Shen
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2020-12-10 10:26 UTC (permalink / raw)
  To: Xiaochen Shen; +Cc: linux-tip-commits, Tony Luck, x86, linux-kernel

On Thu, Dec 10, 2020 at 12:45:11PM +0800, Xiaochen Shen wrote:
> Thank you for clarifying this issue. It is not a 0-DAY CI issue.

Which begs the question: this patch should be Cc: stable and should go
in now, shouldn't it?

Because then the first submission applies cleanly ontop of
tip:x86/urgent.

I mean, it is fixing only reporting but that reporting is kinda waaay
off.

Hmmm?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-10 10:26       ` Borislav Petkov
@ 2020-12-10 16:40         ` Xiaochen Shen
  2020-12-10 16:50           ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Xiaochen Shen @ 2020-12-10 16:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-tip-commits, Tony Luck, x86, linux-kernel, Xiaochen Shen

Hi Boris,

On 12/10/2020 18:26, Borislav Petkov wrote:
> On Thu, Dec 10, 2020 at 12:45:11PM +0800, Xiaochen Shen wrote:
>> Thank you for clarifying this issue. It is not a 0-DAY CI issue.
> Which begs the question: this patch should be Cc: stable and should go
> in now, shouldn't it?
>
> Because then the first submission applies cleanly ontop of
> tip:x86/urgent.
>
> I mean, it is fixing only reporting but that reporting is kinda waaay
> off.
>
> Hmmm?
>

Yes. Thank you for pointing out this, we'd better add Cc: stable tag.
But we have to backport the patch for -stable trees because the code base
is changed by following upstream commits:

(1) For all stable trees: in commit abe8f12b4425 ("x86/resctrl: Remove
unused struct mbm_state::chunks_bw"), removing unused struct
mbm_state::chunks_bw changes the code base for this patch.

(2) For 4.14 and 4.19 stable trees: in commit fa7d949337cc ("x86/resctrl:
Rename and move rdt files to a separate directory"), the file
arch/x86/kernel/cpu/intel_rdt_monitor.c has been renamed and moved to
arch/x86/kernel/cpu/resctrl/monitor.c. This patch needs to be rebased.

I plan to do backporting for all -stable trees after this patch is merged
into upstream.

Thank you.

-- 
Best regards,
Xiaochen


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-10 16:40         ` Xiaochen Shen
@ 2020-12-10 16:50           ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2020-12-10 16:50 UTC (permalink / raw)
  To: Xiaochen Shen; +Cc: linux-tip-commits, Tony Luck, x86, linux-kernel

On Fri, Dec 11, 2020 at 12:40:46AM +0800, Xiaochen Shen wrote:
> I plan to do backporting for all -stable trees after this patch is merged
> into upstream.

Ok, you can do that when Greg sends the mails that it cannot apply to
the respective trees. Lemme queue this one to urgent now.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: x86/urgent] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-04  6:27 [PATCH] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled Xiaochen Shen
  2020-12-09 19:06 ` [tip: x86/cache] " tip-bot2 for Xiaochen Shen
@ 2020-12-10 16:57 ` tip-bot2 for Xiaochen Shen
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot2 for Xiaochen Shen @ 2020-12-10 16:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Xiaochen Shen, Borislav Petkov, Tony Luck, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     06c5fe9b12dde1b62821f302f177c972bb1c81f9
Gitweb:        https://git.kernel.org/tip/06c5fe9b12dde1b62821f302f177c972bb1c81f9
Author:        Xiaochen Shen <xiaochen.shen@intel.com>
AuthorDate:    Fri, 04 Dec 2020 14:27:59 +08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Thu, 10 Dec 2020 17:52:37 +01:00

x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled

The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 54dffe5..a98519a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -279,7 +279,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 		return;
 
 	chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width);
-	m->chunks += chunks;
 	cur_bw = (chunks * r->mon_scale) >> 20;
 
 	if (m->delta_comp)
@@ -450,15 +449,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+		__mon_event_count(rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
 		 * control groups and when user has enabled
 		 * the software controller explicitly.
 		 */
-		if (!is_mba_sc(NULL))
-			__mon_event_count(rmid, &rr);
-		else
+		if (is_mba_sc(NULL))
 			mbm_bw_count(rmid, &rr);
 	}
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-10 16:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04  6:27 [PATCH] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled Xiaochen Shen
2020-12-09 19:06 ` [tip: x86/cache] " tip-bot2 for Xiaochen Shen
2020-12-09 22:23   ` Borislav Petkov
2020-12-09 22:27     ` Borislav Petkov
2020-12-10  4:45     ` Xiaochen Shen
2020-12-10 10:26       ` Borislav Petkov
2020-12-10 16:40         ` Xiaochen Shen
2020-12-10 16:50           ` Borislav Petkov
2020-12-10  5:49     ` [tip: x86/cache v2] " Xiaochen Shen
2020-12-10 16:57 ` [tip: x86/urgent] " tip-bot2 for Xiaochen Shen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.