linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
@ 2020-12-04  6:27 Xiaochen Shen
  2020-12-09 19:06 ` [tip: x86/cache] " tip-bot2 for Xiaochen Shen
  2020-12-10 16:57 ` [tip: x86/urgent] " tip-bot2 for Xiaochen Shen
  0 siblings, 2 replies; 10+ messages in thread
From: Xiaochen Shen @ 2020-12-04  6:27 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, tony.luck, fenghua.yu, reinette.chatre
  Cc: x86, linux-kernel, pei.p.jia, xiaochen.shen

MBA software controller (mba_sc) is a feedback loop which periodically
reads MBM counters and tries to restrict the bandwidth below a user
specified bandwidth. It tags along MBM counter overflow handler to do
the updates with 1s interval in mbm_update() and update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is read out
to the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 54dffe574e67..a98519a3a2e6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -279,7 +279,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 		return;
 
 	chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width);
-	m->chunks += chunks;
 	cur_bw = (chunks * r->mon_scale) >> 20;
 
 	if (m->delta_comp)
@@ -450,15 +449,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+		__mon_event_count(rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
 		 * control groups and when user has enabled
 		 * the software controller explicitly.
 		 */
-		if (!is_mba_sc(NULL))
-			__mon_event_count(rmid, &rr);
-		else
+		if (is_mba_sc(NULL))
 			mbm_bw_count(rmid, &rr);
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-04  6:27 [PATCH] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled Xiaochen Shen
@ 2020-12-09 19:06 ` tip-bot2 for Xiaochen Shen
  2020-12-09 22:23   ` Borislav Petkov
  2020-12-10 16:57 ` [tip: x86/urgent] " tip-bot2 for Xiaochen Shen
  1 sibling, 1 reply; 10+ messages in thread
From: tip-bot2 for Xiaochen Shen @ 2020-12-09 19:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Xiaochen Shen, Borislav Petkov, Tony Luck, x86, linux-kernel

The following commit has been merged into the x86/cache branch of tip:

Commit-ID:     2ba836dbe2467d31fffb439258c2f454c6f1a317
Gitweb:        https://git.kernel.org/tip/2ba836dbe2467d31fffb439258c2f454c6f1a317
Author:        Xiaochen Shen <xiaochen.shen@intel.com>
AuthorDate:    Fri, 04 Dec 2020 14:27:59 +08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Wed, 09 Dec 2020 20:00:46 +01:00

x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled

The MBA software controller (mba_sc) is a feedback loop which periodically
reads MBM counters and tries to restrict the bandwidth below a user
specified bandwidth. It tags along the MBM counter overflow handler to do
the updates with 1s interval in mbm_update() and update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated using incorrect m->chunks is shown to
the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 622073f..93a33b7 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -320,7 +320,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 	}
 
 	chunks = mbm_overflow_count(m->prev_msr, tval, rr->r->mbm_width);
-	m->chunks += chunks;
 	m->prev_msr = tval;
 
 	rr->val += get_corrected_mbm_count(rmid, m->chunks);
@@ -514,15 +513,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+		__mon_event_count(rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
 		 * control groups and when user has enabled
 		 * the software controller explicitly.
 		 */
-		if (!is_mba_sc(NULL))
-			__mon_event_count(rmid, &rr);
-		else
+		if (is_mba_sc(NULL))
 			mbm_bw_count(rmid, &rr);
 	}
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-09 19:06 ` [tip: x86/cache] " tip-bot2 for Xiaochen Shen
@ 2020-12-09 22:23   ` Borislav Petkov
  2020-12-09 22:27     ` Borislav Petkov
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Borislav Petkov @ 2020-12-09 22:23 UTC (permalink / raw)
  To: Xiaochen Shen
  Cc: linux-tip-commits, Xiaochen Shen, Borislav Petkov, Tony Luck,
	x86, linux-kernel

On Wed, Dec 09, 2020 at 07:06:58PM -0000, tip-bot2 for Xiaochen Shen wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 622073f..93a33b7 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -320,7 +320,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
>  	}
>  
>  	chunks = mbm_overflow_count(m->prev_msr, tval, rr->r->mbm_width);
> -	m->chunks += chunks;
>  	m->prev_msr = tval;
>  
>  	rr->val += get_corrected_mbm_count(rmid, m->chunks);


Hmm, zapping this one. First, there's an unused variable warning:

https://lkml.kernel.org/r/202012100516.H7sTNehL-lkp@intel.com

and you should remove the chunks assignment too.

And then it didn't apply cleanly:

$ test-apply.sh /tmp/xiaochen.01 
checking file arch/x86/kernel/cpu/resctrl/monitor.c
Hunk #1 FAILED at 279.
Hunk #2 succeeded at 514 (offset 64 lines).
1 out of 2 hunks FAILED

I wiggled it in but it ended up removing the wrong chunks inc line -
not the one in mbm_bw_count() but in __mon_event_count() - which I just
realized.

So please redo this patch against:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/cache

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-09 22:23   ` Borislav Petkov
@ 2020-12-09 22:27     ` Borislav Petkov
  2020-12-10  4:45     ` Xiaochen Shen
  2020-12-10  5:49     ` [tip: x86/cache v2] " Xiaochen Shen
  2 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2020-12-09 22:27 UTC (permalink / raw)
  To: Xiaochen Shen; +Cc: linux-tip-commits, Tony Luck, x86, linux-kernel

On Wed, Dec 09, 2020 at 11:23:28PM +0100, Borislav Petkov wrote:
> and you should remove the chunks assignment too.

Yah, ignore that part - mbm_bw_count() does need chunks for cur_bw. Oh
well.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-09 22:23   ` Borislav Petkov
  2020-12-09 22:27     ` Borislav Petkov
@ 2020-12-10  4:45     ` Xiaochen Shen
  2020-12-10 10:26       ` Borislav Petkov
  2020-12-10  5:49     ` [tip: x86/cache v2] " Xiaochen Shen
  2 siblings, 1 reply; 10+ messages in thread
From: Xiaochen Shen @ 2020-12-10  4:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-tip-commits, Borislav Petkov, Tony Luck, x86, linux-kernel,
	Xiaochen Shen

Hi Boris,

On 12/10/2020 6:23, Borislav Petkov wrote:
> And then it didn't apply cleanly:
>
> $ test-apply.sh /tmp/xiaochen.01
> checking file arch/x86/kernel/cpu/resctrl/monitor.c
> Hunk #1 FAILED at 279.
> Hunk #2 succeeded at 514 (offset 64 lines).
> 1 out of 2 hunks FAILED
>
> I wiggled it in but it ended up removing the wrong chunks inc line -
> not the one in mbm_bw_count() but in __mon_event_count() - which I just
> realized.

Thank you for clarifying this issue. It is not a 0-DAY CI issue.

>
> So please redo this patch against:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/cache
>
> Thx.

I will send a patch against branch tip: x86/cache soon.
Thank you.

-- 
Best regards,
Xiaochen


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: x86/cache v2] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-09 22:23   ` Borislav Petkov
  2020-12-09 22:27     ` Borislav Petkov
  2020-12-10  4:45     ` Xiaochen Shen
@ 2020-12-10  5:49     ` Xiaochen Shen
  2 siblings, 0 replies; 10+ messages in thread
From: Xiaochen Shen @ 2020-12-10  5:49 UTC (permalink / raw)
  To: bp; +Cc: linux-tip-commits, bp, tony.luck, x86, linux-kernel, xiaochen.shen

MBA software controller (mba_sc) is a feedback loop which periodically
reads MBM counters and tries to restrict the bandwidth below a user
specified bandwidth. It tags along MBM counter overflow handler to do
the updates with 1s interval in mbm_update() and update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is read out
to the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 622073f..7ac3121 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -343,7 +343,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 		return;
 
 	chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width);
-	m->chunks += chunks;
 	cur_bw = (get_corrected_mbm_count(rmid, chunks) * r->mon_scale) >> 20;
 
 	if (m->delta_comp)
@@ -514,15 +513,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+		__mon_event_count(rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
 		 * control groups and when user has enabled
 		 * the software controller explicitly.
 		 */
-		if (!is_mba_sc(NULL))
-			__mon_event_count(rmid, &rr);
-		else
+		if (is_mba_sc(NULL))
 			mbm_bw_count(rmid, &rr);
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-10  4:45     ` Xiaochen Shen
@ 2020-12-10 10:26       ` Borislav Petkov
  2020-12-10 16:40         ` Xiaochen Shen
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2020-12-10 10:26 UTC (permalink / raw)
  To: Xiaochen Shen; +Cc: linux-tip-commits, Tony Luck, x86, linux-kernel

On Thu, Dec 10, 2020 at 12:45:11PM +0800, Xiaochen Shen wrote:
> Thank you for clarifying this issue. It is not a 0-DAY CI issue.

Which begs the question: this patch should be Cc: stable and should go
in now, shouldn't it?

Because then the first submission applies cleanly ontop of
tip:x86/urgent.

I mean, it is fixing only reporting but that reporting is kinda waaay
off.

Hmmm?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-10 10:26       ` Borislav Petkov
@ 2020-12-10 16:40         ` Xiaochen Shen
  2020-12-10 16:50           ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Xiaochen Shen @ 2020-12-10 16:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-tip-commits, Tony Luck, x86, linux-kernel, Xiaochen Shen

Hi Boris,

On 12/10/2020 18:26, Borislav Petkov wrote:
> On Thu, Dec 10, 2020 at 12:45:11PM +0800, Xiaochen Shen wrote:
>> Thank you for clarifying this issue. It is not a 0-DAY CI issue.
> Which begs the question: this patch should be Cc: stable and should go
> in now, shouldn't it?
>
> Because then the first submission applies cleanly ontop of
> tip:x86/urgent.
>
> I mean, it is fixing only reporting but that reporting is kinda waaay
> off.
>
> Hmmm?
>

Yes. Thank you for pointing out this, we'd better add Cc: stable tag.
But we have to backport the patch for -stable trees because the code base
is changed by following upstream commits:

(1) For all stable trees: in commit abe8f12b4425 ("x86/resctrl: Remove
unused struct mbm_state::chunks_bw"), removing unused struct
mbm_state::chunks_bw changes the code base for this patch.

(2) For 4.14 and 4.19 stable trees: in commit fa7d949337cc ("x86/resctrl:
Rename and move rdt files to a separate directory"), the file
arch/x86/kernel/cpu/intel_rdt_monitor.c has been renamed and moved to
arch/x86/kernel/cpu/resctrl/monitor.c. This patch needs to be rebased.

I plan to do backporting for all -stable trees after this patch is merged
into upstream.

Thank you.

-- 
Best regards,
Xiaochen


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-10 16:40         ` Xiaochen Shen
@ 2020-12-10 16:50           ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2020-12-10 16:50 UTC (permalink / raw)
  To: Xiaochen Shen; +Cc: linux-tip-commits, Tony Luck, x86, linux-kernel

On Fri, Dec 11, 2020 at 12:40:46AM +0800, Xiaochen Shen wrote:
> I plan to do backporting for all -stable trees after this patch is merged
> into upstream.

Ok, you can do that when Greg sends the mails that it cannot apply to
the respective trees. Lemme queue this one to urgent now.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: x86/urgent] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  2020-12-04  6:27 [PATCH] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled Xiaochen Shen
  2020-12-09 19:06 ` [tip: x86/cache] " tip-bot2 for Xiaochen Shen
@ 2020-12-10 16:57 ` tip-bot2 for Xiaochen Shen
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot2 for Xiaochen Shen @ 2020-12-10 16:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Xiaochen Shen, Borislav Petkov, Tony Luck, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     06c5fe9b12dde1b62821f302f177c972bb1c81f9
Gitweb:        https://git.kernel.org/tip/06c5fe9b12dde1b62821f302f177c972bb1c81f9
Author:        Xiaochen Shen <xiaochen.shen@intel.com>
AuthorDate:    Fri, 04 Dec 2020 14:27:59 +08:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Thu, 10 Dec 2020 17:52:37 +01:00

x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled

The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 54dffe5..a98519a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -279,7 +279,6 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 		return;
 
 	chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width);
-	m->chunks += chunks;
 	cur_bw = (chunks * r->mon_scale) >> 20;
 
 	if (m->delta_comp)
@@ -450,15 +449,14 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+		__mon_event_count(rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
 		 * control groups and when user has enabled
 		 * the software controller explicitly.
 		 */
-		if (!is_mba_sc(NULL))
-			__mon_event_count(rmid, &rr);
-		else
+		if (is_mba_sc(NULL))
 			mbm_bw_count(rmid, &rr);
 	}
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-10 16:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04  6:27 [PATCH] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled Xiaochen Shen
2020-12-09 19:06 ` [tip: x86/cache] " tip-bot2 for Xiaochen Shen
2020-12-09 22:23   ` Borislav Petkov
2020-12-09 22:27     ` Borislav Petkov
2020-12-10  4:45     ` Xiaochen Shen
2020-12-10 10:26       ` Borislav Petkov
2020-12-10 16:40         ` Xiaochen Shen
2020-12-10 16:50           ` Borislav Petkov
2020-12-10  5:49     ` [tip: x86/cache v2] " Xiaochen Shen
2020-12-10 16:57 ` [tip: x86/urgent] " tip-bot2 for Xiaochen Shen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).