All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list
@ 2022-09-27  2:54 Shawn Wang
  2022-09-27 13:06 ` James Morse
  0 siblings, 1 reply; 6+ messages in thread
From: Shawn Wang @ 2022-09-27  2:54 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre
  Cc: james.morse, tglx, mingo, bp, dave.hansen, x86, hpa, linux-kernel

Array staged_config in struct rdt_domain still maintains the original value when
resctrl is unmounted. If resctrl is mounted with cdp option and then remounted
without cdp option, field have_new_ctrl in staged_config[CDP_CODE] and
staged_config[CDP_DATA] will still be true. Since resctrl_arch_update_domains()
traverses all resctrl_conf_type, it will continue to update CDP_CODE and
CDP_DATA configurations, which can cause overflow problem.

The problem can be reproduced by the following commands:
    # A system with 16 usable closids and mba disabled
    mount -t resctrl resctrl -o cdp /sys/fs/resctrl
    mkdir /sys/fs/resctrl/p{1..7}
    umount /sys/fs/resctrl/
    mount -t resctrl resctrl /sys/fs/resctrl
    mkdir /sys/fs/resctrl/p{1..8}

dmesg will generate the following error:
    [ 6180.939345] unchecked MSR access error: WRMSR to 0xca0 (tried to write
    0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
    [ 6180.951983] Call Trace:
    [ 6180.954516]  <IRQ>
    [ 6180.956619]  __flush_smp_call_function_queue+0x11d/0x170
    [ 6180.962028]  __sysvec_call_function+0x24/0xd0
    [ 6180.966485]  sysvec_call_function+0x89/0xc0
    [ 6180.970760]  </IRQ>
    [ 6180.972947]  <TASK>
    [ 6180.975131]  asm_sysvec_call_function+0x16/0x20
    [ 6180.979757] RIP: 0010:cpuidle_enter_state+0xcd/0x400
    [ 6180.984821] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 1e e5 77 ff 45 84
    ff 74 12 9c 58 f6 c4 02 0f 85 13 03 00 00 31 ff e8 67 70 7d ff fb 45 85
    f6 <0f> 88 75 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
    [ 6181.003710] RSP: 0018:ffffffff83a03e48 EFLAGS: 00000202
    [ 6181.009028] RAX: ffff943400800000 RBX: 0000000000000001 RCX: 000000000000001f
    [ 6181.016261] RDX: 0000000000000000 RSI: ffffffff83795059 RDI: ffffffff837c101e
    [ 6181.023490] RBP: ffff9434c9352000 R08: 0000059f1cb1a05e R09: 0000000000000008
    [ 6181.030717] R10: 0000000000000001 R11: 0000000000005c66 R12: ffffffff83bbf3a0
    [ 6181.037944] R13: 0000059f1cb1a05e R14: 0000000000000001 R15: 0000000000000000
    [ 6181.045202]  ? cpuidle_enter_state+0xb2/0x400
    [ 6181.049678]  cpuidle_enter+0x24/0x40
    [ 6181.053370]  do_idle+0x1dd/0x260
    [ 6181.056713]  cpu_startup_entry+0x14/0x20
    [ 6181.060753]  rest_init+0xbb/0xc0
    [ 6181.064097]  arch_call_rest_init+0x5/0xa
    [ 6181.068137]  start_kernel+0x668/0x691
    [ 6181.071914]  secondary_startup_64_no_verify+0xe0/0xeb
    [ 6181.077086]  </TASK>

We fix this issue by clearing the staged configs when destroying schemata list.

Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Suggested-by: Xin Hao <xhao@linux.alibaba.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f276aff521e8..b4a817ae83ab 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2127,8 +2127,15 @@ static int schemata_list_create(void)
 static void schemata_list_destroy(void)
 {
 	struct resctrl_schema *s, *tmp;
+	struct rdt_domain *dom;
 
 	list_for_each_entry_safe(s, tmp, &resctrl_schema_all, list) {
+		/*
+		 * Clear staged_config on each domain before schemata list is
+		 * destroyed.
+		 */
+		list_for_each_entry(dom, &s->res->domains, list)
+			memset(dom->staged_config, 0, sizeof(dom->staged_config));
 		list_del(&s->list);
 		kfree(s);
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list
  2022-09-27  2:54 [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list Shawn Wang
@ 2022-09-27 13:06 ` James Morse
  2022-09-27 21:21   ` Reinette Chatre
  0 siblings, 1 reply; 6+ messages in thread
From: James Morse @ 2022-09-27 13:06 UTC (permalink / raw)
  To: Shawn Wang, fenghua.yu, reinette.chatre
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, linux-kernel

Hi Shawn,

On 27/09/2022 03:54, Shawn Wang wrote:
> Array staged_config in struct rdt_domain still maintains the original value when
> resctrl is unmounted. If resctrl is mounted with cdp option and then remounted
> without cdp option, field have_new_ctrl in staged_config[CDP_CODE] and
> staged_config[CDP_DATA] will still be true.

staged_config[CDP_DATA] is an array - its always 'true'. I think you mean
staged_config[CDP_DATA].have_new_ctrl, which will still be true because it is only
memset() when the schemata file is written to.


> Since resctrl_arch_update_domains()
> traverses all resctrl_conf_type, it will continue to update CDP_CODE and
> CDP_DATA configurations, which can cause overflow problem.

Only if its called with a stale staged config, and it should only be called when the
schemata file is written to, which would memset() the staged config first.


> The problem can be reproduced by the following commands:
>     # A system with 16 usable closids and mba disabled
>     mount -t resctrl resctrl -o cdp /sys/fs/resctrl
>     mkdir /sys/fs/resctrl/p{1..7}
>     umount /sys/fs/resctrl/
>     mount -t resctrl resctrl /sys/fs/resctrl
>     mkdir /sys/fs/resctrl/p{1..8}

Thanks for the reproducer - but I don't see what could set have_new_ctrl in this sequence.
You can't call apply_config() to set CPUs in the mask without that being set.

Creating a new control group, (your mkdir step) shouldn't touch the hardware at all, as it
should be left in its reset state from the last umount(), or setup.

I can't reproduce this on v6.0-rc7.
Even if I dirty the configuration by writing to the schemata file, I can't reproduce this.

(I have mba enabled, but all this should affect is the number of closid available)


> dmesg will generate the following error:

Which kernel version is this?

>     [ 6180.939345] unchecked MSR access error: WRMSR to 0xca0 (tried to write
>     0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)

Is 0x7ff the default CBM bitmap for this CPU? Or was it written in a step missing from the
reproducer above?


The rest of this splat isn't helpful as its the result of an IPI...

>     [ 6180.951983] Call Trace:
>     [ 6180.954516]  <IRQ>
>     [ 6180.956619]  __flush_smp_call_function_queue+0x11d/0x170
>     [ 6180.962028]  __sysvec_call_function+0x24/0xd0
>     [ 6180.966485]  sysvec_call_function+0x89/0xc0
>     [ 6180.970760]  </IRQ>
>     [ 6180.972947]  <TASK>
>     [ 6180.975131]  asm_sysvec_call_function+0x16/0x20
>     [ 6180.979757] RIP: 0010:cpuidle_enter_state+0xcd/0x400
>     [ 6180.984821] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 1e e5 77 ff 45 84
>     ff 74 12 9c 58 f6 c4 02 0f 85 13 03 00 00 31 ff e8 67 70 7d ff fb 45 85
>     f6 <0f> 88 75 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
>     [ 6181.003710] RSP: 0018:ffffffff83a03e48 EFLAGS: 00000202
>     [ 6181.009028] RAX: ffff943400800000 RBX: 0000000000000001 RCX: 000000000000001f
>     [ 6181.016261] RDX: 0000000000000000 RSI: ffffffff83795059 RDI: ffffffff837c101e
>     [ 6181.023490] RBP: ffff9434c9352000 R08: 0000059f1cb1a05e R09: 0000000000000008
>     [ 6181.030717] R10: 0000000000000001 R11: 0000000000005c66 R12: ffffffff83bbf3a0
>     [ 6181.037944] R13: 0000059f1cb1a05e R14: 0000000000000001 R15: 0000000000000000
>     [ 6181.045202]  ? cpuidle_enter_state+0xb2/0x400
>     [ 6181.049678]  cpuidle_enter+0x24/0x40
>     [ 6181.053370]  do_idle+0x1dd/0x260
>     [ 6181.056713]  cpu_startup_entry+0x14/0x20
>     [ 6181.060753]  rest_init+0xbb/0xc0
>     [ 6181.064097]  arch_call_rest_init+0x5/0xa
>     [ 6181.068137]  start_kernel+0x668/0x691
>     [ 6181.071914]  secondary_startup_64_no_verify+0xe0/0xeb
>     [ 6181.077086]  </TASK>

It would be good to know what triggered this IPI. It may not have been
resctrl_arch_update_domains(). This pattern also happens from reset_all_ctrls() which
happens during umount(). (and that would write the default CBM bitmap)

If you can reproduce this easily, could you add dump_stack() to update_config() to see if
any path is setting have_new_ctrl. You aren't writing to the schemata file in your reproducer.


> We fix this issue by clearing the staged configs when destroying schemata list.


> Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
> Suggested-by: Xin Hao <xhao@linux.alibaba.com>

If we can work out why you are seeing this, it would need a Fixes tag.

Otherwise I agree it makes sense to make this more robust, but it would need a different
commit message.


Thanks,

James



> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index f276aff521e8..b4a817ae83ab 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2127,8 +2127,15 @@ static int schemata_list_create(void)
>  static void schemata_list_destroy(void)
>  {
>  	struct resctrl_schema *s, *tmp;
> +	struct rdt_domain *dom;
>  
>  	list_for_each_entry_safe(s, tmp, &resctrl_schema_all, list) {
> +		/*
> +		 * Clear staged_config on each domain before schemata list is
> +		 * destroyed.
> +		 */
> +		list_for_each_entry(dom, &s->res->domains, list)
> +			memset(dom->staged_config, 0, sizeof(dom->staged_config));
>  		list_del(&s->list);
>  		kfree(s);
>  	}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list
  2022-09-27 13:06 ` James Morse
@ 2022-09-27 21:21   ` Reinette Chatre
  2022-09-28 13:37     ` Shawn Wang
  2022-09-28 14:08     ` James Morse
  0 siblings, 2 replies; 6+ messages in thread
From: Reinette Chatre @ 2022-09-27 21:21 UTC (permalink / raw)
  To: James Morse, Shawn Wang, fenghua.yu
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, linux-kernel

Hi James and Shawn,

On 9/27/2022 6:06 AM, James Morse wrote:
> On 27/09/2022 03:54, Shawn Wang wrote:
>> Array staged_config in struct rdt_domain still maintains the original value when
>> resctrl is unmounted. If resctrl is mounted with cdp option and then remounted
>> without cdp option, field have_new_ctrl in staged_config[CDP_CODE] and
>> staged_config[CDP_DATA] will still be true.
> 
> staged_config[CDP_DATA] is an array - its always 'true'. I think you mean
> staged_config[CDP_DATA].have_new_ctrl, which will still be true because it is only
> memset() when the schemata file is written to.
> 
> 
>> Since resctrl_arch_update_domains()
>> traverses all resctrl_conf_type, it will continue to update CDP_CODE and
>> CDP_DATA configurations, which can cause overflow problem.
> 
> Only if its called with a stale staged config, and it should only be called when the
> schemata file is written to, which would memset() the staged config first.
> 
> 
>> The problem can be reproduced by the following commands:
>>     # A system with 16 usable closids and mba disabled
>>     mount -t resctrl resctrl -o cdp /sys/fs/resctrl
>>     mkdir /sys/fs/resctrl/p{1..7}
>>     umount /sys/fs/resctrl/
>>     mount -t resctrl resctrl /sys/fs/resctrl
>>     mkdir /sys/fs/resctrl/p{1..8}
> 
> Thanks for the reproducer - but I don't see what could set have_new_ctrl in this sequence.
> You can't call apply_config() to set CPUs in the mask without that being set.
> 
> Creating a new control group, (your mkdir step) shouldn't touch the hardware at all, as it
> should be left in its reset state from the last umount(), or setup.

There is an attempt to configure the hardware in the mkdir path:
rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()

This is required in support of the different resource group modes. When a new
resource group is created via mkdir the configuration should respect any
exclusive resource groups that exist at that point.

> 
> I can't reproduce this on v6.0-rc7.
> Even if I dirty the configuration by writing to the schemata file, I can't reproduce this.
> 

From what I can tell the reproducer is dependent on (a) whether hardware
supports CDP, and (b) the number of CLOSIDs supported by the system. The reproducer
works for hardware that has 16 CLOSIDs (see later).

> (I have mba enabled, but all this should affect is the number of closid available)
> 
> 
>> dmesg will generate the following error:
> 
> Which kernel version is this?
> 
>>     [ 6180.939345] unchecked MSR access error: WRMSR to 0xca0 (tried to write
>>     0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
> 
> Is 0x7ff the default CBM bitmap for this CPU? Or was it written in a step missing from the
> reproducer above?

The value of interest here is the register it tries to write to ... 0xca0.
On a system with 16 CLOSIDs the range of registers available to set the CBM
would be 0xc90 to 0xc9f that corresponds to CLOSID 0 to CLOSID 15. The error is
an attempt to write to an unsupported register - there appears to have been an
attempt to configure non-existent CLOSID 16.

As Shawn already root-caused, this is because the staged_config contains data from
the previous run when CDP was enabled and it is never cleared before the resource group
creation flow (triggered by mkdir).


CDP enabled:
mkdir <resource group>
	domainX.staged_config[CDP_NONE].have_new_ctrl = false
	domainX.staged_config[CDP_NONE].new_ctrl = 0
	domainX.staged_config[CDP_CODE].have_new_ctrl = true
	domainX.staged_config[CDP_CODE].new_ctrl = 0x7ff
	domainX.staged_config[CDP_DATA].have_new_ctrl = true
	domainX.staged_config[CDP_CODE].new_ctrl = 0x7ff

unmount/remount resctrl (CDP disabled):
mkdir <resource group>
	domainX.staged_config[CDP_NONE].have_new_ctrl = true
	domainX.staged_config[CDP_NONE].new_ctrl = 0x7ff
	domainX.staged_config[CDP_CODE].have_new_ctrl = true /* stale */
	domainX.staged_config[CDP_CODE].new_ctrl = 0x7ff     /* stale */
	domainX.staged_config[CDP_DATA].have_new_ctrl = true /* stale */
	domainX.staged_config[CDP_CODE].new_ctrl = 0x7ff     /* stale */

Above becomes an issue when the resource group being created is
for a CLOSID # that is more than half of the CLOSIDs supported.
In the reproducer the issue was encountered when creating resource
group for CLOSID 8 on a system that supports 16 CLOSIDs.

In this case get_config_index() called from
resctrl_arch_update_domains() will return 16 and 17 when processing
this resource group and that translated to an invalid register - 0xca0 in this
scenario.


> The rest of this splat isn't helpful as its the result of an IPI...
> 
>>     [ 6180.951983] Call Trace:
>>     [ 6180.954516]  <IRQ>
>>     [ 6180.956619]  __flush_smp_call_function_queue+0x11d/0x170
>>     [ 6180.962028]  __sysvec_call_function+0x24/0xd0
>>     [ 6180.966485]  sysvec_call_function+0x89/0xc0
>>     [ 6180.970760]  </IRQ>
>>     [ 6180.972947]  <TASK>
>>     [ 6180.975131]  asm_sysvec_call_function+0x16/0x20
>>     [ 6180.979757] RIP: 0010:cpuidle_enter_state+0xcd/0x400
>>     [ 6180.984821] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 1e e5 77 ff 45 84
>>     ff 74 12 9c 58 f6 c4 02 0f 85 13 03 00 00 31 ff e8 67 70 7d ff fb 45 85
>>     f6 <0f> 88 75 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
>>     [ 6181.003710] RSP: 0018:ffffffff83a03e48 EFLAGS: 00000202
>>     [ 6181.009028] RAX: ffff943400800000 RBX: 0000000000000001 RCX: 000000000000001f
>>     [ 6181.016261] RDX: 0000000000000000 RSI: ffffffff83795059 RDI: ffffffff837c101e
>>     [ 6181.023490] RBP: ffff9434c9352000 R08: 0000059f1cb1a05e R09: 0000000000000008
>>     [ 6181.030717] R10: 0000000000000001 R11: 0000000000005c66 R12: ffffffff83bbf3a0
>>     [ 6181.037944] R13: 0000059f1cb1a05e R14: 0000000000000001 R15: 0000000000000000
>>     [ 6181.045202]  ? cpuidle_enter_state+0xb2/0x400
>>     [ 6181.049678]  cpuidle_enter+0x24/0x40
>>     [ 6181.053370]  do_idle+0x1dd/0x260
>>     [ 6181.056713]  cpu_startup_entry+0x14/0x20
>>     [ 6181.060753]  rest_init+0xbb/0xc0
>>     [ 6181.064097]  arch_call_rest_init+0x5/0xa
>>     [ 6181.068137]  start_kernel+0x668/0x691
>>     [ 6181.071914]  secondary_startup_64_no_verify+0xe0/0xeb
>>     [ 6181.077086]  </TASK>
> 
> It would be good to know what triggered this IPI. It may not have been
> resctrl_arch_update_domains(). This pattern also happens from reset_all_ctrls() which

I believe this is indeed from resctrl_arch_update_domains() since it calls
smp_call_function_many() to configure all the domains.

> happens during umount(). (and that would write the default CBM bitmap)
> 
> If you can reproduce this easily, could you add dump_stack() to update_config() to see if
> any path is setting have_new_ctrl. You aren't writing to the schemata file in your reproducer.
> 
> 
>> We fix this issue by clearing the staged configs when destroying schemata list.
> 
> 
>> Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
>> Suggested-by: Xin Hao <xhao@linux.alibaba.com>
> 
> If we can work out why you are seeing this, it would need a Fixes tag.
> 
> Otherwise I agree it makes sense to make this more robust, but it would need a different
> commit message.

What do you think about clearing the staged config within resctrl_arch_update_domains()
after the configuration is complete and there is no more need for it? That may reduce
complexity where each caller no longer need to remember to do so.
I see "staged_config" as a temporary storage and it my help to understand the code better
if it is treated as such. 

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index f276aff521e8..b4a817ae83ab 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2127,8 +2127,15 @@ static int schemata_list_create(void)
>>  static void schemata_list_destroy(void)
>>  {
>>  	struct resctrl_schema *s, *tmp;
>> +	struct rdt_domain *dom;
>>  
>>  	list_for_each_entry_safe(s, tmp, &resctrl_schema_all, list) {
>> +		/*
>> +		 * Clear staged_config on each domain before schemata list is
>> +		 * destroyed.
>> +		 */
>> +		list_for_each_entry(dom, &s->res->domains, list)
>> +			memset(dom->staged_config, 0, sizeof(dom->staged_config));
>>  		list_del(&s->list);
>>  		kfree(s);
>>  	}
> 

Reinette

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list
  2022-09-27 21:21   ` Reinette Chatre
@ 2022-09-28 13:37     ` Shawn Wang
  2022-09-28 14:09       ` James Morse
  2022-09-28 14:08     ` James Morse
  1 sibling, 1 reply; 6+ messages in thread
From: Shawn Wang @ 2022-09-28 13:37 UTC (permalink / raw)
  To: Reinette Chatre, James Morse, fenghua.yu
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, linux-kernel

Hi Reinette and James,

Sorry for some mistakes in the last reply email, so I send it again.

On 9/28/22 5:21 AM, Reinette Chatre wrote:
> Hi James and Shawn,
> 
> On 9/27/2022 6:06 AM, James Morse wrote:
>> On 27/09/2022 03:54, Shawn Wang wrote:
>>> Array staged_config in struct rdt_domain still maintains the original value when
>>> resctrl is unmounted. If resctrl is mounted with cdp option and then remounted
>>> without cdp option, field have_new_ctrl in staged_config[CDP_CODE] and
>>> staged_config[CDP_DATA] will still be true.
>>
>> staged_config[CDP_DATA] is an array - its always 'true'. I think you mean
>> staged_config[CDP_DATA].have_new_ctrl, which will still be true because it is only
>> memset() when the schemata file is written to.
>>
>>
>>> Since resctrl_arch_update_domains()
>>> traverses all resctrl_conf_type, it will continue to update CDP_CODE and
>>> CDP_DATA configurations, which can cause overflow problem.
>>
>> Only if its called with a stale staged config, and it should only be called when the
>> schemata file is written to, which would memset() the staged config first.
>>
>>
>>> The problem can be reproduced by the following commands:
>>>      # A system with 16 usable closids and mba disabled
>>>      mount -t resctrl resctrl -o cdp /sys/fs/resctrl
>>>      mkdir /sys/fs/resctrl/p{1..7}
>>>      umount /sys/fs/resctrl/
>>>      mount -t resctrl resctrl /sys/fs/resctrl
>>>      mkdir /sys/fs/resctrl/p{1..8}
>>
>> Thanks for the reproducer - but I don't see what could set have_new_ctrl in this sequence.
>> You can't call apply_config() to set CPUs in the mask without that being set.
>>
>> Creating a new control group, (your mkdir step) shouldn't touch the hardware at all, as it
>> should be left in its reset state from the last umount(), or setup.
> 
> There is an attempt to configure the hardware in the mkdir path:
> rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()
> 
> This is required in support of the different resource group modes. When a new
> resource group is created via mkdir the configuration should respect any
> exclusive resource groups that exist at that point.
> 
>>
>> I can't reproduce this on v6.0-rc7.
>> Even if I dirty the configuration by writing to the schemata file, I can't reproduce this.
>>
> 
>  From what I can tell the reproducer is dependent on (a) whether hardware
> supports CDP, and (b) the number of CLOSIDs supported by the system. The reproducer
> works for hardware that has 16 CLOSIDs (see later).
> 
>> (I have mba enabled, but all this should affect is the number of closid available)
>>

I reproduce this on v6.0-rc6. The key to reproduction is to ensure that 
the number of usable groups is different between CDP enabled and CDP 
disabled.

The system I use has 16 CLOSIDs for L3 CAT and 8 CLOSIDs for MBA. MBA 
limits the max number of groups to 8, even if CDP is disabled. This is 
the reason why I disable MBA.

>>
>>> dmesg will generate the following error:
>>
>> Which kernel version is this?
>>
>>>      [ 6180.939345] unchecked MSR access error: WRMSR to 0xca0 (tried to write
>>>      0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
>>
>> Is 0x7ff the default CBM bitmap for this CPU? Or was it written in a step missing from the
>> reproducer above?
> 
> The value of interest here is the register it tries to write to ... 0xca0.
> On a system with 16 CLOSIDs the range of registers available to set the CBM
> would be 0xc90 to 0xc9f that corresponds to CLOSID 0 to CLOSID 15. The error is
> an attempt to write to an unsupported register - there appears to have been an
> attempt to configure non-existent CLOSID 16.
> 
> As Shawn already root-caused, this is because the staged_config contains data from
> the previous run when CDP was enabled and it is never cleared before the resource group
> creation flow (triggered by mkdir).
> 
> 
> CDP enabled:
> mkdir <resource group>
> 	domainX.staged_config[CDP_NONE].have_new_ctrl = false
> 	domainX.staged_config[CDP_NONE].new_ctrl = 0
> 	domainX.staged_config[CDP_CODE].have_new_ctrl = true
> 	domainX.staged_config[CDP_CODE].new_ctrl = 0x7ff
> 	domainX.staged_config[CDP_DATA].have_new_ctrl = true
> 	domainX.staged_config[CDP_CODE].new_ctrl = 0x7ff
> 
> unmount/remount resctrl (CDP disabled):
> mkdir <resource group>
> 	domainX.staged_config[CDP_NONE].have_new_ctrl = true
> 	domainX.staged_config[CDP_NONE].new_ctrl = 0x7ff
> 	domainX.staged_config[CDP_CODE].have_new_ctrl = true /* stale */
> 	domainX.staged_config[CDP_CODE].new_ctrl = 0x7ff     /* stale */
> 	domainX.staged_config[CDP_DATA].have_new_ctrl = true /* stale */
> 	domainX.staged_config[CDP_CODE].new_ctrl = 0x7ff     /* stale */
> 
> Above becomes an issue when the resource group being created is
> for a CLOSID # that is more than half of the CLOSIDs supported.
> In the reproducer the issue was encountered when creating resource
> group for CLOSID 8 on a system that supports 16 CLOSIDs.
> 
> In this case get_config_index() called from
> resctrl_arch_update_domains() will return 16 and 17 when processing
> this resource group and that translated to an invalid register - 0xca0 in this
> scenario.
> 

Thanks for the detailed explanation. That's exactly what I mean.

> 
>> The rest of this splat isn't helpful as its the result of an IPI...
>>
>>>      [ 6180.951983] Call Trace:
>>>      [ 6180.954516]  <IRQ>
>>>      [ 6180.956619]  __flush_smp_call_function_queue+0x11d/0x170
>>>      [ 6180.962028]  __sysvec_call_function+0x24/0xd0
>>>      [ 6180.966485]  sysvec_call_function+0x89/0xc0
>>>      [ 6180.970760]  </IRQ>
>>>      [ 6180.972947]  <TASK>
>>>      [ 6180.975131]  asm_sysvec_call_function+0x16/0x20
>>>      [ 6180.979757] RIP: 0010:cpuidle_enter_state+0xcd/0x400
>>>      [ 6180.984821] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 1e e5 77 ff 45 84
>>>      ff 74 12 9c 58 f6 c4 02 0f 85 13 03 00 00 31 ff e8 67 70 7d ff fb 45 85
>>>      f6 <0f> 88 75 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
>>>      [ 6181.003710] RSP: 0018:ffffffff83a03e48 EFLAGS: 00000202
>>>      [ 6181.009028] RAX: ffff943400800000 RBX: 0000000000000001 RCX: 000000000000001f
>>>      [ 6181.016261] RDX: 0000000000000000 RSI: ffffffff83795059 RDI: ffffffff837c101e
>>>      [ 6181.023490] RBP: ffff9434c9352000 R08: 0000059f1cb1a05e R09: 0000000000000008
>>>      [ 6181.030717] R10: 0000000000000001 R11: 0000000000005c66 R12: ffffffff83bbf3a0
>>>      [ 6181.037944] R13: 0000059f1cb1a05e R14: 0000000000000001 R15: 0000000000000000
>>>      [ 6181.045202]  ? cpuidle_enter_state+0xb2/0x400
>>>      [ 6181.049678]  cpuidle_enter+0x24/0x40
>>>      [ 6181.053370]  do_idle+0x1dd/0x260
>>>      [ 6181.056713]  cpu_startup_entry+0x14/0x20
>>>      [ 6181.060753]  rest_init+0xbb/0xc0
>>>      [ 6181.064097]  arch_call_rest_init+0x5/0xa
>>>      [ 6181.068137]  start_kernel+0x668/0x691
>>>      [ 6181.071914]  secondary_startup_64_no_verify+0xe0/0xeb
>>>      [ 6181.077086]  </TASK>
>>
>> It would be good to know what triggered this IPI. It may not have been
>> resctrl_arch_update_domains(). This pattern also happens from reset_all_ctrls() which
> 
> I believe this is indeed from resctrl_arch_update_domains() since it calls
> smp_call_function_many() to configure all the domains.
> 
>> happens during umount(). (and that would write the default CBM bitmap)
>>
>> If you can reproduce this easily, could you add dump_stack() to update_config() to see if
>> any path is setting have_new_ctrl. You aren't writing to the schemata file in your reproducer.
>>
>>
>>> We fix this issue by clearing the staged configs when destroying schemata list.
>>
>>
>>> Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
>>> Suggested-by: Xin Hao <xhao@linux.alibaba.com>
>>
>> If we can work out why you are seeing this, it would need a Fixes tag.
>>
>> Otherwise I agree it makes sense to make this more robust, but it would need a different
>> commit message.
> 
> What do you think about clearing the staged config within resctrl_arch_update_domains()
> after the configuration is complete and there is no more need for it? That may reduce
> complexity where each caller no longer need to remember to do so.
> I see "staged_config" as a temporary storage and it my help to understand the code better
> if it is treated as such.
> 

I think this is better. I have tested it and will give a new version.

>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index f276aff521e8..b4a817ae83ab 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -2127,8 +2127,15 @@ static int schemata_list_create(void)
>>>   static void schemata_list_destroy(void)
>>>   {
>>>   	struct resctrl_schema *s, *tmp;
>>> +	struct rdt_domain *dom;
>>>   
>>>   	list_for_each_entry_safe(s, tmp, &resctrl_schema_all, list) {
>>> +		/*
>>> +		 * Clear staged_config on each domain before schemata list is
>>> +		 * destroyed.
>>> +		 */
>>> +		list_for_each_entry(dom, &s->res->domains, list)
>>> +			memset(dom->staged_config, 0, sizeof(dom->staged_config));
>>>   		list_del(&s->list);
>>>   		kfree(s);
>>>   	}
>>
> 
> Reinette

Thanks,

Shawn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list
  2022-09-27 21:21   ` Reinette Chatre
  2022-09-28 13:37     ` Shawn Wang
@ 2022-09-28 14:08     ` James Morse
  1 sibling, 0 replies; 6+ messages in thread
From: James Morse @ 2022-09-28 14:08 UTC (permalink / raw)
  To: Reinette Chatre, Shawn Wang, fenghua.yu
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, linux-kernel

Hi Reinette,

On 27/09/2022 22:21, Reinette Chatre wrote:
> On 9/27/2022 6:06 AM, James Morse wrote:
>> On 27/09/2022 03:54, Shawn Wang wrote:

>>> The problem can be reproduced by the following commands:
>>>     # A system with 16 usable closids and mba disabled
>>>     mount -t resctrl resctrl -o cdp /sys/fs/resctrl
>>>     mkdir /sys/fs/resctrl/p{1..7}
>>>     umount /sys/fs/resctrl/
>>>     mount -t resctrl resctrl /sys/fs/resctrl
>>>     mkdir /sys/fs/resctrl/p{1..8}
>>
>> Thanks for the reproducer - but I don't see what could set have_new_ctrl in this sequence.
>> You can't call apply_config() to set CPUs in the mask without that being set.
>>
>> Creating a new control group, (your mkdir step) shouldn't touch the hardware at all, as it
>> should be left in its reset state from the last umount(), or setup.
> 
> There is an attempt to configure the hardware in the mkdir path:
> rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()

Aha! I'm not sure why my grepping around didn't find this.

This is a path that doesn't memset() the staged config first, so that explains it.

[..]

> What do you think about clearing the staged config within resctrl_arch_update_domains()
> after the configuration is complete and there is no more need for it? That may reduce
> complexity where each caller no longer need to remember to do so.
> I see "staged_config" as a temporary storage and it my help to understand the code better
> if it is treated as such. 

Yup, that would it with the idea of the value being consumed by
resctrl_arch_update_domains(), which is how I think of it.


Thanks,

James

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list
  2022-09-28 13:37     ` Shawn Wang
@ 2022-09-28 14:09       ` James Morse
  0 siblings, 0 replies; 6+ messages in thread
From: James Morse @ 2022-09-28 14:09 UTC (permalink / raw)
  To: Shawn Wang, Reinette Chatre, fenghua.yu
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, linux-kernel

Hi Shawn,

On 28/09/2022 14:37, Shawn Wang wrote:
> On 9/28/22 5:21 AM, Reinette Chatre wrote:
>> On 9/27/2022 6:06 AM, James Morse wrote:
>>> On 27/09/2022 03:54, Shawn Wang wrote:
>>>> Array staged_config in struct rdt_domain still maintains the original value when
>>>> resctrl is unmounted. If resctrl is mounted with cdp option and then remounted
>>>> without cdp option, field have_new_ctrl in staged_config[CDP_CODE] and
>>>> staged_config[CDP_DATA] will still be true.
>>>
>>> staged_config[CDP_DATA] is an array - its always 'true'. I think you mean
>>> staged_config[CDP_DATA].have_new_ctrl, which will still be true because it is only
>>> memset() when the schemata file is written to.
>>>
>>>
>>>> Since resctrl_arch_update_domains()
>>>> traverses all resctrl_conf_type, it will continue to update CDP_CODE and
>>>> CDP_DATA configurations, which can cause overflow problem.
>>>
>>> Only if its called with a stale staged config, and it should only be called when the
>>> schemata file is written to, which would memset() the staged config first.
>>>
>>>
>>>> The problem can be reproduced by the following commands:
>>>>      # A system with 16 usable closids and mba disabled
>>>>      mount -t resctrl resctrl -o cdp /sys/fs/resctrl
>>>>      mkdir /sys/fs/resctrl/p{1..7}
>>>>      umount /sys/fs/resctrl/
>>>>      mount -t resctrl resctrl /sys/fs/resctrl
>>>>      mkdir /sys/fs/resctrl/p{1..8}
>>>
>>> Thanks for the reproducer - but I don't see what could set have_new_ctrl in this sequence.
>>> You can't call apply_config() to set CPUs in the mask without that being set.
>>>
>>> Creating a new control group, (your mkdir step) shouldn't touch the hardware at all, as it
>>> should be left in its reset state from the last umount(), or setup.
>>
>> There is an attempt to configure the hardware in the mkdir path:
>> rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()
>>
>>
>> This is required in support of the different resource group modes. When a new
>> resource group is created via mkdir the configuration should respect any
>> exclusive resource groups that exist at that point.
>>
>>>
>>> I can't reproduce this on v6.0-rc7.
>>> Even if I dirty the configuration by writing to the schemata file, I can't reproduce this.
>>>
>>
>>  From what I can tell the reproducer is dependent on (a) whether hardware
>> supports CDP, and (b) the number of CLOSIDs supported by the system. The reproducer
>> works for hardware that has 16 CLOSIDs (see later).
>>
>>> (I have mba enabled, but all this should affect is the number of closid available)

> I reproduce this on v6.0-rc6. The key to reproduction is to ensure that the number of
> usable groups is different between CDP enabled and CDP disabled.
> 
> The system I use has 16 CLOSIDs for L3 CAT and 8 CLOSIDs for MBA. MBA limits the max
> number of groups to 8, even if CDP is disabled. This is the reason why I disable MBA.

bingo - could that detail be included in the commit message?

[..]

>> What do you think about clearing the staged config within resctrl_arch_update_domains()
>> after the configuration is complete and there is no more need for it? That may reduce
>> complexity where each caller no longer need to remember to do so.
>> I see "staged_config" as a temporary storage and it my help to understand the code better
>> if it is treated as such.
>>
> 
> I think this is better. I have tested it and will give a new version.

Great, thanks!

Could you mention have_new_ctrl in the commit message, and this path via
rdtgroup_init_alloc().

I think you also need:
Fixes: 75408e43509ed ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Cc: <stable@vger.kernel.org>


Thanks,

James

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-28 14:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27  2:54 [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list Shawn Wang
2022-09-27 13:06 ` James Morse
2022-09-27 21:21   ` Reinette Chatre
2022-09-28 13:37     ` Shawn Wang
2022-09-28 14:09       ` James Morse
2022-09-28 14:08     ` James Morse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.