* Re: [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
@ 2017-04-04 19:02 Tracy Smith
0 siblings, 0 replies; 9+ messages in thread
From: Tracy Smith @ 2017-04-04 19:02 UTC (permalink / raw)
To: Shivappa Vikas
Cc: x86, lkml, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin,
Vikas Shivappa
Apologies, unrelated to MBA. Resent later with a changed subject line.
On Tue, Apr 4, 2017 at 1:50 PM, Shivappa Vikas <vikas.shivappa@intel.com> wrote:
>
>
>
> On Mon, 3 Apr 2017, Tracy Smith wrote:
>
>> Hi All,
>>
>> No JTAG available and need to understand why Linux 4.8.3 doesn't boot on a
>> x86_64 corei7-64. Hangs at the typical "Starting kernel" location after
>> the last message of the U-boot. The bootcmd is given below.
>
>
> Do you see the issue when you apply MBA patches . It seems like you use 4.8
> kernel but the mba patches are dependent on :
> https://marc.info/?l=linux-kernel&m=149125583824700
>
> which is applied on 4.11-rc4.
>
>
>>
>> See a fault FFS and a message indicating the image didn't load after
>> failover.
>>
>> 1) How can I verify if the kernel image was loaded from uboot
>> 2) What is this fault?
>> 3) Has the bootargs or bootcmd changed between 4.1 and 4.8.3?
>> 4) If the boot cmd/arg has changed, what should the boot cmd/arg be for
>> 4.8.3 to boot on x86_64 corei7-64?
>>
>> Initial RAM disk at linear address 0x20000000, size 11638378 bytes
>> Kernel command line: "BOOT_IMAGE=/imgx/bzImage LABEL=BOOT root=/dev/ram0
>> imgmnt=/media/sda2 imgdir=imgx img=image.rootfs rootdelay=2 slub_debug=F
>> console=ttyS1,115200 bootcount=1 bootcount_addr=0xa4000
>> acpi_enforce_resources=lax pram_size=0x800000 pram_addr=10000000
>> pram_loc=ddr crashkernel=128M memmap=0x800000$0x10000000 "
>> EFI table at 683392c0, mmap 68339300, mmap size 4c0, version 1, descr.
>> size
>> 0x30
>>
>> Starting kernel ...
>>
>> Timer summary in microseconds:
>> Mark Elapsed Stage
>> 0 0 reset
>> 1 1 board_init_r
>> 105 104 board_init_f
>> 10,180,048 10,179,943 id=64
>> 10,221,985 41,937 id=65
>> 10,356,645 134,660 main_loop
>> 12,366,521 2,009,876 usb_start
>> 18,747,284 6,380,763 start_kernel
>> Accumulated time:
>> 10,162,689 ahci
>>
>> On a 4.1.26-yocto-standard #1 SMP it boots with no issues. Basically same
>> .config used in both cases except for anything deprecated between 4.1 and
>> 4.8.3.
>>
>> root@:~# cat /proc/cmdline
>> BOOT_IMAGE=/imgy/bzImage LABEL=BOOT root=/dev/ram0 imgmnt=/media/sda2
>> imgdir=imgy img=image.rootfs rootdelay=2 slub_debug=F console=ttyS1,115200
>> fault=FFS bootcount=3 bootcount_addr=0xa4000 acpi_enforce_resources=lax
>> pram_size=0x800000 pram_addr=10000000 pram_loc=ddr crashkernel=128M
>> memmap=0x800000$0x10000000
>> root@CLX3001:~# cat /proc/consoles
>> ttyS1 -W- (EC p a) 4:65
>> netcon0 -W- (E )
>>
>> thx,
>> Tray
>>
>
--
Confidentiality notice: This e-mail message, including any
attachments, may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), please
immediately notify the sender and delete this e-mail message.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/8 V4] x86/intel_rdt: Intel Memory bandwidth allocation @ 2017-04-08 0:33 Vikas Shivappa 2017-04-08 0:33 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa 0 siblings, 1 reply; 9+ messages in thread From: Vikas Shivappa @ 2017-04-08 0:33 UTC (permalink / raw) To: vikas.shivappa, x86, linux-kernel Cc: hpa, tglx, mingo, ravi.v.shankar, tony.luck, fenghua.yu, vikas.shivappa Sending another version of MBA patch series with changes to V3 version as per Thomas feedback here: https://marc.info/?l=linux-kernel&m=149125664024881 Changes: - Fixed the wrong names in struct document for rdt_domain - Changed the mba and cache ctrl values to have seperate structs and put them in a union in rdt_resource structure. patch applies on tip x86/cpus [PATCH 1/8] Documentation, x86: Intel Memory bandwidth allocation [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for [PATCH 3/8] x86/intel_rdt/mba: Memory b/w allocation feature detect [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA [PATCH 5/8] x86/intel_rdt: Prep to add info files for MBA [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA [PATCH 7/8] x86/intel_rdt: Prep to add schemata file for MBA [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA 2017-04-08 0:33 [PATCH 0/8 V4] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa @ 2017-04-08 0:33 ` Vikas Shivappa 0 siblings, 0 replies; 9+ messages in thread From: Vikas Shivappa @ 2017-04-08 0:33 UTC (permalink / raw) To: vikas.shivappa, x86, linux-kernel Cc: hpa, tglx, mingo, ravi.v.shankar, tony.luck, fenghua.yu, vikas.shivappa The MBA feature details like minimum bandwidth supported, b/w granularity etc are obtained via executing CPUID with EAX=10H ,ECX=3. Setup and initialize the MBA specific extensions to data structures like global list of RDT resources, RDT resource structure and RDT domain structure. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> --- arch/x86/include/asm/intel_rdt.h | 92 +++++++++++++++--------- arch/x86/kernel/cpu/intel_rdt.c | 151 ++++++++++++++++++++++++++++++++++++--- 2 files changed, 199 insertions(+), 44 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 4c94f18..097134b 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -12,6 +12,7 @@ #define IA32_L3_QOS_CFG 0xc81 #define IA32_L3_CBM_BASE 0xc90 #define IA32_L2_CBM_BASE 0xd10 +#define IA32_MBA_THRTL_BASE 0xd50 #define L3_QOS_CDP_ENABLE 0x01ULL @@ -69,39 +70,6 @@ struct rftype { }; /** - * struct rdt_resource - attributes of an RDT resource - * @enabled: Is this feature enabled on this machine - * @capable: Is this feature available on this machine - * @name: Name to use in "schemata" file - * @num_closid: Number of CLOSIDs available - * @default_ctrl: Specifies default cache cbm or mem b/w percent. - * @data_width: Character width of data when displaying - * @min_cbm_bits: Minimum number of consecutive bits to be set - * in a cache bit mask - * @domains: All domains for this resource - * @msr_base: Base MSR address for CBMs - * @cache_level: Which cache level defines scope of this domain - * @cbm_idx_multi: Multiplier of CBM index - * @cbm_idx_offset: Offset of CBM index. CBM index is computed by: - * closid * cbm_idx_multi + cbm_idx_offset - */ -struct rdt_resource { - bool enabled; - bool capable; - char *name; - int num_closid; - int cbm_len; - int min_cbm_bits; - u32 default_ctrl; - int data_width; - struct list_head domains; - int msr_base; - int cache_level; - int cbm_idx_multi; - int cbm_idx_offset; -}; - -/** * struct rdt_domain - group of cpus sharing an RDT resource * @list: all instances of this resource * @id: unique id for this instance @@ -131,6 +99,63 @@ struct msr_param { int high; }; +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @capable: Is this feature available on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @default_ctrl: Specifies default cache cbm or mem b/w percent. + * @data_width: Character width of data when displaying + * @msr_update: Function pointer to update QOS MSRs + * @domains: All domains for this resource + * @msr_base: Base MSR address for CBMs + * @cache_level: Which cache level defines scope of this domain + * @cbm_idx_multi: Multiplier of CBM index + * @cbm_idx_offset: Offset of CBM index. CBM index is computed by: + * closid * cbm_idx_multi + cbm_idx_offset + * @cbm_len Number of cbm bits + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @max_delay: Max throttle delay. Delay is the hardware + * understandable value for memory bandwidth. + * @min_bw: Minimum memory bandwidth percentage user + * can request + * @bw_gran: Granularity at which the memory bandwidth + * is allocated + * @delay_linear: True if memory b/w delay is in linear scale + * @mb_map: Mapping of memory b/w percentage to + * memory b/w delay values + */ +struct rdt_resource { + bool enabled; + bool capable; + char *name; + int num_closid; + u32 default_ctrl; + int data_width; + void (*msr_update) (struct rdt_domain *d, struct msr_param *m, + struct rdt_resource *r); + struct list_head domains; + int msr_base; + int cache_level; + int cbm_idx_multi; + int cbm_idx_offset; + union { + struct { /*cache ctrls*/ + int cbm_len; + int min_cbm_bits; + }; + struct { /*mem ctrls*/ + u32 max_delay; + u32 min_bw; + u32 bw_gran; + u32 delay_linear; + u32 *mb_map; + }; + }; +}; + extern struct mutex rdtgroup_mutex; extern struct rdt_resource rdt_resources_all[]; @@ -144,6 +169,7 @@ enum { RDT_RESOURCE_L3DATA, RDT_RESOURCE_L3CODE, RDT_RESOURCE_L2, + RDT_RESOURCE_MBA, /* Must be the last */ RDT_NUM_RESOURCES, diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index c4cf2e8..b9f1cfc 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -32,6 +32,9 @@ #include <asm/intel-family.h> #include <asm/intel_rdt.h> +#define MAX_MBA_BW 100u +#define MBA_IS_LINEAR 0x4 + /* Mutex to protect rdtgroup access. */ DEFINE_MUTEX(rdtgroup_mutex); @@ -45,11 +48,17 @@ */ int max_name_width, max_data_width; +static void +mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); +static void +cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); + struct rdt_resource rdt_resources_all[] = { { .name = "L3", .domains = domain_init(RDT_RESOURCE_L3), .msr_base = IA32_L3_CBM_BASE, + .msr_update = cat_wrmsr, .min_cbm_bits = 1, .cache_level = 3, .cbm_idx_multi = 1, @@ -59,6 +68,7 @@ struct rdt_resource rdt_resources_all[] = { .name = "L3DATA", .domains = domain_init(RDT_RESOURCE_L3DATA), .msr_base = IA32_L3_CBM_BASE, + .msr_update = cat_wrmsr, .min_cbm_bits = 1, .cache_level = 3, .cbm_idx_multi = 2, @@ -68,6 +78,7 @@ struct rdt_resource rdt_resources_all[] = { .name = "L3CODE", .domains = domain_init(RDT_RESOURCE_L3CODE), .msr_base = IA32_L3_CBM_BASE, + .msr_update = cat_wrmsr, .min_cbm_bits = 1, .cache_level = 3, .cbm_idx_multi = 2, @@ -77,11 +88,21 @@ struct rdt_resource rdt_resources_all[] = { .name = "L2", .domains = domain_init(RDT_RESOURCE_L2), .msr_base = IA32_L2_CBM_BASE, + .msr_update = cat_wrmsr, .min_cbm_bits = 1, .cache_level = 2, .cbm_idx_multi = 1, .cbm_idx_offset = 0 }, + { + .name = "MB", + .domains = domain_init(RDT_RESOURCE_MBA), + .msr_base = IA32_MBA_THRTL_BASE, + .msr_update = mba_wrmsr, + .cache_level = 3, + .cbm_idx_multi = 1, + .cbm_idx_offset = 0 + }, }; static int cbm_idx(struct rdt_resource *r, int closid) @@ -136,6 +157,53 @@ static inline bool cache_alloc_hsw_probe(void) return false; } +/* + * rdt_get_mb_table() - get a mapping of bandwidth(b/w) percentage values + * exposed to user interface and the h/w understandable delay values. + * + * The non-linear delay values have the granularity of power of two + * and also the h/w does not guarantee a curve for configured delay + * values vs. actual b/w enforced. + * Hence we need a mapping that is pre calibrated so the user can + * express the memory b/w as a percentage value. + */ +static inline bool rdt_get_mb_table(struct rdt_resource *r) +{ + /* + * There are no Intel SKUs as of now to support non-linear delay. + */ + pr_info("MBA b/w map not implemented for cpu:%d, model:%d", + boot_cpu_data.x86, boot_cpu_data.x86_model); + + return false; +} + +static bool rdt_get_mem_config(struct rdt_resource *r) +{ + union cpuid_0x10_3_eax eax; + union cpuid_0x10_x_edx edx; + u32 ebx, ecx; + + cpuid_count(0x00000010, 3, &eax.full, &ebx, &ecx, &edx.full); + r->num_closid = edx.split.cos_max + 1; + r->max_delay = eax.split.max_delay + 1; + r->default_ctrl = MAX_MBA_BW; + if (ecx & MBA_IS_LINEAR) { + r->delay_linear = true; + r->min_bw = MAX_MBA_BW - r->max_delay; + r->bw_gran = MAX_MBA_BW - r->max_delay; + } else { + if (!rdt_get_mb_table(r)) + return false; + } + r->data_width = 3; + + r->capable = true; + r->enabled = true; + + return true; +} + static void rdt_get_cache_config(int idx, struct rdt_resource *r) { union cpuid_0x10_1_eax eax; @@ -212,7 +280,8 @@ static inline bool get_rdt_resources(void) ret = true; } - if (boot_cpu_has(X86_FEATURE_MBA)) + if (boot_cpu_has(X86_FEATURE_MBA) && + rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA])) ret = true; rdt_init_padding(); @@ -233,6 +302,47 @@ static int get_cache_id(int cpu, int level) return -1; } +/* + * Map the memory b/w percentage value to delay values + * that can be written to QOS_MSRs. + * There are currently no SKUs which support non linear delay values. + */ +static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r) +{ + if (r->delay_linear) + return MAX_MBA_BW - bw; + + WARN_ONCE(1, "Non Linear delay-bw map not supported but queried\n"); + return r->default_ctrl; +} + +static void +mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r) +{ + int i; + + for (i = m->low; i < m->high; i++) { + int idx = cbm_idx(r, i); + + /* + * Write the delay value for mba. + */ + wrmsrl(r->msr_base + idx, delay_bw_map(d->ctrl_val[i], r)); + } +} + +static void +cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r) +{ + int i; + + for (i = m->low; i < m->high; i++) { + int idx = cbm_idx(r, i); + + wrmsrl(r->msr_base + idx, d->ctrl_val[i]); + } +} + void rdt_ctrl_update(void *arg) { struct msr_param *m = (struct msr_param *)arg; @@ -291,6 +401,33 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, return NULL; } +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d) +{ + struct msr_param m; + u32 *dc; + int i; + + dc = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); + if (!dc) + return -ENOMEM; + + d->ctrl_val = dc; + + /* + * Initialize the Control MSRs to having no control. + * For Cache Allocation: Set all bits in cbm + * For Memory Allocation: Set b/w requested to 100 + */ + for (i = 0; i < r->num_closid; i++, dc++) + *dc = r->default_ctrl; + + m.low = 0; + m.high = r->num_closid; + r->msr_update(d, &m, r); + + return 0; +} + /* * domain_add_cpu - Add a cpu to a resource's domain list. * @@ -306,7 +443,7 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, */ static void domain_add_cpu(int cpu, struct rdt_resource *r) { - int i, id = get_cache_id(cpu, r->cache_level); + int id = get_cache_id(cpu, r->cache_level); struct list_head *add_pos = NULL; struct rdt_domain *d; @@ -327,19 +464,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) d->id = id; - d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); - if (!d->ctrl_val) { + if (domain_setup_ctrlval(r, d)) { kfree(d); return; } - for (i = 0; i < r->num_closid; i++) { - int idx = cbm_idx(r, i); - - d->ctrl_val[i] = r->default_ctrl; - wrmsrl(r->msr_base + idx, d->ctrl_val[i]); - } - cpumask_set_cpu(cpu, &d->cpu_mask); list_add_tail(&d->list, add_pos); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 0/8 V3] x86/intel_rdt: Intel Memory bandwidth allocation @ 2017-04-03 21:57 Vikas Shivappa 2017-04-03 21:57 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa 0 siblings, 1 reply; 9+ messages in thread From: Vikas Shivappa @ 2017-04-03 21:57 UTC (permalink / raw) To: vikas.shivappa, x86, linux-kernel Cc: hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin Memory b/w allocation(MBA) is part of the Intel Resource Director Technology (RDT). RDT helps monitor and share processor shared resources. MBA helps enforce a limit on the memory bandwidth(b/w) threads can use when they are scheduled. OS does the enforcement using MSR(model specific register) interface and mapping the threads to architecture specific CLOSids(class of service IDs). The user interface is via the common resctrl fs for RDT features. This can be used along with MBM (memory b/w monitoring) and cache allocation to control and monitor the applications cache and memory resources as per the QoS or other performance requirements. Use cases could be large serverclusters, VM, cloud and container based services where the admin or orchestration tools can use this framework to manage/allocate these processor shared resources like other memory/cpu resources to provide QoS guarantees. Detailed use case examples are documented in the 01/08 Documentation patch. Patches are dependent on : https://marc.info/?l=linux-kernel&m=149125583824700 V3 changes ---------- Thanks to Thomas for all the feed back and below are the changes as per the feedback: - Did a lot of changes to make code more readable and follow convention like moving the defines/variables, renaming, removing unnecessary return value changes, unnecessary globals. - Changed all change logs to follow context, problem, solution format. - Fixed the issue where rdt init may return success even if all features fail to init. - Added warn_on to code that should be reached in case when system does not support non_linear delay. - The schemata interface is changed to allow user enter a value which does not follow the granularity requirements and intermediate values are just rounded off to next available value. - Changed documentation to reflect these details and remove all architectural details. V2 changes ---------- Thanks to Thomas for all the feed back and below are the changes as per the feedback: - Sent the RDT generic improvements patch series seperately - Split the info file/validate apis as prep patches to MBA changes - Fixed a lot of coding convention issues, variable initializations, and removing unnecessary allocations etc. - Changed the hotcpu handling for CLOS/RMID to zero the CLOSid/RMID when cpu is offlined. - Changed some naming conventions. - Changed the interface to show the bandwidth percentage that can be requested rather than showing h/w specific delay values which do not map to a particular b/w percentage in non-linear scale [PATCH 1/8] Documentation, x86: Intel Memory bandwidth allocation [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for [PATCH 3/8] x86/intel_rdt/mba: Memory b/w allocation feature detect [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA [PATCH 5/8] x86/intel_rdt: Prep to add info files for MBA [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA [PATCH 7/8] x86/intel_rdt: Prep to add schemata file for MBA [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA 2017-04-03 21:57 [PATCH 0/8 V3] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa @ 2017-04-03 21:57 ` Vikas Shivappa [not found] ` <CAChUvXM8gWAz6-AJ6jkyKjf5Yz0ze-2XAtvdZvze3Go44TPD8A@mail.gmail.com> 2017-04-05 15:40 ` Thomas Gleixner 0 siblings, 2 replies; 9+ messages in thread From: Vikas Shivappa @ 2017-04-03 21:57 UTC (permalink / raw) To: vikas.shivappa, x86, linux-kernel Cc: hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin The MBA feature details like minimum bandwidth supported, b/w granularity etc are obtained via executing CPUID with EAX=10H ,ECX=3. Setup and initialize the MBA specific extensions to data structures like global list of RDT resources, RDT resource structure and RDT domain structure. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> --- arch/x86/include/asm/intel_rdt.h | 80 +++++++++++++-------- arch/x86/kernel/cpu/intel_rdt.c | 151 ++++++++++++++++++++++++++++++++++++--- 2 files changed, 190 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 4c94f18..285cdeb 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -12,6 +12,7 @@ #define IA32_L3_QOS_CFG 0xc81 #define IA32_L3_CBM_BASE 0xc90 #define IA32_L2_CBM_BASE 0xd10 +#define IA32_MBA_THRTL_BASE 0xd50 #define L3_QOS_CDP_ENABLE 0x01ULL @@ -69,6 +70,36 @@ struct rftype { }; /** + * struct rdt_domain - group of cpus sharing an RDT resource + * @list: all instances of this resource + * @id: unique id for this instance + * @cpu_mask: which cpus share this resource + * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID) + * @new_cbm: new cbm value to be loaded + * @have_new_cbm: did user provide new_cbm for this domain + */ +struct rdt_domain { + struct list_head list; + int id; + struct cpumask cpu_mask; + u32 *ctrl_val; + u32 new_ctrl; + bool have_new_ctrl; +}; + +/** + * struct msr_param - set a range of MSRs from a domain + * @res: The resource to use + * @low: Beginning index from base MSR + * @high: End index + */ +struct msr_param { + struct rdt_resource *res; + int low; + int high; +}; + +/** * struct rdt_resource - attributes of an RDT resource * @enabled: Is this feature enabled on this machine * @capable: Is this feature available on this machine @@ -78,6 +109,16 @@ struct rftype { * @data_width: Character width of data when displaying * @min_cbm_bits: Minimum number of consecutive bits to be set * in a cache bit mask + * @msr_update: Function pointer to update QOS MSRs + * @max_delay: Max throttle delay. Delay is the hardware + * understandable value for memory bandwidth. + * @min_bw: Minimum memory bandwidth percentage user + * can request + * @bw_gran: Granularity at which the memory bandwidth + * is allocated + * @delay_linear: True if memory b/w delay is in linear scale + * @mb_map: Mapping of memory b/w percentage to + * memory b/w delay values * @domains: All domains for this resource * @msr_base: Base MSR address for CBMs * @cache_level: Which cache level defines scope of this domain @@ -94,6 +135,14 @@ struct rdt_resource { int min_cbm_bits; u32 default_ctrl; int data_width; + void (*msr_update) (struct rdt_domain *d, struct msr_param *m, + struct rdt_resource *r); + u32 max_delay; + u32 min_bw; + u32 bw_gran; + u32 delay_linear; + u32 *mb_map; + struct list_head domains; int msr_base; int cache_level; @@ -101,36 +150,6 @@ struct rdt_resource { int cbm_idx_offset; }; -/** - * struct rdt_domain - group of cpus sharing an RDT resource - * @list: all instances of this resource - * @id: unique id for this instance - * @cpu_mask: which cpus share this resource - * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID) - * @new_ctrl: new ctrl value to be loaded - * @have_new_ctrl: did user provide new_ctrl for this domain - */ -struct rdt_domain { - struct list_head list; - int id; - struct cpumask cpu_mask; - u32 *ctrl_val; - u32 new_ctrl; - bool have_new_ctrl; -}; - -/** - * struct msr_param - set a range of MSRs from a domain - * @res: The resource to use - * @low: Beginning index from base MSR - * @high: End index - */ -struct msr_param { - struct rdt_resource *res; - int low; - int high; -}; - extern struct mutex rdtgroup_mutex; extern struct rdt_resource rdt_resources_all[]; @@ -144,6 +163,7 @@ enum { RDT_RESOURCE_L3DATA, RDT_RESOURCE_L3CODE, RDT_RESOURCE_L2, + RDT_RESOURCE_MBA, /* Must be the last */ RDT_NUM_RESOURCES, diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index c4cf2e8..be272b9 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -32,6 +32,9 @@ #include <asm/intel-family.h> #include <asm/intel_rdt.h> +#define MAX_MBA_BW 100u +#define MBA_IS_LINEAR 0x4 + /* Mutex to protect rdtgroup access. */ DEFINE_MUTEX(rdtgroup_mutex); @@ -45,11 +48,17 @@ */ int max_name_width, max_data_width; +static void +mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); +static void +cqm_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); + struct rdt_resource rdt_resources_all[] = { { .name = "L3", .domains = domain_init(RDT_RESOURCE_L3), .msr_base = IA32_L3_CBM_BASE, + .msr_update = cqm_wrmsr, .min_cbm_bits = 1, .cache_level = 3, .cbm_idx_multi = 1, @@ -59,6 +68,7 @@ struct rdt_resource rdt_resources_all[] = { .name = "L3DATA", .domains = domain_init(RDT_RESOURCE_L3DATA), .msr_base = IA32_L3_CBM_BASE, + .msr_update = cqm_wrmsr, .min_cbm_bits = 1, .cache_level = 3, .cbm_idx_multi = 2, @@ -68,6 +78,7 @@ struct rdt_resource rdt_resources_all[] = { .name = "L3CODE", .domains = domain_init(RDT_RESOURCE_L3CODE), .msr_base = IA32_L3_CBM_BASE, + .msr_update = cqm_wrmsr, .min_cbm_bits = 1, .cache_level = 3, .cbm_idx_multi = 2, @@ -77,11 +88,21 @@ struct rdt_resource rdt_resources_all[] = { .name = "L2", .domains = domain_init(RDT_RESOURCE_L2), .msr_base = IA32_L2_CBM_BASE, + .msr_update = cqm_wrmsr, .min_cbm_bits = 1, .cache_level = 2, .cbm_idx_multi = 1, .cbm_idx_offset = 0 }, + { + .name = "MB", + .domains = domain_init(RDT_RESOURCE_MBA), + .msr_base = IA32_MBA_THRTL_BASE, + .msr_update = mba_wrmsr, + .cache_level = 3, + .cbm_idx_multi = 1, + .cbm_idx_offset = 0 + }, }; static int cbm_idx(struct rdt_resource *r, int closid) @@ -136,6 +157,53 @@ static inline bool cache_alloc_hsw_probe(void) return false; } +/* + * rdt_get_mb_table() - get a mapping of bandwidth(b/w) percentage values + * exposed to user interface and the h/w understandable delay values. + * + * The non-linear delay values have the granularity of power of two + * and also the h/w does not guarantee a curve for configured delay + * values vs. actual b/w enforced. + * Hence we need a mapping that is pre calibrated so the user can + * express the memory b/w as a percentage value. + */ +static inline bool rdt_get_mb_table(struct rdt_resource *r) +{ + /* + * There are no Intel SKUs as of now to support non-linear delay. + */ + pr_info("MBA b/w map not implemented for cpu:%d, model:%d", + boot_cpu_data.x86, boot_cpu_data.x86_model); + + return false; +} + +static bool rdt_get_mem_config(struct rdt_resource *r) +{ + union cpuid_0x10_3_eax eax; + union cpuid_0x10_x_edx edx; + u32 ebx, ecx; + + cpuid_count(0x00000010, 3, &eax.full, &ebx, &ecx, &edx.full); + r->num_closid = edx.split.cos_max + 1; + r->max_delay = eax.split.max_delay + 1; + r->default_ctrl = MAX_MBA_BW; + if (ecx & MBA_IS_LINEAR) { + r->delay_linear = true; + r->min_bw = MAX_MBA_BW - r->max_delay; + r->bw_gran = MAX_MBA_BW - r->max_delay; + } else { + if (!rdt_get_mb_table(r)) + return false; + } + r->data_width = 3; + + r->capable = true; + r->enabled = true; + + return true; +} + static void rdt_get_cache_config(int idx, struct rdt_resource *r) { union cpuid_0x10_1_eax eax; @@ -212,7 +280,8 @@ static inline bool get_rdt_resources(void) ret = true; } - if (boot_cpu_has(X86_FEATURE_MBA)) + if (boot_cpu_has(X86_FEATURE_MBA) && + rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA])) ret = true; rdt_init_padding(); @@ -233,6 +302,47 @@ static int get_cache_id(int cpu, int level) return -1; } +/* + * Map the memory b/w percentage value to delay values + * that can be written to QOS_MSRs. + * There are currently no SKUs which support non linear delay values. + */ +static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r) +{ + if (r->delay_linear) + return MAX_MBA_BW - bw; + + WARN_ONCE(1, "Non Linear delay-bw map not supported but queried\n"); + return r->default_ctrl; +} + +static void +mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r) +{ + int i; + + for (i = m->low; i < m->high; i++) { + int idx = cbm_idx(r, i); + + /* + * Write the delay value for mba. + */ + wrmsrl(r->msr_base + idx, delay_bw_map(d->ctrl_val[i], r)); + } +} + +static void +cqm_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r) +{ + int i; + + for (i = m->low; i < m->high; i++) { + int idx = cbm_idx(r, i); + + wrmsrl(r->msr_base + idx, d->ctrl_val[i]); + } +} + void rdt_ctrl_update(void *arg) { struct msr_param *m = (struct msr_param *)arg; @@ -291,6 +401,33 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, return NULL; } +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d) +{ + struct msr_param m; + u32 *dc; + int i; + + dc = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); + if (!dc) + return -ENOMEM; + + d->ctrl_val = dc; + + /* + * Initialize the Control MSRs to having no control. + * For Cache Allocation: Set all bits in cbm + * For Memory Allocation: Set b/w requested to 100 + */ + for (i = 0; i < r->num_closid; i++, dc++) + *dc = r->default_ctrl; + + m.low = 0; + m.high = r->num_closid; + r->msr_update(d, &m, r); + + return 0; +} + /* * domain_add_cpu - Add a cpu to a resource's domain list. * @@ -306,7 +443,7 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, */ static void domain_add_cpu(int cpu, struct rdt_resource *r) { - int i, id = get_cache_id(cpu, r->cache_level); + int id = get_cache_id(cpu, r->cache_level); struct list_head *add_pos = NULL; struct rdt_domain *d; @@ -327,19 +464,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) d->id = id; - d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); - if (!d->ctrl_val) { + if (domain_setup_ctrlval(r, d)) { kfree(d); return; } - for (i = 0; i < r->num_closid; i++) { - int idx = cbm_idx(r, i); - - d->ctrl_val[i] = r->default_ctrl; - wrmsrl(r->msr_base + idx, d->ctrl_val[i]); - } - cpumask_set_cpu(cpu, &d->cpu_mask); list_add_tail(&d->list, add_pos); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <CAChUvXM8gWAz6-AJ6jkyKjf5Yz0ze-2XAtvdZvze3Go44TPD8A@mail.gmail.com>]
* Re: [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA [not found] ` <CAChUvXM8gWAz6-AJ6jkyKjf5Yz0ze-2XAtvdZvze3Go44TPD8A@mail.gmail.com> @ 2017-04-04 18:50 ` Shivappa Vikas 0 siblings, 0 replies; 9+ messages in thread From: Shivappa Vikas @ 2017-04-04 18:50 UTC (permalink / raw) To: Tracy Smith Cc: x86, vikas.shivappa, lkml, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin, Vikas Shivappa On Mon, 3 Apr 2017, Tracy Smith wrote: > Hi All, > > No JTAG available and need to understand why Linux 4.8.3 doesn't boot on a > x86_64 corei7-64. Hangs at the typical "Starting kernel" location after > the last message of the U-boot. The bootcmd is given below. Do you see the issue when you apply MBA patches . It seems like you use 4.8 kernel but the mba patches are dependent on : https://marc.info/?l=linux-kernel&m=149125583824700 which is applied on 4.11-rc4. > > See a fault FFS and a message indicating the image didn't load after > failover. > > 1) How can I verify if the kernel image was loaded from uboot > 2) What is this fault? > 3) Has the bootargs or bootcmd changed between 4.1 and 4.8.3? > 4) If the boot cmd/arg has changed, what should the boot cmd/arg be for > 4.8.3 to boot on x86_64 corei7-64? > > Initial RAM disk at linear address 0x20000000, size 11638378 bytes > Kernel command line: "BOOT_IMAGE=/imgx/bzImage LABEL=BOOT root=/dev/ram0 > imgmnt=/media/sda2 imgdir=imgx img=image.rootfs rootdelay=2 slub_debug=F > console=ttyS1,115200 bootcount=1 bootcount_addr=0xa4000 > acpi_enforce_resources=lax pram_size=0x800000 pram_addr=10000000 > pram_loc=ddr crashkernel=128M memmap=0x800000$0x10000000 " > EFI table at 683392c0, mmap 68339300, mmap size 4c0, version 1, descr. size > 0x30 > > Starting kernel ... > > Timer summary in microseconds: > Mark Elapsed Stage > 0 0 reset > 1 1 board_init_r > 105 104 board_init_f > 10,180,048 10,179,943 id=64 > 10,221,985 41,937 id=65 > 10,356,645 134,660 main_loop > 12,366,521 2,009,876 usb_start > 18,747,284 6,380,763 start_kernel > Accumulated time: > 10,162,689 ahci > > On a 4.1.26-yocto-standard #1 SMP it boots with no issues. Basically same > .config used in both cases except for anything deprecated between 4.1 and > 4.8.3. > > root@:~# cat /proc/cmdline > BOOT_IMAGE=/imgy/bzImage LABEL=BOOT root=/dev/ram0 imgmnt=/media/sda2 > imgdir=imgy img=image.rootfs rootdelay=2 slub_debug=F console=ttyS1,115200 > fault=FFS bootcount=3 bootcount_addr=0xa4000 acpi_enforce_resources=lax > pram_size=0x800000 pram_addr=10000000 pram_loc=ddr crashkernel=128M > memmap=0x800000$0x10000000 > root@CLX3001:~# cat /proc/consoles > ttyS1 -W- (EC p a) 4:65 > netcon0 -W- (E ) > > thx, > Tray > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA 2017-04-03 21:57 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa [not found] ` <CAChUvXM8gWAz6-AJ6jkyKjf5Yz0ze-2XAtvdZvze3Go44TPD8A@mail.gmail.com> @ 2017-04-05 15:40 ` Thomas Gleixner 2017-04-05 18:09 ` Shivappa Vikas 1 sibling, 1 reply; 9+ messages in thread From: Thomas Gleixner @ 2017-04-05 15:40 UTC (permalink / raw) To: Vikas Shivappa Cc: vikas.shivappa, x86, linux-kernel, hpa, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin On Mon, 3 Apr 2017, Vikas Shivappa wrote: > > /** > + * struct rdt_domain - group of cpus sharing an RDT resource > + * @list: all instances of this resource > + * @id: unique id for this instance > + * @cpu_mask: which cpus share this resource > + * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID) > + * @new_cbm: new cbm value to be loaded > + * @have_new_cbm: did user provide new_cbm for this domain The version which you removed below has the kernel-doc comments correct .... > +/** > * struct rdt_resource - attributes of an RDT resource > * @enabled: Is this feature enabled on this machine > * @capable: Is this feature available on this machine > @@ -78,6 +109,16 @@ struct rftype { > * @data_width: Character width of data when displaying > * @min_cbm_bits: Minimum number of consecutive bits to be set > * in a cache bit mask > + * @msr_update: Function pointer to update QOS MSRs > + * @max_delay: Max throttle delay. Delay is the hardware > + * understandable value for memory bandwidth. > + * @min_bw: Minimum memory bandwidth percentage user > + * can request > + * @bw_gran: Granularity at which the memory bandwidth > + * is allocated > + * @delay_linear: True if memory b/w delay is in linear scale > + * @mb_map: Mapping of memory b/w percentage to > + * memory b/w delay values > * @domains: All domains for this resource > * @msr_base: Base MSR address for CBMs > * @cache_level: Which cache level defines scope of this domain > @@ -94,6 +135,14 @@ struct rdt_resource { > int min_cbm_bits; > u32 default_ctrl; > int data_width; > + void (*msr_update) (struct rdt_domain *d, struct msr_param *m, > + struct rdt_resource *r); > + u32 max_delay; > + u32 min_bw; > + u32 bw_gran; > + u32 delay_linear; > + u32 *mb_map; I don't know what other weird controls will be added over time, but we are probably better off to have struct cache_ctrl { int cbm_len; int min_cbm_bits; }; struct mba_ctrl { u32 max_delay; u32 min_bw; u32 bw_gran; u32 delay_linear; u32 *mb_map; }; and in then in struct rdt_resource: <common fields> union { struct cache_ctrl foo; struct mba_ctrl bla; } ctrl; That avoids that rdt_resource becomes a hodgepodge of unrelated or even contradicting fields. Hmm? Thanks, tglx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA 2017-04-05 15:40 ` Thomas Gleixner @ 2017-04-05 18:09 ` Shivappa Vikas 0 siblings, 0 replies; 9+ messages in thread From: Shivappa Vikas @ 2017-04-05 18:09 UTC (permalink / raw) To: Thomas Gleixner Cc: Vikas Shivappa, vikas.shivappa, x86, linux-kernel, hpa, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin On Wed, 5 Apr 2017, Thomas Gleixner wrote: > On Mon, 3 Apr 2017, Vikas Shivappa wrote: >> >> /** >> + * struct rdt_domain - group of cpus sharing an RDT resource >> + * @list: all instances of this resource >> + * @id: unique id for this instance >> + * @cpu_mask: which cpus share this resource >> + * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID) >> + * @new_cbm: new cbm value to be loaded >> + * @have_new_cbm: did user provide new_cbm for this domain > > The version which you removed below has the kernel-doc comments correct .... Will fix > >> +/** >> * struct rdt_resource - attributes of an RDT resource >> * @enabled: Is this feature enabled on this machine >> * @capable: Is this feature available on this machine >> @@ -78,6 +109,16 @@ struct rftype { >> * @data_width: Character width of data when displaying >> * @min_cbm_bits: Minimum number of consecutive bits to be set >> * in a cache bit mask >> + * @msr_update: Function pointer to update QOS MSRs >> + * @max_delay: Max throttle delay. Delay is the hardware >> + * understandable value for memory bandwidth. >> + * @min_bw: Minimum memory bandwidth percentage user >> + * can request >> + * @bw_gran: Granularity at which the memory bandwidth >> + * is allocated >> + * @delay_linear: True if memory b/w delay is in linear scale >> + * @mb_map: Mapping of memory b/w percentage to >> + * memory b/w delay values >> * @domains: All domains for this resource >> * @msr_base: Base MSR address for CBMs >> * @cache_level: Which cache level defines scope of this domain >> @@ -94,6 +135,14 @@ struct rdt_resource { >> int min_cbm_bits; >> u32 default_ctrl; >> int data_width; >> + void (*msr_update) (struct rdt_domain *d, struct msr_param *m, >> + struct rdt_resource *r); >> + u32 max_delay; >> + u32 min_bw; >> + u32 bw_gran; >> + u32 delay_linear; >> + u32 *mb_map; > > I don't know what other weird controls will be added over time, but we are > probably better off to have > > struct cache_ctrl { > int cbm_len; > int min_cbm_bits; > }; > > struct mba_ctrl { > u32 max_delay; > u32 min_bw; > u32 bw_gran; > u32 delay_linear; > u32 *mb_map; > }; > > and in then in struct rdt_resource: > > <common fields> > union { > struct cache_ctrl foo; > struct mba_ctrl bla; > } ctrl; > > > That avoids that rdt_resource becomes a hodgepodge of unrelated or even > contradicting fields. > > Hmm? Ok, makes sense. Will fix. Thought of a union when i had added a couple fields and given up but its grown a lot now. Thanks, Vikas > > Thanks, > > tglx > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation @ 2017-02-17 19:58 Vikas Shivappa 2017-02-17 19:58 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa 0 siblings, 1 reply; 9+ messages in thread From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw) To: vikas.shivappa Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen, vikas.shivappa Memory b/w allocation(MBA) is part of the Intel Resource Director Technology (RDT). RDT helps monitor and share processor shared resources. MBA helps enforce a limit on the memory bandwidth(b/w) threads can use when they are scheduled. OS does the enforcement using MSR(model specific register) interface and mapping the threads to architecture specific CLOSids(class of service IDs). The user interface is via the common resctrl fs for RDT features. This can be used along with MBM (memory b/w monitoring) and cache allocation to control and monitor the applications cache and memory resources as per the QoS or other performance requirements. Use cases could be large serverclusters, VM, clould and container based services where the admin or orchestration tools can use this framework to manage/allocate these processor shared resources like other memory/cpu resources to provide QoS guarentees. Detailed use case examples are documented in the 01/08 Documentation patch. Patches are dependent on : https://lkml.org/lkml/2017/2/17/595 V2 changes ---------- Thanks to Thomas for all the feed back and below are the changes as per the feedback: - Sent the RDT generic improvements patch series seperately - Split the info file/validate apis as prep patches to MBA changes - Fixed a lot of coding convention issues, variable initializations, and removing unnecessary allocations etc. - Changed the hotcpu handling for CLOS/RMID to zero the CLOSid/RMID when cpu is offlined. - Changed some naming conventions. - Changed the interface to show the bandwidth percentage that can be requested rather than showing h/w specific delay values which do not map to a particular b/w percentage in non-linear scale [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for [PATCH 3/8] x86/intel_rdt/mba: Memory b/w allocation feature detect [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA [PATCH 5/8] x86/intel_rdt: info file support for MBA prepare [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA 2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa @ 2017-02-17 19:58 ` Vikas Shivappa 2017-03-01 15:24 ` Thomas Gleixner 0 siblings, 1 reply; 9+ messages in thread From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw) To: vikas.shivappa Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen, vikas.shivappa The MBA feature details like minimum bandwidth supported, b/w granularity etc are obtained via executing CPUID with EAX=10H ,ECX=3. Setup and initialize the MBA specific extensions to data structures like global list of RDT resources, RDT resource structure and RDT domain structure. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> --- arch/x86/include/asm/intel_rdt.h | 17 +++++++ arch/x86/kernel/cpu/intel_rdt.c | 95 ++++++++++++++++++++++++++++++++++------ 2 files changed, 99 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index d2eee45..af65b2a 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -11,6 +11,9 @@ #define IA32_L3_QOS_CFG 0xc81 #define IA32_L3_CBM_BASE 0xc90 #define IA32_L2_CBM_BASE 0xd10 +#define IA32_MBA_THRTL_BASE 0xd50 +#define MAX_MBA_THRTL 100u +#define MBA_IS_LINEAR 0x4 #define L3_QOS_CDP_ENABLE 0x01ULL @@ -74,6 +77,14 @@ struct rftype { * @default_ctrl: Specifies default cache cbm or mem b/w percent. * @min_cbm_bits: Minimum number of consecutive bits to be set * in a cache bit mask + * @max_delay: Max throttle delay. Delay is the hardware + * understandable value for memory b/w. + * @min_bw: Minimum memory bandwidth in percentage + * user can request + * @bw_gran: Bandwidth granularity + * @delay_linear: True if Mem b/w delay is in linear scale + * @mb_map: Mapping of mem b/w delay to + * b/w throttle percentage * @domains: All domains for this resource * @num_domains: Number of domains active * @msr_base: Base MSR address for CBMs @@ -92,6 +103,11 @@ struct rdt_resource { int cbm_len; int min_cbm_bits; u32 default_ctrl; + u32 max_delay; + u32 min_bw; + u32 bw_gran; + u32 delay_linear; + u32 *mb_map; struct list_head domains; int num_domains; int msr_base; @@ -141,6 +157,7 @@ enum { RDT_RESOURCE_L3DATA, RDT_RESOURCE_L3CODE, RDT_RESOURCE_L2, + RDT_RESOURCE_MBA, /* Must be the last */ RDT_NUM_RESOURCES, diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index b76a518..130ce98 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -76,6 +76,14 @@ struct rdt_resource rdt_resources_all[] = { .cbm_idx_multi = 1, .cbm_idx_offset = 0 }, + { + .name = "MB", + .domains = domain_init(RDT_RESOURCE_MBA), + .msr_base = IA32_MBA_THRTL_BASE, + .cache_level = 3, + .cbm_idx_multi = 1, + .cbm_idx_offset = 0 + }, }; static int cbm_idx(struct rdt_resource *r, int closid) @@ -130,6 +138,51 @@ static inline bool cache_alloc_hsw_probe(void) return false; } +/* + * rdt_get_mb_table() - get a mapping of b/w percentage values + * exposed to user interface and the h/w understandable delay values. + * + * The non-linear delay values have the granularity of power of two + * and also the h/w does not guarantee a curve for configured delay + * values vs. actual b/w throttled. + * Hence we need a mapping that is pre caliberated for user to express + * the b/w in terms of any sensible number. + */ +static inline int rdt_get_mb_table(struct rdt_resource *r) +{ + /* + * There are no Intel SKUs as of now to support non-linear delay. + */ + r->mb_map = NULL; + + return -ENODEV; +} + +static bool rdt_get_mem_config(struct rdt_resource *r) +{ + union cpuid_0x10_3_eax eax; + union cpuid_0x10_x_edx edx; + u32 ebx, ecx; + + cpuid_count(0x00000010, 3, &eax.full, &ebx, &ecx, &edx.full); + r->num_closid = edx.split.cos_max + 1; + r->max_delay = eax.split.max_delay + 1; + r->default_ctrl = MAX_MBA_THRTL; + if (ecx & MBA_IS_LINEAR) { + r->delay_linear = true; + r->min_bw = MAX_MBA_THRTL - r->max_delay; + r->bw_gran = MAX_MBA_THRTL - r->max_delay; + } else { + if (rdt_get_mb_table(r)) + return false; + } + + r->capable = true; + r->enabled = true; + + return true; +} + static void rdt_get_cache_config(int idx, struct rdt_resource *r) { union cpuid_0x10_1_eax eax; @@ -184,9 +237,8 @@ static inline bool get_rdt_resources(void) ret = true; } - if (boot_cpu_has(X86_FEATURE_MBA)) { - ret = true; - } + if (boot_cpu_has(X86_FEATURE_MBA)) + ret = rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]); return ret; } @@ -262,6 +314,30 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, return NULL; } +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d) +{ + int i; + + d->ctrl_val = kmalloc_array(r->num_closid, + sizeof(*d->ctrl_val), GFP_KERNEL); + if (!d->ctrl_val) + return -ENOMEM; + + /* + * Initialize the Control MSRs to having no control. + * For Cache Allocation: Set all bits in cbm + * For Memory Allocation: Set b/w requested to 100 + */ + for (i = 0; i < r->num_closid; i++) { + int idx = cbm_idx(r, i); + + d->ctrl_val[i] = r->default_ctrl; + wrmsrl(r->msr_base + idx, d->ctrl_val[i]); + } + + return 0; +} + /* * domain_add_cpu - Add a cpu to a resource's domain list. * @@ -277,7 +353,7 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, */ static void domain_add_cpu(int cpu, struct rdt_resource *r) { - int i, id = get_cache_id(cpu, r->cache_level); + int id = get_cache_id(cpu, r->cache_level), ret; struct list_head *add_pos = NULL; struct rdt_domain *d; @@ -298,19 +374,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) d->id = id; - d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); - if (!d->ctrl_val) { + ret = domain_setup_ctrlval(r, d); + if (ret) { kfree(d); return; } - for (i = 0; i < r->num_closid; i++) { - int idx = cbm_idx(r, i); - - d->ctrl_val[i] = r->default_ctrl; - wrmsrl(r->msr_base + idx, d->ctrl_val[i]); - } - cpumask_set_cpu(cpu, &d->cpu_mask); list_add_tail(&d->list, add_pos); r->num_domains++; -- 1.9.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA 2017-02-17 19:58 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa @ 2017-03-01 15:24 ` Thomas Gleixner 2017-03-10 21:51 ` Shivappa Vikas 0 siblings, 1 reply; 9+ messages in thread From: Thomas Gleixner @ 2017-03-01 15:24 UTC (permalink / raw) To: Vikas Shivappa Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen On Fri, 17 Feb 2017, Vikas Shivappa wrote: > --- a/arch/x86/include/asm/intel_rdt.h > +++ b/arch/x86/include/asm/intel_rdt.h > @@ -11,6 +11,9 @@ > #define IA32_L3_QOS_CFG 0xc81 > #define IA32_L3_CBM_BASE 0xc90 > #define IA32_L2_CBM_BASE 0xd10 > +#define IA32_MBA_THRTL_BASE 0xd50 > +#define MAX_MBA_THRTL 100u > +#define MBA_IS_LINEAR 0x4 I have a hard time to figure out how the latter two constants are related to this list of registers. MBA_IS_LINEAR is used to check the CPUID bit and MAX_MBA_THRTL is obviously a pure software constant because with a non-linear scale the maximum value is not 100. Just slapping defines to random places is equally bad as using hard coded constants. > +/* > + * rdt_get_mb_table() - get a mapping of b/w percentage values > + * exposed to user interface and the h/w understandable delay values. > + * > + * The non-linear delay values have the granularity of power of two > + * and also the h/w does not guarantee a curve for configured delay > + * values vs. actual b/w throttled. > + * Hence we need a mapping that is pre caliberated for user to express > + * the b/w in terms of any sensible number. ... calibrated so the user can express the bandwidth as a percentage value. > +static inline int rdt_get_mb_table(struct rdt_resource *r) > +{ > + /* > + * There are no Intel SKUs as of now to support non-linear delay. > + */ > + r->mb_map = NULL; What's the point of setting this to NULL? Also it would be helpful to emit log info here so people don't have to start digging around. pr_info("Bandwidth map not implemented for ....", ... model); > + > + return -ENODEV; Returning -ENODEV to a function which just returns a boolean value is pointless. > static void rdt_get_cache_config(int idx, struct rdt_resource *r) > { > union cpuid_0x10_1_eax eax; > @@ -184,9 +237,8 @@ static inline bool get_rdt_resources(void) > ret = true; > } > > - if (boot_cpu_has(X86_FEATURE_MBA)) { > - ret = true; > - } > + if (boot_cpu_has(X86_FEATURE_MBA)) > + ret = rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]); Groan. When rdt_get_mem_config() returns false (because the map is not implemented), then the whole function returns false and CAT is disabled. > +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d) > +{ > + int i; > + > + d->ctrl_val = kmalloc_array(r->num_closid, > + sizeof(*d->ctrl_val), GFP_KERNEL); > + if (!d->ctrl_val) > + return -ENOMEM; > + > + /* > + * Initialize the Control MSRs to having no control. > + * For Cache Allocation: Set all bits in cbm > + * For Memory Allocation: Set b/w requested to 100 > + */ > + for (i = 0; i < r->num_closid; i++) { > + int idx = cbm_idx(r, i); > + > + d->ctrl_val[i] = r->default_ctrl; > + wrmsrl(r->msr_base + idx, d->ctrl_val[i]); > + } So if you use a local pointer for that, this whole mess becomes readable. static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d) { u32 *p; int i; p = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); if (!p) return -ENOMEM; d->ctrl_val = p; /* Initialize the Control MSRs to the default value */ for (i = 0; i < r->num_closid; i++, p++) { int idx = cbm_idx(r, i); *p = r->default_ctrl; wrmsrl(r->msr_base + idx, *p); } > + > + return 0; > +} > static void domain_add_cpu(int cpu, struct rdt_resource *r) > { > - int i, id = get_cache_id(cpu, r->cache_level); > + int id = get_cache_id(cpu, r->cache_level), ret; Bah. If you have the same type in one line, then please move the uninitialized variables to the front. int ret, id = get_cache_id(cpu, r->cache_level); But a s/i/ret/ would have been to simple and kept the code readable. > @@ -298,19 +374,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) > > d->id = id; > > - d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); > - if (!d->ctrl_val) { > + ret = domain_setup_ctrlval(r, d); > + if (ret) { > kfree(d); > return; > } What's the point of this 'ret' variable if the function is void? if (domain_setup_ctrlval(r, d)) { kfree(d); return; } would have been to easy to read, right? Thanks, tglx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA 2017-03-01 15:24 ` Thomas Gleixner @ 2017-03-10 21:51 ` Shivappa Vikas 0 siblings, 0 replies; 9+ messages in thread From: Shivappa Vikas @ 2017-03-10 21:51 UTC (permalink / raw) To: Thomas Gleixner Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen On Wed, 1 Mar 2017, Thomas Gleixner wrote: > On Fri, 17 Feb 2017, Vikas Shivappa wrote: >> --- a/arch/x86/include/asm/intel_rdt.h >> +++ b/arch/x86/include/asm/intel_rdt.h >> @@ -11,6 +11,9 @@ >> #define IA32_L3_QOS_CFG 0xc81 >> #define IA32_L3_CBM_BASE 0xc90 >> #define IA32_L2_CBM_BASE 0xd10 >> +#define IA32_MBA_THRTL_BASE 0xd50 >> +#define MAX_MBA_THRTL 100u >> +#define MBA_IS_LINEAR 0x4 > > I have a hard time to figure out how the latter two constants are related > to this list of registers. MBA_IS_LINEAR is used to check the CPUID bit and > MAX_MBA_THRTL is obviously a pure software constant because with a > non-linear scale the maximum value is not 100. > > Just slapping defines to random places is equally bad as using hard coded > constants. > >> +/* >> + * rdt_get_mb_table() - get a mapping of b/w percentage values >> + * exposed to user interface and the h/w understandable delay values. >> + * >> + * The non-linear delay values have the granularity of power of two >> + * and also the h/w does not guarantee a curve for configured delay >> + * values vs. actual b/w throttled. >> + * Hence we need a mapping that is pre caliberated for user to express >> + * the b/w in terms of any sensible number. > > ... calibrated so the user can express the bandwidth as a percentage value. > >> +static inline int rdt_get_mb_table(struct rdt_resource *r) >> +{ >> + /* >> + * There are no Intel SKUs as of now to support non-linear delay. >> + */ >> + r->mb_map = NULL; > > What's the point of setting this to NULL? > > Also it would be helpful to emit log info here so people don't have to > start digging around. > > pr_info("Bandwidth map not implemented for ....", ... model); > >> + >> + return -ENODEV; > > Returning -ENODEV to a function which just returns a boolean value is > pointless. > >> static void rdt_get_cache_config(int idx, struct rdt_resource *r) >> { >> union cpuid_0x10_1_eax eax; >> @@ -184,9 +237,8 @@ static inline bool get_rdt_resources(void) >> ret = true; >> } >> >> - if (boot_cpu_has(X86_FEATURE_MBA)) { >> - ret = true; >> - } >> + if (boot_cpu_has(X86_FEATURE_MBA)) >> + ret = rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]); > > Groan. When rdt_get_mem_config() returns false (because the map is not > implemented), then the whole function returns false and CAT is disabled. > >> +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d) >> +{ >> + int i; >> + >> + d->ctrl_val = kmalloc_array(r->num_closid, >> + sizeof(*d->ctrl_val), GFP_KERNEL); >> + if (!d->ctrl_val) >> + return -ENOMEM; >> + >> + /* >> + * Initialize the Control MSRs to having no control. >> + * For Cache Allocation: Set all bits in cbm >> + * For Memory Allocation: Set b/w requested to 100 >> + */ >> + for (i = 0; i < r->num_closid; i++) { >> + int idx = cbm_idx(r, i); >> + >> + d->ctrl_val[i] = r->default_ctrl; >> + wrmsrl(r->msr_base + idx, d->ctrl_val[i]); >> + } > > So if you use a local pointer for that, this whole mess becomes readable. > > static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d) > { > u32 *p; > int i; > > p = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); > if (!p) > return -ENOMEM; > > d->ctrl_val = p; > > /* Initialize the Control MSRs to the default value */ > for (i = 0; i < r->num_closid; i++, p++) { > int idx = cbm_idx(r, i); > > *p = r->default_ctrl; > wrmsrl(r->msr_base + idx, *p); > } >> + >> + return 0; >> +} > >> static void domain_add_cpu(int cpu, struct rdt_resource *r) >> { >> - int i, id = get_cache_id(cpu, r->cache_level); >> + int id = get_cache_id(cpu, r->cache_level), ret; > > Bah. If you have the same type in one line, then please move the > uninitialized variables to the front. > > int ret, id = get_cache_id(cpu, r->cache_level); > > But a s/i/ret/ would have been to simple and kept the code readable. > >> @@ -298,19 +374,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) >> >> d->id = id; >> >> - d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL); >> - if (!d->ctrl_val) { >> + ret = domain_setup_ctrlval(r, d); >> + if (ret) { >> kfree(d); >> return; >> } > > What's the point of this 'ret' variable if the function is void? > > if (domain_setup_ctrlval(r, d)) { > kfree(d); > return; > } > > would have been to easy to read, right? Will fix all the issues pointed. Thanks for pointing out. > > Thanks, > > tglx > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-04-08 0:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-04 19:02 [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Tracy Smith -- strict thread matches above, loose matches on Subject: below -- 2017-04-08 0:33 [PATCH 0/8 V4] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa 2017-04-08 0:33 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa 2017-04-03 21:57 [PATCH 0/8 V3] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa 2017-04-03 21:57 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa [not found] ` <CAChUvXM8gWAz6-AJ6jkyKjf5Yz0ze-2XAtvdZvze3Go44TPD8A@mail.gmail.com> 2017-04-04 18:50 ` Shivappa Vikas 2017-04-05 15:40 ` Thomas Gleixner 2017-04-05 18:09 ` Shivappa Vikas 2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa 2017-02-17 19:58 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa 2017-03-01 15:24 ` Thomas Gleixner 2017-03-10 21:51 ` Shivappa Vikas
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.