From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7BDBC433DF for ; Mon, 18 May 2020 17:28:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7789C20715 for ; Mon, 18 May 2020 17:28:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amdcloud.onmicrosoft.com header.i=@amdcloud.onmicrosoft.com header.b="1tz4rrhu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728344AbgERR2M (ORCPT ); Mon, 18 May 2020 13:28:12 -0400 Received: from mail-mw2nam10on2069.outbound.protection.outlook.com ([40.107.94.69]:60256 "EHLO NAM10-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726958AbgERR2M (ORCPT ); Mon, 18 May 2020 13:28:12 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZgnPTA5iT3ESYpW/CiciazEvmTWdD+Ci3DRiAQ17o0mTILaTnqKqX0yLGn7Cc/MHmhUZwdFMXGcaSfcH5VE3y2EwOLznnbe+8DgvJCPQHS0kuXxL61/wTO8rn5UHswJoxQBXZw3iQNgIEXZHRJ6TjbEF8zQzziWhvLoC5LHGZYxwX/vsFlGj/RVxU9HA7al2y/9Q8T3qbFzDMW4DIF6NMRxcV2a4euP4heRmuNwnhtbm4mm6UASa+yVm/xL25nzMcdfTTpml/xKGBjAYXK+cmfTxUz7tzppJC8TJg+7zUeMjtflzEFsy/otCRKhIW18R3eB3gCWFXf771s0gaeDEVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+TrpFStrUhrzXNWGCSfSr7XLzlCOTNENx3iodQjlwXY=; b=kAdKHrLWtnl7a898qH4G0VhIwZT19zc/e0EUVDYt3T9v8sGRgGcfUpOitSB5/ID/nOl6ToBd7lEEu49J3ZR6kOPqkLKtElRaSX+UGyipKVdjofKQyvZ7ShuCVCzOMAjcaYHkOiehKELftdxlVM62QoMXKrDPju3ob+qtTPbrhLpUVJoUoOuORgoTed0kSzlnSNKuaJbz0kNs93wEkxlHccWNGAtOSMo8GRO51uyzWbz993EbFrkKUApyaE2xKhTzb68cGXdewfO/9sq9vHOahVZSjg26ImJYbUfx7hHluJH5fr72iwErzEU04g/JSZlDGOZpITYACm58A/HylKl//g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector2-amdcloud-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+TrpFStrUhrzXNWGCSfSr7XLzlCOTNENx3iodQjlwXY=; b=1tz4rrhut1QjCV4UnxXyly4t6PHJriy04vhfEUFJhv75vX3wBGi8CJOSgbuWhy1jkMz8ze0q8YXLpJXjUlK7gpu9UFBn8Ypj53C8MdC7u5bIuK1PjD1/gkC2259yHfOgjdBLA3E4KVrHA1XXPcPlt8HK1LOuXjZLm/9iMFiO8Ks= Authentication-Results: vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=none action=none header.from=amd.com; Received: from SN1PR12MB2560.namprd12.prod.outlook.com (2603:10b6:802:26::19) by SN1PR12MB2366.namprd12.prod.outlook.com (2603:10b6:802:25::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3000.27; Mon, 18 May 2020 17:28:06 +0000 Received: from SN1PR12MB2560.namprd12.prod.outlook.com ([fe80::c0f:2938:784f:ed8d]) by SN1PR12MB2560.namprd12.prod.outlook.com ([fe80::c0f:2938:784f:ed8d%7]) with mapi id 15.20.3000.034; Mon, 18 May 2020 17:28:06 +0000 Subject: RE: [PATCH V4 1/4] x86/resctrl: Enable user to view and select thread throttling mode To: Reinette Chatre , "tglx@linutronix.de" , "fenghua.yu@intel.com" , "bp@alien8.de" , "tony.luck@intel.com" Cc: "kuo-lang.tseng@intel.com" , "ravi.v.shankar@intel.com" , "mingo@redhat.com" , "hpa@zytor.com" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" References: <667e48a36924c1710f6f2dd0116d388170721528.1589652468.git.reinette.chatre@intel.com> From: Babu Moger Message-ID: Date: Mon, 18 May 2020 12:28:02 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 In-Reply-To: <667e48a36924c1710f6f2dd0116d388170721528.1589652468.git.reinette.chatre@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit X-ClientProxiedBy: DM5PR12CA0012.namprd12.prod.outlook.com (2603:10b6:4:1::22) To SN1PR12MB2560.namprd12.prod.outlook.com (2603:10b6:802:26::19) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [10.236.30.87] (165.204.77.1) by DM5PR12CA0012.namprd12.prod.outlook.com (2603:10b6:4:1::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3000.26 via Frontend Transport; Mon, 18 May 2020 17:28:04 +0000 X-Originating-IP: [165.204.77.1] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 91e79b68-8495-476b-6291-08d7fb50d37f X-MS-TrafficTypeDiagnostic: SN1PR12MB2366: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-Forefront-PRVS: 04073E895A X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: e3r4IEbkKG50PGKtAP9HUf1xtJo5hwEPVM02B2n3IBnylw2hy2hqPoMXV8pTiE2s+L6Rx33sY9zuSumqNH0NpfSokiI+Mjf98ndE+FRIYVTueEv5+Kk/mB6ik0mYwyCObL108tiTSS2OCN9ScJpZeLXvRHjpeTmofwGt/lp0ENen+Ttf8zygZ7dGc2So5NWB2i5Z5v1riWyhzUfqMFbnOnatbze/exlKDcAow/ZtKxeYtProjFrkOx+5sJowbrlmkls9JkrZ5GlbUqAfZ3rqxby4DglYgKNa9GFVzGOQOVNoEGxbDjLioXjm2kKeeD8bPnMVoMb8j8nVM18jBdujrpQvMZ0OwPnJxbwh7iJNkPuJKtmZ6XezeapfVKSRgKp7nQ1e4RUvNLvaDJtBpwDvhD9xGFSSPpcXbluWa0c0zax3Gfidhiax673zHjQcrY3ybLlWRtMelqtZ3iPyCxBuFYnaY3PM8OEbyiLxMPFBAWmQ9UTaQuuGr8MP8s+atj7cxWDF0/DDCjWPPhl4v5crY8KHfXRcogQNCpWI+09yvRCiNdzM4sjHoM2TqovnitFmUmzbOYbwQabUC9krrlF8hw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SN1PR12MB2560.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFTY:;SFS:(4636009)(39860400002)(366004)(136003)(396003)(376002)(346002)(16576012)(31696002)(31686004)(86362001)(110136005)(316002)(36756003)(2616005)(956004)(66946007)(66476007)(66556008)(44832011)(26005)(53546011)(52116002)(83080400001)(186003)(30864003)(966005)(16526019)(54906003)(5660300002)(4326008)(8676002)(45080400002)(8936002)(478600001)(2906002)(6486002)(7416002)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData: oE3nwMhliWq1051hQ5jo+j0R9zlLnw1ZwiZ8zLgAa8bFrVtQnb3rEOjF+VkkFkLj4t5k+mX5uRtzDEdaX3ak98JY1CdxcXu2DYwtCTi4nEKr8Aj45Dv87yzMLrqzQGMKV0AJYJ4ab4Piq21lNnIFcQN3KvvtfjHiYxb19aycbwWZ1exiZhiogio2vnP36LxxQoNtF1bnL9OaU/63GhR3czHrKe+QzjNYk/mPbch96lGLCJsb6HGs+8JaqHxdW7G1Wl9zP5jAebbUA2WbXX1gte/eGRBXFG/zjp8tkh0RQ+d0czeOTptL5mplcF7N/6XnWv7kR5fE5AiT2CuA3XTIYgVca+aS1JCvSRGWVm94g7U/oO92AAg6mR0QR8P4GVgFMt5VW6Df3l7KcZi/csXzcbsbAS+RAbfGILXOmtkiTYOfNgsqIkxBVyVGQ8tnB2P5O9fgtPlaHphDT2AVUQbjD9SRtEX8cjS6Ky0MMRKLV7w= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 91e79b68-8495-476b-6291-08d7fb50d37f X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2020 17:28:06.3219 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ikv96HTESIRb2lj6zYTef/Vwci7UojmGn8N0UGjEjO6ZtaG19UL5tqgeT2/8QEdk X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1PR12MB2366 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Reinette, Looks good most part. Few minor comments. > -----Original Message----- > From: Reinette Chatre > Sent: Saturday, May 16, 2020 1:29 PM > To: tglx@linutronix.de; fenghua.yu@intel.com; bp@alien8.de; > tony.luck@intel.com > Cc: kuo-lang.tseng@intel.com; ravi.v.shankar@intel.com; mingo@redhat.com; > Moger, Babu ; hpa@zytor.com; x86@kernel.org; > linux-kernel@vger.kernel.org; Reinette Chatre > Subject: [PATCH V4 1/4] x86/resctrl: Enable user to view and select thread > throttling mode > > Intel Memory Bandwidth Allocation (MBA) control is provided per > processor core. At the same time different CLOS, configured with > different bandwidth percentages, can be assigned to the hardware > threads sharing a core. In the original implementation of MBA the > maximum throttling of the per-thread CLOS is allocated to the core. > Specifically, the lower bandwidth percentage is allocated to the core. > > Newer systems can be configured to allocate either maximum or > minimum throttling of the per-thread CLOS values to the core. > > Introduce a new resctrl file, "thread_throttle_mode", on Intel systems > that exposes to the user how per-thread values are allocated to > a core. On systems that support the original MBA implementation the > file will always display "max". On systems that can be configured > the possible values are "min" or "max" that the user can modify by > writing these same words to the file. > > AMD confirmed in > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker > nel.org%2Flkml%2F18d277fd-6523-319c-d560- > 66b63ff606b8%40amd.com&data=02%7C01%7Cbabu.moger%40amd.com > %7C49a96bbf29ca4d7fa65d08d7f9c70d7e%7C3dd8961fe4884e608e11a82d994 > e183d%7C0%7C0%7C637252505635232310&sdata=Z6cFJuy50gDuGDdh%2 > F0yi0YKHHfrbX0vETW3ZdYNDDGE%3D&reserved=0 > that AMD bandwidth allocation is already at thread level. But AMD does not > use memory delay throttle model to control the allocation like Intel does. > So, to avoid any confusion the thread throttling mode would be UNDEFINED > on AMD systems and the "thread_throttle_mode" file will not be visible. > > Cc: Babu Moger > Signed-off-by: Reinette Chatre > --- > Changes since V3: > - Maintain the thread throttling mode as a property ("arch_throttle_mode") > of the memory bandwidth allocation resource > instead of calling a function that queries the system's model each time > this information is needed. Use this new property generically throughout > with the goal of being independent of architecture. (Babu) > - Remove "intel" from thread_throttle_mode_init_intel_rw() and > thread_throttle_mode_init_intel_ro() in anticipation of usage by other > architectures. > - Use function parameter to update_mba_cfg() directly instead of having > local variable point to it. (Fenghua) > - Remove unnecessary additional check whether platform supports feature > from the "thread_throttle_mode" callback. The file will only be > accessible on platforms that support the feature. > - Rework commit message to be more complete regarding > support of all architectures after incorporating AMD feedback. > - View the thread throttle mode with rdtgroup mutex held since it is using > cached information that may be changed concurrently. > > Documentation/x86/resctrl_ui.rst | 19 ++- > arch/x86/kernel/cpu/resctrl/core.c | 54 +++++++ > arch/x86/kernel/cpu/resctrl/internal.h | 47 +++++- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 196 ++++++++++++++++++++++++- > 4 files changed, 307 insertions(+), 9 deletions(-) > > diff --git a/Documentation/x86/resctrl_ui.rst > b/Documentation/x86/resctrl_ui.rst > index 5368cedfb530..861ee2816470 100644 > --- a/Documentation/x86/resctrl_ui.rst > +++ b/Documentation/x86/resctrl_ui.rst > @@ -138,6 +138,19 @@ with respect to allocation: > non-linear. This field is purely informational > only. > > +"thread_throttle_mode": > + Indicator (on some CPU models control) on Intel systems > + of how tasks running on threads of a physical core are > + throttled in cases where they request different memory > + bandwidth percentages: > + > + "min": > + the largest percentage is applied > + to all threads > + "max": > + the smallest percentage is applied > + to all threads > + > If RDT monitoring is available there will be an "L3_MON" directory > with the following files: > > @@ -364,8 +377,10 @@ to the next control step available on the hardware. > > The bandwidth throttling is a core specific mechanism on some of Intel > SKUs. Using a high bandwidth and a low bandwidth setting on two threads > -sharing a core will result in both threads being throttled to use the > -low bandwidth. The fact that Memory bandwidth allocation(MBA) is a core > +sharing a core may result in both threads being throttled to use the > +low bandwidth (see "thread_throttle_mode"). > + > +The fact that Memory bandwidth allocation(MBA) may be a core > specific mechanism where as memory bandwidth monitoring(MBM) is done at > the package level may lead to confusion when users try to apply control > via the MBA and then monitor the bandwidth to see if the controls are > diff --git a/arch/x86/kernel/cpu/resctrl/core.c > b/arch/x86/kernel/cpu/resctrl/core.c > index 12f967c6b603..129ff0cec7a7 100644 > --- a/arch/x86/kernel/cpu/resctrl/core.c > +++ b/arch/x86/kernel/cpu/resctrl/core.c > @@ -250,6 +250,45 @@ static inline bool rdt_get_mb_table(struct rdt_resource > *r) > return false; > } > > +/* > + * Restore the MBA configuration from the cached configuration. Used for > + * the case when an entire package was offline at the time the user made > + * the configuration change. > + */ > +static void mba_cfg_reconfigure_throttle_mode(struct rdt_resource *r) > +{ > + if (!r->alloc_capable) > + return; > + > + if (r == &rdt_resources_all[RDT_RESOURCE_MBA] && > + r->membw.arch_throttle_mode == THREAD_THROTTLE_MIN_MAX) > + wrmsrl(MSR_MBA_CFG, mba_cfg_msr); > +} How about this? It is kind of consistent with other checks that are done. If (r->alloc_capable && (r == &rdt_resources_all[RDT_RESOURCE_MBA]) && (r->membw.arch_throttle_mode == THREAD_THROTTLE_MIN_MAX)) wrmsrl(MSR_MBA_CFG, mba_cfg_msr); > + > +/* > + * Model-specific test to determine if platform where memory bandwidth > + * control is applied to a core can be configured to apply either the > + * maximum or minimum of the per-thread delay values. > + * By default, platforms where memory bandwidth control is applied to a > + * core will select the maximum delay value of the per-thread CLOS. > + * > + * NOTE: delay value programmed to hardware is inverse of bandwidth > + * percentage configured via user interface. > + */ > +static bool mba_cfg_supports_min_max_intel(void) > +{ > + switch (boot_cpu_data.x86_model) { > + case INTEL_FAM6_ATOM_TREMONT_D: > + case INTEL_FAM6_ICELAKE_X: > + case INTEL_FAM6_ICELAKE_D: > + return true; > + default: > + return false; > + } > + > + return false; Is this last return required? I don't think so. We will never go here. > +} > + > static bool __get_mem_config_intel(struct rdt_resource *r) > { > union cpuid_0x10_3_eax eax; > @@ -270,6 +309,14 @@ static bool __get_mem_config_intel(struct > rdt_resource *r) > } > r->data_width = 3; > > + if (mba_cfg_supports_min_max_intel()) { > + r->membw.arch_throttle_mode = > THREAD_THROTTLE_MIN_MAX; > + thread_throttle_mode_init_rw(); > + } else { > + r->membw.arch_throttle_mode = > THREAD_THROTTLE_MAX_ONLY; > + thread_throttle_mode_init_ro(); > + } > + > r->alloc_capable = true; > r->alloc_enabled = true; > > @@ -289,6 +336,11 @@ static bool __rdt_get_mem_config_amd(struct > rdt_resource *r) > /* AMD does not use delay */ > r->membw.delay_linear = false; > > + /* > + * AMD does not use memory delay throttle model to control > + * the allocation like Intel does. > + */ > + r->membw.arch_throttle_mode = THREAD_THROTTLE_UNDEFINED; > r->membw.min_bw = 0; > r->membw.bw_gran = 1; > /* Max value is 2048, Data width should be 4 in decimal */ > @@ -580,6 +632,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource > *r) > > rdt_domain_reconfigure_cdp(r); > > + mba_cfg_reconfigure_throttle_mode(r); > + > if (r->alloc_capable && domain_setup_ctrlval(r, d)) { > kfree(d); > return; > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h > b/arch/x86/kernel/cpu/resctrl/internal.h > index f20a47d120b1..6b9b21d67c9b 100644 > --- a/arch/x86/kernel/cpu/resctrl/internal.h > +++ b/arch/x86/kernel/cpu/resctrl/internal.h > @@ -9,6 +9,7 @@ > > #define MSR_IA32_L3_QOS_CFG 0xc81 > #define MSR_IA32_L2_QOS_CFG 0xc82 > +#define MSR_MBA_CFG 0xc84 > #define MSR_IA32_L3_CBM_BASE 0xc90 > #define MSR_IA32_L2_CBM_BASE 0xd10 > #define MSR_IA32_MBA_THRTL_BASE 0xd50 > @@ -21,6 +22,9 @@ > > #define L2_QOS_CDP_ENABLE 0x01ULL > > +#define MBA_THROTTLE_MODE_MIN 0x01ULL > +#define MBA_THROTTLE_MODE_MAX 0x00ULL > + > /* > * Event IDs are used to program IA32_QM_EVTSEL before reading event > * counter from IA32_QM_CTR > @@ -38,6 +42,8 @@ > #define MBA_MAX_MBPS U32_MAX > #define MAX_MBA_BW_AMD 0x800 > > +#define MBA_THROTTLE_MODE_MASK BIT_ULL(0) > + > #define RMID_VAL_ERROR BIT_ULL(63) > #define RMID_VAL_UNAVAIL BIT_ULL(62) > /* > @@ -47,6 +53,10 @@ > */ > #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE) > > +/* > + * MSR_MBA_CFG cache > + */ > +extern u64 mba_cfg_msr; > > struct rdt_fs_context { > struct kernfs_fs_context kfc; > @@ -368,6 +378,26 @@ struct rdt_cache { > unsigned int shareable_bits; > }; > > +/** > + * enum membw_throttle_mode - System's memory bandwidth throttling mode > + * @THREAD_THROTTLE_UNDEFINED: Not relevant to the system > + * @THREAD_THROTTLE_MIN_MAX: Memory bandwidth is throttled at the > core and > + * can be configured to use smallest or largest > + * bandwidth percentage assigned to threads. > + * Systems that support this mode will support > + * MSR_MBA_CFG and the configuration of > + * thread throttling mode via resctrl > + * file "thread_throttle_mode". > + * @THREAD_THROTTLE_MAX_ONLY: Memory bandwidth is throttled at the > core > + * always using smallest bandwidth percentage > + * assigned to threads, aka "max throttling" > + */ > +enum membw_throttle_mode { > + THREAD_THROTTLE_UNDEFINED = 0, > + THREAD_THROTTLE_MIN_MAX, > + THREAD_THROTTLE_MAX_ONLY, > +}; > + > /** > * struct rdt_membw - Memory bandwidth allocation related data > * @max_delay: Max throttle delay. Delay is the hardware > @@ -375,16 +405,19 @@ struct rdt_cache { > * @min_bw: Minimum memory bandwidth percentage user can > request > * @bw_gran: Granularity at which the memory bandwidth is allocated > * @delay_linear: True if memory B/W delay is in linear scale > + * @arch_throttle_mode: Bandwidth throttling mode when threads > request > + * different memory bandwidths > * @mba_sc: True if MBA software controller(mba_sc) is enabled > * @mb_map: Mapping of memory B/W percentage to memory B/W > delay > */ > struct rdt_membw { > - u32 max_delay; > - u32 min_bw; > - u32 bw_gran; > - u32 delay_linear; > - bool mba_sc; > - u32 *mb_map; > + u32 max_delay; > + u32 min_bw; > + u32 bw_gran; > + u32 delay_linear; > + enum membw_throttle_mode arch_throttle_mode; > + bool mba_sc; > + u32 *mb_map; > }; > > static inline bool is_llc_occupancy_enabled(void) > @@ -611,5 +644,7 @@ void __check_limbo(struct rdt_domain *d, bool > force_free); > bool cbm_validate_intel(char *buf, u32 *data, struct rdt_resource *r); > bool cbm_validate_amd(char *buf, u32 *data, struct rdt_resource *r); > void rdt_domain_reconfigure_cdp(struct rdt_resource *r); > +void thread_throttle_mode_init_rw(void); > +void thread_throttle_mode_init_ro(void); > > #endif /* _ASM_X86_RESCTRL_INTERNAL_H */ > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index d7cb5ab0d1f0..3ce6319b7226 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -29,6 +29,7 @@ > > #include > > +#include > #include > #include "internal.h" > > @@ -38,6 +39,7 @@ DEFINE_STATIC_KEY_FALSE(rdt_alloc_enable_key); > static struct kernfs_root *rdt_root; > struct rdtgroup rdtgroup_default; > LIST_HEAD(rdt_all_groups); > +u64 mba_cfg_msr; > > /* Kernel fs node for "info" directory under root */ > static struct kernfs_node *kn_info; > @@ -1017,6 +1019,141 @@ static int max_threshold_occ_show(struct > kernfs_open_file *of, > return 0; > } > > +/* > + * As documented in the Intel SDM, on systems supporting the original MBA > + * implementation the delay value allocated to a core is always the maximum > + * of the delay values assigned to the hardware threads sharing the core. > + * > + * Some systems support a model-specific MSR with which this default > + * behavior can be changed. On these systems the core can be allocated > + * with either the minimum or maximum delay value assigned to its hardware > + * threads. > + * > + * NOTE: The hardware deals with memory delay values that may be > programmed > + * from zero (implying zero delay, and full bandwidth available) to the > + * maximum specified in CPUID. The software interface deals with memory > + * bandwidth percentages that are the inverse of the delay values (100% > + * memory bandwidth from user perspective is zero MBA delay from hardware > + * perspective). When maximum throttling is active the core is allocated > + * with the maximum delay value that from the software interface will be > + * the minimum of the bandwidth percentages assigned to the hardware > threads > + * sharing the core. > + */ > +static int rdt_thread_throttle_mode_show(struct kernfs_open_file *of, > + struct seq_file *seq, void *v) > +{ > + struct rdt_resource *r = of->kn->parent->priv; > + unsigned int throttle_mode = 0; > + > + mutex_lock(&rdtgroup_mutex); > + > + if (r->membw.arch_throttle_mode == THREAD_THROTTLE_MIN_MAX) > + throttle_mode = mba_cfg_msr & > MBA_THROTTLE_MODE_MASK; > + > + seq_puts(seq, > + throttle_mode == MBA_THROTTLE_MODE_MIN ? "min\n" : > "max\n"); > + > + mutex_unlock(&rdtgroup_mutex); > + return 0; > +} > + > +static void update_mba_cfg(void *data) > +{ > + wrmsrl(MSR_MBA_CFG, *(u64 *)data); > +} > + > +/* > + * The model-specific MBA configuration MSR has package scope. Making a > + * system-wide MBA configuration change thus needs to modify the MSR on > one > + * CPU from each package. > + */ > +static int rdt_system_mba_cfg_set(u64 mba_cfg) > +{ > + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA]; > + cpumask_var_t cpu_mask; > + struct rdt_domain *d; > + > + if (list_is_singular(&r->domains)) { > + wrmsrl(MSR_MBA_CFG, mba_cfg); > + goto out; > + } > + > + if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) { > + rdt_last_cmd_puts("Memory allocation error\n"); > + return -ENOMEM; > + } > + > + list_for_each_entry(d, &r->domains, list) > + cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask); > + > + on_each_cpu_mask(cpu_mask, update_mba_cfg, &mba_cfg, 1); > + > + free_cpumask_var(cpu_mask); > +out: > + mba_cfg_msr = mba_cfg; > + return 0; > +} > + > +static void mba_cfg_reset_all(void) > +{ > + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA]; > + > + if (!r->alloc_capable) > + return; > + > + if (r->membw.arch_throttle_mode == THREAD_THROTTLE_MIN_MAX) > + rdt_system_mba_cfg_set(0); > +} > + > +/* > + * Callback will only be associated with the "thread_throttle_mode" file on > + * systems that are capable of memory bandwidth allocation > + * (RDT_RESOURCE_MBA is alloc_capable) AND the thread throttle mode is > + * THREAD_THROTTLE_MIN_MAX. > + * > + * See NOTE associated with rdt_thread_throttle_mode_show() for > + * details of the min/max interpretation. > + */ > +static ssize_t rdt_thread_throttle_mode_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, > + loff_t off) > +{ > + u64 mba_cfg; > + int ret = 0; > + > + if (nbytes == 0) > + return -EINVAL; > + > + cpus_read_lock(); > + mutex_lock(&rdtgroup_mutex); > + > + rdt_last_cmd_clear(); > + > + mba_cfg = mba_cfg_msr & MBA_THROTTLE_MODE_MASK; > + > + if ((sysfs_streq(buf, "min") && mba_cfg == > MBA_THROTTLE_MODE_MIN) || > + (sysfs_streq(buf, "max") && mba_cfg == > MBA_THROTTLE_MODE_MAX)) > + goto out; > + > + if (sysfs_streq(buf, "min")) { > + mba_cfg = MBA_THROTTLE_MODE_MIN; > + } else if (sysfs_streq(buf, "max")) { > + mba_cfg = MBA_THROTTLE_MODE_MAX; > + } else { > + rdt_last_cmd_puts("Unknown or unsupported mode\n"); > + ret = -EINVAL; > + goto out; > + } > + > + mba_cfg = (mba_cfg_msr & ~MBA_THROTTLE_MODE_MASK) | > mba_cfg; > + ret = rdt_system_mba_cfg_set(mba_cfg); > + > +out: > + mutex_unlock(&rdtgroup_mutex); > + cpus_read_unlock(); > + return ret ?: nbytes; > +} > + > static ssize_t max_threshold_occ_write(struct kernfs_open_file *of, > char *buf, size_t nbytes, loff_t off) > { > @@ -1512,6 +1649,16 @@ static struct rftype res_common_files[] = { > .seq_show = rdt_delay_linear_show, > .fflags = RF_CTRL_INFO | RFTYPE_RES_MB, > }, > + /* > + * Platform specific which (if any) capabilities are provided by > + * thread_throttle_mode. Defer some initialization to platform > + * discovery. > + */ > + { > + .name = "thread_throttle_mode", > + .kf_ops = &rdtgroup_kf_single_ops, > + .seq_show = rdt_thread_throttle_mode_show, > + }, > { > .name = "max_threshold_occupancy", > .mode = 0644, > @@ -1571,6 +1718,52 @@ static struct rftype res_common_files[] = { > > }; > > +static struct rftype *rdtgroup_rftype_by_name(const char *name) > +{ > + struct rftype *rfts, *rft; > + int len; > + > + rfts = res_common_files; > + len = ARRAY_SIZE(res_common_files); > + > + for (rft = rfts; rft < rfts + len; rft++) { > + if (!strcmp(rft->name, name)) > + return rft; > + } > + > + return NULL; > +} > + > +/* > + * Only systems that support memory bandwidth allocation and have > + * THREAD_THROTTLE_MIN_MAX set will support a writable > thread_throttle_mode > + * file with which the user can modify the thread throttling mode. > + */ > +void __init thread_throttle_mode_init_rw(void) > +{ > + struct rftype *rft; > + > + rft = rdtgroup_rftype_by_name("thread_throttle_mode"); > + if (!rft) > + return; > + > + rft->mode = 0644; > + rft->write = rdt_thread_throttle_mode_write; > + rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB; > +} > + > +void __init thread_throttle_mode_init_ro(void) > +{ > + struct rftype *rft; > + > + rft = rdtgroup_rftype_by_name("thread_throttle_mode"); > + if (!rft) > + return; > + > + rft->mode = 0444; > + rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB; > +} > + > static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags) > { > struct rftype *rfts, *rft; > @@ -1582,7 +1775,7 @@ static int rdtgroup_add_files(struct kernfs_node *kn, > unsigned long fflags) > lockdep_assert_held(&rdtgroup_mutex); > > for (rft = rfts; rft < rfts + len; rft++) { > - if ((fflags & rft->fflags) == rft->fflags) { > + if (rft->fflags && ((fflags & rft->fflags) == rft->fflags)) { > ret = rdtgroup_add_file(kn, rft); > if (ret) > goto error; > @@ -2371,6 +2564,7 @@ static void rdt_kill_sb(struct super_block *sb) > /*Put everything back to default values. */ > for_each_alloc_enabled_rdt_resource(r) > reset_all_ctrls(r); > + mba_cfg_reset_all(); > cdp_disable_all(); > rmdir_all_sub(); > rdt_pseudo_lock_release(); > -- > 2.21.0