From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756600AbbEUNcw (ORCPT ); Thu, 21 May 2015 09:32:52 -0400 Received: from mail-bn1bon0113.outbound.protection.outlook.com ([157.56.111.113]:13262 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755170AbbEUNcf (ORCPT ); Thu, 21 May 2015 09:32:35 -0400 Authentication-Results: spf=none (sender IP is 165.204.84.222) smtp.mailfrom=amd.com; intel.com; dkim=none (message not signed) header.d=none; X-WSS-ID: 0NOPC9Z-08-QDT-02 X-M-MSG: Date: Thu, 21 May 2015 21:26:50 +0800 From: Huang Rui To: Borislav Petkov CC: Len Brown , "Rafael J. Wysocki" , Thomas Gleixner , , , Fengguang Wu , "Aaron Lu" , Tony Li Subject: Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer Message-ID: <20150521132650.GC20838@hr-slim.amd.com> References: <1432022472-2224-1-git-send-email-ray.huang@amd.com> <1432022472-2224-3-git-send-email-ray.huang@amd.com> <20150519113121.GD4819@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20150519113121.GD4819@pd.tnic> User-Agent: Mutt/1.5.21 (2010-09-15) X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;BN1AFFO11FD052;1:WZcEKbYLrtuSrtfL7AXJHhDMsa637OTgb2WK/A9GHruivLi/o1YcPOzgs45sV7lG0lG/78W9hAdtX/B7XAb3pB04EJ6SodoLvP8h40UpiaXiosDgZ4+xunqm/KCnF/ON42dKv8vsyJ1eDYgOC97rGck/74pX8Bpum5+NuuKwIyWEgFL70MJvPy1QIvzorAc/4Ge14TXKo3+TK2pLfbqdTn3Tstht50OO7u9+R/MGCx/aQDAhz12hr7w37dg1mY1xncpMkbvI9QoYnQcMIusgMsTXbBs76J9LwF3WQ6GUc46rezOq3hqJXF+Cx17wWWfPkvm0ywr5g8jEZ3FNoTZJyg== X-Forefront-Antispam-Report: CIP:165.204.84.222;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10019020)(6009001)(428002)(189002)(199003)(51704005)(24454002)(164054003)(5001830100001)(19580395003)(76176999)(62966003)(33656002)(189998001)(5001860100001)(110136002)(23726002)(83506001)(19580405001)(101416001)(50466002)(47776003)(86362001)(97756001)(87936001)(4001540100001)(77156002)(54356999)(50986999)(2950100001)(68736005)(105586002)(4001350100001)(64706001)(46102003)(46406003)(106466001)(53416004)(97736004)(77096005)(92566002)(217873001);DIR:OUT;SFP:1102;SCL:1;SRVR:BLUPR02MB065;H:atltwp02.amd.com;FPR:;SPF:None;PTR:InfoDomainNonexistent;MX:1;A:1;LANG:en; X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BLUPR02MB065; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:BLUPR02MB065;BCL:0;PCL:0;RULEID:;SRVR:BLUPR02MB065; X-Forefront-PRVS: 0583A86C08 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 May 2015 13:32:28.2009 (UTC) X-MS-Exchange-CrossTenant-Id: fde4dada-be84-483f-92cc-e026cbee8e96 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=fde4dada-be84-483f-92cc-e026cbee8e96;Ip=[165.204.84.222];Helo=[atltwp02.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR02MB065 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 19, 2015 at 01:31:21PM +0200, Borislav Petkov wrote: > On Tue, May 19, 2015 at 04:01:10PM +0800, Huang Rui wrote: > > MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors. > > The cpu core still consumes less power while waiting, and has faster exit > > from waiting than "Halt". This patch implements an interface using the > > kernel parameter "idle=" to configure mwaitx type and timer value. > > > > If "idle=mwaitx", the timeout will be set as the maximum value > > ((2^64 - 1) * TSC cycle). > > If "idle=mwaitx,100", the timeout will be set as 100ns. > > If the processor doesn't support MWAITX, then halt is used. > > Ok, I see what you're trying here and I think this is not the optimal > approach. > > So let me explain how I see it, you correct me if I'm wrong: > > So we want to do MWAITX so that we can save us idle entry/exit overhead > with HLT. Because MWAITX is faster, reportedly. > > Now, if we want to do that, we want to do it dynamically and adjust the > MWAITX sleep interval depending on the system, usage pattern, system > load and so on. Yes, that's right. Maybe, even we can also find any other use case on kernel other components. > > And for that we would need an adaptive scheme which approximates each > idle interval. Simply taking TSC before we enter idle and after we come > out would give us each idle residency duration and we can do some simple > math to approximate it. > > Now, what would that bring us: faster wakeup times. > > And here comes the 10^6 $ question: why are we doing all the fun? > Do you mention the below codes: /* * TSC loops (EBX input) = Timer(nsec) * * TSC freq(khz) / 1000000 */ timeout = timeout * tsc_freq; do_div(timeout, 1000000); That's because the unit of tsc_freq is KHz and the unit of timeout is nanosecond. Then I do div 10^6 to figure out the corresponding loops. > I'm thinking we want to find a cutoff duration where for smaller > durations it is worth to do MWAITX and have faster entry/exit times and > for bigger durations we want to do HLT because it'll get into C1E and > give us higher power savings. > > We don't want to do MWAITX too long because that'll burn more power > relatively to HLT but we don't want to do HLT for shorter periods > because then entry/exit costs. > > Am I on the right track at least? Yes, that's totally right. > > > Signed-off-by: Huang Rui > > --- > > arch/x86/include/asm/mwait.h | 2 + > > arch/x86/include/asm/processor.h | 2 +- > > arch/x86/kernel/process.c | 79 ++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 82 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > > index b91136f..c4e51e7 100644 > > --- a/arch/x86/include/asm/mwait.h > > +++ b/arch/x86/include/asm/mwait.h > > @@ -14,6 +14,8 @@ > > #define CPUID5_ECX_INTERRUPT_BREAK 0x2 > > > > #define MWAIT_ECX_INTERRUPT_BREAK 0x1 > > +#define MWAITX_ECX_TIMER_ENABLE 0x2 > > Use BIT(1) here. OK. > > > +#define MWAITX_EBX_WAIT_TIMEOUT 0xffffffff > > > > static inline void __monitor(const void *eax, unsigned long ecx, > > unsigned long edx) > > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > > index 23ba676..0f60e94 100644 > > --- a/arch/x86/include/asm/processor.h > > +++ b/arch/x86/include/asm/processor.h > > @@ -733,7 +733,7 @@ extern unsigned long boot_option_idle_override; > > extern bool amd_e400_c1e_detected; > > > > enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT, > > - IDLE_POLL}; > > + IDLE_POLL, IDLE_MWAITX}; > > > > extern void enable_sep_cpu(void); > > extern int sysenter_setup(void); > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > > index 6e338e3..9d68193 100644 > > --- a/arch/x86/kernel/process.c > > +++ b/arch/x86/kernel/process.c > > @@ -30,6 +30,7 @@ > > #include > > #include > > #include > > +#include > > > > /* > > * per-CPU TSS segments. Thre ads are completely 'soft' on Linux, > > @@ -276,6 +277,7 @@ unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE; > > EXPORT_SYMBOL(boot_option_idle_override); > > > > static void (*x86_idle)(void); > > +static unsigned long idle_param; > > > > #ifndef CONFIG_SMP > > static inline void play_dead(void) > > @@ -444,6 +446,17 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c) > > return 1; > > } > > > > +static int not_support_mwaitx(const struct cpuinfo_x86 *c) > > +{ > > + if (c->x86_vendor != X86_VENDOR_AMD) > > + return 1; > > + > > + if (!cpu_has(c, X86_FEATURE_MWAITT)) > > + return 1; > > + > > + return 0; > > +} > > + > > /* > > * MONITOR/MWAIT with no hints, used for default default C1 state. > > * This invokes MWAIT with interrutps enabled and no flags, > > @@ -470,12 +483,45 @@ static void mwait_idle(void) > > __current_clr_polling(); > > } > > > > +/* > > + * AMD Excavator processors support the new MONITORX/MWAITX instructions. > > No need for that especially when newer than XV processors start > supporting those too. > OK, got it. Thanks, Rui