From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753630AbbJ0DlG (ORCPT ); Mon, 26 Oct 2015 23:41:06 -0400 Received: from mail-bn1bon0086.outbound.protection.outlook.com ([157.56.111.86]:6512 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751874AbbJ0DlD convert rfc822-to-8bit (ORCPT ); Mon, 26 Oct 2015 23:41:03 -0400 Authentication-Results: spf=none (sender IP is 165.204.84.221) smtp.mailfrom=amd.com; alien8.de; dkim=none (message not signed) header.d=none;alien8.de; dmarc=permerror action=none header.from=amd.com; X-WSS-ID: 0NWV0W8-07-5Z8-02 X-M-MSG: Date: Tue, 27 Oct 2015 11:36:36 +0800 From: Huang Rui To: Guenter Roeck CC: Borislav Petkov , Peter Zijlstra , "Jean Delvare" , Andy Lutomirski , "Andreas Herrmann" , Thomas Gleixner , Ingo Molnar , "Rafael J. Wysocki" , "Len Brown" , John Stultz , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , "lm-sensors@lm-sensors.org" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , Andreas Herrmann , "Gopalakrishnan, Aravind" , Borislav Petkov , "Fengguang Wu" , Aaron Lu , "Li, Tony" Subject: Re: [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Message-ID: <20151027033636.GG8036@hr-amur2> References: <1445308109-17970-1-git-send-email-ray.huang@amd.com> <1445308109-17970-8-git-send-email-ray.huang@amd.com> <562A424B.2070108@roeck-us.net> MIME-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Disposition: inline In-Reply-To: <562A424B.2070108@roeck-us.net> User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: 8BIT X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:165.204.84.221;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(428002)(479174004)(24454002)(199003)(164054003)(189002)(377454003)(105586002)(92566002)(4001350100001)(19580405001)(5007970100001)(23696002)(50466002)(11100500001)(5008740100001)(77096005)(2950100001)(19580395003)(97736004)(86362001)(47776003)(33716001)(50986999)(101416001)(110136002)(47136003)(87936001)(54356999)(189998001)(106466001)(48336001)(83506001)(76176999)(33656002)(107986001);DIR:OUT;SFP:1101;SCL:1;SRVR:CY1PR12MB0853;H:atltwp01.amd.com;FPR:;SPF:None;PTR:InfoDomainNonexistent;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;CY1PR12MB0853;2:kNrq2OmWLB6wFSnIFQsyeWgHdX1vgOZw7KY7muQ87mFaInfviuDhUbY7QYcUQdIl8gi+0W08xmhDAWYLuOSB8LblbfzYtx1dBU2chBEYq6UETayy5AAz9qx+JaQViFkeU+pSWztYianIKRgr8OW+sMPAuA/8BKqobqmVklFb5J4=;3:CRKW56A5PE7vmiWQPoya/bJedSPldvmRiNSPSd3B0YwwbSiit4Stj6ct+WGo6q6oIeiLM/k7z80Zk/9sqqJ0U2mhUc/uhT5srq1JHo0fwpeZmtLJvfxr38jwOy18YOP3tZPiQfdJrruLxNve2xaObHlRQOWiC/1uNdGatTJvl54cvNhICg7nrHKQKsvUjgv+ggTUHsX4n0wp4Wi70Gyz2mfwTjyEXmuhhFlOZxaCX1iVtwdw++60aeucCB3CBSCJ;25:lPGt5/wKTzADImMouNHYTESDcXmlKXKahKjmxJpLZg44+Gh0AwRXNroGMjNyJ8BQWmJgDgbukpb1SP0gD326gyugm3jrf6bz84kE64/wVXgo4/hQsN1fvqIDsGcxxpGiaGyzH/ojX0t8fjHEK5u/j8Hl+t+dTtmDYDyw5V53r39vxhsvFKEagELE8r/8zVwTkfBdY5V/RtjpMd79UbviJ8IAOey/CBvLFDrgJMQyT0SXaiFBn3RQkkgwQccOrQofa+ngvlUSjGd6VpjHVKE5Lg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR12MB0853; X-Microsoft-Exchange-Diagnostics: 1;CY1PR12MB0853;20:BWSbq8OFeWXly5ePDEsiL1A1TCzLFQGPe0RWXTocDJuPGtSORu4XlnMwrl9NkSfhzdeAy6XrHaB+H5WTbjRSTidMgM0ng7/WfkZQofxb6H+M2C9CK08f3sYK7rkLT/uNy31pl1cZA/B9W/NYZU1g1qnKW/igT+dursXWM4Y59aD0ZFbtRJ52HAja4OtQ3fDPHMXXmd+6LWxAc0co4p772R8WGoMMwsio+XyXXp7KI8FW0F8GgP0SBErj71yC9H6WrRzZZhW/J8JpPvO366EM/9Zn80VBAF3c5erU3/xYUES0VspWQGPh+X9iZIoOo2lIMoSVSh1SRTr3sFJDnkh1PKcN2XYvDlsw3DFVmgAFd9cioA0vzTYomeixcWwlNzEOgQKm9qz8rEg+zmkRXLhZSqRMY1VE/yOtK6C8r1te3fU0JN0bp8rgmCy4sr7X7aT3pXn7BtuDnQ9KDxqF6lN1/u442k7DBdWQT0vBtW5HiCNeMB5nPY8uS4f919X2KEUY;4:82V67K/JJHJAWKvjVZPIlTwkI/NT+XAD0uKJJti1biHAhHcw3ZzeJ/AF3MKoGtB13TrUJce3JMOuZunAl+mAXcJftrUgrPjhb7WMAqJr91alL3sDUL3GWV97sMHzQXHzlKTY7DXC4GXbgF5dAmtAM07tMqZmqdq8evGJCkGvs8wMgZQRkJJ5Otw3ZQGTBwJxe6/qTWIeyiqS5bEJE78KeKDrxEnmeCWRhgxjnY5XRMQnHyhmdsqz9qdjRRJqSZuuSLOxt9OgiTOLirGIBttjSHkDC0NZ2dUQv2lpI2j8DIQkJ8x3hA6wh1kEsE09fqRk9QERgpQkm0QG6sykdg5aV5xMMWp6YQuiwMdg+CsMxch/TdKDE5kkFRlaHWqeQ72G X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(767451399110); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(520078)(8121501046)(3002001)(102215026);SRVR:CY1PR12MB0853;BCL:0;PCL:0;RULEID:;SRVR:CY1PR12MB0853; X-Forefront-PRVS: 0742443479 X-Microsoft-Exchange-Diagnostics: =?gb2312?B?MTtDWTFQUjEyTUIwODUzOzIzOjgzempvM1BNWTFWa3dZMy9mODQ1UnZWWDJH?= =?gb2312?B?Z2ZORVpKVFE3Mm02d3NaRlBPTXo2aHRMMXUvVStmWWV0Nzh3clJtRHZpSnUx?= =?gb2312?B?ZHF0U2U0a29mbmJjM3JSdnlKMDlZZFcxZUpNWU1BQzhlS3pOVHFUSlVwNTRv?= =?gb2312?B?OElEUmNXZWQ4WlF2VEQ3VWI2MFJjMVc3aVRUQTU5SDR4ZFlXKy8yK1hyOThI?= =?gb2312?B?a2wrYTJTSVR2YUF1NTBRTXhKbVpTY0FhYkJmR1BEYUNMU3NTMWIzakVtWXJL?= =?gb2312?B?b0pyUmtKU2pIUTlHVVEwRldrWGJJR2RtUThPc0oyT0ZHQ3RZQjQ3V2xQTjhY?= =?gb2312?B?VXl0L0pxMUM5WVRuZ2IxWVVpWWRCUFlodEY1U29GdDdhcmhEc0c5bXhSZlN6?= =?gb2312?B?cFJtTDc3Vk0xNkRQYmxQVlo5cWRVc3k1bDNKNTZVMzhwNUR3ZVppbTZ1d3M5?= =?gb2312?B?RzNyWjhyaHFBRWFLOTBmTEc5K3hTbFY1QjZScmVwTExyWFJOZnlNd2VYalBi?= =?gb2312?B?elBOcGtwRjF5bHU4R0l0WnNFVS9FSnBXSStzNDBoY3lIcSt1UzgyQ0YyUnZH?= =?gb2312?B?a01uRkR4ejM3Ti9Xci8zMFpNTlE1U0RES3M3YllUYTNuaktPZ1pNd3JVS01D?= =?gb2312?B?c3BqQW9PYy9wYmJCTmovOFVFYUErb0JYMjVpeGFMdlU2TkFnVmlma1F0VTFJ?= =?gb2312?B?elBwbmhiVzRVNEMyVVczTHVURVRZWWFiL1ErZk5jZ1JsRy91Nk9QcVk2Z2x4?= =?gb2312?B?M1EwV1VVOVN5NW9GNGtVSlJNK0N0Rm9KVGN4YU04VXVhdkg1ZVVKNDczeWVY?= =?gb2312?B?OVliVUNibUhHQ0I2VFYxbWFnTUJwek5hRzZBVGZCZlE0K3hKS3lKb0Z1RU9I?= =?gb2312?B?L0RsTElzZC85bVlwamFlQUlmVHBGbS9UWHlTcUpQdjIvck9Qa3ZBYmIrV0d4?= =?gb2312?B?SGxtN1poL0lzT1E3R0JxSE82K0cwbEFMQUtoR3EzZkJkTVJ1WDJ4Z3RwUFFx?= =?gb2312?B?S0l2cWhMYmJzUG5ocEtqV0UzZ3ROTWMrdk5zQWlZeGJwR0lrUHFFVU1aQzhD?= =?gb2312?B?TU95cEliY3NxcmhqT21HZlNsOTE2WXR5TUk0RkROTGZmNE5zZ2M0VnlvUWxY?= =?gb2312?B?ZEhRVDIxbVBWU2hBR1N3b05HUGh0NE1tczU3MTRPY2F6Um5ydVJ4TlIyM0xq?= =?gb2312?B?TEg2YlBRdUh5cURaR3F5bTlldU5lajh3bzd5c0NnbjVtdGZhRFU4UkRaYytX?= =?gb2312?B?cUpiY2VjMXVFKytEbUpML1E0L3Flbkd3NW9iNG9LZTVWVjZtZWtrNEN4VVVH?= =?gb2312?B?T1hpWHRkTGhDYjByb29CME1BTk1wRVg2b29lbnVNTHdzbTVFVFBoTzQwNldS?= =?gb2312?B?WkI0aWszVUs3WUNyQjUwUElmeFJac2hLa3Q5TWNnPT0=?= X-Microsoft-Exchange-Diagnostics: 1;CY1PR12MB0853;5:E0eizW38YlWRDY9C/Ak7AURmsrEqSe6k2hCgLKrK/U8w3WwDuqdi+8AaC6OvkYq3CKObrRNoap17dHLJLgEEee4r2/elhG+gBwZxMA6Wfj2eeRt3aRs9wWvD6TM7MpiN8KxLdpok8NqhvspJH6MWHg==;24:giBsxqTpFZarx7NgmmNQVuOhjN3Maz+NxR5KdZH1LINQEyabmanU/1O+KDvntQnjqfNUpK0176f9RzwtJysFyWuADrx6LHmSoFfPdJBQv3Y=;20:c85kU/fRzeNk8tEMUJCQwLGu/CigCaQ0rgEGhb5ykGtuhwIHnvlm9JNxDBv55eQE67Kn1BVeunwETMDbVD4pV7LdApqprpDAVY+Gmio1vufl4knYZwGmeTWjcYCC3oP7B7r8OcGS/++m+58N/jhiqASt+iuUO+KrtBdrHP+VliQmmgZwT7YNBBlRREVekyUIp8qJqulFueI+sHSSUHOLa6VRKO+S/E6SgxzlQt1G692o1JU9p1SLjMDADsAR6vvU SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Oct 2015 03:40:57.5044 (UTC) X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.221];Helo=[atltwp01.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR12MB0853 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 23, 2015 at 10:20:59PM +0800, Guenter Roeck wrote: > On 10/19/2015 07:28 PM, Huang Rui wrote: > > This patch introduces an algorithm that computes the average power by > > reading a delta value of ¡°core power accumulator¡± register during > > measurement interval, and then dividing delta value by the length of > > the time interval. > > > > User is able to use power1_average entry to measure the processor power > > consumption and power1_average_interval entry to set the interval. > > > > A simple example: > > > > ray@hr-ub:~/tip$ sensors > > fam15h_power-pci-00c4 > > Adapter: PCI adapter > > power1: 23.73 mW (avg = 634.63 mW, interval = 0.01 s) > > (crit = 15.00 W) > > > > ... > > > > The result is current average processor power consumption in 10 > > millisecond. The unit of the result is uWatt. > > > > Suggested-by: Guenter Roeck > > Signed-off-by: Huang Rui > > Cc: Borislav Petkov > > Cc: Peter Zijlstra > > Cc: Ingo Molnar > > --- > > drivers/hwmon/fam15h_power.c | 120 +++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 120 insertions(+) > > > > diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c > > index 6321f73..a5a539e 100644 > > --- a/drivers/hwmon/fam15h_power.c > > +++ b/drivers/hwmon/fam15h_power.c > > @@ -26,6 +26,9 @@ > > #include > > #include > > #include > > +#include > > +#include > > +#include > > #include > > #include > > > > @@ -64,6 +67,10 @@ struct fam15h_power_data { > > u64 cu_acc_power[MAX_CUS]; > > /* performance timestamp counter */ > > u64 cpu_sw_pwr_ptsc[MAX_CUS]; > > + /* online/offline status of current compute unit */ > > + int cu_on[MAX_CUS]; > > + unsigned long power_period; > > + struct mutex acc_pwr_mutex; > > Can you elaborate a bit about what this mutex is supposed to protect ? > To me it seems that it doesn't really protect anything. > My orignal intention is to avoid race condition from user space. Actually you know, show_power_acc, acc_set_power_period will expose to application layer. > > }; > > > > static ssize_t show_power(struct device *dev, > > @@ -132,11 +139,15 @@ static void do_read_registers_on_cu(void *_data) > > cores_per_cu = amd_get_cores_per_cu(); > > cu = cpu / cores_per_cu; > > > > + mutex_lock(&data->acc_pwr_mutex); > > WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR, > > &data->cu_acc_power[cu])); > > > > WARN_ON(rdmsrl_safe(MSR_F15H_PTSC, > > &data->cpu_sw_pwr_ptsc[cu])); > > + > > + data->cu_on[cu] = 1; > > + mutex_unlock(&data->acc_pwr_mutex); > > ... for example, while this protects cu_on[cu], > > > } > > > > static int read_registers(struct fam15h_power_data *data) > > @@ -148,6 +159,10 @@ static int read_registers(struct fam15h_power_data *data) > > cores_per_cu = amd_get_cores_per_cu(); > > cu_num = boot_cpu_data.x86_max_cores / cores_per_cu; > > > > + mutex_lock(&data->acc_pwr_mutex); > > + memset(data->cu_on, 0, sizeof(int) * MAX_CUS); > > + mutex_unlock(&data->acc_pwr_mutex); > > ... this code may well overwrite that same value. > Yes, but I only need the compute unit status of the "sencond" time of read_registers. Then ignore the offline compute unit to avoid hotplug issue. > > + > > WARN_ON_ONCE(cu_num > MAX_CUS); > > > > ret = zalloc_cpumask_var(&mask, GFP_KERNEL); > > @@ -184,18 +199,113 @@ static int read_registers(struct fam15h_power_data *data) > > return 0; > > } > > > > +static ssize_t acc_show_power(struct device *dev, > > + struct device_attribute *attr, > > + char *buf) > > +{ > > + struct fam15h_power_data *data = dev_get_drvdata(dev); > > + u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS], > > + jdelta[MAX_CUS]; > > + u64 tdelta, avg_acc; > > + int cu, cu_num, cores_per_cu, ret; > > + signed long leftover; > > + > > + cores_per_cu = amd_get_cores_per_cu(); > > + cu_num = boot_cpu_data.x86_max_cores / cores_per_cu; > > + > > + ret = read_registers(data); > > + if (ret) > > + return 0; The first time of read_registers > > + > > + cu = 0; > > + while(cu++ < cu_num) { > > + prev_cu_acc_power[cu] = data->cu_acc_power[cu]; > > + prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu]; > > + } > > ... and multiple parallel reads on the power attribute must produce > pretty random values, unless I am really missing something. > > > + > > + leftover = schedule_timeout_interruptible( > > + msecs_to_jiffies(data->power_period) > > + ); > > + if (leftover) > > + return 0; > > + > > + ret = read_registers(data); The second time of read_registers > > + if (ret) > > + return 0; > > + > With a 10ms period, I wonder how accurate this really is. > According to the HW description, the measurement interval could be on the order of several milliseconds. That should get a stable average power value. > > + for (cu = 0, avg_acc = 0; cu < cu_num; cu++) { > > + /* check if current compute unit is online */ > > + if (data->cu_on[cu] == 0) > > + continue; > > + > > + if (data->cu_acc_power[cu] < prev_cu_acc_power[cu]) { > > + jdelta[cu] = data->max_cu_acc_power + data->cu_acc_power[cu]; > > + jdelta[cu] -= prev_cu_acc_power[cu]; > > + } else { > > + jdelta[cu] = data->cu_acc_power[cu] - prev_cu_acc_power[cu]; > > + } > > + tdelta = data->cpu_sw_pwr_ptsc[cu] - prev_ptsc[cu]; > > + jdelta[cu] *= data->cpu_pwr_sample_ratio * 1000; > > + do_div(jdelta[cu], tdelta); > > + > > + /* the unit is microWatt */ > > + avg_acc += jdelta[cu]; > > + } > > + > > + return sprintf(buf, "%llu\n", (unsigned long long)avg_acc); > > +} > > +static DEVICE_ATTR(power1_average, S_IRUGO, acc_show_power, NULL); > > + > > + > > +static ssize_t acc_show_power_period(struct device *dev, > > + struct device_attribute *attr, > > + char *buf) > > +{ > > + struct fam15h_power_data *data = dev_get_drvdata(dev); > > + > > + return sprintf(buf, "%lu\n", data->power_period); > > +} > > + > > +static ssize_t acc_set_power_period(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t count) > > +{ > > + struct fam15h_power_data *data = dev_get_drvdata(dev); > > + unsigned long temp; > > + int ret; > > + > > + ret = kstrtoul(buf, 10, &temp); > > + if (ret) > > + return ret; > > + > > + mutex_lock(&data->acc_pwr_mutex); > > + data->power_period = temp; > > + mutex_unlock(&data->acc_pwr_mutex); > > This doesn't really protect anything either except that power_period > can not be updated while the lock is active. But the code using > power_period does not run under mutex protection, so that seems pretty > pointless. > I thought about carefully, the mutex here didn't protect the variable. Actually, my variables such as data->cu_acc_power[cu], data->cu_on[cu] are independent for different cpu cores. They cannot be updated on different threads at the same time. Different threads just only updates values on different cu(index). > Also, this accepts an unlimited timeout. If I understand correctly, > setting the timeout to, say, 10 seconds will cause the read function > to hang for that period of time. Setting it to 1 hour will cause the read > function to hang for 1 hour. > > Does this really make sense ? > You're right. If set a long interval, the mutex will be hold a long time, that causes cpu looks stuck... So should I set a "power1_average_max"? Thanks, Rui