From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966911AbcA1P3A (ORCPT ); Thu, 28 Jan 2016 10:29:00 -0500 Received: from casper.infradead.org ([85.118.1.10]:42374 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965759AbcA1P2z (ORCPT ); Thu, 28 Jan 2016 10:28:55 -0500 Date: Thu, 28 Jan 2016 16:28:48 +0100 From: Peter Zijlstra To: Borislav Petkov Cc: Huang Rui , Borislav Petkov , Ingo Molnar , Andy Lutomirski , Thomas Gleixner , Robert Richter , Jacob Shin , John Stultz , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , linux-kernel@vger.kernel.org, spg_linux_kernel@amd.com, x86@kernel.org, Guenter Roeck , Andreas Herrmann , Suravee Suthikulpanit , Aravind Gopalakrishnan , Fengguang Wu , Aaron Lu Subject: Re: [PATCH v4] perf/x86/amd/power: Add AMD accumulated power reporting mechanism Message-ID: <20160128152848.GT6356@twins.programming.kicks-ass.net> References: <1453963131-2013-1-git-send-email-ray.huang@amd.com> <20160128090314.GB14274@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160128090314.GB14274@pd.tnic> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 28, 2016 at 10:03:15AM +0100, Borislav Petkov wrote: > + > +struct power_pmu { > + raw_spinlock_t lock; Now that the list is gone, what does this thing protect? > + struct pmu *pmu; This member seems superfluous, there's only the one possible value. > + local64_t cpu_sw_pwr_ptsc; > + > + /* > + * These two cpumasks are used for avoiding the allocations on the > + * CPU_STARTING phase because power_cpu_prepare() will be called with > + * IRQs disabled. > + */ > + cpumask_var_t mask; > + cpumask_var_t tmp_mask; > +}; > + > +static struct pmu pmu_class; > + > +/* > + * Accumulated power represents the sum of each compute unit's (CU) power > + * consumption. On any core of each CU we read the total accumulated power from > + * MSR_F15H_CU_PWR_ACCUMULATOR. cpu_mask represents CPU bit map of all cores > + * which are picked to measure the power for the CUs they belong to. > + */ > +static cpumask_t cpu_mask; > + > +static DEFINE_PER_CPU(struct power_pmu *, amd_power_pmu); > + > +static u64 event_update(struct perf_event *event, struct power_pmu *pmu) > +{ Is there ever a case where @pmu != __this_cpu_read(power_pmu) ? > + struct hw_perf_event *hwc = &event->hw; > + u64 prev_raw_count, new_raw_count, prev_ptsc, new_ptsc; > + u64 delta, tdelta; > + > +again: > + prev_raw_count = local64_read(&hwc->prev_count); > + prev_ptsc = local64_read(&pmu->cpu_sw_pwr_ptsc); > + rdmsrl(event->hw.event_base, new_raw_count); Is hw.event_base != MSR_F15H_CU_PWR_ACCUMULATOR possible? > + rdmsrl(MSR_F15H_PTSC, new_ptsc); Also, I suspect this doesn't do what you expect it to do. We measure per-event PWR_ACC deltas, but per CPU PTSC values. These do not match when there's more than 1 event on the CPU. I would suggest adding a new struct to the hw_perf_event union with the two u64 deltas like: struct { /* amd_power */ u64 pwr_acc; u64 ptsc; }; And track these values per-event. > + > + if (local64_cmpxchg(&hwc->prev_count, prev_raw_count, > + new_raw_count) != prev_raw_count) { > + cpu_relax(); > + goto again; > + } > + > + /* > + * Calculate the CU power consumption over a time period, the unit of > + * final value (delta) is micro-Watts. Then add it to the event count. > + */ > + if (new_raw_count < prev_raw_count) { > + delta = max_cu_acc_power + new_raw_count; > + delta -= prev_raw_count; > + } else > + delta = new_raw_count - prev_raw_count; > + > + delta *= cpu_pwr_sample_ratio * 1000; > + tdelta = new_ptsc - prev_ptsc; > + > + do_div(delta, tdelta); > + local64_add(delta, &event->count); Then this division can be redone on the total values, that looses less precision over-all. > + > + return new_raw_count; > +}