From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756285AbbBEBOL (ORCPT ); Wed, 4 Feb 2015 20:14:11 -0500 Received: from smtp.codeaurora.org ([198.145.11.231]:35369 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751089AbbBEBOJ (ORCPT ); Wed, 4 Feb 2015 20:14:09 -0500 Date: Wed, 4 Feb 2015 17:14:07 -0800 From: Stephen Boyd To: Daniel Thompson Cc: Thomas Gleixner , John Stultz , linux-kernel@vger.kernel.org, patches@linaro.org, linaro-kernel@lists.linaro.org, Sumit Semwal , Steven Rostedt , Russell King , Will Deacon , Catalin Marinas Subject: Re: [PATCH v3 2/4] sched_clock: Optimize cache line usage Message-ID: <20150205011407.GB30372@codeaurora.org> References: <1421859236-19782-1-git-send-email-daniel.thompson@linaro.org> <1422644602-11953-1-git-send-email-daniel.thompson@linaro.org> <1422644602-11953-3-git-send-email-daniel.thompson@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1422644602-11953-3-git-send-email-daniel.thompson@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/30, Daniel Thompson wrote: > diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c > index 3d21a8719444..cb69a47dfee4 100644 > --- a/kernel/time/sched_clock.c > +++ b/kernel/time/sched_clock.c > @@ -18,28 +18,44 @@ > #include > #include > > -struct clock_data { > - ktime_t wrap_kt; > +/** > + * struct clock_read_data - data required to read from sched_clock > + * Nitpick: Won't kernel-doc complain that members aren't documented? > + * Care must be taken when updating this structure; it is read by > + * some very hot code paths. It occupies <=48 bytes and, when combined > + * with the seqcount used to synchronize access, comfortably fits into > + * a 64 byte cache line. > + */ > +struct clock_read_data { > u64 epoch_ns; > u64 epoch_cyc; > - seqcount_t seq; > - unsigned long rate; > + u64 sched_clock_mask; > + u64 (*read_sched_clock)(void); > u32 mult; > u32 shift; > bool suspended; > }; > > +/** > + * struct clock_data - all data needed for sched_clock (including > + * registration of a new clock source) > + * Same comment. > + * The ordering of this structure has been chosen to optimize cache > + * performance. In particular seq and read_data (combined) should fit > + * into a single 64 byte cache line. > + */ > +struct clock_data { > + seqcount_t seq; > + struct clock_read_data read_data; > + ktime_t wrap_kt; > + unsigned long rate; > +}; > @@ -60,15 +79,16 @@ unsigned long long notrace sched_clock(void) > { > u64 cyc, res; > unsigned long seq; > + struct clock_read_data *rd = &cd.read_data; > > do { > seq = raw_read_seqcount_begin(&cd.seq); > > - res = cd.epoch_ns; > - if (!cd.suspended) { > - cyc = read_sched_clock(); > - cyc = (cyc - cd.epoch_cyc) & sched_clock_mask; > - res += cyc_to_ns(cyc, cd.mult, cd.shift); > + res = rd->epoch_ns; > + if (!rd->suspended) { Should this have likely() treatment? It would be really nice if we could use static branches here to avoid any branch penalty at all. I guess that would need some sort of special cased stop_machine() though. Or I wonder if we could replace rd->read_sched_clock() with a dumb function that returns cd.epoch_cyc so that the math turns out to be 0? > + cyc = rd->read_sched_clock(); > + cyc = (cyc - rd->epoch_cyc) & rd->sched_clock_mask; > + res += cyc_to_ns(cyc, rd->mult, rd->shift); > } > } while (read_seqcount_retry(&cd.seq, seq)); > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project