From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754540AbbCBP5E (ORCPT <rfc822;w@1wt.eu>);
	Mon, 2 Mar 2015 10:57:04 -0500
Received: from mail-wi0-f172.google.com ([209.85.212.172]:42616 "EHLO
	mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753784AbbCBP5A (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 2 Mar 2015 10:57:00 -0500
From: Daniel Thompson <daniel.thompson@linaro.org>
To: Thomas Gleixner <tglx@linutronix.de>, John Stultz <john.stultz@linaro.org>
Cc: Daniel Thompson <daniel.thompson@linaro.org>, linux-kernel@vger.kernel.org,
        patches@linaro.org, linaro-kernel@lists.linaro.org,
        Sumit Semwal <sumit.semwal@linaro.org>,
        Stephen Boyd <sboyd@codeaurora.org>,
        Steven Rostedt <rostedt@goodmis.org>
Subject: [PATCH v5 0/5] sched_clock: Optimize and avoid deadlock during read from NMI
Date: Mon,  2 Mar 2015 15:56:39 +0000
Message-Id: <1425311804-3392-1-git-send-email-daniel.thompson@linaro.org>
X-Mailer: git-send-email 2.1.0
In-Reply-To: <1421859236-19782-1-git-send-email-daniel.thompson@linaro.org>
References: <1421859236-19782-1-git-send-email-daniel.thompson@linaro.org>
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

This patchset optimizes the generic sched_clock implementation by
removing branches and significantly reducing the data cache profile. It
also makes it safe to call sched_clock() from NMI (or FIQ on ARM).

The data cache profile of sched_clock() in the original code is
somewhere between 2 and 3 (64-byte) cache lines, depending on alignment
of struct clock_data. After patching, the cache profile for the normal
case should be a single cacheline.

NMI safety was tested on i.MX6 with perf drowning the system in FIQs and
using the perf handler to check that sched_clock() returned monotonic
values. At the same time I forcefully reduced kt_wrap so that
update_sched_clock() is being called at >1000Hz.

Without the patches the above system is grossly unstable, surviving
[9K,115K,25K] perf event cycles during three separate runs. With the
patch I ran for over 9M perf event cycles before getting bored.

Performance testing has primarily been performed using a simple
tight loop test (i.e. one that is unlikely to benefit from the
cache profile improvements). Summary results show benefit on all
CPUs although magnitude varies significantly:

  Cortex A9 @ 792MHz	 4.1% speedup
  Cortex A9 @ 1GHz	 0.4% speedup  (different SoC to above)
  Scorpian		13.6% speedup
  Krait			35.1% speedup
  Cortex A53 @ 1GHz	 1.6% speedup
  Cortex A57 @ 1GHz	 5.0% speedup

Benchmarking was done by Stephen Boyd and myself, full data for the
above summaries can be found here:
https://docs.google.com/spreadsheets/d/1Zd2xN42U4oAVZcArqAYdAWgFI5oDFRysURCSYNmBpZA/edit?usp=sharing

v5:
* Summarized benchmark results in the patchset cover letter and
  added some Reviewed-by:s.
* Rebased on 4.0-rc1.

v4:
* Optimized sched_clock() to be branchless by introducing a dummy
  function to provide clock values while the clock is suspended
  (Stephen Boyd).
* Improved commenting, including the kerneldoc comments (Stephen Boyd).
* Removed a redundant notrace from the update logic (Steven Rostedt).

v3:
* Optimized to minimise cache profile, including elimination of
  the suspended flag (Thomas Gleixner).
* Replaced the update_bank_begin/end with a single update function
  (Thomas Gleixner).
* Split into multiple patches to aid review.

v2:
* Extended the scope of the read lock in sched_clock() so we can bank
  all data consumed there (John Stultz)


Daniel Thompson (5):
  sched_clock: Match scope of read and write seqcounts
  sched_clock: Optimize cache line usage
  sched_clock: Remove suspend from clock_read_data
  sched_clock: Remove redundant notrace from update function
  sched_clock: Avoid deadlock during read from NMI

 kernel/time/sched_clock.c | 195 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 138 insertions(+), 57 deletions(-)

--
2.1.0