From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751946AbdBCSJq (ORCPT ); Fri, 3 Feb 2017 13:09:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56130 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750863AbdBCSEI (ORCPT ); Fri, 3 Feb 2017 13:04:08 -0500 From: Waiman Long To: Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Jonathan Corbet Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Arnaldo Carvalho de Melo , Davidlohr Bueso , Mike Galbraith , Scott J Norton , Waiman Long Subject: [PATCH-tip v5 00/21] futex: Introducing throughput-optimized (TP) futexes Date: Fri, 3 Feb 2017 13:03:33 -0500 Message-Id: <1486145034-22210-1-git-send-email-longman@redhat.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Fri, 03 Feb 2017 18:04:08 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org v4->v5: - Fix 0-day kernel build test compilation warnings. - Extract out non-TP futexes futex-mutex and futex-rwlock microbenchmarks as separate patches, as suggested by Arnaldo. - Rebased to the latest tip tree & use running_clock() instead of sched_clock(). v3->v4: - Properly handle EFAULT error due to page fault. - Extend the use cases of TP futexes to userspace rwlock. - Change the lock handoff trigger from number of spinnings to elapsed time (5ms). - Various updates to the "perf bench futex mutex" microbenchmark. - Add a new "perf bench futex rwlock" microbenchmark for measuring rwlock performance. - Add mutex_owner to futex_state object to hold the serialization mutex owner. - Streamline a number of helper functions and other miscellenous coding improvements. - Rebase the patchset to the 4.10 kernel. v2->v3: - Use the abbreviation TP for the new futexes instead of TO. - Make a number of changes accordingly to review comments from ThomasG, PeterZ and MikeG. - Breaks the main futex patch into smaller pieces to make them easier to review. v1->v2: - Adds an explicit lock hand-off mechanism. - Adds timeout support. - Simplifies the required userspace code. - Fixes a number of problems in the v1 code. This patchset introduces a new futex implementation called throughput-optimized (TP) futexes. It is similar to PI futexes in its calling convention, but provides better throughput than the wait-wake (WW) futexes by encouraging lock stealing and optimistic spinning. The new TP futexes can be used in implementing both userspace mutexes and rwlocks. They provides better performance while simplifying the userspace locking implementation at the same time. The WW futexes are still needed to implement other synchronization primitives like conditional variables and semaphores that cannot be handled by the TP futexes. Another advantage of TP futexes is that it has a built-in lock handoff mechanism to prevent lock starvation from happenning as long as the underlying kernel mutex doesn't have lock starvation problem. Patches 1-2 implements userspace exclusive lock and read/write lock benchmark in perf bench using futexes. Patches 3-8 are preparatory patches that pave the way to implement the TP futexes. Patch 9 implements the basic TP futex that can support userspace mutexes. Patch 10 adds robust handling to TP futex to handle the death of TP futex exclusive lock owners. Patch 11 adds a lock hand-off mechanism that can prevent lock starvation to happen while introducing minimal runtime overhead. Patch 12 enables the FUTEX_LOCK futex(2) syscall to return status information, such as how the lock is acquired and how many time the task needs to sleep in the spinning loop, that can be used by userspace utilities to monitor how the TP futexes are performing. Patch 13 enables userspace applications to supply a timeout value to abort the lock acquisition attempt after the specified time period has passed. Patch 14 adds a new document tp-futex.txt in the Documentation directory to describe the new TP futexes. Patch 15 extends the TP futexes to support userspace rwlocks. Patch 16 enables more reader lock stealing of TP futexes in the kernel. Patch 17 groups readers together as a spin group to enhance reader throughput. Patch 18 updates the tp-futex.txt file to add information about rwlock support. Patch 19 extends the perf bench futex locking benchmark to include variants using the new TP futexes. Patch 20 updates the wake_up_q() function to returns the number woken tasks. Patch 21 enables the dumping of internal futex state information via debugfs. All the benchmark results shown in the change logs were produced by the microbenchmarks included in this patchset. So everyone can run the microbenchmark to see how the TP futexes perform and behave in their own test systems. Once this patchset is finalized, an updated manpage patch to document the new TP futexes will be sent out. The next step will then be to make Glibc NTPL use the new TP futexes. Performance highlight in term of average locking rates (ops/sec) on a 2-socket system are as follows: WW futex TP futex Glibc -------- -------- ----- mutex 121,129 152,242 - rwlock 99,149 162,887 30,362 Patches 9 and 17 contain more detailed information about the performance characteristics of the TP futexes when implementing userspace mutex and rwlock respectively when compared with other possible way of doing so via the wait-wake futexes. Waiman Long (21): perf bench: New microbenchmark for userspace mutex performance perf bench: New microbenchmark for userspace rwlock performance futex: Consolidate duplicated timer setup code futex: Rename futex_pi_state to futex_state futex: Add helpers to get & cmpxchg futex value without lock futex: Consolidate pure pi_state_list add & delete codes to helpers futex: Add a new futex type field into futex_state futex: Allow direct attachment of futex_state objects to hash bucket futex: Introduce throughput-optimized (TP) futexes TP-futex: Enable robust handling TP-futex: Implement lock handoff to prevent lock starvation TP-futex: Return status code on FUTEX_LOCK calls TP-futex: Add timeout support TP-futex, doc: Add TP futexes documentation TP-futex: Support userspace reader/writer locks TP-futex: Enable kernel reader lock stealing TP-futex: Group readers together in wait queue TP-futex, doc: Update TP futexes document on shared locking perf bench: Extend mutex/rwlock futex suite to test TP futexes sched, TP-futex: Make wake_up_q() return wakeup count futex: Dump internal futex state via debugfs Documentation/00-INDEX | 2 + Documentation/tp-futex.txt | 284 ++++++ include/linux/sched.h | 4 +- include/linux/sched/wake_q.h | 2 +- include/uapi/linux/futex.h | 32 +- kernel/futex.c | 1412 +++++++++++++++++++++++--- kernel/sched/core.c | 6 +- tools/perf/Documentation/perf-bench.txt | 5 + tools/perf/bench/Build | 1 + tools/perf/bench/bench.h | 2 + tools/perf/bench/futex-locks.c | 1654 +++++++++++++++++++++++++++++++ tools/perf/bench/futex.h | 43 + tools/perf/builtin-bench.c | 11 + tools/perf/check-headers.sh | 4 + 14 files changed, 3341 insertions(+), 121 deletions(-) create mode 100644 Documentation/tp-futex.txt create mode 100644 tools/perf/bench/futex-locks.c -- 1.8.3.1