All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
@ 2016-08-23 12:46 Peter Zijlstra
  2016-08-23 12:46 ` [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 12:46 UTC (permalink / raw)
  To: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low, Peter Zijlstra

... might eat your pets and set your house on fire ...

But they seem to boot and build kernels on my x86_64.

---
 arch/alpha/include/asm/mutex.h      |   9 --
 arch/arc/include/asm/mutex.h        |  18 ---
 arch/arm/include/asm/mutex.h        |  21 ---
 arch/arm64/include/asm/Kbuild       |   1 -
 arch/avr32/include/asm/mutex.h      |   9 --
 arch/blackfin/include/asm/Kbuild    |   1 -
 arch/c6x/include/asm/mutex.h        |   6 -
 arch/cris/include/asm/mutex.h       |   9 --
 arch/frv/include/asm/mutex.h        |   9 --
 arch/h8300/include/asm/mutex.h      |   9 --
 arch/hexagon/include/asm/mutex.h    |   8 -
 arch/ia64/include/asm/mutex.h       |  90 -----------
 arch/m32r/include/asm/mutex.h       |   9 --
 arch/m68k/include/asm/Kbuild        |   1 -
 arch/metag/include/asm/Kbuild       |   1 -
 arch/microblaze/include/asm/mutex.h |   1 -
 arch/mips/include/asm/Kbuild        |   1 -
 arch/mn10300/include/asm/mutex.h    |  16 --
 arch/nios2/include/asm/mutex.h      |   1 -
 arch/openrisc/include/asm/mutex.h   |  27 ----
 arch/parisc/include/asm/Kbuild      |   1 -
 arch/powerpc/include/asm/mutex.h    | 132 ---------------
 arch/s390/include/asm/mutex.h       |   9 --
 arch/score/include/asm/mutex.h      |   6 -
 arch/sh/include/asm/mutex-llsc.h    | 109 -------------
 arch/sh/include/asm/mutex.h         |  12 --
 arch/sparc/include/asm/Kbuild       |   1 -
 arch/tile/include/asm/Kbuild        |   1 -
 arch/um/include/asm/Kbuild          |   1 -
 arch/unicore32/include/asm/mutex.h  |  20 ---
 arch/x86/include/asm/mutex.h        |   5 -
 arch/x86/include/asm/mutex_32.h     | 110 -------------
 arch/x86/include/asm/mutex_64.h     | 127 ---------------
 arch/xtensa/include/asm/mutex.h     |   9 --
 include/asm-generic/mutex-dec.h     |  88 ----------
 include/asm-generic/mutex-null.h    |  19 ---
 include/asm-generic/mutex-xchg.h    | 120 --------------
 include/asm-generic/mutex.h         |   9 --
 include/linux/mutex-debug.h         |  24 ---
 include/linux/mutex.h               |  44 +++--
 kernel/Kconfig.locks                |   2 +-
 kernel/locking/mutex-debug.c        |  13 --
 kernel/locking/mutex-debug.h        |  10 --
 kernel/locking/mutex.c              | 311 +++++++++++++++---------------------
 kernel/locking/mutex.h              |  26 ---
 kernel/sched/core.c                 |   2 +-
 46 files changed, 160 insertions(+), 1298 deletions(-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-23 12:46 [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Peter Zijlstra
@ 2016-08-23 12:46 ` Peter Zijlstra
  2016-08-23 19:55   ` Waiman Long
                     ` (2 more replies)
  2016-08-23 12:46 ` [RFC][PATCH 2/3] locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 12:46 UTC (permalink / raw)
  To: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low, Peter Zijlstra

[-- Attachment #1: peterz-locking-rewrite-mutex-owner.patch --]
[-- Type: text/plain, Size: 57534 bytes --]

There's a number of iffy in mutex because mutex::count and
mutex::owner are two different fields; this too is the reason
MUTEX_SPIN_ON_OWNER and DEBUG_MUTEX are mutually exclusive.

Cure this by folding them into a single atomic_long_t field.

This nessecairly kills all the architecture specific mutex code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/include/asm/mutex.h      |    9 -
 arch/arc/include/asm/mutex.h        |   18 --
 arch/arm/include/asm/mutex.h        |   21 --
 arch/arm64/include/asm/Kbuild       |    1 
 arch/avr32/include/asm/mutex.h      |    9 -
 arch/blackfin/include/asm/Kbuild    |    1 
 arch/c6x/include/asm/mutex.h        |    6 
 arch/cris/include/asm/mutex.h       |    9 -
 arch/frv/include/asm/mutex.h        |    9 -
 arch/h8300/include/asm/mutex.h      |    9 -
 arch/hexagon/include/asm/mutex.h    |    8 -
 arch/ia64/include/asm/mutex.h       |   90 -----------
 arch/m32r/include/asm/mutex.h       |    9 -
 arch/m68k/include/asm/Kbuild        |    1 
 arch/metag/include/asm/Kbuild       |    1 
 arch/microblaze/include/asm/mutex.h |    1 
 arch/mips/include/asm/Kbuild        |    1 
 arch/mn10300/include/asm/mutex.h    |   16 --
 arch/nios2/include/asm/mutex.h      |    1 
 arch/openrisc/include/asm/mutex.h   |   27 ---
 arch/parisc/include/asm/Kbuild      |    1 
 arch/powerpc/include/asm/mutex.h    |  132 -----------------
 arch/s390/include/asm/mutex.h       |    9 -
 arch/score/include/asm/mutex.h      |    6 
 arch/sh/include/asm/mutex-llsc.h    |  109 --------------
 arch/sh/include/asm/mutex.h         |   12 -
 arch/sparc/include/asm/Kbuild       |    1 
 arch/tile/include/asm/Kbuild        |    1 
 arch/um/include/asm/Kbuild          |    1 
 arch/unicore32/include/asm/mutex.h  |   20 --
 arch/x86/include/asm/mutex.h        |    5 
 arch/x86/include/asm/mutex_32.h     |  110 --------------
 arch/x86/include/asm/mutex_64.h     |  127 ----------------
 arch/xtensa/include/asm/mutex.h     |    9 -
 include/asm-generic/mutex-dec.h     |   88 -----------
 include/asm-generic/mutex-null.h    |   19 --
 include/asm-generic/mutex-xchg.h    |  120 ---------------
 include/asm-generic/mutex.h         |    9 -
 include/linux/mutex-debug.h         |   24 ---
 include/linux/mutex.h               |   44 +++--
 kernel/locking/mutex-debug.c        |   13 -
 kernel/locking/mutex-debug.h        |   10 -
 kernel/locking/mutex.c              |  275 ++++++++++++------------------------
 kernel/locking/mutex.h              |   26 ---
 kernel/sched/core.c                 |    2 
 45 files changed, 123 insertions(+), 1297 deletions(-)

--- a/arch/alpha/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/arch/arc/include/asm/mutex.h
+++ /dev/null
@@ -1,18 +0,0 @@
-/*
- * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-/*
- * xchg() based mutex fast path maintains a state of 0 or 1, as opposed to
- * atomic dec based which can "count" any number of lock contenders.
- * This ideally needs to be fixed in core, but for now switching to dec ver.
- */
-#if defined(CONFIG_SMP) && (CONFIG_NR_CPUS > 2)
-#include <asm-generic/mutex-dec.h>
-#else
-#include <asm-generic/mutex-xchg.h>
-#endif
--- a/arch/arm/include/asm/mutex.h
+++ /dev/null
@@ -1,21 +0,0 @@
-/*
- * arch/arm/include/asm/mutex.h
- *
- * ARM optimized mutex locking primitives
- *
- * Please look into asm-generic/mutex-xchg.h for a formal definition.
- */
-#ifndef _ASM_MUTEX_H
-#define _ASM_MUTEX_H
-/*
- * On pre-ARMv6 hardware this results in a swp-based implementation,
- * which is the most efficient. For ARMv6+, we have exclusive memory
- * accessors and use atomic_dec to avoid the extra xchg operations
- * on the locking slowpaths.
- */
-#if __LINUX_ARM_ARCH__ < 6
-#include <asm-generic/mutex-xchg.h>
-#else
-#include <asm-generic/mutex-dec.h>
-#endif
-#endif	/* _ASM_MUTEX_H */
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -26,7 +26,6 @@ generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
-generic-y += mutex.h
 generic-y += pci.h
 generic-y += poll.h
 generic-y += preempt.h
--- a/arch/avr32/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/arch/blackfin/include/asm/Kbuild
+++ b/arch/blackfin/include/asm/Kbuild
@@ -24,7 +24,6 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += pgalloc.h
--- a/arch/c6x/include/asm/mutex.h
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef _ASM_C6X_MUTEX_H
-#define _ASM_C6X_MUTEX_H
-
-#include <asm-generic/mutex-null.h>
-
-#endif /* _ASM_C6X_MUTEX_H */
--- a/arch/cris/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/arch/frv/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/arch/h8300/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/arch/hexagon/include/asm/mutex.h
+++ /dev/null
@@ -1,8 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-#include <asm-generic/mutex-xchg.h>
--- a/arch/ia64/include/asm/mutex.h
+++ /dev/null
@@ -1,90 +0,0 @@
-/*
- * ia64 implementation of the mutex fastpath.
- *
- * Copyright (C) 2006 Ken Chen <kenneth.w.chen@intel.com>
- *
- */
-
-#ifndef _ASM_MUTEX_H
-#define _ASM_MUTEX_H
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function MUST leave the value lower than
- * 1 even when the "1" assertion wasn't true.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(ia64_fetchadd4_acq(count, -1) != 1))
-		fail_fn(count);
-}
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(ia64_fetchadd4_acq(count, -1) != 1))
-		return -1;
-	return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the count from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
- * In the failure case, this function is allowed to either set the value to
- * 1, or to set it to a value lower than 1.
- *
- * If the implementation sets it to a value of lower than 1, then the
- * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
- * to return 0 otherwise.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	int ret = ia64_fetchadd4_rel(count, 1);
-	if (unlikely(ret < 0))
-		fail_fn(count);
-}
-
-#define __mutex_slowpath_needs_to_unlock()		1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to a value lower than 1, and return 0 (failure)
- * if it wasn't 1 originally, or return 1 (success) otherwise. This function
- * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
- * Additionally, if the value was < 0 originally, this function must not leave
- * it to 0 on failure.
- *
- * If the architecture has no effective trylock variant, it should call the
- * <fail_fn> spinlock-based trylock variant unconditionally.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	if (atomic_read(count) == 1 && cmpxchg_acq(count, 1, 0) == 1)
-		return 1;
-	return 0;
-}
-
-#endif
--- a/arch/m32r/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -20,7 +20,6 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += mman.h
-generic-y += mutex.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += resource.h
--- a/arch/metag/include/asm/Kbuild
+++ b/arch/metag/include/asm/Kbuild
@@ -27,7 +27,6 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += pci.h
 generic-y += percpu.h
--- a/arch/microblaze/include/asm/mutex.h
+++ /dev/null
@@ -1 +0,0 @@
-#include <asm-generic/mutex-dec.h>
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -9,7 +9,6 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mutex.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
--- a/arch/mn10300/include/asm/mutex.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/* MN10300 Mutex fastpath
- *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
- *
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-#include <asm-generic/mutex-null.h>
--- a/arch/nios2/include/asm/mutex.h
+++ /dev/null
@@ -1 +0,0 @@
-#include <asm-generic/mutex-dec.h>
--- a/arch/openrisc/include/asm/mutex.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/*
- * OpenRISC Linux
- *
- * Linux architectural port borrowing liberally from similar works of
- * others.  All original copyrights apply as per the original source
- * declaration.
- *
- * OpenRISC implementation:
- * Copyright (C) 2003 Matjaz Breskvar <phoenix@bsemi.com>
- * Copyright (C) 2010-2011 Jonas Bonn <jonas@southpole.se>
- * et al.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += poll.h
--- a/arch/powerpc/include/asm/mutex.h
+++ /dev/null
@@ -1,132 +0,0 @@
-/*
- * Optimised mutex implementation of include/asm-generic/mutex-dec.h algorithm
- */
-#ifndef _ASM_POWERPC_MUTEX_H
-#define _ASM_POWERPC_MUTEX_H
-
-static inline int __mutex_cmpxchg_lock(atomic_t *v, int old, int new)
-{
-	int t;
-
-	__asm__ __volatile__ (
-"1:	lwarx	%0,0,%1		# mutex trylock\n\
-	cmpw	0,%0,%2\n\
-	bne-	2f\n"
-	PPC405_ERR77(0,%1)
-"	stwcx.	%3,0,%1\n\
-	bne-	1b"
-	PPC_ACQUIRE_BARRIER
-	"\n\
-2:"
-	: "=&r" (t)
-	: "r" (&v->counter), "r" (old), "r" (new)
-	: "cc", "memory");
-
-	return t;
-}
-
-static inline int __mutex_dec_return_lock(atomic_t *v)
-{
-	int t;
-
-	__asm__ __volatile__(
-"1:	lwarx	%0,0,%1		# mutex lock\n\
-	addic	%0,%0,-1\n"
-	PPC405_ERR77(0,%1)
-"	stwcx.	%0,0,%1\n\
-	bne-	1b"
-	PPC_ACQUIRE_BARRIER
-	: "=&r" (t)
-	: "r" (&v->counter)
-	: "cc", "memory");
-
-	return t;
-}
-
-static inline int __mutex_inc_return_unlock(atomic_t *v)
-{
-	int t;
-
-	__asm__ __volatile__(
-	PPC_RELEASE_BARRIER
-"1:	lwarx	%0,0,%1		# mutex unlock\n\
-	addic	%0,%0,1\n"
-	PPC405_ERR77(0,%1)
-"	stwcx.	%0,0,%1 \n\
-	bne-	1b"
-	: "=&r" (t)
-	: "r" (&v->counter)
-	: "cc", "memory");
-
-	return t;
-}
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function MUST leave the value lower than
- * 1 even when the "1" assertion wasn't true.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(__mutex_dec_return_lock(count) < 0))
-		fail_fn(count);
-}
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(__mutex_dec_return_lock(count) < 0))
-		return -1;
-	return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the count from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
- * In the failure case, this function is allowed to either set the value to
- * 1, or to set it to a value lower than 1.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(__mutex_inc_return_unlock(count) <= 0))
-		fail_fn(count);
-}
-
-#define __mutex_slowpath_needs_to_unlock()		1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to 0, and return 1 (success), or if the count
- * was not 1, then return 0 (failure).
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	if (likely(atomic_read(count) == 1 && __mutex_cmpxchg_lock(count, 1, 0) == 1))
-		return 1;
-	return 0;
-}
-
-#endif
--- a/arch/s390/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/arch/score/include/asm/mutex.h
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef _ASM_SCORE_MUTEX_H
-#define _ASM_SCORE_MUTEX_H
-
-#include <asm-generic/mutex-dec.h>
-
-#endif /* _ASM_SCORE_MUTEX_H */
--- a/arch/sh/include/asm/mutex-llsc.h
+++ /dev/null
@@ -1,109 +0,0 @@
-/*
- * arch/sh/include/asm/mutex-llsc.h
- *
- * SH-4A optimized mutex locking primitives
- *
- * Please look into asm-generic/mutex-xchg.h for a formal definition.
- */
-#ifndef __ASM_SH_MUTEX_LLSC_H
-#define __ASM_SH_MUTEX_LLSC_H
-
-/*
- * Attempting to lock a mutex on SH4A is done like in ARMv6+ architecure.
- * with a bastardized atomic decrement (it is not a reliable atomic decrement
- * but it satisfies the defined semantics for our purpose, while being
- * smaller and faster than a real atomic decrement or atomic swap.
- * The idea is to attempt  decrementing the lock value only once. If once
- * decremented it isn't zero, or if its store-back fails due to a dispute
- * on the exclusive store, we simply bail out immediately through the slow
- * path where the lock will be reattempted until it succeeds.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	int __done, __res;
-
-	__asm__ __volatile__ (
-		"movli.l	@%2, %0	\n"
-		"add		#-1, %0	\n"
-		"movco.l	%0, @%2	\n"
-		"movt		%1	\n"
-		: "=&z" (__res), "=&r" (__done)
-		: "r" (&(count)->counter)
-		: "t");
-
-	if (unlikely(!__done || __res != 0))
-		fail_fn(count);
-}
-
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	int __done, __res;
-
-	__asm__ __volatile__ (
-		"movli.l	@%2, %0	\n"
-		"add		#-1, %0	\n"
-		"movco.l	%0, @%2	\n"
-		"movt		%1	\n"
-		: "=&z" (__res), "=&r" (__done)
-		: "r" (&(count)->counter)
-		: "t");
-
-	if (unlikely(!__done || __res != 0))
-		__res = -1;
-
-	return __res;
-}
-
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	int __done, __res;
-
-	__asm__ __volatile__ (
-		"movli.l	@%2, %0	\n\t"
-		"add		#1, %0	\n\t"
-		"movco.l	%0, @%2 \n\t"
-		"movt		%1	\n\t"
-		: "=&z" (__res), "=&r" (__done)
-		: "r" (&(count)->counter)
-		: "t");
-
-	if (unlikely(!__done || __res <= 0))
-		fail_fn(count);
-}
-
-/*
- * If the unlock was done on a contended lock, or if the unlock simply fails
- * then the mutex remains locked.
- */
-#define __mutex_slowpath_needs_to_unlock()	1
-
-/*
- * For __mutex_fastpath_trylock we do an atomic decrement and check the
- * result and put it in the __res variable.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	int __res, __orig;
-
-	__asm__ __volatile__ (
-		"1: movli.l	@%2, %0		\n\t"
-		"dt		%0		\n\t"
-		"movco.l	%0,@%2		\n\t"
-		"bf		1b		\n\t"
-		"cmp/eq		#0,%0		\n\t"
-		"bt		2f		\n\t"
-		"mov		#0, %1		\n\t"
-		"bf		3f		\n\t"
-		"2: mov		#1, %1		\n\t"
-		"3:				"
-		: "=&z" (__orig), "=&r" (__res)
-		: "r" (&count->counter)
-		: "t");
-
-	return __res;
-}
-#endif /* __ASM_SH_MUTEX_LLSC_H */
--- a/arch/sh/include/asm/mutex.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-#if defined(CONFIG_CPU_SH4A)
-#include <asm/mutex-llsc.h>
-#else
-#include <asm-generic/mutex-dec.h>
-#endif
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -14,7 +14,6 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += module.h
-generic-y += mutex.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += serial.h
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += parport.h
 generic-y += poll.h
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -17,7 +17,6 @@ generic-y += irq_work.h
 generic-y += kdebug.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += pci.h
 generic-y += percpu.h
--- a/arch/unicore32/include/asm/mutex.h
+++ /dev/null
@@ -1,20 +0,0 @@
-/*
- * linux/arch/unicore32/include/asm/mutex.h
- *
- * Code specific to PKUnity SoC and UniCore ISA
- *
- * Copyright (C) 2001-2010 GUAN Xue-tao
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * UniCore optimized mutex locking primitives
- *
- * Please look into asm-generic/mutex-xchg.h for a formal definition.
- */
-#ifndef __UNICORE_MUTEX_H__
-#define __UNICORE_MUTEX_H__
-
-# include <asm-generic/mutex-xchg.h>
-#endif
--- a/arch/x86/include/asm/mutex.h
+++ /dev/null
@@ -1,5 +0,0 @@
-#ifdef CONFIG_X86_32
-# include <asm/mutex_32.h>
-#else
-# include <asm/mutex_64.h>
-#endif
--- a/arch/x86/include/asm/mutex_32.h
+++ /dev/null
@@ -1,110 +0,0 @@
-/*
- * Assembly implementation of the mutex fastpath, based on atomic
- * decrement/increment.
- *
- * started by Ingo Molnar:
- *
- *  Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
- */
-#ifndef _ASM_X86_MUTEX_32_H
-#define _ASM_X86_MUTEX_32_H
-
-#include <asm/alternative.h>
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fn> if it
- * wasn't 1 originally. This function MUST leave the value lower than 1
- * even when the "1" assertion wasn't true.
- */
-#define __mutex_fastpath_lock(count, fail_fn)			\
-do {								\
-	unsigned int dummy;					\
-								\
-	typecheck(atomic_t *, count);				\
-	typecheck_fn(void (*)(atomic_t *), fail_fn);		\
-								\
-	asm volatile(LOCK_PREFIX "   decl (%%eax)\n"		\
-		     "   jns 1f	\n"				\
-		     "   call " #fail_fn "\n"			\
-		     "1:\n"					\
-		     : "=a" (dummy)				\
-		     : "a" (count)				\
-		     : "memory", "ecx", "edx");			\
-} while (0)
-
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int __mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(atomic_dec_return(count) < 0))
-		return -1;
-	else
-		return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the mutex from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * try to promote the mutex from 0 to 1. if it wasn't 0, call <fail_fn>.
- * In the failure case, this function is allowed to either set the value
- * to 1, or to set it to a value lower than 1.
- *
- * If the implementation sets it to a value of lower than 1, the
- * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
- * to return 0 otherwise.
- */
-#define __mutex_fastpath_unlock(count, fail_fn)			\
-do {								\
-	unsigned int dummy;					\
-								\
-	typecheck(atomic_t *, count);				\
-	typecheck_fn(void (*)(atomic_t *), fail_fn);		\
-								\
-	asm volatile(LOCK_PREFIX "   incl (%%eax)\n"		\
-		     "   jg	1f\n"				\
-		     "   call " #fail_fn "\n"			\
-		     "1:\n"					\
-		     : "=a" (dummy)				\
-		     : "a" (count)				\
-		     : "memory", "ecx", "edx");			\
-} while (0)
-
-#define __mutex_slowpath_needs_to_unlock()	1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to a value lower than 1, and return 0 (failure)
- * if it wasn't 1 originally, or return 1 (success) otherwise. This function
- * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
- * Additionally, if the value was < 0 originally, this function must not leave
- * it to 0 on failure.
- */
-static inline int __mutex_fastpath_trylock(atomic_t *count,
-					   int (*fail_fn)(atomic_t *))
-{
-	/* cmpxchg because it never induces a false contention state. */
-	if (likely(atomic_read(count) == 1 && atomic_cmpxchg(count, 1, 0) == 1))
-		return 1;
-
-	return 0;
-}
-
-#endif /* _ASM_X86_MUTEX_32_H */
--- a/arch/x86/include/asm/mutex_64.h
+++ /dev/null
@@ -1,127 +0,0 @@
-/*
- * Assembly implementation of the mutex fastpath, based on atomic
- * decrement/increment.
- *
- * started by Ingo Molnar:
- *
- *  Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
- */
-#ifndef _ASM_X86_MUTEX_64_H
-#define _ASM_X86_MUTEX_64_H
-
-/**
- * __mutex_fastpath_lock - decrement and call function if negative
- * @v: pointer of type atomic_t
- * @fail_fn: function to call if the result is negative
- *
- * Atomically decrements @v and calls <fail_fn> if the result is negative.
- */
-#ifdef CC_HAVE_ASM_GOTO
-static inline void __mutex_fastpath_lock(atomic_t *v,
-					 void (*fail_fn)(atomic_t *))
-{
-	asm_volatile_goto(LOCK_PREFIX "   decl %0\n"
-			  "   jns %l[exit]\n"
-			  : : "m" (v->counter)
-			  : "memory", "cc"
-			  : exit);
-	fail_fn(v);
-exit:
-	return;
-}
-#else
-#define __mutex_fastpath_lock(v, fail_fn)			\
-do {								\
-	unsigned long dummy;					\
-								\
-	typecheck(atomic_t *, v);				\
-	typecheck_fn(void (*)(atomic_t *), fail_fn);		\
-								\
-	asm volatile(LOCK_PREFIX "   decl (%%rdi)\n"		\
-		     "   jns 1f		\n"			\
-		     "   call " #fail_fn "\n"			\
-		     "1:"					\
-		     : "=D" (dummy)				\
-		     : "D" (v)					\
-		     : "rax", "rsi", "rdx", "rcx",		\
-		       "r8", "r9", "r10", "r11", "memory");	\
-} while (0)
-#endif
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int __mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(atomic_dec_return(count) < 0))
-		return -1;
-	else
-		return 0;
-}
-
-/**
- * __mutex_fastpath_unlock - increment and call function if nonpositive
- * @v: pointer of type atomic_t
- * @fail_fn: function to call if the result is nonpositive
- *
- * Atomically increments @v and calls <fail_fn> if the result is nonpositive.
- */
-#ifdef CC_HAVE_ASM_GOTO
-static inline void __mutex_fastpath_unlock(atomic_t *v,
-					   void (*fail_fn)(atomic_t *))
-{
-	asm_volatile_goto(LOCK_PREFIX "   incl %0\n"
-			  "   jg %l[exit]\n"
-			  : : "m" (v->counter)
-			  : "memory", "cc"
-			  : exit);
-	fail_fn(v);
-exit:
-	return;
-}
-#else
-#define __mutex_fastpath_unlock(v, fail_fn)			\
-do {								\
-	unsigned long dummy;					\
-								\
-	typecheck(atomic_t *, v);				\
-	typecheck_fn(void (*)(atomic_t *), fail_fn);		\
-								\
-	asm volatile(LOCK_PREFIX "   incl (%%rdi)\n"		\
-		     "   jg 1f\n"				\
-		     "   call " #fail_fn "\n"			\
-		     "1:"					\
-		     : "=D" (dummy)				\
-		     : "D" (v)					\
-		     : "rax", "rsi", "rdx", "rcx",		\
-		       "r8", "r9", "r10", "r11", "memory");	\
-} while (0)
-#endif
-
-#define __mutex_slowpath_needs_to_unlock()	1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to 0 and return 1 (success), or return 0 (failure)
- * if it wasn't 1 originally. [the fallback function is never used on
- * x86_64, because all x86_64 CPUs have a CMPXCHG instruction.]
- */
-static inline int __mutex_fastpath_trylock(atomic_t *count,
-					   int (*fail_fn)(atomic_t *))
-{
-	if (likely(atomic_read(count) == 1 && atomic_cmpxchg(count, 1, 0) == 1))
-		return 1;
-
-	return 0;
-}
-
-#endif /* _ASM_X86_MUTEX_64_H */
--- a/arch/xtensa/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
--- a/include/asm-generic/mutex-dec.h
+++ /dev/null
@@ -1,88 +0,0 @@
-/*
- * include/asm-generic/mutex-dec.h
- *
- * Generic implementation of the mutex fastpath, based on atomic
- * decrement/increment.
- */
-#ifndef _ASM_GENERIC_MUTEX_DEC_H
-#define _ASM_GENERIC_MUTEX_DEC_H
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function MUST leave the value lower than
- * 1 even when the "1" assertion wasn't true.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(atomic_dec_return_acquire(count) < 0))
-		fail_fn(count);
-}
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(atomic_dec_return_acquire(count) < 0))
-		return -1;
-	return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the count from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
- * In the failure case, this function is allowed to either set the value to
- * 1, or to set it to a value lower than 1.
- *
- * If the implementation sets it to a value of lower than 1, then the
- * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
- * to return 0 otherwise.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(atomic_inc_return_release(count) <= 0))
-		fail_fn(count);
-}
-
-#define __mutex_slowpath_needs_to_unlock()		1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to a value lower than 1, and return 0 (failure)
- * if it wasn't 1 originally, or return 1 (success) otherwise. This function
- * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
- * Additionally, if the value was < 0 originally, this function must not leave
- * it to 0 on failure.
- *
- * If the architecture has no effective trylock variant, it should call the
- * <fail_fn> spinlock-based trylock variant unconditionally.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	if (likely(atomic_read(count) == 1 && atomic_cmpxchg_acquire(count, 1, 0) == 1))
-		return 1;
-	return 0;
-}
-
-#endif
--- a/include/asm-generic/mutex-null.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/*
- * include/asm-generic/mutex-null.h
- *
- * Generic implementation of the mutex fastpath, based on NOP :-)
- *
- * This is used by the mutex-debugging infrastructure, but it can also
- * be used by architectures that (for whatever reason) want to use the
- * spinlock based slowpath.
- */
-#ifndef _ASM_GENERIC_MUTEX_NULL_H
-#define _ASM_GENERIC_MUTEX_NULL_H
-
-#define __mutex_fastpath_lock(count, fail_fn)		fail_fn(count)
-#define __mutex_fastpath_lock_retval(count)		(-1)
-#define __mutex_fastpath_unlock(count, fail_fn)		fail_fn(count)
-#define __mutex_fastpath_trylock(count, fail_fn)	fail_fn(count)
-#define __mutex_slowpath_needs_to_unlock()		1
-
-#endif
--- a/include/asm-generic/mutex-xchg.h
+++ /dev/null
@@ -1,120 +0,0 @@
-/*
- * include/asm-generic/mutex-xchg.h
- *
- * Generic implementation of the mutex fastpath, based on xchg().
- *
- * NOTE: An xchg based implementation might be less optimal than an atomic
- *       decrement/increment based implementation. If your architecture
- *       has a reasonable atomic dec/inc then you should probably use
- *	 asm-generic/mutex-dec.h instead, or you could open-code an
- *	 optimized version in asm/mutex.h.
- */
-#ifndef _ASM_GENERIC_MUTEX_XCHG_H
-#define _ASM_GENERIC_MUTEX_XCHG_H
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if it
- * wasn't 1 originally. This function MUST leave the value lower than 1
- * even when the "1" assertion wasn't true.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(atomic_xchg(count, 0) != 1))
-		/*
-		 * We failed to acquire the lock, so mark it contended
-		 * to ensure that any waiting tasks are woken up by the
-		 * unlock slow path.
-		 */
-		if (likely(atomic_xchg_acquire(count, -1) != 1))
-			fail_fn(count);
-}
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(atomic_xchg_acquire(count, 0) != 1))
-		if (likely(atomic_xchg(count, -1) != 1))
-			return -1;
-	return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the mutex from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * try to promote the mutex from 0 to 1. if it wasn't 0, call <function>
- * In the failure case, this function is allowed to either set the value to
- * 1, or to set it to a value lower than one.
- * If the implementation sets it to a value of lower than one, the
- * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
- * to return 0 otherwise.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(atomic_xchg_release(count, 1) != 0))
-		fail_fn(count);
-}
-
-#define __mutex_slowpath_needs_to_unlock()		0
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: spinlock based trylock implementation
- *
- * Change the count from 1 to a value lower than 1, and return 0 (failure)
- * if it wasn't 1 originally, or return 1 (success) otherwise. This function
- * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
- * Additionally, if the value was < 0 originally, this function must not leave
- * it to 0 on failure.
- *
- * If the architecture has no effective trylock variant, it should call the
- * <fail_fn> spinlock-based trylock variant unconditionally.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	int prev;
-
-	if (atomic_read(count) != 1)
-		return 0;
-
-	prev = atomic_xchg_acquire(count, 0);
-	if (unlikely(prev < 0)) {
-		/*
-		 * The lock was marked contended so we must restore that
-		 * state. If while doing so we get back a prev value of 1
-		 * then we just own it.
-		 *
-		 * [ In the rare case of the mutex going to 1, to 0, to -1
-		 *   and then back to 0 in this few-instructions window,
-		 *   this has the potential to trigger the slowpath for the
-		 *   owner's unlock path needlessly, but that's not a problem
-		 *   in practice. ]
-		 */
-		prev = atomic_xchg_acquire(count, prev);
-		if (prev < 0)
-			prev = 0;
-	}
-
-	return prev;
-}
-
-#endif
--- a/include/asm-generic/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-#ifndef __ASM_GENERIC_MUTEX_H
-#define __ASM_GENERIC_MUTEX_H
-/*
- * Pull in the generic implementation for the mutex fastpath,
- * which is a reasonable default on many architectures.
- */
-
-#include <asm-generic/mutex-dec.h>
-#endif /* __ASM_GENERIC_MUTEX_H */
--- a/include/linux/mutex-debug.h
+++ /dev/null
@@ -1,24 +0,0 @@
-#ifndef __LINUX_MUTEX_DEBUG_H
-#define __LINUX_MUTEX_DEBUG_H
-
-#include <linux/linkage.h>
-#include <linux/lockdep.h>
-#include <linux/debug_locks.h>
-
-/*
- * Mutexes - debugging helpers:
- */
-
-#define __DEBUG_MUTEX_INITIALIZER(lockname)				\
-	, .magic = &lockname
-
-#define mutex_init(mutex)						\
-do {									\
-	static struct lock_class_key __key;				\
-									\
-	__mutex_init((mutex), #mutex, &__key);				\
-} while (0)
-
-extern void mutex_destroy(struct mutex *lock);
-
-#endif
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -18,6 +18,7 @@
 #include <linux/atomic.h>
 #include <asm/processor.h>
 #include <linux/osq_lock.h>
+#include <linux/debug_locks.h>
 
 /*
  * Simple, straightforward mutexes with strict semantics:
@@ -48,13 +49,9 @@
  *   locks and tasks (and only those tasks)
  */
 struct mutex {
-	/* 1: unlocked, 0: locked, negative: locked, possible waiters */
-	atomic_t		count;
+	atomic_long_t		owner;
 	spinlock_t		wait_lock;
 	struct list_head	wait_list;
-#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_MUTEX_SPIN_ON_OWNER)
-	struct task_struct	*owner;
-#endif
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* Spinner MCS lock */
 #endif
@@ -66,6 +63,11 @@ struct mutex {
 #endif
 };
 
+static inline struct task_struct *__mutex_owner(struct mutex *lock)
+{
+	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~0x03);
+}
+
 /*
  * This is the control structure for tasks blocked on mutex,
  * which resides on the blocked task's kernel stack:
@@ -79,9 +81,20 @@ struct mutex_waiter {
 };
 
 #ifdef CONFIG_DEBUG_MUTEXES
-# include <linux/mutex-debug.h>
+
+#define __DEBUG_MUTEX_INITIALIZER(lockname)				\
+	, .magic = &lockname
+
+extern void mutex_destroy(struct mutex *lock);
+
 #else
+
 # define __DEBUG_MUTEX_INITIALIZER(lockname)
+
+static inline void mutex_destroy(struct mutex *lock) {}
+
+#endif
+
 /**
  * mutex_init - initialize the mutex
  * @mutex: the mutex to be initialized
@@ -90,14 +103,12 @@ struct mutex_waiter {
  *
  * It is not allowed to initialize an already locked mutex.
  */
-# define mutex_init(mutex) \
-do {							\
-	static struct lock_class_key __key;		\
-							\
-	__mutex_init((mutex), #mutex, &__key);		\
+#define mutex_init(mutex)						\
+do {									\
+	static struct lock_class_key __key;				\
+									\
+	__mutex_init((mutex), #mutex, &__key);				\
 } while (0)
-static inline void mutex_destroy(struct mutex *lock) {}
-#endif
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
@@ -107,7 +118,7 @@ static inline void mutex_destroy(struct
 #endif
 
 #define __MUTEX_INITIALIZER(lockname) \
-		{ .count = ATOMIC_INIT(1) \
+		{ .owner = ATOMIC_LONG_INIT(0) \
 		, .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
 		, .wait_list = LIST_HEAD_INIT(lockname.wait_list) \
 		__DEBUG_MUTEX_INITIALIZER(lockname) \
@@ -127,7 +138,10 @@ extern void __mutex_init(struct mutex *l
  */
 static inline int mutex_is_locked(struct mutex *lock)
 {
-	return atomic_read(&lock->count) != 1;
+	/*
+	 * XXX think about spin_is_locked
+	 */
+	return __mutex_owner(lock) != NULL;
 }
 
 /*
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -73,21 +73,8 @@ void debug_mutex_unlock(struct mutex *lo
 {
 	if (likely(debug_locks)) {
 		DEBUG_LOCKS_WARN_ON(lock->magic != lock);
-
-		if (!lock->owner)
-			DEBUG_LOCKS_WARN_ON(!lock->owner);
-		else
-			DEBUG_LOCKS_WARN_ON(lock->owner != current);
-
 		DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
 	}
-
-	/*
-	 * __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug
-	 * mutexes so that we can do it here after we've verified state.
-	 */
-	mutex_clear_owner(lock);
-	atomic_set(&lock->count, 1);
 }
 
 void debug_mutex_init(struct mutex *lock, const char *name,
--- a/kernel/locking/mutex-debug.h
+++ b/kernel/locking/mutex-debug.h
@@ -27,16 +27,6 @@ extern void debug_mutex_unlock(struct mu
 extern void debug_mutex_init(struct mutex *lock, const char *name,
 			     struct lock_class_key *key);
 
-static inline void mutex_set_owner(struct mutex *lock)
-{
-	WRITE_ONCE(lock->owner, current);
-}
-
-static inline void mutex_clear_owner(struct mutex *lock)
-{
-	WRITE_ONCE(lock->owner, NULL);
-}
-
 #define spin_lock_mutex(lock, flags)			\
 	do {						\
 		struct mutex *l = container_of(lock, struct mutex, wait_lock); \
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -33,26 +33,16 @@
  */
 #ifdef CONFIG_DEBUG_MUTEXES
 # include "mutex-debug.h"
-# include <asm-generic/mutex-null.h>
-/*
- * Must be 0 for the debug case so we do not do the unlock outside of the
- * wait_lock region. debug_mutex_unlock() will do the actual unlock in this
- * case.
- */
-# undef __mutex_slowpath_needs_to_unlock
-# define  __mutex_slowpath_needs_to_unlock()	0
 #else
 # include "mutex.h"
-# include <asm/mutex.h>
 #endif
 
 void
 __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
 {
-	atomic_set(&lock->count, 1);
+	atomic_long_set(&lock->owner, 0);
 	spin_lock_init(&lock->wait_lock);
 	INIT_LIST_HEAD(&lock->wait_list);
-	mutex_clear_owner(lock);
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 	osq_lock_init(&lock->osq);
 #endif
@@ -62,6 +52,38 @@ __mutex_init(struct mutex *lock, const c
 
 EXPORT_SYMBOL(__mutex_init);
 
+#define MUTEX_FLAG_WAITERS	0x01
+
+#define MUTEX_FLAG_ALL		0x03
+
+/*
+ * Atomically try to take the lock when it is available
+ */
+static inline bool __mutex_trylock(struct mutex *lock)
+{
+	unsigned long owner, new_owner;
+
+	owner = atomic_long_read(&lock->owner);
+	if (owner & ~0x03)
+		return false;
+
+	new_owner = owner | (unsigned long)current;
+	if (atomic_long_cmpxchg_acquire(&lock->owner, owner, new_owner) == owner)
+		return true;
+
+	return false;
+}
+
+static inline void __mutex_set_flag(struct mutex *lock, unsigned long flag)
+{
+	atomic_long_or(flag, &lock->owner);
+}
+
+static inline void __mutex_clear_flag(struct mutex *lock, unsigned long flag)
+{
+	atomic_long_andnot(flag, &lock->owner);
+}
+
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * We split the mutex lock/unlock logic into separate fastpath and
@@ -69,7 +91,7 @@ EXPORT_SYMBOL(__mutex_init);
  * We also put the fastpath first in the kernel image, to make sure the
  * branch is predicted by the CPU as default-untaken.
  */
-__visible void __sched __mutex_lock_slowpath(atomic_t *lock_count);
+static void __sched __mutex_lock_slowpath(struct mutex *lock);
 
 /**
  * mutex_lock - acquire the mutex
@@ -95,14 +117,10 @@ __visible void __sched __mutex_lock_slow
 void __sched mutex_lock(struct mutex *lock)
 {
 	might_sleep();
-	/*
-	 * The locking fastpath is the 1->0 transition from
-	 * 'unlocked' into 'locked' state.
-	 */
-	__mutex_fastpath_lock(&lock->count, __mutex_lock_slowpath);
-	mutex_set_owner(lock);
-}
 
+	if (!__mutex_trylock(lock))
+		__mutex_lock_slowpath(lock);
+}
 EXPORT_SYMBOL(mutex_lock);
 #endif
 
@@ -176,7 +194,7 @@ ww_mutex_set_context_fastpath(struct ww_
 	/*
 	 * Check if lock is contended, if not there is nobody to wake up
 	 */
-	if (likely(atomic_read(&lock->base.count) == 0))
+	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
 		return;
 
 	/*
@@ -227,7 +245,7 @@ bool mutex_spin_on_owner(struct mutex *l
 	bool ret = true;
 
 	rcu_read_lock();
-	while (lock->owner == owner) {
+	while (__mutex_owner(lock) == owner) {
 		/*
 		 * Ensure we emit the owner->on_cpu, dereference _after_
 		 * checking lock->owner still matches owner. If that fails,
@@ -260,7 +278,7 @@ static inline int mutex_can_spin_on_owne
 		return 0;
 
 	rcu_read_lock();
-	owner = READ_ONCE(lock->owner);
+	owner = __mutex_owner(lock);
 	if (owner)
 		retval = owner->on_cpu;
 	rcu_read_unlock();
@@ -272,15 +290,6 @@ static inline int mutex_can_spin_on_owne
 }
 
 /*
- * Atomically try to take the lock when it is available
- */
-static inline bool mutex_try_to_acquire(struct mutex *lock)
-{
-	return !mutex_is_locked(lock) &&
-		(atomic_cmpxchg_acquire(&lock->count, 1, 0) == 1);
-}
-
-/*
  * Optimistic spinning.
  *
  * We try to spin for acquisition when we find that the lock owner
@@ -342,12 +351,12 @@ static bool mutex_optimistic_spin(struct
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		owner = READ_ONCE(lock->owner);
+		owner = __mutex_owner(lock);
 		if (owner && !mutex_spin_on_owner(lock, owner))
 			break;
 
 		/* Try to acquire the mutex if it is unlocked. */
-		if (mutex_try_to_acquire(lock)) {
+		if (__mutex_trylock(lock)) {
 			lock_acquired(&lock->dep_map, ip);
 
 			if (use_ww_ctx) {
@@ -357,7 +366,6 @@ static bool mutex_optimistic_spin(struct
 				ww_mutex_set_context_fastpath(ww, ww_ctx);
 			}
 
-			mutex_set_owner(lock);
 			osq_unlock(&lock->osq);
 			return true;
 		}
@@ -406,8 +414,7 @@ static bool mutex_optimistic_spin(struct
 }
 #endif
 
-__visible __used noinline
-void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
+static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock);
 
 /**
  * mutex_unlock - release the mutex
@@ -422,21 +429,26 @@ void __sched __mutex_unlock_slowpath(ato
  */
 void __sched mutex_unlock(struct mutex *lock)
 {
-	/*
-	 * The unlocking fastpath is the 0->1 transition from 'locked'
-	 * into 'unlocked' state:
-	 */
-#ifndef CONFIG_DEBUG_MUTEXES
-	/*
-	 * When debugging is enabled we must not clear the owner before time,
-	 * the slow path will always be taken, and that clears the owner field
-	 * after verifying that it was indeed current.
-	 */
-	mutex_clear_owner(lock);
+	unsigned long owner;
+
+#ifdef CONFIG_DEBUG_MUTEXES
+	DEBUG_LOCKS_WARN_ON(__mutex_owner(lock) != current);
 #endif
-	__mutex_fastpath_unlock(&lock->count, __mutex_unlock_slowpath);
-}
 
+	owner = atomic_long_read(&lock->owner);
+	for (;;) {
+		unsigned long old;
+
+		old = atomic_long_cmpxchg_release(&lock->owner, owner, owner & 0x03);
+		if (old == owner)
+			break;
+
+		owner = old;
+	}
+
+	if (owner & 0x03);
+		__mutex_unlock_slowpath(lock);
+}
 EXPORT_SYMBOL(mutex_unlock);
 
 /**
@@ -465,15 +477,7 @@ void __sched ww_mutex_unlock(struct ww_m
 		lock->ctx = NULL;
 	}
 
-#ifndef CONFIG_DEBUG_MUTEXES
-	/*
-	 * When debugging is enabled we must not clear the owner before time,
-	 * the slow path will always be taken, and that clears the owner field
-	 * after verifying that it was indeed current.
-	 */
-	mutex_clear_owner(&lock->base);
-#endif
-	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
+	mutex_unlock(&lock->base);
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
@@ -529,11 +533,9 @@ __mutex_lock_common(struct mutex *lock,
 	spin_lock_mutex(&lock->wait_lock, flags);
 
 	/*
-	 * Once more, try to acquire the lock. Only try-lock the mutex if
-	 * it is unlocked to reduce unnecessary xchg() operations.
+	 * Once more, try to acquire the lock.
 	 */
-	if (!mutex_is_locked(lock) &&
-	    (atomic_xchg_acquire(&lock->count, 0) == 1))
+	if (__mutex_trylock(lock))
 		goto skip_wait;
 
 	debug_mutex_lock_common(lock, &waiter);
@@ -543,24 +545,13 @@ __mutex_lock_common(struct mutex *lock,
 	list_add_tail(&waiter.list, &lock->wait_list);
 	waiter.task = task;
 
+	if (list_first_entry(&lock->wait_list, struct mutex_waiter, list) == &waiter)
+		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
+
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
 		/*
-		 * Lets try to take the lock again - this is needed even if
-		 * we get here for the first time (shortly after failing to
-		 * acquire the lock), to make sure that we get a wakeup once
-		 * it's unlocked. Later on, if we sleep, this is the
-		 * operation that gives us the lock. We xchg it to -1, so
-		 * that when we release the lock, we properly wake up the
-		 * other waiters. We only attempt the xchg if the count is
-		 * non-negative in order to avoid unnecessary xchg operations:
-		 */
-		if (atomic_read(&lock->count) >= 0 &&
-		    (atomic_xchg_acquire(&lock->count, -1) == 1))
-			break;
-
-		/*
 		 * got a signal? (This code gets eliminated in the
 		 * TASK_UNINTERRUPTIBLE case.)
 		 */
@@ -581,19 +572,22 @@ __mutex_lock_common(struct mutex *lock,
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
 		spin_lock_mutex(&lock->wait_lock, flags);
+
+		if (__mutex_trylock(lock))
+			break;
 	}
 	__set_task_state(task, TASK_RUNNING);
 
 	mutex_remove_waiter(lock, &waiter, task);
 	/* set it to 0 if there are no waiters left: */
 	if (likely(list_empty(&lock->wait_list)))
-		atomic_set(&lock->count, 0);
+		__mutex_clear_flag(lock, MUTEX_FLAG_WAITERS);
+
 	debug_mutex_free_waiter(&waiter);
 
 skip_wait:
 	/* got the lock - cleanup and rejoice! */
 	lock_acquired(&lock->dep_map, ip);
-	mutex_set_owner(lock);
 
 	if (use_ww_ctx) {
 		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
@@ -631,7 +625,6 @@ _mutex_lock_nest_lock(struct mutex *lock
 	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE,
 			    0, nest, _RET_IP_, NULL, 0);
 }
-
 EXPORT_SYMBOL_GPL(_mutex_lock_nest_lock);
 
 int __sched
@@ -650,7 +643,6 @@ mutex_lock_interruptible_nested(struct m
 	return __mutex_lock_common(lock, TASK_INTERRUPTIBLE,
 				   subclass, NULL, _RET_IP_, NULL, 0);
 }
-
 EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested);
 
 static inline int
@@ -715,27 +707,11 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interr
 /*
  * Release the lock, slowpath:
  */
-static inline void
-__mutex_unlock_common_slowpath(struct mutex *lock, int nested)
+static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock)
 {
 	unsigned long flags;
 	WAKE_Q(wake_q);
 
-	/*
-	 * As a performance measurement, release the lock before doing other
-	 * wakeup related duties to follow. This allows other tasks to acquire
-	 * the lock sooner, while still handling cleanups in past unlock calls.
-	 * This can be done as we do not enforce strict equivalence between the
-	 * mutex counter and wait_list.
-	 *
-	 *
-	 * Some architectures leave the lock unlocked in the fastpath failure
-	 * case, others need to leave it locked. In the later case we have to
-	 * unlock it here - as the lock counter is currently 0 or negative.
-	 */
-	if (__mutex_slowpath_needs_to_unlock())
-		atomic_set(&lock->count, 1);
-
 	spin_lock_mutex(&lock->wait_lock, flags);
 	mutex_release(&lock->dep_map, nested, _RET_IP_);
 	debug_mutex_unlock(lock);
@@ -754,17 +730,6 @@ __mutex_unlock_common_slowpath(struct mu
 	wake_up_q(&wake_q);
 }
 
-/*
- * Release the lock, slowpath:
- */
-__visible void
-__mutex_unlock_slowpath(atomic_t *lock_count)
-{
-	struct mutex *lock = container_of(lock_count, struct mutex, count);
-
-	__mutex_unlock_common_slowpath(lock, 1);
-}
-
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * Here come the less common (and hence less performance-critical) APIs:
@@ -789,38 +754,29 @@ __mutex_lock_interruptible_slowpath(stru
  */
 int __sched mutex_lock_interruptible(struct mutex *lock)
 {
-	int ret;
-
 	might_sleep();
-	ret =  __mutex_fastpath_lock_retval(&lock->count);
-	if (likely(!ret)) {
-		mutex_set_owner(lock);
+
+	if (__mutex_trylock(lock))
 		return 0;
-	} else
-		return __mutex_lock_interruptible_slowpath(lock);
+
+	return __mutex_lock_interruptible_slowpath(lock);
 }
 
 EXPORT_SYMBOL(mutex_lock_interruptible);
 
 int __sched mutex_lock_killable(struct mutex *lock)
 {
-	int ret;
-
 	might_sleep();
-	ret = __mutex_fastpath_lock_retval(&lock->count);
-	if (likely(!ret)) {
-		mutex_set_owner(lock);
+
+	if (__mutex_trylock(lock))
 		return 0;
-	} else
-		return __mutex_lock_killable_slowpath(lock);
+
+	return __mutex_lock_killable_slowpath(lock);
 }
 EXPORT_SYMBOL(mutex_lock_killable);
 
-__visible void __sched
-__mutex_lock_slowpath(atomic_t *lock_count)
+static void __sched __mutex_lock_slowpath(struct mutex *lock)
 {
-	struct mutex *lock = container_of(lock_count, struct mutex, count);
-
 	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0,
 			    NULL, _RET_IP_, NULL, 0);
 }
@@ -856,37 +812,6 @@ __ww_mutex_lock_interruptible_slowpath(s
 
 #endif
 
-/*
- * Spinlock based trylock, we take the spinlock and check whether we
- * can get the lock:
- */
-static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
-{
-	struct mutex *lock = container_of(lock_count, struct mutex, count);
-	unsigned long flags;
-	int prev;
-
-	/* No need to trylock if the mutex is locked. */
-	if (mutex_is_locked(lock))
-		return 0;
-
-	spin_lock_mutex(&lock->wait_lock, flags);
-
-	prev = atomic_xchg_acquire(&lock->count, -1);
-	if (likely(prev == 1)) {
-		mutex_set_owner(lock);
-		mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
-	}
-
-	/* Set it back to 0 if there are no waiters: */
-	if (likely(list_empty(&lock->wait_list)))
-		atomic_set(&lock->count, 0);
-
-	spin_unlock_mutex(&lock->wait_lock, flags);
-
-	return prev == 1;
-}
-
 /**
  * mutex_trylock - try to acquire the mutex, without waiting
  * @lock: the mutex to be acquired
@@ -903,13 +828,7 @@ static inline int __mutex_trylock_slowpa
  */
 int __sched mutex_trylock(struct mutex *lock)
 {
-	int ret;
-
-	ret = __mutex_fastpath_trylock(&lock->count, __mutex_trylock_slowpath);
-	if (ret)
-		mutex_set_owner(lock);
-
-	return ret;
+	return __mutex_trylock(lock);
 }
 EXPORT_SYMBOL(mutex_trylock);
 
@@ -917,36 +836,28 @@ EXPORT_SYMBOL(mutex_trylock);
 int __sched
 __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
-	int ret;
-
 	might_sleep();
 
-	ret = __mutex_fastpath_lock_retval(&lock->base.count);
-
-	if (likely(!ret)) {
+	if (__mutex_trylock(&lock->base)) {
 		ww_mutex_set_context_fastpath(lock, ctx);
-		mutex_set_owner(&lock->base);
-	} else
-		ret = __ww_mutex_lock_slowpath(lock, ctx);
-	return ret;
+		return 0;
+	}
+
+	return __ww_mutex_lock_slowpath(lock, ctx);
 }
 EXPORT_SYMBOL(__ww_mutex_lock);
 
 int __sched
 __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
-	int ret;
-
 	might_sleep();
 
-	ret = __mutex_fastpath_lock_retval(&lock->base.count);
-
-	if (likely(!ret)) {
+	if (__mutex_trylock(&lock->base)) {
 		ww_mutex_set_context_fastpath(lock, ctx);
-		mutex_set_owner(&lock->base);
-	} else
-		ret = __ww_mutex_lock_interruptible_slowpath(lock, ctx);
-	return ret;
+		return 0;
+	}
+
+	return __ww_mutex_lock_interruptible_slowpath(lock, ctx);
 }
 EXPORT_SYMBOL(__ww_mutex_lock_interruptible);
 
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -16,32 +16,6 @@
 #define mutex_remove_waiter(lock, waiter, task) \
 		__list_del((waiter)->list.prev, (waiter)->list.next)
 
-#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-/*
- * The mutex owner can get read and written to locklessly.
- * We should use WRITE_ONCE when writing the owner value to
- * avoid store tearing, otherwise, a thread could potentially
- * read a partially written and incomplete owner value.
- */
-static inline void mutex_set_owner(struct mutex *lock)
-{
-	WRITE_ONCE(lock->owner, current);
-}
-
-static inline void mutex_clear_owner(struct mutex *lock)
-{
-	WRITE_ONCE(lock->owner, NULL);
-}
-#else
-static inline void mutex_set_owner(struct mutex *lock)
-{
-}
-
-static inline void mutex_clear_owner(struct mutex *lock)
-{
-}
-#endif
-
 #define debug_mutex_wake_waiter(lock, waiter)		do { } while (0)
 #define debug_mutex_free_waiter(waiter)			do { } while (0)
 #define debug_mutex_add_waiter(lock, waiter, ti)	do { } while (0)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -75,11 +75,11 @@
 #include <linux/compiler.h>
 #include <linux/frame.h>
 #include <linux/prefetch.h>
+#include <linux/mutex.h>
 
 #include <asm/switch_to.h>
 #include <asm/tlb.h>
 #include <asm/irq_regs.h>
-#include <asm/mutex.h>
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #endif

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH 2/3] locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES
  2016-08-23 12:46 [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Peter Zijlstra
  2016-08-23 12:46 ` [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner Peter Zijlstra
@ 2016-08-23 12:46 ` Peter Zijlstra
  2016-08-23 12:46 ` [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation Peter Zijlstra
  2016-08-23 16:17 ` [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Davidlohr Bueso
  3 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 12:46 UTC (permalink / raw)
  To: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low, Peter Zijlstra

[-- Attachment #1: peterz-locking-mutex-enable-spin_on_owner-debug.patch --]
[-- Type: text/plain, Size: 557 bytes --]

Now that mutex::count and mutex::owner are the same field, we can
allow SPIN_ON_OWNER while DEBUG_MUTEX.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/Kconfig.locks |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -225,7 +225,7 @@ config ARCH_SUPPORTS_ATOMIC_RMW
 
 config MUTEX_SPIN_ON_OWNER
 	def_bool y
-	depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW
+	depends on SMP && ARCH_SUPPORTS_ATOMIC_RMW
 
 config RWSEM_SPIN_ON_OWNER
        def_bool y

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation
  2016-08-23 12:46 [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Peter Zijlstra
  2016-08-23 12:46 ` [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner Peter Zijlstra
  2016-08-23 12:46 ` [RFC][PATCH 2/3] locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES Peter Zijlstra
@ 2016-08-23 12:46 ` Peter Zijlstra
  2016-08-23 12:56   ` Peter Zijlstra
       [not found]   ` <57BCA869.1050501@hpe.com>
  2016-08-23 16:17 ` [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Davidlohr Bueso
  3 siblings, 2 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 12:46 UTC (permalink / raw)
  To: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low, Peter Zijlstra

[-- Attachment #1: peterz-locking-mutex-steal.patch --]
[-- Type: text/plain, Size: 3380 bytes --]

Now that we have an atomic owner field, we can do explicit lock
handoff. Use this to avoid starvation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/mutex.c |   44 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -53,6 +53,7 @@ __mutex_init(struct mutex *lock, const c
 EXPORT_SYMBOL(__mutex_init);
 
 #define MUTEX_FLAG_WAITERS	0x01
+#define MUTEX_FLAG_HANDOFF	0x02
 
 #define MUTEX_FLAG_ALL		0x03
 
@@ -84,6 +85,29 @@ static inline void __mutex_clear_flag(st
 	atomic_long_andnot(flag, &lock->owner);
 }
 
+static inline bool __mutex_waiter_is_first(struct mutex *lock, struct mutex_waiter *waiter)
+{
+	return list_first_entry(&lock->wait_list, struct mutex_waiter, list) == waiter;
+}
+
+static void __mutex_handoff(struct mutex *lock, struct task_struct *task)
+{
+	unsigned long owner = atomic_long_read(&lock->owner);
+
+	for (;;) {
+		unsigned long old, new;
+
+		new = (owner & MUTEX_FLAG_WAITERS);
+		new |= (unsigned long)task;
+
+		old = atomic_long_cmpxchg(&lock->owner, owner, new);
+		if (old == owner)
+			break;
+
+		owner = old;
+	}
+}
+
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * We split the mutex lock/unlock logic into separate fastpath and
@@ -414,7 +438,7 @@ static bool mutex_optimistic_spin(struct
 }
 #endif
 
-static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock);
+static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigned long owner);
 
 /**
  * mutex_unlock - release the mutex
@@ -439,6 +463,9 @@ void __sched mutex_unlock(struct mutex *
 	for (;;) {
 		unsigned long old;
 
+		if (owner & MUTEX_FLAG_HANDOFF)
+			break;
+
 		old = atomic_long_cmpxchg_release(&lock->owner, owner, owner & 0x03);
 		if (old == owner)
 			break;
@@ -447,7 +474,7 @@ void __sched mutex_unlock(struct mutex *
 	}
 
 	if (owner & 0x03);
-		__mutex_unlock_slowpath(lock);
+		__mutex_unlock_slowpath(lock, owner);
 }
 EXPORT_SYMBOL(mutex_unlock);
 
@@ -545,7 +572,7 @@ __mutex_lock_common(struct mutex *lock,
 	list_add_tail(&waiter.list, &lock->wait_list);
 	waiter.task = task;
 
-	if (list_first_entry(&lock->wait_list, struct mutex_waiter, list) == &waiter)
+	if (__mutex_waiter_is_first(lock, &waiter))
 		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
 
 	lock_contended(&lock->dep_map, ip);
@@ -573,8 +600,14 @@ __mutex_lock_common(struct mutex *lock,
 		schedule_preempt_disabled();
 		spin_lock_mutex(&lock->wait_lock, flags);
 
+		if (__mutex_owner(lock) == current)
+			break;
+
 		if (__mutex_trylock(lock))
 			break;
+
+		if (__mutex_waiter_is_first(lock, &waiter))
+			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
 	}
 	__set_task_state(task, TASK_RUNNING);
 
@@ -707,7 +740,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interr
 /*
  * Release the lock, slowpath:
  */
-static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock)
+static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigned long owner)
 {
 	unsigned long flags;
 	WAKE_Q(wake_q);
@@ -722,6 +755,9 @@ static noinline void __sched __mutex_unl
 				list_entry(lock->wait_list.next,
 					   struct mutex_waiter, list);
 
+		if (owner & MUTEX_FLAG_HANDOFF)
+			__mutex_handoff(lock, waiter->task);
+
 		debug_mutex_wake_waiter(lock, waiter);
 		wake_q_add(&wake_q, waiter->task);
 	}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation
  2016-08-23 12:46 ` [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation Peter Zijlstra
@ 2016-08-23 12:56   ` Peter Zijlstra
       [not found]   ` <57BCA869.1050501@hpe.com>
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 12:56 UTC (permalink / raw)
  To: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low

On Tue, Aug 23, 2016 at 02:46:20PM +0200, Peter Zijlstra wrote:
> @@ -573,8 +600,14 @@ __mutex_lock_common(struct mutex *lock,
>  		schedule_preempt_disabled();
>  		spin_lock_mutex(&lock->wait_lock, flags);
>  
> +		if (__mutex_owner(lock) == current)
> +			break;
> +
>  		if (__mutex_trylock(lock))
>  			break;
> +
> +		if (__mutex_waiter_is_first(lock, &waiter))
> +			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>  	}
>  	__set_task_state(task, TASK_RUNNING);
>  

And 'obviously' we can add a spin-on-owner loop in there as well, as
Waiman's patches did, but I didn't bother pulling that in for now.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 12:46 [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Peter Zijlstra
                   ` (2 preceding siblings ...)
  2016-08-23 12:46 ` [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation Peter Zijlstra
@ 2016-08-23 16:17 ` Davidlohr Bueso
  2016-08-23 16:35   ` Jason Low
  2016-08-23 18:53   ` Linus Torvalds
  3 siblings, 2 replies; 34+ messages in thread
From: Davidlohr Bueso @ 2016-08-23 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low

What's the motivation here? Is it just to unify counter and owner for
the starvation issue? If so, is this really the path we wanna take for
a small debug corner case?

I have not looked at the patches yet, but are there any performance minutia
to be aware of?

> 46 files changed, 160 insertions(+), 1298 deletions(-)

Oh my.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 16:17 ` [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Davidlohr Bueso
@ 2016-08-23 16:35   ` Jason Low
  2016-08-23 16:57     ` Peter Zijlstra
  2016-08-24  1:13     ` Jason Low
  2016-08-23 18:53   ` Linus Torvalds
  1 sibling, 2 replies; 34+ messages in thread
From: Jason Low @ 2016-08-23 16:35 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: jason.low2, Peter Zijlstra, Linus Torvalds, Waiman Long,
	Ding Tianhong, Thomas Gleixner, Will Deacon, Ingo Molnar,
	Imre Deak, Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2

On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote:
> What's the motivation here? Is it just to unify counter and owner for
> the starvation issue? If so, is this really the path we wanna take for
> a small debug corner case?

And we thought our other patch was a bit invasive  :-)

> I have not looked at the patches yet, but are there any performance minutia
> to be aware of?

This would remove all of the mutex architecture specific optimizations
in the (common) fastpath, so that is one thing that could reduce
performance. I'll run some benchmarks to see what some of the
performance impacts of these patches would be.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 16:35   ` Jason Low
@ 2016-08-23 16:57     ` Peter Zijlstra
  2016-08-23 19:36       ` Waiman Long
  2016-08-24  1:13     ` Jason Low
  1 sibling, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 16:57 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Linus Torvalds, Waiman Long, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2

On Tue, Aug 23, 2016 at 09:35:03AM -0700, Jason Low wrote:
> On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote:
> > What's the motivation here? Is it just to unify counter and owner for
> > the starvation issue? If so, is this really the path we wanna take for
> > a small debug corner case?
> 
> And we thought our other patch was a bit invasive  :-)

So I've wanted to do something like this for a while now, and Linus
saying he wanted to always enable the spinning and basically reduce
special cases made me bite the bullet and just do it to see what it
would look like.

So it not only unifies counter and owner for the starvation case, it
does so to allow spinning and debug as well as lock handoff.
It collapses the whole count+owner+yield_to_owner into a single
variable.

It obviously is a tad invasive, but it does make things more similar to
rt-mutex and pi futex, both of which track the owner and pending in the
primary 'word'.

That said, I don't particularly like the new mutex_unlock() code, its
rather more heavy than I would like, although typically the word is
uncontended at unlock and we'd only need a single go at the
cmpxchg-loop.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 16:17 ` [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Davidlohr Bueso
  2016-08-23 16:35   ` Jason Low
@ 2016-08-23 18:53   ` Linus Torvalds
  2016-08-23 20:34     ` Peter Zijlstra
  1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2016-08-23 18:53 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low

On Tue, Aug 23, 2016 at 12:17 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
>
>> 46 files changed, 160 insertions(+), 1298 deletions(-)
>
> Oh my.

Yeah, that looks like a pretty compelling argument right there, if
there isn't any other really major downside to this...

Peter, is there some downside that isn't obvious? Like "Well, this
does regress performance because it now always does X"?

             Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 16:57     ` Peter Zijlstra
@ 2016-08-23 19:36       ` Waiman Long
  2016-08-23 20:41         ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Waiman Long @ 2016-08-23 19:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Davidlohr Bueso, Linus Torvalds, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2

On 08/23/2016 12:57 PM, Peter Zijlstra wrote:
> On Tue, Aug 23, 2016 at 09:35:03AM -0700, Jason Low wrote:
>> On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote:
>>> What's the motivation here? Is it just to unify counter and owner for
>>> the starvation issue? If so, is this really the path we wanna take for
>>> a small debug corner case?
>> And we thought our other patch was a bit invasive  :-)
> So I've wanted to do something like this for a while now, and Linus
> saying he wanted to always enable the spinning and basically reduce
> special cases made me bite the bullet and just do it to see what it
> would look like.
>
> So it not only unifies counter and owner for the starvation case, it
> does so to allow spinning and debug as well as lock handoff.
> It collapses the whole count+owner+yield_to_owner into a single
> variable.
>
> It obviously is a tad invasive, but it does make things more similar to
> rt-mutex and pi futex, both of which track the owner and pending in the
> primary 'word'.
>
> That said, I don't particularly like the new mutex_unlock() code, its
> rather more heavy than I would like, although typically the word is
> uncontended at unlock and we'd only need a single go at the
> cmpxchg-loop.
>
>

I think this is the right way to go. There isn't any big change in the 
slowpath, so the contended performance should be the same. The fastpath, 
however, will get a bit slower as a single atomic op plus a jump 
instruction (a single cacheline load) is replaced by a read-and-test and 
compxchg (potentially 2 cacheline loads) which will be somewhat slower 
than the optimized assembly code. Alternatively, you can replace the 
__mutex_trylock() in mutex_lock() by just a blind cmpxchg to optimize 
the fastpath further. A cmpxhcg will still be a tiny bit slower than 
other atomic ops, but it will be more acceptable, I think.


BTW, I got the following compilation warning when I tried your patch:

drivers/gpu/drm/i915/i915_gem_shrinker.c: In function ‘mutex_is_locked_by’:
drivers/gpu/drm/i915/i915_gem_shrinker.c:44:22: error: invalid operands 
to binary == (have ‘atomic_long_t’ and ‘struct task_struct *’)
return mutex->owner == task;
^
CC [M] drivers/gpu/drm/i915/intel_psr.o
drivers/gpu/drm/i915/i915_gem_shrinker.c:49:1: warning: control reaches 
end of non-void function [-Wreturn-type]
}
^
make[4]: *** [drivers/gpu/drm/i915/i915_gem_shrinker.o] Error 1

Apparently, you may need to look to see if there are other direct access 
of the owner field in the other code.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-23 12:46 ` [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner Peter Zijlstra
@ 2016-08-23 19:55   ` Waiman Long
  2016-08-23 20:52     ` Tim Chen
  2016-08-23 21:09     ` Peter Zijlstra
  2016-08-23 20:17   ` Waiman Long
  2016-08-24  9:56   ` Will Deacon
  2 siblings, 2 replies; 34+ messages in thread
From: Waiman Long @ 2016-08-23 19:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Jason Low, Ding Tianhong, Thomas Gleixner,
	Will Deacon, Ingo Molnar, Imre Deak, Linux Kernel Mailing List,
	Davidlohr Bueso, Tim Chen, Terry Rudd, Paul E. McKenney,
	Jason Low

On 08/23/2016 08:46 AM, Peter Zijlstra wrote:

I have 2 more comments about the code.
1) There are a couple of places where you only use 0x3 in mutex.c. They 
should be replaced by the symbolic name instead.
2) We should make __mutex_lock_slowpath() a noinline function just like 
__mutex_lock_killable_slowpath() or __mutex_lock_interruptible_slowpath().

Cheers,
Longman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-23 12:46 ` [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner Peter Zijlstra
  2016-08-23 19:55   ` Waiman Long
@ 2016-08-23 20:17   ` Waiman Long
  2016-08-23 20:31     ` Peter Zijlstra
  2016-08-24  9:56   ` Will Deacon
  2 siblings, 1 reply; 34+ messages in thread
From: Waiman Long @ 2016-08-23 20:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Jason Low, Ding Tianhong, Thomas Gleixner,
	Will Deacon, Ingo Molnar, Imre Deak, Linux Kernel Mailing List,
	Davidlohr Bueso, Tim Chen, Terry Rudd, Paul E. McKenney,
	Jason Low

On 08/23/2016 08:46 AM, Peter Zijlstra wrote:
>   /*
>    * Simple, straightforward mutexes with strict semantics:
> @@ -48,13 +49,9 @@
>    *   locks and tasks (and only those tasks)
>    */
>   struct mutex {
> -	/* 1: unlocked, 0: locked, negative: locked, possible waiters */
> -	atomic_t		count;
> +	atomic_long_t		owner;
>   	spinlock_t		wait_lock;
>   	struct list_head	wait_list;
> -#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_MUTEX_SPIN_ON_OWNER)
> -	struct task_struct	*owner;
> -#endif
>   #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
>   	struct optimistic_spin_queue osq; /* Spinner MCS lock */
>   #endif

I think you should put the wait_lock and osq next to each other to save 
8 bytes in space on 64-bit machines.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-23 20:17   ` Waiman Long
@ 2016-08-23 20:31     ` Peter Zijlstra
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 20:31 UTC (permalink / raw)
  To: Waiman Long
  Cc: Linus Torvalds, Jason Low, Ding Tianhong, Thomas Gleixner,
	Will Deacon, Ingo Molnar, Imre Deak, Linux Kernel Mailing List,
	Davidlohr Bueso, Tim Chen, Terry Rudd, Paul E. McKenney,
	Jason Low

On Tue, Aug 23, 2016 at 04:17:54PM -0400, Waiman Long wrote:
> On 08/23/2016 08:46 AM, Peter Zijlstra wrote:
> >  /*
> >   * Simple, straightforward mutexes with strict semantics:
> >@@ -48,13 +49,9 @@
> >   *   locks and tasks (and only those tasks)
> >   */
> >  struct mutex {
> >-	/* 1: unlocked, 0: locked, negative: locked, possible waiters */
> >-	atomic_t		count;
> >+	atomic_long_t		owner;
> >  	spinlock_t		wait_lock;
> >  	struct list_head	wait_list;
> >-#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_MUTEX_SPIN_ON_OWNER)
> >-	struct task_struct	*owner;
> >-#endif
> >  #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
> >  	struct optimistic_spin_queue osq; /* Spinner MCS lock */
> >  #endif
> 
> I think you should put the wait_lock and osq next to each other to save 8
> bytes in space on 64-bit machines.

Right you are.. didn't get around to looking at layout yet. Just barely
got it to compile and boot :-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation
       [not found]   ` <57BCA869.1050501@hpe.com>
@ 2016-08-23 20:32     ` Peter Zijlstra
  2016-08-24 19:50       ` Waiman Long
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 20:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: Linus Torvalds, Jason Low, Ding Tianhong, Thomas Gleixner,
	Will Deacon, Ingo Molnar, Imre Deak, Linux Kernel Mailing List,
	Davidlohr Bueso, Tim Chen, Terry Rudd, Paul E. McKenney,
	Jason Low

On Tue, Aug 23, 2016 at 03:47:53PM -0400, Waiman Long wrote:
> On 08/23/2016 08:46 AM, Peter Zijlstra wrote:
> >N
> >@@ -573,8 +600,14 @@ __mutex_lock_common(struct mutex *lock,
> >  		schedule_preempt_disabled();
> >  		spin_lock_mutex(&lock->wait_lock, flags);
> >
> >+		if (__mutex_owner(lock) == current)
> >+			break;
> >+
> >  		if (__mutex_trylock(lock))
> >  			break;
> >+
> >+		if (__mutex_waiter_is_first(lock,&waiter))
> >+			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
> >  	}
> >  	__set_task_state(task, TASK_RUNNING);
> >
> >
> 
> You may want to think about doing some spinning while the owner is active
> instead of going back to sleep again here.

For sure; I just didn't bother pulling in your patches. I didn't want to
sink in more time in case people really hated on 1/3 ;-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 18:53   ` Linus Torvalds
@ 2016-08-23 20:34     ` Peter Zijlstra
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 20:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low

On Tue, Aug 23, 2016 at 02:53:07PM -0400, Linus Torvalds wrote:
> On Tue, Aug 23, 2016 at 12:17 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
> >
> >> 46 files changed, 160 insertions(+), 1298 deletions(-)
> >
> > Oh my.
> 
> Yeah, that looks like a pretty compelling argument right there, if
> there isn't any other really major downside to this...
> 
> Peter, is there some downside that isn't obvious? Like "Well, this
> does regress performance because it now always does X"?

The biggest difference is the mutex fast paths, where they were a single
atomic and branch they're now a bit bigger. How much that matters in
practise is something that we'll have to benchmark a bit.

Esp. the mutex_lock() fast-path now also needs to load current, which
at least should be fairly hot but can still be a number of dependent
loads on some archs.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 19:36       ` Waiman Long
@ 2016-08-23 20:41         ` Peter Zijlstra
  2016-08-23 22:34           ` Waiman Long
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 20:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: Jason Low, Davidlohr Bueso, Linus Torvalds, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2, chris

On Tue, Aug 23, 2016 at 03:36:17PM -0400, Waiman Long wrote:
> I think this is the right way to go. There isn't any big change in the
> slowpath, so the contended performance should be the same. The fastpath,
> however, will get a bit slower as a single atomic op plus a jump instruction
> (a single cacheline load) is replaced by a read-and-test and compxchg
> (potentially 2 cacheline loads) which will be somewhat slower than the
> optimized assembly code.

Yeah, I'll try and run some workloads tomorrow if you and Jason don't
beat me to it ;-)

> Alternatively, you can replace the
> __mutex_trylock() in mutex_lock() by just a blind cmpxchg to optimize the
> fastpath further. 

Problem with that is that we need to preserve the flag bits, so we need
the initial load.

Or were you thinking of: cmpxchg(&lock->owner, 0UL, (unsigned
long)current), which only works on uncontended locks?

> A cmpxhcg will still be a tiny bit slower than other
> atomic ops, but it will be more acceptable, I think.

I don't think cmpxchg is much slower than say xadd or xchg, the typical
problem with cmpxchg is the looping part, but single instruction costs
should be similar.

> BTW, I got the following compilation warning when I tried your patch:
> 
> drivers/gpu/drm/i915/i915_gem_shrinker.c: In function ‘mutex_is_locked_by’:
> drivers/gpu/drm/i915/i915_gem_shrinker.c:44:22: error: invalid operands to
> binary == (have ‘atomic_long_t’ and ‘struct task_struct *’)
> return mutex->owner == task;
> ^
> CC [M] drivers/gpu/drm/i915/intel_psr.o
> drivers/gpu/drm/i915/i915_gem_shrinker.c:49:1: warning: control reaches end
> of non-void function [-Wreturn-type]
> }
> ^
> make[4]: *** [drivers/gpu/drm/i915/i915_gem_shrinker.o] Error 1
> 
> Apparently, you may need to look to see if there are other direct access of
> the owner field in the other code.

AArggghh.. that is horrible horrible code.

It tries to do a recursive mutex and pokes at the innards of the mutex.
that so deserves to break.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-23 19:55   ` Waiman Long
@ 2016-08-23 20:52     ` Tim Chen
  2016-08-23 21:03       ` Peter Zijlstra
  2016-08-23 21:09     ` Peter Zijlstra
  1 sibling, 1 reply; 34+ messages in thread
From: Tim Chen @ 2016-08-23 20:52 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra
  Cc: Linus Torvalds, Jason Low, Ding Tianhong, Thomas Gleixner,
	Will Deacon, Ingo Molnar, Imre Deak, Linux Kernel Mailing List,
	Davidlohr Bueso, Terry Rudd, Paul E. McKenney, Jason Low

On Tue, 2016-08-23 at 15:55 -0400, Waiman Long wrote:
> On 08/23/2016 08:46 AM, Peter Zijlstra wrote:
> 
> I have 2 more comments about the code.
> 1) There are a couple of places where you only use 0x3 in mutex.c. They 
> should be replaced by the symbolic name instead.

May be easier to read if (owner & 0x3) and
(owner & ~0x3) are changed to something like 
_owner_flag(owner) and _owner_task(owner).

Tim

> 2) We should make __mutex_lock_slowpath() a noinline function just like 
> __mutex_lock_killable_slowpath() or __mutex_lock_interruptible_slowpath().
> 
> Cheers,
> Longman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-23 20:52     ` Tim Chen
@ 2016-08-23 21:03       ` Peter Zijlstra
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 21:03 UTC (permalink / raw)
  To: Tim Chen
  Cc: Waiman Long, Linus Torvalds, Jason Low, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Terry Rudd,
	Paul E. McKenney, Jason Low

On Tue, Aug 23, 2016 at 01:52:46PM -0700, Tim Chen wrote:
> On Tue, 2016-08-23 at 15:55 -0400, Waiman Long wrote:
> > On 08/23/2016 08:46 AM, Peter Zijlstra wrote:
> > 
> > I have 2 more comments about the code.
> > 1) There are a couple of places where you only use 0x3 in mutex.c. They 
> > should be replaced by the symbolic name instead.
> 
> May be easier to read if (owner & 0x3) and
> (owner & ~0x3) are changed to something like 
> _owner_flag(owner) and _owner_task(owner).

Yes that would work. Something like the below..

Note the ';' in the last "-" line that currently ensures we always take
the slow path.

----

--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -55,7 +55,17 @@ EXPORT_SYMBOL(__mutex_init);
 #define MUTEX_FLAG_WAITERS	0x01
 #define MUTEX_FLAG_HANDOFF	0x02
 
-#define MUTEX_FLAG_ALL		0x03
+#define MUTEX_FLAGS		0x03
+
+static inline struct task_struct *__owner_task(unsigned long owner)
+{
+	return (struct task_struct *)(owner & ~MUTEX_FLAGS);
+}
+
+static inline unsigned long __owner_flags(unsigned long owner)
+{
+	return owner & MUTEX_FLAGS;
+}
 
 /*
  * Atomically try to take the lock when it is available
@@ -65,7 +75,7 @@ static inline bool __mutex_trylock(struc
 	unsigned long owner, new_owner;
 
 	owner = atomic_long_read(&lock->owner);
-	if (owner & ~0x03)
+	if (__owner_task(owner))
 		return false;
 
 	new_owner = owner | (unsigned long)current;
@@ -466,14 +476,14 @@ void __sched mutex_unlock(struct mutex *
 		if (owner & MUTEX_FLAG_HANDOFF)
 			break;
 
-		old = atomic_long_cmpxchg_release(&lock->owner, owner, owner & 0x03);
+		old = atomic_long_cmpxchg_release(&lock->owner, owner, __owner_flags(owner));
 		if (old == owner)
 			break;
 
 		owner = old;
 	}
 
-	if (owner & 0x03);
+	if (__owner_flags(owner))
 		__mutex_unlock_slowpath(lock, owner);
 }
 EXPORT_SYMBOL(mutex_unlock);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-23 19:55   ` Waiman Long
  2016-08-23 20:52     ` Tim Chen
@ 2016-08-23 21:09     ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-23 21:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Linus Torvalds, Jason Low, Ding Tianhong, Thomas Gleixner,
	Will Deacon, Ingo Molnar, Imre Deak, Linux Kernel Mailing List,
	Davidlohr Bueso, Tim Chen, Terry Rudd, Paul E. McKenney,
	Jason Low

On Tue, Aug 23, 2016 at 03:55:52PM -0400, Waiman Long wrote:
> On 08/23/2016 08:46 AM, Peter Zijlstra wrote:
> 
> I have 2 more comments about the code.
> 1) There are a couple of places where you only use 0x3 in mutex.c. They
> should be replaced by the symbolic name instead.
> 2) We should make __mutex_lock_slowpath() a noinline function just like
> __mutex_lock_killable_slowpath() or __mutex_lock_interruptible_slowpath().

3) I broken lockdep with the fastpath changes.. we used to only take the
slowpath with debugging, so only the slow paths are annotated. Now we
uncondtionally use the fast paths.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 20:41         ` Peter Zijlstra
@ 2016-08-23 22:34           ` Waiman Long
  0 siblings, 0 replies; 34+ messages in thread
From: Waiman Long @ 2016-08-23 22:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Davidlohr Bueso, Linus Torvalds, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2, chris

On 08/23/2016 04:41 PM, Peter Zijlstra wrote:
> On Tue, Aug 23, 2016 at 03:36:17PM -0400, Waiman Long wrote:
>> I think this is the right way to go. There isn't any big change in the
>> slowpath, so the contended performance should be the same. The fastpath,
>> however, will get a bit slower as a single atomic op plus a jump instruction
>> (a single cacheline load) is replaced by a read-and-test and compxchg
>> (potentially 2 cacheline loads) which will be somewhat slower than the
>> optimized assembly code.
> Yeah, I'll try and run some workloads tomorrow if you and Jason don't
> beat me to it ;-)
>
>> Alternatively, you can replace the
>> __mutex_trylock() in mutex_lock() by just a blind cmpxchg to optimize the
>> fastpath further.
> Problem with that is that we need to preserve the flag bits, so we need
> the initial load.
>
> Or were you thinking of: cmpxchg(&lock->owner, 0UL, (unsigned
> long)current), which only works on uncontended locks?

Yes, that is what I was thinking about. It was a lesson learned in my 
qspinlock patch. I used to do a TATAS in the locking fastpath. Then I 
was told that we should optimize the for the uncontended case. So I 
changed the fastpath to just TAS. I am sure if the same rule should 
apply for mutex or not.

>> A cmpxhcg will still be a tiny bit slower than other
>> atomic ops, but it will be more acceptable, I think.
> I don't think cmpxchg is much slower than say xadd or xchg, the typical
> problem with cmpxchg is the looping part, but single instruction costs
> should be similar.

My testing in the past showed that cmpxchg was tiny bit slower than xchg 
or atomic_inc, for example. In this context, the performance difference, 
if any, should not be noticeable.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-23 16:35   ` Jason Low
  2016-08-23 16:57     ` Peter Zijlstra
@ 2016-08-24  1:13     ` Jason Low
  2016-08-25 12:32       ` Peter Zijlstra
  2016-08-25 15:43       ` Peter Zijlstra
  1 sibling, 2 replies; 34+ messages in thread
From: Jason Low @ 2016-08-24  1:13 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: jason.low2, Peter Zijlstra, Linus Torvalds, Waiman Long,
	Ding Tianhong, Thomas Gleixner, Will Deacon, Ingo Molnar,
	Imre Deak, Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2

On Tue, 2016-08-23 at 09:35 -0700, Jason Low wrote:
> On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote:
> > I have not looked at the patches yet, but are there any performance minutia
> > to be aware of?
> 
> This would remove all of the mutex architecture specific optimizations
> in the (common) fastpath, so that is one thing that could reduce
> performance. I'll run some benchmarks to see what some of the
> performance impacts of these patches would be.

I tested this patch on an 8 socket system with the high_systime AIM7
workload with diskfs. The patch provided big performance improvements in
terms of throughput in the highly contended cases.

-------------------------------------------------
|  users      | avg throughput | avg throughput |
              | without patch  | with patch     |
-------------------------------------------------
| 10 - 90     |   13,943 JPM   |   14,432 JPM   |
-------------------------------------------------
| 100 - 900   |   75,475 JPM   |  102,922 JPM   |
-------------------------------------------------
| 1000 - 1900 |   77,299 JPM   |  115,271 JPM   |
-------------------------------------------------

Unfortunately, at 2000 users, the modified kernel locked up.

# INFO: task reaim:<#> blocked for more than 120 seconds.

So something appears to be buggy.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-23 12:46 ` [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner Peter Zijlstra
  2016-08-23 19:55   ` Waiman Long
  2016-08-23 20:17   ` Waiman Long
@ 2016-08-24  9:56   ` Will Deacon
  2016-08-24 15:34     ` Peter Zijlstra
  2 siblings, 1 reply; 34+ messages in thread
From: Will Deacon @ 2016-08-24  9:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low

On Tue, Aug 23, 2016 at 02:46:18PM +0200, Peter Zijlstra wrote:
> There's a number of iffy in mutex because mutex::count and
> mutex::owner are two different fields; this too is the reason
> MUTEX_SPIN_ON_OWNER and DEBUG_MUTEX are mutually exclusive.
> 
> Cure this by folding them into a single atomic_long_t field.
> 
> This nessecairly kills all the architecture specific mutex code.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

[...]

>  void __sched mutex_unlock(struct mutex *lock)
>  {
> -	/*
> -	 * The unlocking fastpath is the 0->1 transition from 'locked'
> -	 * into 'unlocked' state:
> -	 */
> -#ifndef CONFIG_DEBUG_MUTEXES
> -	/*
> -	 * When debugging is enabled we must not clear the owner before time,
> -	 * the slow path will always be taken, and that clears the owner field
> -	 * after verifying that it was indeed current.
> -	 */
> -	mutex_clear_owner(lock);
> +	unsigned long owner;
> +
> +#ifdef CONFIG_DEBUG_MUTEXES
> +	DEBUG_LOCKS_WARN_ON(__mutex_owner(lock) != current);
>  #endif
> -	__mutex_fastpath_unlock(&lock->count, __mutex_unlock_slowpath);
> -}
>  
> +	owner = atomic_long_read(&lock->owner);
> +	for (;;) {
> +		unsigned long old;
> +
> +		old = atomic_long_cmpxchg_release(&lock->owner, owner, owner & 0x03);
> +		if (old == owner)
> +			break;
> +
> +		owner = old;
> +	}

Can you rewrite this using atomic_long_fetch_and_release?

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-24  9:56   ` Will Deacon
@ 2016-08-24 15:34     ` Peter Zijlstra
  2016-08-24 16:52       ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-24 15:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low

On Wed, Aug 24, 2016 at 10:56:59AM +0100, Will Deacon wrote:
> > +	owner = atomic_long_read(&lock->owner);
> > +	for (;;) {
> > +		unsigned long old;
> > +
> > +		old = atomic_long_cmpxchg_release(&lock->owner, owner, owner & 0x03);
> > +		if (old == owner)
> > +			break;
> > +
> > +		owner = old;
> > +	}
> 
> Can you rewrite this using atomic_long_fetch_and_release?

Yes, until patch 3/3.. but now that you mention it I think we can do:

	owner = atomic_long_read(&lock->owner);
	if (!(owner & MUTEX_FLAG_HANDOFF))
		(void)atomic_long_fetch_and_release(MUTEX_FLAGS, &lock->owner);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-24 15:34     ` Peter Zijlstra
@ 2016-08-24 16:52       ` Peter Zijlstra
  2016-08-24 16:54         ` Will Deacon
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-24 16:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low

On Wed, Aug 24, 2016 at 05:34:12PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 24, 2016 at 10:56:59AM +0100, Will Deacon wrote:
> > > +	owner = atomic_long_read(&lock->owner);
> > > +	for (;;) {
> > > +		unsigned long old;
> > > +
> > > +		old = atomic_long_cmpxchg_release(&lock->owner, owner, owner & 0x03);
> > > +		if (old == owner)
> > > +			break;
> > > +
> > > +		owner = old;
> > > +	}
> > 
> > Can you rewrite this using atomic_long_fetch_and_release?
> 
> Yes, until patch 3/3.. but now that you mention it I think we can do:
> 
> 	owner = atomic_long_read(&lock->owner);
> 	if (!(owner & MUTEX_FLAG_HANDOFF))
> 		(void)atomic_long_fetch_and_release(MUTEX_FLAGS, &lock->owner);
> 

And of course, x86 would very much like atomic_long_and_release() here,
such that it can do LOCK ADD instead of a LOCK CMPXCHG loop. But of
course, we don't have that ...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner
  2016-08-24 16:52       ` Peter Zijlstra
@ 2016-08-24 16:54         ` Will Deacon
  0 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2016-08-24 16:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Waiman Long, Jason Low, Ding Tianhong,
	Thomas Gleixner, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Davidlohr Bueso, Tim Chen, Terry Rudd,
	Paul E. McKenney, Jason Low

On Wed, Aug 24, 2016 at 06:52:44PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 24, 2016 at 05:34:12PM +0200, Peter Zijlstra wrote:
> > On Wed, Aug 24, 2016 at 10:56:59AM +0100, Will Deacon wrote:
> > > > +	owner = atomic_long_read(&lock->owner);
> > > > +	for (;;) {
> > > > +		unsigned long old;
> > > > +
> > > > +		old = atomic_long_cmpxchg_release(&lock->owner, owner, owner & 0x03);
> > > > +		if (old == owner)
> > > > +			break;
> > > > +
> > > > +		owner = old;
> > > > +	}
> > > 
> > > Can you rewrite this using atomic_long_fetch_and_release?
> > 
> > Yes, until patch 3/3.. but now that you mention it I think we can do:
> > 
> > 	owner = atomic_long_read(&lock->owner);
> > 	if (!(owner & MUTEX_FLAG_HANDOFF))
> > 		(void)atomic_long_fetch_and_release(MUTEX_FLAGS, &lock->owner);
> > 
> 
> And of course, x86 would very much like atomic_long_and_release() here,
> such that it can do LOCK ADD instead of a LOCK CMPXCHG loop. But of
> course, we don't have that ...

... yeah, I noticed that. There is a curious use of atomic_and in
linux/atomic.h, but it's packed full of false promises.

Will

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation
  2016-08-23 20:32     ` Peter Zijlstra
@ 2016-08-24 19:50       ` Waiman Long
  2016-08-25  8:11         ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Waiman Long @ 2016-08-24 19:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Jason Low, Ding Tianhong, Thomas Gleixner,
	Will Deacon, Ingo Molnar, Imre Deak, Linux Kernel Mailing List,
	Davidlohr Bueso, Tim Chen, Terry Rudd, Paul E. McKenney,
	Jason Low

On 08/23/2016 04:32 PM, Peter Zijlstra wrote:
> On Tue, Aug 23, 2016 at 03:47:53PM -0400, Waiman Long wrote:
>> On 08/23/2016 08:46 AM, Peter Zijlstra wrote:
>>> N
>>> @@ -573,8 +600,14 @@ __mutex_lock_common(struct mutex *lock,
>>>   		schedule_preempt_disabled();
>>>   		spin_lock_mutex(&lock->wait_lock, flags);
>>>
>>> +		if (__mutex_owner(lock) == current)
>>> +			break;
>>> +
>>>   		if (__mutex_trylock(lock))
>>>   			break;
>>> +
>>> +		if (__mutex_waiter_is_first(lock,&waiter))
>>> +			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
>>>   	}
>>>   	__set_task_state(task, TASK_RUNNING);
>>>
>>>
>> You may want to think about doing some spinning while the owner is active
>> instead of going back to sleep again here.
> For sure; I just didn't bother pulling in your patches. I didn't want to
> sink in more time in case people really hated on 1/3 ;-)

I think there is race in how the handoff is being done.

CPU 0                   CPU 1                   CPU 2

-----                   -----                   -----

__mutex_lock_common:                            mutex_optimistic_spin:

   __mutex_trylock()

                         mutex_unlock:

                           if (owner&  

                              MUTEX_FLAG_HANDOFF)

                           owner&= 0x3;

                                                    __mutex_trylock();

                                                      owner = CPU2;

   __mutex_set_flag(lock,

     MUTEX_FLAG_HANDOFF)

                         __mutex_unlock_slowpath:

                         __mutex_handoff:

                           owner = CPU0;


Now both CPUs 1 and 2 think they have the lock. One way to fix that is
to check if the owner is still the original lock holder (CPU 0) before
doing the handoff, like:

--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -97,6 +97,8 @@ static void __mutex_handoff(struct mutex *lock, struct 
task_st
         for (;;) {
                 unsigned long old, new;

+               if ((owner & ~MUTEX_FLAG_ALL) != current)
+                       break;
                 new = (owner & MUTEX_FLAG_WAITERS);
                 new |= (unsigned long)task;

I also think that the MUTEX_FLAG_HANDOFF bit needs to be cleared if the list
is empty.

@@ -614,7 +633,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned
         mutex_remove_waiter(lock, &waiter, task);
         /* set it to 0 if there are no waiters left: */
         if (likely(list_empty(&lock->wait_list)))
-               __mutex_clear_flag(lock, MUTEX_FLAG_WAITERS);
+               __mutex_clear_flag(lock, 
MUTEX_FLAG_WAITERS|MUTEX_FLAG_HANDOFF);

Or we should try to reset the handoff bit after the while loop exit if 
the bit is still set.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation
  2016-08-24 19:50       ` Waiman Long
@ 2016-08-25  8:11         ` Peter Zijlstra
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-25  8:11 UTC (permalink / raw)
  To: Waiman Long
  Cc: Linus Torvalds, Jason Low, Ding Tianhong, Thomas Gleixner,
	Will Deacon, Ingo Molnar, Imre Deak, Linux Kernel Mailing List,
	Davidlohr Bueso, Tim Chen, Terry Rudd, Paul E. McKenney,
	Jason Low

On Wed, Aug 24, 2016 at 03:50:10PM -0400, Waiman Long wrote:

> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -97,6 +97,8 @@ static void __mutex_handoff(struct mutex *lock, struct
> task_st
>         for (;;) {
>                 unsigned long old, new;
> 
> +               if ((owner & ~MUTEX_FLAG_ALL) != current)
> +                       break;
>                 new = (owner & MUTEX_FLAG_WAITERS);
>                 new |= (unsigned long)task;
> 
> I also think that the MUTEX_FLAG_HANDOFF bit needs to be cleared if the list
> is empty.
> 
> @@ -614,7 +633,7 @@ __mutex_lock_common(struct mutex *lock, long state,
> unsigned
>         mutex_remove_waiter(lock, &waiter, task);
>         /* set it to 0 if there are no waiters left: */
>         if (likely(list_empty(&lock->wait_list)))
> -               __mutex_clear_flag(lock, MUTEX_FLAG_WAITERS);
> +               __mutex_clear_flag(lock,
> MUTEX_FLAG_WAITERS|MUTEX_FLAG_HANDOFF);
> 
> Or we should try to reset the handoff bit after the while loop exit if the
> bit is still set.

Yes, I think you're right. I've also found another issue wrt WAITERS in
patch 1.

I'm not trying to get aim7 running to see if I can reproduce Jason's
results and verify things.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-24  1:13     ` Jason Low
@ 2016-08-25 12:32       ` Peter Zijlstra
  2016-08-25 15:43       ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-25 12:32 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Linus Torvalds, Waiman Long, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2

On Tue, Aug 23, 2016 at 06:13:43PM -0700, Jason Low wrote:

> I tested this patch on an 8 socket system with the high_systime AIM7
> workload with diskfs. The patch provided big performance improvements in
> terms of throughput in the highly contended cases.
> 
> -------------------------------------------------
> |  users      | avg throughput | avg throughput |
>               | without patch  | with patch     |
> -------------------------------------------------
> | 10 - 90     |   13,943 JPM   |   14,432 JPM   |
> -------------------------------------------------
> | 100 - 900   |   75,475 JPM   |  102,922 JPM   |
> -------------------------------------------------
> | 1000 - 1900 |   77,299 JPM   |  115,271 JPM   |
> -------------------------------------------------
> 
> Unfortunately, at 2000 users, the modified kernel locked up.
> 
> # INFO: task reaim:<#> blocked for more than 120 seconds.
> 
> So something appears to be buggy.

Right, so like said I think I found the reason for the lockup and Waiman
appears to have found the reason for your insane performance increase.

Running AIM7 takes ludicrous amounts of time though, so I hacked it up
like below.

That changes two things, it uses log10(rl->runnum) as scale factor and
allows overriding chld_alrm. I run it with -O60, which gets semi decent
runtimes.



---
diff --git a/osdl-aim-7/src/driver.c b/osdl-aim-7/src/driver.c
index 306e23b..03be655 100644
--- a/osdl-aim-7/src/driver.c
+++ b/osdl-aim-7/src/driver.c
@@ -98,6 +98,8 @@ struct runloop_input *rl_vars;
 struct disk_data *my_disk;
 struct _aimList *global_list;
 
+int alarm_timeout = 0;
+
 int flag = 0;
 /* for getopt */
 int opt_num = 0;
@@ -222,13 +224,14 @@ int main(int argc, char **argv)
 			{"config", 1, NULL, 'c'},
 			{"nosync", 0, NULL, 'y'},  /* Remove the sync'y behavior */
 			{"guesspeak", 0, NULL, 'g'}, /* terrible, but we've exhausted the alphabet */
+			{"timeout", 1, NULL, 'O'},
  			{0, 0, 0, 0}
 		};
 
-		c = getopt_long(argc, argv, "bvs:e:i:j:d::f:l:p:r:c:Z:z:mqothxyg",
+		c = getopt_long(argc, argv, "bvs:e:i:j:d::f:l:p:r:c:Z:z:O:mqothxyg",
 				long_options, &option_index);
 #elif hpux
-		c = getopt(argc, argv, "bvs:e:i:j:d::f:l:p:r:c:Z:z:mqothxyg");
+		c = getopt(argc, argv, "bvs:e:i:j:d::f:l:p:r:c:Z:z:O:mqothxyg");
 #endif
 
 		if (c == -1)
@@ -325,6 +328,9 @@ int main(int argc, char **argv)
 			print_usage();
 			exit(1);
 			break;
+		case 'O':
+			alarm_timeout = atoi(optarg);
+			break;
 /* MARCIA - DAN z: pass config file, Z: pass tool/script name (default perf_tools.sh) */
 		case 'Z':
 			tool_name = optarg;
@@ -909,7 +915,7 @@ int runloop(struct _aimList *tlist, struct runloop_input *rl)
 		long start_tick;
 		long delta = 0;
 		int chld_alrm = 0;
-
+		int timo;
 
 		close(umbilical[0]);
 		/* Step 1: seed random number generators
@@ -945,7 +951,15 @@ int runloop(struct _aimList *tlist, struct runloop_input *rl)
 			chld_alrm = 10;
 		}
 		/* now we set a timeout alarm */
-		alarm(rl->runnum * chld_alrm);
+
+		if (alarm_timeout > 0)
+			chld_alrm = alarm_timeout;
+
+		timo = (unsigned int)(log10((double)rl->runnum) * chld_alrm);
+
+		fprintf(stderr, "alarm: %d = log10(%d) * %d\n", timo, rl->runnum, chld_alrm);
+
+		alarm(timo);
 		/*
 		 * Step 4: Set up mechanism for random 
 		 * selection of directory for writes during tests

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-24  1:13     ` Jason Low
  2016-08-25 12:32       ` Peter Zijlstra
@ 2016-08-25 15:43       ` Peter Zijlstra
  2016-08-25 16:33         ` Waiman Long
  2016-08-25 19:11         ` huang ying
  1 sibling, 2 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-25 15:43 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Linus Torvalds, Waiman Long, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2

On Tue, Aug 23, 2016 at 06:13:43PM -0700, Jason Low wrote:
> I tested this patch on an 8 socket system with the high_systime AIM7
> workload with diskfs. The patch provided big performance improvements in
> terms of throughput in the highly contended cases.
> 
> -------------------------------------------------
> |  users      | avg throughput | avg throughput |
>               | without patch  | with patch     |
> -------------------------------------------------
> | 10 - 90     |   13,943 JPM   |   14,432 JPM   |
> -------------------------------------------------
> | 100 - 900   |   75,475 JPM   |  102,922 JPM   |
> -------------------------------------------------
> | 1000 - 1900 |   77,299 JPM   |  115,271 JPM   |
> -------------------------------------------------
> 
> Unfortunately, at 2000 users, the modified kernel locked up.
> 
> # INFO: task reaim:<#> blocked for more than 120 seconds.
> 
> So something appears to be buggy.

So with the previously given changes to reaim, I get the below results
on my 4 socket Haswell with the new version of 1/3 (also below).

I still need to update 3/3..

Note that I think my reaim change wrecked the jobs/min calculation
somehow, as it keeps increasing. I do think however that the numbers are
comparable between runs, since they're wrecked the same way.


*********************  Data Separator  *****************************
Title:    hsw-ex
Kernel:   4.8.0-rc3-00185-g9f55477
Date:     Thu Aug 25 15:29:51 CEST 2016
Command:  /usr/local/share/reaim/reaim -O60 -s10 -e2000 -t -j100 -i[10|100] -y -f /usr/local/share/reaim/workfile.high_systime
Workload: high_systime
Time:     13.44 mins == 73 mins, 26 secs
                 Jobs/min/ Jobs/sec/ Time:   Time:  Time:   Time:           Running child time
Forks  Jobs/min   child     child parent  childU childS  std_dev   JTI   :max  :min
10     1040.00    104.00      1.73    60.00   51.87 507.73    0.00  100.00 60.00 60.00
20     1600.00     80.00      1.33    78.00   88.23 1345.41    0.00  100.00 78.00 78.00
30     2127.27     70.91      1.18    88.00  133.77 2358.77    0.00  100.00 88.00 88.00
40     2599.73     64.99      1.08    96.01  179.21 3482.58    0.00   99.00 96.01 96.00
50     3088.50     61.77      1.03   101.02  203.39 4627.08    0.01   99.00 101.02 101.00
60     3531.08     58.85      0.98   106.03  241.29 5823.03    0.01   99.00 106.03 106.00
70     3969.83     56.71      0.95   110.03  282.49 7146.39    0.01   99.00 110.03 110.00
80     4377.80     54.72      0.91   114.03  298.70 8467.74    0.01   99.00 114.03 114.00
90     4797.95     53.31      0.89   117.05  304.00 9909.76    0.01   99.00 117.05 117.00
100    5198.27     51.98      0.87   120.04  303.13 11172.37    0.01   99.00 120.04 120.00
200    9038.89     45.19      0.75   138.07  354.27 19239.86    0.01   99.00 138.07 138.00
300   12642.67     42.14      0.70   148.07  380.76 20866.01    0.01   99.00 148.07 148.00
400   15970.31     39.93      0.67   156.29  406.51 21991.01    0.08   99.00 156.29 156.00
500   19365.65     38.73      0.65   161.11  417.03 22665.03    0.01   99.00 161.11 161.00
600   22537.92     37.56      0.63   166.12  431.56 23355.16    0.01   99.00 166.12 166.00
700   25536.39     36.48      0.61   171.05  444.61 24013.11    0.33   99.00 171.05 170.00
800   28673.18     35.84      0.60   174.10  454.69 24450.15    0.01   99.00 174.10 174.01
900   31596.71     35.11      0.59   177.74  467.94 24918.53    0.19   99.00 177.74 177.00
1000  34651.27     34.65      0.58   180.08  475.11 25255.57    0.02   99.00 180.08 180.00
1100  37679.09     34.25      0.57   182.17  477.46 25515.20    0.02   99.00 182.17 182.02
1200  40675.76     33.90      0.56   184.09  484.20 25780.88    0.02   99.00 184.09 184.00
1300  43573.08     33.52      0.56   186.17  491.92 26039.82    0.02   99.00 186.17 186.00
1400  46398.98     33.14      0.55   188.28  497.82 26317.58    0.04   99.00 188.28 188.01
1500  48977.03     32.65      0.54   191.11  503.36 26676.47    0.27   99.00 191.11 190.00
1600  51940.48     32.46      0.54   192.22  509.20 26824.58    0.02   99.00 192.22 192.00
1700  54929.58     32.31      0.54   193.12  513.25 26939.84    0.02   99.00 193.12 193.01
1800  57432.12     31.91      0.53   195.57  519.41 27249.93    0.11   99.00 195.57 195.00
1900  60437.38     31.81      0.53   196.17  523.50 27331.56    0.03   99.00 196.17 196.01
2000  62950.82     31.48      0.52   198.25  530.96 27587.86    0.02   99.00 198.25 198.00



*********************  Data Separator  *****************************
Title:    hsw-ex
Kernel:   4.8.0-rc3-00185-g9f55477-dirty
Date:     Thu Aug 25 17:20:37 CEST 2016
Command:  /usr/local/share/reaim/reaim -O60 -s10 -e2000 -t -j100 -i[10|100] -y -f /usr/local/share/reaim/workfile.high_systime
Workload: high_systime
Time:     13.40 mins == 73 mins, 24 secs
                 Jobs/min/ Jobs/sec/ Time:   Time:  Time:   Time:           Running child time
Forks  Jobs/min   child     child parent  childU childS  std_dev   JTI   :max  :min
10     1039.83    103.98      1.73    60.01   42.92 491.23    -nan  -2147483648.00 60.01 60.01
20     1599.79     79.99      1.33    78.01   92.46 1406.04    0.00   99.00 78.01 78.00
30     2127.27     70.91      1.18    88.00  146.93 2330.10    0.00  100.00 88.00 88.00
40     2599.46     64.99      1.08    96.02  187.90 3483.81    0.01   99.00 96.02 96.00
50     3088.50     61.77      1.03   101.02  207.02 4659.42    0.01   99.00 101.02 101.00
60     3531.08     58.85      0.98   106.03  249.31 5855.74    0.01   99.00 106.03 106.00
70     3969.47     56.71      0.95   110.04  287.04 7075.68    0.01   99.00 110.04 110.01
80     4377.80     54.72      0.91   114.03  300.31 8420.47    0.01   99.00 114.03 114.00
90     4798.77     53.32      0.89   117.03  308.47 9801.80    0.01   99.00 117.03 117.00
100    5198.27     51.98      0.87   120.04  314.72 11220.78    0.01   99.00 120.04 120.00
200    9038.89     45.19      0.75   138.07  370.78 19229.77    0.01   99.00 138.07 138.00
300   12643.52     42.15      0.70   148.06  403.30 20834.52    0.01   99.00 148.06 148.00
400   15993.85     39.98      0.67   156.06  429.17 21952.93    0.01   99.00 156.06 156.00
500   19339.24     38.68      0.64   161.33  442.51 22654.86    0.08   99.00 161.33 161.00
600   22542.00     37.57      0.63   166.09  453.04 23334.89    0.01   99.00 166.09 166.00
700   25653.37     36.65      0.61   170.27  467.00 23890.76    0.06   99.00 170.27 170.00
800   28664.94     35.83      0.60   174.15  477.22 24423.89    0.01   99.00 174.15 174.00
900   31685.85     35.21      0.59   177.24  489.02 24826.23    0.03   99.00 177.24 177.01
1000  34643.57     34.64      0.58   180.12  506.51 25232.11    0.02   99.00 180.12 180.00
1100  37693.57     34.27      0.57   182.10  542.31 25458.35    0.02   99.00 182.10 182.00
1200  40675.76     33.90      0.56   184.09  546.54 25718.16    0.02   99.00 184.09 184.00
1300  43575.42     33.52      0.56   186.16  551.71 25990.72    0.02   99.00 186.16 186.00
1400  46406.37     33.15      0.55   188.25  562.14 26247.12    0.02   99.00 188.25 188.01
1500  49203.60     32.80      0.55   190.23  574.29 26506.14    0.03   99.00 190.23 190.00
1600  51970.23     32.48      0.54   192.11  574.22 26771.98    0.02   99.00 192.11 192.00
1700  54646.61     32.15      0.54   194.12  581.82 27023.52    0.23   99.00 194.12 193.00
1800  57538.04     31.97      0.53   195.21  585.85 27137.02    0.02   99.00 195.21 195.00
1900  60434.30     31.81      0.53   196.18  596.44 27277.20    0.02   99.00 196.18 196.00
2000  62973.05     31.49      0.52   198.18  595.86 27527.66    0.03   99.00 198.17 198.00


---
 arch/alpha/include/asm/mutex.h      |   9 --
 arch/arc/include/asm/mutex.h        |  18 ---
 arch/arm/include/asm/mutex.h        |  21 ---
 arch/arm64/include/asm/Kbuild       |   1 -
 arch/avr32/include/asm/mutex.h      |   9 --
 arch/blackfin/include/asm/Kbuild    |   1 -
 arch/c6x/include/asm/mutex.h        |   6 -
 arch/cris/include/asm/mutex.h       |   9 --
 arch/frv/include/asm/mutex.h        |   9 --
 arch/h8300/include/asm/mutex.h      |   9 --
 arch/hexagon/include/asm/mutex.h    |   8 -
 arch/ia64/include/asm/mutex.h       |  90 -----------
 arch/m32r/include/asm/mutex.h       |   9 --
 arch/m68k/include/asm/Kbuild        |   1 -
 arch/metag/include/asm/Kbuild       |   1 -
 arch/microblaze/include/asm/mutex.h |   1 -
 arch/mips/include/asm/Kbuild        |   1 -
 arch/mn10300/include/asm/mutex.h    |  16 --
 arch/nios2/include/asm/mutex.h      |   1 -
 arch/openrisc/include/asm/mutex.h   |  27 ----
 arch/parisc/include/asm/Kbuild      |   1 -
 arch/powerpc/include/asm/mutex.h    | 132 ----------------
 arch/s390/include/asm/mutex.h       |   9 --
 arch/score/include/asm/mutex.h      |   6 -
 arch/sh/include/asm/mutex-llsc.h    | 109 -------------
 arch/sh/include/asm/mutex.h         |  12 --
 arch/sparc/include/asm/Kbuild       |   1 -
 arch/tile/include/asm/Kbuild        |   1 -
 arch/um/include/asm/Kbuild          |   1 -
 arch/unicore32/include/asm/mutex.h  |  20 ---
 arch/x86/include/asm/mutex.h        |   5 -
 arch/x86/include/asm/mutex_32.h     | 110 -------------
 arch/x86/include/asm/mutex_64.h     | 127 ---------------
 arch/xtensa/include/asm/mutex.h     |   9 --
 include/asm-generic/mutex-dec.h     |  88 -----------
 include/asm-generic/mutex-null.h    |  19 ---
 include/asm-generic/mutex-xchg.h    | 120 --------------
 include/asm-generic/mutex.h         |   9 --
 include/linux/mutex-debug.h         |  24 ---
 include/linux/mutex.h               |  46 ++++--
 kernel/locking/mutex-debug.c        |  13 --
 kernel/locking/mutex-debug.h        |  10 --
 kernel/locking/mutex.c              | 307 +++++++++++++++---------------------
 kernel/locking/mutex.h              |  26 ---
 kernel/sched/core.c                 |   2 +-
 45 files changed, 155 insertions(+), 1299 deletions(-)

diff --git a/arch/alpha/include/asm/mutex.h b/arch/alpha/include/asm/mutex.h
deleted file mode 100644
index 458c1f7fbc18..000000000000
--- a/arch/alpha/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/arc/include/asm/mutex.h b/arch/arc/include/asm/mutex.h
deleted file mode 100644
index a2f88ff9f506..000000000000
--- a/arch/arc/include/asm/mutex.h
+++ /dev/null
@@ -1,18 +0,0 @@
-/*
- * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-/*
- * xchg() based mutex fast path maintains a state of 0 or 1, as opposed to
- * atomic dec based which can "count" any number of lock contenders.
- * This ideally needs to be fixed in core, but for now switching to dec ver.
- */
-#if defined(CONFIG_SMP) && (CONFIG_NR_CPUS > 2)
-#include <asm-generic/mutex-dec.h>
-#else
-#include <asm-generic/mutex-xchg.h>
-#endif
diff --git a/arch/arm/include/asm/mutex.h b/arch/arm/include/asm/mutex.h
deleted file mode 100644
index 87c044910fe0..000000000000
--- a/arch/arm/include/asm/mutex.h
+++ /dev/null
@@ -1,21 +0,0 @@
-/*
- * arch/arm/include/asm/mutex.h
- *
- * ARM optimized mutex locking primitives
- *
- * Please look into asm-generic/mutex-xchg.h for a formal definition.
- */
-#ifndef _ASM_MUTEX_H
-#define _ASM_MUTEX_H
-/*
- * On pre-ARMv6 hardware this results in a swp-based implementation,
- * which is the most efficient. For ARMv6+, we have exclusive memory
- * accessors and use atomic_dec to avoid the extra xchg operations
- * on the locking slowpaths.
- */
-#if __LINUX_ARM_ARCH__ < 6
-#include <asm-generic/mutex-xchg.h>
-#else
-#include <asm-generic/mutex-dec.h>
-#endif
-#endif	/* _ASM_MUTEX_H */
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index f43d2c44c765..0b95418fd5ca 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -26,7 +26,6 @@ generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
-generic-y += mutex.h
 generic-y += pci.h
 generic-y += poll.h
 generic-y += preempt.h
diff --git a/arch/avr32/include/asm/mutex.h b/arch/avr32/include/asm/mutex.h
deleted file mode 100644
index 458c1f7fbc18..000000000000
--- a/arch/avr32/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild
index 91d49c0a3118..2fb67b59d188 100644
--- a/arch/blackfin/include/asm/Kbuild
+++ b/arch/blackfin/include/asm/Kbuild
@@ -24,7 +24,6 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += pgalloc.h
diff --git a/arch/c6x/include/asm/mutex.h b/arch/c6x/include/asm/mutex.h
deleted file mode 100644
index 7a7248e0462d..000000000000
--- a/arch/c6x/include/asm/mutex.h
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef _ASM_C6X_MUTEX_H
-#define _ASM_C6X_MUTEX_H
-
-#include <asm-generic/mutex-null.h>
-
-#endif /* _ASM_C6X_MUTEX_H */
diff --git a/arch/cris/include/asm/mutex.h b/arch/cris/include/asm/mutex.h
deleted file mode 100644
index 458c1f7fbc18..000000000000
--- a/arch/cris/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/frv/include/asm/mutex.h b/arch/frv/include/asm/mutex.h
deleted file mode 100644
index 458c1f7fbc18..000000000000
--- a/arch/frv/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/h8300/include/asm/mutex.h b/arch/h8300/include/asm/mutex.h
deleted file mode 100644
index 458c1f7fbc18..000000000000
--- a/arch/h8300/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/hexagon/include/asm/mutex.h b/arch/hexagon/include/asm/mutex.h
deleted file mode 100644
index 58b52de1bc22..000000000000
--- a/arch/hexagon/include/asm/mutex.h
+++ /dev/null
@@ -1,8 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-#include <asm-generic/mutex-xchg.h>
diff --git a/arch/ia64/include/asm/mutex.h b/arch/ia64/include/asm/mutex.h
deleted file mode 100644
index 28cb819e0ff9..000000000000
--- a/arch/ia64/include/asm/mutex.h
+++ /dev/null
@@ -1,90 +0,0 @@
-/*
- * ia64 implementation of the mutex fastpath.
- *
- * Copyright (C) 2006 Ken Chen <kenneth.w.chen@intel.com>
- *
- */
-
-#ifndef _ASM_MUTEX_H
-#define _ASM_MUTEX_H
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function MUST leave the value lower than
- * 1 even when the "1" assertion wasn't true.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(ia64_fetchadd4_acq(count, -1) != 1))
-		fail_fn(count);
-}
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(ia64_fetchadd4_acq(count, -1) != 1))
-		return -1;
-	return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the count from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
- * In the failure case, this function is allowed to either set the value to
- * 1, or to set it to a value lower than 1.
- *
- * If the implementation sets it to a value of lower than 1, then the
- * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
- * to return 0 otherwise.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	int ret = ia64_fetchadd4_rel(count, 1);
-	if (unlikely(ret < 0))
-		fail_fn(count);
-}
-
-#define __mutex_slowpath_needs_to_unlock()		1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to a value lower than 1, and return 0 (failure)
- * if it wasn't 1 originally, or return 1 (success) otherwise. This function
- * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
- * Additionally, if the value was < 0 originally, this function must not leave
- * it to 0 on failure.
- *
- * If the architecture has no effective trylock variant, it should call the
- * <fail_fn> spinlock-based trylock variant unconditionally.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	if (atomic_read(count) == 1 && cmpxchg_acq(count, 1, 0) == 1)
-		return 1;
-	return 0;
-}
-
-#endif
diff --git a/arch/m32r/include/asm/mutex.h b/arch/m32r/include/asm/mutex.h
deleted file mode 100644
index 458c1f7fbc18..000000000000
--- a/arch/m32r/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index eb85bd9c6180..1f2e5d31cb24 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -20,7 +20,6 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += mman.h
-generic-y += mutex.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += resource.h
diff --git a/arch/metag/include/asm/Kbuild b/arch/metag/include/asm/Kbuild
index 29acb89daaaa..167150c701d1 100644
--- a/arch/metag/include/asm/Kbuild
+++ b/arch/metag/include/asm/Kbuild
@@ -27,7 +27,6 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/microblaze/include/asm/mutex.h b/arch/microblaze/include/asm/mutex.h
deleted file mode 100644
index ff6101aa2c71..000000000000
--- a/arch/microblaze/include/asm/mutex.h
+++ /dev/null
@@ -1 +0,0 @@
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 9740066cc631..3269b742a75e 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -9,7 +9,6 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mutex.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/mn10300/include/asm/mutex.h b/arch/mn10300/include/asm/mutex.h
deleted file mode 100644
index 84f5490c6fb4..000000000000
--- a/arch/mn10300/include/asm/mutex.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/* MN10300 Mutex fastpath
- *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
- *
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-#include <asm-generic/mutex-null.h>
diff --git a/arch/nios2/include/asm/mutex.h b/arch/nios2/include/asm/mutex.h
deleted file mode 100644
index ff6101aa2c71..000000000000
--- a/arch/nios2/include/asm/mutex.h
+++ /dev/null
@@ -1 +0,0 @@
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/openrisc/include/asm/mutex.h b/arch/openrisc/include/asm/mutex.h
deleted file mode 100644
index b85a0cfa9fc9..000000000000
--- a/arch/openrisc/include/asm/mutex.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/*
- * OpenRISC Linux
- *
- * Linux architectural port borrowing liberally from similar works of
- * others.  All original copyrights apply as per the original source
- * declaration.
- *
- * OpenRISC implementation:
- * Copyright (C) 2003 Matjaz Breskvar <phoenix@bsemi.com>
- * Copyright (C) 2010-2011 Jonas Bonn <jonas@southpole.se>
- * et al.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index f9b3a81aefcd..91f53c07f410 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += poll.h
diff --git a/arch/powerpc/include/asm/mutex.h b/arch/powerpc/include/asm/mutex.h
deleted file mode 100644
index 078155fa1189..000000000000
--- a/arch/powerpc/include/asm/mutex.h
+++ /dev/null
@@ -1,132 +0,0 @@
-/*
- * Optimised mutex implementation of include/asm-generic/mutex-dec.h algorithm
- */
-#ifndef _ASM_POWERPC_MUTEX_H
-#define _ASM_POWERPC_MUTEX_H
-
-static inline int __mutex_cmpxchg_lock(atomic_t *v, int old, int new)
-{
-	int t;
-
-	__asm__ __volatile__ (
-"1:	lwarx	%0,0,%1		# mutex trylock\n\
-	cmpw	0,%0,%2\n\
-	bne-	2f\n"
-	PPC405_ERR77(0,%1)
-"	stwcx.	%3,0,%1\n\
-	bne-	1b"
-	PPC_ACQUIRE_BARRIER
-	"\n\
-2:"
-	: "=&r" (t)
-	: "r" (&v->counter), "r" (old), "r" (new)
-	: "cc", "memory");
-
-	return t;
-}
-
-static inline int __mutex_dec_return_lock(atomic_t *v)
-{
-	int t;
-
-	__asm__ __volatile__(
-"1:	lwarx	%0,0,%1		# mutex lock\n\
-	addic	%0,%0,-1\n"
-	PPC405_ERR77(0,%1)
-"	stwcx.	%0,0,%1\n\
-	bne-	1b"
-	PPC_ACQUIRE_BARRIER
-	: "=&r" (t)
-	: "r" (&v->counter)
-	: "cc", "memory");
-
-	return t;
-}
-
-static inline int __mutex_inc_return_unlock(atomic_t *v)
-{
-	int t;
-
-	__asm__ __volatile__(
-	PPC_RELEASE_BARRIER
-"1:	lwarx	%0,0,%1		# mutex unlock\n\
-	addic	%0,%0,1\n"
-	PPC405_ERR77(0,%1)
-"	stwcx.	%0,0,%1 \n\
-	bne-	1b"
-	: "=&r" (t)
-	: "r" (&v->counter)
-	: "cc", "memory");
-
-	return t;
-}
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function MUST leave the value lower than
- * 1 even when the "1" assertion wasn't true.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(__mutex_dec_return_lock(count) < 0))
-		fail_fn(count);
-}
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(__mutex_dec_return_lock(count) < 0))
-		return -1;
-	return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the count from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
- * In the failure case, this function is allowed to either set the value to
- * 1, or to set it to a value lower than 1.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(__mutex_inc_return_unlock(count) <= 0))
-		fail_fn(count);
-}
-
-#define __mutex_slowpath_needs_to_unlock()		1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to 0, and return 1 (success), or if the count
- * was not 1, then return 0 (failure).
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	if (likely(atomic_read(count) == 1 && __mutex_cmpxchg_lock(count, 1, 0) == 1))
-		return 1;
-	return 0;
-}
-
-#endif
diff --git a/arch/s390/include/asm/mutex.h b/arch/s390/include/asm/mutex.h
deleted file mode 100644
index 458c1f7fbc18..000000000000
--- a/arch/s390/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/arch/score/include/asm/mutex.h b/arch/score/include/asm/mutex.h
deleted file mode 100644
index 10d48fe4db97..000000000000
--- a/arch/score/include/asm/mutex.h
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef _ASM_SCORE_MUTEX_H
-#define _ASM_SCORE_MUTEX_H
-
-#include <asm-generic/mutex-dec.h>
-
-#endif /* _ASM_SCORE_MUTEX_H */
diff --git a/arch/sh/include/asm/mutex-llsc.h b/arch/sh/include/asm/mutex-llsc.h
deleted file mode 100644
index dad29b687bd3..000000000000
--- a/arch/sh/include/asm/mutex-llsc.h
+++ /dev/null
@@ -1,109 +0,0 @@
-/*
- * arch/sh/include/asm/mutex-llsc.h
- *
- * SH-4A optimized mutex locking primitives
- *
- * Please look into asm-generic/mutex-xchg.h for a formal definition.
- */
-#ifndef __ASM_SH_MUTEX_LLSC_H
-#define __ASM_SH_MUTEX_LLSC_H
-
-/*
- * Attempting to lock a mutex on SH4A is done like in ARMv6+ architecure.
- * with a bastardized atomic decrement (it is not a reliable atomic decrement
- * but it satisfies the defined semantics for our purpose, while being
- * smaller and faster than a real atomic decrement or atomic swap.
- * The idea is to attempt  decrementing the lock value only once. If once
- * decremented it isn't zero, or if its store-back fails due to a dispute
- * on the exclusive store, we simply bail out immediately through the slow
- * path where the lock will be reattempted until it succeeds.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	int __done, __res;
-
-	__asm__ __volatile__ (
-		"movli.l	@%2, %0	\n"
-		"add		#-1, %0	\n"
-		"movco.l	%0, @%2	\n"
-		"movt		%1	\n"
-		: "=&z" (__res), "=&r" (__done)
-		: "r" (&(count)->counter)
-		: "t");
-
-	if (unlikely(!__done || __res != 0))
-		fail_fn(count);
-}
-
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	int __done, __res;
-
-	__asm__ __volatile__ (
-		"movli.l	@%2, %0	\n"
-		"add		#-1, %0	\n"
-		"movco.l	%0, @%2	\n"
-		"movt		%1	\n"
-		: "=&z" (__res), "=&r" (__done)
-		: "r" (&(count)->counter)
-		: "t");
-
-	if (unlikely(!__done || __res != 0))
-		__res = -1;
-
-	return __res;
-}
-
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	int __done, __res;
-
-	__asm__ __volatile__ (
-		"movli.l	@%2, %0	\n\t"
-		"add		#1, %0	\n\t"
-		"movco.l	%0, @%2 \n\t"
-		"movt		%1	\n\t"
-		: "=&z" (__res), "=&r" (__done)
-		: "r" (&(count)->counter)
-		: "t");
-
-	if (unlikely(!__done || __res <= 0))
-		fail_fn(count);
-}
-
-/*
- * If the unlock was done on a contended lock, or if the unlock simply fails
- * then the mutex remains locked.
- */
-#define __mutex_slowpath_needs_to_unlock()	1
-
-/*
- * For __mutex_fastpath_trylock we do an atomic decrement and check the
- * result and put it in the __res variable.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	int __res, __orig;
-
-	__asm__ __volatile__ (
-		"1: movli.l	@%2, %0		\n\t"
-		"dt		%0		\n\t"
-		"movco.l	%0,@%2		\n\t"
-		"bf		1b		\n\t"
-		"cmp/eq		#0,%0		\n\t"
-		"bt		2f		\n\t"
-		"mov		#0, %1		\n\t"
-		"bf		3f		\n\t"
-		"2: mov		#1, %1		\n\t"
-		"3:				"
-		: "=&z" (__orig), "=&r" (__res)
-		: "r" (&count->counter)
-		: "t");
-
-	return __res;
-}
-#endif /* __ASM_SH_MUTEX_LLSC_H */
diff --git a/arch/sh/include/asm/mutex.h b/arch/sh/include/asm/mutex.h
deleted file mode 100644
index d8e37716a4a0..000000000000
--- a/arch/sh/include/asm/mutex.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-#if defined(CONFIG_CPU_SH4A)
-#include <asm/mutex-llsc.h>
-#else
-#include <asm-generic/mutex-dec.h>
-#endif
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 6024c26c0585..de90e6a10b2b 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -14,7 +14,6 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += module.h
-generic-y += mutex.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += serial.h
diff --git a/arch/tile/include/asm/Kbuild b/arch/tile/include/asm/Kbuild
index ba35c41c71ff..2d1f5638974c 100644
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += parport.h
 generic-y += poll.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 904f3ebf4220..052f7f6d0551 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -17,7 +17,6 @@ generic-y += irq_work.h
 generic-y += kdebug.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mutex.h
 generic-y += param.h
 generic-y += pci.h
 generic-y += percpu.h
diff --git a/arch/unicore32/include/asm/mutex.h b/arch/unicore32/include/asm/mutex.h
deleted file mode 100644
index fab7d0e8adf6..000000000000
--- a/arch/unicore32/include/asm/mutex.h
+++ /dev/null
@@ -1,20 +0,0 @@
-/*
- * linux/arch/unicore32/include/asm/mutex.h
- *
- * Code specific to PKUnity SoC and UniCore ISA
- *
- * Copyright (C) 2001-2010 GUAN Xue-tao
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * UniCore optimized mutex locking primitives
- *
- * Please look into asm-generic/mutex-xchg.h for a formal definition.
- */
-#ifndef __UNICORE_MUTEX_H__
-#define __UNICORE_MUTEX_H__
-
-# include <asm-generic/mutex-xchg.h>
-#endif
diff --git a/arch/x86/include/asm/mutex.h b/arch/x86/include/asm/mutex.h
deleted file mode 100644
index 7d3a48275394..000000000000
--- a/arch/x86/include/asm/mutex.h
+++ /dev/null
@@ -1,5 +0,0 @@
-#ifdef CONFIG_X86_32
-# include <asm/mutex_32.h>
-#else
-# include <asm/mutex_64.h>
-#endif
diff --git a/arch/x86/include/asm/mutex_32.h b/arch/x86/include/asm/mutex_32.h
deleted file mode 100644
index e9355a84fc67..000000000000
--- a/arch/x86/include/asm/mutex_32.h
+++ /dev/null
@@ -1,110 +0,0 @@
-/*
- * Assembly implementation of the mutex fastpath, based on atomic
- * decrement/increment.
- *
- * started by Ingo Molnar:
- *
- *  Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
- */
-#ifndef _ASM_X86_MUTEX_32_H
-#define _ASM_X86_MUTEX_32_H
-
-#include <asm/alternative.h>
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fn> if it
- * wasn't 1 originally. This function MUST leave the value lower than 1
- * even when the "1" assertion wasn't true.
- */
-#define __mutex_fastpath_lock(count, fail_fn)			\
-do {								\
-	unsigned int dummy;					\
-								\
-	typecheck(atomic_t *, count);				\
-	typecheck_fn(void (*)(atomic_t *), fail_fn);		\
-								\
-	asm volatile(LOCK_PREFIX "   decl (%%eax)\n"		\
-		     "   jns 1f	\n"				\
-		     "   call " #fail_fn "\n"			\
-		     "1:\n"					\
-		     : "=a" (dummy)				\
-		     : "a" (count)				\
-		     : "memory", "ecx", "edx");			\
-} while (0)
-
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int __mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(atomic_dec_return(count) < 0))
-		return -1;
-	else
-		return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the mutex from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * try to promote the mutex from 0 to 1. if it wasn't 0, call <fail_fn>.
- * In the failure case, this function is allowed to either set the value
- * to 1, or to set it to a value lower than 1.
- *
- * If the implementation sets it to a value of lower than 1, the
- * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
- * to return 0 otherwise.
- */
-#define __mutex_fastpath_unlock(count, fail_fn)			\
-do {								\
-	unsigned int dummy;					\
-								\
-	typecheck(atomic_t *, count);				\
-	typecheck_fn(void (*)(atomic_t *), fail_fn);		\
-								\
-	asm volatile(LOCK_PREFIX "   incl (%%eax)\n"		\
-		     "   jg	1f\n"				\
-		     "   call " #fail_fn "\n"			\
-		     "1:\n"					\
-		     : "=a" (dummy)				\
-		     : "a" (count)				\
-		     : "memory", "ecx", "edx");			\
-} while (0)
-
-#define __mutex_slowpath_needs_to_unlock()	1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to a value lower than 1, and return 0 (failure)
- * if it wasn't 1 originally, or return 1 (success) otherwise. This function
- * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
- * Additionally, if the value was < 0 originally, this function must not leave
- * it to 0 on failure.
- */
-static inline int __mutex_fastpath_trylock(atomic_t *count,
-					   int (*fail_fn)(atomic_t *))
-{
-	/* cmpxchg because it never induces a false contention state. */
-	if (likely(atomic_read(count) == 1 && atomic_cmpxchg(count, 1, 0) == 1))
-		return 1;
-
-	return 0;
-}
-
-#endif /* _ASM_X86_MUTEX_32_H */
diff --git a/arch/x86/include/asm/mutex_64.h b/arch/x86/include/asm/mutex_64.h
deleted file mode 100644
index d9850758464e..000000000000
--- a/arch/x86/include/asm/mutex_64.h
+++ /dev/null
@@ -1,127 +0,0 @@
-/*
- * Assembly implementation of the mutex fastpath, based on atomic
- * decrement/increment.
- *
- * started by Ingo Molnar:
- *
- *  Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
- */
-#ifndef _ASM_X86_MUTEX_64_H
-#define _ASM_X86_MUTEX_64_H
-
-/**
- * __mutex_fastpath_lock - decrement and call function if negative
- * @v: pointer of type atomic_t
- * @fail_fn: function to call if the result is negative
- *
- * Atomically decrements @v and calls <fail_fn> if the result is negative.
- */
-#ifdef CC_HAVE_ASM_GOTO
-static inline void __mutex_fastpath_lock(atomic_t *v,
-					 void (*fail_fn)(atomic_t *))
-{
-	asm_volatile_goto(LOCK_PREFIX "   decl %0\n"
-			  "   jns %l[exit]\n"
-			  : : "m" (v->counter)
-			  : "memory", "cc"
-			  : exit);
-	fail_fn(v);
-exit:
-	return;
-}
-#else
-#define __mutex_fastpath_lock(v, fail_fn)			\
-do {								\
-	unsigned long dummy;					\
-								\
-	typecheck(atomic_t *, v);				\
-	typecheck_fn(void (*)(atomic_t *), fail_fn);		\
-								\
-	asm volatile(LOCK_PREFIX "   decl (%%rdi)\n"		\
-		     "   jns 1f		\n"			\
-		     "   call " #fail_fn "\n"			\
-		     "1:"					\
-		     : "=D" (dummy)				\
-		     : "D" (v)					\
-		     : "rax", "rsi", "rdx", "rcx",		\
-		       "r8", "r9", "r10", "r11", "memory");	\
-} while (0)
-#endif
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int __mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(atomic_dec_return(count) < 0))
-		return -1;
-	else
-		return 0;
-}
-
-/**
- * __mutex_fastpath_unlock - increment and call function if nonpositive
- * @v: pointer of type atomic_t
- * @fail_fn: function to call if the result is nonpositive
- *
- * Atomically increments @v and calls <fail_fn> if the result is nonpositive.
- */
-#ifdef CC_HAVE_ASM_GOTO
-static inline void __mutex_fastpath_unlock(atomic_t *v,
-					   void (*fail_fn)(atomic_t *))
-{
-	asm_volatile_goto(LOCK_PREFIX "   incl %0\n"
-			  "   jg %l[exit]\n"
-			  : : "m" (v->counter)
-			  : "memory", "cc"
-			  : exit);
-	fail_fn(v);
-exit:
-	return;
-}
-#else
-#define __mutex_fastpath_unlock(v, fail_fn)			\
-do {								\
-	unsigned long dummy;					\
-								\
-	typecheck(atomic_t *, v);				\
-	typecheck_fn(void (*)(atomic_t *), fail_fn);		\
-								\
-	asm volatile(LOCK_PREFIX "   incl (%%rdi)\n"		\
-		     "   jg 1f\n"				\
-		     "   call " #fail_fn "\n"			\
-		     "1:"					\
-		     : "=D" (dummy)				\
-		     : "D" (v)					\
-		     : "rax", "rsi", "rdx", "rcx",		\
-		       "r8", "r9", "r10", "r11", "memory");	\
-} while (0)
-#endif
-
-#define __mutex_slowpath_needs_to_unlock()	1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to 0 and return 1 (success), or return 0 (failure)
- * if it wasn't 1 originally. [the fallback function is never used on
- * x86_64, because all x86_64 CPUs have a CMPXCHG instruction.]
- */
-static inline int __mutex_fastpath_trylock(atomic_t *count,
-					   int (*fail_fn)(atomic_t *))
-{
-	if (likely(atomic_read(count) == 1 && atomic_cmpxchg(count, 1, 0) == 1))
-		return 1;
-
-	return 0;
-}
-
-#endif /* _ASM_X86_MUTEX_64_H */
diff --git a/arch/xtensa/include/asm/mutex.h b/arch/xtensa/include/asm/mutex.h
deleted file mode 100644
index 458c1f7fbc18..000000000000
--- a/arch/xtensa/include/asm/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-/*
- * Pull in the generic implementation for the mutex fastpath.
- *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
- */
-
-#include <asm-generic/mutex-dec.h>
diff --git a/include/asm-generic/mutex-dec.h b/include/asm-generic/mutex-dec.h
deleted file mode 100644
index c54829d3de37..000000000000
--- a/include/asm-generic/mutex-dec.h
+++ /dev/null
@@ -1,88 +0,0 @@
-/*
- * include/asm-generic/mutex-dec.h
- *
- * Generic implementation of the mutex fastpath, based on atomic
- * decrement/increment.
- */
-#ifndef _ASM_GENERIC_MUTEX_DEC_H
-#define _ASM_GENERIC_MUTEX_DEC_H
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if
- * it wasn't 1 originally. This function MUST leave the value lower than
- * 1 even when the "1" assertion wasn't true.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(atomic_dec_return_acquire(count) < 0))
-		fail_fn(count);
-}
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(atomic_dec_return_acquire(count) < 0))
-		return -1;
-	return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the count from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * Try to promote the count from 0 to 1. If it wasn't 0, call <fail_fn>.
- * In the failure case, this function is allowed to either set the value to
- * 1, or to set it to a value lower than 1.
- *
- * If the implementation sets it to a value of lower than 1, then the
- * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
- * to return 0 otherwise.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(atomic_inc_return_release(count) <= 0))
-		fail_fn(count);
-}
-
-#define __mutex_slowpath_needs_to_unlock()		1
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: fallback function
- *
- * Change the count from 1 to a value lower than 1, and return 0 (failure)
- * if it wasn't 1 originally, or return 1 (success) otherwise. This function
- * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
- * Additionally, if the value was < 0 originally, this function must not leave
- * it to 0 on failure.
- *
- * If the architecture has no effective trylock variant, it should call the
- * <fail_fn> spinlock-based trylock variant unconditionally.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	if (likely(atomic_read(count) == 1 && atomic_cmpxchg_acquire(count, 1, 0) == 1))
-		return 1;
-	return 0;
-}
-
-#endif
diff --git a/include/asm-generic/mutex-null.h b/include/asm-generic/mutex-null.h
deleted file mode 100644
index 61069ed334e2..000000000000
--- a/include/asm-generic/mutex-null.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/*
- * include/asm-generic/mutex-null.h
- *
- * Generic implementation of the mutex fastpath, based on NOP :-)
- *
- * This is used by the mutex-debugging infrastructure, but it can also
- * be used by architectures that (for whatever reason) want to use the
- * spinlock based slowpath.
- */
-#ifndef _ASM_GENERIC_MUTEX_NULL_H
-#define _ASM_GENERIC_MUTEX_NULL_H
-
-#define __mutex_fastpath_lock(count, fail_fn)		fail_fn(count)
-#define __mutex_fastpath_lock_retval(count)		(-1)
-#define __mutex_fastpath_unlock(count, fail_fn)		fail_fn(count)
-#define __mutex_fastpath_trylock(count, fail_fn)	fail_fn(count)
-#define __mutex_slowpath_needs_to_unlock()		1
-
-#endif
diff --git a/include/asm-generic/mutex-xchg.h b/include/asm-generic/mutex-xchg.h
deleted file mode 100644
index 3269ec4e195f..000000000000
--- a/include/asm-generic/mutex-xchg.h
+++ /dev/null
@@ -1,120 +0,0 @@
-/*
- * include/asm-generic/mutex-xchg.h
- *
- * Generic implementation of the mutex fastpath, based on xchg().
- *
- * NOTE: An xchg based implementation might be less optimal than an atomic
- *       decrement/increment based implementation. If your architecture
- *       has a reasonable atomic dec/inc then you should probably use
- *	 asm-generic/mutex-dec.h instead, or you could open-code an
- *	 optimized version in asm/mutex.h.
- */
-#ifndef _ASM_GENERIC_MUTEX_XCHG_H
-#define _ASM_GENERIC_MUTEX_XCHG_H
-
-/**
- *  __mutex_fastpath_lock - try to take the lock by moving the count
- *                          from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 1
- *
- * Change the count from 1 to a value lower than 1, and call <fail_fn> if it
- * wasn't 1 originally. This function MUST leave the value lower than 1
- * even when the "1" assertion wasn't true.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(atomic_xchg(count, 0) != 1))
-		/*
-		 * We failed to acquire the lock, so mark it contended
-		 * to ensure that any waiting tasks are woken up by the
-		 * unlock slow path.
-		 */
-		if (likely(atomic_xchg_acquire(count, -1) != 1))
-			fail_fn(count);
-}
-
-/**
- *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
- *                                 from 1 to a 0 value
- *  @count: pointer of type atomic_t
- *
- * Change the count from 1 to a value lower than 1. This function returns 0
- * if the fastpath succeeds, or -1 otherwise.
- */
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count)
-{
-	if (unlikely(atomic_xchg_acquire(count, 0) != 1))
-		if (likely(atomic_xchg(count, -1) != 1))
-			return -1;
-	return 0;
-}
-
-/**
- *  __mutex_fastpath_unlock - try to promote the mutex from 0 to 1
- *  @count: pointer of type atomic_t
- *  @fail_fn: function to call if the original value was not 0
- *
- * try to promote the mutex from 0 to 1. if it wasn't 0, call <function>
- * In the failure case, this function is allowed to either set the value to
- * 1, or to set it to a value lower than one.
- * If the implementation sets it to a value of lower than one, the
- * __mutex_slowpath_needs_to_unlock() macro needs to return 1, it needs
- * to return 0 otherwise.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	if (unlikely(atomic_xchg_release(count, 1) != 0))
-		fail_fn(count);
-}
-
-#define __mutex_slowpath_needs_to_unlock()		0
-
-/**
- * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
- *
- *  @count: pointer of type atomic_t
- *  @fail_fn: spinlock based trylock implementation
- *
- * Change the count from 1 to a value lower than 1, and return 0 (failure)
- * if it wasn't 1 originally, or return 1 (success) otherwise. This function
- * MUST leave the value lower than 1 even when the "1" assertion wasn't true.
- * Additionally, if the value was < 0 originally, this function must not leave
- * it to 0 on failure.
- *
- * If the architecture has no effective trylock variant, it should call the
- * <fail_fn> spinlock-based trylock variant unconditionally.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	int prev;
-
-	if (atomic_read(count) != 1)
-		return 0;
-
-	prev = atomic_xchg_acquire(count, 0);
-	if (unlikely(prev < 0)) {
-		/*
-		 * The lock was marked contended so we must restore that
-		 * state. If while doing so we get back a prev value of 1
-		 * then we just own it.
-		 *
-		 * [ In the rare case of the mutex going to 1, to 0, to -1
-		 *   and then back to 0 in this few-instructions window,
-		 *   this has the potential to trigger the slowpath for the
-		 *   owner's unlock path needlessly, but that's not a problem
-		 *   in practice. ]
-		 */
-		prev = atomic_xchg_acquire(count, prev);
-		if (prev < 0)
-			prev = 0;
-	}
-
-	return prev;
-}
-
-#endif
diff --git a/include/asm-generic/mutex.h b/include/asm-generic/mutex.h
deleted file mode 100644
index fe91ab502793..000000000000
--- a/include/asm-generic/mutex.h
+++ /dev/null
@@ -1,9 +0,0 @@
-#ifndef __ASM_GENERIC_MUTEX_H
-#define __ASM_GENERIC_MUTEX_H
-/*
- * Pull in the generic implementation for the mutex fastpath,
- * which is a reasonable default on many architectures.
- */
-
-#include <asm-generic/mutex-dec.h>
-#endif /* __ASM_GENERIC_MUTEX_H */
diff --git a/include/linux/mutex-debug.h b/include/linux/mutex-debug.h
deleted file mode 100644
index 4ac8b1977b73..000000000000
--- a/include/linux/mutex-debug.h
+++ /dev/null
@@ -1,24 +0,0 @@
-#ifndef __LINUX_MUTEX_DEBUG_H
-#define __LINUX_MUTEX_DEBUG_H
-
-#include <linux/linkage.h>
-#include <linux/lockdep.h>
-#include <linux/debug_locks.h>
-
-/*
- * Mutexes - debugging helpers:
- */
-
-#define __DEBUG_MUTEX_INITIALIZER(lockname)				\
-	, .magic = &lockname
-
-#define mutex_init(mutex)						\
-do {									\
-	static struct lock_class_key __key;				\
-									\
-	__mutex_init((mutex), #mutex, &__key);				\
-} while (0)
-
-extern void mutex_destroy(struct mutex *lock);
-
-#endif
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2cb7531e7d7a..4d3bccabbea5 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -18,6 +18,7 @@
 #include <linux/atomic.h>
 #include <asm/processor.h>
 #include <linux/osq_lock.h>
+#include <linux/debug_locks.h>
 
 /*
  * Simple, straightforward mutexes with strict semantics:
@@ -48,16 +49,12 @@
  *   locks and tasks (and only those tasks)
  */
 struct mutex {
-	/* 1: unlocked, 0: locked, negative: locked, possible waiters */
-	atomic_t		count;
+	atomic_long_t		owner;
 	spinlock_t		wait_lock;
-	struct list_head	wait_list;
-#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_MUTEX_SPIN_ON_OWNER)
-	struct task_struct	*owner;
-#endif
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* Spinner MCS lock */
 #endif
+	struct list_head	wait_list;
 #ifdef CONFIG_DEBUG_MUTEXES
 	void			*magic;
 #endif
@@ -66,6 +63,11 @@ struct mutex {
 #endif
 };
 
+static inline struct task_struct *__mutex_owner(struct mutex *lock)
+{
+	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~0x03);
+}
+
 /*
  * This is the control structure for tasks blocked on mutex,
  * which resides on the blocked task's kernel stack:
@@ -79,9 +81,20 @@ struct mutex_waiter {
 };
 
 #ifdef CONFIG_DEBUG_MUTEXES
-# include <linux/mutex-debug.h>
+
+#define __DEBUG_MUTEX_INITIALIZER(lockname)				\
+	, .magic = &lockname
+
+extern void mutex_destroy(struct mutex *lock);
+
 #else
+
 # define __DEBUG_MUTEX_INITIALIZER(lockname)
+
+static inline void mutex_destroy(struct mutex *lock) {}
+
+#endif
+
 /**
  * mutex_init - initialize the mutex
  * @mutex: the mutex to be initialized
@@ -90,14 +103,12 @@ struct mutex_waiter {
  *
  * It is not allowed to initialize an already locked mutex.
  */
-# define mutex_init(mutex) \
-do {							\
-	static struct lock_class_key __key;		\
-							\
-	__mutex_init((mutex), #mutex, &__key);		\
+#define mutex_init(mutex)						\
+do {									\
+	static struct lock_class_key __key;				\
+									\
+	__mutex_init((mutex), #mutex, &__key);				\
 } while (0)
-static inline void mutex_destroy(struct mutex *lock) {}
-#endif
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
@@ -107,7 +118,7 @@ static inline void mutex_destroy(struct mutex *lock) {}
 #endif
 
 #define __MUTEX_INITIALIZER(lockname) \
-		{ .count = ATOMIC_INIT(1) \
+		{ .owner = ATOMIC_LONG_INIT(0) \
 		, .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
 		, .wait_list = LIST_HEAD_INIT(lockname.wait_list) \
 		__DEBUG_MUTEX_INITIALIZER(lockname) \
@@ -127,7 +138,10 @@ extern void __mutex_init(struct mutex *lock, const char *name,
  */
 static inline int mutex_is_locked(struct mutex *lock)
 {
-	return atomic_read(&lock->count) != 1;
+	/*
+	 * XXX think about spin_is_locked
+	 */
+	return __mutex_owner(lock) != NULL;
 }
 
 /*
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 9c951fade415..9aa713629387 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -73,21 +73,8 @@ void debug_mutex_unlock(struct mutex *lock)
 {
 	if (likely(debug_locks)) {
 		DEBUG_LOCKS_WARN_ON(lock->magic != lock);
-
-		if (!lock->owner)
-			DEBUG_LOCKS_WARN_ON(!lock->owner);
-		else
-			DEBUG_LOCKS_WARN_ON(lock->owner != current);
-
 		DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
 	}
-
-	/*
-	 * __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug
-	 * mutexes so that we can do it here after we've verified state.
-	 */
-	mutex_clear_owner(lock);
-	atomic_set(&lock->count, 1);
 }
 
 void debug_mutex_init(struct mutex *lock, const char *name,
diff --git a/kernel/locking/mutex-debug.h b/kernel/locking/mutex-debug.h
index 57a871ae3c81..a459faa48987 100644
--- a/kernel/locking/mutex-debug.h
+++ b/kernel/locking/mutex-debug.h
@@ -27,16 +27,6 @@ extern void debug_mutex_unlock(struct mutex *lock);
 extern void debug_mutex_init(struct mutex *lock, const char *name,
 			     struct lock_class_key *key);
 
-static inline void mutex_set_owner(struct mutex *lock)
-{
-	WRITE_ONCE(lock->owner, current);
-}
-
-static inline void mutex_clear_owner(struct mutex *lock)
-{
-	WRITE_ONCE(lock->owner, NULL);
-}
-
 #define spin_lock_mutex(lock, flags)			\
 	do {						\
 		struct mutex *l = container_of(lock, struct mutex, wait_lock); \
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90db3909..d4bff1d7f92a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -33,35 +33,90 @@
  */
 #ifdef CONFIG_DEBUG_MUTEXES
 # include "mutex-debug.h"
-# include <asm-generic/mutex-null.h>
-/*
- * Must be 0 for the debug case so we do not do the unlock outside of the
- * wait_lock region. debug_mutex_unlock() will do the actual unlock in this
- * case.
- */
-# undef __mutex_slowpath_needs_to_unlock
-# define  __mutex_slowpath_needs_to_unlock()	0
 #else
 # include "mutex.h"
-# include <asm/mutex.h>
 #endif
 
 void
 __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
 {
-	atomic_set(&lock->count, 1);
+	atomic_long_set(&lock->owner, 0);
 	spin_lock_init(&lock->wait_lock);
 	INIT_LIST_HEAD(&lock->wait_list);
-	mutex_clear_owner(lock);
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 	osq_lock_init(&lock->osq);
 #endif
 
 	debug_mutex_init(lock, name, key);
 }
-
 EXPORT_SYMBOL(__mutex_init);
 
+#define MUTEX_FLAG_WAITERS	0x01
+
+#define MUTEX_FLAGS		0x03
+
+static inline struct task_struct *__owner_task(unsigned long owner)
+{
+	return (struct task_struct *)(owner & ~MUTEX_FLAGS);
+}
+
+static inline unsigned long __owner_flags(unsigned long owner)
+{
+	return owner & MUTEX_FLAGS;
+}
+
+/*
+ * Actual trylock that will work on any unlocked state.
+ */
+static inline bool __mutex_trylock(struct mutex *lock)
+{
+	unsigned long owner, curr = (unsigned long)current;
+
+	owner = atomic_long_read(&lock->owner);
+	for (;;) { /* must loop, can race against a flag */
+		unsigned long old;
+
+		if (__owner_task(owner)) {
+			if ((unsigned long)__owner_task(owner) == curr)
+				return true;
+
+			return false;
+		}
+
+		curr |= __owner_flags(owner);
+		old = atomic_long_cmpxchg_acquire(&lock->owner, owner, curr);
+		if (old == owner)
+			return true;
+
+		owner = old;
+	}
+}
+
+/*
+ * Optimistic trylock that only works in the uncontended case. Make sure to
+ * follow with a __mutex_trylock() before failing.
+ */
+static __always_inline bool __mutex_trylock_fast(struct mutex *lock)
+{
+	unsigned long owner, curr = (unsigned long)current;
+
+	owner = atomic_long_cmpxchg_acquire(&lock->owner, 0UL, curr);
+	if (!owner)
+		return true;
+
+	return false;
+}
+
+static inline void __mutex_set_flag(struct mutex *lock, unsigned long flag)
+{
+	atomic_long_or(flag, &lock->owner);
+}
+
+static inline void __mutex_clear_flag(struct mutex *lock, unsigned long flag)
+{
+	atomic_long_andnot(flag, &lock->owner);
+}
+
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * We split the mutex lock/unlock logic into separate fastpath and
@@ -69,7 +124,7 @@ EXPORT_SYMBOL(__mutex_init);
  * We also put the fastpath first in the kernel image, to make sure the
  * branch is predicted by the CPU as default-untaken.
  */
-__visible void __sched __mutex_lock_slowpath(atomic_t *lock_count);
+static void __sched __mutex_lock_slowpath(struct mutex *lock);
 
 /**
  * mutex_lock - acquire the mutex
@@ -95,14 +150,10 @@ __visible void __sched __mutex_lock_slowpath(atomic_t *lock_count);
 void __sched mutex_lock(struct mutex *lock)
 {
 	might_sleep();
-	/*
-	 * The locking fastpath is the 1->0 transition from
-	 * 'unlocked' into 'locked' state.
-	 */
-	__mutex_fastpath_lock(&lock->count, __mutex_lock_slowpath);
-	mutex_set_owner(lock);
-}
 
+	if (!__mutex_trylock_fast(lock))
+		__mutex_lock_slowpath(lock);
+}
 EXPORT_SYMBOL(mutex_lock);
 #endif
 
@@ -176,7 +227,7 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
 	/*
 	 * Check if lock is contended, if not there is nobody to wake up
 	 */
-	if (likely(atomic_read(&lock->base.count) == 0))
+	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
 		return;
 
 	/*
@@ -227,7 +278,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 	bool ret = true;
 
 	rcu_read_lock();
-	while (lock->owner == owner) {
+	while (__mutex_owner(lock) == owner) {
 		/*
 		 * Ensure we emit the owner->on_cpu, dereference _after_
 		 * checking lock->owner still matches owner. If that fails,
@@ -260,7 +311,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 		return 0;
 
 	rcu_read_lock();
-	owner = READ_ONCE(lock->owner);
+	owner = __mutex_owner(lock);
 	if (owner)
 		retval = owner->on_cpu;
 	rcu_read_unlock();
@@ -272,15 +323,6 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 }
 
 /*
- * Atomically try to take the lock when it is available
- */
-static inline bool mutex_try_to_acquire(struct mutex *lock)
-{
-	return !mutex_is_locked(lock) &&
-		(atomic_cmpxchg_acquire(&lock->count, 1, 0) == 1);
-}
-
-/*
  * Optimistic spinning.
  *
  * We try to spin for acquisition when we find that the lock owner
@@ -342,12 +384,12 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		owner = READ_ONCE(lock->owner);
+		owner = __mutex_owner(lock);
 		if (owner && !mutex_spin_on_owner(lock, owner))
 			break;
 
 		/* Try to acquire the mutex if it is unlocked. */
-		if (mutex_try_to_acquire(lock)) {
+		if (__mutex_trylock(lock)) {
 			lock_acquired(&lock->dep_map, ip);
 
 			if (use_ww_ctx) {
@@ -357,7 +399,6 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 				ww_mutex_set_context_fastpath(ww, ww_ctx);
 			}
 
-			mutex_set_owner(lock);
 			osq_unlock(&lock->osq);
 			return true;
 		}
@@ -406,8 +447,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 }
 #endif
 
-__visible __used noinline
-void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
+static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock);
 
 /**
  * mutex_unlock - release the mutex
@@ -422,21 +462,16 @@ void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
  */
 void __sched mutex_unlock(struct mutex *lock)
 {
-	/*
-	 * The unlocking fastpath is the 0->1 transition from 'locked'
-	 * into 'unlocked' state:
-	 */
-#ifndef CONFIG_DEBUG_MUTEXES
-	/*
-	 * When debugging is enabled we must not clear the owner before time,
-	 * the slow path will always be taken, and that clears the owner field
-	 * after verifying that it was indeed current.
-	 */
-	mutex_clear_owner(lock);
+	unsigned long owner;
+
+#ifdef CONFIG_DEBUG_MUTEXES
+	DEBUG_LOCKS_WARN_ON(__mutex_owner(lock) != current);
 #endif
-	__mutex_fastpath_unlock(&lock->count, __mutex_unlock_slowpath);
-}
 
+	owner = atomic_long_fetch_and(MUTEX_FLAGS, &lock->owner);
+	if (__owner_flags(owner))
+		__mutex_unlock_slowpath(lock);
+}
 EXPORT_SYMBOL(mutex_unlock);
 
 /**
@@ -465,15 +500,7 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 		lock->ctx = NULL;
 	}
 
-#ifndef CONFIG_DEBUG_MUTEXES
-	/*
-	 * When debugging is enabled we must not clear the owner before time,
-	 * the slow path will always be taken, and that clears the owner field
-	 * after verifying that it was indeed current.
-	 */
-	mutex_clear_owner(&lock->base);
-#endif
-	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
+	mutex_unlock(&lock->base);
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
@@ -520,7 +547,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	preempt_disable();
 	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
 
-	if (mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx)) {
+	if (__mutex_trylock(lock) || mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx)) {
 		/* got the lock, yay! */
 		preempt_enable();
 		return 0;
@@ -529,11 +556,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	spin_lock_mutex(&lock->wait_lock, flags);
 
 	/*
-	 * Once more, try to acquire the lock. Only try-lock the mutex if
-	 * it is unlocked to reduce unnecessary xchg() operations.
+	 * Once more, try to acquire the lock.
 	 */
-	if (!mutex_is_locked(lock) &&
-	    (atomic_xchg_acquire(&lock->count, 0) == 1))
+	if (__mutex_trylock(lock))
 		goto skip_wait;
 
 	debug_mutex_lock_common(lock, &waiter);
@@ -543,21 +568,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	list_add_tail(&waiter.list, &lock->wait_list);
 	waiter.task = task;
 
+	if (list_first_entry(&lock->wait_list, struct mutex_waiter, list) == &waiter) {
+		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
+		/*
+		 * We must be sure to set WAITERS before attempting the trylock
+		 * below, such that mutex_unlock() must either see our WAITERS
+		 * or we see its unlock.
+		 */
+		smp_mb__after_atomic();
+	}
+
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
-		/*
-		 * Lets try to take the lock again - this is needed even if
-		 * we get here for the first time (shortly after failing to
-		 * acquire the lock), to make sure that we get a wakeup once
-		 * it's unlocked. Later on, if we sleep, this is the
-		 * operation that gives us the lock. We xchg it to -1, so
-		 * that when we release the lock, we properly wake up the
-		 * other waiters. We only attempt the xchg if the count is
-		 * non-negative in order to avoid unnecessary xchg operations:
-		 */
-		if (atomic_read(&lock->count) >= 0 &&
-		    (atomic_xchg_acquire(&lock->count, -1) == 1))
+		if (__mutex_trylock(lock))
 			break;
 
 		/*
@@ -587,13 +611,13 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	mutex_remove_waiter(lock, &waiter, task);
 	/* set it to 0 if there are no waiters left: */
 	if (likely(list_empty(&lock->wait_list)))
-		atomic_set(&lock->count, 0);
+		__mutex_clear_flag(lock, MUTEX_FLAG_WAITERS);
+
 	debug_mutex_free_waiter(&waiter);
 
 skip_wait:
 	/* got the lock - cleanup and rejoice! */
 	lock_acquired(&lock->dep_map, ip);
-	mutex_set_owner(lock);
 
 	if (use_ww_ctx) {
 		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
@@ -631,7 +655,6 @@ _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest)
 	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE,
 			    0, nest, _RET_IP_, NULL, 0);
 }
-
 EXPORT_SYMBOL_GPL(_mutex_lock_nest_lock);
 
 int __sched
@@ -650,7 +673,6 @@ mutex_lock_interruptible_nested(struct mutex *lock, unsigned int subclass)
 	return __mutex_lock_common(lock, TASK_INTERRUPTIBLE,
 				   subclass, NULL, _RET_IP_, NULL, 0);
 }
-
 EXPORT_SYMBOL_GPL(mutex_lock_interruptible_nested);
 
 static inline int
@@ -715,29 +737,13 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
 /*
  * Release the lock, slowpath:
  */
-static inline void
-__mutex_unlock_common_slowpath(struct mutex *lock, int nested)
+static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock)
 {
 	unsigned long flags;
 	WAKE_Q(wake_q);
 
-	/*
-	 * As a performance measurement, release the lock before doing other
-	 * wakeup related duties to follow. This allows other tasks to acquire
-	 * the lock sooner, while still handling cleanups in past unlock calls.
-	 * This can be done as we do not enforce strict equivalence between the
-	 * mutex counter and wait_list.
-	 *
-	 *
-	 * Some architectures leave the lock unlocked in the fastpath failure
-	 * case, others need to leave it locked. In the later case we have to
-	 * unlock it here - as the lock counter is currently 0 or negative.
-	 */
-	if (__mutex_slowpath_needs_to_unlock())
-		atomic_set(&lock->count, 1);
-
 	spin_lock_mutex(&lock->wait_lock, flags);
-	mutex_release(&lock->dep_map, nested, _RET_IP_);
+	mutex_release(&lock->dep_map, 0, _RET_IP_);
 	debug_mutex_unlock(lock);
 
 	if (!list_empty(&lock->wait_list)) {
@@ -754,17 +760,6 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
 	wake_up_q(&wake_q);
 }
 
-/*
- * Release the lock, slowpath:
- */
-__visible void
-__mutex_unlock_slowpath(atomic_t *lock_count)
-{
-	struct mutex *lock = container_of(lock_count, struct mutex, count);
-
-	__mutex_unlock_common_slowpath(lock, 1);
-}
-
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * Here come the less common (and hence less performance-critical) APIs:
@@ -789,38 +784,29 @@ __mutex_lock_interruptible_slowpath(struct mutex *lock);
  */
 int __sched mutex_lock_interruptible(struct mutex *lock)
 {
-	int ret;
-
 	might_sleep();
-	ret =  __mutex_fastpath_lock_retval(&lock->count);
-	if (likely(!ret)) {
-		mutex_set_owner(lock);
+
+	if (__mutex_trylock_fast(lock))
 		return 0;
-	} else
-		return __mutex_lock_interruptible_slowpath(lock);
+
+	return __mutex_lock_interruptible_slowpath(lock);
 }
 
 EXPORT_SYMBOL(mutex_lock_interruptible);
 
 int __sched mutex_lock_killable(struct mutex *lock)
 {
-	int ret;
-
 	might_sleep();
-	ret = __mutex_fastpath_lock_retval(&lock->count);
-	if (likely(!ret)) {
-		mutex_set_owner(lock);
+
+	if (__mutex_trylock_fast(lock))
 		return 0;
-	} else
-		return __mutex_lock_killable_slowpath(lock);
+
+	return __mutex_lock_killable_slowpath(lock);
 }
 EXPORT_SYMBOL(mutex_lock_killable);
 
-__visible void __sched
-__mutex_lock_slowpath(atomic_t *lock_count)
+static void __sched __mutex_lock_slowpath(struct mutex *lock)
 {
-	struct mutex *lock = container_of(lock_count, struct mutex, count);
-
 	__mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0,
 			    NULL, _RET_IP_, NULL, 0);
 }
@@ -856,37 +842,6 @@ __ww_mutex_lock_interruptible_slowpath(struct ww_mutex *lock,
 
 #endif
 
-/*
- * Spinlock based trylock, we take the spinlock and check whether we
- * can get the lock:
- */
-static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
-{
-	struct mutex *lock = container_of(lock_count, struct mutex, count);
-	unsigned long flags;
-	int prev;
-
-	/* No need to trylock if the mutex is locked. */
-	if (mutex_is_locked(lock))
-		return 0;
-
-	spin_lock_mutex(&lock->wait_lock, flags);
-
-	prev = atomic_xchg_acquire(&lock->count, -1);
-	if (likely(prev == 1)) {
-		mutex_set_owner(lock);
-		mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
-	}
-
-	/* Set it back to 0 if there are no waiters: */
-	if (likely(list_empty(&lock->wait_list)))
-		atomic_set(&lock->count, 0);
-
-	spin_unlock_mutex(&lock->wait_lock, flags);
-
-	return prev == 1;
-}
-
 /**
  * mutex_trylock - try to acquire the mutex, without waiting
  * @lock: the mutex to be acquired
@@ -903,13 +858,7 @@ static inline int __mutex_trylock_slowpath(atomic_t *lock_count)
  */
 int __sched mutex_trylock(struct mutex *lock)
 {
-	int ret;
-
-	ret = __mutex_fastpath_trylock(&lock->count, __mutex_trylock_slowpath);
-	if (ret)
-		mutex_set_owner(lock);
-
-	return ret;
+	return __mutex_trylock(lock);
 }
 EXPORT_SYMBOL(mutex_trylock);
 
@@ -917,36 +866,28 @@ EXPORT_SYMBOL(mutex_trylock);
 int __sched
 __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
-	int ret;
-
 	might_sleep();
 
-	ret = __mutex_fastpath_lock_retval(&lock->base.count);
-
-	if (likely(!ret)) {
+	if (__mutex_trylock_fast(&lock->base)) {
 		ww_mutex_set_context_fastpath(lock, ctx);
-		mutex_set_owner(&lock->base);
-	} else
-		ret = __ww_mutex_lock_slowpath(lock, ctx);
-	return ret;
+		return 0;
+	}
+
+	return __ww_mutex_lock_slowpath(lock, ctx);
 }
 EXPORT_SYMBOL(__ww_mutex_lock);
 
 int __sched
 __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
-	int ret;
-
 	might_sleep();
 
-	ret = __mutex_fastpath_lock_retval(&lock->base.count);
-
-	if (likely(!ret)) {
+	if (__mutex_trylock_fast(&lock->base)) {
 		ww_mutex_set_context_fastpath(lock, ctx);
-		mutex_set_owner(&lock->base);
-	} else
-		ret = __ww_mutex_lock_interruptible_slowpath(lock, ctx);
-	return ret;
+		return 0;
+	}
+
+	return __ww_mutex_lock_interruptible_slowpath(lock, ctx);
 }
 EXPORT_SYMBOL(__ww_mutex_lock_interruptible);
 
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 6cd6b8e9efd7..4410a4af42a3 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -16,32 +16,6 @@
 #define mutex_remove_waiter(lock, waiter, task) \
 		__list_del((waiter)->list.prev, (waiter)->list.next)
 
-#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-/*
- * The mutex owner can get read and written to locklessly.
- * We should use WRITE_ONCE when writing the owner value to
- * avoid store tearing, otherwise, a thread could potentially
- * read a partially written and incomplete owner value.
- */
-static inline void mutex_set_owner(struct mutex *lock)
-{
-	WRITE_ONCE(lock->owner, current);
-}
-
-static inline void mutex_clear_owner(struct mutex *lock)
-{
-	WRITE_ONCE(lock->owner, NULL);
-}
-#else
-static inline void mutex_set_owner(struct mutex *lock)
-{
-}
-
-static inline void mutex_clear_owner(struct mutex *lock)
-{
-}
-#endif
-
 #define debug_mutex_wake_waiter(lock, waiter)		do { } while (0)
 #define debug_mutex_free_waiter(waiter)			do { } while (0)
 #define debug_mutex_add_waiter(lock, waiter, ti)	do { } while (0)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 69243142cad1..0154b10ea614 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -75,11 +75,11 @@
 #include <linux/compiler.h>
 #include <linux/frame.h>
 #include <linux/prefetch.h>
+#include <linux/mutex.h>
 
 #include <asm/switch_to.h>
 #include <asm/tlb.h>
 #include <asm/irq_regs.h>
-#include <asm/mutex.h>
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #endif

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-25 15:43       ` Peter Zijlstra
@ 2016-08-25 16:33         ` Waiman Long
  2016-08-25 16:35           ` Peter Zijlstra
  2016-08-25 19:11         ` huang ying
  1 sibling, 1 reply; 34+ messages in thread
From: Waiman Long @ 2016-08-25 16:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Davidlohr Bueso, Linus Torvalds, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2

On 08/25/2016 11:43 AM, Peter Zijlstra wrote:
> On Tue, Aug 23, 2016 at 06:13:43PM -0700, Jason Low wrote:
>> I tested this patch on an 8 socket system with the high_systime AIM7
>> workload with diskfs. The patch provided big performance improvements in
>> terms of throughput in the highly contended cases.
>>
>> -------------------------------------------------
>> |  users      | avg throughput | avg throughput |
>>                | without patch  | with patch     |
>> -------------------------------------------------
>> | 10 - 90     |   13,943 JPM   |   14,432 JPM   |
>> -------------------------------------------------
>> | 100 - 900   |   75,475 JPM   |  102,922 JPM   |
>> -------------------------------------------------
>> | 1000 - 1900 |   77,299 JPM   |  115,271 JPM   |
>> -------------------------------------------------
>>
>> Unfortunately, at 2000 users, the modified kernel locked up.
>>
>> # INFO: task reaim:<#>  blocked for more than 120 seconds.
>>
>> So something appears to be buggy.
> So with the previously given changes to reaim, I get the below results
> on my 4 socket Haswell with the new version of 1/3 (also below).
>
> I still need to update 3/3..
>
> Note that I think my reaim change wrecked the jobs/min calculation
> somehow, as it keeps increasing. I do think however that the numbers are
> comparable between runs, since they're wrecked the same way.

The performance data for the 2 kernels were roughly the same. This was 
what I had been expecting as there was no change in algorithm in how the 
slowpath was being handled. So I was surprised by Jason's result 
yesterday showing such a big difference.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-25 16:33         ` Waiman Long
@ 2016-08-25 16:35           ` Peter Zijlstra
  2016-08-27 18:27             ` Ingo Molnar
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-25 16:35 UTC (permalink / raw)
  To: Waiman Long
  Cc: Jason Low, Davidlohr Bueso, Linus Torvalds, Ding Tianhong,
	Thomas Gleixner, Will Deacon, Ingo Molnar, Imre Deak,
	Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2

On Thu, Aug 25, 2016 at 12:33:04PM -0400, Waiman Long wrote:
> On 08/25/2016 11:43 AM, Peter Zijlstra wrote:
> >On Tue, Aug 23, 2016 at 06:13:43PM -0700, Jason Low wrote:
> >>I tested this patch on an 8 socket system with the high_systime AIM7
> >>workload with diskfs. The patch provided big performance improvements in
> >>terms of throughput in the highly contended cases.
> >>
> >>-------------------------------------------------
> >>|  users      | avg throughput | avg throughput |
> >>               | without patch  | with patch     |
> >>-------------------------------------------------
> >>| 10 - 90     |   13,943 JPM   |   14,432 JPM   |
> >>-------------------------------------------------
> >>| 100 - 900   |   75,475 JPM   |  102,922 JPM   |
> >>-------------------------------------------------
> >>| 1000 - 1900 |   77,299 JPM   |  115,271 JPM   |
> >>-------------------------------------------------
> >>
> >>Unfortunately, at 2000 users, the modified kernel locked up.
> >>
> >># INFO: task reaim:<#>  blocked for more than 120 seconds.
> >>
> >>So something appears to be buggy.
> >So with the previously given changes to reaim, I get the below results
> >on my 4 socket Haswell with the new version of 1/3 (also below).
> >
> >I still need to update 3/3..
> >
> >Note that I think my reaim change wrecked the jobs/min calculation
> >somehow, as it keeps increasing. I do think however that the numbers are
> >comparable between runs, since they're wrecked the same way.
> 
> The performance data for the 2 kernels were roughly the same. This was what
> I had been expecting as there was no change in algorithm in how the slowpath
> was being handled. So I was surprised by Jason's result yesterday showing
> such a big difference.

Its because the mutex wasn't quite exclusive enough :-) If you let in
multiple owner, like with that race you found, you get big gains in
throughput ...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-25 15:43       ` Peter Zijlstra
  2016-08-25 16:33         ` Waiman Long
@ 2016-08-25 19:11         ` huang ying
  2016-08-25 19:26           ` Peter Zijlstra
  1 sibling, 1 reply; 34+ messages in thread
From: huang ying @ 2016-08-25 19:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Davidlohr Bueso, Linus Torvalds, Waiman Long,
	Ding Tianhong, Thomas Gleixner, Will Deacon, Ingo Molnar,
	Imre Deak, Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	Jason Low

Hi, Peter,

Do you have a git tree branch for this patchset? We want to test it in
0day performance test.  That will make it a little easier.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-25 19:11         ` huang ying
@ 2016-08-25 19:26           ` Peter Zijlstra
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-08-25 19:26 UTC (permalink / raw)
  To: huang ying
  Cc: Jason Low, Davidlohr Bueso, Linus Torvalds, Waiman Long,
	Ding Tianhong, Thomas Gleixner, Will Deacon, Ingo Molnar,
	Imre Deak, Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	Jason Low

On Thu, Aug 25, 2016 at 12:11:25PM -0700, huang ying wrote:
> Hi, Peter,
> 
> Do you have a git tree branch for this patchset? We want to test it in
> 0day performance test.  That will make it a little easier.

I just pushed it out to:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/rfc

But as stated, there's still a number of known issues, like lockdep
builds will not work.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
  2016-08-25 16:35           ` Peter Zijlstra
@ 2016-08-27 18:27             ` Ingo Molnar
  0 siblings, 0 replies; 34+ messages in thread
From: Ingo Molnar @ 2016-08-27 18:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Jason Low, Davidlohr Bueso, Linus Torvalds,
	Ding Tianhong, Thomas Gleixner, Will Deacon, Ingo Molnar,
	Imre Deak, Linux Kernel Mailing List, Tim Chen, Paul E. McKenney,
	jason.low2


* Peter Zijlstra <peterz@infradead.org> wrote:

> Its because the mutex wasn't quite exclusive enough :-) If you let in multiple 
> owner, like with that race you found, you get big gains in throughput ...

Btw., do we know which mutex that was?

That it didn't crash with a full AIM run suggests that whatever it is protecting 
it could probably be parallelized some more while still having a mostly working 
kernel! ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-08-27 18:28 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-23 12:46 [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Peter Zijlstra
2016-08-23 12:46 ` [RFC][PATCH 1/3] locking/mutex: Rework mutex::owner Peter Zijlstra
2016-08-23 19:55   ` Waiman Long
2016-08-23 20:52     ` Tim Chen
2016-08-23 21:03       ` Peter Zijlstra
2016-08-23 21:09     ` Peter Zijlstra
2016-08-23 20:17   ` Waiman Long
2016-08-23 20:31     ` Peter Zijlstra
2016-08-24  9:56   ` Will Deacon
2016-08-24 15:34     ` Peter Zijlstra
2016-08-24 16:52       ` Peter Zijlstra
2016-08-24 16:54         ` Will Deacon
2016-08-23 12:46 ` [RFC][PATCH 2/3] locking/mutex: Allow MUTEX_SPIN_ON_OWNER when DEBUG_MUTEXES Peter Zijlstra
2016-08-23 12:46 ` [RFC][PATCH 3/3] locking/mutex: Add lock handoff to avoid starvation Peter Zijlstra
2016-08-23 12:56   ` Peter Zijlstra
     [not found]   ` <57BCA869.1050501@hpe.com>
2016-08-23 20:32     ` Peter Zijlstra
2016-08-24 19:50       ` Waiman Long
2016-08-25  8:11         ` Peter Zijlstra
2016-08-23 16:17 ` [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex Davidlohr Bueso
2016-08-23 16:35   ` Jason Low
2016-08-23 16:57     ` Peter Zijlstra
2016-08-23 19:36       ` Waiman Long
2016-08-23 20:41         ` Peter Zijlstra
2016-08-23 22:34           ` Waiman Long
2016-08-24  1:13     ` Jason Low
2016-08-25 12:32       ` Peter Zijlstra
2016-08-25 15:43       ` Peter Zijlstra
2016-08-25 16:33         ` Waiman Long
2016-08-25 16:35           ` Peter Zijlstra
2016-08-27 18:27             ` Ingo Molnar
2016-08-25 19:11         ` huang ying
2016-08-25 19:26           ` Peter Zijlstra
2016-08-23 18:53   ` Linus Torvalds
2016-08-23 20:34     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.