From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754874AbdC1KBG (ORCPT ); Tue, 28 Mar 2017 06:01:06 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36038 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754554AbdC1KBC (ORCPT ); Tue, 28 Mar 2017 06:01:02 -0400 Subject: Re: [PATCH 1/5] locking: Introduce range reader/writer lock To: Davidlohr Bueso , mingo@kernel.org, peterz@infradead.org, akpm@linux-foundation.org References: <1488863010-13028-1-git-send-email-dave@stgolabs.net> <1488863010-13028-2-git-send-email-dave@stgolabs.net> Cc: jack@suse.cz, kirill.shutemov@linux.intel.com, mhocko@suse.com, mgorman@techsingularity.net, linux-kernel@vger.kernel.org, Davidlohr Bueso From: Laurent Dufour Date: Tue, 28 Mar 2017 12:00:20 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <1488863010-13028-2-git-send-email-dave@stgolabs.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17032810-0016-0000-0000-000004676ABC X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17032810-0017-0000-0000-000026FEB51A Message-Id: <2f7628f4-58e1-22c4-ccbe-3106c15cb405@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-03-28_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703280091 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/03/2017 06:03, Davidlohr Bueso wrote: > This implements a sleepable range rwlock, based on interval tree, serializing > conflicting/intersecting/overlapping ranges within the tree. The largest range > is given by [0, ~0 - 1] (inclusive). Unlike traditional locks, range locking > involves dealing with the tree itself and the range to be locked, normally > stack allocated and always explicitly prepared/initialized by the user in a > [a0, a1] a0 <= a1 sorted manner, before actually taking the lock. > > We allow exclusive locking of arbitrary ranges. We guarantee that each > range is locked only after all conflicting range locks requested previously > have been unlocked. Thus we achieve fairness and avoid livelocks. > > When new range lock is requested, we add its interval to the tree and store > number of intervals intersecting it to 'blocking_ranges'. When a range is > unlocked, we again walk intervals that intersect with the unlocked one and > decrement their 'blocking_ranges'. We wake up owner of any range lock whose > 'blocking_ranges' drops to 0. > > For the shared case, the 'blocking_ranges' is only incremented if the > intersecting range is not marked as a reader. In order to mitigate some of > the tree walk overhead for non-intersecting ranges, the tree's min/max values > are maintained and consulted in O(1) in the fastpath. > > How much does it cost: > ---------------------- > > The cost of lock and unlock of a range is O(log(R_all)+R_int) where R_all is > total number of ranges and R_int is the number of ranges intersecting the > new range range to be added. > > Due to its sharable nature, full range locks can be compared with rw-sempahores, > which also serves from a mutex standpoint as writer-only situations are > pretty similar nowadays. > > The first is the memory footprint, rwsems are larger than tree locks: 40 vs > 24 bytes, but the later requires an additional 64 bytes of stack for the range > structure. > > Secondly, because every range call is serialized by the tree->lock, any lock() > fastpath will at least have an interval_tree_insert() and spinlock lock+unlock > overhead compared to a single atomic insn in the case of rwsems. Similar scenario > obviously for the unlock() case. > > The torture module was used to measure 1-1 differences in lock acquisition with > increasing core counts over a period of 10 minutes. Readers and writers are > interleaved, with a slight advantage to writers as its the first kthread that is > created. The following shows the avg ops/minute with various thread-setups on > boxes with small and large core-counts. > > ** 4-core AMD Opteron ** > (write-only) > rwsem-2thr: 4198.5, stddev: 7.77 > range-2thr: 4199.1, stddev: 0.73 > > rwsem-4thr: 6036.8, stddev: 50.91 > range-4thr: 6004.9, stddev: 126.57 > > rwsem-8thr: 6245.6, stddev: 59.39 > range-8thr: 6229.3, stddev: 10.60 > > (read-only) > rwsem-2thr: 5930.7, stddev: 21.92 > range-2thr: 5917.3, stddev: 25.45 > > rwsem-4thr: 9881.6, stddev: 0.70 > range-4thr: 9540.2, stddev: 98.28 > > rwsem-8thr: 11633.2, stddev: 7.72 > range-8thr: 11314.7, stddev: 62.22 > > For the read/write-only cases, there is very little difference between the range lock > and rwsems, with up to a 3% hit, which could very well be considered in the noise range. > > (read-write) > rwsem-write-1thr: 1744.8, stddev: 11.59 > rwsem-read-1thr: 1043.1, stddev: 3.97 > range-write-1thr: 1740.2, stddev: 5.99 > range-read-1thr: 1022.5, stddev: 6.41 > > rwsem-write-2thr: 1662.5, stddev: 0.70 > rwsem-read-2thr: 1278.0, stddev: 25.45 > range-write-2thr: 1321.5, stddev: 51.61 > range-read-2thr: 1243.5, stddev: 30.40 > > rwsem-write-4thr: 1761.0, stddev: 11.31 > rwsem-read-4thr: 1426.0, stddev: 7.07 > range-write-4thr: 1417.0, stddev: 29.69 > range-read-4thr: 1398.0, stddev: 56.56 > > While a single reader and writer threads does not show must difference, increasing > core counts shows that in reader/writer workloads, writer threads can take a hit in > raw performance of up to ~20%, while the number of reader throughput is quite similar > among both locks. > > ** 240-core (ht) IvyBridge ** > (write-only) > rwsem-120thr: 6844.5, stddev: 82.73 > range-120thr: 6070.5, stddev: 85.55 > > rwsem-240thr: 6292.5, stddev: 146.3 > range-240thr: 6099.0, stddev: 15.55 > > rwsem-480thr: 6164.8, stddev: 33.94 > range-480thr: 6062.3, stddev: 19.79 > > (read-only) > rwsem-120thr: 136860.4, stddev: 2539.92 > range-120thr: 138052.2, stddev: 327.39 > > rwsem-240thr: 235297.5, stddev: 2220.50 > range-240thr: 232099.1, stddev: 3614.72 > > rwsem-480thr: 272683.0, stddev: 3924.32 > range-480thr: 256539.2, stddev: 9541.69 > > Similar to the small box, larger machines show that range locks take only a minor > (up to ~6% for 480 threads) hit even in completely exclusive or shared scenarios. > > (read-write) > rwsem-write-60thr: 4658.1, stddev: 1303.19 > rwsem-read-60thr: 1108.7, stddev: 718.42 > range-write-60thr: 3203.6, stddev: 139.30 > range-read-60thr: 1852.8, stddev: 147.5 > > rwsem-write-120thr: 3971.3, stddev: 1413.0 > rwsem-read-120thr: 1038.8, stddev: 353.51 > range-write-120thr: 2282.1, stddev: 207.18 > range-read-120thr: 1856.5, stddev: 198.69 > > rwsem-write-240thr: 4112.7, stddev: 2448.1 > rwsem-read-240thr: 1277.4, stddev: 430.30 > range-write-240thr: 2353.1, stddev: 502.04 > range-read-240thr: 1551.5, stddev: 361.33 > > When mixing readers and writers, writer throughput can take a hit of up to ~40%, > similar to the 4 core machine, however, reader threads can increase the number of > acquisitions in up to ~80%. In any case, the overall writer+reader throughput will > always be higher for rwsems. A huge factor in this behavior is that range locks > do not have writer spin-on-owner feature. > > On both machines when actually testing threads acquiring different ranges, the > amount of throughput will always outperform the rwsem, due to the increased > parallelism; which is no surprise either. As such microbenchmarks that merely > pounds on a lock will pretty much always suffer upon direct lock conversions, > but not enough to matter in the overall picture. > > Signed-off-by: Davidlohr Bueso > --- > drivers/gpu/drm/Kconfig | 2 - > drivers/gpu/drm/i915/Kconfig | 1 - > include/linux/range_rwlock.h | 96 +++++++++ > kernel/locking/Makefile | 2 +- > kernel/locking/range_rwlock.c | 462 ++++++++++++++++++++++++++++++++++++++++++ > lib/Kconfig | 14 -- > lib/Kconfig.debug | 1 - > lib/Makefile | 2 +- > 8 files changed, 560 insertions(+), 20 deletions(-) > create mode 100644 include/linux/range_rwlock.h > create mode 100644 kernel/locking/range_rwlock.c > > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig > index 88e01e08e279..e4d9eadd2c47 100644 > --- a/drivers/gpu/drm/Kconfig > +++ b/drivers/gpu/drm/Kconfig > @@ -154,7 +154,6 @@ config DRM_RADEON > select HWMON > select BACKLIGHT_CLASS_DEVICE > select BACKLIGHT_LCD_SUPPORT > - select INTERVAL_TREE > help > Choose this option if you have an ATI Radeon graphics card. There > are both PCI and AGP versions. You don't need to choose this to > @@ -174,7 +173,6 @@ config DRM_AMDGPU > select HWMON > select BACKLIGHT_CLASS_DEVICE > select BACKLIGHT_LCD_SUPPORT > - select INTERVAL_TREE > help > Choose this option if you have a recent AMD Radeon graphics card. > > diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig > index 183f5dc1c3f2..8a9154550f46 100644 > --- a/drivers/gpu/drm/i915/Kconfig > +++ b/drivers/gpu/drm/i915/Kconfig > @@ -3,7 +3,6 @@ config DRM_I915 > depends on DRM > depends on X86 && PCI > select INTEL_GTT > - select INTERVAL_TREE > # we need shmfs for the swappable backing store, and in particular > # the shmem_readpage() which depends upon tmpfs > select SHMEM > diff --git a/include/linux/range_rwlock.h b/include/linux/range_rwlock.h > new file mode 100644 > index 000000000000..174e13668f67 > --- /dev/null > +++ b/include/linux/range_rwlock.h > @@ -0,0 +1,96 @@ > +/* > + * Range locking > + * > + * We allow exclusive locking of arbitrary ranges. We guarantee that each > + * range is locked only after all conflicting range locks requested previously > + * have been unlocked. Thus we achieve fairness and avoid livelocks. > + * > + * The cost of lock and unlock of a range is O(log(R_all)+R_int) where R_all is > + * total number of ranges and R_int is the number of ranges intersecting the > + * operated range. > + */ > +#ifndef _LINUX_RANGE_RWLOCK_H > +#define _LINUX_RANGE_RWLOCK_H > + > +#include > +#include > +#include > +#include > + > +/* > + * The largest range will span [0,RANGE_RWLOCK_INFINITY]. > + */ > +#define RANGE_RWLOCK_INFINITY (~0UL - 1) > + > +struct range_rwlock { > + struct interval_tree_node node; > + struct task_struct *task; > + /* Number of ranges which are blocking acquisition of the lock */ > + unsigned int blocking_ranges; > + bool reader; > +}; > + > +struct range_rwlock_tree { > + struct rb_root root; > + spinlock_t lock; > + struct interval_tree_node *leftmost; /* compute smallest 'start' */ > +}; > + > +#define __RANGE_RWLOCK_TREE_INITIALIZER(name) \ > + { .leftmost = NULL \ > + , .root = RB_ROOT \ > + , .lock = __SPIN_LOCK_UNLOCKED(name.lock) } > + > +#define DEFINE_RANGE_RWLOCK_TREE(name) \ > + struct range_rwlock_tree name = __RANGE_RWLOCK_TREE_INITIALIZER(name) > + > +#define __RANGE_RWLOCK_INITIALIZER(start, last) { \ > + .node = { \ > + .start = (start) \ > + ,.last = (last) \ > + } \ > + } Hi Davidlohr, This macro doesn't expand correctly because the field name ".start" is replaced by the start parameter. Should rather be : #define __RANGE_RWLOCK_INITIALIZER(__start, __last) { \ .node = { \ .start = (__start) \ ,.last = (__last) \ } \ } By the way, should the other fields set as in __range_rwlock_init() ? > + > +#define DEFINE_RANGE_RWLOCK(name, start, last) \ > + struct range_rwlock name = __RANGE_RWLOCK_INITIALIZER((start), (last)) > + > +#define DEFINE_RANGE_RWLOCK_INF(name) \ > + struct range_rwlock name = __RANGE_RWLOCK_INITIALIZER(0, RANGE_RWLOCK_INFINITY) > + > +static inline void range_rwlock_tree_init(struct range_rwlock_tree *tree) > +{ > + tree->root = RB_ROOT; > + spin_lock_init(&tree->lock); > + tree->leftmost = NULL; > +} > + > +void range_rwlock_init(struct range_rwlock *lock, > + unsigned long start, unsigned long last); > +void range_rwlock_init_inf(struct range_rwlock *lock); > + > +/* > + * lock for reading > + */ > +void range_read_lock(struct range_rwlock_tree *tree, struct range_rwlock *lock); > +int range_read_lock_interruptible(struct range_rwlock_tree *tree, > + struct range_rwlock *lock); > +int range_read_lock_killable(struct range_rwlock_tree *tree, > + struct range_rwlock *lock); > +int range_read_trylock(struct range_rwlock_tree *tree, struct range_rwlock *lock); > +void range_read_unlock(struct range_rwlock_tree *tree, struct range_rwlock *lock); > + > +/* > + * lock for writing > + */ > +void range_write_lock(struct range_rwlock_tree *tree, struct range_rwlock *lock); > +int range_write_lock_interruptible(struct range_rwlock_tree *tree, > + struct range_rwlock *lock); > +int range_write_lock_killable(struct range_rwlock_tree *tree, > + struct range_rwlock *lock); > +int range_read_trylock(struct range_rwlock_tree *tree, struct range_rwlock *lock); ^^^^ range_write_trylock(...) isn't it ? Cheers, Laurent. > +void range_write_unlock(struct range_rwlock_tree *tree, struct range_rwlock *lock); > + > +void range_downgrade_write(struct range_rwlock_tree *tree, > + struct range_rwlock *lock); > + > +#endif > diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile > index 760158d9d98d..bf054d0af82b 100644 > --- a/kernel/locking/Makefile > +++ b/kernel/locking/Makefile > @@ -2,7 +2,7 @@ > # and is generally not a function of system call inputs. > KCOV_INSTRUMENT := n > > -obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o > +obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o range_rwlock.o > > ifdef CONFIG_FUNCTION_TRACER > CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE) > diff --git a/kernel/locking/range_rwlock.c b/kernel/locking/range_rwlock.c > new file mode 100644 > index 000000000000..4d92baa9bd80 > --- /dev/null > +++ b/kernel/locking/range_rwlock.c > @@ -0,0 +1,462 @@ > +/* > + * Implementation of read/write range locks. > + * > + * We keep interval tree of locked and to-be-locked ranges. When new range lock > + * is requested, we add its interval to the tree and store number of intervals > + * intersecting it to 'blocking_ranges'. > + * > + * When a range is unlocked, we again walk intervals that intersect with the > + * unlocked one and decrement their 'blocking_ranges'. We wake up owner of any > + * range lock whose 'blocking_ranges' drops to 0. For the shared case, the > + * 'blocking_ranges' is only incremented if the intersecting range is not marked > + * as a reader. In order to mitigate some of the tree walk overhead for > + * non-intersecting ranges, the tree's min/max values are maintained and consulted > + * in O(1) in the fastpath. > + */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > + > +#define range_entry(ptr, type, member) container_of(ptr, type, member) > + > +/* > + * Range intersection/overlap between A: [a0, a1] and B: [b0, b1] > + * is given by: > + * > + * a0 <= b1 && b0 <= a1 > + * > + * ... where A holds the lock range and B holds the smallest 'start' and > + * largest 'last' in the tree. For the later, we rely on the root node, > + * which by augmented interval tree property, holds the largest value in > + * its last-in-subtree. > + */ > +static inline bool > +__range_intersects_intree(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + struct interval_tree_node *root; > + > + if (unlikely(RB_EMPTY_ROOT(&tree->root))) > + return false; > + > + root = range_entry(tree->root.rb_node, struct interval_tree_node, rb); > + > + return lock->node.start <= root->__subtree_last && > + tree->leftmost->start <= lock->node.last; > +} > + > +static inline void > +__range_tree_insert(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + if (unlikely(RB_EMPTY_ROOT(&tree->root))) > + tree->leftmost = &lock->node; > + else if (lock->node.start < tree->leftmost->start) > + tree->leftmost = &lock->node; > + > + interval_tree_insert(&lock->node, &tree->root); > +} > + > +static inline void > +__range_tree_remove(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + if (tree->leftmost == &lock->node) { > + struct rb_node *next = rb_next(&tree->leftmost->rb); > + tree->leftmost = range_entry(next, struct interval_tree_node, rb); > + } > + > + interval_tree_remove(&lock->node, &tree->root); > +} > + > +static inline void > +__range_rwlock_init(struct range_rwlock *lock, unsigned long start, > + unsigned long last) > +{ > + WARN_ON(start > last); > + > + lock->node.start = start; > + lock->node.last = last; > + RB_CLEAR_NODE(&lock->node.rb); > + lock->blocking_ranges = 0; > + lock->reader = false; > +} > + > +/** > + * range_rwlock_init - initialize the range lock > + * @lock: the range lock to be initialized > + * @start: start of the interval (inclusive) > + * @last: last location in the interval (inclusive) > + * > + * Initialize the range's [start, last] such that it can > + * later be locked. User is expected to enter a sorted > + * range, such that @start <= @last. > + * > + * It is not allowed to initialize an already locked range. > + */ > +void range_rwlock_init(struct range_rwlock *lock, unsigned long start, > + unsigned long last) > +{ > + __range_rwlock_init(lock, start, last); > +} > +EXPORT_SYMBOL(range_rwlock_init); > + > +/** > + * range_rwlock_init_inf - initialize the range lock to infinity > + * @lock: the range lock to be initialized > + * > + * Initialize the full range. > + * > + * It is not allowed to initialize an already locked range. > + */ > +void range_rwlock_init_inf(struct range_rwlock *lock) > +{ > + __range_rwlock_init(lock, 0, RANGE_RWLOCK_INFINITY); > +} > +EXPORT_SYMBOL(range_rwlock_init_inf); > + > +static inline void > +range_rwlock_unblock(struct range_rwlock *lock, struct wake_q_head *wake_q) > +{ > + if (!--lock->blocking_ranges) > + wake_q_add(wake_q, lock->task); > +} > + > +static inline int wait_for_ranges(struct range_rwlock_tree *tree, > + struct range_rwlock *lock, long state) > +{ > + int ret = 0; > + > + while (true) { > + set_current_state(state); > + > + if (unlikely(signal_pending_state(state, current))) { > + unsigned long flags; > + > + ret = -EINTR; > + /* > + * We're not taking the lock after all, cleanup > + * after ourselves. > + */ > + spin_lock_irqsave(&tree->lock, flags); > + lock->reader = false; > + __range_tree_remove(tree, lock); > + spin_unlock_irqrestore(&tree->lock, flags); > + break; > + } > + > + /* do we need to go to sleep? */ > + if (!lock->blocking_ranges) > + break; > + > + schedule(); > + } > + > + __set_current_state(TASK_RUNNING); > + return ret; > +} > + > +static __always_inline int > +__range_read_lock_common(struct range_rwlock_tree *tree, > + struct range_rwlock *lock, long state) > +{ > + struct interval_tree_node *node; > + unsigned long flags; > + > + spin_lock_irqsave(&tree->lock, flags); > + lock->reader = true; > + > + if (!__range_intersects_intree(tree, lock)) > + goto insert; > + > + node = interval_tree_iter_first(&tree->root, lock->node.start, > + lock->node.last); > + while (node) { > + struct range_rwlock *blocked_lock; > + blocked_lock = range_entry(node, struct range_rwlock, node); > + > + if (!blocked_lock->reader) > + lock->blocking_ranges++; > + node = interval_tree_iter_next(node, lock->node.start, > + lock->node.last); > + } > +insert: > + __range_tree_insert(tree, lock); > + > + lock->task = current; > + spin_unlock_irqrestore(&tree->lock, flags); > + > + return wait_for_ranges(tree, lock, state); > +} > + > +void range_read_lock(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + might_sleep(); > + __range_read_lock_common(tree, lock, TASK_UNINTERRUPTIBLE); > + > +} > +EXPORT_SYMBOL(range_read_lock); > + > +int range_read_lock_interruptible(struct range_rwlock_tree *tree, > + struct range_rwlock *lock) > +{ > + might_sleep(); > + return __range_read_lock_common(tree, lock, TASK_INTERRUPTIBLE); > + > +} > +EXPORT_SYMBOL(range_read_lock_interruptible); > + > +int range_read_lock_killable(struct range_rwlock_tree *tree, > + struct range_rwlock *lock) > +{ > + might_sleep(); > + return __range_read_lock_common(tree, lock, TASK_KILLABLE); > + > +} > +EXPORT_SYMBOL(range_read_lock_killable); > + > +/** > + * range_read_trylock - Trylock for reading > + * @tree: interval tree > + * @lock: the range lock to be trylocked > + * > + * The trylock is against the range itself, not the @tree->lock. > + * > + * Returns 1 if successful, 0 if contention (must block to acquire). > + */ > +int range_read_trylock(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + int ret = true; > + unsigned long flags; > + struct interval_tree_node *node; > + > + spin_lock_irqsave(&tree->lock, flags); > + > + if (!__range_intersects_intree(tree, lock)) > + goto insert; > + > + /* > + * We have overlapping ranges in the tree, ensure that we can > + * in fact share the lock. > + */ > + node = interval_tree_iter_first(&tree->root, lock->node.start, > + lock->node.last); > + while (node) { > + struct range_rwlock *blocked_lock; > + blocked_lock = range_entry(node, struct range_rwlock, node); > + > + if (!blocked_lock->reader) { > + ret = false; > + goto unlock; > + } > + > + node = interval_tree_iter_next(node, lock->node.start, > + lock->node.last); > + } > +insert: > + lock->reader = true; > + __range_tree_insert(tree, lock); > +unlock: > + spin_unlock_irqrestore(&tree->lock, flags); > + > + return ret; > +} > +EXPORT_SYMBOL(range_read_trylock); > + > +void range_read_unlock(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + struct interval_tree_node *node; > + unsigned long flags; > + DEFINE_WAKE_Q(wake_q); > + > + spin_lock_irqsave(&tree->lock, flags); > + > + lock->reader = false; > + __range_tree_remove(tree, lock); > + > + if (!__range_intersects_intree(tree, lock)) { > + /* nobody to wakeup, we're done */ > + spin_unlock_irqrestore(&tree->lock, flags); > + return; > + } > + > + node = interval_tree_iter_first(&tree->root, lock->node.start, > + lock->node.last); > + while (node) { > + struct range_rwlock *blocked_lock; > + blocked_lock = range_entry(node, struct range_rwlock, node); > + > + if (!blocked_lock->reader) > + range_rwlock_unblock(blocked_lock, &wake_q); > + > + node = interval_tree_iter_next(node, lock->node.start, > + lock->node.last); > + } > + > + spin_unlock_irqrestore(&tree->lock, flags); > + wake_up_q(&wake_q); > +} > +EXPORT_SYMBOL(range_read_unlock); > + > +static inline int __sched > +__range_write_lock_common(struct range_rwlock_tree *tree, > + struct range_rwlock *lock, long state) > +{ > + struct interval_tree_node *node; > + unsigned long flags; > + > + spin_lock_irqsave(&tree->lock, flags); > + lock->reader = false; > + > + if (!__range_intersects_intree(tree, lock)) > + goto insert; > + > + node = interval_tree_iter_first(&tree->root, lock->node.start, > + lock->node.last); > + while (node) { > + /* > + * As a writer, we always consider an existing node. We > + * need to block; either the intersecting node is another > + * writer or we have a reader that needs to finish. > + */ > + lock->blocking_ranges++; > + node = interval_tree_iter_next(node, lock->node.start, > + lock->node.last); > + } > +insert: > + __range_tree_insert(tree, lock); > + > + lock->task = current; > + spin_unlock_irqrestore(&tree->lock, flags); > + > + return wait_for_ranges(tree, lock, state); > +} > + > +void range_write_lock(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + might_sleep(); > + __range_write_lock_common(tree, lock, TASK_UNINTERRUPTIBLE); > +} > +EXPORT_SYMBOL(range_write_lock); > + > +int range_write_lock_interruptible(struct range_rwlock_tree *tree, > + struct range_rwlock *lock) > +{ > + might_sleep(); > + return __range_write_lock_common(tree, lock, TASK_INTERRUPTIBLE); > +} > +EXPORT_SYMBOL(range_write_lock_interruptible); > + > +int range_write_lock_killable(struct range_rwlock_tree *tree, > + struct range_rwlock *lock) > +{ > + might_sleep(); > + return __range_write_lock_common(tree, lock, TASK_KILLABLE); > +} > +EXPORT_SYMBOL(range_write_lock_killable); > + > +/** > + * range_write_trylock - Trylock for writing > + * @tree: interval tree > + * @lock: the range lock to be trylocked > + * > + * The trylock is against the range itself, not the @tree->lock. > + * > + * Returns 1 if successful, 0 if contention (must block to acquire). > + */ > +int range_write_trylock(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + int intersects; > + unsigned long flags; > + > + spin_lock_irqsave(&tree->lock, flags); > + intersects = __range_intersects_intree(tree, lock); > + > + if (!intersects) { > + lock->reader = false; > + __range_tree_insert(tree, lock); > + } > + > + spin_unlock_irqrestore(&tree->lock, flags); > + > + return !intersects; > +} > +EXPORT_SYMBOL(range_write_trylock); > + > +void range_write_unlock(struct range_rwlock_tree *tree, struct range_rwlock *lock) > +{ > + struct interval_tree_node *node; > + unsigned long flags; > + DEFINE_WAKE_Q(wake_q); > + > + spin_lock_irqsave(&tree->lock, flags); > + > + lock->reader = false; > + __range_tree_remove(tree, lock); > + > + if (!__range_intersects_intree(tree, lock)) { > + /* nobody to wakeup, we're done */ > + spin_unlock_irqrestore(&tree->lock, flags); > + return; > + } > + > + node = interval_tree_iter_first(&tree->root, lock->node.start, > + lock->node.last); > + while (node) { > + struct range_rwlock *blocked_lock; > + blocked_lock = range_entry(node, struct range_rwlock, node); > + > + range_rwlock_unblock(blocked_lock, &wake_q); > + node = interval_tree_iter_next(node, lock->node.start, > + lock->node.last); > + } > + > + spin_unlock_irqrestore(&tree->lock, flags); > + wake_up_q(&wake_q); > +} > +EXPORT_SYMBOL(range_write_unlock); > + > +/** > + * range_downgrade_write - Downgrade write range lock to read lock > + * @tree: interval tree > + * @lock: the range lock to be downgraded > + * > + * Wakes any blocked readers, when @lock is the only conflicting range. > + * > + * It is not allowed to downgrade an unacquired write lock. > + */ > +void range_downgrade_write(struct range_rwlock_tree *tree, > + struct range_rwlock *lock) > +{ > + unsigned long flags; > + struct interval_tree_node *node; > + DEFINE_WAKE_Q(wake_q); > + > + spin_lock_irqsave(&tree->lock, flags); > + > + WARN_ON(lock->reader); > + > + /* > + * Unaccount for any blocked reader lock. Wakeup if possible. > + */ > + node = interval_tree_iter_first(&tree->root, lock->node.start, > + lock->node.last); > + while (node) { > + struct range_rwlock *blocked_lock; > + blocked_lock = range_entry(node, struct range_rwlock, node); > + > + if (blocked_lock->reader) > + range_rwlock_unblock(blocked_lock, &wake_q); > + > + node = interval_tree_iter_next(node, lock->node.start, > + lock->node.last); > + } > + > + lock->reader = true; > + spin_unlock_irqrestore(&tree->lock, flags); > + wake_up_q(&wake_q); > +} > +EXPORT_SYMBOL(range_downgrade_write); > diff --git a/lib/Kconfig b/lib/Kconfig > index 0c8b78a9ae2e..52ca6668a16d 100644 > --- a/lib/Kconfig > +++ b/lib/Kconfig > @@ -347,20 +347,6 @@ config TEXTSEARCH_FSM > config BTREE > bool > > -config INTERVAL_TREE > - bool > - help > - Simple, embeddable, interval-tree. Can find the start of an > - overlapping range in log(n) time and then iterate over all > - overlapping nodes. The algorithm is implemented as an > - augmented rbtree. > - > - See: > - > - Documentation/rbtree.txt > - > - for more information. > - > config RADIX_TREE_MULTIORDER > bool > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 97d62c2da6c2..a8a64eb54d3a 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1771,7 +1771,6 @@ config RBTREE_TEST > config INTERVAL_TREE_TEST > tristate "Interval tree test" > depends on m && DEBUG_KERNEL > - select INTERVAL_TREE > help > A benchmark measuring the performance of the interval tree library > > diff --git a/lib/Makefile b/lib/Makefile > index 320ac46a8725..d4ff3057450a 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -80,7 +80,7 @@ obj-$(CONFIG_DEBUG_LOCKING_API_SELFTESTS) += locking-selftest.o > obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o > > obj-$(CONFIG_BTREE) += btree.o > -obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o > +obj-y += interval_tree.o > obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o > obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o > obj-$(CONFIG_DEBUG_LIST) += list_debug.o >