From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758491Ab3BZNAA (ORCPT ); Tue, 26 Feb 2013 08:00:00 -0500 Received: from mail-ia0-f173.google.com ([209.85.210.173]:50038 "EHLO mail-ia0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750926Ab3BZM74 (ORCPT ); Tue, 26 Feb 2013 07:59:56 -0500 MIME-Version: 1.0 In-Reply-To: <512C7A38.8060906@linux.vnet.ibm.com> References: <20130218123714.26245.61816.stgit@srivatsabhat.in.ibm.com> <20130218123856.26245.46705.stgit@srivatsabhat.in.ibm.com> <5122551E.1080703@linux.vnet.ibm.com> <51226B46.9080707@linux.vnet.ibm.com> <51226F91.7000108@linux.vnet.ibm.com> <512BBAD8.8010006@linux.vnet.ibm.com> <512C7A38.8060906@linux.vnet.ibm.com> Date: Tue, 26 Feb 2013 20:59:54 +0800 Message-ID: Subject: Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks From: Lai Jiangshan To: "Srivatsa S. Bhat" Cc: Michel Lespinasse , linux-doc@vger.kernel.org, peterz@infradead.org, fweisbec@gmail.com, linux-kernel@vger.kernel.org, namhyung@kernel.org, mingo@kernel.org, linux-arch@vger.kernel.org, linux@arm.linux.org.uk, xiaoguangrong@linux.vnet.ibm.com, wangyun@linux.vnet.ibm.com, paulmck@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, linux-pm@vger.kernel.org, rusty@rustcorp.com.au, rostedt@goodmis.org, rjw@sisk.pl, vincent.guittot@linaro.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, oleg@redhat.com, sbw@mit.edu, tj@kernel.org, akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat wrote: > On 02/26/2013 05:47 AM, Lai Jiangshan wrote: >> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat >> wrote: >>> Hi Lai, >>> >>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote: >>>> Hi, Srivatsa, >>>> >>>> The target of the whole patchset is nice for me. >>> >>> Cool! Thanks :-) >>> > [...] >>>> I wrote an untested draft here. >>>> >>>> Thanks, >>>> Lai >>>> >>>> PS: Some HA tools(I'm writing one) which takes checkpoints of >>>> virtual-machines frequently, I guess this patchset can speedup the >>>> tools. >>>> >>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001 >>>> From: Lai Jiangshan >>>> Date: Mon, 25 Feb 2013 23:14:27 +0800 >>>> Subject: [PATCH] lglock: add read-preference local-global rwlock >>>> >>>> locality via lglock(trylock) >>>> read-preference read-write-lock via fallback rwlock_t >>>> >>>> Signed-off-by: Lai Jiangshan >>>> --- >>>> include/linux/lglock.h | 31 +++++++++++++++++++++++++++++++ >>>> kernel/lglock.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 76 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h >>>> index 0d24e93..30fe887 100644 >>>> --- a/include/linux/lglock.h >>>> +++ b/include/linux/lglock.h >>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu); >>>> void lg_global_lock(struct lglock *lg); >>>> void lg_global_unlock(struct lglock *lg); >>>> >>>> +struct lgrwlock { >>>> + unsigned long __percpu *fallback_reader_refcnt; >>>> + struct lglock lglock; >>>> + rwlock_t fallback_rwlock; >>>> +}; >>>> + >>>> +#define DEFINE_LGRWLOCK(name) \ >>>> + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ >>>> + = __ARCH_SPIN_LOCK_UNLOCKED; \ >>>> + static DEFINE_PER_CPU(unsigned long, name ## _refcnt); \ >>>> + struct lgrwlock name = { \ >>>> + .fallback_reader_refcnt = &name ## _refcnt, \ >>>> + .lglock = { .lock = &name ## _lock } } >>>> + >>>> +#define DEFINE_STATIC_LGRWLOCK(name) \ >>>> + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ >>>> + = __ARCH_SPIN_LOCK_UNLOCKED; \ >>>> + static DEFINE_PER_CPU(unsigned long, name ## _refcnt); \ >>>> + static struct lgrwlock name = { \ >>>> + .fallback_reader_refcnt = &name ## _refcnt, \ >>>> + .lglock = { .lock = &name ## _lock } } >>>> + >>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name) >>>> +{ >>>> + lg_lock_init(&lgrw->lglock, name); >>>> +} >>>> + >>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw); >>>> #endif >>>> diff --git a/kernel/lglock.c b/kernel/lglock.c >>>> index 6535a66..463543a 100644 >>>> --- a/kernel/lglock.c >>>> +++ b/kernel/lglock.c >>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg) >>>> preempt_enable(); >>>> } >>>> EXPORT_SYMBOL(lg_global_unlock); >>>> + >>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw) >>>> +{ >>>> + struct lglock *lg = &lgrw->lglock; >>>> + >>>> + preempt_disable(); >>>> + if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) { >>>> + if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) { >>>> + rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_); >>>> + return; >>>> + } >>>> + read_lock(&lgrw->fallback_rwlock); >>>> + } >>>> + >>>> + __this_cpu_inc(*lgrw->fallback_reader_refcnt); >>>> +} >>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock); >>>> + >>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw) >>>> +{ >>>> + if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) { >>>> + lg_local_unlock(&lgrw->lglock); >>>> + return; >>>> + } >>>> + >>>> + if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt)) >>>> + read_unlock(&lgrw->fallback_rwlock); >>>> + >>>> + preempt_enable(); >>>> +} >>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock); >>>> + >>> >>> If I read the code above correctly, all you are doing is implementing a >>> recursive reader-side primitive (ie., allowing the reader to call these >>> functions recursively, without resulting in a self-deadlock). >>> >>> But the thing is, making the reader-side recursive is the least of our >>> problems! Our main challenge is to make the locking extremely flexible >>> and also safe-guard it against circular-locking-dependencies and deadlocks. >>> Please take a look at the changelog of patch 1 - it explains the situation >>> with an example. >> >> >> My lock fixes your requirements(I read patch 1-6 before I sent). In >> readsite, lglock 's lock is token via trylock, the lglock doesn't >> contribute to deadlocks, we can consider it doesn't exist when we find >> deadlock from it. And global fallback rwlock doesn't result to >> deadlocks because it is read-preference(you need to inc the >> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do >> it in generic lgrwlock) >> > > Ah, since you hadn't mentioned the increment at the writer-side in your > previous email, I had missed the bigger picture of what you were trying > to achieve. > >> >> If lg_rwlock_local_read_lock() spins, which means >> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means >> lg_rwlock_global_write_lock() took the lgrwlock successfully and >> return, and which means lg_rwlock_local_read_lock() will stop spinning >> when the write side finished. >> > > Unfortunately, I see quite a few issues with the code above. IIUC, the > writer and the reader both increment the same counters. So how will the > unlock() code in the reader path know when to unlock which of the locks? The same as your code, the reader(which nested in write C.S.) just dec the counters. > (The counter-dropping-to-zero logic is not safe, since it can be updated > due to different reasons). And now that I look at it again, in the absence > of the writer, the reader is allowed to be recursive at the heavy cost of > taking the global rwlock for read, every 2nd time you nest (because the > spinlock is non-recursive). (I did not understand your comments of this part) nested reader is considered seldom. But if N(>=2) nested readers happen, the overhead is: 1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc() > Also, this lg_rwlock implementation uses 3 > different data-structures - a per-cpu spinlock, a global rwlock and > a per-cpu refcnt, and its not immediately apparent why you need those many > or even those many varieties. data-structures is the same as yours. fallback_reader_refcnt <--> reader_refcnt per-cpu spinlock <--> write_signal fallback_rwlock <---> global_rwlock > Also I see that this doesn't handle the > case of interrupt-handlers also being readers. handled. nested reader will see the ref or take the fallback_rwlock > > IMHO, the per-cpu rwlock scheme that I have implemented in this patchset > has a clean, understandable design and just enough data-structures/locks > to achieve its goal and has several optimizations (like reducing the > interrupts-disabled time etc) included - all in a very straight-forward > manner. Since this is non-trivial, IMHO, starting from a clean slate is > actually better than trying to retrofit the logic into some locking scheme > which we actively want to avoid (and hence effectively we aren't even > borrowing anything from!). > > To summarize, if you are just pointing out that we can implement the same > logic by altering lglocks, then sure, I acknowledge the possibility. > However, I don't think doing that actually makes it better; it either > convolutes the logic unnecessarily, or ends up looking _very_ similar to > the implementation in this patchset, from what I can see. > > Regards, > Srivatsa S. Bhat > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-x236.google.com (mail-ie0-x236.google.com [IPv6:2607:f8b0:4001:c03::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 8A5642C0326 for ; Tue, 26 Feb 2013 23:59:58 +1100 (EST) Received: by mail-ie0-f182.google.com with SMTP id k14so4424187iea.27 for ; Tue, 26 Feb 2013 04:59:55 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <512C7A38.8060906@linux.vnet.ibm.com> References: <20130218123714.26245.61816.stgit@srivatsabhat.in.ibm.com> <20130218123856.26245.46705.stgit@srivatsabhat.in.ibm.com> <5122551E.1080703@linux.vnet.ibm.com> <51226B46.9080707@linux.vnet.ibm.com> <51226F91.7000108@linux.vnet.ibm.com> <512BBAD8.8010006@linux.vnet.ibm.com> <512C7A38.8060906@linux.vnet.ibm.com> Date: Tue, 26 Feb 2013 20:59:54 +0800 Message-ID: Subject: Re: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks From: Lai Jiangshan To: "Srivatsa S. Bhat" Content-Type: text/plain; charset=ISO-8859-1 Cc: linux-doc@vger.kernel.org, peterz@infradead.org, fweisbec@gmail.com, linux-kernel@vger.kernel.org, Michel Lespinasse , mingo@kernel.org, linux-arch@vger.kernel.org, linux@arm.linux.org.uk, xiaoguangrong@linux.vnet.ibm.com, wangyun@linux.vnet.ibm.com, paulmck@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, linux-pm@vger.kernel.org, rusty@rustcorp.com.au, rostedt@goodmis.org, rjw@sisk.pl, namhyung@kernel.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, oleg@redhat.com, vincent.guittot@linaro.org, sbw@mit.edu, tj@kernel.org, akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat wrote: > On 02/26/2013 05:47 AM, Lai Jiangshan wrote: >> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat >> wrote: >>> Hi Lai, >>> >>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote: >>>> Hi, Srivatsa, >>>> >>>> The target of the whole patchset is nice for me. >>> >>> Cool! Thanks :-) >>> > [...] >>>> I wrote an untested draft here. >>>> >>>> Thanks, >>>> Lai >>>> >>>> PS: Some HA tools(I'm writing one) which takes checkpoints of >>>> virtual-machines frequently, I guess this patchset can speedup the >>>> tools. >>>> >>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001 >>>> From: Lai Jiangshan >>>> Date: Mon, 25 Feb 2013 23:14:27 +0800 >>>> Subject: [PATCH] lglock: add read-preference local-global rwlock >>>> >>>> locality via lglock(trylock) >>>> read-preference read-write-lock via fallback rwlock_t >>>> >>>> Signed-off-by: Lai Jiangshan >>>> --- >>>> include/linux/lglock.h | 31 +++++++++++++++++++++++++++++++ >>>> kernel/lglock.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 76 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h >>>> index 0d24e93..30fe887 100644 >>>> --- a/include/linux/lglock.h >>>> +++ b/include/linux/lglock.h >>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu); >>>> void lg_global_lock(struct lglock *lg); >>>> void lg_global_unlock(struct lglock *lg); >>>> >>>> +struct lgrwlock { >>>> + unsigned long __percpu *fallback_reader_refcnt; >>>> + struct lglock lglock; >>>> + rwlock_t fallback_rwlock; >>>> +}; >>>> + >>>> +#define DEFINE_LGRWLOCK(name) \ >>>> + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ >>>> + = __ARCH_SPIN_LOCK_UNLOCKED; \ >>>> + static DEFINE_PER_CPU(unsigned long, name ## _refcnt); \ >>>> + struct lgrwlock name = { \ >>>> + .fallback_reader_refcnt = &name ## _refcnt, \ >>>> + .lglock = { .lock = &name ## _lock } } >>>> + >>>> +#define DEFINE_STATIC_LGRWLOCK(name) \ >>>> + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ >>>> + = __ARCH_SPIN_LOCK_UNLOCKED; \ >>>> + static DEFINE_PER_CPU(unsigned long, name ## _refcnt); \ >>>> + static struct lgrwlock name = { \ >>>> + .fallback_reader_refcnt = &name ## _refcnt, \ >>>> + .lglock = { .lock = &name ## _lock } } >>>> + >>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name) >>>> +{ >>>> + lg_lock_init(&lgrw->lglock, name); >>>> +} >>>> + >>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw); >>>> #endif >>>> diff --git a/kernel/lglock.c b/kernel/lglock.c >>>> index 6535a66..463543a 100644 >>>> --- a/kernel/lglock.c >>>> +++ b/kernel/lglock.c >>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg) >>>> preempt_enable(); >>>> } >>>> EXPORT_SYMBOL(lg_global_unlock); >>>> + >>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw) >>>> +{ >>>> + struct lglock *lg = &lgrw->lglock; >>>> + >>>> + preempt_disable(); >>>> + if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) { >>>> + if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) { >>>> + rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_); >>>> + return; >>>> + } >>>> + read_lock(&lgrw->fallback_rwlock); >>>> + } >>>> + >>>> + __this_cpu_inc(*lgrw->fallback_reader_refcnt); >>>> +} >>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock); >>>> + >>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw) >>>> +{ >>>> + if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) { >>>> + lg_local_unlock(&lgrw->lglock); >>>> + return; >>>> + } >>>> + >>>> + if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt)) >>>> + read_unlock(&lgrw->fallback_rwlock); >>>> + >>>> + preempt_enable(); >>>> +} >>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock); >>>> + >>> >>> If I read the code above correctly, all you are doing is implementing a >>> recursive reader-side primitive (ie., allowing the reader to call these >>> functions recursively, without resulting in a self-deadlock). >>> >>> But the thing is, making the reader-side recursive is the least of our >>> problems! Our main challenge is to make the locking extremely flexible >>> and also safe-guard it against circular-locking-dependencies and deadlocks. >>> Please take a look at the changelog of patch 1 - it explains the situation >>> with an example. >> >> >> My lock fixes your requirements(I read patch 1-6 before I sent). In >> readsite, lglock 's lock is token via trylock, the lglock doesn't >> contribute to deadlocks, we can consider it doesn't exist when we find >> deadlock from it. And global fallback rwlock doesn't result to >> deadlocks because it is read-preference(you need to inc the >> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do >> it in generic lgrwlock) >> > > Ah, since you hadn't mentioned the increment at the writer-side in your > previous email, I had missed the bigger picture of what you were trying > to achieve. > >> >> If lg_rwlock_local_read_lock() spins, which means >> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means >> lg_rwlock_global_write_lock() took the lgrwlock successfully and >> return, and which means lg_rwlock_local_read_lock() will stop spinning >> when the write side finished. >> > > Unfortunately, I see quite a few issues with the code above. IIUC, the > writer and the reader both increment the same counters. So how will the > unlock() code in the reader path know when to unlock which of the locks? The same as your code, the reader(which nested in write C.S.) just dec the counters. > (The counter-dropping-to-zero logic is not safe, since it can be updated > due to different reasons). And now that I look at it again, in the absence > of the writer, the reader is allowed to be recursive at the heavy cost of > taking the global rwlock for read, every 2nd time you nest (because the > spinlock is non-recursive). (I did not understand your comments of this part) nested reader is considered seldom. But if N(>=2) nested readers happen, the overhead is: 1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc() > Also, this lg_rwlock implementation uses 3 > different data-structures - a per-cpu spinlock, a global rwlock and > a per-cpu refcnt, and its not immediately apparent why you need those many > or even those many varieties. data-structures is the same as yours. fallback_reader_refcnt <--> reader_refcnt per-cpu spinlock <--> write_signal fallback_rwlock <---> global_rwlock > Also I see that this doesn't handle the > case of interrupt-handlers also being readers. handled. nested reader will see the ref or take the fallback_rwlock > > IMHO, the per-cpu rwlock scheme that I have implemented in this patchset > has a clean, understandable design and just enough data-structures/locks > to achieve its goal and has several optimizations (like reducing the > interrupts-disabled time etc) included - all in a very straight-forward > manner. Since this is non-trivial, IMHO, starting from a clean slate is > actually better than trying to retrofit the logic into some locking scheme > which we actively want to avoid (and hence effectively we aren't even > borrowing anything from!). > > To summarize, if you are just pointing out that we can implement the same > logic by altering lglocks, then sure, I acknowledge the possibility. > However, I don't think doing that actually makes it better; it either > convolutes the logic unnecessarily, or ends up looking _very_ similar to > the implementation in this patchset, from what I can see. > > Regards, > Srivatsa S. Bhat > From mboxrd@z Thu Jan 1 00:00:00 1970 From: eag0628@gmail.com (Lai Jiangshan) Date: Tue, 26 Feb 2013 20:59:54 +0800 Subject: [PATCH v6 04/46] percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks In-Reply-To: <512C7A38.8060906@linux.vnet.ibm.com> References: <20130218123714.26245.61816.stgit@srivatsabhat.in.ibm.com> <20130218123856.26245.46705.stgit@srivatsabhat.in.ibm.com> <5122551E.1080703@linux.vnet.ibm.com> <51226B46.9080707@linux.vnet.ibm.com> <51226F91.7000108@linux.vnet.ibm.com> <512BBAD8.8010006@linux.vnet.ibm.com> <512C7A38.8060906@linux.vnet.ibm.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Feb 26, 2013 at 5:02 PM, Srivatsa S. Bhat wrote: > On 02/26/2013 05:47 AM, Lai Jiangshan wrote: >> On Tue, Feb 26, 2013 at 3:26 AM, Srivatsa S. Bhat >> wrote: >>> Hi Lai, >>> >>> On 02/25/2013 09:23 PM, Lai Jiangshan wrote: >>>> Hi, Srivatsa, >>>> >>>> The target of the whole patchset is nice for me. >>> >>> Cool! Thanks :-) >>> > [...] >>>> I wrote an untested draft here. >>>> >>>> Thanks, >>>> Lai >>>> >>>> PS: Some HA tools(I'm writing one) which takes checkpoints of >>>> virtual-machines frequently, I guess this patchset can speedup the >>>> tools. >>>> >>>> From 01db542693a1b7fc6f9ece45d57cb529d9be5b66 Mon Sep 17 00:00:00 2001 >>>> From: Lai Jiangshan >>>> Date: Mon, 25 Feb 2013 23:14:27 +0800 >>>> Subject: [PATCH] lglock: add read-preference local-global rwlock >>>> >>>> locality via lglock(trylock) >>>> read-preference read-write-lock via fallback rwlock_t >>>> >>>> Signed-off-by: Lai Jiangshan >>>> --- >>>> include/linux/lglock.h | 31 +++++++++++++++++++++++++++++++ >>>> kernel/lglock.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 76 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/include/linux/lglock.h b/include/linux/lglock.h >>>> index 0d24e93..30fe887 100644 >>>> --- a/include/linux/lglock.h >>>> +++ b/include/linux/lglock.h >>>> @@ -67,4 +67,35 @@ void lg_local_unlock_cpu(struct lglock *lg, int cpu); >>>> void lg_global_lock(struct lglock *lg); >>>> void lg_global_unlock(struct lglock *lg); >>>> >>>> +struct lgrwlock { >>>> + unsigned long __percpu *fallback_reader_refcnt; >>>> + struct lglock lglock; >>>> + rwlock_t fallback_rwlock; >>>> +}; >>>> + >>>> +#define DEFINE_LGRWLOCK(name) \ >>>> + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ >>>> + = __ARCH_SPIN_LOCK_UNLOCKED; \ >>>> + static DEFINE_PER_CPU(unsigned long, name ## _refcnt); \ >>>> + struct lgrwlock name = { \ >>>> + .fallback_reader_refcnt = &name ## _refcnt, \ >>>> + .lglock = { .lock = &name ## _lock } } >>>> + >>>> +#define DEFINE_STATIC_LGRWLOCK(name) \ >>>> + static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \ >>>> + = __ARCH_SPIN_LOCK_UNLOCKED; \ >>>> + static DEFINE_PER_CPU(unsigned long, name ## _refcnt); \ >>>> + static struct lgrwlock name = { \ >>>> + .fallback_reader_refcnt = &name ## _refcnt, \ >>>> + .lglock = { .lock = &name ## _lock } } >>>> + >>>> +static inline void lg_rwlock_init(struct lgrwlock *lgrw, char *name) >>>> +{ >>>> + lg_lock_init(&lgrw->lglock, name); >>>> +} >>>> + >>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw); >>>> +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw); >>>> #endif >>>> diff --git a/kernel/lglock.c b/kernel/lglock.c >>>> index 6535a66..463543a 100644 >>>> --- a/kernel/lglock.c >>>> +++ b/kernel/lglock.c >>>> @@ -87,3 +87,48 @@ void lg_global_unlock(struct lglock *lg) >>>> preempt_enable(); >>>> } >>>> EXPORT_SYMBOL(lg_global_unlock); >>>> + >>>> +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw) >>>> +{ >>>> + struct lglock *lg = &lgrw->lglock; >>>> + >>>> + preempt_disable(); >>>> + if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) { >>>> + if (likely(arch_spin_trylock(this_cpu_ptr(lg->lock)))) { >>>> + rwlock_acquire_read(&lg->lock_dep_map, 0, 0, _RET_IP_); >>>> + return; >>>> + } >>>> + read_lock(&lgrw->fallback_rwlock); >>>> + } >>>> + >>>> + __this_cpu_inc(*lgrw->fallback_reader_refcnt); >>>> +} >>>> +EXPORT_SYMBOL(lg_rwlock_local_read_lock); >>>> + >>>> +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw) >>>> +{ >>>> + if (likely(!__this_cpu_read(*lgrw->fallback_reader_refcnt))) { >>>> + lg_local_unlock(&lgrw->lglock); >>>> + return; >>>> + } >>>> + >>>> + if (!__this_cpu_dec_return(*lgrw->fallback_reader_refcnt)) >>>> + read_unlock(&lgrw->fallback_rwlock); >>>> + >>>> + preempt_enable(); >>>> +} >>>> +EXPORT_SYMBOL(lg_rwlock_local_read_unlock); >>>> + >>> >>> If I read the code above correctly, all you are doing is implementing a >>> recursive reader-side primitive (ie., allowing the reader to call these >>> functions recursively, without resulting in a self-deadlock). >>> >>> But the thing is, making the reader-side recursive is the least of our >>> problems! Our main challenge is to make the locking extremely flexible >>> and also safe-guard it against circular-locking-dependencies and deadlocks. >>> Please take a look at the changelog of patch 1 - it explains the situation >>> with an example. >> >> >> My lock fixes your requirements(I read patch 1-6 before I sent). In >> readsite, lglock 's lock is token via trylock, the lglock doesn't >> contribute to deadlocks, we can consider it doesn't exist when we find >> deadlock from it. And global fallback rwlock doesn't result to >> deadlocks because it is read-preference(you need to inc the >> fallback_reader_refcnt inside the cpu-hotplug write-side, I don't do >> it in generic lgrwlock) >> > > Ah, since you hadn't mentioned the increment at the writer-side in your > previous email, I had missed the bigger picture of what you were trying > to achieve. > >> >> If lg_rwlock_local_read_lock() spins, which means >> lg_rwlock_local_read_lock() spins on fallback_rwlock, and which means >> lg_rwlock_global_write_lock() took the lgrwlock successfully and >> return, and which means lg_rwlock_local_read_lock() will stop spinning >> when the write side finished. >> > > Unfortunately, I see quite a few issues with the code above. IIUC, the > writer and the reader both increment the same counters. So how will the > unlock() code in the reader path know when to unlock which of the locks? The same as your code, the reader(which nested in write C.S.) just dec the counters. > (The counter-dropping-to-zero logic is not safe, since it can be updated > due to different reasons). And now that I look at it again, in the absence > of the writer, the reader is allowed to be recursive at the heavy cost of > taking the global rwlock for read, every 2nd time you nest (because the > spinlock is non-recursive). (I did not understand your comments of this part) nested reader is considered seldom. But if N(>=2) nested readers happen, the overhead is: 1 spin_try_lock() + 1 read_lock() + (N-1) __this_cpu_inc() > Also, this lg_rwlock implementation uses 3 > different data-structures - a per-cpu spinlock, a global rwlock and > a per-cpu refcnt, and its not immediately apparent why you need those many > or even those many varieties. data-structures is the same as yours. fallback_reader_refcnt <--> reader_refcnt per-cpu spinlock <--> write_signal fallback_rwlock <---> global_rwlock > Also I see that this doesn't handle the > case of interrupt-handlers also being readers. handled. nested reader will see the ref or take the fallback_rwlock > > IMHO, the per-cpu rwlock scheme that I have implemented in this patchset > has a clean, understandable design and just enough data-structures/locks > to achieve its goal and has several optimizations (like reducing the > interrupts-disabled time etc) included - all in a very straight-forward > manner. Since this is non-trivial, IMHO, starting from a clean slate is > actually better than trying to retrofit the logic into some locking scheme > which we actively want to avoid (and hence effectively we aren't even > borrowing anything from!). > > To summarize, if you are just pointing out that we can implement the same > logic by altering lglocks, then sure, I acknowledge the possibility. > However, I don't think doing that actually makes it better; it either > convolutes the logic unnecessarily, or ends up looking _very_ similar to > the implementation in this patchset, from what I can see. > > Regards, > Srivatsa S. Bhat >