linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* What if a TLB flush needed to sleep?
@ 2008-03-25 20:49 Luck, Tony
  2008-03-25 21:47 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Luck, Tony @ 2008-03-25 20:49 UTC (permalink / raw)
  To: linux-arch; +Cc: linux-mm, linux-kernel

ia64 processors have a "ptc.g" instruction that will purge
a TLB entry across all processors in a system.  On current
cpus there is a limitation that only one ptc.g instruction may
be in flight at a time, so we serialize execution with code
like this:

	spin_lock(&ptcg_lock);
	... execute ptc.g
	spin_unlock(&ptcg_lock);

The architecture allows for more than one purge at a time.
So (without making any declarations about features of
unreleased processors) it seemed like time to update the
code to grab the maximum count from PAL, use that to
initialize a semaphore, and change the code to:

	down(&ptcg_sem);
	... execute ptc.g
	up(&ptcg_sem);

This code lasted about a week before someone ran hackbench
with parameters chosen to cause some swap activity (memory
footprint ~8.5GB on an 8GB system).  The machine promptly
deadlocked because VM code called the tlbflush code while
holding an anon_vma_lock, the semaphore happened to sleep
because some other processor was also trying to do a purge,
and the test was on a system where the limit was still just
one ptc.g at a time, and the process got swapped.

Now for the questions:

1) Is holding a spin lock a problem for any other arch when
doing a TLB flush (I'm particularly thinking of those that
need to use IPI shootdown for the purge)?

2) Is it feasible to rearrange the MM code so that we don't
hold any locks while doing a TLB flush?  Or should I implement
some sort of spin_only_semaphore?

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-25 20:49 What if a TLB flush needed to sleep? Luck, Tony
@ 2008-03-25 21:47 ` Alan Cox
  2008-03-26 12:32 ` Matthew Wilcox
  2008-03-26 19:25 ` What if a TLB flush needed to sleep? Christoph Lameter
  2 siblings, 0 replies; 23+ messages in thread
From: Alan Cox @ 2008-03-25 21:47 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-arch, linux-mm, linux-kernel

> 	down(&ptcg_sem);
> 	... execute ptc.g
> 	up(&ptcg_sem);

That will dig you a nice large hole for real time to fall into. If you
want to do rt nicely you want to avoid semaphores and the corresponding
lack of ability to fix priority inversions.

> 2) Is it feasible to rearrange the MM code so that we don't
> hold any locks while doing a TLB flush?  Or should I implement
> some sort of spin_only_semaphore?

Better to keep ia64 perversions in the IA64 code whenever possible and
lower risk for everyone else.

Alan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-25 20:49 What if a TLB flush needed to sleep? Luck, Tony
  2008-03-25 21:47 ` Alan Cox
@ 2008-03-26 12:32 ` Matthew Wilcox
  2008-03-26 20:29   ` Luck, Tony
  2008-03-26 19:25 ` What if a TLB flush needed to sleep? Christoph Lameter
  2 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2008-03-26 12:32 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-arch, linux-mm, linux-kernel

On Tue, Mar 25, 2008 at 01:49:54PM -0700, Luck, Tony wrote:
> 1) Is holding a spin lock a problem for any other arch when
> doing a TLB flush (I'm particularly thinking of those that
> need to use IPI shootdown for the purge)?

parisc certainly has that problem too.  It won't happen very often, but
it will do an on_each_cpu() and wait for the outcome once in a while.

> 2) Is it feasible to rearrange the MM code so that we don't
> hold any locks while doing a TLB flush?  Or should I implement
> some sort of spin_only_semaphore?

down_spin() is trivial to implement without knowing the details of the
semaphore code:

void down_spin(struct semaphore *sem)
{
	while (down_trylock(sem))
		cpu_relax();
}

Of course, someone who wrote it could do better ;-)

void down_spin(struct semaphore *sem)
{
	unsigned long flags;
	int count;

	spin_lock_irqsave(&sem->lock, flags);
	count = sem->count - 1;
	if (likely(count >= 0))
		sem->count = count;
	else
		__down_spin(sem);
	spin_unlock_irqrestore(&sem->lock, flags);
}

void __down_spin(struct semaphore *sem)
{
	struct semaphore_waiter waiter;

	list_add_tail(&waiter.list, &sem->wait_list);
	waiter.task = current;
	waiter.up = 0;

	spin_unlock_irq(&sem->lock);
	while (!waiter.up)
		cpu_relax();
	spin_lock_irq(&sem->lock);
}

This more complex implementation is better because:
 - It queues properly (see also down_timeout)
 - It spins on a stack-local variable, not on a global structure

Having done all that ... I bet parisc and ia64 aren't the only two
architectures which can sleep in their tlb flush handlers.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-25 20:49 What if a TLB flush needed to sleep? Luck, Tony
  2008-03-25 21:47 ` Alan Cox
  2008-03-26 12:32 ` Matthew Wilcox
@ 2008-03-26 19:25 ` Christoph Lameter
  2008-03-26 20:29   ` Thomas Gleixner
  2 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2008-03-26 19:25 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-arch, linux-mm, linux-kernel

On Tue, 25 Mar 2008, Luck, Tony wrote:

> 2) Is it feasible to rearrange the MM code so that we don't
> hold any locks while doing a TLB flush?  Or should I implement
> some sort of spin_only_semaphore?

The EMM notifier V2 patchset contains two patches that 
convert the immap_lock and the anon_vma lock to semaphores. After that
much of the TLB flushing is (tlb_finish_mmu, tlb_gather etc) is running 
without holding any spinlocks. There would need to be additional measures 
for flushing inherent in macros (like ptep_clear_flush). Currently the 
pte functions are called under pte lock.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-26 19:25 ` What if a TLB flush needed to sleep? Christoph Lameter
@ 2008-03-26 20:29   ` Thomas Gleixner
  2008-03-27  1:19     ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2008-03-26 20:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Luck, Tony, linux-arch, linux-mm, linux-kernel

On Wed, 26 Mar 2008, Christoph Lameter wrote:

> On Tue, 25 Mar 2008, Luck, Tony wrote:
> 
> > 2) Is it feasible to rearrange the MM code so that we don't
> > hold any locks while doing a TLB flush?  Or should I implement
> > some sort of spin_only_semaphore?
> 
> The EMM notifier V2 patchset contains two patches that 
> convert the immap_lock and the anon_vma lock to semaphores.

Please use a mutex, not a semaphore. semaphores should only be used
when you need a counting sempahore.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: What if a TLB flush needed to sleep?
  2008-03-26 12:32 ` Matthew Wilcox
@ 2008-03-26 20:29   ` Luck, Tony
  2008-03-27  8:09     ` Jens Axboe
  2008-03-27 14:15     ` down_spin() implementation Matthew Wilcox
  0 siblings, 2 replies; 23+ messages in thread
From: Luck, Tony @ 2008-03-26 20:29 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-arch, linux-mm, linux-kernel

> Of course, someone who wrote it could do better ;-)

Here is Willy's code in patch format (against linux-next
tree tag next-20080326 which includes his re-write of the
semaphore code).

This looks a lot cleaner than my ia64 specific code that
used cmpxchg() for the down() operation and fetchadd for
the up() ... using a brand new semaphore_spin data type.

It appears to work ... I tried to do some timing comparisons
of this generic version against my arch specific one, but the
hackbench test case has a run to run variation of a factor of
three (from 1min9sec to 3min44sec) so it is hopeless to try
and see some small percentage difference.

commit 0359fbb64297d44328f26ec5fda3a3c26f1c5ba7
Author: Matthew Wilcox <matthew@wil.cx>
Date:   Wed Mar 26 11:08:18 2008 -0700

    Add "down_spin()" API for semaphores
    
    For those places that need semaphore semantics but cannot sleep
    
    Signed-off-by: Tony Luck <tony.luck@intel.com>

diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
index a7125da..3404ce5 100644
--- a/include/linux/semaphore.h
+++ b/include/linux/semaphore.h
@@ -53,6 +53,12 @@ static inline void sema_init(struct semaphore *sem, int val)
 extern void down(struct semaphore *sem);
 
 /*
+ * Attempt to acquire the semaphore.  If another task is already holding the
+ * semaphore, spin until the semaphore is released.
+ */
+extern void down_spin(struct semaphore *sem);
+
+/*
  * As down(), except the sleep may be interrupted by a signal.  If it is,
  * this function will return -EINTR.
  */
diff --git a/kernel/semaphore.c b/kernel/semaphore.c
index bef977b..d3eb559 100644
--- a/kernel/semaphore.c
+++ b/kernel/semaphore.c
@@ -23,6 +23,7 @@
  */
 
 static noinline void __down(struct semaphore *sem);
+static noinline void __down_spin(struct semaphore *sem);
 static noinline int __down_interruptible(struct semaphore *sem);
 static noinline int __down_killable(struct semaphore *sem);
 static noinline int __down_timeout(struct semaphore *sem, long jiffies);
@@ -41,6 +42,21 @@ void down(struct semaphore *sem)
 }
 EXPORT_SYMBOL(down);
 
+void down_spin(struct semaphore *sem)
+{
+	unsigned long flags;
+	int count;
+
+	spin_lock_irqsave(&sem->lock, flags);
+	count = sem->count - 1;
+	if (likely(count >= 0))
+		sem->count = count;
+	else
+		__down_spin(sem);
+	spin_unlock_irqrestore(&sem->lock, flags);
+}
+EXPORT_SYMBOL(down_spin);
+
 int down_interruptible(struct semaphore *sem)
 {
 	unsigned long flags;
@@ -197,6 +213,20 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long jiffies)
 	return __down_common(sem, TASK_UNINTERRUPTIBLE, jiffies);
 }
 
+static noinline void __sched __down_spin(struct semaphore *sem)
+{
+	struct semaphore_waiter waiter;
+
+	list_add_tail(&waiter.list, &sem->wait_list);
+	waiter.task = current;
+	waiter.up = 0;
+
+	spin_unlock_irq(&sem->lock);
+	while (!waiter.up)
+		cpu_relax();
+	spin_lock_irq(&sem->lock);
+}
+
 static noinline void __sched __up(struct semaphore *sem)
 {
 	struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-26 20:29   ` Thomas Gleixner
@ 2008-03-27  1:19     ` Christoph Lameter
  2008-03-27 13:20       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2008-03-27  1:19 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Luck, Tony, linux-arch, linux-mm, linux-kernel

On Wed, 26 Mar 2008, Thomas Gleixner wrote:

> Please use a mutex, not a semaphore. semaphores should only be used
> when you need a counting sempahore.

Seems that mutexes are mainly useful for 2 processor systems since they 
do not allow concurrent read sections. We want multiple processors able 
to reclaim pages within the same vma or file concurrently. This means 
processors need to be able to concurrently walk potentially long lists of 
vmas.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-26 20:29   ` Luck, Tony
@ 2008-03-27  8:09     ` Jens Axboe
  2008-03-27 14:15     ` down_spin() implementation Matthew Wilcox
  1 sibling, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2008-03-27  8:09 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Matthew Wilcox, linux-arch, linux-mm, linux-kernel

On Wed, Mar 26 2008, Luck, Tony wrote:
> > Of course, someone who wrote it could do better ;-)
> 
> Here is Willy's code in patch format (against linux-next
> tree tag next-20080326 which includes his re-write of the
> semaphore code).
> 
> This looks a lot cleaner than my ia64 specific code that
> used cmpxchg() for the down() operation and fetchadd for
> the up() ... using a brand new semaphore_spin data type.
> 
> It appears to work ... I tried to do some timing comparisons
> of this generic version against my arch specific one, but the
> hackbench test case has a run to run variation of a factor of
> three (from 1min9sec to 3min44sec) so it is hopeless to try
> and see some small percentage difference.
> 
> commit 0359fbb64297d44328f26ec5fda3a3c26f1c5ba7
> Author: Matthew Wilcox <matthew@wil.cx>
> Date:   Wed Mar 26 11:08:18 2008 -0700
> 
>     Add "down_spin()" API for semaphores
>     
>     For those places that need semaphore semantics but cannot sleep
>     
>     Signed-off-by: Tony Luck <tony.luck@intel.com>
> 
> diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
> index a7125da..3404ce5 100644
> --- a/include/linux/semaphore.h
> +++ b/include/linux/semaphore.h
> @@ -53,6 +53,12 @@ static inline void sema_init(struct semaphore *sem, int val)
>  extern void down(struct semaphore *sem);
>  
>  /*
> + * Attempt to acquire the semaphore.  If another task is already holding the
> + * semaphore, spin until the semaphore is released.
> + */
> +extern void down_spin(struct semaphore *sem);
> +
> +/*
>   * As down(), except the sleep may be interrupted by a signal.  If it is,
>   * this function will return -EINTR.
>   */
> diff --git a/kernel/semaphore.c b/kernel/semaphore.c
> index bef977b..d3eb559 100644
> --- a/kernel/semaphore.c
> +++ b/kernel/semaphore.c
> @@ -23,6 +23,7 @@
>   */
>  
>  static noinline void __down(struct semaphore *sem);
> +static noinline void __down_spin(struct semaphore *sem);
>  static noinline int __down_interruptible(struct semaphore *sem);
>  static noinline int __down_killable(struct semaphore *sem);
>  static noinline int __down_timeout(struct semaphore *sem, long jiffies);
> @@ -41,6 +42,21 @@ void down(struct semaphore *sem)
>  }
>  EXPORT_SYMBOL(down);
>  
> +void down_spin(struct semaphore *sem)
> +{
> +	unsigned long flags;
> +	int count;
> +
> +	spin_lock_irqsave(&sem->lock, flags);
> +	count = sem->count - 1;
> +	if (likely(count >= 0))
> +		sem->count = count;
> +	else
> +		__down_spin(sem);
> +	spin_unlock_irqrestore(&sem->lock, flags);
> +}
> +EXPORT_SYMBOL(down_spin);
> +
>  int down_interruptible(struct semaphore *sem)
>  {
>  	unsigned long flags;
> @@ -197,6 +213,20 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long jiffies)
>  	return __down_common(sem, TASK_UNINTERRUPTIBLE, jiffies);
>  }
>  
> +static noinline void __sched __down_spin(struct semaphore *sem)
> +{
> +	struct semaphore_waiter waiter;
> +
> +	list_add_tail(&waiter.list, &sem->wait_list);
> +	waiter.task = current;
> +	waiter.up = 0;
> +
> +	spin_unlock_irq(&sem->lock);
> +	while (!waiter.up)
> +		cpu_relax();
> +	spin_lock_irq(&sem->lock);
> +}

This doesn't look very nice - if down_spin() is called with interrupts
disabled, __down_spin() enables them.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-27  1:19     ` Christoph Lameter
@ 2008-03-27 13:20       ` Peter Zijlstra
  2008-03-27 18:44         ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2008-03-27 13:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, Luck, Tony, linux-arch, linux-mm, linux-kernel

On Wed, 2008-03-26 at 18:19 -0700, Christoph Lameter wrote:
> On Wed, 26 Mar 2008, Thomas Gleixner wrote:
> 
> > Please use a mutex, not a semaphore. semaphores should only be used
> > when you need a counting sempahore.
> 
> Seems that mutexes are mainly useful for 2 processor systems since they 
> do not allow concurrent read sections. We want multiple processors able 
> to reclaim pages within the same vma or file concurrently. This means 
> processors need to be able to concurrently walk potentially long lists of 
> vmas.

confusion between semaphores and rwsems




^ permalink raw reply	[flat|nested] 23+ messages in thread

* down_spin() implementation
  2008-03-26 20:29   ` Luck, Tony
  2008-03-27  8:09     ` Jens Axboe
@ 2008-03-27 14:15     ` Matthew Wilcox
  2008-03-28  0:01       ` Nick Piggin
                         ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Matthew Wilcox @ 2008-03-27 14:15 UTC (permalink / raw)
  To: Luck, Tony, Stephen Rothwell; +Cc: linux-arch, linux-mm, linux-kernel

On Wed, Mar 26, 2008 at 01:29:58PM -0700, Luck, Tony wrote:
> This looks a lot cleaner than my ia64 specific code that
> used cmpxchg() for the down() operation and fetchadd for
> the up() ... using a brand new semaphore_spin data type.

I did brifly consider creating a spinaphore data type, but it's
significantly less code to create down_spin().

> It appears to work ... I tried to do some timing comparisons
> of this generic version against my arch specific one, but the
> hackbench test case has a run to run variation of a factor of
> three (from 1min9sec to 3min44sec) so it is hopeless to try
> and see some small percentage difference.

Thanks for testing and putting this together in patch form.  I've fixed it
up to address Jens' astute comment and added it to my semaphore patchset.

http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=semaphore-20080327

Stephen, I've updated the 'semaphore' tag to point ot the same place as
semaphore-20080327, so please change your linux-next tree from pulling
semaphore-20080314 to just pulling plain 'semaphore'.  I'll use this
method of tagging from now on.

Here's the edited patch.

commit 517df6fedc88af3f871cf827a62ef1a1a2073645
Author: Matthew Wilcox <matthew@wil.cx>
Date:   Thu Mar 27 09:49:26 2008 -0400

    Add down_spin()
    
    ia64 would like to use a semaphore in flush_tlb_all() as it can have
    multiple tokens.  Unfortunately, it's currently nested inside a spinlock,
    so they can't sleep.  down_spin() is the cheapest solution to implement.
    
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
index a7125da..13b5f32 100644
--- a/include/linux/semaphore.h
+++ b/include/linux/semaphore.h
@@ -78,6 +78,14 @@ extern int __must_check down_trylock(struct semaphore *sem);
 extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
 
 /*
+ * As down(), except this function will spin waiting for the semaphore
+ * instead of sleeping.  It is safe to use while holding a spinlock or
+ * with interrupts disabled.  It should not be called from interrupt
+ * context as this may lead to deadlocks.
+ */
+extern void down_spin(struct semaphore *sem);
+
+/*
  * Release the semaphore.  Unlike mutexes, up() may be called from any
  * context and even by tasks which have never called down().
  */
diff --git a/kernel/semaphore.c b/kernel/semaphore.c
index bef977b..a242d87 100644
--- a/kernel/semaphore.c
+++ b/kernel/semaphore.c
@@ -26,6 +26,7 @@ static noinline void __down(struct semaphore *sem);
 static noinline int __down_interruptible(struct semaphore *sem);
 static noinline int __down_killable(struct semaphore *sem);
 static noinline int __down_timeout(struct semaphore *sem, long jiffies);
+static noinline void __down_spin(struct semaphore *sem, unsigned long flags);
 static noinline void __up(struct semaphore *sem);
 
 void down(struct semaphore *sem)
@@ -117,6 +118,20 @@ int down_timeout(struct semaphore *sem, long jiffies)
 }
 EXPORT_SYMBOL(down_timeout);
 
+void down_spin(struct semaphore *sem)
+{
+       unsigned long flags;
+
+       spin_lock_irqsave(&sem->lock, flags);
+       if (likely(sem->count > 0)) {
+               sem->count--;
+               spin_unlock_irqrestore(&sem->lock, flags);
+       } else {
+               __down_spin(sem, flags);
+       }
+}
+EXPORT_SYMBOL(down_spin);
+
 void up(struct semaphore *sem)
 {
        unsigned long flags;
@@ -197,6 +212,20 @@ static noinline int __sched __down_timeout(struct semaphore
        return __down_common(sem, TASK_UNINTERRUPTIBLE, jiffies);
 }
 
+static noinline void __sched __down_spin(struct semaphore *sem,
+                                                       unsigned long flags)
+{
+       struct semaphore_waiter waiter;
+
+       list_add_tail(&waiter.list, &sem->wait_list);
+       waiter.task = current;
+       waiter.up = 0;
+
+       spin_unlock_irqrestore(&sem->lock, flags);
+       while (!waiter.up)
+               cpu_relax();
+}
+
 static noinline void __sched __up(struct semaphore *sem)
 {
        struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,


-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-27 13:20       ` Peter Zijlstra
@ 2008-03-27 18:44         ` Christoph Lameter
  2008-03-28  9:59           ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2008-03-27 18:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Luck, Tony, linux-arch, linux-mm, linux-kernel

On Thu, 27 Mar 2008, Peter Zijlstra wrote:

> confusion between semaphores and rwsems

rwsem is not a semaphore despite its name? What do you want to call it 
then?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-27 14:15     ` down_spin() implementation Matthew Wilcox
@ 2008-03-28  0:01       ` Nick Piggin
  2008-03-28 12:45         ` Matthew Wilcox
  2008-03-28  4:51       ` Stephen Rothwell
  2008-03-28 12:51       ` Jens Axboe
  2 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2008-03-28  0:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luck, Tony, Stephen Rothwell, linux-arch, linux-mm, linux-kernel

On Friday 28 March 2008 01:15, Matthew Wilcox wrote:
> On Wed, Mar 26, 2008 at 01:29:58PM -0700, Luck, Tony wrote:
> > This looks a lot cleaner than my ia64 specific code that
> > used cmpxchg() for the down() operation and fetchadd for
> > the up() ... using a brand new semaphore_spin data type.
>
> I did brifly consider creating a spinaphore data type, but it's
> significantly less code to create down_spin().
>
> > It appears to work ... I tried to do some timing comparisons
> > of this generic version against my arch specific one, but the
> > hackbench test case has a run to run variation of a factor of
> > three (from 1min9sec to 3min44sec) so it is hopeless to try
> > and see some small percentage difference.
>
> Thanks for testing and putting this together in patch form.  I've fixed it
> up to address Jens' astute comment and added it to my semaphore patchset.
>
> http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=semap
>hore-20080327
>
> Stephen, I've updated the 'semaphore' tag to point ot the same place as
> semaphore-20080327, so please change your linux-next tree from pulling
> semaphore-20080314 to just pulling plain 'semaphore'.  I'll use this
> method of tagging from now on.
>
> Here's the edited patch.
>
> commit 517df6fedc88af3f871cf827a62ef1a1a2073645
> Author: Matthew Wilcox <matthew@wil.cx>
> Date:   Thu Mar 27 09:49:26 2008 -0400
>
>     Add down_spin()
>
>     ia64 would like to use a semaphore in flush_tlb_all() as it can have
>     multiple tokens.  Unfortunately, it's currently nested inside a
> spinlock, so they can't sleep.  down_spin() is the cheapest solution to
> implement.

Uhm, how do you use this exactly? All other holders of this
semaphore must have preempt disabled and not sleep, right? (and
so you need a new down() that disables preempt too)

So the only difference between this and a spinlock I guess is
that waiters can sleep rather than spin on contention (except
this down_spin guy, which sleeps).

Oh, I see from the context of Tony's message... so this can *only*
be used when preempt is off, and *only* against other down_spin
lockers.

Bad idea to be hack this into the semaphore code, IMO. It would
take how many lines to implement it properly?


struct {
  atomic_t cur;
  int max;
} ss_t;

void spin_init(ss_t *ss, int max)
{
	&ss->cur = ATOMIC_INIT(0);
	&ss->max = max;
}

void spin_take(ss_t *ss)
{
  preempt_disable();
  while (unlikely(!atomic_add_unless(&ss->cur, 1, &ss->max))) {
    while (atomic_read(&ss->cur) == ss->max)
      cpu_relax();
  }
}

void spin_put(ss_t *ss)
{
  smp_mb();
  atomic_dec(&ss->cur);
  preempt_enable();
}

About the same number as down_spin(). And it is much harder to
misuse. So LOC isn't such a great argument for this kind of thing.

My 2c


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-27 14:15     ` down_spin() implementation Matthew Wilcox
  2008-03-28  0:01       ` Nick Piggin
@ 2008-03-28  4:51       ` Stephen Rothwell
  2008-03-28  5:03         ` Nick Piggin
  2008-03-28 12:51       ` Jens Axboe
  2 siblings, 1 reply; 23+ messages in thread
From: Stephen Rothwell @ 2008-03-28  4:51 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Luck, Tony, linux-arch, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

Hi Willy,

On Thu, 27 Mar 2008 08:15:08 -0600 Matthew Wilcox <matthew@wil.cx> wrote:
>
> Stephen, I've updated the 'semaphore' tag to point ot the same place as
> semaphore-20080327, so please change your linux-next tree from pulling
> semaphore-20080314 to just pulling plain 'semaphore'.  I'll use this
> method of tagging from now on.

Thanks. I read this to late for today's tree, but I will fix it up for
the next one.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-28  4:51       ` Stephen Rothwell
@ 2008-03-28  5:03         ` Nick Piggin
  2008-03-28 12:46           ` Matthew Wilcox
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2008-03-28  5:03 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Matthew Wilcox, Luck, Tony, linux-arch, linux-mm, linux-kernel

On Friday 28 March 2008 15:51, Stephen Rothwell wrote:
> Hi Willy,
>
> On Thu, 27 Mar 2008 08:15:08 -0600 Matthew Wilcox <matthew@wil.cx> wrote:
> > Stephen, I've updated the 'semaphore' tag to point ot the same place as
> > semaphore-20080327, so please change your linux-next tree from pulling
> > semaphore-20080314 to just pulling plain 'semaphore'.  I'll use this
> > method of tagging from now on.
>
> Thanks. I read this to late for today's tree, but I will fix it up for
> the next one.

Please don't add this nasty code to semaphore.

Did my previous message to the thread get eaten by spam filters?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What if a TLB flush needed to sleep?
  2008-03-27 18:44         ` Christoph Lameter
@ 2008-03-28  9:59           ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2008-03-28  9:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, Luck, Tony, linux-arch, linux-mm, linux-kernel

On Thu, 2008-03-27 at 11:44 -0700, Christoph Lameter wrote:
> On Thu, 27 Mar 2008, Peter Zijlstra wrote:
> 
> > confusion between semaphores and rwsems
> 
> rwsem is not a semaphore despite its name? What do you want to call it 
> then?

Its not a real counting semaphore, a sleeping rw lock might be a better
name as opposed to the contradition rw-mutex :-)

But lets just call it a rwsem; we all know what that is.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-28  0:01       ` Nick Piggin
@ 2008-03-28 12:45         ` Matthew Wilcox
  2008-03-28 21:16           ` Luck, Tony
  2008-03-29  1:04           ` Nick Piggin
  0 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2008-03-28 12:45 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Luck, Tony, Stephen Rothwell, linux-arch, linux-mm, linux-kernel

On Fri, Mar 28, 2008 at 11:01:24AM +1100, Nick Piggin wrote:
> Uhm, how do you use this exactly? All other holders of this
> semaphore must have preempt disabled and not sleep, right? (and
> so you need a new down() that disables preempt too)

Ah, I see what you're saying.  The deadlock would be (on a single CPU
machine), task A holding the semaphore, being preempted, task B taking
a spinlock (thus non-preemptable), then calling down_spin() which will
never succeed.

That hadn't occurred to me -- I'm not used to thinking about preemption.
I considered interrupt context and saw how that would deadlock, so just
put a note in the documentation that it wasn't usable from interrupts.

So it makes little sense to add this to semaphores.  Better to introduce
a spinaphore, as you say.

> struct {
>   atomic_t cur;
>   int max;
> } ss_t;
> 
> void spin_init(ss_t *ss, int max)
> {
> 	&ss->cur = ATOMIC_INIT(0);
> 	&ss->max = max;
> }
> 
> void spin_take(ss_t *ss)
> {
>   preempt_disable();
>   while (unlikely(!atomic_add_unless(&ss->cur, 1, &ss->max))) {
>     while (atomic_read(&ss->cur) == ss->max)
>       cpu_relax();
>   }
> }

I think we can do better here with:

	atomic_set(max);

and

	while (unlikely(!atomic_add_unless(&ss->cur, -1, 0)))
		while (atomic_read(&ss->cur) == 0)
			cpu_relax();

It still spins on the spinaphore itself rather than on a local cacheline,
so there's room for improvement.  But it's not clear whether it'd be
worth it.

> About the same number as down_spin(). And it is much harder to
> misuse. So LOC isn't such a great argument for this kind of thing.

LOC wasn't really my argument -- I didn't want to introduce a new data
structure unnecessarily.  But the pitfalls (that I hadn't seen) of
mixing down_spin() into semaphores are just too awful.

I'll pop this patch off the stack of semaphore patches.  Thanks.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-28  5:03         ` Nick Piggin
@ 2008-03-28 12:46           ` Matthew Wilcox
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2008-03-28 12:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Stephen Rothwell, Luck, Tony, linux-arch, linux-mm, linux-kernel

On Fri, Mar 28, 2008 at 04:03:33PM +1100, Nick Piggin wrote:
> On Friday 28 March 2008 15:51, Stephen Rothwell wrote:
> > Hi Willy,
> >
> > On Thu, 27 Mar 2008 08:15:08 -0600 Matthew Wilcox <matthew@wil.cx> wrote:
> > > Stephen, I've updated the 'semaphore' tag to point ot the same place as
> > > semaphore-20080327, so please change your linux-next tree from pulling
> > > semaphore-20080314 to just pulling plain 'semaphore'.  I'll use this
> > > method of tagging from now on.
> >
> > Thanks. I read this to late for today's tree, but I will fix it up for
> > the next one.
> 
> Please don't add this nasty code to semaphore.
> 
> Did my previous message to the thread get eaten by spam filters?

It didn't arrive until after Stephen's message (but it did arrive before
this one).

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-27 14:15     ` down_spin() implementation Matthew Wilcox
  2008-03-28  0:01       ` Nick Piggin
  2008-03-28  4:51       ` Stephen Rothwell
@ 2008-03-28 12:51       ` Jens Axboe
  2008-03-28 13:17         ` Matthew Wilcox
  2 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2008-03-28 12:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luck, Tony, Stephen Rothwell, linux-arch, linux-mm, linux-kernel

On Thu, Mar 27 2008, Matthew Wilcox wrote:
> On Wed, Mar 26, 2008 at 01:29:58PM -0700, Luck, Tony wrote:
> > This looks a lot cleaner than my ia64 specific code that
> > used cmpxchg() for the down() operation and fetchadd for
> > the up() ... using a brand new semaphore_spin data type.
> 
> I did brifly consider creating a spinaphore data type, but it's
> significantly less code to create down_spin().
> 
> > It appears to work ... I tried to do some timing comparisons
> > of this generic version against my arch specific one, but the
> > hackbench test case has a run to run variation of a factor of
> > three (from 1min9sec to 3min44sec) so it is hopeless to try
> > and see some small percentage difference.
> 
> Thanks for testing and putting this together in patch form.  I've fixed it
> up to address Jens' astute comment and added it to my semaphore patchset.
> 
> http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=semaphore-20080327
> 
> Stephen, I've updated the 'semaphore' tag to point ot the same place as
> semaphore-20080327, so please change your linux-next tree from pulling
> semaphore-20080314 to just pulling plain 'semaphore'.  I'll use this
> method of tagging from now on.
> 
> Here's the edited patch.
> 
> commit 517df6fedc88af3f871cf827a62ef1a1a2073645
> Author: Matthew Wilcox <matthew@wil.cx>
> Date:   Thu Mar 27 09:49:26 2008 -0400
> 
>     Add down_spin()
>     
>     ia64 would like to use a semaphore in flush_tlb_all() as it can have
>     multiple tokens.  Unfortunately, it's currently nested inside a spinlock,
>     so they can't sleep.  down_spin() is the cheapest solution to implement.
>     
>     Signed-off-by: Tony Luck <tony.luck@intel.com>
>     Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> 
> diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
> index a7125da..13b5f32 100644
> --- a/include/linux/semaphore.h
> +++ b/include/linux/semaphore.h
> @@ -78,6 +78,14 @@ extern int __must_check down_trylock(struct semaphore *sem);
>  extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
>  
>  /*
> + * As down(), except this function will spin waiting for the semaphore
> + * instead of sleeping.  It is safe to use while holding a spinlock or
> + * with interrupts disabled.  It should not be called from interrupt
> + * context as this may lead to deadlocks.
> + */
> +extern void down_spin(struct semaphore *sem);
> +
> +/*
>   * Release the semaphore.  Unlike mutexes, up() may be called from any
>   * context and even by tasks which have never called down().
>   */
> diff --git a/kernel/semaphore.c b/kernel/semaphore.c
> index bef977b..a242d87 100644
> --- a/kernel/semaphore.c
> +++ b/kernel/semaphore.c
> @@ -26,6 +26,7 @@ static noinline void __down(struct semaphore *sem);
>  static noinline int __down_interruptible(struct semaphore *sem);
>  static noinline int __down_killable(struct semaphore *sem);
>  static noinline int __down_timeout(struct semaphore *sem, long jiffies);
> +static noinline void __down_spin(struct semaphore *sem, unsigned long flags);
>  static noinline void __up(struct semaphore *sem);
>  
>  void down(struct semaphore *sem)
> @@ -117,6 +118,20 @@ int down_timeout(struct semaphore *sem, long jiffies)
>  }
>  EXPORT_SYMBOL(down_timeout);
>  
> +void down_spin(struct semaphore *sem)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&sem->lock, flags);
> +       if (likely(sem->count > 0)) {
> +               sem->count--;
> +               spin_unlock_irqrestore(&sem->lock, flags);
> +       } else {
> +               __down_spin(sem, flags);
> +       }
> +}
> +EXPORT_SYMBOL(down_spin);
> +
>  void up(struct semaphore *sem)
>  {
>         unsigned long flags;
> @@ -197,6 +212,20 @@ static noinline int __sched __down_timeout(struct semaphore
>         return __down_common(sem, TASK_UNINTERRUPTIBLE, jiffies);
>  }
>  
> +static noinline void __sched __down_spin(struct semaphore *sem,
> +                                                       unsigned long flags)
> +{
> +       struct semaphore_waiter waiter;
> +
> +       list_add_tail(&waiter.list, &sem->wait_list);
> +       waiter.task = current;
> +       waiter.up = 0;
> +
> +       spin_unlock_irqrestore(&sem->lock, flags);
> +       while (!waiter.up)
> +               cpu_relax();
> +}
> +
>  static noinline void __sched __up(struct semaphore *sem)
>  {
>         struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,

It used to be illegal to pass flags as parameters. IIRC, sparc did some
trickery with it. That may still be the case, I haven't checked in a
long time.

Why not just fold __down_spin() into down_spin() and get rid of that
nasty anyway?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-28 12:51       ` Jens Axboe
@ 2008-03-28 13:17         ` Matthew Wilcox
  2008-03-28 13:24           ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2008-03-28 13:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Luck, Tony, Stephen Rothwell, linux-arch, linux-mm, linux-kernel

On Fri, Mar 28, 2008 at 01:51:04PM +0100, Jens Axboe wrote:
> It used to be illegal to pass flags as parameters. IIRC, sparc did some
> trickery with it. That may still be the case, I haven't checked in a
> long time.

That problem was removed before 2.6 started, iirc.  At least the chapter
on 'The Fucked Up Sparc' [1] was removed before 2.6.12-rc2 (the
beginning of git history and I can't be bothered to pinpoint it more
precisely).

> Why not just fold __down_spin() into down_spin() and get rid of that
> nasty anyway?

Could have done.  It's moot now that Nick's pointed out how unsafe it
is to mix down_spin() with plain down().

[1] http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/x467.html

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-28 13:17         ` Matthew Wilcox
@ 2008-03-28 13:24           ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2008-03-28 13:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luck, Tony, Stephen Rothwell, linux-arch, linux-mm, linux-kernel

On Fri, Mar 28 2008, Matthew Wilcox wrote:
> On Fri, Mar 28, 2008 at 01:51:04PM +0100, Jens Axboe wrote:
> > It used to be illegal to pass flags as parameters. IIRC, sparc did some
> > trickery with it. That may still be the case, I haven't checked in a
> > long time.
> 
> That problem was removed before 2.6 started, iirc.  At least the chapter
> on 'The Fucked Up Sparc' [1] was removed before 2.6.12-rc2 (the
> beginning of git history and I can't be bothered to pinpoint it more
> precisely).

OK

> > Why not just fold __down_spin() into down_spin() and get rid of that
> > nasty anyway?
> 
> Could have done.  It's moot now that Nick's pointed out how unsafe it
> is to mix down_spin() with plain down().
> 
> [1] http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/x467.html

Yeah saw that after replying, so no problem then :)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: down_spin() implementation
  2008-03-28 12:45         ` Matthew Wilcox
@ 2008-03-28 21:16           ` Luck, Tony
  2008-03-28 23:48             ` Arnd Bergmann
  2008-03-29  1:04           ` Nick Piggin
  1 sibling, 1 reply; 23+ messages in thread
From: Luck, Tony @ 2008-03-28 21:16 UTC (permalink / raw)
  To: Matthew Wilcox, Nick Piggin
  Cc: Stephen Rothwell, linux-arch, linux-mm, linux-kernel

> So it makes little sense to add this to semaphores.  Better to introduce
> a spinaphore, as you say.

> struct {
>   atomic_t cur;
>   int max;
> } ss_t;

Could this API sneak into the bottom of one or the other of
linux/include/{spinlock,semaphore}.h ... or should it get its own
spinaphore.h file?

Or should I follow Alan's earlier advice and keep this as an ia64
only thing (since I'll be the only user).

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-28 21:16           ` Luck, Tony
@ 2008-03-28 23:48             ` Arnd Bergmann
  0 siblings, 0 replies; 23+ messages in thread
From: Arnd Bergmann @ 2008-03-28 23:48 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Matthew Wilcox, Nick Piggin, Stephen Rothwell, linux-arch,
	linux-mm, linux-kernel

On Friday 28 March 2008, Luck, Tony wrote:
> > So it makes little sense to add this to semaphores.  Better to introduce
> > a spinaphore, as you say.
> 
> > struct {
> >   atomic_t cur;
> >   int max;
> > } ss_t;
> 
> Could this API sneak into the bottom of one or the other of
> linux/include/{spinlock,semaphore}.h ... or should it get its own
> spinaphore.h file?
>
> Or should I follow Alan's earlier advice and keep this as an ia64
> only thing (since I'll be the only user).

If you use the simple version suggested last by Willy, I think it
could even be open-coded in your TLB management code.

Should we decided to make it an official interface, I'd suggest
putting it into atomic.h, because it operates on a plain atomic_t.

	Arnd <><

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: down_spin() implementation
  2008-03-28 12:45         ` Matthew Wilcox
  2008-03-28 21:16           ` Luck, Tony
@ 2008-03-29  1:04           ` Nick Piggin
  1 sibling, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2008-03-29  1:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luck, Tony, Stephen Rothwell, linux-arch, linux-mm, linux-kernel

On Friday 28 March 2008 23:45, Matthew Wilcox wrote:

> I think we can do better here with:
>
> 	atomic_set(max);
>
> and
>
> 	while (unlikely(!atomic_add_unless(&ss->cur, -1, 0)))
> 		while (atomic_read(&ss->cur) == 0)
> 			cpu_relax();

Yeah of course! That's much better ;)

I'd say Tony could just open code it for now, which would get him
up and running quickly... although if anybody gets keen to add it
as a generic API then I wouldn't object.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-03-29  1:33 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-25 20:49 What if a TLB flush needed to sleep? Luck, Tony
2008-03-25 21:47 ` Alan Cox
2008-03-26 12:32 ` Matthew Wilcox
2008-03-26 20:29   ` Luck, Tony
2008-03-27  8:09     ` Jens Axboe
2008-03-27 14:15     ` down_spin() implementation Matthew Wilcox
2008-03-28  0:01       ` Nick Piggin
2008-03-28 12:45         ` Matthew Wilcox
2008-03-28 21:16           ` Luck, Tony
2008-03-28 23:48             ` Arnd Bergmann
2008-03-29  1:04           ` Nick Piggin
2008-03-28  4:51       ` Stephen Rothwell
2008-03-28  5:03         ` Nick Piggin
2008-03-28 12:46           ` Matthew Wilcox
2008-03-28 12:51       ` Jens Axboe
2008-03-28 13:17         ` Matthew Wilcox
2008-03-28 13:24           ` Jens Axboe
2008-03-26 19:25 ` What if a TLB flush needed to sleep? Christoph Lameter
2008-03-26 20:29   ` Thomas Gleixner
2008-03-27  1:19     ` Christoph Lameter
2008-03-27 13:20       ` Peter Zijlstra
2008-03-27 18:44         ` Christoph Lameter
2008-03-28  9:59           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).