All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: ghaskins@novell.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] get rid of cpupri lock
Date: Mon, 13 Sep 2010 17:17:59 -0400	[thread overview]
Message-ID: <20100913211759.GB3207@think> (raw)
In-Reply-To: <1284405110.17152.86.camel@gandalf.stny.rr.com>

On Mon, Sep 13, 2010 at 03:11:49PM -0400, Steven Rostedt wrote:
> On Mon, 2010-09-13 at 14:04 -0400, Chris Mason wrote:
> > I recently had the chance to try and tune 2.6.32 kernels running oracle
> > on OLTP workloads.  One of the things oracle loves to do for tuning
> > these benchmarks is make all the database tasks involved SCHED_FIFO.
> > This is because they have their own userland locks and if they get
> > scheduled out, lock contention goes up.
> 
> And each thread is bound to its own CPU, right?

Each to a socket, so they can move between 16 different cpus.

> 
> > 
> > <please insert flames about userland locking here>
> 
> Userland locking sucks sucks sucks!!!

That's pretty weak, you didn't use all the placeholders!

> 
> > 
> > <and here>
> > 
> > <and here>
> > 
> > The box I was tuning had 8 sockets and the first thing that jumped out
> > at me during the run was that we were spending all our system time
> > inside cpupri_set.  Since the rq lock must be held in cpurpi_set, I
> > don't think we need the cpupri lock at all.
> > 
> > The patch below is entirely untested, mostly because I'm hoping for
> > hints on good ways to test it.  Clearly Oracle RT isn't the workload we
> > really want to tune for, but I think this change is generally useful if
> > we can do it safely.
> > 
> > cpusets could also be used to mitigate this problem, but if we can just
> > avoid the lock it would be nice.
> > 
> > diff --git a/kernel/sched_cpupri.c b/kernel/sched_cpupri.c
> > index 2722dc1..dd51302 100644
> > --- a/kernel/sched_cpupri.c
> > +++ b/kernel/sched_cpupri.c
> > @@ -115,7 +115,6 @@ void cpupri_set(struct cpupri *cp, int cpu, int newpri)
> >  {
> >  	int                 *currpri = &cp->cpu_to_pri[cpu];
> >  	int                  oldpri  = *currpri;
> > -	unsigned long        flags;
> >  
> >  	newpri = convert_prio(newpri);
> >  
> > @@ -134,26 +133,15 @@ void cpupri_set(struct cpupri *cp, int cpu, int newpri)
> >  	if (likely(newpri != CPUPRI_INVALID)) {
> >  		struct cpupri_vec *vec = &cp->pri_to_cpu[newpri];
> >  
> > -		raw_spin_lock_irqsave(&vec->lock, flags);
> > -
> >  		cpumask_set_cpu(cpu, vec->mask);
> > -		vec->count++;
> > -		if (vec->count == 1)
> > +		if (atomic_inc_return(&vec->count) == 1)
> >  			set_bit(newpri, cp->pri_active);
> > -
> > -		raw_spin_unlock_irqrestore(&vec->lock, flags);
> 
> IIRC we tried this at first (Gregory?). The problem is that you just
> moved the setting of the vec->mask outside of the updating of the vec
> count. I don't think rq lock helps here at all.
> 

Hmm who are we racing with?  There isn't another CPU updating masks for
this CPU (we have the rq lock).  There could be someone reading the
various masks but they are already racing because they are lock free.

> I'll look into this too.

Great, thanks.  You'll notice I don't have any numbers about having
fixed this...once this lock is gone we slam into the rq locks while
dealing with the queue of RT overloaded CPUs.

That one is next ;)

-chris

> 
> -- Steve
> 
> >  	}
> >  	if (likely(oldpri != CPUPRI_INVALID)) {
> >  		struct cpupri_vec *vec  = &cp->pri_to_cpu[oldpri];
> > -
> > -		raw_spin_lock_irqsave(&vec->lock, flags);
> > -
> > -		vec->count--;
> > -		if (!vec->count)
> > +		if (atomic_dec_and_test(&vec->count))
> >  			clear_bit(oldpri, cp->pri_active);
> >  		cpumask_clear_cpu(cpu, vec->mask);
> > -
> > -		raw_spin_unlock_irqrestore(&vec->lock, flags);
> >  	}
> >  
> >  	*currpri = newpri;
> > @@ -174,9 +162,8 @@ int cpupri_init(struct cpupri *cp)
> >  
> >  	for (i = 0; i < CPUPRI_NR_PRIORITIES; i++) {
> >  		struct cpupri_vec *vec = &cp->pri_to_cpu[i];
> > +		atomic_set(&vec->count, 0);
> >  
> > -		raw_spin_lock_init(&vec->lock);
> > -		vec->count = 0;
> >  		if (!zalloc_cpumask_var(&vec->mask, GFP_KERNEL))
> >  			goto cleanup;
> >  	}
> > diff --git a/kernel/sched_cpupri.h b/kernel/sched_cpupri.h
> > index 9fc7d38..fe07002 100644
> > --- a/kernel/sched_cpupri.h
> > +++ b/kernel/sched_cpupri.h
> > @@ -12,8 +12,7 @@
> >  /* values 2-101 are RT priorities 0-99 */
> >  
> >  struct cpupri_vec {
> > -	raw_spinlock_t lock;
> > -	int        count;
> > +	atomic_t   count;
> >  	cpumask_var_t mask;
> >  };
> >  
> 
> 

  reply	other threads:[~2010-09-13 21:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-13 18:04 [PATCH RFC] get rid of cpupri lock Chris Mason
2010-09-13 19:02 ` Gregory Haskins
2010-09-13 19:11 ` Steven Rostedt
2010-09-13 21:17   ` Chris Mason [this message]
2010-09-14  1:21     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100913211759.GB3207@think \
    --to=chris.mason@oracle.com \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.