All of lore.kernel.org
 help / color / mirror / Atom feed
* [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
@ 2013-03-21 22:55 Steven Rostedt
  2013-03-22 15:41 ` Christoph Lameter
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-03-21 22:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

Christoph,

I've been trying to hunt down a latency on a 40 core box, where funny,
the more tracing we enabled the less likely it would hit. It would take
up to 12 hours to hit the latency, and if we had too much tracing on, it
never would hit :-p

Thus I would add little by little to find where things were going bad. I
found the timer interrupt was triggering late. I then traced with:

trace-cmd start -p function --func-stack -e irq -e raw_syscalls -e 'sched_wakeup*' \
  -e sched_switch -e hrtimer_start -e hrtimer_expire_entry

I kicked off the test again, when it detected a latency it would stop
tracing, I extract the trace data from the buffers, send it to my
workstation and analyze it.

I found this:


hackbenc-61738  39d...0 238202.164474: function:             smp_apic_timer_interrupt
hackbenc-61738  39d...0 238202.164484: kernel_stack:         <stack
trace>
=> apic_timer_interrupt (ffffffff81528077)
=> retint_kernel (ffffffff81520967)
=> __slab_free (ffffffff8115517a)
=> kfree (ffffffff811554e1)
=> skb_free_head (ffffffff8143b9ee)
=> skb_release_data (ffffffff8143c830)
=> __kfree_skb (ffffffff8143c8be)
=> consume_skb (ffffffff8143c9c1)
=> unix_stream_recvmsg (ffffffff814ecd88)
=> sock_aio_read (ffffffff81431f6e)
=> do_sync_read (ffffffff8115fdea)
=> vfs_read (ffffffff811605c0)
=> sys_read (ffffffff811606e1)
=> tracesys (ffffffff8152778d)

That __slab_free() happened here:

   0xffffffff81155160 <+544>:	cmpb   $0x0,-0x55(%rbp)
   0xffffffff81155164 <+548>:	jns    0xffffffff8115509e <__slab_free+350>
   0xffffffff8115516a <+554>:	mov    $0x1,%edx
   0xffffffff8115516f <+559>:	mov    %r13,%rsi
   0xffffffff81155172 <+562>:	mov    %r15,%rdi
   0xffffffff81155175 <+565>:	callq  0xffffffff81154d90 <put_cpu_partial>
   0xffffffff8115517a <+570>:	jmpq   0xffffffff8115509e <__slab_free+350>
   0xffffffff8115517f <+575>:	mov    %r8,-0xf8(%rbp)


The ffffffff8115517a is just before put_cpu_partial() which calls
unfreeze_partials() with irqs disabled. So I started tracing again, this
time with:

trace-cmd start -p function_graph -l smp_apic_timer_interrupt -l unfreeze_partials \
  -e irq -e raw_syscalls -e 'sched_wakeup*' -e sched_switch -e hrtimer_start \
  -e hrtimer_expire_entry

The difference was instead of tracing the function
smp_apic_timer_interrupt and its stack trace, I decided to trace the
function graph of unfreeze_partials, and by only selecting that
function, it gives me a good idea of how long that function runs. Here's
the result:

hackbenc-80563  14...10 262219.108980: sys_exit:             NR 0 = 100
hackbenc-80563  14...10 262219.108981: sys_enter:            NR 0 (a, 7fff1b700d50, 64, 8, 7fff1b700c80, 138ef)
hackbenc-80563  14d...0 262219.108982: funcgraph_entry:      ! 249.510 us |  unfreeze_partials();
hackbenc-80563  14d...0 262219.109233: funcgraph_entry:                   |  smp_apic_timer_interrupt() {
hackbenc-80563  14d.h10 262219.109234: hrtimer_expire_entry: hrtimer=0xffff881fcf731e88 now=262298497076329 function=hrtimer_wakeup/0x0
hackbenc-80563  14d.h30 262219.109235: sched_wakeup:         cyclictest:33010 [4] success=1 CPU:014


It ran for 249.51 microseconds!!! With interrupts disabled! This was
what caused the interrupt to go off late. I have no idea why adding
tracing makes this latency go away. Perhaps it changes the timings just
enough to not let things line up?

I did a report with '-R' and showing the raw events, which will show the
exit part of the function graph we have:

   <...>-80563  14d...0 262219.108982: funcgraph_entry:       func=0xffffffff81154954 depth=0
   <...>-80563  14d...0 262219.109233: funcgraph_exit:        func=0xffffffff81154954 calltime=0xee7ca4d8200f rettime=0xee7ca4dbeeb5 overrun=0x0 depth=0
   <...>-80563  14d...0 262219.109233: funcgraph_entry:       func=0xffffffff81528e47 depth=0

The funcgraph_exit is within the same microsecond the
smp_apic_timer_interrupt() went off, so yes this is what delayed it.

Anyway, this is run on 3.6.11-rt30, but looking at the current code, it
doesn't look like it changed in any meaningful way. The while ((page =
c->partial)) makes me nervous. How big can this list be? Is there a way
to limit the amount this can run?

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-21 22:55 [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code Steven Rostedt
@ 2013-03-22 15:41 ` Christoph Lameter
  2013-03-23  3:51   ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-03-22 15:41 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Thu, 21 Mar 2013, Steven Rostedt wrote:

> The ffffffff8115517a is just before put_cpu_partial() which calls
> unfreeze_partials() with irqs disabled. So I started tracing again, this
> time with:

Hmmm... That is strange. unfreeze_partials should be rather fast.

> It ran for 249.51 microseconds!!! With interrupts disabled! This was
> what caused the interrupt to go off late. I have no idea why adding
> tracing makes this latency go away. Perhaps it changes the timings just
> enough to not let things line up?
>
> I did a report with '-R' and showing the raw events, which will show the
> exit part of the function graph we have:
>
>    <...>-80563  14d...0 262219.108982: funcgraph_entry:       func=0xffffffff81154954 depth=0
>    <...>-80563  14d...0 262219.109233: funcgraph_exit:        func=0xffffffff81154954 calltime=0xee7ca4d8200f rettime=0xee7ca4dbeeb5 overrun=0x0 depth=0
>    <...>-80563  14d...0 262219.109233: funcgraph_entry:       func=0xffffffff81528e47 depth=0
>
> The funcgraph_exit is within the same microsecond the
> smp_apic_timer_interrupt() went off, so yes this is what delayed it.
>
> Anyway, this is run on 3.6.11-rt30, but looking at the current code, it
> doesn't look like it changed in any meaningful way. The while ((page =
> c->partial)) makes me nervous. How big can this list be? Is there a way
> to limit the amount this can run?

The control is via the cpu_partial field in /sys/kernel/slab/<cache>/

There is also slabs_cpu_partial which gives a view as to how many objects
are cached in each per cpu structure. Do a cat

/sys/kernel/*/slabs_cpu_partial to get a view of what the situation is.
Any abnormally high numbers?

The default for the number of per cpu partial objects should be 30 or so.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-22 15:41 ` Christoph Lameter
@ 2013-03-23  3:51   ` Steven Rostedt
  2013-03-25 14:34     ` Christoph Lameter
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-03-23  3:51 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Fri, 2013-03-22 at 15:41 +0000, Christoph Lameter wrote:

> The control is via the cpu_partial field in /sys/kernel/slab/<cache>/
> 
> There is also slabs_cpu_partial which gives a view as to how many objects
> are cached in each per cpu structure. Do a cat
> 
> /sys/kernel/*/slabs_cpu_partial to get a view of what the situation is.
> Any abnormally high numbers?
> 
> The default for the number of per cpu partial objects should be 30 or so.

I just triggered another latency:

hackbenc-31634   5d..31 103261.991668: sched_switch:         hackbench:31634 [120] D ==> hackbench:36093 [120]
hackbenc-36093   5d...0 103261.991670: funcgraph_entry:      ! 225.665 us |  unfreeze_partials();
hackbenc-36093   5d...0 103261.991897: funcgraph_entry:                   |  smp_apic_timer_interrupt() {
hackbenc-36093   5d.h10 103261.991897: hrtimer_expire_entry: hrtimer=0xffff881f5ca7fe88 now=103293011955940 function=hrtimer_wakeup/0x0
hackbenc-36093   5d.h30 103261.991898: sched_wakeup:         cyclictest:8946 [4] success=1 CPU:005
hackbenc-36093   5dN..0 103261.991901: funcgraph_exit:         3.589 us   |  }
hackbenc-36093   5d..30 103261.991902: sched_switch:         hackbench:36093 [120] R ==> cyclictest:8946 [4]

I did a: cat /sys/kernel/slab/*/slabs_cpu_partial > slab_partials

I uploaded it here:

http://rostedt.homelinux.com/private/slab_partials

See anything I should be worried about?

Thanks,

-- Steve




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-23  3:51   ` Steven Rostedt
@ 2013-03-25 14:34     ` Christoph Lameter
  2013-03-25 15:57       ` Steven Rostedt
  2013-03-26 16:52       ` Steven Rostedt
  0 siblings, 2 replies; 51+ messages in thread
From: Christoph Lameter @ 2013-03-25 14:34 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Fri, 22 Mar 2013, Steven Rostedt wrote:

> I uploaded it here:
>
> http://rostedt.homelinux.com/private/slab_partials
>
> See anything I should be worried about?

Looks fine. Maybe its the spinlocks being contended that cause the
long holdoff? How does RT deal with the spinlocks? Dont know too much
about it.

You can eliminate the whole thing by setting

/sys/kernel/slab/<whatever/cpu_partial

to zero.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-25 14:34     ` Christoph Lameter
@ 2013-03-25 15:57       ` Steven Rostedt
  2013-03-25 16:13         ` Steven Rostedt
  2013-03-26 16:52       ` Steven Rostedt
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-03-25 15:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Mon, 2013-03-25 at 14:34 +0000, Christoph Lameter wrote:
> On Fri, 22 Mar 2013, Steven Rostedt wrote:
> 
> > I uploaded it here:
> >
> > http://rostedt.homelinux.com/private/slab_partials
> >
> > See anything I should be worried about?
> 
> Looks fine. Maybe its the spinlocks being contended that cause the
> long holdoff?


Funny, I was in the process of telling you this :-)

I added trace_printks() around the lock where the lock was tried to be
taken and where it actually had the lock. Here's what I got:

hackbenc-56308  18d...0 164395.064073: funcgraph_entry:                   |  unfreeze_partials() {
hackbenc-56308  18d...0 164395.064073: bprint:               unfreeze_partials : take lock 0xffff881fff4000a0
hackbenc-56308  18d..10 164395.064093: bprint:               unfreeze_partials : have lock 0xffff881fff4000a0
hackbenc-56308  18d...0 164395.064094: bprint:               unfreeze_partials : take lock 0xffff880fff4010a0
hackbenc-56308  18d..10 164395.064094: bprint:               unfreeze_partials : have lock 0xffff880fff4010a0
hackbenc-56308  18d...0 164395.064094: bprint:               unfreeze_partials : take lock 0xffff881fff4000a0
hackbenc-56308  18d..10 164395.064159: bprint:               unfreeze_partials : have lock 0xffff881fff4000a0
hackbenc-56308  18d...0 164395.064162: bprint:               unfreeze_partials : take lock 0xffff880fff4010a0
hackbenc-56308  18d..10 164395.064162: bprint:               unfreeze_partials : have lock 0xffff880fff4010a0
hackbenc-56308  18d...0 164395.064162: bprint:               unfreeze_partials : take lock 0xffff881fff4000a0
hackbenc-56308  18d..10 164395.064224: bprint:               unfreeze_partials : have lock 0xffff881fff4000a0
hackbenc-56308  18d...0 164395.064226: bprint:               unfreeze_partials : take lock 0xffff880fff4010a0
hackbenc-56308  18d..10 164395.064226: bprint:               unfreeze_partials : have lock 0xffff880fff4010a0
hackbenc-56308  18d...0 164395.064226: bprint:               unfreeze_partials : take lock 0xffff881fff4000a0
hackbenc-56308  18d..10 164395.064274: bprint:               unfreeze_partials : have lock 0xffff881fff4000a0
hackbenc-56308  18d...0 164395.064277: bprint:               unfreeze_partials : take lock 0xffff882fff4000a0
hackbenc-56308  18d..10 164395.064277: bprint:               unfreeze_partials : have lock 0xffff882fff4000a0
hackbenc-56308  18d...0 164395.064277: bprint:               unfreeze_partials : take lock 0xffff881fff4000a0
hackbenc-56308  18d..10 164395.064334: bprint:               unfreeze_partials : have lock 0xffff881fff4000a0
hackbenc-56308  18d...0 164395.064336: bprint:               unfreeze_partials : out=14 in=14 dis=0
hackbenc-56308  18d...0 164395.064336: funcgraph_exit:       ! 262.907 us |  }

There were several locations that took 60us to grab the lock. This
happened enough to add up to a large latency. This shows that it is the
lock contention that's the issue here.



>  How does RT deal with the spinlocks? Dont know too much
> about it.

Normally, spin_locks are converted to mutexes. But for the slub case,
the locks in question have been made into raw_spinlock_t. These locks
act the same as they do in mainline (ticket spin locks). As it looked
like the locks were held for such short intervals, and it would be quite
difficult to convert them into sleeping locks, we kept them as real spin
locks.

> 
> You can eliminate the whole thing by setting
> 
> /sys/kernel/slab/<whatever/cpu_partial
> 
> to zero.

Thanks! I'll give this a try.

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-25 15:57       ` Steven Rostedt
@ 2013-03-25 16:13         ` Steven Rostedt
  2013-03-25 17:51           ` Christoph Lameter
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-03-25 16:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Mon, 2013-03-25 at 11:57 -0400, Steven Rostedt wrote:

> > 
> > You can eliminate the whole thing by setting
> > 
> > /sys/kernel/slab/<whatever/cpu_partial
> > 
> > to zero.
> 
> Thanks! I'll give this a try.
> 

BTW, what impact does setting all cpu_partial files to zero have on the
system?

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-25 16:13         ` Steven Rostedt
@ 2013-03-25 17:51           ` Christoph Lameter
  2013-03-25 18:03             ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-03-25 17:51 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Mon, 25 Mar 2013, Steven Rostedt wrote:

> BTW, what impact does setting all cpu_partial files to zero have on the
> system?

Slight slowdown in kfree since the frees are less likely to be batched.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-25 17:51           ` Christoph Lameter
@ 2013-03-25 18:03             ` Steven Rostedt
  2013-03-25 18:27               ` Christoph Lameter
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-03-25 18:03 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Mon, 2013-03-25 at 17:51 +0000, Christoph Lameter wrote:
> On Mon, 25 Mar 2013, Steven Rostedt wrote:
> 
> > BTW, what impact does setting all cpu_partial files to zero have on the
> > system?
> 
> Slight slowdown in kfree since the frees are less likely to be batched.

If this makes it more deterministic, and lower worse case latencies,
then it's definitely worth the price.

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-25 18:03             ` Steven Rostedt
@ 2013-03-25 18:27               ` Christoph Lameter
  2013-03-25 18:32                 ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-03-25 18:27 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Mon, 25 Mar 2013, Steven Rostedt wrote:

> If this makes it more deterministic, and lower worse case latencies,
> then it's definitely worth the price.

Yes that would make it more deterministic. Maybe I should add an option
to be able to compile the allocator without cpu partial page support?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-25 18:27               ` Christoph Lameter
@ 2013-03-25 18:32                 ` Steven Rostedt
  2013-03-27  2:59                   ` Joonsoo Kim
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-03-25 18:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Mon, 2013-03-25 at 18:27 +0000, Christoph Lameter wrote:
> On Mon, 25 Mar 2013, Steven Rostedt wrote:
> 
> > If this makes it more deterministic, and lower worse case latencies,
> > then it's definitely worth the price.
> 
> Yes that would make it more deterministic. Maybe I should add an option
> to be able to compile the allocator without cpu partial page support?

I agree that would be useful.

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-25 14:34     ` Christoph Lameter
  2013-03-25 15:57       ` Steven Rostedt
@ 2013-03-26 16:52       ` Steven Rostedt
  1 sibling, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2013-03-26 16:52 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML, RT, Thomas Gleixner, Clark Williams

On Mon, 2013-03-25 at 14:34 +0000, Christoph Lameter wrote:

> You can eliminate the whole thing by setting
> 
> /sys/kernel/slab/<whatever/cpu_partial
> 
> to zero.

Giving you an update.

I did the following:

# ls /sys/kernel/slab/*/cpu_partial | while read f; do
	echo 0 > $f
done


Then ran our test suite for 24 hours, and had a max latency of 135
microseconds. Well within our range for such a machine. Seems this is
definitely the cause of our latencies, as before, we would always
trigger a unacceptable latency within 12 hours.

Thanks for your help!

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-25 18:32                 ` Steven Rostedt
@ 2013-03-27  2:59                   ` Joonsoo Kim
  2013-03-27  3:30                     ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Joonsoo Kim @ 2013-03-27  2:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Lameter, LKML, RT, Thomas Gleixner, Clark Williams

On Mon, Mar 25, 2013 at 02:32:35PM -0400, Steven Rostedt wrote:
> On Mon, 2013-03-25 at 18:27 +0000, Christoph Lameter wrote:
> > On Mon, 25 Mar 2013, Steven Rostedt wrote:
> > 
> > > If this makes it more deterministic, and lower worse case latencies,
> > > then it's definitely worth the price.
> > 
> > Yes that would make it more deterministic. Maybe I should add an option
> > to be able to compile the allocator without cpu partial page support?
> 
> I agree that would be useful.

Hello, Steven and Christoph.

How about using spin_try_lock() in unfreeze_partials() and
using spin_lock_contented() in get_partial_node() to reduce latency?
IMHO, this doesn't make code more deterministic, but can maintain
a benefit of cpu partial page with tolerable latency.

Thanks.

> 
> -- Steve
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-27  2:59                   ` Joonsoo Kim
@ 2013-03-27  3:30                     ` Steven Rostedt
  2013-03-27  6:13                       ` Joonsoo Kim
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-03-27  3:30 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Christoph Lameter, LKML, RT, Thomas Gleixner, Clark Williams

On Wed, 2013-03-27 at 11:59 +0900, Joonsoo Kim wrote:

> How about using spin_try_lock() in unfreeze_partials() and
> using spin_lock_contented() in get_partial_node() to reduce latency?
> IMHO, this doesn't make code more deterministic, but can maintain
> a benefit of cpu partial page with tolerable latency.

And what do you do when you fail the try lock? Try again, or just break
out?

We can run benchmarks, but I don't like playing games in -rt. It either
is deterministic, or it isn't.

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-27  3:30                     ` Steven Rostedt
@ 2013-03-27  6:13                       ` Joonsoo Kim
  2013-03-28 17:29                         ` Christoph Lameter
       [not found]                         ` <alpine.DEB.2.02.1303281227520.16200@gentwo.org>
  0 siblings, 2 replies; 51+ messages in thread
From: Joonsoo Kim @ 2013-03-27  6:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christoph Lameter, LKML, RT, Thomas Gleixner, Clark Williams

On Tue, Mar 26, 2013 at 11:30:32PM -0400, Steven Rostedt wrote:
> On Wed, 2013-03-27 at 11:59 +0900, Joonsoo Kim wrote:
> 
> > How about using spin_try_lock() in unfreeze_partials() and
> > using spin_lock_contented() in get_partial_node() to reduce latency?
> > IMHO, this doesn't make code more deterministic, but can maintain
> > a benefit of cpu partial page with tolerable latency.
> 
> And what do you do when you fail the try lock? Try again, or just break
> out?

Just break out.

> 
> We can run benchmarks, but I don't like playing games in -rt. It either
> is deterministic, or it isn't.

Okay.

> 
> -- Steve
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-27  6:13                       ` Joonsoo Kim
@ 2013-03-28 17:29                         ` Christoph Lameter
  2013-04-08 12:25                           ` Steven Rostedt
       [not found]                         ` <alpine.DEB.2.02.1303281227520.16200@gentwo.org>
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-03-28 17:29 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Steven Rostedt, LKML, RT, Thomas Gleixner, Clark Williams, Pekka Enberg

Two patches against slub to enable deconfiguring cpu_partial support.

First one is a bug fix (Pekka please pick up this one or use Joonsoo's
earlier one).


Subject: slub: Fix object counts in acquire_slab

It seems that we were overallocating objects from the slab queues
since get_partial_node() assumed that page->inuse was undisturbed by
acquire_slab(). Save the # of objects in page->lru.next in acquire_slab()
and pass it to get_partial_node() that way.

I have a vague memory that Joonsoo also ran into this issue awhile back.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-03-28 12:14:26.958358688 -0500
+++ linux/mm/slub.c	2013-03-28 12:16:57.240785613 -0500
@@ -1498,6 +1498,7 @@ static inline void *acquire_slab(struct
 	void *freelist;
 	unsigned long counters;
 	struct page new;
+	unsigned long objects;

 	/*
 	 * Zap the freelist and set the frozen bit.
@@ -1507,6 +1508,7 @@ static inline void *acquire_slab(struct
 	freelist = page->freelist;
 	counters = page->counters;
 	new.counters = counters;
+	objects = page->objects;
 	if (mode) {
 		new.inuse = page->objects;
 		new.freelist = NULL;
@@ -1524,6 +1526,7 @@ static inline void *acquire_slab(struct
 		return NULL;

 	remove_partial(n, page);
+	page->lru.next = (void *)objects;
 	WARN_ON(!freelist);
 	return freelist;
 }
@@ -1565,7 +1568,7 @@ static void *get_partial_node(struct kme
 			c->page = page;
 			stat(s, ALLOC_FROM_PARTIAL);
 			object = t;
-			available =  page->objects - page->inuse;
+			available =  page->objects - (unsigned long)page->lru.next;
 		} else {
 			available = put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
       [not found]                         ` <alpine.DEB.2.02.1303281227520.16200@gentwo.org>
@ 2013-03-28 17:30                           ` Christoph Lameter
  2013-03-29  2:43                             ` Paul Gortmaker
                                               ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Christoph Lameter @ 2013-03-28 17:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Steven Rostedt, LKML, RT, Thomas Gleixner, Clark Williams, Pekka Enberg

This patch requires the earlier bug fix.


Subject: slub: Make cpu partial slab support configurable

cpu partial support can introduce level of indeterminism that is not wanted
in certain context (like a realtime kernel). Make it configurable.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/include/linux/slub_def.h
===================================================================
--- linux.orig/include/linux/slub_def.h	2013-03-28 12:14:26.958358688 -0500
+++ linux/include/linux/slub_def.h	2013-03-28 12:19:46.275866639 -0500
@@ -47,7 +47,9 @@ struct kmem_cache_cpu {
 	void **freelist;	/* Pointer to next available object */
 	unsigned long tid;	/* Globally unique transaction id */
 	struct page *page;	/* The slab from which we are allocating */
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *partial;	/* Partially allocated frozen slabs */
+#endif
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif
@@ -84,7 +86,9 @@ struct kmem_cache {
 	int size;		/* The size of an object including meta data */
 	int object_size;	/* The size of an object without meta data */
 	int offset;		/* Free pointer offset. */
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	int cpu_partial;	/* Number of per cpu partial objects to keep around */
+#endif
 	struct kmem_cache_order_objects oo;

 	/* Allocation and freeing of slabs */
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-03-28 12:18:48.446813991 -0500
+++ linux/mm/slub.c	2013-03-28 12:19:46.275866639 -0500
@@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
 	return freelist;
 }

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
+#endif
 static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);

 /*
@@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
 			object = t;
 			available =  page->objects - (unsigned long)page->lru.next;
 		} else {
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			available = put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
+#else
+			BUG();
+#endif
 		}
-		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
+		if (kmem_cache_debug(s) ||
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+			available > s->cpu_partial / 2
+#else
+			available > 0
+#endif
+			)
 			break;

 	}
@@ -1874,6 +1886,7 @@ redo:
 	}
 }

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 /*
  * Unfreeze all the cpu partial slabs.
  *
@@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
 	return pobjects;
 }
+#endif

 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
@@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
 		if (c->page)
 			flush_slab(s, c);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 		unfreeze_partials(s, c);
+#endif
 	}
 }

@@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
 	struct kmem_cache *s = info;
 	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	return c->page || c->partial;
+#else
+	return c->page;
+#endif
 }

 static void flush_all(struct kmem_cache *s)
@@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
 	page = c->page;
 	if (!page)
 		goto new_slab;
+
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 redo:
+#endif

 	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
@@ -2278,6 +2301,7 @@ load_freelist:

 new_slab:

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	if (c->partial) {
 		page = c->page = c->partial;
 		c->partial = page->next;
@@ -2285,6 +2309,7 @@ new_slab:
 		c->freelist = NULL;
 		goto redo;
 	}
+#endif

 	freelist = new_slab_objects(s, gfpflags, node, &c);

@@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
 		new.inuse--;
 		if ((!new.inuse || !prior) && !was_frozen) {

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			if (!kmem_cache_debug(s) && !prior)

 				/*
@@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
 				 */
 				new.frozen = 1;

-			else { /* Needs to be taken off a list */
+			else
+#endif
+		       		{ /* Needs to be taken off a list */

 	                        n = get_node(s, page_to_nid(page));
 				/*
@@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
 		"__slab_free"));

 	if (likely(!n)) {
+#ifdef CONFIG_SLUB_CPU_PARTIAL

 		/*
 		 * If we just froze the page then put it onto the
@@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
+#endif
 		/*
 		 * The list lock was not taken therefore no list
 		 * activity can be necessary.
@@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
 	 * list to avoid pounding the page allocator excessively.
 	 */
 	set_min_partial(s, ilog2(s->size) / 2);
-
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	/*
 	 * cpu_partial determined the maximum number of objects kept in the
 	 * per cpu partial lists of a processor.
@@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
 		s->cpu_partial = 13;
 	else
 		s->cpu_partial = 30;
+#endif

 #ifdef CONFIG_NUMA
 	s->remote_node_defrag_ratio = 1000;
@@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
 			total += x;
 			nodes[node] += x;

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			page = ACCESS_ONCE(c->partial);
 			if (page) {
 				x = page->pobjects;
 				total += x;
 				nodes[node] += x;
 			}
-
+#endif
 			per_cpu[node]++;
 		}
 	}
@@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
 }
 SLAB_ATTR(min_partial);

+#ifdef CONFIG_CPU_PARTIAL
 static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%u\n", s->cpu_partial);
@@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
 	return length;
 }
 SLAB_ATTR(cpu_partial);
+#endif

 static ssize_t ctor_show(struct kmem_cache *s, char *buf)
 {
@@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
 }
 SLAB_ATTR_RO(objects_partial);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
 {
 	int objects = 0;
@@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
 	return len + sprintf(buf + len, "\n");
 }
 SLAB_ATTR_RO(slabs_cpu_partial);
+#endif

 static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
 {
@@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
 STAT_ATTR(ORDER_FALLBACK, order_fallback);
 STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
 STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
+#ifdef CONFIG_CPU_PARTIAL
 STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
 STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
 STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
 STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
 #endif
+#endif

 static struct attribute *slab_attrs[] = {
 	&slab_size_attr.attr,
@@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
 	&objs_per_slab_attr.attr,
 	&order_attr.attr,
 	&min_partial_attr.attr,
+#ifdef CONFIG_CPU_PARTIAL
 	&cpu_partial_attr.attr,
+#endif
 	&objects_attr.attr,
 	&objects_partial_attr.attr,
 	&partial_attr.attr,
@@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
 	&destroy_by_rcu_attr.attr,
 	&shrink_attr.attr,
 	&reserved_attr.attr,
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	&slabs_cpu_partial_attr.attr,
+#endif
 #ifdef CONFIG_SLUB_DEBUG
 	&total_objects_attr.attr,
 	&slabs_attr.attr,
@@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
 	&order_fallback_attr.attr,
 	&cmpxchg_double_fail_attr.attr,
 	&cmpxchg_double_cpu_fail_attr.attr,
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	&cpu_partial_alloc_attr.attr,
 	&cpu_partial_free_attr.attr,
 	&cpu_partial_node_attr.attr,
 	&cpu_partial_drain_attr.attr,
 #endif
+#endif
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
 #endif
Index: linux/init/Kconfig
===================================================================
--- linux.orig/init/Kconfig	2013-03-28 12:14:26.958358688 -0500
+++ linux/init/Kconfig	2013-03-28 12:19:46.275866639 -0500
@@ -1514,6 +1514,14 @@ config SLOB

 endchoice

+config SLUB_CPU_PARTIAL
+	depends on SLUB
+	bool "SLUB per cpu partial cache"
+	help
+	  Per cpu partial caches accellerate freeing of objects at the
+	  price of more indeterminism in the latency of the free.
+	  Typically one would choose no for a realtime system.
+
 config MMAP_ALLOW_UNINITIALIZED
 	bool "Allow mmapped anonymous memory to be uninitialized"
 	depends on EXPERT && !MMU

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-28 17:30                           ` Christoph Lameter
@ 2013-03-29  2:43                             ` Paul Gortmaker
  2013-04-01 15:32                               ` Christoph Lameter
       [not found]                               ` <alpine.DEB.2.02.1304011025550.12690@gentwo.org>
  2013-04-11 16:05                             ` Steven Rostedt
  2013-05-28 14:39                             ` Steven Rostedt
  2 siblings, 2 replies; 51+ messages in thread
From: Paul Gortmaker @ 2013-03-29  2:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Thu, Mar 28, 2013 at 1:30 PM, Christoph Lameter <cl@linux.com> wrote:
> This patch requires the earlier bug fix.
>
>
> Subject: slub: Make cpu partial slab support configurable

Hi Christoph,

Minor nit:

   Applying: slub: Make cpu partial slab support configurable
   /home/paul/git/linux-head/.git/rebase-apply/patch:141: space before
tab in indent.
		       		{ /* Needs to be taken off a list */
   warning: 1 line adds whitespace errors.

I've tagged the block below where this happens.

Also I've got a question/comment about the Kconfig addition below...

>
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/include/linux/slub_def.h
> ===================================================================
> --- linux.orig/include/linux/slub_def.h 2013-03-28 12:14:26.958358688 -0500
> +++ linux/include/linux/slub_def.h      2013-03-28 12:19:46.275866639 -0500
> @@ -47,7 +47,9 @@ struct kmem_cache_cpu {
>         void **freelist;        /* Pointer to next available object */
>         unsigned long tid;      /* Globally unique transaction id */
>         struct page *page;      /* The slab from which we are allocating */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         struct page *partial;   /* Partially allocated frozen slabs */
> +#endif
>  #ifdef CONFIG_SLUB_STATS
>         unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
> @@ -84,7 +86,9 @@ struct kmem_cache {
>         int size;               /* The size of an object including meta data */
>         int object_size;        /* The size of an object without meta data */
>         int offset;             /* Free pointer offset. */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         int cpu_partial;        /* Number of per cpu partial objects to keep around */
> +#endif
>         struct kmem_cache_order_objects oo;
>
>         /* Allocation and freeing of slabs */
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c        2013-03-28 12:18:48.446813991 -0500
> +++ linux/mm/slub.c     2013-03-28 12:19:46.275866639 -0500
> @@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
>         return freelist;
>  }
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
> +#endif
>  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
>
>  /*
> @@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
>                         object = t;
>                         available =  page->objects - (unsigned long)page->lru.next;
>                 } else {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                         available = put_cpu_partial(s, page, 0);
>                         stat(s, CPU_PARTIAL_NODE);
> +#else
> +                       BUG();
> +#endif
>                 }
> -               if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
> +               if (kmem_cache_debug(s) ||
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +                       available > s->cpu_partial / 2
> +#else
> +                       available > 0
> +#endif
> +                       )
>                         break;
>
>         }
> @@ -1874,6 +1886,7 @@ redo:
>         }
>  }
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  /*
>   * Unfreeze all the cpu partial slabs.
>   *
> @@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
>         } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
>         return pobjects;
>  }
> +#endif
>
>  static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>  {
> @@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
>                 if (c->page)
>                         flush_slab(s, c);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                 unfreeze_partials(s, c);
> +#endif
>         }
>  }
>
> @@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
>         struct kmem_cache *s = info;
>         struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         return c->page || c->partial;
> +#else
> +       return c->page;
> +#endif
>  }
>
>  static void flush_all(struct kmem_cache *s)
> @@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
>         page = c->page;
>         if (!page)
>                 goto new_slab;
> +
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  redo:
> +#endif
>
>         if (unlikely(!node_match(page, node))) {
>                 stat(s, ALLOC_NODE_MISMATCH);
> @@ -2278,6 +2301,7 @@ load_freelist:
>
>  new_slab:
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         if (c->partial) {
>                 page = c->page = c->partial;
>                 c->partial = page->next;
> @@ -2285,6 +2309,7 @@ new_slab:
>                 c->freelist = NULL;
>                 goto redo;
>         }
> +#endif
>
>         freelist = new_slab_objects(s, gfpflags, node, &c);
>
> @@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
>                 new.inuse--;
>                 if ((!new.inuse || !prior) && !was_frozen) {
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                         if (!kmem_cache_debug(s) && !prior)
>
>                                 /*
> @@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
>                                  */
>                                 new.frozen = 1;
>
> -                       else { /* Needs to be taken off a list */
> +                       else
> +#endif
> +                               { /* Needs to be taken off a list */

This is the origin of the trivial whitespace nag.
:se list in vi shows:

-^I^I^Ielse { /* Needs to be taken off a list */$
+^I^I^Ielse$
+#endif$
+^I^I       ^I^I{ /* Needs to be taken off a list */$

>
>                                 n = get_node(s, page_to_nid(page));
>                                 /*
> @@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
>                 "__slab_free"));
>
>         if (likely(!n)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>
>                 /*
>                  * If we just froze the page then put it onto the
> @@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
>                         put_cpu_partial(s, page, 1);
>                         stat(s, CPU_PARTIAL_FREE);
>                 }
> +#endif
>                 /*
>                  * The list lock was not taken therefore no list
>                  * activity can be necessary.
> @@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
>          * list to avoid pounding the page allocator excessively.
>          */
>         set_min_partial(s, ilog2(s->size) / 2);
> -
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         /*
>          * cpu_partial determined the maximum number of objects kept in the
>          * per cpu partial lists of a processor.
> @@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
>                 s->cpu_partial = 13;
>         else
>                 s->cpu_partial = 30;
> +#endif
>
>  #ifdef CONFIG_NUMA
>         s->remote_node_defrag_ratio = 1000;
> @@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
>                         total += x;
>                         nodes[node] += x;
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                         page = ACCESS_ONCE(c->partial);
>                         if (page) {
>                                 x = page->pobjects;
>                                 total += x;
>                                 nodes[node] += x;
>                         }
> -
> +#endif
>                         per_cpu[node]++;
>                 }
>         }
> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
>
> +#ifdef CONFIG_CPU_PARTIAL
>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>         return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
>         return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
>
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>         int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
>         return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
>
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL
>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
>
>  static struct attribute *slab_attrs[] = {
>         &slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
>         &objs_per_slab_attr.attr,
>         &order_attr.attr,
>         &min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL
>         &cpu_partial_attr.attr,
> +#endif
>         &objects_attr.attr,
>         &objects_partial_attr.attr,
>         &partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
>         &destroy_by_rcu_attr.attr,
>         &shrink_attr.attr,
>         &reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         &slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>         &total_objects_attr.attr,
>         &slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
>         &order_fallback_attr.attr,
>         &cmpxchg_double_fail_attr.attr,
>         &cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         &cpu_partial_alloc_attr.attr,
>         &cpu_partial_free_attr.attr,
>         &cpu_partial_node_attr.attr,
>         &cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>         &failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig     2013-03-28 12:14:26.958358688 -0500
> +++ linux/init/Kconfig  2013-03-28 12:19:46.275866639 -0500
> @@ -1514,6 +1514,14 @@ config SLOB
>
>  endchoice
>
> +config SLUB_CPU_PARTIAL
> +       depends on SLUB
> +       bool "SLUB per cpu partial cache"
> +       help
> +         Per cpu partial caches accellerate freeing of objects at the
> +         price of more indeterminism in the latency of the free.
> +         Typically one would choose no for a realtime system.

Is "batch" a better description than "accelerate" ?  Something like

     Per cpu partial caches allows batch freeing of objects to maximize
     throughput.  However, this can increase the length of time spent
     holding key locks, which can increase latency spikes with respect
     to responsiveness.  Select yes unless you are tuning for a realtime
     oriented system.

Also, I believe this will cause a behaviour change for people who
just run "make oldconfig" -- since there is no default line.  Meaning
that it used to be unconditionally on, but now I think it will be off
by default, if people just mindlessly hold down Enter key.

For RT, we'll want default N if RT_FULL (RT_BASE?) but for mainline,
I expect you'll want default Y in order to be consistent with previous
behaviour?

I've not built/booted yet, but I'll follow up if I see anything else in doing
that.

Thanks for the patches, as I'd said to Steve -- I was totally unaware that
this tuning knob even existed prior to this discussion.

Paul.
--

> +
>  config MMAP_ALLOW_UNINITIALIZED
>         bool "Allow mmapped anonymous memory to be uninitialized"
>         depends on EXPERT && !MMU
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-29  2:43                             ` Paul Gortmaker
@ 2013-04-01 15:32                               ` Christoph Lameter
  2013-04-01 16:06                                   ` Paul Gortmaker
                                                   ` (2 more replies)
       [not found]                               ` <alpine.DEB.2.02.1304011025550.12690@gentwo.org>
  1 sibling, 3 replies; 51+ messages in thread
From: Christoph Lameter @ 2013-04-01 15:32 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: Joonsoo Kim, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Thu, 28 Mar 2013, Paul Gortmaker wrote:

> > Index: linux/init/Kconfig
> > ===================================================================
> > --- linux.orig/init/Kconfig     2013-03-28 12:14:26.958358688 -0500
> > +++ linux/init/Kconfig  2013-03-28 12:19:46.275866639 -0500
> > @@ -1514,6 +1514,14 @@ config SLOB
> >
> >  endchoice
> >
> > +config SLUB_CPU_PARTIAL
> > +       depends on SLUB
> > +       bool "SLUB per cpu partial cache"
> > +       help
> > +         Per cpu partial caches accellerate freeing of objects at the
> > +         price of more indeterminism in the latency of the free.
> > +         Typically one would choose no for a realtime system.
>
> Is "batch" a better description than "accelerate" ?  Something like

Its not a batching but a cache that is going to be mainly used for new
allocations on the same processor.

>      Per cpu partial caches allows batch freeing of objects to maximize
>      throughput.  However, this can increase the length of time spent
>      holding key locks, which can increase latency spikes with respect
>      to responsiveness.  Select yes unless you are tuning for a realtime
>      oriented system.
>
> Also, I believe this will cause a behaviour change for people who
> just run "make oldconfig" -- since there is no default line.  Meaning
> that it used to be unconditionally on, but now I think it will be off
> by default, if people just mindlessly hold down Enter key.

Ok.
>
> For RT, we'll want default N if RT_FULL (RT_BASE?) but for mainline,
> I expect you'll want default Y in order to be consistent with previous
> behaviour?

I was not sure exactly how to handle that one yet for realtime. So I need
two different patches?

> I've not built/booted yet, but I'll follow up if I see anything else in doing
> that.

Here is an updated patch. I will also send an updated fixup patch.


Subject: slub: Make cpu partial slab support configurable V2

cpu partial support can introduce level of indeterminism that is not wanted
in certain context (like a realtime kernel). Make it configurable.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/include/linux/slub_def.h
===================================================================
--- linux.orig/include/linux/slub_def.h	2013-04-01 10:27:05.908964674 -0500
+++ linux/include/linux/slub_def.h	2013-04-01 10:27:19.905178531 -0500
@@ -47,7 +47,9 @@ struct kmem_cache_cpu {
 	void **freelist;	/* Pointer to next available object */
 	unsigned long tid;	/* Globally unique transaction id */
 	struct page *page;	/* The slab from which we are allocating */
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *partial;	/* Partially allocated frozen slabs */
+#endif
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif
@@ -84,7 +86,9 @@ struct kmem_cache {
 	int size;		/* The size of an object including meta data */
 	int object_size;	/* The size of an object without meta data */
 	int offset;		/* Free pointer offset. */
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	int cpu_partial;	/* Number of per cpu partial objects to keep around */
+#endif
 	struct kmem_cache_order_objects oo;

 	/* Allocation and freeing of slabs */
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-04-01 10:27:05.908964674 -0500
+++ linux/mm/slub.c	2013-04-01 10:27:19.905178531 -0500
@@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
 	return freelist;
 }

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
+#endif
 static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);

 /*
@@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
 			object = t;
 			available =  page->objects - (unsigned long)page->lru.next;
 		} else {
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			available = put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
+#else
+			BUG();
+#endif
 		}
-		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
+		if (kmem_cache_debug(s) ||
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+			available > s->cpu_partial / 2
+#else
+			available > 0
+#endif
+			)
 			break;

 	}
@@ -1874,6 +1886,7 @@ redo:
 	}
 }

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 /*
  * Unfreeze all the cpu partial slabs.
  *
@@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
 	return pobjects;
 }
+#endif

 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
@@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
 		if (c->page)
 			flush_slab(s, c);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 		unfreeze_partials(s, c);
+#endif
 	}
 }

@@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
 	struct kmem_cache *s = info;
 	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	return c->page || c->partial;
+#else
+	return c->page;
+#endif
 }

 static void flush_all(struct kmem_cache *s)
@@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
 	page = c->page;
 	if (!page)
 		goto new_slab;
+
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 redo:
+#endif

 	if (unlikely(!node_match(page, node))) {
 		stat(s, ALLOC_NODE_MISMATCH);
@@ -2278,6 +2301,7 @@ load_freelist:

 new_slab:

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	if (c->partial) {
 		page = c->page = c->partial;
 		c->partial = page->next;
@@ -2285,6 +2309,7 @@ new_slab:
 		c->freelist = NULL;
 		goto redo;
 	}
+#endif

 	freelist = new_slab_objects(s, gfpflags, node, &c);

@@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
 		new.inuse--;
 		if ((!new.inuse || !prior) && !was_frozen) {

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			if (!kmem_cache_debug(s) && !prior)

 				/*
@@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
 				 */
 				new.frozen = 1;

-			else { /* Needs to be taken off a list */
+			else
+#endif
+		       		{ /* Needs to be taken off a list */

 	                        n = get_node(s, page_to_nid(page));
 				/*
@@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
 		"__slab_free"));

 	if (likely(!n)) {
+#ifdef CONFIG_SLUB_CPU_PARTIAL

 		/*
 		 * If we just froze the page then put it onto the
@@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
+#endif
 		/*
 		 * The list lock was not taken therefore no list
 		 * activity can be necessary.
@@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
 	 * list to avoid pounding the page allocator excessively.
 	 */
 	set_min_partial(s, ilog2(s->size) / 2);
-
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	/*
 	 * cpu_partial determined the maximum number of objects kept in the
 	 * per cpu partial lists of a processor.
@@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
 		s->cpu_partial = 13;
 	else
 		s->cpu_partial = 30;
+#endif

 #ifdef CONFIG_NUMA
 	s->remote_node_defrag_ratio = 1000;
@@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
 			total += x;
 			nodes[node] += x;

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			page = ACCESS_ONCE(c->partial);
 			if (page) {
 				x = page->pobjects;
 				total += x;
 				nodes[node] += x;
 			}
-
+#endif
 			per_cpu[node]++;
 		}
 	}
@@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
 }
 SLAB_ATTR(min_partial);

+#ifdef CONFIG_CPU_PARTIAL
 static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%u\n", s->cpu_partial);
@@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
 	return length;
 }
 SLAB_ATTR(cpu_partial);
+#endif

 static ssize_t ctor_show(struct kmem_cache *s, char *buf)
 {
@@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
 }
 SLAB_ATTR_RO(objects_partial);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
 {
 	int objects = 0;
@@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
 	return len + sprintf(buf + len, "\n");
 }
 SLAB_ATTR_RO(slabs_cpu_partial);
+#endif

 static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
 {
@@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
 STAT_ATTR(ORDER_FALLBACK, order_fallback);
 STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
 STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
+#ifdef CONFIG_CPU_PARTIAL
 STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
 STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
 STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
 STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
 #endif
+#endif

 static struct attribute *slab_attrs[] = {
 	&slab_size_attr.attr,
@@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
 	&objs_per_slab_attr.attr,
 	&order_attr.attr,
 	&min_partial_attr.attr,
+#ifdef CONFIG_CPU_PARTIAL
 	&cpu_partial_attr.attr,
+#endif
 	&objects_attr.attr,
 	&objects_partial_attr.attr,
 	&partial_attr.attr,
@@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
 	&destroy_by_rcu_attr.attr,
 	&shrink_attr.attr,
 	&reserved_attr.attr,
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	&slabs_cpu_partial_attr.attr,
+#endif
 #ifdef CONFIG_SLUB_DEBUG
 	&total_objects_attr.attr,
 	&slabs_attr.attr,
@@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
 	&order_fallback_attr.attr,
 	&cmpxchg_double_fail_attr.attr,
 	&cmpxchg_double_cpu_fail_attr.attr,
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	&cpu_partial_alloc_attr.attr,
 	&cpu_partial_free_attr.attr,
 	&cpu_partial_node_attr.attr,
 	&cpu_partial_drain_attr.attr,
 #endif
+#endif
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
 #endif
Index: linux/init/Kconfig
===================================================================
--- linux.orig/init/Kconfig	2013-04-01 10:27:05.908964674 -0500
+++ linux/init/Kconfig	2013-04-01 10:31:46.497863625 -0500
@@ -1514,6 +1514,17 @@ config SLOB

 endchoice

+config SLUB_CPU_PARTIAL
+	default y
+	depends on SLUB
+	bool "SLUB per cpu partial cache"
+	help
+	  Per cpu partial caches accellerate objects allocation and freeing
+	  that is local to a processor at the price of more indeterminism
+	  in the latency of the free. On overflow these caches will be cleared
+	  which requires the taking of locks that may cause latency spikes.
+	  Typically one would choose no for a realtime system.
+
 config MMAP_ALLOW_UNINITIALIZED
 	bool "Allow mmapped anonymous memory to be uninitialized"
 	depends on EXPERT && !MMU

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
       [not found]                               ` <alpine.DEB.2.02.1304011025550.12690@gentwo.org>
@ 2013-04-01 15:33                                 ` Christoph Lameter
  2013-04-02  0:42                                   ` Joonsoo Kim
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-04-01 15:33 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: Joonsoo Kim, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

Subject: slub: Fix object counts in acquire_slab V2

It seems that we were overallocating objects from the slab queues
since get_partial_node() assumed that page->inuse was undisturbed by
acquire_slab(). Save the # of objects in page->lru.next in acquire_slab()
and pass it to get_partial_node() that way.

I have a vague memory that Joonsoo also ran into this issue awhile back.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-03-28 12:14:26.958358688 -0500
+++ linux/mm/slub.c	2013-04-01 10:23:24.677584499 -0500
@@ -1498,6 +1498,7 @@ static inline void *acquire_slab(struct
 	void *freelist;
 	unsigned long counters;
 	struct page new;
+	unsigned long objects;

 	/*
 	 * Zap the freelist and set the frozen bit.
@@ -1507,6 +1508,7 @@ static inline void *acquire_slab(struct
 	freelist = page->freelist;
 	counters = page->counters;
 	new.counters = counters;
+	objects = page->inuse;
 	if (mode) {
 		new.inuse = page->objects;
 		new.freelist = NULL;
@@ -1524,6 +1526,7 @@ static inline void *acquire_slab(struct
 		return NULL;

 	remove_partial(n, page);
+	page->lru.next = (void *)objects;
 	WARN_ON(!freelist);
 	return freelist;
 }
@@ -1565,7 +1568,7 @@ static void *get_partial_node(struct kme
 			c->page = page;
 			stat(s, ALLOC_FROM_PARTIAL);
 			object = t;
-			available =  page->objects - page->inuse;
+			available =  page->objects - (unsigned long)page->lru.next;
 		} else {
 			available = put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-01 15:32                               ` Christoph Lameter
@ 2013-04-01 16:06                                   ` Paul Gortmaker
  2013-04-01 21:46                                   ` Paul Gortmaker
  2013-04-02  1:37                                 ` Joonsoo Kim
  2 siblings, 0 replies; 51+ messages in thread
From: Paul Gortmaker @ 2013-04-01 16:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On 13-04-01 11:32 AM, Christoph Lameter wrote:
> On Thu, 28 Mar 2013, Paul Gortmaker wrote:
> 
>>> Index: linux/init/Kconfig
>>> ===================================================================
>>> --- linux.orig/init/Kconfig     2013-03-28 12:14:26.958358688 -0500
>>> +++ linux/init/Kconfig  2013-03-28 12:19:46.275866639 -0500
>>> @@ -1514,6 +1514,14 @@ config SLOB
>>>
>>>  endchoice
>>>
>>> +config SLUB_CPU_PARTIAL
>>> +       depends on SLUB
>>> +       bool "SLUB per cpu partial cache"
>>> +       help
>>> +         Per cpu partial caches accellerate freeing of objects at the
>>> +         price of more indeterminism in the latency of the free.
>>> +         Typically one would choose no for a realtime system.
>>
>> Is "batch" a better description than "accelerate" ?  Something like
> 
> Its not a batching but a cache that is going to be mainly used for new
> allocations on the same processor.

OK.  In that case, a minor nit - since it is user facing text, we should
probably drop one of the "l" for "accelerate".

> 
>>      Per cpu partial caches allows batch freeing of objects to maximize
>>      throughput.  However, this can increase the length of time spent
>>      holding key locks, which can increase latency spikes with respect
>>      to responsiveness.  Select yes unless you are tuning for a realtime
>>      oriented system.
>>
>> Also, I believe this will cause a behaviour change for people who
>> just run "make oldconfig" -- since there is no default line.  Meaning
>> that it used to be unconditionally on, but now I think it will be off
>> by default, if people just mindlessly hold down Enter key.
> 
> Ok.
>>
>> For RT, we'll want default N if RT_FULL (RT_BASE?) but for mainline,
>> I expect you'll want default Y in order to be consistent with previous
>> behaviour?
> 
> I was not sure exactly how to handle that one yet for realtime. So I need
> two different patches?

I don't think you need to worry about realtime -- meaning that I would guess
once the patch exists in mainline, Steve will probably cherry pick it onto
3.6.11.x-stable, and then he'd likely add a one-line follow on RT-specific
patch to make it set to off/disabled for RT.  Similar for 3.4.x etc.

> 
>> I've not built/booted yet, but I'll follow up if I see anything else in doing
>> that.
> 
> Here is an updated patch. I will also send an updated fixup patch.

I'll give these some local testing today.

Thanks,
Paul.
--

> 
> 
> Subject: slub: Make cpu partial slab support configurable V2
> 
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/include/linux/slub_def.h
> ===================================================================
> --- linux.orig/include/linux/slub_def.h	2013-04-01 10:27:05.908964674 -0500
> +++ linux/include/linux/slub_def.h	2013-04-01 10:27:19.905178531 -0500
> @@ -47,7 +47,9 @@ struct kmem_cache_cpu {
>  	void **freelist;	/* Pointer to next available object */
>  	unsigned long tid;	/* Globally unique transaction id */
>  	struct page *page;	/* The slab from which we are allocating */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	struct page *partial;	/* Partially allocated frozen slabs */
> +#endif
>  #ifdef CONFIG_SLUB_STATS
>  	unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
> @@ -84,7 +86,9 @@ struct kmem_cache {
>  	int size;		/* The size of an object including meta data */
>  	int object_size;	/* The size of an object without meta data */
>  	int offset;		/* Free pointer offset. */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	int cpu_partial;	/* Number of per cpu partial objects to keep around */
> +#endif
>  	struct kmem_cache_order_objects oo;
> 
>  	/* Allocation and freeing of slabs */
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-04-01 10:27:05.908964674 -0500
> +++ linux/mm/slub.c	2013-04-01 10:27:19.905178531 -0500
> @@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
>  	return freelist;
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
> +#endif
>  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
> 
>  /*
> @@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
>  			object = t;
>  			available =  page->objects - (unsigned long)page->lru.next;
>  		} else {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			available = put_cpu_partial(s, page, 0);
>  			stat(s, CPU_PARTIAL_NODE);
> +#else
> +			BUG();
> +#endif
>  		}
> -		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
> +		if (kmem_cache_debug(s) ||
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +			available > s->cpu_partial / 2
> +#else
> +			available > 0
> +#endif
> +			)
>  			break;
> 
>  	}
> @@ -1874,6 +1886,7 @@ redo:
>  	}
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  /*
>   * Unfreeze all the cpu partial slabs.
>   *
> @@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
>  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
>  	return pobjects;
>  }
> +#endif
> 
>  static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>  {
> @@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
>  		if (c->page)
>  			flush_slab(s, c);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  		unfreeze_partials(s, c);
> +#endif
>  	}
>  }
> 
> @@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
>  	struct kmem_cache *s = info;
>  	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	return c->page || c->partial;
> +#else
> +	return c->page;
> +#endif
>  }
> 
>  static void flush_all(struct kmem_cache *s)
> @@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
>  	page = c->page;
>  	if (!page)
>  		goto new_slab;
> +
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  redo:
> +#endif
> 
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
> @@ -2278,6 +2301,7 @@ load_freelist:
> 
>  new_slab:
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	if (c->partial) {
>  		page = c->page = c->partial;
>  		c->partial = page->next;
> @@ -2285,6 +2309,7 @@ new_slab:
>  		c->freelist = NULL;
>  		goto redo;
>  	}
> +#endif
> 
>  	freelist = new_slab_objects(s, gfpflags, node, &c);
> 
> @@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
>  		new.inuse--;
>  		if ((!new.inuse || !prior) && !was_frozen) {
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			if (!kmem_cache_debug(s) && !prior)
> 
>  				/*
> @@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
>  				 */
>  				new.frozen = 1;
> 
> -			else { /* Needs to be taken off a list */
> +			else
> +#endif
> +		       		{ /* Needs to be taken off a list */
> 
>  	                        n = get_node(s, page_to_nid(page));
>  				/*
> @@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
>  		"__slab_free"));
> 
>  	if (likely(!n)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> 
>  		/*
>  		 * If we just froze the page then put it onto the
> @@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
>  			put_cpu_partial(s, page, 1);
>  			stat(s, CPU_PARTIAL_FREE);
>  		}
> +#endif
>  		/*
>  		 * The list lock was not taken therefore no list
>  		 * activity can be necessary.
> @@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
>  	 * list to avoid pounding the page allocator excessively.
>  	 */
>  	set_min_partial(s, ilog2(s->size) / 2);
> -
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	/*
>  	 * cpu_partial determined the maximum number of objects kept in the
>  	 * per cpu partial lists of a processor.
> @@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
>  		s->cpu_partial = 13;
>  	else
>  		s->cpu_partial = 30;
> +#endif
> 
>  #ifdef CONFIG_NUMA
>  	s->remote_node_defrag_ratio = 1000;
> @@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
>  			total += x;
>  			nodes[node] += x;
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			page = ACCESS_ONCE(c->partial);
>  			if (page) {
>  				x = page->pobjects;
>  				total += x;
>  				nodes[node] += x;
>  			}
> -
> +#endif
>  			per_cpu[node]++;
>  		}
>  	}
> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
> 
> +#ifdef CONFIG_CPU_PARTIAL
>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
>  	return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
> 
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
>  	return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
> 
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL
>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
> 
>  static struct attribute *slab_attrs[] = {
>  	&slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
>  	&objs_per_slab_attr.attr,
>  	&order_attr.attr,
>  	&min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL
>  	&cpu_partial_attr.attr,
> +#endif
>  	&objects_attr.attr,
>  	&objects_partial_attr.attr,
>  	&partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
>  	&destroy_by_rcu_attr.attr,
>  	&shrink_attr.attr,
>  	&reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>  	&total_objects_attr.attr,
>  	&slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
>  	&order_fallback_attr.attr,
>  	&cmpxchg_double_fail_attr.attr,
>  	&cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&cpu_partial_alloc_attr.attr,
>  	&cpu_partial_free_attr.attr,
>  	&cpu_partial_node_attr.attr,
>  	&cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>  	&failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig	2013-04-01 10:27:05.908964674 -0500
> +++ linux/init/Kconfig	2013-04-01 10:31:46.497863625 -0500
> @@ -1514,6 +1514,17 @@ config SLOB
> 
>  endchoice
> 
> +config SLUB_CPU_PARTIAL
> +	default y
> +	depends on SLUB
> +	bool "SLUB per cpu partial cache"
> +	help
> +	  Per cpu partial caches accellerate objects allocation and freeing
> +	  that is local to a processor at the price of more indeterminism
> +	  in the latency of the free. On overflow these caches will be cleared
> +	  which requires the taking of locks that may cause latency spikes.
> +	  Typically one would choose no for a realtime system.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>  	bool "Allow mmapped anonymous memory to be uninitialized"
>  	depends on EXPERT && !MMU
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
@ 2013-04-01 16:06                                   ` Paul Gortmaker
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Gortmaker @ 2013-04-01 16:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On 13-04-01 11:32 AM, Christoph Lameter wrote:
> On Thu, 28 Mar 2013, Paul Gortmaker wrote:
> 
>>> Index: linux/init/Kconfig
>>> ===================================================================
>>> --- linux.orig/init/Kconfig     2013-03-28 12:14:26.958358688 -0500
>>> +++ linux/init/Kconfig  2013-03-28 12:19:46.275866639 -0500
>>> @@ -1514,6 +1514,14 @@ config SLOB
>>>
>>>  endchoice
>>>
>>> +config SLUB_CPU_PARTIAL
>>> +       depends on SLUB
>>> +       bool "SLUB per cpu partial cache"
>>> +       help
>>> +         Per cpu partial caches accellerate freeing of objects at the
>>> +         price of more indeterminism in the latency of the free.
>>> +         Typically one would choose no for a realtime system.
>>
>> Is "batch" a better description than "accelerate" ?  Something like
> 
> Its not a batching but a cache that is going to be mainly used for new
> allocations on the same processor.

OK.  In that case, a minor nit - since it is user facing text, we should
probably drop one of the "l" for "accelerate".

> 
>>      Per cpu partial caches allows batch freeing of objects to maximize
>>      throughput.  However, this can increase the length of time spent
>>      holding key locks, which can increase latency spikes with respect
>>      to responsiveness.  Select yes unless you are tuning for a realtime
>>      oriented system.
>>
>> Also, I believe this will cause a behaviour change for people who
>> just run "make oldconfig" -- since there is no default line.  Meaning
>> that it used to be unconditionally on, but now I think it will be off
>> by default, if people just mindlessly hold down Enter key.
> 
> Ok.
>>
>> For RT, we'll want default N if RT_FULL (RT_BASE?) but for mainline,
>> I expect you'll want default Y in order to be consistent with previous
>> behaviour?
> 
> I was not sure exactly how to handle that one yet for realtime. So I need
> two different patches?

I don't think you need to worry about realtime -- meaning that I would guess
once the patch exists in mainline, Steve will probably cherry pick it onto
3.6.11.x-stable, and then he'd likely add a one-line follow on RT-specific
patch to make it set to off/disabled for RT.  Similar for 3.4.x etc.

> 
>> I've not built/booted yet, but I'll follow up if I see anything else in doing
>> that.
> 
> Here is an updated patch. I will also send an updated fixup patch.

I'll give these some local testing today.

Thanks,
Paul.
--

> 
> 
> Subject: slub: Make cpu partial slab support configurable V2
> 
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/include/linux/slub_def.h
> ===================================================================
> --- linux.orig/include/linux/slub_def.h	2013-04-01 10:27:05.908964674 -0500
> +++ linux/include/linux/slub_def.h	2013-04-01 10:27:19.905178531 -0500
> @@ -47,7 +47,9 @@ struct kmem_cache_cpu {
>  	void **freelist;	/* Pointer to next available object */
>  	unsigned long tid;	/* Globally unique transaction id */
>  	struct page *page;	/* The slab from which we are allocating */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	struct page *partial;	/* Partially allocated frozen slabs */
> +#endif
>  #ifdef CONFIG_SLUB_STATS
>  	unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
> @@ -84,7 +86,9 @@ struct kmem_cache {
>  	int size;		/* The size of an object including meta data */
>  	int object_size;	/* The size of an object without meta data */
>  	int offset;		/* Free pointer offset. */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	int cpu_partial;	/* Number of per cpu partial objects to keep around */
> +#endif
>  	struct kmem_cache_order_objects oo;
> 
>  	/* Allocation and freeing of slabs */
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-04-01 10:27:05.908964674 -0500
> +++ linux/mm/slub.c	2013-04-01 10:27:19.905178531 -0500
> @@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
>  	return freelist;
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
> +#endif
>  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
> 
>  /*
> @@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
>  			object = t;
>  			available =  page->objects - (unsigned long)page->lru.next;
>  		} else {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			available = put_cpu_partial(s, page, 0);
>  			stat(s, CPU_PARTIAL_NODE);
> +#else
> +			BUG();
> +#endif
>  		}
> -		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
> +		if (kmem_cache_debug(s) ||
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +			available > s->cpu_partial / 2
> +#else
> +			available > 0
> +#endif
> +			)
>  			break;
> 
>  	}
> @@ -1874,6 +1886,7 @@ redo:
>  	}
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  /*
>   * Unfreeze all the cpu partial slabs.
>   *
> @@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
>  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
>  	return pobjects;
>  }
> +#endif
> 
>  static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>  {
> @@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
>  		if (c->page)
>  			flush_slab(s, c);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  		unfreeze_partials(s, c);
> +#endif
>  	}
>  }
> 
> @@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
>  	struct kmem_cache *s = info;
>  	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	return c->page || c->partial;
> +#else
> +	return c->page;
> +#endif
>  }
> 
>  static void flush_all(struct kmem_cache *s)
> @@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
>  	page = c->page;
>  	if (!page)
>  		goto new_slab;
> +
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  redo:
> +#endif
> 
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
> @@ -2278,6 +2301,7 @@ load_freelist:
> 
>  new_slab:
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	if (c->partial) {
>  		page = c->page = c->partial;
>  		c->partial = page->next;
> @@ -2285,6 +2309,7 @@ new_slab:
>  		c->freelist = NULL;
>  		goto redo;
>  	}
> +#endif
> 
>  	freelist = new_slab_objects(s, gfpflags, node, &c);
> 
> @@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
>  		new.inuse--;
>  		if ((!new.inuse || !prior) && !was_frozen) {
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			if (!kmem_cache_debug(s) && !prior)
> 
>  				/*
> @@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
>  				 */
>  				new.frozen = 1;
> 
> -			else { /* Needs to be taken off a list */
> +			else
> +#endif
> +		       		{ /* Needs to be taken off a list */
> 
>  	                        n = get_node(s, page_to_nid(page));
>  				/*
> @@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
>  		"__slab_free"));
> 
>  	if (likely(!n)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> 
>  		/*
>  		 * If we just froze the page then put it onto the
> @@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
>  			put_cpu_partial(s, page, 1);
>  			stat(s, CPU_PARTIAL_FREE);
>  		}
> +#endif
>  		/*
>  		 * The list lock was not taken therefore no list
>  		 * activity can be necessary.
> @@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
>  	 * list to avoid pounding the page allocator excessively.
>  	 */
>  	set_min_partial(s, ilog2(s->size) / 2);
> -
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	/*
>  	 * cpu_partial determined the maximum number of objects kept in the
>  	 * per cpu partial lists of a processor.
> @@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
>  		s->cpu_partial = 13;
>  	else
>  		s->cpu_partial = 30;
> +#endif
> 
>  #ifdef CONFIG_NUMA
>  	s->remote_node_defrag_ratio = 1000;
> @@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
>  			total += x;
>  			nodes[node] += x;
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			page = ACCESS_ONCE(c->partial);
>  			if (page) {
>  				x = page->pobjects;
>  				total += x;
>  				nodes[node] += x;
>  			}
> -
> +#endif
>  			per_cpu[node]++;
>  		}
>  	}
> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
> 
> +#ifdef CONFIG_CPU_PARTIAL
>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
>  	return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
> 
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
>  	return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
> 
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL
>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
> 
>  static struct attribute *slab_attrs[] = {
>  	&slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
>  	&objs_per_slab_attr.attr,
>  	&order_attr.attr,
>  	&min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL
>  	&cpu_partial_attr.attr,
> +#endif
>  	&objects_attr.attr,
>  	&objects_partial_attr.attr,
>  	&partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
>  	&destroy_by_rcu_attr.attr,
>  	&shrink_attr.attr,
>  	&reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>  	&total_objects_attr.attr,
>  	&slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
>  	&order_fallback_attr.attr,
>  	&cmpxchg_double_fail_attr.attr,
>  	&cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&cpu_partial_alloc_attr.attr,
>  	&cpu_partial_free_attr.attr,
>  	&cpu_partial_node_attr.attr,
>  	&cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>  	&failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig	2013-04-01 10:27:05.908964674 -0500
> +++ linux/init/Kconfig	2013-04-01 10:31:46.497863625 -0500
> @@ -1514,6 +1514,17 @@ config SLOB
> 
>  endchoice
> 
> +config SLUB_CPU_PARTIAL
> +	default y
> +	depends on SLUB
> +	bool "SLUB per cpu partial cache"
> +	help
> +	  Per cpu partial caches accellerate objects allocation and freeing
> +	  that is local to a processor at the price of more indeterminism
> +	  in the latency of the free. On overflow these caches will be cleared
> +	  which requires the taking of locks that may cause latency spikes.
> +	  Typically one would choose no for a realtime system.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>  	bool "Allow mmapped anonymous memory to be uninitialized"
>  	depends on EXPERT && !MMU
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-01 15:32                               ` Christoph Lameter
@ 2013-04-01 21:46                                   ` Paul Gortmaker
  2013-04-01 21:46                                   ` Paul Gortmaker
  2013-04-02  1:37                                 ` Joonsoo Kim
  2 siblings, 0 replies; 51+ messages in thread
From: Paul Gortmaker @ 2013-04-01 21:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On 13-04-01 11:32 AM, Christoph Lameter wrote:

[...]

> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
> 
> +#ifdef CONFIG_CPU_PARTIAL

Above causes build failures when stats are on, because the
name is wrong and hence is never defined, and ....

>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
>  	return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
> 
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
>  	return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
> 
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL

...the same here, and ...

>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
> 
>  static struct attribute *slab_attrs[] = {
>  	&slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
>  	&objs_per_slab_attr.attr,
>  	&order_attr.attr,
>  	&min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL

...here too.  All should be CONFIG_SLUB_CPU_PARTIAL

Thanks,
Paul.
--

>  	&cpu_partial_attr.attr,
> +#endif
>  	&objects_attr.attr,
>  	&objects_partial_attr.attr,
>  	&partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
>  	&destroy_by_rcu_attr.attr,
>  	&shrink_attr.attr,
>  	&reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>  	&total_objects_attr.attr,
>  	&slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
>  	&order_fallback_attr.attr,
>  	&cmpxchg_double_fail_attr.attr,
>  	&cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&cpu_partial_alloc_attr.attr,
>  	&cpu_partial_free_attr.attr,
>  	&cpu_partial_node_attr.attr,
>  	&cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>  	&failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig	2013-04-01 10:27:05.908964674 -0500
> +++ linux/init/Kconfig	2013-04-01 10:31:46.497863625 -0500
> @@ -1514,6 +1514,17 @@ config SLOB
> 
>  endchoice
> 
> +config SLUB_CPU_PARTIAL
> +	default y
> +	depends on SLUB
> +	bool "SLUB per cpu partial cache"
> +	help
> +	  Per cpu partial caches accellerate objects allocation and freeing
> +	  that is local to a processor at the price of more indeterminism
> +	  in the latency of the free. On overflow these caches will be cleared
> +	  which requires the taking of locks that may cause latency spikes.
> +	  Typically one would choose no for a realtime system.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>  	bool "Allow mmapped anonymous memory to be uninitialized"
>  	depends on EXPERT && !MMU
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
@ 2013-04-01 21:46                                   ` Paul Gortmaker
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Gortmaker @ 2013-04-01 21:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On 13-04-01 11:32 AM, Christoph Lameter wrote:

[...]

> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
> 
> +#ifdef CONFIG_CPU_PARTIAL

Above causes build failures when stats are on, because the
name is wrong and hence is never defined, and ....

>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
>  	return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
> 
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
>  	return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
> 
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL

...the same here, and ...

>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
> 
>  static struct attribute *slab_attrs[] = {
>  	&slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
>  	&objs_per_slab_attr.attr,
>  	&order_attr.attr,
>  	&min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL

...here too.  All should be CONFIG_SLUB_CPU_PARTIAL

Thanks,
Paul.
--

>  	&cpu_partial_attr.attr,
> +#endif
>  	&objects_attr.attr,
>  	&objects_partial_attr.attr,
>  	&partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
>  	&destroy_by_rcu_attr.attr,
>  	&shrink_attr.attr,
>  	&reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>  	&total_objects_attr.attr,
>  	&slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
>  	&order_fallback_attr.attr,
>  	&cmpxchg_double_fail_attr.attr,
>  	&cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&cpu_partial_alloc_attr.attr,
>  	&cpu_partial_free_attr.attr,
>  	&cpu_partial_node_attr.attr,
>  	&cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>  	&failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig	2013-04-01 10:27:05.908964674 -0500
> +++ linux/init/Kconfig	2013-04-01 10:31:46.497863625 -0500
> @@ -1514,6 +1514,17 @@ config SLOB
> 
>  endchoice
> 
> +config SLUB_CPU_PARTIAL
> +	default y
> +	depends on SLUB
> +	bool "SLUB per cpu partial cache"
> +	help
> +	  Per cpu partial caches accellerate objects allocation and freeing
> +	  that is local to a processor at the price of more indeterminism
> +	  in the latency of the free. On overflow these caches will be cleared
> +	  which requires the taking of locks that may cause latency spikes.
> +	  Typically one would choose no for a realtime system.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>  	bool "Allow mmapped anonymous memory to be uninitialized"
>  	depends on EXPERT && !MMU
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-01 16:06                                   ` Paul Gortmaker
  (?)
@ 2013-04-02  0:07                                   ` Paul Gortmaker
  -1 siblings, 0 replies; 51+ messages in thread
From: Paul Gortmaker @ 2013-04-02  0:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

[Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.] On 01/04/2013 (Mon 12:06) Paul Gortmaker wrote:

> On 13-04-01 11:32 AM, Christoph Lameter wrote:

[...]

> > Here is an updated patch. I will also send an updated fixup patch.
> 
> I'll give these some local testing today.

So, ignoring for the moment, the mismatched #ifdef args I'd reported a
couple hrs ago, since they do not matter for the case where
CONFIG_SLUB_CPU_PARTIAL = off/disabled, I'm seeing this when applying
these two commits to today's linux-next (since I'm assuming this is 3.10
material).

I can't say for sure it is these patches, and it hasn't happened every
reboot -- but it has happened twice in approximately four reboots.  The
1st time I missed capturing the log.  But this time I've captured it.

The finger pointed at __slab_free is the only reason I have to assume
it may be related to these two changes.  Both times, it got angry only when
I tried to reboot....

Paul.

--------------------------------------

root@canoehead:~# reboot

Broadcast message from root@canoehead
        (/dev/ttyS0) at 19:42 ...

Therpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
[ 7935.268075] NOHZ: local_softirq_pending 80
[ 7936.356160] general protection fault: 0000 [#1] SMP
[ 7936.361747] Modules linked in:
[ 7936.365184] CPU 15
[ 7936.367354] Pid: 5824, comm: reboot Not tainted 3.9.0-rc4-next-20130328+ #3 Intel Corporation S2600CP/S2600CP
[ 7936.378628] RIP: 0010:[<ffffffff817e30c6>]  [<ffffffff817e30c6>] __slab_free+0x242/0x2ba

[ 7936.387692] RSP: 0018:ffff88042bb69a58  EFLAGS: 00010082
[ 7936.393623] RAX: dead000000200200 RBX: ffff88082cf7ecb0 RCX: 0000000100240000
[ 7936.401582] RDX: dead000000100100 RSI: 0000000000240000 RDI: ffff88082f000580
[ 7936.409555] RBP: ffff88042bb69b08 R08: ffff88082f000580 R09: dead000000100100
[ 7936.417520] R10: dead000000200200 R11: ffffea0020ae7740 R12: ffffea0020b3df80
[ 7936.425492] R13: ffff88082cf7ed20 R14: 0000000100240000 R15: ffff88042f44cb00
[ 7936.433451] FS:  00007f6687342700(0000) GS:ffff88083f6e0000(0000) knlGS:0000000000000000
[ 7936.442476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7936.448890] CR2: 00007f2f603a06a8 CR3: 000000042a0a6000 CR4: 00000000000407e0
[ 7936.456849] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7936.464813] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7936.472786] Process reboot (pid: 5824, threadinfo ffff88042bb68000, task ffff88042c208f20)
[ 7936.482003] Stack:
[ 7936.484265]  ffff88042f403a00 0000000000000001 ffff880400240000 0000000000000000
[ 7936.492561]  0000000100240001 ffff88082f000580 ffff88083f6f2a30 0000000000000203
[ 7936.500864]  ffff880400240023 ffff880800000000 0000000100240024 0000000100240000
[ 7936.509160] Call Trace:
[ 7936.511907]  [<ffffffff810fa434>] ? __mod_zone_page_state+0x44/0x50
[ 7936.518924]  [<ffffffff8119ab43>] ? release_sysfs_dirent+0x73/0xf0
[ 7936.525844]  [<ffffffff811245c3>] kmem_cache_free+0x193/0x1b0
[ 7936.532253]  [<ffffffff8119ab43>] release_sysfs_dirent+0x73/0xf0
[ 7936.538961]  [<ffffffff8119b38a>] sysfs_addrm_finish+0x9a/0xc0
[ 7936.545468]  [<ffffffff8119b3de>] remove_dir+0x2e/0x40
[ 7936.551200]  [<ffffffff8119b88f>] sysfs_remove_dir+0x8f/0xa0
[ 7936.557537]  [<ffffffff812a48a6>] kobject_del+0x16/0x40
[ 7936.563358]  [<ffffffff812a4938>] kobject_cleanup+0x68/0x80
[ 7936.569574]  [<ffffffff812a47db>] kobject_put+0x2b/0x60
[ 7936.575426]  [<ffffffff817d7040>] cacheinfo_cpu_callback+0xa7/0xe2
[ 7936.582346]  [<ffffffff817f04dd>] notifier_call_chain+0x4d/0x70
[ 7936.588974]  [<ffffffff810619a9>] __raw_notifier_call_chain+0x9/0x10
[ 7936.596074]  [<ffffffff8103d5cb>] __cpu_notify+0x1b/0x40
[ 7936.601999]  [<ffffffff8103d600>] cpu_notify+0x10/0x20
[ 7936.607738]  [<ffffffff8103d739>] cpu_notify_nofail+0x9/0x20
[ 7936.614075]  [<ffffffff817d3c2b>] _cpu_down+0xfb/0x280
[ 7936.619808]  [<ffffffff8103d89f>] disable_nonboot_cpus+0x7f/0x110
[ 7936.626615]  [<ffffffff81050b16>] kernel_restart+0x16/0x60
[ 7936.632733]  [<ffffffff81050d1f>] SYSC_reboot+0x1af/0x260
[ 7936.638779]  [<ffffffff811580a0>] ? do_sync_work+0x90/0x90
[ 7936.644921]  [<ffffffff810eb609>] ? do_writepages+0x19/0x40
[ 7936.651161]  [<ffffffff810e1799>] ? __filemap_fdatawrite_range+0x49/0x50
[ 7936.658660]  [<ffffffff81144b63>] ? iput+0x43/0x190
[ 7936.664101]  [<ffffffff811580e0>] ? sync_inodes_one_sb+0x20/0x20
[ 7936.670824]  [<ffffffff811628b7>] ? iterate_bdevs+0xe7/0x100
[ 7936.677138]  [<ffffffff81050dd9>] SyS_reboot+0x9/0x10
[ 7936.682794]  [<ffffffff817f4552>] system_call_fastpath+0x16/0x1b
[ 7936.689493] Code: 99 00 00 e9 89 00 00 00 4d 85 ed 74 3c 49 8b 44 24 28 49 8b 54 24 20 49 b9 00 01 10 00 00 00 ad de 49 ba 00 02 20 00 00 00 ad de <48> 89 42 08 48 89 10 4d 89 4c 24 20 4d 89 54 24 28 49
[ 7936.711304] RIP  [<ffffffff817e30c6>] __slab_free+0x242/0x2ba
[ 7936.717724]  RSP <ffff88042bb69a58>
[ 7936.722196] ---[ end trace 8dd02e4a39a05014 ]---
[ 7936.728514] general protection fault: 0000 [#2] SMP
[ 7936.734075] Modules linked in:
[ 7936.737489] CPU 0
[ 7936.739547] Pid: 5682, comm: rc Tainted: G      D      3.9.0-rc4-next-20130328+ #3 Intel Corporation S2600CP/S2600CP
[ 7936.751504] RIP: 0010:[<ffffffff817e30c6>]  [<ffffffff817e30c6>] __slab_free+0x242/0x2ba
[ 7936.760562] RSP: 0018:ffff8808287e1cb8  EFLAGS: 00010086
[ 7936.766486] RAX: dead000000200200 RBX: ffff88082874fa98 RCX: 00000001002c0000
[ 7936.774445] RDX: dead000000100100 RSI: 00000000002c0000 RDI: ffff88082f000b40
[ 7936.782417] RBP: ffff8808287e1d68 R08: ffff88082f000b40 R09: dead000000100100
[ 7936.790376] R10: dead000000200200 R11: 0000000000000000 R12: ffffea0020a1d380
[ 7936.798342] R13: ffff88082874f648 R14: 00000001002c0000 R15: ffff88042f44c600
[ 7936.806302] FS:  0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
[ 7936.815334] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7936.821743] CR2: 00007fcc7ca776a8 CR3: 0000000001c0b000 CR4: 00000000000407f0
[ 7936.829702] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7936.837661] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7936.845622] Process rc (pid: 5682, threadinfo ffff8808287e0000, task ffff88082b58f170)
[ 7936.854452] Stack:
[ 7936.856693]  ffff88042a3f9ee0 ffff88042a3f9ee0 ffff8808002c0000 0000000000000000
[ 7936.864986]  00000001002c0001 ffff88082f000b40 0000000000000246 0000000000000203
[ 7936.873280]  ffffea0020a1c320 ffffea00209f1ea0 ffff88082870cff8 00000001002c0000
[ 7936.881573] Call Trace:
[ 7936.884306]  [<ffffffff811096ae>] ? remove_vma+0x5e/0x70
[ 7936.890239]  [<ffffffff811245c3>] kmem_cache_free+0x193/0x1b0
[ 7936.896641]  [<ffffffff811096ae>] remove_vma+0x5e/0x70
[ 7936.902385]  [<ffffffff8110c68c>] exit_mmap+0xfc/0x140
[ 7936.908139]  [<ffffffff810375f8>] mmput+0x38/0xc0
[ 7936.913393]  [<ffffffff8103f2a7>] do_exit+0x267/0xad0
[ 7936.919043]  [<ffffffff8104b2aa>] ? recalc_sigpending+0x1a/0x60
[ 7936.925654]  [<ffffffff8104bbd2>] ? __set_task_blocked+0x32/0x80
[ 7936.932365]  [<ffffffff8104e2fa>] ? __set_current_blocked+0x3a/0x60
[ 7936.939358]  [<ffffffff8103fc8a>] do_group_exit+0x3a/0xa0
[ 7936.945388]  [<ffffffff8103fd02>] SyS_exit_group+0x12/0x20
[ 7936.951510]  [<ffffffff817f4552>] system_call_fastpath+0x16/0x1b
[ 7936.958198] Code: 99 00 00 e9 89 00 00 00 4d 85 ed 74 3c 49 8b 44 24 28 49 8b 54 24 20 49 b9 00 01 10 00 00 00 ad de 49 ba 00 02 20 00 00 00 ad de <48> 89 42 08 48 89 10 4d 89 4c 24 20 4d 89 54 24 28 49
[ 7936.979907] RIP  [<ffffffff817e30c6>] __slab_free+0x242/0x2ba
[ 7936.986326]  RSP <ffff8808287e1cb8>
[ 7936.990216] ---[ end trace 8dd02e4a39a05015 ]---
[ 7936.995372] Fixing recursive fault but reboot is needed!
[ 7967.655335] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 7967.663220] sr 5:0:0:0: CDB:
[ 7967.666538] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
[ 7967.674178] ata6.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
[ 7967.674178]          res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[ 7967.690579] ata6.00: status: { DRDY }
[ 7967.694674] ata6: hard resetting link
[ 7973.049829] ata6: link is slow to respond, please be patient (ready=0)
[ 7977.695714] ata6: COMRESET failed (errno=-16)
[ 7977.700575] ata6: hard resetting link
[ 7983.055260] ata6: link is slow to respond, please be patient (ready=0)
[ 7987.701140] ata6: COMRESET failed (errno=-16)
[ 7987.706003] ata6: hard resetting link
[ 7993.060694] ata6: link is slow to respond, please be patient (ready=0)
[ 8022.736150] ata6: COMRESET failed (errno=-16)
[ 8022.741030] ata6: limiting SATA link speed to 1.5 Gbps
[ 8022.746767] ata6: hard resetting link
[ 8027.744864] ata6: COMRESET failed (errno=-16)
[ 8027.749725] ata6: reset failed, giving up
[ 8027.754210] ata6.00: disabled
[ 8027.757545] ata6: EH complete

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-01 15:33                                 ` Christoph Lameter
@ 2013-04-02  0:42                                   ` Joonsoo Kim
  2013-04-02  6:48                                     ` Pekka Enberg
                                                       ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Joonsoo Kim @ 2013-04-02  0:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Gortmaker, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

Hello, Christoph.

On Mon, Apr 01, 2013 at 03:33:23PM +0000, Christoph Lameter wrote:
> Subject: slub: Fix object counts in acquire_slab V2
> 
> It seems that we were overallocating objects from the slab queues
> since get_partial_node() assumed that page->inuse was undisturbed by
> acquire_slab(). Save the # of objects in page->lru.next in acquire_slab()
> and pass it to get_partial_node() that way.
> 
> I have a vague memory that Joonsoo also ran into this issue awhile back.

Yes. I sent a patch for this two month ago. :)

> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-03-28 12:14:26.958358688 -0500
> +++ linux/mm/slub.c	2013-04-01 10:23:24.677584499 -0500
> @@ -1498,6 +1498,7 @@ static inline void *acquire_slab(struct
>  	void *freelist;
>  	unsigned long counters;
>  	struct page new;
> +	unsigned long objects;
> 
>  	/*
>  	 * Zap the freelist and set the frozen bit.
> @@ -1507,6 +1508,7 @@ static inline void *acquire_slab(struct
>  	freelist = page->freelist;
>  	counters = page->counters;
>  	new.counters = counters;
> +	objects = page->inuse;
>  	if (mode) {
>  		new.inuse = page->objects;
>  		new.freelist = NULL;
> @@ -1524,6 +1526,7 @@ static inline void *acquire_slab(struct
>  		return NULL;
> 
>  	remove_partial(n, page);
> +	page->lru.next = (void *)objects;
>  	WARN_ON(!freelist);
>  	return freelist;
>  }

Good. I like your method which use lru.next in order to hand over
number of objects.

> @@ -1565,7 +1568,7 @@ static void *get_partial_node(struct kme
>  			c->page = page;
>  			stat(s, ALLOC_FROM_PARTIAL);
>  			object = t;
> -			available =  page->objects - page->inuse;
> +			available =  page->objects - (unsigned long)page->lru.next;
>  		} else {
>  			available = put_cpu_partial(s, page, 0);
>  			stat(s, CPU_PARTIAL_NODE);

We need one more fix for correctness.
When available is assigned by put_cpu_partial, it doesn't count cpu slab's objects.
Please reference my old patch.

https://lkml.org/lkml/2013/1/21/64

Thanks.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-01 15:32                               ` Christoph Lameter
  2013-04-01 16:06                                   ` Paul Gortmaker
  2013-04-01 21:46                                   ` Paul Gortmaker
@ 2013-04-02  1:37                                 ` Joonsoo Kim
  2 siblings, 0 replies; 51+ messages in thread
From: Joonsoo Kim @ 2013-04-02  1:37 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Gortmaker, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

Hello, Christoph.

On Mon, Apr 01, 2013 at 03:32:43PM +0000, Christoph Lameter wrote:
> On Thu, 28 Mar 2013, Paul Gortmaker wrote:
> 
> > > Index: linux/init/Kconfig
> > > ===================================================================
> > > --- linux.orig/init/Kconfig     2013-03-28 12:14:26.958358688 -0500
> > > +++ linux/init/Kconfig  2013-03-28 12:19:46.275866639 -0500
> > > @@ -1514,6 +1514,14 @@ config SLOB
> > >
> > >  endchoice
> > >
> > > +config SLUB_CPU_PARTIAL
> > > +       depends on SLUB
> > > +       bool "SLUB per cpu partial cache"
> > > +       help
> > > +         Per cpu partial caches accellerate freeing of objects at the
> > > +         price of more indeterminism in the latency of the free.
> > > +         Typically one would choose no for a realtime system.
> >
> > Is "batch" a better description than "accelerate" ?  Something like
> 
> Its not a batching but a cache that is going to be mainly used for new
> allocations on the same processor.
> 
> >      Per cpu partial caches allows batch freeing of objects to maximize
> >      throughput.  However, this can increase the length of time spent
> >      holding key locks, which can increase latency spikes with respect
> >      to responsiveness.  Select yes unless you are tuning for a realtime
> >      oriented system.
> >
> > Also, I believe this will cause a behaviour change for people who
> > just run "make oldconfig" -- since there is no default line.  Meaning
> > that it used to be unconditionally on, but now I think it will be off
> > by default, if people just mindlessly hold down Enter key.
> 
> Ok.
> >
> > For RT, we'll want default N if RT_FULL (RT_BASE?) but for mainline,
> > I expect you'll want default Y in order to be consistent with previous
> > behaviour?
> 
> I was not sure exactly how to handle that one yet for realtime. So I need
> two different patches?
> 
> > I've not built/booted yet, but I'll follow up if I see anything else in doing
> > that.
> 
> Here is an updated patch. I will also send an updated fixup patch.
> 
> 
> Subject: slub: Make cpu partial slab support configurable V2
> 
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/include/linux/slub_def.h
> ===================================================================
> --- linux.orig/include/linux/slub_def.h	2013-04-01 10:27:05.908964674 -0500
> +++ linux/include/linux/slub_def.h	2013-04-01 10:27:19.905178531 -0500
> @@ -47,7 +47,9 @@ struct kmem_cache_cpu {
>  	void **freelist;	/* Pointer to next available object */
>  	unsigned long tid;	/* Globally unique transaction id */
>  	struct page *page;	/* The slab from which we are allocating */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	struct page *partial;	/* Partially allocated frozen slabs */
> +#endif
>  #ifdef CONFIG_SLUB_STATS
>  	unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
> @@ -84,7 +86,9 @@ struct kmem_cache {
>  	int size;		/* The size of an object including meta data */
>  	int object_size;	/* The size of an object without meta data */
>  	int offset;		/* Free pointer offset. */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	int cpu_partial;	/* Number of per cpu partial objects to keep around */
> +#endif
>  	struct kmem_cache_order_objects oo;
> 
>  	/* Allocation and freeing of slabs */

When !CONFIG_SLUB_CPU_PARTIAL, should we remove these variable?
Without removing these, we can make code more simpler and maintainable.

> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-04-01 10:27:05.908964674 -0500
> +++ linux/mm/slub.c	2013-04-01 10:27:19.905178531 -0500
> @@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
>  	return freelist;
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
> +#endif
>  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
> 
>  /*
> @@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
>  			object = t;
>  			available =  page->objects - (unsigned long)page->lru.next;
>  		} else {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			available = put_cpu_partial(s, page, 0);
>  			stat(s, CPU_PARTIAL_NODE);
> +#else
> +			BUG();
> +#endif
>  		}
> -		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
> +		if (kmem_cache_debug(s) ||
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +			available > s->cpu_partial / 2
> +#else
> +			available > 0
> +#endif
> +			)
>  			break;
> 
>  	}

How about introducing wrapper function, cpu_partial_enabled()
like as kmem_cache_debug()?

int cpu_partial_enabled(s)
{
	return kmem_cache_debug(s) || blablabla
}

As you already know, when kmem_cache_debug() is enabled,
cpu partial is maintained as zero. How about re-using this property
for implementing !CONFIG_CPU_PARTIAL?

Thanks.

> @@ -1874,6 +1886,7 @@ redo:
>  	}
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  /*
>   * Unfreeze all the cpu partial slabs.
>   *
> @@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
>  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
>  	return pobjects;
>  }
> +#endif
> 
>  static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>  {
> @@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
>  		if (c->page)
>  			flush_slab(s, c);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  		unfreeze_partials(s, c);
> +#endif
>  	}
>  }
> 
> @@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
>  	struct kmem_cache *s = info;
>  	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	return c->page || c->partial;
> +#else
> +	return c->page;
> +#endif
>  }
> 
>  static void flush_all(struct kmem_cache *s)
> @@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
>  	page = c->page;
>  	if (!page)
>  		goto new_slab;
> +
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  redo:
> +#endif
> 
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
> @@ -2278,6 +2301,7 @@ load_freelist:
> 
>  new_slab:
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	if (c->partial) {
>  		page = c->page = c->partial;
>  		c->partial = page->next;
> @@ -2285,6 +2309,7 @@ new_slab:
>  		c->freelist = NULL;
>  		goto redo;
>  	}
> +#endif
> 
>  	freelist = new_slab_objects(s, gfpflags, node, &c);
> 
> @@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
>  		new.inuse--;
>  		if ((!new.inuse || !prior) && !was_frozen) {
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			if (!kmem_cache_debug(s) && !prior)
> 
>  				/*
> @@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
>  				 */
>  				new.frozen = 1;
> 
> -			else { /* Needs to be taken off a list */
> +			else
> +#endif
> +		       		{ /* Needs to be taken off a list */
> 
>  	                        n = get_node(s, page_to_nid(page));
>  				/*
> @@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
>  		"__slab_free"));
> 
>  	if (likely(!n)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> 
>  		/*
>  		 * If we just froze the page then put it onto the
> @@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
>  			put_cpu_partial(s, page, 1);
>  			stat(s, CPU_PARTIAL_FREE);
>  		}
> +#endif
>  		/*
>  		 * The list lock was not taken therefore no list
>  		 * activity can be necessary.
> @@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
>  	 * list to avoid pounding the page allocator excessively.
>  	 */
>  	set_min_partial(s, ilog2(s->size) / 2);
> -
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	/*
>  	 * cpu_partial determined the maximum number of objects kept in the
>  	 * per cpu partial lists of a processor.
> @@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
>  		s->cpu_partial = 13;
>  	else
>  		s->cpu_partial = 30;
> +#endif
> 
>  #ifdef CONFIG_NUMA
>  	s->remote_node_defrag_ratio = 1000;
> @@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
>  			total += x;
>  			nodes[node] += x;
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			page = ACCESS_ONCE(c->partial);
>  			if (page) {
>  				x = page->pobjects;
>  				total += x;
>  				nodes[node] += x;
>  			}
> -
> +#endif
>  			per_cpu[node]++;
>  		}
>  	}
> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
> 
> +#ifdef CONFIG_CPU_PARTIAL
>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
>  	return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
> 
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
>  	return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
> 
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL
>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
> 
>  static struct attribute *slab_attrs[] = {
>  	&slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
>  	&objs_per_slab_attr.attr,
>  	&order_attr.attr,
>  	&min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL
>  	&cpu_partial_attr.attr,
> +#endif
>  	&objects_attr.attr,
>  	&objects_partial_attr.attr,
>  	&partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
>  	&destroy_by_rcu_attr.attr,
>  	&shrink_attr.attr,
>  	&reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>  	&total_objects_attr.attr,
>  	&slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
>  	&order_fallback_attr.attr,
>  	&cmpxchg_double_fail_attr.attr,
>  	&cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&cpu_partial_alloc_attr.attr,
>  	&cpu_partial_free_attr.attr,
>  	&cpu_partial_node_attr.attr,
>  	&cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>  	&failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig	2013-04-01 10:27:05.908964674 -0500
> +++ linux/init/Kconfig	2013-04-01 10:31:46.497863625 -0500
> @@ -1514,6 +1514,17 @@ config SLOB
> 
>  endchoice
> 
> +config SLUB_CPU_PARTIAL
> +	default y
> +	depends on SLUB
> +	bool "SLUB per cpu partial cache"
> +	help
> +	  Per cpu partial caches accellerate objects allocation and freeing
> +	  that is local to a processor at the price of more indeterminism
> +	  in the latency of the free. On overflow these caches will be cleared
> +	  which requires the taking of locks that may cause latency spikes.
> +	  Typically one would choose no for a realtime system.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>  	bool "Allow mmapped anonymous memory to be uninitialized"
>  	depends on EXPERT && !MMU
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-02  0:42                                   ` Joonsoo Kim
@ 2013-04-02  6:48                                     ` Pekka Enberg
  2013-04-02 19:25                                     ` Christoph Lameter
  2013-04-08 12:32                                     ` Steven Rostedt
  2 siblings, 0 replies; 51+ messages in thread
From: Pekka Enberg @ 2013-04-02  6:48 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Paul Gortmaker, Steven Rostedt, LKML, RT,
	Thomas Gleixner, Clark Williams

On Tue, Apr 2, 2013 at 3:42 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> I have a vague memory that Joonsoo also ran into this issue awhile back.
>
> Yes. I sent a patch for this two month ago. :)

It's applied now.

                       Pekka

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-02  0:42                                   ` Joonsoo Kim
  2013-04-02  6:48                                     ` Pekka Enberg
@ 2013-04-02 19:25                                     ` Christoph Lameter
  2013-04-04  0:58                                       ` Joonsoo Kim
  2013-04-08 12:32                                     ` Steven Rostedt
  2 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-04-02 19:25 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Paul Gortmaker, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Tue, 2 Apr 2013, Joonsoo Kim wrote:

> We need one more fix for correctness.
> When available is assigned by put_cpu_partial, it doesn't count cpu slab's objects.
> Please reference my old patch.
>
> https://lkml.org/lkml/2013/1/21/64

Could you update your patch and submit it again?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-02 19:25                                     ` Christoph Lameter
@ 2013-04-04  0:58                                       ` Joonsoo Kim
  2013-04-04 13:53                                         ` Christoph Lameter
  0 siblings, 1 reply; 51+ messages in thread
From: Joonsoo Kim @ 2013-04-04  0:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Gortmaker, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

Hello, Christoph.

On Tue, Apr 02, 2013 at 07:25:20PM +0000, Christoph Lameter wrote:
> On Tue, 2 Apr 2013, Joonsoo Kim wrote:
> 
> > We need one more fix for correctness.
> > When available is assigned by put_cpu_partial, it doesn't count cpu slab's objects.
> > Please reference my old patch.
> >
> > https://lkml.org/lkml/2013/1/21/64
> 
> Could you update your patch and submit it again?

Pekka alreay applied it.
Do we need update?

Thanks.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-04  0:58                                       ` Joonsoo Kim
@ 2013-04-04 13:53                                         ` Christoph Lameter
  2013-04-05  2:05                                           ` Joonsoo Kim
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-04-04 13:53 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Paul Gortmaker, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Thu, 4 Apr 2013, Joonsoo Kim wrote:

> Pekka alreay applied it.
> Do we need update?

Well I thought the passing of the count via lru.next would be something
worthwhile to pick up.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-04 13:53                                         ` Christoph Lameter
@ 2013-04-05  2:05                                           ` Joonsoo Kim
  2013-04-05 14:35                                             ` Christoph Lameter
  0 siblings, 1 reply; 51+ messages in thread
From: Joonsoo Kim @ 2013-04-05  2:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Gortmaker, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Thu, Apr 04, 2013 at 01:53:25PM +0000, Christoph Lameter wrote:
> On Thu, 4 Apr 2013, Joonsoo Kim wrote:
> 
> > Pekka alreay applied it.
> > Do we need update?
> 
> Well I thought the passing of the count via lru.next would be something
> worthwhile to pick up.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Hello, Pekka.

Here goes a patch implementing Christoph's idea.
Instead of updating my previous patch, I re-write this patch on top of
your slab/next tree.

Thanks.

------------------------8<-------------------------------
>From e1c18793dd2a9d9cef87b07faf975364b71276d7 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Fri, 5 Apr 2013 10:49:36 +0900
Subject: [PATCH] slub: use page->lru.next to calculate nr of acquired object

We can pass inuse count via page->lru.next in order to calculate number
of acquired objects and it is more beautiful way. This reduces one
function argument and makes clean code.

Cc: Christoph Lameter <cl@linux.com>
Suggested-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/slub.c b/mm/slub.c
index 21b3f00..8a35464 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1493,11 +1493,12 @@ static inline void remove_partial(struct kmem_cache_node *n,
  */
 static inline void *acquire_slab(struct kmem_cache *s,
 		struct kmem_cache_node *n, struct page *page,
-		int mode, int *objects)
+		int mode)
 {
 	void *freelist;
 	unsigned long counters;
 	struct page new;
+	unsigned long inuse;
 
 	/*
 	 * Zap the freelist and set the frozen bit.
@@ -1507,7 +1508,7 @@ static inline void *acquire_slab(struct kmem_cache *s,
 	freelist = page->freelist;
 	counters = page->counters;
 	new.counters = counters;
-	*objects = new.objects - new.inuse;
+	inuse = page->inuse;
 	if (mode) {
 		new.inuse = page->objects;
 		new.freelist = NULL;
@@ -1525,6 +1526,7 @@ static inline void *acquire_slab(struct kmem_cache *s,
 		return NULL;
 
 	remove_partial(n, page);
+	page->lru.next = (void *)inuse;
 	WARN_ON(!freelist);
 	return freelist;
 }
@@ -1541,7 +1543,6 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 	struct page *page, *page2;
 	void *object = NULL;
 	int available = 0;
-	int objects;
 
 	/*
 	 * Racy check. If we mistakenly see no partial slabs then we
@@ -1559,11 +1560,11 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 		if (!pfmemalloc_match(page, flags))
 			continue;
 
-		t = acquire_slab(s, n, page, object == NULL, &objects);
+		t = acquire_slab(s, n, page, object == NULL);
 		if (!t)
 			break;
 
-		available += objects;
+		available += (page->objects - (unsigned long)page->lru.next);
 		if (!object) {
 			c->page = page;
 			stat(s, ALLOC_FROM_PARTIAL);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-05  2:05                                           ` Joonsoo Kim
@ 2013-04-05 14:35                                             ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2013-04-05 14:35 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Paul Gortmaker, Steven Rostedt, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Fri, 5 Apr 2013, Joonsoo Kim wrote:

> Here goes a patch implementing Christoph's idea.
> Instead of updating my previous patch, I re-write this patch on top of
> your slab/next tree.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-28 17:29                         ` Christoph Lameter
@ 2013-04-08 12:25                           ` Steven Rostedt
  0 siblings, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2013-04-08 12:25 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, LKML, RT, Thomas Gleixner, Clark Williams, Pekka Enberg

Sorry for the late reply, but I just got back from vacation.

On Thu, 2013-03-28 at 17:29 +0000, Christoph Lameter wrote:
> Two patches against slub to enable deconfiguring cpu_partial support.
> 
> First one is a bug fix (Pekka please pick up this one or use Joonsoo's
> earlier one).
> 
> 
> Subject: slub: Fix object counts in acquire_slab
> 
> It seems that we were overallocating objects from the slab queues
> since get_partial_node() assumed that page->inuse was undisturbed by
> acquire_slab(). Save the # of objects in page->lru.next in acquire_slab()
> and pass it to get_partial_node() that way.
> 
> I have a vague memory that Joonsoo also ran into this issue awhile back.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-03-28 12:14:26.958358688 -0500
> +++ linux/mm/slub.c	2013-03-28 12:16:57.240785613 -0500
> @@ -1498,6 +1498,7 @@ static inline void *acquire_slab(struct
>  	void *freelist;
>  	unsigned long counters;
>  	struct page new;
> +	unsigned long objects;
> 
>  	/*
>  	 * Zap the freelist and set the frozen bit.
> @@ -1507,6 +1508,7 @@ static inline void *acquire_slab(struct
>  	freelist = page->freelist;
>  	counters = page->counters;
>  	new.counters = counters;
> +	objects = page->objects;
>  	if (mode) {
>  		new.inuse = page->objects;
>  		new.freelist = NULL;
> @@ -1524,6 +1526,7 @@ static inline void *acquire_slab(struct
>  		return NULL;
> 
>  	remove_partial(n, page);
> +	page->lru.next = (void *)objects;
>  	WARN_ON(!freelist);
>  	return freelist;
>  }
> @@ -1565,7 +1568,7 @@ static void *get_partial_node(struct kme
>  			c->page = page;
>  			stat(s, ALLOC_FROM_PARTIAL);
>  			object = t;
> -			available =  page->objects - page->inuse;
> +			available =  page->objects - (unsigned long)page->lru.next;

Ug, this is a really nasty hack. To avoid the curses you will get by the
one that takes over this code when you retire, I suggest just have
"objects" passed back as a reference parameter. I only see
acquire_slab() used here. Why not just pass in a pointer and have that
assigned objects instead of using a error prone use of lru.next?

-- Steve

>  		} else {
>  			available = put_cpu_partial(s, page, 0);
>  			stat(s, CPU_PARTIAL_NODE);



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-02  0:42                                   ` Joonsoo Kim
  2013-04-02  6:48                                     ` Pekka Enberg
  2013-04-02 19:25                                     ` Christoph Lameter
@ 2013-04-08 12:32                                     ` Steven Rostedt
  2013-04-10  6:31                                       ` Pekka Enberg
  2 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-04-08 12:32 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Paul Gortmaker, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Tue, 2013-04-02 at 09:42 +0900, Joonsoo Kim wrote:

> > 
> > Signed-off-by: Christoph Lameter <cl@linux.com>
> > 
> > Index: linux/mm/slub.c
> > ===================================================================
> > --- linux.orig/mm/slub.c	2013-03-28 12:14:26.958358688 -0500
> > +++ linux/mm/slub.c	2013-04-01 10:23:24.677584499 -0500
> > @@ -1498,6 +1498,7 @@ static inline void *acquire_slab(struct
> >  	void *freelist;
> >  	unsigned long counters;
> >  	struct page new;
> > +	unsigned long objects;
> > 
> >  	/*
> >  	 * Zap the freelist and set the frozen bit.
> > @@ -1507,6 +1508,7 @@ static inline void *acquire_slab(struct
> >  	freelist = page->freelist;
> >  	counters = page->counters;
> >  	new.counters = counters;
> > +	objects = page->inuse;
> >  	if (mode) {
> >  		new.inuse = page->objects;
> >  		new.freelist = NULL;
> > @@ -1524,6 +1526,7 @@ static inline void *acquire_slab(struct
> >  		return NULL;
> > 
> >  	remove_partial(n, page);
> > +	page->lru.next = (void *)objects;
> >  	WARN_ON(!freelist);
> >  	return freelist;
> >  }
> 
> Good. I like your method which use lru.next in order to hand over
> number of objects.

I hate it ;-)

It just seems to be something that's not very robust and can cause hours
of debugging in the future. I mean, there's not even a comment
explaining what is happening. The lru is a union with other slub
partials structs that is not very obvious. If something is out of order,
it can easily break, and there's nothing here that points to why.

Just pass the damn objects pointer by reference and use that. It's easy
to understand, read and is robust.

-- Steve

> 
> > @@ -1565,7 +1568,7 @@ static void *get_partial_node(struct kme
> >  			c->page = page;
> >  			stat(s, ALLOC_FROM_PARTIAL);
> >  			object = t;
> > -			available =  page->objects - page->inuse;
> > +			available =  page->objects - (unsigned long)page->lru.next;
> >  		} else {
> >  			available = put_cpu_partial(s, page, 0);
> >  			stat(s, CPU_PARTIAL_NODE);
> 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-08 12:32                                     ` Steven Rostedt
@ 2013-04-10  6:31                                       ` Pekka Enberg
  2013-04-10  7:31                                         ` Joonsoo Kim
  0 siblings, 1 reply; 51+ messages in thread
From: Pekka Enberg @ 2013-04-10  6:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joonsoo Kim, Christoph Lameter, Paul Gortmaker, LKML, RT,
	Thomas Gleixner, Clark Williams

On Mon, Apr 8, 2013 at 3:32 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > Index: linux/mm/slub.c
>> > ===================================================================
>> > --- linux.orig/mm/slub.c    2013-03-28 12:14:26.958358688 -0500
>> > +++ linux/mm/slub.c 2013-04-01 10:23:24.677584499 -0500
>> > @@ -1498,6 +1498,7 @@ static inline void *acquire_slab(struct
>> >     void *freelist;
>> >     unsigned long counters;
>> >     struct page new;
>> > +   unsigned long objects;
>> >
>> >     /*
>> >      * Zap the freelist and set the frozen bit.
>> > @@ -1507,6 +1508,7 @@ static inline void *acquire_slab(struct
>> >     freelist = page->freelist;
>> >     counters = page->counters;
>> >     new.counters = counters;
>> > +   objects = page->inuse;
>> >     if (mode) {
>> >             new.inuse = page->objects;
>> >             new.freelist = NULL;
>> > @@ -1524,6 +1526,7 @@ static inline void *acquire_slab(struct
>> >             return NULL;
>> >
>> >     remove_partial(n, page);
>> > +   page->lru.next = (void *)objects;
>> >     WARN_ON(!freelist);
>> >     return freelist;
>> >  }
>>
>> Good. I like your method which use lru.next in order to hand over
>> number of objects.
>
> I hate it ;-)
>
> It just seems to be something that's not very robust and can cause hours
> of debugging in the future. I mean, there's not even a comment
> explaining what is happening. The lru is a union with other slub
> partials structs that is not very obvious. If something is out of order,
> it can easily break, and there's nothing here that points to why.
>
> Just pass the damn objects pointer by reference and use that. It's easy
> to understand, read and is robust.

Christoph, Joonsoo, comments?

                                Pekka

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-10  6:31                                       ` Pekka Enberg
@ 2013-04-10  7:31                                         ` Joonsoo Kim
  2013-04-10 14:00                                           ` Christoph Lameter
  0 siblings, 1 reply; 51+ messages in thread
From: Joonsoo Kim @ 2013-04-10  7:31 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Steven Rostedt, Christoph Lameter, Paul Gortmaker, LKML, RT,
	Thomas Gleixner, Clark Williams

On Wed, Apr 10, 2013 at 09:31:10AM +0300, Pekka Enberg wrote:
> On Mon, Apr 8, 2013 at 3:32 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >> > Index: linux/mm/slub.c
> >> > ===================================================================
> >> > --- linux.orig/mm/slub.c    2013-03-28 12:14:26.958358688 -0500
> >> > +++ linux/mm/slub.c 2013-04-01 10:23:24.677584499 -0500
> >> > @@ -1498,6 +1498,7 @@ static inline void *acquire_slab(struct
> >> >     void *freelist;
> >> >     unsigned long counters;
> >> >     struct page new;
> >> > +   unsigned long objects;
> >> >
> >> >     /*
> >> >      * Zap the freelist and set the frozen bit.
> >> > @@ -1507,6 +1508,7 @@ static inline void *acquire_slab(struct
> >> >     freelist = page->freelist;
> >> >     counters = page->counters;
> >> >     new.counters = counters;
> >> > +   objects = page->inuse;
> >> >     if (mode) {
> >> >             new.inuse = page->objects;
> >> >             new.freelist = NULL;
> >> > @@ -1524,6 +1526,7 @@ static inline void *acquire_slab(struct
> >> >             return NULL;
> >> >
> >> >     remove_partial(n, page);
> >> > +   page->lru.next = (void *)objects;
> >> >     WARN_ON(!freelist);
> >> >     return freelist;
> >> >  }
> >>
> >> Good. I like your method which use lru.next in order to hand over
> >> number of objects.
> >
> > I hate it ;-)
> >
> > It just seems to be something that's not very robust and can cause hours
> > of debugging in the future. I mean, there's not even a comment
> > explaining what is happening. The lru is a union with other slub
> > partials structs that is not very obvious. If something is out of order,
> > it can easily break, and there's nothing here that points to why.
> >
> > Just pass the damn objects pointer by reference and use that. It's easy
> > to understand, read and is robust.
> 
> Christoph, Joonsoo, comments?

Steven's comment is reasonable to me.

If there is no objection from Christoph,
let's drop a patch in which I implement Christoph's idea.

Thanks.

> 
>                                 Pekka
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-10  7:31                                         ` Joonsoo Kim
@ 2013-04-10 14:00                                           ` Christoph Lameter
  2013-04-10 14:09                                             ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-04-10 14:00 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Pekka Enberg, Steven Rostedt, Paul Gortmaker, LKML, RT,
	Thomas Gleixner, Clark Williams

On Wed, 10 Apr 2013, Joonsoo Kim wrote:

> > Christoph, Joonsoo, comments?
>
> Steven's comment is reasonable to me.
>
> If there is no objection from Christoph,
> let's drop a patch in which I implement Christoph's idea.

Fine with me.

I do not like passing a reference just because we have to
return an additional counter. Adds a lot of overhead.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-10 14:00                                           ` Christoph Lameter
@ 2013-04-10 14:09                                             ` Steven Rostedt
  0 siblings, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2013-04-10 14:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Pekka Enberg, Paul Gortmaker, LKML, RT,
	Thomas Gleixner, Clark Williams

On Wed, 2013-04-10 at 14:00 +0000, Christoph Lameter wrote:
> On Wed, 10 Apr 2013, Joonsoo Kim wrote:
> 
> > > Christoph, Joonsoo, comments?
> >
> > Steven's comment is reasonable to me.
> >
> > If there is no objection from Christoph,
> > let's drop a patch in which I implement Christoph's idea.
> 
> Fine with me.
> 
> I do not like passing a reference just because we have to
> return an additional counter. Adds a lot of overhead.
> 

The reference is only passed to acquire_slab() which is a static inline
function that's only called by this function. You would think that gcc
can optimize that to remove the "lot of overhead".

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-28 17:30                           ` Christoph Lameter
  2013-03-29  2:43                             ` Paul Gortmaker
@ 2013-04-11 16:05                             ` Steven Rostedt
  2013-04-11 16:42                               ` Christoph Lameter
  2013-05-28 14:39                             ` Steven Rostedt
  2 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2013-04-11 16:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, LKML, RT, Thomas Gleixner, Clark Williams, Pekka Enberg

On Thu, 2013-03-28 at 17:30 +0000, Christoph Lameter wrote:
> This patch requires the earlier bug fix.
> 
> 
> Subject: slub: Make cpu partial slab support configurable
> 
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 

Hi Christoph,

I was wondering if you made any more forward progress with with patch
yet. When it goes into mainline, I'd like to backport it to the -rt
stable trees, and will probably make it enabled by default when
PREEMPT_RT is enabled.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-11 16:05                             ` Steven Rostedt
@ 2013-04-11 16:42                               ` Christoph Lameter
  2013-04-12  6:48                                   ` Pekka Enberg
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-04-11 16:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joonsoo Kim, LKML, RT, Thomas Gleixner, Clark Williams, Pekka Enberg

On Thu, 11 Apr 2013, Steven Rostedt wrote:

> I was wondering if you made any more forward progress with with patch
> yet. When it goes into mainline, I'd like to backport it to the -rt
> stable trees, and will probably make it enabled by default when
> PREEMPT_RT is enabled.

Sorry I did not get around to it yet. Also waiting for Joonsoo's patch to
be merged first. Which tree may I use Pekka?




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-04-11 16:42                               ` Christoph Lameter
@ 2013-04-12  6:48                                   ` Pekka Enberg
  0 siblings, 0 replies; 51+ messages in thread
From: Pekka Enberg @ 2013-04-12  6:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, Joonsoo Kim, LKML, RT, Thomas Gleixner, Clark Williams

Hello,

On 04/11/2013 07:42 PM, Christoph Lameter wrote:
> On Thu, 11 Apr 2013, Steven Rostedt wrote:
>
>> I was wondering if you made any more forward progress with with patch
>> yet. When it goes into mainline, I'd like to backport it to the -rt
>> stable trees, and will probably make it enabled by default when
>> PREEMPT_RT is enabled.
>
> Sorry I did not get around to it yet. Also waiting for Joonsoo's patch to
> be merged first. Which tree may I use Pekka?

https://github.com/penberg/linux/commits/slab/next

and

https://git.kernel.org/cgit/linux/kernel/git/penberg/linux.git/log/?h=slab/next

are the same and include what I've queued for 3.10.

I'm waiting for a new version of Joonsoon's patch to appear.

			Pekka

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
@ 2013-04-12  6:48                                   ` Pekka Enberg
  0 siblings, 0 replies; 51+ messages in thread
From: Pekka Enberg @ 2013-04-12  6:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, Joonsoo Kim, LKML, RT, Thomas Gleixner, Clark Williams

Hello,

On 04/11/2013 07:42 PM, Christoph Lameter wrote:
> On Thu, 11 Apr 2013, Steven Rostedt wrote:
>
>> I was wondering if you made any more forward progress with with patch
>> yet. When it goes into mainline, I'd like to backport it to the -rt
>> stable trees, and will probably make it enabled by default when
>> PREEMPT_RT is enabled.
>
> Sorry I did not get around to it yet. Also waiting for Joonsoo's patch to
> be merged first. Which tree may I use Pekka?

https://github.com/penberg/linux/commits/slab/next

and

https://git.kernel.org/cgit/linux/kernel/git/penberg/linux.git/log/?h=slab/next

are the same and include what I've queued for 3.10.

I'm waiting for a new version of Joonsoon's patch to appear.

			Pekka

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-03-28 17:30                           ` Christoph Lameter
  2013-03-29  2:43                             ` Paul Gortmaker
  2013-04-11 16:05                             ` Steven Rostedt
@ 2013-05-28 14:39                             ` Steven Rostedt
  2013-05-28 16:22                               ` Christoph Lameter
       [not found]                               ` <alpine.DEB.2.02.1305281121420.1627@gentwo.org>
  2 siblings, 2 replies; 51+ messages in thread
From: Steven Rostedt @ 2013-05-28 14:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, LKML, RT, Thomas Gleixner, Clark Williams, Pekka Enberg

On Thu, 2013-03-28 at 17:30 +0000, Christoph Lameter wrote:
> This patch requires the earlier bug fix.
> 
> 
> Subject: slub: Make cpu partial slab support configurable
> 
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Hi Christoph,

Any progress on this patch?

Thanks!

-- Steve

> 
> Index: linux/include/linux/slub_def.h
> ===================================================================
> --- linux.orig/include/linux/slub_def.h	2013-03-28 12:14:26.958358688 -0500
> +++ linux/include/linux/slub_def.h	2013-03-28 12:19:46.275866639 -0500
> @@ -47,7 +47,9 @@ struct kmem_cache_cpu {
>  	void **freelist;	/* Pointer to next available object */
>  	unsigned long tid;	/* Globally unique transaction id */
>  	struct page *page;	/* The slab from which we are allocating */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	struct page *partial;	/* Partially allocated frozen slabs */
> +#endif
>  #ifdef CONFIG_SLUB_STATS
>  	unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
> @@ -84,7 +86,9 @@ struct kmem_cache {
>  	int size;		/* The size of an object including meta data */
>  	int object_size;	/* The size of an object without meta data */
>  	int offset;		/* Free pointer offset. */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	int cpu_partial;	/* Number of per cpu partial objects to keep around */
> +#endif
>  	struct kmem_cache_order_objects oo;
> 
>  	/* Allocation and freeing of slabs */
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-03-28 12:18:48.446813991 -0500
> +++ linux/mm/slub.c	2013-03-28 12:19:46.275866639 -0500
> @@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
>  	return freelist;
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
> +#endif
>  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
> 
>  /*
> @@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
>  			object = t;
>  			available =  page->objects - (unsigned long)page->lru.next;
>  		} else {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			available = put_cpu_partial(s, page, 0);
>  			stat(s, CPU_PARTIAL_NODE);
> +#else
> +			BUG();
> +#endif
>  		}
> -		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
> +		if (kmem_cache_debug(s) ||
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +			available > s->cpu_partial / 2
> +#else
> +			available > 0
> +#endif
> +			)
>  			break;
> 
>  	}
> @@ -1874,6 +1886,7 @@ redo:
>  	}
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  /*
>   * Unfreeze all the cpu partial slabs.
>   *
> @@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
>  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
>  	return pobjects;
>  }
> +#endif
> 
>  static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>  {
> @@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
>  		if (c->page)
>  			flush_slab(s, c);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  		unfreeze_partials(s, c);
> +#endif
>  	}
>  }
> 
> @@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
>  	struct kmem_cache *s = info;
>  	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	return c->page || c->partial;
> +#else
> +	return c->page;
> +#endif
>  }
> 
>  static void flush_all(struct kmem_cache *s)
> @@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
>  	page = c->page;
>  	if (!page)
>  		goto new_slab;
> +
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  redo:
> +#endif
> 
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
> @@ -2278,6 +2301,7 @@ load_freelist:
> 
>  new_slab:
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	if (c->partial) {
>  		page = c->page = c->partial;
>  		c->partial = page->next;
> @@ -2285,6 +2309,7 @@ new_slab:
>  		c->freelist = NULL;
>  		goto redo;
>  	}
> +#endif
> 
>  	freelist = new_slab_objects(s, gfpflags, node, &c);
> 
> @@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
>  		new.inuse--;
>  		if ((!new.inuse || !prior) && !was_frozen) {
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			if (!kmem_cache_debug(s) && !prior)
> 
>  				/*
> @@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
>  				 */
>  				new.frozen = 1;
> 
> -			else { /* Needs to be taken off a list */
> +			else
> +#endif
> +		       		{ /* Needs to be taken off a list */
> 
>  	                        n = get_node(s, page_to_nid(page));
>  				/*
> @@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
>  		"__slab_free"));
> 
>  	if (likely(!n)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> 
>  		/*
>  		 * If we just froze the page then put it onto the
> @@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
>  			put_cpu_partial(s, page, 1);
>  			stat(s, CPU_PARTIAL_FREE);
>  		}
> +#endif
>  		/*
>  		 * The list lock was not taken therefore no list
>  		 * activity can be necessary.
> @@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
>  	 * list to avoid pounding the page allocator excessively.
>  	 */
>  	set_min_partial(s, ilog2(s->size) / 2);
> -
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	/*
>  	 * cpu_partial determined the maximum number of objects kept in the
>  	 * per cpu partial lists of a processor.
> @@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
>  		s->cpu_partial = 13;
>  	else
>  		s->cpu_partial = 30;
> +#endif
> 
>  #ifdef CONFIG_NUMA
>  	s->remote_node_defrag_ratio = 1000;
> @@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
>  			total += x;
>  			nodes[node] += x;
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			page = ACCESS_ONCE(c->partial);
>  			if (page) {
>  				x = page->pobjects;
>  				total += x;
>  				nodes[node] += x;
>  			}
> -
> +#endif
>  			per_cpu[node]++;
>  		}
>  	}
> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
> 
> +#ifdef CONFIG_CPU_PARTIAL
>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
>  	return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
> 
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
>  	return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
> 
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL
>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
> 
>  static struct attribute *slab_attrs[] = {
>  	&slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
>  	&objs_per_slab_attr.attr,
>  	&order_attr.attr,
>  	&min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL
>  	&cpu_partial_attr.attr,
> +#endif
>  	&objects_attr.attr,
>  	&objects_partial_attr.attr,
>  	&partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
>  	&destroy_by_rcu_attr.attr,
>  	&shrink_attr.attr,
>  	&reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>  	&total_objects_attr.attr,
>  	&slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
>  	&order_fallback_attr.attr,
>  	&cmpxchg_double_fail_attr.attr,
>  	&cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&cpu_partial_alloc_attr.attr,
>  	&cpu_partial_free_attr.attr,
>  	&cpu_partial_node_attr.attr,
>  	&cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>  	&failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig	2013-03-28 12:14:26.958358688 -0500
> +++ linux/init/Kconfig	2013-03-28 12:19:46.275866639 -0500
> @@ -1514,6 +1514,14 @@ config SLOB
> 
>  endchoice
> 
> +config SLUB_CPU_PARTIAL
> +	depends on SLUB
> +	bool "SLUB per cpu partial cache"
> +	help
> +	  Per cpu partial caches accellerate freeing of objects at the
> +	  price of more indeterminism in the latency of the free.
> +	  Typically one would choose no for a realtime system.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>  	bool "Allow mmapped anonymous memory to be uninitialized"
>  	depends on EXPERT && !MMU



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-05-28 14:39                             ` Steven Rostedt
@ 2013-05-28 16:22                               ` Christoph Lameter
       [not found]                               ` <alpine.DEB.2.02.1305281121420.1627@gentwo.org>
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2013-05-28 16:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joonsoo Kim, LKML, RT, Thomas Gleixner, Clark Williams, Pekka Enberg

On Tue, 28 May 2013, Steven Rostedt wrote:

> Any progress on this patch?

Got a new vesion here but have not gotten too much testing done yet.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
       [not found]                               ` <alpine.DEB.2.02.1305281121420.1627@gentwo.org>
@ 2013-05-28 18:24                                 ` Christoph Lameter
  2013-06-03 15:28                                   ` JoonSoo Kim
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-05-28 18:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Joonsoo Kim, LKML, RT, Thomas Gleixner, Clark Williams, Pekka Enberg

I just ran some quick tests and the following seems to work.

Critical portions that need additional review (Joonsoo?) are the
modifications to get_partialinode() and __slab_free().


Subject: slub: Make cpu partial slab support configurable V2

cpu partial support can introduce level of indeterminism that is not wanted
in certain context (like a realtime kernel). Make it configurable.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/include/linux/slub_def.h
===================================================================
--- linux.orig/include/linux/slub_def.h	2013-05-20 15:21:26.925704505 -0500
+++ linux/include/linux/slub_def.h	2013-05-20 15:27:40.827613445 -0500
@@ -47,12 +47,24 @@ struct kmem_cache_cpu {
 	void **freelist;	/* Pointer to next available object */
 	unsigned long tid;	/* Globally unique transaction id */
 	struct page *page;	/* The slab from which we are allocating */
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *partial;	/* Partially allocated frozen slabs */
+#endif
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif
 };

+static inline struct page *kmem_cache_cpu_partial_pages
+			(struct kmem_cache_cpu *c)
+{
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+	return c->partial;
+#else
+	return NULL;
+#endif
+}
+
 /*
  * Word size structure that can be atomically updated or read and that
  * contains both the order and the number of objects that a slab of the
@@ -73,7 +85,9 @@ struct kmem_cache {
 	int size;		/* The size of an object including meta data */
 	int object_size;	/* The size of an object without meta data */
 	int offset;		/* Free pointer offset. */
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	int cpu_partial;	/* Number of per cpu partial objects to keep around */
+#endif
 	struct kmem_cache_order_objects oo;

 	/* Allocation and freeing of slabs */
@@ -104,6 +118,15 @@ struct kmem_cache {
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };

+static inline int kmem_cache_cpu_partial(struct kmem_cache *s)
+{
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+	return s->cpu_partial;
+#else
+	return 0;
+#endif
+}
+
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
 void *__kmalloc(size_t size, gfp_t flags);

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-05-20 15:21:26.925704505 -0500
+++ linux/mm/slub.c	2013-05-20 15:38:00.681437630 -0500
@@ -1530,7 +1530,9 @@ static inline void *acquire_slab(struct
 	return freelist;
 }

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
+#endif
 static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);

 /*
@@ -1570,10 +1572,15 @@ static void *get_partial_node(struct kme
 			stat(s, ALLOC_FROM_PARTIAL);
 			object = t;
 		} else {
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
+#else
+			BUG();
+#endif
 		}
-		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
+		if (kmem_cache_debug(s) ||
+			       available > kmem_cache_cpu_partial(s) / 2)
 			break;

 	}
@@ -1874,6 +1881,7 @@ redo:
 	}
 }

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 /*
  * Unfreeze all the cpu partial slabs.
  *
@@ -1988,6 +1996,7 @@ static void put_cpu_partial(struct kmem_

 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
 }
+#endif

 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
@@ -2012,7 +2021,9 @@ static inline void __flush_cpu_slab(stru
 		if (c->page)
 			flush_slab(s, c);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 		unfreeze_partials(s, c);
+#endif
 	}
 }

@@ -2028,7 +2039,7 @@ static bool has_cpu_slab(int cpu, void *
 	struct kmem_cache *s = info;
 	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);

-	return c->page || c->partial;
+	return c->page || kmem_cache_cpu_partial_pages(c);
 }

 static void flush_all(struct kmem_cache *s)
@@ -2224,6 +2235,7 @@ static void *__slab_alloc(struct kmem_ca
 	page = c->page;
 	if (!page)
 		goto new_slab;
+
 redo:

 	if (unlikely(!node_match(page, node))) {
@@ -2277,10 +2289,12 @@ load_freelist:

 new_slab:

-	if (c->partial) {
+	if (kmem_cache_cpu_partial_pages(c)) {
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 		page = c->page = c->partial;
 		c->partial = page->next;
 		stat(s, CPU_PARTIAL_ALLOC);
+#endif
 		c->freelist = NULL;
 		goto redo;
 	}
@@ -2495,6 +2509,7 @@ static void __slab_free(struct kmem_cach
 		new.inuse--;
 		if ((!new.inuse || !prior) && !was_frozen) {

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			if (!kmem_cache_debug(s) && !prior)

 				/*
@@ -2503,7 +2518,9 @@ static void __slab_free(struct kmem_cach
 				 */
 				new.frozen = 1;

-			else { /* Needs to be taken off a list */
+			else
+#endif
+		       		{ /* Needs to be taken off a list */

 	                        n = get_node(s, page_to_nid(page));
 				/*
@@ -2525,6 +2542,7 @@ static void __slab_free(struct kmem_cach
 		"__slab_free"));

 	if (likely(!n)) {
+#ifdef CONFIG_SLUB_CPU_PARTIAL

 		/*
 		 * If we just froze the page then put it onto the
@@ -2534,6 +2552,7 @@ static void __slab_free(struct kmem_cach
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
+#endif
 		/*
 		 * The list lock was not taken therefore no list
 		 * activity can be necessary.
@@ -3041,7 +3060,7 @@ static int kmem_cache_open(struct kmem_c
 	 * list to avoid pounding the page allocator excessively.
 	 */
 	set_min_partial(s, ilog2(s->size) / 2);
-
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	/*
 	 * cpu_partial determined the maximum number of objects kept in the
 	 * per cpu partial lists of a processor.
@@ -3069,6 +3088,7 @@ static int kmem_cache_open(struct kmem_c
 		s->cpu_partial = 13;
 	else
 		s->cpu_partial = 30;
+#endif

 #ifdef CONFIG_NUMA
 	s->remote_node_defrag_ratio = 1000;
@@ -4283,14 +4303,15 @@ static ssize_t show_slab_objects(struct
 			total += x;
 			nodes[node] += x;

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			page = ACCESS_ONCE(c->partial);
 			if (page) {
 				x = page->pobjects;
 				total += x;
 				nodes[node] += x;
 			}
-
 			per_cpu[node]++;
+#endif
 		}
 	}

@@ -4442,6 +4463,7 @@ static ssize_t min_partial_store(struct
 }
 SLAB_ATTR(min_partial);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%u\n", s->cpu_partial);
@@ -4464,6 +4486,7 @@ static ssize_t cpu_partial_store(struct
 	return length;
 }
 SLAB_ATTR(cpu_partial);
+#endif

 static ssize_t ctor_show(struct kmem_cache *s, char *buf)
 {
@@ -4503,6 +4526,7 @@ static ssize_t objects_partial_show(stru
 }
 SLAB_ATTR_RO(objects_partial);

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
 {
 	int objects = 0;
@@ -4533,6 +4557,7 @@ static ssize_t slabs_cpu_partial_show(st
 	return len + sprintf(buf + len, "\n");
 }
 SLAB_ATTR_RO(slabs_cpu_partial);
+#endif

 static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
 {
@@ -4856,11 +4881,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
 STAT_ATTR(ORDER_FALLBACK, order_fallback);
 STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
 STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
 STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
 STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
 STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
 #endif
+#endif

 static struct attribute *slab_attrs[] = {
 	&slab_size_attr.attr,
@@ -4868,7 +4895,9 @@ static struct attribute *slab_attrs[] =
 	&objs_per_slab_attr.attr,
 	&order_attr.attr,
 	&min_partial_attr.attr,
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	&cpu_partial_attr.attr,
+#endif
 	&objects_attr.attr,
 	&objects_partial_attr.attr,
 	&partial_attr.attr,
@@ -4881,7 +4910,9 @@ static struct attribute *slab_attrs[] =
 	&destroy_by_rcu_attr.attr,
 	&shrink_attr.attr,
 	&reserved_attr.attr,
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	&slabs_cpu_partial_attr.attr,
+#endif
 #ifdef CONFIG_SLUB_DEBUG
 	&total_objects_attr.attr,
 	&slabs_attr.attr,
@@ -4923,11 +4954,13 @@ static struct attribute *slab_attrs[] =
 	&order_fallback_attr.attr,
 	&cmpxchg_double_fail_attr.attr,
 	&cmpxchg_double_cpu_fail_attr.attr,
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	&cpu_partial_alloc_attr.attr,
 	&cpu_partial_free_attr.attr,
 	&cpu_partial_node_attr.attr,
 	&cpu_partial_drain_attr.attr,
 #endif
+#endif
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
 #endif
Index: linux/init/Kconfig
===================================================================
--- linux.orig/init/Kconfig	2013-05-20 15:21:26.925704505 -0500
+++ linux/init/Kconfig	2013-05-20 15:21:26.921704441 -0500
@@ -1558,6 +1558,17 @@ config SLOB

 endchoice

+config SLUB_CPU_PARTIAL
+	default y
+	depends on SLUB
+	bool "SLUB per cpu partial cache"
+	help
+	  Per cpu partial caches accellerate objects allocation and freeing
+	  that is local to a processor at the price of more indeterminism
+	  in the latency of the free. On overflow these caches will be cleared
+	  which requires the taking of locks that may cause latency spikes.
+	  Typically one would choose no for a realtime system.
+
 config MMAP_ALLOW_UNINITIALIZED
 	bool "Allow mmapped anonymous memory to be uninitialized"
 	depends on EXPERT && !MMU

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-05-28 18:24                                 ` Christoph Lameter
@ 2013-06-03 15:28                                   ` JoonSoo Kim
  2013-06-03 19:20                                     ` Christoph Lameter
  2013-06-03 20:41                                     ` Christoph Lameter
  0 siblings, 2 replies; 51+ messages in thread
From: JoonSoo Kim @ 2013-06-03 15:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, Joonsoo Kim, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

Hello, Christoph.

2013/5/29 Christoph Lameter <cl@linux.com>:
> I just ran some quick tests and the following seems to work.
>
> Critical portions that need additional review (Joonsoo?) are the
> modifications to get_partialinode() and __slab_free().

IMO, your code is fine to work.
But, this modification adds lots of "#ifdef" and makes code less clean.
How about *not* removing struct page *partial in struct kmem_cache_cpu?
This introduces memory overhead and makes code bigger
for !CONFIG_SLUB_CPU_PARTIAL, but, this will help to make clean code
and we will maintain code easily.

And I re-read Steven initial problem report in RT kernel and find that
unfreeze_partial() do lock and unlock several times. This means that
each page in cpu partial list doesn't come from same node. Why do we
add other node's slab to this cpu partial list? This is also not good
for general case. How about considering node affinity in __slab_free()?
IMHO, if we implement this, Steven's problem can be solved.

Thanks.

> Subject: slub: Make cpu partial slab support configurable V2
>
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/include/linux/slub_def.h
> ===================================================================
> --- linux.orig/include/linux/slub_def.h 2013-05-20 15:21:26.925704505 -0500
> +++ linux/include/linux/slub_def.h      2013-05-20 15:27:40.827613445 -0500
> @@ -47,12 +47,24 @@ struct kmem_cache_cpu {
>         void **freelist;        /* Pointer to next available object */
>         unsigned long tid;      /* Globally unique transaction id */
>         struct page *page;      /* The slab from which we are allocating */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         struct page *partial;   /* Partially allocated frozen slabs */
> +#endif
>  #ifdef CONFIG_SLUB_STATS
>         unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
>  };
>
> +static inline struct page *kmem_cache_cpu_partial_pages
> +                       (struct kmem_cache_cpu *c)
> +{
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +       return c->partial;
> +#else
> +       return NULL;
> +#endif
> +}
> +
>  /*
>   * Word size structure that can be atomically updated or read and that
>   * contains both the order and the number of objects that a slab of the
> @@ -73,7 +85,9 @@ struct kmem_cache {
>         int size;               /* The size of an object including meta data */
>         int object_size;        /* The size of an object without meta data */
>         int offset;             /* Free pointer offset. */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         int cpu_partial;        /* Number of per cpu partial objects to keep around */
> +#endif
>         struct kmem_cache_order_objects oo;
>
>         /* Allocation and freeing of slabs */
> @@ -104,6 +118,15 @@ struct kmem_cache {
>         struct kmem_cache_node *node[MAX_NUMNODES];
>  };
>
> +static inline int kmem_cache_cpu_partial(struct kmem_cache *s)
> +{
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +       return s->cpu_partial;
> +#else
> +       return 0;
> +#endif
> +}
> +
>  void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
>  void *__kmalloc(size_t size, gfp_t flags);
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c        2013-05-20 15:21:26.925704505 -0500
> +++ linux/mm/slub.c     2013-05-20 15:38:00.681437630 -0500
> @@ -1530,7 +1530,9 @@ static inline void *acquire_slab(struct
>         return freelist;
>  }
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
> +#endif
>  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
>
>  /*
> @@ -1570,10 +1572,15 @@ static void *get_partial_node(struct kme
>                         stat(s, ALLOC_FROM_PARTIAL);
>                         object = t;
>                 } else {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                         put_cpu_partial(s, page, 0);
>                         stat(s, CPU_PARTIAL_NODE);
> +#else
> +                       BUG();
> +#endif
>                 }
> -               if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
> +               if (kmem_cache_debug(s) ||
> +                              available > kmem_cache_cpu_partial(s) / 2)
>                         break;
>
>         }
> @@ -1874,6 +1881,7 @@ redo:
>         }
>  }
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  /*
>   * Unfreeze all the cpu partial slabs.
>   *
> @@ -1988,6 +1996,7 @@ static void put_cpu_partial(struct kmem_
>
>         } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
>  }
> +#endif
>
>  static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>  {
> @@ -2012,7 +2021,9 @@ static inline void __flush_cpu_slab(stru
>                 if (c->page)
>                         flush_slab(s, c);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                 unfreeze_partials(s, c);
> +#endif
>         }
>  }
>
> @@ -2028,7 +2039,7 @@ static bool has_cpu_slab(int cpu, void *
>         struct kmem_cache *s = info;
>         struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
>
> -       return c->page || c->partial;
> +       return c->page || kmem_cache_cpu_partial_pages(c);
>  }
>
>  static void flush_all(struct kmem_cache *s)
> @@ -2224,6 +2235,7 @@ static void *__slab_alloc(struct kmem_ca
>         page = c->page;
>         if (!page)
>                 goto new_slab;
> +
>  redo:
>
>         if (unlikely(!node_match(page, node))) {
> @@ -2277,10 +2289,12 @@ load_freelist:
>
>  new_slab:
>
> -       if (c->partial) {
> +       if (kmem_cache_cpu_partial_pages(c)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                 page = c->page = c->partial;
>                 c->partial = page->next;
>                 stat(s, CPU_PARTIAL_ALLOC);
> +#endif
>                 c->freelist = NULL;
>                 goto redo;
>         }
> @@ -2495,6 +2509,7 @@ static void __slab_free(struct kmem_cach
>                 new.inuse--;
>                 if ((!new.inuse || !prior) && !was_frozen) {
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                         if (!kmem_cache_debug(s) && !prior)
>
>                                 /*
> @@ -2503,7 +2518,9 @@ static void __slab_free(struct kmem_cach
>                                  */
>                                 new.frozen = 1;
>
> -                       else { /* Needs to be taken off a list */
> +                       else
> +#endif
> +                               { /* Needs to be taken off a list */
>
>                                 n = get_node(s, page_to_nid(page));
>                                 /*
> @@ -2525,6 +2542,7 @@ static void __slab_free(struct kmem_cach
>                 "__slab_free"));
>
>         if (likely(!n)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>
>                 /*
>                  * If we just froze the page then put it onto the
> @@ -2534,6 +2552,7 @@ static void __slab_free(struct kmem_cach
>                         put_cpu_partial(s, page, 1);
>                         stat(s, CPU_PARTIAL_FREE);
>                 }
> +#endif
>                 /*
>                  * The list lock was not taken therefore no list
>                  * activity can be necessary.
> @@ -3041,7 +3060,7 @@ static int kmem_cache_open(struct kmem_c
>          * list to avoid pounding the page allocator excessively.
>          */
>         set_min_partial(s, ilog2(s->size) / 2);
> -
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         /*
>          * cpu_partial determined the maximum number of objects kept in the
>          * per cpu partial lists of a processor.
> @@ -3069,6 +3088,7 @@ static int kmem_cache_open(struct kmem_c
>                 s->cpu_partial = 13;
>         else
>                 s->cpu_partial = 30;
> +#endif
>
>  #ifdef CONFIG_NUMA
>         s->remote_node_defrag_ratio = 1000;
> @@ -4283,14 +4303,15 @@ static ssize_t show_slab_objects(struct
>                         total += x;
>                         nodes[node] += x;
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>                         page = ACCESS_ONCE(c->partial);
>                         if (page) {
>                                 x = page->pobjects;
>                                 total += x;
>                                 nodes[node] += x;
>                         }
> -
>                         per_cpu[node]++;
> +#endif
>                 }
>         }
>
> @@ -4442,6 +4463,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>         return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4464,6 +4486,7 @@ static ssize_t cpu_partial_store(struct
>         return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
>
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4503,6 +4526,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>         int objects = 0;
> @@ -4533,6 +4557,7 @@ static ssize_t slabs_cpu_partial_show(st
>         return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
>
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4856,11 +4881,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
>
>  static struct attribute *slab_attrs[] = {
>         &slab_size_attr.attr,
> @@ -4868,7 +4895,9 @@ static struct attribute *slab_attrs[] =
>         &objs_per_slab_attr.attr,
>         &order_attr.attr,
>         &min_partial_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         &cpu_partial_attr.attr,
> +#endif
>         &objects_attr.attr,
>         &objects_partial_attr.attr,
>         &partial_attr.attr,
> @@ -4881,7 +4910,9 @@ static struct attribute *slab_attrs[] =
>         &destroy_by_rcu_attr.attr,
>         &shrink_attr.attr,
>         &reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         &slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>         &total_objects_attr.attr,
>         &slabs_attr.attr,
> @@ -4923,11 +4954,13 @@ static struct attribute *slab_attrs[] =
>         &order_fallback_attr.attr,
>         &cmpxchg_double_fail_attr.attr,
>         &cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>         &cpu_partial_alloc_attr.attr,
>         &cpu_partial_free_attr.attr,
>         &cpu_partial_node_attr.attr,
>         &cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>         &failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig     2013-05-20 15:21:26.925704505 -0500
> +++ linux/init/Kconfig  2013-05-20 15:21:26.921704441 -0500
> @@ -1558,6 +1558,17 @@ config SLOB
>
>  endchoice
>
> +config SLUB_CPU_PARTIAL
> +       default y
> +       depends on SLUB
> +       bool "SLUB per cpu partial cache"
> +       help
> +         Per cpu partial caches accellerate objects allocation and freeing
> +         that is local to a processor at the price of more indeterminism
> +         in the latency of the free. On overflow these caches will be cleared
> +         which requires the taking of locks that may cause latency spikes.
> +         Typically one would choose no for a realtime system.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>         bool "Allow mmapped anonymous memory to be uninitialized"
>         depends on EXPERT && !MMU
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-06-03 15:28                                   ` JoonSoo Kim
@ 2013-06-03 19:20                                     ` Christoph Lameter
  2013-06-04 22:21                                       ` JoonSoo Kim
  2013-06-03 20:41                                     ` Christoph Lameter
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Lameter @ 2013-06-03 19:20 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Steven Rostedt, Joonsoo Kim, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Tue, 4 Jun 2013, JoonSoo Kim wrote:

> And I re-read Steven initial problem report in RT kernel and find that
> unfreeze_partial() do lock and unlock several times. This means that
> each page in cpu partial list doesn't come from same node. Why do we
> add other node's slab to this cpu partial list? This is also not good
> for general case. How about considering node affinity in __slab_free()?
> IMHO, if we implement this, Steven's problem can be solved.

We may need the other nodes pages if we consistently allocate from there.
__slab_alloc() ensures that only pages from the correct node are used. It
will drop pages that do not come from the proper nodes.

Filtering in __slab_free would mean that we cannot improve performance on
remote frees.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-06-03 15:28                                   ` JoonSoo Kim
  2013-06-03 19:20                                     ` Christoph Lameter
@ 2013-06-03 20:41                                     ` Christoph Lameter
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2013-06-03 20:41 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Steven Rostedt, Joonsoo Kim, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Tue, 4 Jun 2013, JoonSoo Kim wrote:

> But, this modification adds lots of "#ifdef" and makes code less clean.
> How about *not* removing struct page *partial in struct kmem_cache_cpu?
> This introduces memory overhead and makes code bigger
> for !CONFIG_SLUB_CPU_PARTIAL, but, this will help to make clean code
> and we will maintain code easily.

ok.

Subject: slub: Make cpu partial slab support configurable V2

cpu partial support can introduce level of indeterminism that is not wanted
in certain context (like a realtime kernel). Make it configurable.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/include/linux/slub_def.h
===================================================================
--- linux.orig/include/linux/slub_def.h	2013-06-03 14:28:57.954239479 -0500
+++ linux/include/linux/slub_def.h	2013-06-03 14:28:57.950239416 -0500
@@ -73,7 +73,9 @@ struct kmem_cache {
 	int size;		/* The size of an object including meta data */
 	int object_size;	/* The size of an object without meta data */
 	int offset;		/* Free pointer offset. */
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	int cpu_partial;	/* Number of per cpu partial objects to keep around */
+#endif
 	struct kmem_cache_order_objects oo;

 	/* Allocation and freeing of slabs */
@@ -104,6 +106,15 @@ struct kmem_cache {
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };

+static inline int kmem_cache_cpu_partial(struct kmem_cache *s)
+{
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+	return s->cpu_partial;
+#else
+	return 0;
+#endif
+}
+
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
 void *__kmalloc(size_t size, gfp_t flags);

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-06-03 14:28:57.954239479 -0500
+++ linux/mm/slub.c	2013-06-03 14:28:57.950239416 -0500
@@ -1573,7 +1573,8 @@ static void *get_partial_node(struct kme
 			put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
 		}
-		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
+		if (kmem_cache_debug(s) ||
+			       available > kmem_cache_cpu_partial(s) / 2)
 			break;

 	}
@@ -1884,6 +1885,7 @@ redo:
 static void unfreeze_partials(struct kmem_cache *s,
 		struct kmem_cache_cpu *c)
 {
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct kmem_cache_node *n = NULL, *n2 = NULL;
 	struct page *page, *discard_page = NULL;

@@ -1938,6 +1940,7 @@ static void unfreeze_partials(struct kme
 		discard_slab(s, page);
 		stat(s, FREE_SLAB);
 	}
+#endif
 }

 /*
@@ -1951,6 +1954,7 @@ static void unfreeze_partials(struct kme
  */
 static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 {
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	struct page *oldpage;
 	int pages;
 	int pobjects;
@@ -1987,6 +1991,7 @@ static void put_cpu_partial(struct kmem_
 		page->next = oldpage;

 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
+#endif
 }

 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
@@ -2495,6 +2500,7 @@ static void __slab_free(struct kmem_cach
 		new.inuse--;
 		if ((!new.inuse || !prior) && !was_frozen) {

+#ifdef CONFIG_SLUB_CPU_PARTIAL
 			if (!kmem_cache_debug(s) && !prior)

 				/*
@@ -2503,7 +2509,9 @@ static void __slab_free(struct kmem_cach
 				 */
 				new.frozen = 1;

-			else { /* Needs to be taken off a list */
+			else
+#endif
+		       		{ /* Needs to be taken off a list */

 	                        n = get_node(s, page_to_nid(page));
 				/*
@@ -2525,6 +2533,7 @@ static void __slab_free(struct kmem_cach
 		"__slab_free"));

 	if (likely(!n)) {
+#ifdef CONFIG_SLUB_CPU_PARTIAL

 		/*
 		 * If we just froze the page then put it onto the
@@ -2534,6 +2543,7 @@ static void __slab_free(struct kmem_cach
 			put_cpu_partial(s, page, 1);
 			stat(s, CPU_PARTIAL_FREE);
 		}
+#endif
 		/*
 		 * The list lock was not taken therefore no list
 		 * activity can be necessary.
@@ -3041,7 +3051,7 @@ static int kmem_cache_open(struct kmem_c
 	 * list to avoid pounding the page allocator excessively.
 	 */
 	set_min_partial(s, ilog2(s->size) / 2);
-
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	/*
 	 * cpu_partial determined the maximum number of objects kept in the
 	 * per cpu partial lists of a processor.
@@ -3069,6 +3079,7 @@ static int kmem_cache_open(struct kmem_c
 		s->cpu_partial = 13;
 	else
 		s->cpu_partial = 30;
+#endif

 #ifdef CONFIG_NUMA
 	s->remote_node_defrag_ratio = 1000;
@@ -4424,7 +4435,7 @@ SLAB_ATTR(order);

 static ssize_t min_partial_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%lu\n", s->min_partial);
+	return sprintf(buf, "%u\n", kmem_cache_cpu_partial(s));
 }

 static ssize_t min_partial_store(struct kmem_cache *s, const char *buf,
@@ -4444,7 +4455,7 @@ SLAB_ATTR(min_partial);

 static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%u\n", s->cpu_partial);
+	return sprintf(buf, "%u\n", kmem_cache_cpu_partial(s));
 }

 static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
@@ -4458,10 +4469,13 @@ static ssize_t cpu_partial_store(struct
 		return err;
 	if (objects && kmem_cache_debug(s))
 		return -EINVAL;
-
+#ifdef CONFIG_SLUB_CPU_PARTIAL
 	s->cpu_partial = objects;
 	flush_all(s);
 	return length;
+#else
+	return -ENOSYS;
+#endif
 }
 SLAB_ATTR(cpu_partial);

Index: linux/init/Kconfig
===================================================================
--- linux.orig/init/Kconfig	2013-06-03 14:28:57.954239479 -0500
+++ linux/init/Kconfig	2013-06-03 14:28:57.950239416 -0500
@@ -1558,6 +1558,17 @@ config SLOB

 endchoice

+config SLUB_CPU_PARTIAL
+	default y
+	depends on SLUB
+	bool "SLUB per cpu partial cache"
+	help
+	  Per cpu partial caches accellerate objects allocation and freeing
+	  that is local to a processor at the price of more indeterminism
+	  in the latency of the free. On overflow these caches will be cleared
+	  which requires the taking of locks that may cause latency spikes.
+	  Typically one would choose no for a realtime system.
+
 config MMAP_ALLOW_UNINITIALIZED
 	bool "Allow mmapped anonymous memory to be uninitialized"
 	depends on EXPERT && !MMU

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-06-03 19:20                                     ` Christoph Lameter
@ 2013-06-04 22:21                                       ` JoonSoo Kim
  2013-06-05 14:06                                         ` Christoph Lameter
  2013-06-05 14:09                                         ` Christoph Lameter
  0 siblings, 2 replies; 51+ messages in thread
From: JoonSoo Kim @ 2013-06-04 22:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, Joonsoo Kim, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

2013/6/4 Christoph Lameter <cl@linux.com>:
> On Tue, 4 Jun 2013, JoonSoo Kim wrote:
>
>> And I re-read Steven initial problem report in RT kernel and find that
>> unfreeze_partial() do lock and unlock several times. This means that
>> each page in cpu partial list doesn't come from same node. Why do we
>> add other node's slab to this cpu partial list? This is also not good
>> for general case. How about considering node affinity in __slab_free()?
>> IMHO, if we implement this, Steven's problem can be solved.
>
> We may need the other nodes pages if we consistently allocate from there.
> __slab_alloc() ensures that only pages from the correct node are used. It
> will drop pages that do not come from the proper nodes.
>
> Filtering in __slab_free would mean that we cannot improve performance on
> remote frees.
>

Sorry for poor explanation.
I knew that we need the other nodes pages if we consistently allocate
from there.
In allocation path, it is reasonable.
But, in free path, it is not reasonable that we put other nodes' page
to cpu partial list.
In this case, we don't need other nodes pages, because there is
no allocation request for other nodes. Putting other node's page to
cpu partial list and
using this other node's page for normal allocation request(it doesn't
specify node affinity)
makes system performance degradation. Below is quick implementation of
this idea.

---------------------->8-------------------
diff --git a/mm/slub.c b/mm/slub.c
index 57707f0..7f614f4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2495,7 +2495,8 @@ static void __slab_free(struct kmem_cache *s,
struct page *page,
                new.inuse--;
                if ((!new.inuse || !prior) && !was_frozen) {

-                       if (!kmem_cache_debug(s) && !prior)
+                       if (!kmem_cache_debug(s) && !prior &&
+                               node_match(page,
cpu_to_node(smp_processor_id())))

                                /*
                                 * Slab was on no list before and will
be partially empty
@@ -2550,8 +2551,9 @@ static void __slab_free(struct kmem_cache *s,
struct page *page,
         * Objects left in the slab. If it was not on the partial list before
         * then add it.
         */
-       if (kmem_cache_debug(s) && unlikely(!prior)) {
-               remove_full(s, page);
+       if (!prior) {
+               if (kmem_cache_debug(s))
+                       remove_full(s, page);
                add_partial(n, page, DEACTIVATE_TO_TAIL);
                stat(s, FREE_ADD_PARTIAL);
        }

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-06-04 22:21                                       ` JoonSoo Kim
@ 2013-06-05 14:06                                         ` Christoph Lameter
  2013-06-05 14:09                                         ` Christoph Lameter
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2013-06-05 14:06 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Steven Rostedt, Joonsoo Kim, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Wed, 5 Jun 2013, JoonSoo Kim wrote:

> I knew that we need the other nodes pages if we consistently allocate
> from there.
> In allocation path, it is reasonable.
> But, in free path, it is not reasonable that we put other nodes' page
> to cpu partial list.

It is reasonable. An "other nodes page" on the cpu partial list will
decrease the synchronization necessary to free other objects to that
page. And multiple objects from a remote node are typically freed in on go.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
  2013-06-04 22:21                                       ` JoonSoo Kim
  2013-06-05 14:06                                         ` Christoph Lameter
@ 2013-06-05 14:09                                         ` Christoph Lameter
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2013-06-05 14:09 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Steven Rostedt, Joonsoo Kim, LKML, RT, Thomas Gleixner,
	Clark Williams, Pekka Enberg

On Wed, 5 Jun 2013, JoonSoo Kim wrote:

> @@ -2495,7 +2495,8 @@ static void __slab_free(struct kmem_cache *s,
> struct page *page,
>                 new.inuse--;
>                 if ((!new.inuse || !prior) && !was_frozen) {
>
> -                       if (!kmem_cache_debug(s) && !prior)
> +                       if (!kmem_cache_debug(s) && !prior &&
> +                               node_match(page,
> cpu_to_node(smp_processor_id())))
>

Would increase the overhead of the free path. cpu_to_node? Use
numa_node_id().

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2013-06-05 14:09 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-21 22:55 [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code Steven Rostedt
2013-03-22 15:41 ` Christoph Lameter
2013-03-23  3:51   ` Steven Rostedt
2013-03-25 14:34     ` Christoph Lameter
2013-03-25 15:57       ` Steven Rostedt
2013-03-25 16:13         ` Steven Rostedt
2013-03-25 17:51           ` Christoph Lameter
2013-03-25 18:03             ` Steven Rostedt
2013-03-25 18:27               ` Christoph Lameter
2013-03-25 18:32                 ` Steven Rostedt
2013-03-27  2:59                   ` Joonsoo Kim
2013-03-27  3:30                     ` Steven Rostedt
2013-03-27  6:13                       ` Joonsoo Kim
2013-03-28 17:29                         ` Christoph Lameter
2013-04-08 12:25                           ` Steven Rostedt
     [not found]                         ` <alpine.DEB.2.02.1303281227520.16200@gentwo.org>
2013-03-28 17:30                           ` Christoph Lameter
2013-03-29  2:43                             ` Paul Gortmaker
2013-04-01 15:32                               ` Christoph Lameter
2013-04-01 16:06                                 ` Paul Gortmaker
2013-04-01 16:06                                   ` Paul Gortmaker
2013-04-02  0:07                                   ` Paul Gortmaker
2013-04-01 21:46                                 ` Paul Gortmaker
2013-04-01 21:46                                   ` Paul Gortmaker
2013-04-02  1:37                                 ` Joonsoo Kim
     [not found]                               ` <alpine.DEB.2.02.1304011025550.12690@gentwo.org>
2013-04-01 15:33                                 ` Christoph Lameter
2013-04-02  0:42                                   ` Joonsoo Kim
2013-04-02  6:48                                     ` Pekka Enberg
2013-04-02 19:25                                     ` Christoph Lameter
2013-04-04  0:58                                       ` Joonsoo Kim
2013-04-04 13:53                                         ` Christoph Lameter
2013-04-05  2:05                                           ` Joonsoo Kim
2013-04-05 14:35                                             ` Christoph Lameter
2013-04-08 12:32                                     ` Steven Rostedt
2013-04-10  6:31                                       ` Pekka Enberg
2013-04-10  7:31                                         ` Joonsoo Kim
2013-04-10 14:00                                           ` Christoph Lameter
2013-04-10 14:09                                             ` Steven Rostedt
2013-04-11 16:05                             ` Steven Rostedt
2013-04-11 16:42                               ` Christoph Lameter
2013-04-12  6:48                                 ` Pekka Enberg
2013-04-12  6:48                                   ` Pekka Enberg
2013-05-28 14:39                             ` Steven Rostedt
2013-05-28 16:22                               ` Christoph Lameter
     [not found]                               ` <alpine.DEB.2.02.1305281121420.1627@gentwo.org>
2013-05-28 18:24                                 ` Christoph Lameter
2013-06-03 15:28                                   ` JoonSoo Kim
2013-06-03 19:20                                     ` Christoph Lameter
2013-06-04 22:21                                       ` JoonSoo Kim
2013-06-05 14:06                                         ` Christoph Lameter
2013-06-05 14:09                                         ` Christoph Lameter
2013-06-03 20:41                                     ` Christoph Lameter
2013-03-26 16:52       ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.