[patch v2 0/5] percpu_counter: bug fix and enhancement

* [patch v2 0/5] percpu_counter: bug fix and enhancement
@ 2011-05-11  8:10 Shaohua Li
  2011-05-11  8:10 ` [patch v2 1/5] percpu_counter: fix code for 32bit systems for UP Shaohua Li
                   ` (5 more replies)
  0 siblings, 6 replies; 52+ messages in thread
From: Shaohua Li @ 2011-05-11  8:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, tj, eric.dumazet, cl, npiggin

The patch sets do two things.
1. fix bug for 32-bit system. percpu_counter uses s64 counter. Without any
locking reading s64 in 32-bit system isn't safe and can cause bad side effect.
2. improve scalability for __percpu_counter_add. In some cases, _add could
cause heavy lock contention (see patch 4 for detailed infomation and data).
The patches will remove the contention and speed up it a bit. Last post
(http://marc.info/?l=linux-kernel&m=130259547913607&w=2) simpliy uses
atomic64 for percpu_counter, but Tejun pointed out this could cause
deviation in __percpu_counter_sum.
The new implementation uses lglock to protect percpu data. Each cpu has its
private lock while other cpu doesn't take. In this way _add doesn't need take
global lock anymore and remove the deviation. This still gives me about
about 5x ~ 6x faster (not that faster than the original 7x faster, but still
good) with the workload mentioned in patch 4.

patch 1 fix s64 read bug for 32-bit system for UP
patch 2 convert lglock to be used by dynamaically allocated structre. Later
patch will use lglock for percpu_counter
patch 3,4 fix s64 read bug for 32-bit system for MP. And it also improve the
scalability for __percpu_counter_add.
patch 5 is from Christoph Lameter to make __percpu_counter_add fastpath
preemptless. I added it here because I converted percpu_counter to use
lglock. All bugs are from mine.

Comments and suggestions are welcomed!

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 52+ messages in thread