On 09/05/2013 09:31 AM, Ingo Molnar wrote: > * Waiman Long wrote: > > >> The latest tty patches did work. The tty related spinlock contention >> is now completely gone. The short workload can now reach over 8M JPM >> which is the highest I have ever seen. >> >> The perf profile was: >> >> 5.85% reaim reaim [.] mul_short >> 4.87% reaim [kernel.kallsyms] [k] ebitmap_get_bit >> 4.72% reaim reaim [.] mul_int >> 4.71% reaim reaim [.] mul_long >> 2.67% reaim libc-2.12.so [.] __random_r >> 2.64% reaim [kernel.kallsyms] [k] lockref_get_not_zero >> 1.58% reaim [kernel.kallsyms] [k] copy_user_generic_string >> 1.48% reaim [kernel.kallsyms] [k] mls_level_isvalid >> 1.35% reaim [kernel.kallsyms] [k] find_next_bit > 6%+ spent in ebitmap_get_bit() and mls_level_isvalid() looks like > something worth optimizing. > > Is that called very often, or is it perhaps cache-bouncing for some > reason? The high cycle count is due more to inefficient algorithm in the mls_level_isvalid() function than cacheline contention in the code. The attached patch should address this problem. It is in linux-next and hopefully will be merged in 3.12. -Longman