On Aug 14, 2016 10:00 PM, "Dave Chinner" <david@fromorbit.com> wrote:
>
> > What does it say if you annotate that _raw_spin_unlock_irqrestore() function?
> ....
>        ¿
>        ¿    Disassembly of section load0:
>        ¿
>        ¿    ffffffff81e628b0 <load0>:
>        ¿      nop
>        ¿      push   %rbp
>        ¿      mov    %rsp,%rbp
>        ¿      movb   $0x0,(%rdi)
>        ¿      nop
>        ¿      mov    %rsi,%rdi
>        ¿      push   %rdi
>        ¿      popfq
>  99.35 ¿      nop

Yeah, that's a good disassembly of a non-debug spin unlock, and the symbols are fine, but the profile is not valid. That's an interrupt point, right after the popf that enables interiors again.

I don't know why 'perf' isn't working on your machine, but it clearly isn't.

Has it ever worked on that machine? What cpu is it? Are you running in some virtualized environment without performance counters, perhaps?

It's not actually the unlock that is expensive, and there is no contention on the lock (if there had been, the numbers would have been entirely different for the debug case, which makes locking an order of magnitude more expensive). All the cost of everything that happened while interrupts were disabled is just accounted to the instruction after they were enabled again.

             Linus