All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] locking/local_lock: Make the empty local_lock_*() function a macro.
@ 2022-01-05 20:26 Sebastian Andrzej Siewior
  2022-01-06  3:34 ` Waiman Long
  0 siblings, 1 reply; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-01-05 20:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Thomas Gleixner, Sebastian Andrzej Siewior

It has been said that local_lock() does not add any overhead compared to
preempt_disable() in a !LOCKDEP configuration. A microbenchmark showed
an unexpected result which can be reduced to the fact that local_lock()
was not entirely optimized away.
In the !LOCKDEP configuration local_lock_acquire() is an empty static
inline function. On x86 the this_cpu_ptr() argument of that function is
fully evaluated leading to an additional mov+add instructions which are
not needed and not used.

Replace the static inline function with a macro. The typecheck() macro
ensures that the argument is of proper type while the resulting
disassembly shows no traces of this_cpu_ptr().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
On -rc8, size says:
|     text    data     bss      dec     filename
| 19656718 8681015 3764440 32102173 vmlinux.old
| 19656218 8681015 3764440 32101673 vmlinux.new

Which is -500 text, not much but still.

 include/linux/local_lock_internal.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h
index 975e33b793a77..6d635e8306d64 100644
--- a/include/linux/local_lock_internal.h
+++ b/include/linux/local_lock_internal.h
@@ -44,9 +44,9 @@ static inline void local_lock_debug_init(local_lock_t *l)
 }
 #else /* CONFIG_DEBUG_LOCK_ALLOC */
 # define LOCAL_LOCK_DEBUG_INIT(lockname)
-static inline void local_lock_acquire(local_lock_t *l) { }
-static inline void local_lock_release(local_lock_t *l) { }
-static inline void local_lock_debug_init(local_lock_t *l) { }
+# define local_lock_acquire(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
+# define local_lock_release(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
+# define local_lock_debug_init(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
 #endif /* !CONFIG_DEBUG_LOCK_ALLOC */
 
 #define INIT_LOCAL_LOCK(lockname)	{ LOCAL_LOCK_DEBUG_INIT(lockname) }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] locking/local_lock: Make the empty local_lock_*() function a macro.
  2022-01-05 20:26 [PATCH] locking/local_lock: Make the empty local_lock_*() function a macro Sebastian Andrzej Siewior
@ 2022-01-06  3:34 ` Waiman Long
  2022-01-10  7:43   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 3+ messages in thread
From: Waiman Long @ 2022-01-06  3:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner

On 1/5/22 15:26, Sebastian Andrzej Siewior wrote:
> It has been said that local_lock() does not add any overhead compared to
> preempt_disable() in a !LOCKDEP configuration. A microbenchmark showed
> an unexpected result which can be reduced to the fact that local_lock()
> was not entirely optimized away.
> In the !LOCKDEP configuration local_lock_acquire() is an empty static
> inline function. On x86 the this_cpu_ptr() argument of that function is
> fully evaluated leading to an additional mov+add instructions which are
> not needed and not used.
>
> Replace the static inline function with a macro. The typecheck() macro
> ensures that the argument is of proper type while the resulting
> disassembly shows no traces of this_cpu_ptr().
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> On -rc8, size says:
> |     text    data     bss      dec     filename
> | 19656718 8681015 3764440 32102173 vmlinux.old
> | 19656218 8681015 3764440 32101673 vmlinux.new
>
> Which is -500 text, not much but still.
>
>   include/linux/local_lock_internal.h | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h
> index 975e33b793a77..6d635e8306d64 100644
> --- a/include/linux/local_lock_internal.h
> +++ b/include/linux/local_lock_internal.h
> @@ -44,9 +44,9 @@ static inline void local_lock_debug_init(local_lock_t *l)
>   }
>   #else /* CONFIG_DEBUG_LOCK_ALLOC */
>   # define LOCAL_LOCK_DEBUG_INIT(lockname)
> -static inline void local_lock_acquire(local_lock_t *l) { }
> -static inline void local_lock_release(local_lock_t *l) { }
> -static inline void local_lock_debug_init(local_lock_t *l) { }
> +# define local_lock_acquire(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
> +# define local_lock_release(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
> +# define local_lock_debug_init(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
>   #endif /* !CONFIG_DEBUG_LOCK_ALLOC */
>   
>   #define INIT_LOCAL_LOCK(lockname)	{ LOCAL_LOCK_DEBUG_INIT(lockname) }

I try out this patch and it indeed helps to reduce the object size of 
functions that use local_lock(). However, the extra code isn't an 
additional mov+add.

Using folio_add_lru() as an example,

Without the patch:

466        local_lock(&lru_pvecs.lock);
    0x00000000000032ee <+14>:    mov    $0x1,%edi
    0x00000000000032f3 <+19>:    callq  0x32f8 <folio_add_lru+24>
    0x00000000000032f8 <+24>:    callq  0x32fd <folio_add_lru+29>

With the patch:

466             local_lock(&lru_pvecs.lock);
    0x00000000000032ae <+14>:    mov    $0x1,%edi
    0x00000000000032b3 <+19>:    callq  0x32b8 <folio_add_lru+24>

There is one less placeholder for tracing. Maybe it depends on the 
compiler and the exact config options.

Anyway,

Reviewed-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] locking/local_lock: Make the empty local_lock_*() function a macro.
  2022-01-06  3:34 ` Waiman Long
@ 2022-01-10  7:43   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-01-10  7:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner

On 2022-01-05 22:34:31 [-0500], Waiman Long wrote:
> 
> I try out this patch and it indeed helps to reduce the object size of
> functions that use local_lock(). However, the extra code isn't an additional
> mov+add.
> 
> Using folio_add_lru() as an example,
> 
> Without the patch:
> 
> 466        local_lock(&lru_pvecs.lock);
>    0x00000000000032ee <+14>:    mov    $0x1,%edi
>    0x00000000000032f3 <+19>:    callq  0x32f8 <folio_add_lru+24>
>    0x00000000000032f8 <+24>:    callq  0x32fd <folio_add_lru+29>

The call here might be due to some debugging switches or compiler
optimisation. I have with no debug and gcc-11:
| # mm/swap.c:466:     local_lock(&lru_pvecs.lock);
|         movq    $lru_pvecs, %rbx        #, tmp135
|         movq    %rbx, %rax      # tmp135, tcp_ptr__
| #APP
| # 466 "mm/swap.c" 1
|         add %gs:this_cpu_off(%rip), %rax        # this_cpu_off, tcp_ptr__

so it is mov per-CPU variable, add per-CPU offset.

Sebastian

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-10  7:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-05 20:26 [PATCH] locking/local_lock: Make the empty local_lock_*() function a macro Sebastian Andrzej Siewior
2022-01-06  3:34 ` Waiman Long
2022-01-10  7:43   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.