linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [v2] smp: fix smp_call_function_single_async prototype
@ 2021-05-05 21:12 Arnd Bergmann
  2021-05-06  1:19 ` Huang, Ying
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Arnd Bergmann @ 2021-05-05 21:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arnd Bergmann, Jens Axboe, Jian Cai, Guenter Roeck,
	Peter Zijlstra, Huang, Ying, Borislav Petkov, Eric Dumazet,
	Juergen Gross, Michael Ellerman, Thomas Gleixner,
	Nathan Chancellor, Nick Desaulniers, Ingo Molnar,
	Frederic Weisbecker, He Ying, Andrew Morton, Paul E. McKenney,
	clang-built-linux

From: Arnd Bergmann <arnd@arndb.de>

As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
call_single_data"), the smp code prefers 32-byte aligned call_single_data
objects for performance reasons, but the block layer includes an instance
of this structure in the main 'struct request' that is more senstive
to size than to performance here, see 4ccafe032005 ("block: unalign
call_single_data in struct request").

The result is a violation of the calling conventions that clang correctly
points out:

block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
                smp_call_function_single_async(cpu, &rq->csd);

It does seem that the usage of the call_single_data without cache line
alignment should still be allowed by the smp code, so just change the
function prototype so it accepts both, but leave the default alignment
unchanged for the other users. This seems better to me than adding
a local hack to shut up an otherwise correct warning in the caller.

Link: https://lore.kernel.org/linux-block/20210330230249.709221-1-jiancai@google.com/
Link: https://github.com/ClangBuiltLinux/linux/issues/1328
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Jian Cai <jiancai@google.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
v2: avoid adding other warnings by a more thorough change
---
 include/linux/smp.h |  2 +-
 kernel/smp.c        | 26 +++++++++++++-------------
 kernel/up.c         |  2 +-
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 669e35c03be2..510519e8a1eb 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -53,7 +53,7 @@ int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask);
 
-int smp_call_function_single_async(int cpu, call_single_data_t *csd);
+int smp_call_function_single_async(int cpu, struct __call_single_data *csd);
 
 /*
  * Cpus stopping functions in panic. All have default weak definitions.
diff --git a/kernel/smp.c b/kernel/smp.c
index e21074900006..52bf159ec400 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -211,7 +211,7 @@ static u64 cfd_seq_inc(unsigned int src, unsigned int dst, unsigned int type)
 	} while (0)
 
 /* Record current CSD work for current CPU, NULL to erase. */
-static void __csd_lock_record(call_single_data_t *csd)
+static void __csd_lock_record(struct __call_single_data *csd)
 {
 	if (!csd) {
 		smp_mb(); /* NULL cur_csd after unlock. */
@@ -226,13 +226,13 @@ static void __csd_lock_record(call_single_data_t *csd)
 		  /* Or before unlock, as the case may be. */
 }
 
-static __always_inline void csd_lock_record(call_single_data_t *csd)
+static __always_inline void csd_lock_record(struct __call_single_data *csd)
 {
 	if (static_branch_unlikely(&csdlock_debug_enabled))
 		__csd_lock_record(csd);
 }
 
-static int csd_lock_wait_getcpu(call_single_data_t *csd)
+static int csd_lock_wait_getcpu(struct __call_single_data *csd)
 {
 	unsigned int csd_type;
 
@@ -282,7 +282,7 @@ static const char *csd_lock_get_type(unsigned int type)
 	return (type >= ARRAY_SIZE(seq_type)) ? "?" : seq_type[type];
 }
 
-static void csd_lock_print_extended(call_single_data_t *csd, int cpu)
+static void csd_lock_print_extended(struct __call_single_data *csd, int cpu)
 {
 	struct cfd_seq_local *seq = &per_cpu(cfd_seq_local, cpu);
 	unsigned int srccpu = csd->node.src;
@@ -321,7 +321,7 @@ static void csd_lock_print_extended(call_single_data_t *csd, int cpu)
  * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU,
  * so waiting on other types gets much less information.
  */
-static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id)
+static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *ts1, int *bug_id)
 {
 	int cpu = -1;
 	int cpux;
@@ -387,7 +387,7 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
  * previous function call. For multi-cpu calls its even more interesting
  * as we'll have to ensure no other cpu is observing our csd.
  */
-static void __csd_lock_wait(call_single_data_t *csd)
+static void __csd_lock_wait(struct __call_single_data *csd)
 {
 	int bug_id = 0;
 	u64 ts0, ts1;
@@ -401,7 +401,7 @@ static void __csd_lock_wait(call_single_data_t *csd)
 	smp_acquire__after_ctrl_dep();
 }
 
-static __always_inline void csd_lock_wait(call_single_data_t *csd)
+static __always_inline void csd_lock_wait(struct __call_single_data *csd)
 {
 	if (static_branch_unlikely(&csdlock_debug_enabled)) {
 		__csd_lock_wait(csd);
@@ -431,17 +431,17 @@ static void __smp_call_single_queue_debug(int cpu, struct llist_node *node)
 #else
 #define cfd_seq_store(var, src, dst, type)
 
-static void csd_lock_record(call_single_data_t *csd)
+static void csd_lock_record(struct __call_single_data *csd)
 {
 }
 
-static __always_inline void csd_lock_wait(call_single_data_t *csd)
+static __always_inline void csd_lock_wait(struct __call_single_data *csd)
 {
 	smp_cond_load_acquire(&csd->node.u_flags, !(VAL & CSD_FLAG_LOCK));
 }
 #endif
 
-static __always_inline void csd_lock(call_single_data_t *csd)
+static __always_inline void csd_lock(struct __call_single_data *csd)
 {
 	csd_lock_wait(csd);
 	csd->node.u_flags |= CSD_FLAG_LOCK;
@@ -454,7 +454,7 @@ static __always_inline void csd_lock(call_single_data_t *csd)
 	smp_wmb();
 }
 
-static __always_inline void csd_unlock(call_single_data_t *csd)
+static __always_inline void csd_unlock(struct __call_single_data *csd)
 {
 	WARN_ON(!(csd->node.u_flags & CSD_FLAG_LOCK));
 
@@ -501,7 +501,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
  * for execution on the given CPU. data must already have
  * ->func, ->info, and ->flags set.
  */
-static int generic_exec_single(int cpu, call_single_data_t *csd)
+static int generic_exec_single(int cpu, struct __call_single_data *csd)
 {
 	if (cpu == smp_processor_id()) {
 		smp_call_func_t func = csd->func;
@@ -784,7 +784,7 @@ EXPORT_SYMBOL(smp_call_function_single);
  * NOTE: Be careful, there is unfortunately no current debugging facility to
  * validate the correctness of this serialization.
  */
-int smp_call_function_single_async(int cpu, call_single_data_t *csd)
+int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
 {
 	int err = 0;
 
diff --git a/kernel/up.c b/kernel/up.c
index df50828cc2f0..a38b8b095251 100644
--- a/kernel/up.c
+++ b/kernel/up.c
@@ -25,7 +25,7 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-int smp_call_function_single_async(int cpu, call_single_data_t *csd)
+int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
 {
 	unsigned long flags;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] [v2] smp: fix smp_call_function_single_async prototype
  2021-05-05 21:12 [PATCH] [v2] smp: fix smp_call_function_single_async prototype Arnd Bergmann
@ 2021-05-06  1:19 ` Huang, Ying
  2021-05-06  7:54   ` Arnd Bergmann
  2021-05-06 10:10 ` Peter Zijlstra
  2021-05-06 13:48 ` [tip: locking/urgent] smp: Fix " tip-bot2 for Arnd Bergmann
  2 siblings, 1 reply; 9+ messages in thread
From: Huang, Ying @ 2021-05-06  1:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, Arnd Bergmann, Jens Axboe, Jian Cai, Guenter Roeck,
	Peter Zijlstra, Borislav Petkov, Eric Dumazet, Juergen Gross,
	Michael Ellerman, Thomas Gleixner, Nathan Chancellor,
	Nick Desaulniers, Ingo Molnar, Frederic Weisbecker, He Ying,
	Andrew Morton, Paul E. McKenney, clang-built-linux

Arnd Bergmann <arnd@kernel.org> writes:

> From: Arnd Bergmann <arnd@arndb.de>
>
> As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
> call_single_data"), the smp code prefers 32-byte aligned call_single_data
> objects for performance reasons, but the block layer includes an instance
> of this structure in the main 'struct request' that is more senstive
> to size than to performance here, see 4ccafe032005 ("block: unalign
> call_single_data in struct request").
>
> The result is a violation of the calling conventions that clang correctly
> points out:
>
> block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
>                 smp_call_function_single_async(cpu, &rq->csd);

Can this be silenced by

		smp_call_function_single_async(cpu, (call_single_data_t *)&rq->csd);

Best Regards,
Huang, Ying

[snip]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [v2] smp: fix smp_call_function_single_async prototype
  2021-05-06  1:19 ` Huang, Ying
@ 2021-05-06  7:54   ` Arnd Bergmann
  2021-05-06  8:14     ` Huang, Ying
  0 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2021-05-06  7:54 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Linux Kernel Mailing List, Jens Axboe, Jian Cai, Guenter Roeck,
	Peter Zijlstra, Borislav Petkov, Eric Dumazet, Juergen Gross,
	Michael Ellerman, Thomas Gleixner, Nathan Chancellor,
	Nick Desaulniers, Ingo Molnar, Frederic Weisbecker, He Ying,
	Andrew Morton, Paul E. McKenney, clang-built-linux

On Thu, May 6, 2021 at 3:20 AM Huang, Ying <ying.huang@intel.com> wrote:
>
> Arnd Bergmann <arnd@kernel.org> writes:
>
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
> > call_single_data"), the smp code prefers 32-byte aligned call_single_data
> > objects for performance reasons, but the block layer includes an instance
> > of this structure in the main 'struct request' that is more senstive
> > to size than to performance here, see 4ccafe032005 ("block: unalign
> > call_single_data in struct request").
> >
> > The result is a violation of the calling conventions that clang correctly
> > points out:
> >
> > block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
> >                 smp_call_function_single_async(cpu, &rq->csd);
>
> Can this be silenced by
>
>                 smp_call_function_single_async(cpu, (call_single_data_t *)&rq->csd);

Probably, but casting from smaller alignment to larger alignment is undefined
behavior and I'd rather not go there in case this triggers some runtime
misbehavior or ubsan check in the future. Making the function accept a
pointer with the smaller alignment avoids getting into undefined behavior
and doesn't require a cast.

       Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [v2] smp: fix smp_call_function_single_async prototype
  2021-05-06  7:54   ` Arnd Bergmann
@ 2021-05-06  8:14     ` Huang, Ying
  2021-05-06  8:30       ` Arnd Bergmann
  0 siblings, 1 reply; 9+ messages in thread
From: Huang, Ying @ 2021-05-06  8:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, Jens Axboe, Jian Cai, Guenter Roeck,
	Peter Zijlstra, Borislav Petkov, Eric Dumazet, Juergen Gross,
	Michael Ellerman, Thomas Gleixner, Nathan Chancellor,
	Nick Desaulniers, Ingo Molnar, Frederic Weisbecker, He Ying,
	Andrew Morton, Paul E. McKenney, clang-built-linux

Arnd Bergmann <arnd@kernel.org> writes:

> On Thu, May 6, 2021 at 3:20 AM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Arnd Bergmann <arnd@kernel.org> writes:
>>
>> > From: Arnd Bergmann <arnd@arndb.de>
>> >
>> > As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
>> > call_single_data"), the smp code prefers 32-byte aligned call_single_data
>> > objects for performance reasons, but the block layer includes an instance
>> > of this structure in the main 'struct request' that is more senstive
>> > to size than to performance here, see 4ccafe032005 ("block: unalign
>> > call_single_data in struct request").
>> >
>> > The result is a violation of the calling conventions that clang correctly
>> > points out:
>> >
>> > block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
>> >                 smp_call_function_single_async(cpu, &rq->csd);
>>
>> Can this be silenced by
>>
>>                 smp_call_function_single_async(cpu, (call_single_data_t *)&rq->csd);
>
> Probably, but casting from smaller alignment to larger alignment is undefined
> behavior

We cannot avoid type cast in Linux kernel, such as container_of(), is
there some difference here?

> and I'd rather not go there in case this triggers some runtime
> misbehavior or ubsan check in the future. Making the function accept a
> pointer with the smaller alignment avoids getting into undefined behavior
> and doesn't require a cast.

In its raw form as above, this looks bad.  If we encapsulate it, it may
look better, for example,

static inline int __smp_call_function_single_async(int cpu, struct __call_single_data *csd)
{
        smp_call_function_single_async(cpu, (call_single_data_t *)csd);
}

Then, we can do

        __smp_call_function_single_async(cpu, &rq->csd);

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [v2] smp: fix smp_call_function_single_async prototype
  2021-05-06  8:14     ` Huang, Ying
@ 2021-05-06  8:30       ` Arnd Bergmann
  2021-05-06 12:03         ` Huang, Ying
  0 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2021-05-06  8:30 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Linux Kernel Mailing List, Jens Axboe, Jian Cai, Guenter Roeck,
	Peter Zijlstra, Borislav Petkov, Eric Dumazet, Juergen Gross,
	Michael Ellerman, Thomas Gleixner, Nathan Chancellor,
	Nick Desaulniers, Ingo Molnar, Frederic Weisbecker, He Ying,
	Andrew Morton, Paul E. McKenney, clang-built-linux

On Thu, May 6, 2021 at 10:14 AM Huang, Ying <ying.huang@intel.com> wrote:
>
> Arnd Bergmann <arnd@kernel.org> writes:
>
> > On Thu, May 6, 2021 at 3:20 AM Huang, Ying <ying.huang@intel.com> wrote:
> >>
> >> Arnd Bergmann <arnd@kernel.org> writes:
> >>
> >> > From: Arnd Bergmann <arnd@arndb.de>
> >> >
> >> > As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
> >> > call_single_data"), the smp code prefers 32-byte aligned call_single_data
> >> > objects for performance reasons, but the block layer includes an instance
> >> > of this structure in the main 'struct request' that is more senstive
> >> > to size than to performance here, see 4ccafe032005 ("block: unalign
> >> > call_single_data in struct request").
> >> >
> >> > The result is a violation of the calling conventions that clang correctly
> >> > points out:
> >> >
> >> > block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
> >> >                 smp_call_function_single_async(cpu, &rq->csd);
> >>
> >> Can this be silenced by
> >>
> >>                 smp_call_function_single_async(cpu, (call_single_data_t *)&rq->csd);
> >
> > Probably, but casting from smaller alignment to larger alignment is undefined
> > behavior
>
> We cannot avoid type cast in Linux kernel, such as container_of(), is
> there some difference here?

container_of() does not cause any alignment problems. Assuming the outer
structure is aligned correctly, then the inner structure also is.

> > and I'd rather not go there in case this triggers some runtime
> > misbehavior or ubsan check in the future. Making the function accept a
> > pointer with the smaller alignment avoids getting into undefined behavior
> > and doesn't require a cast.
>
> In its raw form as above, this looks bad.  If we encapsulate it, it may
> look better, for example,
>
> static inline int __smp_call_function_single_async(int cpu, struct __call_single_data *csd)
> {
>         smp_call_function_single_async(cpu, (call_single_data_t *)csd);
> }
>
> Then, we can do
>
>         __smp_call_function_single_async(cpu, &rq->csd);

Same problem, it's still calling a function that expects stricter alignment.
It would work if we do it the other way around though:

static inline int smp_call_function_single_async(int cpu,
call_single_data_t *csd)
{
        return __smp_call_function_single_async(cpu, csd);
}

That should even work without the cast.

        Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [v2] smp: fix smp_call_function_single_async prototype
  2021-05-05 21:12 [PATCH] [v2] smp: fix smp_call_function_single_async prototype Arnd Bergmann
  2021-05-06  1:19 ` Huang, Ying
@ 2021-05-06 10:10 ` Peter Zijlstra
  2021-05-06 13:48 ` [tip: locking/urgent] smp: Fix " tip-bot2 for Arnd Bergmann
  2 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2021-05-06 10:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, Arnd Bergmann, Jens Axboe, Jian Cai, Guenter Roeck,
	Huang, Ying, Borislav Petkov, Eric Dumazet, Juergen Gross,
	Michael Ellerman, Thomas Gleixner, Nathan Chancellor,
	Nick Desaulniers, Ingo Molnar, Frederic Weisbecker, He Ying,
	Andrew Morton, Paul E. McKenney, clang-built-linux

On Wed, May 05, 2021 at 11:12:42PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
> call_single_data"), the smp code prefers 32-byte aligned call_single_data
> objects for performance reasons, but the block layer includes an instance
> of this structure in the main 'struct request' that is more senstive
> to size than to performance here, see 4ccafe032005 ("block: unalign
> call_single_data in struct request").
> 
> The result is a violation of the calling conventions that clang correctly
> points out:
> 
> block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
>                 smp_call_function_single_async(cpu, &rq->csd);
> 
> It does seem that the usage of the call_single_data without cache line
> alignment should still be allowed by the smp code, so just change the
> function prototype so it accepts both, but leave the default alignment
> unchanged for the other users. This seems better to me than adding
> a local hack to shut up an otherwise correct warning in the caller.
> 
> Link: https://lore.kernel.org/linux-block/20210330230249.709221-1-jiancai@google.com/
> Link: https://github.com/ClangBuiltLinux/linux/issues/1328
> Acked-by: Jens Axboe <axboe@kernel.dk>
> Cc: Jian Cai <jiancai@google.com>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Huang, Ying" <ying.huang@intel.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [v2] smp: fix smp_call_function_single_async prototype
  2021-05-06  8:30       ` Arnd Bergmann
@ 2021-05-06 12:03         ` Huang, Ying
  2021-05-06 14:30           ` Arnd Bergmann
  0 siblings, 1 reply; 9+ messages in thread
From: Huang, Ying @ 2021-05-06 12:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, Jens Axboe, Jian Cai, Guenter Roeck,
	Peter Zijlstra, Borislav Petkov, Eric Dumazet, Juergen Gross,
	Michael Ellerman, Thomas Gleixner, Nathan Chancellor,
	Nick Desaulniers, Ingo Molnar, Frederic Weisbecker, He Ying,
	Andrew Morton, Paul E. McKenney, clang-built-linux

Arnd Bergmann <arnd@kernel.org> writes:

> On Thu, May 6, 2021 at 10:14 AM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Arnd Bergmann <arnd@kernel.org> writes:
>>
>> > On Thu, May 6, 2021 at 3:20 AM Huang, Ying <ying.huang@intel.com> wrote:
>> >>
>> >> Arnd Bergmann <arnd@kernel.org> writes:
>> >>
>> >> > From: Arnd Bergmann <arnd@arndb.de>
>> >> >
>> >> > As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
>> >> > call_single_data"), the smp code prefers 32-byte aligned call_single_data
>> >> > objects for performance reasons, but the block layer includes an instance
>> >> > of this structure in the main 'struct request' that is more senstive
>> >> > to size than to performance here, see 4ccafe032005 ("block: unalign
>> >> > call_single_data in struct request").
>> >> >
>> >> > The result is a violation of the calling conventions that clang correctly
>> >> > points out:
>> >> >
>> >> > block/blk-mq.c:630:39: warning: passing 8-byte aligned argument
>> >> > to 32-byte aligned parameter 2 of
>> >> > 'smp_call_function_single_async' may result in an unaligned
>> >> > pointer access [-Walign-mismatch]
>> >> >                 smp_call_function_single_async(cpu, &rq->csd);
>> >>
>> >> Can this be silenced by
>> >>
>> >>                 smp_call_function_single_async(cpu, (call_single_data_t *)&rq->csd);
>> >
>> > Probably, but casting from smaller alignment to larger alignment is undefined
>> > behavior
>>
>> We cannot avoid type cast in Linux kernel, such as container_of(), is
>> there some difference here?
>
> container_of() does not cause any alignment problems. Assuming the outer
> structure is aligned correctly, then the inner structure also is.

So you think that the compiler may generate different code depends on
the data structure alignment (8 vs. 32 here)?  I think that it doesn't
on x86.  Do you know it does that on any architecture?  But I understand
that this is possible at least in theory.

>> > and I'd rather not go there in case this triggers some runtime
>> > misbehavior or ubsan check in the future. Making the function accept a
>> > pointer with the smaller alignment avoids getting into undefined behavior
>> > and doesn't require a cast.
>>
>> In its raw form as above, this looks bad.  If we encapsulate it, it may
>> look better, for example,
>>
>> static inline int __smp_call_function_single_async(int cpu, struct __call_single_data *csd)
>> {
>>         smp_call_function_single_async(cpu, (call_single_data_t *)csd);
>> }
>>
>> Then, we can do
>>
>>         __smp_call_function_single_async(cpu, &rq->csd);
>
> Same problem, it's still calling a function that expects stricter alignment.
> It would work if we do it the other way around though:
>
> static inline int smp_call_function_single_async(int cpu,
> call_single_data_t *csd)
> {
>         return __smp_call_function_single_async(cpu, csd);
> }
>
> That should even work without the cast.

Yes.  This looks good!

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: locking/urgent] smp: Fix smp_call_function_single_async prototype
  2021-05-05 21:12 [PATCH] [v2] smp: fix smp_call_function_single_async prototype Arnd Bergmann
  2021-05-06  1:19 ` Huang, Ying
  2021-05-06 10:10 ` Peter Zijlstra
@ 2021-05-06 13:48 ` tip-bot2 for Arnd Bergmann
  2 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Arnd Bergmann @ 2021-05-06 13:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Arnd Bergmann, Peter Zijlstra (Intel), Jens Axboe, x86, linux-kernel

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     1139aeb1c521eb4a050920ce6c64c36c4f2a3ab7
Gitweb:        https://git.kernel.org/tip/1139aeb1c521eb4a050920ce6c64c36c4f2a3ab7
Author:        Arnd Bergmann <arnd@arndb.de>
AuthorDate:    Wed, 05 May 2021 23:12:42 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 06 May 2021 15:33:49 +02:00

smp: Fix smp_call_function_single_async prototype

As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
call_single_data"), the smp code prefers 32-byte aligned call_single_data
objects for performance reasons, but the block layer includes an instance
of this structure in the main 'struct request' that is more senstive
to size than to performance here, see 4ccafe032005 ("block: unalign
call_single_data in struct request").

The result is a violation of the calling conventions that clang correctly
points out:

block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
                smp_call_function_single_async(cpu, &rq->csd);

It does seem that the usage of the call_single_data without cache line
alignment should still be allowed by the smp code, so just change the
function prototype so it accepts both, but leave the default alignment
unchanged for the other users. This seems better to me than adding
a local hack to shut up an otherwise correct warning in the caller.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lkml.kernel.org/r/20210505211300.3174456-1-arnd@kernel.org
---
 include/linux/smp.h |  2 +-
 kernel/smp.c        | 26 +++++++++++++-------------
 kernel/up.c         |  2 +-
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 84a0b48..f0d3ef6 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -53,7 +53,7 @@ int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
 void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 			   void *info, bool wait, const struct cpumask *mask);
 
-int smp_call_function_single_async(int cpu, call_single_data_t *csd);
+int smp_call_function_single_async(int cpu, struct __call_single_data *csd);
 
 /*
  * Call a function on all processors
diff --git a/kernel/smp.c b/kernel/smp.c
index e210749..52bf159 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -211,7 +211,7 @@ static u64 cfd_seq_inc(unsigned int src, unsigned int dst, unsigned int type)
 	} while (0)
 
 /* Record current CSD work for current CPU, NULL to erase. */
-static void __csd_lock_record(call_single_data_t *csd)
+static void __csd_lock_record(struct __call_single_data *csd)
 {
 	if (!csd) {
 		smp_mb(); /* NULL cur_csd after unlock. */
@@ -226,13 +226,13 @@ static void __csd_lock_record(call_single_data_t *csd)
 		  /* Or before unlock, as the case may be. */
 }
 
-static __always_inline void csd_lock_record(call_single_data_t *csd)
+static __always_inline void csd_lock_record(struct __call_single_data *csd)
 {
 	if (static_branch_unlikely(&csdlock_debug_enabled))
 		__csd_lock_record(csd);
 }
 
-static int csd_lock_wait_getcpu(call_single_data_t *csd)
+static int csd_lock_wait_getcpu(struct __call_single_data *csd)
 {
 	unsigned int csd_type;
 
@@ -282,7 +282,7 @@ static const char *csd_lock_get_type(unsigned int type)
 	return (type >= ARRAY_SIZE(seq_type)) ? "?" : seq_type[type];
 }
 
-static void csd_lock_print_extended(call_single_data_t *csd, int cpu)
+static void csd_lock_print_extended(struct __call_single_data *csd, int cpu)
 {
 	struct cfd_seq_local *seq = &per_cpu(cfd_seq_local, cpu);
 	unsigned int srccpu = csd->node.src;
@@ -321,7 +321,7 @@ static void csd_lock_print_extended(call_single_data_t *csd, int cpu)
  * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU,
  * so waiting on other types gets much less information.
  */
-static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id)
+static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *ts1, int *bug_id)
 {
 	int cpu = -1;
 	int cpux;
@@ -387,7 +387,7 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
  * previous function call. For multi-cpu calls its even more interesting
  * as we'll have to ensure no other cpu is observing our csd.
  */
-static void __csd_lock_wait(call_single_data_t *csd)
+static void __csd_lock_wait(struct __call_single_data *csd)
 {
 	int bug_id = 0;
 	u64 ts0, ts1;
@@ -401,7 +401,7 @@ static void __csd_lock_wait(call_single_data_t *csd)
 	smp_acquire__after_ctrl_dep();
 }
 
-static __always_inline void csd_lock_wait(call_single_data_t *csd)
+static __always_inline void csd_lock_wait(struct __call_single_data *csd)
 {
 	if (static_branch_unlikely(&csdlock_debug_enabled)) {
 		__csd_lock_wait(csd);
@@ -431,17 +431,17 @@ static void __smp_call_single_queue_debug(int cpu, struct llist_node *node)
 #else
 #define cfd_seq_store(var, src, dst, type)
 
-static void csd_lock_record(call_single_data_t *csd)
+static void csd_lock_record(struct __call_single_data *csd)
 {
 }
 
-static __always_inline void csd_lock_wait(call_single_data_t *csd)
+static __always_inline void csd_lock_wait(struct __call_single_data *csd)
 {
 	smp_cond_load_acquire(&csd->node.u_flags, !(VAL & CSD_FLAG_LOCK));
 }
 #endif
 
-static __always_inline void csd_lock(call_single_data_t *csd)
+static __always_inline void csd_lock(struct __call_single_data *csd)
 {
 	csd_lock_wait(csd);
 	csd->node.u_flags |= CSD_FLAG_LOCK;
@@ -454,7 +454,7 @@ static __always_inline void csd_lock(call_single_data_t *csd)
 	smp_wmb();
 }
 
-static __always_inline void csd_unlock(call_single_data_t *csd)
+static __always_inline void csd_unlock(struct __call_single_data *csd)
 {
 	WARN_ON(!(csd->node.u_flags & CSD_FLAG_LOCK));
 
@@ -501,7 +501,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
  * for execution on the given CPU. data must already have
  * ->func, ->info, and ->flags set.
  */
-static int generic_exec_single(int cpu, call_single_data_t *csd)
+static int generic_exec_single(int cpu, struct __call_single_data *csd)
 {
 	if (cpu == smp_processor_id()) {
 		smp_call_func_t func = csd->func;
@@ -784,7 +784,7 @@ EXPORT_SYMBOL(smp_call_function_single);
  * NOTE: Be careful, there is unfortunately no current debugging facility to
  * validate the correctness of this serialization.
  */
-int smp_call_function_single_async(int cpu, call_single_data_t *csd)
+int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
 {
 	int err = 0;
 
diff --git a/kernel/up.c b/kernel/up.c
index bf20b4a..c732130 100644
--- a/kernel/up.c
+++ b/kernel/up.c
@@ -25,7 +25,7 @@ int smp_call_function_single(int cpu, void (*func) (void *info), void *info,
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
-int smp_call_function_single_async(int cpu, call_single_data_t *csd)
+int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
 {
 	unsigned long flags;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] [v2] smp: fix smp_call_function_single_async prototype
  2021-05-06 12:03         ` Huang, Ying
@ 2021-05-06 14:30           ` Arnd Bergmann
  0 siblings, 0 replies; 9+ messages in thread
From: Arnd Bergmann @ 2021-05-06 14:30 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Linux Kernel Mailing List, Jens Axboe, Jian Cai, Guenter Roeck,
	Peter Zijlstra, Borislav Petkov, Eric Dumazet, Juergen Gross,
	Michael Ellerman, Thomas Gleixner, Nathan Chancellor,
	Nick Desaulniers, Ingo Molnar, Frederic Weisbecker, He Ying,
	Andrew Morton, Paul E. McKenney, clang-built-linux

On Thu, May 6, 2021 at 2:03 PM Huang, Ying <ying.huang@intel.com> wrote:
> Arnd Bergmann <arnd@kernel.org> writes:
> > On Thu, May 6, 2021 at 10:14 AM Huang, Ying <ying.huang@intel.com> wrote:
> >>
> >> We cannot avoid type cast in Linux kernel, such as container_of(), is
> >> there some difference here?
> >
> > container_of() does not cause any alignment problems. Assuming the outer
> > structure is aligned correctly, then the inner structure also is.
>
> So you think that the compiler may generate different code depends on
> the data structure alignment (8 vs. 32 here)?  I think that it doesn't
> on x86.  Do you know it does that on any architecture?  But I understand
> that this is possible at least in theory.

It probably won't generate any different code because that would be silly, but
it's also not a good idea to rely on that. In theory the compiler might e.g.
construct an offset into the structure using a bitwise-or instruction instead of
an addition if the alignment tells it that the lower bits are always zero.

        Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-05-06 14:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 21:12 [PATCH] [v2] smp: fix smp_call_function_single_async prototype Arnd Bergmann
2021-05-06  1:19 ` Huang, Ying
2021-05-06  7:54   ` Arnd Bergmann
2021-05-06  8:14     ` Huang, Ying
2021-05-06  8:30       ` Arnd Bergmann
2021-05-06 12:03         ` Huang, Ying
2021-05-06 14:30           ` Arnd Bergmann
2021-05-06 10:10 ` Peter Zijlstra
2021-05-06 13:48 ` [tip: locking/urgent] smp: Fix " tip-bot2 for Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).