All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-next: tracebacks in workqueue.c/__flush_work()
@ 2019-02-02 22:20 Guenter Roeck
  2019-02-03  1:21 ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Guenter Roeck @ 2019-02-02 22:20 UTC (permalink / raw)
  To: linux-kernel, Tetsuo Handa, Tejun Heo

Commit "workqueue: Try to catch flush_work() without INIT_WORK()" added
a warning if flush_work() is called without worker function.

This results in the following tracebacks, typically observed during
system shutdown.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 101 at kernel/workqueue.c:3018 __flush_work+0x2a4/0x2e0
Modules linked in:
CPU: 0 PID: 101 Comm: umount Not tainted 5.0.0-rc4-next-20190201 #1
        fffffc0007dcbd18 0000000000000000 fffffc00003338a0 fffffc00003517d4
        fffffc00003517d4 fffffc0000e56c98 fffffc0000e56c98 fffffc0000ebc1d8
        fffffc0000ec0bd8 ffffffffa8024010 0000000000000bca 0000000000000000
        fffffc00003d3ea4 fffffc0000e56c98 fffffc0000e56c60 fffffc0000ebc1d8
        fffffc0000ec0bd8 0000000000000000 0000000000000001 0000000000000000
        fffffc000782d520 0000000000000000 fffffc000044ef50 fffffc0007c4b540
Trace:
[<fffffc00003338a0>] __warn+0x160/0x190
[<fffffc00003517d4>] __flush_work+0x2a4/0x2e0
[<fffffc00003517d4>] __flush_work+0x2a4/0x2e0
[<fffffc00003d3ea4>] lru_add_drain_all+0xe4/0x190
[<fffffc000044ef50>] shrink_dcache_sb+0x70/0xb0
[<fffffc0000478dc4>] invalidate_bh_lru+0x44/0x80
[<fffffc00003a94fc>] on_each_cpu_cond+0x5c/0x90
[<fffffc0000478d80>] invalidate_bh_lru+0x0/0x80
[<fffffc000047fe7c>] invalidate_bdev+0x3c/0x70
[<fffffc0000432ca8>] reconfigure_super+0x178/0x2c0
[<fffffc000045ee64>] ksys_umount+0x664/0x680
[<fffffc000045ee9c>] sys_umount+0x1c/0x30
[<fffffc00003115d4>] entSys+0xa4/0xc0
[<fffffc00003115d4>] entSys+0xa4/0xc0

---[ end trace 613cea34708701f1 ]---

The problem is seen with several (but not all) architectures. Affected
architectures/platforms are:
	alpha
	arm:versatilepb
	m68k
	mips, mips64 (boot from IDE drive or MMC, SMP disabled)
	parisc (nosmp builds)
	sparc, sparc64 (nosmp builds)

There may be others; several of my tests fail with build failures.

If/when it is seen, the problem is persistent.

Common denominator seems to be that SMP is disabled. It does appear that
for_each_cpu() ignores the mask for nosmp builds, but I don't really
understand why.

Guenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux-next: tracebacks in workqueue.c/__flush_work()
  2019-02-02 22:20 linux-next: tracebacks in workqueue.c/__flush_work() Guenter Roeck
@ 2019-02-03  1:21 ` Tetsuo Handa
  2019-02-03 23:46   ` Rusty Russell
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2019-02-03  1:21 UTC (permalink / raw)
  To: Guenter Roeck, Chris Metcalf, Rusty Russell
  Cc: linux-kernel, Tejun Heo, linux-mm

(Adding Chris Metcalf and Rusty Russell.)

If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
commits listed below.

  Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
  expects that has_work is evaluated by for_each_cpu().

  Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
  assumes that for_each_cpu() does not need to evaluate has_work.

  Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
  expects that has_work is evaluated by for_each_cpu().

What should we do? Do we explicitly evaluate has_mask if NR_CPUS == 1 ?

 mm/swap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/swap.c b/mm/swap.c
index 4929bc1..5f07734 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -698,7 +698,8 @@ void lru_add_drain_all(void)
 	}
 
 	for_each_cpu(cpu, &has_work)
-		flush_work(&per_cpu(lru_add_drain_work, cpu));
+		if (NR_CPUS > 1 || cpumask_test_cpu(cpu, &has_work))
+			flush_work(&per_cpu(lru_add_drain_work, cpu));
 
 	mutex_unlock(&lock);
 }

On 2019/02/03 7:20, Guenter Roeck wrote:
> Commit "workqueue: Try to catch flush_work() without INIT_WORK()" added
> a warning if flush_work() is called without worker function.
> 
> This results in the following tracebacks, typically observed during
> system shutdown.
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 101 at kernel/workqueue.c:3018 __flush_work+0x2a4/0x2e0
> Modules linked in:
> CPU: 0 PID: 101 Comm: umount Not tainted 5.0.0-rc4-next-20190201 #1
>        fffffc0007dcbd18 0000000000000000 fffffc00003338a0 fffffc00003517d4
>        fffffc00003517d4 fffffc0000e56c98 fffffc0000e56c98 fffffc0000ebc1d8
>        fffffc0000ec0bd8 ffffffffa8024010 0000000000000bca 0000000000000000
>        fffffc00003d3ea4 fffffc0000e56c98 fffffc0000e56c60 fffffc0000ebc1d8
>        fffffc0000ec0bd8 0000000000000000 0000000000000001 0000000000000000
>        fffffc000782d520 0000000000000000 fffffc000044ef50 fffffc0007c4b540
> Trace:
> [<fffffc00003338a0>] __warn+0x160/0x190
> [<fffffc00003517d4>] __flush_work+0x2a4/0x2e0
> [<fffffc00003517d4>] __flush_work+0x2a4/0x2e0
> [<fffffc00003d3ea4>] lru_add_drain_all+0xe4/0x190
> [<fffffc000044ef50>] shrink_dcache_sb+0x70/0xb0
> [<fffffc0000478dc4>] invalidate_bh_lru+0x44/0x80
> [<fffffc00003a94fc>] on_each_cpu_cond+0x5c/0x90
> [<fffffc0000478d80>] invalidate_bh_lru+0x0/0x80
> [<fffffc000047fe7c>] invalidate_bdev+0x3c/0x70
> [<fffffc0000432ca8>] reconfigure_super+0x178/0x2c0
> [<fffffc000045ee64>] ksys_umount+0x664/0x680
> [<fffffc000045ee9c>] sys_umount+0x1c/0x30
> [<fffffc00003115d4>] entSys+0xa4/0xc0
> [<fffffc00003115d4>] entSys+0xa4/0xc0
> 
> ---[ end trace 613cea34708701f1 ]---
> 
> The problem is seen with several (but not all) architectures. Affected
> architectures/platforms are:
>     alpha
>     arm:versatilepb
>     m68k
>     mips, mips64 (boot from IDE drive or MMC, SMP disabled)
>     parisc (nosmp builds)
>     sparc, sparc64 (nosmp builds)
> 
> There may be others; several of my tests fail with build failures.
> 
> If/when it is seen, the problem is persistent.
> 
> Common denominator seems to be that SMP is disabled. It does appear that
> for_each_cpu() ignores the mask for nosmp builds, but I don't really
> understand why.
> 
> Guenter
> 


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: linux-next: tracebacks in workqueue.c/__flush_work()
  2019-02-03  1:21 ` Tetsuo Handa
@ 2019-02-03 23:46   ` Rusty Russell
  2019-02-06  6:31     ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Rusty Russell @ 2019-02-03 23:46 UTC (permalink / raw)
  To: Tetsuo Handa, Guenter Roeck, Chris Metcalf
  Cc: linux-kernel, Tejun Heo, linux-mm

Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> (Adding Chris Metcalf and Rusty Russell.)
>
> If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
> evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
> previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
> commits listed below.
>
>   Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
>   expects that has_work is evaluated by for_each_cpu().
>
>   Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
>   assumes that for_each_cpu() does not need to evaluate has_work.
>
>   Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
>   expects that has_work is evaluated by for_each_cpu().
>
> What should we do? Do we explicitly evaluate has_mask if NR_CPUS == 1 ?

No, fix the API to be least-surprise.  Fix 2d3854a37e8b767a too.

Doing anything else would be horrible, IMHO.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux-next: tracebacks in workqueue.c/__flush_work()
  2019-02-03 23:46   ` Rusty Russell
@ 2019-02-06  6:31     ` Tetsuo Handa
  2019-02-06 14:36       ` Guenter Roeck
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2019-02-06  6:31 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Guenter Roeck, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm,
	linux-arch

(Adding linux-arch ML.)

Rusty Russell wrote:
> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> > (Adding Chris Metcalf and Rusty Russell.)
> >
> > If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
> > evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
> > previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
> > commits listed below.
> >
> >   Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
> >   expects that has_work is evaluated by for_each_cpu().
> >
> >   Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
> >   assumes that for_each_cpu() does not need to evaluate has_work.
> >
> >   Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
> >   expects that has_work is evaluated by for_each_cpu().
> >
> > What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ?
> 
> No, fix the API to be least-surprise.  Fix 2d3854a37e8b767a too.
> 
> Doing anything else would be horrible, IMHO.
> 

Fixing 2d3854a37e8b767a might involve subtle changes. If we do

----------
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 147bdec..1ec5321 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -129,7 +129,7 @@ static inline unsigned int cpumask_check(unsigned int cpu)
 	return cpu;
 }
 
-#if NR_CPUS == 1
+#if NR_CPUS == 1 && 0
 /* Uniprocessor.  Assume all masks are "1". */
 static inline unsigned int cpumask_first(const struct cpumask *srcp)
 {
diff --git a/lib/Makefile b/lib/Makefile
index e1b59da..da6f99c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -28,7 +28,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 
 lib-$(CONFIG_PRINTK) += dump_stack.o
 lib-$(CONFIG_MMU) += ioremap.o
-lib-$(CONFIG_SMP) += cpumask.o
+lib-y += cpumask.o
 
 lib-y	+= kobject.o klist.o
 obj-y	+= lockref.o
----------

then we get e.g. a build failure like below.

----------
arch/x86/kernel/cpu/cacheinfo.o: In function `_populate_cache_leaves':
cacheinfo.c:(.text+0xb20): undefined reference to `cpu_llc_shared_map'
cacheinfo.c:(.text+0xb48): undefined reference to `cpu_llc_shared_map'
cacheinfo.c:(.text+0xb64): undefined reference to `cpu_llc_shared_map'
make: *** [vmlinux] Error 1
----------

This build failure is caused due to the

  DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);

line which cpu_llc_shared_mask() depends on is in arch/x86/kernel/smpboot.c
and the

  obj-$(CONFIG_SMP)               += smpboot.o

line is in arch/x86/kernel/Makefile . We could try

----------
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c4d1023..bf95da3 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -23,6 +23,10 @@
 
 #include "cpu.h"
 
+#ifndef CONFIG_SMP
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
+#endif
+
 #define LVL_1_INST	1
 #define LVL_1_DATA	2
 #define LVL_2		3
----------

or

----------
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c4d1023..b8a22b6 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -886,6 +886,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 	 * to derive shared_cpu_map.
 	 */
 	if (index == 3) {
+#ifdef CONFIG_SMP
 		for_each_cpu(i, cpu_llc_shared_mask(cpu)) {
 			this_cpu_ci = get_cpu_cacheinfo(i);
 			if (!this_cpu_ci->info_list)
@@ -898,6 +899,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 						&this_leaf->shared_cpu_map);
 			}
 		}
+#endif
 	} else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
 		unsigned int apicid, nshared, first, last;
 
----------

but I don't know whether this is a correct fix, for for_each_cpu() currently
always executes the loop because for_each_cpu() does not evaluate
cpu_llc_shared_mask(cpu) argument. But if cpu_llc_shared_mask(cpu) argument
is evaluated by for_each_cpu(), and given that nobody updates cpu_llc_shared_map
if CONFIG_SMP=n, I guess that this for_each_cpu() becomes a no-op loop. I can't
evaluate whether this change is safe, and there might be similar code in other
architectures.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: linux-next: tracebacks in workqueue.c/__flush_work()
  2019-02-06  6:31     ` Tetsuo Handa
@ 2019-02-06 14:36       ` Guenter Roeck
  2019-02-06 14:57         ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Guenter Roeck @ 2019-02-06 14:36 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Rusty Russell, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm,
	linux-arch

On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote:
> (Adding linux-arch ML.)
> 
> Rusty Russell wrote:
> > Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> > > (Adding Chris Metcalf and Rusty Russell.)
> > >
> > > If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
> > > evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
> > > previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
> > > commits listed below.
> > >
> > >   Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
> > >   expects that has_work is evaluated by for_each_cpu().
> > >
> > >   Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
> > >   assumes that for_each_cpu() does not need to evaluate has_work.
> > >
> > >   Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
> > >   expects that has_work is evaluated by for_each_cpu().
> > >
> > > What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ?
> > 
> > No, fix the API to be least-surprise.  Fix 2d3854a37e8b767a too.
> > 
> > Doing anything else would be horrible, IMHO.
> > 
> 
> Fixing 2d3854a37e8b767a might involve subtle changes. If we do
> 

Why not fix the macros ?

#define for_each_cpu(cpu, mask)                 \
        for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)

does not really make sense since it does not evaluate mask.

#define for_each_cpu(cpu, mask)                 \
        for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++)

or something similar might do it.

Guenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux-next: tracebacks in workqueue.c/__flush_work()
  2019-02-06 14:36       ` Guenter Roeck
@ 2019-02-06 14:57         ` Tetsuo Handa
  2019-02-06 16:23           ` Guenter Roeck
  0 siblings, 1 reply; 8+ messages in thread
From: Tetsuo Handa @ 2019-02-06 14:57 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rusty Russell, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm,
	linux-arch

On 2019/02/06 23:36, Guenter Roeck wrote:
> On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote:
>> (Adding linux-arch ML.)
>>
>> Rusty Russell wrote:
>>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
>>>> (Adding Chris Metcalf and Rusty Russell.)
>>>>
>>>> If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
>>>> evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
>>>> previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
>>>> commits listed below.
>>>>
>>>>   Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
>>>>   expects that has_work is evaluated by for_each_cpu().
>>>>
>>>>   Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
>>>>   assumes that for_each_cpu() does not need to evaluate has_work.
>>>>
>>>>   Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
>>>>   expects that has_work is evaluated by for_each_cpu().
>>>>
>>>> What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ?
>>>
>>> No, fix the API to be least-surprise.  Fix 2d3854a37e8b767a too.
>>>
>>> Doing anything else would be horrible, IMHO.
>>>
>>
>> Fixing 2d3854a37e8b767a might involve subtle changes. If we do
>>
> 
> Why not fix the macros ?
> 
> #define for_each_cpu(cpu, mask)                 \
>         for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
> 
> does not really make sense since it does not evaluate mask.
> 
> #define for_each_cpu(cpu, mask)                 \
>         for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++)
> 
> or something similar might do it.

Fixing macros is fine, The problem is that "mask" becomes evaluated
which might be currently undefined or unassigned if CONFIG_SMP=n.
Evaluating "mask" generates expected behavior for lru_add_drain_all()
case. But there might be cases where evaluating "mask" generate
unexpected behavior/results.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux-next: tracebacks in workqueue.c/__flush_work()
  2019-02-06 14:57         ` Tetsuo Handa
@ 2019-02-06 16:23           ` Guenter Roeck
  2019-02-06 16:38             ` Tetsuo Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Guenter Roeck @ 2019-02-06 16:23 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Rusty Russell, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm,
	linux-arch

On Wed, Feb 06, 2019 at 11:57:45PM +0900, Tetsuo Handa wrote:
> On 2019/02/06 23:36, Guenter Roeck wrote:
> > On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote:
> >> (Adding linux-arch ML.)
> >>
> >> Rusty Russell wrote:
> >>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> >>>> (Adding Chris Metcalf and Rusty Russell.)
> >>>>
> >>>> If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
> >>>> evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
> >>>> previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
> >>>> commits listed below.
> >>>>
> >>>>   Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
> >>>>   expects that has_work is evaluated by for_each_cpu().
> >>>>
> >>>>   Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
> >>>>   assumes that for_each_cpu() does not need to evaluate has_work.
> >>>>
> >>>>   Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
> >>>>   expects that has_work is evaluated by for_each_cpu().
> >>>>
> >>>> What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ?
> >>>
> >>> No, fix the API to be least-surprise.  Fix 2d3854a37e8b767a too.
> >>>
> >>> Doing anything else would be horrible, IMHO.
> >>>
> >>
> >> Fixing 2d3854a37e8b767a might involve subtle changes. If we do
> >>
> > 
> > Why not fix the macros ?
> > 
> > #define for_each_cpu(cpu, mask)                 \
> >         for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
> > 
> > does not really make sense since it does not evaluate mask.
> > 
> > #define for_each_cpu(cpu, mask)                 \
> >         for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++)
> > 
> > or something similar might do it.
> 
> Fixing macros is fine, The problem is that "mask" becomes evaluated
> which might be currently undefined or unassigned if CONFIG_SMP=n.
> Evaluating "mask" generates expected behavior for lru_add_drain_all()
> case. But there might be cases where evaluating "mask" generate
> unexpected behavior/results.

Interesting notion. I would have assumed that passing a parameter
to a function or macro implies that this parameter may be used.

This makes me wonder - what is the point of ", (mask)" in the current
macros ? It doesn't make sense to me.

Anyway, I agree that fixing the macro might result in some failures.
However, I would argue that those failures would actually be bugs,
hidden by the buggy macros. But of course that it just my opinion.

Guenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: linux-next: tracebacks in workqueue.c/__flush_work()
  2019-02-06 16:23           ` Guenter Roeck
@ 2019-02-06 16:38             ` Tetsuo Handa
  0 siblings, 0 replies; 8+ messages in thread
From: Tetsuo Handa @ 2019-02-06 16:38 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rusty Russell, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm,
	linux-arch

On 2019/02/07 1:23, Guenter Roeck wrote:
> On Wed, Feb 06, 2019 at 11:57:45PM +0900, Tetsuo Handa wrote:
>> On 2019/02/06 23:36, Guenter Roeck wrote:
>>> On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote:
>>>> (Adding linux-arch ML.)
>>>>
>>>> Rusty Russell wrote:
>>>>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
>>>>>> (Adding Chris Metcalf and Rusty Russell.)
>>>>>>
>>>>>> If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not
>>>>>> evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at
>>>>>> previous for_each_online_cpu() loop. Guenter Roeck found a problem among three
>>>>>> commits listed below.
>>>>>>
>>>>>>   Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective")
>>>>>>   expects that has_work is evaluated by for_each_cpu().
>>>>>>
>>>>>>   Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything")
>>>>>>   assumes that for_each_cpu() does not need to evaluate has_work.
>>>>>>
>>>>>>   Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().")
>>>>>>   expects that has_work is evaluated by for_each_cpu().
>>>>>>
>>>>>> What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ?
>>>>>
>>>>> No, fix the API to be least-surprise.  Fix 2d3854a37e8b767a too.
>>>>>
>>>>> Doing anything else would be horrible, IMHO.
>>>>>
>>>>
>>>> Fixing 2d3854a37e8b767a might involve subtle changes. If we do
>>>>
>>>
>>> Why not fix the macros ?
>>>
>>> #define for_each_cpu(cpu, mask)                 \
>>>         for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
>>>
>>> does not really make sense since it does not evaluate mask.
>>>
>>> #define for_each_cpu(cpu, mask)                 \
>>>         for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++)
>>>
>>> or something similar might do it.
>>
>> Fixing macros is fine, The problem is that "mask" becomes evaluated
>> which might be currently undefined or unassigned if CONFIG_SMP=n.
>> Evaluating "mask" generates expected behavior for lru_add_drain_all()
>> case. But there might be cases where evaluating "mask" generate
>> unexpected behavior/results.
> 
> Interesting notion. I would have assumed that passing a parameter
> to a function or macro implies that this parameter may be used.
> 
> This makes me wonder - what is the point of ", (mask)" in the current
> macros ? It doesn't make sense to me.

I guess it is to avoid "unused argument" warning; but optimization
accepted passing even "undefined argument".

> 
> Anyway, I agree that fixing the macro might result in some failures.
> However, I would argue that those failures would actually be bugs,
> hidden by the buggy macros. But of course that it just my opinion.

Yes, they are bugs which should be fixed. But since suddenly changing
these macros might break something, I suggest temporarily managing at
lru_add_drain_all() side for now, and make sure we have enough period
at linux-next.git for testing changes to these macros.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-06 16:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-02 22:20 linux-next: tracebacks in workqueue.c/__flush_work() Guenter Roeck
2019-02-03  1:21 ` Tetsuo Handa
2019-02-03 23:46   ` Rusty Russell
2019-02-06  6:31     ` Tetsuo Handa
2019-02-06 14:36       ` Guenter Roeck
2019-02-06 14:57         ` Tetsuo Handa
2019-02-06 16:23           ` Guenter Roeck
2019-02-06 16:38             ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.