* linux-next: tracebacks in workqueue.c/__flush_work() @ 2019-02-02 22:20 Guenter Roeck 2019-02-03 1:21 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Guenter Roeck @ 2019-02-02 22:20 UTC (permalink / raw) To: linux-kernel, Tetsuo Handa, Tejun Heo Commit "workqueue: Try to catch flush_work() without INIT_WORK()" added a warning if flush_work() is called without worker function. This results in the following tracebacks, typically observed during system shutdown. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 101 at kernel/workqueue.c:3018 __flush_work+0x2a4/0x2e0 Modules linked in: CPU: 0 PID: 101 Comm: umount Not tainted 5.0.0-rc4-next-20190201 #1 fffffc0007dcbd18 0000000000000000 fffffc00003338a0 fffffc00003517d4 fffffc00003517d4 fffffc0000e56c98 fffffc0000e56c98 fffffc0000ebc1d8 fffffc0000ec0bd8 ffffffffa8024010 0000000000000bca 0000000000000000 fffffc00003d3ea4 fffffc0000e56c98 fffffc0000e56c60 fffffc0000ebc1d8 fffffc0000ec0bd8 0000000000000000 0000000000000001 0000000000000000 fffffc000782d520 0000000000000000 fffffc000044ef50 fffffc0007c4b540 Trace: [<fffffc00003338a0>] __warn+0x160/0x190 [<fffffc00003517d4>] __flush_work+0x2a4/0x2e0 [<fffffc00003517d4>] __flush_work+0x2a4/0x2e0 [<fffffc00003d3ea4>] lru_add_drain_all+0xe4/0x190 [<fffffc000044ef50>] shrink_dcache_sb+0x70/0xb0 [<fffffc0000478dc4>] invalidate_bh_lru+0x44/0x80 [<fffffc00003a94fc>] on_each_cpu_cond+0x5c/0x90 [<fffffc0000478d80>] invalidate_bh_lru+0x0/0x80 [<fffffc000047fe7c>] invalidate_bdev+0x3c/0x70 [<fffffc0000432ca8>] reconfigure_super+0x178/0x2c0 [<fffffc000045ee64>] ksys_umount+0x664/0x680 [<fffffc000045ee9c>] sys_umount+0x1c/0x30 [<fffffc00003115d4>] entSys+0xa4/0xc0 [<fffffc00003115d4>] entSys+0xa4/0xc0 ---[ end trace 613cea34708701f1 ]--- The problem is seen with several (but not all) architectures. Affected architectures/platforms are: alpha arm:versatilepb m68k mips, mips64 (boot from IDE drive or MMC, SMP disabled) parisc (nosmp builds) sparc, sparc64 (nosmp builds) There may be others; several of my tests fail with build failures. If/when it is seen, the problem is persistent. Common denominator seems to be that SMP is disabled. It does appear that for_each_cpu() ignores the mask for nosmp builds, but I don't really understand why. Guenter ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux-next: tracebacks in workqueue.c/__flush_work() 2019-02-02 22:20 linux-next: tracebacks in workqueue.c/__flush_work() Guenter Roeck @ 2019-02-03 1:21 ` Tetsuo Handa 2019-02-03 23:46 ` Rusty Russell 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2019-02-03 1:21 UTC (permalink / raw) To: Guenter Roeck, Chris Metcalf, Rusty Russell Cc: linux-kernel, Tejun Heo, linux-mm (Adding Chris Metcalf and Rusty Russell.) If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at previous for_each_online_cpu() loop. Guenter Roeck found a problem among three commits listed below. Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective") expects that has_work is evaluated by for_each_cpu(). Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything") assumes that for_each_cpu() does not need to evaluate has_work. Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") expects that has_work is evaluated by for_each_cpu(). What should we do? Do we explicitly evaluate has_mask if NR_CPUS == 1 ? mm/swap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/swap.c b/mm/swap.c index 4929bc1..5f07734 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -698,7 +698,8 @@ void lru_add_drain_all(void) } for_each_cpu(cpu, &has_work) - flush_work(&per_cpu(lru_add_drain_work, cpu)); + if (NR_CPUS > 1 || cpumask_test_cpu(cpu, &has_work)) + flush_work(&per_cpu(lru_add_drain_work, cpu)); mutex_unlock(&lock); } On 2019/02/03 7:20, Guenter Roeck wrote: > Commit "workqueue: Try to catch flush_work() without INIT_WORK()" added > a warning if flush_work() is called without worker function. > > This results in the following tracebacks, typically observed during > system shutdown. > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 101 at kernel/workqueue.c:3018 __flush_work+0x2a4/0x2e0 > Modules linked in: > CPU: 0 PID: 101 Comm: umount Not tainted 5.0.0-rc4-next-20190201 #1 > fffffc0007dcbd18 0000000000000000 fffffc00003338a0 fffffc00003517d4 > fffffc00003517d4 fffffc0000e56c98 fffffc0000e56c98 fffffc0000ebc1d8 > fffffc0000ec0bd8 ffffffffa8024010 0000000000000bca 0000000000000000 > fffffc00003d3ea4 fffffc0000e56c98 fffffc0000e56c60 fffffc0000ebc1d8 > fffffc0000ec0bd8 0000000000000000 0000000000000001 0000000000000000 > fffffc000782d520 0000000000000000 fffffc000044ef50 fffffc0007c4b540 > Trace: > [<fffffc00003338a0>] __warn+0x160/0x190 > [<fffffc00003517d4>] __flush_work+0x2a4/0x2e0 > [<fffffc00003517d4>] __flush_work+0x2a4/0x2e0 > [<fffffc00003d3ea4>] lru_add_drain_all+0xe4/0x190 > [<fffffc000044ef50>] shrink_dcache_sb+0x70/0xb0 > [<fffffc0000478dc4>] invalidate_bh_lru+0x44/0x80 > [<fffffc00003a94fc>] on_each_cpu_cond+0x5c/0x90 > [<fffffc0000478d80>] invalidate_bh_lru+0x0/0x80 > [<fffffc000047fe7c>] invalidate_bdev+0x3c/0x70 > [<fffffc0000432ca8>] reconfigure_super+0x178/0x2c0 > [<fffffc000045ee64>] ksys_umount+0x664/0x680 > [<fffffc000045ee9c>] sys_umount+0x1c/0x30 > [<fffffc00003115d4>] entSys+0xa4/0xc0 > [<fffffc00003115d4>] entSys+0xa4/0xc0 > > ---[ end trace 613cea34708701f1 ]--- > > The problem is seen with several (but not all) architectures. Affected > architectures/platforms are: > alpha > arm:versatilepb > m68k > mips, mips64 (boot from IDE drive or MMC, SMP disabled) > parisc (nosmp builds) > sparc, sparc64 (nosmp builds) > > There may be others; several of my tests fail with build failures. > > If/when it is seen, the problem is persistent. > > Common denominator seems to be that SMP is disabled. It does appear that > for_each_cpu() ignores the mask for nosmp builds, but I don't really > understand why. > > Guenter > ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: linux-next: tracebacks in workqueue.c/__flush_work() 2019-02-03 1:21 ` Tetsuo Handa @ 2019-02-03 23:46 ` Rusty Russell 2019-02-06 6:31 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Rusty Russell @ 2019-02-03 23:46 UTC (permalink / raw) To: Tetsuo Handa, Guenter Roeck, Chris Metcalf Cc: linux-kernel, Tejun Heo, linux-mm Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > (Adding Chris Metcalf and Rusty Russell.) > > If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not > evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at > previous for_each_online_cpu() loop. Guenter Roeck found a problem among three > commits listed below. > > Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective") > expects that has_work is evaluated by for_each_cpu(). > > Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything") > assumes that for_each_cpu() does not need to evaluate has_work. > > Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") > expects that has_work is evaluated by for_each_cpu(). > > What should we do? Do we explicitly evaluate has_mask if NR_CPUS == 1 ? No, fix the API to be least-surprise. Fix 2d3854a37e8b767a too. Doing anything else would be horrible, IMHO. Cheers, Rusty. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux-next: tracebacks in workqueue.c/__flush_work() 2019-02-03 23:46 ` Rusty Russell @ 2019-02-06 6:31 ` Tetsuo Handa 2019-02-06 14:36 ` Guenter Roeck 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2019-02-06 6:31 UTC (permalink / raw) To: Rusty Russell Cc: Guenter Roeck, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm, linux-arch (Adding linux-arch ML.) Rusty Russell wrote: > Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > > (Adding Chris Metcalf and Rusty Russell.) > > > > If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not > > evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at > > previous for_each_online_cpu() loop. Guenter Roeck found a problem among three > > commits listed below. > > > > Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective") > > expects that has_work is evaluated by for_each_cpu(). > > > > Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything") > > assumes that for_each_cpu() does not need to evaluate has_work. > > > > Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") > > expects that has_work is evaluated by for_each_cpu(). > > > > What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ? > > No, fix the API to be least-surprise. Fix 2d3854a37e8b767a too. > > Doing anything else would be horrible, IMHO. > Fixing 2d3854a37e8b767a might involve subtle changes. If we do ---------- diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 147bdec..1ec5321 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -129,7 +129,7 @@ static inline unsigned int cpumask_check(unsigned int cpu) return cpu; } -#if NR_CPUS == 1 +#if NR_CPUS == 1 && 0 /* Uniprocessor. Assume all masks are "1". */ static inline unsigned int cpumask_first(const struct cpumask *srcp) { diff --git a/lib/Makefile b/lib/Makefile index e1b59da..da6f99c 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -28,7 +28,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ lib-$(CONFIG_PRINTK) += dump_stack.o lib-$(CONFIG_MMU) += ioremap.o -lib-$(CONFIG_SMP) += cpumask.o +lib-y += cpumask.o lib-y += kobject.o klist.o obj-y += lockref.o ---------- then we get e.g. a build failure like below. ---------- arch/x86/kernel/cpu/cacheinfo.o: In function `_populate_cache_leaves': cacheinfo.c:(.text+0xb20): undefined reference to `cpu_llc_shared_map' cacheinfo.c:(.text+0xb48): undefined reference to `cpu_llc_shared_map' cacheinfo.c:(.text+0xb64): undefined reference to `cpu_llc_shared_map' make: *** [vmlinux] Error 1 ---------- This build failure is caused due to the DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map); line which cpu_llc_shared_mask() depends on is in arch/x86/kernel/smpboot.c and the obj-$(CONFIG_SMP) += smpboot.o line is in arch/x86/kernel/Makefile . We could try ---------- diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index c4d1023..bf95da3 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -23,6 +23,10 @@ #include "cpu.h" +#ifndef CONFIG_SMP +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map); +#endif + #define LVL_1_INST 1 #define LVL_1_DATA 2 #define LVL_2 3 ---------- or ---------- diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index c4d1023..b8a22b6 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -886,6 +886,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index, * to derive shared_cpu_map. */ if (index == 3) { +#ifdef CONFIG_SMP for_each_cpu(i, cpu_llc_shared_mask(cpu)) { this_cpu_ci = get_cpu_cacheinfo(i); if (!this_cpu_ci->info_list) @@ -898,6 +899,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index, &this_leaf->shared_cpu_map); } } +#endif } else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { unsigned int apicid, nshared, first, last; ---------- but I don't know whether this is a correct fix, for for_each_cpu() currently always executes the loop because for_each_cpu() does not evaluate cpu_llc_shared_mask(cpu) argument. But if cpu_llc_shared_mask(cpu) argument is evaluated by for_each_cpu(), and given that nobody updates cpu_llc_shared_map if CONFIG_SMP=n, I guess that this for_each_cpu() becomes a no-op loop. I can't evaluate whether this change is safe, and there might be similar code in other architectures. ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: linux-next: tracebacks in workqueue.c/__flush_work() 2019-02-06 6:31 ` Tetsuo Handa @ 2019-02-06 14:36 ` Guenter Roeck 2019-02-06 14:57 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Guenter Roeck @ 2019-02-06 14:36 UTC (permalink / raw) To: Tetsuo Handa Cc: Rusty Russell, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm, linux-arch On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote: > (Adding linux-arch ML.) > > Rusty Russell wrote: > > Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > > > (Adding Chris Metcalf and Rusty Russell.) > > > > > > If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not > > > evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at > > > previous for_each_online_cpu() loop. Guenter Roeck found a problem among three > > > commits listed below. > > > > > > Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective") > > > expects that has_work is evaluated by for_each_cpu(). > > > > > > Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything") > > > assumes that for_each_cpu() does not need to evaluate has_work. > > > > > > Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") > > > expects that has_work is evaluated by for_each_cpu(). > > > > > > What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ? > > > > No, fix the API to be least-surprise. Fix 2d3854a37e8b767a too. > > > > Doing anything else would be horrible, IMHO. > > > > Fixing 2d3854a37e8b767a might involve subtle changes. If we do > Why not fix the macros ? #define for_each_cpu(cpu, mask) \ for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask) does not really make sense since it does not evaluate mask. #define for_each_cpu(cpu, mask) \ for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++) or something similar might do it. Guenter ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux-next: tracebacks in workqueue.c/__flush_work() 2019-02-06 14:36 ` Guenter Roeck @ 2019-02-06 14:57 ` Tetsuo Handa 2019-02-06 16:23 ` Guenter Roeck 0 siblings, 1 reply; 8+ messages in thread From: Tetsuo Handa @ 2019-02-06 14:57 UTC (permalink / raw) To: Guenter Roeck Cc: Rusty Russell, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm, linux-arch On 2019/02/06 23:36, Guenter Roeck wrote: > On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote: >> (Adding linux-arch ML.) >> >> Rusty Russell wrote: >>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: >>>> (Adding Chris Metcalf and Rusty Russell.) >>>> >>>> If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not >>>> evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at >>>> previous for_each_online_cpu() loop. Guenter Roeck found a problem among three >>>> commits listed below. >>>> >>>> Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective") >>>> expects that has_work is evaluated by for_each_cpu(). >>>> >>>> Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything") >>>> assumes that for_each_cpu() does not need to evaluate has_work. >>>> >>>> Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") >>>> expects that has_work is evaluated by for_each_cpu(). >>>> >>>> What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ? >>> >>> No, fix the API to be least-surprise. Fix 2d3854a37e8b767a too. >>> >>> Doing anything else would be horrible, IMHO. >>> >> >> Fixing 2d3854a37e8b767a might involve subtle changes. If we do >> > > Why not fix the macros ? > > #define for_each_cpu(cpu, mask) \ > for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask) > > does not really make sense since it does not evaluate mask. > > #define for_each_cpu(cpu, mask) \ > for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++) > > or something similar might do it. Fixing macros is fine, The problem is that "mask" becomes evaluated which might be currently undefined or unassigned if CONFIG_SMP=n. Evaluating "mask" generates expected behavior for lru_add_drain_all() case. But there might be cases where evaluating "mask" generate unexpected behavior/results. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux-next: tracebacks in workqueue.c/__flush_work() 2019-02-06 14:57 ` Tetsuo Handa @ 2019-02-06 16:23 ` Guenter Roeck 2019-02-06 16:38 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Guenter Roeck @ 2019-02-06 16:23 UTC (permalink / raw) To: Tetsuo Handa Cc: Rusty Russell, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm, linux-arch On Wed, Feb 06, 2019 at 11:57:45PM +0900, Tetsuo Handa wrote: > On 2019/02/06 23:36, Guenter Roeck wrote: > > On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote: > >> (Adding linux-arch ML.) > >> > >> Rusty Russell wrote: > >>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > >>>> (Adding Chris Metcalf and Rusty Russell.) > >>>> > >>>> If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not > >>>> evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at > >>>> previous for_each_online_cpu() loop. Guenter Roeck found a problem among three > >>>> commits listed below. > >>>> > >>>> Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective") > >>>> expects that has_work is evaluated by for_each_cpu(). > >>>> > >>>> Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything") > >>>> assumes that for_each_cpu() does not need to evaluate has_work. > >>>> > >>>> Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") > >>>> expects that has_work is evaluated by for_each_cpu(). > >>>> > >>>> What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ? > >>> > >>> No, fix the API to be least-surprise. Fix 2d3854a37e8b767a too. > >>> > >>> Doing anything else would be horrible, IMHO. > >>> > >> > >> Fixing 2d3854a37e8b767a might involve subtle changes. If we do > >> > > > > Why not fix the macros ? > > > > #define for_each_cpu(cpu, mask) \ > > for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask) > > > > does not really make sense since it does not evaluate mask. > > > > #define for_each_cpu(cpu, mask) \ > > for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++) > > > > or something similar might do it. > > Fixing macros is fine, The problem is that "mask" becomes evaluated > which might be currently undefined or unassigned if CONFIG_SMP=n. > Evaluating "mask" generates expected behavior for lru_add_drain_all() > case. But there might be cases where evaluating "mask" generate > unexpected behavior/results. Interesting notion. I would have assumed that passing a parameter to a function or macro implies that this parameter may be used. This makes me wonder - what is the point of ", (mask)" in the current macros ? It doesn't make sense to me. Anyway, I agree that fixing the macro might result in some failures. However, I would argue that those failures would actually be bugs, hidden by the buggy macros. But of course that it just my opinion. Guenter ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: linux-next: tracebacks in workqueue.c/__flush_work() 2019-02-06 16:23 ` Guenter Roeck @ 2019-02-06 16:38 ` Tetsuo Handa 0 siblings, 0 replies; 8+ messages in thread From: Tetsuo Handa @ 2019-02-06 16:38 UTC (permalink / raw) To: Guenter Roeck Cc: Rusty Russell, Chris Metcalf, linux-kernel, Tejun Heo, linux-mm, linux-arch On 2019/02/07 1:23, Guenter Roeck wrote: > On Wed, Feb 06, 2019 at 11:57:45PM +0900, Tetsuo Handa wrote: >> On 2019/02/06 23:36, Guenter Roeck wrote: >>> On Wed, Feb 06, 2019 at 03:31:09PM +0900, Tetsuo Handa wrote: >>>> (Adding linux-arch ML.) >>>> >>>> Rusty Russell wrote: >>>>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: >>>>>> (Adding Chris Metcalf and Rusty Russell.) >>>>>> >>>>>> If NR_CPUS == 1 due to CONFIG_SMP=n, for_each_cpu(cpu, &has_work) loop does not >>>>>> evaluate "struct cpumask has_work" modified by cpumask_set_cpu(cpu, &has_work) at >>>>>> previous for_each_online_cpu() loop. Guenter Roeck found a problem among three >>>>>> commits listed below. >>>>>> >>>>>> Commit 5fbc461636c32efd ("mm: make lru_add_drain_all() selective") >>>>>> expects that has_work is evaluated by for_each_cpu(). >>>>>> >>>>>> Commit 2d3854a37e8b767a ("cpumask: introduce new API, without changing anything") >>>>>> assumes that for_each_cpu() does not need to evaluate has_work. >>>>>> >>>>>> Commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") >>>>>> expects that has_work is evaluated by for_each_cpu(). >>>>>> >>>>>> What should we do? Do we explicitly evaluate has_work if NR_CPUS == 1 ? >>>>> >>>>> No, fix the API to be least-surprise. Fix 2d3854a37e8b767a too. >>>>> >>>>> Doing anything else would be horrible, IMHO. >>>>> >>>> >>>> Fixing 2d3854a37e8b767a might involve subtle changes. If we do >>>> >>> >>> Why not fix the macros ? >>> >>> #define for_each_cpu(cpu, mask) \ >>> for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask) >>> >>> does not really make sense since it does not evaluate mask. >>> >>> #define for_each_cpu(cpu, mask) \ >>> for ((cpu) = 0; (cpu) < 1 && cpumask_test_cpu((cpu), (mask)); (cpu)++) >>> >>> or something similar might do it. >> >> Fixing macros is fine, The problem is that "mask" becomes evaluated >> which might be currently undefined or unassigned if CONFIG_SMP=n. >> Evaluating "mask" generates expected behavior for lru_add_drain_all() >> case. But there might be cases where evaluating "mask" generate >> unexpected behavior/results. > > Interesting notion. I would have assumed that passing a parameter > to a function or macro implies that this parameter may be used. > > This makes me wonder - what is the point of ", (mask)" in the current > macros ? It doesn't make sense to me. I guess it is to avoid "unused argument" warning; but optimization accepted passing even "undefined argument". > > Anyway, I agree that fixing the macro might result in some failures. > However, I would argue that those failures would actually be bugs, > hidden by the buggy macros. But of course that it just my opinion. Yes, they are bugs which should be fixed. But since suddenly changing these macros might break something, I suggest temporarily managing at lru_add_drain_all() side for now, and make sure we have enough period at linux-next.git for testing changes to these macros. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-02-06 16:39 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-02 22:20 linux-next: tracebacks in workqueue.c/__flush_work() Guenter Roeck 2019-02-03 1:21 ` Tetsuo Handa 2019-02-03 23:46 ` Rusty Russell 2019-02-06 6:31 ` Tetsuo Handa 2019-02-06 14:36 ` Guenter Roeck 2019-02-06 14:57 ` Tetsuo Handa 2019-02-06 16:23 ` Guenter Roeck 2019-02-06 16:38 ` Tetsuo Handa
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.