* Re: race condition in schedule_on_each_cpu()
[not found] <tencent_0777D84B54B4163A3B85255A@qq.com>
@ 2013-06-06 21:23 ` Tejun Heo
2013-06-07 1:34 ` weiqi
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2013-06-06 21:23 UTC (permalink / raw)
To: 韦奇; +Cc: torvalds, linux-kernel
Hello,
On Thu, Jun 06, 2013 at 06:14:46PM +0800, 韦奇 wrote:
> Hello, Tejun Heo
> thanks for your help,
> 1) I've test the two kernel version on this problem:
> latest 3.10-rc3:(https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.10-rc3.tar.xz)
> latest 3.0branch - 3.0.80:(https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.0.80.tar.xz
>
> they all work fine when hot remove raid disk..
Thanks for verifying.
> | preemption | machine 1 | machine 2 | kversion |
> -------------------------------------------------------------------------------
> | Fully Preemptible | stuck | no stuck | 3.0.30-rt50 |
> | Low-Latency Desktop | no stuck | no stuck | 3.0.30-rt50 |
> | Low-Latency Desktop | no stuck | -- | 3.0.30 |
> | default | no stuck | -- | 3.0.80 |
> | default | no stuck | -- | 3.10-rc3 |
>
> could you tell me some way to debug this problem. for example, how
> to debug workqueue deadlock? I want to find the deadlock point.
I looked through the logs but the only worker depletion related
patches which pop up are around CPU hotplugs, so I don't think they
apply here. If the problem is relatively easy to reproduce && you
can't move onto a newer kernel, I'm afraid bisection probably is the
best option.
Thanks!
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: race condition in schedule_on_each_cpu()
2013-06-06 21:23 ` race condition in schedule_on_each_cpu() Tejun Heo
@ 2013-06-07 1:34 ` weiqi
2013-06-07 2:24 ` weiqi
0 siblings, 1 reply; 6+ messages in thread
From: weiqi @ 2013-06-07 1:34 UTC (permalink / raw)
To: Tejun Heo; +Cc: 韦奇, torvalds, linux-kernel
it's preemption mode related ,
on the 3.0.30-rt50, only config kernel with highest preemption level
(Fully Preemptible Kernel (RT)) in cpu preemption model
will cause problem
and even i use the "Preemptible Kernel" or "Preemptible Kernel
(Low-Latency Desktop)" the problem would not happen..
> I looked through the logs but the only worker depletion related
patches which pop up are around CPU hotplugs, so I don't think they
apply here. If the problem is relatively easy to reproduce && you can't
move onto a newer kernel, I'm afraid bisection probably is the best
option. Thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: race condition in schedule_on_each_cpu()
2013-06-07 1:34 ` weiqi
@ 2013-06-07 2:24 ` weiqi
2013-06-07 23:22 ` Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: weiqi @ 2013-06-07 2:24 UTC (permalink / raw)
To: Tejun Heo; +Cc: weiqi, torvalds, linux-kernel
In the previous message,You mentioned:
>> by the way, I'm wondering about what's the race condition before
which doesn't exist now
> Before the commit you originally quoted, the calling thread could be
preempted and migrated to another CPU before get_online_cpus() thus
ending up executing the function twice on the new cpu but skipping the
old one.
does this situation will happen in "Full preemption" config, on 3.0.30-rt50?
于 2013年06月07日 09:34, weiqi@kylinos.com.cn 写道:
> it's preemption mode related ,
>
> on the 3.0.30-rt50, only config kernel with highest preemption
> level (Fully Preemptible Kernel (RT)) in cpu preemption model
> will cause problem
>
> and even i use the "Preemptible Kernel" or "Preemptible Kernel
> (Low-Latency Desktop)" the problem would not happen..
>
>
> > I looked through the logs but the only worker depletion related
> patches which pop up are around CPU hotplugs, so I don't think they
> apply here. If the problem is relatively easy to reproduce && you
> can't move onto a newer kernel, I'm afraid bisection probably is the
> best option. Thanks!
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: race condition in schedule_on_each_cpu()
2013-06-07 2:24 ` weiqi
@ 2013-06-07 23:22 ` Tejun Heo
[not found] ` <51B27744.6090507@kylinos.com.cn>
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2013-06-07 23:22 UTC (permalink / raw)
To: weiqi; +Cc: torvalds, linux-kernel
On Fri, Jun 07, 2013 at 10:24:50AM +0800, weiqi@kylinos.com.cn wrote:
> >Before the commit you originally quoted, the calling thread could be
> preempted and migrated to another CPU before get_online_cpus() thus
> ending up executing the function twice on the new cpu but skipping the
> old one.
>
> does this situation will happen in "Full preemption" config, on 3.0.30-rt50?
Sure, if the fix hasn't been backported.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: race condition in schedule_on_each_cpu()
[not found] ` <51B27744.6090507@kylinos.com.cn>
@ 2013-06-08 11:30 ` weiqi
0 siblings, 0 replies; 6+ messages in thread
From: weiqi @ 2013-06-08 11:30 UTC (permalink / raw)
To: Tejun Heo; +Cc: weiqi, torvalds, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 403 bytes --]
Hello Tejun Heo,
I've backported the schedule_on_each_cpu() "direct excution" patch on
3.0.30-rt50,
and It fixed my problem.
attachment is the effective patch.
However, I do not understand why machine1 can expose problem, but
machine2 not.
I guess, because it's rt-kernel's preempt level related, so , is this
difference due to cpu performance?
How do you think about this ?
Thank you~
[-- Attachment #2: direct_execution.patch --]
[-- Type: text/plain, Size: 1170 bytes --]
diff -up linux-3.0.30-rt50/kernel/workqueue.c.bak linux-3.0.30-rt50/kernel/workqueue.c
--- linux-3.0.30-rt50/kernel/workqueue.c.bak 2013-06-08 19:09:06.801059232 +0800
+++ linux-3.0.30-rt50/kernel/workqueue.c 2013-06-08 19:09:15.680069626 +0800
@@ -1922,6 +1922,7 @@ static int worker_thread(void *__worker)
/* tell the scheduler that this is a workqueue worker */
worker->task->flags |= PF_WQ_WORKER;
+ smp_mb();
woke_up:
spin_lock_irq(&gcwq->lock);
@@ -2736,6 +2737,7 @@ EXPORT_SYMBOL(schedule_delayed_work_on);
int schedule_on_each_cpu(work_func_t func)
{
int cpu;
+ int orig = -1;
struct work_struct __percpu *works;
works = alloc_percpu(struct work_struct);
@@ -2744,13 +2746,20 @@ int schedule_on_each_cpu(work_func_t fun
get_online_cpus();
+ if(current->flags & PF_WQ_WORKER)
+ orig = raw_smp_processor_id();
+
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, func);
- schedule_work_on(cpu, work);
+ if(cpu != orig)
+ schedule_work_on(cpu, work);
}
+ if (orig >= 0)
+ func(per_cpu_ptr(works,orig));
+
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: race condition in schedule_on_each_cpu()
[not found] ` <51A821F3.1000605@kylinos.com.cn>
@ 2013-05-31 5:03 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2013-05-31 5:03 UTC (permalink / raw)
To: weiqi; +Cc: torvalds, linux-kernel
On Fri, May 31, 2013 at 12:07:15PM +0800, weiqi@kylinos.com.cn wrote:
>
> >the only way for them to get stuck is if there aren't enough execution
> >resources (ie. if a new thread can't be created) but OOM killers would
> >have been activated if that were the case.
>
> The following is a detailed description of our scenerio:
>
> 1. after turnning off the the disk array, the ps results is shown
> in *ps*, which indicates the kworker/1:0 kworker/1:2 are stuck
>
> 2. the call stack for the kworkers are shown in *stack_xxx.txt*
>
> 3. the workqueue operations during that period is shown in
> *out.txt*, use ftrace
> (we added a new trace point /workqueue_queue_work_insert/,
> immediately before insert_wq_barrier, in the function
> start_flush_work. its implementation is shown in
> *trace_insert_wq_barrier.txt*)
> from the results int *grep_kwork1:0_from_out.txt*, we can see:
> kworker/1:0 is stuck after start work
> /fc_starget_delete/ at time 360.801271, and catch the
> insert_wq_barrier trace_info behind this
>
>
> 4. from out.txt , we can see, there are altogether three
> /fc_starget_delete/ work enqueued.
> atfer the point of deadlock, kworker/1:1 and kworker/1:3 is
> executing ...
>
>
> 5. if we let the scsi_transport_fc uses only one worker thread,
> i.e., change scsi_transport_fc.c : fc_host_setup()
> alloc_workqueue(fc_host->work_q_name, 0, 0) to
> alloc_workqueue(fc_host->work_q_name, WQ_UNBOUND, 1)
>
> alloc_workqueue(fc_host->devloss_work_q_name, 0, 0) to
> alloc_workqueue(fc_host->devloss_work_q_name, WQ_UNBOUND, 1)
>
> the deadlock won't occur.
> >Can you please test a recent kernel? How easily can you reproduce the
> >issue?
> >
> it's occured every time when hot remove disk array.
>
> I'll test recent kernel after a while , but this problem in 3.0.30
> really confused me
Yeah, it definitely sounds like concurrency depletion. There have
been some fixes and substantial changes in the area, so I really wanna
find out whether the problem is reproducible in recent vanilla kernel
- say, v3.9 or, even better, v3.10-rc2. Can you please try to
reproduce the problem with a newer kernel?
> by the way, I'm wondering about what's the race condition before
> which doesn't exist now
Before the commit you originally quoted, the calling thread could be
preempted and migrated to another CPU before get_online_cpus() thus
ending up executing the function twice on the new cpu but skipping the
old one.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-06-08 11:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <tencent_0777D84B54B4163A3B85255A@qq.com>
2013-06-06 21:23 ` race condition in schedule_on_each_cpu() Tejun Heo
2013-06-07 1:34 ` weiqi
2013-06-07 2:24 ` weiqi
2013-06-07 23:22 ` Tejun Heo
[not found] ` <51B27744.6090507@kylinos.com.cn>
2013-06-08 11:30 ` weiqi
[not found] <51A7FFE8.6060204@kylinos.com.cn>
[not found] ` <20130531023246.GD30479@mtj.dyndns.org>
[not found] ` <51A821F3.1000605@kylinos.com.cn>
2013-05-31 5:03 ` Tejun Heo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.