* [PATCH] ipvs: Remove unused variable ret from sync_thread_master() @ 2013-11-12 13:53 Geert Uytterhoeven 2013-11-12 14:13 ` Peter Zijlstra 0 siblings, 1 reply; 18+ messages in thread From: Geert Uytterhoeven @ 2013-11-12 13:53 UTC (permalink / raw) To: Peter Zijlstra, Oleg Nesterov, Ingo Molnar Cc: netdev, linux-kernel, Geert Uytterhoeven net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master': net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable] Introduced by commit 35a2af94c7ce7130ca292c68b1d27fcfdb648f6b ("sched/wait: Make the __wait_event*() interface more friendly") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> --- net/netfilter/ipvs/ip_vs_sync.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c index f63c2388f38d..d258125c4202 100644 --- a/net/netfilter/ipvs/ip_vs_sync.c +++ b/net/netfilter/ipvs/ip_vs_sync.c @@ -1637,7 +1637,7 @@ static int sync_thread_master(void *data) continue; } while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) { - int ret = __wait_event_interruptible(*sk_sleep(sk), + __wait_event_interruptible(*sk_sleep(sk), sock_writeable(sk) || kthread_should_stop()); if (unlikely(kthread_should_stop())) -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 13:53 [PATCH] ipvs: Remove unused variable ret from sync_thread_master() Geert Uytterhoeven @ 2013-11-12 14:13 ` Peter Zijlstra 2013-11-12 14:21 ` David Laight 0 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2013-11-12 14:13 UTC (permalink / raw) To: Geert Uytterhoeven; +Cc: Oleg Nesterov, Ingo Molnar, netdev, linux-kernel On Tue, Nov 12, 2013 at 02:53:16PM +0100, Geert Uytterhoeven wrote: > net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master': > net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable] > > Introduced by commit 35a2af94c7ce7130ca292c68b1d27fcfdb648f6b ("sched/wait: > Make the __wait_event*() interface more friendly") > > Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> > --- > net/netfilter/ipvs/ip_vs_sync.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c > index f63c2388f38d..d258125c4202 100644 > --- a/net/netfilter/ipvs/ip_vs_sync.c > +++ b/net/netfilter/ipvs/ip_vs_sync.c > @@ -1637,7 +1637,7 @@ static int sync_thread_master(void *data) > continue; > } > while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) { > - int ret = __wait_event_interruptible(*sk_sleep(sk), So ideally there's be a comment here why we're using interruptible but then ignore interruptions. Julian said ( http://lkml.kernel.org/r/alpine.LFD.2.00.1310012245020.1782@ja.ssi.bg ): " Yes, your patch looks ok to me. In the past we used ssleep() but IPVS users were confused why IPVS threads increase the load average. So, we switched to _interruptible calls and later the socket polling was added. " So maybe add something like /* * (Ab)use interruptible sleep to avoid increasing * the load avg. */ > + __wait_event_interruptible(*sk_sleep(sk), > sock_writeable(sk) || > kthread_should_stop()); > if (unlikely(kthread_should_stop())) > -- > 1.7.9.5 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 14:13 ` Peter Zijlstra @ 2013-11-12 14:21 ` David Laight 2013-11-12 14:31 ` Peter Zijlstra 2013-11-12 14:52 ` Peter Zijlstra 0 siblings, 2 replies; 18+ messages in thread From: David Laight @ 2013-11-12 14:21 UTC (permalink / raw) To: Peter Zijlstra, Geert Uytterhoeven Cc: Oleg Nesterov, Ingo Molnar, netdev, linux-kernel > > @@ -1637,7 +1637,7 @@ static int sync_thread_master(void *data) > > continue; > > } > > while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) { > > - int ret = __wait_event_interruptible(*sk_sleep(sk), > > So ideally there's be a comment here why we're using interruptible but > then ignore interruptions. > > Julian said ( > http://lkml.kernel.org/r/alpine.LFD.2.00.1310012245020.1782@ja.ssi.bg ): > > " Yes, your patch looks ok to me. In the past > we used ssleep() but IPVS users were confused why > IPVS threads increase the load average. So, we > switched to _interruptible calls and later the socket > polling was added. " I've done this in the past so that the code sleeps interruptibly unless there is a signal pending - which would cause it to return early. /* Tell scheduler we are going to sleep... */ if (signal_pending(current)) /* We don't want waking immediately (again) */ sleep_state = TASK_UNINTERRUPTIBLE; else sleep_state = TASK_INTERRUPTIBLE; set_current_state(sleep_state); Shame there isn't a process flag to indicate that the process will sleep uninterruptibly and that it doesn't matter. So don't count to the load average and don't emit a warning if it has been sleeping for a long time. David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 14:21 ` David Laight @ 2013-11-12 14:31 ` Peter Zijlstra 2013-11-12 14:38 ` David Laight 2013-11-12 16:26 ` Oleg Nesterov 2013-11-12 14:52 ` Peter Zijlstra 1 sibling, 2 replies; 18+ messages in thread From: Peter Zijlstra @ 2013-11-12 14:31 UTC (permalink / raw) To: David Laight Cc: Geert Uytterhoeven, Oleg Nesterov, Ingo Molnar, netdev, linux-kernel On Tue, Nov 12, 2013 at 02:21:39PM -0000, David Laight wrote: > > > @@ -1637,7 +1637,7 @@ static int sync_thread_master(void *data) > > > continue; > > > } > > > while (ip_vs_send_sync_msg(tinfo->sock, sb->mesg) < 0) { > > > - int ret = __wait_event_interruptible(*sk_sleep(sk), > > > > So ideally there's be a comment here why we're using interruptible but > > then ignore interruptions. > > > > Julian said ( > > http://lkml.kernel.org/r/alpine.LFD.2.00.1310012245020.1782@ja.ssi.bg ): > > > > " Yes, your patch looks ok to me. In the past > > we used ssleep() but IPVS users were confused why > > IPVS threads increase the load average. So, we > > switched to _interruptible calls and later the socket > > polling was added. " > > I've done this in the past so that the code sleeps interruptibly > unless there is a signal pending - which would cause it to return > early. > > /* Tell scheduler we are going to sleep... */ > if (signal_pending(current)) > /* We don't want waking immediately (again) */ > sleep_state = TASK_UNINTERRUPTIBLE; > else > sleep_state = TASK_INTERRUPTIBLE; > set_current_state(sleep_state); If this is for kernel threads, I think you can wipe the pending state; not entirely sure how signals interact with kthreads; Oleg will know. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 14:31 ` Peter Zijlstra @ 2013-11-12 14:38 ` David Laight 2013-11-12 16:26 ` Oleg Nesterov 1 sibling, 0 replies; 18+ messages in thread From: David Laight @ 2013-11-12 14:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Geert Uytterhoeven, Oleg Nesterov, Ingo Molnar, netdev, linux-kernel > > I've done this in the past so that the code sleeps interruptibly > > unless there is a signal pending - which would cause it to return > > early. > > > > /* Tell scheduler we are going to sleep... */ > > if (signal_pending(current)) > > /* We don't want waking immediately (again) */ > > sleep_state = TASK_UNINTERRUPTIBLE; > > else > > sleep_state = TASK_INTERRUPTIBLE; > > set_current_state(sleep_state); > > If this is for kernel threads, I think you can wipe the pending state; > not entirely sure how signals interact with kthreads; Oleg will know. I did take me a while to manage to use kthreads, and we probably still have customers who insist on using pre 2.6.18 kernels! (In which case the processes are children of a daemon that only return to userspace in order to exit.) We also need to send signals (to get the process out of blocking socket calls), so I have to use force_sig() to get the signal noticed. The other issue was tidyup - I had to find module_put_and_exit(). David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 14:31 ` Peter Zijlstra 2013-11-12 14:38 ` David Laight @ 2013-11-12 16:26 ` Oleg Nesterov 1 sibling, 0 replies; 18+ messages in thread From: Oleg Nesterov @ 2013-11-12 16:26 UTC (permalink / raw) To: Peter Zijlstra Cc: David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On 11/12, Peter Zijlstra wrote: > > On Tue, Nov 12, 2013 at 02:21:39PM -0000, David Laight wrote: > > > > /* Tell scheduler we are going to sleep... */ > > if (signal_pending(current)) > > /* We don't want waking immediately (again) */ > > sleep_state = TASK_UNINTERRUPTIBLE; > > else > > sleep_state = TASK_INTERRUPTIBLE; > > set_current_state(sleep_state); > > If this is for kernel threads, I think you can wipe the pending state; Yes, unless this kthread does allow_signal() signal_pending() can't be true. Oleg. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 14:21 ` David Laight 2013-11-12 14:31 ` Peter Zijlstra @ 2013-11-12 14:52 ` Peter Zijlstra 2013-11-12 16:21 ` Oleg Nesterov 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2013-11-12 14:52 UTC (permalink / raw) To: David Laight Cc: Geert Uytterhoeven, Oleg Nesterov, Ingo Molnar, netdev, linux-kernel On Tue, Nov 12, 2013 at 02:21:39PM -0000, David Laight wrote: > Shame there isn't a process flag to indicate that the process > will sleep uninterruptibly and that it doesn't matter. > So don't count to the load average and don't emit a warning > if it has been sleeping for a long time. A process flag wouldn't work, because the task could block waiting for actual work to complete in other sleeps. However, we could do something like the below; which would allow us writing things like: (void)___wait_event(*sk_sleep(sk), sock_writeable(sk) || kthread_should_stop(), TASK_UNINTERRUPTIBLE | TASK_IDLE, 0, 0, schedule()); Marking the one wait-for-more-work as TASK_IDLE such that it doesn't contribute to the load avg. --- fs/proc/array.c | 23 ++++++++++++----------- include/linux/sched.h | 8 +++++--- include/trace/events/sched.h | 3 ++- 3 files changed, 19 insertions(+), 15 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index 1bd2077187fd..da9a9c8b5bba 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -133,17 +133,18 @@ static inline void task_name(struct seq_file *m, struct task_struct *p) * simple bit tests. */ static const char * const task_state_array[] = { - "R (running)", /* 0 */ - "S (sleeping)", /* 1 */ - "D (disk sleep)", /* 2 */ - "T (stopped)", /* 4 */ - "t (tracing stop)", /* 8 */ - "Z (zombie)", /* 16 */ - "X (dead)", /* 32 */ - "x (dead)", /* 64 */ - "K (wakekill)", /* 128 */ - "W (waking)", /* 256 */ - "P (parked)", /* 512 */ + "R (running)", /* 0 */ + "S (sleeping)", /* 1 */ + "D (disk sleep)", /* 2 */ + "T (stopped)", /* 4 */ + "t (tracing stop)", /* 8 */ + "Z (zombie)", /* 16 */ + "X (dead)", /* 32 */ + "x (dead)", /* 64 */ + "K (wakekill)", /* 128 */ + "W (waking)", /* 256 */ + "P (parked)", /* 512 */ + "I (idle)", /* 1024 */ }; static inline const char *get_task_state(struct task_struct *tsk) diff --git a/include/linux/sched.h b/include/linux/sched.h index 045b0d227846..a4a9e80688d8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -145,9 +145,10 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq); #define TASK_WAKEKILL 128 #define TASK_WAKING 256 #define TASK_PARKED 512 -#define TASK_STATE_MAX 1024 +#define TASK_IDLE 1024 +#define TASK_STATE_MAX 2048 -#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWP" +#define TASK_STATE_TO_CHAR_STR "RSDTtZXxKWPI" extern char ___assert_task_state[1 - 2*!!( sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)]; @@ -173,7 +174,8 @@ extern char ___assert_task_state[1 - 2*!!( ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0) #define task_contributes_to_load(task) \ ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ - (task->flags & PF_FROZEN) == 0) + (task->flags & PF_FROZEN) == 0 && \ + (task->state & TASK_IDLE) == 0) #define __set_task_state(tsk, state_value) \ do { (tsk)->state = (state_value); } while (0) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 04c308413a5d..4553a2495c25 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -144,7 +144,8 @@ TRACE_EVENT(sched_switch, __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|", { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" }, { 16, "Z" }, { 32, "X" }, { 64, "x" }, - { 128, "K" }, { 256, "W" }, { 512, "P" }) : "R", + { 128, "K" }, { 256, "W" }, { 512, "P" }, + { 1024, "I" }) : "R", __entry->prev_state & TASK_STATE_MAX ? "+" : "", __entry->next_comm, __entry->next_pid, __entry->next_prio) ); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 14:52 ` Peter Zijlstra @ 2013-11-12 16:21 ` Oleg Nesterov 2013-11-12 16:56 ` oom-kill && frozen() Oleg Nesterov 2013-11-12 17:00 ` [PATCH] ipvs: Remove unused variable ret from sync_thread_master() Peter Zijlstra 0 siblings, 2 replies; 18+ messages in thread From: Oleg Nesterov @ 2013-11-12 16:21 UTC (permalink / raw) To: Peter Zijlstra, Tejun Heo Cc: David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On 11/12, Peter Zijlstra wrote: > > On Tue, Nov 12, 2013 at 02:21:39PM -0000, David Laight wrote: > > Shame there isn't a process flag to indicate that the process > > will sleep uninterruptibly and that it doesn't matter. > > So don't count to the load average and don't emit a warning > > if it has been sleeping for a long time. > > A process flag wouldn't work, because the task could block waiting for > actual work to complete in other sleeps. > > However, we could do something like the below; which would allow us > writing things like: > > (void)___wait_event(*sk_sleep(sk), > sock_writeable(sk) || kthread_should_stop(), > TASK_UNINTERRUPTIBLE | TASK_IDLE, 0, 0, > schedule()); > > Marking the one wait-for-more-work as TASK_IDLE such that it doesn't > contribute to the load avg. Agreed, I thought about additional bit too. > static const char * const task_state_array[] = { > - "R (running)", /* 0 */ > - "S (sleeping)", /* 1 */ > - "D (disk sleep)", /* 2 */ > - "T (stopped)", /* 4 */ > - "t (tracing stop)", /* 8 */ > - "Z (zombie)", /* 16 */ > - "X (dead)", /* 32 */ > - "x (dead)", /* 64 */ > - "K (wakekill)", /* 128 */ > - "W (waking)", /* 256 */ > - "P (parked)", /* 512 */ > + "R (running)", /* 0 */ > + "S (sleeping)", /* 1 */ > + "D (disk sleep)", /* 2 */ > + "T (stopped)", /* 4 */ > + "t (tracing stop)", /* 8 */ > + "Z (zombie)", /* 16 */ > + "X (dead)", /* 32 */ > + "x (dead)", /* 64 */ > + "K (wakekill)", /* 128 */ > + "W (waking)", /* 256 */ > + "P (parked)", /* 512 */ > + "I (idle)", /* 1024 */ > }; but I am not sure about what /proc/ should report in this case... > #define task_contributes_to_load(task) \ > ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ > - (task->flags & PF_FROZEN) == 0) > + (task->flags & PF_FROZEN) == 0 && \ > + (task->state & TASK_IDLE) == 0) perhaps (task->state & (TASK_UNINTERRUPTIBLE | TASK_IDLE)) == TASK_UNINTERRUPTIBLE can save an insn. I am also wondering if it makes any sense to turn PF_FROZEN into TASK_FROZEN, something like (incomplete, probably racy) patch below. Note that it actually adds the new state, not the the qualifier. Oleg. --- x/include/linux/freezer.h +++ x/include/linux/freezer.h @@ -23,7 +23,7 @@ extern unsigned int freeze_timeout_msecs */ static inline bool frozen(struct task_struct *p) { - return p->flags & PF_FROZEN; + return p->state & TASK_FROZEN; } extern bool freezing_slow_path(struct task_struct *p); --- x/kernel/freezer.c +++ x/kernel/freezer.c @@ -57,16 +57,13 @@ bool __refrigerator(bool check_kthr_stop pr_debug("%s entered refrigerator\n", current->comm); for (;;) { - set_current_state(TASK_UNINTERRUPTIBLE); - spin_lock_irq(&freezer_lock); - current->flags |= PF_FROZEN; - if (!freezing(current) || - (check_kthr_stop && kthread_should_stop())) - current->flags &= ~PF_FROZEN; + if (freezing(current) && + !(check_kthr_stop && kthread_should_stop())) + set_current_state(TASK_FROZEN); spin_unlock_irq(&freezer_lock); - if (!(current->flags & PF_FROZEN)) + if (!(current->state & TASK_FROZEN)) break; was_frozen = true; schedule(); @@ -148,8 +145,7 @@ void __thaw_task(struct task_struct *p) * refrigerator. */ spin_lock_irqsave(&freezer_lock, flags); - if (frozen(p)) - wake_up_process(p); + try_to_wake_up(p, TASK_FROZEN, 0); spin_unlock_irqrestore(&freezer_lock, flags); } ^ permalink raw reply [flat|nested] 18+ messages in thread
* oom-kill && frozen() 2013-11-12 16:21 ` Oleg Nesterov @ 2013-11-12 16:56 ` Oleg Nesterov 2013-11-13 3:20 ` Tejun Heo 2013-11-12 17:00 ` [PATCH] ipvs: Remove unused variable ret from sync_thread_master() Peter Zijlstra 1 sibling, 1 reply; 18+ messages in thread From: Oleg Nesterov @ 2013-11-12 16:56 UTC (permalink / raw) To: Peter Zijlstra, Tejun Heo, David Rientjes Cc: David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On 11/12, Oleg Nesterov wrote: > > I am also wondering if it makes any sense to turn PF_FROZEN into > TASK_FROZEN, something like (incomplete, probably racy) patch below. > Note that it actually adds the new state, not the the qualifier. As for the current usage of PF_FROZEN... David, it seems that oom_scan_process_thread()->__thaw_task() is dead? Probably this was fine before, when __thaw_task() cleared the "need to freeze" condition, iirc it was PF_FROZEN. But today __thaw_task() can't help, no? the task will simply schedule() in D state again. Oleg. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: oom-kill && frozen() 2013-11-12 16:56 ` oom-kill && frozen() Oleg Nesterov @ 2013-11-13 3:20 ` Tejun Heo 2013-11-13 17:07 ` Oleg Nesterov 0 siblings, 1 reply; 18+ messages in thread From: Tejun Heo @ 2013-11-13 3:20 UTC (permalink / raw) To: Oleg Nesterov Cc: Peter Zijlstra, David Rientjes, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel Hello, On Tue, Nov 12, 2013 at 05:56:43PM +0100, Oleg Nesterov wrote: > On 11/12, Oleg Nesterov wrote: > > I am also wondering if it makes any sense to turn PF_FROZEN into > > TASK_FROZEN, something like (incomplete, probably racy) patch below. > > Note that it actually adds the new state, not the the qualifier. > > As for the current usage of PF_FROZEN... David, it seems that > oom_scan_process_thread()->__thaw_task() is dead? Probably this > was fine before, when __thaw_task() cleared the "need to freeze" > condition, iirc it was PF_FROZEN. > > But today __thaw_task() can't help, no? the task will simply > schedule() in D state again. Yeah, it'll have to be actively excluded using e.g. PF_FREEZER_SKIP, which, BTW, can usually only be manipulated by the task itself. I've been saying this multiple times but this is yet another cost of having "frozen" as a separate completely alien task state, which is essentially close to being undefined when viewed from userland. We're spreading broken behaviors and complexity throughout the kernel by making this half broken state visible. :( Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: oom-kill && frozen() 2013-11-13 3:20 ` Tejun Heo @ 2013-11-13 17:07 ` Oleg Nesterov 2013-11-13 17:42 ` Peter Zijlstra 2013-11-13 19:11 ` __refrigerator() && saved task->state Oleg Nesterov 0 siblings, 2 replies; 18+ messages in thread From: Oleg Nesterov @ 2013-11-13 17:07 UTC (permalink / raw) To: Tejun Heo Cc: Peter Zijlstra, David Rientjes, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On 11/13, Tejun Heo wrote: > > Hello, > > On Tue, Nov 12, 2013 at 05:56:43PM +0100, Oleg Nesterov wrote: > > On 11/12, Oleg Nesterov wrote: > > > I am also wondering if it makes any sense to turn PF_FROZEN into > > > TASK_FROZEN, something like (incomplete, probably racy) patch below. > > > Note that it actually adds the new state, not the the qualifier. > > > > As for the current usage of PF_FROZEN... David, it seems that > > oom_scan_process_thread()->__thaw_task() is dead? Probably this > > was fine before, when __thaw_task() cleared the "need to freeze" > > condition, iirc it was PF_FROZEN. > > > > But today __thaw_task() can't help, no? the task will simply > > schedule() in D state again. > > Yeah, it'll have to be actively excluded using e.g. PF_FREEZER_SKIP, > which, BTW, can usually only be manipulated by the task itself. Oh, yes, yes, yes, I agree. PF_FREEZER_SKIP and the growing number of freezable_schedule() makes this all more confusing. In fact I was think about something like 1. Add the new __TASK_FREEZABLE qualifier 2. Turn freezable_schedule() into void freezable_schedule(void) { spin_lock_irq(¤t->pi_lock); if (current->state) current->state |= __TASK_FREEZABLE spin_unlock_irq(¤t->pi_lock); schedule(); try_to_freeze(); } 3. Kill PF_FREEZER_SKIP/freezer_do_not_count/count/should_skip 4. Change freeze_task() and fake_signal_wake_up() - wake_up_state(p, TASK_INTERRUPTIBLE); + wake_up_state(p, TASK_INTERRUPTIBLE | __TASK_FREEZABLE); Unfortunately, this can only work if the caller can tolerate the false wakeup. We can even fix wait_for_vfork_done(), but say ptrace_stop() can't work this way. And even if we can make this work, the very fact that freezable_schedule() does schedule() twice does not look right. _Perhaps_ we can do something like "selective wakeup"? IOW, ignoring the races/details, 1. Add __TASK_FROZEN qualifier _and_ state 2. Change frozen(), static inline bool frozen(struct task_struct *p) { return p->state & __TASK_FROZEN; } 2. Change freezable_schedule(), void freezable_schedule(void) { spin_lock_irq(¤t->pi_lock); if (current->state) current->state |= __TASK_FROZEN; spin_unlock_irq(¤t->pi_lock); schedule(); } 3. Change __refrigerator() to use saved_state | __TASK_FROZEN too. 4. Finally, change try_to_wake_up() path to do - p->state = TASK_WAKING; + p->state &= ~state; + if (p->state & ~(TASK_DEAD | TASK_WAKEKILL | TASK_PARKED)) + return; + else + p->state = TASK_WAKING; IOW, if the task sleeps in, say, TASK_INTERRUPTIBLE | __TASK_FROZEN then it need both try_to_wake_up(TASK_INTERRUPTIBLE) and try_to_wake_up(__TASK_FROZEN) to wake up. 5. Kill PF_FREEZER_SKIP / etc. Unfortunately, 4. is obviously needs more changes, although at first glance nothing really nontrivial... we need a common helper for try_to_wake_up() and ttwu_remote() which checks/changes ->state and we need to avoid "stat" if we do not actually wake up. Hmm. and this all makes me think that at least s/PF_FROZEN/TASK_FROZEN/ as a first step actually makes some sense... Note the "qualifier _and_ state" above. Tejun, Peter, do you think this makes any sense? I am just curious, but "selective wakeup" looks potentially useful. And what about oom_scan_process_thread() ? should we simply kill this dead frozen/__thaw_task code or should we change freezing() to respect TIF_MEMDIE? Oleg. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: oom-kill && frozen() 2013-11-13 17:07 ` Oleg Nesterov @ 2013-11-13 17:42 ` Peter Zijlstra 2013-11-13 18:15 ` Oleg Nesterov 2013-11-13 19:11 ` __refrigerator() && saved task->state Oleg Nesterov 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2013-11-13 17:42 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, David Rientjes, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On Wed, Nov 13, 2013 at 06:07:24PM +0100, Oleg Nesterov wrote: > 4. Finally, change try_to_wake_up() path to do > > - p->state = TASK_WAKING; > + p->state &= ~state; > + if (p->state & ~(TASK_DEAD | TASK_WAKEKILL | TASK_PARKED)) > + return; > + else > + p->state = TASK_WAKING; > > IOW, if the task sleeps in, say, TASK_INTERRUPTIBLE | __TASK_FROZEN > then it need both try_to_wake_up(TASK_INTERRUPTIBLE) and > try_to_wake_up(__TASK_FROZEN) to wake up. > Tejun, Peter, do you think this makes any sense? I am just curious, but > "selective wakeup" looks potentially useful. I've never looked at any of this freeze stuff, so I cannot comment too much. However we should be very careful not to add too much to relative hot paths for the relative rare case of freezing stuff. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: oom-kill && frozen() 2013-11-13 17:42 ` Peter Zijlstra @ 2013-11-13 18:15 ` Oleg Nesterov 0 siblings, 0 replies; 18+ messages in thread From: Oleg Nesterov @ 2013-11-13 18:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Tejun Heo, David Rientjes, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On 11/13, Peter Zijlstra wrote: > > On Wed, Nov 13, 2013 at 06:07:24PM +0100, Oleg Nesterov wrote: > > 4. Finally, change try_to_wake_up() path to do > > > > - p->state = TASK_WAKING; > > + p->state &= ~state; > > + if (p->state & ~(TASK_DEAD | TASK_WAKEKILL | TASK_PARKED)) > > + return; > > + else > > + p->state = TASK_WAKING; > > > > IOW, if the task sleeps in, say, TASK_INTERRUPTIBLE | __TASK_FROZEN > > then it need both try_to_wake_up(TASK_INTERRUPTIBLE) and > > try_to_wake_up(__TASK_FROZEN) to wake up. > > > Tejun, Peter, do you think this makes any sense? I am just curious, but > > "selective wakeup" looks potentially useful. > > I've never looked at any of this freeze stuff, so I cannot comment too > much. However we should be very careful not to add too much to relative > hot paths for the relative rare case of freezing stuff. Yes, yes, sure. Plus my description was confusing and incomplete (and I am sure I missed something more). I forgot to mention that freeze_task() should also add the FROZEN state if it sees FROZEN qualifier, this needs more changes. So please forget. May be I'll _try_ to make something working (at least for discussion), but I am not sure. Oleg. ^ permalink raw reply [flat|nested] 18+ messages in thread
* __refrigerator() && saved task->state 2013-11-13 17:07 ` Oleg Nesterov 2013-11-13 17:42 ` Peter Zijlstra @ 2013-11-13 19:11 ` Oleg Nesterov 2013-11-13 19:14 ` Peter Zijlstra 1 sibling, 1 reply; 18+ messages in thread From: Oleg Nesterov @ 2013-11-13 19:11 UTC (permalink / raw) To: Tejun Heo Cc: Peter Zijlstra, David Rientjes, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel Sorry for noise, but I am totally confused. Could you please remind why __refrigerator() saves/restores task->state? I can see only one reason: set_freezable() kernel threads which check kthread_should_stop() and do try_to_freeze() by hand. But does this save/restore actually help? For example kauditd_thread() looks obviously racy exactly because try_to_freeze() can return in TASK_INTERRUPTIBLE state after wake_up(kauditd_wait) was already called, we can miss an event. At first glance it would be better to simply kill this logic? If it was called with ->state != 0, the caller is going to schedule() and it probably executes the wait_event-like code, in this case it would me more safe to pretend the task got a spurious wakeup? (as for kauditd_thread() in particular, it looks wrong in any case, even kthread_should_stop() check doesn't look right, it needs kthread_freezable_should_stop() afaics). But I guess I missed something else... Oleg. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: __refrigerator() && saved task->state 2013-11-13 19:11 ` __refrigerator() && saved task->state Oleg Nesterov @ 2013-11-13 19:14 ` Peter Zijlstra 2013-11-13 19:40 ` Oleg Nesterov 0 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2013-11-13 19:14 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, David Rientjes, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On Wed, Nov 13, 2013 at 08:11:43PM +0100, Oleg Nesterov wrote: > At first glance it would be better to simply kill this logic? If > it was called with ->state != 0, the caller is going to schedule() > and it probably executes the wait_event-like code, in this case > it would me more safe to pretend the task got a spurious wakeup? Note that in general the kernel cannot deal with spurious wakeups :/ Most proper locks and wait primitives can, but there's enough open-coded crap out there that can not. I tried.. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: __refrigerator() && saved task->state 2013-11-13 19:14 ` Peter Zijlstra @ 2013-11-13 19:40 ` Oleg Nesterov 0 siblings, 0 replies; 18+ messages in thread From: Oleg Nesterov @ 2013-11-13 19:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Tejun Heo, David Rientjes, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On 11/13, Peter Zijlstra wrote: > > On Wed, Nov 13, 2013 at 08:11:43PM +0100, Oleg Nesterov wrote: > > At first glance it would be better to simply kill this logic? If > > it was called with ->state != 0, the caller is going to schedule() > > and it probably executes the wait_event-like code, in this case > > it would me more safe to pretend the task got a spurious wakeup? > > Note that in general the kernel cannot deal with spurious wakeups :/ > > Most proper locks and wait primitives can, but there's enough open-coded > crap out there that can not. Oh yes, I understand. My point is, "restore the old state" in this case looks worse simply because you miss any wakeup in between which was going to clear that state. And afaics only kthreads can call __refrigerator() in !RUNNING. But let me repeat, I am almost sure I missed something else. Oleg. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 16:21 ` Oleg Nesterov 2013-11-12 16:56 ` oom-kill && frozen() Oleg Nesterov @ 2013-11-12 17:00 ` Peter Zijlstra 2013-11-12 18:04 ` Oleg Nesterov 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2013-11-12 17:00 UTC (permalink / raw) To: Oleg Nesterov Cc: Tejun Heo, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On Tue, Nov 12, 2013 at 05:21:36PM +0100, Oleg Nesterov wrote: > On 11/12, Peter Zijlstra wrote: > > > > On Tue, Nov 12, 2013 at 02:21:39PM -0000, David Laight wrote: > > > Shame there isn't a process flag to indicate that the process > > > will sleep uninterruptibly and that it doesn't matter. > > > So don't count to the load average and don't emit a warning > > > if it has been sleeping for a long time. > > > > A process flag wouldn't work, because the task could block waiting for > > actual work to complete in other sleeps. > > > > However, we could do something like the below; which would allow us > > writing things like: > > > > (void)___wait_event(*sk_sleep(sk), > > sock_writeable(sk) || kthread_should_stop(), > > TASK_UNINTERRUPTIBLE | TASK_IDLE, 0, 0, > > schedule()); > > > > Marking the one wait-for-more-work as TASK_IDLE such that it doesn't > > contribute to the load avg. > > Agreed, I thought about additional bit too. > > > static const char * const task_state_array[] = { > > - "R (running)", /* 0 */ > > - "S (sleeping)", /* 1 */ > > - "D (disk sleep)", /* 2 */ > > - "T (stopped)", /* 4 */ > > - "t (tracing stop)", /* 8 */ > > - "Z (zombie)", /* 16 */ > > - "X (dead)", /* 32 */ > > - "x (dead)", /* 64 */ > > - "K (wakekill)", /* 128 */ > > - "W (waking)", /* 256 */ > > - "P (parked)", /* 512 */ > > + "R (running)", /* 0 */ > > + "S (sleeping)", /* 1 */ > > + "D (disk sleep)", /* 2 */ > > + "T (stopped)", /* 4 */ > > + "t (tracing stop)", /* 8 */ > > + "Z (zombie)", /* 16 */ > > + "X (dead)", /* 32 */ > > + "x (dead)", /* 64 */ > > + "K (wakekill)", /* 128 */ > > + "W (waking)", /* 256 */ > > + "P (parked)", /* 512 */ > > + "I (idle)", /* 1024 */ > > }; > > but I am not sure about what /proc/ should report in this case... We have to put in something... BUILD_BUG_ON(1 + ilog2(TASK_STATE_MAX) != ARRAY_SIZE(task_state_array)); However, since we always set it together with TASK_UNINTERUPTIBLE userspace shouldn't actually ever see the I thing. > > #define task_contributes_to_load(task) \ > > ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ > > - (task->flags & PF_FROZEN) == 0) > > + (task->flags & PF_FROZEN) == 0 && \ > > + (task->state & TASK_IDLE) == 0) > > perhaps > > (task->state & (TASK_UNINTERRUPTIBLE | TASK_IDLE)) == TASK_UNINTERRUPTIBLE > > can save an insn. Fair enough. > I am also wondering if it makes any sense to turn PF_FROZEN into > TASK_FROZEN, something like (incomplete, probably racy) patch below. > Note that it actually adds the new state, not the the qualifier. > > --- x/include/linux/freezer.h > +++ x/include/linux/freezer.h > @@ -23,7 +23,7 @@ extern unsigned int freeze_timeout_msecs > */ > static inline bool frozen(struct task_struct *p) > { > - return p->flags & PF_FROZEN; > + return p->state & TASK_FROZEN; do we want == there? Does it make sense to allow it be set with other state flags? > } > > extern bool freezing_slow_path(struct task_struct *p); > --- x/kernel/freezer.c > +++ x/kernel/freezer.c > @@ -57,16 +57,13 @@ bool __refrigerator(bool check_kthr_stop > pr_debug("%s entered refrigerator\n", current->comm); > > for (;;) { > - set_current_state(TASK_UNINTERRUPTIBLE); > - > spin_lock_irq(&freezer_lock); > - current->flags |= PF_FROZEN; > - if (!freezing(current) || > - (check_kthr_stop && kthread_should_stop())) > - current->flags &= ~PF_FROZEN; > + if (freezing(current) && > + !(check_kthr_stop && kthread_should_stop())) > + set_current_state(TASK_FROZEN); > spin_unlock_irq(&freezer_lock); > > - if (!(current->flags & PF_FROZEN)) > + if (!(current->state & TASK_FROZEN)) > break; > was_frozen = true; > schedule(); > @@ -148,8 +145,7 @@ void __thaw_task(struct task_struct *p) > * refrigerator. > */ > spin_lock_irqsave(&freezer_lock, flags); > - if (frozen(p)) > - wake_up_process(p); > + try_to_wake_up(p, TASK_FROZEN, 0); > spin_unlock_irqrestore(&freezer_lock, flags); > } Should work I suppose... I'm not entirely sure why that's a PF to begin with. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] ipvs: Remove unused variable ret from sync_thread_master() 2013-11-12 17:00 ` [PATCH] ipvs: Remove unused variable ret from sync_thread_master() Peter Zijlstra @ 2013-11-12 18:04 ` Oleg Nesterov 0 siblings, 0 replies; 18+ messages in thread From: Oleg Nesterov @ 2013-11-12 18:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Tejun Heo, David Laight, Geert Uytterhoeven, Ingo Molnar, netdev, linux-kernel On 11/12, Peter Zijlstra wrote: > > On Tue, Nov 12, 2013 at 05:21:36PM +0100, Oleg Nesterov wrote: > > On 11/12, Peter Zijlstra wrote: > > > > > > static const char * const task_state_array[] = { > > > ... > > > + "I (idle)", /* 1024 */ > > > }; > > > > but I am not sure about what /proc/ should report in this case... > > We have to put in something... > > BUILD_BUG_ON(1 + ilog2(TASK_STATE_MAX) != ARRAY_SIZE(task_state_array)); > > However, since we always set it together with TASK_UNINTERUPTIBLE > userspace shouldn't actually ever see the I thing. OOPS. I didn't know that get_task_state() does &= TASK_REPORT. So it can never report anything > EXIT_DEAD. Perhaps we should change BUILD_BUG_ON() and shrink task_state_array? --- x/include/linux/sched.h +++ x/include/linux/sched.h @@ -163,7 +163,7 @@ extern char ___assert_task_state[1 - 2*! /* get_task_state() */ #define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE | \ TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \ - __TASK_TRACED) + __TASK_TRACED | EXIT_ZOMBIE | EXIT_DEAD) #define task_is_traced(task) ((task->state & __TASK_TRACED) != 0) #define task_is_stopped(task) ((task->state & __TASK_STOPPED) != 0) --- x/fs/proc/array.c +++ x/fs/proc/array.c @@ -148,16 +148,12 @@ static const char * const task_state_arr static inline const char *get_task_state(struct task_struct *tsk) { - unsigned int state = (tsk->state & TASK_REPORT) | tsk->exit_state; + unsigned int state = (tsk->state | tsk->exit_state) & TASK_REPORT; const char * const *p = &task_state_array[0]; - BUILD_BUG_ON(1 + ilog2(TASK_STATE_MAX) != ARRAY_SIZE(task_state_array)); + BUILD_BUG_ON(1 + ilog2(TASK_REPORT) != ARRAY_SIZE(task_state_array)); - while (state) { - p++; - state >>= 1; - } - return *p; + return task_state_array[fls(state)]; } static inline void task_state(struct seq_file *m, struct pid_namespace *ns, > > I am also wondering if it makes any sense to turn PF_FROZEN into > > TASK_FROZEN, something like (incomplete, probably racy) patch below. > > Note that it actually adds the new state, not the the qualifier. > > > > --- x/include/linux/freezer.h > > +++ x/include/linux/freezer.h > > @@ -23,7 +23,7 @@ extern unsigned int freeze_timeout_msecs > > */ > > static inline bool frozen(struct task_struct *p) > > { > > - return p->flags & PF_FROZEN; > > + return p->state & TASK_FROZEN; > > do we want == there? Not really, but if we add TASK_FROZEN then we should probably add task_is_frozen() just for consistency (frozen() should use this helper) and all task_is_* helpers do "&". And, > Does it make sense to allow it be set with other > state flags? Not sure, but _perhaps_ TASK_FROZEN | TASK_KILLABLE can be used too, say, we can add CGROUP_FROZEN_KILLABLE. Probably makes no sense, I dunno. > > --- x/kernel/freezer.c > > +++ x/kernel/freezer.c > > @@ -57,16 +57,13 @@ bool __refrigerator(bool check_kthr_stop > > pr_debug("%s entered refrigerator\n", current->comm); > > > > for (;;) { > > - set_current_state(TASK_UNINTERRUPTIBLE); > > - > > spin_lock_irq(&freezer_lock); > > - current->flags |= PF_FROZEN; > > - if (!freezing(current) || > > - (check_kthr_stop && kthread_should_stop())) > > - current->flags &= ~PF_FROZEN; > > + if (freezing(current) && > > + !(check_kthr_stop && kthread_should_stop())) > > + set_current_state(TASK_FROZEN); > > spin_unlock_irq(&freezer_lock); > > > > - if (!(current->flags & PF_FROZEN)) > > + if (!(current->state & TASK_FROZEN)) > > break; > > was_frozen = true; > > schedule(); > > @@ -148,8 +145,7 @@ void __thaw_task(struct task_struct *p) > > * refrigerator. > > */ > > spin_lock_irqsave(&freezer_lock, flags); > > - if (frozen(p)) > > - wake_up_process(p); > > + try_to_wake_up(p, TASK_FROZEN, 0); > > spin_unlock_irqrestore(&freezer_lock, flags); > > } > > Should work I suppose... And perhaps we can simplify this a bit. Not sure we can kill freezer_lock altogether, but at least __thaw_task() can avoid it. But I am still not sure the new TASK_ state really makes sense, even if it looks a bit more clean to me. OTOH, personally I think that /proc/pid/status should report "F (frozen)" if frozen(), but this doesn't need TASK_FROZEN of course. > I'm not entirely sure why that's a PF to begin > with. Oh, it had more PF_'s until tj improved (and cleanuped) this code ;) Oleg. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2013-11-13 19:39 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-11-12 13:53 [PATCH] ipvs: Remove unused variable ret from sync_thread_master() Geert Uytterhoeven 2013-11-12 14:13 ` Peter Zijlstra 2013-11-12 14:21 ` David Laight 2013-11-12 14:31 ` Peter Zijlstra 2013-11-12 14:38 ` David Laight 2013-11-12 16:26 ` Oleg Nesterov 2013-11-12 14:52 ` Peter Zijlstra 2013-11-12 16:21 ` Oleg Nesterov 2013-11-12 16:56 ` oom-kill && frozen() Oleg Nesterov 2013-11-13 3:20 ` Tejun Heo 2013-11-13 17:07 ` Oleg Nesterov 2013-11-13 17:42 ` Peter Zijlstra 2013-11-13 18:15 ` Oleg Nesterov 2013-11-13 19:11 ` __refrigerator() && saved task->state Oleg Nesterov 2013-11-13 19:14 ` Peter Zijlstra 2013-11-13 19:40 ` Oleg Nesterov 2013-11-12 17:00 ` [PATCH] ipvs: Remove unused variable ret from sync_thread_master() Peter Zijlstra 2013-11-12 18:04 ` Oleg Nesterov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.