All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	David Rientjes <rientjes@google.com>,
	David Laight <David.Laight@ACULAB.COM>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Ingo Molnar <mingo@kernel.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: oom-kill && frozen()
Date: Wed, 13 Nov 2013 18:07:24 +0100	[thread overview]
Message-ID: <20131113170724.GA17739@redhat.com> (raw)
In-Reply-To: <20131113032053.GA19394@mtj.dyndns.org>

On 11/13, Tejun Heo wrote:
>
> Hello,
>
> On Tue, Nov 12, 2013 at 05:56:43PM +0100, Oleg Nesterov wrote:
> > On 11/12, Oleg Nesterov wrote:
> > > I am also wondering if it makes any sense to turn PF_FROZEN into
> > > TASK_FROZEN, something like (incomplete, probably racy) patch below.
> > > Note that it actually adds the new state, not the the qualifier.
> >
> > As for the current usage of PF_FROZEN... David, it seems that
> > oom_scan_process_thread()->__thaw_task() is dead? Probably this
> > was fine before, when __thaw_task() cleared the "need to freeze"
> > condition, iirc it was PF_FROZEN.
> >
> > But today __thaw_task() can't help, no? the task will simply
> > schedule() in D state again.
>
> Yeah, it'll have to be actively excluded using e.g. PF_FREEZER_SKIP,
> which, BTW, can usually only be manipulated by the task itself.

Oh, yes, yes, yes, I agree. PF_FREEZER_SKIP and the growing number of
freezable_schedule() makes this all more confusing.

In fact I was think about something like

	1. Add the new __TASK_FREEZABLE qualifier

	2. Turn freezable_schedule() into

		void freezable_schedule(void)
		{
			spin_lock_irq(&current->pi_lock);
			if (current->state)
				current->state |= __TASK_FREEZABLE
			spin_unlock_irq(&current->pi_lock);
			
			schedule();

			try_to_freeze();
		}

	3. Kill PF_FREEZER_SKIP/freezer_do_not_count/count/should_skip

	4. Change freeze_task() and fake_signal_wake_up()

		-	wake_up_state(p, TASK_INTERRUPTIBLE);
		+	wake_up_state(p, TASK_INTERRUPTIBLE | __TASK_FREEZABLE);

Unfortunately, this can only work if the caller can tolerate the
false wakeup. We can even fix wait_for_vfork_done(), but say
ptrace_stop() can't work this way.

And even if we can make this work, the very fact that freezable_schedule()
does schedule() twice does not look right.

_Perhaps_ we can do something like "selective wakeup"? IOW, ignoring the
races/details,

	1. Add __TASK_FROZEN qualifier _and_ state

	2. Change frozen(),

		static inline bool frozen(struct task_struct *p)
		{
			return p->state & __TASK_FROZEN;
		}

	2. Change freezable_schedule(),

		void freezable_schedule(void)
		{
			spin_lock_irq(&current->pi_lock);
			if (current->state)
				current->state |= __TASK_FROZEN;
			spin_unlock_irq(&current->pi_lock);
			
			schedule();
		}

	3. Change __refrigerator() to use saved_state | __TASK_FROZEN
	   too.

	4. Finally, change try_to_wake_up() path to do

		-	p->state = TASK_WAKING;
		+	p->state &= ~state;
		+	if (p->state & ~(TASK_DEAD | TASK_WAKEKILL | TASK_PARKED))
		+		return;
		+	else
		+		p->state = TASK_WAKING;

	   IOW, if the task sleeps in, say, TASK_INTERRUPTIBLE | __TASK_FROZEN
	   then it need both try_to_wake_up(TASK_INTERRUPTIBLE) and
	   try_to_wake_up(__TASK_FROZEN) to wake up.

	 5. Kill PF_FREEZER_SKIP / etc.

Unfortunately, 4. is obviously needs more changes, although at first glance
nothing really nontrivial... we need a common helper for try_to_wake_up()
and ttwu_remote() which checks/changes ->state and we need to avoid "stat"
if we do not actually wake up.

Hmm. and this all makes me think that at least s/PF_FROZEN/TASK_FROZEN/ as
a first step actually makes some sense... Note the "qualifier _and_ state"
above.

Tejun, Peter, do you think this makes any sense? I am just curious, but
"selective wakeup" looks potentially useful.



And what about oom_scan_process_thread() ? should we simply kill this
dead frozen/__thaw_task code or should we change freezing() to respect
TIF_MEMDIE?

Oleg.


  reply	other threads:[~2013-11-13 17:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-12 13:53 [PATCH] ipvs: Remove unused variable ret from sync_thread_master() Geert Uytterhoeven
2013-11-12 14:13 ` Peter Zijlstra
2013-11-12 14:21   ` David Laight
2013-11-12 14:31     ` Peter Zijlstra
2013-11-12 14:38       ` David Laight
2013-11-12 16:26       ` Oleg Nesterov
2013-11-12 14:52     ` Peter Zijlstra
2013-11-12 16:21       ` Oleg Nesterov
2013-11-12 16:56         ` oom-kill && frozen() Oleg Nesterov
2013-11-13  3:20           ` Tejun Heo
2013-11-13 17:07             ` Oleg Nesterov [this message]
2013-11-13 17:42               ` Peter Zijlstra
2013-11-13 18:15                 ` Oleg Nesterov
2013-11-13 19:11               ` __refrigerator() && saved task->state Oleg Nesterov
2013-11-13 19:14                 ` Peter Zijlstra
2013-11-13 19:40                   ` Oleg Nesterov
2013-11-12 17:00         ` [PATCH] ipvs: Remove unused variable ret from sync_thread_master() Peter Zijlstra
2013-11-12 18:04           ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131113170724.GA17739@redhat.com \
    --to=oleg@redhat.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=geert@linux-m68k.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.