From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753733Ab1LURhk (ORCPT ); Wed, 21 Dec 2011 12:37:40 -0500 Received: from mail-iy0-f174.google.com ([209.85.210.174]:40368 "EHLO mail-iy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751730Ab1LURhi (ORCPT ); Wed, 21 Dec 2011 12:37:38 -0500 Date: Wed, 21 Dec 2011 09:37:33 -0800 From: Tejun Heo To: tip-bot for Daisuke Nishimura Cc: linux-tip-commits@vger.kernel.org, linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@redhat.com, a.p.zijlstra@chello.nl, pjt@google.com, tglx@linutronix.de, mingo@elte.hu, Frederic Weisbecker Subject: Re: [tip:sched/core] sched: Fix cgroup movement of forking process Message-ID: <20111221173733.GF9213@google.com> References: <20111215143655.662676b0.nishimura@mxp.nes.nec.co.jp> <20111221172632.GD9213@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111221172632.GD9213@google.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (cc'ing Frederic) On Wed, Dec 21, 2011 at 09:26:32AM -0800, Tejun Heo wrote: > Hello, guys. > > On Wed, Dec 21, 2011 at 03:44:14AM -0800, tip-bot for Daisuke Nishimura wrote: > > sched: Fix cgroup movement of forking process > > > > There is a small race between task_fork_fair() and sched_move_task(), > > which is trying to move the parent. > > > > task_fork_fair() sched_move_task() > > --------------------------------+--------------------------------- > > cfs_rq = task_cfs_rq(current) > > -> cfs_rq is the "old" one. > > curr = cfs_rq->curr > > -> curr is set to the parent. > > task_rq_lock() > > dequeue_task() > > ->parent.se.vruntime -= (old)cfs_rq->min_vruntime > > enqueue_task() > > ->parent.se.vruntime += (new)cfs_rq->min_vruntime > > task_rq_unlock() > > raw_spin_lock_irqsave(rq->lock) > > se->vruntime = curr->vruntime > > -> vruntime of the child is set to that of the parent > > which has already been updated by sched_move_task(). > > se->vruntime -= (old)cfs_rq->min_vruntime. > > raw_spin_unlock_irqrestore(rq->lock) > > > > As a result, vruntime of the child becomes far bigger than expected, > > if (new)cfs_rq->min_vruntime >> (old)cfs_rq->min_vruntime. > > > > This patch fixes this problem by setting "cfs_rq" and "curr" after > > holding the rq->lock. > > The race shouldn't happen with threadgroup locking scheduled to be > merged for the coming merge window. sched_fork() and cgroup migration > become exclusive and won't happen concurrently. Would still make > sense for -stable tho. I retract that. sched_move_task() can also be called from cgroup_exit() which is outside of threadgroup locking. Frederic, so, it seems we actually have race conditions here. I really wish cgroup made sure that things like this can't happen even if we pay a bit of overhead in relatively cold paths. I could be being unrealistic tho. Any ideas? Thanks. -- tejun