From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751397AbcGMVXI (ORCPT <rfc822;w@1wt.eu>);
	Wed, 13 Jul 2016 17:23:08 -0400
Received: from mga14.intel.com ([192.55.52.115]:38202 "EHLO mga14.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751238AbcGMVWz convert rfc822-to-8bit (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 13 Jul 2016 17:22:55 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.28,359,1464678000"; 
   d="scan'208";a="138705919"
From: "Yu, Fenghua" <fenghua.yu@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>, "Anvin, H Peter" <h.peter.anvin@intel.com>,
        "Luck, Tony" <tony.luck@intel.com>, Tejun Heo <tj@kernel.org>,
        "Borislav Petkov" <bp@suse.de>, Stephane Eranian <eranian@google.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Marcelo Tosatti <mtosatti@redhat.com>,
        "David Carrillo-Cisneros" <davidcc@google.com>,
        "Shankar, Ravi V" <ravi.v.shankar@intel.com>,
        Vikas Shivappa <vikas.shivappa@linux.intel.com>,
        "Prakhya, Sai Praneeth" <sai.praneeth.prakhya@intel.com>,
        linux-kernel <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>
Subject: RE: [PATCH 24/32] Task fork and exit for rdtgroup
Thread-Topic: [PATCH 24/32] Task fork and exit for rdtgroup
Thread-Index: AQHR3IkMdg/VWHQfnU2y93DLx1VK4aAWzUKA///CFKCAAMCugP//j14g
Date: Wed, 13 Jul 2016 21:22:52 +0000
Message-ID: <3E5A0FA7E9CA944F9D5414FEC6C712205DFD4BC8@ORSMSX106.amr.corp.intel.com>
References: <1468371785-53231-1-git-send-email-fenghua.yu@intel.com>
 <1468371785-53231-25-git-send-email-fenghua.yu@intel.com>
 <alpine.DEB.2.11.1607131508160.4083@nanos>
 <3E5A0FA7E9CA944F9D5414FEC6C712205DFD4A75@ORSMSX106.amr.corp.intel.com>
 <alpine.DEB.2.11.1607132058030.4083@nanos>
In-Reply-To: <alpine.DEB.2.11.1607132058030.4083@nanos>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.22.254.140]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Wednesday, July 13, 2016 2:03 PM
> On Wed, 13 Jul 2016, Yu, Fenghua wrote:
> > On Wed, July 2016, Thomas Gleixner wrote
> > > On Tue, 12 Jul 2016, Fenghua Yu wrote:
> > > > +void rdtgroup_post_fork(struct task_struct *child) {
> > > > +	if (!use_rdtgroup_tasks)
> > > > +		return;
> > > > +
> > > > +	spin_lock_irq(&rdtgroup_task_lock);
> > > > +	if (list_empty(&child->rg_list)) {
> > >
> > > Why would the list be non empty after a fork?
> >
> > In this situation for a pid:
> > 1.rdtgroup_fork(): rg_list=null.
> > 2.setup_task_rg_lists(): rg_list is setup
> > 3.rdtgroup_fork(): rg_list is not empty
> 
> Why would rdtgroup_fork() be called twice for a given thread?
> 
> > This situation happens only during rscctrl mount time. Before mount,
> > post_fork() returns from !use_rdtgroup_tasks and doesn't set up
> > rg_list. After mount, rg_list() is always empty in post_fork(). But we need
> to check rg_list for above situation.
> >
> > Does that make sense?
> 
> No, that does not make any sense at all.
> 
> > Any suggestion for better soluation?
> 
> The problem you have is:
> 
>     fork
> 	list_init(rg_list);
> 	write_lock(tasklist_lock);
> 
> 	  task becomes visible
> 
> 	write_unlock(tasklist_lock);
> 
> 	rdtgroup_post_fork();
> 	    if (!use_rdtgroup_tasks)
> 	       return;
> 
> 	    spin_lock_irq(&rdtgroup_task_lock);
>   	    list_add();
> 	    spin_unlock_irq(&rdtgroup_task_lock);
> 
> I have no idea why this lock must be taken with _irq, but thats another story.
> Let's look at the mount side:
> 
>        spin_lock_irq(&rdtgroup_task_lock);
>        read_lock(&tasklist_lock);
> 
>        do_each_thread(g, p) {
>        	   WARN_ON(More magic crap happening there)
> 
> 	   spin_lock_irq(&p->sighand->siglock);
> 	   list_add();
> 	   spin_unlock_irq(&p->sighand->siglock);
> 		      ^^^^
> Great: You undo the irq disable of (&rdtgroup_task_lock) above! Oh well....
> 
>        read_unlock(&tasklist_lock);
>        spin_unlock_irq(&rdtgroup_task_lock);
> 
> So you need all this magic in rdtgroup_post_fork() and setup_task_rg_lists()
> just because you blindly positioned rdtgroup_post_fork() at the point where
> the cgroup_post_fork() stuff is. But you did not think a second about the
> locking rules here otherwise they would be documented somewhere.
> 
> You need a read_lock(&tasklist_lock) for the mount part anyway. So why
> don't you do the obvious:
> 
>     fork
> 	list_init(rg_list);
> 	write_lock(tasklist_lock);
> 
> 	rdtgroup_post_fork();
> 	    if (use_rdtgroup_tasks)
> 	        spin_lock(&rdtgroup_task_lock);
> 	        list_add();
> 	        spin_unlock(&rdtgroup_task_lock);
> 
>         task becomes visible
> 
> 	write_unlock(tasklist_lock);
> 
> And reorder the lock ordering in the mount path:
> 
>        read_lock(&tasklist_lock);
>        spin_lock(&rdtgroup_task_lock);
> 
> Now using rdtgroup_task_lock to protect current->rdtgroup is horrible as
> well. You need task->sighand->siglock in the mount path anyway to prevent
> exit races. So you can simplify the whole magic to:
> 
>     fork
> 	list_init(rg_list);
> 	write_lock(tasklist_lock);
> 
>         spin_lock(&current->sighand->siglock);
> 
> 	rdtgroup_post_fork();
> 	    if (use_rdtgroup_tasks)
> 	        list_add();
> 
> 	spin_unlock(&current->sighand->siglock);
> 	write_unlock(tasklist_lock);
> 
> That removes an extra lock/unlock operation from the fork path because
> current->sighand->siglock is taken inside of the tasklist_lock write
> current->sighand->locked
> section already.
> 
> So you need protection for use_rdtgroup_task, which is a complete
> misnomer btw. (rdtgroup_active would be too obvious, right?). That
> protection is simple because you can set that flag with tasklist_lock read
> locked which you hold anyway for iterating all threads in the mount path.
> 
> Aside of that you need to take tsk->sighand->siglock when you change
> tsk->rdtgroup, but that's a no-brainer and it gives you the extra
> tsk->benefit that
> you can protect such an operation against exit of the task that way by
> checking PF_EXITING under the lock. I don't see any protection against exit in
> your current implementation when a task is moved to a different partition.
> 
> Please sit down and describe the complete locking and protection scheme of
> this stuff. I'm not going to figure this out from the obscure code another time.
> 
> Thanks,
> 
> 	tglx

Sure, I'll rethink of the locking and protection scheme for the tasks.

Thanks.

-Fenghua