All of lore.kernel.org
 help / color / mirror / Atom feed
* Attaching a process to cgroups
@ 2012-06-19 18:58 Alexey Vlasov
  2012-06-20  3:34 ` Daisuke Nishimura
  2012-06-20 12:28 ` Mike Galbraith
  0 siblings, 2 replies; 20+ messages in thread
From: Alexey Vlasov @ 2012-06-19 18:58 UTC (permalink / raw)
  To: linux-kernel

Hi.

Is it possible to somehow fasten a process of pid attaching to cgroup?
The problem is the pid attaches to a task-file with some strange delay:

22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
22:28:00.788389 lseek(3, 0, SEEK_SET)   = 0 <0.000004>
22:28:00.788426 write(3, "16317\n", 6)  = 6 <0.128094>
22:28:00.916578 close(3)                = 0 <0.000006>

For a comparison here's a test attaching pid-file in placed tmpfs: 

22:24:41.892562 open("/tmp/w_test-l24-apache1_4bdf3d13/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000010>
22:24:41.892597 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
22:24:41.892631 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5685b6f000 <0.000006>
22:24:41.892664 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
22:24:41.892701 lseek(3, 6, SEEK_SET)   = 6 <0.000004>
22:24:41.892738 write(3, "25966\n", 6)  = 6 <0.000008>
22:24:41.892767 close(3)                = 0 <0.000005>

Here goes it immediately.

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-19 18:58 Attaching a process to cgroups Alexey Vlasov
@ 2012-06-20  3:34 ` Daisuke Nishimura
  2012-06-20 11:08   ` Alexey Vlasov
  2012-06-20 12:28 ` Mike Galbraith
  1 sibling, 1 reply; 20+ messages in thread
From: Daisuke Nishimura @ 2012-06-20  3:34 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Daisuke Nishimura, linux-kernel

Hi.

What does "cat /sys/fs/cgroup/.../apache/memory.move_charge_at_immigrate" show ?

If it shows non-zero value, you can make the pid attachment faster by writing "0" to
memory.move_charge_at_immigrate before attaching the process.
But note that if you disable the feature, current memory usage of the process is not
moved to the new cgroup.

Thanks,
Daisuke Nishimura.

On Tue, 19 Jun 2012 22:58:56 +0400
Alexey Vlasov <renton@renton.name> wrote:

> Hi.
> 
> Is it possible to somehow fasten a process of pid attaching to cgroup?
> The problem is the pid attaches to a task-file with some strange delay:
> 
> 22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
> 22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
> 22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788389 lseek(3, 0, SEEK_SET)   = 0 <0.000004>
> 22:28:00.788426 write(3, "16317\n", 6)  = 6 <0.128094>
> 22:28:00.916578 close(3)                = 0 <0.000006>
> 
> For a comparison here's a test attaching pid-file in placed tmpfs: 
> 
> 22:24:41.892562 open("/tmp/w_test-l24-apache1_4bdf3d13/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000010>
> 22:24:41.892597 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
> 22:24:41.892631 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5685b6f000 <0.000006>
> 22:24:41.892664 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
> 22:24:41.892701 lseek(3, 6, SEEK_SET)   = 6 <0.000004>
> 22:24:41.892738 write(3, "25966\n", 6)  = 6 <0.000008>
> 22:24:41.892767 close(3)                = 0 <0.000005>
> 
> Here goes it immediately.
> 
> -- 
> BRGDS. Alexey Vlasov.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-20  3:34 ` Daisuke Nishimura
@ 2012-06-20 11:08   ` Alexey Vlasov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Vlasov @ 2012-06-20 11:08 UTC (permalink / raw)
  To: Daisuke Nishimura; +Cc: linux-kernel

On Wed, Jun 20, 2012 at 12:34:31PM +0900, Daisuke Nishimura wrote:
> 
> What does "cat /sys/fs/cgroup/.../apache/memory.move_charge_at_immigrate" show ?

Here I have zero.

By the way such a delay is not only in memory controller but also in
others - in cpu, blkio for example.

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-19 18:58 Attaching a process to cgroups Alexey Vlasov
  2012-06-20  3:34 ` Daisuke Nishimura
@ 2012-06-20 12:28 ` Mike Galbraith
  2012-06-21  7:54   ` Alexey Vlasov
  2012-07-25 13:36   ` Alexey Vlasov
  1 sibling, 2 replies; 20+ messages in thread
From: Mike Galbraith @ 2012-06-20 12:28 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: linux-kernel

On Tue, 2012-06-19 at 22:58 +0400, Alexey Vlasov wrote: 
> Hi.
> 
> Is it possible to somehow fasten a process of pid attaching to cgroup?
> The problem is the pid attaches to a task-file with some strange delay:
> 
> 22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
> 22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
> 22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788389 lseek(3, 0, SEEK_SET)   = 0 <0.000004>
> 22:28:00.788426 write(3, "16317\n", 6)  = 6 <0.128094>
> 22:28:00.916578 close(3)                = 0 <0.000006>


kernel/cgroup.c::cgroup_attach_task()
{
...
	synchronize_rcu();
...
}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-20 12:28 ` Mike Galbraith
@ 2012-06-21  7:54   ` Alexey Vlasov
  2012-06-21  8:23     ` Mike Galbraith
  2012-07-25 13:36   ` Alexey Vlasov
  1 sibling, 1 reply; 20+ messages in thread
From: Alexey Vlasov @ 2012-06-21  7:54 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel

On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> 
> kernel/cgroup.c::cgroup_attach_task()
> {
> ...
> 	synchronize_rcu();
> ...
> }

So nothing can be done here? (I mean if only I knew how to fix it I
wouldn't ask about it ;)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-21  7:54   ` Alexey Vlasov
@ 2012-06-21  8:23     ` Mike Galbraith
  2012-06-21  8:26       ` Mike Galbraith
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Mike Galbraith @ 2012-06-21  8:23 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: linux-kernel

On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote: 
> On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > 
> > kernel/cgroup.c::cgroup_attach_task()
> > {
> > ...
> > 	synchronize_rcu();
> > ...
> > }
> 
> So nothing can be done here? (I mean if only I knew how to fix it I
> wouldn't ask about it ;)

Sure, kill the obnoxious thing, it's sitting right in the middle of the
userspace interface.

I banged on it a while back (wrt explosive android patches), extracted
RCU from the userspace interface.  It seemed to work great, much faster,
couldn't make it explode.  I wouldn't bet anything I wasn't willing to
immediately part with that the result was really really safe though ;-)

-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-21  8:23     ` Mike Galbraith
@ 2012-06-21  8:26       ` Mike Galbraith
  2012-06-26 18:06       ` Paul E. McKenney
  2012-07-23 20:41       ` Andrea Righi
  2 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2012-06-21  8:26 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: linux-kernel

On Thu, 2012-06-21 at 10:23 +0200, Mike Galbraith wrote: 
> On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote: 
> > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > 
> > > kernel/cgroup.c::cgroup_attach_task()
> > > {
> > > ...
> > > 	synchronize_rcu();
> > > ...
> > > }
> > 
> > So nothing can be done here? (I mean if only I knew how to fix it I
> > wouldn't ask about it ;)
> 
> Sure, kill the obnoxious thing, it's sitting right in the middle of the
> userspace interface.

Um, lest anyone misunderstand, no, I don't mean by whacking instances of
synchronize_rcu() without further ado :)

-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-21  8:23     ` Mike Galbraith
  2012-06-21  8:26       ` Mike Galbraith
@ 2012-06-26 18:06       ` Paul E. McKenney
  2012-06-27  7:23         ` Mike Galbraith
  2012-07-23 20:41       ` Andrea Righi
  2 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2012-06-26 18:06 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Alexey Vlasov, linux-kernel

On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote: 
> > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > 
> > > kernel/cgroup.c::cgroup_attach_task()
> > > {
> > > ...
> > > 	synchronize_rcu();
> > > ...
> > > }
> > 
> > So nothing can be done here? (I mean if only I knew how to fix it I
> > wouldn't ask about it ;)
> 
> Sure, kill the obnoxious thing, it's sitting right in the middle of the
> userspace interface.
> 
> I banged on it a while back (wrt explosive android patches), extracted
> RCU from the userspace interface.  It seemed to work great, much faster,
> couldn't make it explode.  I wouldn't bet anything I wasn't willing to
> immediately part with that the result was really really safe though ;-)

Or replace it with synchronize_rcu_expedited().  You can "get lucky"
for quite some time removing synchronize_rcu() calls!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-26 18:06       ` Paul E. McKenney
@ 2012-06-27  7:23         ` Mike Galbraith
  2012-06-27 17:10           ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2012-06-27  7:23 UTC (permalink / raw)
  To: paulmck; +Cc: Alexey Vlasov, linux-kernel

On Tue, 2012-06-26 at 11:06 -0700, Paul E. McKenney wrote: 
> On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> > On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote: 
> > > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > > 
> > > > kernel/cgroup.c::cgroup_attach_task()
> > > > {
> > > > ...
> > > > 	synchronize_rcu();
> > > > ...
> > > > }
> > > 
> > > So nothing can be done here? (I mean if only I knew how to fix it I
> > > wouldn't ask about it ;)
> > 
> > Sure, kill the obnoxious thing, it's sitting right in the middle of the
> > userspace interface.
> > 
> > I banged on it a while back (wrt explosive android patches), extracted
> > RCU from the userspace interface.  It seemed to work great, much faster,
> > couldn't make it explode.  I wouldn't bet anything I wasn't willing to
> > immediately part with that the result was really really safe though ;-)
> 
> Or replace it with synchronize_rcu_expedited().  You can "get lucky"
> for quite some time removing synchronize_rcu() calls!

s/remove/replace, but yup.  A company that wanted to use the android
patches plus my tinkering showed a fix they needed on top to close a
race discovered in their testing.  So yeah, even when all seems fine,
extracting synchronize_rcu() may expose evils you couldn't encounter
before, and didn't happen to encounter afterward.

-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-27  7:23         ` Mike Galbraith
@ 2012-06-27 17:10           ` Paul E. McKenney
  2012-06-28  2:40             ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2012-06-27 17:10 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Alexey Vlasov, linux-kernel

On Wed, Jun 27, 2012 at 09:23:31AM +0200, Mike Galbraith wrote:
> On Tue, 2012-06-26 at 11:06 -0700, Paul E. McKenney wrote: 
> > On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> > > On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote: 
> > > > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > > > 
> > > > > kernel/cgroup.c::cgroup_attach_task()
> > > > > {
> > > > > ...
> > > > > 	synchronize_rcu();
> > > > > ...
> > > > > }
> > > > 
> > > > So nothing can be done here? (I mean if only I knew how to fix it I
> > > > wouldn't ask about it ;)
> > > 
> > > Sure, kill the obnoxious thing, it's sitting right in the middle of the
> > > userspace interface.
> > > 
> > > I banged on it a while back (wrt explosive android patches), extracted
> > > RCU from the userspace interface.  It seemed to work great, much faster,
> > > couldn't make it explode.  I wouldn't bet anything I wasn't willing to
> > > immediately part with that the result was really really safe though ;-)
> > 
> > Or replace it with synchronize_rcu_expedited().  You can "get lucky"
> > for quite some time removing synchronize_rcu() calls!
> 
> s/remove/replace, but yup.  A company that wanted to use the android
> patches plus my tinkering showed a fix they needed on top to close a
> race discovered in their testing.  So yeah, even when all seems fine,
> extracting synchronize_rcu() may expose evils you couldn't encounter
> before, and didn't happen to encounter afterward.

I really did mean "remove".  Removing a synchronize_rcu() does result
in a race, but often an extremely low-probability race.  So you can
remove a synchronize_rcu() and get lucky for a long time, but sooner
or later, something will explode.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-27 17:10           ` Paul E. McKenney
@ 2012-06-28  2:40             ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2012-06-28  2:40 UTC (permalink / raw)
  To: paulmck; +Cc: Alexey Vlasov, linux-kernel

On Wed, 2012-06-27 at 10:10 -0700, Paul E. McKenney wrote:

> I really did mean "remove".  Removing a synchronize_rcu() does result
> in a race, but often an extremely low-probability race.  So you can
> remove a synchronize_rcu() and get lucky for a long time, but sooner
> or later, something will explode.

Ah.  Android patches did "replace", and it still didn't take long at all
for bad luck to happen.. so "remove" for that particular instance would
probably result in "sooner" variety explosions :)

-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-21  8:23     ` Mike Galbraith
  2012-06-21  8:26       ` Mike Galbraith
  2012-06-26 18:06       ` Paul E. McKenney
@ 2012-07-23 20:41       ` Andrea Righi
  2012-07-24  1:19         ` Mike Galbraith
  2 siblings, 1 reply; 20+ messages in thread
From: Andrea Righi @ 2012-07-23 20:41 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Alexey Vlasov, Paul E. McKenney, linux-kernel

On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote: 
> > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > 
> > > kernel/cgroup.c::cgroup_attach_task()
> > > {
> > > ...
> > > 	synchronize_rcu();
> > > ...
> > > }
> > 
> > So nothing can be done here? (I mean if only I knew how to fix it I
> > wouldn't ask about it ;)
> 
> Sure, kill the obnoxious thing, it's sitting right in the middle of the
> userspace interface.
> 
> I banged on it a while back (wrt explosive android patches), extracted
> RCU from the userspace interface.  It seemed to work great, much faster,
> couldn't make it explode.  I wouldn't bet anything I wasn't willing to
> immediately part with that the result was really really safe though ;-)
> 
> -Mike

JFYI,

I'm testing the following patch in a bunch of hosts and I wasn't able to
make any of them to explode, even running a multi-threaded
cgroup-intensive workload, but probably I was just lucky (or unlucky,
depending on the point of view).

It is basically the same Not-signed-off-by work posted by Mike a while
ago: https://lkml.org/lkml/2011/4/12/599.

In addition, I totally removed the synchronize_rcu() call from
cgroup_attach_task() and added the call_rcu -> schedule_work removal
also for css_set. The latter looks unnecessary to me from a logical
point of view, or maybe I'm missing something, because I can't explain
why with it I can't trigger any BUG / oops.

Mike, did you make any progress from your old patch?

Thanks,
-Andrea

---
 include/linux/cgroup.h |    2 ++
 kernel/cgroup.c        |   91 +++++++++++++++++++++++++++++-------------------
 2 files changed, 58 insertions(+), 35 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d3f5fba..2bab2d6 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -212,6 +212,7 @@ struct cgroup {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+	struct work_struct work;
 
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
@@ -260,6 +261,7 @@ struct css_set {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+	struct work_struct work;
 };
 
 /*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b303dfc..aa7cc06 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -396,6 +396,21 @@ static struct hlist_head *css_set_hash(struct cgroup_subsys_state *css[])
  * compiled into their kernel but not actually in use */
 static int use_task_css_set_links __read_mostly;
 
+static void free_css_set_work(struct work_struct *work)
+{
+	struct css_set *cg = container_of(work, struct css_set, work);
+
+	kfree(cg);
+}
+
+static void free_css_set_rcu(struct rcu_head *obj)
+{
+	struct css_set *cg = container_of(obj, struct css_set, rcu_head);
+
+	INIT_WORK(&cg->work, free_css_set_work);
+	schedule_work(&cg->work);
+}
+
 static void __put_css_set(struct css_set *cg, int taskexit)
 {
 	struct cg_cgroup_link *link;
@@ -433,7 +448,7 @@ static void __put_css_set(struct css_set *cg, int taskexit)
 	}
 
 	write_unlock(&css_set_lock);
-	kfree_rcu(cg, rcu_head);
+	call_rcu(&cg->rcu_head, free_css_set_rcu);
 }
 
 /*
@@ -875,44 +890,52 @@ static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 	return ret;
 }
 
-static void cgroup_diput(struct dentry *dentry, struct inode *inode)
+static void free_cgroup_work(struct work_struct *work)
 {
-	/* is dentry a directory ? if so, kfree() associated cgroup */
-	if (S_ISDIR(inode->i_mode)) {
-		struct cgroup *cgrp = dentry->d_fsdata;
-		struct cgroup_subsys *ss;
-		BUG_ON(!(cgroup_is_removed(cgrp)));
-		/* It's possible for external users to be holding css
-		 * reference counts on a cgroup; css_put() needs to
-		 * be able to access the cgroup after decrementing
-		 * the reference count in order to know if it needs to
-		 * queue the cgroup to be handled by the release
-		 * agent */
-		synchronize_rcu();
+	struct cgroup *cgrp = container_of(work, struct cgroup, work);
+	struct cgroup_subsys *ss;
 
-		mutex_lock(&cgroup_mutex);
-		/*
-		 * Release the subsystem state objects.
-		 */
-		for_each_subsys(cgrp->root, ss)
-			ss->destroy(cgrp);
+	mutex_lock(&cgroup_mutex);
+	/*
+	 * Release the subsystem state objects.
+	 */
+	for_each_subsys(cgrp->root, ss)
+		ss->destroy(cgrp);
 
-		cgrp->root->number_of_cgroups--;
-		mutex_unlock(&cgroup_mutex);
+	cgrp->root->number_of_cgroups--;
+	mutex_unlock(&cgroup_mutex);
 
-		/*
-		 * Drop the active superblock reference that we took when we
-		 * created the cgroup
-		 */
-		deactivate_super(cgrp->root->sb);
+	/*
+	 * Drop the active superblock reference that we took when we
+	 * created the cgroup
+	 */
+	deactivate_super(cgrp->root->sb);
 
-		/*
-		 * if we're getting rid of the cgroup, refcount should ensure
-		 * that there are no pidlists left.
-		 */
-		BUG_ON(!list_empty(&cgrp->pidlists));
+	/*
+	 * if we're getting rid of the cgroup, refcount should ensure
+	 * that there are no pidlists left.
+	 */
+	BUG_ON(!list_empty(&cgrp->pidlists));
+
+	kfree(cgrp);
+}
+
+static void free_cgroup_rcu(struct rcu_head *obj)
+{
+	struct cgroup *cgrp = container_of(obj, struct cgroup, rcu_head);
+
+	INIT_WORK(&cgrp->work, free_cgroup_work);
+	schedule_work(&cgrp->work);
+}
+
+static void cgroup_diput(struct dentry *dentry, struct inode *inode)
+{
+	/* is dentry a directory ? if so, kfree() associated cgroup */
+	if (S_ISDIR(inode->i_mode)) {
+		struct cgroup *cgrp = dentry->d_fsdata;
 
-		kfree_rcu(cgrp, rcu_head);
+		BUG_ON(!(cgroup_is_removed(cgrp)));
+		call_rcu(&cgrp->rcu_head, free_cgroup_rcu);
 	} else {
 		struct cfent *cfe = __d_cfe(dentry);
 		struct cgroup *cgrp = dentry->d_parent->d_fsdata;
@@ -1990,8 +2013,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 			ss->attach(cgrp, &tset);
 	}
 
-	synchronize_rcu();
-
 	/*
 	 * wake up rmdir() waiter. the rmdir should fail since the cgroup
 	 * is no longer empty.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-07-23 20:41       ` Andrea Righi
@ 2012-07-24  1:19         ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2012-07-24  1:19 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Alexey Vlasov, Paul E. McKenney, linux-kernel

On Mon, 2012-07-23 at 22:41 +0200, Andrea Righi wrote: 
> On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> > On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote: 
> > > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > > 
> > > > kernel/cgroup.c::cgroup_attach_task()
> > > > {
> > > > ...
> > > > 	synchronize_rcu();
> > > > ...
> > > > }
> > > 
> > > So nothing can be done here? (I mean if only I knew how to fix it I
> > > wouldn't ask about it ;)
> > 
> > Sure, kill the obnoxious thing, it's sitting right in the middle of the
> > userspace interface.
> > 
> > I banged on it a while back (wrt explosive android patches), extracted
> > RCU from the userspace interface.  It seemed to work great, much faster,
> > couldn't make it explode.  I wouldn't bet anything I wasn't willing to
> > immediately part with that the result was really really safe though ;-)
> > 
> > -Mike
> 
> JFYI,
> 
> I'm testing the following patch in a bunch of hosts and I wasn't able to
> make any of them to explode, even running a multi-threaded
> cgroup-intensive workload, but probably I was just lucky (or unlucky,
> depending on the point of view).
> 
> It is basically the same Not-signed-off-by work posted by Mike a while
> ago: https://lkml.org/lkml/2011/4/12/599.
> 
> In addition, I totally removed the synchronize_rcu() call from
> cgroup_attach_task() and added the call_rcu -> schedule_work removal
> also for css_set. The latter looks unnecessary to me from a logical
> point of view, or maybe I'm missing something, because I can't explain
> why with it I can't trigger any BUG / oops.
> 
> Mike, did you make any progress from your old patch?

No, it worked, but I couldn't prove it was really safe, so let it drop.

-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-06-20 12:28 ` Mike Galbraith
  2012-06-21  7:54   ` Alexey Vlasov
@ 2012-07-25 13:36   ` Alexey Vlasov
  2012-07-25 13:57     ` Mike Galbraith
  1 sibling, 1 reply; 20+ messages in thread
From: Alexey Vlasov @ 2012-07-25 13:36 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, paulmck

Hi.

Now I've got almost 5k used groups and it got even worse. Now I've got
almost 5k used groups and it got even worse.

If only write was working slower, now everything connected with cgroups
is hardly working.

Could it be connected with synchronize_rcu()?

Hanging on read():

# strace -ttT cat /proc/cgroups 

17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>
17:30:51.272302 write(1, "#subsys_name\thierarchy\tnum_cgrou"..., 112#subsys_name       hierarchy       num_cgroups     enabled
cpuacct 1       16844   1
memory  2       16844   1
devices 3       16844   1
blkio   4       16844   1
) = 112 <0.000018>
17:30:51.272363 read(3, "", 32768)      = 0 <0.000007>
17:30:51.272405 brk(0x62d000)           = 0x62d000 <0.000011>
17:30:51.272444 close(3)                = 0 <0.000011>
17:30:51.272488 close(1)                = 0 <0.000007>
17:30:51.272518 close(2)                = 0 <0.000005>

On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> On Tue, 2012-06-19 at 22:58 +0400, Alexey Vlasov wrote: 

> > Is it possible to somehow fasten a process of pid attaching to cgroup?
> > The problem is the pid attaches to a task-file with some strange delay:
> > 
> > 22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
> > 22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> > 22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
> > 22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> > 22:28:00.788389 lseek(3, 0, SEEK_SET)   = 0 <0.000004>
> > 22:28:00.788426 write(3, "16317\n", 6)  = 6 <0.128094>
> > 22:28:00.916578 close(3)                = 0 <0.000006>
> 
> 
> kernel/cgroup.c::cgroup_attach_task()
> {
> ...
> 	synchronize_rcu();
> ...
> }
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-07-25 13:36   ` Alexey Vlasov
@ 2012-07-25 13:57     ` Mike Galbraith
  2012-07-26 13:02       ` Alexey Vlasov
  2012-08-08 16:40       ` Alexey Vlasov
  0 siblings, 2 replies; 20+ messages in thread
From: Mike Galbraith @ 2012-07-25 13:57 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: linux-kernel, paulmck

On Wed, 2012-07-25 at 17:36 +0400, Alexey Vlasov wrote: 
> Hi.
> 
> Now I've got almost 5k used groups and it got even worse. Now I've got
> almost 5k used groups and it got even worse.
> 
> If only write was working slower, now everything connected with cgroups
> is hardly working.
> 
> Could it be connected with synchronize_rcu()?

I'd profile it with perf, and expect to find a large pile of cycles.

> Hanging on read():
> 
> # strace -ttT cat /proc/cgroups 
> 
> 17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
> 17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
> 17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
> 17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
> 17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>

Ew.. zillion cgroups is definitely a bad idea.

-Mike


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-07-25 13:57     ` Mike Galbraith
@ 2012-07-26 13:02       ` Alexey Vlasov
  2012-07-26 14:44         ` Mike Galbraith
  2012-08-08 16:40       ` Alexey Vlasov
  1 sibling, 1 reply; 20+ messages in thread
From: Alexey Vlasov @ 2012-07-26 13:02 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, paulmck

On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> 
> I'd profile it with perf, and expect to find a large pile of cycles.

I did it the as following:
# perf stat cat /proc/self/cgroup 

4:blkio:/
3:devices:/
2:memory:/
1:cpuacct:/

 Performance counter stats for 'cat /proc/self/cgroup':

          0.472513 task-clock                #    0.000 CPUs utilized          
                 1 context-switches          #    0.002 M/sec                  
                 1 CPU-migrations            #    0.002 M/sec                  
               169 page-faults               #    0.358 M/sec                  
           1111521 cycles                    #    2.352 GHz                    
            784737 stalled-cycles-frontend   #   70.60% frontend cycles idle   
            445520 stalled-cycles-backend    #   40.08% backend  cycles idle   
            576622 instructions              #    0.52  insns per cycle        
                                             #    1.36  stalled cycles per insn
            120032 branches                  #  254.029 M/sec                  
              6577 branch-misses             #    5.48% of all branches        

       9.114631804 seconds time elapsed

# perf report --sort comm,dso
Kernel address maps (/proc/{kallsyms,modules}) were restricted.

Check /proc/sys/kernel/kptr_restrict before running 'perf record'.

If some relocation was applied (e.g. kexec) symbols may be misresolved.

Samples in kernel modules can't be resolved as well.

# ========
# captured on: Thu Jul 26 16:23:06 2012
# hostname : l24
# os release : 3.3.3-1gb-c-s-m
# perf version : 3.2
# arch : x86_64
# nrcpus online : 24
# nrcpus avail : 24
# cpudesc : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
# total memory : 74181032 kB
# cmdline : /usr/sbin/perf record cat /proc/self/cgroup 
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1758, 1759, 1760, 1761, 1762, 1763, 1764, 1765, 1766, 1767, 1768, 1769, 1770, 1771, 1772, 1773, 1774, 1775, 1776, 1777, 1778, 1779, 1780, 1781 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Events: 21  cycles
#
# Overhead  Command      Shared Object
# ........  .......  .................
#
   100.00%      cat  [kernel.kallsyms]

but I don't know what next unfortunately.

I also checked the same thing on the other server with the 2.6.37 kernel,
there' some thousands cgroups too, and it somehow works there immediately.

-- 
Alexey Vlasov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-07-26 13:02       ` Alexey Vlasov
@ 2012-07-26 14:44         ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2012-07-26 14:44 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: linux-kernel, paulmck

On Thu, 2012-07-26 at 17:02 +0400, Alexey Vlasov wrote: 
> On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> > 
> > I'd profile it with perf, and expect to find a large pile of cycles.
> 
> I did it the as following:
> # perf stat cat /proc/self/cgroup 
> 
> 4:blkio:/
> 3:devices:/
> 2:memory:/
> 1:cpuacct:/
> 
>  Performance counter stats for 'cat /proc/self/cgroup':
> 
>           0.472513 task-clock                #    0.000 CPUs utilized          
>                  1 context-switches          #    0.002 M/sec                  
>                  1 CPU-migrations            #    0.002 M/sec                  
>                169 page-faults               #    0.358 M/sec                  
>            1111521 cycles                    #    2.352 GHz                    
>             784737 stalled-cycles-frontend   #   70.60% frontend cycles idle   
>             445520 stalled-cycles-backend    #   40.08% backend  cycles idle   
>             576622 instructions              #    0.52  insns per cycle        
>                                              #    1.36  stalled cycles per insn
>             120032 branches                  #  254.029 M/sec                  
>               6577 branch-misses             #    5.48% of all branches        
> 
>        9.114631804 seconds time elapsed

Sleepy box.

> # perf report --sort comm,dso

perf report --sort symbol,dso won't toss everything in one basket.

> Kernel address maps (/proc/{kallsyms,modules}) were restricted.
> 
> Check /proc/sys/kernel/kptr_restrict before running 'perf record'.
> 
> If some relocation was applied (e.g. kexec) symbols may be misresolved.
> 
> Samples in kernel modules can't be resolved as well.
> 
> # ========
> # captured on: Thu Jul 26 16:23:06 2012
> # hostname : l24
> # os release : 3.3.3-1gb-c-s-m
> # perf version : 3.2
> # arch : x86_64
> # nrcpus online : 24
> # nrcpus avail : 24
> # cpudesc : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
> # total memory : 74181032 kB
> # cmdline : /usr/sbin/perf record cat /proc/self/cgroup
>  
> # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1758, 1759, 1760, 1761, 1762, 1763, 1764, 1765, 1766, 1767, 1768, 1769, 1770, 1771, 1772, 1773, 1774, 1775, 1776, 1777, 1778, 1779, 1780, 1781 }
> # HEADER_CPU_TOPOLOGY info available, use -I to display
> # HEADER_NUMA_TOPOLOGY info available, use -I to display
> # ========
> #
> # Events: 21  cycles
> #
> # Overhead  Command      Shared Object
> # ........  .......  .................
> #
>    100.00%      cat  [kernel.kallsyms]
> 
> but I don't know what next unfortunately.

I'll have to pass.  I would just stop creating thousands of cgroups ;-)
  
They've become a lot more scalable fairly recently, but if you populate
those thousands with frequently runnable tasks, I suspect you'll still
see huge truckloads of scheduler in profiles.. maybe even nothing else. 

> I also checked the same thing on the other server with the 2.6.37 kernel,
> there' some thousands cgroups too, and it somehow works there immediately.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-07-25 13:57     ` Mike Galbraith
  2012-07-26 13:02       ` Alexey Vlasov
@ 2012-08-08 16:40       ` Alexey Vlasov
  2012-08-08 16:51         ` Paul E. McKenney
  1 sibling, 1 reply; 20+ messages in thread
From: Alexey Vlasov @ 2012-08-08 16:40 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, paulmck

On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> > Hanging on read():
> > 
> > # strace -ttT cat /proc/cgroups 
> > 
> > 17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
> > 17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
> > 17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
> > 17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
> > 17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>
 
In general I've changed it to synchronize_rcu_expedited () and all the
delays have gone both on writing and reading files from cgroups.

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-08-08 16:40       ` Alexey Vlasov
@ 2012-08-08 16:51         ` Paul E. McKenney
  2012-08-10  9:53           ` Alexey Vlasov
  0 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2012-08-08 16:51 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Mike Galbraith, linux-kernel

On Wed, Aug 08, 2012 at 08:40:33PM +0400, Alexey Vlasov wrote:
> On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> > > Hanging on read():
> > > 
> > > # strace -ttT cat /proc/cgroups 
> > > 
> > > 17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
> > > 17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
> > > 17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
> > > 17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
> > > 17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>
> 
> In general I've changed it to synchronize_rcu_expedited () and all the
> delays have gone both on writing and reading files from cgroups.

Is the writing and reading from cgroups something that your workload
does all the time, or is it something that happens only on occasional
updates to your cgroup configuration?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Attaching a process to cgroups
  2012-08-08 16:51         ` Paul E. McKenney
@ 2012-08-10  9:53           ` Alexey Vlasov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Vlasov @ 2012-08-10  9:53 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Mike Galbraith, linux-kernel

On Wed, Aug 08, 2012 at 09:51:29AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 08, 2012 at 08:40:33PM +0400, Alexey Vlasov wrote:
> > 
> > In general I've changed it to synchronize_rcu_expedited () and all the
> > delays have gone both on writing and reading files from cgroups.
> 
> Is the writing and reading from cgroups something that your workload
> does all the time, or is it something that happens only on occasional
> updates to your cgroup configuration?

There always were some delay on writing. It reproduces easily, you have
to create some 1000 groups (may be it can be enough to create 1 group, I
didn't check it actually) and write pid to a task file of the group. I
described it in my first message.

Delays on reading appeared when there began an active rotation of
proccesses in task files and may be by renewing of counters
(cpuacct.stat, memory.stat) due to the cgroups hierarchy. LA has grown
from 10 to 500 and all the programms that read cgroups files in /proc
(htop for example) practically stopped working.

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-08-10  9:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-19 18:58 Attaching a process to cgroups Alexey Vlasov
2012-06-20  3:34 ` Daisuke Nishimura
2012-06-20 11:08   ` Alexey Vlasov
2012-06-20 12:28 ` Mike Galbraith
2012-06-21  7:54   ` Alexey Vlasov
2012-06-21  8:23     ` Mike Galbraith
2012-06-21  8:26       ` Mike Galbraith
2012-06-26 18:06       ` Paul E. McKenney
2012-06-27  7:23         ` Mike Galbraith
2012-06-27 17:10           ` Paul E. McKenney
2012-06-28  2:40             ` Mike Galbraith
2012-07-23 20:41       ` Andrea Righi
2012-07-24  1:19         ` Mike Galbraith
2012-07-25 13:36   ` Alexey Vlasov
2012-07-25 13:57     ` Mike Galbraith
2012-07-26 13:02       ` Alexey Vlasov
2012-07-26 14:44         ` Mike Galbraith
2012-08-08 16:40       ` Alexey Vlasov
2012-08-08 16:51         ` Paul E. McKenney
2012-08-10  9:53           ` Alexey Vlasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.