linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cgroup: disable irqs while holding css_set_lock
@ 2016-06-06 22:24 Daniel Bristot de Oliveira
  2016-06-07 10:14 ` Juri Lelli
  2016-06-07 15:13 ` Daniel Bristot de Oliveira
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Bristot de Oliveira @ 2016-06-06 22:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rik van Riel, Luis Claudio R. Goncalves, Tejun Heo, Li Zefan,
	Johannes Weiner, Juri Lelli, cgroups

While testing the deadline scheduler + cgroup setup I hit this
warning.

[  132.612935] ------------[ cut here ]------------
[  132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
[  132.612952] Modules linked in: (a ton of modules...)
[  132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2
[  132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[  132.612982]  0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
[  132.612984]  0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
[  132.612985]  00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
[  132.612986] Call Trace:
[  132.612987]  <IRQ>  [<ffffffff813d229e>] dump_stack+0x63/0x85
[  132.612994]  [<ffffffff810a652b>] __warn+0xcb/0xf0
[  132.612997]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
[  132.612999]  [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20
[  132.613000]  [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80
[  132.613008]  [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20
[  132.613010]  [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10
[  132.613015]  [<ffffffff811388ac>] put_css_set+0x5c/0x60
[  132.613016]  [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0
[  132.613017]  [<ffffffff810a3912>] __put_task_struct+0x42/0x140
[  132.613018]  [<ffffffff810e776a>] dl_task_timer+0xca/0x250
[  132.613027]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
[  132.613030]  [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270
[  132.613031]  [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190
[  132.613034]  [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60
[  132.613035]  [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50
[  132.613037]  [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0
[  132.613038]  <EOI>  [<ffffffff81063466>] ? native_safe_halt+0x6/0x10
[  132.613043]  [<ffffffff81037a4e>] default_idle+0x1e/0xd0
[  132.613044]  [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20
[  132.613046]  [<ffffffff810e8fda>] default_idle_call+0x2a/0x40
[  132.613047]  [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340
[  132.613048]  [<ffffffff81050235>] start_secondary+0x155/0x190
[  132.613049] ---[ end trace f91934d162ce9977 ]---

The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt
context. Converting the spin_lock_bh to spin_lock_irqsave to avoid this
problem - and other problems of sharing a spinlock with an interrupt.


Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: cgroups@vger.kernel.org
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
---
 kernel/cgroup.c | 165 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 99 insertions(+), 66 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 86cb5c6..71c4d1a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -837,6 +837,7 @@ static void put_css_set_locked(struct css_set *cset)
 
 static void put_css_set(struct css_set *cset)
 {
+	unsigned long flags;
 	/*
 	 * Ensure that the refcount doesn't hit zero while any readers
 	 * can see it. Similar to atomic_dec_and_lock(), but for an
@@ -845,9 +846,9 @@ static void put_css_set(struct css_set *cset)
 	if (atomic_add_unless(&cset->refcount, -1, 1))
 		return;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	put_css_set_locked(cset);
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 }
 
 /*
@@ -1064,17 +1065,18 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 	struct cgrp_cset_link *link;
 	struct cgroup_subsys *ss;
 	unsigned long key;
+	unsigned long flags;
 	int ssid;
 
 	lockdep_assert_held(&cgroup_mutex);
 
 	/* First see if we already have a cgroup group that matches
 	 * the desired set */
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	cset = find_existing_css_set(old_cset, cgrp, template);
 	if (cset)
 		get_css_set(cset);
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	if (cset)
 		return cset;
@@ -1102,7 +1104,7 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 	 * find_existing_css_set() */
 	memcpy(cset->subsys, template, sizeof(cset->subsys));
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	/* Add reference counts and links from the new css_set. */
 	list_for_each_entry(link, &old_cset->cgrp_links, cgrp_link) {
 		struct cgroup *c = link->cgrp;
@@ -1128,7 +1130,7 @@ static struct css_set *find_css_set(struct css_set *old_cset,
 		css_get(css);
 	}
 
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	return cset;
 }
@@ -1179,6 +1181,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 {
 	struct cgroup *cgrp = &root->cgrp;
 	struct cgrp_cset_link *link, *tmp_link;
+	unsigned long flags;
 
 	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
@@ -1192,7 +1195,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 	 * Release all the links from cset_links to this hierarchy's
 	 * root cgroup
 	 */
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 
 	list_for_each_entry_safe(link, tmp_link, &cgrp->cset_links, cset_link) {
 		list_del(&link->cset_link);
@@ -1200,7 +1203,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 		kfree(link);
 	}
 
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	if (!list_empty(&root->root_list)) {
 		list_del(&root->root_list);
@@ -1586,6 +1589,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
 		struct cgroup *scgrp = &src_root->cgrp;
 		struct cgroup_subsys_state *css = cgroup_css(scgrp, ss);
 		struct css_set *cset;
+		unsigned long flags;
 
 		WARN_ON(!css || cgroup_css(dcgrp, ss));
 
@@ -1600,11 +1604,11 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
 		ss->root = dst_root;
 		css->cgroup = dcgrp;
 
-		spin_lock_bh(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, flags);
 		hash_for_each(css_set_table, i, cset, hlist)
 			list_move_tail(&cset->e_cset_node[ss->id],
 				       &dcgrp->e_csets[ss->id]);
-		spin_unlock_bh(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, flags);
 
 		/* default hierarchy doesn't enable controllers by default */
 		dst_root->subsys_mask |= 1 << ssid;
@@ -1635,15 +1639,16 @@ static int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node,
 	char *buf = NULL;
 	struct cgroup_root *kf_cgroot = cgroup_root_from_kf(kf_root);
 	struct cgroup *ns_cgroup;
+	unsigned long flags;
 
 	buf = kmalloc(PATH_MAX, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	ns_cgroup = current_cgns_cgroup_from_root(kf_cgroot);
 	len = kernfs_path_from_node(kf_node, ns_cgroup->kn, buf, PATH_MAX);
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	if (len >= PATH_MAX)
 		len = -ERANGE;
@@ -1896,8 +1901,9 @@ static bool use_task_css_set_links __read_mostly;
 static void cgroup_enable_task_cg_lists(void)
 {
 	struct task_struct *p, *g;
+	unsigned long flags;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 
 	if (use_task_css_set_links)
 		goto out_unlock;
@@ -1936,7 +1942,7 @@ static void cgroup_enable_task_cg_lists(void)
 	} while_each_thread(g, p);
 	read_unlock(&tasklist_lock);
 out_unlock:
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 }
 
 static void init_cgroup_housekeeping(struct cgroup *cgrp)
@@ -1985,6 +1991,7 @@ static int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 	struct cgroup *root_cgrp = &root->cgrp;
 	struct css_set *cset;
 	int i, ret;
+	unsigned long flags;
 
 	lockdep_assert_held(&cgroup_mutex);
 
@@ -2043,13 +2050,13 @@ static int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 	 * Link the root cgroup in this hierarchy into all the css_set
 	 * objects.
 	 */
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	hash_for_each(css_set_table, i, cset, hlist) {
 		link_css_set(&tmp_links, cset, root_cgrp);
 		if (css_set_populated(cset))
 			cgroup_update_populated(root_cgrp, true);
 	}
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	BUG_ON(!list_empty(&root_cgrp->self.children));
 	BUG_ON(atomic_read(&root->nr_cgrps) != 1);
@@ -2254,13 +2261,14 @@ out_mount:
 	if (!IS_ERR(dentry) && ns != &init_cgroup_ns) {
 		struct dentry *nsdentry;
 		struct cgroup *cgrp;
+		unsigned long flags;
 
 		mutex_lock(&cgroup_mutex);
-		spin_lock_bh(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, flags);
 
 		cgrp = cset_cgroup_from_root(ns->root_cset, root);
 
-		spin_unlock_bh(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, flags);
 		mutex_unlock(&cgroup_mutex);
 
 		nsdentry = kernfs_node_dentry(cgrp->kn, dentry->d_sb);
@@ -2335,13 +2343,14 @@ char *cgroup_path_ns(struct cgroup *cgrp, char *buf, size_t buflen,
 		     struct cgroup_namespace *ns)
 {
 	char *ret;
+	unsigned long flags;
 
 	mutex_lock(&cgroup_mutex);
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 
 	ret = cgroup_path_ns_locked(cgrp, buf, buflen, ns);
 
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 	mutex_unlock(&cgroup_mutex);
 
 	return ret;
@@ -2367,9 +2376,10 @@ char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 	struct cgroup *cgrp;
 	int hierarchy_id = 1;
 	char *path = NULL;
+	unsigned long flags;
 
 	mutex_lock(&cgroup_mutex);
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 
 	root = idr_get_next(&cgroup_hierarchy_idr, &hierarchy_id);
 
@@ -2382,7 +2392,7 @@ char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 			path = buf;
 	}
 
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 	mutex_unlock(&cgroup_mutex);
 	return path;
 }
@@ -2535,6 +2545,7 @@ static int cgroup_taskset_migrate(struct cgroup_taskset *tset,
 	struct task_struct *task, *tmp_task;
 	struct css_set *cset, *tmp_cset;
 	int ssid, failed_ssid, ret;
+	unsigned long flags;
 
 	/* methods shouldn't be called if no task is actually migrating */
 	if (list_empty(&tset->src_csets))
@@ -2557,7 +2568,7 @@ static int cgroup_taskset_migrate(struct cgroup_taskset *tset,
 	 * the new cgroup.  There are no failure cases after here, so this
 	 * is the commit point.
 	 */
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	list_for_each_entry(cset, &tset->src_csets, mg_node) {
 		list_for_each_entry_safe(task, tmp_task, &cset->mg_tasks, cg_list) {
 			struct css_set *from_cset = task_css_set(task);
@@ -2568,7 +2579,7 @@ static int cgroup_taskset_migrate(struct cgroup_taskset *tset,
 			put_css_set_locked(from_cset);
 		}
 	}
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	/*
 	 * Migration is committed, all target tasks are now on dst_csets.
@@ -2597,13 +2608,13 @@ out_cancel_attach:
 		}
 	} while_each_subsys_mask();
 out_release_tset:
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	list_splice_init(&tset->dst_csets, &tset->src_csets);
 	list_for_each_entry_safe(cset, tmp_cset, &tset->src_csets, mg_node) {
 		list_splice_tail_init(&cset->mg_tasks, &cset->tasks);
 		list_del_init(&cset->mg_node);
 	}
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 	return ret;
 }
 
@@ -2631,10 +2642,11 @@ static bool cgroup_may_migrate_to(struct cgroup *dst_cgrp)
 static void cgroup_migrate_finish(struct list_head *preloaded_csets)
 {
 	struct css_set *cset, *tmp_cset;
+	unsigned long flags;
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	list_for_each_entry_safe(cset, tmp_cset, preloaded_csets, mg_preload_node) {
 		cset->mg_src_cgrp = NULL;
 		cset->mg_dst_cgrp = NULL;
@@ -2642,7 +2654,7 @@ static void cgroup_migrate_finish(struct list_head *preloaded_csets)
 		list_del_init(&cset->mg_preload_node);
 		put_css_set_locked(cset);
 	}
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 }
 
 /**
@@ -2777,13 +2789,14 @@ static int cgroup_migrate(struct task_struct *leader, bool threadgroup,
 {
 	struct cgroup_taskset tset = CGROUP_TASKSET_INIT(tset);
 	struct task_struct *task;
+	unsigned long flags;
 
 	/*
 	 * Prevent freeing of tasks while we take a snapshot. Tasks that are
 	 * already PF_EXITING could be freed from underneath us unless we
 	 * take an rcu_read_lock.
 	 */
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	rcu_read_lock();
 	task = leader;
 	do {
@@ -2792,7 +2805,7 @@ static int cgroup_migrate(struct task_struct *leader, bool threadgroup,
 			break;
 	} while_each_thread(leader, task);
 	rcu_read_unlock();
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	return cgroup_taskset_migrate(&tset, root);
 }
@@ -2811,12 +2824,13 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp,
 	LIST_HEAD(preloaded_csets);
 	struct task_struct *task;
 	int ret;
+	unsigned long flags;
 
 	if (!cgroup_may_migrate_to(dst_cgrp))
 		return -EBUSY;
 
 	/* look up all src csets */
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	rcu_read_lock();
 	task = leader;
 	do {
@@ -2826,7 +2840,7 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp,
 			break;
 	} while_each_thread(leader, task);
 	rcu_read_unlock();
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	/* prepare dst csets and commit */
 	ret = cgroup_migrate_prepare_dst(&preloaded_csets);
@@ -2858,10 +2872,11 @@ static int cgroup_procs_write_permission(struct task_struct *task,
 		struct super_block *sb = of->file->f_path.dentry->d_sb;
 		struct cgroup *cgrp;
 		struct inode *inode;
+		unsigned long flags;
 
-		spin_lock_bh(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, flags);
 		cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
-		spin_unlock_bh(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, flags);
 
 		while (!cgroup_is_descendant(dst_cgrp, cgrp))
 			cgrp = cgroup_parent(cgrp);
@@ -2954,6 +2969,7 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
 {
 	struct cgroup_root *root;
 	int retval = 0;
+	unsigned long flags;
 
 	mutex_lock(&cgroup_mutex);
 	for_each_root(root) {
@@ -2962,9 +2978,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
 		if (root == &cgrp_dfl_root)
 			continue;
 
-		spin_lock_bh(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, flags);
 		from_cgrp = task_cgroup_from_root(from, root);
-		spin_unlock_bh(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, flags);
 
 		retval = cgroup_attach_task(from_cgrp, tsk, false);
 		if (retval)
@@ -3074,13 +3090,14 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 	struct cgroup *dsct;
 	struct css_set *src_cset;
 	int ret;
+	unsigned long flags;
 
 	lockdep_assert_held(&cgroup_mutex);
 
 	percpu_down_write(&cgroup_threadgroup_rwsem);
 
 	/* look up all csses currently attached to @cgrp's subtree */
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
 		struct cgrp_cset_link *link;
 
@@ -3088,14 +3105,14 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 			cgroup_migrate_add_src(link->cset, dsct,
 					       &preloaded_csets);
 	}
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	/* NULL dst indicates self on default hierarchy */
 	ret = cgroup_migrate_prepare_dst(&preloaded_csets);
 	if (ret)
 		goto out_finish;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	list_for_each_entry(src_cset, &preloaded_csets, mg_preload_node) {
 		struct task_struct *task, *ntask;
 
@@ -3107,7 +3124,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 		list_for_each_entry_safe(task, ntask, &src_cset->tasks, cg_list)
 			cgroup_taskset_add(task, &tset);
 	}
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	ret = cgroup_taskset_migrate(&tset, cgrp->root);
 out_finish:
@@ -3907,11 +3924,12 @@ static int cgroup_task_count(const struct cgroup *cgrp)
 {
 	int count = 0;
 	struct cgrp_cset_link *link;
+	unsigned long flags;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	list_for_each_entry(link, &cgrp->cset_links, cset_link)
 		count += atomic_read(&link->cset->refcount);
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 	return count;
 }
 
@@ -4244,12 +4262,14 @@ static void css_task_iter_advance(struct css_task_iter *it)
 void css_task_iter_start(struct cgroup_subsys_state *css,
 			 struct css_task_iter *it)
 {
+	unsigned long flags;
+
 	/* no one should try to iterate before mounting cgroups */
 	WARN_ON_ONCE(!use_task_css_set_links);
 
 	memset(it, 0, sizeof(*it));
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 
 	it->ss = css->ss;
 
@@ -4262,7 +4282,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
 
 	css_task_iter_advance_css_set(it);
 
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 }
 
 /**
@@ -4275,12 +4295,14 @@ void css_task_iter_start(struct cgroup_subsys_state *css,
  */
 struct task_struct *css_task_iter_next(struct css_task_iter *it)
 {
+	unsigned long flags;
+
 	if (it->cur_task) {
 		put_task_struct(it->cur_task);
 		it->cur_task = NULL;
 	}
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 
 	if (it->task_pos) {
 		it->cur_task = list_entry(it->task_pos, struct task_struct,
@@ -4289,7 +4311,7 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
 		css_task_iter_advance(it);
 	}
 
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	return it->cur_task;
 }
@@ -4302,11 +4324,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
  */
 void css_task_iter_end(struct css_task_iter *it)
 {
+	unsigned long flags;
+
 	if (it->cur_cset) {
-		spin_lock_bh(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, flags);
 		list_del(&it->iters_node);
 		put_css_set_locked(it->cur_cset);
-		spin_unlock_bh(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, flags);
 	}
 
 	if (it->cur_task)
@@ -4331,6 +4355,7 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 	struct css_task_iter it;
 	struct task_struct *task;
 	int ret;
+	unsigned long flags;
 
 	if (!cgroup_may_migrate_to(to))
 		return -EBUSY;
@@ -4338,10 +4363,10 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from)
 	mutex_lock(&cgroup_mutex);
 
 	/* all tasks in @from are being moved, all csets are source */
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	list_for_each_entry(link, &from->cset_links, cset_link)
 		cgroup_migrate_add_src(link->cset, to, &preloaded_csets);
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	ret = cgroup_migrate_prepare_dst(&preloaded_csets);
 	if (ret)
@@ -5425,6 +5450,7 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	struct cgroup_subsys_state *css;
 	struct cgrp_cset_link *link;
 	int ssid;
+	unsigned long flags;
 
 	lockdep_assert_held(&cgroup_mutex);
 
@@ -5451,10 +5477,10 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	 */
 	cgrp->self.flags &= ~CSS_ONLINE;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	list_for_each_entry(link, &cgrp->cset_links, cset_link)
 		link->cset->dead = true;
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 
 	/* initiate massacre of all css's */
 	for_each_css(css, ssid, cgrp)
@@ -5718,6 +5744,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 	char *buf, *path;
 	int retval;
 	struct cgroup_root *root;
+	unsigned long flags;
 
 	retval = -ENOMEM;
 	buf = kmalloc(PATH_MAX, GFP_KERNEL);
@@ -5725,7 +5752,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 		goto out;
 
 	mutex_lock(&cgroup_mutex);
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 
 	for_each_root(root) {
 		struct cgroup_subsys *ss;
@@ -5778,7 +5805,7 @@ int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
 
 	retval = 0;
 out_unlock:
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 	mutex_unlock(&cgroup_mutex);
 	kfree(buf);
 out:
@@ -5897,6 +5924,7 @@ void cgroup_cancel_fork(struct task_struct *child)
 void cgroup_post_fork(struct task_struct *child)
 {
 	struct cgroup_subsys *ss;
+	unsigned long flags;
 	int i;
 
 	/*
@@ -5923,13 +5951,13 @@ void cgroup_post_fork(struct task_struct *child)
 	if (use_task_css_set_links) {
 		struct css_set *cset;
 
-		spin_lock_bh(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, flags);
 		cset = task_css_set(current);
 		if (list_empty(&child->cg_list)) {
 			get_css_set(cset);
 			css_set_move_task(child, NULL, cset, false);
 		}
-		spin_unlock_bh(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, flags);
 	}
 
 	/*
@@ -5965,6 +5993,7 @@ void cgroup_exit(struct task_struct *tsk)
 {
 	struct cgroup_subsys *ss;
 	struct css_set *cset;
+	unsigned long flags;
 	int i;
 
 	/*
@@ -5974,9 +6003,9 @@ void cgroup_exit(struct task_struct *tsk)
 	cset = task_css_set(tsk);
 
 	if (!list_empty(&tsk->cg_list)) {
-		spin_lock_bh(&css_set_lock);
+		spin_lock_irqsave(&css_set_lock, flags);
 		css_set_move_task(tsk, cset, NULL, false);
-		spin_unlock_bh(&css_set_lock);
+		spin_unlock_irqrestore(&css_set_lock, flags);
 	} else {
 		get_css_set(cset);
 	}
@@ -6036,6 +6065,7 @@ static void cgroup_release_agent(struct work_struct *work)
 		container_of(work, struct cgroup, release_agent_work);
 	char *pathbuf = NULL, *agentbuf = NULL, *path;
 	char *argv[3], *envp[3];
+	unsigned long flags;
 
 	mutex_lock(&cgroup_mutex);
 
@@ -6044,9 +6074,9 @@ static void cgroup_release_agent(struct work_struct *work)
 	if (!pathbuf || !agentbuf)
 		goto out;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	path = cgroup_path_ns_locked(cgrp, pathbuf, PATH_MAX, &init_cgroup_ns);
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 	if (!path)
 		goto out;
 
@@ -6293,6 +6323,7 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 {
 	struct cgroup_namespace *new_ns;
 	struct css_set *cset;
+	unsigned long irq_flags;
 
 	BUG_ON(!old_ns);
 
@@ -6306,12 +6337,12 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
 		return ERR_PTR(-EPERM);
 
 	mutex_lock(&cgroup_mutex);
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, irq_flags);
 
 	cset = task_css_set(current);
 	get_css_set(cset);
 
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, irq_flags);
 	mutex_unlock(&cgroup_mutex);
 
 	new_ns = alloc_cgroup_ns();
@@ -6430,12 +6461,13 @@ static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
 	struct cgrp_cset_link *link;
 	struct css_set *cset;
 	char *name_buf;
+	unsigned long flags;
 
 	name_buf = kmalloc(NAME_MAX + 1, GFP_KERNEL);
 	if (!name_buf)
 		return -ENOMEM;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	rcu_read_lock();
 	cset = rcu_dereference(current->cgroups);
 	list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
@@ -6446,7 +6478,7 @@ static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
 			   c->root->hierarchy_id, name_buf);
 	}
 	rcu_read_unlock();
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 	kfree(name_buf);
 	return 0;
 }
@@ -6456,8 +6488,9 @@ static int cgroup_css_links_read(struct seq_file *seq, void *v)
 {
 	struct cgroup_subsys_state *css = seq_css(seq);
 	struct cgrp_cset_link *link;
+	unsigned long flags;
 
-	spin_lock_bh(&css_set_lock);
+	spin_lock_irqsave(&css_set_lock, flags);
 	list_for_each_entry(link, &css->cgroup->cset_links, cset_link) {
 		struct css_set *cset = link->cset;
 		struct task_struct *task;
@@ -6480,7 +6513,7 @@ static int cgroup_css_links_read(struct seq_file *seq, void *v)
 	overflow:
 		seq_puts(seq, "  ...\n");
 	}
-	spin_unlock_bh(&css_set_lock);
+	spin_unlock_irqrestore(&css_set_lock, flags);
 	return 0;
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cgroup: disable irqs while holding css_set_lock
  2016-06-06 22:24 [PATCH] cgroup: disable irqs while holding css_set_lock Daniel Bristot de Oliveira
@ 2016-06-07 10:14 ` Juri Lelli
  2016-06-07 12:39   ` Daniel Bristot de Oliveira
  2016-06-07 15:13 ` Daniel Bristot de Oliveira
  1 sibling, 1 reply; 7+ messages in thread
From: Juri Lelli @ 2016-06-07 10:14 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Rik van Riel, Luis Claudio R. Goncalves, Tejun Heo,
	Li Zefan, Johannes Weiner, cgroups

Hi,

On 06/06/16 19:24, Daniel Bristot de Oliveira wrote:
> While testing the deadline scheduler + cgroup setup I hit this
> warning.
> 
> [  132.612935] ------------[ cut here ]------------
> [  132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
> [  132.612952] Modules linked in: (a ton of modules...)
> [  132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2
> [  132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
> [  132.612982]  0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
> [  132.612984]  0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
> [  132.612985]  00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
> [  132.612986] Call Trace:
> [  132.612987]  <IRQ>  [<ffffffff813d229e>] dump_stack+0x63/0x85
> [  132.612994]  [<ffffffff810a652b>] __warn+0xcb/0xf0
> [  132.612997]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
> [  132.612999]  [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20
> [  132.613000]  [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80
> [  132.613008]  [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20
> [  132.613010]  [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10
> [  132.613015]  [<ffffffff811388ac>] put_css_set+0x5c/0x60
> [  132.613016]  [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0
> [  132.613017]  [<ffffffff810a3912>] __put_task_struct+0x42/0x140
> [  132.613018]  [<ffffffff810e776a>] dl_task_timer+0xca/0x250
> [  132.613027]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
> [  132.613030]  [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270
> [  132.613031]  [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190
> [  132.613034]  [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60
> [  132.613035]  [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50
> [  132.613037]  [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0
> [  132.613038]  <EOI>  [<ffffffff81063466>] ? native_safe_halt+0x6/0x10
> [  132.613043]  [<ffffffff81037a4e>] default_idle+0x1e/0xd0
> [  132.613044]  [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20
> [  132.613046]  [<ffffffff810e8fda>] default_idle_call+0x2a/0x40
> [  132.613047]  [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340
> [  132.613048]  [<ffffffff81050235>] start_secondary+0x155/0x190
> [  132.613049] ---[ end trace f91934d162ce9977 ]---
> 
> The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt
> context. Converting the spin_lock_bh to spin_lock_irqsave to avoid this
> problem - and other problems of sharing a spinlock with an interrupt.
> 

Interesting. And your test is using cpuset controller to partion
DEADLINE tasks and then modify groups concurrently?

Best,

- Juri

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cgroup: disable irqs while holding css_set_lock
  2016-06-07 10:14 ` Juri Lelli
@ 2016-06-07 12:39   ` Daniel Bristot de Oliveira
  2016-06-07 13:30     ` Juri Lelli
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Bristot de Oliveira @ 2016-06-07 12:39 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, Rik van Riel, Luis Claudio R. Goncalves, Tejun Heo,
	Li Zefan, Johannes Weiner, cgroups

Ciao Juri,

On 06/07/2016 07:14 AM, Juri Lelli wrote:
> Interesting. And your test is using cpuset controller to partion
> DEADLINE tasks and then modify groups concurrently?

Yes. I was studying the partitioning/admission control of the
deadline scheduler, to document it.

I was using the minimal task from sched deadline's documentation
as the load (the ./m in the bellow script). 

Here is the script I was using in the test:
-----------%<------------------------------------------------------------
#!/bin/sh

# I am running on a 8 cpus box, you need to adjust the
# cpu mask to match to your cpu topology.

cd /sys/fs/cgroup/cpuset

# global settings
# echo 1 > cpuset.cpu_exclusive
echo 0 > cpuset.sched_load_balance

# a cpuset to run ordinary load:

if [ ! -d ordinary ]; then
	mkdir ordinary
	echo 0-3 > ordinary/cpuset.cpus
	echo 0 > ordinary/cpuset.mems
	echo 0 > ordinary/cpuset.cpu_exclusive
	# the load balance can be enabled on this cpuset.
	echo 1 > ordinary/cpuset.sched_load_balance
fi

# move all threads to ordinary cpuset 
ps -eL -o lwp | while read tid; do
	echo $tid >> ordinary/tasks 2> /dev/null || echo "thread $tid is pinned or died"
done

echo $$ > ordinary/tasks
cat /proc/self/cpuset
~/m &

# a single cpu cpuset (partitioned)
if [ ! -d partitioned ]; then
	mkdir partitioned
	echo 4 > partitioned/cpuset.cpus
	echo 0 > partitioned/cpuset.mems
	echo 0 > partitioned/cpuset.cpu_exclusive
fi

echo $$ > partitioned/tasks
cat /proc/self/cpuset
~/m &

# a set of cpus (clustered)
if [ ! -d clustered ]; then
	mkdir clustered
	echo 5-7 > clustered/cpuset.cpus
	echo 0 > clustered/cpuset.mems
	echo 0 > clustered/cpuset.cpu_exclusive
	# the load balance can be enabled on this cpuset.
	echo 1 > clustered/cpuset.sched_load_balance
fi

echo $$ > clustered/tasks
cat /proc/self/cpuset
~/m
----------->%------------------------------------------------------------

The problem rarely reproduces.

-- Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cgroup: disable irqs while holding css_set_lock
  2016-06-07 12:39   ` Daniel Bristot de Oliveira
@ 2016-06-07 13:30     ` Juri Lelli
  2016-06-07 15:26       ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Lelli @ 2016-06-07 13:30 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Rik van Riel, Luis Claudio R. Goncalves, Tejun Heo,
	Li Zefan, Johannes Weiner, cgroups

On 07/06/16 09:39, Daniel Bristot de Oliveira wrote:
> Ciao Juri,
> 

Ciao, :-)

> On 06/07/2016 07:14 AM, Juri Lelli wrote:
> > Interesting. And your test is using cpuset controller to partion
> > DEADLINE tasks and then modify groups concurrently?
> 
> Yes. I was studying the partitioning/admission control of the
> deadline scheduler, to document it.
> 
> I was using the minimal task from sched deadline's documentation
> as the load (the ./m in the bellow script). 
> 
> Here is the script I was using in the test:

Thanks for sharing it. It is somewhat similar to some of my test
scripts, but I've got a question below.

> -----------%<------------------------------------------------------------
> #!/bin/sh
> 
> # I am running on a 8 cpus box, you need to adjust the
> # cpu mask to match to your cpu topology.
> 
> cd /sys/fs/cgroup/cpuset
> 
> # global settings
> # echo 1 > cpuset.cpu_exclusive
> echo 0 > cpuset.sched_load_balance
> 
> # a cpuset to run ordinary load:
> 
> if [ ! -d ordinary ]; then
> 	mkdir ordinary
> 	echo 0-3 > ordinary/cpuset.cpus
> 	echo 0 > ordinary/cpuset.mems
> 	echo 0 > ordinary/cpuset.cpu_exclusive
> 	# the load balance can be enabled on this cpuset.
> 	echo 1 > ordinary/cpuset.sched_load_balance
> fi
> 
> # move all threads to ordinary cpuset 
> ps -eL -o lwp | while read tid; do
> 	echo $tid >> ordinary/tasks 2> /dev/null || echo "thread $tid is pinned or died"
> done
> 
> echo $$ > ordinary/tasks
> cat /proc/self/cpuset
> ~/m &
> 
> # a single cpu cpuset (partitioned)
> if [ ! -d partitioned ]; then
> 	mkdir partitioned
> 	echo 4 > partitioned/cpuset.cpus
> 	echo 0 > partitioned/cpuset.mems
> 	echo 0 > partitioned/cpuset.cpu_exclusive
> fi
> 
> echo $$ > partitioned/tasks
> cat /proc/self/cpuset
> ~/m &
> 
> # a set of cpus (clustered)
> if [ ! -d clustered ]; then
> 	mkdir clustered
> 	echo 5-7 > clustered/cpuset.cpus
> 	echo 0 > clustered/cpuset.mems
> 	echo 0 > clustered/cpuset.cpu_exclusive

So, this and the partitioned one could actually overlap, since we don't
set cpu_exclusive. Is that right?

I guess affinity mask of both m processes gets set correclty, but I'm
not sure if we are missing one check in the admission control. Can you
actually create two overlapping sets and get DEADLINE tasks running in
them? For example, what happens if partitioned is [4] and clustered is
[4-7]? Does setattr() fail?

It is not really related to this patch, I'm just wondering if there is
another problem lying around.

Thanks,

- Juri

> 	# the load balance can be enabled on this cpuset.
> 	echo 1 > clustered/cpuset.sched_load_balance
> fi
> 
> echo $$ > clustered/tasks
> cat /proc/self/cpuset
> ~/m
> ----------->%------------------------------------------------------------
> 
> The problem rarely reproduces.
> 
> -- Daniel
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cgroup: disable irqs while holding css_set_lock
  2016-06-06 22:24 [PATCH] cgroup: disable irqs while holding css_set_lock Daniel Bristot de Oliveira
  2016-06-07 10:14 ` Juri Lelli
@ 2016-06-07 15:13 ` Daniel Bristot de Oliveira
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Bristot de Oliveira @ 2016-06-07 15:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rik van Riel, Luis Claudio R. Goncalves, Tejun Heo, Li Zefan,
	Johannes Weiner, Juri Lelli, cgroups

Oops.

While doing further tests on my patch I found a problem:

[   82.390739] =================================
[   82.390749] [ INFO: inconsistent lock state ]
[   82.390759] 4.7.0-rc2+ #5 Not tainted
[   82.390768] ---------------------------------
[   82.390777] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[   82.390790] swapper/5/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[   82.390801]  (css_set_lock){?.+...}, at: [<ffffffff8115e1cc>] put_css_set+0x4c/0x70
[   82.390854] {HARDIRQ-ON-W} state was registered at:
[   82.390865]   [<ffffffff8110a4ef>] mark_held_locks+0x6f/0xa0
[   82.390880]   [<ffffffff8110a5c5>] trace_hardirqs_on_caller+0xa5/0x1b0
[   82.390896]   [<ffffffff8110a6dd>] trace_hardirqs_on+0xd/0x10
[   82.390910]   [<ffffffff8188c59c>] _raw_spin_unlock_irq+0x2c/0x40
[   82.390926]   [<ffffffff8116361c>] cgroup_mount+0x59c/0xc10
[   82.390941]   [<ffffffff812871a8>] mount_fs+0x38/0x170
[   82.390956]   [<ffffffff812a866b>] vfs_kern_mount+0x6b/0x150
[   82.390979]   [<ffffffff812ab3a0>] do_mount+0x250/0xe80
[   82.391012]   [<ffffffff812ac305>] SyS_mount+0x95/0xe0
[   82.391044]   [<ffffffff8188cc7c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
[   82.391071] irq event stamp: 92782
[   82.391084] hardirqs last  enabled at (92779): [<ffffffff8103e69e>] default_idle+0x1e/0x160
[   82.391116] hardirqs last disabled at (92780): [<ffffffff8188d9a1>] apic_timer_interrupt+0x91/0xa0
[   82.391152] softirqs last  enabled at (92782): [<ffffffff810b64d1>] _local_bh_enable+0x21/0x50
[   82.391184] softirqs last disabled at (92781): [<ffffffff810b750c>] irq_enter+0x6c/0x90
[   82.391202] 
               other info that might help us debug this:
[   82.391215]  Possible unsafe locking scenario:

[   82.391228]        CPU0
[   82.391233]        ----
[   82.391239]   lock(css_set_lock);
[   82.391249]   <Interrupt>
[   82.391255]     lock(css_set_lock);
[   82.391265] 
                *** DEADLOCK ***
q
[   82.391460] no locks held by swapper/5/0.
[   82.391621] 
               stack backtrace:
[   82.391910] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2+ #5
[   82.392045] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[   82.392180]  0000000000000086 8f22cb6bc668205e ffff88013fd43c20 ffffffff81438973
[   82.392315]  ffff88013a45a680 ffffffff82857680 ffff88013fd43c70 ffffffff81109e45
[   82.392448]  0000000000000000 0000000000000000 ffff880100000001 0000000000000002
[   82.392578] Call Trace:
[   82.392691]  <IRQ>  [<ffffffff81438973>] dump_stack+0x85/0xc2
[   82.392817]  [<ffffffff81109e45>] print_usage_bug+0x215/0x240
[   82.392939]  [<ffffffff8110a3c0>] mark_lock+0x550/0x610
[   82.393052]  [<ffffffff811093a0>] ? check_usage_backwards+0x160/0x160
[   82.393167]  [<ffffffff8110b10e>] __lock_acquire+0x6ee/0x1440
[   82.393281]  [<ffffffff8112787d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[   82.393396]  [<ffffffff8110b1ee>] ? __lock_acquire+0x7ce/0x1440
[   82.393509]  [<ffffffff8110b228>] ? __lock_acquire+0x808/0x1440
[   82.393621]  [<ffffffff8110c2d0>] lock_acquire+0xf0/0x1d0
[   82.393731]  [<ffffffff8115e1cc>] ? put_css_set+0x4c/0x70
[   82.393850]  [<ffffffff8188cbc9>] _raw_spin_lock_irqsave+0x49/0x5d
[   82.393985]  [<ffffffff8115e1cc>] ? put_css_set+0x4c/0x70
[   82.394093]  [<ffffffff8115e1cc>] put_css_set+0x4c/0x70
[   82.394201]  [<ffffffff81165591>] cgroup_free+0x91/0x120
[   82.394310]  [<ffffffff810ad0d2>] __put_task_struct+0x42/0x140
[   82.394419]  [<ffffffff810fd064>] dl_task_timer+0xc4/0x2c0
[   82.394528]  [<ffffffff810fcfa0>] ? enqueue_task_dl+0x480/0x480
[   82.394638]  [<ffffffff811349cb>] __hrtimer_run_queues+0xfb/0x4c0
[   82.394749]  [<ffffffff8113556b>] hrtimer_interrupt+0xab/0x1b0
[   82.394864]  [<ffffffff81059798>] local_apic_timer_interrupt+0x38/0x60
[   82.394976]  [<ffffffff8188f91d>] smp_apic_timer_interrupt+0x3d/0x50
[   82.395087]  [<ffffffff8188d9a6>] apic_timer_interrupt+0x96/0xa0
[   82.395196]  <EOI>  [<ffffffff8106b806>] ? native_safe_halt+0x6/0x10
[   82.395310]  [<ffffffff8110a6dd>] ? trace_hardirqs_on+0xd/0x10
[   82.395420]  [<ffffffff8103e6a3>] default_idle+0x23/0x160
[   82.395528]  [<ffffffff8103ef6f>] arch_cpu_idle+0xf/0x20
[   82.395637]  [<ffffffff810fecaa>] default_idle_call+0x2a/0x40
[   82.395744]  [<ffffffff810fefea>] cpu_startup_entry+0x32a/0x400
[   82.395854]  [<ffffffff81057f5a>] start_secondary+0x15a/0x190


This problem is fixed with this change:

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c695d55..fcc8aee 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1928,8 +1928,12 @@ static void cgroup_enable_task_cg_lists(void)
                 * entry won't be deleted though the process has exited.
                 * Do it while holding siglock so that we don't end up
                 * racing against cgroup_exit().
+                *
+                * Interruptions were already disabled when acquiring
+                * the css_set_lock, so we do not need to disable it
+                * again when acquiring the sighand->siglock here.
                 */
-               spin_lock_irq(&p->sighand->siglock);
+               spin_lock(&p->sighand->siglock);
                if (!(p->flags & PF_EXITING)) {
                        struct css_set *cset = task_css_set(p);
 
@@ -1938,7 +1942,7 @@ static void cgroup_enable_task_cg_lists(void)
                        list_add_tail(&p->cg_list, &cset->tasks);
                        get_css_set(cset);
                }
-               spin_unlock_irq(&p->sighand->siglock);
+               spin_unlock(&p->sighand->siglock);
        } while_each_thread(g, p);
        read_unlock(&tasklist_lock);
 out_unlock:

I will prepare a v2 of the patch including this change.

I am sorry for the noise.

-- Daniel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cgroup: disable irqs while holding css_set_lock
  2016-06-07 13:30     ` Juri Lelli
@ 2016-06-07 15:26       ` Daniel Bristot de Oliveira
  2016-06-07 16:11         ` Juri Lelli
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Bristot de Oliveira @ 2016-06-07 15:26 UTC (permalink / raw)
  To: Juri Lelli, Daniel Bristot de Oliveira
  Cc: linux-kernel, Rik van Riel, Luis Claudio R. Goncalves, Tejun Heo,
	Li Zefan, Johannes Weiner, cgroups

Ciao Juri,

On 06/07/2016 10:30 AM, Juri Lelli wrote:
> So, this and the partitioned one could actually overlap, since we don't
> set cpu_exclusive. Is that right?
> 
> I guess affinity mask of both m processes gets set correclty, but I'm
> not sure if we are missing one check in the admission control. Can you
> actually create two overlapping sets and get DEADLINE tasks running in
> them? For example, what happens if partitioned is [4] and clustered is
> [4-7]? Does setattr() fail?

That is what I was trying to understand/break. Fortunately, I still
can't break it! bravo! :-)

In the test you mentioned the task in the "clustered" fails to
sched_setattr().

Test output example:
+ '[' '!' -d partitioned ']'
+ mkdir partitioned
+ echo 4
+ echo 0
+ echo 0
+ echo 1155
+ cat /proc/self/cpuset
/partitioned
+ /root/m
main thread [1162]
deadline thread started [1164]
+ '[' '!' -d clustered ']'
+ mkdir clustered
+ echo 4-7
+ echo 0
+ echo 0
+ echo 1
+ echo 1155
+ cat /proc/self/cpuset
+ /root/m
/clustered
+ /root/m
main thread [1166]
sched_setattr: Operation not permitted

I will let you know if I find something odd.

-- Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cgroup: disable irqs while holding css_set_lock
  2016-06-07 15:26       ` Daniel Bristot de Oliveira
@ 2016-06-07 16:11         ` Juri Lelli
  0 siblings, 0 replies; 7+ messages in thread
From: Juri Lelli @ 2016-06-07 16:11 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Rik van Riel, Luis Claudio R. Goncalves, Tejun Heo,
	Li Zefan, Johannes Weiner, cgroups

On 07/06/16 12:26, Daniel Bristot de Oliveira wrote:
> Ciao Juri,
> 
> On 06/07/2016 10:30 AM, Juri Lelli wrote:
> > So, this and the partitioned one could actually overlap, since we don't
> > set cpu_exclusive. Is that right?
> > 
> > I guess affinity mask of both m processes gets set correclty, but I'm
> > not sure if we are missing one check in the admission control. Can you
> > actually create two overlapping sets and get DEADLINE tasks running in
> > them? For example, what happens if partitioned is [4] and clustered is
> > [4-7]? Does setattr() fail?
> 
> That is what I was trying to understand/break. Fortunately, I still
> can't break it! bravo! :-)
> 
> In the test you mentioned the task in the "clustered" fails to
> sched_setattr().
> 
> Test output example:
> + '[' '!' -d partitioned ']'
> + mkdir partitioned
> + echo 4
> + echo 0
> + echo 0
> + echo 1155
> + cat /proc/self/cpuset
> /partitioned
> + /root/m
> main thread [1162]
> deadline thread started [1164]

Mmm. The other one is good, but this one looks still suspect to me. We
shouldn't have created a new root domain at this point (since
cpu_exclusive is not set), so we shouldn't be able to admit DEADLINE
tasks with a smaller affinity mask than the full root domain mask (which
should still be 0-7 at this point), if I'm interpreting your script correctly
and I'm not wrong about root domains stuff.

Unfortunately I'm not able to test this myself right now. Hopefully next
week.

Thanks a lot for testing, though!

Best,

- Juri

> + '[' '!' -d clustered ']'
> + mkdir clustered
> + echo 4-7
> + echo 0
> + echo 0
> + echo 1
> + echo 1155
> + cat /proc/self/cpuset
> + /root/m
> /clustered
> + /root/m
> main thread [1166]
> sched_setattr: Operation not permitted
> 
> I will let you know if I find something odd.
> 
> -- Daniel
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-07 16:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-06 22:24 [PATCH] cgroup: disable irqs while holding css_set_lock Daniel Bristot de Oliveira
2016-06-07 10:14 ` Juri Lelli
2016-06-07 12:39   ` Daniel Bristot de Oliveira
2016-06-07 13:30     ` Juri Lelli
2016-06-07 15:26       ` Daniel Bristot de Oliveira
2016-06-07 16:11         ` Juri Lelli
2016-06-07 15:13 ` Daniel Bristot de Oliveira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).