All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues
@ 2014-03-14 16:38 Frederic Weisbecker
  2014-03-14 16:38 ` [PATCH 1/3] workqueue: Move workqueue bus attr to device attribute Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2014-03-14 16:38 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Tejun Heo, Viresh Kumar

There are several types of workqueues. Some of them are bound to specific
CPUs, some others are unbound and can be executed on any CPU.

A tiny subset of the unbound workqueues have a sysfs representation
in /sys/devices/virtual/workqueue/ and have a cpumask file than can
be used to tweak their CPU affinity.

But the vast majority of unbound workqueues aren't visible in sysfs.

They are a problem nowaday because people who work on CPU isolation
(HPC, Real time, etc...) want to be able to migrate all the unbound
workqueues away from specific CPUs.

There are several possible solutions to solve this:

1) Affine the !WQ_SYSFS unbound workqueues to the CPUs outside the
full dynticks mask. Full dynticks is expected to be a component in
many CPU isolation configurations and its CPU mapping can be a good
way to retrieve the desired set of isolated CPUs.

On the drawbacks though we can notice the lack of consistency with
WQ_SYSFS workqueue affinity interface, issues with ordering between
workqueue and dynticks subsystem initialization, intrusion from
the workqueue subsystem on dynticks internals.

2) Implement a sysfs directory for each unbound !WQ_SYSFS. That sounds
like a very nice solution as it uses existing and known interface.
But workqueues appearing in the sysfs hierarchy are subject to become
stable ABIs. And this is definetly not what we want.

This could be worked around with a specific Kconfig to make sure that
these workqueues won't be considered as a stable ABI. But we all know
that all distros will enable this Kconfig symbol and that nobody
reads, nor care about, warnings in Kconfig help text which thereby won't
protect us against anything.

3) Implement a single sysfs directory to map properties of all !WQ_SYSFS
unbound workqueues. It would contain only the cpumask file to control
the affinity of all these workqueues. But more can be added later.

This complexifies the code a bit although not that much compared to
solution 2 which requires some plumbling to cope with workqueues created
before sysfs, as reported by Mike (I played with that a bit as well, as I
took that direction initially). But it deals with all issues previously
described.

This patchset explores the third solution.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	core/workqueue-anon-sysfs

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      workqueue: Move workqueue bus attr to device attribute
      workqueues: Account unbound workqueue in a seperate list
      workqueue: Add anon workqueue sysfs hierarchy


 kernel/workqueue.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 140 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] workqueue: Move workqueue bus attr to device attribute
  2014-03-14 16:38 [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues Frederic Weisbecker
@ 2014-03-14 16:38 ` Frederic Weisbecker
  2014-03-14 16:38 ` [PATCH 2/3] workqueues: Account unbound workqueue in a seperate list Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2014-03-14 16:38 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Tejun Heo, Viresh Kumar

A workqueue directory implements at least two files: max_active and
per_cpu. Since thse are constant over WQ_SYSFS workqueues, they are
implemented as bus attributes.

Then come the attributes that only come along unbound workqueues. Those
are implemented as device attribute.

Now we are planning to implement a specific workqueue directory in order
to control the cpu affinity of all non-ABI unbound workqueues at once.
So far we are only interested in the cpumask file on this virtual
workqueue directory: we can't have a single file max_active and per_cpu
for all non ABI unbound workqueues for example. And for now it's not
desirable to share more all-in-one property like cpu affinity.

So we want to be able to create a directory containing only cpumask.
Lets move the constant workqueue attrs as device attribute so that
we can consider skipping them when we want.

Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Not-Yet-Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/workqueue.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 193e977..4d230e3 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3133,7 +3133,6 @@ static ssize_t per_cpu_show(struct device *dev, struct device_attribute *attr,
 
 	return scnprintf(buf, PAGE_SIZE, "%d\n", (bool)!(wq->flags & WQ_UNBOUND));
 }
-static DEVICE_ATTR_RO(per_cpu);
 
 static ssize_t max_active_show(struct device *dev,
 			       struct device_attribute *attr, char *buf)
@@ -3156,14 +3155,12 @@ static ssize_t max_active_store(struct device *dev,
 	workqueue_set_max_active(wq, val);
 	return count;
 }
-static DEVICE_ATTR_RW(max_active);
 
-static struct attribute *wq_sysfs_attrs[] = {
-	&dev_attr_per_cpu.attr,
-	&dev_attr_max_active.attr,
-	NULL,
+static struct device_attribute wq_sysfs_default_attrs[] = {
+	__ATTR(per_cpu, 0444, per_cpu_show, NULL),
+	__ATTR(max_active, 0644, max_active_show, max_active_store),
+	__ATTR_NULL,
 };
-ATTRIBUTE_GROUPS(wq_sysfs);
 
 static ssize_t wq_pool_ids_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
@@ -3313,7 +3310,6 @@ static struct device_attribute wq_sysfs_unbound_attrs[] = {
 
 static struct bus_type wq_subsys = {
 	.name				= "workqueue",
-	.dev_groups			= wq_sysfs_groups,
 };
 
 static int __init wq_sysfs_init(void)
@@ -3346,6 +3342,7 @@ static void wq_device_release(struct device *dev)
  */
 int workqueue_sysfs_register(struct workqueue_struct *wq)
 {
+	struct device_attribute *attr;
 	struct wq_device *wq_dev;
 	int ret;
 
@@ -3379,8 +3376,16 @@ int workqueue_sysfs_register(struct workqueue_struct *wq)
 		return ret;
 	}
 
+	for (attr = wq_sysfs_default_attrs; attr->attr.name; attr++) {
+		ret = device_create_file(&wq_dev->dev, attr);
+		if (ret) {
+			device_unregister(&wq_dev->dev);
+			wq->wq_dev = NULL;
+			return ret;
+		}
+	}
+
 	if (wq->flags & WQ_UNBOUND) {
-		struct device_attribute *attr;
 
 		for (attr = wq_sysfs_unbound_attrs; attr->attr.name; attr++) {
 			ret = device_create_file(&wq_dev->dev, attr);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] workqueues: Account unbound workqueue in a seperate list
  2014-03-14 16:38 [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues Frederic Weisbecker
  2014-03-14 16:38 ` [PATCH 1/3] workqueue: Move workqueue bus attr to device attribute Frederic Weisbecker
@ 2014-03-14 16:38 ` Frederic Weisbecker
  2014-03-14 18:17   ` Kevin Hilman
  2014-03-14 16:38 ` [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy Frederic Weisbecker
  2014-03-14 18:06 ` [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues Kevin Hilman
  3 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2014-03-14 16:38 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Tejun Heo, Viresh Kumar

The workqueues are all listed in a global list protected by a big mutex.
And this big mutex is used in apply_workqueue_attrs() as well.

Now as we plan to implement a directory to control the cpumask of
all non-ABI unbound workqueues, we want to be able to iterate over all
unbound workqueues and call apply_workqueue_attrs() for each of
them with the new cpumask.

But the risk for a deadlock is on the way: we need to iterate the list
of workqueues under wq_pool_mutex. But then apply_workqueue_attrs()
itself calls wq_pool_mutex.

The easiest solution to work around this is to keep track of unbound
workqueues in a separate list with a separate mutex.

It's not very pretty unfortunately.

Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Not-Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/workqueue.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4d230e3..ad8f727 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -232,6 +232,7 @@ struct wq_device;
 struct workqueue_struct {
 	struct list_head	pwqs;		/* WR: all pwqs of this wq */
 	struct list_head	list;		/* PL: list of all workqueues */
+	struct list_head	unbound_list;	/* PL: list of unbound workqueues */
 
 	struct mutex		mutex;		/* protects this wq */
 	int			work_color;	/* WQ: current work color */
@@ -288,9 +289,11 @@ static bool wq_numa_enabled;		/* unbound NUMA affinity enabled */
 static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
 
 static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
+static DEFINE_MUTEX(wq_unbound_mutex);	/* protects list of unbound workqueues */
 static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
 
 static LIST_HEAD(workqueues);		/* PL: list of all workqueues */
+static LIST_HEAD(workqueues_unbound);	/* PL: list of unbound workqueues */
 static bool workqueue_freezing;		/* PL: have wqs started freezing? */
 
 /* the per-cpu worker pools */
@@ -4263,6 +4266,12 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
 
 	mutex_unlock(&wq_pool_mutex);
 
+	if (wq->flags & WQ_UNBOUND) {
+		mutex_lock(&wq_unbound_mutex);
+		list_add(&wq->unbound_list, &workqueues_unbound);
+		mutex_unlock(&wq_unbound_mutex);
+	}
+
 	return wq;
 
 err_free_wq:
@@ -4318,6 +4327,12 @@ void destroy_workqueue(struct workqueue_struct *wq)
 	list_del_init(&wq->list);
 	mutex_unlock(&wq_pool_mutex);
 
+	if (wq->flags & WQ_UNBOUND) {
+		mutex_lock(&wq_unbound_mutex);
+		list_del(&wq->unbound_list);
+		mutex_unlock(&wq_unbound_mutex);
+	}
+
 	workqueue_sysfs_unregister(wq);
 
 	if (wq->rescuer) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy
  2014-03-14 16:38 [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues Frederic Weisbecker
  2014-03-14 16:38 ` [PATCH 1/3] workqueue: Move workqueue bus attr to device attribute Frederic Weisbecker
  2014-03-14 16:38 ` [PATCH 2/3] workqueues: Account unbound workqueue in a seperate list Frederic Weisbecker
@ 2014-03-14 16:38 ` Frederic Weisbecker
  2014-03-14 19:08   ` Kevin Hilman
  2014-03-14 18:06 ` [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues Kevin Hilman
  3 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2014-03-14 16:38 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Christoph Lameter, Kevin Hilman,
	Mike Galbraith, Paul E. McKenney, Tejun Heo, Viresh Kumar

We call "anon workqueues" the set of unbound workqueues that don't
carry the WQ_SYSFS flag.

They are a problem nowaday because people who work on CPU isolation
(HPC, Real time, etc...) want to be able to migrate all the unbound
workqueues away to a single CPU. This control is possible through sysfs
but only with WQ_SYSFS workqueues.

Now we need to deal with the other unbound workqueues. There is two
possible solutions:

1) Implement a sysfs directory for each unbound !WQ_SYSFS. This could
be done with a specific Kconfig to make sure that these workqueue
won't be considered as a stable ABI. But we all know that all distros
will enable this Kconfig symbol and that a warning in the Kconfig help
text won't protect against anything.

2) Implement a single sysfs directory containing only the cpumask file
to the control the affinity of all the !WQ_SYSFS workqueues.

This patch implements the second solution.

Two issues I have seen though:

* This triggers the following warning in apply_workqueue_attrs():

	/* creating multiple pwqs breaks ordering guarantee */
	if (WARN_ON((wq->flags & __WQ_ORDERED) && !list_empty(&wq->pwqs)))
		return -EINVAL;

  I haven't yet checked into the details.

* wq_calc_node_cpumask() tells that if NUMA affinity is not enabled,
  cpumask is always used. Which suggest that if NUMA affinity is enabled
  the cpumask may be ignored?

Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Not-Yet-Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/workqueue.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 112 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ad8f727..aabee1f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -289,7 +289,8 @@ static bool wq_numa_enabled;		/* unbound NUMA affinity enabled */
 static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
 
 static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
-static DEFINE_MUTEX(wq_unbound_mutex);	/* protects list of unbound workqueues */
+/* protects list of unbound workqueues and wq_anon_cpumask*/
+static DEFINE_MUTEX(wq_unbound_mutex);
 static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
 
 static LIST_HEAD(workqueues);		/* PL: list of all workqueues */
@@ -3311,13 +3312,122 @@ static struct device_attribute wq_sysfs_unbound_attrs[] = {
 	__ATTR_NULL,
 };
 
+/* Protected by wq_unbound_mutex */
+static cpumask_t wq_anon_cpumask;
+static ssize_t wq_anon_cpumask_show(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	int written;
+
+	mutex_lock(&wq_unbound_mutex);
+	written = cpumask_scnprintf(buf, PAGE_SIZE, &wq_anon_cpumask);
+	mutex_unlock(&wq_unbound_mutex);
+
+	written += scnprintf(buf + written, PAGE_SIZE - written, "\n");
+
+	return written;
+}
+
+/* Must be called with wq_unbound_mutex held */
+static int wq_anon_cpumask_set(cpumask_var_t cpumask)
+{
+	struct workqueue_attrs *attrs;
+	struct workqueue_struct *wq;
+	int ret;
+
+	list_for_each_entry(wq, &workqueues_unbound, unbound_list) {
+		if (wq->flags & WQ_SYSFS)
+			continue;
+		attrs = wq_sysfs_prep_attrs(wq);
+		if (!attrs)
+			return -ENOMEM;
+
+		cpumask_copy(attrs->cpumask, cpumask);
+		ret = apply_workqueue_attrs(wq, attrs);
+		free_workqueue_attrs(attrs);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+static ssize_t wq_anon_cpumask_store(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, size_t count)
+{
+	cpumask_var_t cpumask;
+	int ret = -EINVAL;
+
+	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
+		return -ENOMEM;
+
+	ret = cpumask_parse(buf, cpumask);
+	if (ret)
+		goto out;
+
+	get_online_cpus();
+	if (cpumask_intersects(cpumask, cpu_online_mask)) {
+		mutex_lock(&wq_unbound_mutex);
+		ret = wq_anon_cpumask_set(cpumask);
+		if (!ret)
+			cpumask_copy(&wq_anon_cpumask, cpumask);
+		mutex_unlock(&wq_unbound_mutex);
+	}
+	put_online_cpus();
+out:
+	free_cpumask_var(cpumask);
+	return ret ? ret : count;
+}
+
+static void device_release(struct device *dev)
+{
+	kfree(dev);
+}
+
+static struct device_attribute wq_sysfs_anon_attr =
+	__ATTR(cpumask, 0644, wq_anon_cpumask_show, wq_anon_cpumask_store);
+
 static struct bus_type wq_subsys = {
 	.name				= "workqueue",
 };
 
 static int __init wq_sysfs_init(void)
 {
-	return subsys_virtual_register(&wq_subsys, NULL);
+	struct device *anon_dev;
+	int ret;
+
+	ret = subsys_virtual_register(&wq_subsys, NULL);
+	if (ret < 0)
+		return ret;
+
+	mutex_lock(&wq_unbound_mutex);
+	cpumask_copy(&wq_anon_cpumask, cpu_possible_mask);
+	mutex_unlock(&wq_unbound_mutex);
+
+	anon_dev = kzalloc(sizeof(*anon_dev), GFP_KERNEL);
+	if (!anon_dev)
+		return -ENOMEM;
+
+	anon_dev->bus = &wq_subsys;
+	anon_dev->init_name = "anon_wqs";
+	anon_dev->release = device_release;
+
+	ret = device_register(anon_dev);
+	if (ret) {
+		kfree(anon_dev);
+		return ret;
+	}
+
+	ret = device_create_file(anon_dev, &wq_sysfs_anon_attr);
+	if (ret) {
+		device_unregister(anon_dev);
+		return ret;
+	}
+
+	kobject_uevent(&anon_dev->kobj, KOBJ_ADD);
+
+	return 0;
 }
 core_initcall(wq_sysfs_init);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues
  2014-03-14 16:38 [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2014-03-14 16:38 ` [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy Frederic Weisbecker
@ 2014-03-14 18:06 ` Kevin Hilman
  3 siblings, 0 replies; 12+ messages in thread
From: Kevin Hilman @ 2014-03-14 18:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christoph Lameter, Mike Galbraith, Paul E. McKenney,
	Tejun Heo, Viresh Kumar

Frederic Weisbecker <fweisbec@gmail.com> writes:

> There are several types of workqueues. Some of them are bound to specific
> CPUs, some others are unbound and can be executed on any CPU.
>
> A tiny subset of the unbound workqueues have a sysfs representation
> in /sys/devices/virtual/workqueue/ and have a cpumask file than can
> be used to tweak their CPU affinity.
>
> But the vast majority of unbound workqueues aren't visible in sysfs.
>
> They are a problem nowaday because people who work on CPU isolation
> (HPC, Real time, etc...) want to be able to migrate all the unbound
> workqueues away from specific CPUs.
>
> There are several possible solutions to solve this:
>
> 1) Affine the !WQ_SYSFS unbound workqueues to the CPUs outside the
> full dynticks mask. Full dynticks is expected to be a component in
> many CPU isolation configurations and its CPU mapping can be a good
> way to retrieve the desired set of isolated CPUs.
>
> On the drawbacks though we can notice the lack of consistency with
> WQ_SYSFS workqueue affinity interface, issues with ordering between
> workqueue and dynticks subsystem initialization, intrusion from
> the workqueue subsystem on dynticks internals.
>
> 2) Implement a sysfs directory for each unbound !WQ_SYSFS. That sounds
> like a very nice solution as it uses existing and known interface.
> But workqueues appearing in the sysfs hierarchy are subject to become
> stable ABIs. And this is definetly not what we want.
>
> This could be worked around with a specific Kconfig to make sure that
> these workqueues won't be considered as a stable ABI. But we all know
> that all distros will enable this Kconfig symbol and that nobody
> reads, nor care about, warnings in Kconfig help text which thereby won't
> protect us against anything.
>
> 3) Implement a single sysfs directory to map properties of all !WQ_SYSFS
> unbound workqueues. It would contain only the cpumask file to control
> the affinity of all these workqueues. But more can be added later.
>
> This complexifies the code a bit although not that much compared to
> solution 2 which requires some plumbling to cope with workqueues created
> before sysfs, as reported by Mike (I played with that a bit as well, as I
> took that direction initially). But it deals with all issues previously
> described.

Since I also tinkered with (1) and (2) and ran into some of the same
issues, I think (3) is a good way.  It also doesn't tie the affinity to
full_nohz, and leaves it up to userspace which addresses a concern of
Mike's in earlier proposals as well.

Kevin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] workqueues: Account unbound workqueue in a seperate list
  2014-03-14 16:38 ` [PATCH 2/3] workqueues: Account unbound workqueue in a seperate list Frederic Weisbecker
@ 2014-03-14 18:17   ` Kevin Hilman
  2014-03-15 12:40     ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Hilman @ 2014-03-14 18:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christoph Lameter, Mike Galbraith, Paul E. McKenney,
	Tejun Heo, Viresh Kumar

Frederic Weisbecker <fweisbec@gmail.com> writes:

> The workqueues are all listed in a global list protected by a big mutex.
> And this big mutex is used in apply_workqueue_attrs() as well.
>
> Now as we plan to implement a directory to control the cpumask of
> all non-ABI unbound workqueues, we want to be able to iterate over all
> unbound workqueues and call apply_workqueue_attrs() for each of
> them with the new cpumask.
>
> But the risk for a deadlock is on the way: we need to iterate the list
> of workqueues under wq_pool_mutex. But then apply_workqueue_attrs()
> itself calls wq_pool_mutex.
>
> The easiest solution to work around this is to keep track of unbound
> workqueues in a separate list with a separate mutex.
>
> It's not very pretty unfortunately.
>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Kevin Hilman <khilman@linaro.org>
> Cc: Mike Galbraith <bitbucket@online.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Not-Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  kernel/workqueue.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 4d230e3..ad8f727 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -232,6 +232,7 @@ struct wq_device;
>  struct workqueue_struct {
>  	struct list_head	pwqs;		/* WR: all pwqs of this wq */
>  	struct list_head	list;		/* PL: list of all workqueues */
> +	struct list_head	unbound_list;	/* PL: list of unbound workqueues */
>  
>  	struct mutex		mutex;		/* protects this wq */
>  	int			work_color;	/* WQ: current work color */
> @@ -288,9 +289,11 @@ static bool wq_numa_enabled;		/* unbound NUMA affinity enabled */
>  static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
>  
>  static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
> +static DEFINE_MUTEX(wq_unbound_mutex);	/* protects list of unbound workqueues */
>  static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
>  
>  static LIST_HEAD(workqueues);		/* PL: list of all workqueues */
> +static LIST_HEAD(workqueues_unbound);	/* PL: list of unbound workqueues */
>  static bool workqueue_freezing;		/* PL: have wqs started freezing? */
>  
>  /* the per-cpu worker pools */
> @@ -4263,6 +4266,12 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
>  
>  	mutex_unlock(&wq_pool_mutex);
>  
> +	if (wq->flags & WQ_UNBOUND) {
> +		mutex_lock(&wq_unbound_mutex);
> +		list_add(&wq->unbound_list, &workqueues_unbound);
> +		mutex_unlock(&wq_unbound_mutex);
> +	}
> +
>  	return wq;
>  
>  err_free_wq:
> @@ -4318,6 +4327,12 @@ void destroy_workqueue(struct workqueue_struct *wq)
>  	list_del_init(&wq->list);
>  	mutex_unlock(&wq_pool_mutex);
>  
> +	if (wq->flags & WQ_UNBOUND) {
> +		mutex_lock(&wq_unbound_mutex);
> +		list_del(&wq->unbound_list);
> +		mutex_unlock(&wq_unbound_mutex);
> +	}
> +
>  	workqueue_sysfs_unregister(wq);
>  
>  	if (wq->rescuer) {

Looks good, except for minor nit: I think you're missing an init of the
new list:

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cc708f23d801..a01592f08321 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4309,6 +4309,7 @@ struct workqueue_struct
*__alloc_workqueue_key(const char *fmt,

        lockdep_init_map(&wq->lockdep_map, lock_name, key, 0);
        INIT_LIST_HEAD(&wq->list);
+       INIT_LIST_HEAD(&wq->unbound_list);

        if (alloc_and_link_pwqs(wq) < 0)
                goto err_free_wq;


Kevin

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy
  2014-03-14 16:38 ` [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy Frederic Weisbecker
@ 2014-03-14 19:08   ` Kevin Hilman
  2014-03-17 14:02     ` Frederic Weisbecker
  2014-03-22 17:01     ` Frederic Weisbecker
  0 siblings, 2 replies; 12+ messages in thread
From: Kevin Hilman @ 2014-03-14 19:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Christoph Lameter, Mike Galbraith, Paul E. McKenney,
	Tejun Heo, Viresh Kumar

Frederic Weisbecker <fweisbec@gmail.com> writes:

> We call "anon workqueues" the set of unbound workqueues that don't
> carry the WQ_SYSFS flag.
>
> They are a problem nowaday because people who work on CPU isolation
> (HPC, Real time, etc...) want to be able to migrate all the unbound
> workqueues away to a single CPU. This control is possible through sysfs
> but only with WQ_SYSFS workqueues.
>
> Now we need to deal with the other unbound workqueues. There is two
> possible solutions:
>
> 1) Implement a sysfs directory for each unbound !WQ_SYSFS. This could
> be done with a specific Kconfig to make sure that these workqueue
> won't be considered as a stable ABI. But we all know that all distros
> will enable this Kconfig symbol and that a warning in the Kconfig help
> text won't protect against anything.
>
> 2) Implement a single sysfs directory containing only the cpumask file
> to the control the affinity of all the !WQ_SYSFS workqueues.
>
> This patch implements the second solution.
>
> Two issues I have seen though:
>
> * This triggers the following warning in apply_workqueue_attrs():
>
> 	/* creating multiple pwqs breaks ordering guarantee */
> 	if (WARN_ON((wq->flags & __WQ_ORDERED) && !list_empty(&wq->pwqs)))
> 		return -EINVAL;
>
>   I haven't yet checked into the details.

I tried to test this series and ran into this too for the kmmcd
workqueue.  Looking at the commit that introduced this check, it looks
changing attributes will break the ordering constraints[1], so it's
prevented all together.  hmmm...

Kevin

[1]
commit 8719dceae2f98a578507c0f6b49c93f320bd729c
Author: Tejun Heo <tj@kernel.org>
Date:   Tue Mar 12 11:30:04 2013 -0700

workqueue: reject adjusting max_active or applying attrs to ordered
workqueues

Adjusting max_active of or applying new workqueue_attrs to an ordered
workqueue breaks its ordering guarantee.  The former is obvious.  The
latter is because applying attrs creates a new pwq (pool_workqueue) and
there is no ordering constraint between the old and new pwqs.

Make apply_workqueue_attrs() and workqueue_set_max_active() trigger
WARN_ON() if those operations are requested on an ordered workqueue
and fail / ignore respectively.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] workqueues: Account unbound workqueue in a seperate list
  2014-03-14 18:17   ` Kevin Hilman
@ 2014-03-15 12:40     ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2014-03-15 12:40 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: LKML, Christoph Lameter, Mike Galbraith, Paul E. McKenney,
	Tejun Heo, Viresh Kumar

On Fri, Mar 14, 2014 at 11:17:35AM -0700, Kevin Hilman wrote:
> Frederic Weisbecker <fweisbec@gmail.com> writes:
> 
> > The workqueues are all listed in a global list protected by a big mutex.
> > And this big mutex is used in apply_workqueue_attrs() as well.
> >
> > Now as we plan to implement a directory to control the cpumask of
> > all non-ABI unbound workqueues, we want to be able to iterate over all
> > unbound workqueues and call apply_workqueue_attrs() for each of
> > them with the new cpumask.
> >
> > But the risk for a deadlock is on the way: we need to iterate the list
> > of workqueues under wq_pool_mutex. But then apply_workqueue_attrs()
> > itself calls wq_pool_mutex.
> >
> > The easiest solution to work around this is to keep track of unbound
> > workqueues in a separate list with a separate mutex.
> >
> > It's not very pretty unfortunately.
> >
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Kevin Hilman <khilman@linaro.org>
> > Cc: Mike Galbraith <bitbucket@online.de>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Not-Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  kernel/workqueue.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 4d230e3..ad8f727 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -232,6 +232,7 @@ struct wq_device;
> >  struct workqueue_struct {
> >  	struct list_head	pwqs;		/* WR: all pwqs of this wq */
> >  	struct list_head	list;		/* PL: list of all workqueues */
> > +	struct list_head	unbound_list;	/* PL: list of unbound workqueues */
> >  
> >  	struct mutex		mutex;		/* protects this wq */
> >  	int			work_color;	/* WQ: current work color */
> > @@ -288,9 +289,11 @@ static bool wq_numa_enabled;		/* unbound NUMA affinity enabled */
> >  static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
> >  
> >  static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
> > +static DEFINE_MUTEX(wq_unbound_mutex);	/* protects list of unbound workqueues */
> >  static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
> >  
> >  static LIST_HEAD(workqueues);		/* PL: list of all workqueues */
> > +static LIST_HEAD(workqueues_unbound);	/* PL: list of unbound workqueues */
> >  static bool workqueue_freezing;		/* PL: have wqs started freezing? */
> >  
> >  /* the per-cpu worker pools */
> > @@ -4263,6 +4266,12 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
> >  
> >  	mutex_unlock(&wq_pool_mutex);
> >  
> > +	if (wq->flags & WQ_UNBOUND) {
> > +		mutex_lock(&wq_unbound_mutex);
> > +		list_add(&wq->unbound_list, &workqueues_unbound);
> > +		mutex_unlock(&wq_unbound_mutex);
> > +	}
> > +
> >  	return wq;
> >  
> >  err_free_wq:
> > @@ -4318,6 +4327,12 @@ void destroy_workqueue(struct workqueue_struct *wq)
> >  	list_del_init(&wq->list);
> >  	mutex_unlock(&wq_pool_mutex);
> >  
> > +	if (wq->flags & WQ_UNBOUND) {
> > +		mutex_lock(&wq_unbound_mutex);
> > +		list_del(&wq->unbound_list);
> > +		mutex_unlock(&wq_unbound_mutex);
> > +	}
> > +
> >  	workqueue_sysfs_unregister(wq);
> >  
> >  	if (wq->rescuer) {
> 
> Looks good, except for minor nit: I think you're missing an init of the
> new list:
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index cc708f23d801..a01592f08321 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4309,6 +4309,7 @@ struct workqueue_struct
> *__alloc_workqueue_key(const char *fmt,
> 
>         lockdep_init_map(&wq->lockdep_map, lock_name, key, 0);
>         INIT_LIST_HEAD(&wq->list);
> +       INIT_LIST_HEAD(&wq->unbound_list);

Actually that's only for the head of a list. Nodes don't need such initialization.

Thanks.

> 
>         if (alloc_and_link_pwqs(wq) < 0)
>                 goto err_free_wq;
> 
> 
> Kevin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy
  2014-03-14 19:08   ` Kevin Hilman
@ 2014-03-17 14:02     ` Frederic Weisbecker
  2014-03-22 17:01     ` Frederic Weisbecker
  1 sibling, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2014-03-17 14:02 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: LKML, Christoph Lameter, Mike Galbraith, Paul E. McKenney,
	Tejun Heo, Viresh Kumar

On Fri, Mar 14, 2014 at 12:08:39PM -0700, Kevin Hilman wrote:
> Frederic Weisbecker <fweisbec@gmail.com> writes:
> 
> > We call "anon workqueues" the set of unbound workqueues that don't
> > carry the WQ_SYSFS flag.
> >
> > They are a problem nowaday because people who work on CPU isolation
> > (HPC, Real time, etc...) want to be able to migrate all the unbound
> > workqueues away to a single CPU. This control is possible through sysfs
> > but only with WQ_SYSFS workqueues.
> >
> > Now we need to deal with the other unbound workqueues. There is two
> > possible solutions:
> >
> > 1) Implement a sysfs directory for each unbound !WQ_SYSFS. This could
> > be done with a specific Kconfig to make sure that these workqueue
> > won't be considered as a stable ABI. But we all know that all distros
> > will enable this Kconfig symbol and that a warning in the Kconfig help
> > text won't protect against anything.
> >
> > 2) Implement a single sysfs directory containing only the cpumask file
> > to the control the affinity of all the !WQ_SYSFS workqueues.
> >
> > This patch implements the second solution.
> >
> > Two issues I have seen though:
> >
> > * This triggers the following warning in apply_workqueue_attrs():
> >
> > 	/* creating multiple pwqs breaks ordering guarantee */
> > 	if (WARN_ON((wq->flags & __WQ_ORDERED) && !list_empty(&wq->pwqs)))
> > 		return -EINVAL;
> >
> >   I haven't yet checked into the details.
> 
> I tried to test this series and ran into this too for the kmmcd
> workqueue.  Looking at the commit that introduced this check, it looks
> changing attributes will break the ordering constraints[1], so it's
> prevented all together.  hmmm...
> 
> Kevin
> 
> [1]
> commit 8719dceae2f98a578507c0f6b49c93f320bd729c
> Author: Tejun Heo <tj@kernel.org>
> Date:   Tue Mar 12 11:30:04 2013 -0700
> 
> workqueue: reject adjusting max_active or applying attrs to ordered
> workqueues
> 
> Adjusting max_active of or applying new workqueue_attrs to an ordered
> workqueue breaks its ordering guarantee.  The former is obvious.  The
> latter is because applying attrs creates a new pwq (pool_workqueue) and
> there is no ordering constraint between the old and new pwqs.

Ah I see. The way apply_workqueue_attrs() applies the cpumask with the pwqs creation
does break ordering.

Hmm, looks like some more plumbing is required.

> 
> Make apply_workqueue_attrs() and workqueue_set_max_active() trigger
> WARN_ON() if those operations are requested on an ordered workqueue
> and fail / ignore respectively.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy
  2014-03-14 19:08   ` Kevin Hilman
  2014-03-17 14:02     ` Frederic Weisbecker
@ 2014-03-22 17:01     ` Frederic Weisbecker
  2014-03-22 18:55       ` Tejun Heo
  1 sibling, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2014-03-22 17:01 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: LKML, Christoph Lameter, Mike Galbraith, Paul E. McKenney,
	Tejun Heo, Viresh Kumar

On Fri, Mar 14, 2014 at 12:08:39PM -0700, Kevin Hilman wrote:
> Frederic Weisbecker <fweisbec@gmail.com> writes:
> 
> > We call "anon workqueues" the set of unbound workqueues that don't
> > carry the WQ_SYSFS flag.
> >
> > They are a problem nowaday because people who work on CPU isolation
> > (HPC, Real time, etc...) want to be able to migrate all the unbound
> > workqueues away to a single CPU. This control is possible through sysfs
> > but only with WQ_SYSFS workqueues.
> >
> > Now we need to deal with the other unbound workqueues. There is two
> > possible solutions:
> >
> > 1) Implement a sysfs directory for each unbound !WQ_SYSFS. This could
> > be done with a specific Kconfig to make sure that these workqueue
> > won't be considered as a stable ABI. But we all know that all distros
> > will enable this Kconfig symbol and that a warning in the Kconfig help
> > text won't protect against anything.
> >
> > 2) Implement a single sysfs directory containing only the cpumask file
> > to the control the affinity of all the !WQ_SYSFS workqueues.
> >
> > This patch implements the second solution.
> >
> > Two issues I have seen though:
> >
> > * This triggers the following warning in apply_workqueue_attrs():
> >
> > 	/* creating multiple pwqs breaks ordering guarantee */
> > 	if (WARN_ON((wq->flags & __WQ_ORDERED) && !list_empty(&wq->pwqs)))
> > 		return -EINVAL;
> >
> >   I haven't yet checked into the details.
> 
> I tried to test this series and ran into this too for the kmmcd
> workqueue.  Looking at the commit that introduced this check, it looks
> changing attributes will break the ordering constraints[1], so it's
> prevented all together.  hmmm...

So I stared at this and it's indeed hard to get it correct. On workqueue creation
creation, apply_workqueue_attr() is called a first time to attach the worker pool
to it.

Non ordered workqueues seem to have a worker per NUMA while ordered workqueues
have the same worker for all CPUs (due to the no_numa forced to true). That's what
enforce the ordering.

So if we apply new attrs, the worker is replaced. But in the replacement process,
a work can be queued on the new worker while other work may execute on the old worker.

I have some random ideas to solve that but all of them imply performance issues:

1) Call a per workqueue mutex when a work execute on an ordered workqueue. Although
  contention should be very rare (only while we replace the workqueue attrs and
  switch to a new worker), frequent locking may have a visible impact.

2) Have a seperate worker for all ordered workqueues. But we may lose a bit of
  serialization with other workqueues along the way.

Actually the second solution could work since only ordered workqueue should use
the default global worker most of the time. Ah no wait, non-ordered workqueues
only allocate per node workers on NUMA...

Grrr.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy
  2014-03-22 17:01     ` Frederic Weisbecker
@ 2014-03-22 18:55       ` Tejun Heo
  2014-03-22 22:04         ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2014-03-22 18:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Kevin Hilman, LKML, Christoph Lameter, Mike Galbraith,
	Paul E. McKenney, Viresh Kumar

On Sat, Mar 22, 2014 at 06:01:18PM +0100, Frederic Weisbecker wrote:
> 1) Call a per workqueue mutex when a work execute on an ordered workqueue. Although
>   contention should be very rare (only while we replace the workqueue attrs and
>   switch to a new worker), frequent locking may have a visible impact.
> 
> 2) Have a seperate worker for all ordered workqueues. But we may lose a bit of
>   serialization with other workqueues along the way.

Ordered workqueues are always bound to the fallback default worker
pool.  Why not just adjust cpus_allowed of the worker pool?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy
  2014-03-22 18:55       ` Tejun Heo
@ 2014-03-22 22:04         ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2014-03-22 22:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kevin Hilman, LKML, Christoph Lameter, Mike Galbraith,
	Paul E. McKenney, Viresh Kumar

On Sat, Mar 22, 2014 at 02:55:51PM -0400, Tejun Heo wrote:
> On Sat, Mar 22, 2014 at 06:01:18PM +0100, Frederic Weisbecker wrote:
> > 1) Call a per workqueue mutex when a work execute on an ordered workqueue. Although
> >   contention should be very rare (only while we replace the workqueue attrs and
> >   switch to a new worker), frequent locking may have a visible impact.
> > 
> > 2) Have a seperate worker for all ordered workqueues. But we may lose a bit of
> >   serialization with other workqueues along the way.
> 
> Ordered workqueues are always bound to the fallback default worker
> pool.  Why not just adjust cpus_allowed of the worker pool?

Now that you tell me, that sounds obvious enough. I'll try something.

Thanks.

> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-03-22 22:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-14 16:38 [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues Frederic Weisbecker
2014-03-14 16:38 ` [PATCH 1/3] workqueue: Move workqueue bus attr to device attribute Frederic Weisbecker
2014-03-14 16:38 ` [PATCH 2/3] workqueues: Account unbound workqueue in a seperate list Frederic Weisbecker
2014-03-14 18:17   ` Kevin Hilman
2014-03-15 12:40     ` Frederic Weisbecker
2014-03-14 16:38 ` [PATCH 3/3] workqueue: Add anon workqueue sysfs hierarchy Frederic Weisbecker
2014-03-14 19:08   ` Kevin Hilman
2014-03-17 14:02     ` Frederic Weisbecker
2014-03-22 17:01     ` Frederic Weisbecker
2014-03-22 18:55       ` Tejun Heo
2014-03-22 22:04         ` Frederic Weisbecker
2014-03-14 18:06 ` [RFC PATCH 0/3] workqueue: Control cpu affinity of !WQ_SYSFS unbound workqueues Kevin Hilman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.