All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS
@ 2014-07-25 16:55 Lina Iyer
  2014-07-25 16:55 ` [RFC] [PATCH 1/3] irq: Allow multiple clients to register for irq affinity notification Lina Iyer
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Lina Iyer @ 2014-07-25 16:55 UTC (permalink / raw)
  To: linux-pm
  Cc: daniel.lezcano, linus.walleij, arnd.bergmann, rjw, tglx, Lina Iyer

This series of patches adds a new feature to allow per-cpu PM QoS.

The first of the patch, modifies the irq manager to allow multiple clients for
IRQ SMP affinity change notification. Today, only one client can register a
notification per IRQ. The PM QoS framework is also now interested in knowing
when the SMP affinity changes for an IRQ. With the current implementation, a
second registration on the change notification releases the current
notification callbacks and registers the new one. Modify the notification
mechanism to use a list for notification instead of single data structure.
Also, a client that wants to de-register from the notification will now need to
call a separate API instead of the overloaded function call with a NULL
argument.

The next two patches re-organize PM QoS framework to allow QoS and the Dev PM
QoS frameworks to specify a request type. Most requestors of PM QoS do not know
or care about the CPU(s) the QoS needs to be effected. In many cases, it is
still desirable to have the QoS apply on all available cpus. However, in
conjunction with an IRQ balancer or a driver that has specific cpu(s)
requirement for its use, can specify a QoS request only for that set of cpus.
For example in a case, where a certain IRQ might need a performance QoS, but
does not want to affect the general power consumption of all the cpus in the
system, can specify an QoS request thats affine only to that cpu(s), where the
IRQ can be triggered.

The change adds ability to specify the PM QoS request types and two new PM QoS
request types in addition to the default that applies to all cpus.

PM_QOS_REQ_AFFINE_CORES: This allows drivers to specify a certain set of cpus
that the request should be applied on.

PM_QOS_REQ_AFFINE_IRQ: This allows drivers to specify an IRQ to which the QoS
request can be tracked with. This uses the IRQ SMP affinity notification to set
the cpumask of the affected cpus internally.

The request defaults to PM_QOS_REQ_ALL_CORES when not explicitly specified and
applies the request to all cpus.

There is also a provision to read the QoS request value for a constraint, for a
constraint for a set of cpus or just a cpu. CPUIdle governors can use this
feature to get a QoS request for the cpu(s) they are interested in.



Lina Iyer (3):
  irq: Allow multiple clients to register for irq affinity notification
  QoS: Modify data structures and function arguments for scalability.
  QoS: Enhance framework to support cpu/irq specific QoS requests

 Documentation/power/pm_qos_interface.txt |  18 +++
 drivers/base/power/qos.c                 |  14 +--
 drivers/infiniband/hw/qib/qib_iba7322.c  |   4 +-
 include/linux/interrupt.h                |  12 +-
 include/linux/irq.h                      |   1 +
 include/linux/irqdesc.h                  |   6 +-
 include/linux/pm_qos.h                   |  23 +++-
 kernel/irq/irqdesc.c                     |   1 +
 kernel/irq/manage.c                      |  77 ++++++++-----
 kernel/power/qos.c                       | 184 ++++++++++++++++++++++++++++++-
 lib/cpu_rmap.c                           |   2 +-
 11 files changed, 289 insertions(+), 53 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC] [PATCH 1/3] irq: Allow multiple clients to register for irq affinity notification
  2014-07-25 16:55 [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS Lina Iyer
@ 2014-07-25 16:55 ` Lina Iyer
  2014-08-01 14:48   ` Daniel Lezcano
  2014-07-25 16:55 ` [RFC] [PATCH 2/3] QoS: Modify data structures and function arguments for scalability Lina Iyer
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Lina Iyer @ 2014-07-25 16:55 UTC (permalink / raw)
  To: linux-pm
  Cc: daniel.lezcano, linus.walleij, arnd.bergmann, rjw, tglx, Lina Iyer

The current implementation allows only a single notification callback
whenever the IRQ's SMP affinity is changed. Adding a second notification
punts the existing notifier function out of registration.

Add a list of notifiers, allowing multiple clients to register for irq
affinity notifications.

Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
---
 drivers/infiniband/hw/qib/qib_iba7322.c |  4 +-
 include/linux/interrupt.h               | 12 ++++-
 include/linux/irq.h                     |  1 +
 include/linux/irqdesc.h                 |  6 ++-
 kernel/irq/irqdesc.c                    |  1 +
 kernel/irq/manage.c                     | 77 ++++++++++++++++++++-------------
 lib/cpu_rmap.c                          |  2 +-
 7 files changed, 66 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index a7eb325..62cb77d 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -3345,9 +3345,7 @@ static void reset_dca_notifier(struct qib_devdata *dd, struct qib_msix_entry *m)
 		"Disabling notifier on HCA %d irq %d\n",
 		dd->unit,
 		m->msix.vector);
-	irq_set_affinity_notifier(
-		m->msix.vector,
-		NULL);
+	irq_release_affinity_notifier(m->notifier);
 	m->notifier = NULL;
 }
 
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 698ad05..c1e227c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -203,7 +203,7 @@ static inline int check_wakeup_irqs(void) { return 0; }
  * struct irq_affinity_notify - context for notification of IRQ affinity changes
  * @irq:		Interrupt to which notification applies
  * @kref:		Reference count, for internal use
- * @work:		Work item, for internal use
+ * @list:		Add to the notifier list, for internal use
  * @notify:		Function to be called on change.  This will be
  *			called in process context.
  * @release:		Function to be called on release.  This will be
@@ -214,7 +214,7 @@ static inline int check_wakeup_irqs(void) { return 0; }
 struct irq_affinity_notify {
 	unsigned int irq;
 	struct kref kref;
-	struct work_struct work;
+	struct list_head list;
 	void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
 	void (*release)(struct kref *ref);
 };
@@ -265,6 +265,8 @@ extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
 extern int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
 
+extern int
+irq_release_affinity_notifier(struct irq_affinity_notify *notify);
 #else /* CONFIG_SMP */
 
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
@@ -295,6 +297,12 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
 {
 	return 0;
 }
+
+static inline int
+irq_release_affinity_notifier(struct irq_affinity_notify *notify)
+{
+	return 0;
+}
 #endif /* CONFIG_SMP */
 
 /*
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 62af592..2634a48 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -20,6 +20,7 @@
 #include <linux/errno.h>
 #include <linux/topology.h>
 #include <linux/wait.h>
+#include <linux/list.h>
 
 #include <asm/irq.h>
 #include <asm/ptrace.h>
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 472c021..db3509e 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -31,7 +31,8 @@ struct irq_desc;
  * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
  * @lock:		locking for SMP
  * @affinity_hint:	hint to user space for preferred irq affinity
- * @affinity_notify:	context for notification of affinity changes
+ * @affinity_notify:	list of notification clients for affinity changes
+ * @affinity_work:	Work queue for handling affinity change notifications
  * @pending_mask:	pending rebalanced interrupts
  * @threads_oneshot:	bitfield to handle shared oneshot threads
  * @threads_active:	number of irqaction threads currently running
@@ -60,7 +61,8 @@ struct irq_desc {
 	struct cpumask		*percpu_enabled;
 #ifdef CONFIG_SMP
 	const struct cpumask	*affinity_hint;
-	struct irq_affinity_notify *affinity_notify;
+	struct list_head	affinity_notify;
+	struct work_struct	affinity_work;
 #ifdef CONFIG_GENERIC_PENDING_IRQ
 	cpumask_var_t		pending_mask;
 #endif
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 1487a12..c95e1f3 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -91,6 +91,7 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
 	for_each_possible_cpu(cpu)
 		*per_cpu_ptr(desc->kstat_irqs, cpu) = 0;
 	desc_smp_init(desc, node);
+	INIT_LIST_HEAD(&desc->affinity_notify);
 }
 
 int nr_irqs = NR_IRQS;
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 88657d7..cd7fc48 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -209,10 +209,9 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask,
 		irq_copy_pending(desc, mask);
 	}
 
-	if (desc->affinity_notify) {
-		kref_get(&desc->affinity_notify->kref);
-		schedule_work(&desc->affinity_notify->work);
-	}
+	if (!list_empty(&desc->affinity_notify))
+		schedule_work(&desc->affinity_work);
+
 	irqd_set(data, IRQD_AFFINITY_SET);
 
 	return ret;
@@ -248,14 +247,14 @@ EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
 static void irq_affinity_notify(struct work_struct *work)
 {
-	struct irq_affinity_notify *notify =
-		container_of(work, struct irq_affinity_notify, work);
-	struct irq_desc *desc = irq_to_desc(notify->irq);
+	struct irq_desc *desc =
+			container_of(work, struct irq_desc, affinity_work);
 	cpumask_var_t cpumask;
 	unsigned long flags;
+	struct irq_affinity_notify *notify;
 
 	if (!desc || !alloc_cpumask_var(&cpumask, GFP_KERNEL))
-		goto out;
+		return;
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
 	if (irq_move_pending(&desc->irq_data))
@@ -264,11 +263,20 @@ static void irq_affinity_notify(struct work_struct *work)
 		cpumask_copy(cpumask, desc->irq_data.affinity);
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 
-	notify->notify(notify, cpumask);
+	list_for_each_entry(notify, &desc->affinity_notify, list) {
+		/**
+		 * Check and get the kref only if the kref has not been
+		 * released by now. Its possible that the reference count
+		 * is already 0, we dont want to notify those if they are
+		 * already released.
+		 */
+		if (!kref_get_unless_zero(&notify->kref))
+			continue;
+		notify->notify(notify, cpumask);
+		kref_put(&notify->kref, notify->release);
+	}
 
 	free_cpumask_var(cpumask);
-out:
-	kref_put(&notify->kref, notify->release);
 }
 
 /**
@@ -286,8 +294,6 @@ int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
-	struct irq_affinity_notify *old_notify;
-	unsigned long flags;
 
 	/* The release function is promised process context */
 	might_sleep();
@@ -295,25 +301,37 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
 	if (!desc)
 		return -EINVAL;
 
-	/* Complete initialisation of *notify */
-	if (notify) {
-		notify->irq = irq;
-		kref_init(&notify->kref);
-		INIT_WORK(&notify->work, irq_affinity_notify);
+	if (!notify) {
+		WARN("%s called with NULL notifier - use irq_release_affinity_notifier function instead.\n",
+				__func__);
+		return -EINVAL;
 	}
 
-	raw_spin_lock_irqsave(&desc->lock, flags);
-	old_notify = desc->affinity_notify;
-	desc->affinity_notify = notify;
-	raw_spin_unlock_irqrestore(&desc->lock, flags);
-
-	if (old_notify)
-		kref_put(&old_notify->kref, old_notify->release);
+	notify->irq = irq;
+	kref_init(&notify->kref);
+	INIT_LIST_HEAD(&notify->list);
+	list_add(&desc->affinity_notify, &notify->list);
 
 	return 0;
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_notifier);
 
+/**
+ *	irq_release_affinity_notifier - Remove us from notifications
+ *	@notify: Context for notification
+ */
+int irq_release_affinity_notifier(struct irq_affinity_notify *notify)
+{
+	if (!notify)
+		return -EINVAL;
+
+	list_del(&notify->list);
+	kref_put(&notify->kref, notify->release);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(irq_release_affinity_notifier);
+
 #ifndef CONFIG_AUTO_IRQ_AFFINITY
 /*
  * Generic version of the affinity autoselector.
@@ -348,6 +366,8 @@ setup_affinity(unsigned int irq, struct irq_desc *desc, struct cpumask *mask)
 		if (cpumask_intersects(mask, nodemask))
 			cpumask_and(mask, mask, nodemask);
 	}
+	INIT_LIST_HEAD(&desc->affinity_notify);
+	INIT_WORK(&desc->affinity_work, irq_affinity_notify);
 	irq_do_set_affinity(&desc->irq_data, mask, false);
 	return 0;
 }
@@ -1414,14 +1434,13 @@ EXPORT_SYMBOL_GPL(remove_irq);
 void free_irq(unsigned int irq, void *dev_id)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_affinity_notify *notify;
 
 	if (!desc || WARN_ON(irq_settings_is_per_cpu_devid(desc)))
 		return;
 
-#ifdef CONFIG_SMP
-	if (WARN_ON(desc->affinity_notify))
-		desc->affinity_notify = NULL;
-#endif
+	list_for_each_entry(notify, &desc->affinity_notify, list)
+		kref_put(&notify->kref, notify->release);
 
 	chip_bus_lock(desc);
 	kfree(__free_irq(irq, dev_id));
diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c
index 4f134d8..0c8da50 100644
--- a/lib/cpu_rmap.c
+++ b/lib/cpu_rmap.c
@@ -235,7 +235,7 @@ void free_irq_cpu_rmap(struct cpu_rmap *rmap)
 
 	for (index = 0; index < rmap->used; index++) {
 		glue = rmap->obj[index];
-		irq_set_affinity_notifier(glue->notify.irq, NULL);
+		irq_release_affinity_notifier(&glue->notify);
 	}
 
 	cpu_rmap_put(rmap);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC] [PATCH 2/3] QoS: Modify data structures and function arguments for scalability.
  2014-07-25 16:55 [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS Lina Iyer
  2014-07-25 16:55 ` [RFC] [PATCH 1/3] irq: Allow multiple clients to register for irq affinity notification Lina Iyer
@ 2014-07-25 16:55 ` Lina Iyer
  2014-07-25 16:55 ` [RFC] [PATCH 3/3] QoS: Enhance framework to support cpu/irq specific QoS requests Lina Iyer
  2014-08-01 11:54 ` [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS Daniel Lezcano
  3 siblings, 0 replies; 9+ messages in thread
From: Lina Iyer @ 2014-07-25 16:55 UTC (permalink / raw)
  To: linux-pm
  Cc: daniel.lezcano, linus.walleij, arnd.bergmann, rjw, tglx,
	Lina Iyer, Praveen Chidambaram

QoS add requests uses a handle to the priority list that is used
internally to save the request, but this does not extend well. Also,
dev_pm_qos structure definition seems to use a list object directly.
The 'derivative' relationship seems to be broken.

Use pm_qos_request objects instead of passing around the protected
priority list object.

Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>
Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
---
 drivers/base/power/qos.c | 14 +++++++-------
 include/linux/pm_qos.h   |  7 ++++---
 kernel/power/qos.c       | 14 ++++++++------
 3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 36b9eb4..67a66b1 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -143,7 +143,7 @@ static int apply_constraint(struct dev_pm_qos_request *req,
 	switch(req->type) {
 	case DEV_PM_QOS_RESUME_LATENCY:
 		ret = pm_qos_update_target(&qos->resume_latency,
-					   &req->data.pnode, action, value);
+					   &req->data.lat, action, value);
 		if (ret) {
 			value = pm_qos_read_value(&qos->resume_latency);
 			blocking_notifier_call_chain(&dev_pm_notifiers,
@@ -153,7 +153,7 @@ static int apply_constraint(struct dev_pm_qos_request *req,
 		break;
 	case DEV_PM_QOS_LATENCY_TOLERANCE:
 		ret = pm_qos_update_target(&qos->latency_tolerance,
-					   &req->data.pnode, action, value);
+					   &req->data.lat, action, value);
 		if (ret) {
 			value = pm_qos_read_value(&qos->latency_tolerance);
 			req->dev->power.set_latency_tolerance(req->dev, value);
@@ -254,7 +254,7 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
 
 	/* Flush the constraints lists for the device. */
 	c = &qos->resume_latency;
-	plist_for_each_entry_safe(req, tmp, &c->list, data.pnode) {
+	plist_for_each_entry_safe(req, tmp, &c->list, data.lat.node) {
 		/*
 		 * Update constraints list and call the notification
 		 * callbacks if needed
@@ -263,7 +263,7 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
 		memset(req, 0, sizeof(*req));
 	}
 	c = &qos->latency_tolerance;
-	plist_for_each_entry_safe(req, tmp, &c->list, data.pnode) {
+	plist_for_each_entry_safe(req, tmp, &c->list, data.lat.node) {
 		apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_DEFAULT_VALUE);
 		memset(req, 0, sizeof(*req));
 	}
@@ -378,7 +378,7 @@ static int __dev_pm_qos_update_request(struct dev_pm_qos_request *req,
 	switch(req->type) {
 	case DEV_PM_QOS_RESUME_LATENCY:
 	case DEV_PM_QOS_LATENCY_TOLERANCE:
-		curr_value = req->data.pnode.prio;
+		curr_value = req->data.lat.node.prio;
 		break;
 	case DEV_PM_QOS_FLAGS:
 		curr_value = req->data.flr.flags;
@@ -831,8 +831,8 @@ s32 dev_pm_qos_get_user_latency_tolerance(struct device *dev)
 	mutex_lock(&dev_pm_qos_mtx);
 	ret = IS_ERR_OR_NULL(dev->power.qos)
 		|| !dev->power.qos->latency_tolerance_req ?
-			PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT :
-			dev->power.qos->latency_tolerance_req->data.pnode.prio;
+		PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT :
+		dev->power.qos->latency_tolerance_req->data.lat.node.prio;
 	mutex_unlock(&dev_pm_qos_mtx);
 	return ret;
 }
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 9ab4bf7..e1b763d 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -60,7 +60,7 @@ enum dev_pm_qos_req_type {
 struct dev_pm_qos_request {
 	enum dev_pm_qos_req_type type;
 	union {
-		struct plist_node pnode;
+		struct pm_qos_request lat;
 		struct pm_qos_flags_request flr;
 	} data;
 	struct device *dev;
@@ -112,7 +112,8 @@ static inline int dev_pm_qos_request_active(struct dev_pm_qos_request *req)
 	return req->dev != NULL;
 }
 
-int pm_qos_update_target(struct pm_qos_constraints *c, struct plist_node *node,
+int pm_qos_update_target(struct pm_qos_constraints *c,
+			 struct pm_qos_request *req,
 			 enum pm_qos_req_action action, int value);
 bool pm_qos_update_flags(struct pm_qos_flags *pqf,
 			 struct pm_qos_flags_request *req,
@@ -210,7 +211,7 @@ int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val);
 
 static inline s32 dev_pm_qos_requested_resume_latency(struct device *dev)
 {
-	return dev->power.qos->resume_latency_req->data.pnode.prio;
+	return dev->power.qos->resume_latency_req->data.lat.node.prio;
 }
 
 static inline s32 dev_pm_qos_requested_flags(struct device *dev)
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index 884b770..d0b9c0f 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -161,19 +161,21 @@ static inline void pm_qos_set_value(struct pm_qos_constraints *c, s32 value)
  * pm_qos_update_target - manages the constraints list and calls the notifiers
  *  if needed
  * @c: constraints data struct
- * @node: request to add to the list, to update or to remove
+ * @req: request to add to the list, to update or to remove
  * @action: action to take on the constraints list
  * @value: value of the request to add or update
  *
  * This function returns 1 if the aggregated constraint value has changed, 0
  *  otherwise.
  */
-int pm_qos_update_target(struct pm_qos_constraints *c, struct plist_node *node,
+int pm_qos_update_target(struct pm_qos_constraints *c,
+			 struct pm_qos_request *req,
 			 enum pm_qos_req_action action, int value)
 {
 	unsigned long flags;
 	int prev_value, curr_value, new_value;
 	int ret;
+	struct plist_node *node = &req->node;
 
 	spin_lock_irqsave(&pm_qos_lock, flags);
 	prev_value = pm_qos_get_value(c);
@@ -310,7 +312,7 @@ static void __pm_qos_update_request(struct pm_qos_request *req,
 	if (new_value != req->node.prio)
 		pm_qos_update_target(
 			pm_qos_array[req->pm_qos_class]->constraints,
-			&req->node, PM_QOS_UPDATE_REQ, new_value);
+			req, PM_QOS_UPDATE_REQ, new_value);
 }
 
 /**
@@ -355,7 +357,7 @@ void pm_qos_add_request(struct pm_qos_request *req,
 	INIT_DELAYED_WORK(&req->work, pm_qos_work_fn);
 	trace_pm_qos_add_request(pm_qos_class, value);
 	pm_qos_update_target(pm_qos_array[pm_qos_class]->constraints,
-			     &req->node, PM_QOS_ADD_REQ, value);
+			     req, PM_QOS_ADD_REQ, value);
 }
 EXPORT_SYMBOL_GPL(pm_qos_add_request);
 
@@ -409,7 +411,7 @@ void pm_qos_update_request_timeout(struct pm_qos_request *req, s32 new_value,
 	if (new_value != req->node.prio)
 		pm_qos_update_target(
 			pm_qos_array[req->pm_qos_class]->constraints,
-			&req->node, PM_QOS_UPDATE_REQ, new_value);
+			req, PM_QOS_UPDATE_REQ, new_value);
 
 	schedule_delayed_work(&req->work, usecs_to_jiffies(timeout_us));
 }
@@ -437,7 +439,7 @@ void pm_qos_remove_request(struct pm_qos_request *req)
 
 	trace_pm_qos_remove_request(req->pm_qos_class, PM_QOS_DEFAULT_VALUE);
 	pm_qos_update_target(pm_qos_array[req->pm_qos_class]->constraints,
-			     &req->node, PM_QOS_REMOVE_REQ,
+			     req, PM_QOS_REMOVE_REQ,
 			     PM_QOS_DEFAULT_VALUE);
 	memset(req, 0, sizeof(*req));
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC] [PATCH 3/3] QoS: Enhance framework to support cpu/irq specific QoS requests
  2014-07-25 16:55 [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS Lina Iyer
  2014-07-25 16:55 ` [RFC] [PATCH 1/3] irq: Allow multiple clients to register for irq affinity notification Lina Iyer
  2014-07-25 16:55 ` [RFC] [PATCH 2/3] QoS: Modify data structures and function arguments for scalability Lina Iyer
@ 2014-07-25 16:55 ` Lina Iyer
  2014-08-01 15:58   ` Daniel Lezcano
  2014-08-01 11:54 ` [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS Daniel Lezcano
  3 siblings, 1 reply; 9+ messages in thread
From: Lina Iyer @ 2014-07-25 16:55 UTC (permalink / raw)
  To: linux-pm
  Cc: daniel.lezcano, linus.walleij, arnd.bergmann, rjw, tglx,
	Lina Iyer, Praveen Chidambaram

QoS request for CPU_DMA_LATENCY can be better optimized if the request
can be set only for the required cpus and not all cpus. This helps save
power on other cores, while still gauranteeing the quality of service.

Enhance the QoS constraints data structures to support target value for
each core. Requests specify if the QoS is applicable to all cores
(default) or to a selective subset of the cores or to a core(s), that
the IRQ is affine to.

QoS requests that need to track an IRQ can be set to apply only on the
cpus to which the IRQ's smp_affinity attribute is set to. The QoS
framework will automatically track IRQ migration between the cores. The
QoS is updated to be applied only to the core(s) that the IRQ has been
migrated to.

Idle and interested drivers can request a PM QoS value for a constraint
across all cpus, or a specific cpu or a set of cpus. Separate APIs have
been added to request for individual cpu or a cpumask.  The default
behaviour of PM QoS is maintained i.e, requests that do not specify a
type of the request will continue to be effected on all cores.  Requests
that want to specify an affinity of cpu(s) or an irq, can modify the PM
QoS request data structures by specifying the type of the request and
either the mask of the cpus or the IRQ number depending on the type.
Updating the request does not reset the type of the request.

The userspace sysfs interface does not support CPU/IRQ affinity.

Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>
Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
---
 Documentation/power/pm_qos_interface.txt |  18 ++++
 include/linux/pm_qos.h                   |  16 +++
 kernel/power/qos.c                       | 170 +++++++++++++++++++++++++++++++
 3 files changed, 204 insertions(+)

diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt
index a5da5c7..32b864d 100644
--- a/Documentation/power/pm_qos_interface.txt
+++ b/Documentation/power/pm_qos_interface.txt
@@ -41,6 +41,17 @@ registered notifiers are called only if the target value is now different.
 Clients of pm_qos need to save the returned handle for future use in other
 pm_qos API functions.
 
+The handle is a pm_qos_request object. By default the request object sets the
+request type to PM_QOS_REQ_ALL_CORES, in which case, the PM QoS request
+applies to all cores. However, the driver can also specify a request type to
+be either of
+        PM_QOS_REQ_ALL_CORES,
+        PM_QOS_REQ_AFFINE_CORES,
+        PM_QOS_REQ_AFFINE_IRQ,
+
+Specify the cpumask when type is set to PM_QOS_REQ_AFFINE_CORES and specify
+the IRQ number with PM_QOS_REQ_AFFINE_IRQ.
+
 void pm_qos_update_request(handle, new_target_value):
 Will update the list element pointed to by the handle with the new target value
 and recompute the new aggregated target, calling the notification tree if the
@@ -54,6 +65,13 @@ the request.
 int pm_qos_request(param_class):
 Returns the aggregated value for a given PM QoS class.
 
+int pm_qos_request_for_cpu(param_class, cpu):
+Returns the aggregated value for a given PM QoS class for the specified cpu.
+
+int pm_qos_request_for_cpumask(param_class, cpumask):
+Returns the aggregated value for a given PM QoS class for the specified
+cpumask.
+
 int pm_qos_request_active(handle):
 Returns if the request is still active, i.e. it has not been removed from a
 PM QoS class constraints list.
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index e1b763d..05d7c46 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -9,6 +9,8 @@
 #include <linux/miscdevice.h>
 #include <linux/device.h>
 #include <linux/workqueue.h>
+#include <linux/cpumask.h>
+#include <linux/interrupt.h>
 
 enum {
 	PM_QOS_RESERVED = 0,
@@ -40,7 +42,18 @@ enum pm_qos_flags_status {
 #define PM_QOS_FLAG_NO_POWER_OFF	(1 << 0)
 #define PM_QOS_FLAG_REMOTE_WAKEUP	(1 << 1)
 
+enum pm_qos_req_type {
+	PM_QOS_REQ_ALL_CORES = 0,
+	PM_QOS_REQ_AFFINE_CORES,
+	PM_QOS_REQ_AFFINE_IRQ,
+};
+
 struct pm_qos_request {
+	enum pm_qos_req_type type;
+	struct cpumask cpus_affine;
+	uint32_t irq;
+	/* Internal structure members */
+	struct irq_affinity_notify irq_notify;
 	struct plist_node node;
 	int pm_qos_class;
 	struct delayed_work work; /* for pm_qos_update_request_timeout */
@@ -80,6 +93,7 @@ enum pm_qos_type {
 struct pm_qos_constraints {
 	struct plist_head list;
 	s32 target_value;	/* Do not change to 64 bit */
+	s32 target_per_cpu[NR_CPUS];
 	s32 default_value;
 	s32 no_constraint_value;
 	enum pm_qos_type type;
@@ -127,6 +141,8 @@ void pm_qos_update_request_timeout(struct pm_qos_request *req,
 void pm_qos_remove_request(struct pm_qos_request *req);
 
 int pm_qos_request(int pm_qos_class);
+int pm_qos_request_for_cpu(int pm_qos_class, int cpu);
+int pm_qos_request_for_cpumask(int pm_qos_class, struct cpumask *mask);
 int pm_qos_add_notifier(int pm_qos_class, struct notifier_block *notifier);
 int pm_qos_remove_notifier(int pm_qos_class, struct notifier_block *notifier);
 int pm_qos_request_active(struct pm_qos_request *req);
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index d0b9c0f..92d8521 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -41,6 +41,9 @@
 #include <linux/platform_device.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
+#include <linux/delay.h>
 
 #include <linux/uaccess.h>
 #include <linux/export.h>
@@ -65,6 +68,8 @@ static BLOCKING_NOTIFIER_HEAD(cpu_dma_lat_notifier);
 static struct pm_qos_constraints cpu_dma_constraints = {
 	.list = PLIST_HEAD_INIT(cpu_dma_constraints.list),
 	.target_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
+	.target_per_cpu = { [0 ... (NR_CPUS - 1)] =
+				PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE },
 	.default_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
 	.no_constraint_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
 	.type = PM_QOS_MIN,
@@ -79,6 +84,8 @@ static BLOCKING_NOTIFIER_HEAD(network_lat_notifier);
 static struct pm_qos_constraints network_lat_constraints = {
 	.list = PLIST_HEAD_INIT(network_lat_constraints.list),
 	.target_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
+	.target_per_cpu = { [0 ... (NR_CPUS - 1)] =
+				PM_QOS_NETWORK_LAT_DEFAULT_VALUE },
 	.default_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
 	.no_constraint_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
 	.type = PM_QOS_MIN,
@@ -94,6 +101,8 @@ static BLOCKING_NOTIFIER_HEAD(network_throughput_notifier);
 static struct pm_qos_constraints network_tput_constraints = {
 	.list = PLIST_HEAD_INIT(network_tput_constraints.list),
 	.target_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
+	.target_per_cpu = { [0 ... (NR_CPUS - 1)] =
+				PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE },
 	.default_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
 	.no_constraint_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
 	.type = PM_QOS_MAX,
@@ -157,6 +166,34 @@ static inline void pm_qos_set_value(struct pm_qos_constraints *c, s32 value)
 	c->target_value = value;
 }
 
+static inline void pm_qos_set_value_for_cpus(struct pm_qos_constraints *c)
+{
+	struct pm_qos_request *req = NULL;
+	int cpu;
+	s32 qos_val[NR_CPUS] = { [0 ... (NR_CPUS - 1)] = c->default_value };
+
+	plist_for_each_entry(req, &c->list, node) {
+		for_each_cpu(cpu, &req->cpus_affine) {
+			switch (c->type) {
+			case PM_QOS_MIN:
+				if (qos_val[cpu] > req->node.prio)
+					qos_val[cpu] = req->node.prio;
+				break;
+			case PM_QOS_MAX:
+				if (req->node.prio > qos_val[cpu])
+					qos_val[cpu] = req->node.prio;
+				break;
+			default:
+				BUG();
+				break;
+			}
+		}
+	}
+
+	for_each_possible_cpu(cpu)
+		c->target_per_cpu[cpu] = qos_val[cpu];
+}
+
 /**
  * pm_qos_update_target - manages the constraints list and calls the notifiers
  *  if needed
@@ -206,6 +243,7 @@ int pm_qos_update_target(struct pm_qos_constraints *c,
 
 	curr_value = pm_qos_get_value(c);
 	pm_qos_set_value(c, curr_value);
+	pm_qos_set_value_for_cpus(c);
 
 	spin_unlock_irqrestore(&pm_qos_lock, flags);
 
@@ -298,6 +336,44 @@ int pm_qos_request(int pm_qos_class)
 }
 EXPORT_SYMBOL_GPL(pm_qos_request);
 
+int pm_qos_request_for_cpu(int pm_qos_class, int cpu)
+{
+	return pm_qos_array[pm_qos_class]->constraints->target_per_cpu[cpu];
+}
+EXPORT_SYMBOL(pm_qos_request_for_cpu);
+
+int pm_qos_request_for_cpumask(int pm_qos_class, struct cpumask *mask)
+{
+	unsigned long irqflags;
+	int cpu;
+	struct pm_qos_constraints *c = NULL;
+	int val;
+
+	spin_lock_irqsave(&pm_qos_lock, irqflags);
+	c = pm_qos_array[pm_qos_class]->constraints;
+	val = c->default_value;
+
+	for_each_cpu(cpu, mask) {
+		switch (c->type) {
+		case PM_QOS_MIN:
+			if (c->target_per_cpu[cpu] < val)
+				val = c->target_per_cpu[cpu];
+			break;
+		case PM_QOS_MAX:
+			if (c->target_per_cpu[cpu] > val)
+				val = c->target_per_cpu[cpu];
+			break;
+		default:
+			BUG();
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&pm_qos_lock, irqflags);
+
+	return val;
+}
+EXPORT_SYMBOL(pm_qos_request_for_cpumask);
+
 int pm_qos_request_active(struct pm_qos_request *req)
 {
 	return req->pm_qos_class != 0;
@@ -330,6 +406,35 @@ static void pm_qos_work_fn(struct work_struct *work)
 	__pm_qos_update_request(req, PM_QOS_DEFAULT_VALUE);
 }
 
+static void pm_qos_irq_release(struct kref *ref)
+{
+	unsigned long flags;
+	struct irq_affinity_notify *notify = container_of(ref,
+			struct irq_affinity_notify, kref);
+	struct pm_qos_request *req = container_of(notify,
+			struct pm_qos_request, irq_notify);
+
+	spin_lock_irqsave(&pm_qos_lock, flags);
+	cpumask_clear(&req->cpus_affine);
+	spin_unlock_irqrestore(&pm_qos_lock, flags);
+}
+
+static void pm_qos_irq_notify(struct irq_affinity_notify *notify,
+		const cpumask_t *mask)
+{
+	unsigned long flags;
+	struct pm_qos_request *req = container_of(notify,
+			struct pm_qos_request, irq_notify);
+	struct pm_qos_constraints *c =
+		pm_qos_array[req->pm_qos_class]->constraints;
+
+	spin_lock_irqsave(&pm_qos_lock, flags);
+	cpumask_copy(&req->cpus_affine, mask);
+	spin_unlock_irqrestore(&pm_qos_lock, flags);
+
+	pm_qos_update_target(c, req, PM_QOS_UPDATE_REQ, req->node.prio);
+}
+
 /**
  * pm_qos_add_request - inserts new qos request into the list
  * @req: pointer to a preallocated handle
@@ -353,6 +458,51 @@ void pm_qos_add_request(struct pm_qos_request *req,
 		WARN(1, KERN_ERR "pm_qos_add_request() called for already added request\n");
 		return;
 	}
+
+	switch (req->type) {
+	case PM_QOS_REQ_AFFINE_CORES:
+		if (cpumask_empty(&req->cpus_affine)) {
+			req->type = PM_QOS_REQ_ALL_CORES;
+			cpumask_setall(&req->cpus_affine);
+			WARN(1, KERN_ERR "Affine cores not set for request with affinity flag\n");
+		}
+		break;
+
+	case PM_QOS_REQ_AFFINE_IRQ:
+		if (irq_can_set_affinity(req->irq)) {
+			int ret = 0;
+			struct irq_desc *desc = irq_to_desc(req->irq);
+			struct cpumask *mask = desc->irq_data.affinity;
+
+			/* Get the current affinity */
+			cpumask_copy(&req->cpus_affine, mask);
+			req->irq_notify.irq = req->irq;
+			req->irq_notify.notify = pm_qos_irq_notify;
+			req->irq_notify.release = pm_qos_irq_release;
+
+			ret = irq_set_affinity_notifier(req->irq,
+					&req->irq_notify);
+			if (ret) {
+				WARN(1, KERN_ERR "IRQ affinity notify set failed\n");
+				req->type = PM_QOS_REQ_ALL_CORES;
+				cpumask_setall(&req->cpus_affine);
+			}
+		} else {
+			req->type = PM_QOS_REQ_ALL_CORES;
+			cpumask_setall(&req->cpus_affine);
+			WARN(1, KERN_ERR "IRQ-%d not set for request with affinity flag\n",
+					req->irq);
+		}
+		break;
+
+	default:
+		WARN(1, KERN_ERR "Unknown request type %d\n", req->type);
+		/* fall through */
+	case PM_QOS_REQ_ALL_CORES:
+		cpumask_setall(&req->cpus_affine);
+		break;
+	}
+
 	req->pm_qos_class = pm_qos_class;
 	INIT_DELAYED_WORK(&req->work, pm_qos_work_fn);
 	trace_pm_qos_add_request(pm_qos_class, value);
@@ -426,10 +576,16 @@ void pm_qos_update_request_timeout(struct pm_qos_request *req, s32 new_value,
  */
 void pm_qos_remove_request(struct pm_qos_request *req)
 {
+	int count = 10;
+
 	if (!req) /*guard against callers passing in null */
 		return;
 		/* silent return to keep pcm code cleaner */
 
+	/* Remove ourselves from the irq notification */
+	if (req->type == PM_QOS_REQ_AFFINE_IRQ)
+		irq_release_affinity_notifier(&req->irq_notify);
+
 	if (!pm_qos_request_active(req)) {
 		WARN(1, KERN_ERR "pm_qos_remove_request() called for unknown object\n");
 		return;
@@ -441,6 +597,20 @@ void pm_qos_remove_request(struct pm_qos_request *req)
 	pm_qos_update_target(pm_qos_array[req->pm_qos_class]->constraints,
 			     req, PM_QOS_REMOVE_REQ,
 			     PM_QOS_DEFAULT_VALUE);
+
+	/**
+	 * The 'release' callback of the notifier would not be called unless
+	 * there are no active users of the irq_notify object, i.e, kref count
+	 * is non-zero. This could happen if there is an active 'notify'
+	 * callback happening while the pm_qos_remove request is called. Wait
+	 * until the release callback clears the cpus_affine mask.
+	 */
+	if (req->type == PM_QOS_REQ_AFFINE_IRQ) {
+		while (!cpumask_empty(&req->cpus_affine) && count--)
+			msleep(50);
+		BUG_ON(!count);
+	}
+
 	memset(req, 0, sizeof(*req));
 }
 EXPORT_SYMBOL_GPL(pm_qos_remove_request);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS
  2014-07-25 16:55 [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS Lina Iyer
                   ` (2 preceding siblings ...)
  2014-07-25 16:55 ` [RFC] [PATCH 3/3] QoS: Enhance framework to support cpu/irq specific QoS requests Lina Iyer
@ 2014-08-01 11:54 ` Daniel Lezcano
  3 siblings, 0 replies; 9+ messages in thread
From: Daniel Lezcano @ 2014-08-01 11:54 UTC (permalink / raw)
  To: Lina Iyer, linux-pm; +Cc: linus.walleij, arnd.bergmann, rjw, tglx

On 07/25/2014 06:55 PM, Lina Iyer wrote:
> This series of patches adds a new feature to allow per-cpu PM QoS.

Hi Lina,

this description lacks in the reason of why allowing per cpu PM QoS is 
needed. Could you give the context of the proposed feature ?

> The first of the patch, modifies the irq manager to allow multiple clients for
> IRQ SMP affinity change notification. Today, only one client can register a
> notification per IRQ. The PM QoS framework is also now interested in knowing
> when the SMP affinity changes for an IRQ.

Why ?

> With the current implementation, a
> second registration on the change notification releases the current
> notification callbacks and registers the new one. Modify the notification
> mechanism to use a list for notification instead of single data structure.

May be this is not necessary if the notification is always needed. Even 
if that breaks the code encapsulation, invoking directly a pm_qos 
function when the irq is changed, in addition of the notification, 
should suffice instead of adding a list of notifications with all the 
races that implies.

> Also, a client that wants to de-register from the notification will now need to
> call a separate API instead of the overloaded function call with a NULL
> argument.
>
> The next two patches re-organize PM QoS framework to allow QoS and the Dev PM
> QoS frameworks to specify a request type. Most requestors of PM QoS do not know
> or care about the CPU(s) the QoS needs to be effected. In many cases, it is
> still desirable to have the QoS apply on all available cpus. However, in
> conjunction with an IRQ balancer or a driver that has specific cpu(s)
> requirement for its use, can specify a QoS request only for that set of cpus.
> For example in a case, where a certain IRQ might need a performance QoS, but
> does not want to affect the general power consumption of all the cpus in the
> system, can specify an QoS request thats affine only to that cpu(s), where the
> IRQ can be triggered.
>
> The change adds ability to specify the PM QoS request types and two new PM QoS
> request types in addition to the default that applies to all cpus.
>
> PM_QOS_REQ_AFFINE_CORES: This allows drivers to specify a certain set of cpus
> that the request should be applied on.
>
> PM_QOS_REQ_AFFINE_IRQ: This allows drivers to specify an IRQ to which the QoS
> request can be tracked with. This uses the IRQ SMP affinity notification to set
> the cpumask of the affected cpus internally.
>
> The request defaults to PM_QOS_REQ_ALL_CORES when not explicitly specified and
> applies the request to all cpus.
>
> There is also a provision to read the QoS request value for a constraint, for a
> constraint for a set of cpus or just a cpu. CPUIdle governors can use this
> feature to get a QoS request for the cpu(s) they are interested in.

IMHO, it would be desirable first to implement a per cpu pm_qos without 
adding any extra new request or features, so we can comment it focused 
on how to do that.

> Lina Iyer (3):
>    irq: Allow multiple clients to register for irq affinity notification
>    QoS: Modify data structures and function arguments for scalability.
>    QoS: Enhance framework to support cpu/irq specific QoS requests
>
>   Documentation/power/pm_qos_interface.txt |  18 +++
>   drivers/base/power/qos.c                 |  14 +--
>   drivers/infiniband/hw/qib/qib_iba7322.c  |   4 +-
>   include/linux/interrupt.h                |  12 +-
>   include/linux/irq.h                      |   1 +
>   include/linux/irqdesc.h                  |   6 +-
>   include/linux/pm_qos.h                   |  23 +++-
>   kernel/irq/irqdesc.c                     |   1 +
>   kernel/irq/manage.c                      |  77 ++++++++-----
>   kernel/power/qos.c                       | 184 ++++++++++++++++++++++++++++++-
>   lib/cpu_rmap.c                           |   2 +-
>   11 files changed, 289 insertions(+), 53 deletions(-)
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/3] irq: Allow multiple clients to register for irq affinity notification
  2014-07-25 16:55 ` [RFC] [PATCH 1/3] irq: Allow multiple clients to register for irq affinity notification Lina Iyer
@ 2014-08-01 14:48   ` Daniel Lezcano
  2014-08-01 15:26     ` Lina Iyer
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Lezcano @ 2014-08-01 14:48 UTC (permalink / raw)
  To: Lina Iyer, linux-pm; +Cc: linus.walleij, arnd.bergmann, rjw, tglx

On 07/25/2014 06:55 PM, Lina Iyer wrote:
> The current implementation allows only a single notification callback
> whenever the IRQ's SMP affinity is changed. Adding a second notification
> punts the existing notifier function out of registration.
>
> Add a list of notifiers, allowing multiple clients to register for irq
> affinity notifications.

As commented before, you should explain why this is needed.

Perhaps, if the patchset is inverted that will be easier to introduce.

1. per cpu pm_qos (code changes to prepare the new features but with the 
same functionalities)

2. irq affinity change notification

3. new pm_qos requests

I am not a big fan for using the lists, there isn't another solution ?

> Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> ---
>   drivers/infiniband/hw/qib/qib_iba7322.c |  4 +-
>   include/linux/interrupt.h               | 12 ++++-
>   include/linux/irq.h                     |  1 +
>   include/linux/irqdesc.h                 |  6 ++-
>   kernel/irq/irqdesc.c                    |  1 +
>   kernel/irq/manage.c                     | 77 ++++++++++++++++++++-------------
>   lib/cpu_rmap.c                          |  2 +-
>   7 files changed, 66 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
> index a7eb325..62cb77d 100644
> --- a/drivers/infiniband/hw/qib/qib_iba7322.c
> +++ b/drivers/infiniband/hw/qib/qib_iba7322.c
> @@ -3345,9 +3345,7 @@ static void reset_dca_notifier(struct qib_devdata *dd, struct qib_msix_entry *m)
>   		"Disabling notifier on HCA %d irq %d\n",
>   		dd->unit,
>   		m->msix.vector);
> -	irq_set_affinity_notifier(
> -		m->msix.vector,
> -		NULL);
> +	irq_release_affinity_notifier(m->notifier);
>   	m->notifier = NULL;
>   }
>
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 698ad05..c1e227c 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -203,7 +203,7 @@ static inline int check_wakeup_irqs(void) { return 0; }
>    * struct irq_affinity_notify - context for notification of IRQ affinity changes
>    * @irq:		Interrupt to which notification applies
>    * @kref:		Reference count, for internal use
> - * @work:		Work item, for internal use
> + * @list:		Add to the notifier list, for internal use
>    * @notify:		Function to be called on change.  This will be
>    *			called in process context.
>    * @release:		Function to be called on release.  This will be
> @@ -214,7 +214,7 @@ static inline int check_wakeup_irqs(void) { return 0; }
>   struct irq_affinity_notify {
>   	unsigned int irq;
>   	struct kref kref;
> -	struct work_struct work;
> +	struct list_head list;
>   	void (*notify)(struct irq_affinity_notify *, const cpumask_t *mask);
>   	void (*release)(struct kref *ref);
>   };
> @@ -265,6 +265,8 @@ extern int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m);
>   extern int
>   irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify);
>
> +extern int
> +irq_release_affinity_notifier(struct irq_affinity_notify *notify);
>   #else /* CONFIG_SMP */
>
>   static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
> @@ -295,6 +297,12 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
>   {
>   	return 0;
>   }
> +
> +static inline int
> +irq_release_affinity_notifier(struct irq_affinity_notify *notify)
> +{
> +	return 0;
> +}
>   #endif /* CONFIG_SMP */
>
>   /*
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index 62af592..2634a48 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -20,6 +20,7 @@
>   #include <linux/errno.h>
>   #include <linux/topology.h>
>   #include <linux/wait.h>
> +#include <linux/list.h>
>
>   #include <asm/irq.h>
>   #include <asm/ptrace.h>
> diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
> index 472c021..db3509e 100644
> --- a/include/linux/irqdesc.h
> +++ b/include/linux/irqdesc.h
> @@ -31,7 +31,8 @@ struct irq_desc;
>    * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
>    * @lock:		locking for SMP
>    * @affinity_hint:	hint to user space for preferred irq affinity
> - * @affinity_notify:	context for notification of affinity changes
> + * @affinity_notify:	list of notification clients for affinity changes
> + * @affinity_work:	Work queue for handling affinity change notifications
>    * @pending_mask:	pending rebalanced interrupts
>    * @threads_oneshot:	bitfield to handle shared oneshot threads
>    * @threads_active:	number of irqaction threads currently running
> @@ -60,7 +61,8 @@ struct irq_desc {
>   	struct cpumask		*percpu_enabled;
>   #ifdef CONFIG_SMP
>   	const struct cpumask	*affinity_hint;
> -	struct irq_affinity_notify *affinity_notify;
> +	struct list_head	affinity_notify;
> +	struct work_struct	affinity_work;
>   #ifdef CONFIG_GENERIC_PENDING_IRQ
>   	cpumask_var_t		pending_mask;
>   #endif
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 1487a12..c95e1f3 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -91,6 +91,7 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
>   	for_each_possible_cpu(cpu)
>   		*per_cpu_ptr(desc->kstat_irqs, cpu) = 0;
>   	desc_smp_init(desc, node);
> +	INIT_LIST_HEAD(&desc->affinity_notify);
>   }
>
>   int nr_irqs = NR_IRQS;
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 88657d7..cd7fc48 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -209,10 +209,9 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask,
>   		irq_copy_pending(desc, mask);
>   	}
>
> -	if (desc->affinity_notify) {
> -		kref_get(&desc->affinity_notify->kref);
> -		schedule_work(&desc->affinity_notify->work);
> -	}

Could you explain the kref change ?

> +	if (!list_empty(&desc->affinity_notify))
> +		schedule_work(&desc->affinity_work);
> +
>   	irqd_set(data, IRQD_AFFINITY_SET);
>
>   	return ret;
> @@ -248,14 +247,14 @@ EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
>
>   static void irq_affinity_notify(struct work_struct *work)
>   {
> -	struct irq_affinity_notify *notify =
> -		container_of(work, struct irq_affinity_notify, work);
> -	struct irq_desc *desc = irq_to_desc(notify->irq);
> +	struct irq_desc *desc =
> +			container_of(work, struct irq_desc, affinity_work);

Could you put cleanups in separate patches ? This is not directly 
related to the change of this patch.

>   	cpumask_var_t cpumask;
>   	unsigned long flags;
> +	struct irq_affinity_notify *notify;
>
>   	if (!desc || !alloc_cpumask_var(&cpumask, GFP_KERNEL))
> -		goto out;
> +		return;
>
>   	raw_spin_lock_irqsave(&desc->lock, flags);
>   	if (irq_move_pending(&desc->irq_data))
> @@ -264,11 +263,20 @@ static void irq_affinity_notify(struct work_struct *work)
>   		cpumask_copy(cpumask, desc->irq_data.affinity);
>   	raw_spin_unlock_irqrestore(&desc->lock, flags);
>
> -	notify->notify(notify, cpumask);
> +	list_for_each_entry(notify, &desc->affinity_notify, list) {
> +		/**
> +		 * Check and get the kref only if the kref has not been
> +		 * released by now. Its possible that the reference count
> +		 * is already 0, we dont want to notify those if they are
> +		 * already released.
> +		 */
> +		if (!kref_get_unless_zero(&notify->kref))
> +			continue;
> +		notify->notify(notify, cpumask);
> +		kref_put(&notify->kref, notify->release);
> +	}
>
>   	free_cpumask_var(cpumask);
> -out:
> -	kref_put(&notify->kref, notify->release);
>   }
>
>   /**
> @@ -286,8 +294,6 @@ int
>   irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
>   {
>   	struct irq_desc *desc = irq_to_desc(irq);
> -	struct irq_affinity_notify *old_notify;
> -	unsigned long flags;
>
>   	/* The release function is promised process context */
>   	might_sleep();
> @@ -295,25 +301,37 @@ irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify *notify)
>   	if (!desc)
>   		return -EINVAL;
>
> -	/* Complete initialisation of *notify */
> -	if (notify) {
> -		notify->irq = irq;
> -		kref_init(&notify->kref);
> -		INIT_WORK(&notify->work, irq_affinity_notify);
> +	if (!notify) {
> +		WARN("%s called with NULL notifier - use irq_release_affinity_notifier function instead.\n",
> +				__func__);
> +		return -EINVAL;
>   	}
>
> -	raw_spin_lock_irqsave(&desc->lock, flags);
> -	old_notify = desc->affinity_notify;
> -	desc->affinity_notify = notify;
> -	raw_spin_unlock_irqrestore(&desc->lock, flags);
> -
> -	if (old_notify)
> -		kref_put(&old_notify->kref, old_notify->release);
> +	notify->irq = irq;
> +	kref_init(&notify->kref);
> +	INIT_LIST_HEAD(&notify->list);
> +	list_add(&desc->affinity_notify, &notify->list);

Is this race free ? It is possible there are two callers of the 
irq_set_affinity_notifier or 
irq_set_affinity_notifier/irq_release_affinity_notifier, no ?

>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(irq_set_affinity_notifier);
>
> +/**
> + *	irq_release_affinity_notifier - Remove us from notifications
> + *	@notify: Context for notification
> + */
> +int irq_release_affinity_notifier(struct irq_affinity_notify *notify)
> +{
> +	if (!notify)
> +		return -EINVAL;
> +
> +	list_del(&notify->list);
> +	kref_put(&notify->kref, notify->release);

Same comment as above.

> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(irq_release_affinity_notifier);
> +
>   #ifndef CONFIG_AUTO_IRQ_AFFINITY
>   /*
>    * Generic version of the affinity autoselector.
> @@ -348,6 +366,8 @@ setup_affinity(unsigned int irq, struct irq_desc *desc, struct cpumask *mask)
>   		if (cpumask_intersects(mask, nodemask))
>   			cpumask_and(mask, mask, nodemask);
>   	}
> +	INIT_LIST_HEAD(&desc->affinity_notify);
> +	INIT_WORK(&desc->affinity_work, irq_affinity_notify);
>   	irq_do_set_affinity(&desc->irq_data, mask, false);
>   	return 0;
>   }
> @@ -1414,14 +1434,13 @@ EXPORT_SYMBOL_GPL(remove_irq);
>   void free_irq(unsigned int irq, void *dev_id)
>   {
>   	struct irq_desc *desc = irq_to_desc(irq);
> +	struct irq_affinity_notify *notify;
>
>   	if (!desc || WARN_ON(irq_settings_is_per_cpu_devid(desc)))
>   		return;
>
> -#ifdef CONFIG_SMP
> -	if (WARN_ON(desc->affinity_notify))
> -		desc->affinity_notify = NULL;
> -#endif
> +	list_for_each_entry(notify, &desc->affinity_notify, list)
> +		kref_put(&notify->kref, notify->release);

Same comment as above and probably you should keep the warning.

>   	chip_bus_lock(desc);
>   	kfree(__free_irq(irq, dev_id));
> diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c
> index 4f134d8..0c8da50 100644
> --- a/lib/cpu_rmap.c
> +++ b/lib/cpu_rmap.c
> @@ -235,7 +235,7 @@ void free_irq_cpu_rmap(struct cpu_rmap *rmap)
>
>   	for (index = 0; index < rmap->used; index++) {
>   		glue = rmap->obj[index];
> -		irq_set_affinity_notifier(glue->notify.irq, NULL);
> +		irq_release_affinity_notifier(&glue->notify);
>   	}
>
>   	cpu_rmap_put(rmap);
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/3] irq: Allow multiple clients to register for irq affinity notification
  2014-08-01 14:48   ` Daniel Lezcano
@ 2014-08-01 15:26     ` Lina Iyer
  0 siblings, 0 replies; 9+ messages in thread
From: Lina Iyer @ 2014-08-01 15:26 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: linus.walleij, arnd.bergmann, rjw, tglx, linux-pm

On Fri Aug  1 08:48:13 2014, Daniel Lezcano wrote:
> As commented before, you should explain why this is needed.
>
> Perhaps, if the patchset is inverted that will be easier to introduce.
>
> 1. per cpu pm_qos (code changes to prepare the new features but with
> the same functionalities)
>
> 2. irq affinity change notification
>
> 3. new pm_qos requests
>
> I am not a big fan for using the lists, there isn't another solution ?
I will reorganize the patches. Even in the absence of the PM QoS 
changes, the IRQ affinity notifier changes makes sense, just to break 
the assumption that there could possibly be only one notifier callback.

I will see what other framework suits better than the list here. Its 
very easy to reuse existing data structure with lists.


>> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
>> index 88657d7..cd7fc48 100644
>> --- a/kernel/irq/manage.c
>> +++ b/kernel/irq/manage.c
>> @@ -209,10 +209,9 @@ int irq_set_affinity_locked(struct irq_data
>> *data, const struct cpumask *mask,
>>           irq_copy_pending(desc, mask);
>>       }
>>
>> -    if (desc->affinity_notify) {
>> -        kref_get(&desc->affinity_notify->kref);
>> -        schedule_work(&desc->affinity_notify->work);
>> -    }
>
> Could you explain the kref change ?
Since there was only one possible notification object, the notifier 
object had a kref, that would help release the notifier object if 
another object register or the IRQ desc went away. The kref object also 
helped the case, where a work was scheduled to notify and the driver 
tried to de-register the notification.
I have not changed the place of the kref.
But since, there could be multiple notifications sent out, doing a 
kref_get before schedule does not make much sense. Which object do we 
use to do a kref_get()? Hence the kref_get_unless_zero(), which ensures 
that I notify only the drivers that still have a registered (kref count 
 > 0) notifier callback.

>
>> +    if (!list_empty(&desc->affinity_notify))
>> +        schedule_work(&desc->affinity_work);
>> +
>>       irqd_set(data, IRQD_AFFINITY_SET);
>>
>>       return ret;
>> @@ -248,14 +247,14 @@ EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
>>
>>   static void irq_affinity_notify(struct work_struct *work)
>>   {
>> -    struct irq_affinity_notify *notify =
>> -        container_of(work, struct irq_affinity_notify, work);
>> -    struct irq_desc *desc = irq_to_desc(notify->irq);
>> +    struct irq_desc *desc =
>> +            container_of(work, struct irq_desc, affinity_work);
>
> Could you put cleanups in separate patches ? This is not directly
> related to the change of this patch.
It is directly related to this patch. The current code uses a single 
irq_affinity_notify object, so it can get a notify object from the 
parent of the work object and thereby the irq_desc object from the irq 
number. I had to use another way to get the same irq_desc object.


>> -    if (old_notify)
>> -        kref_put(&old_notify->kref, old_notify->release);
>> +    notify->irq = irq;
>> +    kref_init(&notify->kref);
>> +    INIT_LIST_HEAD(&notify->list);
>> +    list_add(&desc->affinity_notify, &notify->list);
>
> Is this race free ? It is possible there are two callers of the
> irq_set_affinity_notifier or
> irq_set_affinity_notifier/irq_release_affinity_notifier, no ?
Hmm... You are right. May need a lock around the list.


>> +    list_del(&notify->list);
>> +    kref_put(&notify->kref, notify->release);
>
> Same comment as above.
Lock may be needed.

>> -#endif
>> +    list_for_each_entry(notify, &desc->affinity_notify, list)
>> +        kref_put(&notify->kref, notify->release);
>
> Same comment as above and probably you should keep the warning.
Okay.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 3/3] QoS: Enhance framework to support cpu/irq specific QoS requests
  2014-07-25 16:55 ` [RFC] [PATCH 3/3] QoS: Enhance framework to support cpu/irq specific QoS requests Lina Iyer
@ 2014-08-01 15:58   ` Daniel Lezcano
  2014-08-01 17:11     ` Lina Iyer
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Lezcano @ 2014-08-01 15:58 UTC (permalink / raw)
  To: Lina Iyer, linux-pm
  Cc: linus.walleij, arnd.bergmann, rjw, tglx, Praveen Chidambaram

On 07/25/2014 06:55 PM, Lina Iyer wrote:
> QoS request for CPU_DMA_LATENCY can be better optimized if the request
> can be set only for the required cpus and not all cpus. This helps save
> power on other cores, while still gauranteeing the quality of service.
>
> Enhance the QoS constraints data structures to support target value for
> each core. Requests specify if the QoS is applicable to all cores
> (default) or to a selective subset of the cores or to a core(s), that
> the IRQ is affine to.
>
> QoS requests that need to track an IRQ can be set to apply only on the
> cpus to which the IRQ's smp_affinity attribute is set to. The QoS
> framework will automatically track IRQ migration between the cores. The
> QoS is updated to be applied only to the core(s) that the IRQ has been
> migrated to.
>
> Idle and interested drivers can request a PM QoS value for a constraint
> across all cpus, or a specific cpu or a set of cpus. Separate APIs have
> been added to request for individual cpu or a cpumask.  The default
> behaviour of PM QoS is maintained i.e, requests that do not specify a
> type of the request will continue to be effected on all cores.  Requests
> that want to specify an affinity of cpu(s) or an irq, can modify the PM
> QoS request data structures by specifying the type of the request and
> either the mask of the cpus or the IRQ number depending on the type.
> Updating the request does not reset the type of the request.
>
> The userspace sysfs interface does not support CPU/IRQ affinity.
>
> Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>
> Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> ---
>   Documentation/power/pm_qos_interface.txt |  18 ++++
>   include/linux/pm_qos.h                   |  16 +++
>   kernel/power/qos.c                       | 170 +++++++++++++++++++++++++++++++
>   3 files changed, 204 insertions(+)
>
> diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt
> index a5da5c7..32b864d 100644
> --- a/Documentation/power/pm_qos_interface.txt
> +++ b/Documentation/power/pm_qos_interface.txt
> @@ -41,6 +41,17 @@ registered notifiers are called only if the target value is now different.
>   Clients of pm_qos need to save the returned handle for future use in other
>   pm_qos API functions.
>
> +The handle is a pm_qos_request object. By default the request object sets the
> +request type to PM_QOS_REQ_ALL_CORES, in which case, the PM QoS request
> +applies to all cores. However, the driver can also specify a request type to
> +be either of
> +        PM_QOS_REQ_ALL_CORES,
> +        PM_QOS_REQ_AFFINE_CORES,
> +        PM_QOS_REQ_AFFINE_IRQ,
> +
> +Specify the cpumask when type is set to PM_QOS_REQ_AFFINE_CORES and specify
> +the IRQ number with PM_QOS_REQ_AFFINE_IRQ.
> +
>   void pm_qos_update_request(handle, new_target_value):
>   Will update the list element pointed to by the handle with the new target value
>   and recompute the new aggregated target, calling the notification tree if the
> @@ -54,6 +65,13 @@ the request.
>   int pm_qos_request(param_class):
>   Returns the aggregated value for a given PM QoS class.
>
> +int pm_qos_request_for_cpu(param_class, cpu):
> +Returns the aggregated value for a given PM QoS class for the specified cpu.
> +
> +int pm_qos_request_for_cpumask(param_class, cpumask):
> +Returns the aggregated value for a given PM QoS class for the specified
> +cpumask.
> +

You can get rid of 'pm_qos_request_for_cpu' because that will be easy to 
call 'pm_qos_request_for_cpumask' with cpumask_of(cpu).

IMO, you should split this patch into several. First bring per cpu 
without adding new apis and then bring one by one the different changes.

>   int pm_qos_request_active(handle):
>   Returns if the request is still active, i.e. it has not been removed from a
>   PM QoS class constraints list.
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index e1b763d..05d7c46 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -9,6 +9,8 @@
>   #include <linux/miscdevice.h>
>   #include <linux/device.h>
>   #include <linux/workqueue.h>
> +#include <linux/cpumask.h>
> +#include <linux/interrupt.h>
>
>   enum {
>   	PM_QOS_RESERVED = 0,
> @@ -40,7 +42,18 @@ enum pm_qos_flags_status {
>   #define PM_QOS_FLAG_NO_POWER_OFF	(1 << 0)
>   #define PM_QOS_FLAG_REMOTE_WAKEUP	(1 << 1)
>
> +enum pm_qos_req_type {
> +	PM_QOS_REQ_ALL_CORES = 0,
> +	PM_QOS_REQ_AFFINE_CORES,
> +	PM_QOS_REQ_AFFINE_IRQ,
> +};
> +
>   struct pm_qos_request {
> +	enum pm_qos_req_type type;
> +	struct cpumask cpus_affine;
> +	uint32_t irq;
> +	/* Internal structure members */
> +	struct irq_affinity_notify irq_notify;
>   	struct plist_node node;
>   	int pm_qos_class;
>   	struct delayed_work work; /* for pm_qos_update_request_timeout */
> @@ -80,6 +93,7 @@ enum pm_qos_type {
>   struct pm_qos_constraints {
>   	struct plist_head list;
>   	s32 target_value;	/* Do not change to 64 bit */
> +	s32 target_per_cpu[NR_CPUS];
>   	s32 default_value;
>   	s32 no_constraint_value;
>   	enum pm_qos_type type;
> @@ -127,6 +141,8 @@ void pm_qos_update_request_timeout(struct pm_qos_request *req,
>   void pm_qos_remove_request(struct pm_qos_request *req);
>
>   int pm_qos_request(int pm_qos_class);
> +int pm_qos_request_for_cpu(int pm_qos_class, int cpu);
> +int pm_qos_request_for_cpumask(int pm_qos_class, struct cpumask *mask);
>   int pm_qos_add_notifier(int pm_qos_class, struct notifier_block *notifier);
>   int pm_qos_remove_notifier(int pm_qos_class, struct notifier_block *notifier);
>   int pm_qos_request_active(struct pm_qos_request *req);
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index d0b9c0f..92d8521 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -41,6 +41,9 @@
>   #include <linux/platform_device.h>
>   #include <linux/init.h>
>   #include <linux/kernel.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
> +#include <linux/delay.h>
>
>   #include <linux/uaccess.h>
>   #include <linux/export.h>
> @@ -65,6 +68,8 @@ static BLOCKING_NOTIFIER_HEAD(cpu_dma_lat_notifier);
>   static struct pm_qos_constraints cpu_dma_constraints = {
>   	.list = PLIST_HEAD_INIT(cpu_dma_constraints.list),
>   	.target_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> +	.target_per_cpu = { [0 ... (NR_CPUS - 1)] =
> +				PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE },
>   	.default_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
>   	.no_constraint_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
>   	.type = PM_QOS_MIN,
> @@ -79,6 +84,8 @@ static BLOCKING_NOTIFIER_HEAD(network_lat_notifier);
>   static struct pm_qos_constraints network_lat_constraints = {
>   	.list = PLIST_HEAD_INIT(network_lat_constraints.list),
>   	.target_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> +	.target_per_cpu = { [0 ... (NR_CPUS - 1)] =
> +				PM_QOS_NETWORK_LAT_DEFAULT_VALUE },
>   	.default_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
>   	.no_constraint_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
>   	.type = PM_QOS_MIN,
> @@ -94,6 +101,8 @@ static BLOCKING_NOTIFIER_HEAD(network_throughput_notifier);
>   static struct pm_qos_constraints network_tput_constraints = {
>   	.list = PLIST_HEAD_INIT(network_tput_constraints.list),
>   	.target_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> +	.target_per_cpu = { [0 ... (NR_CPUS - 1)] =
> +				PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE },
>   	.default_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
>   	.no_constraint_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
>   	.type = PM_QOS_MAX,
> @@ -157,6 +166,34 @@ static inline void pm_qos_set_value(struct pm_qos_constraints *c, s32 value)
>   	c->target_value = value;
>   }

Wouldn't make sense to create a per_cpu variable pm qos constraints 
instead of creating an array of cpus ?

> +static inline void pm_qos_set_value_for_cpus(struct pm_qos_constraints *c)
> +{
> +	struct pm_qos_request *req = NULL;
> +	int cpu;
> +	s32 qos_val[NR_CPUS] = { [0 ... (NR_CPUS - 1)] = c->default_value };

The kernel stack size is small, IIRC, 2xPAGE_SIZE. NR_CPUS could be big 
(1024). That may lead to a stack overflow on some systems.

> +
> +	plist_for_each_entry(req, &c->list, node) {
> +		for_each_cpu(cpu, &req->cpus_affine) {
> +			switch (c->type) {
> +			case PM_QOS_MIN:
> +				if (qos_val[cpu] > req->node.prio)
> +					qos_val[cpu] = req->node.prio;
> +				break;
> +			case PM_QOS_MAX:
> +				if (req->node.prio > qos_val[cpu])
> +					qos_val[cpu] = req->node.prio;
> +				break;
> +			default:
> +				BUG();
> +				break;
> +			}
> +		}
> +	}
> +
> +	for_each_possible_cpu(cpu)
> +		c->target_per_cpu[cpu] = qos_val[cpu];
> +}
> +
>   /**
>    * pm_qos_update_target - manages the constraints list and calls the notifiers
>    *  if needed
> @@ -206,6 +243,7 @@ int pm_qos_update_target(struct pm_qos_constraints *c,
>
>   	curr_value = pm_qos_get_value(c);
>   	pm_qos_set_value(c, curr_value);
> +	pm_qos_set_value_for_cpus(c);
>
>   	spin_unlock_irqrestore(&pm_qos_lock, flags);
>
> @@ -298,6 +336,44 @@ int pm_qos_request(int pm_qos_class)
>   }
>   EXPORT_SYMBOL_GPL(pm_qos_request);
>
> +int pm_qos_request_for_cpu(int pm_qos_class, int cpu)
> +{
> +	return pm_qos_array[pm_qos_class]->constraints->target_per_cpu[cpu];
> +}
> +EXPORT_SYMBOL(pm_qos_request_for_cpu);
> +
> +int pm_qos_request_for_cpumask(int pm_qos_class, struct cpumask *mask)
> +{
> +	unsigned long irqflags;
> +	int cpu;
> +	struct pm_qos_constraints *c = NULL;
> +	int val;
> +
> +	spin_lock_irqsave(&pm_qos_lock, irqflags);
> +	c = pm_qos_array[pm_qos_class]->constraints;
> +	val = c->default_value;
> +
> +	for_each_cpu(cpu, mask) {
> +		switch (c->type) {
> +		case PM_QOS_MIN:
> +			if (c->target_per_cpu[cpu] < val)
> +				val = c->target_per_cpu[cpu];
> +			break;
> +		case PM_QOS_MAX:
> +			if (c->target_per_cpu[cpu] > val)
> +				val = c->target_per_cpu[cpu];
> +			break;
> +		default:
> +			BUG();
> +			break;
> +		}
> +	}
> +	spin_unlock_irqrestore(&pm_qos_lock, irqflags);
> +
> +	return val;
> +}
> +EXPORT_SYMBOL(pm_qos_request_for_cpumask);
> +
>   int pm_qos_request_active(struct pm_qos_request *req)
>   {
>   	return req->pm_qos_class != 0;
> @@ -330,6 +406,35 @@ static void pm_qos_work_fn(struct work_struct *work)
>   	__pm_qos_update_request(req, PM_QOS_DEFAULT_VALUE);
>   }
>
> +static void pm_qos_irq_release(struct kref *ref)
> +{
> +	unsigned long flags;
> +	struct irq_affinity_notify *notify = container_of(ref,
> +			struct irq_affinity_notify, kref);
> +	struct pm_qos_request *req = container_of(notify,
> +			struct pm_qos_request, irq_notify);
> +
> +	spin_lock_irqsave(&pm_qos_lock, flags);
> +	cpumask_clear(&req->cpus_affine);
> +	spin_unlock_irqrestore(&pm_qos_lock, flags);
> +}
> +
> +static void pm_qos_irq_notify(struct irq_affinity_notify *notify,
> +		const cpumask_t *mask)
> +{
> +	unsigned long flags;
> +	struct pm_qos_request *req = container_of(notify,
> +			struct pm_qos_request, irq_notify);
> +	struct pm_qos_constraints *c =
> +		pm_qos_array[req->pm_qos_class]->constraints;
> +
> +	spin_lock_irqsave(&pm_qos_lock, flags);
> +	cpumask_copy(&req->cpus_affine, mask);
> +	spin_unlock_irqrestore(&pm_qos_lock, flags);
> +
> +	pm_qos_update_target(c, req, PM_QOS_UPDATE_REQ, req->node.prio);
> +}
> +
>   /**
>    * pm_qos_add_request - inserts new qos request into the list
>    * @req: pointer to a preallocated handle
> @@ -353,6 +458,51 @@ void pm_qos_add_request(struct pm_qos_request *req,
>   		WARN(1, KERN_ERR "pm_qos_add_request() called for already added request\n");
>   		return;
>   	}
> +
> +	switch (req->type) {
> +	case PM_QOS_REQ_AFFINE_CORES:
> +		if (cpumask_empty(&req->cpus_affine)) {
> +			req->type = PM_QOS_REQ_ALL_CORES;
> +			cpumask_setall(&req->cpus_affine);
> +			WARN(1, KERN_ERR "Affine cores not set for request with affinity flag\n");
> +		}
> +		break;
> +
> +	case PM_QOS_REQ_AFFINE_IRQ:
> +		if (irq_can_set_affinity(req->irq)) {
> +			int ret = 0;
> +			struct irq_desc *desc = irq_to_desc(req->irq);
> +			struct cpumask *mask = desc->irq_data.affinity;
> +
> +			/* Get the current affinity */
> +			cpumask_copy(&req->cpus_affine, mask);
> +			req->irq_notify.irq = req->irq;
> +			req->irq_notify.notify = pm_qos_irq_notify;
> +			req->irq_notify.release = pm_qos_irq_release;
> +
> +			ret = irq_set_affinity_notifier(req->irq,
> +					&req->irq_notify);
> +			if (ret) {
> +				WARN(1, KERN_ERR "IRQ affinity notify set failed\n");
> +				req->type = PM_QOS_REQ_ALL_CORES;
> +				cpumask_setall(&req->cpus_affine);
> +			}
> +		} else {
> +			req->type = PM_QOS_REQ_ALL_CORES;
> +			cpumask_setall(&req->cpus_affine);
> +			WARN(1, KERN_ERR "IRQ-%d not set for request with affinity flag\n",
> +					req->irq);
> +		}
> +		break;
> +
> +	default:
> +		WARN(1, KERN_ERR "Unknown request type %d\n", req->type);
> +		/* fall through */
> +	case PM_QOS_REQ_ALL_CORES:
> +		cpumask_setall(&req->cpus_affine);
> +		break;
> +	}
> +
>   	req->pm_qos_class = pm_qos_class;
>   	INIT_DELAYED_WORK(&req->work, pm_qos_work_fn);
>   	trace_pm_qos_add_request(pm_qos_class, value);
> @@ -426,10 +576,16 @@ void pm_qos_update_request_timeout(struct pm_qos_request *req, s32 new_value,
>    */
>   void pm_qos_remove_request(struct pm_qos_request *req)
>   {
> +	int count = 10;
> +
>   	if (!req) /*guard against callers passing in null */
>   		return;
>   		/* silent return to keep pcm code cleaner */
>
> +	/* Remove ourselves from the irq notification */
> +	if (req->type == PM_QOS_REQ_AFFINE_IRQ)
> +		irq_release_affinity_notifier(&req->irq_notify);
> +
>   	if (!pm_qos_request_active(req)) {
>   		WARN(1, KERN_ERR "pm_qos_remove_request() called for unknown object\n");
>   		return;
> @@ -441,6 +597,20 @@ void pm_qos_remove_request(struct pm_qos_request *req)
>   	pm_qos_update_target(pm_qos_array[req->pm_qos_class]->constraints,
>   			     req, PM_QOS_REMOVE_REQ,
>   			     PM_QOS_DEFAULT_VALUE);
> +
> +	/**
> +	 * The 'release' callback of the notifier would not be called unless
> +	 * there are no active users of the irq_notify object, i.e, kref count
> +	 * is non-zero. This could happen if there is an active 'notify'
> +	 * callback happening while the pm_qos_remove request is called. Wait
> +	 * until the release callback clears the cpus_affine mask.
> +	 */
> +	if (req->type == PM_QOS_REQ_AFFINE_IRQ) {
> +		while (!cpumask_empty(&req->cpus_affine) && count--)
> +			msleep(50);
> +		BUG_ON(!count);
> +	}

You shouldn't use this approach but a locking mechanism.

>   	memset(req, 0, sizeof(*req));
>   }
>   EXPORT_SYMBOL_GPL(pm_qos_remove_request);
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 3/3] QoS: Enhance framework to support cpu/irq specific QoS requests
  2014-08-01 15:58   ` Daniel Lezcano
@ 2014-08-01 17:11     ` Lina Iyer
  0 siblings, 0 replies; 9+ messages in thread
From: Lina Iyer @ 2014-08-01 17:11 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Lina Iyer, linux-pm, linus.walleij, arnd.bergmann, rjw, tglx,
	Praveen Chidambaram

On Fri, 1 Aug 2014, Daniel Lezcano wrote:

> On 07/25/2014 06:55 PM, Lina Iyer wrote:
>> +int pm_qos_request_for_cpumask(param_class, cpumask):
>> +Returns the aggregated value for a given PM QoS class for the specified
>> +cpumask.
>> +
>
> You can get rid of 'pm_qos_request_for_cpu' because that will be easy to call 
> 'pm_qos_request_for_cpumask' with cpumask_of(cpu).
Sure, but pm_qos_request_for_cpu() returns without locking which the 
-for_cpu_mask() cannot. Is that still desirable?

>
> IMO, you should split this patch into several. First bring per cpu without 
> adding new apis and then bring one by one the different changes.
Will do.

>>   	.no_constraint_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
>>   	.type = PM_QOS_MAX,
>> @@ -157,6 +166,34 @@ static inline void pm_qos_set_value(struct 
>> pm_qos_constraints *c, s32 value)
>>   	c->target_value = value;
>>   }
>
> Wouldn't make sense to create a per_cpu variable pm qos constraints instead 
> of creating an array of cpus ?
Sure. But we most likely would not access it from the respective cpu. But 
can be done.

>
>> +static inline void pm_qos_set_value_for_cpus(struct pm_qos_constraints *c)
>> +{
>> +	struct pm_qos_request *req = NULL;
>> +	int cpu;
>> +	s32 qos_val[NR_CPUS] = { [0 ... (NR_CPUS - 1)] = c->default_value };
>
> The kernel stack size is small, IIRC, 2xPAGE_SIZE. NR_CPUS could be big 
> (1024). That may lead to a stack overflow on some systems.
>
Good point. I will allocate this.

>> +	 * there are no active users of the irq_notify object, i.e, kref 
>> count
>> +	 * is non-zero. This could happen if there is an active 'notify'
>> +	 * callback happening while the pm_qos_remove request is called. Wait
>> +	 * until the release callback clears the cpus_affine mask.
>> +	 */
>> +	if (req->type == PM_QOS_REQ_AFFINE_IRQ) {
>> +		while (!cpumask_empty(&req->cpus_affine) && count--)
>> +			msleep(50);
>> +		BUG_ON(!count);
>> +	}
>
> You shouldn't use this approach but a locking mechanism.
>
Okay, will change.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-08-01 17:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-25 16:55 [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS Lina Iyer
2014-07-25 16:55 ` [RFC] [PATCH 1/3] irq: Allow multiple clients to register for irq affinity notification Lina Iyer
2014-08-01 14:48   ` Daniel Lezcano
2014-08-01 15:26     ` Lina Iyer
2014-07-25 16:55 ` [RFC] [PATCH 2/3] QoS: Modify data structures and function arguments for scalability Lina Iyer
2014-07-25 16:55 ` [RFC] [PATCH 3/3] QoS: Enhance framework to support cpu/irq specific QoS requests Lina Iyer
2014-08-01 15:58   ` Daniel Lezcano
2014-08-01 17:11     ` Lina Iyer
2014-08-01 11:54 ` [RFC] [PATCH 0/3] IRQ affinity notifier and per-cpu PM QoS Daniel Lezcano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.