All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/4] acpi: cppc optimization patches
@ 2016-02-10 20:05 Prashanth Prakash
  2016-02-10 20:05 ` [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations Prashanth Prakash
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Prashanth Prakash @ 2016-02-10 20:05 UTC (permalink / raw)
  To: rjw
  Cc: linux-acpi, linaro-acpi, ashwin.chaugule, alexey.klimov, timur,
	Prashanth Prakash

Changes in V3: Addressed the review comments from Alexey, Ashwin and Timur

Changes in V2: Fixed compilation error on i386

This series introduces changes to reduce the time required to send a frequency
transition requests to the platform while using the cppc-cpufreq driver.

With these changes we see significant improvement in the average time to send
freq. transition request to the platform. Profiling on an ARM platform showed
that the average transaction time per request reduced from 200us to under 20us.
Ashwin Chaugule (1):
  ACPI / CPPC: Optimize PCC Read Write operations

Prashanth Prakash (3):
  acpi: cppc: optimized cpc_read and cpc_write
  mailbox: pcc: optimized pcc_send_data
  acpi: cppc: replace writeX/readX to PCC with relaxed version

 drivers/acpi/cppc_acpi.c | 183 +++++++++++++++++++++++++++++++++++------------
 drivers/mailbox/pcc.c    | 111 ++++++++++++++++++++++++++--
 2 files changed, 241 insertions(+), 53 deletions(-)

-- 
Qualcomm Technologies, Inc. on behalf
of the Qualcomm Innovation Center, Inc.  The Qualcomm Innovation Center, Inc. 
is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-10 20:05 [PATCH V3 0/4] acpi: cppc optimization patches Prashanth Prakash
@ 2016-02-10 20:05 ` Prashanth Prakash
  2016-02-10 20:42   ` Timur Tabi
  2016-02-15 16:37   ` Alexey Klimov
  2016-02-10 20:06 ` [PATCH V3 2/4] acpi: cppc: optimized cpc_read and cpc_write Prashanth Prakash
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 16+ messages in thread
From: Prashanth Prakash @ 2016-02-10 20:05 UTC (permalink / raw)
  To: rjw
  Cc: linux-acpi, linaro-acpi, ashwin.chaugule, alexey.klimov, timur,
	Prashanth Prakash

From: Ashwin Chaugule <ashwin.chaugule@linaro.org>

Previously the send_pcc_cmd() code checked if the
PCC operation had completed before returning from
the function. This check was performed regardless
of the PCC op type (i.e. Read/Write). Knowing
the type of cmd can be used to optimize the check
and avoid needless waiting. e.g. with Write ops,
the actual Writing is done before calling send_pcc_cmd().
And the subsequent Writes will check if the channel is
free at the entry of send_pcc_cmd() anyway.

However, for Read cmds, we need to wait for the cmd
completion bit to be flipped, since the actual Read
ops follow after returning from the send_pcc_cmd(). So,
only do the looping check at the end for Read ops.

Also, instead of using udelay() calls, use ktime as a
means to check for deadlines. The current deadline
in which the Remote should flip the cmd completion bit
is defined as N * Nominal latency. Where N is arbitrary
and large enough to work on slow emulators and Nominal
latency comes from the ACPI table (PCCT). This helps
in working around the CONFIG_HZ effects on udelay()
and also avoids needing different ACPI tables for Silicon
and Emulation platforms.

Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
---
 drivers/acpi/cppc_acpi.c | 99 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 71 insertions(+), 28 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 6730f96..6ae327e 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -39,6 +39,7 @@
 
 #include <linux/cpufreq.h>
 #include <linux/delay.h>
+#include <linux/ktime.h>
 
 #include <acpi/cppc_acpi.h>
 /*
@@ -63,25 +64,53 @@ static struct mbox_chan *pcc_channel;
 static void __iomem *pcc_comm_addr;
 static u64 comm_base_addr;
 static int pcc_subspace_idx = -1;
-static u16 pcc_cmd_delay;
 static bool pcc_channel_acquired;
+static ktime_t deadline;
 
 /*
  * Arbitrary Retries in case the remote processor is slow to respond
- * to PCC commands.
+ * to PCC commands. Keeping it high enough to cover emulators where
+ * the processors run painfully slow.
  */
 #define NUM_RETRIES 500
 
+static int check_pcc_chan(void)
+{
+	int ret = -EIO;
+	struct acpi_pcct_shared_memory __iomem *generic_comm_base = pcc_comm_addr;
+	ktime_t next_deadline = ktime_add(ktime_get(), deadline);
+
+	/* Retry in case the remote processor was too slow to catch up. */
+	while (!ktime_after(ktime_get(), next_deadline)) {
+		if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
+			ret = 0;
+			break;
+		}
+		/*
+		 * Reducing the bus traffic in case this loop takes longer than
+		 * a few retries.
+		 */
+		udelay(3);
+	}
+
+	return ret;
+}
+
 static int send_pcc_cmd(u16 cmd)
 {
-	int retries, result = -EIO;
-	struct acpi_pcct_hw_reduced *pcct_ss = pcc_channel->con_priv;
+	int ret = -EIO;
 	struct acpi_pcct_shared_memory *generic_comm_base =
 		(struct acpi_pcct_shared_memory *) pcc_comm_addr;
-	u32 cmd_latency = pcct_ss->latency;
 
-	/* Min time OS should wait before sending next command. */
-	udelay(pcc_cmd_delay);
+	/*
+	 * For CMD_WRITE we know for a fact the caller should have checked
+	 * the channel before writing to PCC space
+	 */
+	if (cmd == CMD_READ) {
+		ret = check_pcc_chan();
+		if (ret)
+			return ret;
+	}
 
 	/* Write to the shared comm region. */
 	writew(cmd, &generic_comm_base->command);
@@ -90,26 +119,25 @@ static int send_pcc_cmd(u16 cmd)
 	writew(0, &generic_comm_base->status);
 
 	/* Ring doorbell */
-	result = mbox_send_message(pcc_channel, &cmd);
-	if (result < 0) {
+	ret = mbox_send_message(pcc_channel, &cmd);
+	if (ret < 0) {
 		pr_err("Err sending PCC mbox message. cmd:%d, ret:%d\n",
-				cmd, result);
-		return result;
+				cmd, ret);
+		return ret;
 	}
 
-	/* Wait for a nominal time to let platform process command. */
-	udelay(cmd_latency);
-
-	/* Retry in case the remote processor was too slow to catch up. */
-	for (retries = NUM_RETRIES; retries > 0; retries--) {
-		if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
-			result = 0;
-			break;
-		}
-	}
+	/*
+	 * For READs we need to ensure the cmd completed to ensure
+	 * the ensuing read()s can proceed. For WRITEs we dont care
+	 * because the actual write()s are done before coming here
+	 * and the next READ or WRITE will check if the channel
+	 * is busy/free at the entry of this call.
+	 */
+	if (cmd == CMD_READ)
+		ret = check_pcc_chan();
 
-	mbox_client_txdone(pcc_channel, result);
-	return result;
+	mbox_client_txdone(pcc_channel, ret);
+	return ret;
 }
 
 static void cppc_chan_tx_done(struct mbox_client *cl, void *msg, int ret)
@@ -306,6 +334,7 @@ static int register_pcc_channel(int pcc_subspace_idx)
 {
 	struct acpi_pcct_hw_reduced *cppc_ss;
 	unsigned int len;
+	u64 usecs_lat;
 
 	if (pcc_subspace_idx >= 0) {
 		pcc_channel = pcc_mbox_request_channel(&cppc_mbox_cl,
@@ -335,7 +364,14 @@ static int register_pcc_channel(int pcc_subspace_idx)
 		 */
 		comm_base_addr = cppc_ss->base_address;
 		len = cppc_ss->length;
-		pcc_cmd_delay = cppc_ss->min_turnaround_time;
+
+		/*
+		 * cppc_ss->latency is just a Nominal value. In reality
+		 * the remote processor could be much slower to reply.
+		 * So add an arbitrary amount of wait on top of Nominal.
+		 */
+		usecs_lat = NUM_RETRIES * cppc_ss->latency;
+		deadline = ns_to_ktime(usecs_lat * NSEC_PER_USEC);
 
 		pcc_comm_addr = acpi_os_ioremap(comm_base_addr, len);
 		if (!pcc_comm_addr) {
@@ -604,7 +640,7 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
 			(ref_perf->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) ||
 			(nom_perf->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM)) {
 		/* Ring doorbell once to update PCC subspace */
-		if (send_pcc_cmd(CMD_READ)) {
+		if (send_pcc_cmd(CMD_READ) < 0) {
 			ret = -EIO;
 			goto out_err;
 		}
@@ -662,7 +698,7 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
 	if ((delivered_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) ||
 			(reference_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM)) {
 		/* Ring doorbell once to update PCC subspace */
-		if (send_pcc_cmd(CMD_READ)) {
+		if (send_pcc_cmd(CMD_READ) < 0) {
 			ret = -EIO;
 			goto out_err;
 		}
@@ -713,6 +749,13 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
 
 	spin_lock(&pcc_lock);
 
+	/* If this is PCC reg, check if channel is free before writing */
+	if (desired_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
+		ret = check_pcc_chan();
+		if (ret)
+			goto busy_channel;
+	}
+
 	/*
 	 * Skip writing MIN/MAX until Linux knows how to come up with
 	 * useful values.
@@ -722,10 +765,10 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
 	/* Is this a PCC reg ?*/
 	if (desired_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
 		/* Ring doorbell so Remote can get our perf request. */
-		if (send_pcc_cmd(CMD_WRITE))
+		if (send_pcc_cmd(CMD_WRITE) < 0)
 			ret = -EIO;
 	}
-
+busy_channel:
 	spin_unlock(&pcc_lock);
 
 	return ret;
-- 
Qualcomm Technologies, Inc. on behalf
of the Qualcomm Innovation Center, Inc.  The Qualcomm Innovation Center, Inc. 
is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 2/4] acpi: cppc: optimized cpc_read and cpc_write
  2016-02-10 20:05 [PATCH V3 0/4] acpi: cppc optimization patches Prashanth Prakash
  2016-02-10 20:05 ` [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations Prashanth Prakash
@ 2016-02-10 20:06 ` Prashanth Prakash
  2016-02-10 20:06 ` [PATCH V3 3/4] mailbox: pcc: optimized pcc_send_data Prashanth Prakash
  2016-02-10 20:06 ` [PATCH V3 4/4] acpi: cppc: replace writeX/readX to PCC with relaxed version Prashanth Prakash
  3 siblings, 0 replies; 16+ messages in thread
From: Prashanth Prakash @ 2016-02-10 20:06 UTC (permalink / raw)
  To: rjw
  Cc: linux-acpi, linaro-acpi, ashwin.chaugule, alexey.klimov, timur,
	Prashanth Prakash

cpc_read and cpc_write are used while holding the pcc_lock spin_lock,
so they need to be as fast as possible. acpi_os_read/write_memory
APIs linearly search through a list for cached mapping which is
quite expensive. Since the PCC subspace is already mapped into
virtual address space during initialization, we can just add the
offset and access the necessary CPPC registers.

This patch + similar changes to PCC driver reduce the time per freq.
transition from around 200us to about 20us for cppc cpufreq driver

Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Acked-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
 drivers/acpi/cppc_acpi.c | 80 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 64 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 6ae327e..b7d92a4 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -67,6 +67,9 @@ static int pcc_subspace_idx = -1;
 static bool pcc_channel_acquired;
 static ktime_t deadline;
 
+/* pcc mapped address + header size + offset within PCC subspace */
+#define GET_PCC_VADDR(offs) (pcc_comm_addr + 0x8 + (offs))
+
 /*
  * Arbitrary Retries in case the remote processor is slow to respond
  * to PCC commands. Keeping it high enough to cover emulators where
@@ -582,29 +585,74 @@ void acpi_cppc_processor_exit(struct acpi_processor *pr)
 }
 EXPORT_SYMBOL_GPL(acpi_cppc_processor_exit);
 
-static u64 get_phys_addr(struct cpc_reg *reg)
-{
-	/* PCC communication addr space begins at byte offset 0x8. */
-	if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM)
-		return (u64)comm_base_addr + 0x8 + reg->address;
-	else
-		return reg->address;
-}
+/*
+ * Since cpc_read and cpc_write are called while holding pcc_lock, it should be
+ * as fast as possible. We have already mapped the PCC subspace during init, so
+ * we can directly write to it.
+ */
 
-static void cpc_read(struct cpc_reg *reg, u64 *val)
+static int cpc_read(struct cpc_reg *reg, u64 *val)
 {
-	u64 addr = get_phys_addr(reg);
+	int ret_val = 0;
+
+	*val = 0;
+	if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
+		void __iomem *vaddr = GET_PCC_VADDR(reg->address);
 
-	acpi_os_read_memory((acpi_physical_address)addr,
-			val, reg->bit_width);
+		switch (reg->bit_width) {
+		case 8:
+			*val = readb(vaddr);
+			break;
+		case 16:
+			*val = readw(vaddr);
+			break;
+		case 32:
+			*val = readl(vaddr);
+			break;
+		case 64:
+			*val = readq(vaddr);
+			break;
+		default:
+			pr_debug("Error: Cannot read %u bit width from PCC\n",
+				reg->bit_width);
+			ret_val = -EFAULT;
+		}
+	} else
+		ret_val = acpi_os_read_memory((acpi_physical_address)reg->address,
+					val, reg->bit_width);
+	return ret_val;
 }
 
-static void cpc_write(struct cpc_reg *reg, u64 val)
+static int cpc_write(struct cpc_reg *reg, u64 val)
 {
-	u64 addr = get_phys_addr(reg);
+	int ret_val = 0;
+
+	if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
+		void __iomem *vaddr = GET_PCC_VADDR(reg->address);
 
-	acpi_os_write_memory((acpi_physical_address)addr,
-			val, reg->bit_width);
+		switch (reg->bit_width) {
+		case 8:
+			writeb(val, vaddr);
+			break;
+		case 16:
+			writew(val, vaddr);
+			break;
+		case 32:
+			writel(val, vaddr);
+			break;
+		case 64:
+			writeq(val, vaddr);
+			break;
+		default:
+			pr_debug("Error: Cannot write %u bit width to PCC\n",
+				reg->bit_width);
+			ret_val = -EFAULT;
+			break;
+		}
+	} else
+		ret_val = acpi_os_write_memory((acpi_physical_address)reg->address,
+				val, reg->bit_width);
+	return ret_val;
 }
 
 /**
-- 
Qualcomm Technologies, Inc. on behalf
of the Qualcomm Innovation Center, Inc.  The Qualcomm Innovation Center, Inc. 
is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 3/4] mailbox: pcc: optimized pcc_send_data
  2016-02-10 20:05 [PATCH V3 0/4] acpi: cppc optimization patches Prashanth Prakash
  2016-02-10 20:05 ` [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations Prashanth Prakash
  2016-02-10 20:06 ` [PATCH V3 2/4] acpi: cppc: optimized cpc_read and cpc_write Prashanth Prakash
@ 2016-02-10 20:06 ` Prashanth Prakash
  2016-02-10 20:06 ` [PATCH V3 4/4] acpi: cppc: replace writeX/readX to PCC with relaxed version Prashanth Prakash
  3 siblings, 0 replies; 16+ messages in thread
From: Prashanth Prakash @ 2016-02-10 20:06 UTC (permalink / raw)
  To: rjw
  Cc: linux-acpi, linaro-acpi, ashwin.chaugule, alexey.klimov, timur,
	Prashanth Prakash

pcc_send_data can be invoked during the execution of performance
critical code as in cppc_cpufreq driver. With acpi_* APIs, the
doorbell register accessed in pcc_send_data if present in system
memory will be searched (in cached virt to phys addr mapping),
mapped, read/written and then unmapped. These operations take
significant amount of time.

This patch maps the performance critical doorbell register
during init and then reads/writes to it directly using the
mapped virtual address. This patch + similar changes to CPPC
acpi driver reduce the time per freq. transition from around
200us to about 20us for cppc cpufreq driver

Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Acked-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
 drivers/mailbox/pcc.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 104 insertions(+), 7 deletions(-)

diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c
index 8f779a1..0ddf638 100644
--- a/drivers/mailbox/pcc.c
+++ b/drivers/mailbox/pcc.c
@@ -63,6 +63,7 @@
 #include <linux/platform_device.h>
 #include <linux/mailbox_controller.h>
 #include <linux/mailbox_client.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
 
 #include "mailbox.h"
 
@@ -70,6 +71,9 @@
 
 static struct mbox_chan *pcc_mbox_channels;
 
+/* Array of cached virtual address for doorbell registers */
+static void __iomem **pcc_doorbell_vaddr;
+
 static struct mbox_controller pcc_mbox_ctrl = {};
 /**
  * get_pcc_channel - Given a PCC subspace idx, get
@@ -160,6 +164,66 @@ void pcc_mbox_free_channel(struct mbox_chan *chan)
 }
 EXPORT_SYMBOL_GPL(pcc_mbox_free_channel);
 
+/*
+ * PCC can be used with perf critical drivers such as CPPC
+ * So it makes sense to locally cache the virtual address and
+ * use it to read/write to PCC registers such as doorbell register
+ *
+ * The below read_register and write_registers are used to read and
+ * write from perf critical registers such as PCC doorbell register
+ */
+static int read_register(void __iomem *vaddr, u64 *val, unsigned int bit_width)
+{
+	int ret_val = 0;
+
+	switch (bit_width) {
+	case 8:
+		*val = readb(vaddr);
+		break;
+	case 16:
+		*val = readw(vaddr);
+		break;
+	case 32:
+		*val = readl(vaddr);
+		break;
+	case 64:
+		*val = readq(vaddr);
+		break;
+	default:
+		pr_debug("Error: Cannot read register of %u bit width",
+			bit_width);
+		ret_val = -EFAULT;
+		break;
+	}
+	return ret_val;
+}
+
+static int write_register(void __iomem *vaddr, u64 val, unsigned int bit_width)
+{
+	int ret_val = 0;
+
+	switch (bit_width) {
+	case 8:
+		writeb(val, vaddr);
+		break;
+	case 16:
+		writew(val, vaddr);
+		break;
+	case 32:
+		writel(val, vaddr);
+		break;
+	case 64:
+		writeq(val, vaddr);
+		break;
+	default:
+		pr_debug("Error: Cannot write register of %u bit width",
+			bit_width);
+		ret_val = -EFAULT;
+		break;
+	}
+	return ret_val;
+}
+
 /**
  * pcc_send_data - Called from Mailbox Controller code. Used
  *		here only to ring the channel doorbell. The PCC client
@@ -175,21 +239,39 @@ EXPORT_SYMBOL_GPL(pcc_mbox_free_channel);
 static int pcc_send_data(struct mbox_chan *chan, void *data)
 {
 	struct acpi_pcct_hw_reduced *pcct_ss = chan->con_priv;
-	struct acpi_generic_address doorbell;
+	struct acpi_generic_address *doorbell;
 	u64 doorbell_preserve;
 	u64 doorbell_val;
 	u64 doorbell_write;
+	u32 id = chan - pcc_mbox_channels;
+	int ret = 0;
+
+	if (id >= pcc_mbox_ctrl.num_chans) {
+		pr_debug("pcc_send_data: Invalid mbox_chan passed\n");
+		return -ENOENT;
+	}
 
-	doorbell = pcct_ss->doorbell_register;
+	doorbell = &pcct_ss->doorbell_register;
 	doorbell_preserve = pcct_ss->preserve_mask;
 	doorbell_write = pcct_ss->write_mask;
 
 	/* Sync notification from OS to Platform. */
-	acpi_read(&doorbell_val, &doorbell);
-	acpi_write((doorbell_val & doorbell_preserve) | doorbell_write,
-			&doorbell);
-
-	return 0;
+	if (pcc_doorbell_vaddr[id]) {
+		ret = read_register(pcc_doorbell_vaddr[id], &doorbell_val,
+			doorbell->bit_width);
+		if (ret)
+			return ret;
+		ret = write_register(pcc_doorbell_vaddr[id],
+			(doorbell_val & doorbell_preserve) | doorbell_write,
+			doorbell->bit_width);
+	} else {
+		ret = acpi_read(&doorbell_val, doorbell);
+		if (ret)
+			return ret;
+		ret = acpi_write((doorbell_val & doorbell_preserve) | doorbell_write,
+			doorbell);
+	}
+	return ret;
 }
 
 static const struct mbox_chan_ops pcc_chan_ops = {
@@ -265,14 +347,29 @@ static int __init acpi_pcc_probe(void)
 		return -ENOMEM;
 	}
 
+	pcc_doorbell_vaddr = kcalloc(count, sizeof(void *), GFP_KERNEL);
+	if (!pcc_doorbell_vaddr) {
+		kfree(pcc_mbox_channels);
+		return -ENOMEM;
+	}
+
 	/* Point to the first PCC subspace entry */
 	pcct_entry = (struct acpi_subtable_header *) (
 		(unsigned long) pcct_tbl + sizeof(struct acpi_table_pcct));
 
 	for (i = 0; i < count; i++) {
+		struct acpi_generic_address *db_reg;
+		struct acpi_pcct_hw_reduced *pcct_ss;
 		pcc_mbox_channels[i].con_priv = pcct_entry;
 		pcct_entry = (struct acpi_subtable_header *)
 			((unsigned long) pcct_entry + pcct_entry->length);
+
+		/* If doorbell is in system memory cache the virt address */
+		pcct_ss = (struct acpi_pcct_hw_reduced *)pcct_entry;
+		db_reg = &pcct_ss->doorbell_register;
+		if (db_reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
+			pcc_doorbell_vaddr[i] = acpi_os_ioremap(db_reg->address,
+							db_reg->bit_width/8);
 	}
 
 	pcc_mbox_ctrl.num_chans = count;
-- 
Qualcomm Technologies, Inc. on behalf
of the Qualcomm Innovation Center, Inc.  The Qualcomm Innovation Center, Inc. 
is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 4/4] acpi: cppc: replace writeX/readX to PCC with relaxed version
  2016-02-10 20:05 [PATCH V3 0/4] acpi: cppc optimization patches Prashanth Prakash
                   ` (2 preceding siblings ...)
  2016-02-10 20:06 ` [PATCH V3 3/4] mailbox: pcc: optimized pcc_send_data Prashanth Prakash
@ 2016-02-10 20:06 ` Prashanth Prakash
  3 siblings, 0 replies; 16+ messages in thread
From: Prashanth Prakash @ 2016-02-10 20:06 UTC (permalink / raw)
  To: rjw
  Cc: linux-acpi, linaro-acpi, ashwin.chaugule, alexey.klimov, timur,
	Prashanth Prakash

We do not have a strict read/write order requirement while accessing
PCC subspace. The only requirement is all access should be committed
before triggering the PCC doorbell to transfer the ownership of PCC
to the platform and this requirement is enforced by the PCC driver.

Profiling on a many core system shows improvement of about 1.8us on
average per freq change request(about 10% improvement on average).
Since these operations are executed while holding the pcc_lock,
reducing this time helps the CPPC implementation to scale much
better as the number of cores increases.

Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Acked-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
 drivers/acpi/cppc_acpi.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index b7d92a4..2f144dd 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -116,10 +116,10 @@ static int send_pcc_cmd(u16 cmd)
 	}
 
 	/* Write to the shared comm region. */
-	writew(cmd, &generic_comm_base->command);
+	writew_relaxed(cmd, &generic_comm_base->command);
 
 	/* Flip CMD COMPLETE bit */
-	writew(0, &generic_comm_base->status);
+	writew_relaxed(0, &generic_comm_base->status);
 
 	/* Ring doorbell */
 	ret = mbox_send_message(pcc_channel, &cmd);
@@ -601,16 +601,16 @@ static int cpc_read(struct cpc_reg *reg, u64 *val)
 
 		switch (reg->bit_width) {
 		case 8:
-			*val = readb(vaddr);
+			*val = readb_relaxed(vaddr);
 			break;
 		case 16:
-			*val = readw(vaddr);
+			*val = readw_relaxed(vaddr);
 			break;
 		case 32:
-			*val = readl(vaddr);
+			*val = readl_relaxed(vaddr);
 			break;
 		case 64:
-			*val = readq(vaddr);
+			*val = readq_relaxed(vaddr);
 			break;
 		default:
 			pr_debug("Error: Cannot read %u bit width from PCC\n",
@@ -632,16 +632,16 @@ static int cpc_write(struct cpc_reg *reg, u64 val)
 
 		switch (reg->bit_width) {
 		case 8:
-			writeb(val, vaddr);
+			writeb_relaxed(val, vaddr);
 			break;
 		case 16:
-			writew(val, vaddr);
+			writew_relaxed(val, vaddr);
 			break;
 		case 32:
-			writel(val, vaddr);
+			writel_relaxed(val, vaddr);
 			break;
 		case 64:
-			writeq(val, vaddr);
+			writeq_relaxed(val, vaddr);
 			break;
 		default:
 			pr_debug("Error: Cannot write %u bit width to PCC\n",
-- 
Qualcomm Technologies, Inc. on behalf
of the Qualcomm Innovation Center, Inc.  The Qualcomm Innovation Center, Inc. 
is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-10 20:05 ` [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations Prashanth Prakash
@ 2016-02-10 20:42   ` Timur Tabi
  2016-02-10 21:15     ` Ashwin Chaugule
  2016-02-15 16:37   ` Alexey Klimov
  1 sibling, 1 reply; 16+ messages in thread
From: Timur Tabi @ 2016-02-10 20:42 UTC (permalink / raw)
  To: Prashanth Prakash, rjw
  Cc: linux-acpi, linaro-acpi, ashwin.chaugule, alexey.klimov

Prashanth Prakash wrote:

> +static int check_pcc_chan(void)
> +{
> +	int ret = -EIO;
> +	struct acpi_pcct_shared_memory __iomem *generic_comm_base = pcc_comm_addr;
> +	ktime_t next_deadline = ktime_add(ktime_get(), deadline);
> +
> +	/* Retry in case the remote processor was too slow to catch up. */
> +	while (!ktime_after(ktime_get(), next_deadline)) {
> +		if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
> +			ret = 0;
> +			break;
> +		}
> +		/*
> +		 * Reducing the bus traffic in case this loop takes longer than
> +		 * a few retries.
> +		 */
> +		udelay(3);
> +	}

Like I said last time, you really should use 
readq_relaxed_poll_timeout(), which does exactly the same thing as this 
loop, but more elegantly.  I think this will work:

u32 status;
ret = readq_relaxed_poll_timeout(&generic_comm_base->status, status, 
status & PCC_CMD_COMPLETE, 3, deadline);
return ret ? -EIO : 0;
...
deadline = NUM_RETRIES * cppc_ss->latency;

This lets you completely eliminate all usage of ktime.  You can 
eliminate the global variable 'deadline' also, if you can figure out how 
to pass the cppc_ss object to check_pcc_chan().

> -	/* Wait for a nominal time to let platform process command. */
> -	udelay(cmd_latency);
> -
> -	/* Retry in case the remote processor was too slow to catch up. */
> -	for (retries = NUM_RETRIES; retries > 0; retries--) {
> -		if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
> -			result = 0;
> -			break;
> -		}
> -	}
> +	/*
> +	 * For READs we need to ensure the cmd completed to ensure
> +	 * the ensuing read()s can proceed. For WRITEs we dont care

"don't"



-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-10 20:42   ` Timur Tabi
@ 2016-02-10 21:15     ` Ashwin Chaugule
  2016-02-10 21:57       ` Timur Tabi
  0 siblings, 1 reply; 16+ messages in thread
From: Ashwin Chaugule @ 2016-02-10 21:15 UTC (permalink / raw)
  To: Timur Tabi
  Cc: Prashanth Prakash, Rafael J. Wysocki, linux acpi,
	Linaro ACPI Mailman List, Alexey Klimov

On 10 February 2016 at 15:42, Timur Tabi <timur@codeaurora.org> wrote:
> Prashanth Prakash wrote:
>
>> +static int check_pcc_chan(void)
>> +{
>> +       int ret = -EIO;
>> +       struct acpi_pcct_shared_memory __iomem *generic_comm_base =
>> pcc_comm_addr;
>> +       ktime_t next_deadline = ktime_add(ktime_get(), deadline);
>> +
>> +       /* Retry in case the remote processor was too slow to catch up. */
>> +       while (!ktime_after(ktime_get(), next_deadline)) {
>> +               if (readw_relaxed(&generic_comm_base->status) &
>> PCC_CMD_COMPLETE) {
>> +                       ret = 0;
>> +                       break;
>> +               }
>> +               /*
>> +                * Reducing the bus traffic in case this loop takes longer
>> than
>> +                * a few retries.
>> +                */
>> +               udelay(3);
>> +       }
>
>
> Like I said last time, you really should use readq_relaxed_poll_timeout(),
> which does exactly the same thing as this loop, but more elegantly.

Nope.

>I think
> this will work:
>
> u32 status;
> ret = readq_relaxed_poll_timeout(&generic_comm_base->status, status, status
> & PCC_CMD_COMPLETE, 3, deadline);
> return ret ? -EIO : 0;
> ...
> deadline = NUM_RETRIES * cppc_ss->latency;
>
> This lets you completely eliminate all usage of ktime.

..Only to be re substituted by the macro all over again. So, there
really is no value in replacing this with a macro.  Also,
readx_relaxed_poll_timeout() has a usleep(), which will kill all the
optimization here. Its also horribly wrong in this context.

The alternative readx_poll_timeout_atomic() has a udelay() but it also
adds way more conditionals than this code. So, there is no need to
change things for the sake of cosmetic reasons.

> You can eliminate
> the global variable 'deadline' also, if you can figure out how to pass the
> cppc_ss object to check_pcc_chan().
>
>> -       /* Wait for a nominal time to let platform process command. */
>> -       udelay(cmd_latency);
>> -
>> -       /* Retry in case the remote processor was too slow to catch up. */
>> -       for (retries = NUM_RETRIES; retries > 0; retries--) {
>> -               if (readw_relaxed(&generic_comm_base->status) &
>> PCC_CMD_COMPLETE) {
>> -                       result = 0;
>> -                       break;
>> -               }
>> -       }
>> +       /*
>> +        * For READs we need to ensure the cmd completed to ensure
>> +        * the ensuing read()s can proceed. For WRITEs we dont care
>
>
> "don't"
>

Ashwin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-10 21:15     ` Ashwin Chaugule
@ 2016-02-10 21:57       ` Timur Tabi
  2016-02-10 22:17         ` Ashwin Chaugule
  0 siblings, 1 reply; 16+ messages in thread
From: Timur Tabi @ 2016-02-10 21:57 UTC (permalink / raw)
  To: Ashwin Chaugule
  Cc: Prashanth Prakash, Rafael J. Wysocki, linux acpi,
	Linaro ACPI Mailman List, Alexey Klimov

Ashwin Chaugule wrote:
> ..Only to be re substituted by the macro all over again. So, there
> really is no value in replacing this with a macro.  Also,
> readx_relaxed_poll_timeout() has a usleep(), which will kill all the
> optimization here. Its also horribly wrong in this context.
>
> The alternative readx_poll_timeout_atomic() has a udelay() but it also
> adds way more conditionals than this code. So, there is no need to
> change things for the sake of cosmetic reasons.

This is not a cosmetic change.  Calling a built-in macro instead of 
re-inventing the wheel is not cosmetic.  That's like saying that people 
shouldn't bother using min() or max() because they can just calculate 
the minimum and maximum themselves.  The macros exist for a reason, and 
people should be using them when applicable.

Also, the number of conditionals is not really much of an issue, I think.

The while loop has two conditionals:

1) The while-expression !ktime_after(ktime_get(), next_deadline)
2) The if-statement readw_relaxed(&generic_comm_base->status) & 
PCC_CMD_COMPLETE

readl_relaxed_poll_timeout_atomic() has four:

1) if (cond) break;
2) if (timeout_us && ktime_compare(ktime_get(), timeout) > 0)
3) if (delay_us) udelay(delay_us);	
4) (cond) ? 0 : -ETIMEDOUT; \

The third one is compiled-out because delay_us is a constant.  The 
fourth test is to handle the race condition between timeout and success. 
  If we time out, but the register value changes at the last 
microsecond, then we report success anyway.

And since this is a macro instead of a function, the code is generally 
optimized better.  The compiler typically can optimize macros better 
than functions, so there's still a chance that the macro version will be 
more optimized than the function anyway.

Finally, there's the convention of using a built-in macro/function over 
hand-coding something, even if it's not quite as optimized.  I don't 
think check_pcc_chan() is that time-critical that a single additional 
conditional is really a problem.  This while-loop really does re-invent 
the wheel.

Anyway, I don't want to belabor this point.  The decision is Rafael's now.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-10 21:57       ` Timur Tabi
@ 2016-02-10 22:17         ` Ashwin Chaugule
  0 siblings, 0 replies; 16+ messages in thread
From: Ashwin Chaugule @ 2016-02-10 22:17 UTC (permalink / raw)
  To: Timur Tabi
  Cc: Prashanth Prakash, Rafael J. Wysocki, linux acpi,
	Linaro ACPI Mailman List, Alexey Klimov

On 10 February 2016 at 16:57, Timur Tabi <timur@codeaurora.org> wrote:

> Anyway, I don't want to belabor this point.  The decision is Rafael's now.

Thank you.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-10 20:05 ` [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations Prashanth Prakash
  2016-02-10 20:42   ` Timur Tabi
@ 2016-02-15 16:37   ` Alexey Klimov
  2016-02-16 18:47     ` Ashwin Chaugule
  1 sibling, 1 reply; 16+ messages in thread
From: Alexey Klimov @ 2016-02-15 16:37 UTC (permalink / raw)
  To: Prashanth Prakash; +Cc: rjw, linux-acpi, linaro-acpi, ashwin.chaugule, timur

Hi Prashanth and Ashwin,

On Wed, Feb 10, 2016 at 01:05:59PM -0700, Prashanth Prakash wrote:
> From: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> 
> Previously the send_pcc_cmd() code checked if the
> PCC operation had completed before returning from
> the function. This check was performed regardless
> of the PCC op type (i.e. Read/Write). Knowing
> the type of cmd can be used to optimize the check
> and avoid needless waiting. e.g. with Write ops,
> the actual Writing is done before calling send_pcc_cmd().
> And the subsequent Writes will check if the channel is
> free at the entry of send_pcc_cmd() anyway.
> 
> However, for Read cmds, we need to wait for the cmd
> completion bit to be flipped, since the actual Read
> ops follow after returning from the send_pcc_cmd(). So,
> only do the looping check at the end for Read ops.
> 
> Also, instead of using udelay() calls, use ktime as a
> means to check for deadlines. The current deadline
> in which the Remote should flip the cmd completion bit
> is defined as N * Nominal latency. Where N is arbitrary
> and large enough to work on slow emulators and Nominal
> latency comes from the ACPI table (PCCT). This helps
> in working around the CONFIG_HZ effects on udelay()
> and also avoids needing different ACPI tables for Silicon
> and Emulation platforms.
> 
> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
> ---
>  drivers/acpi/cppc_acpi.c | 99 ++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 71 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index 6730f96..6ae327e 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -39,6 +39,7 @@
>  
>  #include <linux/cpufreq.h>
>  #include <linux/delay.h>
> +#include <linux/ktime.h>
>  
>  #include <acpi/cppc_acpi.h>
>  /*
> @@ -63,25 +64,53 @@ static struct mbox_chan *pcc_channel;
>  static void __iomem *pcc_comm_addr;
>  static u64 comm_base_addr;
>  static int pcc_subspace_idx = -1;
> -static u16 pcc_cmd_delay;
>  static bool pcc_channel_acquired;
> +static ktime_t deadline;
>  
>  /*
>   * Arbitrary Retries in case the remote processor is slow to respond
> - * to PCC commands.
> + * to PCC commands. Keeping it high enough to cover emulators where
> + * the processors run painfully slow.
>   */
>  #define NUM_RETRIES 500
>  
> +static int check_pcc_chan(void)
> +{
> +	int ret = -EIO;
> +	struct acpi_pcct_shared_memory __iomem *generic_comm_base = pcc_comm_addr;
> +	ktime_t next_deadline = ktime_add(ktime_get(), deadline);
> +
> +	/* Retry in case the remote processor was too slow to catch up. */
> +	while (!ktime_after(ktime_get(), next_deadline)) {
> +		if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {

What do you think about polling for error too?
Firmware might set error bit and kernel will spin here for full cycle (maybe even
more than one time).


> +			ret = 0;
> +			break;
> +		}
> +		/*
> +		 * Reducing the bus traffic in case this loop takes longer than
> +		 * a few retries.
> +		 */
> +		udelay(3);
> +	}
> +
> +	return ret;
> +}
> +
>  static int send_pcc_cmd(u16 cmd)
>  {
> -	int retries, result = -EIO;
> -	struct acpi_pcct_hw_reduced *pcct_ss = pcc_channel->con_priv;
> +	int ret = -EIO;
>  	struct acpi_pcct_shared_memory *generic_comm_base =
>  		(struct acpi_pcct_shared_memory *) pcc_comm_addr;
> -	u32 cmd_latency = pcct_ss->latency;
>  
> -	/* Min time OS should wait before sending next command. */
> -	udelay(pcc_cmd_delay);

Ouch.
This delay is Minimum Request Turnaround Time and specification is strict about this:
The minimum amount of time that OSPM must wait after the
completion of a command before issuing the next command, in microseconds.

And you're going to completely remove delay that described as "must wait" regardless if it's
write or read command. In my view completion of a command is when CMD complete bit is set to 1.
This field in table is not optional.

Even worse, what if platform report latency = 0 (or very close to zero) and turnaround time as very large number?

Well, my other comments here are minor things but this one looks like main issue of this patch.



> +	/*
> +	 * For CMD_WRITE we know for a fact the caller should have checked
> +	 * the channel before writing to PCC space
> +	 */
> +	if (cmd == CMD_READ) {
> +		ret = check_pcc_chan();
> +		if (ret)
> +			return ret;
> +	}
>  
>  	/* Write to the shared comm region. */
>  	writew(cmd, &generic_comm_base->command);
> @@ -90,26 +119,25 @@ static int send_pcc_cmd(u16 cmd)
>  	writew(0, &generic_comm_base->status);
>  
>  	/* Ring doorbell */
> -	result = mbox_send_message(pcc_channel, &cmd);
> -	if (result < 0) {
> +	ret = mbox_send_message(pcc_channel, &cmd);
> +	if (ret < 0) {
>  		pr_err("Err sending PCC mbox message. cmd:%d, ret:%d\n",
> -				cmd, result);
> -		return result;
> +				cmd, ret);
> +		return ret;
>  	}
>  
> -	/* Wait for a nominal time to let platform process command. */
> -	udelay(cmd_latency);
> -
> -	/* Retry in case the remote processor was too slow to catch up. */
> -	for (retries = NUM_RETRIES; retries > 0; retries--) {
> -		if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
> -			result = 0;
> -			break;
> -		}
> -	}
> +	/*
> +	 * For READs we need to ensure the cmd completed to ensure
> +	 * the ensuing read()s can proceed. For WRITEs we dont care
> +	 * because the actual write()s are done before coming here
> +	 * and the next READ or WRITE will check if the channel
> +	 * is busy/free at the entry of this call.
> +	 */
> +	if (cmd == CMD_READ)
> +		ret = check_pcc_chan();
>  
> -	mbox_client_txdone(pcc_channel, result);
> -	return result;
> +	mbox_client_txdone(pcc_channel, ret);
> +	return ret;
>  }
>  
>  static void cppc_chan_tx_done(struct mbox_client *cl, void *msg, int ret)
> @@ -306,6 +334,7 @@ static int register_pcc_channel(int pcc_subspace_idx)
>  {
>  	struct acpi_pcct_hw_reduced *cppc_ss;
>  	unsigned int len;
> +	u64 usecs_lat;
>  
>  	if (pcc_subspace_idx >= 0) {
>  		pcc_channel = pcc_mbox_request_channel(&cppc_mbox_cl,
> @@ -335,7 +364,14 @@ static int register_pcc_channel(int pcc_subspace_idx)
>  		 */
>  		comm_base_addr = cppc_ss->base_address;
>  		len = cppc_ss->length;
> -		pcc_cmd_delay = cppc_ss->min_turnaround_time;
> +
> +		/*
> +		 * cppc_ss->latency is just a Nominal value. In reality
> +		 * the remote processor could be much slower to reply.
> +		 * So add an arbitrary amount of wait on top of Nominal.

Specification requires platform to report accurate values.
If remote party requires arbitrary amount on top that means someone screwed up
values in firmware.
That's not only slowness of remote processor but broken/inaccurate values given by firmware.
I think if you want this comment you need to blame firmware too.


> +		 */
> +		usecs_lat = NUM_RETRIES * cppc_ss->latency;
> +		deadline = ns_to_ktime(usecs_lat * NSEC_PER_USEC);
>  
>  		pcc_comm_addr = acpi_os_ioremap(comm_base_addr, len);
>  		if (!pcc_comm_addr) {
> @@ -604,7 +640,7 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
>  			(ref_perf->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) ||
>  			(nom_perf->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM)) {
>  		/* Ring doorbell once to update PCC subspace */
> -		if (send_pcc_cmd(CMD_READ)) {
> +		if (send_pcc_cmd(CMD_READ) < 0) {

Somewhere in previous email I asked you to point me to specs with words about init values before
issuing first PCC command. Thanks for that.
However I still think that comment should be added about init value of status register.
Please add it unless you have something to come up with. I think such comment will be useful.


>  			ret = -EIO;
>  			goto out_err;
>  		}
> @@ -662,7 +698,7 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>  	if ((delivered_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) ||
>  			(reference_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM)) {
>  		/* Ring doorbell once to update PCC subspace */
> -		if (send_pcc_cmd(CMD_READ)) {
> +		if (send_pcc_cmd(CMD_READ) < 0) {
>  			ret = -EIO;
>  			goto out_err;
>  		}
> @@ -713,6 +749,13 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
>  
>  	spin_lock(&pcc_lock);
>  
> +	/* If this is PCC reg, check if channel is free before writing */
> +	if (desired_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
> +		ret = check_pcc_chan();
> +		if (ret)
> +			goto busy_channel;
> +	}
> +
>  	/*
>  	 * Skip writing MIN/MAX until Linux knows how to come up with
>  	 * useful values.
> @@ -722,10 +765,10 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
>  	/* Is this a PCC reg ?*/
>  	if (desired_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
>  		/* Ring doorbell so Remote can get our perf request. */
> -		if (send_pcc_cmd(CMD_WRITE))
> +		if (send_pcc_cmd(CMD_WRITE) < 0)
>  			ret = -EIO;
>  	}
> -
> +busy_channel:
>  	spin_unlock(&pcc_lock);
>  
>  	return ret;

Best regards,
Alexey.
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-15 16:37   ` Alexey Klimov
@ 2016-02-16 18:47     ` Ashwin Chaugule
  2016-02-16 19:10       ` Rafael J. Wysocki
  2016-02-29 17:39       ` Alexey Klimov
  0 siblings, 2 replies; 16+ messages in thread
From: Ashwin Chaugule @ 2016-02-16 18:47 UTC (permalink / raw)
  To: Alexey Klimov
  Cc: Prashanth Prakash, Rafael J. Wysocki, linux acpi,
	Linaro ACPI Mailman List, Timur Tabi

Hi Alexey,

On 15 February 2016 at 11:37, Alexey Klimov <alexey.klimov@arm.com> wrote:
> Hi Prashanth and Ashwin,
>
> On Wed, Feb 10, 2016 at 01:05:59PM -0700, Prashanth Prakash wrote:
>> From: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>>
>> Previously the send_pcc_cmd() code checked if the
>> PCC operation had completed before returning from
>> the function. This check was performed regardless
>> of the PCC op type (i.e. Read/Write). Knowing
>> the type of cmd can be used to optimize the check
>> and avoid needless waiting. e.g. with Write ops,
>> the actual Writing is done before calling send_pcc_cmd().
>> And the subsequent Writes will check if the channel is
>> free at the entry of send_pcc_cmd() anyway.
>>
>> However, for Read cmds, we need to wait for the cmd
>> completion bit to be flipped, since the actual Read
>> ops follow after returning from the send_pcc_cmd(). So,
>> only do the looping check at the end for Read ops.
>>
>> Also, instead of using udelay() calls, use ktime as a
>> means to check for deadlines. The current deadline
>> in which the Remote should flip the cmd completion bit
>> is defined as N * Nominal latency. Where N is arbitrary
>> and large enough to work on slow emulators and Nominal
>> latency comes from the ACPI table (PCCT). This helps
>> in working around the CONFIG_HZ effects on udelay()
>> and also avoids needing different ACPI tables for Silicon
>> and Emulation platforms.
>>
>> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
>> ---
>>  drivers/acpi/cppc_acpi.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>  1 file changed, 71 insertions(+), 28 deletions(-)

[..]

>>
>>  /*
>>   * Arbitrary Retries in case the remote processor is slow to respond
>> - * to PCC commands.
>> + * to PCC commands. Keeping it high enough to cover emulators where
>> + * the processors run painfully slow.
>>   */
>>  #define NUM_RETRIES 500
>>
>> +static int check_pcc_chan(void)
>> +{
>> +     int ret = -EIO;
>> +     struct acpi_pcct_shared_memory __iomem *generic_comm_base = pcc_comm_addr;
>> +     ktime_t next_deadline = ktime_add(ktime_get(), deadline);
>> +
>> +     /* Retry in case the remote processor was too slow to catch up. */
>> +     while (!ktime_after(ktime_get(), next_deadline)) {
>> +             if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
>
> What do you think about polling for error too?
> Firmware might set error bit and kernel will spin here for full cycle (maybe even
> more than one time).
>

Hm. I dont think thats useful since we dont do anything special if the
firmware errors out. Perhaps at some later point we can stop sending
new requests if there are firmware errors, but then how would we know
when to resume?

>
>> +                     ret = 0;
>> +                     break;
>> +             }
>> +             /*
>> +              * Reducing the bus traffic in case this loop takes longer than
>> +              * a few retries.
>> +              */
>> +             udelay(3);
>> +     }
>> +
>> +     return ret;
>> +}
>> +
>>  static int send_pcc_cmd(u16 cmd)
>>  {
>> -     int retries, result = -EIO;
>> -     struct acpi_pcct_hw_reduced *pcct_ss = pcc_channel->con_priv;
>> +     int ret = -EIO;
>>       struct acpi_pcct_shared_memory *generic_comm_base =
>>               (struct acpi_pcct_shared_memory *) pcc_comm_addr;
>> -     u32 cmd_latency = pcct_ss->latency;
>>
>> -     /* Min time OS should wait before sending next command. */
>> -     udelay(pcc_cmd_delay);
>
> Ouch.
> This delay is Minimum Request Turnaround Time and specification is strict about this:
> The minimum amount of time that OSPM must wait after the
> completion of a command before issuing the next command, in microseconds.
>
> And you're going to completely remove delay that described as "must wait" regardless if it's
> write or read command. In my view completion of a command is when CMD complete bit is set to 1.
> This field in table is not optional.

The spec has a whole bunch of delays which are highly questionable.
e.g. the Min Request Turnaround time and the Max periodic access rate.
Both seem quite useless since their original intention was to avoid
the OS from flooding the BMC. But one can argue that the firmware can
throttle the OS requests via the Command Completion bit anyway. I also
didnt like the usage of udelays and prefer the ktime math, but that
seems overkill given the questionable value of these fields.

>
> Even worse, what if platform report latency = 0 (or very close to zero) and turnaround time as very large number?
>

Such a firmware would need serious rework. Besides, if for whatever
reason they choose to have it, the command complete bit can be used to
throttle the OS requests.

>> +             /*
>> +              * cppc_ss->latency is just a Nominal value. In reality
>> +              * the remote processor could be much slower to reply.
>> +              * So add an arbitrary amount of wait on top of Nominal.
>
> Specification requires platform to report accurate values.
> If remote party requires arbitrary amount on top that means someone screwed up
> values in firmware.
> That's not only slowness of remote processor but broken/inaccurate values given by firmware.
> I think if you want this comment you need to blame firmware too.
>

Not necessarily. The firmware could have other threads going on, which
sporadically cause additional ( > Nominal) latencies and the OS must
honor that. Its only a "Nominal" value after all.

>
>> +              */
>> +             usecs_lat = NUM_RETRIES * cppc_ss->latency;
>> +             deadline = ns_to_ktime(usecs_lat * NSEC_PER_USEC);
>>
>>               pcc_comm_addr = acpi_os_ioremap(comm_base_addr, len);
>>               if (!pcc_comm_addr) {
>> @@ -604,7 +640,7 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
>>                       (ref_perf->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) ||
>>                       (nom_perf->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM)) {
>>               /* Ring doorbell once to update PCC subspace */
>> -             if (send_pcc_cmd(CMD_READ)) {
>> +             if (send_pcc_cmd(CMD_READ) < 0) {
>
> Somewhere in previous email I asked you to point me to specs with words about init values before
> issuing first PCC command. Thanks for that.
> However I still think that comment should be added about init value of status register.
> Please add it unless you have something to come up with. I think such comment will be useful.
>
>

Ok. We'll add it if we need to revise this patchset.


Thanks,
Ashwin.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-16 18:47     ` Ashwin Chaugule
@ 2016-02-16 19:10       ` Rafael J. Wysocki
  2016-02-16 19:33         ` Ashwin Chaugule
  2016-02-29 17:39       ` Alexey Klimov
  1 sibling, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-02-16 19:10 UTC (permalink / raw)
  To: Ashwin Chaugule
  Cc: Alexey Klimov, Prashanth Prakash, Rafael J. Wysocki, linux acpi,
	Linaro ACPI Mailman List, Timur Tabi

On Tue, Feb 16, 2016 at 7:47 PM, Ashwin Chaugule
<ashwin.chaugule@linaro.org> wrote:
> Hi Alexey,
>
> On 15 February 2016 at 11:37, Alexey Klimov <alexey.klimov@arm.com> wrote:
>> Hi Prashanth and Ashwin,
>>
>> On Wed, Feb 10, 2016 at 01:05:59PM -0700, Prashanth Prakash wrote:
>>> From: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>>>
>>> Previously the send_pcc_cmd() code checked if the
>>> PCC operation had completed before returning from
>>> the function. This check was performed regardless
>>> of the PCC op type (i.e. Read/Write). Knowing
>>> the type of cmd can be used to optimize the check
>>> and avoid needless waiting. e.g. with Write ops,
>>> the actual Writing is done before calling send_pcc_cmd().
>>> And the subsequent Writes will check if the channel is
>>> free at the entry of send_pcc_cmd() anyway.
>>>
>>> However, for Read cmds, we need to wait for the cmd
>>> completion bit to be flipped, since the actual Read
>>> ops follow after returning from the send_pcc_cmd(). So,
>>> only do the looping check at the end for Read ops.
>>>
>>> Also, instead of using udelay() calls, use ktime as a
>>> means to check for deadlines. The current deadline
>>> in which the Remote should flip the cmd completion bit
>>> is defined as N * Nominal latency. Where N is arbitrary
>>> and large enough to work on slow emulators and Nominal
>>> latency comes from the ACPI table (PCCT). This helps
>>> in working around the CONFIG_HZ effects on udelay()
>>> and also avoids needing different ACPI tables for Silicon
>>> and Emulation platforms.
>>>
>>> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>>> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
>>> ---
>>>  drivers/acpi/cppc_acpi.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>  1 file changed, 71 insertions(+), 28 deletions(-)
>
> [..]
>
>>>
>>>  /*
>>>   * Arbitrary Retries in case the remote processor is slow to respond
>>> - * to PCC commands.
>>> + * to PCC commands. Keeping it high enough to cover emulators where
>>> + * the processors run painfully slow.
>>>   */
>>>  #define NUM_RETRIES 500
>>>
>>> +static int check_pcc_chan(void)
>>> +{
>>> +     int ret = -EIO;
>>> +     struct acpi_pcct_shared_memory __iomem *generic_comm_base = pcc_comm_addr;
>>> +     ktime_t next_deadline = ktime_add(ktime_get(), deadline);
>>> +
>>> +     /* Retry in case the remote processor was too slow to catch up. */
>>> +     while (!ktime_after(ktime_get(), next_deadline)) {
>>> +             if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
>>
>> What do you think about polling for error too?
>> Firmware might set error bit and kernel will spin here for full cycle (maybe even
>> more than one time).
>>
>
> Hm. I dont think thats useful since we dont do anything special if the
> firmware errors out. Perhaps at some later point we can stop sending
> new requests if there are firmware errors, but then how would we know
> when to resume?
>
>>
>>> +                     ret = 0;
>>> +                     break;
>>> +             }
>>> +             /*
>>> +              * Reducing the bus traffic in case this loop takes longer than
>>> +              * a few retries.
>>> +              */
>>> +             udelay(3);
>>> +     }
>>> +
>>> +     return ret;
>>> +}
>>> +
>>>  static int send_pcc_cmd(u16 cmd)
>>>  {
>>> -     int retries, result = -EIO;
>>> -     struct acpi_pcct_hw_reduced *pcct_ss = pcc_channel->con_priv;
>>> +     int ret = -EIO;
>>>       struct acpi_pcct_shared_memory *generic_comm_base =
>>>               (struct acpi_pcct_shared_memory *) pcc_comm_addr;
>>> -     u32 cmd_latency = pcct_ss->latency;
>>>
>>> -     /* Min time OS should wait before sending next command. */
>>> -     udelay(pcc_cmd_delay);
>>
>> Ouch.
>> This delay is Minimum Request Turnaround Time and specification is strict about this:
>> The minimum amount of time that OSPM must wait after the
>> completion of a command before issuing the next command, in microseconds.
>>
>> And you're going to completely remove delay that described as "must wait" regardless if it's
>> write or read command. In my view completion of a command is when CMD complete bit is set to 1.
>> This field in table is not optional.
>
> The spec has a whole bunch of delays which are highly questionable.

Why does your opinion matter here if I may ask?

If everybody follows this line of reasoning, there's no point in
having any specifications in the first place.

If you don't like the spec, please submit requests to modify it.  Or
follow it otherwise.

> e.g. the Min Request Turnaround time and the Max periodic access rate.
> Both seem quite useless since their original intention was to avoid
> the OS from flooding the BMC. But one can argue that the firmware can
> throttle the OS requests via the Command Completion bit anyway.

Yes, they can.  What does that have to do with the other fields?

> I also didnt like the usage of udelays and prefer the ktime math, but that
> seems overkill given the questionable value of these fields.
>
>>
>> Even worse, what if platform report latency = 0 (or very close to zero) and turnaround time as very large number?
>>
>
> Such a firmware would need serious rework. Besides, if for whatever
> reason they choose to have it, the command complete bit can be used to
> throttle the OS requests.

Yes, it can.  Is that a good enough argument for not respecting the
spec on the OS side?  I don't think so.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-16 19:10       ` Rafael J. Wysocki
@ 2016-02-16 19:33         ` Ashwin Chaugule
  2016-02-16 19:39           ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Ashwin Chaugule @ 2016-02-16 19:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alexey Klimov, Prashanth Prakash, Rafael J. Wysocki, linux acpi,
	Linaro ACPI Mailman List, Timur Tabi

Hi Rafael,

On 16 February 2016 at 14:10, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Feb 16, 2016 at 7:47 PM, Ashwin Chaugule
> <ashwin.chaugule@linaro.org> wrote:
>> Hi Alexey,
>>
>> On 15 February 2016 at 11:37, Alexey Klimov <alexey.klimov@arm.com> wrote:
>>> Hi Prashanth and Ashwin,
>>>
>>> On Wed, Feb 10, 2016 at 01:05:59PM -0700, Prashanth Prakash wrote:
>>>> From: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>>>>
>>>> Previously the send_pcc_cmd() code checked if the
>>>> PCC operation had completed before returning from
>>>> the function. This check was performed regardless
>>>> of the PCC op type (i.e. Read/Write). Knowing
>>>> the type of cmd can be used to optimize the check
>>>> and avoid needless waiting. e.g. with Write ops,
>>>> the actual Writing is done before calling send_pcc_cmd().
>>>> And the subsequent Writes will check if the channel is
>>>> free at the entry of send_pcc_cmd() anyway.
>>>>
>>>> However, for Read cmds, we need to wait for the cmd
>>>> completion bit to be flipped, since the actual Read
>>>> ops follow after returning from the send_pcc_cmd(). So,
>>>> only do the looping check at the end for Read ops.
>>>>
>>>> Also, instead of using udelay() calls, use ktime as a
>>>> means to check for deadlines. The current deadline
>>>> in which the Remote should flip the cmd completion bit
>>>> is defined as N * Nominal latency. Where N is arbitrary
>>>> and large enough to work on slow emulators and Nominal
>>>> latency comes from the ACPI table (PCCT). This helps
>>>> in working around the CONFIG_HZ effects on udelay()
>>>> and also avoids needing different ACPI tables for Silicon
>>>> and Emulation platforms.
>>>>
>>>> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>>>> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
>>>> ---
>>>>  drivers/acpi/cppc_acpi.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>>  1 file changed, 71 insertions(+), 28 deletions(-)
>>
>> [..]
>>
>>>> -     u32 cmd_latency = pcct_ss->latency;
>>>>
>>>> -     /* Min time OS should wait before sending next command. */
>>>> -     udelay(pcc_cmd_delay);
>>>
>>> Ouch.
>>> This delay is Minimum Request Turnaround Time and specification is strict about this:
>>> The minimum amount of time that OSPM must wait after the
>>> completion of a command before issuing the next command, in microseconds.
>>>
>>> And you're going to completely remove delay that described as "must wait" regardless if it's
>>> write or read command. In my view completion of a command is when CMD complete bit is set to 1.
>>> This field in table is not optional.
>>
>> The spec has a whole bunch of delays which are highly questionable.
>
> Why does your opinion matter here if I may ask?
>

We asked the folks who added those delays to the spec before making
this change. So this is not unfounded.

> If everybody follows this line of reasoning, there's no point in
> having any specifications in the first place.
>
> If you don't like the spec, please submit requests to modify it.  Or
> follow it otherwise.

But I agree that for the sake of spec correctness we should include
this delay and let firmware choose what value to put in it and deal
with the latencies for each request.

>
>> e.g. the Min Request Turnaround time and the Max periodic access rate.
>> Both seem quite useless since their original intention was to avoid
>> the OS from flooding the BMC. But one can argue that the firmware can
>> throttle the OS requests via the Command Completion bit anyway.
>
> Yes, they can.  What does that have to do with the other fields?

If the intention was only to control the OS request rate, then why
isnt Command Completion sufficient for that purpose? The firmware can
flip the command complete bit only when it is ready to receive the
next request.

>
>> I also didnt like the usage of udelays and prefer the ktime math, but that
>> seems overkill given the questionable value of these fields.
>>
>>>
>>> Even worse, what if platform report latency = 0 (or very close to zero) and turnaround time as very large number?
>>>
>>
>> Such a firmware would need serious rework. Besides, if for whatever
>> reason they choose to have it, the command complete bit can be used to
>> throttle the OS requests.
>
> Yes, it can.  Is that a good enough argument for not respecting the
> spec on the OS side?  I don't think so.

In principle I agree, but with CPPC, these fields seemed redundant,
hence the patch. But I'm not strongly opposed either way, so we'll add
the latency field back and leave it up to the platform and vendors.

Thanks,
Ashwin.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-16 19:33         ` Ashwin Chaugule
@ 2016-02-16 19:39           ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2016-02-16 19:39 UTC (permalink / raw)
  To: Ashwin Chaugule
  Cc: Rafael J. Wysocki, Alexey Klimov, Prashanth Prakash, linux acpi,
	Linaro ACPI Mailman List, Timur Tabi

On Tuesday, February 16, 2016 02:33:00 PM Ashwin Chaugule wrote:
> Hi Rafael,
> 
> On 16 February 2016 at 14:10, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Tue, Feb 16, 2016 at 7:47 PM, Ashwin Chaugule
> > <ashwin.chaugule@linaro.org> wrote:
> >> Hi Alexey,
> >>
> >> On 15 February 2016 at 11:37, Alexey Klimov <alexey.klimov@arm.com> wrote:
> >>> Hi Prashanth and Ashwin,
> >>>
> >>> On Wed, Feb 10, 2016 at 01:05:59PM -0700, Prashanth Prakash wrote:
> >>>> From: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> >>>>
> >>>> Previously the send_pcc_cmd() code checked if the
> >>>> PCC operation had completed before returning from
> >>>> the function. This check was performed regardless
> >>>> of the PCC op type (i.e. Read/Write). Knowing
> >>>> the type of cmd can be used to optimize the check
> >>>> and avoid needless waiting. e.g. with Write ops,
> >>>> the actual Writing is done before calling send_pcc_cmd().
> >>>> And the subsequent Writes will check if the channel is
> >>>> free at the entry of send_pcc_cmd() anyway.
> >>>>
> >>>> However, for Read cmds, we need to wait for the cmd
> >>>> completion bit to be flipped, since the actual Read
> >>>> ops follow after returning from the send_pcc_cmd(). So,
> >>>> only do the looping check at the end for Read ops.
> >>>>
> >>>> Also, instead of using udelay() calls, use ktime as a
> >>>> means to check for deadlines. The current deadline
> >>>> in which the Remote should flip the cmd completion bit
> >>>> is defined as N * Nominal latency. Where N is arbitrary
> >>>> and large enough to work on slow emulators and Nominal
> >>>> latency comes from the ACPI table (PCCT). This helps
> >>>> in working around the CONFIG_HZ effects on udelay()
> >>>> and also avoids needing different ACPI tables for Silicon
> >>>> and Emulation platforms.
> >>>>
> >>>> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> >>>> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
> >>>> ---
> >>>>  drivers/acpi/cppc_acpi.c | 99 ++++++++++++++++++++++++++++++++++--------------
> >>>>  1 file changed, 71 insertions(+), 28 deletions(-)
> >>
> >> [..]
> >>
> >>>> -     u32 cmd_latency = pcct_ss->latency;
> >>>>
> >>>> -     /* Min time OS should wait before sending next command. */
> >>>> -     udelay(pcc_cmd_delay);
> >>>
> >>> Ouch.
> >>> This delay is Minimum Request Turnaround Time and specification is strict about this:
> >>> The minimum amount of time that OSPM must wait after the
> >>> completion of a command before issuing the next command, in microseconds.
> >>>
> >>> And you're going to completely remove delay that described as "must wait" regardless if it's
> >>> write or read command. In my view completion of a command is when CMD complete bit is set to 1.
> >>> This field in table is not optional.
> >>
> >> The spec has a whole bunch of delays which are highly questionable.
> >
> > Why does your opinion matter here if I may ask?
> >
> 
> We asked the folks who added those delays to the spec before making
> this change. So this is not unfounded.

That's fine, but then the authors of the firmware your code has to deal
with may not have that conversation and may expect certain type of behavior
from it (as indicated by the spec).

> > If everybody follows this line of reasoning, there's no point in
> > having any specifications in the first place.
> >
> > If you don't like the spec, please submit requests to modify it.  Or
> > follow it otherwise.
> 
> But I agree that for the sake of spec correctness we should include
> this delay and let firmware choose what value to put in it and deal
> with the latencies for each request.
> 
> >
> >> e.g. the Min Request Turnaround time and the Max periodic access rate.
> >> Both seem quite useless since their original intention was to avoid
> >> the OS from flooding the BMC. But one can argue that the firmware can
> >> throttle the OS requests via the Command Completion bit anyway.
> >
> > Yes, they can.  What does that have to do with the other fields?
> 
> If the intention was only to control the OS request rate, then why
> isnt Command Completion sufficient for that purpose? The firmware can
> flip the command complete bit only when it is ready to receive the
> next request.
> 
> >
> >> I also didnt like the usage of udelays and prefer the ktime math, but that
> >> seems overkill given the questionable value of these fields.
> >>
> >>>
> >>> Even worse, what if platform report latency = 0 (or very close to zero) and turnaround time as very large number?
> >>>
> >>
> >> Such a firmware would need serious rework. Besides, if for whatever
> >> reason they choose to have it, the command complete bit can be used to
> >> throttle the OS requests.
> >
> > Yes, it can.  Is that a good enough argument for not respecting the
> > spec on the OS side?  I don't think so.
> 
> In principle I agree, but with CPPC, these fields seemed redundant,
> hence the patch. But I'm not strongly opposed either way, so we'll add
> the latency field back and leave it up to the platform and vendors.

Sounds good.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-16 18:47     ` Ashwin Chaugule
  2016-02-16 19:10       ` Rafael J. Wysocki
@ 2016-02-29 17:39       ` Alexey Klimov
  2016-02-29 19:20         ` Prakash, Prashanth
  1 sibling, 1 reply; 16+ messages in thread
From: Alexey Klimov @ 2016-02-29 17:39 UTC (permalink / raw)
  To: Ashwin Chaugule
  Cc: Prashanth Prakash, Rafael J. Wysocki, linux acpi,
	Linaro ACPI Mailman List, Timur Tabi

On Tue, Feb 16, 2016 at 01:47:15PM -0500, Ashwin Chaugule wrote:
> Hi Alexey,
> 
> On 15 February 2016 at 11:37, Alexey Klimov <alexey.klimov@arm.com> wrote:
> > Hi Prashanth and Ashwin,
> >
> > On Wed, Feb 10, 2016 at 01:05:59PM -0700, Prashanth Prakash wrote:
> >> From: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> >>
> >> Previously the send_pcc_cmd() code checked if the
> >> PCC operation had completed before returning from
> >> the function. This check was performed regardless
> >> of the PCC op type (i.e. Read/Write). Knowing
> >> the type of cmd can be used to optimize the check
> >> and avoid needless waiting. e.g. with Write ops,
> >> the actual Writing is done before calling send_pcc_cmd().
> >> And the subsequent Writes will check if the channel is
> >> free at the entry of send_pcc_cmd() anyway.
> >>
> >> However, for Read cmds, we need to wait for the cmd
> >> completion bit to be flipped, since the actual Read
> >> ops follow after returning from the send_pcc_cmd(). So,
> >> only do the looping check at the end for Read ops.
> >>
> >> Also, instead of using udelay() calls, use ktime as a
> >> means to check for deadlines. The current deadline
> >> in which the Remote should flip the cmd completion bit
> >> is defined as N * Nominal latency. Where N is arbitrary
> >> and large enough to work on slow emulators and Nominal
> >> latency comes from the ACPI table (PCCT). This helps
> >> in working around the CONFIG_HZ effects on udelay()
> >> and also avoids needing different ACPI tables for Silicon
> >> and Emulation platforms.
> >>
> >> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> >> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
> >> ---
> >>  drivers/acpi/cppc_acpi.c | 99 ++++++++++++++++++++++++++++++++++--------------
> >>  1 file changed, 71 insertions(+), 28 deletions(-)
> 
> [..]
> 
> >>
> >>  /*
> >>   * Arbitrary Retries in case the remote processor is slow to respond
> >> - * to PCC commands.
> >> + * to PCC commands. Keeping it high enough to cover emulators where
> >> + * the processors run painfully slow.
> >>   */
> >>  #define NUM_RETRIES 500
> >>
> >> +static int check_pcc_chan(void)
> >> +{
> >> +     int ret = -EIO;
> >> +     struct acpi_pcct_shared_memory __iomem *generic_comm_base = pcc_comm_addr;
> >> +     ktime_t next_deadline = ktime_add(ktime_get(), deadline);
> >> +
> >> +     /* Retry in case the remote processor was too slow to catch up. */
> >> +     while (!ktime_after(ktime_get(), next_deadline)) {
> >> +             if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
> >
> > What do you think about polling for error too?
> > Firmware might set error bit and kernel will spin here for full cycle (maybe even
> > more than one time).
> >
> 
> Hm. I dont think thats useful since we dont do anything special if the
> firmware errors out. Perhaps at some later point we can stop sending
> new requests if there are firmware errors, but then how would we know
> when to resume?

The suggestion is to jump out of polling loop quickly instead of spinning for full cycle.
With longer delays given by firmware the spinning issue on error will be worse.
This is not going to change behaviour on error paths (or in other words I don't suggest
to change error path behaviour right now).

Does specification describe any resume after error path? I don't see anything.

<snip>

Best regards,
Alexey Klimov.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations
  2016-02-29 17:39       ` Alexey Klimov
@ 2016-02-29 19:20         ` Prakash, Prashanth
  0 siblings, 0 replies; 16+ messages in thread
From: Prakash, Prashanth @ 2016-02-29 19:20 UTC (permalink / raw)
  To: Alexey Klimov
  Cc: Rafael J. Wysocki, linux acpi, Linaro ACPI Mailman List,
	Timur Tabi, Ashwin Chaugule

Hi Alexey,

On 2/29/2016 10:39 AM, Alexey Klimov wrote:
> On Tue, Feb 16, 2016 at 01:47:15PM -0500, Ashwin Chaugule wrote:
>> Hi Alexey,
>>
>> On 15 February 2016 at 11:37, Alexey Klimov <alexey.klimov@arm.com> wrote:
>>> Hi Prashanth and Ashwin,
>>>
>>> On Wed, Feb 10, 2016 at 01:05:59PM -0700, Prashanth Prakash wrote:
>>>> From: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>>>>
>>>> Previously the send_pcc_cmd() code checked if the
>>>> PCC operation had completed before returning from
>>>> the function. This check was performed regardless
>>>> of the PCC op type (i.e. Read/Write). Knowing
>>>> the type of cmd can be used to optimize the check
>>>> and avoid needless waiting. e.g. with Write ops,
>>>> the actual Writing is done before calling send_pcc_cmd().
>>>> And the subsequent Writes will check if the channel is
>>>> free at the entry of send_pcc_cmd() anyway.
>>>>
>>>> However, for Read cmds, we need to wait for the cmd
>>>> completion bit to be flipped, since the actual Read
>>>> ops follow after returning from the send_pcc_cmd(). So,
>>>> only do the looping check at the end for Read ops.
>>>>
>>>> Also, instead of using udelay() calls, use ktime as a
>>>> means to check for deadlines. The current deadline
>>>> in which the Remote should flip the cmd completion bit
>>>> is defined as N * Nominal latency. Where N is arbitrary
>>>> and large enough to work on slow emulators and Nominal
>>>> latency comes from the ACPI table (PCCT). This helps
>>>> in working around the CONFIG_HZ effects on udelay()
>>>> and also avoids needing different ACPI tables for Silicon
>>>> and Emulation platforms.
>>>>
>>>> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>>>> Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
>>>> ---
>>>>  drivers/acpi/cppc_acpi.c | 99 ++++++++++++++++++++++++++++++++++--------------
>>>>  1 file changed, 71 insertions(+), 28 deletions(-)
>> [..]
>>
>>>>  /*
>>>>   * Arbitrary Retries in case the remote processor is slow to respond
>>>> - * to PCC commands.
>>>> + * to PCC commands. Keeping it high enough to cover emulators where
>>>> + * the processors run painfully slow.
>>>>   */
>>>>  #define NUM_RETRIES 500
>>>>
>>>> +static int check_pcc_chan(void)
>>>> +{
>>>> +     int ret = -EIO;
>>>> +     struct acpi_pcct_shared_memory __iomem *generic_comm_base = pcc_comm_addr;
>>>> +     ktime_t next_deadline = ktime_add(ktime_get(), deadline);
>>>> +
>>>> +     /* Retry in case the remote processor was too slow to catch up. */
>>>> +     while (!ktime_after(ktime_get(), next_deadline)) {
>>>> +             if (readw_relaxed(&generic_comm_base->status) & PCC_CMD_COMPLETE) {
>>> What do you think about polling for error too?
>>> Firmware might set error bit and kernel will spin here for full cycle (maybe even
>>> more than one time).
>>>
>> Hm. I dont think thats useful since we dont do anything special if the
>> firmware errors out. Perhaps at some later point we can stop sending
>> new requests if there are firmware errors, but then how would we know
>> when to resume?
> The suggestion is to jump out of polling loop quickly instead of spinning for full cycle.
> With longer delays given by firmware the spinning issue on error will be worse.
> This is not going to change behaviour on error paths (or in other words I don't suggest
> to change error path behaviour right now).
>
> Does specification describe any resume after error path? I don't see anything.
The specification doesn't really say how to handle the error. While it doesn't say resume after
error, it  also doesn't say abort all commands after an error.
The only thing I see with respect to error bit is "If set, an error occurred executing the last
command."

So, all we know is there was a an error with the execution of the last command and I am not
sure it is correct to assume all the future usage of the channel should be aborted because the
failure of the last command.

Given the above ambiguity, I think it is better to keep the current implementation as is. If
some platforms shows a behavior where after an error the channel becomes completely
useless, then we can maintain some sort of internal state that decides not to use the channel
after say X number of consecutive errors.
> <snip>
>
> Best regards,
> Alexey Klimov.
>
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-02-29 19:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-10 20:05 [PATCH V3 0/4] acpi: cppc optimization patches Prashanth Prakash
2016-02-10 20:05 ` [PATCH V3 1/4] ACPI / CPPC: Optimize PCC Read Write operations Prashanth Prakash
2016-02-10 20:42   ` Timur Tabi
2016-02-10 21:15     ` Ashwin Chaugule
2016-02-10 21:57       ` Timur Tabi
2016-02-10 22:17         ` Ashwin Chaugule
2016-02-15 16:37   ` Alexey Klimov
2016-02-16 18:47     ` Ashwin Chaugule
2016-02-16 19:10       ` Rafael J. Wysocki
2016-02-16 19:33         ` Ashwin Chaugule
2016-02-16 19:39           ` Rafael J. Wysocki
2016-02-29 17:39       ` Alexey Klimov
2016-02-29 19:20         ` Prakash, Prashanth
2016-02-10 20:06 ` [PATCH V3 2/4] acpi: cppc: optimized cpc_read and cpc_write Prashanth Prakash
2016-02-10 20:06 ` [PATCH V3 3/4] mailbox: pcc: optimized pcc_send_data Prashanth Prakash
2016-02-10 20:06 ` [PATCH V3 4/4] acpi: cppc: replace writeX/readX to PCC with relaxed version Prashanth Prakash

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.